Snowball Sampling

Snowball sampling is a method typically used with unknown or rare populations. Members of these populations have not all been previously identified and are more difficult to locate or contact than known populations (Coleman, 1958; Goodman, 1961; Spreen, 1992). Obtaining a sample from such a population typically does not allow for the use of traditional random sampling methodologies that require the entire population be known (i.e., the population of students at a university). Instead, methodologies such as snowball sampling employ the presumed social networks that exist between members of a target population to build a sample (Fig. 1). Snowball sampling is more directed and purposeful than many other non-random sampling techniques, such as convenience sampling which focuses only on the most easily identified and reachable members of a population. When carefully conducted, snowball sampling can provide comprehensive (though not generalizable) characterizations of unknown populations.

Snowball Sample Network Diagram
Example of a social network analysis diagram showing the linkages between people in a group.
Figure 1. Example of a social network analysis diagram showing the linkages between people in a group. Source: Wikimedia Commons (http://commons.wikimedia.org/wiki/File:Sna_large.png).

For a 2009 survey of Landsat users in the United States (U.S.) conducted by the USGS Fort Collins Science Center’s Social and Economic Analysis (SEA) Branch, snowball sampling provided a way to identify otherwise unknown users of Landsat imagery. The population of Landsat users in the U.S. can be characterized as unknown because there is no list that contains contact information for every user in the U.S. from which to sample. There are many remote sensing, GIS, and other satellite imagery-related organizations, but none comprise the entire population of users in the U.S. Additionally, there are several imagery providers that retain lists of customers, but again those lists are not exhaustive. Though the results of the survey are not generalizable to the population as a whole, in this case, a non-probability sampling method such as snowball sampling was the only means to gather information on the population. By using a novel approach to sampling and minimizing biases where possible, the result was a very diverse sample from which valuable baseline information was collected and a better understanding of the population was reached.

Conducting Snowball Sampling

The snowball sampling process is relatively simple. In the same way that a snowball rolled in the snow will pick up more and more flakes with each turn, snowball sampling is a multi-step process in which more and more people are added to the sample with each step. Typically, the initial step involves identifying a group of individuals who are known members of the population to create a “seed.” Often, the seed comprises an existing list (or lists) of members of the population, but these lists tend to be fairly homogeneous, such as the members of a professional organization. For Landsat users, a diverse seed group of confirmed users did not exist, so the first step was defining this group. For large, unidentified populations such as Landsat users, initial contacts can be found in many different ways. For this study, we used a mix of approaches in order to identify a broad variety of users. First, we conducted a Web search for potential professional users of moderate-resolution imagery in the U.S. on a state-by-state basis. The email addresses of potential users were the only contact information collected since the survey was to be conducted online. Second, during the Web search, we recorded the contact information for professional organizations related to remote sensing and GIS. These organizations, ranging from local to national, were then asked to provide their memberships lists (if appropriate) or to send our snowball sampling request for contacts to their members.

Next, we contacted these potential users to confirm their use of moderate-resolution imagery and to elicit the names of up to three other moderate-resolution imagery users. Those who confirmed their use composed the seed for this snowball sampling effort. The contacts provided by the seed produced the first wave of individuals, who in turn were contacted, thus providing the second wave (Fig. 2). The outcome from contacting the second wave produced the third wave, which contributed to the fourth wave, and so on. We concluded the snowball sampling after six waves and contacted the identified users later via email with an invitation to participate in the survey itself.

Sampling Process
Snowball sampling process wherein each wave increases the sample size.
Figure 2. Snowball sampling process wherein each wave increases the sample size.

There are a number of parameters that need to be determined when designing snowball sampling, including:

  • Number of waves. The number of waves may or may not be determined beforehand. Often, the process ends when the waves cease to produce a predetermined number of new contacts. In small populations, it may take only a few waves before almost no new contacts are obtained in a wave, whereas larger populations may require more waves. For this study, we considered the sampling complete when less than three percent of those contacted in a wave responded.

  • Number of contacts to request. Often, the number of contacts asked for is three, partly to minimize the burden on the respondent but also to minimize the potentially biasing impact of participants with very large social networks. We followed this approach and asked for the contact information for only three moderate-resolution imagery users.

  • Criteria for including a participant in the sample. Usually, the number of times a person is identified by others as a member of a certain population is used. For some populations, it is appropriate to include a participant after only one mention, while for others, it may require two or three mentions. For this study, we used self-confirmation as our criteria for inclusion: if a participant identified themselves as a user of moderate-resolution imagery, we included them in the sample.

Advantages of Snowball Sampling

The primary advantage to snowball sampling is its success in identifying individuals from unknown (and potentially very large) populations beyond any known segments of a given population. In the case of the entire population of Landsat users, a random sample is not a possibility and snowball sampling provided a way to identify users who otherwise might not have been included in the sample. Another advantage is that a sample can be produced quickly and cost-effectively, particularly when it is completed on the Web. For our study, the costs to create the sample consisted of the time spent searching the Web, emailing potential users, and managing the contacts database. Contacting potential users via email also significantly reduced individual response times, as well as the time needed between contacts.

Challenges of Snowball Sampling

There are several challenges inherent in snowball sampling, foremost being that snowball sampling does not yield a random sample. Thus, the results from a study using a snowball sample are not generalizable to the population under study. However, when a population is unknown and there is little information available about it, snowball sampling can provide a better understanding and more complete characterization of a population. In the case of Landsat users, the few studies that have been conducted focused on only known segments of the population, such as members of professional organizations. One of the goals of this study was to reach as diverse an array of users as possible, including those who may have been excluded from previous studies, and snowball sampling provided a means to do so.

Biases

Snowball sampling does not produce a random sample because of the potential biases present in the process. The initial seed may introduce bias at the beginning because the people who make up the seed are typically selected via a convenience sample. Volunteerism bias frequently exists, both in the seed and also in subsequent waves. Masking is also common in at-risk or stigmatized populations (i.e., drug users, people with HIV), where people may not want to reveal that their acquaintances are members of that population, though masking can occur in any population. Additionally, personal network size influences the chances a person will be included in the sample. Members of the population with the largest networks and highest social visibility are more likely to be referred (Biernacki and Waldorf, 1981; Henslin, 1972). Conversely, those with small networks, or isolated individuals, can be omitted from the sample because they are less likely to be mentioned by another member of the population (Van Meter, 1990).

Minimizing Biases

Some of these biases can be minimized, though not entirely removed, by taking a few extra steps suggested in the literature, such as:

  • Obtaining a large sample size (Atkinson and Flint, 2001; Tsvetovat and Sharabati, 2006). We eventually collected a sample of over 4,500 satellite imagery users and over 2,500 responded to the survey.

  • Relying on a variety of indirect sources to develop a seed (Blanken and others, 1992; Faugier and Sargeant, 1997). By conducting a Web search and contacting professional organizations, we created the sample from a diversity of sources. Based on the results from the snowball sampling and the survey, we reached a very diverse group of users from all sectors, applying the imagery in many different application areas.

  • Reaching isolated members of the population. We addressed this issue by creating a new list of potential users through the Web search and by requesting contact information for only three users to avoid biasing the sample toward people with large personal networks. The results from the snowball sampling and the survey indicate that we were successful in reaching isolated individuals. During the snowball sampling over half of the participants stated that they did not know any other moderate-resolution imagery users. While some of these responses were probably the result of masking, the majority were most likely true responses since this is not a stigmatized population. The survey results revealed that around half of the Landsat users were not members of any professional organizations, indicating that we reached many members of the population who would have been excluded by only contacting those organizations.

For more information about snowball sampling, please see the references listed below.  Additional information and references are also available in Appendix A of the full report.

References

Atkinson, R., and Flint, J., 2001, Accessing hidden and hard-to-reach populations—Snowball research strategies: University of Surrey Social Research Update, v. 33. Accessed on January 2, 2007, at http://sru.soc.surrey.ac.uk/SRU33.html.

Berg, S., 1988, Snowball sampling, in Kotz, S. and Johnson, N. L., eds., Encyclopedia of Statistical Sciences (Vol. 8), p. 528-532.

Biernacki, P and Waldorf. D., 1981, Snowball sampling—Problems and techniques of chain referral sampling: Sociological Methods Research, v. 10, p. 141-163.

Blanken, P., Hendricks, V.M., & Adriaans, N.F.P., 1992, Snowball sampling—Methodological analysis? in Hendricks, V.M., Blanken, P., & Adriaans, N.F.P., eds., Snowball sampling—A pilot study on cocaine use: Rotterdam, IVO, p. 83-100.

Coleman, J.S., 1958, Snowball sampling—Problems and techniques of chain referral sampling: Human Organization, v. 17, p. 28-36.

Faugier, J. and Sargeant, M., 1997, Sampling hard to reach populations: Journal of Advanced Nursing, v. 26, p. 790-797.

Goodman, L.A., 1961, Snowball sampling: The Annals of Mathematical Statistics, v. 32, no. 1, p. 148-170.

Henslin, J.M, 1972, Studying deviance in four settings—Research experiences with cabbies, suicide, drug users, and abortionees in Douglas, J., ed., Research on Deviance: New York, Random House, p. 35-70.

Lunsford, T.R., and Lunsford, B.R., 1995, The research sample, Part I—Sampling: Journal of Prosthetics and Orthotics, v. 7, no. 3, p. 105-112. Accessed on January 4, 2007, at http://www.oandp.org/jpo/library/1995_03_105.asp.

Snijders, T., 1992, Estimation on the basis of snowball samples—How to weight: Bulletin Methodologie Sociologique, v. 36, p. 59-70.

Spreen, M., 1992, Rare populations, hidden populations and link-tracing designs—What and why?: Bulletin Methodologie Sociologique, v. 36, p. 34-58.

Thomson, S., 1997, Adaptive sampling in behavioral surveys in Harrison, L., and Huges, A., eds., The validity of self-reported drug use—Improving the accuracy of survey estimates: Rockville, MD, National Institute on Drug Abuse, NIDA Research Monograph 167, p. 296-319.

Tsvetovat, M. and Sharabati, W., 2006, CSS 692—Social Network Analysis: LifeJournal, Fall, p. 1-20. Accessed on January 4, 2007, at http://www.academic2.american.edu/~sharabat/files/SNA_Problem_Set2.pdf.

Van Meter, K.M., 1990, Methodological and design issues—Techniques for assessing the representatives of snowball samples in Lambert, E.Y., ed., The collection and interpretation of data from hidden populations: Rockville, MD, National Institute on Drug Abuse, NIDA Research Monograph 98, p. 31-43.

Vogt, W.P., 1999, Dictionary of statistics and methodology—A Nontechnical Guide for the Social Sciences: London, Sage.

Waters, J.K., and Biernacki, P., 1989, Targeted sampling—Options for the study of hidden populations: Social Problems, v.36, p. 416-430.