Snowball sampling is a method typically used with unknown or rare populations. Members of these populations have not all been previously identified and are more difficult to locate or contact than known populations (Coleman, 1958; Goodman, 1961; Spreen, 1992). Obtaining a sample from such a population typically does not allow for the use of traditional random sampling methodologies requiring that the entire population be known (e.g., the population of students at a university). Instead, methodologies such as snowball sampling employ the presumed social networks that exist between members of a target population to build a sample (Fig. 1). Snowball sampling is more directed and purposeful than many other non-random sampling techniques, such as convenience sampling that focuses only on the most easily identified and reachable members of a population. When carefully conducted, snowball sampling can provide comprehensive (though not generalizable) characterizations of unknown populations.
For a survey of Landsat users in the United States, conducted by the USGS Fort Collins Science Center’s Policy Analysis and Science Assistance Branch (PASA), snowball sampling provided a way to identify otherwise unknown users of Landsat imagery. The population of Landsat users in the U.S. can be characterized as "unknown" because there is no list that contains contact information for every user in the U.S. from which to sample. There are many remote sensing, GIS, and other satellite-imagery-related organizations, but none comprise the entire population of users in the U.S. Additionally, there are several imagery providers that retain lists of customers, but again those lists are not exhaustive. Though the results of the survey are not generalizable to the population as a whole, in this case, a non-probability sampling method such as snowball sampling was the only means to gather information on the population. By using a novel approach to sampling and minimizing biases where possible, the result was a very diverse sample from which valuable baseline information was collected and a better understanding of the population was reached.
The snowball sampling process is relatively simple. In the same way that a snowball rolled in the snow will pick up more and more flakes with each turn, snowball sampling is a multi-step process in which more and more people are added to the sample with each step. Typically, the initial step involves identifying a group of individuals who are known members of the population to create a “seed.” Often, the seed comprises an existing list (or lists) of members of the population, but these lists tend to be fairly homogeneous, such as the members of a professional organization. For Landsat users, a diverse seed group of confirmed users did not exist, so the first step was defining this group. For large, unidentified populations such as Landsat users, initial contacts can be found in many different ways. For this study, we used a mix of approaches in order to identify a broad variety of users. First, we conducted a Web search for potential professional users of moderate-resolution imagery in the U.S. on a state-by-state basis. The email addresses of potential users were the only contact information collected since the survey was to be conducted online. Second, during the Web search, we recorded the contact information for professional organizations related to remote sensing and GIS. These organizations, ranging from local to national, were then asked to provide their memberships lists (if appropriate) or to send our snowball sampling request for contacts to their members.
Next, we contacted these potential users to confirm their use of moderate-resolution imagery and to elicit the names of up to three other moderate-resolution imagery users. Those who confirmed their use composed the seed for this snowball sampling effort. The contacts provided by the seed produced the first wave of individuals, who in turn were contacted, thus providing the second wave (Fig. 2). The outcome from contacting the second wave produced the third wave, which contributed to the fourth wave, and so on. We concluded the snowball sampling after six waves and contacted the identified users later via email with an invitation to participate in the survey itself.
There are a number of parameters that need to be determined when designing snowball sampling, including:
The primary advantage to snowball sampling is its success in identifying individuals from unknown (and potentially very large) populations beyond any known segments of the population. In the case of the population of Landsat users, a random sample is not a possibility and snowball sampling provided a way to identify users who otherwise might not have been included in the sample. Another advantage is that a sample can be produced quickly and cost-effectively, particularly when it is completed on the Web. For our study, the costs to create the sample consisted of the time spent searching the Web, emailing potential users, and managing the contacts database. Contacting potential users via email also significantly reduced individual response times, as well as the time needed between contacts.
There are several challenges inherent in snowball sampling, foremost being that snowball sampling does not yield a random sample. Thus, the results from a study using a snowball sample are not generalizable to the population under study. However, when a population is unknown and there is little information available about it, snowball sampling can provide a better understanding and more complete characterization of a population. In the case of Landsat users, the few studies that have been conducted focused on only known segments of the population, such as members of professional organizations. One of the goals of this study was to reach as diverse an array of users as possible, including those who may have been excluded from previous studies, and snowball sampling provided a means to do so.
Snowball sampling does not produce a random sample because of the potential biases present in the process. The initial seed may introduce bias at the beginning because the people who make up the seed are typically selected via a convenience sample. Volunteerism bias frequently exists, both in the seed and also in subsequent waves. Masking is also common in at-risk or stigmatized populations (i.e., drug users, people with HIV), where people may not want to reveal that their acquaintances are members of that population, though masking can occur in any population. Additionally, personal network size influences the chances a person will be included in the sample. Members of the population with the largest networks and highest social visibility are more likely to be referred (Biernacki and Waldorf, 1981; Henslin, 1972). Conversely, those with small networks, or isolated individuals, can be omitted from the sample because they are less likely to be mentioned by another member of the population (Van Meter, 1990).
Some of these biases can be minimized, though not entirely removed, by taking a few extra steps suggested in the literature, such as:
The references below provide more in-depth information about snowball sampling. Additional information and references are also available in Appendix A of the Executive Report.
Atkinson, R., and J. Flint. 2001. Accessing hidden and hard-to-reach populations: Snowball research strategies. University of Surrey Social Research Update 33. Accessed on January 2, 2007, at http://sru.soc.surrey.ac.uk/SRU33.html.
Berg, S. 1988. Snowball sampling. Pp. 528-532 in S. Kotz and N.L. Johnson, eds. Encyclopedia of Statistical Sciences, Vol. 8.
Biernacki, P., and D. Waldorf. 1981. Snowball sampling: Problems and techniques of chain referral sampling. Sociological Methods Research 10: 141-163.
Blanken, P., V.M. Hendricks, and N.F.P. Adriaans. 1992. Snowball sampling: Methodological analysis? Pp. 83-100 in V.M. Hendricks, P. Blanken, and N.F.P. Adriaans, eds. Snowball sampling: A pilot study on cocaine use. Rotterdam: IVO.
Coleman, J.S. 1958. Snowball sampling: Problems and techniques of chain referral sampling. Human Organization 17: 28-36.
Faugier, J. and M. Sargeant. 1997. Sampling hard to reach populations. Journal of Advanced Nursing 26: 790-797.
Goodman, L.A. 1961. Snowball sampling. The Annals of Mathematical Statistics 32(1): 148-170.
Henslin, J.M. 1972. Studying deviance in four settings: Research experiences with cabbies, suicide, drug users, and abortionees. Pp. 35-70 in J. Douglas, ed. Research on deviance. New York: Random House.
Lunsford, T.R., and B.R. Lunsford. 1995. The research sample, Part I: Sampling. Journal of Prosthetics and Orthotics 7(3): 105-112. Accessed on January 4, 2007, at http://www.oandp.org/jpo/library/1995_03_105.asp.
Snijders, T. 1992. Estimation on the basis of snowball samples: How to weight. Bulletin Methodologie Sociologique 36: 59-70.
Spreen, M. 1992. Rare populations, hidden populations and link-tracing designs: What and why? Bulletin Methodologie Sociologique 36: 34-58.
Thomson, S. 1997. Adaptive sampling in behavioral surveys. Pp. 296-319 in L. Harrison and A. Hughes, eds. The validity of self-reported drug use: Improving the accuracy of survey estimates. NIDA Research Monograph 167. Rockville, MD: National Institute on Drug Abuse.
Tsvetovat, M. and W. Sharabati. 2006. CSS 692: Social network analysis. LifeJournal Fall: 1-20. Accessed on January 4, 2007, at http://www.academic2.american.edu/~sharabat/files/SNA_Problem_Set2.pdf.
Van Meter, K.M. 1990. Methodological and design issues: Techniques for assessing the representatives of snowball samples. Pp. 31-43 in E.Y. Lambert, ed. The collection and interpretation of data from hidden populations. NIDA Research Monograph 98. Rockville, MD: National Institute on Drug Abuse.
Vogt, W.P. 1999. Dictionary of statistics and methodology: A nontechnical guide for the social sciences. London: Sage.
Waters, J.K., and P. Biernacki. 1989. Targeted sampling: Options for the study of hidden populations. Social Problems 36: 416-430.