Statistical Interpretation

Where data are compared in the results from the Landsat user surveys, chi-square and t-test statistics are reported if they are significant (p < 0.001) and have at least a small effect size. Occasionally, significant differences of p < 0.05 are reported if there is at least a small effect size. Because statistically significant differences are more likely to occur with large sample sizes, effect sizes are necessary to understand if the differences are meaningful.  For chi-square analyses, the effect sizes are phi (Φ) or Cramer’s V and for t-tests, the effect size is Cohen’s d.  The following guidelines are used to determine the magnitude of the effect size (Cohen, 1988, p. 25 and 79):

Magnitude of Effect Size Cramer’s V/phi Cohen’s d

Small

0.1

0.2

Medium

0.3

0.5

Large

0.5

0.8

 

Chi-Square (χ2)

Chi-square tests compare the expected and actual distribution of data across categories (i.e., gender, work sector). For instance, if you had a sample which was half female and half male and wanted to know if the distribution of males and females across a work sector, such as government or private business, was the same as the overall distribution in the sample, you would use a chi-square. The expected distribution for the sector would be 50% female and 50% male, but the actual distribution could be 60% males and 40% females. The chi-square statistic is a sum of the differences between the expected and actual distribution. The greater the difference between the expected and actual distribution, the larger the chi-square statistic. Whether the difference is statistically significant (as shown by the p-value) is based on both the size of the chi-square and the number of people in the sample.

T-tests

T-tests are used to determine differences between the means (or averages) of two continuous variables (i.e., age, years of education).  They can be used to compare means of the same variable from two different groups of people (i.e., mean income of men versus mean income of women) or between the same group of people at two different times (i.e., mean weight before beginning a diet program versus mean weight after completing the diet program).  T-tests take into account both the absolute difference between the means, as well as the distribution of data within each group. Given the same absolute difference in means, the more the distributions of data from each group overlap, the less significant the difference is between them. T-statistics can be positive or negative.  A positive t-statistic indicates that the mean of group 1 (or time 1) is larger than the mean of group 2 (or time 2) and a negative t-statistic indicates the opposite. Whether the difference is statistically significant is based on the size of the t-statistic and the number of people in the sample.

P-value

Statistical significance for these test statistics is determined by the p-value. The p-value indicates whether the difference between the data is real or simply a chance finding. The p-value threshold is set before analysis begins and is based on the characteristics of the study. Typically, a p-value of 0.05 or smaller is used in social science research to indicate significance.  This means that there is a 5% chance of incorrectly finding a significant difference when there actually is none. However, p-values are sensitive to sample size and analysis of data from a large sample will often yield many significant differences. For this study, we decided on a threshold of 0.001, meaning there is one chance in a thousand that we will find a significant difference when there is none. Even with this conservative threshold, we felt calculating effect sizes was necessary to identify meaningful differences.   

Effect Size

Effect sizes, or measures of association, reveal the differences in data regardless of sample size. They demonstrate practical or meaningful differences, rather than simply statistical differences. Effect size can be thought of as a measurement of the amount of impact an independent variable has on a dependent variable (Murphy and Myors, 1998, p. 12). To return to our earlier example of gender and work sector, the effect size would reveal whether gender was a significant factor in determining in which sector a person worked. Effect sizes are generally reported as small, medium, or large.  To illustrate what these levels mean in a practical sense, Cohen (1988, p. 25–27, 79–80) provides the following examples for interpreting the effect sizes phi and Cramer’s V:

  • a small effect (0.1) = the difference in mean height between 15- and 16-year-old girls,
  • a medium effect (0.3) = the difference in mean height between 14- and 18-year-old girls, and
  • a large effect (0.5) = the difference in mean height between 13- and 18-year-old girls.

Following Cohen’s recommendations on the interpretation of effect size for behavioral and psychological studies (1988, p. 25), we consider a statistically significant measure with a small effect size or greater to indicate a meaningful difference for this study.

References

Cohen, Jacob, 1988, Statistical power and analysis for the behavioral sciences (2nd ed.), Hillsdale, N.J., Lawrence Erlbaum Associates, Inc.

Murphy, K.R., and Myors, Brett, 1998, Statistical power analysis—A simple and general model for traditional and modern hypothesis tests: Mahwah, N.J., Lawrence Erlbaum Associates, Inc.