Statistics and sampling are fundamental to almost all of our understanding of the world. The world is too big to measure directly. Measuring representative samples is a way to understand the entire picture.
Popular and academic literature are both full of examples of poor sample selection resulting in flawed conclusions about the population. Some of the most famous examples relied on sampling from telephone books (in the days when phone books still mattered and only relatively wealthy people had telephones) resulting in skewed samples.
This post is not about bias in sample selection but rather the simpler matter of sample sizes.
Population size is usually irrelevant to sample size
I’ve read too often the quote: “Your sample was only 60 people from a population of 100,000. That’s not statistically relevant.” Which is of course plain wrong and frustratingly wide-spread.
Required Sample Size is dictated by:
- How accurate one needs the estimate to be
- The standard deviation of the population
- The homogeneity of the population
Only in exceptional circumstances does population size matter at all. To demonstrate this, consider the graph of the standard error of the mean estimate as the sample size increases for a population of 1,000 with a standard deviation of the members of the population of 25.
The standard error drops very quickly at first, then decreases very gradually thereafter even for a large sample of 100. Let’s see how this compares to a larger population of 10,000. Continue reading