Twenty Third Floor

Figuring out the future and the now

Actuarial and Risk, data analysis, insight, managing uncertainty, measurement, statistics

The virtual irrelevancy of population size to required sample size

April 5, 2013

David Kirk

Statistics and sampling are fundamental to almost all of our understanding of the world. The world is too big to measure directly. Measuring representative samples is a way to understand the entire picture.

Popular and academic literature are both full of examples of poor sample selection resulting in flawed conclusions about the population. Some of the most famous examples relied on sampling from telephone books (in the days when phone books still mattered and only relatively wealthy people had telephones) resulting in skewed samples.

This post is not about bias in sample selection but rather the simpler matter of sample sizes.

Population size is usually irrelevant to sample size

I’ve read too often the quote: “Your sample was only 60 people from a population of 100,000.Â That’s not statistically relevant.”Â Which is of course plain wrong and frustratingly wide-spread.

Required Sample Size is dictated by:

How accurate one needs the estimate to be
The standard deviation of the population
The homogeneity of the population

Only in exceptional circumstances does population size matter at all. To demonstrate this, consider the graph of the standard error of the mean estimate as the sample size increases for a population of 1,000 with a standard deviation of the members of the population of 25.

Standard Error as Sample Size increases for population of 1,000

The standard error drops very quickly at first, then decreases very gradually thereafter even for a large sample of 100. Let’s see how this compares to a larger population of 10,000.

Standard Error as Sample Size increases for population of 1,000 vs 10,000

It’s not an error that those graphs are almost identical.Â (see next section for why they’re not exactly the same.) Population size isÂ irrelevant.

Finite Population Correction – or when population size begins to matter

Population size is usually irrelevant to sample size. There is an exception. Few people even know about this because it is so rarely relevant.

If one samples the entire population, there is no room for error in estimating statistics related to the population. The mean of the population is then known precisely. We don’t have a “sample” any more. The standard error of our estimate is thus zero since it is no longer an estimate but a direct measurement of the population statistics.

Standard Error as Sample Size increases for population of 100 vs 1,000

As the sample size approaches the population size, the sample error begins to decline. We adjust the normal standard error estimate by multiplying it by sqrt((Populaton – Sample) / (Population – 1)). In this case, you can see that the 100 population sample error does begin to decrease in the graph above.

You can go ahead and forget that formula now as you’ll probably never need it. It does explain the slight difference in the graphs in the previous section though.

Non-homogenous populations aka sample size still isn’t the answer

One last point – if the population is not homoegenous, this will typically increase the total population variability and result in higher standard errors. It adds risk in using a small sample size. The best solution is not to simply increase the population size. Cluster Sampling is a straightforward way of reducing the variability of overall estimates for population statistics and better understanding differences between the clusters.

So what determines the required sample size?

I get asked “how big a sample will we need” typically with only the population size to go on. Hopefully I’ve explained why the population size is irrelevant in almost all scenarios. How should you estimate the required sample size?

Ideally, you need an initial estimate of population standard deviation. You can get this from prior studies of similar populations or by doing a quick, small initial survey to estimate the standard error to set a sample size. You could also continue to sample under the standard error of your estimate became sufficiently low, but this often isn’t practical and can encourage unintended biases in terms of how the sample is increased over time.

Without anything to go on, I use the rule of thumb that 10 observations is usually the minimum number within a homogenous population to measure a single characteristic. 20 is better and usually, more doesn’t make much of a difference.

Smaller than you thought?

About David Kirk

Featured Posts

The “Indemnity Trap”: Why Outdated Legal Models are Deferring the Promise of Parametric Insurance
by David Kirk
Parametric insurance is often marketed as the “clean” alternative to traditional risk transfer. The pitch is compelling: if a hurricane hits a specific GPS coordinate… Read more: The “Indemnity Trap”: Why Outdated Legal Models are Deferring the Promise of Parametric Insurance
The Complexities of Comparing SCR Cover Ratios Across Insurers
by David Kirk
In the insurance industry, we often use Solvency Capital Requirement (SCR) cover ratios as a key metric for comparing insurers’ financial strength. While these ratios… Read more: The Complexities of Comparing SCR Cover Ratios Across Insurers
One answer could be pet insurance
by David Kirk
I firmly believe major demographics shifts are going to have massive social, political, economic, financial market and commercial impacts in the coming decades. The balance… Read more: One answer could be pet insurance
A piece of the failure puzzle – decreasing insurer failure rates through Skilled Person Reviews
by David Kirk
Every failure hits policyholders’ savings or cover, impact their lives and their livelihoods. They destroys shareholder value and decrease confidence in the entire financial sector.… Read more: A piece of the failure puzzle – decreasing insurer failure rates through Skilled Person Reviews
How and why insurers fail
by David Kirk
I’ve been updating my presentation from 2021 on “How and Why Insurers Fail”. I now estimate the annual failure rate (or at least getting into… Read more: How and why insurers fail
The hidden dangers of opinion shopping and trusting good news.
by David Kirk
I had a recent discussion on the value of getting a second opinion. There’s the old line that “if you want five opinions, ask three… Read more: The hidden dangers of opinion shopping and trusting good news.
40,000
by David Kirk
40,000. That’s the ballpark figure I usually work with as the minimum number of micro insurance policies required for scale. The expenses of running even… Read more: 40,000