Have you ever been in the situation where you needed to know something about a population and don’t have the budget to measure it for everyone? By sampling a population and applying an Excel function you can obtain an approximation for the overall population. This concept is best illustrated through an example.
Consider that you are responsible for product marketing at an amusement park. You need to know the proportion of boys that attend vs. females that attend so that you know how many blue vs pink sunglasses to stock. (I am using colour as a simple proxy for “sunglasses that boys like” vs. “sunglasses that girls like” – we won’t get into discussing colour and gender). You discover that the customer service folks conducted a customer satisfaction survey that included collecting gender information. We can use this sample information to draw conclusions for the entire population.
The magic is through using the Binomial distribution. I won’t get into the math details her but for those so inclined details of this distribution is on wikipedia. Utilizing the distribution you can calculate confidence intervals based on a confidence level. The more samples you use the tighter the confidence interval. The great thing is it doesn’t matter if the overall population is 2000 or 200,000, provided you do random sampling the confidence interval results are the same.
If we observe that 30% of the people who responded on the survey are male and we want a 90% confidence level, the following table shows the confidence interval based on the number of samples we take.
|Number of samples ||Low confidence interval ||High confidence interval |
|100 ||23% ||38% |
|300 ||26% ||34% |
|500 ||27% ||33% |
|1000 ||28% ||32% |
For example, if we have 300 samples we can be 90% sure that between 26% and 34% of the entire population is male (regardless of the sample size). Microsoft Excel provides the CRITBINOM function to help calculate these values.
Note that to use this method you need to ensure that you are doing a random sampling. This means that the samples need to happen across different factors that might influence the results. In the amusement park example, if you did all the surveys just as the local girls school arrived for their annual trip you certainly wouldn’t be able to apply the results to the entire population.
When conducting measurements within your organization sampling is a cost-effective, statistically relevant way of getting information you need when you don’t require the exact value.