Introduction
The issue of sampling is to establish a sampling process to limit the cost and time of a statistical study while ensuring that the observations are generalizable to the population.
This concept appeared for the first time in Ratiocinais in Awl Ludo published in 1657 by the Dutch scientist Christiaan Huygens. But it will only be in the twentieth century^{} that these methods will be recognized.
Clearly, the larger the sample size and the greater the reliability of the observations. Of course, the most reliable method is the exhaustive collection of data. However, for reasons of cost, time or opportunity, sampling identifies a number of individuals as appropriate.
Note that:
- N: Population size
- n: Sample size
The sampling process
1. Establish the objectives of the study
The challenge is to identify our expectations with regard to the statistical study. For example, if one wants to accept a batch, estimate what proportion of the population has the defect, understand which parameter influences the result…
2. Define the target population
This is the total population for which information is needed. Define this population with specific characteristics (size, date, typology…).
3. Determine the data to be collected
A good statistical study is based on data Clear and reliable. To do that, you have to share a common vocabulary and be specific about definitions.
4. Choose the sampling method
In order to take a representative sample, 3 large families of methods can be used:
- Random Probabilistic methods : Random Draw of individuals in the population. Each unit has a chance that one can quantify to be selected.
- Nonrandom non- probabilistic methods : Selection of some significant distribution criteria.
- sampling standards or scales : Many sectors have predetermined sampling standards indicating the method of collection, sample size,… Most companies also have their own methods established on a historical, desired level of quality…
5. Set the Confidence interval
From the moment we put in place a sample, there is always a degree of uncertainty about its representativeness. So it is necessary to identify a degree of precision. At the most the degree will be great, at the most the size of the sample will be great.
6. Identify the sample size
6.1 Sample size to make an estimate
You want to make an estimate of the population from a sample. This is the case for example surveys, where from a sample of X people, it is concluded that x% of the population votes for Y. This is also the case where we want to know what percentage of my population defaults from a sample.
Sample size and normal law
By browsing our site, you will find repeatedly the fact that for the study to be qualitative (capability…), the size of the sample must be at least 30 individuals^{1}. This is not due to chance but comes from the mathematical demonstration that from 28 data, our data follow a normal law (according to Cochran’s conditions). Basic Law of these tools.
However, by convention, the minimum size is 30 individuals.
To calculate this sample size, the first variable is whether our population is finite or infinite. It will be considered that below a population of 100 000 individuals, the population is finite and at the top is infinite.
You can find in the following formulas:
- t: margin coefficient
- e: margin of error. In general, 1/10 of^{} The estimated proportion of the population is taken
- p: Estimated per priori proportion of individual who represent the observed character
- n: sample size
- N: The size of the founding population
Precision on the margin of error – e
The margin of error is the level of error that one wants to give to estimate the data. For example, we want to know the real proportion to 2%, which means that we will have a value of more or less 2%. Generally, a margin of error is taken 1/10^{th} of the proportion of the population with the character studied. For example, if we analyze parts and we think that 20% of the individuals have the defect that we are studying. We will then take a margin of error of 2%.
As a result, at the most the margin of error is low at the most the sample size is large.
Size of a sample for an infinite founding population
The infinite population is composed of more than 100 000 individuals. The formula for identifying the sample size n is:
Margin of error | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
P | 1% | 2% | 3% | 4% | 5% | 6% | 7% | 8% | 9% | 10% |
10% | 3457 | 864 | 384 | 216 | 138 | 96 | 71 | 54 | 43 | 35 |
20% | 6147 | 1537 | 683 | 384 | 246 | 171 | 125 | 96 | 76 | 61 |
30% | 8067 | 2017 | 896 | 504 | 323 | 224 | 165 | 126 | 100 | 81 |
40% | 9220 | 2305 | 1024 | 576 | 369 | 256 | 188 | 144 | 114 | 92 |
50% | 9604 | 2401 | 1067 | 600 | 384 | 267 | 196 | 150 | 119 | 96 |
Sample size for a finite founding population
To calculate the sample size, we start from the previous formula (sample size for an infinite population) to which a correction factor is applied. It is considered to be used when the founding population is below 100 000 individuals. In this case, the size n ‘ of the sample is calculated with the following formula:
For example, for an estimated proportion of 10%, a confidence level of 95%, the margin coefficient being 1.96, and a margin of error of 1%, we have a sample size of 3457. For a finite population, we get:
Founding Population N |
Sample size n ‘ |
100 |
98 |
1000 |
776 |
10000 |
2570 |
100000 |
3457 |
Example
We want to estimate the percentage of defects that we generate on our candy production. We produce by batch of 80 000, so we consider that we are within the framework of a finite mother population. We wish to make our estimate with a confidence threshold of 95% and a margin of error of 1%. The estimated proportion is 0.1%. The sample size is therefore 39. In terms of accuracy, this means that there is a 95% chance that a result worth 0.1% is safe at + or – 2% near, i.e. between 0.101% and 0.099%. In other words, only 5% of the population will be outside the range of 0.099% to 0.101%.
Identify the likelihood of occurrence
We calculate the Probability to observe the event studied in our sample. For this, we use the hypergeometric Law, the most common case (where the Part is not handed out for the next Draw : Typical case of quality controls) or Binomial (If at any chance the sample we took, we put it in for the next Draw).
6.2 Sample size to see a difference
In this case, we want to identify a sample size allowing us to see a difference of result at a certain level of confidence. This is the case, for example, where one wishes to identify a significance of our improvements before/after or to identify the number of samples for each test of a plan of experiments. In this case, the calculation is based on statistical power.
The calculation formula is as follows:
With:
- t: margin coefficient
- P0: The estimated proportion of the event for the first population
- P1: The estimated proportion of the event for the second population
- zβ: The coefficient of statistical power in the light of normal law. For power, most commonly, we will choose 10%.
Example
We have improved our filtration processes and we want to know if our proportion of defects has decreased significantly. This is done by a student hypothesis test. The question is on how many individuals I calculate my proportion. One wishes for this, a confidence rate of 5% and a power of 10%.
We know that at the initial we had about 10% of the filtration tests that were defective. With our improvements, we are thinking of falling around 1%. By performing the calculation, we obtain a sample size of 70.
In other words, we have to calculate our proportion on the basis of a sample of 70 individuals.
Source
1-H. Saranadasa (2003)-The square root of N plus one sampling rule
P. Ardilly (1994) – Survey techniques
B. The Evil (2013) – The selection of the sample
F. Kohler (2014) – Data collection
L. Gerville Reache, V. Couallier, N. Paris (2012) – Representative sample
J. Desabie (1963) – Applied Statistics Review
C. Durand (2002) – Sampling, Gemba Management