Sampling methods ensure that reliable data is collected that will allow for meaningful study.
The main issue of sampling methods is to structure data collection to perform a reliable statistical analysis. First essential condition, data must be random in other words, each of the data must have the same chance of being taken.
Probabilistic sampling methods
Probabilistic or random samples are drawn from the mother population for draw which the complete list of all the survey units that comprise it (individuals, families, businesses, etc.) is available. The methods described below are distinguished.
Simple random sampling
A sample is called random when each individual in the population has a known and non-zero probability of belonging to the sample. This is the only method that ensures the representativeness of the sample to the population. This methodology ensures the independence of errors, indispensable for several types of statistical tests.
The principle is to randomly and independently collect the n individuals of the population.
1. Dispose of the entire population, and assign a number to each object or make the total list.
2. Choose the type of Draw :
With putting back (independent draw )
Without putting back (exhaustive draw )
Composition of the survey unchanged, a single individual can be chosen several times.
The sampling base varies with each sample, and each individual has the same probability of being part of the specimen. An individual who has already been chosen cannot be again. This is the most common method in practice.
3. By using a random number table or a computer program, we get random numbers telling us the individual to collect. Each individual to a n/N probability of being taken.
4. Draw the n numbers constituting the sample.
Sample by systematic Draw
The systematic draw consists of making a statement according to a deviation or a fixed interval. This is only possible when a notion of order exists between individuals.
- Dispose of the entire population, and assign a number to each object or make the total list.
- Determine the sampling interval, called K. It is calculated by making the N/n ratio.
- One chooses randomly the first individual whose number is between 1 and K: a the number of that individual, it will be the origin of the sampling.
- From a, We do successive samples: A, A + K, a + 2k…
Sample with probability proportional to the size
This method is based on the inclusion in the probability of collection of the “ size ” of each individual. The larger the size of the individual, the more likely it is to be selected.
For example, we want to study the influence of bottle moulds on the quality of their labelling. It is known that mould 3 was used to realize the ¾ of production. The sampling must take into account this, and the bottles of this mould must represent 75% of the total sampling.
It is a matter of cutting the population into subsets called Strata and making a sampling in each of them.
- The population is divided into homogeneous groups, which are mutually exclusive (age, sex, income…). The division criteria must be simple to use, easy to observe and related to the issue of the study.
- In proportion to its importance in the population, it is calculated how many individuals are required within the sample to represent each stratum.
- Independent samples are selected from each stratum. Any of the sampling methods can be used. The sampling method may be different depending on the strata.
Note that the total variance is the sum of the variances of each stratum and the interstrata variances. To ensure a good collection, we seek to have the smallest Variance in the strata and the largest interstrata.
Technique used in case the data to be collected is particularly complex, it consists of collecting on a large sample of the first data even little precise. This data will be used to ” prune ” the subject, and then collect accurate data on a smaller sample, determined from the first data.
Cluster and/or multi-degree sampling
The technique of cluster sampling consists of dividing the population into groups or clusters. A number of clusters are randomly selected to represent the population, clusters that will be studied in a comprehensive way.
- Identify criteria for dividing the total population
- Determine the different clusters, each of which is composed of individuals who are supposed to be representative of the population
- Randomly identify the cluster (or bunches) to represent the population
- Take the collection of all the individuals from selected clusters
Technique similar to cluster sampling, it consists of taking only a certain number of individuals in the selected cluster (or bunches). So we find a leveling in several degrees. If the population is divided into M clusters (factories, establishments…):
- 1st degree: the different clusters, also say primary units.
- 2nd degree: within each cluster, individuals are selected from the sample.
- 3nd degree: if among, clusters, we can still subdivide them into a group, then we have as much degree of subgroup. For example, for bottle moulds: batches of flasks, mould used for the production of these lots, impressions of each of these mussels.
Non-probabilistic sampling methods
The non-probabilistic sampling method is used when it is not possible to compile an exhaustive list of all survey units. In the case of probabilistic sampling, each unit has a chance to be selected. This is no longer true in the case of probabilistic sampling. The disadvantage of these methods is that they do not allow the representativeness of the population to be ensured.
It consists of making a random sampling without any particular technique.
This is done until a specific number of units (quotas) for various sub-populations have been selected.
- Quotas can be based on population proportions. (e.g. 50% men and 50% female)
- Retain only a small number of quotas. In excess of 2 or 3 quotas, we complicate the task of the investigators.
Sampling of convenience or judging
A sample is taken based on some of the judgments that are made about the population as a whole.
|Simple random sampling||Systematic||Stratified||Multi-degree||Cluster||Non probabilistic|
|Simplicity of implementation||++||+||+||+||+||+++|
|Level of representativeness||++||+||++||0||0||0|
|Need for an exhaustive list of data||Yes||Yes||Yes||Non||Non||Non|
Standards and sampling scales
Many business sectors or companies have their own sampling standards or scales for which the sampling method, the size of the batches is generally found…
Examples of standards include:
- ISO 2854: Statistical interpretation of data. Estimation Techniques and Tests on averages and variances.
- ISO 2859: Sampling rules for attribute controls
- ISO 3494: Effectiveness of tests on averages and variances
- ISO 3951: Rules and sampling tables for measurement checks on percentages of non-compliant
- ISO 5725: Statistical Application accuracy of results and measurement methods
- ISO 7002: Agricultural and food products-presentation of a standardized batch sampling method
- ISO 8423: Progressive sampling Plans for measurement control of non-compliant percentages (known standard deviation)
- ISO 8422: Progressive sampling Plan for attribute control
- ISO/TR 8550: Guide for the selection of a system, program or sampling plan for acceptance, for the control of separate elements in batches
- ISO 10725: Sampling Plans and procedures for acceptance for the control of bulk materials
- ISO/FDIS 11 648: Statistical Aspects of bulk material sampling.
- ISO/DIS 14 560: Sampling procedures for acceptance by attributes – quality levels specified for non-conforming individuals per million.
P. Ardilly (1994) – Survey techniques
B. The Evil (2013) – The selection of the sample
F. Kohler (2014) – Data collection
L. Gerville Reache, V. Couallier, N. Paris (2012) – Representative sample
J. Desabie (1963) – Applied Statistics Review
C. Durand (2002) – Sampling, Gemba Management