The main issue of sampling methods is to collect viable data for meaningful statistical treatment.
The methods of collection
The selection of the method of collection depends on the existence and difficulty of collecting the data of the study. In order to ensure this feasibility, it is necessary to:
- Go to the Gemba with the work team.
- Answer the 5 key questions that are: what should be observed? What kind of information is needed? What data is needed? How much data is needed? What precision do we need?
- Recover data by making sure they are simple, objective and reliable.
- Synthesize the data.
Thus, if the data are already existing (historical available, database to buy…), it will suffice to identify the possible sources set and to select the data.
In cases where data is difficult to collect or non-existent, specific methods must be used:
- The Design Of Experiments: Use in the case where one assumes that many factors have influences each other.
- The Sampling plans : These are tools that can be used to structure the collection.
The data sheet
The statement sheet is the key tool in the data collection. Simple tool, it responds to specific rules:
- The name of the data to be collected is clearly specified. It must be as short as possible.
- The Line/column assembly must be coherent and ergonomic so that the person in charge of the collection can be mistaken.
- The data must be coded to the document: unit, number of digits after the comma…
- The date, location and person in charge of the statement must be noted.
- Provide a code in the case where a data could not be raised or could not be recorded correctly.
- An example should be given allowing someone outside the project to understand the data.
Errors due to measuring system
- An instrument is faithful if it responds exactly the same way when it is placed in two identical situations.
- An instrument is valid when it really measures what it is supposed to measure.
- To make sure of that, we will conduct an analysis of the measurement system.
Errors due to the organization
- These are the errors that occur during the data collection: Have the instructions been respected? Did the investigators act the same way?
- To avoid these errors you have to use the same instruments and the same conditions.
Errors due to the sampling method
- It should always be verified, in the light of the objectives of the statistical study, that the sampling method is adapted.
- In particular avoid the over-representation of certain parts of the population.
- On the other hand, except in a particular case, the data must be collected randomly: each must have the same chance of being taken.
The notion of statistical bias
In statistics, a bias is an approach that creates errors in the outcome of the study. The bias of an estimator (of average, of standard deviation…) is the difference between the value of its expectation (the average of its estimate) and the true value of the random variable that it is supposed to estimate.
For example, a biased sample is a set of individuals from a population, supposed to represent it, but whose selection of individuals has introduced a bias that then no longer allows for proper conclusion for the entire population.
Errors due to aberrant data
In some cases, a few points can totally distort the results. These points deviate significantly from the others, they are referred to as aberrant points, in the sense that they are not likely to belong to the parent population.
The reasons for the appearance of this type of observation are:
- Error in data collection: A 4-year-old person subscribes to life insurance, actually she is 40 years old.
- A really different behavior: A sportsman so doped that he improve world records.
The positioning of these dots in relation to the global cloud suggests (or masks) the existence of a manifest link between the variables. Most of the time, to detect them, a simple point cloud graph is enough. The opposite case should be used:
- A control chart : Points validating a criterion should be considered aberrant.
- A test of Homoscedaticity : If the test validates a difference in variance, there are aberrant points.
- Get support from management and stakeholders this to avoid barriers when collecting data.
- Communicate with all persons directly or indirectly impacted by the compendium.
- Determine who will collect the data.
- Determine logistics, resources and technology
- Minimize the impacts and disadvantages for those involved in the workplace or in a service environment, which means choosing the most appropriate time to collect data.
- Provide for a trial period or pilot phase to improve and modify the data collection method.
P. Ardilly (1994) – Survey techniques
B. The Evil (2013) – The selection of the sample
F. Kohler (2014) – Data collection
L. Gerville Reache, V. Couallier, N. Paris (2012) – Representative sample
J. Desabie (1963) – Applied Statistics Review
C. Durand (2002) – Sampling, Gemba Management