Introduction
This law allows to represent the sum of the squares of independent variables. The variance being a particular case.
The χ^{2} test is very robust (not dependent on normal and effective law even with low strength) and serves several purposes. To compare:
 A standard deviation to a desired target.
 1 distribution to 1 theoretical distribution that we know
 Several distributions between it
This test has the peculiarity of taking into account any type of data, whether quantitative or qualitative. In addition, the conditions of normality and variance are not necessary.
Some examples:
 Are the dice rigged?
 Is there a link between the blood group of the sick and the onset of a disease?
 Our proportion of defects is it similar between different technologies?
The principle
We want to measure the difference between any data distribution and the one that we imagine in theory to be true, if our hypothesis of departure is the right one. This is the typical case of a stack or face, or one ” knows ” that the probability is one in two odds.
It consists of calculating a statistic based on the discrepancy between the data we observe and the theoretical data of our experience.
It is noted, however, that the test is really effective only when we have more than 5 individuals per category.
Comparing a standard deviation to a target
We are in the case where we aim to reduce the variability of a process. We have identified a target standard deviation that we are going to compare with the standard deviation we obtained after actions of improvements. The χ^{2} test consists of comparing the value of the observed standard deviation with a confidence interval that we calculate from the objective.
1. Calculate the standard deviation observed and the average
2. Identify the α risk
3. Calculate the practical value of χ^{2} using the following formula, the standard deviation taken into account being the target standard deviation:
4. Calculate critical value with dof = n1
5. Calculate the PValue
6. Interpret the result (see below)
Compare a distribution to a known theoretical distribution
Also known as the χ^{2} Goodness of Fit test or suitability test, the principle of the χ^{2} test is based on the comparison between observed proportions and a theoretical model. It is used to test the association or nonassociation between rows and columns in a contingency table. It is noted that the categories X and Y can be both qualitative and quantitative. The test based on a comparison of the proportion between the values, the type of variable does not matter.
This test is often used as part of an adjustment test to determine whether a data distribution meets a normal law or another law.
It is also used to simply know if the observed results are similar to what was expected. For example, we have written in a specification that our tank must be able to heat 1000 L of liquid at 20 ° in 10 minutes, then 30 ° at 15mn… When we receive our VAT, we perform tests and check whether our results are statistically identical to the specifications.
Compare one or more distributions between it
Also known as the χ^{2} test for Associationtest, or an independent test, the test principle of χ^{2} is the same as the previous one. The only difference is that the theoretical model is built from the experience data and not from a predefined model.
Step 1: Assumptions
We want to check whether the proportions between all the variables are similar or not. The following assumptions are posed:
H0: P_{1} = P_{2} =… = P_{k}
H1: P_{1} ≠ p_{2 }≠_{.} .. ≠ p_{k}
Step 2: The data table
The data table is a contingency table, where the modalities of criterion X are crossed with the terms of criterion Y.
Case 1: Comparison of a distribution to a theoretical distribution
An adjustment test is desired for example whether we need to use a parametric test (requiring normal data distribution) or simply to predict the behavior of our variable through the adequacy of a known law.
In this case, the data table is simply the set of measures observed.
Case 2: Comparison of more than 2 distributions between them
We want to compare 2 or more distributions. The comparison is made against the same parameter. Let’s take the previous example, but we add the fact that we want to compare the occurrence frequency of the defect with respect to the production rate and compared to 3 different machines. The table becomes the following:
Machine Cadence 
Machine 1 
Machine 2 
Machine 3 
150 
50 
40 
35 
100 
30 
45 
25 
50 
20 
45 
20 
Total 
100 
130 
80 
Step 3: Develop the theoretical model
Whether in the context of an adjustment test or the search for a causal relationship, the calculation principle and formulas are the same. Simply differs the construction of the theoretical model, which we detail below.
Case 1: Comparison of a distribution to a theoretical distribution
Used as an adjustment test, the χ^{2 }test will allow to compare the distribution of data regarding a previously selected law (Normal, Fisher…). We will therefore, from the observed data, build a model that follows a known law. For this, it is necessary to calculate the probability that the data have, to belong to the class interval according to the chosen law. This probability, referred to by the law, is calculated according to the following formula:
 N: The total number of starting data
 m: The average of the starting data
 b_{inf} : the lower terminal of the interval in question
 b_{sup} : the upper terminal of the interval
 σ: standard deviation of baseline data
It is noted that under Excel, the function to return the probability of the normal law is law. Normal. Reverse.
We build a table of this type:
Interval 
of 16 
16 to 20 
20 to 24 
24 to 28 
28 to 32 
32 at 36 
36 at 40 
40 at 44 
Observed frequency 
5 
11 
16 
21 
15 
12 
8 
2 
Estimated frequency 
3,82 
9,29 
16,25 
20,45 
18,53 
12,08 
5,67 
1,91 
Sample size 
90 
Average 
26,8 
Standard deviation 
26,8 
Case 2: Comparison 2 or more distributions between them
The causal relationship analysis is based on a comparison of the proportions of the parameters studied between the observed data and the theoretical data. The theoretical model is constructed assuming that the proportions between our samples are similar. The proportions of the theoretical model are calculated from the proportions observed. The average of the observed proportions is calculated and applied to determine the theoretical model.
Let’s take the previous example. We want to know if changing a Part in a machine allows us to significantly reduce the defect rate. Sample 2 and 3 was made with our new Part and sample 1 and 4 with the original Part . The following table is obtained:
This is inferred from a theoretical model, simply by applying this percentage to all samples. The following table is obtained:
Theoretical table
Machine 1 
Machine 2 
Machine 3 

Speed 150 
40,32* 
52,42 
32,26 
Speed 100 
32,26 
41,94 
25,81 
Speed 50 
27,42 
35,65 
21,94 
Total 
100 
130 
80 
* Example: 100 * 125/310 = 40.32
Step 4: Practical value
The practical value is to estimate the discrepancies between the theoretical model and the observations. We note:
 F_{ }: Value observed in Sample I of modality J
 f_{th} : The theoretical value for sample I of the J modality
For each sample, a ratio is calculated according to the following formula:
Example: Gap on sample 1 = (1215,47)^{2}/15.47 + (188184,53)^{2}/184.53
The practical value is then simply to make the sum of all the variances of the samples:
Practical value = Σ D_{ij}
Step 5: Critical value
Case 1: Comparison of a distribution to a theoretical distribution
The law of χ^{2} is only used for bilaterals in view of the fact that it is not symmetrical. We then use a value of the risk always of α and not α/2 or other as for symmetric laws. The number of degrees of freedom is equal to the number of classes in the distribution – 3 (for comparison to normal law) or numbers of values – 1 (for comparison to different expected values).
In EXCEL, the function is CHIINV(α; dof).
Case 2: Comparison 2 or more distributions between them
The practical value will be compared to the critical value that the distribution law of χ^{2}gives us. The value α is identified, usually 5%, then the number of degrees of freedom is calculated with the formula:
dof = (Number of categories of X – 1) * (Number of Y – 1 categories)
Then we determine the critical value either by searching directly in the table of χ^{2}, or via the EXCEL spreadsheet with the function: CHIINV (risk; dof).
Step 6: PValue
The PValue of the test will allow us to conclude definitively on the model. It follows a law of χ^{2 }and is computed under Excel using the formula:
Chidist (practical value; dof)
Step 7: Interpretation
Result  Statistical conclusion  Comparison of a standard deviation with a target standard deviation  Adjustment to a known distribution law  Comparison of distributions between them 

Practical value ≥ Critical value  We reject H0  Our standard deviation is not on target.  The distribution of our data is not that of our comparison model.  The samples are different. 
Practical value < Critical value  On retain H0  Our standard deviation is at the target  The distribution of our data is that of our comparison model.  The samples are identical. 
Result  Statistical conclusion  Comparison of a standard deviation with a target standard deviation  Adjustment to a known distribution law  Comparison of distributions between them 

pvalue > α  We retain H0  Our result is on target.  Our data follow the law of comparison.  The samples are similar. 
pvalue < α  We reject H0  Our result is not the target.  Our 2 distributions are not similar.  The samples are different. 
Source
1 – K. Pearson (1900) – On The criterion that given system deviations from the probable in the case of correlated system of variables is such that it can be reasonably assumed to have arisen from random sampling.
2 – G. U. Yule, M. Greenwood (1915) – The statistics of antityphoid and anticholera innoculations, and the interpretation of such statistics in general.
3 – R. A. Fisher (1922) – On the interpretation of Chi Square from contingency tables and the calculation of P.
Y. BrunetMoret (1966) – Pearson’s χ^{2 }test
E. Cahuzac, C. Bland (2008) – Stata by practice
D. Laffly (2012) – Bivariate analysis of qualitative variables
Mr. DienerWest (2008) – Use of the ChiSquare Statistic
Standard NF X 06061
Standard NF X06070