[Total: 0    Average: 0/5]
Hypothesis tests make it possible to compare one or more samples, and to validate or invalidate a hypothesis.


Hypothesis tests compare groups of data to see if they are similar or not. They can also test a group of data with an expected target and see if it is on target or not. They can also support or supplement design of experiments and regressions to analyze cause-and-effect relationships. Assumptions can be of any kind :

  • as the default percentage really decreased as a result of the actions taken ?
  • Have we significantly lowered the level of inventory ?
  • Are more than half of the employees the driving force behind the progress process ?
  • Does changing this part really have an impact on product quality ?

Step 0: Define the purpose of the test

Compare a sample with a target

Data typeQualitative 2 modalitiesQualitative, + 2 modalitiesQuantitative
ExampleWe want to compare a percentage of default (Good / Not good) compared to a target percentage.We want to compare percentages for data divided into different categoriesWe want to compare an average or a standard deviation with respect to a desired target.
Test to uset of Student for a proportionχ2 Goodness of fit testt of Student for average

χ2 test for a standard deviation

Compare 2 samples together

Data typeQualitative 2 modalitiesQualitative, + 2 modalitiesOrdinal QualitativeQuantitative
ExampleWe want to compare percentages of defect (good / not good) of 2 samples.We want to compare percentages for data divided into various categories.We want to compare the ranking proposed by several juries.We want to compare the average or the standard deviation of 2 samples.
Independent dataTest t de Student

Test du χ2 Test for Association
Test du Test χ2, test for associationTest de Wilcoxon-Mann Whitneyt Test of Student for average

Brown Forsythe test or Fisher Snedecor test (but Brown Forsythe is better) for Variances
Paired dataMcNemar testWilcoxon testWilcoxon testt test of Student for averages or Variances

Compare more than 2 samples

Data typeQualitative 2 modalitiesQualitative, + 2 modalitiesOrdinal qualitativeQuantitative
ExampleIt is desired to compare percentages of defect (good / not good) of several samples.We want to compare percentages for data divided into various categories.We want to compare the ranking proposed by several juries.We want to compare the mean or standard deviation of more than 2 samples.
Independent dataχ2 Test for AssociationTest χ2, test for associationKruskal-Wallis test
Tau of Kendall
Rho of Spearman
Anova for average

Brown Forsythe test for variance
Paired dataQ of CochranFriedman testFriedman testFriedman test
Anova blocks
Manova for average

Non parametric VS parametric test

In the previous tables, we find 2 families of tests:

  • Parametric tests : they are based on the comparison of the different groups, assuming a number of elements : the distribution is normal and the variance between the samples is similar.
  • Non parametric tests : they are almost all based on the notion of ranks. The principle is to substitute values for their order number in the data set. The interest of these tests is to be used for quantitative, qualitative data and in case the distribution is not normal.
 Non parametric testParametric test
Test nameFriedman

Kruskal et Wallis

Wilcoxon-Mann Whitney


Q de Cochran


Tau de Kendall

Rho de Spearman
Brown Forsythe

Fisher Snedecor


Student for paired data



Anova et Anova in blocks
AdvantageNo conditions to respect

Allows to easily take into account qualitative data as quantitative and even ordinal

Less sensitive to outliers

Suitable for small samples
More precise if the conditions are established.
DisadvantagesLess accurate than nonparametric tests.Requires to meet the constraints of normalities

Requires "similar" variances for tests that compare a parameter other than the variance.
In general, we will always prefer if the conditions are met, a parametric test that are more accurate than nonparametric tests.

Step 1 : Validate the test conditions

Data distribution

For parametric tests, normality is necessary. To make sure, we make a fit test.

Homogeneity of the Variance

Parametric tests are effective if the sample Variance is small. To make sure, we perform a homoscedaticity test.

Independence or paired data

The principle of matching is based on creating pairs of data to reduce the risk of misreading results.


We want to analyze the performance level of an additive to reduce vehicle consumption. We will have 2 scenarios.

The first, we choose 10 vehicles that we separate into 2 groups, including 1 to whom we will administer the additive. We make them travel the same way and we compare the consumption. The results will probably be skewed because we do not know if the vehicles have basic consumptions in correspondence.

The second, we choose 5 vehicles. They are made to make the same journey, a first time without the additive, a second time with. We compare the consumptions thereafter.

We realize that in the second case, our measurements will be much more precise. We are free from various variabilities that distort the results.

Sample size

In general, it will always be preferable to have similar sample sizes between all the groups that one wishes to test. This reduces the variability of Variances.

On the other hand, the more data we have, the more these will tend towards a normal distribution. It is recommended to have a sample size of at least 30 individuals per group.

Do not have outliers

Les valeurs Outliers are sources of biased results. Even if some tests, particularly nonparametric ones, are robust to this type of value, it is necessary to delete them before the study..

Step 2: Ask the hypotheses

Expression of hypotheses

The principle of a hypothesis test is to compare the probability of a hypothesis versus its opposite. For example, compare the assumption that we have 3% default against the assumption that there is not 3% default.

We therefore call :

  • Null hypothesis H0 : this is the hypothesis where we learn nothing, therefore the one where we reject the hypothesis that our result is significant. In other words, our result is obtained by chance. For example, for the choice of a drug, the null hypothesis will be the one where this drug has no effect. We note that the hypothesis H0 is always expressed with equality.
  • Alternative hypothesis H1 : it is the hypothesis where we learn something and therefore where the tested result is significant, in other words, something other than chance happened . The data collected shows a statistical difference. It actually represents what we want to know unless what we want to know is equality.

For example, we think we have 3% default on average and we want to test it. Our hypotheses will be :

  • H0 : our default percentage is equal to 3%
  • H1 : we have a default percentage different from 3%

The test direction

We define which side tilts the balance. For that, we give direction to the test and we will express our hypotheses as follows :

  • H  = 3% and H1 ≠ 3% : We define a bilateral test
  • H  = 3% and H1 < 3% : We define a unilateral test on the left
  • H  = 3% and H1 > 3% : We define a unilateral test on the right

Step 3 : Identify the Practical Value

Called also test statistic, it is the value calculated from our samples that we will compare to the critical value. Its calculation depends on the chosen test.

Step 4 : Choose the level of risk

In the hypothesis tests, there are two types of risks.

DecisionH0Correct conclusionError of the second kind
H1Error of first kindCorrect conclusion

In other words, by definition :

  • A risk of the first kind, called α : Risk of error to reject the null hypothesis while it is true (also called significance threshold). There is the risk of seeing an event when there is none: we condemn an innocent.
  • A risk of the second kind, called β : it is the risk of holding the null hypothesis when it is false. So there is the risk of not seeing an event when there is one: we acquit a guilty party.

Statistical power

Calculated by the formula 1 – β, the power must be as great as possible, ie with a risk of the second kind the lowest possible.

More clearly, statistical power represents the probability of rejecting H0 when H0 is false and therefore represents our probability of detecting a difference.

Ideally 0.8 and above to detect a reasonable deviation from the null hypothesis.

The power depends on several parameters: the size of the effect to be highlighted, the sample size, the basic risk and the risk. α.

In practice, it is customary to set the risk of error α 5% and the risk β to 10%. These are values arbitrarily set only by philosophical choice :

We prefer to accept a bad lot than to refuse a good lot or to acquit a guilty person rather than to condemn an innocent person.

Thus, the risk associated with the first type error, which is considered the most serious error, is better controlled. These two risks α and β Being an antagonist, choosing an α risk that is too small will lead to very rarely reject H0. On the contrary, choosing a risk that is too big will lead to acceptance only very rarely. The risk is then deduced by the calculation, if the law under H1 is known.

The confidence level is calculated according to the following formula : 1 – α , where α is named the risk.

Example :

Consider the test of the following hypotheses :

  • Hypothesis H0 : the patient must be hospitalized,
  • Alternative hypothesis H1 : the patient should not be hospitalized.

The error of the first kind is not to hospitalize a patient who needed it. This error is very serious, since it can lead to the death of the patient. The second type of risk of hospitalizing a patient who did not need it may be less serious.

Step 5 : Establish the decision rule

The critical value of the test is calculated from the previously defined confidence level. This critical value separates 2 areas of choice :

  • Rejection area : set of values where the test statistic is likely because H0 was selected.
  • Non-rejection area : set formed by the other values in the case where we reject H0 and so we hold back H1.

Unilateral or Bilateral tests

For all laws, one chooses between Bilateral tests and unilateral tests. The graphic representation of this notion is as follows :

Test typeChartUseDecision rule
Bilateral Test

We will take for our calculation a value α divided by 2
We want to know if our value is different from the test value.

Example: the average of this sample is different from this one
- Critical value > Practical value > + critical value → Reject of H0

We conclude that our two samples are different.
Unilateral test on the left

We will take for our calculation a value α equal to the total risk
Find out if our value is lower than the test value.

Example: the average consumption of the new vehicle is significantly lower than the old one.
Practical value < critical value → Reject of H0

It is concluded that our sample 1 is well below sample 2.
Unilateral test on the right

We will take for our calculation a value α equal to the total risk
Find out if our value is higher than the test value.

Example: the life of the new washing machines is better than the old ones
Practical value > critical value → Reject of H0

We conclude that our sample 1 is greater than sample 2

Calculate the critical value

The critical value is read on specific tables that have been developed by the test designers. It depends on the applicable law for the chosen test and most of the time the number of degree of freedom.

Degree of freedom concept

The number of degrees of freedom is a measure of the amount of information that can be obtained from an observation. The more degrees of freedom we have, the more information we have.

For example, in the equation A * B = 10, we have 2 solutions :

  • if A = 2, then B = 5
  • if A = 5, then B = 2

In other words, if we block one of the two parameters we can easily define the other. So in this case we have 1 degree of freedom either n – 1.

Step 6 : calculate the p-Value

La p-Value, Significance index, is an important concept in statistics. Introduced by Fisher, it helps to identify the actual level of « hazard » of the result. 

It follows the same law as the chosen test.

Step 7 : Take the statistical decision

7.1 Reading the comparison between the practical value and the critical value

The reading of the results is always done vis-à-vis the null hypothesis. We can be in 2 cases :

  • We have retained H0 : we conclude that the alternative hypothesis H1 is not true.
  • We have rejected H0 : we conclude that the alternative hypothesis H1 is true.


7.2 Reading thep-Value

The value of the p-Value is interpreted as follows :

  • P < α : the result is very significant, and is not due to chance
  • P > α : the result is not significant and due to chance 

Step 8 : The Post Hoc test

Where more than 2 samples have been compared and the statistical conclusion has led to one or more being different from the others, post-hoc tests. These tests make it possible to identify among the different samples which one or which differ from the others.

Note however that a simple pair analysis can in most cases be sufficient to identify these samples.


D. Chessel, A. B. Dufour (2003) – Pratique des tests élémentaires

N. Boudaoud (2002) – Rappels statistiques

P. Dagnelie (1970) – Théories et méthodes statistiques

P. Sprent (1992) – Pratique des statistiques non paramétriques

D. Mouchiroud (2003) – Tests d’hypothèse

J. Jacques (2012) – Statistiques inférentielles

R. Rakotomalala (2008) – Comparaison de populations, test non paramétriques

E. Ouellet, I. Belley-Ferris, S. Leblond (2011) – Guide d’économétrie appliquée pour Stata

R. Rakotomalala (2013) – Comparaison de populations, test non paramétriques

J. Poirier (1999) – Estimateurs et tests d’hypothèses

M. Lejeune (2005) – Statistique : la théorie et ses applications

P. Capéraà, B. Van Cutsem (1988) – méthodes et modèles en statistique non paramétrique

V. Bhushan (1978) – Les méthodes en statistique

S. Tufféry (2010) – data mining et statistique décisionnelle : l’intelligence des données

Norme NF X06-064

Share This