Introduction
The WilcoxonMann Whitney test is arguably the best known nonparametric test. In the end, these are 2 tests but are inferred from each other. That’s why we’re still talking about the WilcoxonMann Whitney test. Historically, the Frank Wilcoxon test, which he published in 1945^{1}, preceded that of Henry Mann and his student Donald Ransom Whitney, published in 1947^{2}. The difference lies in the fact that for the Wilcoxon test, one chooses a reference sample which by convention is the one that has the lowest individual number.
The tests of Wilcoxon And Mann Whitney are based on the same principles and the same types of calculations. Most of the software offers the 2 but some calculations either. For this reason we present the two methods.
The principle
Like most nonparametric tests, we compare data that can be of all types (qualitative, continuous…) to which they are assigned a rank. It is this rank that is evaluated and compared to whether the distribution of values is similar.
Let’s take an example: we have 4 juries who have evaluated 2 products by awarding them a note. The table is as follows:
Jury 1 
Jury 2 
Jury 3 
Jury 4 

Product 1 
4 
9 
11 
Lost Note 
Product 2 
6 
8 
10 
12 
The question to which the Wilcoxon – Mann Whitney test can be answered in this case is: is one of the 2 products significantly better than the other?
Step 1: Assumptions
The Wilcoxon – Mann Whitney test relies on comparing the distribution of stored data. In the case of a bilateral test, one asks:
H_{ } : The 2 distributions are identical
H_{1} : The 2 distributions are different
A unilateral test can also be carried out and indicate:
H_{ } : The 2 distributions are identical
H_{1} : The value of Sample 1 is < in Sample 2 (left unilateral Test)
or
H_{1} : The value of Sample 1 is > in Sample 2 (right unilateral Test)
Step 2: Calculate the sum of the ranks
Like most nonparametric tests, the WilcoxonMann Whitney statistic uses the sum of ranks. A new variable is introduced which is the sum of the ranks of each sample. This has 2 consequences:
 The distribution of the data necessarily becomes symmetrical regardless of the initial distribution. Through this transformation in line, we find a normal law.
 The impact of aberrant points is reduced see canceled.
2.1 Identify the rank of each value
The rank of each of the values is given in relation to the set of values of the 2 samples. We find two ways to calculate them:
Wilcoxon Test 
Mann Whitney Test 
The gross rank number of the value is given within the data set of the 2 samples. 
The rank is given by comparing its position to the other sample. 
The complexity lies in the case where we have exaequo. For this, the method of the middle ranks is used: they are given the average value of their ranks.
For example :
 If we have 2 equal values that take the 8th and 9th place^{} , then we give them the rank 8.5.
 If we have 3 equal values, which take 10, 11 and 12^{th} Place, then we give them the rank of 11.
2.2 Calculating the sum of the rows per sample
Once the set of ranks is assigned, the sum of the ranks is calculated for each of the two samples.
Taking the table above, we get the following results:
Wilcoxon Test 
Mann Whitney Test 
Sample Rank 1:1, 4, 6 or W1 = 11 Sample Rank 2:2, 3, 5, 7 or W2 = 17 
Sample Rank 1 = 0, 2, 3 is U1 = 5 Sample Rank 2:1, 1, 2, 3 or U2 = 7 
It is noted that the results of the Mann Whitney test can be inferred from the results of the Wilcoxon test, whose rank calculation is simpler. We find:
U_{1} = W_{1} – (n_{} 1 * (n_{1} + 1))/2
U_{2} = W_{2} – (n_{} 2 * (n_{2} + 1))/2
Step 3: Calculate the averages and Variance
Under the null hypothesis, the 2 distributions are the same. We are trying to understand the positioning of U and W visavis the average. In other words, if one of the values of U or W is exaggeratedly weak or strong visavis the average, then we can say that the 2 distributions are different.
For average
n = n_{1} + N_{2}
T_{g }: The number of observations associated with the value in question. If for example we have 2 values of 6, then T_{g} will be 2.
For variance
If n_{1} and N_{2} <= 8
The exact tables of Wilcoxon – Mann Whitney are used:
 One chooses the table that corresponds to the size of the smallest sample
 The column is identified with the value of the individual number of the second sample
 The line is identified with the W value of the sample 1
 At the crossroads, we find the practical value of Wilcoxon – Mann Whitney
Example: We have a sample of 3 individuals and another of 5. The smallest W value is 11. We get a practical value of 0.286.
If n_{1} or n_{2} > 8
In the light of the convergence to the normal law, the following formula is used:
 W_{1} : The value of the sum of the ranks of the sample 1
 E_{W} : the Wilcoxon average
 V_{W} : The Variance of Wilcoxon
For the Mann Whitney test
If n_{1} and N_{2} <= 8
The exact tables of Wilcoxon – Mann Whitney are used:
 One chooses the table that corresponds to the size of the smallest sample
 The column is identified with the value of the individual number of the second sample
 The line is identified with the W value of the sample 1
 At the crossroads, we find the practical value of Wilcoxon – Mann Whitney
Example: We have a sample of 3 individuals and another of 5. The smallest W value is 11. We get a practical value of 0.286.
If n_{1} or n_{2} > 8
In the light of the convergence to the normal law, the following formula is used:
 U_{1} : The value of the sum of the ranks of the sample 1
 E_{U} : Mann Whitney Average
 V_{U} : The Variance of Mann Whitney
Step 5: Critical value
The critical value is the same for the Wilcoxon and Mann Whitney test. We find two ways to calculate it according to the number of individuals per sample.
N_{1} and N_{2} <= 8
The practical value has in this case been chosen in the Wilcoxon or Mann Whitney table. It is compared to the value of the risk we have chosen, depending on the meaning of the test:
Test Type 
Critical value 
Bilateral 
α/2 
Left unilateral 
To 
Right unilateral 
1α 
N_{1} or n_{2} > 8
In this case, with regard to the convergence of the distribution to a normal distribution, the critical value calculated via the normal law is used. It is calculated according to the sense of the test and through the function NORM.S.INV:
Test Type 
Critical value 
Bilateral 
α/2 or 1α/2 
Left unilateral 
To 
Right unilateral 
1α 
Step 6: Calculating the pValue
The PValue To evaluate the risk level of the test. Since the ranking method has the “normalization” of the data, the pvalue is obtained via the formula:
PValue = 2 * (1 – NORM.DIST (ABS (practical value)))
Step 7: Interpretation
Test direction  Result  Statistical conclusion  Practical conclusion 

Bilateral  Practical value < Critival value α / 2 or Practical value > Critival value 1  α / 2  We reject H0  The 2 distributions are different 
Right unilateral  Practical value > Critival value  We reject H0  Sample 1 has larger values than sample 2 
Left unilateral  Practical value < Critival value  We reject H0  Sample 1 has smaller values than sample 2 
Resultat  Statistical conclusion  Practical conclusion 

pvalue > α  We retain H0  Our 2 data series are identical or close with a risk of being wrong with pvalue% 
pValue ≤ α  We reject H0  Our data series are statistically different with a risk of being wrong with pvalue% 
It is simply noted that in general, for samples below 8, pvalue will often be greater than α. This simply indicates that the number of readings is not high enough to guarantee the reliability of the results.
Source
1 – F. Wilcoxon (1945) – Individual comparisons by rankings methods
2 – H. B. Mann, D. R. Whitney (1947) – On a test of whether one of two random variables is stochastically larger than the other
S. Tufféry (2005) – Data Mining and decisionmaking statistics
P. Capéraa, B. Van Custsen (1988) – Methods and models in nonparametric statistics
H. J. Motulsky (2002) – Biostatistics, an intuitive approach
R. Rafiq (2008) – Population comparison, nonparametric tests
R. Defoamer, Mr. Leo, L. The Guelte (1996) – Introduction to statistics