Select Page
[Total: 0    Average: 0/5]
The Wilcoxon – Mann Whitney test allows you to compare performance levels.

## Introduction

The Wilcoxon-Mann Whitney test is arguably the best known non-parametric test. In the end, these are 2 tests but are inferred from each other. That’s why we’re still talking about the Wilcoxon-Mann Whitney test. Historically, the Frank Wilcoxon test, which he published in 19451, preceded that of Henry Mann and his student Donald Ransom Whitney, published in 19472. The difference lies in the fact that for the Wilcoxon test, one chooses a reference sample which by convention is the one that has the lowest individual number.

The tests of Wilcoxon And Mann Whitney are based on the same principles and the same types of calculations. Most of the software offers the 2 but some calculations either. For this reason we present the two methods.

## The principle

Like most non-parametric tests, we compare data that can be of all types (qualitative, continuous…) to which they are assigned a rank. It is this rank that is evaluated and compared to whether the distribution of values is similar.

Let’s take an example: we have 4 juries who have evaluated 2 products by awarding them a note. The table is as follows:

 Jury 1 Jury 2 Jury 3 Jury 4 Product 1 4 9 11 Lost Note Product 2 6 8 10 12

The question to which the Wilcoxon – Mann Whitney test can be answered in this case is: is one of the 2 products significantly better than the other?

## Step 1: Assumptions

The Wilcoxon – Mann Whitney test relies on comparing the distribution of stored data. In the case of a bilateral test, one asks:

H : The 2 distributions are identical

H1 : The 2 distributions are different

A unilateral test can also be carried out and indicate:

H : The 2 distributions are identical

H1 : The value of Sample 1 is < in Sample 2 (left unilateral Test)

or

H1 : The value of Sample 1 is > in Sample 2 (right unilateral Test)

## Step 2: Calculate the sum of the ranks

Like most non-parametric tests, the Wilcoxon-Mann Whitney statistic uses the sum of ranks. A new variable is introduced which is the sum of the ranks of each sample. This has 2 consequences:

• The distribution of the data necessarily becomes symmetrical regardless of the initial distribution. Through this transformation in line, we find a normal law.
• The impact of aberrant points is reduced see canceled.

### 2.1 Identify the rank of each value

The rank of each of the values is given in relation to the set of values of the 2 samples. We find two ways to calculate them:

 Wilcoxon Test Mann Whitney Test The gross rank number of the value is given within the data set of the 2 samples. The rank is given by comparing its position to the other sample.

The complexity lies in the case where we have ex-aequo. For this, the method of the middle ranks is used: they are given the average value of their ranks.

For example :

• If we have 2 equal values that take the 8th and 9th place , then we give them the rank 8.5.
• If we have 3 equal values, which take 10, 11 and 12th Place, then we give them the rank of 11.

### 2.2 Calculating the sum of the rows per sample

Once the set of ranks is assigned, the sum of the ranks is calculated for each of the two samples.

Taking the table above, we get the following results:

 Wilcoxon Test Mann Whitney Test Sample Rank 1:1, 4, 6 or W1 = 11 Sample Rank 2:2, 3, 5, 7 or W2 = 17 Sample Rank 1 = 0, 2, 3 is U1 = 5 Sample Rank 2:1, 1, 2, 3 or U2 = 7

It is noted that the results of the Mann Whitney test can be inferred from the results of the Wilcoxon test, whose rank calculation is simpler. We find:

U1 = W1 – (n 1 * (n1 + 1))/2

U2 = W2 – (n 2 * (n2 + 1))/2

## Step 3: Calculate the averages and Variance

Under the null hypothesis, the 2 distributions are the same. We are trying to understand the positioning of U and W vis-a-vis the average. In other words, if one of the values of U or W is exaggeratedly weak or strong vis-a-vis the average, then we can say that the 2 distributions are different.

For average

n1 and n2 : Size of samples 1 and 2

n = n1 + N2

Tg : The number of observations associated with the value in question. If for example we have 2 values of 6, then Tg will be 2.

For variance

In the case where there are joint ex-ties to the 2 samples, the Variance must be adjusted:

## Step 4: Practical value of Wilcoxon and Mann Whitney

### For the Wilcoxon test

#### If n1 and N2 <= 8

The exact tables of Wilcoxon – Mann Whitney are used:

1. One chooses the table that corresponds to the size of the smallest sample
2. The column is identified with the value of the individual number of the second sample
3. The line is identified with the W value of the sample 1
4. At the crossroads, we find the practical value of Wilcoxon – Mann Whitney

Example: We have a sample of 3 individuals and another of 5. The smallest W value is 11. We get a practical value of 0.286.

#### If n1 or n2 > 8

In the light of the convergence to the normal law, the following formula is used:

• W1 : The value of the sum of the ranks of the sample 1
• EW : the Wilcoxon average
• VW : The Variance of Wilcoxon

### For the Mann Whitney test

#### If n1 and N2 <= 8

The exact tables of Wilcoxon – Mann Whitney are used:

1. One chooses the table that corresponds to the size of the smallest sample
2. The column is identified with the value of the individual number of the second sample
3. The line is identified with the W value of the sample 1
4. At the crossroads, we find the practical value of Wilcoxon – Mann Whitney

Example: We have a sample of 3 individuals and another of 5. The smallest W value is 11. We get a practical value of 0.286.

#### If n1 or n2 > 8

In the light of the convergence to the normal law, the following formula is used:

• U1 : The value of the sum of the ranks of the sample 1
• EU : Mann Whitney Average
• VU : The Variance of Mann Whitney

## Step 5: Critical value

The critical value is the same for the Wilcoxon and Mann Whitney test. We find two ways to calculate it according to the number of individuals per sample.

### N1 and N2 <= 8

The practical value has in this case been chosen in the Wilcoxon or Mann Whitney table. It is compared to the value of the risk we have chosen, depending on the meaning of the test:

 Test Type Critical value Bilateral α/2 Left unilateral To Right unilateral 1-α

### N1 or n2 > 8

In this case, with regard to the convergence of the distribution to a normal distribution, the critical value calculated via the normal law is used. It is calculated according to the sense of the test and through the function NORM.S.INV:

 Test Type Critical value Bilateral α/2 or 1-α/2 Left unilateral To Right unilateral 1-α

## Step 6: Calculating the p-Value

The P-Value To evaluate the risk level of the test. Since the ranking method has the “normalization” of the data, the p-value is obtained via the formula:

P-Value = 2 * (1 – NORM.DIST (ABS (practical value)))

## Step 7: Interpretation

Test directionResultStatistical conclusionPractical conclusion
BilateralPractical value < Critival value α / 2

or

Practical value > Critival value 1 - α / 2
We reject H0The 2 distributions are different
Right unilateralPractical value > Critival valueWe reject H0Sample 1 has larger values than sample 2
Left unilateralPractical value < Critival valueWe reject H0Sample 1 has smaller values than sample 2
ResultatStatistical conclusion Practical conclusion
p-value > αWe retain H0Our 2 data series are identical or close with a risk of being wrong with p-value%
p-Value ≤ αWe reject H0Our data series are statistically different with a risk of being wrong with p-value%

It is simply noted that in general, for samples below 8, p-value will often be greater than α. This simply indicates that the number of readings is not high enough to guarantee the reliability of the results.

## Source

1 – F. Wilcoxon (1945) – Individual comparisons by rankings methods

2 – H. B. Mann, D. R. Whitney (1947) – On a test of whether one of two random variables is stochastically larger than the other

S. Tufféry (2005) – Data Mining and decision-making statistics

P. Capéraa, B. Van Custsen (1988) – Methods and models in non-parametric statistics

H. J. Motulsky (2002) – Biostatistics, an intuitive approach

R. Rafiq (2008) – Population comparison, non-parametric tests

R. Defoamer, Mr. Leo, L. The Guelte (1996) – Introduction to statistics