Select Page
[Total: 0    Average: 0/5]
The analysis of the variance allows to study the average of K samples.

## Introduction

ANOVA (Analysis of the Variance) allows to analyze the differences between several samples whose response is quantitative (a speed, a pressure…).

There are so many applications. He answered the question: ” is at least one group of values different from the others ?” ». We can with this test:

• compare averages of different populations: For example, we want to compare the number of faults generated on several lots to see if the actions implemented have paid off.
• analyze the effect of a qualitative variable on a continuous variable: For example, one wants to know whether by changing a Part, the performance of the machine is better. We will have 2 quantitative measurement groups (the number of good pieces per hour for example) that we compare.

Why call a variance analysis a test that consists of comparing averages?

Because all of the variance calculations are based on averages.

## The principle

The ANOVA is based on comparing the average of a multitude of groups with the general average. We calculate:

• Intergroup variance called Variance explained : difference between the average of each group and the general average.
• Variance Intra called residual variance : The difference between the value of each individual and the average of the group.
• Total Variance : The difference between the value of each individual and the general average.

The ANOVA is presented in the Shape of a table synthesizing all the results of the calculations.

Source of variance∑squareDOFMean SquareFp-value
Treatment varianceSSTK* - 1MST$\frac{\mathbf{MST}}{\mathbf{MSE}}$p-value
Error varianceSSEn - K*MSE
TotalTSSn - K

* K being the number of different samples to compare.

## Step 1: Assumptions

The Anova is a generalization of the comparison of the averages of several populations. So we’re only doing a bilateral test. Null and alternative assumptions are:

H0 : μ1 = μ2 =… = μk

H1 : At least 2 averages are different

## Step 2: Calculate the sum of squares of the treatment-SST

We start by calculating the sum of square of the Treatment. This is to calculate the difference in values relative to the average of all groups. We are looking at whether the average of the different groups is far from the general average. So if this value is large, the variability between the averages is important and takes us to reject the null hypothesis. In other words, the samples are different.

The calculation formula for each sample is as follows:

With:

• μGroup : average of each of the groups
• General μ: average of all groups
• nk : Size of each sample

## Step 3: Calculate the sum of squares of the Error – SSE

The sum of the squares of the tailings represents the deviations of the values in their own group. The question is whether the values of each group are agglutinated around the average or if there is a lot of variability. It is assumed that if there is a lot of variability between the values of the same group, then the difference between the groups is not clear. Conversely, if we have little variability between the values of the same group (low SSE) and we have many differences between the groups (strong SST), then we can reject the null hypothesis with a high degree of confidence.

Unlike the SST, the SSE is unique regardless of the number of samples. The calculation formula is as follows:

With:

• Xi : The values of the same group
• μGroup : Group average in questions

## Step 4: Deduct Total Sum of Squares – TSS

TSS represents the addition of SSE and SST. This is the total variability of our samples:

With:

• xi : The x values of all samples
• General μ: average of all groups

## Step 5: The number of degrees of freedom

The number of degrees of freedom is representative of the level of knowledge we can draw from our test. In our case, we have three sources of knowledge:

• Number of dof for each SST: dofSST = K – 1
• Nb of dof for SSE: dofSSE = n-K
• Number of dof for TSS: dofTSS = n – 1

With the reminder:

• K: The number of samples
• N: total number of individuals

## Step 6: Calculate the average squares

The average squares represent the “weight” given to the different values of SST and SSE. They are calculated by making the connection with the dof. We find:

• MST: SST/ dof SST
• MSE: SSE/ dofSSE

## Step 7: Practical value

The test statistic represents the relationship between the explained variability and the corrected residual variability of the degrees of freedom. The larger this ratio is the more the variability gap between the groups is pronounced. It is calculated with the following formula:

There is as much practical value as there are groups of samples.

## Step 8 : Critical value

The practical value follows a Fisher’s law for dofSST and dofSSE. The use of Fisher’s law in case we want to test a difference, is a unilateral right test. You choose the desired α value, usually 5%, then determine it either through the specific tables or in Excel with the INVERSE function F.INV (1 – α; dofSST ; dof SSE ).

## Step 9 : Calculate the P-Value

To validate the significance of the test, we compute the P-Value via the Excel formula law. F. (Practical value; dof SST  ; dof SSE).

## Step 10: Interpretation

Test directionResultStatistical conclusionPractical conclusion
BilateralPractical value ≥ Critical valueWe reject H0The samples have averages that differ at the given level of risk α.
Practical value < Critical valueWe retain H0The samples have averages that do not differ at the given level of risk α.
ResultStatistical conclusionPractical conclusion
p-value > αWe retain H0Our data series are identical or similar with a risk of being wrong with p-value%
p-Value < αWe reject H0Our data series are statistically different with a risk of being wrong with p-value%

## Step 11: Significance of the difference

A last indicator shows the level of difference between the different samples. This is calculated in the following way:

According to the grid of Keppel (1991 ):

• 0.01 < ω2 < 0.06 : The difference is low
• 0.06 < ω2 < 0.15 : The difference is moderate
• 0.15 >   ω2 : The difference is high

## Source

J. Bouyer (2000) – Statistical methods

F. Bertrand, M. Maumy (2012) – Elements of analysis of variance

S. Morgenthaler (2007) – Introduction to statistics

L. Chanquoy () – Statistics applied to psychology

M. Lal (2004) – New methods of posturographique signal processing