Introduction
The independence of the data is necessary for a regression study. Independence results in the fact that a given is not self correlated. It is clear that it does not have an evolution according to time, but a random evolution or at least dependent on another data than that of time. In mathematical terms, one must check whether the residues are produced in a random way.
In addition to the visual aspect of a tailings graph (evolution of differences between the regression model and the actual value), a hypothesis test is performed called the Durbin Watson test.
It is noted that for this test to work properly, our prediction model must have a constant a_{ }.
The principle
The invention of this method is due to J. Durbin and G. S. Watson 1950. Its principle is based on a study of the autocorrelation (tailings calculation): The sum of the differences in tailings to T-1 and the sum of the residues (the discrepancies between our prediction model and the actual values) are reported.
Step 1: Assumptions
We put ρ as the autocorrelation of our values. The following assumptions are posed:
H0: ρ = 0
H1: ρ ≠ 0
Step 2: Practical value
The practical value d is representative of our autocorrelation ratio. This is calculated with the following formula:
With e_{t} being the error also called residue and representative of the difference between our prediction model and the actual values.
Step 3: Critical value
The creators of this test have identified an exact table that is dependent on the number of values we have. Reading this table gives us the critical values dl and du, between 0 and 2, and which delimit the space of the practical value.
Step 4: Interpretation
The rule of interpretation is not usual. It’s always a bilateral test. It is also known that the practical value, by its construction, varies between 0 and 4. The rules of interpretation are as follows:
Result |
Statistical Conclusion |
Practical Conclusion |
du < d < 4 – du (ρ = 0) |
We retain H0 |
There is no autocorrelation, our data are quite independent. |
d < dl (ρ > 0) or d > 4 – dl (ρ < 0) |
We reject |
There is autocorrelation, our data is not independent. |
dl < d < du or 4 – du < d < 4 – dl |
Uncertainty |
There is uncertainty, play on the level of risk or bring uncertainty about the correlation. |