There is a whole set of data to study and not all of them study with the same tools. Understanding the different types of data makes it possible to:
- Identify the type of data test for the analysis.
- Identify the level of complexity and performance of the mathematical tools to be applied.
Qualitative data (attribute)
Qualitative data (commonly called attribute) contain values that express a quality, a state, of which we cannot calculate an average, a limit… They do not answer the question “how much” but “does”. Their values are defined beforehand and can be:
- A brand: Ford, Peugeot…
- One color: blue, black…
- A judgement: good/Not good, small/large…
The arithmetic operations that can be performed on this type of variable are relatively small and are limited to the counting of the numbers by mode and the calculation of the relative frequencies.
What to do with qualitative data?
Statistically speaking, qualitative data are more complex to deal with than others.
The most effective method is to be able to transform a qualitative variable in quantitative variable.
- If the variable we want to study is a color, then we can transform the qualitative aspect “blue, red, green” in quantitative aspect via the wavelength of the color.
- If we are in the case of the occurrence of the defect, we can transform the “good/Not good”, in dimension of the defect.
The qualitative data and variables are in two types.
A variable is called nominal qualitative when its values are elements of a non-hierarchical category. In other words, its elements cannot be stored in a logical gradation, according to a natural hierarchy. The nominal qualitative data can therefore only be apprehended through modalities between which there is no relationship of order.
It is a variable of nominal qualitative type, the values which can be taken by it being of type name (green, yellow, black, red,…) without any hierarchy being applicable between the modes listed (we can under no circumstances write yellow > red or green = black).
An ordinal qualitative variable has all the properties of the nominal qualitative variable plus the ability to position and prioritize the individuals between them depending on the value attached to their character. In other words, it will be possible to store in a logical gradation, according to a natural hierarchy, the individuals of the population studied for the character retained. The operations allowed for the ordinal qualitative scale are, in addition to the count by mode (absolute frequencies and relative frequencies and mode), the median.
Example: The Comfort
The variable ” comfort level of a house “is of the ordinal qualitative type, the values that can be taken being of type name (mediocre, medium, good, very good) And a hierarchy exists between the modalities defined without the ability to measure infalliblely the level of comfort: there is no ” Conforometer “or unit of measurement of the parameter” Comfort “Very subjective. The ordinal character of the variable allows however to write good >mediocre or < medium very good.
The quantitative data
Quantitative data or variables contain numerical values that refer to a recognized unit of measure. For this reason, they are sometimes referred to as metric variables. The size, the weight, the surface, the distance, the income, the age, the turnover or even the population (in the sense of the number of inhabitants) are quantitative variables.
All simple and complex arithmetic operations are applicable to quantitative variables, count (absolute frequencies) and other percentage calculation (relative frequencies), passing through the mean, median and standard deviation up to Numerical modeling.
Example: Rent of a house
Beyond the qualification of a rent (cheap, correct, expensive or very expensive) which then makes an ordinal qualitative variable, the rent remains a variable measurable objectively according to a recognised unit of measure: The price expressed in euros per month or in euros Per month and by m2. You can add it, calculate the mean and standard deviation, group the values in classes and even model it.
More complex and especially able to be treated with a substantial number of mathematical tools, this data can be classified into 2 sub-groups.
This type relates to data referring to constant units of measurement but whose zero point is arbitrarily fixed which does not correspond in any way to the absence of phenomena.
Example: The temperature
The unit of measurement of the temperature is constant once the reference system is defined (Celsius or Fahrenheit) and the zero is totally arbitrary. In the case of the Celsius °c system, zero corresponds to the freezing temperature of the water, whereas in the case of the Fahrenheit ° F system, zero is equivalent to the solidification temperature of an equal mixture of water and ammonium chloride (Fahrenheit, 1724). Taking advantage of the quantitative nature of the temperature variable, a relationship can however be established between the two systems as follows: °f = 1.8 °c + 32 and conversely °c = (°f – 32)/1.8. 0 °c Just like 0 °f does not correspond to a lack of temperature. Even considering the absolute zero (0 ° K =-273.15 ° C), the lowest temperature that can be observed in the universe and to which any molecular and atomic movement is halted due to a minimal energy state, the temperature remains a variable of the interval scale.
The interval scale, in addition to classical arithmetic operations, allows for most statistical calculations: arithmetic mean, standard deviation, correlation coefficient, variance, covariance, etc. However, it does not allow the calculation of the geometric mean or the coefficient of variation.
Apart from the temperature, many other variables refer to the interval scale. Among these, we can cite the Richter scale of intensity measurement of earthquakes, the measurement of time via our Gregorian calendar.
Unlike the interval scale, the ratio scale is characterized by an equal proportion of the measured values in such a way that there is a direct and constant mathematical relationship between these values. The ratio scale also has a unique and universal zero. All variables referring to the International System of Unity (SI – ISO 1000) belong to the scale of measurement so-called ratio: this is the case of lengths, surfaces, weights and counts of manpower as well as the measurement of time via the SI, and All the resulting variables of the combination of at least two of the SI units such as speed, population density… Zero is universal and means no measurement or zero measurement, and each non-zero measured value is necessarily the multiple of any other measured value.
Example: The weight
It can be said that a person weighing 90 kg is twice as heavy as a person of 45 kg or even that a rent of €337.50/month is 1.5 times (or 50) higher than a rent of €225/month.
The ratio scale has all the properties and levels of information of other scales plus the immense advantage of lending itself to absolutely any arithmetic and statistical operations that may exist.
A variable is said to be discreet when it takes a finite or countable number of values. In other words, the transition from one modality to another is ” brutal “, without continuity, without progressive slippage.
Example: The number of inhabitants
The number of inhabitants of a country or a city is a quantitative variable discrete ratio. The discrete nature of the variable is justified by the indivisible character of the basic element, namely the inhabitant: Thus, the set of values that can take the variable ” Number of inhabitants belongs to all the integers N. It is therefore not possible to write that a city has 12283.18 inhabitants.
A continuous variable may, unlike the discrete variable, take an infinite or uncountable number of values. There is thus no more modality or rather an infinity of modalities because between two values given all the nuances of transitions are possible.
Example: The temperature
The variable ” temperature ” is a quantitative variable of continuous interval. It can indeed take an infinity of values regardless of the limits. For example, between 10 and 12 °c, the variable can take any of the countless existing and measurable values: 10.007 °c, 11.11 °c or even 11.9999 °c if one is able to achieve this precision in the measure.
C. Muller (1973)-Introduction to Methods of linguistic statistics
R. Veysseyre (2006)-Statistics and probability for the engineer
CD4 3534-1 Standard