As we know, an important purpose of statistics is studying the variation of object. Hence, variables play an important role in statistics. Characteristics of variables can affect considerably statistical processes.
Overview of data and variable
In statistics, data is a collection of raw information about elements of an object or objects. In general, this collection is arranged and/or organized by certain rules or methods.
Two main components of data are the elements and their characteristics. For an element, each of its characteristics has own value.
For example, a meat product taken from production line at 9:30 AM, having red color and soft texture, containing 31% protein, its pH being 6,2, its stiffness being 1,45 kG/m2 , being evaluated as safe, its weight is 153 g, ...
The values of a certain characteristic are generally change from element to element. So “variable” is used in statistics to denote characteristic of element. All the values of variables is the core of DATA.
Quantitative and qualitiative variables
- A variable is quantitative when its values are numerical and can be ordered, ranked, counted, measured, calculated. Mass, length, water content, preserved time are examples of quantitative variable.
- A variable is qualitative when its values can be placed into distinct categories. For example color of an object, gender of a people, safety of a product.
In some cases, this classification is only relative and the change may occur. Let's examine following examples.
- Use scale to evaluate sensorial attribute of food product: from qualitative to quantitative.
- If E. coli content of a food product is smaller than 5 mg-1, its safety level is classified as good: from quantitative to qualitative.
Sometimes, to facilitate data analysis, values of qualitative variables can be coded by number (for example, male is coded as “0”, female is coded as “1”). But these variables (for example: gender) cannot be classified as quantitative.
Levels of measurement
According to Stanley Smith Stevens (1946 and 1951), all the measurements can be classified into four levels of measurement, or measurement scales, named “nominal”, “ordinal”, “interval”, and “ratio”.
- In nominal level, all the data are classified into exclusive (non overlapping) categories. Each category is represented by a name or a label. Some example of this type of data are color (its values: red, green, blue, . . .), gender (male, female), material (steel, wood, plastic, paper, . . .). Data of this type can not be ranked or put in order. Among nominal data, there is a special type, binary data, which has only two values, yes-no, right-wrong, success-failure.
- Values of data in ordinal level can be ranked or put in a certain order. Strength of taste in a sensory evaluation is an example of ordinal data: none, slight, moderate, strong, extreme. Another example is the ability of a worker: superior, good, average, poor. But the precise differences between data do not exist or do not have any meaning.
- An interval data is a quantitative one, so its values are expressed as numbers, and the precise differences between values exist. But the ratio between two data do not have any meaning. Another property of interval data is that: there is not “true zero”. Temperature in Celsius scale is an example of this type of data: differences between temperatures exist and have meaning, but the ratio between 100°C and 50°C is meaningless; furthermore 0°C is only a point of reference, it does not mean that there is no heat at 0°C. Another example of interval data is date.
- Most of numerical parameters in science and technology belong to the group of ratio scale. The ratio between two values of this data is meaningful: 6 m is twice longer than 3 m. In general, there is true zero for the value of this level: a particle at 0 K (Kelvin) have zero kinetic energy.
Discrete variable - Continuous variable
When we can count the number of values of a variable, this variable is discrete. The number of defects of coffee in a sample is a discrete variable. On the contrary, a continuous variable can take an infinite values in a certain interval, for example pH of fruit juice.
In practice, there are cases that we can consider a discrete variable as continuous one to facilitate the data analysis. On the other hand, because of the limit of measurement devices and methods, the “practical” number of values of a continuous variable is countable. In this case, continuous variable is “discretized”.