In many cases, there are repetitions in some values of a variable. And in many cases we are interested not only in the values of a variable but also in number of elements for these values. In these cases we have to examine the frequency distribution of data.
Frequency & Table of frequency
Let's examine the value of pH in Example 1, we have Table 1.
| Sample | pH |
|---|---|
| 0814091 | 6,0 |
| 0814092 | 6,0 |
| 0814093 | 6,1 |
| 0814101 | 6,0 |
| 0814102 | 5,9 |
| 0814103 | 6,0 |
| 0814104 | 6,1 |
| 0814111 | 6,1 |
| 0814112 | 6,0 |
| 0814113 | 5,9 |
We can recognize that the value "6,1" of pH appears 3 times. We define that "3" is the frequency of "6,1". Hence:
"Frequency of a value is the number of appearance of this value"
Frequency of value `i` (sometimes named as absolute frequency) is symbolized as `f_i`.
If we use a table to show the frequency of values of pH in Table 1, we obtain Table 2, named as table of frequency.
| pH | Frequency |
|---|---|
| 5,9 | 2 |
| 6,0 | 5 |
| 6,1 | 3 |
Note that for one variable, we have one corresponding table of frequency.
Relative frequency
Another important form of frequency is relative frequency. It is defined as the ratio of frequency and number of elements of investigated set.
| `"Relative frequency"="Frequency"/"Number of elements"` | (1) |
or :
| `p_i=f_i/n` | (2) |
For example, for value 6,0 of pH, we have:
`p_(6,0)=f_(6,0)/n=5/10=0,5`
So we can extend table of frequency by adding a column for relative frequency (Table 3)
| pH | Frequency | Relative frequency |
|---|---|---|
| 5,9 | 2 | 0,2 |
| 6,0 | 5 | 0,5 |
| 6,1 | 3 | 0,3 |
We recognize that :
| `sum_i p_i = 1` | (3) |
Partitioning
Let's examine the value of weight in Example 1. We obtain Table 4.
| Sample | Weight (g) |
|---|---|
| 0814091 | 252,3 |
| 0814092 | 251,4 |
| 0814093 | 251,8 |
| 0814101 | 252,1 |
| 0814102 | 253,0 |
| 0814103 | 252,0 |
| 0814104 | 251,6 |
| 0814111 | 251,1 |
| 0814112 | 252,8 |
| 0814113 | 251,5 |
We recognize that all the values of weight are different. If we realize the table of frequency as the precedent case, the result is not interesting.
Now we separate all the values of weight into 4 groups based on their values:
[251,0 - 251,5], [251,5 - 252,0], [252,0 - 252,5] and [252,5 - 253,0].
And we expand the definition of frequency such that:
"Frequency of a group is the number of elements belonging to this group"
The relative frequency can be expanded by the similar way.
By this method, we can obtain the table of frequency for weight in Table 5.
| Weight | Frequency | Relative frequency |
|---|---|---|
| 251,0 - 251,5 | 2 | 0,2 |
| 251,5 - 252,0 | 3 | 0,3 |
| 252,0 - 252,5 | 3 | 0,3 |
| 252,5 - 253,0 | 2 | 0,2 |
Left-side rule
When we partition a numerical variable, we apply the left-side rule. It means that the interval `a - b` consists the values `x` such that `a<= x< b` (except the last group). For example the value 252,0 belongs to group [252,0 - 252,5] and does not belong to group [251,5 - 252,0]. The exception is the last group: 253,0 belongs to group [252,5 - 253,0].
Cumulative frequency
Cumulative frequency `ff_i` of value `x_i` is the number of elements whose values are less or equal than `x_i`. Relative cumulative frequency of this value is the ratio between `ff_i` and the number of elements `n`.
Example 2 : We separate 50 candies into groups based on their weights. The result is shown in Table 6.
| Weight (g) | Candies |
|---|---|
| 4,3 | 9 |
| 4,4 | 22 |
| 4,5 | 14 |
| 4,6 | 5 |
| Total | 50 |
From Table 6 we can establish table of cumulative frequency based on the definition above (Table 8).
| Weight (g) | Cumulative candies |
|---|---|
| ≤ 4,3 | 9 |
| ≤ 4,4 | 31 |
| ≤ 4,5 | 45 |
| ≤ 4,6 | 50 |
| Weight (g) | Cumulative frequency |
|---|---|
| 4,3 | 9 |
| 4,4 | 31 |
| 4,5 | 45 |
| 4,6 | 50 |
We can expand the concept "cumulative" to relative frequency. For example, we obtain Table 9 for the case of candies.
| Weight (g) | Cumulative frequency | Relative cumulative frequency |
|---|---|---|
| 4,3 | 9 | 0,18 |
| 4,4 | 31 | 0,62 |
| 4,5 | 45 | 0,90 |
| 4,6 | 50 | 1,00 |