Introduction to single factor experiment
Consider a general case of single factor experiments. The experiment is carried out to study the effect of factor A with `a` levels. To simplify the experiment, we realize `n` runs for each level. So experiment consists of `an` runs. In the method of Completely Randomized Design, `an` runs are realized in a random order.
For each run, we obtain value `y_(ij)` of response `Y` corresponding to run `j` of level `i`. With each level `i` of factor A, there are `n` values of `y_i` for `n` runs. So there are `an` values of `Y` as on Table 1.
| Level | |||||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | . . . | `i` | . . . | `a` | ||
| Runs | 1 | `y_(11)` | `y_(21)` | . . . | `y_(i1)` | . . . | `y_(a1)` |
| 2 | `y_(12)` | `y_(22)` | . . . | `y_(i2)` | . . . | `y_(a2)` | |
| . . . | . . . | . . . | . . . | . . . | . . . | . . . | |
| `j` | `y_(1j)` | `y_(2j)` | . . . | `y_(ij)` | . . . | `y_(aj)` | |
| . . . | . . . | . . . | . . . | . . . | . . . | . . . | |
| `n` | `y_(1n)` | `y_(2n)` | . . . | `y_(i n)` | . . . | `y_(an)` | |
| Mean | `bar y_1` | `bar y_2` | . . . | `bar y_i` | . . . | `bar y_n` | |
The example below will illustrate these notions.
Example 4
A manufacturer of fish ball would like to improve the firmness of his product by soybean protein. To study the effect of this ingredient on firmness and sensory quality, an experiment was realized with 5 levels of soybean protein content: 10%, 14%, 18%, 22%, 26% (5 treatments). With each level of soybean protein content, there are 4 runs. So this experiment consists of 20 runs. These runs are realized in random order, for example 7, 15, 2, 11, 19, 4, 5, 18, 6, ...
The measurement of firmness of these runs are presented in Table 2.
| Protein content (%) | ||||||
|---|---|---|---|---|---|---|
| 10 | 14 | 18 | 22 | 26 | ||
| Run 1 | 181 | 194 | 204 | 210 | 216 | |
| Run 2 | 179 | 197 | 208 | 214 | 214 | |
| Run 3 | 182 | 201 | 205 | 210 | 212 | |
| Run 4 | 183 | 195 | 206 | 209 | 215 | |
| Mean | 181,3 | 196,8 | 205,8 | 211,5 | 214,3 | |
Analysis of variance for single factor experiments
The variation of response `Y` is due to:
- effect of factor A
- random errors
The variation of `Y` due to factor A is characterized by:
| `SS_A=nsum_(i=1)^a (bar y_i-bar y)^2` | (2) |
(`SS` is the acronym of sum of squares)
The variation of `Y` due to random errors is characterized by:
| `SS_E=sum_(i=1)^a sum_(j=1)^n (y_(ij)-bar y_i)^2` | (3) |
Total sum of squares `SS_T` is defined by:
| `SS_T=sum_(i=1)^a sum_(j=1)^n (y_(ij)-bar y)^2` | (4) |
We can prove that:
`SS_T=SS_A+SS_E`(5)
We also define:
| `MS_A=(SS_A)/(a-1)` | (6) |
| `MS_E=(SS_E)/(a(n-1))` | (7) |
(`MS` is the acronym of mean of squares)
In formulae (6) and (7), the denominators of `MS_A` and `MS_E` are degrees of freedom `df_A` and `df_E` respectively.
We also can prove that the ratio:
| `F=(MS_A)/(MS_E)` | (8) |
conforms to Fisher distribution with degrees of freedom `df_A` and `df_E`.
Therefore, `F` represents the difference between variation due to factor A and variation due to random errors.
Because ANOVA is a form of hypothesis testing, so to conclude about the differences between these variations, we have to compare `F_o` (calculated from data) to critical value `F"*"`.
Because this case is a one-sided test with rejection region being in the right of `F`* and significant level `alpha` so:
`F`*`=F_(alpha,df_A,df_E)=F_(alpha,a-1,a(n-1)`
If :
- `F_o>F`* : we conclude that factor A does affect on `Y`.
- `F_o< F`* : we conclude that effect of factor A on `Y` is insignificant statistically.
When we use softwares to realize ANOVA, the results are shown in the form of Table 3.
| Source of variation | Degree of freedom | `SS` | `MS` | `F_o` | `F`* | `p` value |
|---|---|---|---|---|---|---|
| Treatment | `a-1` | `SS_A` | `MS_A` | `(MS_A)/(MS_E)` | `F_(alpha,a-1,a(n-1)` | |
| Error | `a(n-1)` | `SS_E` | `MS_E` | |||
| Total | `an-1` | `SS_T` |
Numerical calculation
Manual analysis of variance by procedure above is tedious and the errors can propagate if we use formulae (2), (3) and (4). To facilitate the analysis and increase the accuracy, we use another way to calculate `SS_T` and `SS_A`.
To simplify the formulae, put:
`y_(i*)=sum_(j=1)^n y_(ij)`(9)
and `y_(* *)=sum_(i=1)^a sum_(j=1)^n y_(ij)`(10)
Then we calculate `SS_T` and `SS_A` by formulae:
`SS_T=sum_(i=1)^a sum_(j=1)^n y_(ij)^2\ -\ (y_(* *)^2)/(an)`(11)
`SS_A=1/nsum_(i=1)^a y_(i*)^2\ -\ (y_(* *)^2)/(an)`(12)
| with : | `CF=y_(* *)^2/(an)` | (13) |
`CF` is denoted as correction factor.
The example below will illustrate these notions.
Example
To study the effect of preferred colour on intelligence of human being, an experiment was conducted with 3 colours A, B and C. For each colour, 10 persons are tested and their `IQ` are measured. The result is shown in Table 4.
| Colour A | Colour B | Colour C | |
|---|---|---|---|
| 102 | 89 | 51 | |
| 88 | 100 | 76 | |
| 106 | 92 | 90 | |
| 93 | 76 | 117 | |
| 98 | 64 | 103 | |
| 104 | 104 | 64 | |
| 90 | 66 | 64 | |
| 103 | 98 | 50 | |
| 99 | 90 | 89 | |
| 92 | 82 | 67 |
This is a single factor experiment used to study the effect of factor "preferred colour" on response `IQ`. Factor is studied with 3 levels (`a=3`) and 10 runs for each level (`n=10`). So the experiment consists of 30 runs.
For a manual analysis, we develop Table 4 to obtain Table 5.
| Colour A | Colour B | Colour C | `sum` | |
|---|---|---|---|---|
| 102 | 89 | 51 | ||
| 88 | 100 | 76 | ||
| 106 | 92 | 90 | ||
| 93 | 76 | 117 | ||
| 98 | 64 | 103 | ||
| 104 | 104 | 64 | ||
| 90 | 66 | 64 | ||
| 103 | 98 | 50 | ||
| 99 | 90 | 89 | ||
| 92 | 82 | 67 | ||
| `y_(i*)` | 975 | 861 | 771 | 2607 |
| `y_(i*)^2` | 950.625 | 741.321 | 594.441 | 2.286.387 |
| `sum y_(ij)^2` | 95.427 | 75.857 | 63.877 | 235.161 |
| Mean | 97,5 | 86,1 | 77,1 |
From Table 5, we obtain:
`y_(* *)=sum_(i=1)^a sum_(j=1)^n y_(ij)=2607` `sum_(i=1)^a sum_(j=1)^n y_(ij)^2=235.161`
`sum_(i=1)^a y_(i*)^2=2.286.387` `CF=y_(* *)^2/(an)=2607^2/(3xx10)=226.548,3`
`SS_T=sum_(i=1)^a sum_(j=1)^n y_(ij)^2\ -\ (y_(* *)^2)/(an)=235.161-226.548,3=8612,7`
`SS_A=1/nsum_(i=1)^a y_(i*)^2\ -\ (y_(* *)^2)/(an)=2.286.387/10-226.548.3=2090,4`
`SS_E=SS_T-SS_A=8612,7-2090,4=6522,3`
`MS_A=(SS_A)/(a-1)=(2090,4)/(3-1)=1045,2`
`MS_E=(SS_E)/(a(n-1))=(6522,3)/(3xx(10-1))=241,567`
`F_o=(MS_A)/(MS_E)=(1045,2)/(241,567)=4,327`
With confidence level of 95%, critical value of `F` is:
`F`*`=F_(alpha,a-1,a(n-1))=F_(0,05,2,27)=3,354`.
Because `F_o>F`* ; we conclude preferred colour affects `IQ`.