Overview
Consider two populations whose means of random variable `X` are `mu_1` and `mu_2`, their standard deviations are `sigma_1` and `sigma_2`. Drawing two sample with size `n_1` and `n_2`, whose means are `bar x_1` and `bar x_2`, their standard deviations are `s_1` and `s_2`.
In general, we would like to compare the means of these populations with confidence level `(1-alpha)` or significance level `alpha`.
So the null hypothesis is Ho : `mu_1=mu_2`(8)
Depend on the context, the alternative hypothesis can be `mu_1!=mu_2` ; or `mu_1< mu_2` ; or `mu_1>mu_2`.
Comparing two means, variances known
When variances `sigma_1^2` and `sigma_2^2` are known, test statistic is:
`z=(bar x_1-bar x_2)/sqrt((sigma_1^2/n_1)+(sigma_2^2/n_2))`(9)
This test statistic conforms to standard normal distribution `N(0,1)`.
Comparing two means, variance unknown
Large sample
When variances `sigma_1^2` and `sigma_2^2` are unknown, but sample sizes are large, we can use test statistic similar to (9) in which `sigma` are replaced by `s`:
`z=(bar x_1-bar x_2)/sqrt(s_1^2/n_1+s_2^2/n_2)`(10)
This test statistic also conforms to standard normal distribution `N(0,1)`.
Small sample
We have to distinguish two cases.
Standard deviations of populations are equal
If we know that standard deviations of two populations are equal (eventhough this value is unknown), then we defined pooled variance by:
`s_p^2=((n_1-1)s_1^2+(n_2-1)s_2^2)/(n_1+n_2-2)`(11)
And test statistic is :
`t=(bar x_1-bar x_2)/sqrt(s_p^2(1/n_1+1/n_2))`(12)
This test statistic conforms to Student's distribution with degree of freedom `nu=n_1+n_2-2`.
Standard deviations of populations are unequal
In this case, test statistic is determined by :
`t=(bar x_1-bar x_2)/sqrt(s_1^2/n_1+s_2^2/n_2)`(13)
This test statistic also conforms to Student's distribution with degree of freedom calculated by:
| `nu=((s_1^2/n_1+s_2^2/n_2)^2) / ( (s_1^2/n_1)^2/(n_1-1) + (s_2^2/n_2)^2/(n_2-1) )` | (14) |
When `nu` is not an integer, we have to round it to the nearest integer (in order to use the percentage point table).
Match paired test
In comparing means, there are cases which the data `x_(1i)` and `x_(2i)` of two sample are matched or paired by a certain way. For example, to study the effect of a preservation method, we compare the weight loss of treated and untreated fruits during preservation. So the sample sizes are equal `n_1=n_2=n`.
In these cases, hypothesis testing are realized as follows:
- For each pair of data, calculate `d_i=x_(1i)-x_(2i)`
- For `n` value of `d_i`, there are a mean `bar d` and a standard deviation `s_d`
- Put `delta=mu_1-mu_2` (difference of two means of two population).
- Construct null hypothesis as Ho : `delta=0`
- Alternative hypothesis can be `delta !=0` ; or `delta< 0` ; or `delta>0` depend on context.
- Using test statistic :
`K=bar d/(s_d/sqrt(n))`(15)
- If sample is large, `K` conforms to standard normal distribution. If sample is small, `K` conforms to Student's distribution.
- The next steps are realized similar to that of other types of hypothesis testing.
Example
The result of comparison of calories of two types of biscuits is presented in Table 1.
| Biscuit A | Biscuit B | |
|---|---|---|
| Sample size | 8 | 10 |
| Mean (kcal) | 325 | 295 |
| Standard deviation (kcal) | 34 | 26 |
Are the calories of two types of biscuits really different with confidence level of 95%?
In this case, we have to compare the mean of calories `mu_A` and `mu_B` for two types of biscuits A and B. Hypothesis testing is realized as follows.
- Pair of hypotheses :
- Ho : `mu_A=mu_B`
- Ha : `mu_A!=mu_B`
- Because sample is small and we don't have any information about standard deviations of these types of biscuit, the test statistic is:
`t=(bar x_A-bar x_B)/sqrt(s_A^2/n_A+s_B^2/n_B)`
- This test statistic conforms to Student distribution with degree of freedom:
`nu=(s_A^2/n_A+s_B^2/n_B)/ ((s_A^2/n_A)^2/(n_A-1)+(s_B^2/n_B)^2/(n_B-1)) = (34^2/8+26^2/10)/ ((34^2/8)^2/(8-1)+(26^2/10)^2/(10-1)) = 12,88`
- This is two-sided hypothesis testing, probability density function of Student's distribution is even, therefore critical value of test statistic is `t"*"=t_(0,025,13)=2,1604`.
- From the data :
`t_o=(325-295)/sqrt(34^2/8+26^2/10)=2,06`
- Because `|t_o|< t`* ; we cannot reject Ho.
- Therefore the calories of two types of biscuits A and B are equal.