Mean is an important quantity in statistics, it is used to characterize central tendency of data. So tests on the mean is widely realized on numerical values.
Introduction
Consider a population whose mean of random variable `X` is `mu` and standard deviation is `sigma`. Drawing a sample with size `n`, whose mean is `bar x` and standard deviation is `s`. We have to decide some statements about the relation of the mean `mu` with a value `a`, with confidence level (`1–alpha`) (or significance level `alpha`).
In this case, null hypothesis has the form:
Ho : `mu=a`(3)
Depend on context, alternative hypothesis can be `mu!=a` ; or `mu>a` ; or `mu< a`.
Example : Assume that protein content of acceptable milk is 3,7%. To evaluate quality of milk produced by company C, 10 samples were taken and analyzed. The result are: mean protein content is 3,4 %. Is this milk acceptable with confidence level of 95 %?
We recognized that:
- Population is all the milk produced by company C. Variable of interest `X` is protein content of this company's milk.
- We don't know the mean `mu` of `X` in this population.
- We take a sample with `n=10` to analyze. The result show that the mean of `X` in this sample is 3,4%
- Now we have to decide the relation of `mu` with the acceptable value 3,7%
So the pair of hypotheses can be
Ho : `mu=3,7`
Ha : `mu< 3,7`
There are 4 main cases :
- `sigma` is known, `n` is large
- `sigma` is known, `n` is small, `X` is normally distributed
- `sigma` is unknown, `n` is large
- `sigma` is unknown, `n` is small, `X` is normally distributed
Tests on the mean, variance known
When `sigma` is known, test statistic is:
`z=(bar x-a)/(sigma/sqrt(n))`(4)
We recognize that `z` represents the relative distance between data of sample and null hypothesis. The larger `z` is, the longer the distance is, the higher ability to reject null hypothesis.
When the sample is large or small but `X` being normally distributed, the distribution of `z` is standard normal distribution `N(0,1)`.
Tests on the mean, variance unknown
Because `sigma` is unknown, we use standard deviation of sample `s` to replace. In this case, test statistic is:
`t=(bar x-a)/(s/sqrt(n))`(5)
Concerning probability distribution of test statistic, there are two cases:
- If sample is large, we use standard normal distribution.
- If sample is small (but `X` being normally distributed), Student's distribution is used.
Example
Average calcium content in powder milk is 1,2 %. Company C announces that calcium content of its product is higher than that value. A sample of 10 boxes is controlled for verification. The mean and standard deviation of this sample is 1,4 % and 0,4 % respectively. Can we accept the announcement of this company with confidence level of 95 %?
In this case, population is all the powder milk produced by company C, variable of interest `X` is calcium content in powder milk (unit %), Sample size is 10, mean of `X` in sample is 1,4, standard deviation of `X` in sample is 0,4. We would like to know the relation of mean of `X` in population with 1,2.
Then the hypothesis testing is realized with the following steps.
Tested parameter
Based on the context, we recognize that test parameter is the mean of calcium content in powder milk produced by company C, symbolized as `Ca`.
Construction of pair of hypotheses
In order to verify the announcement of company C, pair of hypotheses are:
- Ho : `Ca=1,2`
- Ha : `Ca>1,2`
Determining significance level `alpha`
Confidence level is 95%, then `alpha=0,05`.
Test statistic and its distribution
Because standard deviation of population `sigma` is unknown, test statistic is:
`t=(bar x-a)/(s/sqrt(n))`
Sample size is 10 : this is a small sample, hence probability distribution is Student's distribution.
Determining of `t`*
Based on Ha, the rejection region is in the right of `t`*, so `t`*`=t_(alpha,n-1)`.
Using percentage point table of Student's distribution: `t`*`= t_(0,05,9)=1,833`
Calculating `t_o`
`t_o=(1,4-1,2)/((0,4)/sqrt(10))=1,58`
Comparing `t`* and `t_o`
We recognize that `t_o< t`* : we accept Ho.
Conclusion
Because we cannot reject Ho, we conclude that calcium content of powder milk produced by company C is in ordinary level.