Testing the independence of two attributes

To understand better this type of hypothesis test, let's consider this example.

Example

A marketing research would like to know the relation between preferred products and income of consumers. Meat product and fruit product are two preferred ones. The income of consumers is classified to 3 categories: high, intermediate, and low. We would like to know if the food preference depends on the income of the consumers with confidence level of 95 %.

The result of survey is summarized in Table 1, known as contingency table (or crosstab or two-way table).

Table 1 Summary of survey on food preference of groups classified by income
	High income	Intermediate income	Low income	Total
Meat product	18	42	58	118
Fruit product	42	28	12	82
Total	60	70	70	200

From Table 1, we know that there are 200 consumers participating in the research, in which:

118 consumers prefer meat product (proportion 0,59 or 59%),
70 consumers belong to the group of intermediate income,
42 consumers in the group of high income prefer fruit product,
58 consumers in the group of low income prefer meat product,
. . . . .

Hence, besides the last column (Total) and the last row (Total), the value `x_(ij)` in a cell of the contingency table is the number of elements corresponds to the value of column variable and the value of row variable. For example 42 is the number of consumers in the intermediate income group that prefer meat product.

For this hypothesis testing, the null hypothesis is:

Ho : All the groups of income have the same preference.

It means that, in all groups, there are always 59% of customers in group preferring meat product.

Based on this hypothesis, we construct Table 2 with the new values `e_(ij)` in cells. These values conform to Ho.

Table 2 Number of customers in preferred food based on Ho
	High income	Intermediate income	Low income	Total
Meat product	35,4	41,3	41,3	118
Fruit product	24,6	28,7	28,7	82
Total	60	70	70	200

Test statistic of this hypothesis testing is :

`chi^2=sum_(i=1)^d sum_(j=1)^c ((x_(ij)-e_(ij))^2)/e_(ij)`

(23)

This test statistic conforms to chi-square distribution with degree of freedom `nu=(c - 1)(r - 1)` in which `c` and `r` are numbers of columns and rows of contingency table respectively.

This is a one-sided hypothesis test with rejection being in the right of critical value and `alpha=0,05`.

Hence `chi^2`*`=chi_(0,05,2)^2=5,991` (percentage point of chi-square distribution).

Based on the data of survey :

`chi_o^2=(18-35,4)^2/(35,4)+(42-41,3)^2/(41,3)+(58-41,3)^2/(41,3)+(42-24,6)^2/(24,6)`
`+(28-28,7)^2/(28,7)+(12-28,7)^2/(28,7)=37,359`

Because `chi_o^2>chi^2`*, we reject Ho.

Two attributes "food preference" and "income" are not independent. It means that preferred food depends on income of customers.

This web page was last updated on 03 December 2018.