Linear regression is widely used thanks to its simplicity, ease of use, good conformity to a lot of phenomena. In this page, we consider the simple case with two variables.
Consider two variables `X` and `Y`, and `n` pairs of data (`x_i`, `y_i`). We have to find out two coefficients `a` and `b` for the equation `y=ax+b` of the line satisfying least squares condition (Fig. 1).
Fig. 1 Illustration of linear regression
Using (2) to calculate `SS_E`:
| `SS_E=sum_(i=1)^n (y_i-ax_i-b)^2` | (6) |
Coefficients a and b are determined by system of equations:
| `(partial SS_E)/(partial a)=-2sum_(i=1)^n (y_i-ax_i-b)x_i=0` | (7) |
| `(partial SS_E)/(partial b)=-2sum_(i=1)^n (y_i-ax_i-b)=0` | (8) |
Rearrange this system of equations:
| `nb+asum_(i=1)^n x_i=sum_(i=1)^n y_i` | (9) |
| `bsum_(i=1)^n x_i+asum_(i=1)^n x_i^2=sum_(i=1)^n x_iy_i` | (10) |
Solve system of equations (9) and (10), we get:
| `a=(nsum xy-sum x sum y)/(nsum x^2-(sum x)^2)` | (11) |
| `b=(sum y sum x^2-sum x sum xy)/(n sum x^2-(sum x)^2)` | (12) |
Example
The heights and weights of 10 students are shown in Table 1.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Height (m) | 1,57 | 1,62 | 1,58 | 1,64 | 1,74 | 1,68 | 1,71 | 1,60 | 1,66 | 1,64 |
| Weight (kg) | 55 | 72 | 52 | 65 | 82 | 79 | 78 | 66 | 78 | 71 |
Symbolize `X` as height and `Y` as weight. To find out the linear regression equation for `X` and `Y` we calculate the components in equations (11) and (12). Result is shown in Table 2.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | `sum` | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| `x` | 1,57 | 1,62 | 1,58 | 1,64 | 1,74 | 1,68 | 1,71 | 1,60 | 1,66 | 1,64 | 16,44 |
| `y` | 55 | 72 | 52 | 65 | 82 | 79 | 78 | 66 | 78 | 71 | 698 |
| `x^2` | 2,4649 | 2,6244 | 2,4964 | 2,6896 | 3,0276 | 2,8224 | 2,9241 | 2,56 | 2,7556 | 2,6896 | 27,0546 |
| `xy` | 86,35 | 116,64 | 82,16 | 106,6 | 142,68 | 132,72 | 133,38 | 105,6 | 129,48 | 116,44 | 1152,05 |
Using formulae (11) and (12) to calculate coefficients `a` and `b`:
`a=((10xx1152,5)-(16,44xx698))/((10xx27,0546)-(16,44)^2)=166,593`
`b=((698xx27,0546)-(16,44xx1152,05))/((10xx27,0546)-(16,44)^2)=-204,079`
Therefore, the relationship between height and weight of 10 students can be represented by formula:
`y=166,593x-204,079`
This web page was last updated on 03 December 2018.