Instructions for doing simple regression in Excel
(1) Go to Tools—Addins
in menu, and make sure that the box in front of Analysis ToolPak
is checked.
(2) Go to Tools—Data
Analysis—Regression in menu.
(3) You will have to
indicate where the Yvariable (dependent variable) is located, e.g., d4:d13;
you will also have to indicate where the Xvariable (independent variable) is
located, e.g., b4:b13.
Input Y Range: d4:d13
Input X Range: b4:b13
Optional:
(a)
Check the box in front of Confidence Level 95% (95% is the default so
you do not have to change this). This
will give you a 95% confidence interval for the b_{o} and b_{1 }coefficients. If you do not check this, the program will
run but you will not get the confidence interval.
(b) If
you would like to see the residuals (also known as deviations), check the box
in front of Residuals. [You can
also do Residual Plots.]
You have to indicate where you want the regression output to
appear. You will probably want the
output to appear either on the same page or on another worksheet. If you want the output to appear on the same
page, then check the circle in front of
Example: A bookstore
wants to determine whether there is a relationship between shelf space and
number of books sold.
Shelf Space (in feet) (X)
Number of Books Sold
1 14
2 18
3 22
4 25
5 30
6 34
7 40
8 43
9 50
10 54
SUMMARY
OUTPUT 













Regression Statistics 






Multiple
R 
0.997115226 






0.994238773 





Adjusted

0.99351862 





Standard
Error 
1.09336844 





Observations 
10 












ANOVA 







df 
SS 
MS 
F 
Significance F 

Regression 
1 
1650.436364 
1650.436364 
1380.593156 
3.0194E10 

Residual 
8 
9.563636364 
1.195454545 



Total 
9 
1660 












Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Intercept 
8.4 
0.746912838 
11.24629216 
3.50917E06 
6.677614793 
10.12238521 
X
Variable 1 
4.472727273 
0.120375903 
37.15633399 
3.0194E10 
4.195139762 
4.750314783 
This regression is very significant; the Fvalue is
1380.59. If the Xvariable explains very
little of the Yvariable, you should get an Fvalue that is 1 or less. In this case, the explained variation (due to
regression = explained by the Xvariable) is 1,380.59 times greater than the
unexplained (residual) variation. The probability of getting the sample
evidence (the X and Y input data), are something more extreme, if the X and Y are unrelated (that is the Ho)
is .00000000030194. In other words, it
is almost impossible to get this kind of data as a result of chance. This is why we reject the null hypothesis.
The X Variable 1 coefficient (slope term) is the b_{1};
in the above problem it is 4.47 (rounded).
The regression equation is:
Sales = 8.4 +4.47 (shelf space).
In theory, if no shelf space is assigned to the book (book
must be ordered from catalog), you will sell 8.4 copies. Every foot of shelf space will increase sales
by 4.47 books.
The correlation coefficient,
r, is +.997. It is almost a perfect 1. To get 1, all the points would have to be on
a line and all the residuals (deviations) would be 0.
The coefficient of determination, r^{2}, is
99.4%. There is very little unexplained
variation (in fact, the MSE, mean square error is only 1.195).
Another way to test the regression for significance is to
test the b_{1} term (slope term which shows the effect of X on Y). This
is done via a ttest. The tvalue is
37.156 and this is very, very significant.
The probability of getting a b_{1 }of this magnitude if Ho is
true (the null hypothesis for this test is that B_{1} = 0, i.e., the X
variable has no effect on Y) is
3.0194E10
or .00000000030194. Note that
this is the same sig. level we got before for the Ftest. Indeed, the two tests give exactly the same
results.