Ordinary Least Squares Regression Problems

Project description

Please put the answers next to the questions in the word document that I uploaded in the Order Tracking Area

There is also a problem in the way end (SAS Output provided) that you have to answer just by giving the steps to answer the SAS Output by hand

Assignment #4:Problem Set for Ordinary Least Squares Regression (50 points)

This assignment will be made available in both pdf and Microsoft docx format. Answers should be typed into the docx file, saved, and converted into pdf format for submission into Blackboard. Color your answers in green so that they can be easily distinguished from the questions themselves.

Throughout this assignment keep all decimals to four places, i.e. X.xxxx.

Any computations that involve “the log function”, denoted by log(x), are always meant to mean the natural log function (which will show as ln() on a calculator). The only time that you should ever use a log function other than the natural logarithm is if you are given a specific base.

When stating the null and alternate hypotheses in any statistical test in PREDICT 410, we should always state these hypotheses in terms of the model parameters, i.e. the model coefficients denoted by the betas.

Model 1: Let’s consider the regression model, which we will refer to as Model 1, given by

Y = 10,000 + 150*X1 + 25*X1^2 + 60*X2 (M1).

(1) (2 points) Is this a “linear” regression model, why or why not?

(2) (4 points) How do we interpret this model? Hint: how does a one unit change in X1 or X2 affect the estimated value for Y? State the interpretation for both X1 and X2.

(3) Consider the Analysis of Variance (ANOVA) table from fitting this model to a sample of 50 observations.

a. (4 points) Compute the R-squared and adjusted R-squared values for this regression model.

b. (2 points) Compute the estimate of the Mean Square Error (MSE).

c. (4 points) Perform the overall F-test for this model, i.e. state the null and alternate hypothesis, compute the test statistic for the overall F-test, and make a decision to “reject “ or “fail to reject” the null hypothesis. Test the statistical significance of the overall F-test using a critical value for alpha=0.05 from Table A.4 on page 376 in Regression Analysis By Example.

Model 2: Now let’s consider an alternate regression model, which we will refer to as Model 2, given by

Y = 9,750 + 145*X1 + 75*X2 (M2).

(4) Consider the ANOVA table from fitting this model to the same sample of 50 observations that we used to fit M1.

a. (4 points) Compute the R-squared and adjusted R-squared values for this regression model.

b. (2 points) Compute the estimate of the Mean Square Error (MSE).

c. (4 points) State the hypothesis and compute the test statistic for the overall F-test.

(5) Now let’s consider M1 and M2 as a pair of models. We want to decide which model we should use as our final model. Here are some concepts to help us make that decision.

a. (2 points) What is the definition of a nested model?

b. (2 points) Does M1 nest M2 or does M2 nest M1?

c. (2 points) Based on any of the metrics or statistics that you have computed in Questions #3 and #4, which model should we prefer (M1 or M2) and why?

d. (10 points) Perform a F-test for nested models and determine if we should choose M1 or M2. State the hypothesis that we will be testing, compute the test statistic, and test the statistical significance using a critical value for alpha=0.05 from Table A.4 on page 376 in Regression Analysis By Example.

(6) In Ordinary Least Squares (OLS) Regression we assume that the response variable is normally distributed with mean XB and variance sigma^2, i.e. Y ~ N(XB, sigma^2).

a. (2 points) How do we estimate sigma^2?

b. (6 points) What are two diagnostic checks of model goodness-of-fit that we perform in order to assess this distributional assumption?

BINGO BONUS (10 POINTS):

Run the following SAS program:

data TEMPFILE;

label Y = “Monthly use of steam (in pounds)”;

label X8 = “Average temperature (F)”;

input Y X8;

cards;

10.98 35.3

11.13 29.7

12.51 30.8

8.4 58.8

9.27 61.4

8.73 71.3

6.36 74.4

8.5 76.7

7.82 70.7

9.14 57.5

8.24 46.4

12.19 28.9

11.88 28.1

9.57 39.1

10.94 46.8

9.58 48.5

10.09 59.3

8.11 70

6.83 70

8.88 74.5

7.68 72.1

8.47 58.1

8.86 44.6

10.36 33.4

11.08 28.6

;

run;

procregdata=TEMPFILE;

model Y = X8;

run;

quit;

The following output should be generated from the SAS code:

BONUS WORK:

Using the above data, calculate all of the numbers in this table BY HAND. In other words, apply the regression formulas given in the text to generate the regression parameters and ANOVA table values given above. Here are the rules:

• Once you calculate a number, let me know where it is on the above table (i.e. this is the INTERCEPT PARAMETER ESTIMATE). If it’s easier for me to follow, then you will get more points!

• If a value is calculated from a table, tell me where you got the value (i.e. “page xxx from the text book”, “internet web site that has F tables”, “Excel”, etc.)

• I plan to spend 5 minutes each on this bonus problem. If I can’t find your answers (to my satisfaction) in that brief amount of time, you won’t get the points.

• I will give partial credit, so get as many points as you can.

• Grading extra work like this is time consuming. I don’t have time to go back and give you points that I overlooked the first time. Make it clear where everything is!

• In other words: ALL SALES ARE FINAL !?