xav big data discussion 3 and replies 1

Discuss any of the following topics from the text book

“ISLR” An Introduction to Statistical Learning, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, http://www-bcf.usc.edu/~gareth/ISL/index.html

3 Linear Regression 59

3.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . 61

3.1.1 Estimating the Coefficients . . . . . . . . . . . . . . 61

3.1.2 Assessing the Accuracy of the Coefficient Estimates . . . . . . . . . . . . . . . . . . . . . . . . 63

3.1.3 Assessing the Accuracy of the Model . . . . . . . . . 68

3.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . 71

3.2.1 Estimating the Regression Coefficients . . . . . . . . 72

3.2.2 Some Important Questions . . . . . . . . . . . . . . 75

3.3 Other Considerations in the Regression Model . . . . . . . . 82

3.3.1 Qualitative Predictors . . . . . . . . . . . . . . . . . 82

3.3.2 Extensions of the Linear Model . . . . . . . . . . . . 86

3.3.3 Potential Problems . . . . . . . . . . . . . . . . . . . 92

3.4 The Marketing Plan . . . . . . . . . . . . . . . . . . . . . . 102

3.5 Comparison of Linear Regression with K-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104


words : 250

and also provide replies to below 3 student posts each in 150 words,

chaitanya- One very important question is how the variables are related. For example, we could ask for the relationship between people’s weights and heights, or study time and test scores, or two animal populations. Regression is a set of techniques for estimating relationships, and we’ll focus on them for the next two chapters. In this chapter, we’ll focus on finding one of the simplest type of relationship: linear. This process is unsurprisingly called linear regression, and it has many applications. For example, we can relate the force for stretching a spring and the distance that the spring stretches or explain how many transistors the semiconductor industry can pack into a circuit over time. Despite its simplicity, linear regression is an incredibly powerful tool for analysing data. While we’ll focus on the basics in this chapter, the next chapter will show how just a few small tweaks and extensions can enable more complex analyses

Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X1, X2, . . .Xp is linear.Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X1, X2, . . .Xp is linear. True regression functions are never linear.

By examining the second equation for the estimated slope βˆ 1, we see that since sample standard deviations sx and sy are positive quantities, the correlation coefficient r, which is always between −1 and 1, measures how much x is related to y and whether the trend is positive or negative. Figure 3.2 illustrates different correlation strengths. The square of the correlation coefficient r 2 will always be positive and is called the coefficient of determination. As we’ll see later, this also is equal to the proportion of the total variability that’s explained by a linear model. As an extremely crucial remark, correlation does not imply causation! We devote the entire next page to this point, which is one of the most common sources of error in interpreting statistics.

In particular, the residual is defined to be yi − yˆi : the distance from the original data point to the predicted value on the line. You can think of it as the error left over after the model has done its work. This difference is shown graphically in Figure 3.5. Note that the residual yi − yˆ isn’t quite the same as the noise ε! We’ll talk a little more about analyzing residuals (and why this distinction matters) in the next chapter. If our model is doing a good job, then it should explain most of the difference from ¯y, and the first term should be bigger than the second term. If the second term is much bigger, then the model is probably not as useful.

kranthi – inear regression is a technique used to model the relationships between observed variables. The idea behind simple linear regression is to “fit” the observations of two variables into a linear relationship between them (James et al. 2017). Graphically, the task is to draw the line that is “best-fitting” or “closest” to the points(xi,yi), where xi and yi are observations of the two variables which are expected to depend linearly on each other.

Regression is a common process used in many applications of statistics in the real world. There are two main types of applications:

Predictions: After a series of observations of variables, regression analysis gives a statistical model for the relationship between the variables (Seal, 1997). This model can be used to generate predictions: given two variables xx and y,y, the model can predict values of yy given future observations of x.x. This idea is used to predict variables in countless situations, e.g. the outcome of political elections, the behavior of the stock market, or the performance of a professional athlete.

Correlation: The model given by a regression analysis will often fit some kinds of data better than others. This can be used to analyze correlations between variables and to refine a statistical model to incorporate further inputs: if the model describes certain subsets of the data points very well, but is a poor predictor for other data points, it can be instructive to examine the differences between the different types of data points for a possible explanation. This type of application is common in scientific tests, e.g. of the effects of a proposed drug on the patients in a controlled study.

urmila – Regression analysis is a significant statistical method for the analysis of restorative information. It empowers the ID and portrayal of connections among numerous variables. It likewise empowers the ID of prognostically pertinent hazard factors and the figuring of hazard scores for singular prognostication. (Schneider, Hommel, & Blettner, 2010)

Regression analysis utilizes a model that portrays the connections between the dependent variables and the independent variables in a rearranged scientific structure. There might be natural motivations to expect from the earlier that a specific sort of scientific capacity will best depict such a relationship, or straightforward suppositions must be made this is the situation.

Linear regression is utilized to examine the linear connection between a dependent variable Y (circulatory strain) and at least one independent variable X (age, weight, sex). The dependent variable Y must be constant, while the independent variables might be either ceaseless (age), paired (sex), or absolute (economic wellbeing).

The underlying judgment of a potential connection between two constant variables ought to consistently be made based on a disperse plot (dissipate graph). Performing a linear regression bodes well just if the relationship is linear. Different methods must be utilized to examine nonlinear relationships. (Schneider, Hommel, & Blettner, 2010)

Univariable linear regression considers the linear connection between the dependent variable Y and a solitary independent variable X. The linear regression model depicts the dependent variable with a straight line that is characterized by the condition Y = a + b × X, where and is they-converge of the line, and b is its slant. (Schneider, Hommel, & Blettner, 2010)

Initially, the parameters an and b of the regression line are assessed from the estimations of the dependent variable Y and the independent variable X with the guide of statistical methods. The regression line empowers one to foresee the estimation of the dependent variable Y from that of the independent variable X. (Schneider, Hommel, & Blettner, 2010)