Simple Linear Regression Analysis

By Issa Bass


A good and reliable business decision making is always founded on a clear knowledge of how a change in one variable can affect all the other variables that are in one way or another associated to it.
How will commercial banks react to a change in the interest rate by the Federal Reserve?
How does that change affect our mortgages?
How does an increase in the price of gas affect the volume of cars sold in the economy?
How do changes in the prices of a given input affect the cost of the output?

The Regression Analysis is the part of Statistics that analyzes the relationship between quantitative variables. It helps predict the reaction of a variable when a related variable varies.
The objective here is to determine how the predicted  or dependent variable y (the variable to be estimated) reacts to the variations of the predicator  or independent variables.

The first step should be to determine whether there is any relationship between the independent and dependent variables, and if there is any, how important it is.
The Covariance, the Coefficient of Correlation and the Coefficient of Determination can determine this relationship and its level of importance. But these alone do not help make accurate predictions on how variations of the independent variables impact upon the dependent variables.
The objective of Regression analysis is to build a mathematical model that will help make accurate predictions about the impact of variable variations.
It is obvious that in most cases there are more than one independent variables that can cause the variations of a dependent variable.

Example: there is more than one factor that can explain the changes in the volume of cars sold by a given car maker. Among other factors, we can name the price of the cars, the gas mileage, the warranty, the comfort, the reliability, the population growth, the competing companies, and so on. But the importance of all those factors in the variation of the dependent variable is disproportional. So in some cases, it is more beneficial to concentrate on one factor versus analyzing all the competing factors.

When building a regression model, if more than one independent variable is being considered, we call it a multiple regression analysis, if only one independent variable is being considered, the analysis is a simple linear regression.
In our quest for that model, we will start with the techniques that enable us to find the relatedness between two variables.

2.1 Simple Linear Regression (or first-order linear model)

The simple regression analysis is a bivariate regression in that it involves only two variables: the independent and the dependent variables. The model we will attempt to build will be a simple linear equation  that will stress the relationship between the two variables.
We will attempt to build a model that will enable us to predict the volume of gas demanded by a given community when the price of that commodity varies.
The first step in the model building is, as always the data gathering and organization.

The following table summarizes the data prices of gas and the quantity sold over a period of 15 years in a given city. The model we are about to build is intended to help predict the quantity of gas that will be sold at a given price.

Years

Price per gallon

Quantity sold
(in 1000)

Year 1
Year 2
Year 3
Year 4
Year 5
Year 6
Year 7
Year 8
Year 9
Year 10
Year 11
Year 12
Year 13
Year 14
Year 15
1.62
1.667
1.69
1.70
1.72
1.73
1.736
1.74
1.75
1.755
1.756
1.77
1.767
1.756
1.77
159
160
163
166
167
167
168
167
167.9
168.9
169
169
170
171
172

The first observation that needs to be mentioned is that in this case, the independent variable x  is the price of gas (since it explains the variations of the quantity sold) and the dependent variable y  is the quantity sold.
The equation we are looking for will be in the form of .
A scatter plot can help visualize the relationship between the two variables x  and y.

Using Minitab

The vertical distance between the line and each point is called the error of prediction.

The equation we are about to derive from the data will determine the line that will passes through the dots that represent the combinations of price and quantity.
The equation  will yield two points of interest: the slope of the line and the intercept of the line.
So  will be under the form of where a  is the slope of the line and b  is the Y intercept.

In statistics, the most commonly used letters to represent the slopes and intercepts are the Greek letter .
With  representing the y intercept and  being the slope of the line therefore .

If all the independent variable is known with certitude and only that variable can affect the dependent variable Y, the model that will be built will generate an exact predictable output. In that case, the model will be called a deterministic model and it will be under the form:

.

But in most cases, the independent variables  are not the only factors affecting Y, so the value of Y  will not always equal the value generated by the equation for a given x.
That is why an error term is added to the deterministic model to take into account the uncertainty. The equation for the probabilistic model is:


Or


Where  represents the error term.

Least Square Method

To determine the equation of the model, what we are looking for is the values of  and . The method used for that purpose is called the Least Square method.

As mentioned earlier, the vertical distance between each point and the line is called the error of prediction. What we are looking for is the line that generates the smallest error of predictions, in other words the least square regression line.
The equations for and are obtained from the following formula:

In other words

can be rewritten as:

The y intercept is obtained from the following equation:

Now that we have the formula for the parameters of the equation, we can build the model for gas price –quantity.

We will need to add columns to the distribution.

Year

Price per
Gallon

Quantity
Sold

Year1

1.62

159

-0.108466667

-7.986666667

0.866287111

0.011765018

Year2

1.667

160

-0.061466667

-6.986666667

0.429447111

0.003778151

Year3

1.69

163

-0.038466667

-3.986666667

0.153353778

0.001479684

Year4

1.7

166

-0.028466667

-0.986666667

0.028087111

0.000810351

Year5

1.72

167

-0.008466667

0.013333333

-0.000112889

7.17E-05

Year6

1.73

167

0.001533333

0.013333333

2.04E-05

2.35E-06

Year7

1.736

168

0.007533333

1.013333333

0.007633778

5.68E-05

Year8

1.74

167

0.011533333

0.013333333

0.000153778

0.000133018

Year9

1.75

167.9

0.021533333

0.913333333

0.019667111

0.000463684

Year10

1.755

168.9

0.026533333

1.913333333

0.050767111

0.000704018

Year11

1.756

169

0.027533333

2.013333333

0.055433778

0.000758084

Year12

1.77

169

0.041533333

2.013333333

0.083620444

0.001725018

Year13

1.767

170

0.038533333

3.013333333

0.116113778

0.001484818

Year14

1.756

171

0.027533333

4.013333333

0.110500444

0.000758084

Year15

1.77

172

0.041533333

5.013333333

0.208220444

0.001725018

Totals

       

2.129193333

0.0257157

For a deterministic model,

We could have used Excel or Minitab

How much gas should we expect to sell if the price per gallon becomes $2?
Since all we need to do is replace x  by 2 and obtain:

If the price of gas per gallon becomes $2, the quantity sold will be 189,468 gallons.


About the author
Issa Bass is the managing editor of SixSigmaFirst. He can be reached at issa@sixsigmafirst.com

Tell us what you think about this article. Send a note to the Editor.

www.manorhouseassociates.com

 

Place your Ad here
Six Sigma Statistics
Order "Six Sigma Statistics with Excel and Minitab," the new book by Issa Bass.

Issa Bass and Barbara Lawton