|
Simple Linear Regression Analysis |
By
Issa Bass
A good and reliable business decision making is always founded on a clear
knowledge of how a change in one variable can affect all the other
variables that are in one way or another associated to it.
How will commercial banks react to a change in the interest rate by the
Federal Reserve?
How does that change affect our mortgages?
How does an increase in the price of gas affect the volume of cars sold in
the economy?
How do changes in the prices of a given input affect the cost of the output?
The Regression Analysis is the part of Statistics that analyzes the
relationship between quantitative variables. It helps predict the
reaction of a variable when a related variable varies.
The objective here is to determine how the predicted or
dependent variable y (the variable to be estimated) reacts to
the variations of the predicator or independent
variables.
The first step should be to determine whether there is any relationship
between the independent and dependent variables, and if there is any,
how important it is.
The Covariance, the Coefficient of Correlation and the Coefficient of
Determination can determine this relationship and its level of
importance. But these alone do not help make accurate predictions on how
variations of the independent variables impact upon the dependent
variables.
The objective of Regression analysis is to build a mathematical model
that will help make accurate predictions about the impact of variable
variations.
It is obvious that in most cases there are more than one independent
variables that can cause the variations of a dependent variable.
Example: there is more than one factor that can explain the changes in
the volume of cars sold by a given car maker. Among other factors, we
can name the price of the cars, the gas mileage, the warranty, the
comfort, the reliability, the population growth, the competing
companies, and so on. But the importance of all those factors in the
variation of the dependent variable is disproportional. So in some
cases, it is more beneficial to concentrate on one factor versus
analyzing all the competing factors.
When building a regression model, if more than one independent variable
is being considered, we call it a multiple regression analysis,
if only one independent variable is being considered, the analysis is a
simple linear regression.
In our quest for that model, we will start with the techniques that
enable us to find the relatedness between two variables.
2.1 Simple Linear Regression (or first-order linear model)
The simple regression analysis is a bivariate regression in that it
involves only two variables: the independent and the dependent
variables. The model we will attempt to build will be a simple
linear equation that will stress the relationship between the two
variables.
We will attempt to build a model that will enable us to predict the volume
of gas demanded by a given community when the price of that commodity
varies.
The first step in the model building is, as always the data gathering
and organization.
The following table summarizes the data prices of gas and the quantity
sold over a period of 15 years in a given city. The model we are about
to build is intended to help predict the quantity of gas that will be
sold at a given price.
|
Years |
Price per gallon |
Quantity sold
(in 1000) |
Year 1
Year 2
Year 3
Year 4
Year 5
Year 6
Year 7
Year 8
Year 9
Year 10
Year 11
Year 12
Year 13
Year 14
Year 15 |
1.62
1.667
1.69
1.70
1.72
1.73
1.736
1.74
1.75
1.755
1.756
1.77
1.767
1.756
1.77 |
159
160
163
166
167
167
168
167
167.9
168.9
169
169
170
171
172 |
The first observation that needs to be mentioned is that in this case,
the independent variable x is the price of gas (since it
explains the variations of the quantity sold) and the dependent variable
y is the quantity sold.
The equation we are looking for will be in the form of .
A scatter plot can help visualize the relationship between the two
variables x and y.
Using Minitab

The vertical distance between the line and each point is called the
error of prediction.
The equation we are about to derive from the data will determine the
line that will passes through the dots that represent the combinations
of price and quantity.
The equation will
yield two points of interest: the slope of the line and the intercept of
the line.
So will
be under the form of where
a is the slope of the line and b is the Y intercept.
In statistics, the most commonly used letters to represent the slopes
and intercepts are the Greek letter .
With representing
the y intercept and being
the slope of the line therefore .
If all the independent variable is
known with certitude and only that variable can affect the dependent
variable Y, the model that will be built will
generate an exact predictable output. In that case,
the model will be called a deterministic model and it will be
under the form:
.
But in most cases, the independent variables are
not the only factors affecting Y, so the value of Y
will not always equal the value generated by the equation for a given
x.
That is why an error term is added to the deterministic model to take
into account the uncertainty. The equation for the probabilistic model
is:

Or

Where represents the error
term.
Least Square Method
To determine the equation of the model, what we are looking for is the
values of and .
The method used for that purpose is called the Least Square
method.
As mentioned earlier, the vertical distance between each point and the
line is called the error of prediction. What we are looking for is the
line that generates the smallest error of predictions, in other words
the least square regression line.
The equations for and are
obtained from the following formula:

In other words

can be
rewritten as:

The y intercept
is obtained from the following equation:

Now that we have the formula for the parameters of the equation, we can
build the model for gas price –quantity.
We will need to add columns to the distribution.
|
Year |
Price per
Gallon |
Quantity
Sold |

|

|

|

|
|
Year1 |
1.62 |
159 |
-0.108466667 |
-7.986666667 |
0.866287111 |
0.011765018 |
|
Year2 |
1.667 |
160 |
-0.061466667 |
-6.986666667 |
0.429447111 |
0.003778151 |
|
Year3 |
1.69 |
163 |
-0.038466667 |
-3.986666667 |
0.153353778 |
0.001479684 |
|
Year4 |
1.7 |
166 |
-0.028466667 |
-0.986666667 |
0.028087111 |
0.000810351 |
|
Year5 |
1.72 |
167 |
-0.008466667 |
0.013333333 |
-0.000112889 |
7.17E-05 |
|
Year6 |
1.73 |
167 |
0.001533333 |
0.013333333 |
2.04E-05 |
2.35E-06 |
|
Year7 |
1.736 |
168 |
0.007533333 |
1.013333333 |
0.007633778 |
5.68E-05 |
|
Year8 |
1.74 |
167 |
0.011533333 |
0.013333333 |
0.000153778 |
0.000133018 |
|
Year9 |
1.75 |
167.9 |
0.021533333 |
0.913333333 |
0.019667111 |
0.000463684 |
|
Year10 |
1.755 |
168.9 |
0.026533333 |
1.913333333 |
0.050767111 |
0.000704018 |
|
Year11 |
1.756 |
169 |
0.027533333 |
2.013333333 |
0.055433778 |
0.000758084 |
|
Year12 |
1.77 |
169 |
0.041533333 |
2.013333333 |
0.083620444 |
0.001725018 |
|
Year13 |
1.767 |
170 |
0.038533333 |
3.013333333 |
0.116113778 |
0.001484818 |
|
Year14 |
1.756 |
171 |
0.027533333 |
4.013333333 |
0.110500444 |
0.000758084 |
|
Year15 |
1.77 |
172 |
0.041533333 |
5.013333333 |
0.208220444 |
0.001725018 |
|
Totals
|
|
|
|
|
2.129193333
|
0.0257157
|

For a deterministic model,

We could have
used Excel or
Minitab
How much gas should we expect to sell if the price per gallon becomes
$2?
Since all we
need to do is replace x by 2 and obtain:

If the price of gas per gallon becomes $2, the quantity sold will be
189,468 gallons.
About the author
Issa Bass is the managing
editor of SixSigmaFirst. He can be reached at
issa@sixsigmafirst.com
Tell us what you think about this article. Send a
note to the Editor.
www.manorhouseassociates.com
|