A Numerical Example of Multiple Linear Regression by Hand

What is Multiple Linear Regression?

The linear regression model shows the linear dependence of one variable on one or more independent variables. A simple linear regression model consists of the linear dependence of one variable on only one independent variable. It is also called a bivariate or two-variable regression model. Such as the dependence of consumption on disposable income.

A multiple linear regression model consists of the linear dependence of one variable on two or more independent variables. In other words, in multiple linear regression,, a dependent variable is exprrssed asa linear function of more than one independent variable. It is also called a multivariate regression model. For example, crop yield depends on rainfall, temperature, sunshine, fertilizer, etc. A k-variable multiple linear regression model can be written as:

Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_k X_{ki} + u_i

  • Y_i: Dependent variable
  • \beta_0: Intercept term
  • \beta_1, \beta_2, \dots, \beta_k: Coefficients of independent variables
  • X_{1i}, X_{2i}, \dots, X_{ki}: Independent variables
  • u_i: Error term

To understand multiple linear regression and its interpretation, we consider an example of a 3-variable linear regression model, in which the dependent variable is a linear function of only two explanatory variables. It is written as:

Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i

A Numerical Example of Multiple Linear Regression

The following table provides data about monthly sales revenue in USD 1000 (Yi), price index for all products sold in a given month (X1i), and expenditure on advertising in USD 1000 (X2i).

Obs.salespriceadvert
173.25.691.3
271.86.492.9
362.45.630.8
467.46.220.7
589.35.021.5
670.36.411.3
773.25.851.8
886.15.412.4
9816.240.7
1076.46.23
1176.65.482.8
1282.26.142.7

Answer the following questions.

  1. Estimate and interpret the following model: Sales_i = \beta_0 + \beta_1 Price_i + \beta_2 Advert_i + \mu_i
  2. Compute and interpret the multiple coefficient of determination and the multiple standard error of estimate.
  3. Test the significance of regression coefficients and tell whether the signs are according to the underlying theory?
  4. Predict the sales revenue at means.

Solution

1. Estimate and interpret the following model:

.Sales_i = \beta_0 + \beta_1 Price_i + \beta_2 Advert_i + \mu_i

The OLS estimates can be obtained by the following formulas.

\hat{\beta}_1 = \frac{(\sum x_1 y)(\sum x_2^2) - (\sum x_2 y)(\sum x_1 x_2)}{(\sum x_1^2)(\sum x_2^2) - (\sum x_1 x_2)^2}

\hat{\beta}_2 = \frac{(\sum x_2 y)(\sum x_1^2) - (\sum x_1 y)(\sum x_1 x_2)}{(\sum x_1^2)(\sum x_2^2) - (\sum x_1 x_2)^2}

\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X}_1 - \hat{\beta}_2 \bar{X}_2

The following identities can be used to find the values in the above formulas.

\sum x_1 y = \sum X_1 Y - \frac{\sum X_1 \sum Y}{n}

\sum x_2 y = \sum X_2 Y - \frac{\sum X_2 \sum Y}{n}

\sum x_1 x_2 = \sum X_1 X_2 - \frac{\sum X_1 \sum X_2}{n}

\sum x_1^2 = \sum X_1^2 - \frac{(\sum X_1)^2}{n}

\sum x_2^2 = \sum X_2^2 - \frac{(\sum X_2)^2}{n}

\sum y^2 = \sum Y^2 - \frac{(\sum Y)^2}{n}

Obs.YX1X2X1YX2YX1X2X12X22Y2
173.25.691.3416.50895.167.39732.37611.695358.24
271.86.492.9465.982208.2218.82142.12018.415155.24
362.45.630.8351.31249.924.50431.69690.643893.76
467.46.220.7419.22847.184.35438.68840.494542.76
589.35.021.5448.286133.957.5325.20042.257974.49
670.36.411.3450.62391.398.33341.08811.694942.09
773.25.851.8428.22131.7610.5334.22253.245358.24
886.15.412.4465.801206.6412.98429.26815.767413.21
9816.240.7505.4456.74.36838.93760.496561
1076.46.23473.68229.218.638.4495836.96
1176.65.482.8419.768214.4815.34430.03047.845867.56
1282.26.142.7504.708221.9416.57837.69967.296756.84
909.970.7821.95349.561686.54129.343419.76848.7969660.4
Mean75.8255.898331.825445.796140.54510.778634.98074.065835805.03

\sum x_1 y = \sum X_1 Y - \frac{\sum X_1 \sum Y}{n} = 5349.556 - \frac{(70.78)(909.9)}{12} = -17.3375

\sum x_2 y = \sum X_2 Y - \frac{\sum X_2 \sum Y}{n} = 1686.54 - \frac{(21.9)(909.9)}{12} = 25.98

\sum x_1 x_2 = \sum X_1 X_2 - \frac{\sum X_1 \sum X_2}{n} = 129.343 - \frac{(70.78)(21.9)}{12} = 0.1695

\sum x_1^2 = \sum X_1^2 - \frac{(\sum X_1)^2}{n} = 419.7682 - \frac{5009.8084}{12} = 2.2841

\sum x_2^2 = \sum X_2^2 - \frac{(\sum X_2)^2}{n} = 48.79 - \frac{479.61}{12} = 8.8225

\sum y^2 = \sum Y^2 - \frac{(\sum Y)^2}{n} = 69660.39 - \frac{827918.01}{12} = 667.2225

Plugging these values in OLS formula, we get

\hat{\beta}_1 = \frac{(-17.3375)(8.8225) - (25.98)(0.1695)}{(2.2841)(8.8225) - (0.1695)^2} = \frac{-152.8939 - 4.40361}{20.15 - 0.02873} = -7.82

\hat{\beta}_2 = \frac{(25.98)(2.2841) - (-17.3375)(0.1695)}{(2.2841)(8.8225) - (0.1695)^2} = \frac{59.3409 + 2.9387}{20.15 - 0.02873} = 3.094

\hat{\beta}_0 = 75.825 - (-7.82)\bar{X}_1 - (3.094)\bar{X}_2

\hat{\beta}_0 = 75.825 + 7.82(5.9) - 3.094(1.825)

\hat{\beta}_0 = 116.31

Estimated Regression Equation:

\hat{Y}_i = 116.31 - 7.82X_1 + 3.094X_2

2. Compute and interpret the multiple coefficient of determination and the multiple standard error of estimate

Multiple Coefficient of Determination

Formula 1:

R^2 = \frac{ESS}{TSS} = \frac{\sum (\hat{Y}_i - \bar{Y})^2}{\sum (Y_i - \bar{Y})^2} = \frac{215.937}{667.223} = 0.323

Formula 2:

R^2 = \frac{ESS}{TSS} = \frac{\hat{\beta}_1 \sum x_1 y + \hat{\beta}_2 \sum x_2 y}{\sum y_i^2} = \frac{(-7.82)(-17.3375) + (3.094)(25.98)}{667.223} = 0.323

Multiple Standard Error of Estimate

SER = \sqrt{\frac{\sum (Y_i - \hat{Y}_i)^2}{n - 3}} = \sqrt{\frac{SSR}{n - 3}} = \sqrt{\frac{451.286}{12 - 3}} = \sqrt{50.1428} = 7.0811

3. Test the significance of regression coefficients and tell whether the signs are according to the underlying theory.

Standard error and t-value of \hat{\beta}_1

\mathrm{Var}(\hat{\beta}_1) = \frac{\sum x_2^2}{(\sum x_1^2)(\sum x_2^2) - (\sum x_1 x_2)^2} \, \sigma^2

\mathrm{Var}(\hat{\beta}_1) = \frac{8.8225}{(2.2841)(8.8225) - 0.02873} \times 50.1428 = 21.98

se(\hat{\beta}_1) = \sqrt{\mathrm{Var}(\hat{\beta}_1)} = \sqrt{21.98} = 4.688

t_{\hat{\beta}_1} = \frac{\hat{\beta}_1}{se(\hat{\beta}_1)} = \frac{-7.82}{4.688} = -1.668

Standard error and t-value of \hat{\beta}_2

\mathrm{Var}(\hat{\beta}_2) = \frac{\sum x_1^2}{(\sum x_1^2)(\sum x_2^2) - (\sum x_1 x_2)^2} \, \sigma^2

\mathrm{Var}(\hat{\beta}_2) = \frac{2.2841}{(2.2841)(8.8225) - 0.02873} \times 50.1428 = 5.7

se(\hat{\beta}_2) = \sqrt{\mathrm{Var}(\hat{\beta}_2)} = \sqrt{5.7} = 2.385

t_{\hat{\beta}_2} = \frac{\hat{\beta}_2}{se(\hat{\beta}_2)} = \frac{3.094}{2.385} = 1.297

4. Predict the sales revenue at mean price and mean advertisement expenditures.

The estimated regression equation is.

\hat{Y}_i = 116.31 - 7.82X_1 + 3.094X_2

Substituting the mean values of X_1 and X_2, that is plugging \bar{X}_1 = 5.9 and \bar{X}_2 = 1.83.

\hat{Y}_i = 116.31 - 7.82(\bar{X}_1) + 3.094(\bar{X}_2)

\hat{Y}_i = 116.31 - 7.82(5.9) + 3.094(1.83)

\hat{Y}_i = 116.31 - 46.14 + 5.66

\hat{Y}_i = 75.83

Regression Output Results Summary

\hat{Y}_i = 116.31 - 7.82X_1 + 3.094X_2

se = (27.9,\; 4.688,\; 2.385)

t = (4.17,\; -1.668,\; 1.297)

R^2 = 0.323,\quad SER = 7.08

Interpretation of Regression Results

Interpretation of Coefficients and their significance

The regression results show that if the price index of sold goods increases by 1 unit, then sales revenue will decrease by USD 7.8 thousand, holding the advertising expenditure constant. If the advertising expenditure increases by USD 1000, then sales revenue will increase by USD 3000, holding the price index constant. The intercept value shows that the sales revenue is USD 116.3 thousand if the price index of sold goods and advertising expenditure are zero. It makes no sense that if the price index is zero, then the sales revenue is USD 116.3 thousand, because the sales formula is P*Q; if P = 0, then sales must be zero.

The absolute t-value of \hat{\beta}_1 is 1.668, which is less than its critical value; therefore, we fail to reject the null hypothesis and conclude that the sales price index has a statistically insignificant effect on sales revenue.

The t-value of \hat{\beta}_2 is less than its critical value, so we fail to reject the null hypothesis and conclude that advertising expenditure has no statistically significant impact on sales revenue.

According to the underlying theory, the signs of slope coefficients indicate that an increase in sales price will decrease the demand for the good, leading to a decrease in sales revenue. Similarly, more expenditure on advertising leads to more sales revenue, but the impact of both variables on sales revenue is statistically insignificant.

Interpretation of Multiple R2

The R2 value of 0.323 shows that about 32.3% of the variation in sales revenue is explained by the sales price index and advertising expenditure together. While 67.7% of the variation in sales revenue remains unexplained, it is important to consider other factors.

Suggestions for further readings:

Share this article
Facebook
Twitter
LinkedIn
WhatsApp

Leave a Reply

Your email address will not be published. Required fields are marked *

Education and Economic Development

Health, Education, and Economic Development Health and Education as Objectives of Development Education and health are basic objectives of development; they are important ends in themselves. Health is central to wellbeing, and education is essential for a satisfying and rewarding life. Health and Education as Inputs of Development At the

Read More »

Short Questions Project Appraisal

Short Questions Cash flow refers to the movement of money into and out of a business, project, or individual account over a specific period. For example, Revenue from sales of goods or services, loan repayments, interest payments, etc. Cash inflow is the amount of money coming into the business from

Read More »

Capital Budgeting

What is Capital Budgeting? Capital Budgeting is the process of evaluating and selecting long-term investment projects that involve significant capital expenditures such as purchasing new machinery, expanding production facilities, or launching new products. It compares costs and benefits of projects to determine its long-term profitability. Some of the most common

Read More »

Cost Benefit Analysis

What is Cost Benefit Analysis (CBA)? Cost Benefit Analysis (CBA) is a process that’s used to determine the profitability of a project by estimating and comparing its costs and benefits measured in monetary terms after adjusting for the time value of money. Objectives of Cost Benefit Analysis To determine the

Read More »