A Numerical Example of Multiple Linear Regression by Hand

Multiple Linear Regression

What is Multiple Linear Regression?

The linear regression model shows the linear dependence of one variable on one or more independent variables. A simple linear regression model consists of the linear dependence of one variable on only one independent variable. It is also called a bivariate or two-variable regression model. Such as the dependence of consumption on disposable income.

A multiple linear regression model consists of the linear dependence of one variable on two or more independent variables. In other words, in multiple linear regression,, a dependent variable is exprrssed asa linear function of more than one independent variable. It is also called a multivariate regression model. For example, crop yield depends on rainfall, temperature, sunshine, fertilizer, etc. A k-variable multiple linear regression model can be written as:

Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_k X_{ki} + u_i

  • Y_i: Dependent variable
  • \beta_0: Intercept term
  • \beta_1, \beta_2, \dots, \beta_k: Coefficients of independent variables
  • X_{1i}, X_{2i}, \dots, X_{ki}: Independent variables
  • u_i: Error term

To understand multiple linear regression and its interpretation, we consider an example of a 3-variable linear regression model, in which the dependent variable is a linear function of only two explanatory variables. It is written as:

Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i

A Numerical Example of Multiple Linear Regression

The following table provides data about monthly sales revenue in USD 1000 (Yi), price index for all products sold in a given month (X1i), and expenditure on advertising in USD 1000 (X2i).

Obs.salespriceadvert
173.25.691.3
271.86.492.9
362.45.630.8
467.46.220.7
589.35.021.5
670.36.411.3
773.25.851.8
886.15.412.4
9816.240.7
1076.46.23
1176.65.482.8
1282.26.142.7

Answer the following questions.

  1. Estimate and interpret the following model: Sales_i = \beta_0 + \beta_1 Price_i + \beta_2 Advert_i + \mu_i
  2. Compute and interpret the multiple coefficient of determination and the multiple standard error of estimate.
  3. Test the significance of regression coefficients and tell whether the signs are according to the underlying theory?
  4. Predict the sales revenue at means.

Solution

1. Estimate and interpret the following model:

.Sales_i = \beta_0 + \beta_1 Price_i + \beta_2 Advert_i + \mu_i

The OLS estimates can be obtained by the following formulas.

\hat{\beta}_1 = \frac{(\sum x_1 y)(\sum x_2^2) - (\sum x_2 y)(\sum x_1 x_2)}{(\sum x_1^2)(\sum x_2^2) - (\sum x_1 x_2)^2}

\hat{\beta}_2 = \frac{(\sum x_2 y)(\sum x_1^2) - (\sum x_1 y)(\sum x_1 x_2)}{(\sum x_1^2)(\sum x_2^2) - (\sum x_1 x_2)^2}

\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X}_1 - \hat{\beta}_2 \bar{X}_2

The following identities can be used to find the values in the above formulas.

\sum x_1 y = \sum X_1 Y - \frac{\sum X_1 \sum Y}{n}

\sum x_2 y = \sum X_2 Y - \frac{\sum X_2 \sum Y}{n}

\sum x_1 x_2 = \sum X_1 X_2 - \frac{\sum X_1 \sum X_2}{n}

\sum x_1^2 = \sum X_1^2 - \frac{(\sum X_1)^2}{n}

\sum x_2^2 = \sum X_2^2 - \frac{(\sum X_2)^2}{n}

\sum y^2 = \sum Y^2 - \frac{(\sum Y)^2}{n}

Obs.YX1X2X1YX2YX1X2X12X22Y2
173.25.691.3416.50895.167.39732.37611.695358.24
271.86.492.9465.982208.2218.82142.12018.415155.24
362.45.630.8351.31249.924.50431.69690.643893.76
467.46.220.7419.22847.184.35438.68840.494542.76
589.35.021.5448.286133.957.5325.20042.257974.49
670.36.411.3450.62391.398.33341.08811.694942.09
773.25.851.8428.22131.7610.5334.22253.245358.24
886.15.412.4465.801206.6412.98429.26815.767413.21
9816.240.7505.4456.74.36838.93760.496561
1076.46.23473.68229.218.638.4495836.96
1176.65.482.8419.768214.4815.34430.03047.845867.56
1282.26.142.7504.708221.9416.57837.69967.296756.84
909.970.7821.95349.561686.54129.343419.76848.7969660.4
Mean75.8255.898331.825445.796140.54510.778634.98074.065835805.03

\sum x_1 y = \sum X_1 Y - \frac{\sum X_1 \sum Y}{n} = 5349.556 - \frac{(70.78)(909.9)}{12} = -17.3375

\sum x_2 y = \sum X_2 Y - \frac{\sum X_2 \sum Y}{n} = 1686.54 - \frac{(21.9)(909.9)}{12} = 25.98

\sum x_1 x_2 = \sum X_1 X_2 - \frac{\sum X_1 \sum X_2}{n} = 129.343 - \frac{(70.78)(21.9)}{12} = 0.1695

\sum x_1^2 = \sum X_1^2 - \frac{(\sum X_1)^2}{n} = 419.7682 - \frac{5009.8084}{12} = 2.2841

\sum x_2^2 = \sum X_2^2 - \frac{(\sum X_2)^2}{n} = 48.79 - \frac{479.61}{12} = 8.8225

\sum y^2 = \sum Y^2 - \frac{(\sum Y)^2}{n} = 69660.39 - \frac{827918.01}{12} = 667.2225

Plugging these values in OLS formula, we get

\hat{\beta}_1 = \frac{(-17.3375)(8.8225) - (25.98)(0.1695)}{(2.2841)(8.8225) - (0.1695)^2} = \frac{-152.8939 - 4.40361}{20.15 - 0.02873} = -7.82

\hat{\beta}_2 = \frac{(25.98)(2.2841) - (-17.3375)(0.1695)}{(2.2841)(8.8225) - (0.1695)^2} = \frac{59.3409 + 2.9387}{20.15 - 0.02873} = 3.094

\hat{\beta}_0 = 75.825 - (-7.82)\bar{X}_1 - (3.094)\bar{X}_2

\hat{\beta}_0 = 75.825 + 7.82(5.9) - 3.094(1.825)

\hat{\beta}_0 = 116.31

Estimated Regression Equation:

\hat{Y}_i = 116.31 - 7.82X_1 + 3.094X_2

2. Compute and interpret the multiple coefficient of determination and the multiple standard error of estimate

Multiple Coefficient of Determination

Formula 1:

R^2 = \frac{ESS}{TSS} = \frac{\sum (\hat{Y}_i - \bar{Y})^2}{\sum (Y_i - \bar{Y})^2} = \frac{215.937}{667.223} = 0.323

Formula 2:

R^2 = \frac{ESS}{TSS} = \frac{\hat{\beta}_1 \sum x_1 y + \hat{\beta}_2 \sum x_2 y}{\sum y_i^2} = \frac{(-7.82)(-17.3375) + (3.094)(25.98)}{667.223} = 0.323

Multiple Standard Error of Estimate

SER = \sqrt{\frac{\sum (Y_i - \hat{Y}_i)^2}{n - 3}} = \sqrt{\frac{SSR}{n - 3}} = \sqrt{\frac{451.286}{12 - 3}} = \sqrt{50.1428} = 7.0811

3. Test the significance of regression coefficients and tell whether the signs are according to the underlying theory.

Standard error and t-value of \hat{\beta}_1

\mathrm{Var}(\hat{\beta}_1) = \frac{\sum x_2^2}{(\sum x_1^2)(\sum x_2^2) - (\sum x_1 x_2)^2} \, \sigma^2

\mathrm{Var}(\hat{\beta}_1) = \frac{8.8225}{(2.2841)(8.8225) - 0.02873} \times 50.1428 = 21.98

se(\hat{\beta}_1) = \sqrt{\mathrm{Var}(\hat{\beta}_1)} = \sqrt{21.98} = 4.688

t_{\hat{\beta}_1} = \frac{\hat{\beta}_1}{se(\hat{\beta}_1)} = \frac{-7.82}{4.688} = -1.668

Standard error and t-value of \hat{\beta}_2

\mathrm{Var}(\hat{\beta}_2) = \frac{\sum x_1^2}{(\sum x_1^2)(\sum x_2^2) - (\sum x_1 x_2)^2} \, \sigma^2

\mathrm{Var}(\hat{\beta}_2) = \frac{2.2841}{(2.2841)(8.8225) - 0.02873} \times 50.1428 = 5.7

se(\hat{\beta}_2) = \sqrt{\mathrm{Var}(\hat{\beta}_2)} = \sqrt{5.7} = 2.385

t_{\hat{\beta}_2} = \frac{\hat{\beta}_2}{se(\hat{\beta}_2)} = \frac{3.094}{2.385} = 1.297

4. Predict the sales revenue at mean price and mean advertisement expenditures.

The estimated regression equation is.

\hat{Y}_i = 116.31 - 7.82X_1 + 3.094X_2

Substituting the mean values of X_1 and X_2, that is plugging \bar{X}_1 = 5.9 and \bar{X}_2 = 1.83.

\hat{Y}_i = 116.31 - 7.82(\bar{X}_1) + 3.094(\bar{X}_2)

\hat{Y}_i = 116.31 - 7.82(5.9) + 3.094(1.83)

\hat{Y}_i = 116.31 - 46.14 + 5.66

\hat{Y}_i = 75.83

Regression Output Results Summary

\hat{Y}_i = 116.31 - 7.82X_1 + 3.094X_2

se = (27.9,\; 4.688,\; 2.385)

t = (4.17,\; -1.668,\; 1.297)

R^2 = 0.323,\quad SER = 7.08

Interpretation of Regression Results

Interpretation of Coefficients and their significance

The regression results show that if the price index of sold goods increases by 1 unit, then sales revenue will decrease by USD 7.8 thousand, holding the advertising expenditure constant. If the advertising expenditure increases by USD 1000, then sales revenue will increase by USD 3000, holding the price index constant. The intercept value shows that the sales revenue is USD 116.3 thousand if the price index of sold goods and advertising expenditure are zero. It makes no sense that if the price index is zero, then the sales revenue is USD 116.3 thousand, because the sales formula is P*Q; if P = 0, then sales must be zero.

The absolute t-value of \hat{\beta}_1 is 1.668, which is less than its critical value; therefore, we fail to reject the null hypothesis and conclude that the sales price index has a statistically insignificant effect on sales revenue.

The t-value of \hat{\beta}_2 is less than its critical value, so we fail to reject the null hypothesis and conclude that advertising expenditure has no statistically significant impact on sales revenue.

According to the underlying theory, the signs of slope coefficients indicate that an increase in sales price will decrease the demand for the good, leading to a decrease in sales revenue. Similarly, more expenditure on advertising leads to more sales revenue, but the impact of both variables on sales revenue is statistically insignificant.

Interpretation of Multiple R2

The R2 value of 0.323 shows that about 32.3% of the variation in sales revenue is explained by the sales price index and advertising expenditure together. While 67.7% of the variation in sales revenue remains unexplained, it is important to consider other factors.

Suggestions for further readings:

Share this article
Facebook
Twitter
LinkedIn
WhatsApp

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *

Multicollinearity

What is Multicollinearity? Multicollinearity occurs when two or more explanatory variables in the regression model are highly correlated with each other making it difficult to isolate their individual effects on the dependent variable. Multicollinearity has two types: Perfect Multicollinearity Imperfect Multicollinearity. Perfect multicollinearity Perfect multicollinearity refers to the exact linear relationship

Read More »

Sen Capability Approach

Core Values of Development Sustenance: Sustenance is the ability to meet life-sustaining basic needs like food, clothing, shelter, health, and protection. It is the minimum level required for a good life. If any of these basic needs are absent or shorter in supply, the situation is known as absolute underdevelopment.

Read More »

Assumptions of Classical Linear Regression Model (CLRM)

In the previous post, we discussed how to estimate a sample regression model, i.e., and . by applying the OLS method on sample data, both in simple and multiple linear regression models. You can read these posts here: A Numerical Example of Multiple Linear Regression by Hand and Simple Linear Regression

Read More »

Regression Through Origin

Introduction of Regression Through Origin Models So far we have studied models like Where intercept is present. An economic example of these models is the Keynes consumption function written as: Where  is autonomous consumption i.e., level of consumption when income is zero. In some cases, we wish to impose the

Read More »