Coefficient of Determination R2

Goodness of Fit

So far, we have no way of measuring how well the explanatory or independent variable, X, explains the dependent variable, Y. In other words, we must know how well the OLS regression line fits the data. Are the observations tightly clustered around the regression line, or are they spread out? Two most commonly used measures of how well the OLS regression line fits the data are:

  1. Standard Error of Estimate of Regression (SEE or SER)
  2. Coefficient of Determination, R2

Standard Error of Estimate or Regression (SEE or SER)

Standard error of estimate of regression is a measure of goodness of fit. It is the standard deviation of the values of actual Y around the regression line. SER is a measure of the spread of the observations around the regression line, measured in the units of the dependent variable. It is square root of the variance of residuals. The closer the sample values to regression line smaller will be the standard error of regression. 

\hat{\sigma}_u = SER = \sqrt{\frac{\sum (Y_i - \hat{Y}_i)^2}{n - 2}} = \sqrt{\frac{\sum \hat{u}_i^{\,2}}{n - 2}} = \sqrt{\frac{RSS}{n - 2}}

Coefficient of Determination, r^2

The coefficient of determination is a measure of goodness of fit of regression line. It tells us how accurately sample regression line fits the data. It measures the percentage of the total variation of the dependent variable Y that can be explained by the independent variable, X. It ranges between 0 to 1.

Some Important Points About r^2

  • r^2 can never be negative because sum of deviations is always positive.
  • It ranges between 0 to 1.
  • If all sample values lie exactly on the sample regression line than r^2 would be equal to 1.
  • If r^2 is equal to zero than none of the variation in Y is explained by our model (independent variables).
  • The higher the r^2, the greater is the goodness of fit of the estimated sample regression line.
  • The scatter the data, the lower the r^2 value.

Formula of r^2

R^2 = \frac{Explained Sum of Squares (ESS)}{Total Sum of Squares (TSS)}

Understanding r^2 Through Venn Diagram

Coefficient of Determination

In the figure the circle Y represent variations in dependent variable, and the circle X represents variation in the explanatory variable X. The overlap of the two circles (the shaded area) indicates the extent to which the variation in Y is explained by the variation in X (say, via an OLS regression). The greater the extent of the overlap, the greater the variation in Y is explained by X. The r^2 is simply a numerical measure of this overlap. As you can the overlap increases as we move from top left bottom right. When there is no overlap then r^2 is equal to zero. It means none of variation in Y is due to X. If the overlap is complete than r^2 is equal to 1 which is the rare case.

Types of Variations in Regression

Types of Variations in Regression

The figure shows that total variation in Y from mean is divided into two components. Portion of variation of Y explained by X variables known as explained variation. Portion of variation of Y not explained by X variables known as unexplained variation. These variations are explained in detailed below.

Total Variation or Total Sum of Squares (TSS)

Total variation or total sum of squares (TSS) is the sum of squared deviations of Y from its mean. In fact, it is the sum of explained variation (ESS) and unexplained variation (RSS). TSS has n-1 degrees of freedom since we lose 1 df in computing the sample mean, hat is,

\displaystyle TSS=\sum (Y_i-\bar{Y})^2

TSS= ESS+ RSS

\sum (Y_i-\bar{Y})^2 = \sum (\hat{Y}_i-\bar{Y})^2 + \sum (Y_i-\hat{Y}_i)^2

Explained Variation or Explained Sum of Squares (ESS)

Explained variation or explained sum of squares or regression sum of square is the variation in dependent variable Y that is explained by our regression model. It is the sum of squared deviation of the predicted Y from its average. ESS has degrees of freedom 1. It is written as:

\text{ESS}=\sum_{i=1}^{n}\left(\hat{Y}_i-\bar{Y}\right)^2

Unexplained Variation or Sum of Squared Residuals (RSS)

Unexplained variation or residual sum of square (RSS) is the variation in Y that is not explained by our regression model. It is the sum of squared difference between actual Y and estimated Y. RSS has n-2 degrees of freedom in two variable case with intercept. It is written as:

\text{RSS}=\sum_{i=1}^{n}\left(Y_i-\hat{Y}_i\right)^2

Degrees of Freedom (df)

Degrees of freedom (df) refers to the total number of independent observations out of a total of n observations. It is calculated by the difference between total observations (n) and total number of parameters (k), df=n-k.

Suggestions for further reading:

Share this article
Facebook
Twitter
LinkedIn
WhatsApp

Leave a Reply

Your email address will not be published. Required fields are marked *

Capital Budgeting

What is Capital Budgeting? Capital Budgeting is the process of evaluating and selecting long-term investment projects that involve significant capital expenditures such as purchasing new machinery, expanding production facilities, or launching new products. It compares costs and benefits of projects to determine its long-term profitability. Some of the most common

Read More »

Cost Benefit Analysis

What is Cost Benefit Analysis (CBA)? Cost Benefit Analysis (CBA) is a process that’s used to determine the profitability of a project by estimating and comparing its costs and benefits measured in monetary terms after adjusting for the time value of money. Objectives of Cost Benefit Analysis To determine the

Read More »

Measures of Poverty and Inequality

In this post we will discuss measures of poverty and inequality. What is Poverty? Poverty can be defined in two ways, absolute poverty and relative poverty. In ordinary language when we ask about poverty, we often mean absolute poverty. But what it is? United Nations define poverty as: A condition

Read More »

Money, Functions and Types

What is Money? Money can be defined as any asset or commodity that is widely accepted for the exchange of goods and services. The word “Money” is derived from the Latin word “Moneta”. According to Mankiw: Money is the stock of assets that can be readily used to make transactions.

Read More »