Regression in Excel: equation, examples. Linear regression

Regression analysis is a statistical research method that allows you to show the dependence of a parameter on one or more independent variables. In the pre-computer era, its use was rather difficult, especially when it came to large amounts of data. Today, learning how to build a regression in Excel, you can solve complex statistical problems in just a couple of minutes. The following are specific examples from the field of economics.

Types of Regression

This concept itself was introduced into mathematics by Francis Galton in 1886. Regression happens:

  • linear
  • parabolic;
  • exponential;
  • exponential;
  • hyperbolic;
  • indicative;
  • logarithmic.

Example 1

Consider the task of determining the dependence of the number of quitting team members on the average salary at 6 industrial enterprises.

Task. Six enterprises analyzed the average monthly salary and the number of employees who left on their own. In tabular form we have:

A

B

C

1

X

The number of resigned

Salary

2

y

30,000 rubles

3

1

60

35,000 rubles

4

2

35

40,000 rubles

5

3

20

45,000 rubles

6

4

20

50,000 rubles

7

5

fifteen

55,000 rubles

8

6

fifteen

60,000 rubles

For the task of determining the dependence of the number of workers quitting on the average salary at 6 enterprises, the regression model has the form of the equation Y = 0 + 1 x 1 + ... + k x k , where x i are the influencing variables, a i are the regression coefficients, ak - number of factors.

For this task, Y is the indicator of employees quitting, and the influencing factor is the salary, which is denoted by X.

Using the features of the Excel processor

Excel regression analysis should be preceded by applying built-in functions to existing tabular data. However, for these purposes, it is better to use the very useful add-on “Analysis Package”. To activate it you need:

  • go to the “Parameters” section from the “File” tab;
  • in the window that opens, select the line "Add-ons";
  • Click on the “Go” button located at the bottom right of the “Management” line;
  • put a tick next to the name “Analysis Package” and confirm your actions by clicking “OK”.

If everything is done correctly, the right button will appear on the right side of the "Data" tab, located above the Excel worksheet.

Linear Regression in Excel

Now that you have at hand all the necessary virtual tools for performing econometric calculations, we can begin to solve our problem. For this:

  • click on the button “Data Analysis”;
  • in the window that opens, click on the "Regression" button;
  • in the tab that appears, enter the range of values ​​for Y (the number of employees quitting) and for X (their salary);
  • Confirm your actions by clicking the "Ok" button.

As a result, the program will automatically fill in a new sheet of the tabular processor with regression analysis data. Note! In Excel, you can independently set the place that you prefer for this purpose. For example, it can be the same sheet where the values ​​of Y and X are located, or even a new book specifically designed to store such data.

Analysis of the regression results for the R-square

In Excel, the data obtained during the processing of the data of this example are of the form:

regression in Excel

First of all, you should pay attention to the value of the R-square. It represents the coefficient of determination. In this example, R-squared = 0.755 (75.5%), i.e., the calculated model parameters explain the relationship between the considered parameters by 75.5%. The higher the value of the coefficient of determination, the selected model is considered more applicable for a specific task. It is believed that it correctly describes the real situation with an R-square value above 0.8. If the R-squared is <0.5, then such a regression analysis in Excel cannot be considered reasonable.

Coefficient analysis

The number 64.1428 shows what the value of Y will be if all the variables xi in the model we are considering are reset to zero. In other words, it can be argued that other factors not described in a particular model also influence the value of the analyzed parameter.

The next coefficient of -0.16285, located in cell B18, shows the weight of the influence of the variable X on Y. This means that the average monthly salary of employees within the framework of the model in question affects the number of those who quit with a weight of -0.16285, i.e., the degree of its influence is completely small. A “-” sign indicates that the coefficient is negative. This is obvious, since everyone knows that the higher the salary at the enterprise, the less people express a desire to terminate an employment contract or quit.

Multiple regression

This term refers to the equation of communication with several independent variables of the form:

y = f (x 1 + x 2 + ... x m ) + ε, where y is the resultant attribute (dependent variable), and x 1 , x 2 , ... x m are the attribute factors (independent variables).

Parameter Estimation

For multiple regression (MR), it is carried out using the least squares method (least squares). For linear equations of the form Y = a + b 1 x 1 + ... + b m x m + ε, we construct a system of normal equations (see below)

multiple regression

To understand the principle of the method, consider the two-factor case. Then we have the situation described by the formula

regression coefficient

From here we get:

Excel Regression Equation

where σ is the variance of the corresponding attribute reflected in the index.

MNCs are applicable to the MR equation on a standardized scale. In this case, we obtain the equation:

linear regression in Excel

in which t y , t x 1, ... t xm - standardized variables for which the average values ​​are 0; β i - standardized regression coefficients, and standard deviation - 1.

Note that all β i in this case, they are specified as normalized and centralized; therefore, their comparison with each other is considered correct and permissible. In addition, it is customary to drop out factors by discarding those with the lowest βi values.

Task using the linear regression equation

Suppose there is a table of price dynamics for a specific product N over the past 8 months. It is necessary to decide on the feasibility of acquiring his lot at a price of 1850 rubles / t.

A

B

C

1

month number

month name

product price N

2

1

January

1750 rubles per ton

3

2

February

1755 rubles per ton

4

3

March

1767 rubles per ton

5

4

April

1760 rubles per ton

6

5

May

1770 rubles per ton

7

6

June

1790 rubles per ton

8

7

July

1810 rubles per ton

9

8

August

1840 rubles per ton

To solve this problem, in the Excel processor, you need to use the Data Analysis tool, which is already known in the above example. Next, select the section "Regression" and set the parameters. It must be remembered that in the field “Input Interval Y” a range of values ​​should be entered for the dependent variable (in this case, the price of the goods in specific months of the year), and in the “Input interval X” for the independent (month number). Confirm the action by clicking "Ok". On a new sheet (if so indicated) we obtain data for regression.

Using them, we construct a linear equation of the form y = ax + b, where the parameters a and b are the row coefficients with the name of the month number and the Y-intersection coefficients and rows from the sheet with the results of regression analysis. Thus, the linear regression equation (SD) for problem 3 is written as:

Product price N = 11.714 * month number + 1727.54.

or in algebraic notation

y = 11.714 x + 1727.54

Results Analysis

To determine whether the obtained linear regression equation is adequate, the multiple correlation coefficients (CMC) and determination are used, as well as the Fisher test and the Student criterion. In the Excel table with the regression results, they appear under the names multiple R, R-square, F-statistics and t-statistics, respectively.

KMK R makes it possible to evaluate the tightness of the probabilistic relationship between the independent and dependent variables. Its high value indicates a fairly strong connection between the variables “Number of the month” and “The price of the goods N in rubles per 1 ton”. However, the nature of this relationship remains unknown.

The square of the coefficient of determination R 2 (RI) is a numerical characteristic of the fraction of the total spread and shows the spread of which part of the experimental data, i.e. values ​​of the dependent variable corresponds to the linear regression equation. In the problem under consideration, this value is 84.8%, i.e., statistical data are described with a high degree of accuracy by the obtained SD.

F-statistics, also called the Fisher criterion, is used to assess the significance of a linear relationship, refuting or confirming the hypothesis of its existence.

The value of t-statistics (Student's criterion) helps to evaluate the significance of the coefficient with an unknown or free term of a linear dependence. If the value of the t-criterion> t cr , then the hypothesis of the insignificance of the free term of the linear equation is rejected.

In the problem under consideration, for the free term, using the Excel tools, it was found that t = 169.20903, and p = 2.89E-12, that is, we have zero probability that the true hypothesis about the insignificance of the free term will be rejected. For the coefficient with the unknown, t = 5.79405, and p = 0.001158. In other words, the probability that the correct hypothesis about the insignificance of the coefficient with the unknown will be rejected is 0.12%.

Thus, it can be argued that the obtained linear regression equation is adequate.

The task of the appropriateness of buying a block of shares

Multiple regression in Excel is performed using the same Data Analysis tool. Consider a specific application.

The management of NNN must decide on the feasibility of purchasing a 20% stake in MMM. The cost of the package (joint venture) is 70 million US dollars. NNN specialists collected data on similar transactions. It was decided to evaluate the value of the block of shares by such parameters, expressed in millions of US dollars, as:

  • accounts payable (VK);
  • annual turnover (VO);
  • accounts receivable (VD);
  • value of fixed assets (SOF).

In addition, the enterprise wage arrears parameter (V3 P) in thousands of US dollars is used.

Solution using Excel spreadsheet processor

First of all, you need to create a table of source data. It has the following form:

how to build regression in excel

Further:

  • Call the "Data Analysis" window;
  • select the section "Regression";
  • in the window "Input interval Y" enter the range of values ​​of dependent variables from column G;
  • Click the red arrow icon to the right of the “Input Interval X” window and select the range of all values ​​from columns B, C, D, F on the sheet.

Mark the item “New worksheet” and click “Ok”.

Get a regression analysis for this task.

regression examples in Excel

Learning the results and conclusions

“We collect” from the rounded data presented above on the sheet of the Excel spreadsheet processor, the regression equation:

SP = 0.103 * SOF + 0.541 * VO - 0.031 * VK + 0.405 * VD + 0.691 * VZP - 265.844.

In a more familiar mathematical form, it can be written as:

y = 0.103 * x1 + 0.541 * x2 - 0.031 * x3 + 0.405 * x4 + 0.691 * x5 - 265.844

The data for MMM JSC are presented in the table:

SOF, USD

VO, USD

VK, USD

VD, USD

VZP, USD

Joint venture, USD

102.5

535.5

45,2

41.5

21.55

64.72

Substituting them into the regression equation, they get a figure of 64.72 million US dollars. This means that the shares of MMM are not worth buying, as their value of $ 70 million is quite overpriced.

As you can see, the use of the Excel processor and the regression equation made it possible to make an informed decision regarding the feasibility of a very specific transaction.

Now you know what regression is. The examples in Excel discussed above will help you in solving practical problems from the field of econometrics.

Source: https://habr.com/ru/post/K2440/


All Articles