Methods of mathematical statistics. Regression analysis

Pearson began using the term multiple regression analysis in his work dating back to 1908. He described it as an example of the work of an agent engaged in the sale of real estate. In his records, the house sales specialist kept records of a wide range of source data for each particular building. Based on the trading results, it was determined which factor had the greatest influence on the transaction price.

The analysis of a large number of transactions gave interesting results. The final cost was influenced by many factors, sometimes leading to paradoxical conclusions and even to obvious "emissions" when a house with a high initial potential was sold at an underestimated price indicator.

A second example of the application of such an analysis is the work of a human resources specialist who was entrusted with the determination of employee benefits. The complexity of the task was that it was required not the distribution of a fixed amount to each, but the strict correspondence of its value to the specific work performed. The emergence of many problems that have a practically similar solution, required a more detailed study of them at the mathematical level.

In mathematical statistics, a significant place was assigned to the section "regression analysis", it combined the practical methods used to study the dependencies that fall under the concept of regression. These relationships are observed between data obtained in the course of statistical studies.

The regression analysis among the many tasks to be solved has three main objectives: setting a general view for the regression equation; the construction of estimates of parameters that are unknown, which are part of the regression equation; verification of statistical regression hypotheses. In the study of the relationship that arises between a pair of quantities obtained as a result of experimental observations and comprising a series (set) of the type (x1, y1), ..., (xn, yn), they rely on the provisions of the regression theory and assume that for one quantity Y a certain probability distribution is observed, while the other X remains fixed.

The result Y depends on the value of the variable X, this dependence can be determined by various laws, and the nature of the observations and the purpose of the analysis affect the accuracy of the results. The experimental model is based on certain assumptions, which are simplified but plausible. The main condition is that the parameter X is a controlled value. Its values ​​are set before the experiment.

If during the experiment a pair of uncontrolled XY values ​​is used, then the regression analysis is carried out in the same way, but to interpret the results, during which the relationship of the studied random variables is studied, correlation analysis methods are used . Methods of mathematical statistics are not an abstract topic. They find their application in life in various fields of human activity.

In the scientific literature, the term linear regression analysis has been widely used to determine the above method. For the variable X, the term regressor or predictor is used, and the dependent Y-variables are also called criterial. This terminology reflects only the mathematical dependence of variables, but not the causal relationship.

Regression analysis is the most common method used in processing the results of a wide variety of observations. Physical and biological dependencies are studied using this method, it is implemented in economics and in technology. A host of other areas use regression analysis models. Analysis of variance, experiment planning, and multivariate statistical analysis work closely with this method of study.

Source: https://habr.com/ru/post/G201/


All Articles