Statistical modeling: methods, description, application

The assumptions embodied in statistical modeling describe a set of probability distributions, some of which are supposed to adequately approximate the distribution. A specific data set is selected from the definition. The probability distributions inherent in statistical modeling are what distinguishes statistical models from other, non-statistical, mathematical models.

Connection with mathematics

This scientific method is rooted primarily in mathematics. Statistical modeling of systems is usually given by mathematical equations that relate one or more random variables and, possibly, other nonrandom variables. Thus, the statistical model is a “formal representation of the theory” (Herman Ader, quoting Kenneth Bollen).

All statistical hypothesis tests and all statistical estimates are derived from statistical models. More generally, statistical models are part of the foundation of statistical inference.

Statistical Modeling Methods

Informally, a statistical model can be considered as a statistical assumption (or a set of statistical assumptions) with a certain property: this assumption allows us to calculate the probability of any event. As an example, consider a pair of regular hexagonal cubes. We will study two different statistical assumptions about bone.

The first statistical assumption is a statistical model, because with only one assumption we can calculate the probability of any event. An alternative statistical assumption does not constitute a statistical model, because with only one assumption we cannot calculate the probability of each event.

Typical statistical model.

In the above example, with the first assumption, it is easy to calculate the probability of an event. However, in some other examples, the calculation may be complicated or even impractical (for example, this may require millions of years of calculation). For the assumption that makes up the statistical model, this difficulty is acceptable: the calculation should not be practicable, just theoretically possible.

Model Examples

Suppose we have a population of schoolchildren with children evenly distributed by age. The child’s growth will be stochastically related to age: for example, when we know that the child is 7 years old, this affects the likelihood that the child will be 5 feet tall (about 152 cm). We could formalize this relationship in a linear regression model, for example: growth = b0 + b1agei + εi, where b0 is the intersection, b1 is the parameter by which the age is multiplied when receiving the growth forecast, εi is the error term. This implies that growth is predicted by age with some error.

A valid model must match all data points. Thus, a straight line (heighti = b0 + b1agei) cannot be an equation for a data model — unless it exactly matches all data points, that is, all data points ideally lie on a line. The error term εi must be included in the equation so that the model matches all data points.

Gender statistics.

To make a statistical conclusion, we first need to accept some probability distributions for εi. For example, we can assume that the distributions εi are Gaussian, with a zero mean parameter. In this case, the model will have 3 parameters: b0, b1 and the dispersion of the Gaussian distribution.

general description

A statistical model is a special class of mathematical model. What distinguishes a statistical model from other mathematical models is that it is non-deterministic. With its help, modeling of statistical data is carried out. Thus, in a statistical model defined using mathematical equations, some variables do not have specific values, but instead have probability distributions; that is, some variables are stochastic. In the above example, ε is a stochastic variable; without this variable, the model would be deterministic.

Statistical models are often used in statistical analysis and modeling, even if the simulated physical process is deterministic. For example, tossing coins is, in principle, a deterministic process; yet it is usually modeled as stochastic (through the Bernoulli process).

Warming statistics.

Parametric Models

Parametric models are the most commonly used statistical models. Regarding semi-parametric and non-parametric models, Sir David Cox said: “Typically, they include fewer assumptions about the structure and shape of the distribution, but usually contain strong assumptions about independence.” Like all other mentioned models, they are also often used in the statistical method of mathematical modeling.

Layered Models

Multilevel models (also known as hierarchical linear models, models with embedded data, mixed models, random coefficients, models with random effects, models with random parameters or models with division into sections) are statistical models of parameters that vary on more than one level. An example is the student performance model, which contains indicators for individual students, as well as indicators for classrooms in which students are grouped. These models can be considered as generalizations of linear models (in particular, linear regression), although they can also be extended to nonlinear models. These models became much more popular after sufficient computing power and software became available.

Statistics from segments.

Multilevel models are especially suitable for research projects where data for participants is organized at more than one level (i.e., embedded data). The units of analysis are usually individuals (at a lower level) who are nested in contextual / aggregate units (at a higher level). While the lowest level of data in layered models is usually individual, repeated measurements of individuals can also be considered. Thus, multilevel models provide an alternative type of analysis for one-dimensional or multi-dimensional analysis of repeated measurements. Individual differences in growth curves can be considered. In addition, tiered models can be used as an alternative to ANCOVA, where points in the dependent variable are adjusted for covariates (e.g., individual differences) before testing treatment differences. Multilevel models are able to analyze these experiments without assuming that the slopes of the regression are homogeneous, which is required by ANCOVA.

Multilevel models can be used for data with many levels, although two-level models are the most common, and the rest of this article focuses only on this. The dependent variable should be examined at the lowest level of analysis.

Graph of atmospheric pressure.

Model selection

Model selection is the task of choosing from a set of candidate models, taking into account the data, carried out as part of statistical modeling. In the simplest cases, an existing dataset is considered. However, the task may also include designing experiments so that the data collected is well suited to the problem of model selection. Given candidate models with similar predictive or explanatory power, the simplest model is likely to be the best choice (Occam's razor).

Representatives from Konishi & Kitagawa state: "Most statistical inference problems can be considered problems associated with statistical modeling." Similarly, Cox said: “How a subject problem is translated into a statistical model is often the most important part of the analysis.”

Model selection may also relate to the problem of selecting multiple representative models from a large set of computational models for decision making or optimization under uncertainty.

Graphic Models

A graphical model, or probabilistic graphical model (PGM), or a structured probabilistic model, is a probabilistic model for which the graph expresses the structure of the conditional relationship between random variables. They are commonly used in probability theory, statistics (especially in Bayesian statistics), and in machine learning.

Statistical model with a graph.

Econometric models

Econometric models are statistical models used in econometrics. The econometric model defines the statistical relationships that are believed to exist between different economic quantities related to a particular economic phenomenon. An econometric model can be obtained from a deterministic economic model that takes into account uncertainty, or from an economic model that is itself stochastic. However, econometric models that are not tied to any particular economic theory can also be used.

Source: https://habr.com/ru/post/F32816/


All Articles