Statistical significance: definition, concept, significance, regression equations and hypothesis testing

Statistics have long become an integral part of life. People come across it everywhere. Based on statistics, conclusions are drawn about where and what diseases are common, which is more in demand in a particular region or among a certain segment of the population. On the basis of statistics , even the construction of political programs of candidates for government is based. They are also used by retail chains when purchasing goods, and manufacturers are guided by these data in their proposals.

Statistics play an important role in society and affect each individual member, even in small things. For example, if according to statistics , most people prefer dark colors in clothes in a particular city or region, then it will be extremely difficult to find a bright yellow floral coat at local retail outlets. But what are the values ​​that make up this data? For example, what is “statistical significance”? What exactly is meant by this definition?

What is it?

Statistics as a science consists of a combination of different quantities and concepts. One of them is the concept of “statistical significance”. This is the name of the value of variables, the probability of the appearance of other indicators in which is negligible.

Calculation of statistical indicators

For example, 9 out of 10 people put on rubber shoes on their feet during a morning walk for mushrooms in the autumn forest after a rainy night. The likelihood that at some point 8 of them are worn in canvas loafers is negligible. Thus, in this particular example, the number 9 is a quantity called “statistical significance”.

Accordingly, if we develop the practical example that follows, shoe stores purchase large quantities of rubber boots by the end of the summer season than at other times of the year. So, the magnitude of the statistical value affects ordinary life.

Of course, in complex calculations, for example, when predicting the spread of viruses, a large number of variables are taken into account. But the very essence of determining a significant indicator of statistical data is similar, regardless of the complexity of the calculations and the number of variable values.

How to calculate?

Used in calculating the value of the indicator "statistical significance" of the equation. That is, it can be argued that in this case, everything is decided by mathematics. The simplest calculation option is a chain of mathematical actions in which the following parameters participate:

  • two types of results obtained from surveys or the study of objective data, for example, the amounts for which purchases are made, denoted by a and b;
  • the sample size indicator for both groups is n;
  • the value of the share of the combined sample is p;
  • The concept of standard error is SE.

The next step is to determine the overall test indicator - t, its value is compared with the number 1.96. 1.96 is the averaged value transmitting a range of 95% according to the student t-distribution function.

Formula for easy calculation.

The question often arises of what is the difference between the values ​​of n and p. This nuance is simply clarified using an example. Suppose that the statistical significance of loyalty to a product or brand of men and women is calculated.

In this case, the letter designations will be the following:

  • n is the number of respondents;
  • p is the number of people satisfied with the product.

The number of women interviewed in this case will be indicated as n1. Accordingly, men - n2. The digits “1” and “2” for the character p will have the same meaning.

Comparison of the test indicator with the average values ​​of the student’s calculation tables becomes what is called “statistical significance”.

What is meant by verification?

The results of any mathematical calculation can always be checked; children are taught this in the elementary grades. It is logical to assume that once statistical indicators are determined using a chain of calculations, then they are checked.

However, verification of statistical significance is not only mathematics. Statistics deals with a large number of variables and various probabilities, which are far from always calculable. That is, if we return to the example with rubber shoes given at the beginning of the article, then the logical construction of statistical data, on which purchasers of goods for shops will rely, can be disrupted by dry and hot weather, which is not typical for autumn. As a result of this phenomenon, the number of people purchasing rubber boots will decrease, and retail outlets will suffer losses. The mathematical formula, of course, is not able to foresee the weather anomaly. This moment is called - “mistake”.

Statistics Visualization Tools

This is precisely the probability of such errors that is taken into account by checking the level of calculated significance. It takes into account both calculated indicators and accepted significance levels, as well as quantities, conditionally called hypotheses.

What is the significance level?

The concept of "level" is included in the main criteria of statistical significance. It is used in applied and practical statistics. This is a kind of value that takes into account the probability of possible deviations or errors.

The level is based on the identification of differences in the finished samples, allows you to establish their significance or, conversely, randomness. This concept has not only digital meanings, but also their peculiar decipherments. They explain how to understand the meaning, and the level itself is determined by comparing the result with the average index, this reveals the degree of significance of the differences.

Discussion of statistics

Thus, it is possible to imagine the concept of a level simply - it is an indicator of permissible, probable error or error in the conclusions made from the obtained statistical data.

What significance levels are used?

The statistical significance of the probability coefficients of a mistake in practice is based on three basic levels.

The first level is the threshold at which the value is 5%. That is, the probability of error does not exceed a significance level of 5%. This means that the confidence in the flawlessness and faultlessness of the conclusions made on the basis of statistical research is 95%.

The second level is the threshold of 1%. Accordingly, this figure means that you can be guided by the data obtained during statistical calculations with a confidence of 99%.

The third level is 0.1%. With this value, the probability of an error is equal to a fraction of a percent, that is, errors are practically eliminated.

What is a hypothesis in statistics?

Errors as a concept are divided in two directions regarding the acceptance or rejection of the null hypothesis. A hypothesis is a concept that, by definition, hides a set of survey results, other data, or statements. That is, a description of the probability distribution of something related to the subject of statistical accounting.

statistical significance of regression

The hypothesis in simple calculations is two - zero and alternative. The difference between them is that the null hypothesis takes as a basis the idea that there are no fundamental differences between the samples participating in the determination of statistical significance, and the alternative is completely opposite to it. That is, an alternative hypothesis is based on the presence of a significant difference in the data of the samples.

What are the mistakes?

Errors as a concept in statistics are directly dependent on the adoption of a particular hypothesis as true. They can be divided into two directions or of the same type:

  • the first type is due to the adoption of the null hypothesis, which turned out to be incorrect;
  • the second is caused by following an alternative.
View statistics graphs

The first type of error is called false positive and occurs quite often in all areas where statistics are used. Accordingly, the error of the second type is called false negative.

Why do we need regression in statistics?

The statistical significance of the regression is that it can be used to establish how much the model of various dependencies calculated on the basis of data corresponds to reality; allows you to identify the sufficiency or lack of factors for accounting and conclusions.

The regressive value is determined by comparing the results with the data listed in the Fisher tables. Or using analysis of variance. Regression indicators are important in complex statistical studies and calculations, in which a large number of variables, random data, and likely changes are involved.

Source: https://habr.com/ru/post/C3812/


All Articles