The confidence interval came to us from the field of statistics. This is a certain range that serves to evaluate an unknown parameter with a high degree of reliability. This will most easily be illustrated by an example.
Suppose you want to investigate some random variable, for example, the server’s response speed to a client’s request. Each time a user types the address of a particular site, the server responds to this at a different speed. Thus, the investigated response time is random. So, the confidence interval allows you to determine the boundaries of this parameter, and then it can be argued that with a probability of 95% the server reaction speed will be in the range we calculated.
Or you need to find out how many people know about the brand of the company. When the confidence interval is calculated, it will be possible, for example, to say that with a 95% probability, the proportion of consumers who know about this brand is in the range from 27% to 34%.
Closely related to this term is a quantity such as confidence probability. It represents the probability that the desired parameter falls within the confidence interval. From this value depends how large will be our desired range. The more important it takes, the narrower the confidence interval is, and vice versa. Usually it is set equal to 90%, 95% or 99%. The value of 95% is the most popular.
This indicator is also influenced by the variance of observations and sample size. Its definition is based on the assumption that the trait under investigation obeys the normal distribution law. This statement is also known as the Gauss Law. According to him, the distribution of all probabilities of a continuous random variable that can be described by the probability density is called normal. If the assumption of a normal distribution turned out to be erroneous, then the estimate may turn out to be incorrect.
First, let's figure out how to calculate the confidence interval for the mathematical expectation. Two cases are possible here. Dispersion (the degree of variation of a random variable) can be known or not. If it is known, then our confidence interval is calculated using the following formula:
- t * σ / (sqrt (n)) <= α <= + t * σ / (sqrt (n)), where
α is a sign
t is a parameter from the Laplace distribution table,
sqrt (n) is the square root of the total sample size,
σ is the square root of the variance.
If the variance is unknown, then it can be calculated if we know all the values of the desired attribute. To do this, use the following formula:
σ2 = 2 - () 2, where
x2av - the average value of the squares of the investigated trait,
(hsr) 2 - the square of the average value of this attribute.
The formula by which the confidence interval is calculated in this case varies slightly:
xsr - t * s / (sqrt (n)) <= α <= xsr + t * s / (sqrt (n)), where
hsr - sample mean,
α is a sign
t is a parameter that is found using the student distribution table t = t (ɣ; n-1),
sqrt (n) is the square root of the total sample size,
s is the square root of the variance.
Consider such an example. Suppose that, based on the results of 7 measurements, the average value of the studied attribute was determined equal to 30 and the variance of the sample equal to 36. We need to find a confidence interval with a probability of 99% that contains the true value of the measured parameter.
First, we determine what t is equal to: t = t (0.99; 7-1) = 3.71. We use the above formula, we get:
xsr - t * s / (sqrt (n)) <= α <= xsr + t * s / (sqrt (n))
30 - 3.71 * 36 / (sqrt (7)) <= α <= 30 + 3.71 * 36 / (sqrt (7))
21.587 <= α <= 38.413
The confidence interval for the variance is calculated both in the case of the known average and when there is no mathematical expectation data, and only the value of the point unbiased estimate of the variance is known. We will not give here the formulas for its calculation, since they are quite complex and, if desired, they can always be found on the network.
We only note that the confidence interval is conveniently determined using Excel or a network service, which is called that.