🤞🏾 🕵️ 👍🏽 Information entropy: definition of a concept, property, system 👩‍👩‍👦 👨‍👨‍👦 🐢

The concept of information entropy implies the negative logarithm of the probability mass function for a value. Thus, when a data source is less likely to have a value (that is, when an event with a low probability occurs), the event carries more “information” (“surprise”) than when the original data has a higher probability.

The amount of information transmitted by each event defined in this way becomes a random variable whose expected value is informational entropy. Typically, entropy refers to confusion or uncertainty, and its definition used in information theory is directly analogous to that used in statistical thermodynamics. The concept of IE was introduced by Claude Shannon in his 1948 article, The Mathematical Theory of Communication. This is where the term "Shannon information entropy" came from.

Definition and system

The basic model of a data transmission system consists of three elements: a data source, a communication channel and a receiver, and, as Shannon puts it, the “main communication problem” is that the recipient can identify which data was generated by the source based on the signal that it gets on the channel. Entropy provides an absolute restriction on the shortest possible average coding length without loss of compressed data by source. If the source entropy is less than the bandwidth of the communication channel, the data generated by it can be reliably transmitted to the receiver (at least theoretically, possibly neglecting some practical considerations, such as the complexity of the system required to transfer the data and the amount of time it may take to transfer data).

Information entropy is usually measured in bits (alternatively called “shannons”) or sometimes in “natural units” (nats) or decimal places (called “dits”, “bans” or “hartleys”). The unit of measurement depends on the logarithm base, which is used to determine entropy.

Properties and logarithm

The logarithm of the probability distribution is useful as a measure of entropy because it is additive for independent sources. For example, the entropy of a fair coin rate is 1 bit, and the entropy of m-volumes is m bits. In a simple representation, the log2 (n) bits are needed to represent a variable that can take one of n values if n is a power of 2. If these values are equally likely, the entropy (in bits) is equal to that number. If one of the values is more likely than the others, the observation that this value occurs is less informative than if some less general result occurred. Conversely, more rare events provide additional tracking information.

Since observation of less likely events occurs less frequently, there is nothing in common that entropy (considered average information) obtained from unevenly distributed data is always less than or equal to log2 (n). Entropy is zero when one result is determined.

Shannon's informational entropy quantifies these considerations when the probability distribution of the source data is known. The meaning of the observed events (the meaning of the messages) does not matter in the definition of entropy. The latter takes into account only the probability of observing a certain event, therefore the information that it encapsulates is data on the possibilities underlying the distribution, and not on the significance of the events themselves. The properties of informational entropy remain the same as described above.

Information theory

The basic idea of information theory is that the more a person knows about a topic, the less information about it can be obtained. If the event is very likely, it is not surprising when it occurs and, therefore, provides little new information. Conversely, if the event was unbelievable, it was much more informative that the event occurred. Consequently, the content is an increasing function of the inverse probability of the event (1 / p).

Now, if more events happen, entropy measures the average information content that you can expect if one of the events happens. This means that stamping has more entropy than tossing a coin, because each crystal result is less likely than every coin result.

Features

Thus, entropy is a measure of the unpredictability of a state or, which is the same thing, of its average information content. For an intuitive understanding of these terms, consider an example of a political survey. Typically, such polls happen because the results of, for example, elections are not yet known.

In other words, the results of the survey are relatively unpredictable, and in fact, conducting it and studying the data provide some new information; these are just different ways of saying that the a priori entropy of the survey results is great.

Now consider the case when the same survey is performed a second time shortly after the first. Since the result of the first survey is already known, the performance of the second survey can be well predicted, and the results should not contain much new information; in this case, the a priori entropy of the second survey result is small compared to the first.

Coin toss

Now consider an example of a coin toss. Assuming that the probability of tails coincides with the probability of an eagle, the entropy of the coin toss is very high, since it is a peculiar example of the informational entropy of a system.

This is due to the fact that it is impossible to predict that the outcome of the coin will be abandoned ahead of schedule: if we need to choose, the best thing we can do is to predict that the coin will fall by tails, and this prediction will be correct with a probability of 1/2. a coin toss has one bit of entropy, since there are two possible results that occur with equal probability, and studying the actual result contains one bit of information.

On the contrary, the flip of a coin using both sides with tails and without eagles has zero entropy, since the coin will always fall on this sign, and the result can be perfectly predicted.

Conclusion

If the compression scheme has no losses, that is, you can always restore the entire original message by unpacking, then the compressed message has the same amount of information as the original, but is transmitted with fewer characters. That is, it has more information or higher entropy for each character. This means that the compressed message has less redundancy.

Roughly speaking, the Shannon source coding theorem states: a lossless compression scheme cannot reduce messages on average to have more than one bit of information per message bit, but any value less than one bit of information per message bit can be achieved using a suitable scheme coding. The entropy of a message in bits, multiplied by its length, is a measure of how much general information is contained in it.