Pseudo-random number: methods of obtaining, advantages and disadvantages

A pseudo-random number is a special digit created by a special generator. A generator of such numbers (PRNG), also known as a generator of deterministic random bits (DRBG), is an algorithm for creating a sequence of numbers whose properties approximate the characteristics of sequences of random numbers. The generated PRNG sequence is not truly random, because it is completely determined by the initial value, called the initial number of PRNG, which may include truly random values. Although sequences that are closer to random can be created using hardware random number generators, pseudo random number generators are important in practice for the speed of number generation and reproducibility.

Number randomization

Application

PRNGs are central to applications such as modeling (for example, for the Monte Carlo method), electronic games (for example, for procedural generation) and cryptography. Cryptographic applications require that the output is not predictable from earlier information. More complex algorithms are required that do not inherit the linearity of simple PRNGs.

Terms & Conditions

Good statistical properties are central to obtaining PRNG. In general, careful mathematical analysis is needed to ensure that the RNG generates numbers that are close enough to random to match the intended use.

John von Neumann warned of misinterpreting PRNG as a truly random generator and joked that "anyone who considers arithmetic methods to obtain random numbers, of course, is in a state of sin."

Using

PRNG can be started from an arbitrary initial state. It will always generate the same sequence when initializing with this state. The PRNG period is defined as follows: the maximum over all initial states of the length of the repeatless prefix of the sequence. The period is limited by the number of states usually measured in bits. Since the length of the period potentially doubles with each bit of the “state” added, it is easy to create PRNGs with periods large enough for many practical applications.

Large randomization schedules

If the internal state of the PRNG contains n bits, its period can be no more than 2n results, it is much shorter. For some PRNGs, the duration can be calculated without going around the entire period. Linear Feedback Shift Registers (LFSRs) are typically chosen to have periods equal to 2n - 1.

Linear congruent generators have periods that can be calculated using factoring. Although the PPP will repeat their results after they reach the end of the period, a repeated result does not mean that the end of the period has been achieved, since its internal state may be greater than the output; this is especially true for PRNGs with single-bit output.

Possible mistakes

Errors detected by defective PRNGs range from invisible (and unknown) to obvious. An example is the RANDU random number algorithm, which has been used on mainframes for decades. This was a serious flaw, but its inadequacy went unnoticed for a long period of time.

Number Generator Operation

In many areas, research using random sampling, Monte Carlo simulations, or other methods based on PPPs is much less reliable than could be the result of using low-quality GNPPs. Even today, caution is sometimes required, as evidenced by the warning given in the International Encyclopedia of Statistical Science (2010).

Success Case Study

As an illustration, consider the widely used Java programming language. As of 2017, Java is still relying on a linear congruent generator (LCG) for its PRNG.

History

The first PRNG that escaped serious problems and still worked pretty fast was Mersenne Twister (discussed below), which was published in 1998. Since then, other high quality PRNGs have been developed.

Generation Description

But the history of pseudorandom numbers does not end there. In the second half of the 20th century, the standard class of algorithms used for PRNG included linear congruent generators. LCG quality was known to be inadequate, but best practices were not available. Press et al (2007) described the result as follows: “If all scientific articles whose results were in doubt because of [LCG and related] disappeared from library shelves, there would be a gap the size of your fist on each shelf.”

The main achievement in the creation of pseudo-random generators was the introduction of methods based on linear recurrence in a two-element field; such generators are connected with linear feedback shift registers. They served as the basis for the invention of pseudo-random number sensors.

In particular, the 1997 invention of Mersen Twister avoided many problems with earlier generators. Mersenne Twister has a period of 219937−1 iterations (≈4.3 × 106001). It is proved that it is evenly distributed in (up to) 623 dimensions (for 32-bit values), and at the time of its introduction it worked faster than other statistically sound generators that create pseudorandom sequences of numbers.

In 2003, George Marsaglia introduced a family of x-shift generators, also based on linear repetition. Such generators are extremely fast and - in combination with a non-linear operation - they pass rigorous statistical tests.

In 2006, the WELL family of generators was developed. WELL generators in a sense improve the quality of Twister Mersenne, which has too much state space and very slow recovery from them, generating pseudorandom numbers with a large number of zeros.

Random Number Characterization

Cryptography

A PRNG suitable for cryptographic applications is called cryptographically secure PRNG (CSPRNG). The requirement for CSPRNG is that an attacker who does not know the seed has only a slight advantage in distinguishing the output sequence of a generator from a random sequence. In other words, while PRNG is only required to pass certain statistical tests, the CSPRNG must pass all statistical tests that are limited to polynomial time in seed size.

Although proof of this property is beyond the scope of the theory of computational complexity, compelling evidence can be provided by reducing CSPRNG to a problem that is considered complex, like integer factorization. In general, it may take years of review before an algorithm can be certified as CSPRNG.

It was shown that it is likely that the NSA inserted an asymmetric back door into the NIST-certified Dual_EC_DRBG pseudo random number generator.

BBS Generator

Pseudo Random Number Algorithms

Most PRNG algorithms produce sequences that are evenly distributed by any of several tests. This is an open question. It is one of the central ones in the theory and practice of cryptography: is there a way to distinguish the output of high-quality PRNG from a truly random sequence? In this setting, the recognizer knows that either the well-known PRNG algorithm was used (but not the state with which it was initialized), or a truly random algorithm was used. He must distinguish between them.

The security of most cryptographic algorithms and protocols using PRNGs is based on the assumption that it is not possible to distinguish between the use of a suitable PRNG and the use of a truly random sequence. The simplest examples of this dependency are stream ciphers, which most often work by excluding or sending plain text messages with PRNG output, creating encrypted text. The development of cryptographically adequate PRNGs is extremely difficult because they must meet additional criteria. The size of its period is an important factor in the cryptographic suitability of PRNG, but not the only one.

Pseudo random numbers

The early computer PRNG, proposed by John von Neumann in 1946, is known as the mid-square method. The algorithm is as follows: take any number, square it, delete the middle digits of the resulting number as a "random number", then use this number as the starting number for the next iteration. For example, squaring the number 1111 gives 1234321, which can be written as 01234321, the 8-digit number is the square of the 4-digit number. This gives 2343 as a "random" number. The result of repeating this procedure is 4896 and so on. Von Neumann used 10-digit numbers, but the process was the same.

The disadvantages of the "middle square"

The problem with the “mean square” method is that all sequences end up repeating, some very quickly, for example: 0000. Von Neumann knew about this, but he found an approach sufficient for his purposes, and was worried that the mathematical “ corrections ”will simply hide the errors and not delete them.

The essence of the generator

Von Neumann considered the hardware random and pseudo-random number generators inappropriate: if they do not write the generated output, then they cannot be checked for errors later. If they recorded their results, they would have exhausted the limited available computer memory and, accordingly, the computer's ability to read and write numbers. If the numbers were written on the cards, they would need much more time to write and read. On the ENIAC computer, which he used, the “mean square” method and carried out the process of obtaining pseudo-random numbers several hundred times faster than reading numbers from punch cards.

The mean square method has since been supplanted by more complex generators.

Groundbreaking method

A recent innovation is to combine the middle square with the Weil sequence. This method provides high quality products for a long period. It helps to get the best pseudo random number formulas.

Source: https://habr.com/ru/post/E21561/


All Articles