🚁 🎢 🎰 Bayesian networks: definition, examples and principles of work 👩‍🎤 🥇 ♠️

A network of beliefs, decisions, a Bayesian (ian) model, or a probabilistic-oriented acyclic graphic model is a variable scheme (type of statistical model), which represents a set of variables and their conditional dependencies through a directed acyclic graph (DAG).

For example, a Bayesian network may represent a probabilistic relationship between diseases and symptoms. Given the latter, the network can be used to calculate the possibility of various diseases. In the video below, you can see an example of a Bayesian trust network with calculations.

Efficiency

Efficient algorithms can perform output and training in Bayesian networks. Networks that model variables (e.g., speech signals or protein sequences) are called dynamic. Generalizations of Bayesian networks that can represent and solve problems in the face of uncertainty are called influence diagrams.

Essence

Formally, Bayesian networks are database availability groups whose nodes represent variables in the Bayesian sense: these can be observable quantities, hidden variables, unknown parameters, or hypotheses. Therefore it is very interesting.

Bayesian Network Example

Two events can cause grass moisture: an active sprayer or rain. Rain has a direct effect on the use of the sprayer (namely that when it rains, the sprayer is usually inactive). This situation can be modeled using a Bayesian network.

Modeling

Since the Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, it can be used to update knowledge about the state of a subset of variables when other data (evidence variables) are observed. This interesting process is called probabilistic inference.

A posteriori provides universal sufficient statistics for detection applications when choosing values for a subset of variables. Thus, this algorithm can be considered a mechanism for automatically applying Bayes' theorem to complex problems. In the pictures in the article, you can see examples of Bayesian trust networks.

Output methods

The most common methods for accurate inference are: the exclusion of variables, which eliminates (by integrating or summing) non-observable parameters that are not related to the request, one after the other by distributing the sum over the product.

A click tree spread that caches calculations so that many variables can be queried at a time and new evidence can be spread quickly; and recursive matching and / or search, which allow you to find a compromise between space and time and correspond to the efficiency of eliminating variables when enough space is used.

All these methods have particular complexity, which exponentially depends on the length of the network. The most common approximate output algorithms are methods such as eliminating mini-segments, cyclic dissemination of beliefs, generalized distribution of the latter, and variational methods.

Networking

In order to fully indicate the Bayesian network and thus fully represent the joint probability distribution, it is necessary to indicate for each node X the probability distribution for X due to the parents of X.

The distribution of X conditionally by its parents can take any form. It is common to work with discrete or Gaussian distributions, as this simplifies calculations. Sometimes only distribution restrictions are known. Entropy can then be used to determine the only distribution that has the greatest entropy, subject to constraints.

Similarly, in the specific context of a dynamic Bayesian network, the conditional distribution for the temporal evolution of the latent state is usually specified to maximize the entropy rate of the implied random process.

Direct maximization of probability (or posterior probability) is often difficult, given the presence of unobserved variables. This is especially true for a Bayesian decision network.

Classic approach

A classic approach to this problem is the expectation maximization algorithm, which alternates between calculating the expected values of unobservable variables that depend on the observed data and maximizing the total probability (or a posteriori value), assuming that the previously calculated expected values are correct. Under conditions of moderate regularity, this process converges in the maximum (or maximum posterior) parameter values.

A more complete Bayesian approach to parameters is to consider them as additional unobservable variables and calculate the full posterior distribution over all nodes taking into account the observed data, and then integrate the parameters. This approach can be costly and lead to large models, making classic parameter setting approaches more accessible.

In the simplest case, a Bayesian network is determined by an expert and then used to make a conclusion. In other applications, the task of determining is too complex for humans. In this case, the structure of the Bayesian neural network and the parameters of local distributions should be studied among the data.

Alternative method

An alternative method of structural learning uses optimization search. This requires an evaluation function and search strategy. A common assessment algorithm is the posterior probability of a structure based on training data, such as BIC or BDeu.

The time required for an exhaustive search returning a structure that maximizes the estimate is superexponential in the number of variables. The local search strategy introduces gradual changes aimed at improving the assessment of the structure. Friedman and his colleagues considered using mutual information between variables to find the right structure. They restrict the set of parent candidates to k nodes and conduct a thorough search in them.

A particularly fast method for accurately studying BN is to present the problem as an optimization problem and solve it using integer programming. Acyclicity constraints are added to the integer program (IP) during the solution in the form of cutting planes. This method can handle problems with an accuracy of 100 variables.

Solution of problems

Solving problems with thousands of variables requires a different approach. One of them is to first select one order and then find the optimal BN structure with respect to this order. This implies working in a search space for possible ordering, which is convenient, since it is smaller than the space of network structures. Several orders are then selected and evaluated. This method turned out to be the best available in the literature when the number of variables is huge.

Another method is to focus on a subclass of decomposable models for which MLEs are closed. Then you can find a consistent structure for hundreds of variables.

The study of Bayesian networks with a limited width of three lines is necessary to ensure accurate, interpretable output, since the complexity of the latter is exponential in the length of the tree k (according to the exponential time hypothesis). Nevertheless, as a global property of the graph, it significantly increases the complexity of the learning process. In this context, you can use the K-tree for effective learning.

Development

The development of a Bayesian trust network often begins with the creation of a DAG G such that X satisfies the local Markov property with respect to G. Sometimes this is a causal DAG. The distributions of the conditional probability of each variable by its parents in G. are estimated. In many cases, in particular, when the variables are discrete, if the joint distribution of X is a product of these conditional distributions, then X becomes a Bayesian network with respect to G.

The Markov “knot blanket” is a multitude of knots. The Markov blanket makes the knot independent of the rest in the blank of the homonymous knot and is sufficient knowledge to calculate its distribution. X is a Bayesian network with respect to G if each node is conditionally independent of all other nodes, given its Markov blanket.

Bayesian networks: definition, examples and principles of work