Simulation of a neural network

Dear habocoobschestvuyu, decided to share with you my experience in the study of the neural network of the Boltzmann Machine, made in the student year.

In Russia there was very little information on this topic. Even the head of our department could not help me with the material. The benefit of our university was in a single international base, and there was an opportunity to take advantage of international experience. In particular, most of it was found in the literature of the University of Oxford. In fact, this article is a collection of information from various sources, reinterpreted and presented in a fairly clear language, I think. I hope someone will be interested. Once it forced me not to sleep at night.
So let's get started.

Hopfield network

Among the various configurations of artificial neural networks (NS) there are such, when classifying them according to the principle of learning, strictly speaking, neither learning with a teacher [1] nor learning without a teacher [2] is suitable. In such networks, the weights of synapses are calculated only once before the start of the network operation based on information about the data being processed, and all network training is reduced to just this calculation. On the one hand, the presentation of a priori information can be regarded as a teacher’s help, but on the other hand, the network actually just remembers the samples before real data arrives at its entrance and cannot change its behavior, therefore, talking about a link of feedback with the “world "(Teacher) is not necessary. Of the networks with similar logic, the most well-known is the Hopfield network, which is commonly used to organize associative memory. That is what we consider.
The block diagram of the Hopfield network is shown in the figure below. It consists of a single layer of neurons, the number of which is simultaneously the number of inputs and outputs of the network. Each neuron is connected by synapses with all other neurons, and also has one input synapse, through which the signal is input. Output signals, as usual, are formed on axons.
')

The problem solved by this network as an associative memory, as a rule, is formulated as follows. A certain set of binary signals (images, sound digitization, other data describing certain objects or characteristics of processes), which are considered exemplary, is known. The network must be able to select (“remember” by partial information) the corresponding sample (if any) or “give a conclusion” that the input data do not correspond to any of the samples from an arbitrary non-ideal signal applied to its input. In general, any signal can be described by the vector X = {xi: i = 0 ... n-1}, n is the number of neurons in the network and the dimension of the input and output vectors. Each element xi is either +1 or -1. Denote the vector describing the k-th pattern by Xk, and its components, respectively, xik, k = 0 ... m-1, m is the number of samples. When the network recognizes (or “remembers”) a sample based on the data presented to it, its outputs will contain it, that is, Y = Xk, where Y is the vector of output network values: Y = {yi: i = 0 ,. ..n-1}. Otherwise, the output vector does not coincide with any exemplary.
If, for example, the signals are some images, then by displaying the data from the network output graphically, you can see a picture that completely coincides with one of the exemplary (if successful) or “free improvisation” of the network (in case of failure).

At the network initialization stage, the weights of synapses are set as follows [3] [4]:

(one)
Here, i and j are indices of, respectively, presynaptic and postsynaptic neurons; xik, xjk are the i-th and j-th elements of the k-th sample vector.
The network operation algorithm is as follows (p is the iteration number):
1. An unknown signal is sent to the network inputs. In fact, it is entered directly by setting axon values:
Yi (0) = xi, i = 0 ... n-1 (2)
therefore, the designation on the network diagram of the input synapses in an explicit form is purely conditional. Zero in the bracket to the right of yi means zero iteration in the network loop.
2. Calculate the new state of neurons.

(3)
and new values of axons

(four)

Fig. 4 Activation Functions
where f - activation function in the form of a jump, shown in Figure (4)

3. Check whether the output values of axons changed during the last iteration. If yes, go to step 2, otherwise (if the outputs are stabilized), end. At the same time, the output vector is a sample that is best combined with the input data.
As mentioned above, sometimes the network can not conduct recognition and outputs a non-existent image. This is due to the problem of limited network capabilities. For the Hopfield network, the number of memorized images m should not exceed a value approximately equal to 0.15 • n. In addition, if the two images A and B are very similar, they will probably cause cross associations in the network, that is, presentation of the vector A to the network inputs will result in the appearance of vector B at its outputs and vice versa.
Also for a complete understanding of the functioning of the network it is necessary to mention the Hebbian rule.
The learning rule for the Hopfield network is based on research by Donald Hebb (D.Hebb, 1949), who suggested that the synaptic connection connecting the two neurons will increase if both neurons in the process of learning are consistently excited or inhibited. A simple algorithm that implements such a learning mechanism is called the Hebbian rule. Consider it in detail.

Let a training sample of images be given xa, a = 1..p. It is required to construct the process of obtaining the matrix of connections W, such that the corresponding neural network will have the images of the training sample as stationary states (the values of the thresholds of neurons T are usually set equal to zero).
In the case of a single training image, Hebb's rule leads to the required matrix:
Wij = ξi * ξj (5)
Let us show that the state S = x is stationary for the Hopfield network with the indicated matrix. Indeed, for any pair of neurons i and j, the energy of their interaction in the state x reaches its minimum possible value.
Eij = - (1/2) xi xj xi xj = -1/2. (6)
In this case, - total energy is equal to E = - (1/2) N2, which corresponds to the global minimum.
An iterative process can be used to memorize other images:
Wij (a) = Wij (a-1) + ξi (a) + ξj (a), W (0) = 0, a = 1..p (7)
which leads to a full matrix of connections in the form of Hebb:

(eight)
The stability of the totality of images is not as obvious as in the case of a single image. A number of studies show that a neural network trained according to Hebb's rule can, on average, with a large network size N, store no more than p »0.14 N different images. Resilience can be shown for a set of orthogonal patterns when

(9)
In this case, for each state xa, the product of the total input of the ith neuron hi by the amount of its activity Si = xia is positive, therefore the state xa itself is a state of attraction (stable attractor):

(ten)
Thus, the Hebba rule ensures the stability of the Hopfield network on a given set of relatively few orthogonal patterns.

Synchronous and asynchronous activation option

Consider the behavior of the network in time with a fixed matrix of weights. Let at the moment t = 0 we betrayed the state vector some value. There are two options for the future process of the network: synchronous and asynchronous. The synchronous version of the work is that all neurons simultaneously change their state at time t + 1 according to the state of the network at time t. In the case of asynchronous operation at time t + 1, only one neuron changes its state, at the time t + 2 some other neuron changes according to the network state at time t + 1, etc. (each time a neuron is chosen randomly). In any case, over time, the network somehow changes its state. It is argued that under the conditions imposed by us on the matrix of weights, the network will come to a stationary state after some finite time, that is. It is also stated that this stationary state S is achieved and is the same regardless of the synchronism of the work of neurons.

Boltzmann Machines (Probabilistic Network)

One of the major drawbacks of the Hopfield network is the tendency for the output signal to “stabilize” at a local rather than a global minimum. It is desirable that the network should find deep minima more often than shallow ones, and that the relative probability of a network moving to one of two different minima depends only on the ratio of their depths. This would allow controlling the probabilities of obtaining specific output vectors by changing the profile of the energy surface of the system by modifying the link weights.
The idea of using “thermal noise” to get out of local minima and increase the likelihood of falling into deeper minima belongs to S. Kirpatriku. Based on this idea, an annealing simulation algorithm has been developed.
We introduce some parameter t - an analogue of the level of thermal noise. Then the probability of activity of a certain neuron k is determined on the basis of the Boltzmann probability function:
k = 1 / (1 + exp (-k / t)) (11)
where t is the level of thermal noise in the network; EC - the sum of the weights of the connections of the k-th neuron with all currently active neurons. The curve of the change in the probability of the activity of the kth neuron is shown in Figure 5. As t decreases, the fluctuations in the activity of the neuron decrease; when t = 0, the curve becomes threshold.

The change in the probability of neuron activity depending on the parameter t.
Statisticians call such networks random Markov fields. The network using an annealing simulation algorithm for training is called the Boltzmann machine in honor of the Austrian physicist L. Boltzmann, one of the creators of statistical mechanics. We formulate the problem of learning a probabilistic network, in which the probability of a neuron's activity is calculated by formula (11). Let for each set of possible input vectors it is required to obtain with a certain probability all admissible output vectors. In most cases, this probability is close to zero. The procedure for learning the Boltzmann machine comes down to performing the following alternating steps:
1) Submit the input vector to the network input and fix the output (as in the training procedure with the teacher). Provide networks with the opportunity to get closer to the state of thermal equilibrium:
a) assign the state of each neuron with probability pk (see formula (7.2)) to value one (active neuron) and with probability 1-pk - zero (not active neuron);
b) reduce the parameter t, go to a).
In accordance with the Hebbian rule, to increase the weight of the connection between active neurons by δ. Repeat these steps for all pairs of training sample vectors.
2) Submit to the input network input vector without fixing the output vector. Repeat a), b). Reduce the weight of the connection between active neurons by δ.
The resulting change in the weight of the connection between two randomly taken neurons at a certain learning step will be proportional to the difference between the probabilities of the activity of these neurons at the 1st and 2nd steps. When repeating steps 1 and 2, this difference tends to zero.
The second way of learning - learning without a teacher - is a preliminary calculation of the coefficients in the matrix of weights. These calculations are based on a preliminary knowledge of the conditions of the problem being solved.

Friends, this is all I could remember and find in my storerooms.
In the next article I want to describe the already applied aspect of this issue, namely the development of a program using the Boltzmann Machine algorithm. The decision will be on the task of a salesman.
Thank you all for your attention.
Really looking forward to comments and questions.

Source: https://habr.com/ru/post/188368/

All Articles

Simulation of a neural network

More articles: