Neural networks. Part 1. Basics of artificial neural networks

Good day to you, dear Habrasoobschestvo.

I want to make a little disclaimer first. The previous post in this community reviewed the basics of an artificial neural network. I dealt with this topic for writing my master's work and, accordingly, I read enough literature in my time, so I would like to add a little and continue to tell you about what a neural network is, what idea it has from the inside, how it solves problems and so on…
I’ll just make a reservation that I am not a guru in this matter, I know him (well, or I knew, since the time has passed enough) as deep as I needed to write a working neural network for digit recognition, its training and further use. The subject of the study was the structure of the neural network for character recognition, and specifically, the relationship between the number of neurons in the hidden layer and the complexity of the sample for the input data (the number of characters for recognition).

UPD : This text is basically a summary of the reading literature. It is not written by me personally. At least this part.
UPD2 : Most likely the continuation of this topic will not be, because the user stepan_ovchinnikov , who is the caretaker of this blog, believes that it makes no sense to write here what can be read from the numerous literature that is on neural networks. So sorry.
')
Perhaps the first part will be somewhat similar to the previous post of user Kallisto , but I think it is worthwhile to consider in more detail the structure of an artificial neuron, I have something to add, and, plus to everything, I want to write a complete and complete series of posts about neural networks, not relying on what has been written. I hope you will find this material useful.

Biological prototype of the neuron

The first attempt to create and research artificial neural networks is the work of J. McCulloch and W. Pitts “Logical calculus of ideas related to nervous activity” (1943), which formulated the basic principles of artificial neurons and neural networks. And although this work was only the first stage, many of the ideas described in it remain relevant today.

Artificial neural networks are induced by biology because they consist of elements whose functionality is similar to most functions of a biological neuron. These elements can be organized in a way that can correspond to the anatomy of the brain, and they exhibit a large number of properties that are inherent in the brain. For example, they can learn from experience, can generalize previous use cases to new cases, and identify significant features from input data that contains redundant information.

The central nervous system has a cellular structure. A unit is a nerve cell, a neuron. It consists of a body and processes that connect it with the outside world (Fig. 1.1). The processes by which a neuron receives arousal are called dendrites. The process through which a neuron transmits arousal is called an axon, and each neuron has one axon. Dendrites and axons have a rather complicated branching structure. The junction of the axon of the neuron - the source of excitation with the dendrite is called a synapse. The main function of the neuron is to transfer the excitation from the dendrites to the axon. But signals that come from different dendrites can affect the signal in the axon. The neuron will give a signal if the total excitation exceeds a certain limit value, which generally changes within certain limits. Otherwise, no signal will be given to the axon: the neuron will not respond to the excitation. This basic scheme has many complications and exceptions; however, most neural networks model these very simple properties.

(Figure 1.1) - Model of a biological neuron

The neuron has the following basic properties:

Takes part in the metabolism and dissipates energy. It changes the internal state with time, reacts to input signals, generates output actions and therefore is an active dynamic system.
Has many synapses - contacts for information transfer

There are two approaches to the creation of artificial neural networks (NS). Informational approach : it makes no difference what mechanisms underlie the operation of artificial neural networks; it is only important that, when solving problems, the information processes in the National Assembly are similar to the biological one. Biological : when modeling it is important to complete biosimilarity, and for this it is necessary to study in detail the work of a biological neuron.

The intensity of the signal that a neuron receives (and therefore the possibility of its activation) is highly dependent on the activity of synapses. Each synapse has a length, and special chemicals transmit a signal along it. One of the most reputable neurosystem researchers, Donald Hebb, made the postulate that learning consists primarily in changing the “power” of synaptic connections. For example, in the classical experience of Pavlov, each time just before feeding the dog a bell rang, and the dog quickly learned to associate the bell bell with food. The synaptic connections between the areas of the cerebral cortex responsible for hearing and the salivary glands increased, and when the dog was excited by the sound of a bell, the dog began to salivate.

Thus, being built from a very large number of very simple elements (each of which takes a weighted sum of input signals and, if the total input exceeds a certain level, transmits a binary signal), the brain is capable of solving extremely complex tasks.

Artificial Neuron

An artificial neuron mimics in the first approximation the properties of a biological neuron. The input of an artificial neuron receives some set of signals, each of which is the output of another neuron. Each input is multiplied by the corresponding weight, similar to synaptic power, and all works are summed up, determining the level of neuron activation. Figure 1.2 shows the model that implements this idea. Although the networks are quite different, almost all of them are based on this configuration. Here, a set of input signals, denoted x1, x2, ..., xn, are fed to an artificial neuron. These inputs correspond to signals that come to the synapses of a biological neuron. Each signal is multiplied by the corresponding weight w1, w2, ..., wn, and goes to the summing unit, denoted by ∑. Each weight corresponds to the "strength" of a single biological synaptic connection. The summing unit, which corresponds to the body of the biological element, algebraically combines the weighted inputs, creating the output NET:

(Figure 1.2) - Artificial neuron as a first approximation

This description can be represented by the following formula

where w0 is bias;
wі is the weight of the i-th neuron;
xi is the output of the i-th neuron;
n - the number of neurons that are included in the treated neuron

The signal w0, which has the name bias, displays the function of the limit value, shift. This signal allows you to shift the origin of the activation function, which further leads to an increase in the speed of learning. This signal is added to each neuron, it learns like all other scales, and its peculiarity is that it connects to the +1 signal, and not to the output of the previous neuron.

The received signal NET is usually processed by the activation function and gives the output neural signal OUT (Fig. 1.3)

(Figure 1.3) - Artificial neuron with activation function

If the activation function narrows the range of variation of the NET value so that for each NET value the OUT values belong to a certain range - a finite interval, then the function F is called a function that narrows. Logistic or “sigmoidal” function is often used as this function. This function is mathematically expressed as follows:

The main advantage of such a function is that it has a simple derivative and is differentiated along the whole x-axis. The graph of the function is as follows (Fig. 1.4)

(Figure 1.4) - Type of sigmoidal activation function

The function amplifies weak signals and prevents saturation from large signals.

Another function that is also often used is hyperbolic tangent. It is similar in shape to sigmoid and is often used by biologists as a mathematical model of nerve cell activation. She has the look

Like the logistic function, the hyperbolic tangent has an S-shape, but it is symmetrical about the origin, and at the point NET = 0 the value of the output signal is OUT = 0 (Fig. 1.5). On the graph you can see that this function, in contrast to the logistics, takes on the value of different signs, which is a very advantageous property for some types of networks.

(Figure 1.5) - Type of activation function - hyperbolic tangent

The considered model of an artificial neuron ignores many properties of a biological neuron. For example, it does not take into account the time delays that affect the dynamics of the system. Input signals immediately generate the original. But despite this, artificial neural networks composed of the considered neurons reveal properties that are inherent in the biological system.

references to literature:
1. F. Wassermen. Neurocomputer technology: theory and practice. Translation into Russian by Yu. A. Zuev, V. A. Tochenov, 1992
2. I. V. Zaentsev. Neural networks: basic models. A manual for the course "Neural networks"

Source: https://habr.com/ru/post/40137/

All Articles

Neural networks. Part 1. Basics of artificial neural networks

More articles: