📜 ⬆️ ⬇️

Neural networks: Lecture 2 (+ example in PHP).

The combination of neurons connected in one way or another is called an artificial neural network or simply a neural network.
The law by which neurons are connected to a network is called the structure or network topology.

Many neurons are not connected to each other, but connected to other neurons is called the neuron layer.
Networks are of 2 types: single-layer, multi-layered.

Simple perceptron


A simple perceptron consists of a 1st neuron (one layer) with n inputs and a threshold activation function.
')
Because the network outputs are +1 or -1. That perceptron is effective for solving the problem of classifying 2 classes.


If the output is +1, then the input vector belongs to the I-class, otherwise - to the II class.

First, at the 0-step of training, the weights of the input of the perceptron are set randomly.
The essence of learning is to change the weights.

To implement the training procedure for any neural network, including for a simple perceptron, a training sample consisting of vectors, which are called training vectors, is formed before the beginning , and each training vector consists of 2 parts:
  1. Those values ​​that are fed to the input
  2. What should be from our point of view at the output of the network when components from 1st part are fed to the network input

General view of the training vector: (x1, x2, ..., Xn, {+1, -1}).

Generally speaking, the second part can be empty, in which case it is said that learning takes place without a teacher.
If there is - with a teacher.

The training sample vectors are fed to the input of the network and, according to the applied vectors, the weights Wi, i = 1, n, vary during the training procedure.

For a simple perceptron, the procedure is as follows.
  1. We input the components of the 1st part of the training sample vector Xp = (X 1 p , ..., X n p ), p = 1, P. P is the index of the vector of the training sample. At this stage, the output is y = (X p ).
  2. Compare the network output with the desired value.
    y (xp)? d (xp),
    - d (xp) - the desired value
    - y (xp) - network value
    If y (xp) == d (xp) (as necessary), then p = p + 1, go to step 1.
    Otherwise, step 3.
  3. New i-weight value: Wi (t) = Wi (t-1) + d (Xp) * Xi.
    p = p + 1, step 1


Generally speaking, the procedure is complete if all vectors have passed.
2 cases are possible:
- training sample is not enough for network training
- training was completed much earlier than the end of the sample.

The convergence theorem (Novikov) :
If there is a set of weights W * capable of dividing class 2 using a simple perceptron, then the proposed algorithm converges to some solution that does not necessarily coincide with W *, and it converges in a finite number of steps.

Proof .
Due to the use of the sign function in the definition of a neuron, we can assume that || W * || = 1.
We introduce the cosine of the angle between the current value of the set of weights and W *.

cos angle = (W, W *) / || W ||.

(1) (W (t + 1), W *) = (W (t), W *) + d (X) * (X, W *).
Since W * is an exact solution, then | (W *, X) |> = delta> 0.

As soon as the learning procedure takes place, the current set of vectors W (t) incorrectly classifies the current vector X, which means d (x) * (x, W *) = | (X, W *) |, and therefore we have ( 1)> = (W (t), W *) + delta.

|| W (t + 1) || ** 2 = || W (t) || ** 2 + || X || ** 2 + 2d (X) * (x, w (t)).

By virtue of the same logic, if learning occurs, then the signs are d (x) * (X, W (T)) <0 <W 2 + M 2 .

Where M is the radius of the n-dimensional ball, inside which all vectors of the training sample are located.
Their final number, hence such a ball exists.

After t-steps, we have an inequality.
(1)> (W (0), W *) + tdelta.

|| W (t + 1) || <((W 2 (0) + tM 2 )) 1/2

Hence the angle cos is> + inf.

The resulting contradiction proves that there is some t max after which the weights do not change, which means the convergence of the learning algorithm in a finite number of steps.

Proven.

Remarks 1 : If we set W (0) = 0, then from the cosine equality t max = M 2 / Delta.
Obviously, t max here the greater the variation of the sample vectors and the greater, the smaller the distance between the classes.

Remark 2 : Essential when forming the existence of W *.
If classes are inseparable, then a simple perceptron is not enough to solve the problem.


A classic example of a task that a simple perceptron cannot solve is XOR.

UPD Example implementing a simple perceptron.
Written on PHP5.
In the example, the weights grid is already given.
A perceptron responds that they gave at the entrance, square or straight ...

In order not to litter with example, he made a separate archive .
Or look here .

Source: https://habr.com/ru/post/40659/


All Articles