Method of non-iterative learning of a single-layer network of direct distribution with a linear activation function

In this article there will not be a single line of code, there will simply be a theory of the method of teaching neural networks, which I have been developing for the last half-year. I plan to implement the method in the next article.

The prospects for non-iterative learning of neural networks are very large, this is potentially the fastest way to learn NA. I want to begin the cycle of work on non-educational training from the simplest case (where there is no place to simplify). Namely, with a single-layer network of direct distribution with a linear activation function, weighted adder. The error function for one neuron is given as:

$f_ {los} (W) = \ sum_ {i = 1} ^ n [y ^ i - (\ sum ^ m_ {j = 1} w_j \ cdot x ^ i_j)] ^ 2$

Where

W = \ {w_1, ... w_k \};

$W = \ {w_1, ... w_k \};$ , m is the number of inputs in the neural network, n is the power of the training sample, which consists of pairs: the output ideal value “y” for each neuron and the input vector “x”. It is also worth noting that you can train each neuron separately.

The network will be trained if:

$f_ {los} (W) \ rightarrow min$ i.e. if the error is minimal.
')
Given that the activation function is linear, and the equation of the error function is quadratic, it is obvious that such a function has no maximum, and therefore the condition at which

$\ frac {\ partial f_ {los} (W)} {\ partial w_i} = 0$ This is a minimum condition. Let's first define this derivative and equate it to 0.

$\ frac {\ partial f_ {los} (W)} {\ partial w_k} = -2 \ cdot \ sum_ {i = 1} ^ {n} (y ^ i- \ sum _ {j = 1} ^ mw_j \ cdot x_j ^ i) x_k ^ i = 0;$

After a series of transformations we get:

$\ sum_ {j = 1} ^ m (w_j \ cdot \ sum_ {i = 1} ^ {n} x_j ^ i \ cdot x ^ i_k) = - \ sum_ {i = 1} ^ {n} x_k ^ i \ cdot y ^ i;$

Where k is the number of the equation in the system.

To complete the training, we need to calculate the weights vector W. It is not difficult to notice that the last expression, if written for each equation, is a SLAE relative to W. To solve this system, I chose the Cramer method (the Gauss method works faster, but it is not so visual) . Each neuron weight can be written as:

$\\ w_j = \ frac {det (A_j)} {det (A)}; \\ A = \ begin {pmatrix} a_ {11} ..... .... a_ {1m} \\ ..... .... \\ .. ... .. .. \\ a_ {m1} ..... .... a_ {mm} \ end {pmatrix}; \\ B = \ begin {pmatrix} b_1 \\ .. \\ .. \\ b_m \ end {pmatrix}; \\ a_ {kj} = \ sum_ {i = 1} ^ nx_j ^ i \ cdot x ^ i_k; \\ b_k = - \ sum_ {i = 1} ^ ny ^ i \ cdot x ^ i_k;$

Here is the matrix

$A_j$ this is the matrix "A" in which the j-th column is replaced by the vector B. This is the training of one neuron, due to the fact that the neurons are in no way interconnected, they can be trained in parallel, independently of each other.

PS If there are comments on the article, write, always happy constructive criticism.

Source: https://habr.com/ru/post/332936/

All Articles

Method of non-iterative learning of a single-layer network of direct distribution with a linear activation function

More articles: