Algorithm for learning a multilayer neural network using the back propagation error (Backpropagation)

The topic of neural networks was already repeatedly covered in Habré, but today I would like to acquaint readers with the algorithm for learning a multilayer neural network using the method of back-propagation of error and lead the implementation of this method.

I just want to make a reservation that I am not an expert in the field of neural networks, so I expect constructive criticism, comments and additions from readers.

Theoretical part

This material assumes familiarity with the basics of neural networks, but I consider it possible to introduce the reader to the topic course without unnecessary ordeal on the theory of neural networks. So, for those who for the first time hears the phrase “neural network”, I propose to perceive the neural network as a weighted directed graph, the nodes (neurons) of which are arranged in layers. In addition, the node of one layer has connections with all nodes of the previous layer. In our case, such a graph will have input and output layers, the nodes of which play the role of inputs and
outputs respectively. Each node (neuron) has an activation function - a function responsible for calculating the signal at the output of the node (neuron). There is also the concept of displacement, which is a node, at the output of which a unit always appears. In this article, we will consider the neural network learning process, which implies the presence of a “teacher”, that is, a learning process in which learning occurs by providing the network with a sequence of learning examples with correct responses.

As with most neural networks, our goal is to train the network in such a way as to achieve a balance between the ability of the network to respond correctly to the input data used in the learning process (remembering) and the ability to produce the correct results in response to the input data similar, but not identical to those used in training (the principle of generalization). Teaching the network the method of back propagation of an error includes three stages: submission of data to the input, followed by dissemination of data in the direction of the outputs, calculation and back propagation of the corresponding error and correction of weights. After the training, it is only assumed that data is fed to the network input and distributed in the direction of the outputs. At the same time, if the training of the network can be a rather long process, the direct calculation of the results by the trained network occurs very quickly. In addition, there are numerous variations of the back-propagation error method, designed to increase the flow rate
learning process.
It is also worth noting that a single-layer neural network is significantly limited in learning which input data patterns it is subject to, while a multi-layer network (with one or more hidden layers) does not have this disadvantage. Next, a description will be given of a standard neural network with back propagation of an error.
')

Architecture

Figure 1 shows a multilayer neural network with one layer of hidden neurons (elements of Z).

Neurons, which are network outputs (labeled

), and hidden neurons may have an offset (as shown in the image). Output offset

marked

hidden item

. These offsets serve as weights on connections emanating from neurons, at the output of which 1 always appears (in Figure 1 they are shown, but usually not explicitly displayed, implied). In addition, in Figure 1, the arrows indicate the movement of information during the data distribution phase from inputs to outputs. In the process of learning the signals propagate in the opposite direction.

Algorithm Description

The algorithm presented below is applicable to a neural network with one hidden layer, which is an acceptable and adequate situation for most applications. As mentioned earlier, network training includes three stages: feeding the training data to the network inputs, back propagating the error, and adjusting the weights. During the first stage, each input neuron

receives a signal and broadcasts it to each of the hidden neurons

. Each hidden neuron then calculates the result of its activation function (network function) and sends its signal

all weekend neurons. Every output neuron

, in turn, calculates the result of its activation function

which is nothing but the output signal of a given neuron for the corresponding input data. In the process of learning, each neuron at the output of the network compares the calculated value

with the teacher provided

(target value), defining the corresponding error value for a given input pattern. Based on this error, it is calculated

used when propagating errors from

to all network elements of the previous layer (hidden neurons associated with

), as well as later when changing the weights of the connections between the output neurons and the hidden ones. Similarly, it is calculated

for each hidden neuron

. Although there is no need to propagate the error to the input layer,

It is used to change the weights of connections between the neurons of the hidden layer and the input neurons. After all

were determined, there is a simultaneous adjustment of the weights of all links.

Legend:

The network training algorithm uses the following notation:

Input vector training data

Vector target output values provided by the teacher

Linkage adjustment component

corresponding to the error of the output neuron

; also, neuron error information

which is distributed to those neurons of the hidden layer that are associated with

Linkage adjustment component

corresponding to propagation from the output layer to the hidden neuron

error information.

Learning speed

Neuron inlet with index i. For input neurons, the input and output signals are the same -

Offset of the hidden neuron j.

Hidden neuron j; The total value supplied to the input of the hidden item

denoted by

Output signal

(result of applying to

activation function) is denoted by

The displacement of the neuron at the exit.

Neuron output k; The total value supplied to the input of the output element

denoted by

. Output signal

(result of applying to

activation function) is denoted by

Activation function

The activation function in the backpropagation error algorithm must have several important characteristics: continuity, differentiability and be monotonously non-decreasing. Moreover, for the sake of efficiency of calculations, it is desirable that its derivative be easily found. Often, the activation function is also a function with saturation. One of the most frequently used activation functions is a binary sigmoid function with a range of values in (0, 1) and defined as:

Another widespread activation function is bipolar sigmoid with a range of values (-1, 1) and defined as:

Learning algorithm

The learning algorithm is as follows:

Step 0.

The initialization of the weights (the weights of all the links are initialized to random small values).

Step 1.

Until the termination condition of the algorithm is incorrect, steps 2–9 are performed.

Step 2.

For each pair {data, target value} steps 3–8 are performed.

Distribution of data from inputs to outputs:

Step 3.

Each input neuron

sends the received signal

all neurons in the next layer (hidden).

Step 4.

Every hidden neuron

summarizes the weighted incoming signals:

and applies the activation function:

Then sends the result to all elements of the next layer (output).

Step 5.

Every output neuron

summarizes the weighted incoming signals:

and applies the activation function, calculating the output signal:

Reverse Error Propagation:

Step 6.

Every output neuron

gets the target value - the output value that is correct for this input signal, and calculates the error:

, also calculates the amount by which the weight of the connection will change

. In addition, it calculates the magnitude of the offset correction:

and sends

neurons in the previous layer.

Step 7.

Every hidden neuron

summarizes incoming errors (from neurons in the next layer)

and calculates the magnitude of the error, multiplying the resulting value by the derivative of the activation function:

, also calculates the amount by which the weight of the connection will change

. In addition, it calculates the magnitude of the offset correction:

Step 8. Changing weights.

Every output neuron

changes the weights of its connections with the displacement element and hidden neurons:

Every hidden neuron

changes the weights of its connections with the displacement element and output neurons:

Step 9.

Check the condition of the termination of the algorithm.

The condition for the termination of the algorithm can be both the achievement of a total quadratic error of the result at the network output of a predetermined minimum in the course of the learning process, and the execution of a certain number of iterations of the algorithm. The algorithm is based on a method called gradient descent. Depending on the sign, the gradient of the function (in this case, the value of the function is an error, and the parameters are the weights of the links in the network) gives the direction in which the values of the function increase (or decrease) most rapidly.

Selection of initial weights and offsets

Random initialization. The choice of initial weights will influence whether the network is able to achieve a global (or only local) minimum of error, and how quickly this process will take place. The change in weights between two neurons is related to the derivative of the activation function of the neuron from the next layer and the activation function of the neuron of the previous layer. In this regard, it is important to avoid choosing such initial weights that will nullify the activation function or its derivative. Also, the initial weights should not be too large (or the input signal for each hidden or output neuron is likely to fall into a region of very small sigmoid values (saturation region)). On the other hand, if the initial weights are too small, then the input to the hidden or output neurons will be close to zero, which will also lead to a very low learning rate. The standard procedure for initializing weights is to assign them random values in the interval (-0.5; 0.5). Values can be both positive and negative, since the final weights obtained after training the network can be both signs. Initializing Nguyen - Widrow. The following simple modification of the standard initialization procedure facilitates faster learning: The weights of the connections of the hidden and output neurons, as well as the displacement of the output layer, are also initialized, as in the standard procedure, with random values from the interval (-0.5; 0.5).

We introduce the notation:

number of input neurons

number of hidden neurons

scaling factor:

The procedure consists of the following simple steps:
For each hidden neuron

:
initialize its weight vector (connections to input neurons):

calculate

reinitialize weights:

set offset value:

Practical part

I'll start with the implementation of the concept of a neuron. It was decided to present the neurons of the input layer as the base class, and the hidden and the weekend as decorators of the base class. In addition, the neuron stores information about outgoing and incoming connections, as well as each neuron compositionally contains an activation function.

Neuron interface

/** * Neuron base class. * Represents a basic element of neural network, node in the net's graph. * There are several possibilities for creation an object of type Neuron, different constructors suites for * different situations. */ template <typename T> class Neuron { public: /** * A default Neuron constructor. * - Description: Creates a Neuron; general purposes. * - Purpose: Creates a Neuron, linked to nothing, with a Linear network function. * - Prerequisites: None. */ Neuron( ) : mNetFunc( new Linear ), mSumOfCharges( 0.0 ) { }; /** * A Neuron constructor based on NetworkFunction. * - Description: Creates a Neuron; mostly designed to create an output kind of neurons. * @param inNetFunc - a network function which is producing neuron's output signal; * - Purpose: Creates a Neuron, linked to nothing, with a specific network function. * - Prerequisites: The existence of NetworkFunction object. */ Neuron( NetworkFunction * inNetFunc ) : mNetFunc( inNetFunc ), mSumOfCharges( 0.0 ){ }; Neuron( std::vector<NeuralLink<T > *>& inLinksToNeurons, NetworkFunction * inNetFunc ) : mNetFunc( inNetFunc ), mLinksToNeurons(inLinksToNeurons), mSumOfCharges(0.0){ }; /** * A Neuron constructor based on layer of Neurons. * - Description: Creates a Neuron; mostly designed to create an input and hidden kinds of neurons. * @param inNeuronsLinkTo - a vector of pointers to Neurons which is representing a layer; * @param inNetFunc - a network function which is producing neuron's output signal; * - Purpose: Creates a Neuron, linked to every Neuron in provided layer. * - Prerequisites: The existence of std::vector<Neuron *> and NetworkFunction. */ Neuron( std::vector<Neuron *>& inNeuronsLinkTo, NetworkFunction * inNetFunc ); virtual ~Neuron( ); virtual std::vector<NeuralLink<T > *>& GetLinksToNeurons( ){ return mLinksToNeurons; }; virtual NeuralLink<T> * at( const int& inIndexOfNeuralLink ) { return mLinksToNeurons[ inIndexOfNeuralLink ]; }; virtual void SetLinkToNeuron( NeuralLink<T> * inNeuralLink ){ mLinksToNeurons.push_back( inNeuralLink ); }; virtual void Input( double inInputData ){ mSumOfCharges += inInputData; }; virtual double Fire( ); virtual int GetNumOfLinks( ) { return mLinksToNeurons.size( ); }; virtual double GetSumOfCharges( ); virtual void ResetSumOfCharges( ){ mSumOfCharges = 0.0; }; virtual double Process( ) { return mNetFunc->Process( mSumOfCharges ); }; virtual double Process( double inArg ){ return mNetFunc->Process( inArg ); }; virtual double Derivative( ){ return mNetFunc->Derivative( mSumOfCharges ); }; virtual void SetInputLink( NeuralLink<T> * inLink ){ mInputLinks.push_back( inLink ); }; virtual std::vector<NeuralLink<T > *>& GetInputLink( ){ return mInputLinks; }; virtual double PerformTrainingProcess( double inTarget ); virtual void PerformWeightsUpdating( ); virtual void ShowNeuronState( ); protected: NetworkFunction * mNetFunc; std::vector<NeuralLink<T > *> mInputLinks; std::vector<NeuralLink<T > *> mLinksToNeurons; double mSumOfCharges; }; template <typename T> class OutputLayerNeuronDecorator : public Neuron<T> { public: OutputLayerNeuronDecorator( Neuron<T> * inNeuron ){ mOutputCharge = 0; mNeuron = inNeuron; }; virtual ~OutputLayerNeuronDecorator( ); virtual std::vector<NeuralLink<T > *>& GetLinksToNeurons( ){ return mNeuron->GetLinksToNeurons( ) ;}; virtual NeuralLink<T> * at( const int& inIndexOfNeuralLink ){ return ( mNeuron->at( inIndexOfNeuralLink ) ) ;}; virtual void SetLinkToNeuron( NeuralLink<T> * inNeuralLink ){ mNeuron->SetLinkToNeuron( inNeuralLink ); }; virtual double GetSumOfCharges( ) { return mNeuron->GetSumOfCharges( ); }; virtual void ResetSumOfCharges( ){ mNeuron->ResetSumOfCharges( ); }; virtual void Input( double inInputData ){ mNeuron->Input( inInputData ); }; virtual double Fire( ); virtual int GetNumOfLinks( ) { return mNeuron->GetNumOfLinks( ); }; virtual double Process( ) { return mNeuron->Process( ); }; virtual double Process( double inArg ){ return mNeuron->Process( inArg ); }; virtual double Derivative( ) { return mNeuron->Derivative( ); }; virtual void SetInputLink( NeuralLink<T> * inLink ){ mNeuron->SetInputLink( inLink ); }; virtual std::vector<NeuralLink<T > *>& GetInputLink( ) { return mNeuron->GetInputLink( ); }; virtual double PerformTrainingProcess( double inTarget ); virtual void PerformWeightsUpdating( ); virtual void ShowNeuronState( ) { mNeuron->ShowNeuronState( ); }; protected: double mOutputCharge; Neuron<T> * mNeuron; }; template <typename T> class HiddenLayerNeuronDecorator : public Neuron<T> { public: HiddenLayerNeuronDecorator( Neuron<T> * inNeuron ) { mNeuron = inNeuron; }; virtual ~HiddenLayerNeuronDecorator( ); virtual std::vector<NeuralLink<T > *>& GetLinksToNeurons( ){ return mNeuron->GetLinksToNeurons( ); }; virtual void SetLinkToNeuron( NeuralLink<T> * inNeuralLink ){ mNeuron->SetLinkToNeuron( inNeuralLink ); }; virtual double GetSumOfCharges( ){ return mNeuron->GetSumOfCharges( ) ;}; virtual void ResetSumOfCharges( ){mNeuron->ResetSumOfCharges( ); }; virtual void Input( double inInputData ){ mNeuron->Input( inInputData ); }; virtual double Fire( ); virtual int GetNumOfLinks( ){ return mNeuron->GetNumOfLinks( ); }; virtual NeuralLink<T> * ( const int& inIndexOfNeuralLink ){ return ( mNeuron->at( inIndexOfNeuralLink) ); }; virtual double Process( ){ return mNeuron->Process( ); }; virtual double Process( double inArg ){ return mNeuron->Process( inArg ); }; virtual double Derivative( ){ return mNeuron->Derivative( ); }; virtual void SetInputLink( NeuralLink<T> * inLink ){ mNeuron->SetInputLink( inLink ); }; virtual std::vector<NeuralLink<T > *>& GetInputLink( ){ return mNeuron->GetInputLink( ); }; virtual double PerformTrainingProcess( double inTarget ); virtual void PerformWeightsUpdating( ); virtual void ShowNeuronState( ){ mNeuron->ShowNeuronState( ); }; protected: Neuron<T> * mNeuron; };

The neural link interface is shown below, each link stores a weight and a pointer to a neuron:

Neural Communication Interface

 template <typename T> class Neuron; template <typename T> class NeuralLink { public: NeuralLink( ) : mWeightToNeuron( 0.0 ), mNeuronLinkedTo( 0 ), mWeightCorrectionTerm( 0 ), mErrorInformationTerm( 0 ), mLastTranslatedSignal( 0 ){ }; NeuralLink( Neuron<T> * inNeuronLinkedTo, double inWeightToNeuron = 0.0 ) : mWeightToNeuron( inWeightToNeuron ), mNeuronLinkedTo( inNeuronLinkedTo ), mWeightCorrectionTerm( 0 ), mErrorInformationTerm( 0 ), mLastTranslatedSignal( 0 ){ }; void SetWeight( const double& inWeight ){ mWeightToNeuron = inWeight; }; const double& GetWeight( ){ return mWeightToNeuron; }; void SetNeuronLinkedTo( Neuron<T> * inNeuronLinkedTo ){ mNeuronLinkedTo = inNeuronLinkedTo; }; Neuron<T> * GetNeuronLinkedTo( ){ return mNeuronLinkedTo; }; void SetWeightCorrectionTerm( double inWeightCorrectionTerm ){ mWeightCorrectionTerm = inWeightCorrectionTerm; }; double GetWeightCorrectionTerm( ){ return mWeightCorrectionTerm; }; void UpdateWeight( ){ mWeightToNeuron = mWeightToNeuron + mWeightCorrectionTerm; }; double GetErrorInFormationTerm( ){ return mErrorInformationTerm; }; void SetErrorInFormationTerm( double inEITerm ){ mErrorInformationTerm = inEITerm; }; void SetLastTranslatedSignal( double inLastTranslatedSignal ){ mLastTranslatedSignal = inLastTranslatedSignal; }; double GetLastTranslatedSignal( ){ return mLastTranslatedSignal; }; protected: double mWeightToNeuron; Neuron<T> * mNeuronLinkedTo; double mWeightCorrectionTerm; double mErrorInformationTerm; double mLastTranslatedSignal; };

Each activation function inherits from an abstract class, implementing the function itself and its derivative:

Interface activation function

 class NetworkFunction { public: NetworkFunction(){}; virtual ~NetworkFunction(){}; virtual double Process( double inParam ) = 0; virtual double Derivative( double inParam ) = 0; }; class Linear : public NetworkFunction { public: Linear(){}; virtual ~Linear(){}; virtual double Process( double inParam ){ return inParam; }; virtual double Derivative( double inParam ){ return 0; }; }; class Sigmoid : public NetworkFunction { public: Sigmoid(){}; virtual ~Sigmoid(){}; virtual double Process( double inParam ){ return ( 1 / ( 1 + exp( -inParam ) ) ); }; virtual double Derivative( double inParam ){ return ( this->Process(inParam)*(1 - this->Process(inParam)) );}; }; class BipolarSigmoid : public NetworkFunction { public: BipolarSigmoid(){}; virtual ~BipolarSigmoid(){}; virtual double Process( double inParam ){ return ( 2 / ( 1 + exp( -inParam ) ) - 1 ) ;}; virtual double Derivative( double inParam ){ return ( 0.5 * ( 1 + this->Process( inParam ) ) * ( 1 - this->Process( inParam ) ) ); }; };

The neuron factory is responsible for the production of neurons:

Neural Factory Interface

 template <typename T> class NeuronFactory { public: NeuronFactory(){}; virtual ~NeuronFactory(){}; virtual Neuron<T> * CreateInputNeuron( std::vector<Neuron<T > *>& inNeuronsLinkTo, NetworkFunction * inNetFunc ) = 0; virtual Neuron<T> * CreateOutputNeuron( NetworkFunction * inNetFunc ) = 0; virtual Neuron<T> * CreateHiddenNeuron( std::vector<Neuron<T > *>& inNeuronsLinkTo, NetworkFunction * inNetFunc ) = 0; }; template <typename T> class PerceptronNeuronFactory : public NeuronFactory<T> { public: PerceptronNeuronFactory(){}; virtual ~PerceptronNeuronFactory(){}; virtual Neuron<T> * CreateInputNeuron( std::vector<Neuron<T > *>& inNeuronsLinkTo, NetworkFunction * inNetFunc ){ return new Neuron<T>( inNeuronsLinkTo, inNetFunc ); }; virtual Neuron<T> * CreateOutputNeuron( NetworkFunction * inNetFunc ){ return new OutputLayerNeuronDecorator<T>( new Neuron<T>( inNetFunc ) ); }; virtual Neuron<T> * CreateHiddenNeuron( std::vector<Neuron<T > *>& inNeuronsLinkTo, NetworkFunction * inNetFunc ){ return new HiddenLayerNeuronDecorator<T>( new Neuron<T>( inNeuronsLinkTo, inNetFunc ) ); }; };

The neural network itself stores pointers to neurons organized by
layers (in general, pointers to neurons are stored in vectors that
need to be replaced with layer objects), includes abstract
a neuron factory; and a network learning algorithm.

Neural network interface

 template <typename T> class TrainAlgorithm; /** * Neural network class. * An object of that type represents a neural network of several types: * - Single layer perceptron; * - Multiple layers perceptron. * * There are several training algorithms available as well: * - Perceptron; * - Backpropagation. * * How to use this class: * To be able to use neural network , you have to create an instance of that class, specifying * a number of input neurons, output neurons, number of hidden layers and amount of neurons in hidden layers. * You can also specify a type of neural network, by passing a string with a name of neural network, otherwise * MultiLayerPerceptron will be used. ( A training algorithm can be changed via public calls); * * Once the neural network was created, all u have to do is to set the biggest MSE required to achieve during * the training phase ( or u can skip this step, then mMinMSE will be set to 0.01 ), * train the network by providing a training data with target results. * Afterwards u can obtain the net response by feeding the net with data; * */ template <typename T> class NeuralNetwork { public: /** * A Neural Network constructor. * - Description: A template constructor. T is a data type, all the nodes will operate with. Create a neural network by providing it with: * @param inInputs - an integer argument - number of input neurons of newly created neural network; * @param inOutputs- an integer argument - number of output neurons of newly created neural network; * @param inNumOfHiddenLayers - an integer argument - number of hidden layers of newly created neural network, default is 0; * @param inNumOfNeuronsInHiddenLayers - an integer argument - number of neurons in hidden layers of newly created neural network ( note that every hidden layer has the same amount of neurons), default is 0; * @param inTypeOfNeuralNetwork - a const char * argument - a type of neural network, we are going to create. The values may be: * <UL> * <LI>MultiLayerPerceptron;</LI> * <LI>Default is MultiLayerPerceptron.</LI> * </UL> * - Purpose: Creates a neural network for solving some interesting problems. * - Prerequisites: The template parameter has to be picked based on your input data. * */ NeuralNetwork( const int& inInputs, const int& inOutputs, const int& inNumOfHiddenLayers = 0, const int& inNumOfNeuronsInHiddenLayers = 0, const char * inTypeOfNeuralNetwork = "MultiLayerPerceptron" ); ~NeuralNetwork( ); /** * Public method Train. * - Description: Method for training the network. * - Purpose: Trains a network, so the weights on the links adjusted in the way to be able to solve problem. * - Prerequisites: * @param inData - a vector of vectors with data to train with; * @param inTarget - a vector of vectors with target data; * - the number of data samples and target samples has to be equal; * - the data and targets has to be in the appropriate order u want the network to learn. */ bool Train( const std::vector<std::vector<T > >& inData, const std::vector<std::vector<T > >& inTarget ); /** * Public method GetNetResponse. * - Description: Method for actually get response from net by feeding it with data. * - Purpose: By calling this method u make the network evaluate the response for u. * - Prerequisites: * @param inData - a vector data to feed with. */ std::vector<int> GetNetResponse( const std::vector<T>& inData ); /** * Public method SetAlgorithm. * - Description: Setter for algorithm of training the net. * - Purpose: Can be used for dynamic change of training algorithm. * - Prerequisites: * @param inTrainingAlgorithm - an existence of already created object of type TrainAlgorithm. */ void SetAlgorithm( TrainAlgorithm<T> * inTrainingAlgorithm ) { mTrainingAlgoritm = inTrainingAlgorithm; }; /** * Public method SetNeuronFactory. * - Description: Setter for the factory, which is making neurons for the net. * - Purpose: Can be used for dynamic change of neuron factory. * - Prerequisites: * @param inNeuronFactory - an existence of already created object of type NeuronFactory. */ void SetNeuronFactory( NeuronFactory<T> * inNeuronFactory ) { mNeuronFactory = inNeuronFactory; }; /** * Public method ShowNetworkState. * - Description: Prints current state to the standard output: weight of every link. * - Purpose: Can be used for monitoring the weights change during training of the net. * - Prerequisites: None. */ void ShowNetworkState( ); /** * Public method GetMinMSE. * - Description: Returns the biggest MSE required to achieve during the training phase. * - Purpose: Can be used for getting the biggest MSE required to achieve during the training phase. * - Prerequisites: None. */ const double& GetMinMSE( ){ return mMinMSE; }; /** * Public method SetMinMSE. * - Description: Setter for the biggest MSE required to achieve during the training phase. * - Purpose: Can be used for setting the biggest MSE required to achieve during the training phase. * - Prerequisites: * @param inMinMse - double value, the biggest MSE required to achieve during the training phase. */ void SetMinMSE( const double& inMinMse ){ mMinMSE = inMinMse; }; /** * Friend class. */ friend class Hebb<T>; /** * Friend class. */ friend class Backpropagation<T>; protected: /** * Protected method GetLayer. * - Description: Getter for the layer by index of that layer. * - Purpose: Can be used by inner implementation for getting access to neural network's layers. * - Prerequisites: * @param inInd - an integer index of layer. */ std::vector<Neuron<T > *>& GetLayer( const int& inInd ){ return mLayers[inInd]; }; /** * Protected method size. * - Description: Returns the number of layers in the network. * - Purpose: Can be used by inner implementation for getting number of layers in the network. * - Prerequisites: None. */ unsigned int size( ){ return mLayers.size( ); }; /** * Protected method GetNumOfOutputs. * - Description: Returns the number of units in the output layer. * - Purpose: Can be used by inner implementation for getting number of units in the output layer. * - Prerequisites: None. */ std::vector<Neuron<T > *>& GetOutputLayer( ){ return mLayers[mLayers.size( )-1]; }; /** * Protected method GetInputLayer. * - Description: Returns the input layer. * - Purpose: Can be used by inner implementation for getting the input layer. * - Prerequisites: None. */ std::vector<Neuron<T > *>& GetInputLayer( ){ return mLayers[0]; }; /** * Protected method GetBiasLayer. * - Description: Returns the vector of Biases. * - Purpose: Can be used by inner implementation for getting vector of Biases. * - Prerequisites: None. */ std::vector<Neuron<T > *>& GetBiasLayer( ) { return mBiasLayer; }; /** * Protected method UpdateWeights. * - Description: Updates the weights of every link between the neurons. * - Purpose: Can be used by inner implementation for updating the weights of links between the neurons. * - Prerequisites: None, but only makes sense, when its called during the training phase. */ void UpdateWeights( ); /** * Protected method ResetCharges. * - Description: Resets the neuron's data received during iteration of net training. * - Purpose: Can be used by inner implementation for reset the neuron's data between iterations. * - Prerequisites: None, but only makes sense, when its called during the training phase. */ void ResetCharges( ); /** * Protected method AddMSE. * - Description: Changes MSE during the training phase. * - Purpose: Can be used by inner implementation for changing MSE during the training phase. * - Prerequisites: * @param inInd - a double amount of MSE to be add. */ void AddMSE( double inPortion ){ mMeanSquaredError += inPortion; }; /** * Protected method GetMSE. * - Description: Getter for MSE value. * - Purpose: Can be used by inner implementation for getting access to the MSE value. * - Prerequisites: None. */ double GetMSE( ){ return mMeanSquaredError; }; /** * Protected method ResetMSE. * - Description: Resets MSE value. * - Purpose: Can be used by inner implementation for resetting MSE value. * - Prerequisites: None. */ void ResetMSE( ) { mMeanSquaredError = 0; }; NeuronFactory<T> * mNeuronFactory; /*!< Member, which is responsible for creating neurons @see SetNeuronFactory */ TrainAlgorithm<T> * mTrainingAlgoritm; /*!< Member, which is responsible for the way the network will trained @see SetAlgorithm */ std::vector<std::vector<Neuron<T > *> > mLayers; /*!< Inner representation of neural networks */ std::vector<Neuron<T > *> mBiasLayer; /*!< Container for biases */ unsigned int mInputs, mOutputs, mHidden; /*!< Number of inputs, outputs and hidden units */ double mMeanSquaredError; /*!< Mean Squared Error which is changing every iteration of the training*/ double mMinMSE; /*!< The biggest Mean Squared Error required for training to stop*/ };

And finally, the interface itself of the class responsible for network training:

Learning algorithm interface

 template <typename T> class NeuralNetwork; template <typename T> class TrainAlgorithm { public: virtual ~TrainAlgorithm(){}; virtual double Train(const std::vector<T>& inData, const std::vector<T>& inTarget) = 0; virtual void WeightsInitialization() = 0; protected: }; template <typename T> class Hebb : public TrainAlgorithm<T> { public: Hebb(NeuralNetwork<T> * inNeuralNetwork) : mNeuralNetwork(inNeuralNetwork){}; virtual ~Hebb(){}; virtual double Train(const std::vector<T>& inData, const std::vector<T>& inTarget); virtual void WeightsInitialization(); protected: NeuralNetwork<T> * mNeuralNetwork; }; template <typename T> class Backpropagation : public TrainAlgorithm<T> { public: Backpropagation(NeuralNetwork<T> * inNeuralNetwork); virtual ~Backpropagation(){}; virtual double Train(const std::vector<T>& inData, const std::vector<T>& inTarget); virtual void WeightsInitialization(); protected: void NguyenWidrowWeightsInitialization(); void CommonInitialization(); NeuralNetwork<T> * mNeuralNetwork; };

All code is available on github: Sovietmade / NeuralNetworks

As a conclusion, I would like to note that the topic of neural networks is not fully developed at the moment, again and again we see on the pages of Habr mention of new achievements of scientists in the field of neural networks, new amazing developments. From my side,
This article was the first step in the development of the most interesting technology, and I hope for someone it will be useful.

References:

The learning algorithm of the neural network was taken from an amazing book:
Laurene V. Fausett “Fundamentals of Neural Networks: Architects, Algorithms and Applications”.

Source: https://habr.com/ru/post/198268/

All Articles

Algorithm for learning a multilayer neural network using the back propagation error (Backpropagation)

Theoretical part

Architecture

Algorithm Description

Legend:

Activation function

Learning algorithm

Step 0.

Step 1.

Step 2.

Distribution of data from inputs to outputs:

Step 3.

Step 4.

Step 5.

Reverse Error Propagation:

Step 6.

Step 7.

Step 8. Changing weights.

Step 9.

Selection of initial weights and offsets

Practical part

References:

More articles: