Hi, Habr! I recently started my introduction to Google’s Deep Learning library called TensorFlow. And I wanted to write an Kohonen self-organization map as an experiment. Therefore, I decided to start creating it using the standard functionality of this library. The article describes what the Kohonen self-organization map and its learning algorithm are. And also gives an example of its implementation and what came of it all.
About self-organization cards
To begin with, let's figure out what the Self-orginizing Map is, or simply SOM. SOM is an artificial neural network based on learning without a teacher. In self-organizing maps, neurons are placed in lattice sites, usually one- or two-dimensional. All neurons of this lattice are connected with all nodes of the input layer.
SOM transforms continuous source space to discrete output space .
SOM learning algorithm
Network training consists of three main processes: competition , cooperation and adaptation . The following describes all the steps of the SOM learning algorithm. ')
Step 1: Initialization. For all the vectors of synaptic weights,
Where - the total number of neurons, - dimension of the input space, selects a random value from -1 to 1.
Step 2: Sub-selection. Choose a vector from the input space.
Step 3: Search for a winning neuron or process competition. Find the most suitable (winning neuron) on the move , using the minimum Euclidean distance criterion (which is equivalent to the maximum of scalar products ):
Euclidean distance.
Euclidean distance ( ) are defined as:
Step 4: The process of cooperation. The winning neuron is located in the center of the topological neighborhood of the “cooperating” neurons. The key question is: how to define the so-called topological neighborhood (topological neighborhood) of the victorious neuron? For convenience, we denote it by the symbol: , centered on the victorious neuron . The topological neighborhood must be symmetric with respect to the maximum point determined when , - this is the lateral distance between the winner and neighboring neurons . A typical example satisfying the condition above is the gauss function :
Where - effective width. Lateral distance is defined as: in one dimensional, and: in the two-dimensional case. Where determines the position of the excited neuron, and - position of the winning neuron (in the case of a two-dimensional lattice where and coordinates of the neuron in the lattice).
Graph of topological neighborhood function for various.
SOM is characterized by a decrease in the topological neighborhood in the learning process. You can achieve this by changing according to the formula:
Where - some constant - learning step - initial value .
Schedule changein the process of studying.
Function at the end of the training phase should cover only the nearest neighbors. The figures below show the graphs of the topological neighborhood function for a two-dimensional lattice.
From this figure it is clear that at the beginning of the training, the topological neighborhood covers almost the entire lattice.
At the end of trainingtapers to nearest neighbors.
Step 5: Adaptation Process. The adaptation process involves changing the synaptic weights of the network. Changing the weight vector of the neuron in the lattice can be expressed as follows:
- parameter learning speed. As a result, we have the formula of the updated weights vector at the moment of time :
The SOM learning algorithm also recommends changing the learning speed parameter. depending on the step.
Where Is another constant of the SOM algorithm.
Schedule changein the process of studying.
After updating the scales, we return to step 2, and so on.
Heuristics SOM learning algorithm
Network training consists of two stages: The stage of self-organization can take up to 1000 iterations and maybe more. Convergence stage - required for fine-tuning the feature map. As a rule, the number of iterations sufficient for the convergence stage can exceed the number of neurons in the network by 500 times. Heuristics 1. The initial value of the learning rate parameter is better to choose close to the value: , . However, it should not fall below the value of 0.01. Heuristics 2. Original value should be set approximately equal to the lattice radius, and the constant stupid like:
At the stage of convergence, the change should be stopped. .
Implementing SOM with Python and TensorFlow
Now let's move on from theory to practical implementation of a self-organizing map (SOM) using Python and TensorFlow.
To begin, create the SOMNetwork class and create TensorFlow operations to initialize all the constants:
As a result, we implemented SOM and received the operation training_op with the help of which we can train our neural network, passing the input vector and step number as a parameter at each iteration. Below is a graph of TensorFlow operations built using Tesnorboard.
Graph operations TensorFlow.
Testing the program
To test the program, we will feed a three-dimensional vector to the input. . This vector can be represented as a color of three component. Let's create an instance of our SOM network and an array of random vectors (colors):
training_op = som.training_op() init = tf.global_variables_initializer() with tf.Session() as sess: init.run() for i, color_data in enumerate(test_data): if i % 1000 == 0: print('iter:', i) sess.run(training_op, feed_dict={som.x: color_data, som.n:i})
After learning, the network converts a continuous input color space into a discrete gradient map, when feeding a single color gamut, neurons will always be activated from one map area corresponding to a given color (one neuron is activated which is the most suitable to the supplied vector). For demonstration, one can imagine the vector of synaptic weights of neurons as a color gradient image.
The figure shows a scale map for a network of 20x20 neurons, after 200 thousand. learning iterations:
Map of weights at the beginning of training (left) and at the end of training (right).
Scale map for a network of 100x100 neurons, after 350 thousand.iterations of learning.
Conclusion
As a result, a self-organizing map was created and an example of its training on the input vector consisting of three components is shown. For learning the network, you can use a vector of any dimension. You can also convert the algorithm to work in batch mode. In this case, the order of presentation of the network of input data does not affect the final form of the attribute map and there is no need to change the learning speed parameter with time.