Unsupervised learning or “go there, I don’t know where, find it, I don’t know what”

Expert systems, neural networks, predicate calculus, Horn disjuncts, convergence theorems ...
I do not know about you, I have all this cuisine is delighted. How wonderful that computers (of course, taught by programmers armed with serious mathematics) can at least sometimes approach a person in decision-making skills. This is especially good for them when a person is ready to teach.

In other words, AI methods work the better, the more formalized the knowledge that a computer must master. See, for example, the previous post about chess .

Unsupervised learning, learning without a teacher
However, there are situations, and there are quite a lot of them, when no one knows the correct answer. And even it is not clear what is the answer. And even the task is not completely clear. There is only data. It is necessary to extract something useful from them.
Agree, the task is much more interesting and gives room for fantasy?

Usually such tasks are formulated as automatic classification tasks . At the entrance there are a lot of separate objects (for example, web pages or case histories of patients), at the exit there is a tree of groups of such objects where they are neatly arranged on the shelves.
')

You say - nonsense? What to do, if the task and criterion of an estimation of result is not clear? However, there are methods for solving such problems, and they are quite effectively applied.

One of these methods is Kohonen self-organizing maps (SOM, Kohonen neural network, Kohonen data clustering algorithm).

Initially, all data are represented as points in some multidimensional space. The point is described by a set of coordinates along the axes, and these coordinates are usually many (hundreds and thousands). Our task is to identify groups, connections, patterns in this inconvenient multidimensional cloud .

The idea of the method arose by analogy with the human brain. According to one version, the human cortex is a flat sheet of approximately 1 square meter. m, crumpled and tucked into the skull. At the same time, some areas of the cortex are adjacent to each other tightly, although their “flat coordinates” are very far from each other. Kohonen suggested (unconfirmed, but not refuted) that the proximity of the points of the initially flat crust has some deeper meaning, at the level of knowledge and thinking of a person.

The method of self-organizing cards
How does the method work? There are data points in space. The Kohonen network of neurons is placed in the same space.
All data points are searched, for each one there is the nearest network point and the whole network pulls up to this point. The network is pulled over the neuron that is closer to the data point. The farther from the best a particular neuron, the less its movement.

As a result, the entire network creeps quite vigorously towards the clusters of points and is distributed between them, covering the multidimensional data with a flat map.

For example, for two-dimensional and three-dimensional tasks, it looks like this (training is not consciously brought to an end when the card collapses in 2-3 clots):

Analysis and Opportunities
After the process is over, the map is “cracked down” and the resulting groups of data can be analyzed (there are many methods and characteristics).

If you then select some target attribute in the found clusters (for example, determine the sex of a person by a set of features of facial features), you can check the effect of each feature and their groups on the result in more detail: clustering turns into a tool for analysis and decision making.

The resulting classification can be made more or less detailed depending on the requirements. They regulate the number of clusters or set the threshold of closeness of individual clusters, obtaining a multilevel classification with any degree of detail.

Implementations and demonstrations
There is a program for demonstration purposes that implements the work on training the Kohonen network on sets of points in three-dimensional space.

The points of the images are distributed over zones of space defined as ellipsoidal clouds: the center and three radii. The configuration and source data in are set in xml. The program for clarity displays a three-dimensional image and three projections on the coordinate plane. Automatically generates video with the latest training session.

Download sample program

Written using OpenCV ( Download description )

Kohonen networks as one of the data analysis tools are also present in a variety of specialized software packages.

Methods of data analysis in the absence of a teacher and clear criteria for grouping are widely used in the task of selecting features for analysis, downgrading, data mining.

What to read on the topic :
In Russian
1. Kohonen Networks
2. Cluster analysis
In other languages
3. www.cis.hut.fi/teuvo
4. www.shef.ac.uk/psychology/gurney/notes/l7/l7.html
5. www.len.ro/work/ai/som-neural-networks
6. www.samhill.co.uk/kohonen

Source: https://habr.com/ru/post/51372/

All Articles

Unsupervised learning or “go there, I don’t know where, find it, I don’t know what”

More articles: