Good day!
I would like to share with you a implementation in MATLAB of a density algorithm for spatial data clustering with the presence of noise - DBSCAN (Density Based Spatial Clustering of Applications with Noise).
Features
The DBSCAN algorithm was proposed by Martin Esther, Hans-Peter Kriegel, and colleagues in 1996 as a solution to the problem of splitting (initially spatial) data into clusters of arbitrary shape. Most algorithms that produce a flat partition create clusters that are close to spherical in shape, since they minimize the distance of the documents to the center of the cluster. DBSCAN authors experimentally showed that their algorithm is able to recognize clusters of various shapes.
main idea
The idea underlying the algorithm is that inside each cluster there is a typical density of points (objects), which is noticeably higher than the density outside the cluster, as well as the density in areas with noise below the density of any of the clusters. More precisely, for each point of the cluster its neighborhood of a given radius must contain at least a certain number of points, this number of points is specified by a threshold value. For a detailed presentation of the principal features of the algorithm, it is necessary to introduce a number of definitions - which you can read in more detail in the excellent tutorial [1] on pages 197-200, where this algorithm is also presented in general form. I suggest you pay your attention directly to the implementation of the ideas proposed by the authors.
Input / Output Information
The input information is a cont matrix of dimension nx 4. The coordinates of the points in question are located in the first two columns, the other two columns are filled with 0.
As a result of the algorithm's operation, columns 3 and 4 of the cont matrix are filled: the 3rd column shows whether the point was processed; The fourth column is responsible for belonging to a particular class from 1 to s-1. If the fourth column contains -1, then this point was related to noise.
')
Implementation in the MATLAB package
n = length(cont(:,1));
I apologize for the incorrect highlighting of the MATLAB code, and could not deal with it on HABRAHABR.
Work results
Let the initial information for the algorithm is a set of points (see the figure below) containing, in addition to 3 clearly distinguished groups, the points that must be attributed to noise.

After the operation of the algorithm, we obtain the following clustering pattern:

As you can see, the algorithm identified 3 groups and noise (blue scatter in the image), which was to be expected! The reader's attention also wants to draw the fact that the algorithm coped well with the task of extracting points located along a certain curve, which allows, when properly used, to use this algorithm for the task of selecting the contours of objects in an image ... but this is a completely different story!
Literature
Automatic processing of texts in natural language and computational linguistics: studies. allowance / Bolshakova E.I., Klyshinsky E.S., Lande D.V., Noskov A.A., Peskova O.V., Yagunova E.V. - M .: MIEM, 2011. - 272 p.