📜 ⬆️ ⬇️

Regular network based point clustering


In this article I will consider two algorithms, the first one is clustering directly, the second one is building a cluster contour in the form of a convex polygon, an applied task for improved perception of the result obtained.


Clustering


In general, the comics with this area is rather superficial, so it is quite expected that such an algorithm exists for a long time and is somehow named, if anyone knows, please enlighten.

Input data


For the algorithm to work effectively, the following condition should be met:
R <2 ^ k, or modify the algorithm.
')
Algorithm

  1. Comparison of coordinates of points with regular network cells.
    Xr = x >> k
    Yr = y >> k, where k is the grid coefficient, >> is the bitwise offset. Due to the absence of a division operation, this step yields a good speed gain. The result of this step will be a dictionary of the form:
    [network cell coordinates]: [links to points related to this cell]
  2. Building clusters in one pass over the points:
    • The next point is taken if it is not tied to the cluster, a new cluster is created and the point is added to it, the link to the cluster is affixed to the point itself.
    • A list of cells is taken into it: a cell with a dot and its neighbors.
    • All points in the cells from the list are searched.
    • There is a distance between the current point and the one being checked from the list, if the distance is <= R then two cases are possible.
      • the checked point does not enter the cluster yet => the point is entered into the same cluster where the current point is located
      • the checked point already belongs to some cluster other than the current => a cluster with a large number of points absorbs a cluster with a smaller one.


The result of the work will be a set of clusters in each of which the distance between the points does not exceed R, like this:

Or more clearly as follows:


Points of one cluster are marked with one color.

Circuit


It works, but you want to look at the result in a more attractive form, the easiest way, to select a clatter with an outline, preferably in the form of a convex polygon.
The main tool is a three-point vector multiplication:
public static int CrossLinePoint(ClusterPoint l1, ClusterPoint l2, ClusterPoint p) { return (pX - l2.X) * (l2.Y - l1.Y) - (pY - l2.Y) * (l2.X - l1.X); } 

If the result of the function is less than 0, then the bend at the middle point goes clockwise, if it is greater than 0, then against, if it is 0, there is no bend. The value itself is not important, only the sign is important.

The algorithm consists of only two steps.

Selection of a descriptive contour

  1. The center of mass of the cluster is located, without taking into account the weighting factors. Those. all coordinates are simply added, separately for x and y, and divided by the number of points in the cluster, the obtained coordinates [Xc; Yc] and will be the center of mass of the cluster.
  2. The three points farthest from the center of mass of the points that form the initial contour are searched for.
  3. The remaining points are searched, and for each entry in the initial contour is checked, if the point enters it, nothing happens, if outside, it becomes part of the contour.
  4. When adding a point to the contour, the array of contour points is sorted, the angle for the sort is taken from the polar coordinates of the point, where the center of coordinates is the center of mass of the cluster.
  5. The result will be a broken contour describing the points included in the cluster, and the points in the contour are already ordered clockwise relative to the center of mass.

Function to get the angle from the polar coordinates:
 private float GetPolarAngle(ClusterPoint p) { float a = 0, x = pX - center.X, y = pY - center.Y; if (x > 0 && y >= 0) { a = (float)Math.Atan(y / x); } else if (x > 0 && y < 0) { a = (float)(Math.Atan(y / x) + 2 * Math.PI); } else if (x < 0) { a = (float)(Math.Atan(y / x) + Math.PI); } else if (x == 0 && y > 0) { a = (float)(Math.PI * .5); } else if (x == 0 && y < 0) { a = (float)(Math.PI * 1.5); } else if (x == 0 && y == 0) { a = 0; } return (float)(a * 180 / Math.PI); } 

Here center is the center of mass of the cluster.

Sorting by angle is needed to draw the points of the contour in the correct order, and to determine the entry / non-entry point in the contour, when building the contour. If you leave points in random order, the algorithm will not work.

Result of work:

The green dot is the center of mass.
Still not very nice, so go to the last step.

Convex contour construction

The simplest step, in which the points are already ordered clockwise, makes a walk through the points and throws out of the contour those in which the bend goes counterclockwise. To determine the bend, use the above CrossLinePoint function.
Result of work:


Testing


Graphics speed of the algorithm.

The field has a constant size, the density of the points increases:

The field increases in size, the density of the points is about the same:


Processing time is highly dependent on the initial data. If you set a large distance between points in a cluster on a field with a high density of points, time increases by several times, which is logical, since the number of compared points increases. For example, on the graphs, the speed of work is three times different. The first shows that for 20,000 points it took 140 seconds, and in the case where the density does not increase just 40 seconds for the same 20,000 points.
If you were engaged in a similar task and you have your own measurements, I will be glad to see the results in the comments, to compare the effectiveness of the algorithms.

For testing, contour construction was not used. In view of the non-optimization of the contour search code, the time for its construction is several times longer than the time for finding the clusters themselves.

What's next


The algorithm works, but only for a set of points with coordinates, the next step will be an attempt to modify to highlight areas on continuous images (drawings, photographs, etc.).

Implementation


If someone is interested in the algorithm, there is a program and source code. Win / C #
Win x86
Sources

In addition to the cluster mappings shown in the article in the form of points and in the form of points with a bounding contour of lines, the program can also display in the form of closed curves and a shaded area of ​​closed curves.


Uses standard GDI + tools.

Source: https://habr.com/ru/post/138185/


All Articles