Combinatorial algorithms: combination index, partitioning into subsets

Short preface

Combinatorial algorithms are used quite often. On the Internet you can find a lot of information regarding combinatorial algorithms. However, the Russian-language Internet, basically, gives the simplest tasks of continuous enumeration (generation) of combinatorial objects in a cycle. For example:

Example

//   3  52 for (int i1 = 0; i1 < 50; ++i1) for (int i2 = i1+1; i2 < 51; ++i2) for (int i3 = i2+1; i3 < 52; ++i3) // ...

Combination index

Each combination, permutation, placement, and other combinatorial objects can be associated with an index — this is the number in which it appears when iterating through this algorithm.

Here we will look at a more complex task, the solutions of which I haven’t found in Runet (however, I’ll give one link , but that formula is clearly incorrect) - based on the combination itself (in this case, a set of three numbers), find its index.

There is a brute force "in the forehead." We turn on the counter of combinations, we take the algorithm above and until we reach the desired option, we sort out everything. This option has a very large time complexity, so this option will be discarded.
Suppose there are numbers i1, i2, i3 at the input. First of all, we need to arrange them in ascending order (note that the generation algorithm itself, given above, always displays them in an ordered form, although the very notion of “combination” implies an arbitrary order).

Put, for definiteness, after ordering i1 = 3, i2 = 7, i3 = 12.
This means that we “went through” all the combinations, where i1 was equal to 0, 1 and 2.
For i1 = 0, we went through C (2, 51) combinations, since the indices i1, i2 run through all the combinations of 2 out of 51 numbers.
For i1 = 1, we passed C (2, 50) combinations.
For i1 = 2, we passed C (2, 49) combinations.
Total we passed the SUM {from n = 0 to n = i1-1) C (2, 51-n}.
With i1 = 3, we will consider the combinations that we went through, running over the index i2 (and it starts with i2 = i1 + 1 = 4).
With i2 = 4, C (1, 47) combinations passed, with i2 = 5 C (1, 46) combinations passed, with i2 = 6 C (1, 45) combinations passed.
By complete analogy, with i2 = 7, we consider the combinations that the i3 index ran.
We get the general formula:
The desired combination index = SUM {from n = 0 through i1-1} C (2, 51-n) + SUM {from n = i1 + 1 through i2-1} C (1, 51-n) + SUM {from n = i2 + 1 through i3-1} C (0, 51-n).
')

Index of partitioning into subsets

In combinatorics, there is a more complex object - splitting into subsets. For example, splitting a 52-element set into three subsets, consisting, respectively, of, say, 2, 3, and 47 elements, where the order of the elements within each subset is unimportant. (By the way, a combination of 2 out of 52 is just a special case of splitting into two subsets of 2 and 50 elements, respectively).

A typical generation algorithm is that we generate combinations of 2 of 52, and for each such combination in a nested loop we generate combinations of 3 of 50, and check that the indices (i3, i4, i5) in the nested combination do not match the indices (i1, i2) in external combination:

C ++ code

 //   for (int i1 = 0; i1 < 51; ++i1) for (int i2 = i1+1; i2 < 52; ++i2) //   for (int i3 = 0; i3 < 50; ++i3) if (i3 != i1 && i3 != i2) for (int i4 = i3+1; i4 < 51; ++i4) if (i4 != i1 && i4 != i2) for (int i5 = i4+1; i5 < 52; ++i5) if (i5 != i1 && i5 != i2) // ...

Again, each such partition has its own index.
It turns out that our algorithm for finding the combination index can be modified to find the partition index.
It should be noted that the indices in the “external combination” i1, i2 must be ordered among themselves, and the indices i3, i4, i5 among themselves, but independently of the first two.
It should also be noted that in the “nested combination” we are essentially working with a 50-element set, but the indices i3, i4, i5 need to be “shifted” in a certain way so that they change not from 0 to 51, but from 0 to 49:

C ++ code

 if (i3 >= i1) --i3; if (i3 >= i2) --i2; //   i4, i5

(so to speak, we cut out the indices i1, i2 from our 52-element set and then shift the remaining set to close the holes, while the indexes i3-i5 are shifted).
It should be borne in mind that inside each "external" combination we have exactly C (3, 50) "nested" combinations.
Then the algorithm will be reduced to the following:
CALCULATION INDEX (i1, i2 of 52) * NUMBER OF COUNTINGS_PO_3_IZ_50 + CALCULATION INDEX (i3, i4, i5 of 50 after the shift of indices).

Reduction of algorithms to constant complexity

Immediately I should note that a “error” occurs in the formula if, for example, i1 = 0 (we assume the sum for n = from 0 to -1) or i2 = 1 (we count the sum from 1 to 0). In fact, in such cases, the corresponding amount should be taken equal to zero, and the result will be correct.
I should also pay attention to the time complexity of our algorithms: they have linear complexity, provided that C is considered to be constant time. This is much better than brute force.
But in fact, for a fixed dimension of the base set (in our case 52), nothing prevents us from reducing the algorithm to constant complexity. For this, we apply the knowledge of mathematics and analytically calculate all sums.
For example, the number of combinations of C (2, K), if "open" it in the form of the formula K! / ((K-2)! * 2!), Is reduced to a polynomial of the 2nd degree. And since we have it under the sum sign, then we can apply the formulas for the sum of the Nth powers of the numbers of the natural series (see Wikipedia ) and get a single polynomial of the 3rd degree. Which obviously has a constant temporal complexity. (And the “mistake” that I gave at the beginning of the subtitle doesn’t manifest itself at all, the formula remains correct).
Again, this is for a fixed dimension of the base set . However, I am sure that using metaprogramming, you can “teach” the compiler to do this work for you.

Sample code for index splitting into 2, 3, 47

 int get_split_2_3_47_index(int i1, int i2, int i3, int i4, int i5) { //    2  52,   C(3, 50) int offset = ( (52*51 - (51-i1)*(52-i1)) / 2 + (i2 - i1 - 1) ) * 19600; // ""   ,    0...49 if (i3 >= i1) --i3; if (i3 >= i2) --i3; if (i4 >= i1) --i4; if (i4 >= i2) --i4; if (i5 >= i1) --i5; if (i5 >= i2) --i5; //      3 // 0: //   n = 0  i3-1  (2, 49-n) // =   m = 50-i3  49 (m * (m-1) / 2) offset += (19600 - ( ((49-i3)*(50-i3)*(99-2*i3)) / 6 - ((49-i3)*(50-i3)) / 2 ) / 2 ); // 1: //   n = i3+1  i4-1  (1, 49-n) offset += (((48-i3)*(49-i3) - (49-i4)*(50-i4)) / 2); // 2: //   n = i4+1  i5-1 (1) offset += (i5 - i4 - 1); return offset; }

Afterword

In my poker simulator (Texas Hold'em), I wanted to pre-calculate and store the winning probabilities for all possible combinations of my hand cards (2 pieces) and all the flop cards (3 cards on the table). This corresponds to the division of the 52-set into 2, 3, 47 subsets.
Calculated and saved.
But when it came time to read the data from a file for a particular combination, I really didn’t want to calculate something for a long time or search in a gigabyte file. Now I just define the offset in the file and read directly what I need.

In general, I would divide combinatorial algorithms into such classes:

Generation of combinatorial objects in a single cycle (everything is simple, I gave examples);
The transition to the next (or previous) combinatorial object, knowing the previous one (a kind of forward / backward iterator, to put it in terms of C ++) - here you can mention the textbook T. I. Fedoriaeva, where the constant time complexity algorithm for permutations is given, and other examples in runet can be found, but only for permutations - and it would be interesting to see similar algorithms for other combinatorial objects;
Finding the index of the object. There are no examples at all, except for Fedoryaeva’s allowance, with linear complexity, and only for permutation;
Finding an object by index.

It would be interesting to get a reference book on combinatorial algorithms for all combinatorial objects, including permutations, combinations, placements, splitting into subsets, combinations with repetitions, placing with repetitions.

Source: https://habr.com/ru/post/247807/

All Articles