Finding a Cartesian product using LINQ

Question: how to find the Cartesian product of an arbitrary number of sequences using LINQ?

For a start, let's make sure we know what this is about. I will denote sequences as ordered sets: {a, b, c, d...} Cartesian product of two sequences S1 and S2 is a sequence of all possible ordered pairs such that their first element is from S1 and the second is from S2. So, for example, if you have two sequences {a, b} and {x, y, z} , then their Cartesian product looks like {{a, x}, {a, y}, {a, z}, {b, x}, {b, y}, {b, z}} .

For simplicity, suppose that S1 and S2 consist of elements of the same type. Of course, we can define the Cartesian product of a sequence of lines with a sequence of numbers as a sequence of tuples (string, int), but later it will be difficult to generalize because the C # type system, in particular, does not work in the best way with tuples of arbitrary length.

LINQ has an operator specifically for calculating Cartesian products: in the “method syntax” it is SelectMany , in the “query syntax” it is a query with two “from” expressions:

var s1 = new [ ] { a, b } ; var s2 = new [ ] { x, y, z } ; var product = from first in s1 from second in s2 select new [ ] { first, second } ;

Of course, we can generalize the Cartesian product to more than two sequences. The Cartesian product of n sequences {S1, S2, ... , Sn} is the sequence of all possible n-element sequences that have the first element of S1, the second of S2, and so on.
')
This definition lacks a trivial case: what is the Cartesian product of zero sequences? Let it then consist of a sequence containing a single element: an empty sequence, that is, { { } } .

Note that this gives a logical definition of the Cartesian product of one sequence. The Cartesian product of a sequence containing a single sequence (say, {{a, b}} ) is the sequence of all possible single-element sequences, where the first (and only) element of {a, b} . Thus, the Cartesian product {{a, b}} is {{a}, {b}} .

With LINQ, you can create a Cartesian product of any number of sequences simply if you know how much you will work with :

var product = from first in s1 from second in s2 from third in s3 select new [ ] { first, second, third } ;

But what if you do not know at the stage of writing the code, how many sequences will you have? That is, how to write the function body:

public static IEnumerable < IEnumerable < T >> CartesianProduct < T > ( this IEnumerable < IEnumerable < T >> sequences )

Well, there is a reason to use induction; this idea almost never fails when working with recursively defined data structures.

If sequences contains zero sequences, the job is done: we simply return { { } } .

How to find the Cartesian product of two sequences, say again {a, b} and {x, y, z} ? We start by counting the Cartesian product of the first sequence. Let the inductive hypothesis be that we somehow did it, so we know the answer: {{a}, {b}} . How to connect {{a}, {b}} with {x, y, z} to get the total product?

Let us go back in search of inspiration to the original definition of the Cartesian product of two sequences. Cartesian product {{a}, {b}} and {x, y, z} are random {{{a}, x}, {{a}, y}, {{a}, z}, {{b}, x}, {{b}, y}, {{b},z}} , which is nonetheless seductively close to what we want to receive. We don't just want to find the Cartesian product {{a}, {b}} and {x, y, z} , creating a sequence containing {a} and x , but no, instead we need to calculate the Cartesian product by appending x to {a} to get {a, x} ! In other words, by concatenating {a} and {x} .

In terms of code: let us have an old Cartesian product, say {{a}, {b}} . We want to connect it with the sequence {x, y, z} :

var newProduct = from old in oldProduct from item in sequence select old. Concat ( new [ ] { item } } ;

And now we have a safe induction transition. If oldProduct is any Cartesian product, then we can calculate its combination with another sequence to get a new Cartesian product.

For every fireman: does this method work with induction backing? Yes. If we want to combine the Cartesian product { { } } with the sequence {{a}, {b}} , we merge { } with {a} and { } with {b} to get {{a}, {b}} .

Let's put it all together:

static IEnumerable < IEnumerable < T >> CartesianProduct < T > ( this IEnumerable < IEnumerable < T >> sequences ) { // : IEnumerable < IEnumerable < T >> result = new [ ] { Enumerable. Empty < T > ( ) } ; foreach ( var sequence in sequences ) { var s = sequence ; // // : SelectMany, result = from seq in result from item in s select seq. Concat ( new [ ] { item } ) ; } return result ; }

It works fine, but if you want, you can make a little more beautiful. Essentially we use a battery here. Consider an example easier, say, the summation of a list of integers. One way to do this is to say “Let the battery first be zero. A new battery is obtained from the old one by adding the current element to the old battery. ” If we have the starting value of the battery and in some way we can get the new value from the old and from the current sequence element, then we can use the useful Aggregate method. It takes the starting value of the battery and the function that takes the last value and the current item and returns the new value of the battery. At the output of the method - the total value of the battery.

In this case, we will start with an empty product as a battery, and each time we will “add” to it by combining the current sequence with the product of the previous ones. At each step of the algorithm, the accumulator is equal to the Cartesian product of the sequences already scanned.

static IEnumerable < IEnumerable < T >> CartesianProduct < T > ( this IEnumerable < IEnumerable < T >> sequences ) { IEnumerable < IEnumerable < T >> emptyProduct = new [ ] { Enumerable. Empty < T > ( ) } ; return sequences. Aggregate ( emptyProduct, ( accumulator, sequence ) => from accseq in accumulator from item in sequence select accseq. Concat ( new [ ] { item } ) ) ; }

And finally, a few words for those who understand. Remember, the result of a LINQ query is a query that will provide the results on demand, not the results directly . When we build this battery, we generally do not calculate the Cartesian product. We are building a large complex query that, when launched, will produce a Cartesian product. The request is built vigorously, but will be carried out lazily.

From translator

Eric walked around in his post the specific name of the idiom that he used, namely convolution . However, on this topic on Habrahabr there were already posts. The interested person can find an excellent translation “Catamorphism in F #” .

The same problem, much less verbose, but with the same algorithm, can be solved on F #. Instead of LINQ, list comprehensions will fit well into the code - one of the remarkable features traditionally inherent in functional languages. To achieve greater performance, it is necessary to stick an element to the list from the head, with an operator (::) . In this case, to achieve the classical form of the Cartesian product, the initial sequence must be turned over before work begins.

In sum, the convolution, pipelines and list expressions will give such a beautiful code:

let product seqs = List.rev seqs |> List.fold ( fun acc cur -> [ for seq in acc do for item in cur do yield item::seq]) [[]]

Source: https://habr.com/ru/post/99128/

All Articles

Finding a Cartesian product using LINQ

From translator

More articles: