Parallel Extensions for .net 3.5

The number of cores in processors is growing year by year. But many programs still know how to use only one thing. In a small note I want to talk about the addition to the System.Threading library, which is called Parallel Extensions . This addition allows you to perform tasks on a high level on all available cores / processors.

This article is only a brief introductory overview of Parallel Extensions . Also at the end of the article you will find links to resources that reveal the topic in detail.

If interested, feel free to dive under the cat.

A bit of history

Parallel Extensions (PE) library - a joint project of the .net team and Microsoft Research - first saw the light of day on November 29, 2007. It is designed so that developers can use modern multi-core architectures without bothering with labor-intensive flow control. Programs written using the library automatically use all available system cores. If the program is run on an old single-core computer, then execution will occur sequentially, with almost no loss in performance. Thus, the use of PE reveals all the advantages of multi-core technologies, while maintaining performance on single-core systems.
')
The last update of the library was in June 2008. Now it has the status of Community Technology Preview and, most likely, will be included in version 4.net.

Library Composition

PE consists of three main components:

Task Parallel Library (TPL) - provides such imperative methods as Parallel.For , Parallel.Foreach and Parallel.Invoke to perform parallel computing. All work on creating and terminating threads, depending on the processors available, is performed by the library automatically.
Parallel LINQ (PLINQ) is an add-on over LINQ to Objects and LINQ to XML that allows you to perform parallel queries. In most cases, it is enough to write AsParallel() at the beginning of the request so that all subsequent operators are executed in parallel. Internally uses TPL.
Coordination Data Structures (CDS) is a set of structures used to synchronize and coordinate the execution of parallel tasks. We will not consider it in this article.

I want to warn you that this library does not resolve issues related to streaming security of objects. If you are using non- ThreadSafe objects, then you must ensure that the object is used only by one thread.

What do we do

As an example, I propose to take the search for prime numbers from 2 to a given limit. We will check all the numbers from this interval using the following static method:

public class Prime { /// <summary> /// Determines whether the specified candidate is prime. /// </summary> /// <param name="candidate">The candidate.</param> /// <returns> /// <c>true</c> if the specified candidate is prime; otherwise, <c>false</c>. /// </returns> public static bool IsPrime( int candidate) { for ( int d = 2; d <= candidate/2; d++) { if (candidate%d == 0) return false ; } return candidate > 1; } } * This source code was highlighted with Source Code Highlighter .

For convenience, we will create an interface:

interface IPrimeFinder { /// <summary> /// Finds primes up to the specified limit. /// </summary> /// <param name="limit">The limit.</param> /// <returns>List of primes</returns> List < int > Find ( int limit ); } * This source code was highlighted with Source Code Highlighter .

Reference calculation

As a reference, we will consider a simple for loop from 2 to limit :

public List < int > Find( int limit) { var result = new List < int >(); for ( int i = 2; i < limit; i++) { if (Prime.IsPrime(i)) result.Add(i); } return result; } * This source code was highlighted with Source Code Highlighter .

If we run our program, we will see something like the following schedule for CPU usage when searching for all prime numbers up to 200,000:

CPU load

The processor only works in half power. Let's try to fix it.

Parallel.For

The big advantage of this library is that to create a multi-threaded code, a minimum of changes are required. In our case, the Parallel.For() method takes three parameters: the beginning of the interval, the end, and the delegate to execute. Note that the modification does not affect the body of the loop:

public List < int > Find( int limit) { var result = new List < int >(); Parallel.For(2, //start from limit, //up to i => //action variable { if (Prime.IsPrime(i)) result.Add(i); }); return result; } * This source code was highlighted with Source Code Highlighter .

The changes are minimal, but let's look at the CPU usage schedule:

Full cpu load

Both cores are used. And the execution time was reduced almost twice (one cell - 5 seconds). It is worth noting that List<T> allows you to add elements in different threads. But if we tried to go through the list using foreach, we would get an error. All work with synchronization and lies on the user.

AsParallel ()

LINQ has become one of my favorite innovations in .net. It allows you to reduce many designs to one line. This is how our example written with LINQ will look like:

Enumerable.Range ( 2, limit - 1 ).Where ( Prime.IsPrime ).ToList ( ); * This source code was highlighted with Source Code Highlighter .

Briefly and clearly. In order to make this query parallel, you need to add just one word AsParallel() :

Enumerable.Range(2, limit - 1).AsParallel().Where(Prime.IsPrime).ToList(); * This source code was highlighted with Source Code Highlighter .

All that is written after AsParallel() will be executed in parallel threads using all processors. As can be seen in the case of LINQ, no changes affecting the content of the source code were required. Moreover, this option is more secure in terms of threads: in this case, PLINQ itself synchronizes execution. You can also use aggregate functions.

Parallel.Invoke ()

Parallel.Invoke() is mainly used to perform versatile tasks. In our version, we divide all the numbers from the range into 10 parts and perform the calculation in the form of parallel tasks. To do this, create an additional helper method that takes as its arguments the upper bound, the number of subranges and the number of the subrange to be calculated:

private static List < int > CheckRange( int n, int limit, int factor) { var local_result = new List < int >(); for ( int i = n*(limit/factor); i < (n + 1)*(limit/factor); i++) { if (Prime.IsPrime(i)) local_result.Add(i); } return local_result; } * This source code was highlighted with Source Code Highlighter .

As an argument, Parallel.Invoke() in our case takes an array of delegates. The new code will look like this:

public List < int > Find( int limit) { var result = new List < int >(); Parallel.Invoke(() => result.AddRange(CheckRange(0, limit, 10)), () => result.AddRange(CheckRange(1, limit, 10)), () => result.AddRange(CheckRange(2, limit, 10)), () => result.AddRange(CheckRange(3, limit, 10)), () => result.AddRange(CheckRange(4, limit, 10)), () => result.AddRange(CheckRange(5, limit, 10)), () => result.AddRange(CheckRange(6, limit, 10)), () => result.AddRange(CheckRange(7, limit, 10)), () => result.AddRange(CheckRange(8, limit, 10)), () => result.AddRange(CheckRange(9, limit, 10))); return result; } * This source code was highlighted with Source Code Highlighter .

Of course, we have greatly changed the code, but this is not a typical situation.

Some statistics

For clarity, here is the speed chart:

Execution time

And the CPU usage schedule (stretched a bit):

CPU load

On the load graph, patterns from three methods using parallel computations and one reference are clearly visible.

Even with a small amount of computation, when the processors are not fully loaded, the use of PE gives almost a twofold increase in speed. There is a small overhead when you first access the library. The choice of the type of parallel computing has practically no effect on speed.