Parallel fast sorting on Haskell and how hard it turned out to write

Note Perev .: This is a translation of the story of how difficult it was to write a parallel quick sort (quicksort) on Haskell. The original article was written in 2010, but it seems to me that it is still instructive and in many ways relevant.

There are many examples of how Haskel makes simple problems difficult. Probably the most famous of them is the sieve of Eratosthenes, which is easy to write in any imperative language, but it is so difficult to write on Haskell that almost all the decisions that have been taught at universities and used in research over the past 18 years have been wrong. Melissa O'Neill drew attention to their inconsistency in her important scientific work The Real Sieve of Eratosthenes . It provides an excellent description of what is wrong with the old approaches, and how to correct them. Melissa’s decision was to use the priority queue to implement the sieve. The correct solution turned out to be 10 times longer than a much simpler solution on F # and as much as 100 times longer than the original disfigured Haskell algorithm.

Today, quick sorting is a new sieve of Eratosthenes. And the programmers on Haskell again bypassed the inability of the language to express this algorithm by disfiguring the latter . The new version is slower by orders of magnitude, but it can easily be written on Haskell.

qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

This code is completely at odds with the essence of the present quick sort algorithm, which makes it so efficient (see Tony Hoare's original quick sorting article in 1962 ). Namely, rearranging the array without additional memory allocation [in-place partitioning using swaps].
')
Faced with the challenge of writing a general purpose parallel sorting on Haskell, Jim Apple (who writes Haskell's Ph.D. at the University of California, Davis, UC Davis) launched the case by writing the following code:

 import Data.HashTable as H import Data.Array.IO import Control.Parallel.Strategies import Control.Monad import System exch air = do tmpi <- readArray ai tmpr <- readArray ar writeArray ai tmpr writeArray ai tmpi bool abc = if c then a else b quicksort arr lr = if r <= l then return () else do i <- loop (l-1) r =<< readArray arr r exch arr ir withStrategy rpar $ quicksort arr l (i-1) quicksort arr (i+1) r where loop ijv = do (i', j') <- liftM2 (,) (find (>=v) (+1) (i+1)) (find (<=v) (subtract 1) (j-1)) if (i' < j') then exch arr i' j' >> loop i' j' v else return i' find pfi = if i == l then return i else bool (return i) (find pf (fi)) . p =<< readArray arr i main = do [testSize] <- fmap (fmap read) getArgs arr <- testPar testSize ans <- readArray arr (testSize `div` 2) print ans testPar testSize = do x <- testArray testSize quicksort x 0 (testSize - 1) return x testArray :: Int -> IO (IOArray Int Double) testArray testSize = do ans <- newListArray (0,testSize-1) [fromIntegral $ H.hashString $ show i | i <- [1..testSize]] return ans

This algorithm uses parallel Haskell " strategies ". This concept was designed to give programmers on Haskell more control over parallelization, but it turned out that in the only available implementation memory flows and nobody managed to get it to work in this code: Jim's solution contains a multithreading error [concurrency], due to which returns incorrect results in almost every call.

Then Peaker proposed his solution on Haskell:

 import Data.Array.IO import Control.Monad import Control.Concurrent bool t _f True = t bool _t f False = f swap arr ij = do (iv, jv) <- liftM2 (,) (readArray arr i) (readArray arr j) writeArray arr i jv writeArray arr j iv parallel fg bg = do m <- newEmptyMVar forkIO (bg >> putMVar m ()) fg >> takeMVar m sort arr left right = when (left < right) $ do pivot <- read right loop pivot left (right - 1) (left - 1) right where read = readArray arr sw = swap arr find n pred i = bool (find n pred (ni)) (return i) . pred i =<< read i move op di pivot = bool (return op) (sw (d op) i >> return (d op)) =<< liftM (/=pivot) (read i) loop pivot oi oj op oq = do i <- find (+1) (const (<pivot)) oi j <- find (subtract 1) (\idx cell -> cell>pivot && idx/=left) oj if i < j then do sw ij p <- move op (+1) i pivot q <- move oq (subtract 1) j pivot loop pivot (i + 1) (j - 1) pq else do sw i right forM_ (zip [left..op-1] [i-1,i-2..]) $ uncurry sw forM_ (zip [right-1,right-2..oq+1] [i+1..]) $ uncurry sw let ni = if left >= op then i + 1 else right + i - oq nj = if right-1 <= oq then i - 1 else left + i - op let thresh = 1024 strat = if nj - left < thresh || right - ni < thresh then (>>) else parallel sort arr left nj `strat` sort arr ni right main = do arr <- newListArray (0, 5) [3,1,7,2,4,8] getElems arr >>= print sort (arr :: IOArray Int Int) 0 5 getElems arr >>= print

This solution also turned out to be bugs. First, it contains a more subtle bug of multithreading [concurrency], which leads to incorrect results only occasionally. Picker fixed this bug in the following code:

 import System.Time import System.Random import Data.Array.IO import Control.Monad import Control.Concurrent import Control.Exception import qualified Data.List as L bool t _ True = t bool _ f False = f swap arr ij = do (iv, jv) <- liftM2 (,) (readArray arr i) (readArray arr j) writeArray arr i jv writeArray arr j iv background task = do m <- newEmptyMVar forkIO (task >>= putMVar m) return $ takeMVar m parallel fg bg = do wait <- background bg fg >> wait sort arr left right = when (left < right) $ do pivot <- read right loop pivot left (right - 1) (left - 1) right where read = readArray arr sw = swap arr find n pred i = bool (find n pred (ni)) (return i) . pred i =<< read i move op di pivot = bool (return op) (sw (d op) i >> return (d op)) =<< liftM (/=pivot) (read i) swapRange px x nx y ny = if px x then sw xy >> swapRange px (nx x) nx (ny y) ny else return y loop pivot oi oj op oq = do i <- find (+1) (const (<pivot)) oi j <- find (subtract 1) (\idx cell -> cell>pivot && idx/=left) oj if i < j then do sw ij p <- move op (+1) i pivot q <- move oq (subtract 1) j pivot loop pivot (i + 1) (j - 1) pq else do sw i right nj <- swapRange (<op) left (+1) (i-1) (subtract 1) ni <- swapRange (>oq) (right-1) (subtract 1) (i+1) (+1) let thresh = 1024000 strat = if nj - left < thresh || right - ni < thresh then (>>) else parallel sort arr left nj `strat` sort arr ni right timed act = do TOD beforeSec beforeUSec <- getClockTime x <- act TOD afterSec afterUSec <- getClockTime return (fromIntegral (afterSec - beforeSec) + fromIntegral (afterUSec - beforeUSec) / 1000000000000, x) main = do let n = 1000000 putStrLn "Making rands" arr <- newListArray (0, n-1) =<< replicateM n (randomRIO (0, 1000000) >>= evaluate) elems <- getElems arr putStrLn "Now starting sort" (timing, _) <- timed $ sort (arr :: IOArray Int Int) 0 (n-1) print . (L.sort elems ==) =<< getElems arr putStrLn $ "Sort took " ++ show timing ++ " seconds"

This solution really works on small input arrays, but increasing the size of the array to 1,000,000 elements leads to stack overflow. Two attempts were made to analyze this bug, here and here , but both turned out to be wrong. In fact, this is a bug in the getElems function of the standard Haskell library, which overflows the stack on large arrays.

Oddly enough, fixing a few more bugs, apparently, led to the implementation of the world's first parallel general sorting, written in Haskell. Moreover, the final solution on Haskell is only about 55% slower than the equivalent solution on F #. Be careful, this solution requires the latest version of GHC, which was released a few weeks ago ( comment re: article 2010, so the reader has nothing to worry about ).

First comments on the original article.

Ganesh Sittampalam:

Congratulations on learning how to do fork and synchronize in Haskell!

Jon Harrop (original author):

Congratulations on checking your theory that it will be "trivial" ...

Source: https://habr.com/ru/post/317348/

All Articles

Parallel fast sorting on Haskell and how hard it turned out to write

First comments on the original article.

More articles: