Road to Haskeller #26

Parallelism and Determinism

So far, we have been manually creating threads that run by the Haskell Execution Context (HEC) in a non-deterministic order. Due to this non-deterministic parallelism, we needed to set up communication and synchronization with mutexes, channels, semaphores, and STM to ensure that conflicts never occur on any system. However, the number of variables can quickly grow as the program becomes more complex.

Instead, we can use deterministic parallelism, where we specify the order of program execution to avoid conflicts without communication and synchronization. Then, we can delegate the task of creating appropriate threads to the runtime system (RTS) and expect threads to run in a deterministic order. The unit of computation we want RTS to convert into a thread is called a spark, and we can program the strategy of how the sparks are spawned in our Haskell code so that the RTS can handle them and convert them into threads appropriately.

Strategies

In the Haskell module, Control.Parallel.Strategies, a strategy for spawning sparks is expressed using Strategy, which is a monadic function that turns an expression into an Eval monad containing the evaluation after converting a spark.

-- you need to install the module using the command 
-- "cabal install parallel" in the command line
import Control.Parallel.Strategies
 
type Strategy a = a -> Eval a

It is built as a monad to make Strategy composable (able to build strategies by combining smaller strategies) and to ensure it clearly separates side effects like IO, just like STM. Depending on the strategy, the way of turning a into Eval a differs. The following are strategies available in the module.

-- spawn spark that can be evaluated in parallel
rpar :: Strategy a
 
-- spawn spark that needs to be evaluated into WHNF before the following sparks are spawned
rseq :: Strategy a
 
-- spawn spart that needs to be evaluated into NF before the following sparks are spawned
rdeepseq :: Strategy a
 
-- get the evaluation from Eval
runEval :: Eval a -> a
 
-- Usage
runEval $ do
    a <- rpar (f a) -- can be evaluated in parallel with return
    b <- rseq (f b) -- needs to be evaluated into WHNF before return
    return (a, b)

While rpar (run in parallel) allows the evaluation of f a spark to be performed in parallel with other sparks, rseq (run sequentially) forces evaluation of f b spark into WHNF to be performed in a sequential manner. This causes return (a, b) to run only after b is evaluated but not necessarily after a is evaluated. If there is a race condition, we can use rseq or rdeepseq to get the evaluations. We can see how using strategies helps avoid setting up communication and synchronization in f.

Examples

Let’s look at some examples of how strategies for spawning sparks can be composed to understand the mechanisms.

rparPair :: Strategy (a, b)
rparPair (a, b) = do
    a' <- rpar a
    b' <- rpar b
    return (a', b')
 
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
 
main :: IO ()
main = do
    let (a, b) = runEval . rparPair $ (fib 20, fib 30)
    putStrLn $ "Tuple: " ++ show (a, b)

The above example illustrates how we can create a new strategy for tuples using predefined strategies. We can also create a higher-order strategy that can take any pair of strategies and apply it like below.

evalPair :: Strategy a -> Strategy b -> Strategy (a, b)
rparPair sa sb (a, b) = do
    a' <- sa a
    b' <- sa b
    return (a', b')
 
main :: IO ()
main = do
    let (a, b) = runEval . evalPair $ rseq rseq (fib 20, fib 30)
    putStrLn $ "Tuple: " ++ show (a, b)

We can make it more readable using the syntax, using, which is equivalent to runEval (s x).

-- using :: a -> Strategy a -> a
-- x `using` s = runEval (s x)
 
main :: IO ()
main = do
    let (a, b) = (fib 20, fib 30) `using` evalPair rseq rseq 
    putStrLn $ "Tuple: " ++ show (a, b)

When working with lists, we often use map to apply functions to a list and create a new list. By default, it applies the function and pushes it into the buffer sequentially. However, using parBuffer, you can evaluate n elements and push them into the buffer in parallel.

parBuffer :: Int -> Strategy a -> Strategy [a]
 
doubleList :: Num a => [a] -> [a]
doubleList xs = map (\x -> 2*x) xs `using` parBuffer 100 rseq
-- using rseq to evaluate each element in the chunks in WHNF
-- `using` parBuffer 100 rseq is the only part for parallelization!!
 
main :: IO ()
main = do
    let xs = doubleList [1..1000]
    let sum = foldl' (+) xs
    putStrLn $ "Sum: " ++ show sum

There are many other functions regarding strategies in the module, so I recommend checking out the official documentation if you are interested. At first, it might not seem intuitive, but you will begin to understand the concept as you continue experimenting with it and realize the simplicity of achieving parallelism using strategies.

Spark Lifecycle

Now that we understand how to set up strategies for spawning sparks, let’s try to understand how sparks spawned by our program are handled and converted into threads by the RTS. First, RTS judges whether a spark should be created as specified by our program. A spark is not worth creating if the expression is already evaluated to WHNF or if the number of sparks waiting for evaluation in the spark pool exceeds the limit. The sparks in those situations are called dud and overflow respectively and will not be created.

After creating sparks successfully, RTS tries to convert the sparks into threads. However, they need to check if the created spark is already computed somewhere else or if converting the spark is necessary, given the laziness. The sparks in those situations are called fizzled and are garbage collected (GC'd) and will not be converted. After converting sparks successfully, HEC will run them and return the evaluations. The diagram above explains the lifecycle of sparks.

Disadvantage of Sparks

While it may be tempting to always set up a strategy for spawning sparks for parallelism due to its simplicity and ease of use, there are some disadvantages. As it is easy to spawn sparks, you may end up spawning so many sparks that most of them end up being overflow, fizzled, or GC'd. This can make the program significantly slower than a single-threaded program.

Additionally, converting a spark into a thread by RTS takes more time (~20ms) than manually creating a thread. Hence, you are essentially sacrificing speed in exchange for convenience in certain cases. To properly choose the best solution for parallelism in your particular scenario, you should always carefully implement and measure all the solutions in the pruduction environment.

Exercises

This is an exercise section where you can test your understanding of the material introduced in the article. I highly recommend solving these questions by yourself after reading the main part of the article. You can click on each question to see its answer.

Resources

Philipp, Hagenlocher. 2020. Haskell for Imperative Programmers #33 - Parallelism. YouTube.