Road to C Programmer #13 - Multithreading

In English, the words "concurrency" and "parallelism" are often used interchangeably, but they mean different things in the programming world, which can confuse many beginner programmers. I was one of those programmers until I heard a beautiful quote from Rob Pike (2016) that perfectly summarizes the difference between them:

Concurrency is about dealing with lot of things at once, Parallelism is about doing a lot of things at once.

While we need multiple processors to run programs simultaneously for the execution to be parallel, we can have a single or multiple processor(s) running multiple bits of programs for the execution to be concurrent. The importance of this distinction and their benefits will hopefully become clearer as we discuss how to accomplish them.

Multithreading

To accomplish either concurrent or parallel programming, programmers need to consider how to split a process into multiple subprocesses (threads) and merge the results from these processes. This is what multithreaded programming is about: coming up with ways to create multiple threads and orchestrate them to successfully execute a program, allowing it to run either concurrently or in parallel (depending on the program and resources available). Let’s look at the simplest example of multithreading in C.

print.c

#include <stdio.h>
#include <pthread.h>
 
void *print (void *input) {
    int *value = (int *) (input);
    for (int i = 0; i < 1000000000; i++) {}
    printf("%d\n", *value);
    return NULL;
}
 
int main () {
    pthread_t thread1;
    pthread_t thread2;
 
    int value1 = 3;
    int value2 = 5;
 
    pthread_create(&thread1, NULL, print, (void *) &value1);
    pthread_create(&thread2, NULL, print, (void *) &value2);
 
    pthread_join(thread1, NULL);
    pthread_join(thread2, NULL);
 
    return 0;
}

The above code creates two threads, thread1 and thread2, running the print function using the pthread_create function and joins the threads back to the main thread after execution using the pthread_join function. The pthread_create function takes four arguments: a pthread_t structure, the attributes of the threads (such as the stack size and address for the thread; NULL means default attributes), the function to run in the thread, and the parameters of the function.

By joining the results after creating the two threads, they can run in parallel if the computational resources are available. When I measured the time for the execution of the above program by compiling it with gcc -o print print.c and using time ./print, I got around 1.3 seconds. You should try this yourself as well. You can see the improvement in speed by measuring the time of execution for the code below.

print.c

int main () {
    pthread_t thread1;
    pthread_t thread2;
 
    int value1 = 3;
    int value2 = 5;
 
    pthread_create(&thread1, NULL, print, (void *) &value1);
    pthread_join(thread1, NULL);
 
    pthread_create(&thread2, NULL, print, (void *) &value2);
    pthread_join(thread2, NULL);
 
    return 0;
}

The only change from the previous code is the timing of joining. Because the first thread is joined before the creation of the second thread, they can still run concurrently but in a sequential manner, not in parallel. I got around 2 seconds when measuring the execution of the above. Again, you should try it yourself to see the impact of parallel execution.

Race Condition

There are times when we want to manipulate the same variable across multiple threads, as in the following example.

int value;
 
void *increment () {
    for (int i = 0; i < 1000000000; i++) {
        value += i;
    }
    return NULL;
}
 
int main () {
    pthread_t thread1;
    pthread_t thread2;
 
    value = 0;
 
    pthread_create(&thread1, NULL, increment, NULL);
    pthread_create(&thread2, NULL, increment, NULL);
 
    pthread_join(thread1, NULL);
    pthread_join(thread2, NULL);
 
    printf("Result: %d\n", value);
 
    return 0;
}

Both threads are trying to add i 1 billion times, which should result in 1808348672 after the execution of both threads if they were successful. However, running this will result in a random number less than 1808348672. The reason is that both threads try to access, increment, and store the value at the same time, which can cause one thread to overwrite the increment made by another.

Let’s think about a concrete example of what this means. Suppose the value is set to 0. Both threads simultaneously access the value and get 0. For some reason, thread 1 was faster in performing this operation and successfully stored 1 back to the value. However, thread 2 does not know that thread 1 has already incremented the value from 0 to 1, so it tries to increment the value from 0 to 1 and store the result back to the value, instead of incrementing the value from 1 to 2.

When the success of the execution of a program depends on the timing of the execution, it is said that a race condition has occurred. The above is a typical example of a race condition, where multiple threads are trying to access the same variables.

Mutex

One way of dealing with a race condition is to use a mutex, which allows us to restrict operations to one thread at a time. Conceptually, a mutex works like a key that a thread can reach first to lock other threads out of the processor until the thread finishes its execution and unlocks the processor. By doing this, we can ensure that the shared variable is updated one at a time.

int value;
 
pthread_mutex_t mutex;
 
void *increment () {
    for (int i = 0; i < 1000000000; i++) {
        pthread_mutex_lock(&mutex);
        value += i;
        pthread_mutex_unlock(&mutex);
    }
    return NULL;
}
 
int main () {
    pthread_t thread1;
    pthread_t thread2;
 
    value = 0;
 
    pthread_mutex_init(&mutex, NULL);
 
    pthread_create(&thread1, NULL, increment, NULL);
    pthread_create(&thread2, NULL, increment, NULL);
 
    pthread_join(thread1, NULL);
    pthread_join(thread2, NULL);
 
    pthread_mutex_destroy(&mutex);
 
    printf("Result: %d\n", value);
 
    return 0;
}

We create a pthread_mutex_t variable, which we initialize and destroy before and after the execution of the threads. Then, we place pthread_mutex_lock and pthread_mutex_unlock where we want the thread to lock and unlock the processor. When you run the above, you are guaranteed to get 1808348672 every time.

However, you might also notice that the algorithm is now significantly slower. This is because none of the operations are parallel, and there is an additional 2 billion locking and unlocking processes on top of the actual program. (It is even slower than just running the for loop twice without multithreading. You can test that out yourself.) So, what’s the point of all this? If you think carefully about the problem, you might notice that addition is associative, meaning the order of addition does not matter. Hence, we can increment local variables in each thread simultaneously, and then update the shared variable one at a time.

void *increment () {
    int local = 0;
    for (int i = 0; i < 1000000000; i++) {
        local += i;
    }
    pthread_mutex_lock(&mutex);
        value += local;
    pthread_mutex_unlock(&mutex);
    return NULL;
}

On my computer, the previous algorithm took a whopping 24 seconds to run, while the updated version took only around 1.3 seconds. (Two for loops without multithreading took around 4 seconds.) Hence, mutithreading is great if and only if you carefully analyze your program and apply techniques appropriately. Also, there are many other functionalities around multithreading to handle different types of communication between the threads (condition variables, mutex barriers, detached threads, semaphores, etc.), which I might cover in future articles. (I highly recommend the video series from CodeVault, Unix Threads in C ,if you are interested. )

Is Non-Parallel Concurrency Always Bad?

The above example illustrates the importance of distinguishing between parallel and non-parallel concurrent executions, as one can run faster by utilizing multiple processors while the other can’t run faster and may even be slower than a single-threaded program. Then, should non-parallel concurrent execution always be avoided at all costs? I’m not an expert, but as far as I know, there are some cases where concurrency is still appreciated, such as in networking and logging, where the nature of the task inherently involves real-time monitoring. Also, it is especially essential for the operating systems as they need to run multiple applications at the same time with the limited number of computational resources.

[NOTE]: I covered semaphores and deadlocks in the article Road to Haskeller #24 - Semaphores. If you are following the Haskell series and want to learn more, I recommend checking it out.

Resources

Computerphile. 2019. Multithreading Code - Computerphile. YouTube.
gnbitcom. 2016. Concurrency is not Parallelism by Rob Pike. YouTube.
Portfolio Courses. 2022. Introduction To Threads (pthreads) | C Programming Tutorial. YouTube.