Road to C Programmer #8

Memory Fragmentation in Heap

While the stack can guard against memory fragmentation by knowing the types of variables and when they will be deleted, the heap cannot do so because we are not sure when dynamically allocated memory will be freed up. This can make it difficult to fit large data contiguously. So the question becomes: how can we store a large array in a fragmented heap?

Linked List

The solution is a linked list. We can assign a pointer pointing to the next value of an array to each element of an array so that we can fit data separately in a fragmented heap. When we want to move to the next element, we simply move to the address that the current element is pointing to, instead of moving to the adjacent address. We can program this using a struct.

typedef struct {
    int data;
    struct Node *next;
} Node;
 
Node *head = NULL;

We call an individual element of a linked list a Node. When we define a linked list like this, we can store each node in a separate place.

Insertion and Deletion

Not only that the linked list can solve the problem of fragmented memory in heap, but it also excels at insertion and deletion of an element compared to an array.

When inserting an element in the middle of an array, you need to shift every element to the right of the index where we want to insert the new element. Similarly, when deleting an element in the middle of an array, you need to shift every element to the right of the index where we want to delete the element. This is not a big deal when the array is small, but it starts to become problematic as the array grows larger.

However, when we want to insert an element into a linked list, you just need to take the node at that index, make it point to a new node, and make the new node point to where the original node was previously pointing. Similarly, when deleting a node, we just need to change the node previously pointing to the node we want to delete to point to the next node. There is no shifting of elements involved in a linked list, which makes it a superior data structure.

Code Implementation

Let's start with printing the linked list.

// Display a linked list
void printLinkedList (Node *head) {
    Node *current = head;
    while (current != NULL) {
        printf("%d -> ", current -> data);
        current = current -> next;
    }
    printf("\n");
    return;
}

You have to traverse the linked list using pointers as shown above. We can set up a function for prepending a node to the head of the list like this:

// Prepend a node to a linked list
Node *prependNode(Node *head, int data) {
    Node *newNode = malloc(sizeof(Node));
 
    // Handle malloc failure
    if (newNode == NULL) {
        return NULL;
    }
    newNode->data = data;
 
    if (head == NULL) {
        head = newNode;
        newNode->next = NULL;
    } else {
        newNode->next = head;
        head = newNode;
    }
 
    return head;
}

We can also write functions for insertion and deletion, as discussed earlier.

// Insert a node into a linked list
int insertNode(Node *head, int data, int pos) {
    // Set up a new node
    Node *newNode = malloc(sizeof(Node));
    if (newNode == NULL) {
        return 0;
    }
    newNode->data = data;
 
    // Traverse the list until we find the specified position
    Node *current = head;
    while (current != NULL && pos != 1) {
        current = current->next;
        pos--;
    }
 
    // If the specified position was not found, return 0
    if (pos != 1 || current == NULL) {
        free(newNode);
        return 0;
    }
 
    // Change the pointers
    newNode->next = current->next;
    current->next = newNode;
    return 1;
}
 
// Remove a node from a linked list
int removeNode(Node **head, int pos) {
    Node *current = *head;
    Node *prev = NULL;
 
    // If removing head, set head to the next node
    if (pos == 0) {
        if (current == NULL) {
            return 0;
        }
        *head = current->next;
        free(current);
        return 1;
    }
 
    // Traverse the list until we find the specified position
    while (current != NULL && pos != 0) {
        prev = current;
        current = current->next;
        pos--;
    }
 
    // If the specified position was not found, return 0
    if (pos != 0 || current == NULL) {
        return 0;
    }
 
    // Set the pointer of the previous node to the next node
    prev->next = current->next;
    free(current);
    return 1;
}

Finally, we must not forget to set up the cleanup function.

void freeNodes(Node *head) {
    Node *current = head;
    while (current != NULL) {
        Node *cleaned = current;
        current = current->next;
        free(cleaned);
    }
    return;
}

Always Be Critical

If you read the above and thought a linked list is a great data structure, you might want to be more critical while reading something. When I told you a linked list is superior to an array in certain situations, I intentionally left out an important and obvious disadvantage of a linked list: it needs significantly more memory to store data because of pointers, and I did not use any measurement to prove that it is faster to perform insertion and deletion.

In reality, a linked list tends to be significantly slower than an array because a linked list requires us to perform a linear search when traversing for insertion and deletion, while an array doesn't. Bjarne Stroustrup, inventor of C++, points out that the linear search for linked lists dominates completely and is much worse than shifting elements of an array. Moreover, because a linked list can be stored in a fragmented heap, you have a higher risk of a cache miss, which can make any operation slower.

ArrayList

Given that, we can create a new structure called ArrayList, which stores a pointer to the array and information like the length and available space for the array, allowing it to reallocate when capacity is reached. This allows us to store the array contiguously and increase the chance of a cache hit while reallocating when necessary to avoid overwriting memory that it shouldn't write to.

typedef struct {
    int *items;
    int capacity;
    int length;
} ArrayList;

Having access to the length is great as we can avoid accessing values out of range, and we often want to know the length of the array for iterations. This was not possible with an array by itself.

Again, Be Critical

The ArrayList is better theoretically, but it does not mean that it works better in all cases. There is no silver bullet in this world. There will be some cases where a linked list is more performant. When you work on real projects, make sure you benchmark multiple solutions and choose the right data structure with sufficient evidence.

[Fun Fact] Arrays in Python

In scripting languages like Python and JavaScript, instead of compiling the code into machine code, they run scripts interpreters to simulate the desired behavior written in the scripts. Hence, when you declare a list in a script, the interpreters will generate a data structure behind the scenes to simulate the behavior of lists.

In Python, when you declare a list, it generates an array of object pointers instead of creating an array or array list. Hence, when you access an element of a list, it first looks for the pointer stored in that index and prints out the object it is pointing to. This is why a list in Python can store objects of different types. JavaScript has a very interesting way of simulating a list, but that is something for a future article.

Exercises

From this article, there will be an exercise section where you can test your understanding of the material introduced in the article. I highly recommend solving these questions by yourself after reading the main part of the article. You can click on each question to see its answer.

Resources

Alessandro Stamatto. 2012. Bjarne Stroustrup: Why you should avoid Linked Lists. YouTube.
Core Dumped. 2024. ARRAYLIST VS LINKEDLIST. YouTube.

Q1: Define a function push that pushes an element to an ArrayList.