Road to C++ Programmer #15

In the article on container adaptors, Road to C++ Programmer #12 - Container Adaptors, we briefly touched on the concept of the heap data structure. Now that we have covered various tree structures, we are ready to discuss heaps, which are used to implement priority queues. (The heap as a data structure is not related to the heap memory location.)

Heaps

A simple BST requires that nodes in the left subtree must always be smaller and nodes in the right subtree must always be larger than the parent node, which assists search operations by narrowing down the candidate nodes with each comparison. However, sometimes you might only need to find the minimum or maximum value, as in a priority queue. In such cases, we can use a min or max heap, which allows us to retrieve the minimum or maximum element in $O(1)$ time.

For a binary tree to qualify as a heap, it must follow two rules. First, the parent node must be smaller (for a min heap) or larger (for a max heap) than both its left and right child nodes, so that the root node always represents the minimum or maximum element. Second, when inserting a new element, we must ensure that all nodes at the lowest level are filled from left to right, which keeps the tree balanced.

#include <vector>
using namespace std;
 
class MinHeap {
    public:
        int size;
        int capacity;
        vector<int> heap;
 
        int parent(int i) {return (i - 1) / 2;};
        int left(int i) {return 2 * i + 1;};
        int right(int i) {return 2 * i + 2;};
 
    MinHeap(int capacity): capacity(capacity) {
        size = 0;
        heap.resize(capacity);
    };
};

The code above is for a min heap. Instead of using individual nodes, we typically use a vector for heaps, as this allows faster access to the minimum or maximum elements. While it’s possible to implement a BST with a vector, a heap does not require ordering of nodes at the same level, allowing us to simply add nodes from left to right. The parent, left, and right methods return the index of the parent, left child, and right child nodes, respectively, under the assumption that the tree is full except for the last level, where children are added from left to right.

Insertion

When inserting an element into the heap, we must consider two cases: when the heap has reached capacity and when the new element violates the heap property. In the first case, we simply resize the heap. In the second, we keep swapping the new element with its parent until it reaches the correct position. The following is the implementation of insertion for a min heap.

void MinHeap::insert(int n) {
    // Case 1: size
    if (size == capacity) {
        cout << "Heap is full. Please allocate more memory. " << endl;
        return;
    }
 
    // Inserting at the end
    int ind = size - 1;
    heap[ind] = n;
    size++;
 
    // Case 2: swapping
    while (ind != 0  && heap[parent(ind)] > heap[ind]) {
        swap(heap[ind], heap[parent(ind)]); // swap from std
        ind = parent(i);
    }
};

As shown, insertion is relatively simple for heaps. Since the while loop may travel up to the root of the tree, the insertion’s time complexity scales with the height of the heap, resulting in $O(\log(n))$ .

Deletion

Deletion in the context of heaps usually refers to extracting the root node, as heaps are often used to implement priority queues, where the highest-priority element is removed. When removing the root, we replace it with the last element in the heap to maintain the tree’s shape. However, this replacement may violate the heap property, which we fix by calling a heapify function (explained shortly). Below is the extractMin function, which removes the root node.

int MinHeap::extractMin() {
    if (size == 0) {
        cout << "This Heap is currently empty." << endl;
        return -1;
    }
 
    if (size == 1) {
        size--;
        return heap[0];
    }
 
    int root = heap[0];
    heap[0] = heap[size - 1];
    size--;
    heapify(0); // heapify method
    
    return root;
};

If the heap has 0 elements, we can simply print an error message and return -1. If there is only 1 element, we adjust the size and return the root. Otherwise, we replace the root with the last element and run the heapify method to restore the heap property. The heapify function is implemented below.

void MiniHeap::heapify(int i) {
    int l = left(i);
    int r = right(i);
    int smallest = i;
 
    // Identify the smallest among i, left and right children
    if (l < size && heap[l] < heap[smallest]) {
        smallest = l;
    }
    if (r < size && heap[r] < heap[smallest]) {
        smallest = r;
    }
 
    // If the smallest is not i, then we need to swap it with smallest and continue heapifing
    if (smallest != i) {
        swap(heap[i], heap[smallest]);
        heapify(smallest);
    }
};

The heapify function is also straightforward; it compares the input node with its children, swapping it with the smallest child if needed. If the input node is already the smallest, the heap property is satisfied. Otherwise, we recursively apply heapify until the violation is resolved. Since heapify may need to process nodes up to the height of the tree, both heapify and extractMin have a time complexity of $O(\log(n))$ .

Array to Heap

In some scenarios, we may want to build a heap directly from an existing array without using additional space. To do this, we can heapify all nodes (excluding leaf nodes, which don’t violate the heap property) in reverse order.

void MinHeap::fromArray(int arr[], int arr_size) {
    if (arr_size > capacity) {
        cout << "Size is bigger than the capacity." << endl;
        return;
    }
    size = arr_size;
    copy(arr, arr + arr_size, heap.begin()); // copy from std
 
    int lst = parent(arr_size); // last non-leaf node
    for (int i = lst; i >= 0; i--) {
        heapify(i);
    }
};

The code above shows the implementation for building a heap from an array. The fromArray function uses the copy function from the standard library to copy the array’s contents into the heap, then heapifies all non-leaf nodes in reverse order. The time complexity of this approach is $O(n)$ because it only iterates over non-leaf nodes, which scale linearly with the array’s size.

Heap Sort

The unique characteristic of a heap that allows $O(1)$ access to the minimum or maximum value is not only useful for implementing a priority queue but also for sorting arrays efficiently. In the article, Road to C Programmer #12 - Sorting Algorithms, we introduced various sorting algorithms and their complexities, including selection sort, which selects the minimum element from the right subarray and appends it to the left subarray. The complexity of selection sort was $O(n^2)$ , as it iterates over $n$ elements with a linear search for $n$ times.

Instead of linear search, we can convert the array into a min heap (using the fromArray method) and take advantage of the min heap by repeatedly popping the minimum element (using the extractMin method) from the right subarray in $O(\log(n))$ time. This approach is called heap sort, resulting in a time complexity of $O(n \log(n))$ . The time complexity of heap sort matches that of quicksort and is relatively efficient.

Conclusion

The heap achieves a time complexity of $O(\log(n))$ for both insertion and deletion, and a time complexity of $O(1)$ for accessing the minimum or maximum value, with relatively simple rules, making it suitable for applications like priority queues and heap sort. As a challenge, I would recommend implementing a max heap on your own.

Exercises

From this article, there will be an exercise section where you can test your understanding of the material introduced in the article. I highly recommend solving these questions by yourself after reading the main part of the article. You can click on each question to see its answer.

Resources

CoffeeBeforeArch. 2019. C++ Data Structures: Min-Heaps. YouTube.
Inside code. 2021. Heaps, heapsort, and priority queues - Inside code. YouTube.

Q1: Define the printHeap method that prints out the hierarchy of elements in the heap.

Q2: Define the heapSort function that performs heap sort on an array.