Road to C++ Programmer #11 - Unordered Associative Containers

Last Edited: 10/8/2024

The blog post introduces the associative containers like unordered sets and maps in C++.

C++ Set Map

In the last article, we covered some sequence containers, where elements are stored sequentially. In this article, we will cover unordered associative containers, where elements are not sorted like in traditional associative containers.

Unordered Sets

As the name suggests, an unordered set does not maintain order by default but still enforces uniqueness, meaning all elements must be unique.

#include <unordered_set>
 
int main() {
    
    unordered_set<int> s;
    s.insert(1);
    s.insert(3);
    s.insert(2);
    s.insert(2);
 
    cout << "s:";
    for (auto i: s) {
        cout << " " << i;
    }
    cout << endl;
    // => s: 1 3 2
 
    return 0;
}

As observed, the interface is very similar to set, but the order of elements is not preserved. By sacrificing ordering, unordered sets allow for faster insertions and look ups.

Unordered Maps

As the name suggests, an unordered map is a map without default key ordering.

#include <unordered_map>
 
int main () {
    unordered_map<string, int> m;
    m["a"] = 1;
    m["b"] = 2;
    m.insert(pair<string, int>("c", 3)); // "pair" is an object for key-value pair
 
    cout << "m:" << endl;
    for (auto i: s) {
        cout << " " << i.first << ": " << i.second << endl;
    }
    // => m: 
    //    a: 1
    //    b: 2
    //    c: 3
 
    return 0;
}

As seen, the interface is largely the same as with map, but the ordering of elements is not maintained. This trade-off allows for faster insertions and lookups.

Hash Tables

Since Binary Search Trees (BSTs) prioritize maintaining the order of elements, they are not the best option for implementing unordered sets and unordered maps. Instead, there is an alternative data structure that sacrifices ordering for faster lookups—hash tables. Accessing an array at a specific index is O(1)O(1), but the challenge is that we often don’t know the index of the element we want to look up. Hash tables solve this by using a hash function to convert a key into a number, then performing a modulo operation to map that number to an index.

C++ Hashing

The example above demonstrates a hash table. The hash function uses the ASCII table to convert a string into a number, then performs a modulo operation with a divisor of 11 to determine the index where the string will be stored. When looking up a key, we can pass the key to the same hash function to pinpoint the index where the key is stored. For unordered sets, we only store the keys, and for unordered maps, we store key-value pairs.

Collisions

Ideally, each key would map to a unique index, but in practice, it’s common for two keys to end up with the same index—this is called a collision. One simple way to handle collisions is to create a linked list at each index. When looking up a key, you use the hash function to find the index, then traverse the linked list to locate the element. In the worst case, the time complexity of looking up an element in a hash table is O(n)O(n), but this can be mitigated by using a better hash function and a larger divisor.

#include <iostream>
using namespace std;
 
class Node {
    public:
        int data;
        Node* next;
 
    Node(int data): data(data), next(nullptr) {};
};
 
class HashTable {
    public:
        Node* table[11];
 
        int insert(int data) {
            Node* new_node = new Node(data);
            int index = hash(data);
 
            if (table[index] == nullptr) {
                table[index] = new_node;
                return 1;
            };
 
            Node* p_current = table[index];
            while (p_current != nullptr) {
                if (p_current->data == data) {
                    return 0;
                }
                if (p_current->next == nullptr) {
                    p_current->next = new_node;
                    return 1;
                }
                p_current = p_current->next;
            }
            
            return 0;
        };
 
        void print() {
            for (int i = 0; i < 11; i++) {
                Node* p_current = table[i];
                if (p_current == nullptr) {
                    continue;
                }
                cout << "Index: " << i << ", Value: ";
                while (p_current != nullptr) {
                    cout << " -> " << p_current->data;
                    p_current = p_current->next;
                }
                cout << endl;
            }
        };
 
    HashTable() : divisor(11) {
        for (int i = 0; i < 11; i++) {
                table[i] = nullptr;
            }
    };
 
    ~HashTable() {
        for (int i = 0; i < 11; i++) {
            Node* p_current = table[i];
            while (p_current != nullptr) {
                Node* next_node = p_current->next;
                delete p_current;
                p_current = next_node;
            }
        }
    };
 
    private:
        int divisor;
 
        int hash(int data) {
            return data % divisor;
        };
};
 
int main () {
 
    HashTable ht;
    ht.insert(5);
    ht.insert(10);
    ht.insert(16);
    ht.insert(121);
    ht.print();
    // =>
    // Index: 0, Value:  -> 121
    // Index: 5, Value:  -> 5 -> 16
    // Index: 10, Value:  -> 10
 
    return 0;
}

The example above is a partial implementation of a hash table in C++. For simplicity, the keys are restricted to integers, and the hash function directly performs a modulo operation on the key with a divisor of 11. This demonstrates how unordered sets and maps can be built with hash tables. As a challenge, I recommend implementing search and delete functionality, or creating a class template with different hash functions and collision-handling strategies.

Conclusion

In this article, we introduced unordered associative containers available in the C++ Standard Library. Associative containers store keys in ascending order by default because they are implemented with BSTs, whereas unordered associative containers remove ordering for performance gains, as they are implemented with hash tables. However, it is important to note that hash tables can result in many collisions slowing down the time for insertions and lookups depending on the implementation.

Resources