Road to C++ Programmer #17 - Search Algorithms

Last Edited: 12/18/2024

The blog post introduces various search algorithms in C++.

Search

Searching is one of the most fundamental and significant operations in data structures, allowing us to find specific data within a collection of data. When we open Google and search for a website, log in to the website, or look at our purchase history, searching is involved every step of the way. I have mentioned some search algorithms already, but in this article, we will properly cover some basic search algorithms in arrays and graphs.

Searching in Array

The array is the most elementary data structure that stores a collection of data contiguously, and finding the best search algorithm for an array is critical in improving performance. Here, I introduce 2 search algorithms: Linear Search and Binary Search.

The most intuitive search algorithm is linear search, where we start from the leftmost element and keep moving to the right until we find the specified data. It can be written as a simple for loop that increments the index one by one and checks if the value stored in the index is the same as the specified data. (The reverse direction is also considered linear search.) The time complexity of linear search is O(n)O(n).

Linear search is hard to beat except when the array is sorted. The assumption that the array is sorted gives rise to a more efficient way of searching: binary search. Binary search starts from the middle and checks if the element is smaller or larger than the specified data. If the element is smaller, we can cut the search space into the left half since we already know that the right half is larger than the specified data, and vice versa. We continue cutting the search space in half until we find the specified data.

The binary search tree (BST) and its variants are tree data structures that aim to keep the elements organized so that the left subtree and right subtree are always smaller and larger than the parent, respectively. This ensures that binary search can be performed. As we computed in the article on BST, Road to C++ Programmer #10 - Associative Containers, the time complexity of binary search is O(log(n))O(log(n)), which is better than O(n)O(n). The benefit of using a tree structure is the better time complexity of insertion and deletion (O(log(n))O(log(n))) compared to an array (O(n)O(n)), but it takes more memory, and memory is not guaranteed to be contiguous.

Searching in Graph

A graph is a data structure consisting of vertices and pointers (or edges) pointing to other vertices. A linked list and a tree are special kinds of graphs with certain properties like hierarchy and a strict number of children. As graphs can flexibly represent any relationships (transportation, social networks, chemical structures, etc.), searching in graphs has tremendous significance.

Searching in a graph differs from searching in an array because a graph may have multiple or no paths to reach the destination vertex from the starting vertex. The objective is to find a path or an optimal path by traversing the graph. The multiple edges at each vertice require the algorithm to decide which edge to explore first or determine a strategy for choosing edges. In this article, we will cover two main strategies: Depth-First Search and Breadth-First Search.

The breadth-first search, or BFS for short, investigates all the vertices on the same level—meaning all the vertices that are the same distance from the starting vertex—before moving on to the vertices at the next level. Hence, we first check all the children, then move on to the grandchildren, and great-grandchildren, and so on until we find the destination vertex. We can imagine that this approach is guaranteed to terminate and will follow the shortest path to the destination vertex in a finite graph when visited vertices are appropriately tracked. The following is the implementation of BFS for AdjacencyList.

#include <queue>
#include <unordered_set>
 
using namespace std;
 
int BFS(AdjacencyList al, int start, int goal) {
    queue<int> q;
    unordered_set<int> explored;
    q.push(start);
    explored.insert(start);
    while (!q.empty()) {
        int front = q.front();
        cout << "To explore: " << front << endl;
        if (front == goal) {
            cout << "Found!" << endl;
            return 1;
        }
        for (list<int>::iterator it = al.li[front].begin(); it != al.li[front].end(); ++it) {
            int i = *it;
            if (explored.find(i) == explored.end()) {
                q.push(i);
                explored.insert(i);
            }
        }
        q.pop();
    }
    cout << "Not Found:(" << endl;
    return 0;
};
 
int main() {
    AdjacencyList al(5);
    al.addEdge(0, 1);
    al.addEdge(0, 2);
    al.addEdge(1, 3);
    al.addEdge(2, 4);
    BFS(al, 2, 6); // => Not Found:(
    BFS(al, 0, 4); // => Found!
    return 0;
}

To explore the vertices on the same level before moving to the next, we can utilize a queue that enforces FIFO. In cases where the graph is undirected or cyclic, we need to keep track of all the vertices already visited, which can be done using an unordered set. Since every vertice and every edge are explored in the worst case, the time complexity is O(V+E)O(|V| + |E|), where VV is the number of vertices and EE is the number of edges. The space complexity is O(V)O(V) because both the queue and the unordered set are expected to store all the vertices in the worst case.

Another approach is to explore one branch at a time until the destination vertex is reached. This is called depth-first search, or DFS. We can implement it using a stack instead of a queue to perform DFS. (The implementation of DFS for AdjacencyList is the question for the exercise.) The time and space complexities of DFS are O(V+E)O(|V| + |E|) and O(V)O(V), respectively. However, DFS tends to use less space than BFS because only the vertices on the current path are stored in the stack, whereas BFS stores all the vertices at the previous level in the queue. However, DFS does not guarantee termination or the shortest path to the destination vertex, making BFS the generally preferred option when space is abundant.

Conclusion

We have examined search algorithms in the context of different data structures and the complexities of these algorithms. We observed the impact of choosing a data structure suitable for the specific task at hand, as discussed in previous articles in this series. Even though binary search tends to be faster, maintaining a sorted array or a BST can be computationally or spatially costly. Similarly, BFS is not always preferred, as DFS tends to store fewer vertices in the stack. There are many other graph traversal algorithms suited for various problems, so we will continue exploring other graph traversal algorithms in future articles.

Exercises

From this article, there will be an exercise section where you can test your understanding of the material introduced in the article. I highly recommend solving these questions by yourself after reading the main part of the article. You can click on each question to see its answer.