Road to C Programmer #9 - Buffer Overflow

Null-Terminated Strings

When we initialize a string in C, it automatically adds \0 (or 0 in ASCII) at the end of the string to signify that it is the end of the string. Hence, when we want to store 5 characters, we need to prepare a buffer of size 6.

int main () {
    char buffer[6];
 
    buffer[0] = "1";
    buffer[1] = "2";
    buffer[2] = "3";
    buffer[3] = "4";
    buffer[4] = "5";
    buffer[5] = "\0"; // or 0
 
    return 0;
}

This is a great way of implementing strings because it doesn't need to store the length of the string, which takes up 4 bytes. However, this is also where a significant security vulnerability comes from.

Buffer Overflow

Let's see what happens if we forget to add the null terminator.

#include <string.h>
 
int main() {
    char buffer[6];
 
    buffer[0] = "1";
    buffer[1] = "2";
    buffer[2] = "3";
    buffer[3] = "4";
    buffer[4] = "5";
    // buffer[5] = "\0"; // or 0
 
    printf("len: %d", strlen(buffer)); // does not work properly
 
    return 0;
}

The above strlen operation does not work properly as it keeps going until it detects \0, which is not present in the buffer. This also happens when we try to assign a larger string to a buffer with strcpy. It keeps copying the characters until it sees \0, which leads to the values outside of the buffer being modified. This is called buffer overflow, and hackers can use this to do all kinds of malicious activities.

Buffer Overflow Attack

Let's say our main function takes a string user input like the following:

int main (int argc, char *argv[]) {
    // argc : Argument count
    // argv : Argument values
 
    char buffer[500];
    strcpy(buffer, argv);
    printf("%s", buffer);
 
    return 0;
}

It simply takes a user input string, copies that to the buffer of size 500, and prints that buffer. Now, using the buffer overflow, we can alter any part of the memory by assigning more characters than the buffer can take, even the return address. Hence, we can inject a string such that it overflows to the return address, which points to the malicious code embedded in the string (gaining super-user access to the files, stealing information, etc.). I recommend the video from Computerphile linked below if you are interested in the details.

The exploitation of buffer overflow caused by the null-terminated string in C is called a buffer overflow attack, and a C programmer has to be careful with this.

Coutermeasures

Luckily, we have countermeasures against such attacks that we can implement to safely handle user input strings. For example, instead of using strcpy, we can use strncpy to only access characters inside of the buffer.

int main (int argc, char *argv[]) {
 
    char buffer[500];
    strncpy(buffer, argv, 499); // only up to 500th character, not up to \0
    buffer[499] = '\0'; // add null terminator at the end of the buffer
    printf("%s", buffer);
 
    return 0;
}

However, realistically speaking, it is hard not to forget to implement those countermeasures all the time, just like how it is hard to manage heap safety. This is why many programming languages today implement strings differently, and why many programmers choose something other than C.

Exercises

From this article, there will be an exercise section where you can test your understanding of the material introduced in the article. I highly recommend solving these questions by yourself after reading the main part of the article. You can click on each question to see its answer.

Resources