Database Basics #1 - Getting Started

Last Edited: 4/3/2025

This blog post introduces the fundamental concepts like DBMS, relational data model, and SQL.

DevOps

In this DevOps series, we have been discussing various concepts and DevOps tools like networking, Linux, Git, and so on. Lately, we're focusing on Kubernetes for scaling applications, but we have been ignoring the gigantic elephant in the room, the database. The database is essential for almost all applications, yet we're only briefly mentioned it previously. Therefore, before we discuss further on how to effectively scale our applications, we will cover the basics of databases in this database series.

Database Management Systems

The simplest way to create a database is by using CSV or TXT files to store data on disk and accessing them via a script. Using system calls, we can read and write to files via the operating system. However, this naive approach is slow due to the intensive I/O operations and is inefficient and not scalable. It requires reading the entire file for every query. Furthermore, there's no system in place to ensure data integrity, which is crucial for data accuracy and consistency, and security. In other words, anyone can edit any part of a TXT file anyhow concurrently, potentially breaking the logic.

Thus, we often need tailored software to manage databases properly, which are database management systems (DBMSs). DBMSs support the definition, creation, querying, updating, and administration of databases based on a data model, an abstraction of how we define data. DBMSs ensure data integrity by enforcing a schema, which is the description of a particular collection of data given the data model. Examples of DBMSs include SQLite, MySQL, and PostgreSQL, which we often refer to simply as "databases." The database is the storage organized by a DBMS, and the set of DBMS, database, and data model is known as a database system. Different DBMSs have different implementations of different data models, which is an important consideration when choosing a DBMS based on the use case.

Relational Data Model

Though many data models are available, the most commonly chosen is the relational data model, derived from mathematical notions of relations. The relational data model defines a database's relations (tables) of multiple attributes (columns), whose tuples (rows) represent the entities (data points). A relation contains a primary key to identify a single tuple, and a foreign key can be defined as an attribute of a tuple to refer to the tuple of another relation, defining the relationships. The following illustrates how the relational data model looks like.

DB Relations

The relational data model allows us to express different objects and their relationships flexibly. Also, relations are abstractions, separate from the physical representations of data. This means a DBMS can organize data in storage however it chooses, or can use different physical schemas, behind the programming interface based on relations for users and applications to manipulate the database. The following illustrates how a database system based on the relational data model enforces this data independence.

DB Schemas

Applications may only interact with the predefined external schema that enforces security to the logical schema defined based on the relational data model. Then, the DBMS is responsible for handling the implementations of the physical schema based on the logical schema, providing physical independence and enabling optimizations. Other data models can achieve data independence to some extent and are more suited for specific applications, though the relational data model excels in this regard and is robust. Hence, we will primarily discuss the relational data model, though we will briefly touch on other data models as well.

SQL

The physical independence of the relational data model not only enables optimizations by the DBMS but also provides an opportunity to standardize the programming language to work with logical and external schemas between different DBMSs with different physical schemas. This led to the birth of structured query language (SQL). Due to the dominance of relational database systems, SQL is one of the most used programming languages in the world.

Though DBMS implementations are important to consider when writing queries and optimizing applications and system design, most use cases do not require delving into the complexity behind the abstractions for incremental performance gains. Hence, we can start by learning the basics of interacting with a database using SQL from an application's perspective, which remains mostly the same across different DBMSs.

To get started with learning SQL, we can set up PostgreSQL in a Docker container by first pulling the image with docker pull postgres and instantiating the container with docker run --name pg -e POSTGRES_PASSWORD=mypsecretpassword -d postgres. Then, we can access the container via a terminal with docker exec -it pg bash and access the database with psql -U postgres. We can run SELECT 1; to see if the environment is set up correctly.

Conclusion

In this article, we covered what DBMSs and relational data models are, and why we need DBMSs and prefer the relational data model for efficiency, data integrity, flexibility, security, and durability. We also discussed how logical independence, brought about by the relational data model, has led to SQL. From the next article, we will dive deeper into SQL and how we can perform basic operations with it.

Resources