Overview

Storing data on a hard disk is necessary to ensure duration. However, directly working with disks can significantly decrease performance. Therefore, like other software, databases also have a buffer layer (buffer pools) in front of the disk to reduce disk I/O requests.

What is a buffer pool?

The buffer pool is the memory layer in front of the Disk. It is a mechanism to avoid accessing the disk whenever data is required.

Note: OS also has a buffer pool. It is called by journaling file system. Different implementations but similar concepts and purposes.

When we want to read data from a disk, the database engine will request data from the buffer pool. If the data is in the buffer pool, it will return the data immediately. If the data is not in the buffer pool, it will read data from the disk and store it in it. Then it will return the data.

Why not use os buffer pools?

Buffer pools Database inspire from OS. OS also has buffer pools. Why does a database need to reinvent the wheel?

  • The database will allocate a significant amount of memory for caching pages because it recognizes that disk I/O is one of the most resource-intensive operations. On the other hand, the operating system is designed to optimize for various types of applications and may not prioritize disk I/O in the same way.
  • They want to implement their own algorithms to optimize the buffer pool. (prefetching, page replacement algorithm, etc.)

When data is written to disk? How to ensure data durability?

Data will write to the buffer pool first. This page will be marked as dirty. The database will write the dirty page to the disk. If the data is written to the disk until when conditions are met:

  • Remove the page from the buffer pool (The buffer pool is full)
  • Until a certain amount of time has passed since the page was last modified

To ensure data durability, the database will write the log first. It is called write-ahead logging. Note: Because the write-ahead log is sequential, it is faster than random write. This technique using by most software want to ensure data durability.

Databasestorage.svg

Page replacement algorithm

Because memory is limited, we can’t store all data in the buffer pool. Page replacement algorithm is used to replace data in the buffer pool when the buffer pool is full. Therefore, the hottest data should be placed in the nearest place to maximum reduce disk I/O.

This is called page replacement. There are many algorithms to do that. I will explain the most common ones.

DatabasePage replacement algorithm
MysqlVariant of LRU (midpoint insertion strategy)
PostgresqlClock
SQLiteLRU (Least Recently Used)

LRU (Least Recently Used)

This is commonly used for cache replacement. It is very simple. When we read data from the disk, we will move the data to the head of the list. When we need to replace data, we will remove the data at the end of the list.

To reduce data movement, LRU is generally implemented with a linked list.

Clock

Although, LRU implemented by the linked list has to avoid data movement. But to completely remove it, they use Clock. The Clock is a simple algorithm to replace LRU. It uses a circular queue to store data and a pointer to point to the current data in the queue (the hand pointer) to replace data when the buffer pool is full (the clock pointer). The hand pointer will move forward when we read data. We will replace data from the hand pointer when we need to replace data. It’s very simple but doesn’t actually delete the least used data. It is just a simple approximation.

Midpoint Insertion Strategy

Prefetching (Read-Ahead)

It is a technique to prefetch data asynchronously to improve I/O performance. I will not explain in detail about it MySQL Prefetching (Read-Ahead) The data is prefetched. Maybe it was never used. Moving it to the top LRU is not fair. Some hot data will be deleted from the cache.

How does the Midpoint Insertion Strategy solve the problem?

Therefore, they customize LRU by separating the data into new (young) and old. When the data is fetched for the first time, it is stored in old storage. If it continues to be accessed, it will move to the young. The flow of data will be moving from young to old and removed.

5/8 of the buffer pool is young, and 3/8 is old

Pastedimage20230405005424.png

The key advantage of the Midpoint Insertion Strategy is that it prioritizes frequently accessed data. By storing frequently accessed data in the upper half of the buffer pool, the Midpoint Insertion Strategy ensures that data prefetching not will not replace hot data.