Overview

Although I often use a database to store data, I think I know it well enough. After reading the book, I realized I don’t know much about databases. So I wrote this series of articles to learn more about databases. This series of articles will help you understand how databases work.

A page is a block of data read or written in a single I/O disk operation. That means the disk always reads or writes a page, even if you only update 1 byte of data. Page size is usually 4KB.

Note: B+Tree is optimized for that.

Database using the same concept to store data. A page is a data unit. Page size is usually going to be between 4-16KB. That depends on the database.

DatabasePage size
Mysql16KB
Postgresql8KB
SQLite1-64 KB (configurable)
Oracle2 KB, 4 KB, 8 KB, 16 KB, or 32 KB (configurable)

Fixed-size pages

The first time I heard about the database, I thought that database store data like this:

+--------+--------+--------+--------+---------------+
| HEADER | CELL 1 | CELL 2 | CELL N | FREE SPACE... |
+--------+--------+--------+--------+---------------+

The beginning is the page header. After that is cell data, and the end of the page is free space. This is one of the most straightforward page layouts. It is called fixed-size pages. But it is not used in the real world. The advantages of this layout are simple, easy to implement, and management. But it has a lot of disadvantages:

  • The cost of compaction is high. Because we need to move all data to the beginning of the page. (compact is a process to reclaim free space data fragmented on the page)
  • Wasted space: Cell data is a fixed size, even if the amount of data stored in them is much smaller. This can be a particular issue in databases with many small records or where space is at a premium.

So we need a better page layout.

Slotted pages

Page header store at the beginning of the page. It contains information about the page, such as the page type (root, leaf, …) and the start and end of the free space on the page. Typically, this header occupies around 100 bytes of space and may contain additional data such as a magic number, version number, checksum for data integrity, and bit flags to indicate features like compression or sibling page IDs.

Something like that:

type PageHeader {
    // Page type
    PageType uint16
    // Page number
    PageNumber uint32
    // Page checksum
    Checksum uint32
    // Page flags
    Flags uint16
    // Page free space
    FreeSpace uint16
    // Page number of the next page
    NextPage uint32
    // Page number of the previous page
    PrevPage uint32
    // start the free space on the page
    FreeSpaceStart uint16
    // end of the free space on the page
    FreeSpaceEnd uint16
    // Page number of the overflow page
    OverflowPage uint32
    ...
}

Insert data

When we insert data into the page, we must find a free space to store it. If FreeSpaceStart - FreeSpaceEnd is enough. Data will add to the end of free space. FreeSpaceStart and FreeSpaceEnd will be updated. If not, a new page will be created to store data.

Note: A page split will happen in a B+tree index that uses slotted pages.

Remove data

When we remove data from the page, Ptr will be set to null. FreeSpaceStart and FreeSpaceEnd will not be updated. Space is not reclaimed, and data will be fragmented, but the page will be marked as can-compact and waiting for compact. Compact command depends on the database. In Mysql, we can use OPTIMIZE TABLE to compact the page. In Postgresql, that is VACUUM.

Note: Some databases have mechanisms to track and reuse the space. But fragmentation still happens and still needs optimization.

slotted_page.svg

Overflow pages

The page size is limited, But what will happen if data is too large to fit on a page? Text, BLOB,… data types are examples of data that can be too large to fit in a single slot. In this case, we need to use overflow pages.

The overflow page is a linked list of pages. Each page in the list contains data and a pointer to the next page in the list.

Overflow pages are used when a data record is too large to fit entirely within a single-slotted page. The system might try to fit as much of the data as possible on the current page and then store the remaining portion in one or more overflow pages, depending on the size of the remaining portion.

Note: Unlike page split, the overflow page is not a B+tree operation. It is a database operation.

Because for optimizing database operation, the Overflow page is a linked list page fixed-size instead of a big page.

BufferPools

References: