How do indexes make databases read faster?
The database is just a collection of records. These records are serialized and stored on the disk. The way the records are serialized and stored on the disk depends on the database engine.
How does the database read from the disk?
A disk is always read in Blocks, and a block is typically 8kb big. When any disk IO happens, even requesting one byte, the entire block is read in memory, and the byte is returned from it.
Say we have a "users" table with 100 rows, with each row being 200B long. If the block size is 600B, we can read 600/200 = 3 rows in one disk read.
To find all users with age = 23, we will have to read the entire table row by row and filter out the relevant documents; we will have to read the entire table. In one shot, we read 3 rows so that it would take 100/3 = 34 disk/block reads to answer the query.
Let's see how indexes make this faster.
An index is a small referential table with two columns: the indexed value and the row ID. So an index on the age column will have two columns age (4 bytes) and row ID (4 bytes).
Each entry in the index is 8 bytes long, and the total size of the index will be 100 * 8 = 800 bytes. When stored on the disk, the index will require 2 disk blocks.
When we want to get users with age == 23, we will first read the entire index, taking 2 disk IOs and filtering out relevant row IDs. Then make disk reads for the relevant rows from the main table. Thus we read 2 blocks to load the index and only relevant blocks have the actual data.
In our example, it comes out to be 2 + 2 = 4 block reads. So, we get an 8x boost in performance using indexes.
Note: This is a very crude example of how fetch happens with indexes; there are a lot of optimizations that I have not talked about.
Here's the video of my explaining this in-depth 👇 do check it out
In this video, we discuss how indexes make a database operate faster. While discussing that, we dive deep into how the data is read from the disk, how indexes are structured, serialized, and stored on the disk, and finally, how exactly data is quickly read using the right set of indexes.
Outline:
- 00:00 How is a table stored on the disk?
- 03:14 Reading bytes from the disk
- 05:21 Reading the entire table from the disk
- 08:33 Evaluating a simple query without an index
- 10:00 Basics of Database Indexes
- 13:24 Evaluating a simple query with index
You can also
- Subscribe to the YT Channel Asli Engineering
- Download the notes
- Listen to this on the go on Spotify
Thank you so much for reading 🖖 If you found this helpful, do spread the word about it on social media; it would mean the world to me.
You can also follow me on your favorite social media LinkedIn, and Twitter.
Yours truly,
Arpit
Until next time, stay awesome :)
I teach a course on System Design where you'll learn how to intuitively design scalable systems. The course will help you
- become a better engineer
- ace your technical discussions
- get you acquainted with a massive spectrum of topics ranging from Storage Engines, High-throughput systems, to super-clever algorithms behind them.
I have compressed my ~10 years of work experience into this course, and aim to accelerate your engineering growth 100x. To date, the course is trusted by 500+ engineers from 9 different countries and here you can find what they say about the course.
Together, we will build some of the most amazing systems and dissect them to understand the intricate details. You can find the week-by-week curriculum and topics, benefits, testimonials, and other information here https://arpitbhayani.me/masterclass.
Senior Software Engineer at Intuit
2yAccording to the video you mentioned, when we find the age of a user with indexing, it first looks for all relevant ids from the index table and then searches in the main table. If all the relevant ids are the first id of every 34 blocks, does that mean it would take 34 seconds?
Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022
2yInformative👍💯
Azure Data Engineer
2yInsightful.