60% of the tables I have recently designed, do not have the standard auto-increment ID column. Although counter-intuitive, doing this has helped me gain a significant performance boost because of smaller table sizes (data plus index); particularly for the cases where we paginate more than we do pointed reads and updates. This required me to not use ORM and I happily took that tradeoff to get better query control and performance. This design choice is purely made after I have understood how a relational database (MySQL in my case) stores the data and evaluates the query; definitely not a one-size-fits-all design decision. So, whenever you are designing a schema, always optimize for the most frequent queries and even if it requires going for a composite primary key, it may be worth it. also, knowing a bit about database internals always helps. #AsliEngineering #Databases #SystemDesign
If you don't use the id column then how do you use foreign keys?
It would be really good if you could share the use case in which you opted to go for non auto-increment ID column.
You still do require the unique id either Guid or computed key. Interestingly I will try your case to see the performance.
Composite Pks are more useful that it seems. I have been using (tenant_id, id) kind of fields on postgres (so easily use uuids), as we always read / write data for a single tenant only. This keeps possibilities like partitioning on tenant_id, or even sharding on it (using something like citus or own shard router code). This almost eliminates the noisy neighbour issue at least from query plan perspective as all indexes essentially contains tenant_id.
I've been working in the opposite direction. The data (tires) I work with is very hierarchical -- you have a brand, a model, and then the size in that model. The old data model has an auto-incrementing id for brand, but model was set up to have a dependent, incremental id based on the brand id, causing it to be a composite key. Generally, the sizes can be identified by a manufacturers #, but because there are "white label" tires that get sub-branded, it's not a unique identifier. Thus, the composite key of brand id, line id, and manufacturer's number is the primary key. In the new model, I still have the old keys that I keep as unique keys, however I've moved the primary key to just be a single auto-incrementing id column. You keep the prevented redundancy guarantee, have the ability to query based on either key set (thus allowing queries to be less complex in some scenarios), get better insertion/deletion performance due to the linear nature of the auto-incremental id, and only have the cost of an extra few bytes per record. Really, it's all about knowing the data. Not every dataset is best modeled with a single-column primary key. Not every dataset is best modeled with a composite primary key.
Perfectly said Arpit Bhayani . Design the database with forecasting that how much data and which type of data will come, will help you always to do a properly design table structure but as you said that one design not fit at all. Going for surrogate key(Auto increment non business value) or Composite primary key is always a debatable topic. composite primary key Vs Sequence(Surrogate Key) This is a controversial point. If my table with a composite primary key is expected to have millions of rows, the index controlling the composite key can grow up to a point where CRUD operation performance is very degraded. In that case, it is a lot better to use a simple integer ID primary key whose index will be compact enough and establish the necessary DBE constraints to maintain uniqueness. Follow Vishal Jaiswal, PMP® for learning database concepts, building tricks and cracking database technical interviews
Many devs by default go for ORMs . But what i have experienced as a front-end dev is that the queries get incrementally slower as the size of the database increases
but CPK can result in higher index sizes since composite indexes can be larger than single-column indexes, which can also increase storage and I/O ops, especially in the case of write-heavy workloads. how do you handle this complexity and potential performance issues?
100% i agree with you here, this is something i too discovered while designing a tightly related pointed queries
Sys Design for SDE-1s: https://arpitbhayani.me/sys-design Sys Design for SDE-2s+: https://arpitbhayani.me/course Redis Internals: https://arpitbhayani.me/redis-internals I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff. no-fluff engineering - youtube.com/c/ArpitBhayani first principle series - https://arpitbhayani.me/first my knowledge base: https://arpitbhayani.me/blogs bookshelf: https://arpitbhayani.me/bookshelf papershelf: https://arpitbhayani.me/papershelf