Mory Kaba’s Post

View profile for Mory Kaba, graphic

Data Engineer | Skilled in Data Processing, Big Data Technologies & Cloud Infrastructure | Python | Spark | SQL | Terraform | AWS | Azure

Database Normalization for Data Engineering: From No Normalization to 3NF What is Database Normalization? Database normalization is the process of organizing your data to reduce redundancy and improve integrity by breaking down large, complex tables into smaller, more structured ones. This is done by applying a series of rules called Normal Forms (NFs), which minimize dependencies between tables and lead to fewer data inconsistencies. Normalization typically involves dividing tables into smaller ones and linking them using foreign key constraints. The most common Normal Forms are 1NF, 2NF, and 3NF, though more advanced forms exist for specific use cases. Why is Normalization Important for Data Engineers? As a data engineer, whether you’re modeling your data as One Big Table (OBT), using a star schema, snowflake schema, or even a Data Vault, understanding normalization is crucial. Each of these approaches follows different degrees of normalization: • OBT (One Big Table) is the least normalized, often in 1NF, and combines both fact and dimension tables into one. • Star schemas and snowflake schemas typically keep fact tables in 3NF for data integrity, while dimension tables are often in 2NF. • Data Vault is highly normalized and can follow even stricter rules. Understanding these differences allows you to apply the correct normalization form depending on your use case, balancing complexity with performance: 1. Unnormalized Form (UNF): Contains duplicate data and repeating groups, not divided into separate tables. 2. First Normal Form (1NF): Ensures that all values are atomic (indivisible) and each record is unique. 3. Second Normal Form (2NF): Removes partial dependencies, meaning attributes should only depend on the entire primary key (not a subset of it). 4. Third Normal Form (3NF): Removes transitive dependencies, where non-key attributes depend only on the primary key and not on other non-key attributes. Lets look at a simple example on how to take a table from no normalization to 3NF: #dataengineering #datamodelling #analytics #databases #dataanalytics

To view or add a comment, sign in

Explore topics