Normalization and Normal Forms (1NF, 2NF, 3NF)

RMAG news

Introduction

Normalization is a systematic approach to organizing data in a database to reduce redundancy and improve data integrity. The process involves decomposing a table into smaller, related tables without losing data. This article will explain the concepts of normalization and the different normal forms (1NF, 2NF, 3NF), providing clear illustrations and examples to help students understand these concepts thoroughly.

What is Normalization?

Normalization involves structuring a relational database in a way that minimizes redundancy and dependency by organizing fields and table relations. The primary goals of normalization are to:

Eliminate redundant data.
Ensure data dependencies make sense.
Reduce the potential for anomalies during data operations (insertion, update, deletion).

Normal Forms

Normal forms are a series of guidelines that a relational database must follow to be considered normalized. Each normal form builds on the previous one, creating a series of increasingly stringent rules.

First Normal Form (1NF)

A table is in the First Normal Form if:

All the values in a table are atomic (indivisible).
Each column contains values of a single type.
Each column contains unique values.
The order in which data is stored does not matter.

Example of 1NF

Consider a table that stores information about students and their courses:

StudentID
StudentName
Courses

1
John Doe
Math, Science

2
Jane Smith
History, Math

This table is not in 1NF because the Courses column contains multiple values. To convert it to 1NF, we need to ensure that each column contains atomic values:

StudentID
StudentName
Course

1
John Doe
Math

1
John Doe
Science

2
Jane Smith
History

2
Jane Smith
Math

Second Normal Form (2NF)

A table is in the Second Normal Form if:

It is in 1NF.
All non-key attributes are fully functionally dependent on the primary key.

This means that there should be no partial dependency of any column on the primary key. In other words, all columns must depend on the entire primary key.

Example of 2NF

Consider the following table that stores information about students, courses, and instructors:

StudentID
CourseID
StudentName
CourseName
InstructorName

1
101
John Doe
Math
Dr. Smith

1
102
John Doe
Science
Dr. Jones

2
101
Jane Smith
Math
Dr. Smith

2
103
Jane Smith
History
Dr. Brown

This table is in 1NF but not in 2NF because StudentName depends only on StudentID and CourseName, InstructorName depend only on CourseID, not on the combination of StudentID and CourseID. To convert it to 2NF, we decompose the table into two tables:

Students Table:
| StudentID | StudentName |
|———–|————-|
| 1 | John Doe |
| 2 | Jane Smith |

Courses Table:
| CourseID | CourseName | InstructorName |
|———-|————|—————-|
| 101 | Math | Dr. Smith |
| 102 | Science | Dr. Jones |
| 103 | History | Dr. Brown |

Enrollment Table:
| StudentID | CourseID |
|———–|———-|
| 1 | 101 |
| 1 | 102 |
| 2 | 101 |
| 2 | 103 |

Third Normal Form (3NF)

A table is in the Third Normal Form if:

It is in 2NF.
There are no transitive dependencies.

A transitive dependency occurs when a non-key column is dependent on another non-key column.

Example of 3NF

Consider the following table:

StudentID
CourseID
CourseName
InstructorName
InstructorOffice

1
101
Math
Dr. Smith
Room 101

1
102
Science
Dr. Jones
Room 102

2
101
Math
Dr. Smith
Room 101

2
103
History
Dr. Brown
Room 103

This table is in 2NF but not in 3NF because InstructorOffice is dependent on InstructorName, which is not a key. To convert it to 3NF, we decompose it further:

Students Table:
| StudentID | StudentName |
|———–|————-|
| 1 | John Doe |
| 2 | Jane Smith |

Courses Table:
| CourseID | CourseName | InstructorName |
|———-|————|—————-|
| 101 | Math | Dr. Smith |
| 102 | Science | Dr. Jones |
| 103 | History | Dr. Brown |

Instructors Table:
| InstructorName | InstructorOffice |
|—————-|——————|
| Dr. Smith | Room 101 |
| Dr. Jones | Room 102 |
| Dr. Brown | Room 103 |

Enrollment Table:
| StudentID | CourseID |
|———–|———-|
| 1 | 101 |
| 1 | 102 |
| 2 | 101 |
| 2 | 103 |

Summary

Normalization is an essential process in database design that aims to reduce redundancy and ensure data integrity. By following the rules of normalization and moving through the different normal forms (1NF, 2NF, 3NF), we can create a well-structured database that minimizes data anomalies and supports efficient data operations. Understanding and applying these principles is fundamental for anyone involved in database design and management.