CrazyCloud - Fotolia

Database normalization in MySQL: Four quick and easy steps

Normalization helps in achieving resource optimization with improvements in MySQL database design. Learn about database normalization in MySQL

Database normalization was introduced as a procedure by Edgar Frank Codd, a computer scientist at IBM, in his paper A relational model of data for large shared data banks in 1970.

Database normalization is a process by which an existing schema is modified to bring its component tables into compliance through a series of progressive normal forms.

It focuses on ridding developers and their projects of the “spreadsheet syndrome”, which refers to the tendency of developers to squeeze as much information as possible into as few tables as possible.

Earlier, due to the notions of spreadsheets and how data was managed in them, developers continued to design MySQL databases with the same mind frame.

Today, this method is not considered a smart way to design MySQL databases since tables, when designed with the spreadsheet syndrome, call for constant redesigning for every small change to the database.

Benefits of database normalization in MySQL

Reduced usage of storage space by intelligently categorizing data is one of the many benefits database normalization lends to MySQL.

It aids in better, faster, stronger searches as it entails fewer entities to scan in comparison with the earlier searches based on mixed entities. Data integrity is improved through database normalization as it splits all the data into individual entities yet also builds strong linkages with the related data.

Mike Hillyer, a technical writer in MySQL AB (now Oracle Corporation), said: “The goal of database normalization is to ensure that every non-key column in every table is directly dependent on the key: the whole key and nothing but the key. With this goal come the benefits in the form of reduced redundancies, fewer anomalies and improved efficiencies.”

Normalize your data easily

The following example will illustrate how database normalization helps achieve a good design in MySQL.

The table below presents data that needs to be captured in the database:

Title Author Bio ISBN Subject Pages Publisher
Beginning MySQL Database Design and Optimization

Chad Russell,

Jon Stephens

Chad Russell is a programmer and system administrator who owns his own internet hosting company. Jon Stephens is a member of the MySQL AB documentation team. 90593324 MySQL Database Design 520 Apress

In the example shown above, a lot of storage space will be wasted if any one criterion (author or publisher) is considered as the identification key, therefore database normalization is essential.

It is a step-by-step process that cannot be carried out haphazardly. The following steps will help in attaining database normalization in MySQL.

Step 1: Create first normal form (1NF)

The database normalization process involves getting data to conform to progressive normal forms, and a higher level of database normalization cannot be achieved unless the previous levels have been satisfied. First normal form is the basic level of database normalization.

For 1NF, ensure that the values in each column of a table are atomic; which means they are unique, containing no sets of values. In our case, Author and Subject do not comply.

One method for bringing a table into 1NF is to separate the entities contained in the table into separate tables. In our case, this would result in Book, Author, Subject and Publisher tables.

Book’s table:

ISBN

Title

Pages

1590593324

Beginning MySQL Database Design and Optimization

520

Author’s table:                                                                                         

Author_ID

First Name

Last Name

1

Chad

Russell

2

Jon

Stephens

3

Mike

Hilyer

Subject’s table:

Subject _ID

Last_name

1

Russell

2

Stephens

Publisher’s table:

     Publisher_ID

Name

Address

City

State

Zip

1

Apress

2   580, Ninth street, station 219

Berkeley

California

94710

Step 2: Define relationships

Three types of relations can be established:

  • One-to-(Zero or)-one (Example: marriage)
  • One-to-(Zero or)-many (Example: kids)
  • Many-to-many (Example: Facebook)

The Book’s table may have many to many relations with the Author’s table.

Author’s table may have many books and a book may have more than one author.

The Book’s table may have many to many relations with the Subject table.

The books may fit in many subjects and the subjects may have many books.

Many-to-many relations have to be presented by “link” tables

Book Author table:

ISBN

Subject_ID

1590593324

1

1590593324

2

Book_Subject table:

ISBN

Subject_ID

1590593324

1

1590593324

2

One-to-many in our example will be Books to Publisher. Each book has only one Publisher but one Publisher may have many books.

We can achieve one-to-many relationships with a foreign key. A foreign key is a mechanism in database management systems (DBMS) that defines relations and creates constraints between data segments. It is not possible to review what is not related to the specific book. It is not possible to have a book without an author or publisher.

When deleting a publisher, all the related books may need to be deleted along with the reviews of those books. The authors would not need to be deleted.

The foreign key is introduced in the table that represents the “many”, pointing to the primary key on the “one” table. Since the Book table represents the many portion of the one-to-many relationship, the primary key value of the Publisher as in a Publisher_ID column as a foreign key is added.

  • ISBN

    Title

    Pages

    Publisher_ID

    1590593324

    Beginning MySQL Database Design and Optimization

    520

    1

Step 3: Make second normal form (2NF)

Second normal form (2NF) cuts down the tautological/superfluous data in a table by selecting it, putting it in new tables and by establishing relations among them.

In database normalization, 2NF is about the relations between the composite key columns and non-key columns. That means the non-key columns have to depend on the whole composite key.

Here, the primary key is composite to eliminate the possibility of the same person writing more than one review of the book. Reviewer URL is dependent only on the Reviewer ID which is only part of the whole composite key.

This table does not comply with the 2NF:

ISBN

Reviewer ID

Summary

Reviewer_URL

1590593324

3

A great book!

http://www.openwin.org

Step 4: Third Normal Form (3NF)

This requires that all columns depend directly on the primary key. Tables violate the 3NF when one column depends on another column which, in turn, depends on the primary key – a transitive dependency.

In the publisher table, the City and State are dependent on the zip code, not the Publisher_ID.

Publisher_ID

Name

Address

City

State

Zip

1

Apress

2580, Ninth street, station 219

Berkeley

California

94710

To comply with 3NF we have to move these outside the publisher’s table:

Zip

City

State

94710

Berkeley

California

Through the process of database normalization, we bring our schema’s tables into conformance with progressive normal forms. As a result, the tables each represent a single entity – a book, an author or a subject, for example – and we benefit from decreased redundancy, fewer anomalies and improved efficiency.


Ronen Baram is a MySQL sales consultant at Oracle for Australia/New Zealand markets. As part of his job, he meets with clients all over Australia and New Zealand, as well as the rest of APAC. Baram has 15 years of IT experience and has a special interest in Linux OS.

(This tutorial is based on a presentation given by Ronen Baram at Oracle Develop Conference held in Hyderabad this year. Compiled by Sharon D’Souza.)

This was last published in May 2011

CW+

Features

Enjoy the benefits of CW+ membership, learn more and join.

Read more on Database management

Join the conversation

2 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

If using the data model I presented, this could return some interesting results for certain data sets.
SELECT
zip
,GROUP_CONCAT(DISTINCT CONCAT_WS(' ',c.city,s.state)) combo
,COUNT(DISTINCT CONCAT_WS('_',c.city_id,s.state_id)) combo_count
FROM address a
LEFT JOIN city c ON (a.city_id =c.city_id )
LEFT JOIN state s ON (a.state_id=s.state_id)
GROUP BY a.zip
HAVING combo_count>1;
Cancel
One issue with the address is that the ZipCode cannot be used as the key. One city may have multiple zip codes and one zip code may have multiple cities/ towns within.
Cancel

-ADS BY GOOGLE

SearchCIO

SearchSecurity

SearchNetworking

SearchDataCenter

SearchDataManagement

Close