Continued.
Previous parts:
1-3 ,
4-6 ,
7-910. Normalization of databases
The guidelines for properly designing
relational databases are set out in the relational data model. They are grouped into 5 groups, which are called
normal forms . The first normal form represents the lowest level of database normalization. The fifth level represents the highest level of normalization.
Normal forms are
recommendations for database design. You are not required to adhere to all five normal forms when designing databases. However, it is recommended to normalize the database to some extent because this process has several significant advantages in terms of efficiency and ease of handling your database.
- In the normalized database structure, you can produce complex data samples with relatively simple SQL queries.
- Data integrity Normalized database allows you to securely store data.
- Normalization prevents the appearance of redundant stored data . Data is always stored in only one place, which makes it easy to insert, update and delete data. There is an exception to this rule. Keys, by themselves, are stored in several places because they are copied as foreign keys to other tables.
- Scalability is the ability of a system to cope with future growth. For a database, this means that it must be able to work quickly when the number of users and data volumes increase. Scalability is a very important feature of any database model for RDBMS.
')
Here are some of the main points that are associated with the
normalization of databases :
- Arrange data into logical groups or sets.
- Finding links between data sets. You have already seen examples of one-to-many and many-to-many connections.
- Minimize data redundancy.
A very small number of databases follow all five normal forms provided in the relational data model. Usually databases are normalized to the second or third normal form. The fourth and fifth forms are rarely used. Therefore, I limit myself to tell you only about the first three.
11. The first normal form (1NF)
The first normal form says that a database table is a representation of the
essence of your system that you are creating. Examples of entities: orders, customers, booking tickets, hotel, product, etc. Each entry in the database represents one instance of the entity. For example, in the customer table, each entry represents one customer.
Primary key
Rule: each table has a primary key consisting of the smallest possible number of fields.As you know, a primary key can consist of several fields. For example, you can choose a first and last name as the primary key (and hope that this combination will always be unique). It will be a much better choice social number. Insurance as a primary key, since This is the only field that uniquely identifies a person.
Even better, when there is no obvious candidate for the title of the primary key, create a
surrogate primary key in the form of a numerical auto-increment field.
Atomicity
Rule: the fields do not have duplicates in each record and each field contains only one value.Take, for example, the site of car collectors, where every collector can register his cars. The table below stores information about registered cars.

Horizontal data duplication is bad practice.
With this design option, you can save only five cars and if you have less than 5, then you are wasting free space in the database for storing empty cells.
Another example of poor design practice is storing multiple values ββin a cell.

Multiple values ββin one cell.
The correct solution in this case would be to allocate cars into a separate table and use a foreign key that refers to this table.
The order of the entries should not matter.
Rule: the order of table entries should not matter.You may be inclined to use the order of entries in the customer table to determine which of the customers has registered first. For these purposes, you better create the date and time fields for customer registration. The order of the records will inevitably change when customers are deleted, changed or added. That is why you should never rely on the order of records in a table.
In the next part, we consider the second normal form (2NF).
12. The second normal form.
In order for the database to be normalized according to the second normal form, it must be normalized according to the first normal form. The second normal form is related to data redundancy.
Data redundancy.
Rule: fields with a non-primary key should not be dependent on the primary key.May sound a little abstruse. But this means that you should store in the table only data that is directly related to it and not related to another entity. Following the second normal form is a matter of finding data that is often duplicated in table entries and which may belong to another entity.

Duplication of data among the records in the store.
The table above may belong to a company that sells cars and has several stores in the Netherlands.
If you look at this table, you will see multiple examples of data duplication among the records. The
brand field could be allocated to a separate table. As well as the
type field (model), which could also be highlighted in a separate table, which would have many-to-one relationships with the
brand table because a brand may have different models.
The
store column contains the name of the store where the machine is currently located.
A Store is an obvious example of data redundancy and a good candidate for a single entity that should be linked to a car table
by a foreign key relationship .
Below is an example of how you can model a database for cars, avoiding data redundancy.

In the example above, the
car table has a foreign key β a reference to the
type and
store tables. The brand column has disappeared because the brand has an implicit link through the
type table. When there is a link to type, there is a link to brand, too. type is owned by brand.
Data redundancy has been substantially eliminated from our database model. If you are picky enough, then you may not be satisfied with this decision. What about the
country_of_origin field in the
brand table?
So far there are no duplicates because there are only four brands from different countries. The attentive database developer should highlight the names of countries in a separate table
country .
And even now you should not be satisfied with the result because you could also separate the
color field into a separate table.
How strictly you approach the creation of your tables is up to you and depends on the specific situation. If you plan to store a huge number of car units in the system and you want to be able to search by color (color), then it would be a wise decision to select colors in a separate table so that they are not duplicated.
There is another case where you may want to highlight colors in a separate table. If you want to allow company employees to enter data about new cars you want them to have, it is possible
to choose the color of the car from a predefined list. In this case, you will want to store all possible colors in your database. Even
if there are no machines with such a color yet , you will want these colors to be present in the database so that workers can select them. This is definitely the case when you need to highlight the colors in a separate table.
13. The third normal form.
The third normal form is associated with
transitive dependencies . Transitive dependencies between database fields exist when the values ββof non-key fields depend on the values ββof other non-key fields. For a database to be in third normal form, it must be in second normal form.
Transitive dependencies.
Rule: there can be no transitive dependencies between fields in a table.
The client table (my clients are players of the German and French football teams) below contains transitive dependencies.

In this table, not all fields depend solely on the primary key. There is a separate link between the postal_code field and the fields of the city (city) and province (province). In the Netherlands, both the city and the province are determined by the postal code, zip code. Thus, there is no need to store the city and province in the client table. If you know the postal code, then you already know the city and province.
Such transitive dependencies should be avoided if you want your database model to be in third normal form.
In this case, the elimination of transitive dependencies from the table can be achieved by removing the city and province fields from the table and storing them in a separate table containing the postal code (primary key), the name of the province and the name of the city. Obtaining a combination of a postal code-city-province for an entire country can be a very non-trivial task. That is why such tables are often sold.
Another example for applying a third normal form is the (too) simple example of an online store's order table below.

VAT (value added tax) is a percentage that is added to the product price (19% in this table). This means that the value of total_ex_vat can be calculated from the value of total_inc_vat and vice versa. You should store one of these values ββin a table, but not both. You must assign the task of calculating total_inc_vat from total_ex_vat or vice versa on the program that uses the database.
The third normal form says that you should not store data in a table that can be obtained from other (non-key) table fields. Especially in the example of the customer table, following a third normal form requires either a large amount of work or the acquisition of a commercial version of the data for such a table.
The third normal form is not always used when designing databases. When developing a database, you should always compare the benefits of a higher normal form versus the amount of work required to apply the third normal form and maintain the data in that state. In the case of the client table, I personally would prefer not to normalize the table to the third normal form. In the last example with VAT, I would use the third normal form. Storing data that is
reproducible from existing data
is usually a bad idea.