📜 ⬆️ ⬇️

Normalization of relationships. First and second normal forms

Foreword


The normalization of relations (tables) is one of the fundamental parts of the theory of relational databases. Normalization is intended to get rid of redundancy in the relationship and modify their structure so that the process of working with them is not burdened by various extraneous difficulties. If you ignore this approach, the design efficiency rapidly decreases, which, together with other such liberties, can lead to critical consequences.

It is useful for any specialist, by the nature of his activity, one way or another connected with the design of relational databases, to understand and be able to normalize relations. And with this post I would like to start a small series of publications devoted to normal forms, aimed at giving those Habrahabr readers, who for various reasons have not yet mastered this topic, the ability to easily fill this gap in knowledge.

The article is not intended to provide a detailed and accurate statement of the principles of normalization, since this is obviously not possible within the framework of a blog due to the large amounts of information required for publication with this approach. In addition, for this purpose there is a large amount of literature written by excellent experts. My task, I believe, is to popularly demonstrate and explain the basic principles.
')

Terms Used


An attribute is a property of some entity. Often called a table field.
An attribute domain is the set of valid values ​​that an attribute can take.
A tuple is a finite set of interrelated valid attribute values ​​that together describe an entity (a row of a table).
A relation is a finite set of tuples (table).
A relationship diagram is a finite set of attributes that define an entity. In other words, it is a table structure consisting of a specific set of fields.
Projection is a relation obtained from a given by deleting and (or) rearranging some attributes.
The functional dependency between attributes (attribute sets) X and Y means that for any valid set of tuples in this respect: if two tuples match on the value of X, then they match on the value of Y. For example, if the value of the attribute "Company Name" is Canonical Ltd , then the value of the “Headquarters” attribute in such a tuple will always be Millbank Tower, London, United Kingdom. Designation: {X} -> {Y}.

First normal form


The relationship is in the first normal form (abbreviated 1NF) if all its attributes are atomic, that is, if none of its attributes can be divided into simpler attributes that correspond to some other properties of the described entity.

We will call the original relation the main one, and the value of a non-atomic attribute - subordinate.

In order to normalize the original relation, the attributes of which are non-atomic, it is necessary to combine the schemes of the main and subordinate relations. In addition, if, for example, the table corresponding to the unnormalized relation is already contained in the database and is filled with information, the task is complicated by the fact that the value of a non-atomic attribute can in turn contain several tuples.

It is necessary to clarify the above with an example. Consider a relationship that has the attributes "Employee ID", "Full Name", "Position", "Projects". Obviously, one employee can work on several projects. Suppose that a project is described by an identifier, a name and a date of delivery.
Employee codeFull namePositionProjects
oneIvanov Ivan IvanovichProgrammerID: 123; Name: Steam boiler control system; Delivery date: 09/30/2011
ID: 231; Name: PS for control and notification of the maximum permissible concentration of various gases in the room; Delivery date: 11/30/2011
ID: 321; Name: Face Recognition Module for the protection system; Delivery date: 12/01/2011
It is easy to see that not all the attributes of this relationship are atomic (indivisible). In particular, the “Projects” attribute can be divided into three simpler attributes: “Project code”, “Name”, “Date of delivery”, and the value of this attribute for an employee Ivan Ivanovich Ivanov contains several tuples — information about three projects.

Note: from a certain point of view, the “full name” attribute can also be considered non-atomic, and in this case it should also be divided into simpler ones, such as “Last Name”, “First Name”, “Middle Name”.

Now it is time to consider the relationship normalization algorithm to 1NF.
  1. Create a new relationship whose scheme will be obtained by merging the main and subordinate schemes of the original relationship into one.
  2. For each tuple of the original relation, include in the new as many lines as there are tuples in the subject relation of this tuple.
  3. Fill in the values ​​of the attributes of the new relationship that match the attributes of the subordinate relationship.
  4. Fill the lines of the new relationship with the values ​​of the atomic attributes of the source.
Apply this algorithm to the above relation. The scheme of the new relationship will consist of 6 attributes: “Employee ID”, “Full Name”, “Position”, “Project ID”, “Name”, “Delivery Date”. For one single tuple of a given relation, we add three new lines to the new one for each project (by the number of tuples in the subordinate relation). Now you can fill in the values ​​of separated attributes with tuples from the subordinate relation. Then we transfer to each of these lines the values ​​of atomic attributes: “Employee ID”, “Full Name”, “Position” (as you guessed, all three lines will contain the same values ​​of these attributes).

The result will look like this:
Employee codeFull namePositionProject codeTitleDate of delivery
oneIvanov Ivan IvanovichProgrammer123Steam Boiler Control System09/30/2011
oneIvanov Ivan IvanovichProgrammer231PS for monitoring and notification of the exceedances of MPC of various gases in the room11/30/2011
oneIvanov Ivan IvanovichProgrammer321Face Recognition Module for Security System12/01/2011

Second normal form


It is clear that the relation in 1NF may also be redundant. To eliminate it is the second normal form. But before proceeding to its description, you must first identify the shortcomings of the first.

Let the original relation contain information about the supply of certain goods and their suppliers.
Supplier CodeCityCity statusProduct codeamount
oneMoscow20one300
oneMoscow202400
oneMoscow203100
2Yaroslavltenfour200
3Stavropolthirtyfive300
3Stavropolthirty6400
fourPskov157100
It is known in advance that in this respect the following functional dependencies are contained:
{{Supplier Code, Product Code} -> {Quantity},
{Supplier Code} -> {City},
{Supplier Code} -> {Status},
{City} -> {Status}}

Primary key in relation to: {Supplier Code, Product Code}.

Obviously, the relationship is redundant: it describes two entities - the supply and the supplier. In this regard, the following anomalies occur:
The physical meaning of the redundancy of the original relationship lies in the fact that it describes not one entity, but two - the supply and the supplier .

To eliminate these anomalies, it is necessary to break the original relation into a projection:
  1. The first one should include the primary key and all non-key attributes that are clearly dependent on it.
  2. The remaining projections (in this case, it is one) will include non-key attributes that depend on the primary key implicitly , along with the part of the primary key on which these attributes depend explicitly.
As a result, two relationships will be obtained:
Supplier CodeProduct codeamount
oneone300
one2400
one3100
2four200
3five300
36400
four7100
The first relation now corresponds to the following functional dependencies:
{Supplier Code, Product Code} -> {Quantity}
Supplier CodeCityCity status
oneMoscow20
2Yaroslavlten
3Stavropolthirty
fourPskov15
The second relation is:
{{Supplier Code} -> {City},
{Supplier Code} -> {Status},
{City} -> {Status}}

Such a partition eliminated the anomalies described above: you can add information about the supplier who has not yet delivered the product, delete the delivery information without deleting information about the supplier, and easily update the information if the supplier moved to another city.

Now we can formulate the definition of the second normal form, to which, most likely, the reader has already been able to guess independently: the relation is in the second normal form (abbreviated 2NF) if and only if it is in the first normal form and each of its non-key attributes is irreducibly dependent on primary key.

Literature


For a more in-depth and thorough study of this topic, we recommend the book “Introduction to Database Systems” by Chris J. Data, on the basis of which this article was written.

Source: https://habr.com/ru/post/129195/


All Articles