📜 ⬆️ ⬇️

Normalization of relationships. Six normal forms

In this topic I will touch on 6 normal forms and methods for bringing tables into these forms.

The process of designing a database using the NF method is iterative and consists in the sequential transfer of the relation from 1NF to NF of a higher order according to certain rules. Each next NF is limited to a certain type of functional dependencies and the elimination of the corresponding anomalies when performing operations on database relations, as well as preserving the properties of previous NFs.

Terms Used


An attribute is a property of some entity. Often called a table field.

An attribute domain is the set of valid values ​​that an attribute can take.
')
A tuple is a finite set of interrelated valid attribute values ​​that together describe an entity (a row of a table).

A relation is a finite set of tuples (table).

A relationship diagram is a finite set of attributes that define an entity. In other words, this is a table structure consisting of a specific set of fields.

Projection is a relation obtained from a given by deleting and (or) rearranging certain attributes.

The functional relationship between the attributes (attribute sets) X and Y means that for any valid set of tuples in this respect: if two tuples match the value of X, then they match the value of Y. For example, if the value of the attribute "Company Name" is Canonical Ltd , then the value of the Headquarters attribute in such a tuple will always be Millbank Tower, London, United Kingdom. Designation: {X} -> {Y}.

The normal form is a requirement imposed on the structure of tables in the theory of relational databases to eliminate redundant functional dependencies between attributes (table fields) from the database.

The method of normal forms (NF) consists in gathering information about the objects of solving the problem within one relationship and the subsequent decomposition of this relationship into several interrelated relations based on the procedures for normalizing relations.

The goal of normalization is to eliminate redundant data duplication, which is the cause of anomalies that have arisen when adding, editing and deleting tuples (rows of a table).

Anomaly is such a situation in the database table, which leads to a contradiction in the database or significantly complicates the processing of the database. The reason is unnecessary duplication of data in the table, which is caused by the presence of functional dependencies on non-key attributes.

Anomalies-modifications are manifested in the fact that a change in one data may entail viewing the entire table and a corresponding change in some records of the table.

Anomalies-deletions - when deleting a tuple from the table, information that is not directly connected with the record being deleted may disappear.

Anomalies-additions occur when the information in the table cannot be placed until it is complete, or the insertion of a record requires additional viewing of the table.

First normal form


A relation is in 1NF, if all its attributes are simple, all domains used must contain only scalar values. There should be no repetitions of rows in the table.

For example, there is a table "Cars":

FirmModels
BMWM5, X5M, M1
NissanGT-R

Violation of the normalization of 1NF occurs in BMW models, because One cell contains a list of 3 elements: M5, X5M, M1, i.e. it is not atomic. Let's translate the table to 1NF:
FirmModels
BMWM5
BMWX5M
BMWM1
NissanGT-R

Second normal form


A relation is in 2NF if it is in 1NF and every non-key attribute is irreducibly dependent on the Primary Key (PC).

Irreducibility means that the potential key does not contain a smaller subset of attributes, from which this functional dependence can also be derived.

For example, given the table:

ModelFirmPriceA discount
M5BMW5,500,000five%
X5MBMW6000000five%
M1BMW2.5 millionfive%
GT-RNissan5,000,000ten%

The table is in the first normal form, but not in the second. The price of the car depends on the model and the company. Discount depends on the company, that is, the dependence on the primary key is incomplete. This is corrected by decomposition into two relationships in which non-key attributes depend on the PC.

ModelFirmPrice
M5BMW5,500,000
X5MBMW6000000
M1BMW2.5 million
GT-RNissan5,000,000

FirmA discount
BMWfive%
Nissanten%


Third normal form


The relationship is in 3NF when it is in 2NF and each non-key attribute is non-transitively dependent on the primary key. Simply put, the second rule requires you to take out all non-key fields, the contents of which can refer to several entries in a table in separate tables.

Consider the table:

ModelScorePhone
BMWReal-auto87-33-98
AudiReal-auto87-33-98
NissanNext-Auto94-54-12


The table is in 2NF, but not in 3NF.
In relation to the attribute "Model" is the primary key. Cars do not have personal phones, and the phone depends solely on the store.
Thus, in relation to the following functional dependencies exist: Model → Store, Shop → Phone, Model → Phone.
Dependency Model → Phone is transitive, therefore, the relationship is not in 3NF.
As a result of the separation of the original relationship, two relations are obtained that are in 3NF:


Real-auto 87-33-98
Real-auto 87-33-98
Next-Auto 94-54-12
ModelScore
BMWReal-auto
AudiReal-auto
NissanNext-Auto


Beuys-Codd Normal Form (NFBC) (a particular form of the third normal form)


The definition of 3NF is not quite suitable for the following relationships:
1) the relationship has two or more potential keys;
2) two or more potential keys are composite;
3) they intersect, i.e. have at least one attribute.

For relationships that have one potential key (primary), the NFBK is 3NF.

The relation is in the NFBK, when each non-trivial and left-irreducible functional dependence has a potential key as a determinant.

Suppose we consider a relationship that represents data on parking a day:

Parking numberStart timeEnd timeRate
one09:3010:30Thrifty
one11:0012:00Thrifty
one2:00 pm3:30 pmStandard
210:0012:00Premium V
212:002:00 pmPremium V
23:00 pm18:00Premium A

The tariff has a unique name and depends on the selected parking and the availability of benefits, in particular:

Thus, the following composite primary keys are possible: {Parking number, Start time}, {Parking number, End time}, {Rate, Start time}, {Rate, End time}.

The ratio is in 3NF. The requirements of the second normal form are satisfied, since all attributes are included in one of the potential keys, and there are no non-key attributes in relation. There are also no transitive dependencies, which meets the requirements of the third normal form. However, there is a functional dependence Tariff → Parking number, in which the left part (determinant) is not a potential key of the relationship, that is, the relationship is not in Boyes-Codd normal form.

The disadvantage of this structure is that, for example, by mistake, the “Thrifty” tariff can be attributed to booking a second parking, although it can only apply to the first parking.

It is possible to improve the structure by decomposing a relationship into two and adding an attribute. It has benefits , obtaining relations satisfying NFBK (the attributes included in the primary key are underlined.):

Tariffs
RateParking numberIt has benefits
ThriftyoneYes
StandardoneNot
Premium A2Yes
Premium V2Not

Booking
RateStart timeEnd time
Thrifty09:3010:30
Thrifty11:0012:00
Standard2:00 pm3:30 pm
Premium V10:0012:00
Premium V12:002:00 pm
Premium A3:00 pm18:00

Fourth normal form


A relation is in 4NF if it is in the NFBK and all nontrivial multivalued dependencies are in fact functional dependencies on its potential keys.

In relation to R (A, B, C), there is a multi-valued relationship RA -> -> RB if and only if the set of values ​​of B, corresponding to a pair of values ​​of A and C, depends only on A and does not depend on C.

Suppose that restaurants produce different types of pizza, and restaurant delivery services operate only in certain areas of the city. The composite primary key of the corresponding relationship variable includes three attributes: {Restaurant, Pizza Type, Delivery Area}.

This relationship variable does not correspond to 4NF, since there is the following multivalued relationship:
{Restaurant} → {Type of pizza}
{Restaurant} → {Delivery Area}

That is, for example, when adding a new type of pizza, you will have to make one new tuple for each delivery area. A logical anomaly is possible, in which only certain delivery areas from the areas served by the restaurant will correspond to a certain type of pizza.

To prevent anomalies, you need to decompose the relationship by placing independent facts in different relationships. In this example, decompose into {Restaurant, Kind of Pizza} and {Restaurant, Delivery Area}.

However, if we add an attribute functionally dependent on a potential key to the original relationship variable, for example, the price taking into account the cost of delivery ({Restaurant, Pizza Type, Delivery Area} → Price), then the resulting ratio will be in 4NF and it cannot be decomposed without losses.

Fifth normal form


Relations are in 5NF if it is in 4NF and there are no complex dependent connections between attributes.
If “Attribute depends on“ Attribute_2 ”, and“ Attribute_2 ”in turn depends on“ Attribute_3 ”, and“ Attribute_3 ”depends on“ Attribute_1 ”, then all three attributes are necessarily included in one tuple.

This is a very strict requirement that can only be met under additional conditions. In practice, it is difficult to find an example of the implementation of this requirement in its pure form.

For example, some table contains three attributes "Supplier", "Product" and "Buyer". Buyer1 purchases several Products from Supplier1. Buyer 1 purchased a new Product from the Supplier2. Then, by virtue of the above requirement I, the supplier1 is obliged to supply the Purchaser1 with the same new Product, and the Supplier2 must supply the Purchaser 1, with the exception of the new Goods, the entire product line of the Supplier1. This does not happen in practice. The buyer is free in his choice of goods. Therefore, to eliminate this difficulty, all three attributes are spread across different relationships (tables). After selecting three new relationships (Supplier, Product and Buyer), you need to remember that when retrieving information (for example, About Buyers and Products), you need to connect all three relations in the request. Any combination of the combination of two relationships from the sin will inevitably lead to the extraction of incorrect (incorrect) information. Some DBMS are equipped with special mechanisms that eliminate the extraction of unreliable information. Nevertheless, the general recommendation should be followed: the database structure should be structured in such a way as to avoid the use of 4NF and 5NF.

The fifth normal form is focused on working with dependent compounds. These dependent connections between the three attributes are very rare. Dependent connections between four, five or more attributes are almost impossible to specify.

Domain Key Normal Form


A relationship variable is in DK / NF if and only if each constraint imposed on it is a logical consequence of domain constraints and key constraints imposed on a given relationship variable.
Domain Restriction - a restriction that requires you to use values ​​from a specific domain for a particular attribute. The restriction is essentially the task of specifying a list (or a logical equivalent of a list) of valid values ​​of a type and declaring that the specified attribute has a given type.

Key constraint is a constraint stating that some attribute or combination of attributes is a potential key.

Any relationship variable that is in DK / NF is necessarily in 5NF. However, not any relationship variable can be brought to DK / NF.

Sixth normal form


A relation variable is in the sixth normal form if and only if it satisfies all nontrivial dependencies of the connection. It follows from the definition that a variable is in 6NF if and only if it is irreducible, that is, it cannot be subjected to further decomposition without loss. Each relationship variable that is in 6NF is also in 5NF.

The idea of ​​"decomposition to the end" was put forward before the start of research in the field of historical data, but did not find support. However, for historical databases, the maximum possible decomposition allows you to combat redundancy and simplifies maintaining the integrity of the database.

For historical databases, U_ operators are defined, which unpack the relations by the specified attributes, perform the corresponding operation, and package the result obtained. In this example, the connection of the projections of the relationship should be made using the operator U_JOIN.

Workers
Tab.â„–TimePositionHome address
657501-01-2000: 10-02-2003locksmithLenin St., 10
657511-02-2003: 15-06-2006locksmith22 Sovetskaya St.
657506/16/2006: 05/03/2009brigadier22 Sovetskaya St.

The variable “Employees” is not in 6NF and can be decomposed into the variable “Employees 'Positions” and “Employers' Home Addresses”.

Employees positions
Tab.â„–TimePosition
657501-01-2000: 10-02-2003locksmith
657506/16/2006: 05/03/2009brigadier

Employees' home addresses
Tab.â„–TimeHome address
657501-01-2000: 10-02-2003Lenin St., 10
657511-02-2003: 15-06-200622 Sovetskaya St.

Literature


For a more in-depth and thorough study of the topic considered, the book “Introduction to Database Systems” by Chris J. Data is recommended, on the basis of which this article was written.

Source: https://habr.com/ru/post/254773/


All Articles