📜 ⬆️ ⬇️

The second normal form in the examples

I will not retell here everything I know about normal forms and I am not going to write an exhaustive introduction on relational algebra and discrete mathematics, for this it is better to open a textbook. Rather, I will try to explain in simple words why all this is needed and give examples.

What is the second normal form or 2NF? So that the three-year-old child understood ...
First, let's look at the goals pursued by normalization. Under the cut a few terms from the discrete.

The purpose of the reduction to the first normal form (1NF) is to enable the construction of statements about the data using the first-order predicate logic formulas. In practice, this makes it possible to construct sample queries according to conditions, since the values ​​are of the same nature.

For example, if the relationship 'Family' has the attribute 'Children', we can easily compare the two strings 'Vasya' and 'Anya' and determine their lexicographical order. But comparing the lines 'Vasya' and 'Anya, Sasha' and determining their order is meaningless, and trying to solve this problem without decomposition will either lead to the introduction of rules for comparing scalars and lists, or to complicate the formula language. The first-order predicate logic is not enough here. Therefore, 1NF requires that all values ​​be simple values ​​from a domain.
')
That is, the first NF deals with the structure of the values ​​of records.

The second (and third) NF already deals with keys and dependencies in the schema. We list its goals with explanations.

  1. The main goal of bringing to the second normal form is the desire to get rid of the redundancy of data storage and, as a consequence, to avoid anomalies of modification of these data (anomalies of change, insertion and deletion)
  2. Second in order, but not by value, the goal of normalization in 2NF is to split the data model into separate relations as much as possible so that they can be combined and used in expressions in new ways not originally intended.
  3. Minimize efforts to change the scheme if necessary. The fewer dependencies within the schema, the fewer changes it will need when changing the data model.
  4. The clarity of the scheme for the user. How to keep all the data in one large table, it is easier to present the data as several related and logically separated relationships. It is easier to read, perceive, design and maintain. In the end, any data model starts on a blackboard or paper in the form of circles, blocks and lines that children and programmers love to draw.


For example, we have a scheme

R = { '', ' -', ' ' } 


where the primary key is the 'Identifier' , and the alternative key is the 'Name of the SD-disk' . This scheme is in 2NF, since the non-key attribute 'Group Name' depends only on the key and does not depend on a subset of the attributes of this key (which are not present, see below).

The relationship scheme has 2NF if any non-key attribute depends only on the key, and does not depend on a subset of its attributes.

In general, it is possible to raise the question of the 2NF mismatch only if there are composite keys in the scheme. Schemes with simple keys as in the example always have 2NF. This scheme is just an example of such a case, since both keys (and this is the 'Identifier' and 'The name of the SD-disk' ) are simple, and the subsets of the attributes of these keys are empty.

The 2NF mismatch is shown in the diagram.

 R = { ' ', ' -', ' ', ' ', '' } 


The same song can be included in several discs, albums of the same name with songs of the same name from different groups, such as tributes, are also theoretically possible. Therefore, the key will be {'Group name', 'CD title', 'Song name'} . At the same time, the attributes Author of Words and Composer depend on the set of attributes {'Group Name', 'Song Name'} . This is a violation of 2NF.

The consequence of this model is the redundancy of storing the values ​​of the 'Word Author' and 'Composer' attributes for each CD-ROM which includes a song. In the field of music, these values ​​do not change, but in other domain areas, a change in such redundant data can lead to modification anomalies and a contradictory state of the database.

Another consequence is that songs that have not yet been released on CDs, but simply broadcast by radio or released on other media, do not fit the specified data scheme. Accordingly, we will not be able to add a new song to the database until it is released on the CD. This is an example of insertion anomaly.

Similarly, if we want to remove any disc from the database, we will have to lose the information about the authors and composers of all the songs that are included only in this disc, because in this model it is not possible to provide information about the author and the composer, if the song is not included in any diabetes. This is an example of a deletion anomaly.

To avoid such anomalies and remove redundancy, we need to break the scheme, that is, to decompose, into two schemes:

 R1 = { ' ', ' ', ' ' } R2 = { ' ', ' ', '', '' } 


Both schemes have 2NF, R1 - because it has no non-key attributes, and R2 - because 'Author' and 'Composer' depend on the key {'Group Name', 'Song Name'} and do not depend (functionally) on any of the attributes ' The name of the group or the name of the song .

We illustrate with another example. We have parts in warehouses. All this is presented in the table of the form.

 ---------------------------------------------- |  |  |  | _ | ==================---------------------------- 


The key in this scheme is the pair {'Detail', 'Warehouse'} , but 'Warehouse Address' functionally depends only on the 'Warehouse' attribute, that is, on a subset of the key attributes. Therefore, the 2NF requirements are not met. What is bad? First, the warehouse address will be duplicated for all parts in the warehouse (redundancy) and if the address changes, all these records will need to be changed to preserve integrity (deletion anomalies may occur). Secondly, if there are no parts in the warehouse, then we are not able to store the address of the warehouse, since the scheme does not envisage such a situation. Therefore, the addition of a new warehouse is impossible (insert anomaly), and the removal of parts from the warehouse means that we will lose information about its address (deletion anomaly).

That's all.
I hope it was clear, I went to deal with 3NF!

Source: https://habr.com/ru/post/113260/


All Articles