šŸ“œ ā¬†ļø ā¬‡ļø

Basics of relational algebra

Relational algebra is based on set theory and is the basis of the logic of database operation.
When I first studied the database and SQL device, a preliminary acquaintance with relational algebra greatly helped further knowledge to properly fit in my head, and I will try to make this article have a similar effect.

So if you are going to start your training in this area or you just became interested, I ask under the cat.


Relational database

')
To begin with, we introduce the concept of a relational database in which we will perform all the actions.

A relational database is a set of relations containing all the information that should be stored in the database. In this definition, we are interested in the term relation, but for the time being we will leave it without a strict definition.
Better imagine a table of products.

PRODUCTS table
IDNAMECOMPANYPRICE
123CookiesLLC ā€The Dark Sideā€190
156TeaLLC ā€The Dark Sideā€60
235PineapplesJSC ā€Fruitsā€100
623TomatoesLLC ā€Vegetablesā€130


A table consists of 4 rows, a row in a table is a tuple in relational theory. The set of ordered tuples is called a relation.
Before we define the relationship, we introduce another term - the domain. Domains in relation to a table are columns.

For clarity, we now introduce a strict definition of a relationship.

Let N be the sets D1, D2, .... Dn (domains), the relation R over these sets is the set of ordered N-tuples of the form <d1, d1, ... dn>, where d1 belongs to D1 and so on. The sets D1, D2, .. Dn are called the domains of the relation R.
Each element of a tuple is the value of one of the attributes corresponding to one of the domains.

Keys in a relationship

In relation to the requirement is that all tuples must differ. To uniquely identify a tuple, there is a primary key. The primary key is an attribute or a set of the minimum number of attributes that uniquely identifies a particular tuple and does not contain additional attributes.
It is understood that all attributes in the primary key must be necessary and sufficient to identify a specific tuple, and the exclusion of any of the attributes in the key will make it insufficient for identification.
For example, in such a table, the key will be a combination of attributes from the first and second columns.

DRIVERS table

COMPANYDRIVER
LLC ā€The Dark Sideā€Vladimir
LLC ā€The Dark Sideā€Michael
JSC ā€Fruitsā€Ruslan
LLC ā€Vegetablesā€Vladimir


It can be seen that there can be several drivers in an organization, and in order to unambiguously identify a driver, it is necessary to have a value from the column ā€œName of organizationā€ and from ā€œDriver’s nameā€. Such a key is called composite.

In a relational database, tables are interrelated and correlated with each other as main and subordinate. The connection of the main and subordinate tables is carried out through the primary key (primary key) of the main table and the foreign key (foreign key) of the subordinate table.
A foreign key is an attribute or set of attributes that is the primary key in the main table.

This preparatory theory will be enough to become familiar with the basic operations of relational algebra.

Relational algebra operations


The basic eight operations of relational algebra were proposed by E. Codd .

The first half of operations is similar to the same operations on sets. Some operations can be expressed through other operations. Consider most of the operations with examples.

For understanding, it is important to remember that the result of any operation of algebra on relations is another relation, which can then also be used in other operations.
Create another table, which is useful to us in the examples.

table SELLERS

IDSELLER
123OOO ā€œDartā€
156OJSC ā€Vedroā€
235CJSC ā€œVegetable Baseā€
623OJSC ā€Firmā€

We agree that in this table ID is the foreign key associated with the primary key of the PRODUCTS table.

To begin, consider the simplest operation - the name of the relationship. Its result will be the same relationship, that is, performing the PRODUCTS operation, we will receive a copy of the PRODUCTS relationship.

Projection

A projection is an operation in which attributes are selected from a relationship only from specified domains, that is, only necessary columns are selected from a table, and if there are several identical tuples, then only one instance of such a tuple remains in the resulting ratio.
For example, let's make a projection on the PRODUCTS table by selecting from it ID and PRICE.

Operation syntax:
Ļ€ (ID, PRICE) PRODUCTS

As a result of this operation, we obtain the relation:
IDPRICE
123190
15660
235100
623130


Sample

A sample is an operation that selects a set of rows in a table that satisfy specified conditions. The condition can be any logical expression.
For example, let's make a selection from a table with a price greater than 90.

Operation syntax:
σ (PRICE> 90) PRODUCTS

IDNAMECOMPANYPRICE
123CookiesLLC ā€The Dark Sideā€190
235PineapplesJSC ā€Fruitsā€100
623TomatoesLLC ā€Vegetablesā€130


In the sample condition, we can use any logical expression. Let's make another sample with a price of more than 90 and product ID less than 300:

σ (PRICE> 90 ^ ID <300) PRODUCTS

IDNAMECOMPANYPRICE
123CookiesLLC ā€The Dark Sideā€190
235PineapplesJSC ā€Fruitsā€100


Let's combine the projection and sampling operators. We can do this because any of the operators returns a relation as a result and also uses the relation as arguments.
From the table with products we will select all companies selling products that are cheaper than 110.

Ļ€ COMPANY σ (PRICE <100) PRODUCTS

COMPANY
LLC ā€The Dark Sideā€
JSC ā€Fruitsā€


Multiplication

Multiplication or Cartesian product is an operation performed on two relations, as a result of which we obtain a relationship with all domains from two initial relations. Tuples in these domains will be all possible combinations of tuples from the initial relationship. The example will be clearer.

We obtain the Cartesian product of the PRODUCTS and SELLERS tables.
Operation syntax:

PRODUCTS Ɨ SELLERS
You may notice that these two tables have the same domain ID. In this situation, domains with the same name get a prefix in the form of the name of the corresponding relationship, as shown below.
For brevity, we multiply not complete relations, but samples with the condition ID <235

(the same tuples are highlighted)
PRODUCTS.IDNAMECOMPANYPRICESELLERS.IDSELLER
123CookiesLLC ā€The Dark Sideā€190123OOO ā€œDartā€
156TeaLLC ā€The Dark Sideā€60156OJSC ā€Vedroā€
123CookiesLLC ā€The Dark Sideā€190156OJSC ā€Vedroā€
156TeaLLC ā€The Dark Sideā€60123OOO ā€œDartā€


For an example of using this operation, let us imagine the need to choose sellers with prices less than 90. Without a product, you would first need to get product IDs from the first table, then use the second table to get the necessary SELLER names, and using the product you will receive such a query:

Ļ€ (SELLER) σ (RODUCTS.ID = SELLERS.ID ^ PRICE <90) PRODUCTS Ɨ SELLERS

As a result of this operation, we obtain the relation:

SELLER
OJSC ā€Vedroā€


Compound and natural compound

The join operation is the reverse of the projection operation and creates a new relation from the two already existing ones. A new relation is obtained by concatenating the tuples of the first and second relations, and the concatenations are subjected to relations in which the values ​​of the given attributes match. In particular, if you combine the PRODUCTS and SELLERS relationships, these attributes will be the attributes of the ID domains.

Also, for clarity, you can imagine the connection as the result of two operations. First, the product of the original tables is taken, and then from the obtained relation, we make a selection with the condition of equality of attributes from the same domains. In this case, the condition is equality of PRODUCTS.ID and SELLERS.ID.

Let's try to connect the relationship PRODUCTS and SELLERS and get the relationship.

PRODUCTS.IDNAMECOMPANYPRICESELLERS.IDSELLER
123CookiesLLC ā€The Dark Sideā€190123OOO ā€œDartā€
156TeaLLC ā€The Dark Sideā€60156OJSC ā€Vedroā€
235PineapplesJSC ā€Fruitsā€100235CJSC ā€œVegetable Baseā€
623TomatoesLLC ā€Vegetablesā€130623OJSC ā€Firmā€


A natural connection gets a similar relationship, but if we have a correctly configured scheme in the database (in this case, the primary key of the PRODUCTS ID table is associated with the foreign key of the SELLERS ID table), then in the resulting relationship there is only one domain ID.

Operation syntax:
PRODUCTS ā‹ˆ SELLERS;

It turns out this attitude:

PRODUCTS.IDNAMECOMPANYPRICESELLER
123CookiesLLC ā€The Dark Sideā€190OOO ā€œDartā€
156TeaLLC ā€The Dark Sideā€60OJSC ā€Vedroā€
235PineapplesJSC ā€Fruitsā€100CJSC ā€œVegetable Baseā€
623TomatoesLLC ā€Vegetablesā€130OJSC ā€Firmā€


Intersection and subtraction.

The result of the intersection will be a relation consisting of tuples that are fully included in both relations.
The result of the subtraction will be a relation consisting of tuples that are tuples of the first relation and are not tuples of the second relation.
These operations are similar to the same operations on sets, so I think there is no need to describe them in detail.

Information sources



I will be grateful for the reasoned comments

Source: https://habr.com/ru/post/145381/


All Articles