Basics of relational algebra

Relational algebra is based on set theory and is the basis of the logic of database operation.
When I first studied the database and SQL device, a preliminary acquaintance with relational algebra greatly helped further knowledge to properly fit in my head, and I will try to make this article have a similar effect.

So if you are going to start your training in this area or you just became interested, I ask under the cat.

Relational database

')
To begin with, we introduce the concept of a relational database in which we will perform all the actions.

A relational database is a set of relations containing all the information that should be stored in the database. In this definition, we are interested in the term relation, but for the time being we will leave it without a strict definition.
Better imagine a table of products.

PRODUCTS table

ID	NAME	COMPANY	PRICE
123	Cookies	LLC ”The Dark Side”	190
156	Tea	LLC ”The Dark Side”	60
235	Pineapples	JSC ”Fruits”	100
623	Tomatoes	LLC ”Vegetables”	130

A table consists of 4 rows, a row in a table is a tuple in relational theory. The set of ordered tuples is called a relation.
Before we define the relationship, we introduce another term - the domain. Domains in relation to a table are columns.

For clarity, we now introduce a strict definition of a relationship.

Let N be the sets D1, D2, .... Dn (domains), the relation R over these sets is the set of ordered N-tuples of the form <d1, d1, ... dn>, where d1 belongs to D1 and so on. The sets D1, D2, .. Dn are called the domains of the relation R.
Each element of a tuple is the value of one of the attributes corresponding to one of the domains.

Keys in a relationship

In relation to the requirement is that all tuples must differ. To uniquely identify a tuple, there is a primary key. The primary key is an attribute or a set of the minimum number of attributes that uniquely identifies a particular tuple and does not contain additional attributes.
It is understood that all attributes in the primary key must be necessary and sufficient to identify a specific tuple, and the exclusion of any of the attributes in the key will make it insufficient for identification.
For example, in such a table, the key will be a combination of attributes from the first and second columns.

DRIVERS table

COMPANY	DRIVER
LLC ”The Dark Side”	Vladimir
LLC ”The Dark Side”	Michael
JSC ”Fruits”	Ruslan
LLC ”Vegetables”	Vladimir

It can be seen that there can be several drivers in an organization, and in order to unambiguously identify a driver, it is necessary to have a value from the column “Name of organization” and from “Driver’s name”. Such a key is called composite.

In a relational database, tables are interrelated and correlated with each other as main and subordinate. The connection of the main and subordinate tables is carried out through the primary key (primary key) of the main table and the foreign key (foreign key) of the subordinate table.
A foreign key is an attribute or set of attributes that is the primary key in the main table.

This preparatory theory will be enough to become familiar with the basic operations of relational algebra.

Relational algebra operations

The basic eight operations of relational algebra were proposed by E. Codd .

Union
Intersection
Subtraction
Cartesian product
Sample
Projection
Compound
Division

The first half of operations is similar to the same operations on sets. Some operations can be expressed through other operations. Consider most of the operations with examples.

For understanding, it is important to remember that the result of any operation of algebra on relations is another relation, which can then also be used in other operations.
Create another table, which is useful to us in the examples.

table SELLERS

ID	SELLER
123	OOO “Dart”
156	OJSC ”Vedro”
235	CJSC “Vegetable Base”
623	OJSC ”Firm”

We agree that in this table ID is the foreign key associated with the primary key of the PRODUCTS table.

To begin, consider the simplest operation - the name of the relationship. Its result will be the same relationship, that is, performing the PRODUCTS operation, we will receive a copy of the PRODUCTS relationship.

Projection

A projection is an operation in which attributes are selected from a relationship only from specified domains, that is, only necessary columns are selected from a table, and if there are several identical tuples, then only one instance of such a tuple remains in the resulting ratio.
For example, let's make a projection on the PRODUCTS table by selecting from it ID and PRICE.

Operation syntax:
π _{(ID, PRICE)} PRODUCTS

As a result of this operation, we obtain the relation:

ID	PRICE
123	190
156	60
235	100
623	130

Sample

A sample is an operation that selects a set of rows in a table that satisfy specified conditions. The condition can be any logical expression.
For example, let's make a selection from a table with a price greater than 90.

Operation syntax:
σ _{(PRICE> 90)} PRODUCTS

ID	NAME	COMPANY	PRICE
123	Cookies	LLC ”The Dark Side”	190
235	Pineapples	JSC ”Fruits”	100
623	Tomatoes	LLC ”Vegetables”	130

In the sample condition, we can use any logical expression. Let's make another sample with a price of more than 90 and product ID less than 300:

σ _{(PRICE> 90 ^ ID <300)} PRODUCTS

ID	NAME	COMPANY	PRICE
123	Cookies	LLC ”The Dark Side”	190
235	Pineapples	JSC ”Fruits”	100

Let's combine the projection and sampling operators. We can do this because any of the operators returns a relation as a result and also uses the relation as arguments.
From the table with products we will select all companies selling products that are cheaper than 110.

π _COMPANY σ _{(PRICE <100)} PRODUCTS

COMPANY

LLC ”The Dark Side”

JSC ”Fruits”

Multiplication

Multiplication or Cartesian product is an operation performed on two relations, as a result of which we obtain a relationship with all domains from two initial relations. Tuples in these domains will be all possible combinations of tuples from the initial relationship. The example will be clearer.

We obtain the Cartesian product of the PRODUCTS and SELLERS tables.
Operation syntax:

PRODUCTS × SELLERS
You may notice that these two tables have the same domain ID. In this situation, domains with the same name get a prefix in the form of the name of the corresponding relationship, as shown below.
For brevity, we multiply not complete relations, but samples with the condition ID <235

(the same tuples are highlighted)

PRODUCTS.ID	NAME	COMPANY	PRICE	SELLERS.ID	SELLER
123	Cookies	LLC ”The Dark Side”	190	123	OOO “Dart”
156	Tea	LLC ”The Dark Side”	60	156	OJSC ”Vedro”
123	Cookies	LLC ”The Dark Side”	190	156	OJSC ”Vedro”
156	Tea	LLC ”The Dark Side”	60	123	OOO “Dart”

For an example of using this operation, let us imagine the need to choose sellers with prices less than 90. Without a product, you would first need to get product IDs from the first table, then use the second table to get the necessary SELLER names, and using the product you will receive such a query:

π _(SELLER) σ _{(RODUCTS.ID = SELLERS.ID ^ PRICE <90)} PRODUCTS × SELLERS

As a result of this operation, we obtain the relation:

SELLER

OJSC ”Vedro”

Compound and natural compound

The join operation is the reverse of the projection operation and creates a new relation from the two already existing ones. A new relation is obtained by concatenating the tuples of the first and second relations, and the concatenations are subjected to relations in which the values of the given attributes match. In particular, if you combine the PRODUCTS and SELLERS relationships, these attributes will be the attributes of the ID domains.

Also, for clarity, you can imagine the connection as the result of two operations. First, the product of the original tables is taken, and then from the obtained relation, we make a selection with the condition of equality of attributes from the same domains. In this case, the condition is equality of PRODUCTS.ID and SELLERS.ID.

Let's try to connect the relationship PRODUCTS and SELLERS and get the relationship.

PRODUCTS.ID	NAME	COMPANY	PRICE	SELLERS.ID	SELLER
123	Cookies	LLC ”The Dark Side”	190	123	OOO “Dart”
156	Tea	LLC ”The Dark Side”	60	156	OJSC ”Vedro”
235	Pineapples	JSC ”Fruits”	100	235	CJSC “Vegetable Base”
623	Tomatoes	LLC ”Vegetables”	130	623	OJSC ”Firm”

A natural connection gets a similar relationship, but if we have a correctly configured scheme in the database (in this case, the primary key of the PRODUCTS ID table is associated with the foreign key of the SELLERS ID table), then in the resulting relationship there is only one domain ID.

Operation syntax:
PRODUCTS ⋈ SELLERS;

It turns out this attitude:

PRODUCTS.ID	NAME	COMPANY	PRICE	SELLER
123	Cookies	LLC ”The Dark Side”	190	OOO “Dart”
156	Tea	LLC ”The Dark Side”	60	OJSC ”Vedro”
235	Pineapples	JSC ”Fruits”	100	CJSC “Vegetable Base”
623	Tomatoes	LLC ”Vegetables”	130	OJSC ”Firm”

Intersection and subtraction.

The result of the intersection will be a relation consisting of tuples that are fully included in both relations.
The result of the subtraction will be a relation consisting of tuples that are tuples of the first relation and are not tuples of the second relation.
These operations are similar to the same operations on sets, so I think there is no need to describe them in detail.

Information sources

Basics of using and designing databases - V. M. Ilyushechkin
lecture course Introduction to Databases - Jennifer Widom, Stanford University

_{I will be grateful for the reasoned comments}

Source: https://habr.com/ru/post/145381/

All Articles