Tales of the DBMS

Introduction

Often, when the term “DBMS” is pronounced, it means only a relational DBMS (hereinafter, the terms will be considered synonymous) - this is primarily due to the fact that most of the DBMS on the market are now relational. The relational model is focused on organizing data in the form of two-dimensional tables, and its implementation is based on the work of Edgar Codd ¹ . The relational model is good and bad: good because of the simplicity of implementation, bad from the point of view of working with object-oriented programming languages.

Fairy tale first. "Objects in the database"

To solve the problems of storing objects in the database, a whole class of new systems has been created - Object-oriented DBMS (and intermediate Object-relational DBMS). The need to solve this problem is due to the fact that object-oriented programming (OOP) is currently the dominant programming paradigm in which the basic concepts are the concepts of objects and classes. Object-oriented programming is a programming methodology based on the representation of a program as a collection of objects, each of which is an instance of a particular class, and the classes form an inheritance hierarchy [Butch]. OOP is focused on the implementation of large software systems that are developed by a team of programmers. OOP allows you to create software objects that correspond to the domain objects and reuse objects when developing other programs, which significantly speeds up the process of creating new products.

An object has a state, a behavior, and an identity; the structure and behavior of similar objects defines a common class for them; the terms “class instance” and “object” are interchangeable [Butch]. The most important properties in the context of interaction with the DBMS is the state (the cumulative value of the attributes of the object) and identity. “Identity is an object property that distinguishes it from all other objects” [Khoshafian].
')
Khoshafyan and Copeland point out the following: “In most programming and database management languages, they are called to distinguish temporary objects, thereby confusing addressability and identity. Most databases distinguish permanent objects by key attribute, thereby mixing the identity and value of the data. ”

A number of problems associated with processing objects in the database were solved in object-oriented DBMS, for example, in Cache ² . The use of such DBMS is not yet widespread. Therefore, a solution using object-oriented database management systems as the basis for storing data should be considered a next “bicycle” rather than an industry standard.

The second tale. "Business logic"

I constantly come across apologists for the “Everything to the base” approach. The essence of the approach is that the databases are used not only for their intended purpose, namely: placing structured data, their selection and modification, but also for the purpose of aggressively introducing business logic into the framework of the database. The latter binds the data and logic in the framework of one DBMS - it’s like to cork a gin in a bottle, and wait for the moment when it breaks free breaking and breaking everything around. The course associated with the integration of business logic in the database, appeared in the mid-90s, when it took numerous refactoring of ³ software.

In order not to be unfounded, I will give the main arguments in favor of the “Everything to the base” approach:

Security. It is considered that it is much safer to give rights to a stored procedure and protect the logic of data processing than to give rights to tables and then try to implement data integrity using triggers.
Speed. Execution of stored procedures occurs in the address space of the DBMS, which means proximity to the stored data, which in turn reduces the response time of the system.
Specialization of languages for developing stored procedures. Such languages are “sharpened” to work with data in the database, which, again, has a positive effect on the speed of the CP.
Localization of changes. Changing the logic of the system is performed in a single place, and in most cases does not require recompilation of the client software and its subsequent reinstallation.
Traffic minimization. The decrease in traffic volumes is due to the fact that the result of the CP is returned to the user, and not the raw data.

To argue with the data assertions is meaningless, since they are basically true. However, if we specify the question, namely, is it necessary to store business logic in the database using a three-tier architecture with the release of a full-fledged "application server"? If for a startup the creation of such an infrastructure is not always justified, then for corporate use there are practically no alternative solutions - the monsters of the industry, naturally, offer their own solutions for the intermediate layer.

Based on the foregoing, I want to remind one of the rules for developing applications:

“The most important thing is the concentration of business logic on the same level. And the choice of level depends on the specific task. "
- Sergey Kiselev, co-founder and consultant on technologies and development, I-Point Rus Research and Production Company LLC.

Bringing business logic to the database level is concentrating it in a place where you lose control over performance, scaling, and portability and, to a large extent, rely on the database vendor.

“If you transfer the application logic to the database, then you tie yourself to a specific database, and even more precisely and worse, to a specific version of the DBMS. Loss of flexibility. At the same time, if we understand what a cross-platform implementation is, then transferring the logic within the framework of the implementation is no problem anywhere. And how to manage it is well known. ”
- Victor Gamov, leading programmer, educational and scientific center MIIT-Expert.

Another, in my opinion, main problem is related to the complexity of carrying out a number of development and maintenance stages of a software product, such as debugging and testing. And also, the use of the “All to Base” approach greatly complicates the process of refactoring.

“Such a vendor-specific logic, difficult to debug, is not refactoring, testing is possible only with static analysis. However, there are probably a dozen arguments for business logic on the DBMS side. Among them may be, for example, the pursuit of performance or the need for clever management of locks on the rows of database tables. In these cases, all the listed disadvantages become an unpleasant consequence of a compromise. And the developers, in turn, are victims of these disadvantages. "
- Andrey Gura, project manager, Yandex.

Of course, to make a decision about the location of the business logic in the implementation should be based on the task. Based on the fact that the same data processing steps can be implemented both in the stored procedure directly at the DBMS level and in the application.

“All business logic can be divided into“ two parts ”. One part is “data maintenance”, the second is “implementation of an applied task”. It is more logical to keep the first one in the DBMS, next to the data, the second - on the application server. By “data maintenance” I understand the support of data consistency from the point of view of the applied task and the type function to assemble applied-calculated from primary data. ”
- Vladimir Bormotov, system adminitrator, ZAO BioHimMak.

In any case, the adoption of a specific decision rests with the developers, who must, as sappers, understand the importance of each step in this matter.

The third tale. "Waterfall Connections"

If you look inside the majority of CMS ⁴ systems, despite the different writing style and qualifications of developers, you can notice one common feature - this is how developers work with connections to the DBMS.

If we imagine a small Web-project, which is used by several hundred people, and a typical process of receiving content from a database, then we will get a depressing picture. Often, when downloading content, a connection is created for each user, the data is uploaded and the connection is closed - this seemingly slender process actually turns out that if an unreasonably large number of consumers arrive at rush hour, then the DBMS is denied service.

The error described above is the anti-pattern “Bustle with Connections” [Tate]. For example, several objects during the execution of a business process need to connect to the database several times to receive and save data, and more time may be spent on creating each connection periodically than on conducting a transaction. For competent work with the DBMS, it is enough to use the special ConnectionPool pattern.

The essence of this pattern is to control the number of connections to the database (as a rule, the number of connections to the database is limited to work), which simplifies the process of obtaining data. To get a database connection, you need to call the getPooledConnection method on the object that implements the ConnectionPool pattern - the result will be a connection to connect to the database, and after you finish working with the database, you must release the connection, returning it to the ConnectionPool.

I draw your attention to the fact that ConnectionPool maintains the selected number of connections all the time, which naturally minimizes the time of receiving data.

The described mechanism allows for more efficient system power management. When using the approach, the overall system performance, expressed in the number of requests serviced per unit of time, becomes much more predictable, since it does not decrease due to lack of memory [Ch].

Conclusion

The speed of software development, and not its quality, is a top priority in many companies and, although it’s wrong, you can’t get away with it. Since most of the work is related to storing and processing data in a DBMS, companies from their own “bikes” come to proven ORM frameworks.

LITERATURE

[Buch] Butch G. Object-oriented analysis and design with examples of applications in C ++ / Publishers: Bean, Nevsky Dialect, 1998, 560 pages.
[Khoshafian] Khoshafian, S. and Copeland, G. November 1986. Object Identity. SIGPLAN Notices vol.21 (ll) .p.406.
[Fa] Fowler M. Refactoring. Improvement of the existing code / M. Fowler - SPb .: Symbol-Plus, 2004. - 432 pp., Ill.
[Ki] D. Kerievski. Refactoring using templates / M .: OOO I.D. Williams, 2006. - 400 pp., Ill.
[Tate] Tate B. Bitter Taste Java: A Programmer’s Library / SP: Peter, 2003. - 333 p.
[Ch] Chernousov A.V. IT infrastructure of system research in power engineering and its implementation technology / L.V. Massel, A.V. Chernousov // Modeliuvnya that information technology - KIV, 2006.

_________
¹ Edgar Codd (Edgar Codd) - British scientist, founder of the theory of relational databases. In 1970 he published the work “A Relational Model of Data for Large Shared Data Banks”, which is considered the first work on the relational data model.
² InterSystems Caché official website www.intersystems.com/cache
³ Refactoring or Reorganization - the process of completely or partially rewriting a computer program or other material in order to improve the readability of the code and the overall internal structure of components, while fully and accurately preserving the original meaning and behavior (except when an error is corrected during refactoring - incorrect behavior) [Fa, Ki].
⁴ CMS - (Content Management System) - a computer program or system used to provide and organize a joint process of creating, editing and managing text and multimedia documents (content or content). Usually, this content is treated as unstructured data of the objective problem as opposed to structured data, usually managed by a DBMS. In this case, we consider an entry-level Web-oriented CMS.

Source: https://habr.com/ru/post/70906/

All Articles