📜 ⬆️ ⬇️

Stored functions. Pros and cons

Using the stored functions of the DBMS to implement business logic, or part of it, has always been a stumbling block. On the one hand, DBA barricades and database programmers, on the other, backend developers.
I would risk incurring anger from both camps, but still sum up the pros and cons and present my thoughts on when to write code in stored functions, and when to bring it out.




Let's start with the arguments against:
')

Smearing business logic


This, in fact, is not a problem of DBMS and HF, as a tool - this is the problem of their incorrect use. The DB programmer may want to describe all the logic of the action being implemented in the stored function - indeed, because all the data is at hand. If a programmer succumbs to temptation, and his manager does not mind, in the future there may be problems with the narrowness of the interface with the external system (for example, with the application server) - you have to add new parameters, complicate logic, etc. This may even lead to the emergence of "duplicate" HF with slightly different functionality.

The paucity of the language of the DBMS


There is such a thing. Traditional languages ​​for writing HF pl / sql, t-sql, pl / pgsql are quite primitive compared to modern general-purpose languages. It is worth noting that it is possible to write HF in more advanced languages, such as Java in Oracle or Python in postgresql.

Intolerance to stored functions


This refers to the incompatibility of the dialects of procedural languages ​​of different DBMS. Multi-platform is just at the level - thanks to the support of different operating systems and architectures in the DBMS itself and the independence of the built-in languages ​​from the external platform. Here again, the decision depends on the specifics of the project. If the project is replicable, and you do not control the platform (the classic example is CMS), then you need portability and the use of HF will only add a headache. If the project is unique, or the implementations will be unified (for example, in different branches of one company), then the intolerance between different DBMS can be forgotten.

The lack of the necessary skills of the team and the high "cost" of the relevant specialists


This, in my opinion, the most serious argument against the use of HF. It all depends on the scale of the project. Roughly speaking, the use of stored code on the DBMS side is justified in medium-large enterprise projects. If the project is smaller - not worth the candle. If the project is hugely overloaded, then the architecture with HF and RDBMS will be confronted with scaling problems - here it is necessary to use a specific storage and data processing approach.

Now the pros:

Speed


When processing even small amounts of data in an external application, we spend additional time on transmitting over the network and converting data into the format we need. In addition, the data processing algorithms are already built in, debugged and tested in the DBMS, and your programmers have no reason to practice inventing bicycles.

Hiding the data structure


With the growth and evolution of the software system, the data scheme can and should change. A well-designed software interface on the HF will allow you to change the data scheme without changing the code of external applications (of which there may be several). This naturally implies the separation of the roles of developers who work with the database and know its structure, and developers of external applications who only need to know the provided API. When using dynamic SQL on the application side, additional layers of database software abstractions, various ORMs, are introduced for such a separation.

Flexible access rights management


It is a good practice to restrict the user, under which the client application “walks” into the base, in such a way that he does not have rights to read and change any objects. Only performs the functions allowed to it. Thus, you can tightly control what actions are available to the client, reducing the likelihood of data integrity problems due to an error in the client application.

Lower probability of SQL injection


When using dynamic SQL on the part of the client program, the client program transmits the SQL commands to the DBMS as strings generated in the code. When forming these lines, the programmer must be extremely careful to prevent the possibility of unintended modification of the SQL command. When using HF, SQL code on the application side is usually static, and looks like a simple HF call, the parameters of which are not passed in strings, but through placeholders (: variable) through the binding mechanism. Of course, this does not exclude the possibility of SQL injection completely (after all, it is possible to concatenate a string passed in a parameter with the text of a dynamically executed SQL query in HF), but it significantly reduces its probability.

SQL reuse


Implementing the logic of working with data in the stored layer, we get our usual hierarchical model of reusing SQL code.
When using dynamic SQL, query reuse is difficult.
For example, suppose there is a system A based on HF and a system B based on dynamic SQL. In both systems, there is a function to get the price of the product get_price. In case A, this is a stored function or display (view), in case B, for example, a java procedure that executes a SQL query via JDBC. There is a task - to get the total value of goods in stock. In case A, we join get_price directly into a query that receives a list of goods in stock (if get_price - view or HF in SQL, such as in PostgreSQL, the optimizer expands the inline query - this turns out one query that quickly finds the amount) .
In case B, there are two options - either run on the cursor with a sample of goods in stock and call get_price n times (which means that the entire sample should be transmitted over the network to the client) or forget about reuse and write a subquery duplicating the one that was already written in get_price. Both options are bad.

Simple SQL Debugging


Debugging is simplified (in comparison with a heterogeneous procedure external code + sql)
In systems with dynamic SQL (any ORM), even the simple task of finding a problem piece of SQL can be difficult.
Semantic and syntactic SQL verification at compile time.
Ability to profile functions and search for bottlenecks.
The ability to trace an already running and running system.
Automatic dependency control - when entity definition is changed, dependent entities are disabled.

When to write business logic in the database?


If the processing speed is important

Processing data directly at the place of storage often provides a significant increase in processing speed. Such optimizations become possible, such as, for example, aggregations at the data storage level — the data from the array is not even transferred to the DBMS server, let alone the client.

When data integrity and consistency are important

In stored functions with explicit transaction and lock management, it is easier to ensure data integrity and operation atomicity. Of course, all this can be implemented outside, but this is a separate and big work.

The data has a complex but well-established structure.

Flat and weakly interconnected structures often do not require the full wealth of processing tools offered by the DBMS. For them, you can use ultra-fast key-value storage and caching in memory.
Difficultly organized, highly connected hierarchical and network structures are a clear indication that your knowledge of RDBMS will be useful!

When to take the code out?


Work with external data

If the specifics of the system is such that the data coming in for processing outside (from sensors, from other systems) is greater than the data stored in the database, then many advantages of the database, as programming platforms are lost. It turns out to be easier to process the incoming data from the outside and save the result in the database, than to first push everything into the database, and then process it. Here the same principle is observed - to process data as close as possible to the source, which we talked about earlier in relation to the processing of data already stored in the database.

Complex algorithms

Sophisticated or highly optimized numerical impact crusher algorithms are best written in more suitable languages. The embedded languages ​​of the RDBMS are very powerful (in the sense that they are high-level, not flexible), but due to this they have a high overhead.

Highload

In super-heavily loaded systems, conventional approaches to transaction serialization and server cluster synchronization become a bottleneck. Such systems are characterized by unique solutions for specific tasks, universal and powerful RDBMS systems are often too slow with loads of hundreds of thousands of competitive transactions per second.

The conclusion is that there is no clear algorithm. Each time the decision remains for the architects and the manager and it depends on whether the project is bogged down with problems with race conditions and inconsistency of NoSQL data, problems with performance and debugging of ORM requests, or will run into problems of scaling DBMS using stored functions. Therefore - make the right decisions :)

Source: https://habr.com/ru/post/210920/


All Articles