Database Development Solutions

Good day! In the process of project development one has to deal with the problem of database performance, since the volume of data is growing, and willy-nilly, problem areas come up.

This post describes not very successful solutions that are invisible in the early stages of the project. And solutions that can further significantly increase productivity.

The post is more designed for more experienced developers who are already looking for solutions, so I will be brief.

Select in the resulting fields

Many developers use the following construction when writing queries

SELECT T.Column1, (SELECT MAX(Column) FROM Table2) as Column3, T.Column2 FROM Table1 T

')
The fact is that for each row from Table1, a “SELECT MAX (Column) FROM Table2” is sampled, which requires additional resources for a new sample. In this case, the performance will fall as with the increase in the amount of data in Table1 and Table2.

Acceptable option.

 SELECT T.Column1, T2.Column3, T.Column2 FROM Table1 T JOIN (SELECT Max(Column) as Column3 FROM Table2) T2 ON 1 = 1

In this case, the selection from Table2 will be performed once, and the result will be substituted for each row from Table1.

Replacing Left Join with Union

There are situations when you need to collect data from several tables of the same type within the meaning of the table and often resort to the following solution.

 SELECT M.Id, COALESCE(P1.Sum, 0) + COALESCE(P2.Sum, 0) + COALESCE(P3.Sum, 0) + ... + COALESCE(PN.Sum, 0) as Sum FROM MainTable M LEFT JOIN PayTable1 P1 ON P1.Id = M.Id LEFT JOIN PayTable2 P2 ON P2.Id = M.Id LEFT JOIN PayTable3 P3 ON P3.Id = M.Id . . . LEFT JOIN PayTableN PN ON PN.Id = M.Id

This approach has several problems:
1. Multiplications of the PayTableN tables are made, and as the tables grow, the performance will drop even if there is little data.
2. The need to use LEFT JOIN, that in many DBMS indexes do not work.

You can improve performance by dropping the multiplication operation in favor of joining and using JOIN instead of LEFT JOIN.

 SELECT R.Id, SUM(R.Sum) as Sum FROM ( SELECT M.Id, P1.Sum FROM MainTable M JOIN PayTable1 P1 ON P1.Id = M.Id UNION ALL SELECT M.Id, P2.Sum FROM MainTable M JOIN PayTable2 P2 ON P2.Id = M.Id UNION ALL SELECT M.Id, P3.Sum FROM MainTable M JOIN PayTable3 P3 ON P3.Id = M.Id . . . UNION ALL SELECT M.Id, PN.Sum FROM MainTable M JOIN PayTableN PN ON PN.Id = M.Id ) R GROUP BY R.Id

Moreover, such a construction is easier to maintain, i.e. when adding new PayTableN tables, you need to add the corresponding block in only one revenge.

Using scalar operations in the join condition

There are situations when in the join condition it is necessary to use additional operations on the field, in which case the index is not used

 SELECT T1.*, T2.* FROM Table1 T1 JOIN Table2 T2 ON T2.Id = T1.Id AND COALESCE(T2.Column, 0) = COALESCE(T1.Column, 0)

Possible solutions:
1. Make the condition WHERE, but this option does not always give a positive result, because The optimizer could do this for you.

 SELECT T1.*, T2.* FROM Table1 T1 JOIN Table2 T2 ON T2.Id = T1.Id WHERE COALESCE(T2.Column, 0) = COALESCE(T1.Column, 0)

2. Divide the request into two.

 SELECT T1.*, T2.* FROM Table1 T1 JOIN Table2 T2 ON T2.Id = T1.Id AND T2.Column = T1.Column UNION SELECT T1.*, T2.* FROM Table1 T1 JOIN Table2 T2 ON T2.Id = T1.Id AND T2.Column IS NULL AND T1.Column IS NULL

Requests using With

The WITH operator of course helps to make the query more understandable and structured, but many do not quite understand how it works and make the following errors.

 WITH MainSubQuery AS ( SELECT * FROM Table1 ), SubQuery AS ( --   ,    ,    SELECT * FROM Table2 ) SELECT * FROM MainSubQuery M LEFT JOIN SubQuery Q1 ON Q1.id = M.id AND Q1.Param = 1 LEFT JOIN SubQuery Q2 ON Q2.id = M.id AND Q2.Param = 2 LEFT JOIN SubQuery Q3 ON Q3.id = M.id AND Q3.Param = 3 LEFT JOIN SubQuery Q4 ON Q4.id = M.id AND Q4.Param = 4

The problem is that many people think that the SubQuery subquery is executed once, and then only the result will be taken and substituted where needed.
But in fact, it turns out the following:

 SELECT * FROM (SELECT * FROM Table1) M LEFT JOIN (SELECT * FROM Table2) Q1 ON Q1.id = M.id AND Q1.Param = 1 LEFT JOIN (SELECT * FROM Table2) Q2 ON Q2.id = M.id AND Q2.Param = 2 LEFT JOIN (SELECT * FROM Table2) Q3 ON Q3.id = M.id AND Q3.Param = 3 LEFT JOIN (SELECT * FROM Table2) Q4 ON Q4.id = M.id AND Q4.Param = 4

The problem can be solved by resorting to temporary tables, i.e. The result of a heavy query is preliminarily placed in a temporary table and then used. This method is also effective if NICKNAMEs from the IBM DB2 world are used as the initial data set.

 INSERT Session.MainSubQuery SELECT * FROM Table1; INSERT Session.SubQuery SELECT * FROM Table2; SELECT * FROM Session.MainSubQuery M LEFT JOIN Session.SubQuery Q1 ON Q1.id = M.id AND Q1.Param = 1 LEFT JOIN Session.SubQuery Q2 ON Q2.id = M.id AND Q2.Param = 2 LEFT JOIN Session.SubQuery Q3 ON Q3.id = M.id AND Q3.Param = 3 LEFT JOIN Session.SubQuery Q4 ON Q4.id = M.id AND Q4.Param = 4;

Escort

Next, we turn from machine productivity to the productivity of human labor. It will be about how to facilitate the routine work of supporting and maintaining databases.

In the process of developing a project, it is necessary to re-create procedures, functions, tables and other database objects, but
Not every DBMS has a CREATE OR REPLACE PROCEDURE [FUNCTION, TABLE, VIEW, TRIGER, ...] construct and you have to
updates constantly monitor DROPs, ALTERs, etc. in each database in different ways.

The solution to this issue may be to enter your own service procedures, for example:

 CALL[EXECUTE] DropProcedure(SchemaName, ProcedureName) CALL[EXECUTE] DropTable(TableName, TableName) CALL[EXECUTE] DropView(SchemaView, ProcedureView)

etc.

And then we already create the creation of the corresponding objects, regardless of whether they were created earlier or not.

But with this approach, there is also a problem associated with access rights to objects, i.e. when deleting and creating an object, the rights of groups to objects flies, since This is a new object for the DBMS.

Solutions may be different, for example:
Maintain the rights issue script up to date and download after each update (inconvenient).
Do not use group policy at all (not good).

Quite a convenient and effective solution is to enter a pair of service procedures, where one will be launched at the beginning of the update, and the second at the end.

 --   CALL[EXECUTE] StoreRights --    --  CALL[EXECUTE] RestoreRights

The next question concerns the maintenance of the database structure of the MS SQL DBMS, namely in connection with the extremely inconvenient method of commenting tables, columns. Which in turn is done rarely or not at all done.

Example:

  EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'' , @level0type=N'SCHEMA',@level0name=N'main', @level1type=N'TABLE',@level1name=N'TextTable', @level2type=N'COLUMN',@level2name=N'id';

You can also create a set of service procedures for wrappers.

  EXEC Service.CommentOnTable N'SchemaName', N'TableName', N'TableComment'; EXEC Service.CommentOnColumn N'SchemaName', N'TableName', N'ColumnName', N'ColumnComment'; EXEC Service.CommentOnProcedure N'SchemaName', N'TableName', N'ProcedureComment';

So already shorter, more informative and more enjoyable to work.

Of course, this set of service procedures is not limited, you can think of a lot of things.

All the best of pleasant work!

Source: https://habr.com/ru/post/234721/

All Articles