📜 ⬆️ ⬇️

Work on hardware errors on the SQL server side and the benefits of load tests

Some of our users began to report that reports sometimes return a value in excess of 100% to show data growth.

At the same time, it turned out that the service delivering data from the equipment sometimes makes gaps in the values.
Where the server read the hardware directly, instead of values, it inserted NULL, and where it read through SNMP, it inserted 0.

That is, a series of counter values ​​was: 4, 10, 20, NULL, NULL, 31, 0, 0, 0, 50, and now 4, 10, 20, 20, 20, 31, 31, 31, 31, 50
')
Probably it would be possible to approximate the data, but it suits the stackers, and our business is to satisfy the customers.
What to do became clear, the question is only on which side what to correct.



The reporters themselves decided not to touch them - there are a lot of them, therefore, in addition to repairing the reading service, we must also correct the data in the database itself.

In the beginning, because they complained about one report, the solution was simple - to fix the data that participated in the report. All the analysis could not be done, because the customers (due to the volumes) could not send their bases, so the decision was made on the basis of logs.

The first solution was obvious and simple.

for MSSQL 2008:

Update curr set curr.dev_counter_color=coalesce(curr.dev_counter_color, prev.dev_counter_color) from device_counter curr left join device_counter prev on curr.dev_id = prev.dev_id and curr.dev_counter_color is null and prev.dev_counter_date = (select max(sub.dev_counter_date) from device_counter sub where sub.dev_counter_date < curr.dev_counter_date and dev_counter_color is not null); 


for Oracle:
 merge into DEVICE_COUNTER t using ( select ROWID as rid, last_value(dev_counter_duplex ignore nulls ) over (partition by dev_id order by dev_counter_date, dev_counter_id) as new_cnt_value from DEVICE_COUNTER t ) v on (t.rowid = v.rid) when matched then update set dev_counter_duplex = new_cnt_value 


And half a year later, as a user appeared with a base of tens of thousands of devices with data for several years (about five million records), he began complaining that he could not wait for the end of the database upgrade.

At the same time, we received a letter from the technical report of those who decided to offer to repair the base with all 57 counters.

It would seem that you fix the query column by column and the end of it.

When we still pulled the database from the client, it turned out that the request runs on one column for almost two minutes on an average virtual server, and a weak laptop client complaining about the update speed, according to our calculations, did all 57 sql of queries in ~ 52 hours! ..

As usual, we recommend our clients to work with a technique whose parameters are specified in the technical requirements, but is it really that bad in relational languages?

I had to remember about the good old cursor, - the task itself is simple - to go through several million entries and if there is something to edit, then edit.

If the data skips were in all columns, then a simple pass with an update of all rows takes about the same time as the original version - 38 minutes versus 92 in the relational approach ...

And only the fact that the passes are rare - reduced the upgrade to 2! minutes is for 22,000 lines of 3 million.

The final TSQL code. The number of fields reduced to 2 so that the code does not look so loud:
 BEGIN DECLARE @updated int = 0; DECLARE @cur_dev_id int = NULL, @cur_id int = NULL DECLARE @cur_counter_total_color int = NULL, @cur_counter_total_mono int = NULL; DECLARE @next_dev_id int = NULL, @next_id int = NULL DECLARE @next_counter_total_color int = NULL, @next_counter_total_mono int = NULL; DECLARE UPDCURSOR CURSOR for select dev_id, dev_counter_id, dev_counter_total_color, dev_counter_total_mono from device_counter d order by dev_id, dev_counter_date OPEN UPDCURSOR FETCH NEXT FROM UPDCURSOR into @cur_dev_id, @cur_id, @cur_counter_total_color; WHILE @@FETCH_STATUS = 0 BEGIN FETCH NEXT FROM UPDCURSOR INTO @next_dev_id, @next_id , @next_counter_total_color, @next_counter_total_mono IF @@FETCH_STATUS = 0 AND @cur_dev_id = @next_dev_id AND ( ((@next_counter_total_color IS NULL AND @cur_counter_total_color IS NOT NULL) OR (@next_counter_total_color = 0 AND (@cur_counter_total_color > 0 OR @cur_counter_total_color Is NULL ))) OR ((@next_counter_total_mono IS NULL AND @cur_counter_total_mono IS NOT NULL) OR (@next_counter_total_mono = 0 AND (@cur_counter_total_mono > 0 OR @cur_counter_total_mono Is NULL ))) ) BEGIN SELECT @cur_counter_total_color = (CASE WHEN @next_counter_total_color IS NULL OR @next_counter_total_color = 0 THEN @cur_counter_total_color ELSE @next_counter_total_color END), @cur_counter_total_mono = (CASE WHEN @next_counter_total_mono IS NULL OR @next_counter_total_mono = 0 THEN @cur_counter_total_mono ELSE @next_counter_total_mono END); SET @updated = @updated + 1 UPDATE device_counter SET dev_counter_total_color = @cur_counter_total_color, dev_counter_total_mono = @cur_counter_total_mono WHERE CURRENT OF UPDCURSOR END ELSE BEGIN SELECT @cur_counter_total_color = @next_counter_total_color, @cur_counter_total_mono = @next_counter_total_mono; END SET @cur_dev_id = @next_dev_id END CLOSE UPDCURSOR DEALLOCATE UPDCURSOR END 


I didn’t invent anything especially in this article, just practicing the real SQL developer requires to avoid using the cursor, but as this example showed, if there is an ugly alternative solution and it works better, then you need to use it ...

And it is especially important to do not only unit tests but load tests at the highest possible volumes.

Source: https://habr.com/ru/post/276835/


All Articles