How I improved the performance of SSRS reports

Prehistory

Being a “Nedomidl” in one Kharkiv office, I suddenly received on LinkedIn an invitation to move to Poland and start working for one big bank. After very long thoughts, I agreed, motivating myself with the hope that this would be a very rewarding experience. Working for a large corporation is not only a programming experience, but also two, atoms and three years of useful entries for resumes, as well as +100 to wear a suit. (Not so, but more on that another time). It turned out that the level of govnokod in Poland is much more favorable in Ukraine, and the average level of a senior developer, at least of those whom I met here, cannot be compared with Ukrainian colleagues.

So, having come to a new job, after three months of organizing the routine and finding a project for me, I received a tool for myself that generates SSRS reports. To say that they put here from a high bell tower on any principles and advice from leading world engineers means to say nothing. But it is necessary to do something - so there is no one to blame.

Given:

.Net Console Application, which

- Runs on the standard Windows scheduler;
- downloads two txt files from ftp;
- makes structural analysis;
- connects the data;
- shoves in the database;
- as a result, one table per 100 string (mostly) columns, to which about 3000 entries are added every day, which do not even have id IDENTITY.

SQL Server in which the database is stored, in which

- A fucking healthy table, with a bunch of columns and rows;
- normalization died;
- there is a pair of view'yuh;
- there are accompanying tables, as information about the user, logs, and so on;
- a very important detail - two types of data were stored in a large table, which contained the same columns, but had to match each other. For example, information about the transaction on the part of the buyer and the house being bought, the street price and everything else is the same, only a few fields made up the key by which the house and the buyer could be related to each other. This was the main task of the report.
')

SSRS reports that

- climb directly to the database with requests that select data from the view'yuh, on certain days. You can generate a daily report or for a month. For example:

SELECT a.col1, a.col2 .... a.col100 FROM vw_Data_With_User_Info WHERE a.TimeStamp >= @StartDate AND a.TimeStamp <=@EndDate

Additional conditions:

- No access to UAT;
- In the DEV database data for the month, test, access to real data is not present;
- No rights to profiler.

Task

- The report for one day is loaded approximately 3 minutes, about 40 seconds are necessary;
- The report for a month loads about an hour, or does not load at all - just fix it whenever possible.

Solutions:

It should be noted that I haven’t even read about SSRS before, so maybe some really good programmers already see why it works so badly, but it took me about a month to complete the solution.

Attempt 1

First, the most logical solution was to reduce the number of columns in the report. After all, no one can normally view data from 100 columns for 3000 records, let them grouped by company and user. We talked with the customer, found out the most necessary columns, and it turned out that it was completely painless to reduce their number to 16. The Lite report was born.
The result of the attempt 1. Report_Lite for one day - 40 seconds - 1 min 20 seconds depending on the mood of the server and the position of the stars. For a month, half an hour - an hour. At this, customers have subsided for some time, content with the result.

Attempt 2

Over the next month, I gradually tried to reduce the report loading time, restoring order in views and queries. For example, all the calculations, and with them the logic of comparing the client and the house to each other, in the console application, because no one has yet made any requirements for its speed. After that, a pair of columns was added to the table, among which was MatchingKey, and the rate of generating reports on the UAT dropped sharply. It should also be noted that during this month the generation rate slowly dropped from 1 minute (on average) to 1.30. Then I was confused by the fact of adding a new column. I looked at the fact that for the records in the table before my release, MatchingKey is NULL, which naturally made the query go through almost a million records and did not find matches. Plus, I noticed that first the view is executed without any restrictions on the date and time, and only then the time data is selected from it. By my logic, this should have significantly improved the performance of report generation and, by the way, it worked on the DEV server more or less, invested in 40 seconds, but did not affect the UAT.

The result of the attempt 2. Virtually not visible.

Attempt 3 (success and universal acceptance)

After I disappointed my superiors with an attempt at two, I received specific instructions. Drop everything and be engaged only in productivity. The first idea was the FULL PROCESSING APPLICATION. I wanted to split the big table into two as files and reduce them to the third one by id, completely transfer any calculations to .net aplicuhu. Remove as many groupings from the reports as possible and, in general, do everything as the Pasta God commands. I spent a week on this, but when I got to the part of storing data in the database, I started experimenting with tables and queries and it turned out that there was practically no difference in the speed of query execution with a choice from view, join two separate tables. At least not so that the report was loaded for 1 minute. For 3000 records, the query in SQL was performed for 3-5 seconds. So, the point is in grouping data on the client side. We remove all groupings, a clean report even without sorting - minus a couple of seconds to generate. It turns out on grouping leaves couple of seconds, on SQL query 3-5. Where does almost a minute go?

The question is not rhetorical, and for good, such experiments had to be carried out before attempt 1, but when it came to it, it came to that. We continue the experiment. We start to play with requests from the report. By that time, I had already asked the admins for the right to access Profiler. He opened Profiler, but since he had no particular (no) experience with him, he didn’t look very carefully. And then at some point I got tired of typing @StartDate and @EndDate every time, it is annoying when you need to click two calendars every half minute. Therefore, in the report without grouping, I went to the query generator and instead of @StartDate and @EndDate wrote '2014-06-11'. Clap, 3 seconds to generate! How so? Probably, there was a caching, cleared the cache everywhere, again. 3 seconds. Can not be. Now with grouping - 5 seconds. Yes, you're kidding.
After half an hour of scratching turnips and digging in Profiler, I finally found that creature that tormented me for a month, and this poor project - six months.

Result

It turned out that having a query in the report, for example:

 SELECT * FROM Table WHERE time >= @StartDate AND time <= @EndDate

On the SQL side, it will be executed like this:

 EXEC sp_executesql N'SELECT * FROM Table WHERE time >= @StartDate AND time <= @EndDate','@StartDate datetime, @EndDate datetime','2014-06-11','2014-06-11'

And now such a request is executed in about 1.30 seconds. Whereas when I substituted the values in the request immediately, it was not DSQL that was executed, but a normal request.

As a result, I was able to convince the client to switch to Stored Procedures and now the daily report is loaded for 6 seconds, the monthly one is 1 minute.

findings

1. You'll make people laugh
2. You'll lose a month
3. Let the bosses get angry
4. It is better to spend time on understanding the problem, and not to suggest possible solutions, especially in the topic in which you are new.

PS I understand that this is rather an article not about solving a problem, but about how it was wrong to solve it. But I googled a lot about this and didn’t find such tips, or clues that reports could arrange a similar dish.

PS2 I have a couple of suggestions about why such a query is executed for so long, but I will be very grateful if there is someone who knows exactly why - and will explain to me.

UPD1: I took into account the complaints about this article and tried to correct all the words that caused criticism.

UPD2: Today in the new report, having already converted it to a stored procedure, I still ran into a problem.

 exec rOtex_Mis_DailyHighLevelReport @StartDate='2014-06-01 00:00:00',@EndDate='2014-06-30 00:00:00'

during the generation of the report, it took 6 minutes and hands 1 second. Therefore, I got into Google with a specific question, “SSRS stored procedure slow”. And got a sensible answer. Woe to me google-rukozhopu, he lay on the surface of the right search.
stackoverflow.com/questions/2283943/fast-query-runs-slow-in-ssrs
The fact is that this is a well-known problem of SQL Servera, which relates to Parameter Sniffing, as comrades BelAnt and microuser have rightly advised me - and the solution is pretty shitty and, as often happens with govnokodom, simple. I will give an example from my stored procedure:

 @StartDate Date, @EndDate Date AS BEGIN declare @Start datetime declare @End datetime set @Start = @StartDate set @End = @EndDate

And it works:

Output 2.0

- googly is normal, epta

UPD3: After several tests, it turned out that the stored procedures were also invented by the "big mind". It turned out that it is enough to add OPTIONS (RECOMPILE) to the end of the request, as advised by comrade BelAnt. Here I will not write conclusions, if it is without a mozok then that yogo conclusions before the asshole: (

Source: https://habr.com/ru/post/230939/

All Articles