Start
One accounting and reporting system supported by our company began to grow very quickly in the amount of stored data. The system is written in PHP using the framework Yii2. Initially, reports were built through the library PhpSpreadsheet, which came to replace, has long become deprecated, PhpExcel.
Among the different types of reporting was one very large - in fact, the complete set of all data stored in the database should be unloaded into one excel-table. At the initial stage, there were no problems, but when the volume began to exceed many hundreds of thousands of records, the unloading script began to fall off at the timeout limit.
To begin with, we raised this very limit and began to look for ways to solve the problem. But the temporary solution was not long enough - the problem with the time limit turned into a problem with the memory limit. The server was thrown with "RAM" and generally removed the memory_limit for this particular operation. Very soon, users started complaining again about runtime errors. I had to remove the time limit for the full report. But to sit and watch a dozen minutes on the screen with a load indicator is not much fun. In addition, sometimes the report was needed “here and now,” and every minute spent on its formation turned out to be critical. Experiments with environment settings have been stopped, the back of the head has been scratched, and the code has been optimized.
Finding a solution
The first thing that was done was the reporting script put into the background process, and the user watches the progress through the progress bar. The background execution of tasks was implemented through the queue mechanism using Redis for storage. Work in the system does not stop, you can do other tasks and periodically return to the page with the report - see if the file is ready. Once the file is formed, the user is offered a link to download. But, as mentioned above, sometimes the file was required "immediately", and increasing usability did not solve this problem. Meanwhile, the amount of data continued to grow, and the time to build the file reached 79 minutes! This is completely unacceptable, especially considering that reporting is one of the fundamentals of the functionality of this system. No, all the other parts worked like clockwork, but this fly in the ointment spoiled the overall impression.
')
First results
Sit down again for the code analysis. The first thing that was tested was the process of selecting data from the database. But requests have already been optimized as much as possible. Although the longest request was a terrible sample with five or six calls to the monstrous FIAS, but it worked out in 2-5 seconds. The weak point was not he, but the formation of the exelnik file. Attempts to optimize this process. Starting from caching in redis, to distortions such as the formation of separate small “exellers” in parallel threads with subsequent gluing into one file. But the result was always the same: a problem with time turned into a problem with memory and vice versa. There was no golden middle ground, only flowing from one extreme to another. After a certain amount of data, the library's resource consumption began to grow exponentially and it was not possible to win. PhpSpreadsheet - not suitable for large files. As a result, it was decided to change the library. As an option - writing your analogue to form Excel-files.
Analysis and tool selection
We didn’t hurry with writing bicycles, but first we analyzed the existing solutions. Of the possible options are only interested in box / spout. Quickly rewrote the module using this library. As a result, the full report turned out in 145 seconds. Let me remind you that the latest tests with PhpSpreadsheet - 79 minutes, and here 2.5 minutes! Conducted testing: increased the amount of data 2 times. The report was generated in 172 seconds. The difference is awesome. Of course, the library does not have all the same functions as PhpSpreadsheet, but in this case the minimum set of tools is enough, since the speed of work is critical.
Extension for Yii2
The final decision was designed as an extension for Yii2. Maybe someone will come in handy. The extension allows you to upload any data set from GridView to excel while maintaining filtering and sorting. It uses yii / queue and box / spout as dependencies. It makes sense to use the extension to form really large files, well, at least 50,000 lines =) At the moment, the module that has become the basis for the extension is able to cope with a load of almost 600,000 lines.
Link to github:
Yii2 ExcelReport Extension
Thanks for attention!