When developing an electronic document management system, it was necessary to implement functions for exporting data in popular formats. In particular, in Microsoft Excel format. Export requirements were fairly simple — export data with a minimum of formatting, i.e. no merged cells, font games, etc. Export Formats XLSX and Excel XML.

In this case I will tell about
Excel XML .
So, in any system operating with tabular data, a wound or later there is a need to export data. Export objectives are different:
- data transmission in tabular form
- data analysis acc. systems
- charting
- building pivot tables, etc.
Based on the goals, a reasonable conclusion was asked that when exporting data, you should try to save or specify the type of data in the respective columns so that the application that opens the file does not try to apply formatting at its discretion. To put it another way, dates must be dates, numbers must be numbers, lines must be strings.
')
Task
Technical requirements formulated trivially:
- Arrange the export mechanism as a standalone class.
- Implement a set of functions in a class for writing cell values and a number of
- Ability to work with unlimited data.
Making the export mechanism in the form of an autonomous class is an obvious requirement, the fulfillment of which will allow in the future to share the class with other developers and use it in new projects.
Implementing a set of functions in a class for writing cell values and a row is the main requirement, which implies the creation of functions for writing cell values of the specified types and the ability to write a finished row to a file.
Ability to work with an unlimited amount of data - of course, the export class itself cannot be responsible for the recorded volume, but it must provide functions for writing data to disk and freeing up RAM for the next piece of data.
In addition to the requirements described, it was necessary to add service functions:
Implementation
First of all, when creating a class, I check the final file name and request the number of columns and rows. The file must have the correct name, and the folder in which it will be saved must exist. Everything is as usual.
The Excel XML format allows you to save information about the user who created it in the file; therefore, when creating the header, I record the name of the organization, the information about the user and the date the file was created.
public function writeDocumentProperties($organization = null, $user = null) { fwrite($this->file, '<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">'); if (!is_null($user)) { fwrite($this->file, '<Author>'.$user->description.'</Author>'); fwrite($this->file, '<LastAuthor>'.$user->description.'</LastAuthor>'); } $dt = new Datetime(); $dt_string = $dt->format('Ymd\TH:i:s\Z'); fwrite($this->file, '<Created>'.$dt_string.'</Created>'); fwrite($this->file, '<LastSaved>'.$dt_string.'</LastSaved>'); if (!is_null($organization)) fwrite($this->file, '<Company>'.$organization->name.'</Company>'); fwrite($this->file, '<Version>12.00</Version>'); fwrite($this->file, '</DocumentProperties>'); }
However, it is in this function that entities of the workflow system are used - organization (organization) and user (user). Replacing these entities with, say, string values is not a problem.
The most interesting in the title is information about styles. In Excel XML, they are implemented very conveniently, so I simply create a table with styles for rows, date / time, and hyperlinks.
public function writeStyles() { fwrite($this->file, '<Styles>');
The preparatory work is finished, you can go to the data record. Opening a worksheet is just a couple of tags, just at this moment information about the number of columns and rows is used.
public function openWorksheet() { fwrite($this->file, '<Worksheet ss:Name="Export">'); fwrite($this->file, strtr('<Table ss:ExpandedColumnCount="{col_count}" ss:ExpandedRowCount="{row_count}" x:FullColumns="1" x:FullRows="1" ss:DefaultRowHeight="15">', array('{col_count}'=>$this->colCount, '{row_count}'=>$this->rowCount))); }
But the recording of rows is a more interesting process. The class should work quickly and process an unlimited amount of data, because there can be thousands of records or even a million! If you want speed, work with memory, if you want an unlimited amount of data, work with a disk. To reconcile the requirements, implemented the resetRow and flushRow functions.
The first one clears the current row, after which it can again be filled with data, and the second one writes the current row to an open file on the disk. Sharing them helps balance the speed and amount of memory used.
public function resetRow() { $this->currentRow = array(); } public function flushRow() { fwrite($this->file, implode('', $this->currentRow)); unset($this->currentRow); }
Each cell is recorded by a function corresponding to the data type, namely appendCellxxx, where xxx is the data type. Valid data types are Num, String, Real, DateTime, Date, Time, Link. An example of a function for recording a numeric value:
public function appendCellNum($value) { $this->currentRow[] = '<Cell><Data ss:Type="Number">'.$value.'</Data></Cell>'; }
After recording all the data, it remains to close the worksheet and the workbook.
Application
The use of the described class is based on the export of data using the
CArrayDataProvider provider. However, assuming that the volume of exported data can be very large, a special iterator
CDataProviderIterator is used , which
iterates over the returned data by 100 records (you can specify a different number of records).
public function exportExcelXML($organization, $user, &$filename) { $this->_provider = new CArrayDataProvider(); Yii::import('ext.AlxdExportExcelXML.AlxdExportExcelXML'); $export = new AlxdExportExcelXML($filename, count($this->_attributes), $this->_provider->getTotalItemCount() + 1); $export->openWriter(); $export->openWorkbook(); $export->writeDocumentProperties($organization, $user); $export->writeStyles(); $export->openWorksheet();
In my case, each row is written to disk, which, so far, is quite acceptable, but in the future, it may require changes. For example, it would be reasonable to save not every row, but every ten or even a hundred rows at a time. Then the export rate will increase.
Speed
By the way, I learned from my own experience how important it is to assume the possibility of the existence of large amounts of data during a batch operation, such as export.
Initially, I tried to export data using a
CActiveDataProvider , which required about 240 seconds when exporting 1000 records! Changing the query to use
CArrayDataProvider reduced the export time of 1000 entries to 0.5 seconds!
Especially for this publication measured export figures.
Exported
1,626 records with
9 attributes that represent information about closed incidents (see
ITSM ).
The original view of the exported table
(sorry, the picture disappears after publication)
Result
(sorry, the picture disappears after publication)
Export figures
The size of the final file:
1 312 269Compressed file size:
141,762Elapsed time: approximately
0.5 seconds
Anyone interested, can get the source code of my class
AlxdExportExcelXML for free. Only you need to remember to correct the function
writeDocumentProperties , in order to get rid of the entities of the organization and user document management system, or use your similar entities with appropriate properties.