We continue to talk about the technologies used in our
email-marketing service Pechkin-mail.ru . One of the key tasks of any service related to customer data is to download this data to the service. For Pechkin, it is very important to quickly and easily download the address databases containing email addresses, first names, last names and other additional data.

What to use as a tool?
We used Microsoft Excel as the base standard used when importing address databases. This is explained simply:
- This is a standard tool that at the basic level is owned by 100% of computer users. Moreover, in business, it is a de facto corporate standard and is used even if on working Mac or Linux computers.
- Virtually all CRM, CMS, cloud or desktop systems are exported to Excel or CSV, which is simply converted to XLS or XLSX format.
- It is also known that “90% of software errors are half a meter from the monitor”. No offense will be told to ordinary users, but we must take into account the most basic level of training and those. support for an explanation is enough to say “Download an Excel file” and not to explain the procedure for preparing the file in the correct format.
The problem of users when importing address databases was removed. But there is already a problem directly developing.
Our pain as developers
Excel is not an open-source development, but a proprietary solution. The data format, especially in newer versions after 2007 (xlsx), is non-trivial. PHP is used at Pechkin, so we began searching for libraries that will allow us to solve this problem. But here we encountered the problem that a number of libraries do not allow reading xlsx:
- php-spreadsheetreader reads a variety of formats (.xls, .ods AND .csv)
- PHP-ExcelReader (xls only)
- PHP_Excel_Reader (xls only)
- PHP_Excel_Reader2 (xls only)
- Xls file reader commercial and xls only
- SimpleXLSX From the description is able to read xlsx, however, the author refers only to xls
- PHP Excel Explorer Commercial and xls only
')
The
PHPExcel library attracted our attention. We used it several years ago in the
SMS24X7.ru sms-mailing service . Petya Sokolov (
Petr_Sokolov ), our talented developer, wrote a wrapper for this library, correcting a number of its flaws and bugs.
The library is certainly interesting and developed. But for Pechkin, it became impossible to use it in a couple of years, when both we and our customers grew up - its disastrous demands on resources and the huge time of parsing files. For example, there are frequent cases of loading on the service of address databases> 100,000 lines with a complex structure. And if the file is already 500,000 lines and “weighs” more than 30MB?
And then we let go ...
In the process of searching, we stumbled upon the
libxl commercial library, seeing the
results of the “handicraft benchmark” at Stackoverflow.
The library is written in C ++, and thanks to the excellent object-oriented extension for PHP from
Ilia Alshanetsky , it is easy to learn and integrate (for example, it took about 3 hours to rewrite our current solution from PHPExcel to LibXL). What is very cool, considering that, unfortunately, there is no documentation from the developer, and it is necessary to use the Reflection extension.
The installation process is very simple.
cd /usr/local/src/ wget http://libxl.com/download/libxl.tar.gz tar zxfv libxl.tar.gz cd libxl-3.5.4.1/ ln -s include_c include cd ../ wget https://github.com/iliaal/php_excel/archive/master.zip unzip master.zip cd php_excel-master/ ./configure --with-excel --with-libxl-libdir=../libxl-3.5.4.1/lib64 --with-libxl-incdir=../libxl-3.5.4.1/include_c make make test make install
As a result of the compilation, you will receive the excel.so file in the / usr / lib / php5 / 20090626 / folder. Now it’s enough to create the file /etc/php5/conf.d/excel.ini with the contents.
extension=excel.so
Check whether the module is installed and restart the web server.
php -m | grep excel service lighttpd restart
In the code, everything is also very simple. You load the file and read the necessary cells. For example, like this:
$doc = new ExcelBook(); $doc->loadFile('example.xls'); for($r=$sheet->firstRow();$r<=$sheet->lastRow();$r++){ for($c=$sheet-> firstCol();$c<=$sheet-> lastCol();$c++){ echo ': '.$r.' : '.$c.' : '.$sheet->read($r,$c).'<br />'; } }
Performance Results
The absence of the need for RAM (in the process of downloading the file and reading it) was pleasantly pleased.

And here is the increase in the speed of downloading the excel file and its reading on various sizes of address bases.

These tests were carried out on xlsx files with N subscribers in one hundred hundred with email. The real address bases are even bigger and more complex and the advantage in speed and memory consumption looks even more significant.
The cost of the library is $ 199 for a developer license, but believe me, it's worth it. Of course, we recommend to anyone who is faced with the problem of importing Excel files into their service.