📜 ⬆️ ⬇️

Converting multi-page xls / xslx to csv with PHPExcel

In my projects, it was often necessary to collect data from different sources in a CSV format, and so far I did not need to receive data from several pages of xls tables, I just had a simple fgetcsv () / fputcsv (). But the day came when the task “to get data from all pages of the document” was set before me. And, as usual, I began to look for a ready-made solution in order not to build my “bicycle”. But, unfortunately, I didn’t find exactly what I needed: there was a similar solution that displayed a multi-page document on the screen, but another library was used, which, as I understood, did not support the xslx format (Excel 2007 +). A little more searching for other options, I realized that it was a bad job and decided to deal with the library myself. Combining some tips on working with the PHPExcel library into one, I got the following script. So let's get started.

First we need PHPExcel itself. Immediately, I note that the library is perfectly installed through the composer, but there is no explicitly full version of the library anywhere indicated. I selected version 1.8 with the selection method with the addition of the “inaccuracy” attribute.

In my composer.json, which I added to the “require-dev” block: {}, I got this entry:

"require-dev": {
"phpoffice/phpexcel": "~1.8"
},

')
At the moment, version 1.8.1 has been installed. Since the PHPExcel library inherits SPL, which is in PHP, starting with version 5.3, instead of the standard walkthroughs of the array of strings and document cells using foreach (), I decided to use Iterators.

We connect the library, load the document and define some initial data:

 /** Include PHPExcel */ include_once '../Classes/PHPExcel.php'; $callStartTime = microtime(true); $tmpFileName = microtime(true); $format = 'Ym-d'; // Load PHPExcel object $objPHPExcel = PHPExcel_IOFactory::load('multipage.xls'); 

Next, we get an Iterator of pages, for which we don’t need to know their number:

 $sheetsIterator = $objPHPExcel->getWorksheetIterator(); 

The iterator is bypassed using this simple construction:

 while( $sheetsIterator->valid()) { $pageNumber = $sheetsIterator->key(); $pageContent = $sheetsIterator->current(); $sheetsIterator->next(); 

Similarly, the lines and cells of the document were obtained and bypassed How nice to get data from the cell, I found here, in Habré, in the article Universal reading cells in PHPExcel . I will not describe in detail all the checks, who have a desire - you can read in this article.

Writing to a CSV file, I produce the same through this library like this.

 // Create new object to write converted data and separate documents sheets $csvPagePhpExcel = new PHPExcel(); // HERE Add Data to Object // Creating CSV writer Object and save data to file $objWriter = PHPExcel_IOFactory::createWriter($csvPagePhpExcel, 'CSV'); $objWriter->save($currentTmpFileName); 

Filling the object to write to the file I will show below in the full version of the script. The only thing I can add is: if you need to write dates again in xls / xlsx documents and specify explicit formatting , you can use the following construct when preparing the PHPExcel object:

  if ($isDate) { $csvPagePhpExcel->getActiveSheet()->getStyle($cellIterator->key().$rowIterator->key())->getNumberFormat()->setFormatCode(PHPExcel_Style_NumberFormat::FORMAT_DATE_YYYYMMDD2); $csvPagePhpExcel->getActiveSheet()->setCellValue($cellIterator->key().$rowIterator->key(), $cellValue); } else { $csvPagePhpExcel->getActiveSheet()->setCellValue($cellIterator->key().$rowIterator->key(), $cellValue); } 

Where the format is specified using the PHPExcel_Style_NumberFormat :: FORMAT_DATE_YYYYMMDD2 constant . In this case, it is the yyyy-mm-dd format, which you can immediately use in the MySql query.

By the way, here are all available library constants:

 /* Pre-defined formats */ const FORMAT_GENERAL = 'General'; const FORMAT_TEXT = '@'; const FORMAT_NUMBER = '0'; const FORMAT_NUMBER_00 = '0.00'; const FORMAT_NUMBER_COMMA_SEPARATED1 = '#,##0.00'; const FORMAT_NUMBER_COMMA_SEPARATED2 = '#,##0.00_-'; const FORMAT_PERCENTAGE = '0%'; const FORMAT_PERCENTAGE_00 = '0.00%'; const FORMAT_DATE_YYYYMMDD2 = 'yyyy-mm-dd'; const FORMAT_DATE_YYYYMMDD = 'yy-mm-dd'; const FORMAT_DATE_DDMMYYYY = 'dd/mm/yy'; const FORMAT_DATE_DMYSLASH = 'd/m/y'; const FORMAT_DATE_DMYMINUS = 'dm-y'; const FORMAT_DATE_DMMINUS = 'd-m'; const FORMAT_DATE_MYMINUS = 'm-y'; const FORMAT_DATE_XLSX14 = 'mm-dd-yy'; const FORMAT_DATE_XLSX15 = 'd-mmm-yy'; const FORMAT_DATE_XLSX16 = 'd-mmm'; const FORMAT_DATE_XLSX17 = 'mmm-yy'; const FORMAT_DATE_XLSX22 = 'm/d/yy h:mm'; const FORMAT_DATE_DATETIME = 'd/m/yh:mm'; const FORMAT_DATE_TIME1 = 'h:mm AM/PM'; const FORMAT_DATE_TIME2 = 'h:mm:ss AM/PM'; const FORMAT_DATE_TIME3 = 'h:mm'; const FORMAT_DATE_TIME4 = 'h:mm:ss'; const FORMAT_DATE_TIME5 = 'mm:ss'; const FORMAT_DATE_TIME6 = 'h:mm:ss'; const FORMAT_DATE_TIME7 = 'i:s.S'; const FORMAT_DATE_TIME8 = 'h:mm:ss;@'; const FORMAT_DATE_YYYYMMDDSLASH = 'yy/mm/dd;@'; const FORMAT_CURRENCY_USD_SIMPLE = '"$"#,##0.00_-'; const FORMAT_CURRENCY_USD = '$#,##0_-'; const FORMAT_CURRENCY_EUR_SIMPLE = '[$EUR ]#,##0.00_-'; 

As a result, I got a script that writes each page of the document in a separate csv-file, while receiving the correct cell values ​​regardless of the availability of external data and formats the date in a suitable format.

Here it is:

 <?php /** Include PHPExcel */ include_once '../Classes/PHPExcel.php'; $callStartTime = microtime(true); // Load PHPExcel object $objPHPExcel = PHPExcel_IOFactory::load('multipage.xls'); // Get all document sheets $sheetsIterator = $objPHPExcel->getWorksheetIterator(); $tmpFileName = microtime(true); // Date format ready to import in SQL database $format = 'Ym-d'; while( $sheetsIterator->valid()) { $currentTmpFileName = "/tmp/{$tmpFileName}_sheet_{$sheetsIterator->key()}.csv"; echo $sheetsIterator->key() . '<hr />'; // Get current sheet rows $rowIterator = $sheetsIterator->current()->getRowIterator(); // Create new object to write converted data and separate documents sheets $csvPagePhpExcel = new PHPExcel(); while ($rowIterator->valid()) { // Get Cells from current Rows $cellIterator = $rowIterator->current()->getCellIterator(); echo '<br />' . $rowIterator->key() .'-'; while ($cellIterator->valid()) { $cellValue = $cellIterator->current()->getCalculatedValue(); //check is date if(PHPExcel_Shared_Date::isDateTime($cellIterator->current())) { $cellValue = date($format, PHPExcel_Shared_Date::ExcelToPHP($cellValue)); } //for incorrect formulas take old value if((substr($cellValue,0,1) === '=' ) && (strlen($cellValue) > 1)){ $cellValue = $cellIterator->current()->getOldCalculatedValue(); } $currentCellNum = PHPExcel_Cell::columnIndexFromString($cellIterator->key()); echo $cellIterator->key() . '(' . $currentCellNum . ') => ' . $cellValue; $csvPagePhpExcel->getActiveSheet()->setCellValue($cellIterator->key().$rowIterator->key(), $cellValue); $cellIterator->next(); } $rowIterator->next(); } // Creating CSV writer Object and save data to file $objWriter = PHPExcel_IOFactory::createWriter($csvPagePhpExcel, 'CSV'); $objWriter->save($currentTmpFileName); // clearing trash $csvPagePhpExcel->__destruct(); unset($csvPagePhpExcel); $objWriter = ''; unset($objWriter); $sheetsIterator->next(); } $callEndTime = microtime(true); $callTime = $callEndTime - $callStartTime; echo $callTime; 

At the stage of testing, all values ​​are displayed on the screen, then who does not need to naturally output can be removed.

Criticism, additions and corrections are welcome. Thank you all, and I will be very happy if my article will help someone and shorten a few hours of work.

Source: https://habr.com/ru/post/269051/


All Articles