📜 ⬆️ ⬇️

Fair generation of DOCX files in PHP. Part 2

image Hello, dear habrasoobschestvu!
We continue the story about the generation of DOCX using PHP.

What awaits us today:

Those who do not know, it is recommended to read the first part . Well, who in the subject - I ask under the cat .

Again


But first, first things first. Since the publication of the previous article, a sufficient number of comments have been written: emotional and in the case; The PHPDocx project on github has several forks. All this suggests that this topic is quite relevant. But some developers do not understand the very essence of my approach. And this approach is to use inheritance: the class generator must be a successor of ZipArchive. Listen, well, if you do not want to use inheritance, install PHP 5.4 and use traits , after all! This approach is incomparably better than working continuously through one property:
')
$this->archive->open( … ); $this->archive->addFile( … ); $this->archive->close( .. ); 

Why do I need to generate DOCX in PHP? Some developers do not understand why it is needed at all. I was guided by the ability to save a web page in Word format. Personally, I use my class to save Yandex. Metrics reports in DOCX format. User seriyPS asked why I broke the text into lines? I did this, assuming that the text is a field from the database, and the line break is a new paragraph. In general, we will not do this for clarity. Do yourself a breakdown of the paragraphs.
In addition, our generator should have the most convenient API. I think I managed to implement it. The API consists of only three methods: constructor, assign, create.
Well, talk, and that's enough. Let's get started

What's new


First, I significantly changed the code used in that article , and designed it all into a full-featured OpenSource library. Links at the end. And now the points:

1. OfficeDocument and WordDocument class


As we have already understood, the files necessary for the MS Office document as a whole are stored in the archive root. The word / folder contains documents required by MS Office Word directly. The solution suggests itself: to make the class common for MS Office documents, and the heir class for Word documents directly.
Immediately describe the structure:

 //      MS Office  class OfficeDocument extends ZipArchive{ __construct($filename, $template_path = '/template/' ); protected function add_rels( $filename, $rels, $path = '' ); protected function pparse( $replace, $content ); } //     MS Word class WordDocument extends OfficeDocument{ public function __construct( $filename, $template_path = '/template/' ) //  ,   API public function assign( $content = '', $return = false ); public function create(); } 

Why did I do this? This is a reserve for the future, in which we will generate MS Excel files with the XlsxDocument class.
Let's break the insides.

2. Dynamic link building


Inside the docx file there are files _rels / .xml and word / _rels / document.xml.rels. They attach files to the document. If you do not describe any file in these structures, it will simply be overweight in a docx document. So you can just hide the info inside the docx. We in designers will create arrays of internal communications between XML documents. Here, for example, links for the MS Office document:

  //     MS Office $this->rels = array_merge( $this->rels, array( 'rId3' => array( 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties', 'docProps/app.xml' ), 'rId2' => array( 'http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties', 'docProps/core.xml' ), ) ); 

The file identifier is the “rIdN” entry. The app.xml and core.xml files are static. We will simply pack them into the archive using the add_rels method, in parallel creating the XML-description file of links _rels.xml:

  //   protected function add_rels( $filename, $rels, $path = '' ){ //  XML $xmlstring = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">'; //      foreach( $rels as $rId => $params ){ //     , .  ,     $pathfile = empty( $params[2] ) ? $this->path . $path . $params[1] : $params[2]; //     if( $this->addFile( $pathfile , $path . $params[1] ) === false ) die('     ' . $path . $params[1] ); //    $xmlstring .= '<Relationship Id="' . $rId . '" Type="' . $params[0] . '" Target="' . $params[1] . '"/>'; } $xmlstring .= '</Relationships>'; //    $this->addFromString( $path . $filename, $xmlstring ); } 

I note that add_rels is described in OfficeDocument, and is used in both classes: OfficeDocument and WordDocument, since there are two _rels.xml documents inside the docx file that describe dependencies. This is a win of the PLO approach that I proposed, and here the methodology proposed by VolCh is definitely not appropriate.
As a result, we get the following typical _rels:

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml"/> <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml"/> <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/> </Relationships> 

The word / document.xml file we generate and connect dynamically. Hope, with dynamic linking, is understandable. Now with the image insert.

Learning to embed images


First, I will give an XML fragment obtained by an experimental method, to be inserted into document.xml, to get an image in a Word document:

 <w:pw:rsidR="000E3348" w:rsidRDefault="00CD6FED"> <w:r> <w:rPr> <w:noProof/> <w:lang w:eastAsia="ru-RU"/> </w:rPr> <w:drawing> <wp:inline distT="0" distB="0" distL="0" distR="0"> <wp:extent cx="{WIDTH}" cy="{HEIGHT}"/> <wp:effectExtent l="19050" t="0" r="0" b="0"/> <wp:docPr id="2" name=" 2"/> <wp:cNvGraphicFramePr> <a:graphicFrameLocks xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" noChangeAspect="1"/> </wp:cNvGraphicFramePr> <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"> <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture"> <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture"> <pic:nvPicPr> <pic:cNvPr id="0" name="image.jpg"/> <pic:cNvPicPr/> </pic:nvPicPr> <pic:blipFill> <a:blip r:embed="{RID}"> <a:extLst> <a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}"> <a14:useLocalDpi xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main" val="0"/> </a:ext> </a:extLst> </a:blip> <a:stretch> <a:fillRect/> </a:stretch> </pic:blipFill> <pic:spPr> <a:xfrm> <a:off x="0" y="0"/> <a:ext cx="{WIDTH}" cy="{HEIGHT}"/> </a:xfrm> <a:prstGeom prst="rect"> <a:avLst/> </a:prstGeom> <a:noFill/> <a:ln> <a:noFill/> </a:ln> </pic:spPr> </pic:pic> </a:graphicData> </a:graphic> </wp:inline> </w:drawing> </w:r> </w:p> 

We will need to replace {RID} with the identifier of the connected image, and also register {WIDTH} and {HEIGHT}.
For inserting an image, as well as for inserting text, one API method is responsible - assign:

  public function assign( $content = '', $return = false ){ // ,   $text .  ,    if( is_file( $content ) ){ //    $block = file_get_contents( $this->path . 'image.xml' ); list( $width, $height ) = getimagesize( $content ); $rid = "rId" . count( $this->word_rels ) . 'i'; $this->word_rels[$rid] = array( "http://schemas.openxmlformats.org/officeDocument/2006/relationships/image", "media/" . $content, //      $content ); $xml = $this->pparse( array( '{WIDTH}' => $width * $this->px_emu, '{HEIGHT}' => $height * $this->px_emu, '{RID}' => $rid, ), $block ); } else{ //    $block = file_get_contents( $this->path . 'p.xml' ); $xml = $this->pparse( array( '{TEXT}' => $content, ), $block ); } //   ,    XML,  if( $return ) return $xml; else $this->content .= $xml; } 

Anyone who can read the code will notice that the method uses a clever metric system. It is called English Metric Units (EMU). You can read about it on the English Wikipedia . In short: you can get EMU from px by multiplying by number. Only here on Wikipedia it is written that this number is equal to 12,700. I found out experimentally that it is equal to 8,625. At the same multiplier, the picture was displayed pixel by pixel.
And of course, we connect the image file directly to the link structure:

  $rid = "rId" . count( $this->word_rels ) . 'i'; $this->word_rels[$rid] = array( "http://schemas.openxmlformats.org/officeDocument/2006/relationships/image", "media/" . $content, //      $content ); 


As a result


As a result, we got a full library. Now we can use it like this:

 //   include 'PHPDocx_0.9.2.php'; //     .   $w = new WordDocument( ".docx" ); //   assign /****************************** / / $w->assign( 'text' ); / $w->assign( 'image.png' ); / $xml = $w->assign( 'image.png', true ); / $w->assign( $w->assign( 'image.png', true ) ); / /******************************/ $w->assign('image.jpg'); $w->assign('    -     .'); $w->create(); 

That's basically it.
In the plans: the generation of tables.
References:
PHPDocx on github .
PHPDocx project page .
Download the source .

Source: https://habr.com/ru/post/140012/


All Articles