📜 ⬆️ ⬇️

HTML with images in DOC in PHP with your own hands

The article Not Very Fair DOC File Generation in PHP described how to generate a DOC file using MHT (MIME HTML) generation using a third-party library. Today I will talk about my own generation in this format. The advantages of my method are as follows:

1) In OpenOffice readable text and pictures.
2) In Word, the file opens in electronic form, rather than full screen.
3) Our script will accept HTML and immediately send the DOC file for download.

Everything else, you will understand how to convert naked HTML to MHT with your own hands. Errors, if any, will be easier to dig into the code.
')
Let's start with the function that will give the DOC file to download and will work in all browsers and with all protocols (I had problems with this):

/*     */ function send_download($filename, $charset = 'cp1251') { header ($_SERVER["SERVER_PROTOCOL"] . ' 200 OK'); if (ereg('Opera(/| )([0-9].[0-9]{1,2})', $_SERVER['HTTP_USER_AGENT'])) $UserBrowser = "Opera"; elseif (ereg('MSIE ([0-9].[0-9]{1,2})', $_SERVER['HTTP_USER_AGENT'])) $UserBrowser = "IE"; else $UserBrowser = ''; $mime_type = ($UserBrowser == 'IE' || $UserBrowser == 'Opera') ? 'application/octetstream' : 'application/octet-stream'; header("Content-Type: application/msword; charset=".$charset); $ua = (isset($_SERVER['HTTP_USER_AGENT']))?$_SERVER['HTTP_USER_AGENT']:''; $isMSIE = preg_match('@MSIE ([0-9].[0-9]{1,2})@', $ua); if ($isMSIE) { header('Content-Disposition: attachment; filename="' . $filename . '"'); header('Cache-Control: must-revalidate, post-check=0, pre-check=0'); header('Pragma: public'); } else { header('Content-Disposition: attachment; filename="' . $filename . '"'); header('Pragma: no-cache'); } } 




Now let's move on to the DOC file generation itself, for this we will create a form that will send us html with pictures, the pictures are on our website.

 <form action="getFile.php" method="POST"> <textarea name="text" rows ="10" cols="60"><?=str_replace(array('<', '>'), array('<', '>'), '   <b></b><img src="/images/logo.gif">');?></textarea> <input type="submit" value="HTML TO DOC"/> </form> 


We will convert the images using base64, create a function - callback for this:

 /*   */ function prepareImage($matches) { global $IMAGES, $IMAGE_NAMES, $IMAGE_COUNT,$gldir,$SITE; $imgfile = $_SERVER['DOCUMENT_ROOT'].'/'.$matches[2]; $imgbinary = fread(fopen($imgfile, "r"), filesize($imgfile)); $url = $SITE.$matches[1]; $data = chunk_split(base64_encode($imgbinary)); $IMAGE_COUNT++; $ext = substr($matches[2], strpos($matches[2], '.') + 1, strlen($matches[2])); $imgName = 'images'.$IMAGE_COUNT.'.'.$ext; $IMAGES .= ' --doc_file_part_na_habrahabr Content-Location: '.$gldir.'images/'.$imgName.' Content-Transfer-Encoding: base64 Content-Type: image/'.$ext.' '.$data.' '; $pr1 = $matches[1]; $pr2 = $matches[3]; $IMAGE_NAMES .= ' <o:File HRef=3D"'.$imgName.'"/>'; return '<v:imagedata src=3D"'.$gldir.'images/'.$imgName.'" o:href=3D"'.$url.'"/></v:shape><![endif]--><![if !vml]><span style=3D"mso-ignore:vglayout"><img border=3D0 src=3D"'$gldir'.images/'.$imgName.'" alt=3DHaut v:shapes=3D"_x0000_i1057" '.$p1.' '.$pr2.'></span><![endif]>'; } 


Immediately I apologize for the fact that the code is written in functions, and all data is stored in global variables. The code was written when I was just starting to write in PHP. Now we will create a function that will help us with the text so that it is at least readable in OpenOffice.

 /*   */ function xml_entities($text, $charset = 'cp1251'){ global $SITE; /*        */ $text = preg_replace_callback( '/<img([a-zA-Z0-9:\/\.\-\?=_&\s;"]*)src="([--a-zA-Z\d\s:\/\.\-\?=_&]*)"([a-zA-Z0-9:\/\.\-\?=_&\s;"]*)>/', "prepareImage", $text); /*      */ $text = preg_replace('/href="/','href=3D"'.$SITE, $text); /*       =3D, 3D -     */ $text = preg_replace('/=(?=[^3])/','=3D',$text); $text = preg_replace('/\s?=\s?"/','=3D"',$text); /*  ,    OpenOffice */ $text = htmlentities($text, null, $charset); $fi = array(""","&","'","<",">"); $re = array('"',"&","'","<",">"); return str_replace($fi, $re, $text); } 




Now the script code itself:

 global $SITE; /*  ,             */ $SITE = 'http://pihpi.ru'; function htmlToDoc($name, $html, $charset = 'cp1251') { $nameFile = $name.'.doc'; global $IMAGES, $IMAGE_NAMES, $IMAGE_COUNT, $gldir, $SITE; $IMAGE_COUNT = 0; $IMAGE_NAMES = ''; $IMAGES = ''; $gldir = 'file:///C:/AF22D505/'; /* doc_file_part_na_habrahabr -       .  ,   ,   MIME        ,  . */ $head = 'MIME-Version: 1.0 Content-Type: multipart/related; boundary="doc_file_part_na_habrahabr" --doc_file_part_na_habrahabr Content-Location: '.$gldir.$nameFile.' Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="windows-1251" <html xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" xmlns=3D"http://www.w3.org/TR/REC-html40"> <head> <meta http-equiv=3DContent-Type content=3D"text/html; charset=3Dwindows-1251"> <meta name=3DProgId content=3DWord.Document> <meta name=3DGenerator content=3D"Microsoft Word 11"> <meta name=3DOriginator content=3D"Microsoft Word 11"> <link rel=3DFile-List href=3D"filelist.xml"> <!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Print</w:View> <w:GrammarState>Clean</w:GrammarState> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> </w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState=3D"false" LatentStyleCount=3D"156"> </w:LatentStyles> </xml><![endif]--> <style> <!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Tahoma"; mso-fareast-font-family:"Tahoma";} @page Section1 {size:595.3pt 841.9pt; margin:18.0pt 19.3pt 18.0pt 18.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> </style> <!--[if gte mso 10]> <style> /* Style Definitions */ table.MsoNormalTable {mso-style-name:"\041E\0431\044B\0447\043D\0430\044F \0442\0430\0431\043B\= 0438\0446\0430"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Tahoma"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400; width:100%; } td.br1{ border:1px solid black; } </style> <![endif]--> </head> <body>'; $end = ' </body> </html> '; $html = xml_entities($html, $charset); /*    ,   xml    */ $fileList = ' --doc_file_part_na_habrahabr Content-Location: '.$gldir.'filelist.xml Content-Transfer-Encoding: quoted-printable Content-Type: text/xml; charset="utf-8" <xml xmlns:o=3D"urn:schemas-microsoft-com:office:office"> <o:MainFile HRef=3D"../'.$nameFile.'"/> '.$IMAGE_NAMES.' <o:File HRef=3D"filelist.xml"/> </xml> '; $content = $head.$html.$end.$IMAGES.$fileList.'--doc_file_part_na_habrahabr--'; send_download($nameFile); echo $content; exit(); } 


And finally, the actual conversion of HTML to DOC in action:

 if (isset($_POST['text'])) { htmlToDoc('article', str_replace('\\', '', $_POST['text'])); } 


An example of the script, you can see the following link:

http://pihpi.ru/getFile.php

A few words about Office Libre:

The text is displayed, but the pictures are alas. After reading about Mime tried to do through the Content-ID, did not work. Either I do not know something, or Libre Office does not want to support MIME HTML at all.

Source: https://habr.com/ru/post/168977/


All Articles