I would like to tell one story that happened to me. In my opinion, it is quite interesting, and can help someone with a similar problem. At once I will say - this is my first post on Habré :-) It turned out to be a bit long, so I made the main conclusions separately - at the end of the article.
So, the task:There is a portal to which Microsoft Word documents are uploaded in Doc format. They must be processed before they are shared. How exactly is not important, therefore, to simplify, let's take the following algorithm:
- Create a new document.
- Insert data from the source (insert file).
- Save the resulting file instead of the original.
I will say right away: An additional server with Win2k3 + Office + Apache + PHP5 is on board for Doc-files. Perhaps in the comments, I will find the answer to how I can perform similar manipulations on FreeBSD :-)
The answer to the first question: Why do we need this treatment?
Through trial and error, it was concluded that this is the minimum sequence of actions that must be performed for all documents.
There are two reasons for this:
- First, when inserting a file, only its contents are inserted (text, formatting, pictures), and the macros remain “overboard”. This allows you to get rid of malicious code that may get into your Normal.dot :-) Of course, if you need to save macros, this option will not work, but for most tasks, macros are not needed.
- Secondly, if the source file was locked for writing, then you cannot change it. A new document does not inherit this lock.
')
The answer to the second question: How was the processing of documents implemented?
Several generations of handlers were made by trial and error. The fact is that, in the beginning, a normally working program, after a while began to produce errors. As a result, the processing stopped. And when the stops became too frequent, one had to admit the failure of the previous decision and invent a new one.
So.
First generation: "mainscript.exe"This script runs Word with command line options. One of them is the name of the file being processed, and the second is the name of the macro processing. After 2 minutes the program killed the process launched by it. The script itself was run via CGI.
Of the minuses, it can be noted:
- the program worked exactly 2 minutes
- frequent freezes and programs and word. As a result, a bunch of hanging processes winword.exe and mainscript.exe.
Second generation: simple php - com / oleIf the first generation was implemented before me, then the second is my add-on above the first. A simple script was written that implements the same actions from a macro. One script was replaced by another. Locally everything worked fine. Processing the average file took 15 seconds (instead of 2 minutes). The script is transferred to the server. Testing. Same 15 seconds. We start in work, and ...
Winwords fall, hang, brake ... Exceptions constantly appear. In addition, php scripts hang for a time longer than max_execution_time and hang only if they are thrown out, the winword.exe process. The reasons are not clear, Google suggests an
article in which Microsoft warns that the office is not intended for server applications and its behavior in my situation is unpredictable. Also, it was not recommended to run more than one winword process there.
Said - done:
Third generation: monopolizing access to winword.exeAnother processing request checks whether Word is already running or not. This check can be done through the database or pid-file. I did through the db. The value of the flag field increases with each request. If there were a lot of requests, it means that the process has hung, and then the magic program killword.exe is started. She stupidly kills all processes called winword.exe.
It seems everything works fine, but over time, the script has increased its runtime, and increasingly it is required to launch the killword. Constant Exception ...
Fourth generation: handler releaseA painstaking search for the causes of errors gave a very interesting result. After opening / saving the file, the control returns to the php script, but the COM server remains busy for a while. And
if at this moment to send the following command to the COM server, we will fly to Exception . After that, Word can kill and start all over again.
The problem was solved by simply
copying to a temporary folder. It turned out Word scans the folder for temporary files, or something else. Over time, the number of files in the source folder increased and the script began to fail.
In general, the last scheme of work turned out to be the most efficient. Sometimes errors such as “lack of memory” pop up, but this is quite rare and is still a secret :-)
So:
DOC handler operation schemeThe handler consists of two files: tmanager.php and handler.php
1. tmanger.php
This is a daemon that runs continuously (or periodically starts through the task scheduler). Every n seconds he checks the table in the database for the presence of the next request for processing (the so-called queue).
If there is an application, perform the following actions:
- Delete all files from temporary folders (including% SYSTEM_ROOT% \ TEMP)
- Kill all word processes (killword.exe)
- Copy source file to temporary folder
- Run handler.php (via curl)
- Copy the resulting file to a shared folder.
- Clean up temporary files and unnecessary processes.
- Remove entry from DB queue
2. handler.php
This is the handler itself. In a separate file, it is taken out in case of a hang. If after some time it does not return the result, then tmanager.php will start the killword.
here is its simplified code:
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
<?php try { $doc = new COM( "Word.Document" ); $doc->Range->InsertFile( 'E:\\tmp\\tmp1.doc' ); // $doc->SaveAs( 'E:\\tmp\\tmp2.doc' ); $doc = null ; } catch (Exception $e) { print '3' ; $doc = null ; die(); } print 0; ?> * This source code was highlighted with Source Code Highlighter .
The tmanager.php script passes the file handler.php to the script for processing. The latter processes it and creates a new file. In the event of a hang handler.php, tmanager.php just kills winword.
findings- Do not open the files sent to you directly, paste the file.
- Do not run multiple Winword.exe processes - they may conflict.
- Do not open or save files in a folder with a large number of files - Word will try to scan these files and at the same time “slow down”.
- Separate the process working with COM from the main process. This will allow you to take some actions when the first hangs up.
Update: Found a link to
an article on server automation. Since the last reading, she has become in Russian and has grown a bit.