⬆️ ⬇️

Preview documents in a Python program

In one of the systems to which I relate, doc-files are added to the database.

I wondered if it was possible to attach a view of these files into my own program working with the database.





For some reason, the natural solution of such problems is usually considered to launch MSWord with the file name on the command line. But this way, to put it mildly, is not too safe - there may be macros in doc-e, or it may not be a doc at all, but a file specially prepared by a cracker. Therefore, it is better to use a special viewing object implemented in the Office. It is more secure, as it is not able to do anything other than viewing the document.



And if we are not limited to one doc format, then as a bonus we will be able to view attached documents in other formats for which standard viewers are registered in Windows.

')

Looking ahead - everything turned out with the help of PyWin32. True, suddenly in the process I had to compile my package to support the necessary COM interface, but there were no casualties.



So what we know.



  1. According to MSDN , there are viewers in the system that implement the standard IPreviewHandler interface, the interface is described in the Shobjidl.h browser.
  2. You can check whether there is a registered viewer in the system for a specific file extension - if there is a branch HKEY_CLASSES_ROOT \ <extn> \ Shellex \ {8895b1c6-b41f-4c1c-a562-0d564250836f} (where " <extn> " is the file extension with a dot, i.e. " .doc ", " .pdf ", and so on), and there is a default value in it, then this value is the CLSID of the corresponding component.
  3. All registered viewers are listed in the registry branch HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Windows \ CurrentVersion \ PreviewHandlers


Operating procedure:



4. There is a document on disk.



5. According to the file extension, we find the CLSID, using it we create a viewer object.



6. The object is initialized with our file — either by name or by an IStream stream — more on this later.



7. We indicate to the object the window in which it should be displayed by calling the SetWindow method - here we need the window handle, but there are no problems, the Qt widgets have a winId () method for this.



8. To start viewing, call the object's DoPrevew method.



9. If the window is resized, then SetRect must be called to resize the view accordingly.



10. When the viewer is no longer needed, we call Unload on it.



We need to figure out the easiest way in Python (I have not said that I have a python program ?) To create a component using its CLSID.



On Stack Overflow, it is advised to put PyWin32 for such things. Ok, let's try.



C:\>pip3 install pywin32 Collecting pywin32 Could not find a version that satisfies the requirement pywin32 (from versions: ) No matching distribution found for pywin32 


What the…? In the sense of "I can not find the version"?

Google again - aha, you need to install "pypiwin32", because, as said :

Pypiwin32 is a repackaging of pywin32 to use sane packaging tools (namely

wheels). Its repackaged by the BDFL of the Twisted project. If you use

pip, or virtualenvs (and you should be using pip and virtualenvs, if you are

not, start), use pypiwin32.



 C:\>pip3 install pypiwin32 


Uh, set. Great job done!



Now you need to check, I write a small script:



 #!/usr/bin/python3 # -*- coding: utf-8 -*- import pythoncom import pywintypes adobe = pywintypes.IID('{DC6EFB56-9CFA-464D-8880-44885D7DC193}') CLSID_IPreviewHandler = '{8895B1C6-B41F-4C1C-A562-0D564250836F}' iid = pywintypes.IID(CLSID_IPreviewHandler) handler = pythoncom.CoCreateInstance( adobe, None, pythoncom.CLSCTX_LOCAL_SERVER, iid) print(handler) 


Here you can create one of the browsers available in the system, specifically for Adobe pdf. Just created, without further action. If it works, then you can pull its methods.



Run and get that very surprise



 Traceback (most recent call last): File "C:\Projects\pytest\w1.py", line 19, in <module> iid) TypeError: There is no interface object registered that supports this IID 


That is, he created a viewer, but could not return — no, you see, he has a registered interface object that supports the entot IID.



In some respects, I agree with him - Python needs to know what methods the created COM object has in order to allow them to be called from the Python script. This information is provided by the IDispatch interface, but it is not in this object ...



So what to do? Googling through the text of the error message, I find the answer of the developer package Mark Hammond:

> The document PythonCOM.html says that this is done using a "pyd" module that

> is imported. It is accessed in

> this manner is a C or C ++ module must be created specifically for that

> interface?



Exactly. Note however that the IDispatch

this is true.



> If this is needed, it is there that I can see an example

> of the code for that module? If not, how do I tell Python about the

> interface object associated with the IID?



There are a number of examples in the win32com sources. The most

recent set are in the “internet” and “axcontrols” directory.



Also there are 2 options for generating the C code. One

is to use "makegw" that comes with win32com - it takes a .h file

that has geen itself generated from an IDL file

code. But its not very flexible. There is also a SWIG, which is far

more flexible, but probably not.

.H file generated from an IDL,

then check out "makegw" and the samples I mentioned (which

themselves where generated with makepy)


In short, it offers to guide on the old, good C. Inklyudniki, compiler, linker - this is all that, going to Python, I wanted to avoid. And offers to take examples from the source package. I downloaded the sources, then they came in handy.



And two options for assembly





About SWIG found an article on Habré " Python, Modules, SWIG, Windows " mclander , where everything seems to be good, easy and cool. I downloaded this SWIG, tried to figure it out - it didn’t come out of the raid, but makegw did it.



The makegw is a module with actually one function that needs to be run with the necessary parameters — the path to the original user, in this case, ShObjIdl.h from the Windows SDK, and the desired interface, so I wrote a script.



mk.py



 import win32com.makegw.makegw inc = "C:/Program Files (x86)/Windows Kits/10/Include/10.0.14393.0/um/" h = inc + "ShObjIdl.h" win32com.makegw.makegw.make_framework_support(h, "IPreviewHandler") 


The script worked, it turned out two files PyIPreviewHandler.cpp and PyIPreviewHandler.h. Looking into the sishnik, I see the following picture:



 // *** The input argument hwnd of type "__RPC__in HWND" was not processed *** // Please check the conversion function is appropriate and exists! __RPC__in HWND hwnd; PyObject *obhwnd; // @pyparm <o Py__RPC__in HWND>|hwnd||Description for hwnd 


 // *** The input argument prc of type "__RPC__in const RECT *" was not processed *** // Please check the conversion function is appropriate and exists! __RPC__in const RECT prc; PyObject *obprc; // @pyparm <o Py__RPC__in const RECT>|prc||Description for prc 


That is, makegw could not, and did not try to figure out what the constructions "__RPC__in HWND", "__RPC__in const RECT *" mean, and so on. What and warned.



It was foolish to try to compile it, and I didn’t want to correct it either, so I tried to get around the problem - to replace these constructions with monosyllabic equivalents.



I took ShObjIdl.h, tore the description of the IPreviewHandler interface from it into a separate file, changed the types of the parameters.



preview.h
 #include "rpc.h" #include "rpcndr.h" #include "windows.h" #include "ole2.h" //#define __RPC__in #ifndef __IPreviewHandler_INTERFACE_DEFINED__ #define __IPreviewHandler_INTERFACE_DEFINED__ /* interface IPreviewHandler */ /* [uuid][object] */ #include "prtypes.h" EXTERN_C const IID IID_IPreviewHandler; MIDL_INTERFACE("8895b1c6-b41f-4c1c-a562-0d564250836f") IPreviewHandler : public IUnknown { public: virtual HRESULT STDMETHODCALLTYPE SetWindow( /* [in] */ HWND hwnd, /* [in] */ CRECTPTR prc) = 0; virtual HRESULT STDMETHODCALLTYPE SetRect( /* [in] */ CRECTPTR prc) = 0; virtual HRESULT STDMETHODCALLTYPE DoPreview( void) = 0; virtual HRESULT STDMETHODCALLTYPE Unload( void) = 0; virtual HRESULT STDMETHODCALLTYPE SetFocus( void) = 0; virtual HRESULT STDMETHODCALLTYPE QueryFocus( /* [out] */ HWNDPTR phwnd) = 0; virtual HRESULT STDMETHODCALLTYPE TranslateAccelerator( /* [in] */ MSGPTR pmsg) = 0; }; #endif 




New types are described in a separate file.



prtypes.h

 typedef const RECT *CRECTPTR; typedef const MSG *CMSGPTR; typedef MSG *MSGPTR; typedef HWND *HWNDPTR; 


Accordingly, the script has changed the name of the attendant. At the same time I turned off the generation of the gateway object - the fact is that I am going to get the implemented interface from the external library, and not create it in python, so I don’t need the gateway responsible for the generation.



mk.py

 import win32com.makegw.makegw win32com.makegw.makegw.make_framework_support("preview.h", "IPreviewHandler", bMakeGateway = 0) 


Launched



 C:\Projects\pytest>python mk.py IPreviewHandler 


So, now you need to collect the package. Having smoked the documentation in Python, I find out ( here and here ) that it is necessary and sufficient to build the setup.py script to build. You, probably, already knew, and this is my first time, in the sense of building a package. Do what



 #!/usr/bin/env python from distutils.core import setup, Extension pypacks = "C:/Python/Lib/site-packages/" wdkinc = "C:\\Program Files (x86)\\Windows Kits\\10\\Include\\10.0.14393.0\\" wdklib = "C:\\Program Files (x86)\\Windows Kits\\10\\Lib\\10.0.14393.0\\" pywinsrc = "C:/Projects/Source/pywin32-221/" example_module = Extension('_preview', sources=['PyIPreviewHandler.cpp','prtypes.cpp'], include_dirs=[wdkinc + "ucrt", pywinsrc + "com/win32comext/shell/src", pypacks + "win32/include", pypacks + "win32com/include"], library_dirs=[wdklib + "ucrt\\x86", pypacks + "win32/libs", pypacks + "win32com/libs"] ) setup (name = 'preview', version = '0.1', author = "My", description = """Simple swig example from docs""", ext_modules = [example_module], py_modules = ["preview"], ) 


I already had a Windows SDK (more precisely, the WDK, but not fundamentally) and the Visual Studio Community 2017, I was wondering if it would find them setup.py. The compiler itself was found, and the path to the SDK had to be specified.



 C:\Projects\pytest>python.exe setup.py build_ext --inplace >err.txt error: command 'D:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\cl.exe' failed with exit status 2 


Of course, it did not meet, however, I did not expect that it would meet the first time. Errors are clear:



PyIPreviewHandler.cpp(46): error C3861: 'PyObject_AsCRECTPTR': identifier not found





I used to define new data types, but now we need data conversion functions of these types from python objects to C and vice versa. Searching among the sources of PyWin32, I found within the function PyObject_AsRECT, PyObject_FromRECT, and so on - in a word, everything I needed. I had to fix the generated sishnik to use these functions.



It was:



  CRECTPTR prc; PyObject *obprc; ... if (bPythonIsHappy && !PyObject_AsCRECTPTR( obprc, &prc )) bPythonIsHappy = FALSE; ... PyObject_FreeCRECTPTR(prc); 


It became:



  RECT prc; PyObject *obprc; ... if (bPythonIsHappy && !PyObject_AsRECT( obprc, &prc )) bPythonIsHappy = FALSE; ... //PyObject_FreeCRECTPTR(prc); 


And so on, the benefit of the IPreviewHandler methods is not so much. However, the conversion functions had to be pulled from the source and inserted into the prtypes.cpp file, because they were not included in the library in PyWin32.



prtypes.cpp
 #include "shell_pch.h" #include "prtypes.h" BOOL PyObject_AsMSG( PyObject *obpmsg, MSG *msg ) { PyObject *obhwnd; return PyArg_ParseTuple(obpmsg, "Oiiii(ii)", &obhwnd,&msg->message,&msg->wParam,&msg->lParam,&msg->time,&msg->pt.x,&msg->pt.y) && PyWinObject_AsHANDLE(obhwnd, (HANDLE *)&msg->hwnd); } PyObject *PyObject_FromMSG(const MSG *msg) { if (!msg) { Py_INCREF(Py_None); return Py_None; } return Py_BuildValue("Niiii(ii)", PyWinLong_FromHANDLE(msg->hwnd),msg->message,msg->wParam,msg->lParam,msg->time,msg->pt.x,msg->pt.y); } BOOL PyObject_AsRECT( PyObject *ob, RECT *r) { return PyArg_ParseTuple(ob, "iiii", &r->left, &r->top, &r->right, &r->bottom); } PyObject *PyObject_FromRECT(const RECT *r) { if (!r) { Py_INCREF(Py_None); return Py_None; } return Py_BuildValue("iiii", r->left, r->top, r->right, r->bottom); } 




But now it is compiled without reference to the sources. Compiled, but not going.



 LINK : error LNK2001: unresolved external symbol PyInit__preview build\temp.win32-3.6\Release\_preview.cp36-win32.lib : fatal error LNK1120: 1 unresolved externals 


It feels like you have not been told something. PyInit_xxx similar to the standard name for module initialization, the only question is what should be in it how to register the interface. I had to unravel the PyWin32 sources again and figure out what was needed for a complete build. By analogy with the functions found, PyInit_xxx added his own.



 #include "PythonCOMRegister.h" // For simpler registration of IIDs ... //     static struct PyMethodDef preview_methods[] = {{NULL}}; PyObject *PyInit__preview(void) { static PyModuleDef _preview_def = { PyModuleDef_HEAD_INIT, "_previewer", "Preview Handler Interface", -1, preview_methods }; PyObject *module=PyModule_Create(&_preview_def); //   PyCom_RegisterClientType(&PyIPreviewHandler::type, &IID_IPreviewHandler); return module; } 


Now the _preview.cp36-win32.pyd file was _preview.cp36-win32.pyd (and here Stirlitz guessed that underlining was superfluous). Install the resulting package.



 C:\Projects\pytest>python.exe setup.py install 


I check - in the same test script after import _preview just add import _preview



whole script
 <source lang="python">#!/usr/bin/python3 # -*- coding: utf-8 -*- import pythoncom import pywintypes import _preview adobe = pywintypes.IID('{DC6EFB56-9CFA-464D-8880-44885D7DC193}') CLSID_IPreviewHandler = '{8895B1C6-B41F-4C1C-A562-0D564250836F}' iid = pywintypes.IID(CLSID_IPreviewHandler) handler = pythoncom.CoCreateInstance( adobe, None, pythoncom.CLSCTX_LOCAL_SERVER, iid) print(handler) 




I launch and get:



 C:\Projects\Python\test>python wincom.py <PyIPreviewHandler at 0x00817770 with obj at 0x00745FFC> 


However, it works, the object is created.



It remains to use the product as intended. To check on different document formats, I wrote a script using QFileSystemModel and QTreeView from PyQt5, i.e. on the left, I will have a file system tree, and on the right, a preview of the selected file.







Script below. It is simple enough to parse it line by line, just to say that unlike many examples on the Internet using IPreviewHandler, I don’t read the file into memory, but either open it directly with the viewer through the IInitializeWithFile interface (if it exists), or create a standard stream WinAPI function SHCreateStreamOnFileEx (she, it turns out, is also supported by PyWin32) and pass this stream to the interface IInitializeWithStream - each of the two viewers necessarily has one of the two interfaces.



filepreview.py
 #!/usr/bin/python3 # -*- coding: utf-8 -*- import pythoncom, win32comext import win32comext.propsys.propsys as propsys import win32comext.shell.shell as shellext import pywintypes import _preview from PyQt5.QtCore import * from PyQt5.QtWidgets import * CLSID_IPreviewHandler = '{8895B1C6-B41F-4C1C-A562-0D564250836F}' iid = pywintypes.IID(CLSID_IPreviewHandler) class PreviewWin(QWidget): def __init__(self, parent=None): super().__init__(parent) self.handler = None self.isFirst = True self.topLay = QHBoxLayout(self) self.splitter = QSplitter(self) self.topLay.addWidget(self.splitter) self.model = QFileSystemModel(self) self.model.setRootPath(QDir.currentPath()) self.tree = QTreeView(self.splitter) self.tree.setModel(self.model) cur = self.model.index(QDir.currentPath()) self.tree.setCurrentIndex(cur) self.tree.expand(cur) self.view = QWidget() self.splitter.addWidget(self.tree) self.splitter.addWidget(self.view) self.tree.clicked.connect(self.previewIndex) self.tree.setColumnWidth(0, 200) self.setWindowState(Qt.WindowMaximized) def resizeEvent(self, event): super().resizeEvent(event) if self.handler: self.handler.SetRect(self.view.rect().getRect()); def previewIndex(self, index): try: if self.handler: self.handler.Unload() self.handler = None if not index.isValid(): return filePath = QDir.toNativeSeparators(self.model.filePath(index)) ext = self.model.fileInfo(index).suffix() regPath = "HKEY_CLASSES_ROOT\\." + ext + "\\shellex\\" + CLSID_IPreviewHandler sets = QSettings(regPath, QSettings.NativeFormat) if not sets.contains("."): return classId = sets.value(".") if not classId: return self.handler = pythoncom.CoCreateInstance(classId, None, pythoncom.CLSCTX_LOCAL_SERVER, iid) if not self.handler: return STGM_READ = 0 try: iwfile = self.handler.QueryInterface(propsys.IID_IInitializeWithFile) except: iwfile = None if iwfile: try: iwfile.Initialize(filePath, STGM_READ) except: iwfile = None if not iwfile: try: iwstream = self.handler.QueryInterface(propsys.IID_IInitializeWithStream) except: print(str(sys.exc_info()[1])) iwstream = None if iwstream: iis = shellext.SHCreateStreamOnFileEx(filePath,STGM_READ,0,False) if iis: iwstream.Initialize(iis, STGM_READ) else: return else: print("Can't initialize preview for",filePath) return r = self.view.rect().getRect() self.handler.SetWindow(self.view.winId(), r); self.handler.DoPreview(); self.handler.SetFocus(); except: print(str(sys.exc_info()[1])) if __name__ == '__main__': import sys app = QApplication(sys.argv) w = PreviewWin() w.show() sys.exit(app.exec_()) 




All files are stacked on Github .

Source: https://habr.com/ru/post/344086/



All Articles