📜 ⬆️ ⬇️

MS Excel and Python integration

Good afternoon, dear readers.

In today's article, I would like, in as much detail as possible, to consider the integration of Python and MS Excel applications. These questions may arise, for example, when creating any online reporting system, which should upload the results to a generally accepted format or some other tasks. Also in the article I will show the reverse integration, i.e. how to use a function written in python in Excel, which can also be useful for automating reports.

We work with MS Excel files in Python


I have 2 options for working with Excel files from Python:

  1. Using libraries such as xlrd, xlwt, xlutils or openpyxl
  2. Work with com-object

Consider working with these methods in more detail. As an example, we will use a ready-made excel file from which we first read the data from the first cell, and then write it to the second. Such simple examples will be enough for the first acquaintance.
')

Use of libraries


So, the first method is quite simple and well described. For example, there is an excellent article for describing how to work with xlrd, xlwt, xlutils . Therefore, in this material I will give a small piece of code with their use.

To get started, load the necessary libraries and open the xls file for reading and select
desired data sheet:

import xlrd, xlwt #  rb = xlrd.open_workbook('../ArticleScripts/ExcelPython/xl.xls',formatting_info=True) #   sheet = rb.sheet_by_index(0) 

Now let's see how to count values ​​from the right cells:

 #    A1 val = sheet.row_values(0)[0] #      vals = [sheet.row_values(rownum) for rownum in range(sheet.nrows)] 

As can be seen reading the data is not difficult. Now write them to another file. To do this, create a new excel file with a new workbook:

 wb = xlwt.Workbook() ws = wb.add_sheet('Test') 

We will write the previously obtained data into the new file and save the changes:

 # A1     A1   ws.write(0, 0, val[0]) #  B      A   i = 0 for rec in vals: ws.write(i,1,rec[0]) i =+ i #   wb.save('../ArticleScripts/ExcelPython/xl_rec.xls') 

It is clear from the example above that the xlrd library is responsible for reading the data, and the xlwt is responsible for writing, so there is no way to make changes to an already created book without copying it into a new one. In addition, these libraries work only with xls format files (Excel 2003) and they do not have support for the new xlsx format (Excel 2007 and higher).

To successfully work with the xlsx format, you will need the openpyxl library. To demonstrate its work, let's do the actions that were shown for previous libraries.

First, load the library and select the desired book and worksheet:

 import openpyxl wb = openpyxl.load_workbook(filename = '../ArticleScripts/ExcelPython/openpyxl.xlsx') sheet = wb['test'] 

As can be seen from the above listing is not difficult to do. Now let's see how to read the data:

 #    val = sheet['A1'].value #   vals = [v[0].value for v in sheet.range('A1:A2')] 

The difference from past libraries is that openpyxl allows you to map to cells and sequences through their names, which is quite convenient and understandable when reading a program.

Now let's see how we record and save the data:

 #     sheet['B1'] = val #  i = 0 for rec in vals: sheet.cell(row=i, column=2).value = rec i =+ 1 #   wb.save('../ArticleScripts/ExcelPython/openpyxl.xlsx') 

From the example, it is clear that recording is also quite easy. In addition, in the code above, you can see that besides the names of the cells, openpyxl can work with their indices.

The disadvantages of this library include the fact that, as in the previous example, it is not possible to save changes without creating a new book.

As it was shown above, for more or less full-fledged work with excel files, in this case, you need 4 libraries, and this is not always convenient. In addition, you may need to access VBA (for example, for any subsequent processing) and with the help of these libraries you will not get it.

However, working with these libraries is quite simple and convenient for quickly creating Excel files for formatting them, but if you need more features, then the next subclause is for you.

Work with com-object


In my reports, I prefer to use the second method, namely the use of an Excel file through a com object using the win32com library. Its advantage is that you can perform all operations with a file that regular Excel can do using VBA.

We illustrate this with the same problem as the previous examples.

First, load the required library and create a COM object.

 import win32com.client Excel = win32com.client.Dispatch("Excel.Application") 

Now we can work using the Excel object we can access all the features of VBA. Let's first open any book and select the active sheet. This can be done like this:

 wb = Excel.Workbooks.Open(u'D:\\Scripts\\DataScience\\ArticleScripts\\ExcelPython\\xl.xls') sheet = wb.ActiveSheet 

Let's get the value of the first cell and the sequence:

 #    val = sheet.Cells(1,1).value #   A1:A2 vals = [r[0].value for r in sheet.Range("A1:A2")] 

As you can see, we operate here with pure VBA functions. This is very convenient if you have written macros and you want to use them when working with Python with minimal overhead of rewriting the code.

Let's see how you can record the received values:

 #     sheet.Cells(1,2).value = val #  i = 1 for rec in vals: sheet.Cells(i,3).value = rec i = i + 1 #   wb.Save() #  wb.Close() # COM  Excel.Quit() 

From the example it can be seen that these operations are also quite simply implemented. In addition, you can notice that we saved the changes in the same book that we opened for reading, which is quite convenient.

However, an attentive reader will pay attention to the variable i , which is initialized not 0, as is customary in python, but 1. This is due to the fact that we work with cell indices as from VBA, and there the numbering starts not from 0, but from 1.

This completes the analysis of how to work with excel files in python and proceed to the inverse problem.

Call Python functions from MS Excel


This situation may arise that you already have some function that processes data in python, and you need to transfer its functionality to Excel. Of course, you can rewrite it in VBA, but why?

There is a great ExcelPython add-in for using python functions in Excel. With it, you can call functions written in python directly from Excel, although you still have to write a small wrapper on VBA, and all of this will be shown below.

So, suppose we have a function written in python that we want to use:

 def get_unique(lists): sm = 0 for i in lists: sm = sm + int(i.pop()) return sm 

At the entrance it is given a list consisting of lists, this is one of the conditions that must be met for this function to work in Excel.

Let's save the function in the plugin.py file and put it in the same directory where our excel file will be located, with which we will work.

Now install ExcelPython . Installation occurs through the launch of the exe-file and does not cause difficulties.

When all the preparations are done, open the excel test file and call the VBA editor (Alt + F11). To work with the above add-in, you need to connect it via Tools-> References , as shown in the figure:



Well, now you can start writing the wrapper function for our Python module plugin.py . It will look like this:

 Function sr(lists As Range) On Error GoTo do_error Set plugin = PyModule("plugin", AddPath:=ThisWorkbook.Path) Set result = PyCall(plugin, "get_unique", PyTuple(lists.Value2)) sr = WorksheetFunction.Transpose(PyVar(result)) Exit Function do_error: sr = Err.Description End Function 

So, what happens in this function?

First of all, using PyModule , we connect the necessary module. To do this, as parameters it is passed the name of the module without an extension, and the path to the folder in which it is located. At the output of the PyModule we get an object to work with the module.

Then, using PyCall , we call the function we need from the specified module. As parameters, PyCall gets the following:

  1. Module object obtained in the previous step
  2. Name of the function being called
  3. Parameters passed to functions (transmitted as a list)

The PyTuple function takes as input some values ​​and converts them into a Python tuple object.
Well and, accordingly, PyVar performs the operation of converting the result of the python function, to the type understood by Excel.

Now, to make sure that our bundle works, let's call our newly baked function on a sheet in Excel:



As you can see from the figure, everything worked correctly.

It should be noted that this material uses the old version of ExcelPython , and a new version is available on the author's GitHub .

Conclusion


As a conclusion, it should be noted, the examples in this article are the most simple and for a more in-depth study of these methods, I recommend referring to
documentation for the right packages.

I also want to note that these packages are not the only ones, and the article omits consideration of such packages as xlsxwriter for generating excel files or xlwings that can work with Excel files on the fly, as well as PyXLL , which performs similar ExcelPython functions.

In addition, in the article I tried to somewhat generalize the material scattered around the network, since such questions often appear on the forums and I think it would be useful for some to have such a “cheat sheet” at hand.

Source: https://habr.com/ru/post/232291/


All Articles