📜 ⬆️ ⬇️

Handling Excel Files Using Python

image
At work, I had to face the task of processing xls files with python tools. A little google, I came across several libraries with which you can work with excel files.

Libraries:
- xlrd - allows you to read Excel files
- xlwt - create and populate Excel files
- xlutils - a set of utilities for expanding the capabilities of the previous two libraries
- pyExcelerator - also gives you the opportunity to work with Excel files, but has not been updated for a long time.

For my task I used the first three libraries.
The task was divided into several parts: reading a file with the xls extension; creating a new one and filling it out; create a copy of the file based on the input file; delete the necessary lines in the output file.

Read input file


This task is not very complex. The documentation and examples bundled with xlrd helped solve it quickly.
Code example:
import xlrd
rb = xlrd . open_workbook( 'd:/final.xls' ,formatting_info = True )
sheet = rb . sheet_by_index( 0 )
for rownum in range (sheet . nrows):
row = sheet . row_values(rownum)
for c_el in row:
print c_el

')

Creating a new file and filling it


This task was no more difficult than the previous one. Documentation and examples helped.
Code example:
import xlwt
from datetime import datetime

font0 = xlwt . Font()
font0 . name = 'Times New Roman'
font0 . colour_index = 2
font0 . bold = True

style0 = xlwt . XFStyle()
style0 . font = font0

style1 = xlwt . XFStyle()
style1 . num_format_str = 'D-MMM-YY'

wb = xlwt . Workbook()
ws = wb . add_sheet( 'A Test Sheet' )

ws . write( 0 , 0 , 'Test' , style0)
ws . write( 1 , 0 , datetime . now(), style1)
ws . write( 2 , 0 , 1 )
ws . write( 2 , 1 , 1 )
ws . write( 2 , 2 , xlwt . Formula( "A3+B3" ))

wb . save( 'example.xls' )


Create a copy of the file based on the input file


This problem can be solved in two ways. Option one: open the input file for reading, create a new file and overwrite all the data from one file to another. This solution is not difficult to implement, so it makes no sense to lay out the code sample. Option two: use the xlutils library. There are a lot of interesting and useful things in this library, but xlutils.copy will be interesting for our task.
And so, a sample code to create a file based on input using xlutils.copy:
import xlrd
import xlwt
from xlutils.copy import copy

rb = open_workbook( 'final.xls' ,on_demand = True ,formatting_info = True )
wb = copy(rb)
wb . save( "final_complete.xls" )


Here such small code turned out. In order for it to work, the on_demand = True flag must be set. Thanks to the use of the formatting_info flag, the output file is obtained with the same styles as the input file. For my task, this turned out to be the right option.

Deleting rows by a given condition


To solve this problem, it was decided to use a filter. One option is to rewrite from one file to another, excluding those options that do not fulfill the specified condition. But there is one catch here, if you need to keep the style of the paperwork, then this approach will not work (Unless of course you know the design style in advance and can set it programmatically). The solution to the problem was achieved by using xlutils.filter. Task: to leave in the output Excel file only those records that are contained in the transmitted list.
The code that solves this problem:
from xlutils.filter import GlobReader,BaseFilter,DirectoryWriter,process

myfile = 'final2.xls'
mydir = 'd:/'

class MyFilter (BaseFilter):

goodlist = None

def __init__ ( self ,elist):
self . goodlist = goodlist
self . wtw = 0
self . wtc = 0


def workbook ( self , rdbook, wtbook_name):
self . next . workbook(rdbook, 'filtered_' + wtbook_name)

def row ( self , rdrowx, wtrowx):
pass

def cell ( self , rdrowx, rdcolx, wtrowx, wtcolx):
value = self . rdsheet . cell(rdrowx,rdcolx) . value
if value in self . goodlist:
self . wtc = self . wtc +1
self . next . row(rdrowx,wtrowx)
else :
return
self . next . cell(rdrowx,rdcolx, self . wtc,wtcolx)


data = """somedata1
somedata2
somedata3
somedata4
somedata5
"""


goodlist = data . split( " \n " )

process(GlobReader(os . path . join(mydir,myfile)),MyFilter(goodlist),DirectoryWriter(mydir))


Conclusion


Using a set of three libraries, the tasks were solved. The following was noted: if there are graphic elements (pictures, etc.) in the input Excel file, they are not transferred to the output file. Perhaps by studying these libraries it will be possible to solve this part of the problem.

Links


sourceforge.net/projects/pyexcelerator
www.python-excel.org - for the first three libraries.
groups.google.com/group/python-excel is a group that discusses the use of xlrd, xlwt, and xlutils libraries.

PS I think it would be nice to move this post to a thematic blog.

Source: https://habr.com/ru/post/99923/


All Articles