📜 ⬆️ ⬇️

Preparing .xlsx files: introduction, cell styles

Not so long ago, as a matter of duty, it was necessary to edit MS Office files (first of all, MS Word and MS Excel) using PL / SQL, that is, a language that is not connected to .NET practically in no way. In this regard, there was a problem that in the manual from Microsoft about editing these files, as they say, "hands" did not say practically anything, and the only sane website reference for this case, apparently, has not been updated since 2010. Understanding the need to reduce Putting it all together, what I put together, bit by bit, with Stackoverflow and my own experiments, came almost immediately.

Just want to make a few comments.

The first . Basically, I will talk about what I personally encountered. I have no claims for 100% knowledge of the “inside out” of the format.
')
The second . As many probably know, MS Office 2007 and higher files are an archive that can be opened using any archiver (WinRAR, 7zip, and so on).

Third . “Under the hood” of these files is mostly XML markup, proudly called OOXML or simply OpenXML. Therefore, in principle, Notepad (or, more conveniently, Notepad ++) is enough to understand the principles of editing files with “hands”.

So, let's start with MS Excel format as the most used for generating various reports, unloading from the database and others like them.

When dealing with the .XLSX format, every question “why is it implemented ... is it so?” Should be answered: “that optimization required”. When Microsoft expanded the admissible sheet size to an indecent observer (about a million rows and 16 thousand columns), they understood that there would be maniacs who would score all this with data. And through this, if you approach the data storage "head on", then even the ever-increasing capacity of the PC is not enough under any circumstances. Therefore, it is necessary to make so that the amount of stored data was less than the amount of data in the book. How to do it? Looking ahead: for example, eliminate duplicate rows.

The first file of interest is% file% / xl / workbook.xml. Its main purpose is the manifest, that is, the list of sheets of which our Excel book, in fact, consists of.
This list looks like this:

<sheets> <sheet name="1" sheetId="1" r:id="rId1"/> <sheet name="2" sheetId="2" r:id="rId2"/> <sheet name="3" sheetId="3" r:id="rId3"/> <sheet name="4" sheetId="4" r:id="rId4"/> </sheets> 

This means that there are 4 sheets in the book, and their names are indicated in the name attributes. Each tag must correspond to a file in the folder% file% / xl / worksheets. Excel knows how these files should be called, and when you try to rename them, it considers the entire book damaged.

Even in the folder% file% / xl we are interested in the styles.xml file.

As you might guess, stored here is information about the design of cells. And for the sake of optimization, it is stored in a rather interesting way. The file consists of the following sections:

1. Fonts:

 <fonts count="2" x14ac:knownFonts="1"> <font> <sz val="11"/> <color theme="1"/> <name val="Calibri"/> <family val="2"/> <scheme val="minor"/> </font> <font> <b/> <sz val="11"/> <color theme="1"/> <name val="Calibri"/> <family val="2"/> <charset val="204"/> <scheme val="minor"/> </font> </fonts> 

As you can see, only the unique text styles used in the book are listed here. Each tag is one style. Nested tags are style features such as bold spelling (<b /> tag), size (<sz />), and others.

2. Fill cells:

 <fills count="2"> <fill> <patternFill patternType="none"/> </fill> <fill> <patternFill patternType="gray125"/> </fill> </fills> 

As you can see, the first option is without a fill at all, and the second is a solid fill of the library color “gray125”.

3. Borders:

 <borders count="2"> <border> <left/> <right/> <top/> <bottom/> <diagonal/> </border> <border> <left style="thin"> <color indexed="64"/> </left> <right style="thin"> <color indexed="64"/> </right> <top style="thin"> <color indexed="64"/> </top> <bottom style="thin"> <color indexed="64"/> </bottom> <diagonal/> </border> </borders> 

As you can see, one name here consists of five elements, 4 main borders and diagonal, that is, everything that can be configured through the Excel GUI itself.

4. Cell styles:

 <cellXfs count="4"> <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/> <xf numFmtId="0" fontId="1" fillId="0" borderId="0" xfId="0" applyFont="1" applyAlignment="1"> <alignment wrapText="1"/> </xf> <xf numFmtId="0" fontId="0" fillId="0" borderId="1" xfId="0" applyBorder="1" applyAlignment="1"> <alignment horizontal="center" vertical="center"/> </xf> <xf numFmtId="0" fontId="1" fillId="2" borderId="1" xfId="0" applyFont="1" applyFill="1" applyBorder="1" applyAlignment="1"> <alignment horizontal="center" vertical="center"/> </xf> </cellXfs> 

And here it is necessary to explain in more detail. When we specify the cell style in the sheet file, we will refer to this section. Each <xf /> tag, which is one style, is a collection of links to previous sections, that is, it itself does not contain font declarations, borders, or fills. Consider interesting attributes closer:

  1. numFmtId - specifies the format of the text in the cell (date, currency, number, text, ...). A full list of these types is here.
  2. fontId, fillId, borderId - reference to the font / fill / border (see sections 1, 2 and 3, respectively). Numbering starts at 0.
  3. applyFont, applyFill, applyBorder - an indication that when making this cell, custom font / fill / border is generally used. By default, 0, so you can not specify at all, as seen in the example in the element # 0.
  4. applyAlignment - an indication that the text alignment in the cell will differ from the standard one. The default is 0, but if "1" is specified, then in the parent <xf /> tag
    The <alignment /> tag is embedded, as seen in the example starting with # 1.

Further, the conversation will go about the text data and the actual layout of sheets.

Source: https://habr.com/ru/post/347550/


All Articles