So, we continue the conversation. Just in case, clarify that the beginning is
here .
About the string values and the method of their storage, I have casually mentioned in the first part, and now let's talk in more detail. Imagine that we have a table filled with string data and that it is large. In this case, it is extremely unlikely that all values in it will be unique. Some of them no, no, yes, they will repeat somewhere in different parts of the table. To keep such an array "as is" inside the XML-markup sheet is irrational in terms of PC resources. Therefore, all string values are in a separate file,% file% / xl / sharedStrings.xml. The part of it that interests us looks, let's say, like this:
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="4" uniqueCount="3"> <si><t></t></si> <si><t></t></si> <si><t></t></si> </sst>
Notice the attributes of the <sst /> “count” and “uniqueCount” tags: their values differ. The fact is that in the book one of the lines I used twice. At that, attributes are not required, that is, if you remove them, Excel will not give an error, but when you save the file, it will draw again.
Here you can say that here, inside the <si /> tag, you can play with the font settings. To do this, use the filed
mileage system, applied in MS Word (we'll get to it later). It looks like this:
')
<si> <r> <t xml:space="preserve"> </t> </r> <r> <rPr> <b/> </rPr> <t></t> </r> <r> <rPr> <sz val="18"/> </rPr> <t xml:space="preserve"> </t> </r> </si>
Please note: the <si /> root tag in the previous example directly embedded the <t /> tag containing the text. Here it is wrapped with a <r /> tag, i.e. Run; in Russian it is usually called “mileage”. Mileage is, if in a nutshell - a piece of text that has the same style settings.
In this example, the string value contains 3 runs. In order to be more convenient to consider them, I, perhaps, will carry them out by separate sources.
The first: <r> <t xml:space="preserve"> </t> </r>
This run does not contain the <rPr /> section, therefore it uses the style settings of the cell in which it is located. In it another is interesting: attribute
xml: space = "preserve" . The fact is that by default, Excel, Word, cut end spaces from all runs. It may seem that in this case at the junction of the runs there should always be something like this:
Vasya Peter . But from the experience of communicating with the same MS Word, we know that it is not. Because of which? That's just because of xml: space = "preserve".
Second: <r> <rPr> <b/> </rPr> <t></t> </r>
There is no xml attribute: space = "preserve". We do not care what Excel will do with trailing spaces that are not. But there is a <rPr /> block. In principle, you can put in it any font settings that are just in Excel. I did only one, so as not to inflate the scope of the example.
Third: <r> <rPr> <sz val="18"/> </rPr> <t xml:space="preserve"> </t> </r>
And here we have the font settings and the preservation of end spaces.
Well, still a short remark. If there is a need to make a multi-line entry in a cell, then here in the line there will simply be an ordinary hyphen, chr (10). The cell multiline attribute itself is located in the sheet layout file. In a single-line cell, the hyphen character will be ignored. Excel will simply pretend that it is not.
Go to the folder% file% / xl / worksheets. Here, as mentioned above, each sheet contained in the book is represented by an .xml file.
The sheet markup file contains the following key elements (and, importantly, it is desirable to arrange them in that order):
1. Tag
<dimension> . Not required. Serves, as far as I could understand, to indicate to the system the size of the occupied area, that is, how much to draw the rows and columns
2. Tag
</ sheetViews> . Optional, but sometimes useful. I used it to indicate the need to pin the top line: this is useful for large reports. It looks like this:
<sheetViews> <sheetView tabSelected="1" workbookViewId="0"> <pane ySplit="1" topLeftCell="A2" activePane="bottomRight" state="frozen"/> </sheetView> </sheetViews>
Here it is necessary to give an explanation. The actual pinning of the string is the <pane /> tag. And here are the attributes used here:
- ySplit - shows the number of fixed lines . There is a similar attribute xSplit to fix the columns;
- topLeftCell - specifies the left upper cell of the default visible non-fixed area;
- activePane - an indication of the location of the non-fixed area. The tutorials say that this attribute governs which side will be a non-fixed area. True, having tried different values, for some reason I got the same result. As a variant of “by default”, I chose bottomRight for myself;
- state - the state indicator of the pinned area. To simply pin the string, use the value frozen.
3. The
<sheetFormatPr /> tag . Example:
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
We are interested here mainly in the
defaultRowHeight attribute, that is, the default height of the column. Standard, familiar to us option - $ 15. If you assign it, say, $ 30, then the lines for which the height is not specified separately will be 2 times higher. However, in order to apply a value other than the default, you must specify the
customHeight attribute with a value of "true." It looks like this:
<sheetFormatPr defaultRowHeight="30" customHeight="true" x14ac:dyDescent="0.25"/>
4. Tag
</ cols> . It helps to set the width of the columns different from the default. When completed, it looks like this:
<cols> <col min="1" max="2" width="9" customWidth="1"/> <col min="3" max="4" width="16" customWidth="1"/> <col min="5" max="6" width="66" customWidth="1"/> <col min="7" max="8" width="42" customWidth="1"/> </cols>
Nested tags
</ col> do not indicate every single column, as it might seem, but a group of columns running in a row and having a single width.
- The min attribute is the first column of the group;
- The max attribute is the last column of the group;
- The width attribute is the width of the column from the group;
- The customWidth attribute is a flag for applying custom width; without it, the width will still be default;
5. The
<sheetData> tag is, in fact, the zone where the contents of the cells are stored, or, depending on the type, references to it. On average, it looks like this:
<sheetData> <row r="1"> <cr="A1" s="3" t="s"><v>0</v></c> <cr="B1" s="3" t="s"><v>1</v></c> <cr="C1" s="3" t="s"><v>2</v></c> </row> <row r="2"> <cr="A2" s="2"><v>1</v></c> <cr="B2" s="2"><v>37539</v></c> <cr="C2" s="2" t="s"><v>14</v></c> </row> <row r="3"> <cr="A3" s="2"><v>2</v></c> <cr="B3" s="2"><v>33227</v></c> <cr="C3" s="2" t="s"><v>21</v></c> </row> </sheetData>
As you can see, the
<row> tags with the “r” attribute are enclosed in the
<sheetData> tag, denoting the actual (and not the ordinal) line number: you have to remember that, say, line 1 can be empty, and line 2 is not.
“Letters, have you regretted it?” - you ask. "Save Memory" - Microsoft will respond. If we recall the limit of 16 million with a hook of cells, their motivation becomes clear. It turns out that in theory one (!) Extra character in the attribute name can lead to millions of extra characters when reading the entire file.In the
<row /> tag nested tags
<c /> - cells. In the example, it is clear that the basic attributes of the cell are three:
- r is the cell address;
- s - cell style. We recall the first part of this article: in the styles.xml file there is a <cellXfs /> section, in which the registered cell styles are listed. attribute “s” of the tag < /> - just a link to the element of this list, starting with 0;
- t indicates the need to refer to a string value table in the sharedStrings.xml file. If the attribute is specified, we appeal; if not, we write as is what is in the body of the tag. It is noteworthy that when you try to write text into the body of the tag without specifying this attribute, Excel will scold when opening the file, but obediently transfer our phrase to where it belongs (although I recommend not relying on it and not rely on writing strings in sharedStrings.xml);
When I talked about the file sharedStrings.xml, I mentioned that multi-line cells are marked in the layout sheet file. This is done, for example, as follows:
<row r="1" spans="1:9" ht="45" customHeight="1" x14ac:dyDescent="0.3"> <cr="A1" s="2" t="s"><v>3</v></c> </row>
That is, strictly speaking, we need to specify the custom height of the cell and set the flag for applying this custom height.
6. The
<mergeCells /> tag . As we know, Excel has the ability to merge cells. All merged cells on the sheet are listed here. When completed, the tag looks like this:
<mergeCells count="40"> <mergeCell ref="G15:I15"/> <mergeCell ref="E3:F3"/> <mergeCell ref="E4:F4"/> </mergeCells>
As you can see, one merged cell is indicated by one
<mergeCell /> tag with a single ref attribute that defines the merge range.
7. The
<autoFilter /> tag . Filters that our users love to see in reports. In the completed form, the tag looks like this:
<autoFilter ref="A1:N1"/>
It is easy to understand that the “ref” attribute specifies the area occupied by the active filter cells.
Well, "for dessert" in the file are setting the page for printing. Here is an example from one of my files:
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/> <pageSetup paperSize="9" orientation="portrait" r:id="rId1"/>
The
<pageMargins /> tag sets the margins, and the
<pageSetup /> tag
sets the preferred paper settings.
In the comments to the first part there was a request to talk about actually editing this miracle of technology by means of PL / SQL. The next part will be exactly that.