Dissecting .xlsx files: editing a file with PL / SQL

Part 1 . Introduction, styles
Part 2 . Lines, markup
Part 3 Editing through PL / SQL

Good day. The third part of the conversation about the format XLSX arrived. I did not accidentally start from the internal device file. Not understanding where everything is and what it looks like, it is difficult to understand why I did this and that. Besides, now I can make a few comments:

The first. If I have not mentioned any element that you need, create an empty XLSX file, create the necessary element and save. Now you know where to look for the code defining this element.
')
The second. OpenXML allows arbitrary text in the markup, if it does not violate the structure of tags (we will use this very actively). Now illustrate. You can do this:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> <si><t> </t></si> <si><t> </t></si> $  $ </sst>

And so - you can not:

 <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> <si><t> </t></si> <si><t> </t></si> <  > </sst>

But to the point. Strictly speaking, there are two possible cases. Either we have a certain template that we need to fill in with data, or we ourselves have to create a file, which is called, from scratch. The first case is simpler, the second is more interesting. But at the same time, both cases require us to have a blank file: since the .XLSX file consists not only of .XML files, creating it with your own hands, alas, will not work.

In general, the methodology I use is largely based on the Alexandria PL / SQL library. The library itself is huge, and if except for the purposes described below, you do not need it, then it is better to implant it selectively.

For a simple substitution in the receipt form of the full name of the subscriber of the funds represented in this library, it should be enough. I, in view of my specifics, had to make an add-on over it. Therefore, if you started reading, I recommend reading it to the end: how to know, maybe my solution will seem more convenient or effective to you. The general algorithm of actions is as follows:

We transform the blank file into BLOB;
Replace the conditional tags in the XML files inside it with our data;
Save the modified BLOB as a new file;
We return the modified file to the user.

Go through these items in more detail. To distinguish between library tools and self-written ones, let’s imagine that I placed the library code in the speculative package lib_utils .

Transformation of the blank file to BLOB

If we have some form available, for example, a receipt, in which the subscriber’s name and amount must be inserted, then everything is easier: we take a ready-made receipt and change the contents of the variable fields to special beacons. There are two basic rules about the text of these beacons - they should not imitate tags and the probability of coincidence in the source text of the document or in the replacement text should be vanishingly small. Otherwise, it all depends on your imagination or habits. I use something like % name% . I will explain why. The sign "%" does not imitate markup, and the probability that there will be a word somewhere, isolated from this side by this sign, is scanty.

But in the case when we do not know in advance what could be in the output file, there will be more work.

First of all, I recommend Excel exploration and designate all the cell styles that we might need (if you don’t need it, it's okay, it's better than if something is not enough). After that, we cut a notebook into the styles and write down the indices of specific styles. So, I made myself a separate style for the heading (gray fill, bold spelling and thin borders) and a separate style for an ordinary line (no fill, regular writing and thin borders).

Then you will have to work not through Excel, but with your hands

The sharedStrings.xml file should look something like this:

 <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> %strings% </sst>

The sheet1.xml file (assuming that we have the first sheet) should contain the following:

 <sheetViews> <sheetView tabSelected="1" workbookViewId="0"> %attach% </sheetView> </sheetViews> <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/> %colsize% <sheetData> %data% </sheetData> %filter%

The % attach% label is located where the region's attachment tag should be. The % colsize% label is where the tag is located, indicating the width of the occupied columns. This was done so that, say, in the column name the width was appropriate. The % data% label will be replaced with the generated cell markup. Label % filter% - in case you need to build an autofilter.

Save, close - the disc is ready. Next, we need to transform it into a BLOB. To do this, we need the library function lib_utils.get_blob_from_file (just in case I remind you that lib_utils is a function from the library by reference at the beginning of the post). The function takes two parameters: directory and file name. Since this is a bit unobvious, I will explain that the directory means the object DIRECTORY. In our example we will name the directory FILE_DIR . That is, the call will look something like this:

 --  ,         -  v_blobsrc := lib_utils.get_blob_from_file('FILE_DIR', 'src_blank.xlsx');

Replacing tags with custom data

In a simpler case with a receipt form (or a similar case), simply use the function lib_utils.multi_replace . The library will do everything for you.

For a difficult case, I built my design. It is based on a composite handwritten data type, which is a comprehensive description of the contents of an Excel sheet. Since the type is composite, let's go from bottom to top:

 /* :  */ type tp_cell is record(address varchar2(15), style number, val varchar2(4000), lines number default 1); /* :  */ type tp_row is table of tp_cell index by binary_integer; /* :  () */ type tp_table is table of tp_row index by binary_integer; /* :      */ type tp_string is table of varchar2(4000) index by binary_integer; type tp_number is table of number index by binary_integer;

The latter type is not directly involved in the construction of the tp_table , but it will still be needed later. Let me explain the elements of type tp_cell .

address is the actual cell address. Here it is necessary to clarify something. In the comments to the previous post, the results of experiments are laid out, showing that it is not necessary to enter the address when describing a cell. This is true. However, in my opinion, the explicit, as a rule, is better than the implicit.
style - reference to the style index of the cell being described. I decided for myself that all the styles I described will be stored as global constants of the custom_utils package.
val - the contents of the cell.
lines is a multi-line flag. Setting its value to 2 or higher will mean that the contents of the cell imply the transfer, and as we remember, to display it you should increase the height of the cell.

The final signature of my file layout procedure looks like this:

 file_build(i_content tp_table, i_filename varchar2, i_filter number default 0, i_attach number default 0);

The procedure takes the following parameters:

i_content - the contents of the future file.
i_filename is the name of the future file.
i_filter - checkbox for auto filter
i_attach - checkbox of the need to fix the area Since in my case only the first line needs to be fixed, my code generated by this flag will always be the same.

And before starting a detailed analysis of the main procedure, I will lay out auxiliary functions:

Secondary functions

 /*     (  Excel) */ function get_literal(i_number number) return varchar2 is begin --    if i_number < 1 or is_number(i_number) = false then return '#'; -- 1-  elsif i_number > 0 and i_number < 29 then return chr(64 + i_number); -- 2-  else return chr(64 + trunc(i_number / 28))||chr(64 + (i_number - (28 * trunc(i_number / 28)))); end if; end get_literal; /* :    */ function is_number(i_char char) return boolean is begin if (to_number(i_char) = to_number(i_char)) then return true; end if; exception when others then return false; end is_number; /*     */ function array_search(i_source tp_string, i_value varchar2) return number is begin for i in 1 .. i_source.count loop if i_value = i_source(i) then return i; end if; end loop; return -1; end array_search; /*  ""  ( 32767 )   CLOB-    */ procedure clob_append(i_dest in out clob, i_src in varchar2, i_encode in varchar2 default 'utf8') is begin if i_src is not null then if i_dest is null then i_dest := to_clob(convert(i_src, i_encode)); else dbms_lob.write(i_dest, length(convert(i_src, i_encode)), length(i_dest) + 1, convert(i_src, i_encode)); end if; end if; end clob_append;

Just want to explain something. Since there may be a lot of data in the file, it is impossible to work with varchar-like types, they simply are not enough. Therefore you have to sharpen your decision under CLOB. In general, so far nothing supernatural. But - to the point.

Basic procedure

 procedure build_file(i_content tp_table, i_filename varchar2, i_filter number default 0, i_attach number default 0) is v_blobsrc blob; --    BLOB v_blobres blob; --    BLOB c_namesrc constant varchar2(50) := 'src_blank.xlsx'; --  - v_stringarr tp_string; --     v_numarr tp_number; --    v_index number; --     v_clobmarkup clob; --   v_clobstring clob; --  sharedStrings.xml v_clobcolumns clob; --    c_letsize constant number := 3; --  1   .. c_padding constant number := 1; --    v_rowcount number; --  -  v_colcount number; --  -  v_multiline number; --  v_filtertag varchar2(50); --    v_attachtag varchar2(150); --     begin /*      */ for l_row in 1 .. i_content.count loop v_rowcount := l_row; --   DIMENSION /* ,      .  ,         */ v_multiline := 1; for l_col in 1 .. i_content(l_row).count loop if i_content(l_row)(l_col).lines > v_multiline then v_multiline := i_content(l_row)(l_col).lines; end if; end loop; clob_append(v_clobmarkup, '<row r="'||l_row||'"'||case when v_multiline > 1 then ' ht="'||(15 * v_multiline)||'" customHeight="1"' else null end||'>'||chr(10)); /*      */ for l_cells in 1 .. i_content(l_row).count loop v_colcount := l_cells; --   if i_content(l_row)(l_cells).val is null then clob_append(v_clobmarkup, '<cr="'||i_content(l_row)(l_cells).address||'" s="'||i_content(l_row)(l_cells).style||'"><v></v></c>'||chr(10)); else --      if is_number(i_content(l_row)(l_cells).val) then clob_append(v_clobmarkup, '<cr="'||i_content(l_row)(l_cells).address||'" s="'||i_content(l_row)(l_cells).style||'"><v>'||i_content(l_row)(l_cells).val||'</v></c>'||chr(10)); else /*   : "-1" -  ,          */ v_index := array_search(v_stringarr, i_content(l_row)(l_cells).val); if v_index = -1 then v_index := v_stringarr.count + 1; v_stringarr(v_index) := i_content(l_row)(l_cells).val; end if; -- --         clob_append(v_clobmarkup, '<cr="'||i_content(l_row)(l_cells).address||'" s="'||i_content(l_row)(l_cells).style||'" t="s"><v>'||(v_index - 1)||'</v></c>'||chr(10)); end if; end if; /*          */ if v_numarr.count >= l_cells then if length(i_content(l_row)(l_cells).val) > v_numarr(l_cells) then v_numarr(l_cells) := length(i_content(l_row)(l_cells).val); end if; elsif v_numarr.count = l_cells - 1 then v_numarr(l_cells) := length(i_content(l_row)(l_cells).val); end if; end loop; --    clob_append(v_clobmarkup, chr(10)||'</row>'); end loop; --     clob_append(v_clobcolumns, '<cols>'); for l_cnt in 1 .. v_numarr.count loop clob_append(v_clobcolumns, '<col min="'||l_cnt||'" max="'||l_cnt||'" width="'||round((v_numarr(l_cnt) * c_letsize / 2) + c_padding)||'" style="1" customWidth="1"/>'); end loop; clob_append(v_clobcolumns, '</cols>'); --     for l_cnt in 1 .. v_stringarr.count loop clob_append(v_clobstring, '<si><t>'||v_stringarr(l_cnt)||'</t></si>'||chr(10)); end loop; /*    -   */ if i_filter = 1 then v_filtertag := '<autoFilter ref="A1:'||get_literal(v_colcount)||'1"/>'; end if; if i_attach = 1 then v_attachtag := '<pane ySplit="1" topLeftCell="A2" activePane="bottomLeft" state="frozen"/>'; end if; --  v_blobsrc := lib_utils.get_blob_from_file('FILE_DIR', c_namesrc); v_blobres := lib_utils.get_file_from_template(v_blobsrc, lib_utils.t_str_array('%colsize%', '%data%', '%strings%', '%filter%', '%attach%', '<dimension ref="A1:B2"/>'), lib_utils.t_str_array(v_clobcolumns, v_clobmarkup, v_clobstring, v_filtertag, v_attachtag, '<dimension ref="A1:'||get_literal(v_colcount)||v_rowcount||'"/>')); lib_utils.save_blob_to_file('FILE_DIR', i_filename, v_blobres); exception when others then dbms_output.put_line('    Excel: '||sqlerrm); end build_file;

Actually, everything. Well, yes, I want to make a reservation: this procedure is periodically supplemented in accordance with current needs.

Source: https://habr.com/ru/post/348450/

All Articles

Dissecting .xlsx files: editing a file with PL / SQL

Transformation of the blank file to BLOB

Replacing tags with custom data

More articles: