It would seem, why in general it may be necessary to parse XML on the side of the database?
But there may be many reasons, and each of them may have their own. Some, including me, do not hesitate to implement applied logic by means of a database, but for some it seems to be an archaic relic and the usefulness of tools for working with XML in a DBMS, it may seem doubtful to such people. However, I believe that very few people will object to the usefulness of having such an opportunity during the operation phase of an application. For example, the price list of the wholesaler did not accept the application from us — it swirled a mess at the lack of recoding for some items. More than 20k positions in XML - go and figure out where the dog rummaged, what exactly embarrassed the application. Agree, because it would be great to be able to present a list of products listed in XML as a data set that can be connected to a transcoder table, in order to reveal in one fell swoop all positions that do not have transcoding? And there can be many such examples. I have been engaged in supporting an application that integrates with external systems through the exchange of XML messages, and despite the fact that the application did not use the tools provided by Oracle itself, it turned out to be very useful for me and my colleagues with the support of this product.
In this article, I would like to demonstrate how easy and easy it is to parse XML of various degrees of complexity using the tools of the Oracle Database.
I absolutely would not want to touch the DOM parser here. Let me just say that it is, implemented by the DBMS_XMLDOM package. At times, it can be extremely useful to the developer, and it will not be difficult for anyone who has previously encountered DOM parsers from other manufacturers to deal with it.
')
An innovative feature of the oracle is the XMLType type and the means of working with it. This type is part of the XML DB technology, which has been included with the Oracle Database since version 9.2.
The source text of the XML document can be passed to the XMLType constructor as values ​​of types CLOB, BLOB, VARCHAR2, BFILE. Perhaps it is worth noting that BFILE allows you to download a file from the server's file system - not a client, so if our XML is on the client side and it is large enough to be passed as a string in the request, you should probably take care of the possibility of delivering the file from XML content on the server file system.
An example of creating an instance of XMLType, with the content passed in a string:
select XMLType( '<hello-world> <word seq="1">Hello</word> <word seq="2">world</word> </hello-world> ') XML from dual
By creating an instance of XMLType, you can try to take the first tentative steps to parse our XML. The XMLType type implements the Extract method, which, when accepting an XPatch expression, returns an XML fragment that matches this expression. An XML fragment (XML Fragment), unlike a well-formed XML (whellformed XML), allows for the absence of a root element (or, in other words, allowing more than one root element in its composition).
So in the example below, the three expressions return three pieces of XML. The first one returns all occurrences of the word element, the second one returns only its first occurrence, the third one returns a fragment of the text content of the word element, for which the value of the seq attribute is two.
SQL> with demo1 as ( 2 select XMLType( 3 '<hello-world> 4 <word seq="1">Hello</word> 5 <word seq="2">world</word> 6 </hello-world> 7 ') xml 8 from dual 9 ) 10 select t.xml.extract('//word') case1 11 ,t.xml.extract('//word[position()=1]') case2 12 ,t.xml.extract('//word[@seq=2]/text()') case3 13 from demo1 t; CASE1 CASE2 CASE3
Here, I think, it is worth putting a bold emphasis on the fact that in the third case it is the XML fragment that is returned, not the value of this element. Differences will become noticeable only when this value contains wildcard (escaped) characters, such as & amp; & gt ;. In order to get the value of an element, use the extractValue function. Here I will mention that the main XMLType methods are duplicated by SQL functions, or vice versa, the main functions of working with XMLType are implemented in the form of its methods. However, extractValue is an exception. extractValue is represented only as a function. XMLType, unfortunately, does not implement the extractValue method.
SQL> with demo2 as (select xmltype('<a><&hello&></a>') xml from dual) 2 select t.xml.extract('a/text()').getStringVal() case1 3 ,extractValue(t.xml,'a') case2 4 from demo2 t; CASE1 CASE2
Perhaps we should also mention the rules for working with namespaces. Not all intuition leads to a correct understanding of these mechanisms of work. The functions (and method) extract, extractValue, as one of the parameters, accept the namespace description. The namespaces described in this parameter can be used in an XPath expression. And that is what I want to emphasize. Pay attention to the third case. Namespaces in XML and XPatch expressions have different aliases, but they have the same URI, so parsing is successful.
SQL> select extractValue(t.xml,'a') case1 2 ,extractValue(t.xml,'a','xmlns="foo"') case2 3 ,extractValue(t.xml,'y:a/@z:val','xmlns:y="foo" xmlns:z="bar"') case3 4 from (select XMLType('<a xmlns="foo" xmlns:x="bar" x:val="a-val">a-text</a>') XMl from dual) t; CASE1 CASE2 CASE3
So, having learned to extract the values, now we should learn to separate them. Let me remind you that in the first case, for the first example, we tried to select all word elements from XML, and we succeeded; we received two word elements, but we got them in one fragment. In order to represent a fragment containing several root elements as a sequence of fragments, each of which contains one root element, there is a pipelined function XMLSeqence. The function returns XMLSequenceType, which is a table of XMLType values.
If suddenly someone forgot, I will remind that pipeline functions return as though collections, therefore by a call turn around expression of table. The results of these functions are accessed using the virtual column column_value, or the value () expression, and therefore for a table expression (table collection excpression) you should define an alias. If suddenly someone did not know this, I recommend memorizing it as a mantra, understanding will come with time, and then only if necessary.
The simplest example of using XMLSequence is:
SQL> select extractValue(value(t),'b') result 2 from table(XMLSequence(XMLType('<a><b>b1</b><b>b2</b></a>').extract('a/b'))) t; RESULT
I will try to say what is happening here, although I am afraid that in Russian this will turn out to be much more messy and much less clear than in SQL. In the from expression, we first create an instance of XMLType, passing it a string containing the XML text. Next, using the extract method, we extract in one fragment all the elements of b that element a contains. The resulting XML fragment is passed by the parameter to the pipeline function XMLSequence, which is called using the table clause to call which, according to the rules of the grammar. The record set described by this clause is assigned an alias t. In select-list'e we get an instance of the object returned by the table expression t, it has the type XMLType. For each row returned by a table expression, this instance contains one fragment of the source XML element b. Pass this object as a parameter to the extractValue function. The result - on the face.
In fact, everything is not as difficult as it turns out in my presentation. To this it is enough just to get used to it. But brain violence is not yet fully completed. What we did at this stage only works for one XML document. If we have the source code of several XML in the table and we need to parse several of them at once, we will have to remember what left correlation is. Here, too, there is nothing military. This piece is thought up by Oraklom and especially for tabular expressions (table collection expression). The bottom line is that values ​​in a table expression can be used (columns) from the data sets defined in the from expression before (to the left) of the table expression itself. In practice, this does not look so terrible as it is heard:
SQL> with demo3 as(select 1 id, XMLType('<a><b>b1</b><b>b2</b></a>') xml from dual 2 union all select 2 id, XMLType('<a><b>b3</b><b>b4</b></a>') xml from dual) 3 select id xml_id 4 ,extractValue(value(t),'b') result 5 from demo3 s,table(XMLSequence(s.xml.extract('a/b'))) t; XML_ID RESULT
Here, in the table expression t, the xml value of the demo3 table is used. The expression will be calculated for each row of the demo3 table. This is the very thing called such an elaborate word - the left correlation.
The described functionality is quite enough to parse the XML of almost any complexity. These tools can not be parsed, perhaps, only the hierarchically presented data of a knowingly unknown depth of nesting. To parse such structures will have to resort to XSLT, to bring XML to a more readable form. The XSLT transformation is performed by the XMLTransform function, which takes the XMLType of the source document, the second XMLType of the XSL template, and returns the XMLType of the transformation result as the first parameter.
In principle, this theory can be completed. In conclusion, I’ll only demonstrate an example of extracting elements from different levels of XML nesting. For beginners it sometimes causes difficulties.
SQL> with demo4 as( 2 select XMLType( 3 '<master> 4 <id>mater id</id> 5 <details> 6 <detail> 7 <id>detail 1 id</id> 8 <sub_details> 9 <sub_detail> 10 <id>sub_detail 1.1 id</id> 11 </sub_detail> 12 <sub_detail> 13 <id>sub_detail 1.2 id</id> 14 </sub_detail> 15 </sub_details> 16 </detail> 17 <detail> 18 <id>detail 2 id</id> 19 <sub_details> 20 <sub_detail> 21 <id>sub_detail 2.1 id</id> 22 </sub_detail> 23 <sub_detail> 24 <id>sub_detail 2.2 id</id> 25 </sub_detail> 26 </sub_details> 27 </detail> 28 <detail> 29 <id>detail 3 id</id> 30 </detail> 31 </details> 32 </master>' 33 ) xml from dual) 34 select extractValue(s.xml,'master/id') master_id 35 ,extractValue(value(dtl),'detail/id') detail_id 36 ,extractValue(value(subdtl),'sub_detail/id') sub_detail_id 37 from demo4 s 38 ,table(XMLSequence(s.xml.extract('master/details/detail'))) dtl 39 ,table(XMLSequence(value(dtl).extract('detail/sub_details/sub_detail')))(+) subdtl; MASTER_ID DETAIL_ID SUB_DETAIL_ID
As you can see, there is nothing new here. All the same left correlation. The only thing that I would like to draw attention to (+) at the end of the table expression is subdtl. As probably not difficult to guess, it means that you should use an external connection. If we didn’t specify it, we wouldn’t get lines with detail 3.
So, what appeared before our eyes? We have one object type, a relatively limited set of functions, which gives a practically unlimited set of possibilities. I really like this implementation. I am particularly impressed that Oracle corp did not have to align the semantics of their SQL in order to write XML into it. All described features - objects, conveyor functions, table expressions are used by this technology, but not created specifically for it. It turns out that anyone could implement such an implementation. This implementation by the fat line underlines the power and flexibility of the Oracle SQL engine.
On this note, I could have finished, however, the question that I anticipated, and with reproach, does not give me peace. “My friend, in what century do you live in general, did you look at the calendar for a long time? The yard is nearing the end of 2011, the year is now far from being the first production raised at the 11-2 version of the database, and you are still chewing the ninth functional. ” Yes, there is such a sin for me. I know perfectly well that in the 10th version, we introduced a wonderful XMLTable, which completely pushes the functionality that I just described into the background. Allowing it is even easier and no longer forced to parse XML. However, with XMLTable, I still do not have enough experience to say anything beyond the already obvious. Therefore I will confine myself to a simple demonstration.
I will show on the same example:
34 select master_id 35 ,details_id 36 ,sub_details_id 37 from demo4 s 38 ,XMLTable('master' 39 passing (s.xml) 40 columns master_id varchar2 (20) path 'id' 41 ,details XMLType path 'details/detail') mstr 42 ,XMLTable('detail' 43 passing (mstr.details) 44 columns details_id varchar2 (20) path 'id' 45 ,sub_details XMLType path 'sub_details/sub_detail')(+) dtl 46 ,XMLTable('sub_detail' 47 passing (dtl.sub_details) 48 columns sub_details_id varchar2 (20) path 'id')(+) sub_dtl; MASTER_ID DETAILS_ID SUB_DETAILS_ID
It would seem that there are many more letters, a fair question may arise ... what is the profit of innovation? The profit is that the first parameter in the XMLTable is no longer an XPath expression, but an XQuery. So, the union can be made by its means, and not by means of SQL. XMLTable promises to be that same yummy, but, alas, I repeat, I still have nothing to say about it.