DOM L3 XPath support in Project Spartan

Note from the translator: I am a server-side Java programmer, but at the same time it has historically developed that I work exclusively under Windows. In the team, everyone sits mostly on a Mac or Linux, but someone has to live test web-interfaces of projects under a real IE, not for me? So I have been using it for quite a few years both for work and, due to laziness, as the main browser. In my opinion, with each new version, starting with the ninth, it becomes more and more worthy, and Project Spartan promises to be excellent at all. At least in terms of technology - on an equal footing with others. I bring to your attention the translation of an article from the blog of developers, which gives some grounds for this hope.

Providing compatibility with DOM L3 XPath

Having set ourselves the task of providing a truly compatible and modern web platform in Windows 10, we are constantly working to improve standards support, in particular with respect to the DOM L3 XPath . Today we would like to tell you how we achieved this in Project Spartan.

A bit of history

Before implementing support for the DOM L3 Core standard and native XML documents in IE9, we provided web developers with the MSXML library through an ActiveX mechanism. In addition to the XMLHttpRequest object, MSXML also provided partial support for the XPath query language through a set of proprietary APIs, selectSingleNode and selectNodes. From the point of view of applications using MSXML, this method simply worked. However, it did not at all comply with the W3C standards for either interacting with XML or working with XPath.

Library authors and website developers had to wrap up XPath calls to switch between implementations on the fly. If you search online tutorials or examples on XPath, you will immediately notice wrappers for IE and MSXML, for example,

// code for IE if (window.ActiveXObject || xhttp.responseType == "msxml-document") { xml.setProperty("SelectionLanguage", "XPath"); nodes = xml.selectNodes(path); for (i = 0; i < nodes.length; i++) { document.write(nodes[i].childNodes[0].nodeValue); document.write("<br>"); } } // code for Chrome, Firefox, Opera, etc. else if (document.implementation && document.implementation.createDocument) { var nodes = xml.evaluate(path, xml, null, XPathResult.ANY_TYPE, null); var result = nodes.iterateNext(); while (result) { document.write(result.childNodes[0].nodeValue); document.write("<br>"); result = nodes.iterateNext(); } }

For our new web-based engine without plug-ins, we needed to provide native support for XPath.
')

Evaluation of possible options

We immediately began to evaluate the options available for the realization of such support. You could write it from scratch, or integrate it completely into MSXML browser, or port System.XML from .NET, but all this would take too much time. Therefore, we decided to start implementing the support of some main XPath subset, while thinking about the full one.

To determine which initial subset of the standard is worth taking, we used an internal tool that collects statistics on requests from hundreds of thousands of the most popular sites. It turned out that the most common requests of these types:

// element1 / element2 / element3
// element [@ attribute = "value"]
.//* [contains (concat ("", @ class, ""), "classname")]

Each of them perfectly corresponds to a certain CSS selector that can be redirected to the very fast implementation of the CSS selectors API that we already have. Compare yourself:

element1> element2> element3
element [attribute = "value"]
* .classname

Thus, the first step in implementing XPath support was to write a converter from XPath requests to CSS selectors, and redirect the call to the right place. Having done this, we again used our telemetry to measure the percentage of successful requests, and also to find out which of the unsuccessful ones are the most common.

It turned out that such an implementation covers as much as 94% of requests, and allows you to immediately make a lot of sites. Of the unsuccessful majority turned out to be

// element [contains (@ class, "className")]
// element [contains (concat ("", normalize-space (@ class), ""), "className")]

and both are perfectly rewritten into the "element.className" selector. By adding such a rule, we have improved support for up to 97% of sites, which means that the new engine is practically ready for the modern web.

The result of a telemetry run for XPath queries

Providing support for the remaining 3% of sites

Maintaining the vast majority of XPath queries by simply converting to CSS selectors is great, but still not enough, because implementing the rest in a similar way will not work. The XPath grammar includes such advanced things as functions, queries for non-DOM elements, document nodes, and complex predicates. Some authoritative sites (including MDN ) suggest that in such cases, platforms that do not have adequate native XPath support should use polyfill libraries.

For example, wicked-good-xpath (WGX), which is written in pure JS. We tested it on our internal test suite for the XPath specification, and compared to native implementations, it showed 91% compatibility, as well as very decent performance. So the idea of using WGX for the remaining 3% of sites seemed very attractive to us. Moreover, this is a project with source code opened under the MIT license, which is wonderfully combined with our intention to make an ever greater contribution to the open source business. But we, however, have never used JavaScript polyfill inside IE to support any web standard.

To enable WGX to work, and not to spoil the context of the document, we launch it in a separate, isolated JS engine instance, passing the request and necessary data from the page to the input, and take the finished result from the output. By modifying the WGX code to work in such a mode “cut off” from the document, we immediately improved the display of the content of many sites in our new browser.

Sites before using WGX

And this after. Pay attention to the appeared prices and numbers of winning tickets.

However, in WGX there were also bugs, because of which it behaves differently from the W3C specification and from other browsers. We plan to fix them all first, and then share the patches with the community.

Thus, as a result of some data mining on the Web, and with the help of an open source library, our new engine in a short time gained productive support for XPath, and users will soon get better support for web standards. You can download the latest Windows 10 Technical Preview , and see for yourself. You can also write via UserVoice how well we did it, or tweet us, or comment on the original article .

PS from translator: the tendency of turning JavaScript into a language in which the platforms are written, as they say, is evident. Get Firefox Shumway, or PDF.js. Now, Microsoft also translates its browser, at least partially, to JS.

Source: https://habr.com/ru/post/253595/

All Articles

DOM L3 XPath support in Project Spartan

Providing compatibility with DOM L3 XPath

A bit of history

Evaluation of possible options

Providing support for the remaining 3% of sites

More articles: