📜 ⬆️ ⬇️

Monitoring Dynamic XML Documents


At work, as part of designing a new system for integrating devices for monitoring audio / video streams, the task of tracking, accumulating, and subsequently analyzing changes in their state arose. The status is issued through a zoo of dynamic XML documents, used mainly to populate legacy web-UI.

To simplify integration, I proposed the idea of ​​creating a generic library for storing structured diffs for (almost) arbitrary XML. Since these diffs will be preserved taking into account the structure of the document, this would make it possible to very economically accumulate changes in the state of devices, as well as generate reports with analytics, diagrams, etc. After a week of drunken programming, I sketched a working proof-of-concept, which I want to share in this article.

Creating a document outline


The library uses XSD as a source of information about the structure of the document. Getting XSD is very simple: there are many online services that allow you to generate some valid XSD via XML. For most cases this will be enough.

Next, you need to slightly modify the resulting XSD scheme. For each element of the original XML document that assumes multiple entries, you need to add the attribute `monId` to the corresponding XSD` element`. Its value will be the name of the attribute that uniquely identifies the duplicate element. For example, we are going to monitor documents of the following type:
')
<element1> <element2 attr1="value1"> <element3> <element4 attr2="value2">value3</element4> <element4 attr2="value4">value5</element4> <element4 attr2="value6">value7</element4> </element3> </element2> <element2 attr1="value8"> <element3> <element4 attr2="value9">value10</element4> <element4 attr2="value11">value12</element4> </element3> </element2> </element1> 

By the structure of the document it is clear that at least the following elements have multiple occurrences:


Therefore, the `monId` must be added to the corresponding XSD` elements` with the names of the identifying attributes:
...
<xs: element name = "element2" maxOccurs = "unbounded" minOccurs = "0" monId = "attr1" >
...
<xs: element name = "element4" maxOccurs = "unbounded" minOccurs = "0" monId = "attr2" >
...

How it works


So, the library parses XSD (in fact, so far only its limited subset is supported, sufficient to digest the majority of automatically generated schemes), and on its basis it creates tables corresponding to the elements of the original document.



After creating the internal representation of the document schema, each element will correspond to a table in the database. Any change to an item will result in the addition of a new entry in such a table. Those. each entry means an event (add, change, delete, snapshot). In other words, to retrieve the version of the document corresponding to a given timestamp, the library scans all the events corresponding to a given element and reconstructs its state.

Since events can be many, such a reconstruction will require more and more time. That is why for each document it is periodically required to save a snapshot of its current state (snapshot). Thus, the elements will be reconstructed not from the beginning of the document’s existence, but from the nearest snapshot for the specified timestamp.

Using


The library is written in golang and stores documents in PostgreSQL. Libpq is used as a database driver. In the current state, the library can only save and reconstruct XML documents (for an arbitrary timestamp).

Usage example
 package main import ( "btc/data" "btc/mon" "btc/xmls" "database/sql" "log" "os" "time" ) func install(db *sql.DB) { var err error if err = mon.Install(db); err != nil { log.Fatalf("failed to install data monitor: %s", err) } var root *xmls.Element root, err = xmls.FromFile("tmp/etr.xsd") if err != nil { log.Fatalf("failed to create xml schema: %s", err) } schema := mon.NewSchema("etr", "probe ETR-290 checks") if err = mon.AddSchema(db, schema, root); err != nil { log.Fatalf("failed to install schema: %s", err) } doc := mon.NewDoc("hw4_172_etr", "etr", "http://10.0.30.172/probe/etrdata?inputId=0&tuningSetupId=1", 60, 86400) if err = mon.AddDoc(db, doc); err != nil { log.Fatalf("failed to add document: %s", err) } } func commit(db *sql.DB) { file, err := os.Open("tmp/etr.xml") if err != nil { log.Fatalf("failed to open xml doc: %s", err) } defer file.Close() if err = mon.CommitDoc(db, "hw4_172_etr", file, false); err != nil { log.Fatalf("failed to commit doc: %s", err) } } func checkout(db *sql.DB) { timestamp, err := time.Parse( time.RFC3339, "2015-12-25T18:26:58+01:00") if err != nil { log.Fatalf("failed to parse timestamp: %s", err) } if err := mon.CheckoutDoc( db, "hw4_172_etr", timestamp, os.Stdout, " ", " "); err != nil { log.Fatalf("failed to checkout doc: %s", err) } } func main() { config, err := NewConfig("config.json") if err != nil { log.Fatalf("failed to load config: %s", err) } var db *sql.DB db, err = data.Open(config.DbConnStr) if err != nil { log.Fatalf("failed to establish db connection: %s", err) } defer db.Close() //install(db) //commit(db) checkout(db) } 

Source: https://habr.com/ru/post/274131/


All Articles