QlikView and its younger brother QlikSense are great BI tools that are quite popular in our country and abroad. Very often, these systems save the "intermediate" results of their work - the data that visualize their "dashboards" - in the so-called "QVD files". Often QVD files are used as the main repository in multi-stage ETL processes built on the basis of Qlik. And then some (for me, for example, I deal with data engineering issues in a company) have a question - is it possible and how to use this data without QlikView / QlikSense? Or another - and what is there and is it “correct”?
QVD is a file format optimized for QlikView / QlikSense (reading information from these applications from writing to these files is much faster than to files of any other format). The structure of this file is undocumented and covered by the "darkness of propriety", there are almost no applications that can work with such files (read and write). In this series of articles I will share my experience and practical knowledge gained: I know how QVD works, I can read it directly and quickly and write to it.
Who will be interested in this information: first of all, those who work with QlikView / QlikSense, as well as those who (like me) would like to use the data stored in QVD files. And, of course, all inquisitive.
Everything that is written in this series is based on my personal experience, which, of course, is not "documentation" or "guarantee" (that your files will be exactly as I described. Or that it will last forever ). I also cannot guarantee that I have sorted out all the cases - there may be some files that will contain something not described by me (if only because I did not come across such options). However, I must note that the information has been verified on a large (several hundred) set of files created by different people from different systems using different versions of QlikView / QlikSense.
And a little bit about how I did it: I started with a simple one - a small inline example, persisting in QVD. Further - the analysis of the binary file, brain efforts, tests and errors. Looking ahead (I’ll talk about this in more detail at the conclusion of the series) I managed to quite effectively read and write medium-sized QVD files (hundreds of gigabytes). The starting point of my journey into the world of QVD was this GitHub , many thanks to the author (I tried to contact him - he is not responding).
What was my goal (besides curiosity and the desire to check the correctness of the data with which QlikView / QlikSense works) - I needed to read the contents of the QVD file, i.e. recreate a relational table based on it. And vice versa, upload the relational table data to QVD so that QlikView can load it correctly.
How do I see this series of articles
The QVD file is created by the QlikView / QlikSense script during data loading into the application's memory (the result of the STORE command) and corresponds to one (relational) QlikView / QlikSense table. It consists of two parts.
Metadata is presented in the form of XML (an example will be given below), the binary part begins immediately after the textual one and consists of two blocks
Thus, for a table with N columns, the file will contain N + 1 binary blocks. All parts of the file are "tightly glued together" and go one after the other without any kind of fillers or "liners".
QVD file contains a lot of metadata - "data about data". It is almost self-sufficient, judge for yourself, here is a short list of what is in the metadata (I will describe them in more detail below):
Metadata is stored in a file in text form and can be seen in any program that can show the file in text form (well, almost any ... in one that is not afraid of large files). Personally, I look at the meta information with the help of more - quite conveniently.
In the following, I will use a test table (I use the QlikView syntax, but I think it will be easy to speculate):
SET NULLINTERPRET =<sym>; tab1: LOAD * INLINE [ ID, NAME 123.12,"Pete" 124,12/31/2018 -2,"Vasya" 1,"John" <sym>,"None" ];
I will give as an example the metadata for this label.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <QvdTableHeader> <QvBuildNo>7314</QvBuildNo> <CreatorDoc></CreatorDoc> <CreateUtcTime>2019-04-03 06:24:33</CreateUtcTime> <SourceCreateUtcTime></SourceCreateUtcTime> <SourceFileUtcTime></SourceFileUtcTime> <SourceFileSize>-1</SourceFileSize> <StaleUtcTime></StaleUtcTime> <TableName>tab1</TableName> <Fields> <QvdFieldHeader> <FieldName>ID</FieldName> <BitOffset>0</BitOffset> <BitWidth>3</BitWidth> <Bias>-2</Bias> <NumberFormat> <Type>0</Type> <nDec>0</nDec> <UseThou>0</UseThou> <Fmt></Fmt> <Dec></Dec> <Thou></Thou> </NumberFormat> <NoOfSymbols>5</NoOfSymbols> <Offset>0</Offset> <Length>42</Length> </QvdFieldHeader> <QvdFieldHeader> <FieldName>NAME</FieldName> <BitOffset>3</BitOffset> <BitWidth>5</BitWidth> <Bias>0</Bias> <NumberFormat> <Type>0</Type> <nDec>0</nDec> <UseThou>0</UseThou> <Fmt></Fmt> <Dec></Dec> <Thou></Thou> </NumberFormat> <NoOfSymbols>5</NoOfSymbols> <Offset>42</Offset> <Length>37</Length> </QvdFieldHeader> </Fields> <Compression></Compression> <RecordByteSize>1</RecordByteSize> <NoOfRecords>5</NoOfRecords> <Offset>79</Offset> <Length>5</Length> </QvdTableHeader>
My experience with QVD shows that the XML structure does not change from file to file.
I will comment on the most important elements of metadata.
QvBuildNo
The build number of the QlikView / QlikSense application that generated the QVD file.
CreatorDoc
Typically contains the name of the QVW file whose script spawned the QVD file. This example is empty, possibly because the Personal Edition was used.
CreateUtcTime
Time to create a QVD file.
SourceCreateUtcTime, SourceFileUtcTime, SourceFileSize, StaleUtcTime
I did not see the files in which these fields would be filled in - to an inquiring mind: maybe some settings are missing?
Tablename
The name of the table in QlikView (see example above).
By the way, the words "field" and "column" are synonymous for me, do not be alarmed if I use both of them (I will try not to do this, but still ...).
Each field in QVD stores information about
Fieldname
Field name (again in terms of QlikView, i.e. with regard to "AS")
BitOffset, BitWidth, Bias
For now, skipping is the information for "decrypting strings", consider in the third part, when you will deal with strings.
Type, nDec, UseThou, Fmt, Dec, Thou
Well conceived (judging by the name), but absolutely useless from the point of view of achieving my goal information (in more detail - in the second part, where we will talk about columns). Why useless? - "Type" tag does not correlate with the type of data stored in the binary part. It cannot restore column type (it would seem - what could be simpler, there is a Type tag!). In 90% of cases, the value of this tag will be the string UNKNOWN ...
In the metadata about the columns there are still such data (in the metadata of the example it is not, apparently, due to the small size)
<Comment></Comment> <Tags> <String>$numeric</String> <String>$integer</String> </Tags>
The comment in comments does not need (by the way, in those files with which I worked, 100% is empty ...).
Tags are also useless (in terms of restoring the structure of the table) information. But one can guess from it what kind of information is stored in the column. I will touch upon the typing in the second part in more detail - when I talk about the columns: this is important. But a little more complicated than I would like.
NoOfSymbols
The number of entries in the binary part related to this column. As we see - in our example it is 5. Very important information for decoding.
Offset
The offset of the data block of this column in bytes relative to the beginning of the binary part of the file. Also very important.
Length
The length of the entire data block of this column in bytes. Note that the binary representation of the column element (table cell) in general has a variable length (string, for example), so the length cannot be calculated, you can only take from this tag (smile).
Compression
Never completed (in the data I worked with). We may not use this option ...
RecordByteSize
The size of the string entry in bytes. All strings are represented in a binary block of strings as a bit index (more on this in the third part), the bit index consists of strings of the same length.
NoOfRecords
The number of rows (in the bit index and in the source table).
Offset
Bit index offset (block with information about strings) in bytes relative to the beginning of the binary part of the file.
Length
The length of the bit index in bytes.
In the metadata about the lines there are still such data (again - a short example does not allow to see everything, but it does allow to understand the complex)
<Lineage> <LineageInfo> <Discriminator>Provider=OraOLEDB.Oracle.1;Persist Security Info=True;Data Source=XXXX;Extended Properties=""</Discriminator> <Statement>LinkTable: LOAD SOURCE_NAME & '_' & SOURCE_ID as SYSKEY, HID_PARTY;SQL SELECT * FROM UNITED_VIEW</Statement> </LineageInfo> <LineageInfo> <Discriminator>Provider=OraOLEDB.Oracle.1;Persist Security Info=True;Data Source=XXXX;Extended Properties=""</Discriminator> <Statement>SQL SELECT * FROM UNITED_VIEW</Statement> </LineageInfo> <LineageInfo> <Discriminator>STORE - \\xxx.ru\mfs\SPECIAL\Qlikview\QVData\LinkTable.qvd (qvd)</Discriminator> <Statement></Statement> </LineageInfo> </Lineage> <Comment></Comment>
I won’t go too far here, it’s pretty clear (the original SELECTs that spawned the table in QlikView), I haven’t figured it out yet (sometimes they split up) ... (except for one, there are no 100% comments (smile)) .
The inquisitive reader has the right to ask here: “Nothing new has been said so far, all of the above can be taken and viewed in the XML header of the QVD file ... This has already been repeatedly written on various Internet sites, what is the novelty?”. That's right - the first part is almost entirely devoted to metadata. But this is not the end.
What's next - in the next part, we will look in detail at the structure of the binary part of a QVD file containing information about the columns (unique values ​​of all columns of the table).
Source: https://habr.com/ru/post/454302/
All Articles