📜 ⬆️ ⬇️

jl-sql: SQL queries by JSON logs on the command line

Nobody is interested in the introduction, so I'll start right away with examples of using


json-pipe-sql
% cat log.json 

 {"type": "hit", "client": {"ip": "127.1.2.3"}} {"type": "hit", "client": {"ip": "127.2.3.4"}} {"type": "hit", "client": {"ip": "127.3.4.5"}} {"type": "hit", "client": {"ip": "127.3.4.5"}} {"type": "hit", "client": {"ip": "127.1.2.3"}} {"type": "click", "client": {"ip": "127.1.2.3"}} {"type": "click", "client": {"ip": "127.2.3.4"}} 

Execute the query:


 % cat log.json | jl-sql 'SELECT client.ip, COUNT(*) AS count WHERE type = "hit" GROUP BY client.ip' 

 {"client":{"ip":"127.1.2.3"},"count":2} {"client":{"ip":"127.2.3.4"},"count":1} {"client":{"ip":"127.3.4.5"},"count":2} 



Short description


As you can see, to execute the query, the jl-sql utility is used, which accepts a stream of JSON objects separated by a newline character ( "\n" ) as input.


It is worth noting here that the utility is based on the jl-sql-api library, on the basis of which you can quite simply implement data processing in an arbitrary format, not limited only to JSON.


The SQL engine supports WHERE , GROUP BY , HAVING , ORDER BY and {LEFT|INNER} JOIN . As for JOIN , there is a restriction on the expression in ON : support for joins is implemented only by the exact correspondence of two fields, the so-called Equi Join :


 SELECT id, @town.name AS townName JOIN `towns.json` ON @towns.id = townId 



Examples


Definition of unique ip-addresses


 % cat log.json | jl-sql 'SELECT DISTINCT client.ip AS ip' 

 {"ip":"127.1.2.3"} {"ip":"127.2.3.4"} {"ip":"127.3.4.5"} 

Counting the number of unique addresses for each group


 % cat log.json | jl-sql 'SELECT type, COUNT(DISTINCT client.ip) AS ipCount GROUP BY type' 

 {"type":"click","ipCount":2} {"type":"hit","ipCount":3} 

Rebuilding facilities


With the help of alias ( AS ), you can not only assign aliases to the fields, but also create complex structures within the object:


 % echo '{"a": 1, "b": 2}' | jl-sql 'SELECT a AS sub.bar.first, b AS sub.foo.second' 

 {"sub":{"bar":{"first":1},"foo":{"second":2}}} 

Delete and modify


In addition to SELECT , DELETE and UPDATE are also supported.


 % cat log.json | jl-sql -b :ip=127.1.2.3 'DELETE WHERE client.ip = :ip' 

 % cat log.json | jl-sql -b :ip=127.1.2.3 'UPDATE SET banned = true WHERE client.ip = :ip' 

These examples also show the use of bindings (the -b / --bind option), more in detail about them is described in the corresponding section .


Work with dates


 % echo '{"ts": "2017-01-18 00:00:00"}' | jl-sql 'SELECT ts - INTERVAL 2 MONTH 3 DAY 4 HOUR 5 MINUTE 6 SECOND AS past' 

The keyword INTERVAL allows you to "add" and "take away" periods of time from a certain date. All this is similar to using INTERVAL in MySQL.


Since JSON does not provide a separate data type for dates, strings are used to save them. The utility understands formats from RFC2822 or ISO 8601 . Others may be used, but the results will be unpredictable.


It should be noted that the system time zone is used for parsing and date manipulation. If this does not suit you, then you can set the TZ environment variable with the desired time zone before running jl-sql .


A more detailed description of working with dates can be found in the documentation on GitHub .


JOIN


For JOIN , at least one more data source is required, in terms of SQL, this is called a “table”, as such sources are regular files:


 % cat banned.json 

 {"ip": "127.1.2.3"} 

 % cat log.json | jl-sql -b :banned=banned.json 'SELECT * INNER JOIN {:banned} ON @banned.ip = client.ip' 

 {"type":"hit","client":{"ip":"127.1.2.3"}} {"type":"hit","client":{"ip":"127.1.2.3"}} {"type":"click","client":{"ip":"127.1.2.3"}} 

In this example, a new concept was introduced - the data source ( @banned ), more details about this and JOIN in general can be found in the corresponding section .




Productivity and consumed resources


The utility is designed for processing large logs. Depending on the type, the request can be executed either in streaming mode or in the mode of using temporary storage . A huge advantage of streaming mode is the ability to perform jl-sql in conjunction, for example, with tail -f for filtering and reformatting logs in real time.



For sorting large volumes of jl-sql uses the system utility sort , which allows the use of more productive native sorting.




Installation


The code is written in JavaScript under Node.js, so the easiest way to install the package is through npm :


 # npm install -g jl-sql 

Attention: Node.js version is required not less than 6.0.0




Dirty details


Sorting


Sorting can be set by the standard ORDER BY expression [{DESC|ASC}] , the sorting is supported by several fields at once, including multidirectional ones.


The unstructured data at the input introduces its own corrections: the field by which sorting is performed may be absent in some objects or may have different types of data in different objects. Therefore, it is advisable to always explicitly specify the data type with the STRING() and NUMBER() functions:


Sort by rows


 ORDER BY STRING(field) 

Sort by numbers


 ORDER BY NUMBER(field) 

If you do not explicitly specify the type of sorting, the utility will try to determine the type according to the rules described here . If the type could not be determined, then the sorting will occur in rows.


Values ​​are converted to a string according to the following rules:


Data typeString representation
linesource line
numberdecimal string
boolean"true" / "false"
null"null"
missing field"" (empty string)
an objectN / A *
arrayN / A *

* - conversion of strings and arrays to a string is not specified, so relying on the order of these values ​​in the total set should not be.


I recommend always explicitly setting the data type. current default behavior can be changed or completely removed as potentially dangerous.




Binding


In order to solve the problem of data screening on the command line, the utility allows you to use binders (substitutions):


 jl-sql -b :type=hit jl-sql 'SELECT client.ip, COUNT(*) AS count WHERE type = :type GROUP BY client.ip' 

Here the option -b :type=hit specifies a substitution with the name :type and the value "hit" . This makes it possible to use conventional shell-level screening for SQL queries.


Similarly, you can use substitutions for the file names used in the JOIN :


 jl-sql -b :towns=towns.json 'SELECT id, @town.name AS townName JOIN {:towns} ON @towns.id = townId' 

This will allow the use of autocomplete file names in your favorite shell.


More specifically about JOIN can be read in the appropriate section JOIN .




JOIN


JOIN support required an introduction to the syntax of a new entity - the name of the data source (you can call it a "table"). The problem here is that in "classic" SQL you can always determine where in the ident1.ident2 identifier ident1.ident2 name of the table, and where is the name of the field. With JSON, everything is more complicated: objects can have different nesting, so without special syntax it is impossible to say exactly what the user had in mind when he entered ident1.ident2 because ident1 can be either a field name or a "table" name.


To resolve this ambiguity, the @ident syntax was @ident , where @ is the prefix indicating that the identifier next to it is the name of the "table".


Let's return to the example from the beginning of the article and analyze it in more detail:


 % cat log.json | jl-sql -b :banned=banned.json 'SELECT * INNER JOIN {:banned} ON @banned.ip = client.ip' 

So let's start from the beginning:


-b :banned=banned.json - create a binding with the name of the source file, this is not an obligatory step, but it allows your shell to auto-complete petey, and also eliminates the need to escape special characters in the path


INNER JOIN {:banned} - here {:banned} is a special syntax for substitution of binding in JOIN . Without the use of binders, this line would look like INNER JOIN `banned.json` . The use of back quotes is mandatory here. otherwise, the dot ( . ) will be interpreted in a special way.


@banned.ip - here @banned is the name of the table. In this case, the name was derived automatically from the name of the binding, but you can explicitly set it via alias: INNER JOIN {:banned} AS @someName , then the call to this table will occur as @someName.ip


I recommend to always use binders, both for files and for regular data. This eliminates many problems.


Currently only two types of JOIN are INNER JOIN : INNER JOIN and LEFT JOIN . If you do not specify a JOIN type in the query, then the INNER JOIN will be used.


Kau has already been said earlier, the expression in ON must have the form ON @table.primary = foreign , that is, the exact correspondence of the field @table.primary (the table to be connected) to the foreign field (the main table).




Comparison operators


For convenience, the operator = (and its alias == ) does not take into account the type of the value and tries to compare the values ​​as liberal as possible, therefore the value of the expression 1 = "1" will be true . In contrast, the === operator also takes into account the type when comparing, so the expression 1 === "1" will already be false . Comparison rules are similar to those used in JavaScript, more details can be found here .


Comparing Objects and Arrays


The behavior of the operators = and === not defined for arrays and objects, so this comparison should be avoided at the moment. In the future, the comparison of objects and arrays will be implemented.




Command line options


 % jl-sql -h 

 Usage: jl-sql [OPTIONS] SQL OPTIONS: -h, --help show this help -I, --ignore-json-error ignore broken JSON -v, --verbose display additional information -B, --sort-in-memory-buffer-length=ROWS save up to ROWS rows for in-memory sort -S, --sort-external-buffer-size=SIZE use SIZE bytes for `sort` memory buffer -T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR or /tmp -b, --bind=BIND=VALUE+ bind valiable See full documentation at https://github.com/avz/jl-sql 

More detail on interesting options



Links



Thanks for attention.


')

Source: https://habr.com/ru/post/319722/


All Articles