Programming language and database Q: syntax does not matter in the enterprise

There was a need to choose a new tariff plan for the cellular. It became clear for 30 minutes to carry with excel and google-docs that nothing sensible could come of it and db could not do here.

A little thought the hand itself scored "q", since it was the only available on the computer here and now. What I knew about him: that he first and last time launched a year ago, about 30 minutes, for a simple task of parsing and searching through a file.

Next, there will be a lot of q, namely, ascii, a follower of a subset of the languages of APL and Scheme, namely, k and its k-sql extensions, which were reborn into a product named Q - a close link of the language and the database built into it.
')

C:\Users\unknown\Dropbox\j>q KDB+ 3.0 2013.02.06 Copyright (C) 1993-2013 Kx Systems w32/ 2()core 2972MB unknown win-d2om7les24v 192.168.1.2 PLAY 2013.05.07

Some lyrics: I download the report from the operator's site in csv and slightly correct the title:

 ; ;tel;time;-; ;money; ;    ;22.02.2013 20:38:14;79064014328;00:00:13;0;114,9175;0,0000;114,9175;  ;22.02.2013 20:03:49;79094445182;00:12:05;0;114,9175;0,0000;114,9175;    ;22.02.2013 17:04:39;79064014328;00:01:15;0;115,8175;-0,9000;114,9175;    ;22.02.2013 13:18:22;79064014328;00:01:36;0;116,7175;-0,9000;115,8175;    ;22.02.2013 01:35:00;;00:00:00;0;119,3675;-2,6500;116,7175;  ;21.02.2013 23:40:42;*102;00:00:01;0;119,3675;0,0000;119,3675;

Then we read the manual, reading the text file 0: parallelly, you can divide the file into fields by specifying the format of columns and the separator in the left part, if the separator is a list, then the table columns will be named from the first line. We select only the required fields, two times and the table is ready.

Who is bored to read about podgovku data - you can jump immediately to the data analysis .

 q)clog:select tel,time,money from ("SSSTSSSS";enlist ";") 0: `:tel.txt q)clog tel time money --------------------------------- 79064014328 00:00:31.000 0,0000 79263883922 00:02:06.000 0,0000 79064014328 00:01:15.000 -0,9000 79064014328 00:01:36.000 -0,9000 00:00:00.000 -2,6500 *102 00:00:01.000 0,0000 ..

Since q is a vector database, in fact, clog is a dictionary, the column name is a list of values.

 q)clog.money `0,0000`0,0000`-0,9000`-0,9000`-2,6500`0,0000`0,0000`0,0000`0,0000`0,0000`-0,..

Just prepare the data. It can be seen that money is not in a numeric format, it would be necessary to convert it to a number: ssr is an oracle replace. The term $ (cast) deals with various type conversions and type conversions, in this case it reads a number from a string:

each is a map

 {"F"$ssr[string x;",";"."]} each clog.money

Well, write it all in the table using update. There is a small feature. If you use the clog table name, then the result of the update function is a new table with updated values. but you can specify the table name as `clog, then the changes will be saved. We will also make the phone as a string, initially “S” is not a string but a character type.

 q)update string tel, {"F"$ssr[string x;",";"."]} each money from `clog `clog

Almost all words in a given q-sql are normal functions with a small portion of syntactic sugar. They can be used separately, for example, where simply converts the bit string to a list of indices.

comparison works with lists. the result is a bit string from where where it retrieves the indices, and select by these indices extracts the corresponding list elements from the table.

 q)15<40 10 20 30 1011b q)where 15<40 10 20 30 0 2 3

In the file there are both outgoing and incoming calls and payments for different services, I select the lines where the money was written off and there is a phone number and assign it the old name:

 q)clog:select from clog where money<0,not tel like "" q)clog tel time money -------------------------------- "79064014328" 00:01:15.000 -0.9 "79064014328" 00:01:36.000 -0.9 "79064014328" 00:01:33.000 -0.9 "79104652109" 00:01:23.000 -11.9 "79265996349" 00:00:12.000 -5.95 ..

We determine the codes that are used to subsequently classify by operators:

Already a little later, I realized that it was necessary to extract the extraction of code into a function.

 q)gcode:1_ 4# / get code from tel q)gcode each clog.tel "906" "906" "906" "910" "926" ..

 q)distinct gcode each clog.tel "906" "910" "926" ..

there is more sql-like entry with exec. exec is the same select, but which does not return the dictionary of the table, but returns the values or the value of the result of the query or table.

 q)codes:exec distinct gcode each tel from clog q)codes "906" "910" "926" ..

Next, go to the dictionaries, they are described simply <keys>! <values>. I create a dictionary code <> operator.

 q)ops:codes ! `beeline`mts`megafon`beeline`mts`beeline`beeline`mts`moscow q)ops "906"| beeline "910"| mts "926"| megafon "909"| beeline "495"| moscow ..

Many tariffs round up the minute to full, I enter a field for convenience, which will be just a whole number of minutes. I do not save it to the table, I just get the result, because later we will create a view with this field. Time in milliseconds, so I divide by 1000.

 q)update ctime:ceiling (time%1000)%60 from clog tel time money ctime -------------------------------------- "79064014328" 00:01:15.000 -0.9 2 "79064014328" 00:01:36.000 -0.9 2 "79064014328" 00:01:33.000 -0.9 2 "79104652109" 00:01:23.000 -11.9 2 "79265996349" 00:00:12.000 -5.95 1 ..

I create a view with an operator and whole minutes if I wrote t: then I would create a table t. I remind you that update preserves the original columns.

 q)t::update op:ops@gcode each tel, ctime:ceiling (time%1000)%60 from clog q)t tel time money op ctime ---------------------------------------------- "79064014328" 00:01:15.000 -0.9 beeline 2 "79064014328" 00:01:36.000 -0.9 beeline 2 "79064014328" 00:01:33.000 -0.9 beeline 2 "79104652109" 00:01:23.000 -11.9 mts 2 "79265996349" 00:00:12.000 -5.95 megafon 1 ..

I typed everything I needed, save the result t to a file just in case, of course it would be more correct to save the clog and view `t description, but laziness:

 q)save `:t `:t

All that is higher - just preparing the data, now a little more interesting: analysis.

Let's see who called the most, here begins grouping. Grouping is a parameter of the select function, which creates lists for each key occurrence:

 q)select ctime by tel from t tel | ctime .. -------------| --------------------------------------------------------------.. "74956471602"| ,1 .. "79031398210"| 7 3 .. "7903X" | ,2 .. "79064014328"| 2 2 2 2 1 2 1 1 1 3 1 2 2 3 1 1 1 1 2 2 3 3 3 1 3 2 1 1 0 2 1 .. ..

then we execute the functions with the parameter in the form of this list, desc is the inverse sort function, it sorts both the normal list and the table, it is sorted by default in the last column.

 q)desc select sum ctime by tel from t tel | ctime -------------| ----- "79064014328"| 126 "79094445182"| 36 "79652650530"| 30 ..

Noticing that there are a lot of calls to one number, I added the column “favorite number”, a little later I decided to just indicate this in the operator field, I assigned the old view a new name, and “t” now is a new view based on the old one:

 q)t2::update op:ops@gcode each tel, ctime:ceiling (time%1000)%60 from clog q)t::update op:`lub from t2 where tel like "79064014328" q)t tel time money op ctime ---------------------------------------------- "79064014328" 00:01:15.000 -0.9 lub 2 "79064014328" 00:01:36.000 -0.9 lub 2 "79064014328" 00:01:33.000 -0.9 lub 2 "79104652109" 00:01:23.000 -11.9 mts 2 "79265996349" 00:00:12.000 -5.95 megafon 1 ..

Now it's time to think about money, specifically about the tariffs of the megaphone.

Some kind of 3 kopecks, to describe the function is simple:

 q)meg3:{0.03*sum x}

Let's see what is there with the money for each operator:

 q)select meg3 time%1000 by op from t op | time -------| ------ beeline| 111.93 lub | 148.05 megafon| 29.1 moscow | 0.93 mts | 24.45

It is necessary to enter tariff options, if the number is `lub, then divide the price by two and add 30p.

 q)lub:{$[x=`lub;30+y%2;y]} / [op;time]

Actually everything, the function for counting will be the following, here for lub it uses currying:

 q){lub[x] meg3[y]}

Unfortunately, I did not find how to pass the key and the value of the “by” result to a function, so I am making this up as a subquery. Since op and time are not two values of some line from a table as in a normal db, entire lists will be passed to the function (in this case, the list and the list of lists), but the function described by me above expects only two parameters: operator and a list of times, so it’s traversed to use eachboth function, which is denoted as' (quotation mark) in essence, this is zipWith , but without limiting the number of lists. The request, unlike the usual db, at the same time is complicated only by ':

 q)select money:{lub[x] meg3[y]}'[op;time] from select time%1000 by op from t money ------ 259.98 29.1 0.93 24.45

I summarize, here you can write both exec sum and sum exec - the result of exec is summed up or exec will sum up the result - it does not matter:

 q)exec sum {lub[x] meg3[y]}'[op;time] from select time%1000 by op from t 314.46

It is clear how much I would spend by going to this tariff. Now we’ll calculate another, where the minute is rounded up, and farther by seconds. It is necessary to count for each specified time, which I do using each:

 q)mego:sum {1.20+$[x<=60;0;1.20*(x-60)%60]} each q)exec sum {lub[x] mego[y]}'[op;time] from select time%1000 by op from t 258.06

Full code:

 clog:select tel,time,money from ("SSSTSSSS";enlist ";") 0: `:tel.txt {"F"$ssr[string x;",";"."]} each clog.money update string tel, {"F"$ssr[string x;",";"."]} each money from `clog clog:select from clog where money<0,not tel like "" gcode:1_ 4# codes:exec distinct gcode each tel from clog ops:codes ! `beeline`mts`megafon`beeline`mts`beeline`beeline`mts`moscow t2::update op:ops@gcode each tel, ctime:ceiling (time%1000)%60 from clog t::update op:`lub from t2 where tel like "79064014328" meg3:{0.03*sum x} mego:sum {1.20+$[x<=60;0;1.20*(x-60)%60]} each lub:{$[x=`lub;30+y%2;y]} exec sum {lub[x] meg3[y]}'[op;time] from select time%1000 by op from t exec sum {lub[x] mego[y]}'[op;time] from select time%1000 by op from t

Making this text was much more difficult than writing these 14 lines. It is clear that there are no very heavy things for any other base, but it was the ease of use and the obviousness of writing some constructions that made me write it. In the beginning it was a little difficult to switch from normal sql, but after understanding that the table here stores data in lists, and the functions, as a rule, work with almost any built-in data types, it became much clearer. It is the idiomatic simplicity and ease of implementation of this db, and in fact it is a mixture of scheme and APL, which makes it possible to use this tool effectively. Impressions of this are APL and functional functionality shifted towards sql and databases.

The entire database consists of a single q.exe file, ~ 400kb in size. Skeptics will smile after this, but then look at the list of customers for this product http://kx.com/end-user-customers.php .
Play with this can be downloaded here http://kx.com/software-download.php

Source: https://habr.com/ru/post/198912/

All Articles

Programming language and database Q: syntax does not matter in the enterprise

More articles: