How I measured the evolution of admins in programmers

Recently, my friend Karl (name changed) was interviewed for the position of DevOps and asked me to check his decision. I read the condition of the problem and decided that a good test would come out of it, so I expanded the task a bit and wrote my own implementation, and at the same time asked my colleague Alex to think about my implementation. When all three options were ready, I made two more comparative versions in C # and sat down to write this article. The task is quite simple, and the applicants are at certain stages of evolution from admins to programmers, which I wanted to evaluate.

Who are interested in dirty parts, biased tests and subjective assessments - please under the cat.

Task

By condition, we have text logs with CPU load on the servers and we need to make some selections from them.

Full quest text

Imagine a monitoring system in which 1000+ servers with several CPUs write the load log every minute to a separate file on a dedicated server.
')
As a result, for 1000 servers with 2 CPUs in one day, a directory with 1000 logs in text form with 2880 entries in the following format is obtained:

1414689783 192.168.1.10 0 87
1414689783 192.168.1.11 1 93

The fields in the file mean the following:
timestamp IP cpu_id usage

It is necessary to make a CLI program that takes as a parameter the name of the directory with logs and allows you to see the load of a specific processor in the time interval.
The program can be initialized indefinitely, but the execution time of each request should be less than a second.

You must support the following commands to query:

1. QUERY command - summary server statistics for a time range
Syntax: IP cpu_id time_start time_end

* Time is set as YYYY-MM-DD HH: MM

Example:

> QUERY 192.168.1.10 1 2014-10-31 00:00 2014-10-31 00:05
(2014-10-31 00:00, 90%), (2014-10-31 00:01, 89%), (2014-10-31 00:02, 87%), (2014-10-31 00: 03, 94%) (2014-10-31 00:04, 88%)

2. LOAD command - average load of the selected processor for the selected server
Syntax: IP cpu_id time_start time_end

Example:

> LOAD 192.168.1.10 1 2014-10-31 00:00 2014-10-31 00:05
88%

3. STAT command - statistics of all processors for the selected server
Syntax: IP time_start time_end

Example:

> STAT 192.168.1.10 2014-10-31 00:00 2014-10-31 00:05
0: 23%
1: 88%

It is allowed to use any programming languages, third-party utilities.

PS In the original assignment it was assumed that this is an interactive program that receives commands from the console after loading. This is not necessary and the program can be divided into separate parts for downloading and for executing requests. Those. allowed option with multiple init.sh, query.sh, load.sh, etc. scripts

In general, the task is very transparent and it suggests the use of the database, so it is not surprising that all three options use SQLite. I already made auxiliary options on C # for speed comparison and they work differently.

Evaluation

In ready-made solutions, I evaluated two factors with a ratio of 40/60%: speed and code quality. The method of assessing the factors is given below, but both factors are not related to the general issue and do not show the degree of “adminstvo” or “programmerstvo”, so apart from the dry points of speed / quality, I derived a separate subjective scale “admin solution”, “programmer” , "Universal". This is in no way a competition or a comparison of the speed of different languages, but rather an assessment of programming approaches.

Speed rating

By condition, the request must be executed for less than a second, but neither the equipment, nor the number of cores, nor the architecture of the test station is given. In my opinion, this suggests that the test should be performed an order of magnitude or two faster in order to always stay within the required limits on adequate equipment. At the same time, this should push the idea of scalability - the task gives an example of 2880000 records in one day, but in real conditions there may be much more (more servers and cores), and the sampling range may include not days, but months and years. This means that the ideal solution should not show dependence on the amount of data and not consume limited resources in unlimited quantities. In this case, uncontrolled use of memory (in-memory tables or storage in arrays in memory) is a minus, not a plus, because the sample for the year for 10,000 computers with 8 processors is 42,048,000,000 records, at least 10 bytes each, those. ~ 420GiB of data. Unfortunately, I could not verify such volumes due to the limitations of the available equipment.

For checking the speed, the unix command time (user value) was used, and for interactive solutions, the internal timers in the program were used.

Quality control

By quality, I basically understand universality - universality of use, support, refinement. There is not much sense in the code that works only in strictly defined frames and cannot go beyond them, process more data, be changed for other situations, etc. For example, the x86 Assembler code could be very fast, but it’s not completely flexible, and a simple change like switching to an IPV6 address could be very painful for it. First of all, the processing of incoming parameters was assessed here: non-standard situations, samples that give 0 results, not valid queries. Secondly - programming language, code style, quantity and quality of comments.

Subjective assessment

It is difficult to say exactly which parameter determines how far the administrator evolved to the programmer. Personally, I share them according to this principle: the administrator works with the tools, and the programmer creates them. The difference is approximately like between a professional racer and a car mechanic - a mechanic often drives a car quite well and knows the car thoroughly, but the racer senses all the properties embedded in the car and even more. A good admin knows the speed of the database, understands what horizontal and vertical scaling is, uses indexes every day. The programmer can write his own database, use the nested trees for them, can convert all the data into its own format and leave it on disk, arranged in a sly way for quick access.

If Karl wrote a direct data search, referring to his super performance, especially with hashing or quick search, this would mean that he had already become a programmer. But most likely, would not have a chance to get a job as DevOps.

Programs

In total, I had 5 programs - three are participating in the comparison, two were written later, only to test some ideas and made in C #. For convenience, I will call the program the names of their authors.

Karl
Github code
Language: Python 2.7
Dependencies: no
Interactive: yes
DB: SQLite, in-memory table

Alex
Github code
Language: Python 2.7
Dependencies: progress, readline
Interactive: yes
DB: SQLite, in-memory table

Nomad1
Github code
Language: Bash
Dependencies: no
Interactive: no
DB: SQLite
Feature: external .db file for work

Nomad2
Github code
Language: C #
Dependencies: mono
Interactive: yes
DB: no
Feature: Hash Table by IP Addresses

Nomad3
Github code
Language: C #
Dependencies: mono
Interactive: no
DB: no
Feature: specially prepared data

Testing

For testing, a log generator was written, first in bash, then in C ++. Three test suites were created:

data_norm - 1000 logs with 2 CPUs, one day (~ 80Mb logs)
data_wide - 1000 logs of 2 CPUs, one month (~ 2.3Gb of logs)
data_huge - 10,000 logs with 4 CPUs each, 5 days (~ 10Gb logs)

Requests were formed on the principle:

valid - request in the range of valid values
wide - the query is wider than the allowed values (captures the beginning or end of the range)
invalid - request for missing data

All tests were performed 4 times, the first value was recalled, the rest were averaged (to exclude JIT compilation time, cache warm-up, loading from the swap). Testing was conducted on a working computer under Mac OS 10.13.2 with an i7 2.2 GHz processor, 8GB RAM, SSD drive.

Thoughts aloud on the work programs

DISCLAIMER: “Warming up the cache” causes the data to be pulled into the cache and memory and we measure not the real response time that the program operator would have, but the spherical time in vacuum for returning data from the cache and processing them in sqlite / python / C # . It is not scientific, not professional and useless for anything else other than this article. Do not do this in real life!

Unfortunately, tests for QUERY are not very indicative of half of the programs, because the output to the screen is often several times longer than the query itself. In the case of the Nomad1 program, the output can take hundreds of milliseconds due to the very slow formatting in Bash, while the query is executed in milliseconds. In the Karl program, a measurement error is generally made: the time taken to complete an internal query for QUERY without displaying on the screen is considered. In my understanding, “command execution time” is the time between entering a command and obtaining a result, therefore, the program described the penalties described below.

It is noteworthy that Karl and Alex, without saying a word, wrote programs in python 2.7, using SQLite, in an interactive mode (first the data is loaded, then the commands are received). Nomad1 is written in pure bash as a set of CLI scripts and also uses SQLite.

Nomad2 and Nomad3 are interesting in their general approach: in the case of Nomad2, all data is loaded into memory in a hash table with a key by IP. In the case of Nomad3, it is conditionally assumed that the file name is an IP address and when searching, the program simply reads the file into memory and then goes through the search. Both tests are relevant only for speed comparison and do not participate in the quality assessment. Among other things, they are written in C #, which is represented on Unix as mono and has a bunch of features. For example, the results of mono32 and mono64 differ by several times for the same code, but on Windows and .Net everything works even faster.

Speed results

I will hide the query commands themselves under the cat so as not to litter the topic. In the tables, the result is recorded three rows per cell, this is the speed of QUERY, LOAD, STAT commands in seconds.

Requests

data_norm / valid:
QUERY 10.0.2.23 1 2014-10-31 09:00 2014-10-31 12:00
LOAD 10.0.2.254 0 2014-10-31 13:10 2014-10-31 20:38
STAT 10.0.1.1 2014-10-31 04:21 2014-10-31 08:51

data_norm / wide:
QUERY 10.0.1.11 0 2014-10-01 09:00 2014-10-31 07:21
LOAD 10.0.2.254 1 2014-10-31 15:55 2014-11-04 10:00
STAT 10.0.1.100 2014-10-31 14:21 2015-01-01 01:01

data_norm / invalid
QUERY 10.0.2.23 1 2015-10-31 09:00 2015-10-31 12:00
LOAD 10.0.2.254 0 2015-10-31 13:10 2015-10-31 20:38
STAT 10.0.1.1 2015-10-31 04:21 2015-10-31 08:51

data_wide / valid:
QUERY 10.0.2.33 0 2014-10-30 09:00 2014-10-31 02:00
LOAD 10.0.0.125 1 2014-10-02 14:04 2014-10-04 20:38
STAT 10.0.1.10 2014-10-07 00:00 2014-10-17 23:59

data_wide / wide:
QUERY 10.0.1.11 1 2014-07-30 09:00 2014-10-01 07:21
LOAD 10.0.0.137 0 2014-10-20 04:12 2015-02-01 00:00
STAT 10.0.3.3 2014-10-20 04:12 2015-02-01 00:00

data_wide / invalid
QUERY 10.0.0.123 1 2015-10-31 09:00 2015-10-31 12:00
LOAD 10.0.0.154 0 2015-10-31 13:10 2015-10-31 20:38
STAT 10.0.0.1 2015-10-31 04:21 2015-10-31 08:51

data_huge / valid:
QUERY 10.0.2.33 0 2014-10-30 09:00 2014-10-31 02:00
LOAD 10.0.0.125 1 2014-10-28 14:04 2014-10-30 20:38
STAT 10.0.1.10 2014-10-28 00:00 2014-10-30 23:59

data_huge / wide:
QUERY 10.0.5.72 0 2014-10-31 09:00 2015-11-03 12:11
LOAD 10.0.0.137 0 2014-10-20 04:12 2015-02-01 00:00
STAT 10.0.3.3 2014-10-20 04:12 2015-02-01 00:00

data_huge / invalid
QUERY 10.0.1.11 1 2014-07-30 09:00 2014-10-01 07:21
LOAD 10.0.0.154 0 2015-10-31 13:10 2015-10-31 20:38
STAT 10.0.0.1 2015-10-31 04:21 2015-10-31 08:51

135 tests were done (27 for each program), their speed is shown in the table:

^Test	^Karl	^Alex	^Nomad1	^Nomad2	^Nomad3
^{data_norm / valid}	^0.008800 ^0.000440 ^0.000420	^0.215300 ^0.211700 ^0.217800	^0.256200 ^0.007300 ^0.008300	^0.002160 ^0.000130 ^0.000140	^0.050200 ^0.050300 ^0.052600
^{data_norm / wide}	^0.002640 ^0.000330 ^0.000630	^0.218000 ^0.212000 ^0.215000	^0.716000 ^0.008000 ^0.008600	^0.005000 ^0.000150 ^0.000320	^0.050200 ^0.005200 ^0.005500
^{data_norm / invalid}	^0.000063 ^0.000073 ^0.000065	^0.214200 ^0.209100 ^0.206300	^0.007600 ^0.008300 ^0.008100	^0.000008 ^0.000026 ^0.000034	^0.048000 ^0.053000 ^0.050000
^{data_wide / valid}	^0.007300 ^0.005500 ^0.002300	^6.237600 ^6.146500 ^6.151000	^1.446000 ^0.036000 ^0.069000	^0.017186 ^0.001099 ^0.005665	^0.167000 ^0.088000 ^0.126000
^{data_wide / wide}	^0.006800 ^0.002100 ^0.024200	^6.176600 ^6.157900 ^6.326100	^0.570000 ^0.039000 ^0.070000	^0.008363 ^0.005818 ^0.005592	^0.071000 ^0.160000 ^0.159000
^{data_wide / invalid}	^0.000085 ^0.000110 ^0.000150	^6.288100 ^6.152100 ^6.130400	^0.044000 ^0.040000 ^0.062000	^0.000013 ^0.000040 ^0.000013	^0.155000 ^0.156000 ^0.164000
^{data_huge / valid}	^0.009107 ^0.007655 ^0.012858	^155.9738 ^146.5377 ^140.1752	^1.401000 ^0.013300 ^0.026000	^0.036806 ^0.003798 ^0.003751	^0.069000 ^0.066000 ^0.072000
^{data_huge / wide}	^0.009418 ^0.013718 ^0.014266	^157.1896 ^148.5435 ^147.9525	^1.078000 ^0.011700 ^0.026000	^0.018393 ^0.000805 ^0.003329	^0.072000 ^0.081000 ^0.077000
^{data_huge / invalid}	^0.000070 ^0.000095 ^0.000081	^144.7307 ^158.0090 ^165.6820	^0.012000 ^0.013000 ^0.023000	^0.000012 ^0.000031 ^0.000013	^0.054000 ^0.071000 ^0.081000

I estimated the speed result mathematically: for each query and data set, the order was considered (decimal logarithm of time in microseconds) and then it appeared as a divisor for the order of the fastest solution. Thus, the fastest solution received a coefficient of 1.0, an order of magnitude slower 0.5, etc. The result for each program is averaged and multiplied by 40.

R = 40 c d o t f r a c s u m_{i = 1}^{n} f r a c l o g_{10} T_{b e s t} l o g_{10} T_{i} + M_{i} n

$R = 40 \ cdot \ frac {\ sum_ {i = 1} ^ {n} \ frac {\ log_ {10} T_ {best}} {\ log_ {10} T_ {i} + M_ {i}}} {n}$

For the Karl program, unfortunately, I had to introduce a decreasing coefficient, since it did not count the running time of the entire QUERY command, but only the internal SQL query. I added one order (M) to all non-empty QUERY results, which reduced the Karl score by about 2 points in total.

The full version of the table with the results can be seen here .

Results:
Karl: 31/40 (33 without penalty)
Alex: 15/40
Nomad1: 22/40
Nomad2: 39/40
Nomad3: 21/40

Quality results

Speed tests revealed various interesting bugs and pitfalls. I hide them under the spoiler in case you want to write your program and you are sure that you will certainly not allow other bugs. Nomad2 and Nomad3 programs are not understood or evaluated here.

Errors and omissions

1. Time in UTC. Both Alex and Nomad1 forgot about it and their results are shifted by 2 hours due to being in the GMT +2 zone.

2. Indices. Alex forgot about indexes in general. Karl created an index from all fields - IP + Timestamp + CPU. This is justified only in very rare cases of searching for a specific Timestamp, but according to the condition of the problem we always sample by IP + CPU and Timestamp range. This is not critical if the size of the base is more or less adequate, but for the _wide and _huge options, this led to huge memory losses with a minimum gain in speed. The Karl program on the _huge data constantly crashed with an error "Killed: 9" due to an overflow of memory and swap.

3. The inclusion of the boundaries of the range in the calculations. Nomad1 forgot about this and his selection due to some peculiarities of the timestamp conversion to bash sometimes does not include the lower limit (there is a corrected result in github, but it was not included in the tests).

4. Usage: memory: tables with an unknown amount of data.
This is an architectural error and it was made by Karl and Alex - they made an in-memory table without asking themselves about the consequences and volumes. As a result, their programs are very dependent on the amount of data and available memory, as can be seen in the data_huge test. In reality, such programs would not work or work with problems. The ideal option is to estimate the amount of read data and choose the type of database.

5. Check incoming data and errors. Everything was adjusted here - queries to the database are not checked for valid dates, addresses, SQL Injection, etc. In the case of an invalid LOAD request, Alex gets a divide-by-zero error, Karl writes No data, and Nomad1 does not have any Exception at all and the output of the SQLite error in the STAT query will be milled through splitting the line by the | character. No program accepts an IP address of the form 010.00.020.003. Departures from incorrect requests were all, but because for tests, I had to do 540+ command executions, I didn’t have enough health to assemble and analyze their examples.

6. Rounding results for LOAD and STAT. Karl did not round anything and derived a number with a decimal point, which is not fatal, but does not meet the condition of the problem. Alex led the number to INT, dropping the fractional part entirely.

All three programs are written in modern and readable programming languages (VBScript and Brainfuck are not noticed). The bash code is a bit less readable than the Python version, but noticeably smaller in size. Alex's code uses third-party readline and progress libraries, wrote its own class for Auto-complete on Tab, there are separate functions for help, work with date, support for reloading data, error handling, but the database does not close on exit. The Karl code uses the class to inherit from Cmd, handle exceptions, close the database on exit, catch Ctrl-C. Unfortunately, no one has comments (with a couple of minor exceptions).
An interesting and more programmer approach uses Alex - it makes the same request for all three commands, and then in the code it reads the data for STAT / LOAD, without using AVG and GROUP BY. This significantly reduces the amount of code, and the execution speed in general is the same as if you shift this task to the database.

Taking into account the described features and a couple of additional factors for quality, I evaluated the programs as follows:

Karl: 35/60
Alex: 40/60
Nomad1: 30/60

findings

Total points:
Karl: 66/100
Alex: 55/100
Nomad: 52/100

In terms of points and speed, Karl's decision bypassed everyone’s decision, because Alex’s decision is not competitive in speed due to the lack of indices. What is interesting, as soon as I told Alex about the low speed, he said that in the 82nd line you can add indexes, he planned and thought it over, but decided to leave it “for later”. Unfortunately, it was already after receiving the programs and freezing the code, so this change was impossible.

The Nomad2 and Nomad3 programs scored 39/40 and 21/40 points, respectively. Not surprisingly, working with the hash table was faster than the database, albeit with large memory losses. Working directly with the file system was not very fast, but it should be understood that this option has almost no initialization time, it has a minimal load on the memory and by and large it can be used with any amounts of data prepared in advance.

The Karl variant, due to the “wide” index, consumed the most memory and fell even when the data size was 6GB. All variants with: memory: a table or hash tables will not work with volumes of 10GB or higher, while the solution with the database in the file is not much slower and scales much better. Unfortunately, bash output put an end to the speed of this program.

Working in the form of interactive applications gives a significant increase in speed - for the Nomad1 and Nomad3 programs it is clearly seen that even on empty requests, about 10 ms for bash and 50 ms for C # takes only the launch.

Subjective assessment

Now a little subjective reasoning. Especially nervous you can not read, I remind you that everything written is my own opinion and most likely does not coincide with yours.

All three participants used SQLite and did not fence their bike. This is a definite plus, but it also clearly shows that all three options are far from pure programming. They do their job, quickly enough, but without trying to create their own In-Memory Database with fast indexing (as in the Nomad2 version) or samples without preloading (as in the Nomad3 option). A little bit closer to Alex’s programmer’s solution is the use of a single query, and then the LOAD / STAT calculations in the code. I also did not see other “programmer satellites” in the code, such as logs, comments, own data structures (the IP4 address is a 32-bit number, and the CPU and LOAD are single-byte variables!). The authors as a whole did not begin to think about data storage and, by and large, simply made the transfer of text files to the binary format SQLite.

Total, in my opinion, the decisions on subjective scales were distributed as follows:

The most "admin" solution:

1. Nomad1 is both the .import command and data transfer to the sqlite console client instead of the connector / cursor
2. Karl - work with indexes, SQL queries for all operations, GROUP BY, ORDER BY
3. Alex

The most "programmer" solution (perk "he created a new tool" ):

1. Alex - good structure, work with the data array when sampling, third-party libraries
2. Karl - code with exceptions, data cleansing
3. Nomad1

The most "universal" solution:

1. Nomad1 - commands are added by separate request files to the finished database by analogy with the existing ones; The program does not depend on the amount of data and memory.
2. Alex - a single request produces an array of data, then the code processes them; In the file header there is a code for working with a file database.
3. Karl

All program codes, including the generator, are available on GitHub.

There are also commands that are used to generate data sets.
If someone has a desire to test himself on a similar test - you are welcome.

PS: According to the results of tests Nomad1 - a programmer with 20 years of experience - got less points in this task than DevOps and Junior developer. On the other hand, he is also the author of the article and it would be, ahem, it is not correct to award yourself higher points :)

PPS: Writing the article and performing the measurements took three working days, more than all the participants together spent writing and debugging code. The performance of the author as a writer is definitely unsatisfactory.

Source: https://habr.com/ru/post/359284/

All Articles