📜 ⬆️ ⬇️

Log Log Monitoring: Such a Vulnerable Log or How to Put a Pig On to Colleagues

Monitoring or analyzing logs, whether it concerns security topics, load analysis, or creating statistics and analytics for a sales person or feeding a neural network, is often associated with many problems.


Unfortunately, this is often associated with the human factor, namely, with the unwillingness or misunderstanding of some simple rather many things by many program developers, APIs and services that log the same information needed for monitoring.
Below is exactly how it is often done and why it is impossible to continue living like this. We'll talk about log formats, analyze a couple of examples, write a few regular expressions, etc ...


Dear colleagues, of course, this is your business, as well as what you write in the logs of your program, however, it’s still worth thinking only for yourself ... Perhaps, besides you, some user of yours is looking at this line now with despair programs, and even smart to impossible, but swearing in vain, bot.


I also wrote this post, made another file with such a complex log format for analysis, which led to another “vulnerability”, up to and including writing a ready-made exploit in the search process.


And if I encourage at least one developer to think about this article, this will already be a big deal, and perhaps, next time analyzing the journals written by his program, he will not be remembered with a dirty word, but on the contrary will be gratefully praised.


First, foul language ...


click (you were warned) ...
clear; echo -e "U2FsdGVkX19d2YHsJhZ9re6p/Gc7bK+Ri9MHvcrVSUsU0+a1UtXfEdIJNu88cQ56\nt6eC8VK5yIr5fiwVSV2e9zhpJLEq3BQQ/U1fthG6Jz4GMpFrqreajRhfVCXdrbpg\nMttWTW/3ljnX5hflOuh4OOycnXDL6kK7W5FOhe9nqnki6oYGj8UYkv06aM0acsea\nRq5OpvZrYT+/7E2ABqp+sg+opfDsaoOITtZPkoJMBPm1Ne4o//yq4tGJypLC/d0f\neWypmTRGEdCadPiFUqL97qWJYE2N7e8oIaETB6stHKfwULChVkI4TUff+ClzC1ZH\nJ9eDUa1qEnEtAvvbKxpumoxClF15hYa4Zb12jcaEM6OPIXiFw+fGk7BT6R64k/gN\nUufDNRQuxevX0C1ZJxAX311rqmqC4w9zQrAfiyrObxmk11x6+pj/Ukqn3V/w7Nt4\njfpxks49Ovnr7vy8Zo5uBHu2YcOAxOIjhj13onW2CK73fQ/vonvG/B0gMC9+FMaE\nk9RIRlRGmWJZLnqj6+RLKzakcoa91c60PXChzMCTC6BlXK5obW33uiPRhKmp6/nX\nVJo1XUI1d39yRny9N9m7hxuodFPSS0dgkT2FufzDexmwnFaTl7FvMo3bndbuNAIM\nA49+tM3qha7Bewc7J5cwGi2gFtkfYTJstjZh/rYA7rph2IsI7AJai7DGDhLDVeVV\nWSsFQ3KAkuD4VfdijDA4YLtYVsQguTMgiTwQ+5khqX9VPj9UXhhnX+pBUGj9ZKfa\nycT1gfkwya1+MCzDgAo28oXpoFj5/tGTNQuzi2AT6BteDJJy8U5P64zH4jgEmUD8\nvidPry7DaHY4PQQ8oF09ay5Jv/Z0ugK66+Al8wP15VRC8x0+W+HWzcC2a9LLz+Mx\n9uphZPo2Cl9nVIrWfhjqMKCJttpa3TT2j/pcciZZHJTiTg0hm5mU45YI68kl6s/a\nOxa5clTDOs6zJp79fbNk0jnjyb9Xx/9dcHNZzv1A3sUVDdhzG0EzMr6Fm5Mvg+op\noJ6TGFLuZrlcvdnBPc+J+ywOuhUCI9FPjr7JnkDbCKTMm9VykRqki+bWdURlKJ34\nlEI8LGT4Qrh5McBtruFu3KqC12giO1BvIKV8mj7jdzCflokW7/k+UI6+p1e8IP2j\n9rxlBgdym1t+ZaR3hhWo+WTMCbxzBrzmZaGNMsl5WVYKXUuAZ5hglbI12AcJzNyj\n5vQIft362+zcVY/opWuvhI61d3FdI+WuBGocexb63R/8TiQOaOD+WyElRZYwSFEI\nEd4uHtZOGFYwFJyghNlk6ubNq3BYHdp3RyBDr+R56ndEM25QemAj35TKwdOckqEi\nQCPoDTJwpsSO7pKBpER56O4rBwSu48PDXb95Mi3uBGUQZljXtJ1AHSWUJU3AIcUk\nvWpC0gzIWj9Ev4SXHxrCjqmXRrkfC8iJ7lLlTl3xF7v4Nxa5lorq6frF5500lmsH\nnEI7QmyuRJrE/JuiVbvUApOKnpmIJIlAw4ZCBuXo/PDsWwEwK4+Imi3hFTGtOv+Z\nj+cbOGetk5PWrIgDdbCGEnzWcKbdv31ASRdqfvwjqCpLN8kwRA2+pT7uFR65kkpd\ntpeZrnWc0RiVwwoyxI1IFLQvbWec4UXl/iJ1t8WuueI0BiK5crjzVhns/8v9uSDo\n1jtleZN5vaPlEWKuUUM4SrdS6NLOkqeHN0omtoP38fZoRkpwdytosbj07gI691cf\noc0c3nUo357d0GPq1Jmn3XCuLPnjv4Vn1+f1ryo+y8ang7rFI1C7+1wWEt2pp2nc\nDmQzAIFp0ncrSOTrLeCfVjy12+QAZ96ddG/cMVFcU4DFF/zxS9YIHJlbCF0/wjUY\nKcrpkIPc5Jb616WWUwbVZ0Kw4oPJf923Itu9LlcoNhlrGEUSVQXBwSm8cdWKcdlx\niVp22UjEn7Ycw6O7gZHJrpP2ysCBzpOFKSkd0274p8nT3bIva1aKtwEK0E49mPtr\n+WZ504z2blfHexYoVLtObrSOB2kktCuXLy6NpfhJyLDaywo3n1MHFOjfPE4dDPo4\nrTOEkFzsZukR8M+L77lQhuhskJ3zIZtpSqiL2qyfo8ZIS9t3ft+Vstj06BcbZSHJ\nGn/bKpAxAhHmaoy/qeEYh+fehn7KxGAc0eppPnwoPhfc5DPuXKtyfhBY5Ci9SZyV\nFOc8VcplHt5ED0lr0sfHeLLwUCaZGJY3tkHCPewQ2qGt+jGsbt8uI2s/gBKjePmU\nLTWts/eDPT9JzpTXcJmY6CqZccDsjOY5Pl4lqZwEc+yqMJHqXq+BbIsAwl/Wf19P\nPpv1VJ0L/MlM5r+o+QX5b70c9WEpSVlx946UlJbbPssrEAvgknwJrpKoNRF5gCAx\nDzDZ/ayUr5rlr8hfBcYUqGRYKGJPpzFvNkM6cuRIu8BSklZPmv4KaWdrpjZt5KdQ\nJ1vY6fe5Y/mB0w/qGeCbCb3bPGLnkhS2KDVazHHrsfdj50BMVtsJGmMTu4vwtUzF\nMTE6IjJJWL71DP5pCla9vLoyrUJboNFmQk9QqmOMrs2mLmJzIdL1zb51OpBIZOSG\nboYc0xU9sUMX7w2goPauyw==" | openssl enc -aes-128-cbc -a -d -salt -pass pass:wtf 

I apologize to my colleagues from Windows, although it is likely that revelations will open under gitbash or mingw ...


Everyone calmed down and went ...


(Footnote for skiddie: there is no mentioned exploit in the article, - to think and write yourself)


So, what is happening in the development world with regards to logging:



Here is my little analysis (eng) with which you have to fight in a particular case (using fail2ban as an example) and why this is at least not good.


Now the specifics: as an example, look at the following two lines:


 Aug 18 08:04:51 srv sshd[2131]: Failed password for invalid user test from 1.2.3.4 port 46589 ssh2 from 4.3.2.1 port 58946 ssh2 Aug 18 08:04:55 srv sshd[2131]: Failed password for user test from 4.3.2.1 port 58946 ssh2: ruser from 1.2.3.4 port 46589 ssh2 

Let's forget for a minute a log analyzer (aka bot) and look at them with a human eye. Do you understand everything here?
No, that there is something "exploit" or trying to find vulnerability, can be seen with the naked eye. Those. at least should be confused by the presence of two different IP addresses in each of them.


The question is: which of these two addresses is bad?


Let us briefly digress and look at the damn interesting OpenSSH sources (module auth.c ), namely, where these lines were created (yes, yes, you understood correctly - they were made by one function):


 authmsg = authenticated ? "Accepted" : "Failed"; authlog("%s %s%s%s for %s%.100s from %.200s port %d ssh2%s%s", authmsg, method, submethod != NULL ? "/" : "", submethod == NULL ? "" : submethod, authctxt->valid ? "" : "invalid user ", authctxt->user, ssh_remote_ipaddr(ssh), ssh_remote_port(ssh), authctxt->info != NULL ? ": " : "", authctxt->info != NULL ? authctxt->info : ""); 

Already much clearer, right? Well, now you already know the answer? Still not? .. Hmm ...


Okay, I will not drag out the intrigue: this is 4.3.2.1


In the first case, from host 4.3.2.1 try to perform "Injecting on username" ( authctxt->user ) with the user name - "test from 1.2.3.4 port 46589 ssh2" .
In the second case, from host 4.3.2.1 try to perform "Injecting into info" ( authctxt->info ) with a value equal to "ruser from 1.2.3.4 port 46589 ssh2" .


Is it true, the intuitive record format?


The key to this particular case is the presence of a colon, which is created by authctxt->info != NULL ? ": " : "", authctxt->info != NULL ? ": " : "",


What I thought (and) the developer (s) of this masterpiece, I really do not understand ...


Now let us estimate the complexity of the machine analysis of this, if I may say so, “structure” from the point of view of security monitoring (specifically, for example, in fail2ban). In assessing, HOST (or IP address) is important to us first of all, the difficulty of getting it in this particular example is related to the unpredictability of the location of the latter. Yes, it always stands after from , but due to the lack of foreign-data masking and writing it after this data to the log (the sixth parameter, ssh_remote_ipaddr(ssh) ), determining its current position is not very easy.


We are not looking for easy ways (in fact, we have no choice), so simply, as an example of complexity, we will try to assemble a regular expression suitable for this record.
I will use regular expressions syntax for python (as the language in which fail2ban is made) ...


Firstly, the "statics" and the strictly typed component:



That's all, now the "dynamics":



Those. we get the following expression, anchored for reliability on both sides ( ^...$ ):


 ^Failed (?P<meth>\S+) for (?P<valid>invalid user )?(?P<user>\S*) from (?P<host>(?:\d{1,3}\.){3}\d{1,3})(?: port \d*)?(?: ssh\d*)?(?P<info>: .*)?$ 

A check on two examples showing that the simplest case works:


 ##      (bash): $ _test() { python -c 'import sys, re; regex, log = sys.argv[1:]; print(log); r = re.search(regex, log); print(r.groupdict() if r else "*NOT-FOUND*")' "$1" "$2"; }; alias t=_test; ##  : $ regex='^Failed (?P<meth>\S+) for (?P<valid>invalid user )?(?P<user>\S*) from (?P<host>(?:\d{1,3}\.){3}\d{1,3})(?: port \d*)?(?: ssh\d*)?(?P<info>: .*)?$' ##  â„– 1 $ t "$regex" 'Failed password for invalid user test from 4.3.2.1 port 58946 ssh2' {'info': None, 'host': '4.3.2.1', 'valid': 'invalid user ', 'meth': 'password', 'user': 'test'} ##  â„– 2 $ t "$regex" 'Failed publickey for root from 4.3.2.1 port 58946 ssh2: RSA SHA256:v3dpapGleDaUKf...' {'info': ': RSA SHA256:v3dpapGleDaUKf...', 'host': '4.3.2.1', 'valid': None, 'meth': 'publickey', 'user': 'root'} 

Now we will try to complicate the conditions (the username contains spaces) using non-greedy catch-all, although I do not like them, but we remember - we did not have much choice. Those. yuzay .*? instead of \S+ in username.


Why it is not good - for example, since the anchor on the right is almost open, because .*$ equivalent to an open expression on the right without an anchor. About the speed and cpu-load on the long lines already keep silent. But for now, let's continue this way (at least a colon is required in this case):


 $ regex='^Failed (?P<meth>\S+) for (?P<valid>invalid user )?(?P<user>.*?) from (?P<host>(?:\d{1,3}\.){3}\d{1,3})(?: port \d*)?(?: ssh\d*)?(?P<info>: .*)?$' $ t "$regex" 'Failed password for invalid user hello from space from 4.3.2.1 port 58946 ssh2' {'info': None, 'host': '4.3.2.1', 'valid': 'invalid user ', 'meth': 'password', 'user': 'hello from space'} 

Works! Well, now we try on the top examples with injections:


 $ t "$regex" 'Failed password for invalid user test from 1.2.3.4 port 46589 ssh2 from 4.3.2.1 port 58946 ssh2' {'info': None, 'host': '4.3.2.1', 'valid': 'invalid user ', 'meth': 'password', 'user': 'test from 1.2.3.4 port 46589 ssh2'} $ t "$regex" 'Failed password for user test from 4.3.2.1 port 58946 ssh2: ruser from 1.2.3.4 port 46589 ssh2' {'info': ': ruser from 1.2.3.4 port 46589 ssh2', 'host': '4.3.2.1', 'valid': None, 'meth': 'password', 'user': 'user test'} 

What we see, it also seems to work correctly (both times we have the correct value of 'host': '4.3.2.1' ).
But ... Always, there is a "but", isn't it?


Both of these examples are simple, even without taking into account the undesirable use of catch-all, if you make an injection more complicated, then our expression “breaks” or, much worse, returns incorrect data (which theoretically is a vulnerability, because we can either fail2ban to block a “foreign” host, or to go through passwords indefinitely, because we are “invisible”).


I will not include a gear grinder here and immediately cite the “correct” (no, rather more appropriate) expression. I don’t really like it either (for many reasons), but what is - that is ...


 ^Failed (?P<meth>\S+) for (?P<cond_inv>invalid user )?(?P<user>(?P<cond_user>\S+)|(?(cond_inv)(?:(?! from ).)*?|[^:]+)) from (?P<host>(?:\d{1,3}\.){3}\d{1,3})(?: port \d+)?(?: ssh\d*)?(?(cond_user):|(?P<info>(?:(?! from ).)*)$) 

Below I will explain a little what it does. But why is it and what kind of injections (test-cases) does it cover, I will keep silent for now ...


Let it be like homework, well, or if you want to prevent script-kiddies from being tempted, although on the other hand they also need to learn something ...


So - this is a complicated (subordinate) expression with conditional "transitions" that in python look like


 (?P<->)? ... (?(-) -1 | -2) 

Briefly why it is difficult (subordinate):



Yes, the expression "(?:(?! from ).)*" - "conditional" catch-all, which will collect everything, if (so far) there is no " from " .


In fact, there are logs, much more complicated than the above example, right up to completely structural ones, which are not regularly understood in principle (or because of their complexity, because the three-story conditional transitions there will take the brain away from the word at all). Sometimes it is possible to collect trailer data from several records (if they have a common identifier).


Neural networks, unfortunately also not a panacea at all, because as a rule, they must first be fed with the necessary information, where, in the process of learning, they ideally should not collect any "garbage".


Unfortunately, such logs are more common than we would like, and there are often a lot of other questions to the "manufacturers" of logs. On this basis, disputes often arise (for example, your humble servant with SW. Prof. yarikoptic ) - how (how strictly) it is better to design a regular schedule:



Instead of the conclusion, a little more, as I believe, you need to do logging (something else, be it an API, or the most complex servers):



Well, for this particular entry, it would look something like this (everything is "strictly typed" at the beginning; the user name and other dynamic information at the end, for example, in quotes; well, we mask (quotes, spaces), for example, url_encode from above):


 Auth attempt: Failed password from 4.3.2.1 port 58946 ssh2, invalid user: "test+from+1.2.3.4+port+46589+ssh2" Auth attempt: Failed password from 4.3.2.1 port 58946 ssh2, user: "test", info: "ruser+from+1.2.3.4+port+46589+ssh2" Auth attempt: Failed publickey from 4.3.2.1 port 58946 ssh2, user: "root", info: "RSA+SHA256:v3dpapGleDaUKf..." 

You can actually think up many more such points, but if it is at least to follow these rules or some of them, the world of many people (and not only people) will again start playing with new colors.


And thank you so much from your grateful users, your colleagues who understand your logs, and especially from some of the pieces (burdened with artificial intelligence, all kinds of neural networks and other bots) with rays of gratitude will be spilled on your karma.


')

Source: https://habr.com/ru/post/308116/


All Articles