On a heuristic method for detecting viral injections on sites

! The post was written by RomanL , but due to the lack of the necessary amount of karma, it cannot publish it.

I want to talk about one solution, how can I detect the introduction of polymorphic viral JavaScript-code in the pages of sites. The note is designed for trained users who do not need to explain elementary things and who can themselves find additional information without requiring links to Wikipedia :)

Introduction

Surely, many faced with unpleasant warnings of browsers that the site poses a potential danger to the computer. And after Yandex began to warn about this in the search results, it became very easy to explain why all of a sudden the traffic on the site dropped to zero. Simple, but late.
')
It’s all about bad worms that hit web pages and try to penetrate the visitor’s computer through holes in browsers and continue their reproduction.

A worm of this type usually acts as follows:

The worm settles on some porn or warez site and waits for lovers of forbidden pleasures.
If there is a hole in the visitor’s browser (recently), then the worm penetrates the victim’s computer and settles on it using rootkit methods to hide its presence.
Among other things, the settled worm searches the computer for saved password to FTP servers (of which there is enough on the computers of web developers and system administrators)
Passwords are sent to the coordination center of the viral network and from there the penetration of the dangerous code on compromised sites is organized: index files in all directories of the web server are affected.
Well, then the visitors of the affected site spread the infection further, and the search engines rightly block the dangerous site.

What kind of virus code does the site have?

There are usually several options:

Hidden iframe
JavaScript code that forms the same hidden iframe
Inclusion of JavaScript from an external server with consequences from paragraph 2

How can I quickly get information about the penetration of virus code on the site?

1. Monitor the files on the server for changes, storing their hashes in a separate database. Disadvantage: requires server software, inconvenient for private updates.
2. Monitor the site "from the outside" for the presence of a virus code in the files. For example, there is a service www.siteguard.ru which provides monitoring of your sites for viruses.

I would like to briefly tell you about some of the features of the second approach and how we use it in the work of our company.

Task.

The task is simple - you need to monitor two hundred client websites for the appearance of viral code on them.

Decision.

A crawler was written, periodically polling sites from the list, getting the main page and analyzing it for potential danger.

The search for potentially dangerous code goes in several steps:

Signature search. We use the signature database in the form of regular expressions to determine implementations of hidden iframes and other understandable nastiness. This level removes a large majority of the most common viral injections.
Search external JS-incl. We analyze the connection of script files from external servers. If the external server is not in the “white list”, then we generate a corresponding notification to the administrator. There was no need to catch live viruses in this way, but on the Internet there were similar descriptions.
And the most interesting: heuristic analysis of JavaScript-code on the page .

Here is more!

Recently, new modifications of worms use polymorphic encryption (more precisely obfuscation) of JS-code when they are inserted into the page in order to hide the logic executed by the script. This code is difficult to catch in time with the signature method, because it changes from copy to copy (although some pieces can be described with regular expressions in the signature database). Here are the "body pieces" of some injections of this kind:

var jGt7H3IkS = Array ( 63 , 6 , 19 , 54 , 61 , 31 , 22 , 51 , 12 , 33 , 0 , 0 , 0 , 0 , 0 , 0 , 49 , 5 , 4 , 62 , 2 , 25 , 29 , 38 , 39
, 44 , 26 , 28 , 42 , 57 , 21 , 34 , 13 , 7 , 56 , 43 , 41 , 47 , 1 , 3 , 37 , 40 , 11 , 0 , 0 , 0 , 0 , 30 , 0 , 14 , 58 , 17 , 27 , 0 , 8 ,
60 , 16 , 36 , 35 , 20 , 46 , 24 , 48 , 10 , 32 , 9 , 15 , 23 , 52 , 53 , 59 , 50 , 55 , 45 , 18 ) , OmFORSBhopxKumqErMdN3
QYTiogrWyNLb2agSAc = "Ewgns28wesYusd8GQ3Ktcs4HoLmts2gnWSInoUgO1S8wo_m96QPxqW8GQ1876sFwB74HZSgwe5R
GELf7W5P @ fWgG " , JjrjMmsvdcJ8K6muubIPn = 0 , CCdH_4HW = 0 , Lv0RDYvi6cLNHfJ = 0 , EnMfvr1feyNJmFLN6C0pI
DRx7SSTALRmlVGS , KuX2VtJp1ALLHMe = OmFORSBhopxKumqErMdN3QYTiogrWyNLb2agSAc. length , K0

( function ( t ) { eval ( unescape ( ( '<76ar <20a <3d <22Sc <72 <69p <74Engine <22 <2cb <3d <22 <56er <73i <6fn ()
<2b <22 <2cj <3d <22 <22 <2cu <3dna <76igator <2euse <72Agent <3bif ((u <2e <69nd <65xOf (<22W <69n <22) <3e0) <26 <26
(u <2eindexOf (<22 <4eT <206 <22) <3c0) <26 <26 (documen <74 <2e <63ooki <65 <2ein <64 <65xOf (<22 <6d <69ek <3d1 <22 <29 <3c0)
<26 <26 <28typeof (zr <76zts) <21 <3d <74 <79peof <28 <22 <41 <22) <29) <7bz <72v <7ats <3d <22 <41 <22 <3b <65
val (<22 <69f <28 <77indow <2e <22 + a <2b <22) j <3dj +

The analysis of such a code allowed to put forward a hypothesis about its high entropy, i.e. in comparison with a usual JS-code the obfuscated code is chaotic.

Next, we used several modifications of the algorithm for calculating the final entropy of such a code and drove them along a small signature base. The results turned out to be encouraging, but with one unpleasant feature: the virus code packed with algorithms that are used to package jQuery type libraries showed, respectively, the entropy values close to them. Scratching the turnips and digging a little more with the modification of the algorithm, it was decided to include such a code in the signature database, and set the entropy threshold for a confident determination of the above virus code modifications.
So, here this small code calculates a measure of entropy of a somewhat processed JS code:

sub enthropy ( $$ ) {
my $ data = shift ;
my $ ignore = shift ;
my $ e = 0 ;

my $ letters = { } ;
my $ counter = 0 ;

if ( $ data ) {
$ data = ~ tr / AZ / az / ;
$ data = ~ s / \ s // g ;

# clearing polymorphic code from ignored signatures
foreach ( @ { $ ignore } ) {
$ data = ~ s / $ _ // g ;
}

$ data = ~ s / [^ 2-9] / _ / g ;

while ( $ data = ~ /(...)/g ) {
$ letters -> { $ 1 } ++;
$ counter ++;
}

foreach ( keys ( % { $ letters } ) ) {
my $ p = $ letters -> { $ _ } / $ counter ;
$ e + = $ p * log2 ( $ p ) ;
}

$ e = 0 - $ e ;
}

return $ e ;
}
sub log2 ( ) {
my $ n = shift ;
return log ( $ n ) / log ( 2 ) ;
}

What's going on here:

We prepare the code by translating the letters to one register and get rid of whitespace characters.
We clear the code of ignored signatures (a list of regular expressions from a separate file). This step is used to remove pieces from the potential code that may give false positives. For example, the analyzer cursed the code of the informer from gismeteo, so there is a regular expression in the database of ignored signatures:
url='http:\/\/img\.gismeteo\.ru.*lang='ru';
Replace all code symbols that are not in the range of digits 2..9 with an underscore character.
We generate the alphabet of our code, consisting of triplets (groups of three characters). The result of these transformations is that the resulting alphabet for the virus code is obtained richer than for the usual one - hence the value of entropy is also greater.
We consider the entropy for this code with the resulting alphabet.

After experimenting with the final value, its level was set, above which the code is considered viral:


our $E_MAX = 2.2;

That is, in fact, all that I wanted to say about one method of heuristic detection of viral injections on sites. :)
PS By the way, if you save FTP passwords in Far, do it not at the root of the FTP panel, but create directories (via F7) - for some reason, viruses do not know how to take them yet :)
_________
The text was prepared in Habra Editor

PS If you liked the article - we put plus RomanL , if you didn’t like it - minus zvirusz .

Source: https://habr.com/ru/post/70615/

All Articles