Express analysis of malicious files for educational purposes

Express analysis of malicious files, first of all, obfuscated malicious scripts (fresh, hot from the heat) - an example of how to interest students with the question of [de] code obfuscation.
A deep and painstaking analysis with building block diagrams of algorithms and execution routes (a la RD control NDV SCC ) is not here.

Instead of introducing

Artificial examples characteristic of many textbooks, often, unfortunately, discourage the desire to understand the question. It is also bad when the very first training task on the issue being studied is so complex that it is possible to lose the thread of reasoning (in the absence of experience, but it is not clear yet) and with it the interest.
Building on the above and remembering that the presentation of the material is the task and responsibility of the teacher - there was a desire to demonstrate a rapid analysis of obfuscated malicious scripts. Fortunately, getting this "good" is not such a big problem: the whole industry is regularly working on the creation and distribution.
It was received: an impressive selection from quarantine for the previous week and a modest selection of scripts from infected sites (for 2015).
Where to begin? With the preliminary preparation of the received "good".

Step One (preparatory)

We unpack all archives, more precisely, we try archiver (in this case 7zip was used) to recognize and unpack archives. Other try? Because in the resulting collection, files with a wide variety of extensions (zip, rar, arj, lzh, ace, exe, uue, etc.) can be an archive.

* .zip file is not always a zip file

The existing collection clearly showed the situation when inside a file with an extension characteristic of one archiver, the data is packed with an algorithm from another archiver. Approximately 50% of the files, the extensions of which are characteristic of various archives, contained a rar archive inside.
The calculation is clear: on a typical user workplace (if it is considered as the target of an attack), an archiver is usually installed, which analyzes not only the extension, but also the file header, which means unpack the contents, on the other hand, this can make it difficult for simple protection systems that handle files based only on the extension.
The effectiveness of such a technique for hiding code from antiviruses at intermediate nodes is probably about zero.

Delete all duplicates, because we are not interested in a quantitative indicator. Let us remember the list received at this stage, more precisely - the names of the files, we still need it.

Step two (sorting) or in the footsteps of Mendeleev

Got a list of files, what's next? We divide the files by type:

all identified archives (now they are just shipping containers) separately;
all compiled executable files (their analysis is usually more complicated, requires more time and debugger) separately. Only exe and scr were included here (there was one exe file with the com extension);
all scripts, each type separately. There were not so many types in the collection, only 4: js, php, vbs, wsf;
The last group got a single html file. The file did not contain scripts, therefore it was not included in the previous groups.

Again, quantitative indicators - what types of files anymore - do not interest us. To imagine from which end to take, you must somehow streamline what you received. Remembering Mendeleev, we sort by "mass":

each group of files is sorted by size;
sort the file names by length, discarding the extensions (this is where the general list of files comes in handy).

Step three (analysis) or "peering method"

Applying the "peering method" to a list of files sorted by length:

immediately striking leaders, whose length is more than 100 characters. Thus, the malware hides the file extension, since to see him, if not specifically set such a goal, is practically impossible;
Example of very long names
The act of reconciliation with the attached register of primary documentation on 14.03.2016. Unloaded from 1C Accounting _xlsx
new document from 03/16/2016, the current version for checking and printing on the date checked by antivirus software ok
attracts the attention of a significant number of files with the same length, if you look closely - you can see the structure of the construction of the name. Perhaps these files are united by something else;
3 letters and 10 numbers
GUA1343958710
RQQ7223899805
VII964085171413
YAF3892579406
Among the names with a length of 8 and 10 characters there are a lot of dates in various formats, but this information does not give us much.

Without using debuggers from compiled executable files, the following was obtained:

in most cases, icons are replaced by document icons (from doc (x), xls (x), pdf, icons characteristic of Libre / OpenOffice have never met) or multimedia (from jpeg, mp3, etc.);
Some files contained an electronic signature (allegedly even from Microsoft), but invalid.

Before moving to a large group of scripts, consider a single html file. He was not pleased with anything really interesting: deleted line feeds and many comments. It is easily cleaned and immediately visible is the technique that the attackers wanted to use.

html

<meta http-equiv='refresh' content='0; url=http://bad.url/' />

Reception is simple, clear and long known.

In the wsf scripts nothing interesting was found. After the elementary deletion of multi-line comments, the same code remained (requiring decryption, but it was not dealt with).

wsf

 <job> <script language="JScript.Encode"> #@~^+goAAA==&JNi ... DAA==^#~@ </script> </job>

JS deobfuscation - the topic has been repeatedly disclosed on Habré and there’s no particular point in repeating. In the collection there were different techniques.

Interesting and illustrative was closer attention to the group of files "3 letters and 10 numbers." The file structure was similar if the variables were given the same name and still performed some “cosmetic” transformations (align indents, perform arithmetic operations on numbers [so that 1 + 4 + 1 = 2 + 2 + 2 = 3 + 2 + 1 = 6]) , the similarity of the files became apparent. Files were obviously processed by one obfuscator. This demonstrated the ability to identify groups of similar from the file stream and, after performing a thorough analysis of individual files, to draw conclusions about the groups.

Separate malware tricks with archives

The change in the archive extension is already noted above.
There are archives nested in each other with correct extensions or not. Nesting depends on the author's imagination and has two possible goals:

hide from anti-virus scanning (to save resources, antiviruses, as a rule, limit the nesting depth of the archives being scanned);
to provoke the user [even cautious] to launch a malicious workload. The user several times clicks the mouse (or presses Enter in the shell) on the nested archive so that it opens, and, with a high probability, on the machine will perform the same action on the malicious load.

Reception of simple obfuscation using the archiving function was revealed in a php file. In addition to obfuscation and playing hide and seek with antivirus in this case, compression also gave significant savings on file size.

php

 eval(gzinflate(...));

A small modification of the code that does not execute, but saves the result, allows you to get to the content in several iterations. The task is not difficult and every student will be able to solve it. As a result, from the initial 10k, a php script at 30k is obtained. The script contains an authorization block with password hash checking, but another small modification disables the check (or replaces the hash) and you can see the “real live hacker shellcode” in action: a mini file manager with a convenient interface and the ability to execute input commands.

Source: https://habr.com/ru/post/279717/

All Articles