📜 ⬆️ ⬇️

Virus scanner inside


Last year I worked on the implementation of a virus scanner for one antivirus, oddly enough company.
The post is a squeeze of acquired knowledge, and tells the habrasoobschestvuu about the internal structure oddly enough antivirus scanner.
The scanning engine or scanner is the foundation of the antivirus package. It is a back-end of the antivirus and, as a rule, is presented as a dll, since the scanner is used by several programs from the package at once.
The graphical shell in this case is only a beautiful wrapper for displaying the results of the engine. All the useful work makes the engine in the back-end.


Virus Locations



From the name it is clear that the engine is used to scan all possible locations of malware, namely:

')

Types of scan.


Scanning is divided into two main types: signature and heuristic .

Signature-based scanning.

Another name is a hash scan. The scanner checks files by comparing file signatures with a dictionary.
Usually, antivirus signature is an MD5 hash (16 bytes) generated based on the body of a known virus.
Thus, a file is considered infected if its hash is found in the signature database. To localize the detection of malware, the hash can be calculated only for exe-files based on the PE header.
This type of scan allows you to determine the type of attack with a high degree of probability, without false positives (which is what heuristic scanning suffers).
The disadvantages of the hash scan include the inability to detect new viruses that are missing in the database. As well as the vulnerability to polymorphic or encrypted viruses, and therefore requires regular updates of the database of signatures.
Another weakness of the hash scan is the speed of verification. If it were not for Moore's law , not a single modern computer would have been able to finish scanning with such a mass of signatures in a reasonable time.

Heuristic scan


Method based on detection of a virus based on previously known characteristics (heuristics). For example, to detect a boot virus registered in the MBR, the antivirus can read the boot record in two ways: using the WinAPI ReadFile function, and using the direct disk access driver (DDA driver). And then compare both buffers. If the buffers are different, then with high probability
we can say that not only is the virus bootable, it also substitutes WinAPI calls.
This is common practice for rootkits .
Another example of a heuristic scan is the search for traces of the virus in the registry and system directories. As a rule, viruses create a specific set of files and / or registry entries by which they can be identified.
The above types of malware locations, namely, scanning of traces, scanning of cookies and checking disk boot records, also fall under the heuristic scan.

Scanner Components and Auxiliary Modules



Direct Drive Driver

Required to bypass rootkits. In an infected system, rootkits are used
for sweeping traces of their presence. The best way to do this is to substitute calls to API functions.
In particular, for working with files: CreateFile, ReadFile, etc. When the antivirus program scans the system,
calling these functions, the rootkit may return FALSE when such a call relates to it. To get around this,
the scanner contains a module for direct sector-based reading from the disk, without using WinAPI.

Black and White lists

They are used to filter detections that are not actually malicious. Thus, the antivirus does not warn about the danger in the event of a false positive.
Modern antiviruses store a database with an average of 5 million signatures. And quite often, for one virus, there may be a dozen signatures. It is possible that out of several thousand system files, there will be a file suitable for a signature. And it threatens with the fact that the antivirus will remove it, or move it to quarantine, which can lead to system failure at all.
Minimizing false positives is the main priority of any anti-virus company.
To pass the most prestigious antivirus test - virus bulletin , the antivirus must show 100% of the detection result, while not issuing a single false positive.
White List - contains a list of files that do not harm the system, but somehow are detected by the scanner.
Blacklist - contains a list of viruses that we trust (also do not harm the system).

Unpackers, decryptors

In order to achieve an acceptable level of virus detection, the scanner must work with exe-files encrypted with an exe-packer (for example, UPX). Then, before calculating the hash, the scanner detects that the file is encrypted and first accesses the decoder, and then on this basis, the hash is calculated and compared with the one in the database.
The second type of archives is the well-known zip, rar, 7z, etc. Antivirus should also be able to unpack these archives and scan the contents.
The third type is unpacking NTFS ADS (NTFS Alternative Data Streams). In the NTFS file system, the executable file can be disguised as a regular, for example text. An alternate stream of this file will link directly to the virus.

Use-cases



Antivirus software uses the engine not only with a full system scan, but also in such cases:

Source: https://habr.com/ru/post/145948/


All Articles