
Last year I worked on the implementation of a virus scanner for one antivirus,
oddly enough company.
The post is a squeeze of acquired knowledge, and tells the habrasoobschestvuu about the internal structure
oddly enough antivirus scanner.
The scanning engine or scanner is the foundation of the antivirus package. It is a back-end of the antivirus and, as a rule, is presented as a dll, since the scanner is used by several programs from the package at once.
The graphical shell in this case is only a beautiful wrapper for displaying the results of the engine. All the useful work makes the engine in the back-end.
Virus Locations
From the name it is clear that the engine is used to scan all possible locations of malware, namely:
- Scan arbitrary files and folders, up to entire disks.
- Memory scan. All processes loaded into memory and their dll are scanned.
- Scan Boot Records (Master Boot Records - MBR).
- Scanning the system for traces of malware. Checking system folders like% APPDATA%,% WINDIR% for specific files and folders. Scan the registry, also for traces in the startup and settings.
')
Types of scan.
Scanning is divided into two main types:
signature and
heuristic .
Signature-based scanning.
Another name is a hash scan. The scanner checks files by comparing file signatures with a dictionary.
Usually, antivirus signature is an MD5 hash (16 bytes) generated based on the body of a
known virus.
Thus, a file is considered infected if its hash is found in the signature database. To localize the detection of malware, the hash can be calculated
only for exe-files based on the PE header.
This type of scan allows you to determine the type of attack with a high degree of probability, without false positives (which is what
heuristic scanning suffers).
The disadvantages of the hash scan include the inability to detect new viruses that are missing in the database. As well as the vulnerability to polymorphic or encrypted viruses, and therefore requires regular updates of the database of signatures.
Another weakness of the hash scan is the speed of verification. If it were not for
Moore's law , not a single modern computer would have been able to finish scanning with such a mass of signatures in a reasonable time.
Heuristic scan
Method based on detection of a virus based on previously known characteristics (heuristics). For example, to detect a boot virus registered in the MBR, the antivirus can read the boot record in two ways: using the WinAPI ReadFile function, and using the direct disk access driver (DDA driver). And then compare both buffers. If the buffers are different, then with high probability
we can say that not only is the virus bootable, it also substitutes WinAPI calls.
This is common practice for
rootkits .
Another example of a heuristic scan is the search for traces of the virus in the registry and system directories. As a rule, viruses create a specific set of files and / or registry entries by which they can be identified.
The above types of malware locations, namely, scanning of traces, scanning of cookies and checking disk boot records, also fall under the heuristic scan.
Scanner Components and Auxiliary Modules
Direct Drive Driver
Required to bypass rootkits. In an infected system, rootkits are used
for sweeping traces of their presence. The best way to do this is to substitute calls to API functions.
In particular, for working with files: CreateFile, ReadFile, etc. When the antivirus program scans the system,
calling these functions, the rootkit may return FALSE when such a call relates to it. To get around this,
the scanner contains a module for direct sector-based reading from the disk, without using WinAPI.
Black and White lists
They are used to filter detections that are not actually malicious. Thus, the antivirus does not warn about the danger in the event of a
false positive.
Modern antiviruses store a database with an average of 5 million signatures. And quite often, for one virus, there may be a dozen signatures. It is possible that out of several thousand system files, there will be a file suitable for a signature. And it threatens with the fact that the antivirus will remove it, or move it to quarantine, which can lead to system failure at all.
Minimizing false positives is the main priority of any anti-virus company.
To pass the most prestigious antivirus test -
virus bulletin , the antivirus must show 100% of the detection result, while not issuing a single false positive.
White List - contains a list of files that do not harm the system, but somehow are detected by the scanner.
Blacklist - contains a list of viruses that we trust (also do not harm the system).
Unpackers, decryptors
In order to achieve an acceptable level of virus detection, the scanner must work with exe-files encrypted with an exe-packer (for example, UPX). Then, before calculating the hash, the scanner detects that the file is encrypted and first accesses the decoder, and then on this basis, the hash is calculated and compared with the one in the database.
The second type of archives is the well-known zip, rar, 7z, etc. Antivirus should also be able to unpack these archives and scan the contents.
The third type is unpacking
NTFS ADS (NTFS Alternative Data Streams). In the NTFS file system, the executable file can be disguised as a regular, for example text. An alternate stream of this file will link directly to the virus.
Use-cases
Antivirus software uses the engine not only with a full system scan, but also in such cases:
- Internet protection. Namely, scanning cookies browsers and Flash Player'a.
For example, Chrome stores cookies in the% localappdata% \\ Google \\ Chrome \\ User Data \\ Default \\ folder, Firefox in% appdata% \\ Mozilla \\ Firefox \\ Profiles.
For Internet Explorer in% USERPROFILE% \ AppData \ Roaming \ Microsoft \ Windows \ Cookies.
Thus, extracting the domain names from the sql-base of cookies for Firefox and Chrome, or from the 3rd line of each cookie for IE,
the scanner compares it with the base of harmful sites.
- Email protection. The scanner clings to the anti-virus module responsible for email protection. This could be a plugin for Outlook, Thunderbird,
the basis of which is the verification of attachments for viruses. - Context scan file / folder. When the user selects in the context menu "Check file ...", by right-clicking on a folder or file, the antivirus also accesses the engine.