One day, after another conversation in our company about which antivirus best copes with the discovery of new types of viruses, I have a desire to conduct some research on this issue on my own. I do not argue, the question is not new, and many independent groups conduct their expertise (for example, an excellent selection of such ratings is summarized on the
Anti-Malware.ru group forum, and I also advise you to look at the latest, in the sense of the most recent, picture
VirusBulletin ). However, the strong desire was precisely to conduct such an analysis on their own, especially since on the Internet there is an almost ready source of information on this issue - a wonderful service
VirusTotal .
How it was done and what came of it
My first idea was to use a very interesting table
“Top10 file submissions (Yesterday)” on the
Stats page
of VirusTotal itself . Statistics, say, for 3 months of a daily survey of this table would actually be enough for the described purposes. But alas, the table does not work unintentionally or deliberately, i.e. It displays values ​​that change only about once a month according to some internal logic. A weekly correspondence with representatives of Virus Total itself ended in the fact that the existence of the “problem” was recognized, but in a free translation, “the priority of its correction compared to other more topical problems is very low”. In fact, this wonderful service can be forgiven for this, although in their place I would simply hide the indicated block from the statistics page until the moment of repair.
')
I had to follow a more complicated path: with the help of Google, wget, grep and a pair of self-made scripts from the Internet (in the overwhelming majority - from forums asking for “diagnosing” or “telling how to heal”), 3,630 links were collected to the actual ones from 1.1 .2010 VirusTotal analyzes.
What are these links? The specifics of addressing the issue of a virus to Internet forums are basically such that the analyzed sample just recently began its distribution. Most often this is either the first third or the middle of the active phase of processing by antivirus companies. If we analyze the same instances at the current time, the quality of the response will undoubtedly be better - I think close to 80-90% - but I remind you that I have only references (in fact, hash sums), and not the copies of the viruses themselves.
Well, the middle of the life cycle of a virus is also not a bad time to analyze within the framework of a goal. Products with either very fast feedback or good heuristic algorithms will be included in the list of antivirus detectors detected by this instance. We are satisfied with both. The number of antiviruses that detected instances of the sample at that historical moment is shown in the histogram (X-axis is the number of activated anti-virus engines, Y-axis is the number of virus instances):

The graph leads to the next integral issue of the virus scan - false positives. Undoubtedly, in the area of ​​graphics, close to zero, along with the little-known at the moment, but still viruses, there are many false positives (false positive):
- undue suspicions by heuristic algorithms,
- packaged executables (as the gjf habrauzer quite rightly specified)
- other unintentional errors of antivirus programs.
I don’t see the possibility of separating what is based on the available data, so I chose the most obvious way - to set the threshold value for false positive. Specifically, starting with 8 operations, we consider an instance with a high probability as malicious (on the graph these instances are highlighted in red). About 2,600 records fell under this criterion.
Well, now actually, for the sake of what all these preliminary actions were carried out. Based on the described sample of 2600 highly probable “fresh” viruses, the average frequencies of their detection were calculated by antivirus engines participating in VirusTotal.
Disclaimer. The above statistics is one of the random implementations created on the basis of information freely available on the Internet. Despite the measures taken to eliminate correlations and conditional probabilities, it is impossible to guarantee with certainty 1.0 their absence in the material due to the specifics of receiving the sample from the Internet. The statistics reflect the percentage of detection of virus specimens at a random time in the phase of the beginning of their distribution and cannot characterize the final quality of virus detection with each specific product, which is undoubtedly higher.A big request is to take this publication calmly, without HolyWars, and to accept it simply as just another “alternative” rating of antivirus products. The decision to publish these (obtained initially for internal use) results was taken after a small
survey on Habré in Q & A. Ready for constructive criticism and additions to the column “Notes” of the table.