Last week, Doctor Web released an antivirus bot for Telegram. As a direct participant in this project, I would like on behalf of the whole team to tell about why we made this bot, how it works and whether it’s time to abandon the desktop antivirus.

Concept
Last summer Telegram introduced bots and the Telegram Bot API. Chat bots existed for a long time, but in this case the platform provided such ample opportunities for integration experiments that only lazy people did not make their own bots. There are even such
exotic examples .
Most of the bots that we experienced were entertaining (like IQ tests or sticker evaluation), informational (for example, they sent a weather forecast, a translation of words or the address of the nearest ATM), or both, and others, such as bots to search for Indian cinema. . It turned out to be convenient to use them, and the format itself so fascinated us that we wanted to use it for our own information stand - our bot could give a description of the threat on request: let's say the user asks the bot what exactly is caught by the Linux.Encoder.1 antivirus. and in response receives a detailed description of the threat. But having slightly twisted the idea in our hands, we found obvious flaws:
')
- In the format of messages from the messenger, it is inconvenient to read about malware: the description of the mechanism is often very long, with examples of code and a lot of screenshots.
- The situation itself seemed artificial, when the user found out about the threat on his device, opened Telegram, found the bot and asked him a question about it, rather than simply google it.
- Different antivirus companies use different threat naming conventions. The user can search for a threat by a different name and not find the necessary information.
Having considered all this, we decided to take a step further - and create a bot with truly applied functionality. Experimental antivirus bot.
The task seemed fascinating and useful. The messenger is responsible for traffic encryption and secure data exchange, and Telegram has proven itself in this. For the security of the device on which Telegram is installed, the user is responsible - and all the usual social engineering tricks work. Both a computer and a smartphone can be infected with a Trojan, which at best will show tons of advertising, and at worst - turn the device into an insensitive set of plastic and metal.
We conceived a bot that could check files and links on the fly and warn the user if it detects a threat. When anti-virus protection is embedded in, say, e-mail, the anti-virus can be located either on the mail hosting side or on the user's device. Bot API allows you to organize protection differently, in the new paradigm: the bot does not work on the user's machine or on the service side, it does not depend on the operating system or on the performance of the device. The only condition for his work - the Telegram client version must support the use of bots. If a suspicious message came in Telegram itself, you can immediately forward it to the bot. Convenient and send bot a dubious link obtained from other sources.
Immediately make a reservation that such a mechanism is not a full replacement for antivirus. A bot cannot prevent a user from clicking on a dangerous link or launching a file, he can only warn about the danger - while the antivirus will protect, even if the social engineer’s carefree victim immediately downloads and launches the Trojan. At the same time, Telegram's tech-savvy audience may be interested in an anti-virus product, which does not limit their actions in any way, but provides information on request. We think of the bot as a research project, and first of all we are interested in feedback - that's why you see our article here.
Implementation
The bot is implemented using the Tornado framework - which, as a traffic controller at the intersection, coordinates data flows between the Telegram Bot API and the closed API of our Dr.Web services. Initially, we went the standard way and used Django. However, the feature of the Django framework is such that during data entry and output (retrieving the request body, sending a response, working with a database, etc.), precious time is wasted. We conducted an experiment using the
Siege utility and realized that this model was unsuitable for efficiently processing thousands of one-time requests.
Therefore, we began to look towards asynchronous models of work - and made a choice in favor of Tornado (where asynchrony, in fact, is the main feature). Currently, the entire bot code is asynchronous: including downloading files, checking references and even working with the database - when adding a record to the database, the bot does not wait for a response from the server, but continues to perform tasks.

When the messages intended for the bot come from the Telegram cloud, we need to parse the links in the received text. At the same time, it is important to avoid discrepancies between how our parser works (that is, which page the bot checks), and how the Telegram parser works (that is, what the user has to open by clicking on the messenger), so we followed the most Parsing links Telegram - focusing on the open source web version. Although their mechanism is probably not limited to this, and periodically we have questions (for example, in the iOS mobile application, the link “test.com:8080” without specifying the protocol looks like “
test.com : 8080” at the sender, but as “
Test.com:8080 “ at the recipient).
Further processing of links and files goes in several stages: unpacking archives, disclosing reduced links and tracking redirects. If files are downloaded via a link, we download them - thanks to this, the bot can check not only files sent via Telegram, but also files from external links.
In order to more efficiently distribute the load on the servers, first of all the caches of files and links are checked. After that, the bot passes the baton to various Dr.Web technologies through our internal APIs: the Dr.Web Cloud cloud service, the Scanning Engine anti-virus engine, the Link Checker link checking service, and virus signature databases. Data exchange is asynchronous and multi-threaded, and with increasing load, we can increase power by adding new servers and writing certain settings in configuration files - the ability to scale is initially incorporated into the architecture of the bot.
Finally, the verified materials are returned to the bot - and it sends the results to users, taking into account the
restrictions on the frequency of messages from the bots that are set by the Telegram Bot API.
Users can check links and files both in private mode (send suspicious content to the bot or send messages received from other users to it) or in group chat - if you add a bot to the chat participants, it will work on all files and links in the chat.
The bot works in two modes: "quiet" and normal. In normal mode, the bot responds to each file or link and sends a message stating that the link is safe or it is not recommended to download the file. If the bot behaves like this in a group chat, then this may prevent people from communicating, so we made a “quiet” mode. In this mode, the bot gives a sign only when the file or link in the chat contains a threat, and warns users against rash or click. Verification error messages also come in “silent” mode - otherwise, without waiting for an answer, the user could mistakenly find that the link or file was successfully checked and safe. You can select a mode using the
/ mode command.
As the API evolves, we will introduce new features if they prove to be useful for our tasks. Not so long ago, Telegram introduced the use of bots in inline mode without adding a bot to the chat - so far this mechanism does not allow the file to be sent to the bot for verification, but we are considering its use. In the next updates we plan to make the bot faster and more reliable, we also closely monitor the feedback of users.
A few words about localization (since this is not only my profession, but also passion): our bot knows how to communicate in Russian, English or German. There were no special difficulties, we use the gettext library, and store localization files in the .po format.
As a rule, all the texts for our products are written in the official style, so the use of emoji in resource files was an interesting experience - in OS X they are supported out of the box, in Ubuntu it was enough to add the font to the system (
sudo apt-get install ttf- ancient-fonts ), and on Windows, tricks were needed so that translators could see emozdi in localization files. We tried to insert emoji into .po files using codes, but not all operating systems can read them (for example, users of Windows desktop clients have seen text codes instead of them). Apparently, there are two reasonable solutions: either select the .po-files editor, which displays all emoji, or replace them with codes, but convert them into emoji on our side. We think in the direction of the second option - but be that as it may, the user of these torments will not even notice.
Another feature that we keep in mind when developing: the same emoji look different on different devices and are not supported everywhere at all. Emojipedia helped with the solution of this problem - in it you can see if there are any emoji you need on various platforms, and also copy emoji or its code and paste it into a .po file.
And a small snag that we encountered: Telegram does not allow to locate the bot completely, the description of the bot and the hints in the input field are always in any one language (in our case, in English). We hope that a solution for this will appear in the next releases of the Telegram Bot API.
In general, the development, internal testing and localization took us 3 months with a team of 7 people. Colleagues were engaged in development in a relaxed mode in parallel with the main work tasks, so we had enough time to “meditate” on the logic of the bot's work. In this mode, it is most difficult to perform load testing - for the main stress test several dozens of employees with Telegram accounts were invited and fed a bot from a thousand files using a conditional signal. We hope that the influx of curious testers from the Habr will not incapacitate us, but if anything - over time, we will connect additional capacity, do not judge strictly.
As far as we know, no one has yet done antivirus bots, therefore there is a wide field for experiments. We will be happy if you share your thoughts and experiences with our bot:
@drwebbot