In the article
“What and why are we looking for on the sites“ bots of the dark side of power ”,” we reviewed typical examples from magazines of various sites. However, a much more interesting variation on the subject of a
radio game in intelligence . What it is and how to cook it - I will tell further.
We list the main provisions. If you disagree with them, it is better not to waste your time and read on.
So, the main provisions:
- you are fond of information security, web administration or study on relevant specialties;
- you have a little desire, time and resources that you can spend to feel like an explorer;
- You do not expect to immediately become a super guru, but by developing the particular solutions suggested in the article, you can study some issues with interest.
Honeypot, in short, is a kind of trap with which the researcher collects material. Information on varieties, existing solutions, incl. OpenSource, easy to find on the network, so we will not dwell on them.
')
Let's get to the bottom:
- we take hosting;
- we take the domain;
- wrap up all incoming requests to your script;
- We analyze incoming requests and, in addition to accumulating statistics, join the game.
We take hosting
You need to decide on the site where our honeypot will be located. To reduce the threshold of entry, choose shared-hosting, because this eliminates system administration issues (installation, optimal configuration, protection and updates) quickly and reasonably cheaply. Servers (IP ranges of web servers) hosting companies are known and never complain about the lack of attention from the bots.
Those interested can immediately take up the VPS / VDS, most importantly, do not get stuck at the server setup.
We take the domain
The new domain, as a rule, immediately attracts the attention of bots, although the “old” domains are also excellent. If you use a valid domain (site), then there may be side effects due to possible redirection errors or excessive load.
According to rough estimates, at the beginning of 2015 a new domain and several months of hosting will cost 1000 rubles.
Wrap up all incoming requests to your script.
There are a lot of solutions for this task, depending on the web server used and the level of influence on the server settings. The proposed simplest option is suitable for a new domain. This does not interfere with the main business and allows us to move quickly to the most interesting, in our opinion.
the simplest option .htaccessRewriteEngine On RewriteRule .* index.php [L]
All requests are wrapped in index.php.
To fine-tune the redirects, especially for the current site, you need not be lazy and refer to the documentation or articles, for example,
“How mod_rewrite actually works. Allowance for continuing .
We analyze incoming requests, accumulate statistics
In the script, which wraps requests, we implement the following functionality:
- journaling some data from $ _SERVER to accumulate statistics;
- the ability to search patterns (patterns) in the incoming data from $ _SERVER;
- an effective mechanism for connecting handlers for some templates (for efficiency, see the non-standard optimization of PHP projects );
- (for the future) a simplified and resource-intensive server-side session mechanism.
Join the game
Finally, we came to the main point. What will be the game?
After analyzing the statistics, you choose the bot that you want to explore. You can try to identify the bot by various signs (IP ranges, scan time, User-Agent, specific URL requests, etc.).
After that, you disguise yourself as a bot and, giving him the information and files he expects, fully describe his behavior from scanning to attempts to use exploits, non-standard calls, downloading specific files, etc.
For example, a bot expects some css file — get it, after that it tries to access a specific file — look for information about it on the network and output it, sends parameters — we try to fake the answer, etc. Here just lightweight implementation of sessions is useful.
Of course, between the first call and the construction of the whole chain of answers, several iterations can be performed with elements of guessing and manual information retrieval. But this is a mind battle (you are a <-> bot algorithm developer), real chess!
a little hintTo make it more difficult for bots to identify your analysis, it is advisable (within reasonable limits) to use the element of randomness when issuing results. Namely, your algorithm does not yet know the “correct answer” for the bot or the request has not been met before - issue a message simulating a server error or an empty file with a XX% probability, try SQL-injection — issue a plausible error message from the DBMS or PHP, etc. .
Instead of conclusion
Dare! And let your work be a blessing.
Warning proposals immediately put the finished code (why it is not done):
- so as not to hinder the flight of fantasy;
- so that students of specialized specialties / departments "do not copy and paste" (hello to the KSIMU at TSU);
- so as not to facilitate the task to bot-eaters, who will immediately cut off novice researchers in the process of testing the proposed (if it were) code article.