It has historically happened to us that we pay more attention to filtering spam in the incoming mail, almost completely forgetting about outgoing mail.
Starting to analyze this situation, we are confronted with the fact that we cannot really say who is “shit” in our mail traffic, because addresses are given dynamically. spamassassin doesn’t help much either (for now), as outgoing spam has almost 2 times lower marks than incoming spam.
And for a start, it was decided to conduct a small study which is described under the cut.
Initial data
As the initial data we will have:
- billing system. In our case: Abills . But this example can be adapted for any billing
- Exim is configured for almost any of the configs found on the Internet using mysql
- Actually MySQL DBMS. in our case, these were 2 separate servers. One for billing the 2nd for the statistics server that we will collect
What we want
The main task of course is to find out which user is “spamming” through us. For this
- We find out the IP-address that sends mail
- By IP-address we find the user who is currently using this address.
- We write the necessary information for subsequent analysis in the table (login, ip, email_from, email_to, email_time, spam_score)
So the points:
- The IP address is determined via the exim variable - $ sender_host_address
- Since in the abills table, the dv_calls table contains the current online sessions, we find the user who occupied this address on request:
SELECT concat("login=",user_name) FROM dv_calls WHERE INET_NTOA(framed_ip_address)='${quote_mysql:$sender_host_address}';
Note the return result as a pair of parameter = value
. In the Exim config, it looks like this:
GET_LOGIN = SELECT concat("login=",user_name) FROM dv_calls WHERE INET_NTOA(framed_ip_address)='${quote_mysql:$sender_host_address}';
- this is the macro that we will run during the scan of the letter with anti-spam.
- well, the actual insertion of data through the insert will also be performed during the anti-spam check
ADD_STATISTICS = INSERT INTO statistics VALUES ('$acl_m1','${quote_mysql:$sender_host_address}',\
'${quote_mysql:$sender_address}','${quote_mysql:$acl_m4}',NOW(),$spam_score_int);
Exim
Let's go over the Exim configuration shortly:
- Definition of 2 macros:
ADD_STATISTICS = INSERT INTO statistics VALUES ('$acl_m1','${quote_mysql:$sender_host_address}',\
'${quote_mysql:$sender_address}','${quote_mysql:$acl_m4}',NOW(),$spam_score_int);
GET_LOGIN = SELECT concat("login=",user_name) FROM dv_calls WHERE INET_NTOA(framed_ip_address)='${quote_mysql:$sender_host_address}';
- In acl_smtp_rcpt we add the very first item:
warn
hosts = LOCAL_NETS
set acl_m4 = $local_part@$domain
- this is some kind of hack, because as we will write in the table both the sender's address and the recipient's address. But in the place where we will do this, the variables $ local_part and $ domain will already be undefined (I don’t know this from me anyway or in general in Exim, so I am waiting for your comments on this).
- At the very beginning we add the following to acl_smtp_data:
warn
hosts = LOCAL_NETS
set acl_m0 = ${lookup mysql{GET_LOGIN}{$value}{login=unknown}}
set acl_m1 = ${extract{login}{$acl_m0}{$value}{unknown}}
warn
hosts = LOCAL_NETS
spam = nobody:true
set acl_m2 = ${lookup mysql{servers=localhost; ADD_STATISTICS}{$value}{0}}
here in the 1st half of the code - we define the login by the address of the sender and writing it into the variable acl_m1. Moreover, if we are unable to unambiguously determine the client's login, then we write unknown (in our case these will be server and monitoring service messages).
In the 2nd half we check mail with anti-spam for all OUR clients. And pay attention to the record servers = localhost; ADD_STATISTICS here we explicitly indicate that it is necessary to execute the query on the local server and not on the billing server, such an Exim record allows using an arbitrary number of different connections to the DBMS.
Restart exim and have a result.
Preliminary findings
During the day of operation of this algorithm, we found 2 spam hotbed users. Which sent 181 letters from fake addresses with an average spam rate of 24 (according to our antispam scale). And since our antispam was tuned to a completely different order of ratings (50 is a warning) (70 is cut-off), he naturally missed them.
Eventually. Organizational conclusions are made, the list of perpetrators was submitted to the relevant authorities for clarification in the future (fines, blocking, massacre, etc., etc.)
PS
Almost all the information we draw from the
specificationI came up with the idea of ​​working with the DBMS,
this note - though it’s about greylisting which we plan to introduce later