📜 ⬆️ ⬇️

DIY mail cluster

Immediately make a reservation that the task can be solved in several ways. This is one of the possible.
The article is intended for those who know how Exim and Dovecot are configured and work, and I will not dwell on the basic settings of these services in it.

I hope that someone, after reading the note, will receive the necessary knowledge or ideas to implement their decision.

The task is to build a fault-tolerant service, with mail storage on servers, with access via IMAP.
The cluster will serve the company with approximately 60 branches, each of which has its own 3rd level domain.
')
The main task of the service, continuous access to the mail. Therefore, for storage we will use two geographically separated servers, with synchronization of mail directories.
Both servers will be active, which means that we will distribute the load between the nodes. Some domains will be served by one node, some domains by another. In case of failure of one of the nodes, customers switch to another.
We will use Nginx with the mail module as a front-end for load distribution of client routing. To receive mail, we will use two smtp servers.

Scheme:


STORAGE : Mailbox Store. It consists of two nodes.
Each node is a dedicated server with 2x4TB HDD located on different hostings
DNS: storage-01.domain.ru and storage-02.domain.ru
OS: FreeBSD,
Software: Dovecot, Exim, Postgresql and Nginx

SMTP : Servers processing SMTP traffic, two nodes.
Virtual servers located on different hosting,
DNS: smtp-01.domain.ru and smtp-02.domain.ru
OS: FreeBSD,
Software: Exim, Postgresql

PROXY : Proxy server for user access to the IMAP, POP3, SMTP service.
Virtual server The only non-duplicated link in the cluster, but due to its simplicity, it rises within a few minutes from snapshot.
DNS: mail.domain.ru
OS: FreeBSD,
Software: Nginx

STORAGE.
Dovecot was chosen as MDA since it is able to cluster out of the box. To store mail, the Maildir format was chosen, since deduplication immediately wanted, but more on that below.
Datastors accept mail only from their smtp servers and PROXY. Mail to the world sent by themselves, bypassing the smtp servers. You can hide them altogether, and send outgoing mail through smtp nodes.
Path in the file system to the boxes / usr / mail / domain level 2 / domain level 3 / box /
In the authorization, the full mailbox mail@ldomain.mdomain.ru is used as the login

DB:
Description of tables:
mail table to store mailboxes


ldomain table to describe 3rd level domains

mdomain table for the description of domains of the 2nd level

maps routing table

As I wrote above, I distribute the load (mail domains) across two storage, in the maps table it determines on which of the storage the 3rd level domain is located.

mail=# select * from maps limit 3; id | ldomain_id | mdomain_id | storage1 | storage2 ----+------------+------------+--------------------+--------------- 56 | 56 | 2 | storage-01.domain.ru | storage-02.domain.ru 57 | 57 | 2 | storage-02.domain.ru | storage-01.domain.ru 58 | 58 | 2 | storage-01.domain.ru | storage-02.domain.ru (3 ) 

Based on this table, Exms of storages and smtp nodes will determine where to send letters. And Nginx, to what storazhu connect users.

Creating a database and tables:
  psql -Upgsql template1 ctreate database mail; \q 

 CREATE TABLE mail ( "id" BIGSERIAL PRIMARY KEY, "mailbox" CHARACTER VARYING(32) not null, "password" CHARACTER VARYING(128), "ldomain_id" int NOT NULL, "mdomain_id" int NOT NULL, active BOOLEAN DEFAULT TRUE NOT NULL, CONSTRAINT "mail_ldomain_id_check" CHECK (("ldomain_id" > 0)) ); CREATE TABLE "ldomain" ( "id" BIGSERIAL PRIMARY KEY, "domain" CHARACTER VARYING(32) NOT NULL, "active" BOOLEAN DEFAULT TRUE NOT NULL, CONSTRAINT ldomain_k UNIQUE (domain) ); CREATE TABLE "mdomain" ( "id" BIGSERIAL PRIMARY KEY, "domain" CHARACTER VARYING(32) NOT NULL, "active" BOOLEAN DEFAULT TRUE NOT NULL, CONSTRAINT mdomain_k UNIQUE (domain) ); CREATE TABLE "maps" ( "id" SERIAL PRIMARY KEY, "ldomain_id" int NOT NULL, "mdomain_id" int NOT NULL, "storage1" CHARACTER VARYING(32) NOT NULL, "storage2" CHARACTER VARYING(32) NOT NULL, CONSTRAINT maps_ldomain_k UNIQUE (ldomain_id) ); 


Dovecot
Dovecot performs the MDA function. I will leave the basic setting of Dovecot beyond the scope of this article, I’ll dwell only on those points that are important for connecting it with DB and MTA
 /usr/local/etc/dovecot/dovecot.conf protocols = imap pop3 lmtp #    Exim   LMTP 

 /usr/local/etc/dovecot/dovecot-sql.conf.ext driver = pgsql connect = host=localhost dbname=mail user=mail password=password default_pass_scheme = MD5 iterate_query = \ SELECT mail.mailbox || '@' || ldomain.domain || '.' || mdomain.domain AS user \ FROM mail \ INNER JOIN mdomain ON ( mail.mdomain_id = mdomain.id ) \ INNER JOIN ldomain ON ( mail.ldomain_id = ldomain.id ) password_query = \ SELECT mail.mailbox || '@' || ldomain.domain || '.' || mdomain.domain AS mail, mail.password \ FROM mail \ INNER JOIN mdomain ON ( mail.mdomain_id = mdomain.id ) \ INNER JOIN ldomain ON ( mail.ldomain_id = ldomain.id ) \ WHERE mailbox = '%n' AND \ ldomain.domain || '.' || mdomain.domain = '%d' AND \ mail.active = true AND \ ldomain.active = 'true' user_query = \ SELECT '/usr/mail/' || ldomain.domain || '.' || mdomain.domain || '/' || mail.mailbox AS home \ FROM mail \ INNER JOIN ldomain ON ( mail.ldomain_id = ldomain.id ) \ INNER JOIN mdomain ON ( mail.mdomain_id = mdomain.id ) \ WHERE mail.mailbox = '%n' AND \ ldomain.domain || '.' || mdomain.domain = '%d' 

 /usr/local/etc/dovecot/conf.d/10-auth.conf auth_username_format = %Lu #    mail@ldomain.mdomain.ru !include auth-sql.conf.ext 

 /usr/local/etc/dovecot/conf.d/10-mail.conf mail_location = maildir:/usr/mail/%d/%n/Maildir #      /usr/mail/ 2- / 3- // 


Storage synchronization
Initially, I set up synchronization with the means of Dovecot (dsync) myself, but during operation a very unpleasant problem got out. As it turned out, the problem was related to the type of storage Maildir. Dsync began to fail, to produce copies of letters by fattening the free space on the disks. By that time, I could not transfer all mailboxes to dbox (Dovecot proprietary format), so I had to abandon synchronization via dsync. In general, there were no other complaints about this mechanism.
I had to contact rsync, with a simple script it takes from the database those domains that are serviced by the server on which it starts up and synchronizes their directories to the second server. Correspondingly, on the second server the same script drives its first directories to the first one. Of course, this mechanism is less reliable since rsync runs on a schedule, there is a window between launches in which if the server goes down we lose letters.

the script is launched with two parameters - local_server_name remote_server_name
 #mailrsync.pl storage-01.domain.ru storage-02.domain.ru 

sync script:
 #!/usr/local/bin/perl use DBI; use threads; use Net::Nslookup; use Sys::Hostname; @host = split('\.',hostname); $dbn="mail"; $dbuser="mail"; $dbpass = "password" $curdata=`date +%Y-%m`; chop $curdata; $conn=DBI->connect("DBI:Pg:dbname=$dbn;host=localhost","$dbuser","$dbpass") or die "Cannot connect"; ($localhostname,$remotehost)=@ARGV; $mail_dir = "/usr/mail/"; sub domains { $q = "SELECT ldomain.domain,mdomain.domain,maps.storage1 FROM mail INNER JOIN ldomain on (mail.ldomain_id = ldomain.id) INNER JOIN mdomain on (mail.mdomain_id = mdomain.id) INNER JOIN maps on (maps.ldomain_id=ldomain.id) WHERE maps.storage1='".$localhostname."' AND mail.mailbox ='dir'"; $domain = $conn->prepare($q) or die "Can't prepare statement: $DBI::errstr"; $domain->execute(); while ( my @domain = $domain->fetchrow_array ) { @domains=(@domains,$domain[0].".".$domain[1]); } print "count of domains: ".($#domains + 1)."\n"; $dt = 2; #      $count = ($#domains / $dt ); print "count: ".$count."\n"; $i1 = 0; for ($i2 = 0; $i2< $count; $i2++){ if ($dt > $#domains ){$dt = $#domains ;} print $dt."\n"; print "loop: ".$i2."\n"; foreach $item (@domains[$i1..$m]){ print "in \@domains: ".$mail_dir.$item."\n"; @stack = (@stack,$mail_dir.$item."/"); } push @threads,threads->create(\&sync,\@stack); $i1 = $dt+1; $dt = $dt + 2; @stack=(); } } sub sync { print "sync\n"; foreach $target (@stack){ system(`/usr/local/bin/rsync -H --delete-during -azz -e "/usr/bin/ssh -i /root/.ssh/dovecot_dsa" $target vmail\@$remotehost:$target`); print $target."\n"; } } domains(); foreach $thread (@threads) { $thread->join(); } 


On this with Dovecot, everything.

Exim
we define local domains based on the records in the maps table so that Exim “knows” its domains.
 domainlist LOCAL_DOMAINS = \ ${lookup pgsql{\ SELECT ldomain.domain || '.' || mdomain.domain AS domainname \ FROM ldomain, mdomain,maps \ WHERE ldomain.domain || '.' || mdomain.domain = LOWER('${quote_pgsql:$domain}') \ AND ldomain.active = 'true' \ AND maps.storage1 = 'storage-01.domain.ru' \ AND maps.ldomain_id = ldomain.id}} 


in hostlist relay_from_hosts I specify the addresses of the smtp node and the proxy, I accept mail from them without authorization (clients are authorized to proxy).
  relay_from_hosts = localhost : smtp01.domain.ru : smtp02.domain.ru : mail.domain.ru 

I give incoming mail through LMPT Dovecot. Everything else is standard. Requests to the database to search for boxes and passwords are the same as in the listing for Dovecot

SMTP nodes
The database is the same as on the stack, except that the mail table does not have a password field, because users do not connect to these servers. smtp nodes handle only incoming traffic from the world. The database checks if the box exists, skipping further letters only for existing boxes.

Exim
Standard config, except for the request to determine the route
 ROUTE_LIST = "${lookup pgsql{\ SELECT COALESCE(storage1,'') || ' : ' || COALESCE(storage2,'') \ FROM (\ SELECT storage1,storage2 \ FROM maps \ INNER JOIN ldomain ON ( maps.ldomain_id = ldomain.id ) \ INNER JOIN mdomain ON ( maps.mdomain_id = mdomain.id ) \ WHERE ldomain.domain || '.' || mdomain.domain = '${quote_pgsql:$domain}' \ UNION ALL \ SELECT storage1,storage2 \ FROM co_maps \ INNER JOIN co_domain ON ( co_maps.domain_id = co_domain.id ) \ WHERE co_domain.domain = '${quote_pgsql:$domain}') AS foo}}" 

SQL query pulls the name of the stack for the recipient, then the address of the stack is specified in the router, in the directive route_list. Thus, the letter is sent to the site where the active domain for this mailbox is located.
 begin routers DATASTORE: driver = manualroute domains = DOMAINS transport = remote_smtp condition = MAILS route_list = * ROUTE_LIST no_more 

Requests to the database to search for boxes and passwords are the same as in the listing for Dovecot.

PROXY
Dovecot can act as a proxy, but I chose Nginx, it seemed simpler and clearer in this regard. There was one task, to tell nginx in some way where to send the user.

nginx.conf on PROXY
 cat /usr/local/etc/nginx/nginx.conf worker_processes 1; worker_rlimit_nofile 8192; pid /var/run/nginx.pid; error_log /var/log/nginx-error.log debug; error_log /var/log/nginx-error.log notice; error_log /var/log/nginx-error.log info; events { worker_connections 8192; multi_accept on; use kqueue; } mail { ssl_certificate /usr/local/etc/ssl/proxy.crt; ssl_certificate_key /usr/local/etc/ssl/proxy.key; ssl_session_timeout 5m; xclient off; auth_http storage-01.domain.ru:8185/auth; pop3_capabilities "LAST" "TOP" "USER" "PIPELINING" "UIDL" "RESP-CODES" "EXPIRE" "IMPLEMENTATION"; imap_capabilities "IMAP4" "IMAP4rev1" "UIDPLUS" "IDLE" "LITERAL+" "QUOTA" "LIST-EXTENDED"; smtp_capabilities "SIZE 52428800" "8BITMIME" "PIPELINING" "STARTTLS" "HELP"; server { smtp_auth login plain; listen 25; protocol smtp; proxy on; starttls on; } server { smtp_auth login plain; listen 587; protocol smtp; proxy on; starttls on; } server { listen 110; protocol pop3; proxy on; starttls on; } server { listen 995; protocol pop3; proxy on; starttls on; } server { listen 143; protocol imap; proxy on; starttls on; } server { listen 993; protocol imap; proxy on; starttls on; } } 

Pay attention to the directive auth_http storage-01.domain.ru:8185/auth;
Nginx also works on the stack (on both!), But in the web server mode, with the sole purpose of processing storage-01.domain.ru:8185/auth request
This request in case of successful authorization of the client returns the authorization status, the name of the store and the port of the service
 "Auth-Status", "OK"; "Auth-Server", "storage-01.domain.ru"; "Auth-Port", "143"; 

After that, nginx on PROXY sends a client to the store who returned in response.
One could of course exclude nginx on the stack, but for this, it would be necessary to keep a base with users on PROXY. In general, there could be options.

Below is the Nginx config on the stack, with a perl module for implementing the above.
 worker_processes 4; worker_rlimit_nofile 8192; error_log /var/log/nginx-error.log info; events { worker_connections 8192; multi_accept on; } http { perl_modules perl/lib; perl_require mailauth.pm; perl_require Digest.pm; access_log off; server { listen 8185; ssl_certificate /usr/local/etc/ssl/storage-01.crt; ssl_certificate_key /usr/local/etc/ssl/storage-01.key; ssl_session_timeout 5m; location /auth { perl mailauth::handler; proxy_set_header X-Real-IP $remote_addr; } } } 


mailauth.pm module
 package mailauth; use nginx; use DBI; use Net::Nslookup; use Digest::MD5 qw(md5_hex); $pg_user = "mail"; $pg_pass = "password"; $passhost = "localhost"; $mapshost = "localhost"; our $auth_ok; $protocol_ports->{'pop3'}=110; $protocol_ports->{'imap'}=143; $protocol_ports->{'smtp'}=25; $protocol_ports->{'smtpssl'}=465; sub handler { $r = shift; $Passdbh=DBI->connect("DBI:Pg:dbname=mail;host=$passhost","$pg_user","$pg_pass"); if (!$Passdbh) { $r->header_out("Auth-Status", "OK") ; $r->header_out("Auth-Server", '0.0.0.0'); $r->header_out("Auth-Port", $protocol_ports->{$r->header_in("Auth-Protocol")}); $r->send_http_header("text/html"); return OK; exit; }; $Mapsdbh=DBI->connect("DBI:Pg:dbname=mail;host=$mapshost","$pg_user","$pg_pass"); $auth_ok=0; $mailbox = $r->header_in("Auth-User"); our $get_pass_from_db=$Passdbh->prepare("SELECT password FROM mail INNER JOIN ldomain ON ( mail.ldomain_id = ldomain.id ) INNER JOIN mdomain ON ( mail.mdomain_id = mdomain.id ) WHERE mail.mailbox || '\@' || ldomain.domain || '.' || mdomain.domain = ? "); $get_pass_from_db->execute($mailbox); @row=$get_pass_from_db->fetchrow_array(); $passfromDB=@row[0]; $md5passFromConnect = md5_hex($r->header_in("Auth-Pass")); if ( $passfromDB eq $md5passFromConnect ){ $auth_ok=1; } if ($auth_ok==1){ @domain = split('\@',$mailbox); $get_server_from_maps = $Mapsdbh->prepare( "SELECT storage1 FROM maps INNER JOIN ldomain ON ( maps.ldomain_id = ldomain.id ) \ INNER JOIN mdomain ON ( maps.mdomain_id = mdomain.id ) \ WHERE ldomain.domain || '.' || mdomain.domain = ? " ); $get_server_from_maps->execute(@domain[1]); @row=$get_server_from_maps->fetchrow_array(); $server_from_maps = nslookup(host => $row[0], type => "A"); $r->header_out("Auth-Status", "OK") ; $r->header_out("Auth-Server", $server_from_maps); $r->header_out("Auth-Port", $protocol_ports->{$r->header_in("Auth-Protocol")}); } else { $r->header_out("mail:", $r->header_in("Auth-User")); $r->header_out("Auth-Status", "Invalid login or password") ; } $r->send_http_header("text/html"); return OK; } sub db_fail { $r->header_out("Auth-Status", "OK") ; $r->header_out("Auth-Server", '127.0.0.1'); $r->send_http_header("text/html"); } 1; __END__ 


Adjustment of balancing, and switching to the backup node
Now switching to the backup node occurs in manual mode. Just in the maps table, the value in the storage1 field changes. T.K. all servers are monitored by monitoring so far have been enough.

CONCLUSION
The cluster has been operating for 3 years. During this time, survived several drops of one of the nodes (as a result, this node moved to another DC).

Someone may seem this design complex, confusing and "bicycle". But I want to emphasize that the architecture of this solution was based on the decision to use cheap, "unreliable" hardware. As a result, we have a reliable service with a minimum cost of server rental.

Perhaps the note was not clear enough, and I did not display some important details. In the comments, if they will, I will supplement.

PS The article turned out to be long, so I’ll tell you about the deduplication and another, not mentioned here service management of this cluster in the next article, if this is interesting.

Thanks for attention!

Source: https://habr.com/ru/post/324538/


All Articles