High Availability FTP Server with AWS S3 Data Storage

Good afternoon, dear readers.
Again, I want to share with you the experience gained. On one of the projects, the goal was set up to organize an FTP-server of increased reliability. By increased reliability was meant the following:

Data is stored in AWS S3
The FTP server itself (Pure-ftpd was selected) should be as accessible as possible.
Organize load balancing (optional)

Step one : Install s3fs and mount the S3 bucket as a disk partition.
There are not many options, or rather one (if I am mistaken, correct it) - s3fs . The developers of s3fs on their page claim that "s3fs is stable and is being used in a number of production environment" . The installation process for s3fs does not make sense to paint, it is here. I ’ll dwell only on the really important points. First, the latest version of s3fs has problems with data synchronization. When you upload a new file on S3, it immediately appears on your server, but if later you make changes to this file on S3, then the old version is still on the server. There is a problem with caching. Attempts to mount an S3 bucket with various options to enable and disable caching did not work. After testing various releases of s3fs, a version was found where this bug did not manifest itself. Download the package, unpack and install as written in the Makefile. In order for s3fs to work properly, make sure that the following packages are already installed on the system:

fuse
fuse-devel
fuse-libs

To check, you can try to mount the bake with the command:

#/usr/bin/s3fs mybucket /mnt/mybucket/ -o accessKeyId=XXXXXXXXXXXXX -o secretAccessKey=YYYYYYYYYYYYYYYYY -o allow_other,rw -o readwrite_timeout=120;

Step Two : Install pure-ftpd.
It would seem nothing interesting. Simply install using any package manager. However, pure-ftpd is notable for its paranoia, and before deleting a file, it first copies it to a new temporary file. And when the file size is several gigabytes, this procedure takes extra time. And in our case, when the data is not stored locally, but on S3, it is not a short time at all.
To disable the creation of temporary files before deleting, I rebuilt pure-ftpd with the option --without-sendfile . Of course, it would be better to assemble your package and install it into the system, but I did it on a fast hand and did not get distracted by it.

Step three : Setting user rights.
One of the most interesting nuances. According to customer requirements, each user's home folder should contain directories that the user cannot or cannot write to them. If we dealt with regular disk partitions, we could simply change the owner of the folder. But in our case it will not work as the rights will be inherited from the option with which the partition is mounted (ro or rw). That is, the user can either everything or just read. But pure-ftpd has one useful feature, it can “follow” links. To do this, at build time, add another option --with-virtualchroot. Thus, we can mount the batch twice, in read-only and read-write modes and make links to them in the users home directories.

 #/usr/bin/s3fs mybucket /mnt/mybucketrw/ -o accessKeyId=XXXXXXXXXXXXX -o secretAccessKey=YYYYYYYYYYYYYYYYY -o allow_other,rw -o readwrite_timeout=120; #/usr/bin/s3fs mybucket /mnt/mybucketro/ -o accessKeyId=XXXXXXXXXXXXX -o secretAccessKey=YYYYYYYYYYYYYYYYY -o allow_other,ro -o readwrite_timeout=120; #mount | grep s3fs s3fs on /mnt/mybucketro type fuse.s3fs (ro,nosuid,nodev,allow_other) s3fs on /mnt/mybucketrw type fuse.s3fs (rw,nosuid,nodev,allow_other)

The user directory will look like this:

 ls -la /mnt/Users/User1/ . lrwxrwxrwx 1 root root 15 Mar 25 09:10 mybucketro/folder1 -> /mnt/mybucketro/folder1 lrwxrwxrwx 1 root root 15 Mar 25 09:10 mybucketrw/folder2 -> /mnt/mybucketrw/folder2

Now we have given the user read access to the / mnt / mybucketro / folder 1 folder and write access to the / mnt / mybucketrw / folder2 folder . At this stage, we can assume that the first TK item (Data stored in AWS S3) has been completed.

Step Four : Set up high availability.
Then it was decided to use the good old AWS LoadBalancer and its wonderful HealthCheck.
We open the AWS Console and create a new balancer (I am sure that there is no need to repeat the process of creating a balancer. If anything, here’s a reminder ).
In Ping Protocol, select TCP, Ping Port - 21.
Everything, now the viability of the server will be checked for availability of 21 ports, that is, our FTP server.
We create AMI from our server (on which FTP is already configured and partitions are mounted). Further, as always, we do launch-config with the created AMI and create the auto-scaling-group.
When creating an auto-scaling-group, we specify our new Load Balancer and the option - health-check-type ELB. In this configuration, if our FTP server “crashes”, the Load Balancer will remove it and “pick up” a new working server. that we store all the data on S3, then this procedure will not harm us.
')
Fifth step (optional) Your work is highly appreciated : Configure load balancing and autoscaling.
The issue of load balancing on FTP is far from being solved as easily as, say, the load on the web. I came across this for the first time and, not finding a ready-made free solution, suggested that the customer balance the load with the help of DNS.
In AWS Route53 there is an option for A-type records - weight. The higher the value of the record, the higher its priority at the time of the response to the client.
That is, theoretically, we can create 5 records with the same weight and thus evenly distribute client requests across 5 servers. To automate adding records to AWS Route53, I made two scripts. One to add entry:

instance_up.sh

 #!/bin/bash zone_id="Z3KU6XBKO52XV4" dns_record="example.com." instance_dns=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-hostname) instance_ip=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-ipv4) let number_nodes=$(route53 get $zone_id | grep $dns_record | wc -l)+1 weight="50" id=$(date "+%Y_%m_%d_%H:%M") route53 get $zone_id | grep $instance_ip > /dev/null if [ $? -ne 0 ]; then route53 get $zone_id | grep $dns_record | awk '{print $4" "$3" "$6" "$7}' | sed 's/id=//' | sed 's/\,//' | sed 's/w=//' | sed 's/)//' | while read i; do route53 del_record $zone_id $dns_record A $i route53 add_record $zone_id $dns_record A $(echo $i | awk '{print $1" "$2" "$3}') $weight done route53 add_record $zone_id $dns_record A $instance_ip 60 $id $weight fi

Another to remove:

instance_down.sh

 #!/bin/bash zone_id="Z3KU6XBKO52XV4" dns_record="example.com." instance_dns=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-hostname) instance_ip=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-ipv4) let number_nodes=$(route53 get $zone_id | grep $dns_record | wc -l)+1 weight="50" id=$(date "+%Y_%m_%d_%H:%M") route53 get $zone_id | grep $instance_ip > /dev/null if [ $? -eq 0 ]; then route53 del_record $zone_id $(route53 get $zone_id | grep $instance_ip | awk '{print $1" "$2" "$4" "$3" "$6" "$7}' | sed 's/id=//' | sed 's/\,//' | sed 's/w=//' | sed 's/)//') fi

The scripts use the route53 utility, which comes with the python-boto package.
Both scripts are placed on the server from which we are doing AMI and add their call to the Pure-Ftpd start script
Now, when Pure-Ftpd is launched, it will add a new “A” entry with its own IP address to AWS Route53, and, upon shutdown, delete it.
It remains only to add policies for ScaleUP and ScaleDown for our auto-scaling-group.

That's the whole setup. This configuration has been successfully working on the project for six months already.
If you have any questions - write comments, if possible answer. I would also be happy if someone shares their experience in organizing such systems.

Source: https://habr.com/ru/post/173699/

All Articles

High Availability FTP Server with AWS S3 Data Storage

More articles: