AWS: Good, bad, angry

Here at awe.sm , we used Amazon AWS from the very beginning for hosting. Over the past three years, we have learned what is good and what has not been and formulated for ourselves our own set of rules for launching a highly accessible, highly productive system, which in some cases differ from what Amazon recommends.

We are going to talk about the following related terms:

For people who have heard about Amazon, but have not yet had the opportunity to use it, we will show all the advantages and disadvantages of this service that we encountered in our work.
For those who are already using AWS, we’ll clarify some of the details and highlight the best practices for using Amazon for high-performance services like ours, where the continuous operation of the system is the highest criterion.

It would not be an exaggeration to say that Amazon has radically changed the economic aspect of launching IT startups , it happened slowly and gradually, but now it is a fact. No one realizes how many companies use Amazon EC2 anywhere in their infrastructure, until it crashes and it seems that half of the Internet has stopped working. This does not mean that Amazon is just lucky, in fact, they have a very good product. Everyone uses this service because it has greatly simplified the launch of applications and services, significantly reducing the amount of knowledge, steps that need to be taken and the money that is needed to start a startup.

EC2 is a new way to start software.
The first and most important thing to know about EC2 is that it is not just virtual hosting . It is better to think of this as hiring a system and network administrator for part-time work. Instead of hiring a high-paying employee who does the full amount of work and automates everything for you, you pay a little more for each server, but you get rid of a number of problems. Power supply, network topology, cost of hardware, incompatibility of equipment from different manufacturers, network data storage - all these things had to be thought back in 2004 (or get this idea out of my head). With AWS and its competitors, the number of which is growing rapidly, you don’t need to think about such things until you want something more.
')
The main difference and advantage of using EC2 is flexibility. We can launch a new server quickly, very quickly, it will take about 5 minutes from the moment the thought “I need new equipment” to the moment when you can log in for the first time. This gives us the opportunity to do things that several years ago seemed impossible, for example:

we can install the latest hardware updates. When we have a big update, we launch the new server, install all the necessary software on it, install all the dependencies, transfer the configuration files, and then just include this server in our load balancer, if everything is fine, we simply remove the old servers from the balancer and turn them off, if something went wrong, we can easily switch the balancer back. You can keep running duplicates of old and new servers in the quantity we need, the time we need and then turn off those that are not needed, without the need to buy new equipment.
For some non-critical systems, where one hour of downtime is acceptable, we used the following algorithm: the server was monitored and in case of problems, we simply picked up a new host
we can expand our infrastructure at the time of increasing load, instead of installing new equipment in advance. In the event of an increase in load, we are launching new capacities that we need at a specific point in time in order to cope with the current load.
we do not have to worry about pre-calculating the power we may need. When we need we start a new server, if it does not cope with the load, we launch a more powerful one, or vice versa, we can start a weaker one if its capacity is enough. This is one of the best features of AWS, which is provided at the hardware level, and this is possible only because the provision of new servers and the removal of old ones occurs almost instantly.

EC2 is financially beneficial for startups.
The most obvious economic benefit is that we literally can start at zero cost . You use the same Amazon account that you use to purchase a variety of unnecessary things over the Internet, press a button and start playing with your servers within an hour. You pay only for those servers that are running and only for those disk drives that are used, so that the start cost for you is minimal. This makes it possible to experiment with the equipment: run 10 times more power than you need, run load tests and then turn it off until we really need such power. This is not just a convenience, it is a revolutionary breakthrough, along with other advantages of AWS, this quantitative feature becomes qualitative .

As I mentioned, AWS dramatically reduces operational costs. Until 2012, more than two years since we launched our company, we did not have a dedicated system administrator. It was a bad trend, we had to hire at least one person in 2011 or earlier. Now we have only one system administrator who works at full employment and manages our entire infrastructure, consisting of hundreds of servers. This is a fairly high ratio of the number of people to the number of serviced cars. The effect is reinforced by the fact that we do not have to worry about the network, power supply and much more, and as soon as you get used to this, you start to underestimate it .

Of course, this is not just hosting, it is more expensive than regular hosting. But Amazon is trying to remove this disadvantage and periodically reduces prices: by 18% in October, by 10% in March, and this is only this year. Also, to save money, you can use spare servers that run, with free capacity and cost less, instead of those that run on demand. Also, with long-term use, you can pay in advance and use reserved machines, so you can save up to 50%. We at awe.sm are obsessed with reliability and use an excess amount of equipment, so reservations were a big win for us.

EC2 has a number of problems.
At this point, the eulogy ends and it is time to reconsider the attitude towards Amazon. As long as we love EC2 and cannot imagine life without it, it is important to be honest and understand that this path is not cloudless and is not laid out with rose petals. EC2 has serious limitations on performance and reliability that need to beware and which should be taken into account in their plans.

First of all, it is the declared independence of the infrastructure and its failures within the accessibility zone. AWS services are located in several locations around the world, called accessibility regions. Each region consists of several availability zones, which in theory are isolated from each other, are independent data centers, have independent network infrastructure, power supply, and the like. There are several important facts to consider when using regions and accessibility zones:

Virtual hardware is not real physical hardware.
Our three-year observations showed that the average life cycle of a virtual machine on EC2 is 200 days . After this, the chance that the server will retire greatly increases. And this process is unpredictable: sometimes those. support informs us in advance for 10 days, that the machine will be turned off, sometimes a message that the machine will be turned off comes two hours after it was turned off. Suddenly, the disappearing equipment is not the biggest problem - you can easily start a new one, but it is important to take this fact into account and spend time on automating this process in advance to save time spent regularly launching new equipment.
Your servers must be located in more than one availability zone and have all the necessary services in both zones. Our experience has shown that it is more likely that a whole zone can fail than a separate server. So, if you plan to use the primary and backup servers in the same availability zone, in case of problems with the primary server, you will also lose the backup due to common problems within the entire zone. in this case, your system will be a single point of failure and you will not be able to restore your data from a backup copy or extract your files from the servers, as if there are problems with the zone, you will not even be able to see your servers, not something to extract from them either data.
Problems with several zones within the region also occur. So, if you can afford it, use different regions as well . The US-East region, which is the most popular because it is the oldest and cheapest, had problems in June 2012, March 2012 and the strongest failure was in April 2011, which was called the cloud apocalypse. Our opinion on this, because of which we may lose friends at Amazon, the unstable work of entire regions happens quite often and it happens for the same reason. This led us to the following solution.

To ensure high reliability, we must stop trusting EBS.
This is the point where we strongly disagree with Amazon marketers and their advice. Amazon assumes that using EBS is fundamental when using EC2. You must store all data on the EBS disk, you can connect it to new servers, you can take a snapshot of the EBS disk to create backup copies of the database and then use it to restore. Amazon also wants you to use EBS as the root disk device of the system using EBS-backed images. EBS brought us several major problems:

I / O speed on EBS unsatisfactory
I / O speed on virtual hardware is much lower than on pure hardware, but our experience has shown that EBS performance is much lower than the performance of local disks on a virtual machine, which Amazon calls ephemeral storage. EBS drives are essentially network drives . The performance that should be expected from any network drive is not very large. AWS also provides disks with a guaranteed number of I / O operations, but they are quite expensive and are not suitable as a slightly more attractive compromise.
EBS fails at the region level, not at the level of a single disk.
Our experience has shown that EBS has two behaviors: all disks are available or all disks are not available. Two of the three failures within the region that were previously described were related to problems at the EBS level, problems started in one zone and spread to the others. If your recovery plan is tied to work with EBS disks, and failure to work is due to problems at the EBS level, nothing will work out for you, we have encountered a similar problem several times.
The problems with EBS on the Ubuntu system are extremely difficult: because EBS is a network drive that is emulated in the system under the guise of a real hard disk, it disrupts work at the operating system level. It had terrible consequences for us. As soon as problems with EBS occur, the entire server to which the EBS disk is attached is completely inaccessible, and this affects functionality that is not related to disk activity.

For this reason, and also due to the fact that our main goal is the greatest uptime of the system, we completely abandoned EBS about 6 months ago. We spent some time on the implementation of complex operations, mainly related to backup and recovery, but it was worth it, given the increased system uptime.

Be careful. Other Amazon services can also use EBS.
Due to the fact that some Amazon services use EBS, for problems at the EBS level, these services are also unavailable. This is true for the load balancer ELB, RDS database service, cloud application services Elastic Beanstalk and others.

Based on our experience, we concluded that with serious problems with Amazon, the EBS service is also almost always unavailable. So if EBS does not work, and you need to switch the balancer to another region, you will not be able to do this, as it is tied to EBS. Also, you will not be able to launch new hardware, because the Amazon console is running on EBS. So we love EC2 and love S3 very much, but do not use any additional services.
The advantage of our approach is that we can easily switch to using another provider and are not strongly attached to AWS.

The lessons we have learned.
If we started awe.sm tomorrow, I would use Amazon without any hesitation. For a startup with a small team and with a limited budget, this is what you need to quickly start . AWS doesn’t really pose any threat; it’s not scary and bad.

IaaS providers such as Joyent and Rackspace are coming on the heels of Amazon: we have good friends in both companies and we are going to work with them. When the number of our servers grows from 100 to 1000, we will have to diversify our infrastructure with these providers, as well as with such as Carpathia, who use AWS Direct Connect to provide hosting services with low AWS access time, which makes the creation of hybrid cloud infrastructures easier.

I hope this information was helpful to you.

Original article: http://blog.awe.sm/2012/12/18/aws-the-good-the-bad-and-the-ugly/
Posted by: Laurie Voss (seldo)

Source: https://habr.com/ru/post/164239/

All Articles

AWS: Good, bad, angry

More articles: