📜 ⬆️ ⬇️

What to do if lightning strikes your Amazon EC2 Instance?

As many know, recently in Ireland one of the data centers of Amazon was de-energized. This has already been told . As I understood from communicating with my colleagues, most of the users of Amazon EC2 did get downtime, but I was not lucky - not only did the instances stop, but one of my volume turned into an error state.

Everything would be fine (after all, there are snapshots), but here's the bad luck: you cannot disable (detach) EBS volume, if they are connected as root to any instance. Anyway through the web.


')
And so, you have an instance in the stopped state. Volume is connected to it with the error status. You should also receive a letter with apologies and primitive instructions like “do not worry, just connect the recovered snapshot-a volume”. But as it turned out, not everything is so simple. I will give a list of steps that I had to do to restore my instance.

Small note: Perhaps for those who are actively working with Amazon EC2, this does not look like a problem. But for me, which is EC2 related “works? cool! ”, it was not so easy to figure out how and what to do. Therefore, the instruction is more likely for those who are in the same position and does not want to create a new instance simply by deleting the old one.

I wrote in support, but they couldn’t tell anything except for sympathetic advice to use Amazon AMI / API Tools.

Even if now "lightning struck" by your server, this does not mean that the next time, too, "blow over".

So what I did:

1. Recover from Recover snapshot


The first thing you need to do is go to AWS Management Console, Elastic Block Storage -> Snapshots. There should be a snapshot with the description “Recovery snapshot for vol-XXXXXXX”. From it you need to create a restored copy of volume-a.

2. Try to detach the broken volume


And what if you're lucky? I think about what was written to me in the support:

You can select "Detach Volume" from within the AWS Management Console to detach this volume from your instance. You may need to execute multiple force-detaches if this hangs in a "detaching" state.

If you are unable to follow up, you can’t be able to do this.

You can select “Detach Volume” from the AWS Management Console in order to detach the volume from your instance. You may need to perform several shutdown operations with the “force” option enabled if it hangs in the “detaching” state.

If you did not succeed in any command, you will have to create a new instance and connect the restored volume to the instance.


Unfortunately, it didn’t work out for me - I issued “Unable to ...” to all operations

3. Take advantage of AMI Tools



So, the only solution in this situation is to use Amazon AMI Tools.
You can download it from here .

So, as I have Ubuntu, it was enough for me:
sudo apt-get install ec2-ami-tools sudo apt-get install ec2-api-tools 


Under Windows it should not be much more complicated.

Next you need to create access keys. This is slightly different than the “Key pairs” in the Network / Security in the Management Console.

In order to create a pair of key-certificate of access, you need to go to aws.amazon.com , there choose Account -> Security Credentials. There go to the x.509 Certificates tab. There you can click on “Create a new Certificate” and save two files - one certificate (let's call it ec2-cert.pem) and a private key (ec2-key.pem).

Important: do not confuse the key and certificate. They have the same extensions (.pem) but if swapped, the tools will fall out with java.security.cert.CertificateParsingException.

Further, it is better (although not necessary - through the parameters) to set the environment variables:
 export EC2_CERT=~/--/ec2_cert.pem export EC2_PRIVATE_KEY=~/--/ec2-key.pem 


After that, you can check whether everything picked up normally:
 $ ec2-describe-regions REGION eu-west-1 ec2.eu-west-1.amazonaws.com REGION us-east-1 ec2.us-east-1.amazonaws.com REGION ap-northeast-1 ec2.ap-northeast-1.amazonaws.com REGION us-west-1 ec2.us-west-1.amazonaws.com REGION ap-southeast-1 ec2.ap-southeast-1.amazonaws.com 


Here in the eu-west-1 region there was a problem, and I have a hostel instance there. Watch our instances:
 $ ec2-describe-instances RESERVATION .... INSTANCE ...(   ) BLOCKDEVICE /dev/sda1 vol-20155a49 2011-05-20T14:14:54.000Z ... 

Here we need BLOCKDEVICE (and for one thing, remember / dev / sda1 - you still need it). Namely, "vol-20155a49" (you, of course, will be different). Check in the console - is this really the volume that does not want to shut down? If yes, then we have the last step:
 $ ec2-detach-volume --region eu-west-1 vol_2055a49 --force 

After that, go to the management console and calmly connect the volume, recovered from the snapshot. By the way, here we need to remember what happened immediately after BLOCKDEVICE - I had it "/ dev / sda1".

Done! Now you can start the instance :)

As a summary, I can say that Amazon AMI / API Tools are not as complex as it seems (205 utilities in the bag), and they can be used when the web-based management console fails.

No thunderstorms or fires to your servers!

Useful links:
* Instructions for setting up utilities
* How to use keys and certificates
* Download utilities: AMI / API

Source: https://habr.com/ru/post/126729/


All Articles