This story happened to me during the previous week - from January 26 to January 31, 2009. Having lived this wonderfully tiny period of my life, I realized the necessity of simple things, believed in the existence of an “event” and became increasingly disillusioned with people. This week's tags were RAID, Infobox and backup. Although it all started much earlier ...
Part one
In January 2008, I rented a server in the St. Petersburg company Infobox. Mediocre in performance, relatively cheap, he, as well as possible, satisfied my current needs. The rental service included the initial installation of the operating system, which of course became freebsd and the desired partitioning. Also, the kind technical support workers combined a pair of 120 gigovy screws into software RAID 1 (mirror). To take care of the server, I asked my friend, working as a system administrator at once in many places. He installed a web server, set up all services, including full backup of data to the archives twice a day. On my home computer, I picked up a script that regularly took these backups and stored them in daddy. I periodically cleaned my daddy.
The reader should agree that, in general, everything turned out quite well: RAID 1 + archives on the server + archives on my home computer, which is turned on around the clock.
Immediately, I transferred the
CakePHP website from the hosting to the newly created server, and later other sites appeared, known to habra audiences, such as
MyNotifier ,
CodeIgniter , my
homepage , as well as many other projects that are very distant to my story.
')
Part two
So I lived myself happily ever after, until in January of this year I decided to update my outdated ubuntu. From the number 8.04 to the number 8.10, and at the same time the life of the desktop, first start - format the screws and put the OS on a "clean" one. This noble thing happened on January 23. There was no big sense to save the accumulated backups: “I will reinstall the system - I will set up the script again and collect the archives,” I thought. But life is fast and unpredictable and in the next couple of days I didn’t manage to devote a lot of time to setting up my new one 8.10.
Returning home in the evening of the 26th, I found a contact list of jabber and ICQ that was filled with messages. All as one wrote that something yes on my sites does not work. It was not difficult to make sure of this - it is enough to open any of the projects and wait half a minute to load the page with a database error. Having decided that the matter is simple, I rebooted the prank mysql, but this did not bring the desired effect. Moreover, the server responded by ssh with the speed of a turtle or slightly slower than that. The situation was aggravated by the fact that at that time my friend-administrator was traveling peacefully on the St. Petersburg-Moscow train and my worldly problems were not subject to his desire to solve them.
With a request to restart the server, I contacted Infobox's technical support. So began my correspondence with them, consisting at this moment of 53 letters.
The server was restarted, but nothing has changed, then I suggested that something had burned, it could have been a cooler or a screw. It turned out that it was the hard drive, which technical support staff, after an hour and a little, was kindly replaced by running a background copy from the old screw. It was overnight, and after several fruitless attempts to reach the server, I went to bed, having decided that the background copying that the server had taken so should end by morning. But in the morning nothing has changed.
In the meantime, my administrator arrived in Moscow and after a while he threw me a log of unsuccessful attempts to write to a new hard drive. It looks like this.
...
Jan 27 10:44:44 oowl kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA = 74274048
Jan 27 10:46:14 oowl kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA = 74344960
Jan 27 10:47:05 oowl kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA = 50792319
The next half-day went to the fact that the Infobox engineers themselves were convinced of the impossibility of recording and of the malfunction of the new hard drive. The hard drive was again changed and now this background copying has started. By that time, I had already received the 50th letter from users with questions about what had happened.
When the server began to respond to requests with an acceptable speed, and I already thought that this was the end of an unpleasant story, but as it turned out, the adventures are just beginning, because I was in the past! The latest posts on the forums were dated May 24, 2008. Jobs in MyNotifier confirmed my teleportation. To make sure that I was not crazy, I had to look at the calendar. It was winter, and the server is already spring, however, last year.
After negotiations with the support, I received the following from them.
Now the server is working hard which was in the raid before the second hard drive crashed. The second hard drive failed at the physical level, data recovery is impossible, since May the hard ones had to be synchronized, apparently because of some kind of error of the same hard failing that did not happen.
Part three
That's how I stayed at the broken trough: May blooms on the first hard drive, the second “failed at the physical level data recovery is impossible,” and the script for collecting backups has not been configured on the local computer (remember my transition to 8.10?). Thus, I lost the information accumulated for almost a year, including the full source code of some projects, which were not duplicated.
Having dug up all the correspondence for May with my administrator, I came to the conclusion that nothing was installed in May, was not erased, and was not overloaded at all. In any logs, the hard drive failure did not pass.
We had to do something and as soon as possible. Having phoned the serious companies that are engaged in data recovery, I arranged a visit to the data center and that I was given the dead screw on receipt. You can get into DC only from 10 in the morning. At 9:30 I already knocked the thresholds. Grabbing the warming corpse of the hard drive, rushed into the intensive care unit for people like him.
Part four
At 10:15 I already described the situation to the master. “We'll see,” he grunted, and plunged into the dark room behind the counter, leaving me to fill out a questionnaire with questions about the volume of sections, the location of information, and what needs to be restored first. I didn’t spend five minutes as a master ran out with the words: “You are kidding, right ?! Are you bored or what? Why did you bring me a whole hard drive ?! ”.
There was an awkward pause. The technician looked at me reproachfully, and I looked at him, not trusting his professional abilities, had already mentally buried the screw. "It can not be, check again," - I did not believe my ears. The wizard connected the screw to the wind machine behind the client counter and, through the UFS Explorer, showed me the contents of the screw, my documents, databases, pictures and all that I did not ask, hoping to get rid of the abnormal client.
With the hard drive, I came home and, oh, the horror, I realized that I simply have nowhere to connect it - I do not have stationary PCs. Calling all my friends, I made sure that if people were not owners of leptopes, then they had nowhere to insert a hard drive with a SATA connector. Of course, this was not a problem for my administrator, but he was in Moscow.
In the meantime, angry correspondence with infobox support continued. As an excuse, they chose the phrase:
We couldn’t work with this disk, it’s probably a matter of the configuration of the server.
And also wrote:
... you can bring a hard back to us, we will try to copy the information, or connect it to your server.
I had no options, and the next morning I brought the hard drive back to the data center. Meanwhile, the number of letters with requests to clarify the situation exceeded 80, and the new hard, delivered to the leased server, began to slowly refuse.
Part five
2009-01-29 11:03:32 <...> Well, let's skip the data throughout the day.
2009-01-29 19:18:28 No copying has yet been done, there were problems with the “old hard” only the 500Mb root file system was mounted, now we managed to lure the / var / usr / home partitions but still there are errors. <...>
and about the updated server, which was constantly hanging:
The server hung, the signal was not output to the console, rebooted now to ping <...>
The next day, in the late afternoon, my administrator appeared in touch, who explained where the necessary data lay. I immediately sent this information to technical support.
2009-01-30 17:36:59 Thanks for the information, we will keep you informed.
2009-01-30 21:53:53 I : What is the state of the process now?
2009-01-30 21:55:57 Engineer : We are trying to copy the data.
Part Six
My patience broke, as you understand, two days of the phrase “we are trying to copy” passed, and on Saturday morning the next day I was allowed to pick up the hard drive again. Having hit the gas, I went to the administrator who had just returned to St. Petersburg.
What was my surprise when he said that he just copied all the data. Errors caused only reading one innodb'shnoy base, which strongly collapsed in case of failure. The rest of the files were extracted without any problems. A reasonable question arises: what was the support for two days, writing off reports about the recovery process? But we will leave it on the conscience of engineers, who, by the way, in “attempts” to read data from the hard drive, wrote it down to him!
Conclusion
My story with a happy ending. All data is returned, projects are working. I refused to rent the server, putting my own on colocation. The money for the remainder of the lease term was returned to me in three stages: first they refused tightly, then credited 800 rubles to their side, then, after the next letter, they fixed it.
I don’t know if the story that happened to someone helped me make sure I needed reliable backups, choose a data center or just think about the value of information, but she taught me a lot, pretty nerves and gave invaluable life experience. As a result of weekly correspondence with support, 119 letters from users of sites and endless running around, I still found more than lost.
Thanks for attention.