📜 ⬆️ ⬇️

Problem with servers July 1

Evernote experienced a series of hardware failures on one of our servers between July 1 and 4. These problems could potentially affect 6,323 users worldwide. As a result of this failure, some notes created or edited by these users between July 1 and July 4 were not recorded correctly on Evernote servers. We immediately contacted all affected users by mail, and our support team guided them through the data recovery procedure. We immediately provided all potentially affected users with a premium subscription (or extended it for 1 year for those who were already a premium user) in order to give them priority access to our technical support in case they need help in recovering notes, as well as for partial compensation for the inconvenience.

If you did not receive such a letter from us at the beginning of July, then the problem did not affect you.

Due to the inherent Evernote data storage redundancy (copies of notes were stored on the hard disk, in the mail and browser history), most affected users were able to recover all their notes.
We want to assure you that this was a one-time problem. We have significantly improved our alert system and infrastructure redundancy to prevent the recurrence of similar incidents. We sincerely apologize to our affected users. Even if most of them did not lose any data, they were forced to read a long and probably exciting email with information about the problem. We also hope that those who lost the data received enough information about the notes they worked with for four days to restore or recreate the most important ones.

We received responses from several hundred affected users, and we are extremely grateful for their understanding and continued support. We are writing about it now because of the erroneous information that has spread somewhere on the Internet.

Below are the technical details of what happened.

Each user's data is stored in a “cluster” (shard). The cluster is formed from a bunch of two servers: working and spare, to ensure fault tolerance. If there is a problem with the server, the system automatically starts up the second server in the cluster. We now have 37 clusters. There were problems with cluster number 22 last month. The data for each server is stored on a RAID (fully redundant). All data is also backed up, both in our data center and in another location. A full copy of your notes is also stored in software clients for Windows and Mac (as well as programs for iPhone and iPad for premium users who have activated this feature). This means that any note in Evernote is stored in at least six places: the disk on the main server, the RAID mirror, the backup server of the cluster and its RAID replica, and backup copies in the data center and in the backup storage. Most users also have one or two copies in their local clients. This makes data loss in Evernote extremely rare.

The incident with cluster 22 was caused by an extremely unlikely combination of hardware problems immediately with the main server and a spare mechanism. In short, the cluster switched back and forth between the two servers for some time, which led to the fact that the records created at that time were erased. All data created before the failure, it was easy to restore from a backup. The probability of the recurrence of such a sequence of failures is extremely small, but just in case, we modified the fault tolerance mechanism to be sure that it would be impossible to lose data even during the worst case scenario.

Source: https://habr.com/ru/post/101419/

All Articles