EMC Avamar at CROC DatacenterThis hefty multi-server cabinet is called EMC Avamar. It stands in our data center, is engaged in backup, and makes it very interesting.
What's inside the cabinet?
Technologically, this is a block of x86 servers, now there are 10 of them. The architecture is as follows: there is a spare node and a control node, and data is written on the remaining 8. Considering redundancy (the principle of the Hamming code, the uniform distribution of the RAIN - Redundant Array of Independent Nodes), if any of the nodes fail, the data is saved. The spare unit replaces the dead at this moment. In total, only 50% of each node is directly used in the system — the backup node, the parity node, and the second half is spent on data safety. The physical capacity of the array 200 TB turns into 62.5 TB.
')
On each of the nodes is OS SUSE Linux and specialized proprietary software - the server part of the complex. Nodes are interconnected by internal switches that isolate external backup traffic from internal service traffic.
The structure of a single node is 12 disks, 6 of which contain basic data, another 6 mirror them (RAID1), plus an ssd disk for the OS.

Where does backup come from?
The main purpose of EMC Avamar is a “hot” backup of a combat system from various sources:
- From the “cloud” (“cloud” in the data center, in the “cloud” virtual networks, and from them you can back up).
- Physical servers in other racks.
- From the virtual machines and physical servers of the infrastructure of the customer, who stretched his cable to the data center or via the Internet.

What are the benefits of Avamar?
The special features of the cabinet are:
1. Deduplication. Data is stored in small blocks, and duplicate data is stored as block references. If you load 50 different text documents, which are essentially different versions of the same document, or are made on the basis of a single template, then in the process of deduplication, the documents are broken into a large number of blocks of variable length. And most of these blocks are repeated, since the basis of each of the documents included a lot of information from related documents. All duplicate blocks are replaced by a link that is virtually “weightless.” This allows you to compress backup files up to 500 times, as the manufacturer claims. In practice, among our customers, we observe a rate of 15–20 times file compression due to deduplication.
2. One of the coolest things of this particular software and hardware complex is
deduplication at the sources . That is, if a backup is being made from your server, the definition of those pieces that should actually be sent is not done after analyzing the “flown” data on Avamar, but directly on the spot, on the hosts themselves. This means that the first backup is 100% of the base volume (for example, 2 TB), and the second, third and subsequent in practice is about 0.1% - that is, approximately 200 MB each (in fact, an incremental copy). Backing up a remote office, a huge database or something similar for a minute is just a fairy tale.
3.
Compatible with different software . Specifically - with the main OS and application software. Why is it needed? Imagine a combat database where thousands of transactions are made per minute. If you start copying it “in the forehead”, then from the moment of the beginning of the copying to the moment of the end of the copying the base will change - and irrelevant, erroneous and deleted data will get into the backup. In an hour, a million transactions can go through - and you will get an excellent mess from the data, which you can’t even recover with your hands. Therefore, we need a software agent, which will make an impression of the database (“freeze” it for backup) and start copying this impression. In addition, the agent compresses the data and encrypts it during transmission. The wardrobe, like ours, comes immediately with a full set of agents.
General solution scheme:

What exactly is compatibility?
System software:
- Microsoft Windows;
- HP-UX;
- IBM AIX;
- Oracle Solaris;
- Novell Netware;
- SUSE Linux;
- Red Hat Enterprise Linux;
- Apple Macintosh OS X;
- Free BSD;
- VMware ESX / ESXi.
Application software:
- Oracle and Oracle RAC;
- Microsoft SQL Server;
- Microsoft Share Point;
- Microsoft Exchange;
- IBM DB2;
- IBM Lotus Domino, etc.
How can this be used?
- Additional backup . It is delivered as a service of another backup backup: given the convenience, the transfer of all capital costs to operating costs, geographical distance and full automation, is in great demand for storing almost any data.
- The main backup (with restrictions) . For such an application, either wide channels are needed, or not very large amounts of data - otherwise you will have to sacrifice the recovery speed (after all, 100% of the database will go backward through the backup channels, which can be very long for remote combat systems).
- The main backup without restrictions . This is a rather unusual decision by KROK. It works like this: infrastructure is deployed in your data center, EMC Avamar is in our data center. Backup to it is done on your standard Internet channel. We put another server in your data center - “mini-Avamar” - a virtual appliance. “Small” will be synchronized with the “dad” and keep the latest copies (the most relevant ones for rolling back). Older copies (rarely needed for fast backup) are stored on the main site. This appliance does not need to be bought: it is also paid per use, that is, all costs are operational. The scheme of the solution is given below.

Returning to the cost of the entire solution on our site, yes, it is really high. But this cost is divided into many customers, and because of this “utility” mode, the cost for an individual customer is reduced. The data is completely isolated from each other: you only see your backups.
Customer Interface ScreenshotWho and how applies?
We have some interesting case studies. Unfortunately, I can’t mention the names of the companies, for now:
- One commercial company, often doing financial transactions, backs up our Oracle.
- A large state-owned company keeps backups of virtual machines from its “cloud”.
- The insurance company is testing the solution as the main backup.
Why is it safe and convenient?
- Remote storage . This thing is far from the main infrastructure (maybe in another machine building, in another data center (if the infrastructure is connected to our data center), that is, at least, not where combat vehicles are deployed). This is a significant reduction in the risk of losing all information at once.
- Data is stored on disk . EMC Avamar is not a traditional tape library used in such cases, but a disk array, that is, data integrity is higher.
- It is sold as a pay-per-use service , that is, by the gigabyte for the storage. At the end of the month, unloading is done according to the volume of data in the customer’s account - the amount to be paid is obtained. This is a “cloud approach” and it is convenient: capital costs are moving to operational ones.
- Technical support is outsourced . And this is the normal support of the integrator: not the “FAQ-line”, but a full-fledged solution of the working issues before the result.
So, if you need a reliable backup, come to
us , we have EMC Avamar and cookies.