Backup system is an important part of any corporate information system. With proper organization, it is able to reliably protect critical data, and in the corporate segment, accessibility and data integrity is the key to business continuity and business performance. The topic of today's post is a backup system with deduplication functions. Which of the many - you guessed it, but we will not get ahead of ourselves.
The implementation of data backup systems allows you to quickly restore information in a variety of situations, but these systems often have several disadvantages. Here are the traditional problems of corporate backup systems:
- Inefficiency : up to 200% of the data is copied per week.
- Slow operation : copy speed is too low.
- Unpredictable recovery time : as a rule, in practice it is much more than planned.
- Not very high reliability : high risk of impossibility of recovery (according to Gartner, above 10%).
- Problems with disaster recovery : there is a risk of loss of tapes when they are moved to remote storage.
This is how a traditional backup system works. Today, we need new backup and recovery solutions that will function effectively in the face of exponential data growth, tougher regulatory requirements, strict SLA requirements, and shrink backup windows.')
Pool Based Replication: For maximum efficiency, Avamar prioritizes and replicates groups in parallel.Data deduplication is intended to overcome them, which also remains one of the main topics when it comes to reducing the costs associated with data protection.
Data deduplication technology and its benefits
Deduplication is a duplication-based technology for creating a single copy of data with the possibility of sharing, increasing the efficiency of using storage capacity (IDC). Simply put, it reduces the amount of stored data by removing repetitive data sequences.
Deduplication allows you to send over the network and store less data.Deduplication can work at the file, block or application level. To further reduce the amount of stored data, deduplication is often combined with compression (compression), calling it all a compaction of data.
How the deduplication algorithm works and what it does.In general, the deduplication algorithm searches for duplicates (data blocks) and replaces them with a “data pointer” - a reference to an exact copy of this data object. Deduplication reduces costs by minimizing unnecessary work during storage and transfer of data.
In backup systems, deduplication helps solve a number of problems and provides the following benefits.
â–Ť Reducing backup windows
With traditional backups, large amounts of data are transferred, so it is sometimes quite difficult to “fit” into the backup window, and sometimes it is completely impossible. Deduplication reduces the amount of backed up data, allowing you to perform backups within a selected window.
â–Ť Reduce recovery time to reduce downtime
Storing a smaller amount of backup data on disks instead of tapes significantly reduces recovery time. The use of disks also simplifies the status checks and other operations to prevent failures.
â–Ť Reduced storage costs
Deduplication reduces backup data volumes and, accordingly, reduces storage capacity requirements.
â–ŤPartial solution to the problem of increasing the volume of corporate data that require protection
Deduplication allows you to efficiently back up increasing amounts of stored data.
â–Ť Improving security
Deduplication in combination with replication allows you to store copies off-site, eliminating the need for manual operations with tape media and increasing security.
â–ŤSimplification of data protection in organizations with a branch structure
Data in remote locations requires central protection and recovery. With centralized backup, deduplication simplifies the process of transferring large amounts of backup data over the network to a data center.
In general, deduplication is widely used as a tool to help cope with the rising costs of backup infrastructure while increasing data volumes and respecting backup windows.
If we don’t go into details, the main reasons for using data deduplication in the backup system are reducing backup size and reducing the need for storage capacity, reducing network traffic, reducing backup window (backup time). It is possible to do daily full backups to ensure fast recovery in one step.
Deduplication backup systems can use various backup storage options, including tape drives and libraries, virtual tape libraries, disk arrays, data storage systems, including built-in deduplication, and cloud backup.
Deduplication drastically reduces the amount of data transferred and stored, eliminating IT bottlenecks and reducing storage costs.Deduplication reduces the capacity requirements by an average of 10-30 times, significantly increasing the speed and reliability of data recovery and retrieval. Storing only unique data on disks also means that they can be replicated over remote networks across the network to remote sites for fast disaster recovery.
To minimize disk space requirements, some storage systems have built-in data deduplication: it runs “on the fly” (during backup or archiving), so deduplicated data is saved to disk.
Types of deduplication and its effectiveness
There are different types of deduplication with its advantages and disadvantages. Online deduplication (on-the-fly) is performed as data is received, redundancy is eliminated even before writing to storage systems. This reduces the number of I / O operations and storage, but the deduplication speed should be high enough not to slow down the backup process. Typically, this deduplication is performed at the data source.
Post-process processing provides the fastest possible copying of data, but for their temporary storage before deduplication additional capacity is required. Sometimes these methods are combined: online deduplication continues until the data arrival rate reaches a certain limit, after which the transition to post-process processing takes place.
Comparison of deduplication at the source and post-process (postoperative) deduplication.The reasons for such compromises are that deduplication is costly. It requires computational resources, and it causes a decrease in system performance. Therefore, you need to consider what costs are inevitable when using each method.
Thus, there are two main deduplication schemes: in the first method, the data is processed on the client side (“at the source”), and after deduplication are sent to storage devices. In other cases, deduplication occurs on end storage devices. The first method reduces the load on the network, but requires the installation on clients of specialized software, client computing resources. The second involves the use of more powerful and expensive storage systems, and the cost of the initial data transfer is not actually reduced.
Deduplication can be performed at the source or directly into the data warehouse. Depending on this, the computational load is distributed, which can also change as the block size changes. In any case, you need to take into account the relatively high requirements of deduplication to system resources.Deduplication is most effective when the data is highly redundant, and also when it is copied and / or saved after making minor changes. In general, unstructured data like document files, virtual machines, email, and archives give a high deduplication rate. The volume is often reduced by 20-30 times. Deduplication of structured data, such as databases, gives a smaller ratio (usually 5-8 times).
The effectiveness of deduplication examples.In some hash-based schemes, data pre-processing is used to increase the deduplication ratio. Significantly reduce backup time allows deduplication of variable length blocks when processing data on the client side, since only unique segments are saved. This deduplication method is more effective than traditional deduplication of fixed-length segments, when even small changes in the data set result in a backup of the entire file.
Data deduplication technology.Deduplication of variable length segments, which is performed on the client device, is implemented in the Dell EMC Avamar backup system.
Deduplication in Avamar.Avamar: solution architecture
Avamar is a comprehensive hardware and software solution for backing up and restoring data, allowing you to work in both real and virtual environments. It supports data transfer optimization when working with different types of networks. Recovery is performed in one step. Avamar can be configured to work with a specific type of software (for example, Oracle databases) and a virtual environment.
The Dell EMC Avamar system from the Data Protection Suite product family, integrated with the Dell EMC Data Domain storage system, can be deployed in a variety of ways and complete daily backups with it. It supports virtualized and physical environments, enterprise applications, network storage systems (NAS) and PCs, data protection of remote offices.
Avamar deployment options
Avamar Data Store
| The easiest and fastest option, Data Store includes a Dell EMC certified dedicated backup device and Avamar deduplication software. This turnkey solution avoids the complexity of integration.
|
Avamar Business Edition
| This is a specialized backup deduplication device for medium-sized companies. Its features are simple management, built-in fault tolerance of the storage system and additional replication capability for disaster recovery.
|
Avamar Virtual Edition
| Avamar backup software with deduplication and virtual device deployed in vSphere or Hyper-V and Azure.
|
Data Domain Integration
| Combined with Data Domain deduplication storage systems, Avamar can be used for all backup workloads.
|
To create backup copies faster and more efficiently, Avamar takes a multi-threaded approach with Dell EMC Data Domain Boost software. This allows you to reduce network load by 99%, get by 30 times less capacity of backup storage and reduce backup time by half. And image recovery will occur up to 30 times faster than traditional backup.
Avamar + Data Domain support different types of loads.A multi-threaded approach with Dell EMC Data Domain Boost software means that multiple simultaneous parallel streams can be created within a single backup session to speed up operations.
Deduplication at the first copy.It is worth noting that the backup of the same data is never performed twice. The system separates the data to be backed up into variable length segments, compresses them and applies a unique hash identifier for each segment. It then determines whether a segment has previously been backed up and copies only unique ones.
Avamar deployment options.The Avamar Data Store integrates Dell EMC certified hardware and Avamar backup software with deduplication in a scalable, ready-made solution. Data Store eliminates points of failure using the patented Redundant Array of Independent Nodes (RAIN) technology and ensures high availability of all nodes. System and data integrity checks are performed daily to ensure recoverability.
Key features of Avamar:
- Global client-side deduplication reduces the amount of backup data on the client side and globally. The time reduction for daily backups is reduced by 90%, the load on the network bandwidth is 99% and the total capacity of disk storage systems is 95%.
- Data encryption on the fly and in storage for security.
- Patented RAIN architecture for high availability of Avamar nodes.
- The reliability of the Avamar server and the ability to restore backup data are checked daily.
- Centralized management simplifies backups of remote offices from the data center.
- Fast one-step data recovery. There is no need to restore the latest complete and incremental backups.
- Longer storage in a virtual tape library or tape.
- Compliance with regulatory requirements.
- Backup VMware virtual environments and Microsoft Hyper-V. Variants of physical and virtual deployment of Avamar.
- Multi-threaded integration with Data Domain storage systems. High performance backup and restore for specific applications.
Deduplication in Avamar.Among the technological features of the solution are:
- Flexibility and scalability with virtual proxies for VMware protection.
- The presence of a special cache of unique files and blocks, which makes it possible to bypass the file system very quickly compared to traditional backups.
- Supports most well-known corporate applications, such as SAP, Oracle, MS SQL and others, as well as virtual and physical environments.
- Support for CBT technology when restoring virtual machines (it allows you to quickly recover large VMs).
- The presence of a special plug-in for vCenter to manage directly from this console.
Why and how can such a system be used?
Avamar usage scenarios
â–Ť1. Virtualized environments
You can back up and restore virtual machines at both the guest OS level and the VMware vSphere and Microsoft Hyper-V image level. Avamar is integrated with VMware vCenter, VMware vCloud Director, VMware vStorage API for Data Protection, vSphere Web Client, and Microsoft Hyper-V. There is also a special version of Avamar Virtual Edition (AVE) - an exclusively software solution for protecting virtual environments.
Using Avamar to back up VMs in VMware using the data snapshot function of storage systems and software agents.In a VMware environment, Avamar uses VMware Changed Block Tracking (CBT) to track changed blocks. This speeds up the backup and recovery process. And throughput is enhanced by intelligent load balancing across multiple proxies.
Benefits of Avamar in protecting VMware virtual environments.Avamar provides a quick one-step recovery for individual files or complete VM images in the original, alternative, or new VM. If VM images are stored in Data Domain, you can immediately access the virtual machine by moving it to production with vMotion.
Backup guest virtual machines.It is worth noting that VMware vSphere Data Protection (VDP) - a product for backing up virtual machines on ESXi servers - is built on the basis of the Avamar solution, which is deployed as a virtual appliance (Virtual Appliance). The software works without agents, uses CBT and can recover not VM entirely, but also individual files through the browser.
Dell EMC Avamar also supports backing up and restoring virtual machines running on VMware vSAN.By integrating Avamar with vRealize Automation (vRA) and vCloud Director (vCD), you can use data protection services for a public, private, hybrid cloud, as well as for cloud software. VRA Extension The vCD Data Protection extension builds backup services directly into the vCloud Director.
Avamar also protects data from a private and hosted Microsoft cloud, including Hyper-V and Azure. Administrators assign appropriate data protection policies using Microsoft System Center Virtual Machine Manager. Policies are applied when allocating virtual machine resources.
By the way, Data Domain Cloud Tier supports Azure, Amazon, Virtustream and any devices that work with the S3 protocol.
Disaster-proof solution: backup VM to the cloud.
Cloud Disaster Recovery Architecture.Avamar protects each VM using an image-based backup method and scans VM files directly into the Hyper-V file system. Such data processing is more efficient than using agents. Cluster Shared Volumes (CSV) allow multiple nodes to access all disks in clusters.
â–Ť2. Backup Network Storage System
By using the NDMP (Network Data Management Protocol) protocol, NAS backup time is reduced. You can opt out of the long-term creation of basic full backups.
Avamar allows you to back up and restore network storage systems using the Avamar NDMP Accelerator node.When using the Avamar NDMP Accelerator node, zero-level backup is performed only once, during the first full backup. Then only incremental level 1 dumps are applied. This significantly reduces backup time and negative impact on NAS resources, simplifies and speeds up recovery.
However, there are no restrictions on the number or size of files and volumes to comply with backup windows. Thanks to multithreading, Avamar can operate in large, horizontally scalable NAS (including the Dell EMC Isilon), providing higher throughput when transferring copy data.
The fast incremental backup architecture of Avamar allows you to work with HDFS, while eliminating the bottlenecks inherent in traditional file-based backup.
â–Ť3. Copying desktop and laptop data
This option allows end users to protect and recover their data themselves - no need to wait for the help of IT professionals. In this case, the system performs data deduplication, backup of open files, adjusts the load on the CPU and does not affect the work of end users. Data backup is performed automatically during scheduled backup windows when the PC is connected to the network.
Users can run backups on demand and quickly restore their data. The process is simplified by an intuitive interface and an integrated search engine. In this case, recovery is always performed in one step.Dell EMC Avamar Workstation Solution is optimal for usability and scalability. It offers a complete set of necessary tools to protect data from distributed networks of PCs and laptops. Customers who have installed a Dell EMC Avamar solution will have real benefits from using data deduplication technology to optimize backup and restore procedures.
The current problem is protection against the rapidly gaining popularity of ransomware viruses (Ransomware). According to TrendLabs, in 2016 the number of known types of ransomware Trojans increased by 752%: from 29 types in 2015 to 247 by the end of 2016. According to Kaspersky Lab, in the 1st quarter of 2017, 11 new families of ransomware Trojans and 55,679 modifications appeared (for comparison, 70,837 modifications appeared in 2-4 quarters of 2016). First of all, attacks with the introduction of the ransomware are aimed at the most popular operating systems: Windows for workstations and servers, Android for mobile devices.
To protect the backup and recovery environment from outside attacks, Dell EMC offers the
Isolated Recovery Solution . The backup system embedded in the customer's infrastructure (for example, based on Dell EMC Data Domain) is separated by a special gateway from the rest of the infrastructure and is connected to it only at certain points in time for backup. Data backups remain recoverable.
Thus, the main idea of ​​Isolated Recovery is to create an isolated segment of the IT infrastructure that cannot be accessed from outside the network perimeter, as well as significantly limited access for internal users. This area is backed up the most critical information for the business.
Isolated Recovery: an isolated segment is connected to the main system by means of a gateway, which is opened to create backup copies at previously unknown points in time. Stored data is additionally checked for the presence of malicious codes.Even in the case of cyber attacks and the destruction of the working environment, Isolated Recovery solutions together with the use of Data Domain allow you to save backups and use them for further work. However, the implementation requires a preliminary analysis of the infrastructure, identifying the most critical components, developing a data recovery strategy.
â–Ť4. Backup of remote office data
The system allows you to centralize backup and recovery in remote offices through a single interface.
Remote office data protection with Avamar and Data Domain.In remote offices, simple Avamar software agents can be deployed to servers without the use of additional remote hardware. This allows you to back up data over existing WAN links to the central Avamar Data Store system in the data center.
Multisystem Management - centralized management of all Avamar topologies.â–Ť5. Business critical applications
Avamar provides backup and recovery for IBM, Microsoft, Oracle and SAP applications with high-performance deduplication, as well as management capabilities for application owners. Unlike competitive solutions, in addition to traditional copying at the image level, there is an opportunity to organize the correct consistent copying of applications using special integration modules.
In addition, when backing up to application-specific features, Avamar uses Dell EMC Data Domain Boost software to send critical business application data directly to the Data Domain system. Companies can unify their data protection processes using this software and equipment for deduplication, and get a backup and recovery solution with high performance and scalability.
The system is offered in the following versions:
The product's name | Dell EMC Avamar | Dell EMC Avamar Virtual Edition | Dell EMC Avamar Business Edition |
Description
| The Dell EmC Avamar backup system with deduplication provides fast, daily full backup and one-step data recovery.
| Protects virtual machines in virtual environments and remote offices. Software version for VMware vSphere, Microsoft Hyper-V and Azure.
| A turnkey backup device with integrated data deduplication. Designed for medium-sized customers.
|
Backup storage
| Dell EMC Data Domain Systems Dell EMC Avamar Data Store
| - | Dell EMC Avamar Data Store
|
Database
| IBM DB2 IBM Domino (Lotus) Microsoft Exchange Server Microsoft SQL Server Oracle SAP Sybase Microsoft SharePoint
| - | Microsoft Exchange Server Oracle Microsoft SQL Server IBM DB2 IBM Domino (Lotus) SAP Microsoft SharePoint Sybase
|
Virtualization Support
| Microsoft Hyper-V VMware Openstack
| - | - |
Hypervisors
| - | VMware Microsoft Hyper-V
| - |
Version
| - | Software
| Backup storage
|
Deduplication Capacity
| - | - | 3.9 or 7.8 TB
|
Finally, a brief comparison of Avamar with another well-known backup and recovery product.
Avamar vs veeam
Veeam features:
- Data loss prevention
- Automated Disaster Recovery
- Scale-out backup repository
- "Sandbox on request"
- Instant file level recovery
- Reports
- Monitoring
- Standalone console
- Capacity planning
- Forecasting
- Deduplication within a single backup job
| Avamar features:
- Network bandwidth optimization
- Fast recovery
- Multiformat backup
- Deduplication technology
- High reliability servers
- Large capacity storage
- Scaling options
- Guaranteed uptime
- Continuous availability
- Training
Global deduplication, data is copied only once
|
Summary
Avamar software is aimed at customers who want to backup and restore virtual environments and workstations. It is suitable for protecting virtual environments, enterprise applications, remote offices, desktops / laptops, and network storage servers (NAS). Companies that use VMware vCenter, VMware vCloud Director, VMware vSphere Web Client, Microsoft Hyper-V and VMware vSphere API for data protection users can use this software to optimize backup and recovery of virtual machines.
Avamar has fast recovery and backup capabilities by reducing backup time and daily storage of changes only, optimizing network bandwidth, single-step recovery, has high reliability thanks to RAID and RAIN technologies, offers different deployment options, scaling to 124 TB with deduplication. Users of desktops and laptops can use the recovery function for self-service.
Avamar data deduplication features are more advanced than other similar tools. The specificity of global deduplication is that, regardless of the type or number of external devices, sessions, data is copied once and stored in the deduplication pool once.
For example, Veeam is characterized by deduplication within a single backup job. The new next task means that even duplicate data will be copied again. In the HPE backup solution, if two logical devices are created in the system, then there will be no global deduplication between these devices.
The benefits of Avamar include support for protecting workstations, the presence of special modules for backing up most business applications (Oracle, SQL Server, Exchange, SharePoint, SAP, Sybase, IBM DB2, etc.), support for protecting virtual and physical servers.
Backup and recovery tools provide tight integration with popular virtualization solutions such as Hyper-V and VMware vSphere. In addition, Avamar can function as part of the Integrated Data Protection Appliance. Among the new features of the product - support for clouds, tearing. Avamar can also offer the option of organizing long-term storage in the cloud — both with and without Data Domain.