EMC ViPR 2.1: Third Platform Data Management

ViPR - an element of the software-defined data center

ViPR implements for the data storage segment roughly the same as VMware did for the server segment — it creates the ability to abstract resources, create pools, and implement automation for the infrastructure. Using VMware APIs, storage pools created in EMC ViPR are presented to VMware vSphere as a simple array. In addition, the ViPR controller provides integration with VMware vStorage API for Storage Awareness (VASA), vCOps, as well as with VMware SDDC management and orchestration tools, vCloud Automation Center and vCenter Operations Manager. Thus, in ViPR, storage management can be carried out as an independent object, which is presented as such in virtual environments of Microsoft and OpenStack, as well as within the software-defined VMware data center.

The main goal of EMC ViPR development was to simplify and reduce the cost of managing existing heterogeneous storage infrastructures, as well as creating a simple data management and data access system in distributed cluster file systems, for example, based on hadopoop clusters, as well as in cloud environments.
The basic functionality of EMC ViPR is freely available without any payment and no time limit. It is represented by the components: ViPR Controller and SolutionPack (M & R - monitoring and reporting). ViPR Services components that support object, block, and HDFS access are licensed separately. To deploy ViPR with basic functionality, a single VMware ESX virtual machine with two processors is sufficient. ViPR Solution Pack needs four more virtual processors.
')
You can deploy ViPR not only on EMC hardware, but also on third-party servers. The platform can be used both to manage the storage infrastructure and to manage data hosted on hadoop clusters. In this case, ViPR is additionally deployed as an agent on a separate node.

EMC ViPR is designed for cloud environments and service providers, as well as for those corporate customers who are moving to the “IT as a service” model and are creating an internal cloud with web access. ViPR is designed based on a globally distributed architecture, which eliminates the need to move large amounts of data across the network. The platform provides horizontal scaling as the number of devices and data volumes grows, eliminates a single point of failure and allows you to build an environment with fully autonomous control and resource allocation.

Management level

ViPR Controller software is designed to simplify the management of storage infrastructure (including heterogeneous) at both local and global levels. If you compare ViPR Controller with classic storage virtualizers, it represents the “Out-Of-Band” solution, because it does not store any data inside it and does not pass through any information flow and is in fact neither storage system nor data storage virtualizer. ViPR Controller deals only with the management (administration) of the storage pool and its associated services. The creation of storage pools and their further assignment to applications occurs through the self-service portal.

ViPR Controller can significantly improve automation functions, in particular, reducing the harm to administration as it virtualizes the underlying storage infrastructure. Storage management functions, such as resource allocation and migration, are abstracted so that the various storage arrays can be managed as a single resource pool from a single console.

At the same time, the corresponding arrays, data protection tools, technological settings, and others are “attached” to each pool. Then each pool corresponds to a given service level of service.

After creating storage pools, they are shared for use by applications. For this is the self-service portal. In it, you can browse the catalog of data storage services and select the service resources that are most suitable for your tasks.

For most traditional storage infrastructures, EMC ViPR will provide only the management layer that performs storage discovery, the creation of virtual storage pools, and the assignment of these pools to applications. At the same time, the management of the entire data exchange remains at the array level.

ViPR Controller supports all types of data access: block, file, object, and access to hadoop clusters (data storage based on a distributed file system - HDFS) using iSCSI, NFS, REST, etc. protocols. At a block level, ViPR can work with SAN zoning (SAN Zoning, Brocade and Cisco switches).

The new version of ViPR Controller adds support for standard disks and a large number of third-party storage arrays thanks to the built-in support or via the OpenStack Cinder plug-in. The full list of native support includes EMC, Hitachi Data Systems solutions (AMS 2100, USP-V, VM HUS and VSP) and NetApp FAS (7-mode only), as well as standard storage systems. When installing OpenStack Cinder, ViPR also supports Dell, HP, and IBM arrays. In fact, ViPR received support from most storage systems available on the market: Dell EqualLogic, HDS (HUS), HP 3PAR (StoreServ), HP Lefthand (StoreVirtual), Huawei T / Dorado, IBM DS8000, IBM Storwize Family / SVC, IBM XIV, LVM (Reference), NetApp, Nexenta, Solaris (ZFS), SolidFire, Zadara Storage and others. A single panel in ViPR 2.0 allows you to automate and standardize the management of your existing storage infrastructure and at the same time implement support for the new, policy-based.

In addition, the new version adds support for standard disks and block data management services based on EMC ScaleIO. ViPR Controller 2.0 began to support converged infrastructures based on VCE Vblock Systems.

Support for EMC arrays has been enhanced through improved integration and administration of EMC VPLEX, EMC RecoverPoint, SRDF, and Data Domain. New features include multi-site data management through spatial storage scaling functions that provide data access, integrity and security. Multi-user functionality has been extended to support geographically distributed storage systems that scale to hundreds of clients in multiple locations in a single namespace. This means that now ViPR's object data management services can work with multiple locations, offering the most advanced spatial replication and spatial distribution features to provide a fundamentally new level of efficiency and performance. The ViPR object data management services offer additional features to meet the requirements of various regulators, as well as support for the EMC Centera CAS (Content Addressable Storage) API. As a result, EMC Centera users can still use the unique long-term storage features found in their applications on any platform supported by ViPR without changing existing software.

Since ViPR Controller is freely available, we can say that EMC is moving towards SRM solutions towards greater openness and accessibility.

Event monitoring

VIPR SolutionPack (Reporting and Monitoring) includes a number of features. For example, visualization of storage storage utilization trends by service levels and virtual storage pools (VSP) with detailed virtual arrays (VSA) is available. It is also possible to visualize trends in the use of VSA by service levels and visualize tendencies in the use of storage resources by tenants. In addition, the system allows you to monitor VIPR-events (warnings, errors, etc.), as well as their presentation for a certain time period.

Data level

In the case of traditional file-and-block-based workloads, the EMC ViPR platform “disengages” and transfers the role of the data tier located in this infrastructure to the underlying array. This model includes the majority of application workloads in the data center, and, according to EMC, such loads will increase by about 70% by 2016. But at the same time there are new application workloads that often work with huge amounts and flows of data and serve thousands or millions of users. These are the so-called “third platform” technologies, which are associated with the wide distribution of big data, mobile systems, social networks and cloud services, and create thousands of times more information than their predecessors, requiring new storage infrastructures

Features of these new applications suggest a completely new architecture. The mandatory requirement of massive scalability requires the use of a simpler approach to storage infrastructure — object data storage. At the same time, access methods also change: traditional protocols (such as NFS and iSCSI) give way to new ones, such as HDFS, which are known as the basis of the Hadoop database. To support these new architectures, object data services are implemented in the EMC ViPR platform.

ViPR's object data services provide access through HDFS and REST-based APIs that are compatible with Amazon S3 and OpenStack Swift, and thanks to this, applications written for these APIs work without any problems. They also support existing EMC Atmos, EMC VNX, and EMC Isilon arrays as a permanent layer, as well as third-party arrays and solutions based on standard servers. At the moment, this list includes about 20 storage lines.

ViPR "sees" objects in the form of files, which allows you to get the performance characteristic of file access, and eliminate the delays inherent in object data storage. In addition, the ViFS HDFS data service allows you to perform local analytics across the entire heterogeneous storage environment. As a result, the extremely time-consuming and resource-intensive task of managing heterogeneous storage environments disappears by itself.

The solution facilitates the transition to the “third platform”, providing an opportunity for coordinated and fully automated management of classic and new storage infrastructures, as well as provides integration with higher-level management and orchestration tools offered by VMware, OpenStack and Microsoft, thanks to which the storage system is seamlessly integrated into the system data center workflows and business processes.

ViPR HDFS data service

Apache Hadoop is a set of utilities, libraries and frameworks for developing and executing distributed programs running on clusters of hundreds and thousands of nodes, and consists of several modules. Hadoop Distributed File System (HDFS) is a distributed file system that writes data to standard servers, providing high aggregate bandwidth for the entire cluster. Hadoop YARN (Yet Another Resource Negotiator) is a resource management platform that is responsible for managing computing resources in clusters and using them by user applications. Hadoop MapReduce is a programming model for processing large data volumes. The Hadoop Ecosystem is an ecosystem of Apache projects such as Pig, Hive, Sqoop, Flume, Oozie, Spark, HBase, Zookeeper, etc., which add value to the project and improve its use.

ViPR HDFS Data Service Architecture

The main components of HDFS are NameNode and DataNode. The first is the central element of HDFS, which serves as a metadata server for the file system. HDFS is managed through a dedicated NameNode server, which hosts file system indexes, and a secondary NameNode, which can generate snapshots of memory structures in, thus preventing damage to the file system and reducing data loss. In HDFS, individual files are divided into blocks of fixed size. These blocks are stored in a cluster on one or more nodes, references to which are stored in DataNodes. DataNode nodes are used to process requests for reading and writing by the direction of the NameNode.

Apache Hadoop YARN - this cluster management technology is a key feature of the second generation of Hadoop and is characterized as a highly scalable distributed operating system for applications focused on working with big data. YARN combines a centralized resource manager, coordinating how an application uses Hadoop system resources with node management agents (Node Manager), which, in turn, monitor the processing of individual cluster nodes. Separating HDFS from MapReduce using YARN makes the Hadoop environment more suitable for productive (transactional) applications that cannot wait for batch jobs to complete.

It is worth noting that the native implementation of Hadoop has a number of limitations, including limited namespace and cluster performance, low file system reliability, support for only one protocol, high storage costs, inefficient processing of small files, outdated architecture, and the lack of enterprise-level features and multi-rentals. Let us dwell on these restrictions.

The namespace of the HDFS file system is managed by a single server and is stored in its memory. Its size is limited by the amount of available memory on the NameNode, and file system performance, in turn, is limited by the performance of the NameNode.

Before Hadoop 2.x, NameNode was a single point of failure. The failure of the NameNode resulted in unavailability of the cluster. Recently, the High Availability option has been added to HDFS, but it has limitations: Hot Standby NameNode cannot actively process requests; in addition, new equipment is needed to support NameNode STAND-BY.
The native HDFS implementation provides support for only one protocol for data access. Object and file access methods are not supported.

By default, HDFS replicates all data blocks three times. This leads to a doubling of storage costs, which becomes extremely redundant, for example, when archiving.
HDFS is inefficient when processing large volumes of small files, because the metadata for each file in the file system must be stored in the memory of one server — in NameNode. For example, a million files consume about 3 GB of RAM.

Since HDFS was designed almost 10 years ago, it was focused on unreliable consumer magnetic hard drives and outdated network infrastructure (1GbE). It was assumed that the bottleneck is the network, not the disk, which is wrong for modern infrastructures.

The HDFS file system lacks enterprise-class features, such as geo-distribution, disaster recovery, consistent snapshots, deduplication, parameter monitoring, etc. Also, multi-rent features that can provide guaranteed data isolation and performance for many companies are not supported. As a result - a lot of isolated clusters with low utilization.

ViPR HDFS data service allows you to get rid of the above restrictions and to make hadoop clusters as close as possible to corporate requirements, regardless of whether they are installed on file servers or / and on ECS. This hadoop-compatible file system (HCFS, Hadoop Compatible File System), which makes it possible to run applications written for Hadoop 2.2 on file arrays and / or on EMS ECS (Elastic Cloud Storage) and managed by ViPR Controller. When the ViPR HDFS client is installed on each cluster node, all requests to the node are processed by the ViPR HDFS data service client (JAR), and the native components are no longer used. The ViPR HDFS data service increases the efficiency, performance and reliability of Hadoop, while providing a number of advantages.

Thus, an ECS device can easily scale to petabyte and exabyte sizes. At the same time, the ViPR data services / ECS architecture allows scaling in performance and storage capacity independently of each other. ECS provides access within a single platform with support for multiple API objects, as well as HDFS access, which makes life easier for application developers. Geo-distributed data protection ensures complete security of information in the event of website failures and in the event of any disasters. Since the data is highly consistent, applications can access it through any ECS site, regardless of where the latest information was recorded.

Erasing coding ensures the efficiency of data storage without compromising their protection or access to it. The ECS storage engine implements the Reed Solomon 12/4 coding erase scheme, in which the block is divided into 12 data fragments and 4 coding fragments. The resulting 16 fragments are distributed between the nodes in the local site. The storage engine can recover the entire block from a minimum of 12 fragments. In addition, ViPR data services / ECS adapts to handle large numbers of both small and large files. Using a technique called box-carting, ECS can perform a large number of user transactions at the same time with very little delay. This allows ECS to support high-performance workloads. ECS is also effective for processing very large files. All nodes can simultaneously process write requests for the same object, and each node can write to a set of three disks.

It is also worth noting that the ViPR HDFS data service allows you to select multiple Hadoop-vendors and combine them for sharing services.

Advanced Packages for Software-Defined Storage Systems

Significant changes also affected the packages for software-defined storage systems EMC - ViPR SRM and Service Assurance (SA) Suite. The updated complexes provide the most visual representation of complex environments with equipment from different suppliers. In addition to supporting a wide range of EMC and third-party platforms, the ViPR SRM package provides improved integration with ViPR and VPLEX, enabling organizations to get new cost-sharing opportunities for implementing IT-as-a-service model beyond the scope of the SLA. ViPR SRM package enhancements also include advanced virtual storage management from the ViPR console. SAS 9.3 implements integration with VMware NSX, which provides deep visualization of the computing and network infrastructure in physical and virtual environments.

The ViPR family of products implements two main functions — resource management virtualization and provision of data access for cloud infrastructures, while the solutions primarily focus on large infrastructures of large data centers.

If the task is to automate the process of allocating disk resources for virtual machines, as well as tracking changes in the configuration of the environment, ViPR Controller is a solution that automates storage from any manufacturer. During the creation of a virtual machine in any virtualization environment, the necessary disks will be immediately allocated along with it. Allocation of resources and their use can be centrally monitored using ViPR SRM, which also supports solutions from many storage vendors. The ViPR product is designed in such a way that you can manage and monitor an environment of any size by parallelizing the task on many virtual machines. To increase the efficiency of the data center, you no longer need an expensive hardware virtualizer, which is located on the data exchange path, adding additional delays to the environment and slowing down the work of applications.

ViPR Data Services provides the ability to create managed cloud storage resources of any type (object, file, block) based on conventional servers with local disks. This solution has impressive scalability and was designed with the possibility of providing cloud storage for rent.

With ViPR Controller, this type of storage can be successfully integrated into a data center that uses traditional storage systems from different manufacturers. Management virtualization will create a single consolidated pool of resource allocation from servers with local disks (DAS), storage area network storage (SAN), and network connection storage (NAS).

For questions contact: emc@muk.ua.

It is worth noting that EMC solutions through a group of companies are now available in Moldova , Georgia , Azerbaijan and Kazakhstan - a distribution contract was recently signed in these countries.

MUK-Service - all types of IT repair: warranty, non-warranty repair, sale of spare parts, contract service

Source: https://habr.com/ru/post/271037/

All Articles