
The rapid growth in the volume of information generated throughout the world is said on every corner. This is usually remembered when it comes to network infrastructure, client content, search technology and many other things. The same situation is observed in the corporate segment: in most organizations, the amount of stored information increases many times. According
to a Forrester Research report , about 85% of the data in corporate systems is static content that will never change. The various requirements of the authorities and regulators oblige organizations to store various information for several years (for example, information on all customers and completed transactions, etc.). As a result, businesses have to spend significant funds to ensure the storage of this information, investing in a server fleet, storage, software, etc.
Another result of the growth in the volume of stored data was the desire of many companies to have the opportunity to analyze and search all the available information. From a certain point, such a task turns into a Big Data processing task. As a consequence, there is a need to find solutions that are more suitable for storing and working with similar arrays of information. Therefore, many are looking for more profitable solutions for storing such arrays of information and working with them.
An interesting example is Nokia, which recently sold its mobile division of Microsoft. According to the terms of the contract, the Finns should have transferred to the new owner the entire information archive of the unit. Considering the large amount of data, Nokia approached this creatively: a compact archival storage system was acquired, and all necessary information was uploaded to the database, and then the entire system was simply sent to Microsoft.
Speaking about the growth of volumes of stored information, it is necessary to mention the accumulation of data from outdated applications. As information systems are upgraded, working environments are changing, new software packages are being introduced, the database structure is being rebuilt. As a result, a large array of information stored in the form already unused by the organization is accumulated. But, since it is often required to ensure the availability of these data, additional funds are spent on maintaining already irrelevant equipment and software for years.
')
About myths
Today, many tend to view archival systems as outdated or irrelevant technology. For example, it is considered that the backup successfully replaces archiving. In fact, it is not at all interchangeable concepts. Unlike backup, the archive is designed to store information without redundant duplication, allows you to structure and index data, provides access to it with the ability to search with optional encryption and the use of different policies. In addition, the transfer of static data to the archive can reduce the load on applications and get by with cheaper server clusters and storage systems.
It is also widely believed that the archive is a kind of indiscriminate pile of information, reflecting the history of the company and unnecessary for solving current and future business problems. However, we have already mentioned above about such a trend as the analysis of the data set about the work of the company, accumulated during its existence. According
to Gartner's forecast , by 2017 about 75% of organizations will use their own archive as the
initial source of information. Today, such organizations are about 10%.
The next prejudice regarding archiving is associated with the desire to hide some inconvenient information from the regulator. After all, the archive is much easier to find something. However, this situation has a downside: the size of fines for failure to provide the information requested by the regulator may amount to
millions of dollars . And this is many times more than the cost of creating the archive.
Speaking of costs. There is an opinion that archiving is expensive. However, in fact, archival systems provide significant cost savings. This is due to the use of cheaper carriers, with a decrease in the cost of support and an increase in the performance of the main working systems. You also need to recall that in 2014 was a record for the number of information leaks, and the volume of stolen data
increased compared to 2013 by 78% . Reputation and litigation costs can also be much more expensive than using an archive system with data encryption.
Finally, another argument against the creation of archives is the opinion that the ECM platform provides the same functionality. But there are a number of differences. First, the archive is designed to work simultaneously with structured and unstructured data. It is optimized for storing billions of records and documents. Secondly, as noted above, storing data in the archive is cheaper due to transfer to cheaper media, as well as reducing the size of the backup and freeing up resources of the working system.
Archive of the modern sample
Modern archive system allows solving five main tasks:
- Saving data for future use.
- Ensuring constant user access to stored data.
- Ensuring confidentiality of access.
- Reducing the load on the working systems due to the transfer to the archive of static data.
- Using data retention policies.
Also an important property of the archive is storing structured and unstructured information in a single database. Naturally, the base should be deployed on a separate horizontal-scalable storage system, so that the archive can be painlessly expanded as data increases.
As such a solution,
EMC InfoArchive can be used. This is a complex product, which is a bundle of "storage + software archiving and encryption platform." InfoArchive will also be useful when it is necessary to store inherited data from heterogeneous systems and in different formats, as well as for the tasks of analyzing data lakes. By “data lake” is meant a repository with a very large amount of raw data in the original formats, without any hierarchical structure.
Depending on the specific conditions (the amount of structured and unstructured data; the presence and composition of legacy systems supported by the company; the need to use analytical tools; the creation of cloud services, etc.); InfoArchive can be built on the basis of EMC
Isilon ,
DataDomain ,
Atmos or
Centera . EMC
Documentum Dynamic Delivery Services (DDS), based on xDB and using a number of international standards, including the open XML and
OAIS (Open Archival Information System) standards, is deployed on the selected storage system.

A special feature of InfoArchive is that all data must be transferred to the system either as SIP information packets, according to the OAIS standard, or as simple XML structures, if the customer does not require compliance with the OAIS standard. Also, all the information stored in InfoArchive can be accessed via JDBC for later use / recovery in the original application.

Data in relational databases are presented in the form of related tables. When a user requests some information, the application sends requests to the tables, aggregates the responses received and provides the user.




XML files are used to store, organize and transfer structured information and meta-data of unstructured information. This allows you to create an archive that combines data from disparate applications. InfoArchive provides the ability to search all stored data and use storage policies, provide encryption on the fly, and control access to certain data and its sets. Regardless of the size of the archive, only one DBMS is used.
System performance depends on the number and configuration of storage, as well as the configuration of the platform itself. For example, for some clients, the performance of InfoArchive when receiving structured data reaches 2 million records / hour (up to 60 GB / hour). The system is capable of processing up to 15,000 search requests per hour,
an average search for a single document takes 0.5 seconds, and records — 2.5 seconds.
Access policies and encryption using EMC RSA KeyManager are used to protect data. InfoArchive can also be integrated with other encryption systems.
Conclusion
Today, archiving systems are first of all being introduced in those companies where the most pressing problems are the increase in the volume of data that must be stored and made available at the request of regulators. First of all, it is the financial sector, telecommunications, utilities and the public sector. And as our company's practice shows, medium-sized companies are actively showing interest in archives, actively trying to strengthen their market positions. Visual evidence of the onset of the information age.