Unique in its capabilities, HPE Vertica DBMS easily handles the processing of not only business transactions, but also machine-to-machine interaction and the Internet of things, allowing you to control the world of smart devices in real time.
The global economy is entering the era of the Internet of Things and massive inter-machine interaction. This means, says David Jones, senior vice president and general manager of the HPE business unit for information management and its organization, that soon, around 2020, worldwide will have to process data from
50 billion smart devices and
one trillion applications - only about 44 Zbayt. There is no doubt that the former DBMS, focused on processing transactional data circulating in traditional business applications, will not cope with such a load. They are being replaced by new generation DBMS, initially designed to work with large volumes and data streams. One of them is HPE Vertica, which is capable of analyzing in real time huge amounts of information received from various data “generators” - not only traditional transactional systems, but also sensors and devices of the Internet of things, machine-to-machine systems, automated process control systems, websites and other sources. .
The fruit of creativity genius world DBMS
Work on Vertica began in 2005 at Vertica Systems (in 2011 it became part of Hewlett Packard). Its author is a professor at the Massachusetts Institute of Technology, Michael Stounbraker, winner of the world's most prestigious IT Turing Award for 2014, awarded for its "fundamental contribution to the principles and practices underlying modern database management systems." The names of his brainchild on the hearing of all specialists DBMS: Postgres, Ingres, Informix, VoltDB and several others.
')
The basis of Vertica, Michael Stonebraker put the following principles.
- Analytical (non-transactional) processing of data, and large complex and short analytical queries must be processed very quickly, in real time.
- The platform architecture is designed for massively parallel data processing without the use of shared resources (the exception is a network connecting infrastructure elements, on which computing nodes data can be processed in parallel).
- The platform is linearly scalable, running on standard x86 server hardware.
- Support is provided for the standard SQL query language used in relational DBMSs (that is, from the user's point of view, the platform looks like a regular relational DBMS, but performs analytical tasks).
- The platform must support atomicity, integrity, integrity and isolation of transactions - the ACID principle (Atomicity, Consistency, Isolation, Durability).
According to Gartner, Vertica is currently the leader among analytical platforms in the number of deployments with data volumes of hundreds of terabytes or more.
Unique DBMS
Vertica DBMS has four unique properties:
- this is a truly columnar DBMS (true column store),
- Supports massively parallel data processing (MPP)
- and without using shared resources (shared nothing),
- expandable by connecting additional x86_64 serial servers.
Of course, there are other column DBMSs on the market that do not use shared resources, as well as DBPs that support MPP, but only Vertica has all the four properties listed.
The first three of the four unique properties of Vertica provide its highest performance, thanks to which a business can get the most relevant data and analyze information in real time. With MPP, processing is divided among multiple compute nodes, each of which performs its own part of the task. The refusal to use shared resources avoids architecture bottlenecks, such as waiting for access to disk systems. And thanks to the column architecture, Vertica automatically optimizes the physical storage of data, that is, the physical data model, which can significantly reduce the amount of information transmitted in disk read operations (these operations often slow down the DBMS operation) and achieve high performance. Compression of data helps to reduce these volumes even more.
An important advantage of Vertica is its ease of deployment: it doesn’t need a specialized hardware-software complex (appliance) to operate this database, it “feels fine” on serial 64-bit X86 servers on Linux with local hard drives. Recall, Vertica is licensed only as a software product and can be deployed on the equipment of any vendor. It is important to note that the time spent on administration is minimal.
Another key advantage of this DBMS is the ability to save with its help both previously made and future investments. Since Vertica supports the standard ANSI SQL 99 and the ACID principle, there is no need to retrain staff, and in addition, you can avoid the cost of modifying the application infrastructure that works with the DBMS using SQL. With the help of open interfaces, Vertica is easily integrated into the existing analytical landscape when installed instead of the previous DBMS used for analytical tasks. Curiously, the Vertica cluster can be smoothly expanded from terabytes to petabytes of "raw", that is, licensed, data without a fundamental change in its infrastructure.
Licensing rules contribute to the safety of investments: one step is 1 TB of raw data, which makes it possible to expand the use of Vertica as information increases (by the way, this DBMS can be used free of charge if the amount of data being processed does not exceed 1 TB), and the total volume of data analyzed. If, for example, a test or a backup is created in addition to the Vertica production cluster, additional licenses will not be required (of course, the total amount of data should not exceed the figures provided by the current license).
Finally, it is important that the Vertica architecture provides very good protection against all sorts of failures. The functions of replication, backup and recovery of data allow you to create disaster-resistant configurations with the placement of “hot” clusters in several data centers that are remote from one another. Clustering also provides protection against failures during the linear scaling of this system.
Vertica today has more than 3 thousand customers, the largest of which is Facebook, which acquired licenses for 20 Pbytes of data. Another one, AT & T, has about 3.2 pbytes of data in the storage, which comes from cell towers at a speed of 100 million files per hour. There are customers in Russia, the most famous among them are Avito and Yota, they use this DBMS as a central data repository. In addition, Vertica is used in Superjob and Gloria Jeans, as well as in various telecommunication companies, banks, retail chains, online stores, and transport companies. By the way, Uber, famous for its innovations, also uses Vertica.
Such popularity could not be achieved if the DBMS was not so versatile: it copes well not only with homogeneous but also with mixed analytical workload, when applications oriented to various types of analytics can work simultaneously - from simple reports to in-depth research of data and search for patterns in them. And data loading in Vertica can be carried out in parallel with their analysis.
Vertica is tightly integrated with Hadoop and easily handles both structured and semi-structured data, such as various system logs (logs). For media analysis, HPE recommends using IDOL, which is also supported for integration.
Functionality and performance
Vertica can be deployed on physical servers or in a public or private cloud. Already prepared recommendations for its deployment in the clouds of Amazon and Microsoft, has accumulated experience of integration with almost all popular analytical platforms and applications. By the way, in Russia there are already certified partners with experience in implementing Vertica, there is a partner who can take on the training of specialists.
The management of the DBMS is performed via the management console, which is included in the delivery and is implemented as a web application.
For processing data arriving in real time, a mechanism is provided that allows data to be loaded immediately into RAM, but its volume is always limited. The restrictions were removed thanks to the integration with Kafka, an open source product this year, which is a service bus that processes messages and provides two-way data exchange with Vertica, which allows not only to monitor the events of the Internet of Things, but also to respond to them, generating managers impact.
Over the coming years, it is planned to release annually one new release and five packages with significant improvements (functional updates). Customers who have purchased Vertica technical support will receive all updates for free.
During the development of the system, the most attention will be paid to improving the functions of the platform’s core and expanding the range of analytical tools and models, which will lead to an increase in Vertica’s performance and the appearance of additional capabilities.