HPE Vertica 8 Review (Frontloader)

Hello everyone and have a nice day. August 30, HPE officially announced the release of a new version of Vertika. It can be noted that the product has already reached the time of maturity, when instead of a huge list of new functionality, the expansion and optimization of the existing one is in the first place. There is also a clear integration with products and services in certain directions.

What do I mean?

Clouds

')
First, it is integration with MS Azure Cloud. This will allow the use of Vertic in MS clouds. Recently, I see a great foundation of friendship between HPE and MS. In addition to Azure, Vertika expanded support for VS Studio and improved driver performance for ADO.NET.

The friendship between Vertica and MS definitely pleases me, I hope it will develop further.

Jungle

Secondly, Vertika continues to bite into the world of Hadoop. If in earlier versions Vertika could only download data from HDFS of certain formats, then gradually she learned to work with all file formats, such as ORC and Parquet, connect files as external tables, and then store their data in ROS containers directly to HDFS .

In the new version, a significant optimization of the speed of work with HDFS, a catalog of metadata and parsing of these formats was carried out.

It seems to me that Vertika could become part of the Hadoop environment, this is not enough. That is why the new version has added a new type of licensing Vertika ... by the number of hadod nodes and the ability to build a Vertika cluster directly on the Hadoop cluster.

What it looks like:

The idea is that Vertika works directly in the Hadoop cluster, has direct access to data on HDFS and also stores its data on HDFS. In this case, the Vertica cluster is licensed by the number of nodes. HPE managers promise that the cost of the license will be better, but so far I don’t know the price of the license. So wait and see.

Where Hadoop is, there is Spark. The new version adds full support for working with Spark. You can copy data from Spark to Vertices tables, you can transfer data from Sparks back to Vertices to Spark.

Integration with Apache Kafka has already been added since version 7.2. However, it turned out that there are many problems that interfere with the full-fledged work of the Vertika connector with Kafka. In version 8 there are updated versions of libraries working with Kafka. I sincerely hope that they will close all the problems found and the people will stop opening cases.

Machine learning

Support for machine learning appeared in version 7.2. However, it was “on the side” - it was a separate library and was not fully integrated with Vertika’s metadata. Apparently the “theme has gone”, since the new version of Machine Learning is immediately integrated into the server, is available after installation, along with everyone is fully present in the metadata layer, and the functions are included in the standard. Let us wish Vertical to further develop and study in this undoubtedly promising direction.

Any fichechki

Fishechek surprisingly little. Apparently the fantasy of Vertika’s engineers is finally exhausted. From the point of view of optimists, this is probably not bad - fewer new chips, fewer bugs.

But still, in the new version such interesting things appeared as:

• The COPY_TABLE table copying function, which allows data to be shared with one table as part of another. That is interesting, then at data change each table will have a different data set. This is achieved due to the general use of ROS containers between 2 tables. What is equally interesting, for the Vertika license, calculate the volume for each table, even if the data in both tables are physically stored only once.
• For SELECT, in the FROM section, the keyword TABLESAMPLE has been added, which will allow you to return a specified percentage of the data in a random order of records.
• The IDLESESSIONTIMEOUT parameter allows you to shoot sessions that are hanging for a long time and do nothing. It has long dreamed of such a parameter.
• A new version of the Python API for access to Vertica has been released. It's always nice, the people on Python work with Vertica a lot.
• Added multi-language support for Text Search. They say they support the analysis of texts, even in Asian languages. I hope the Cyrillic, they also were able to win.

Finally

As I wrote at the beginning, I can also write the same thing at the end of my article - the forward movement is mainly observed for integration with clouds and services. I would like to know more about the licensing of Vertica on Hadoop. I think this is an interesting option for tasks where primary information is collected on Hadoop, milled and then loaded into the Vertica server for further work using its analytical functions and machine learning.

PS It is very pleasant that the name of the new version of FrontLoader is consonant with the name of our product of data delivery to Vertica EasyLoader. And it is not less pleasant that right now, when we teach our EasyLoader to manage data loading between HFDS and Vertica, the eighth version expanded the use of Vertika on Hadup. So to speak, on time.

Source: https://habr.com/ru/post/309550/

All Articles