Welcome to HadoopKitchen

We're glad to inform you about our new initiative, which will be of interest to both programmers and a number of other IT specialists: September 27, next Saturday, the first HadoopKitchen meeting will take place at the Mail.Ru Group Moscow office. Why exactly Hadoop and how can this meeting be interesting for non-programmers?

Hadoop is the center of a true ecosystem, with numerous projects and technologies associated with it.
Many companies rely entirely on commercial Hadoop distributions.
Hadoop is included in the product lines of almost all major information technology providers, indicating its relevance and popularity.

The program of the first Hadoop meeting will be very rich, as many as four speakers will speak. All of them are wonderful professionals with extensive experience, which they want to share with the audience. Under the cat read the program of events and announcements of reports.

The program of the event :
')
11:00 Registration and welcome coffee.

12:00 Alexey Filanovsky (Cloudera Certified Developer for Apache Hadoop, Senior Sales Consultant, Oracle) tells about new interesting features of Hadoop v2. Of course, this will not be a dry enumeration with brief descriptions; Alexey will also analyze different scenarios for using these features, and at the same time he will talk about some examples from practice.

Hadoop-ecosystem is gaining popularity by leaps and bounds, more and more users are beginning to use it not only for synthetic tests, to satisfy their own curiosity, but also in the productive environment of the enterprise. This fact explains the rapid development of the product. More users, more suggestions for developers. This report will highlight the main features introduced in Hadoop v2.

13:00 Nikita Makeev (Data Team lead, IponWeb) will tell the audience special knowledge about how you can extend the capabilities of Hadoop Streaming when working with modern data formats Avro and Parquet.

Map-Reduce, Avro and Parquet without Java. Almost. Hadoop Streaming is a great way to ride Hadoop in particular and batch processing large amounts of data in general. You almost don’t need to know Java, but only approximately to understand how MapReduce works, and to be able to write in any programming language that can process lines of text. Virtually any task that can be solved with MapReduce can also be solved with Hadoop Streaming. The advantages are obvious - ease of development, no problems with personnel, low entry costs.

One of the most common uses for Hadoop Streaming is processing text logs or other data presented as text. However, popularity is rapidly gaining more complex than just text formats. Is it possible to retain the ability to process data using scripting languages and at the same time use all the advantages possessed by modern data formats such as Avro and Parquet?

We cope with this task using some amount of Java code and JSON as a link. As usual, there are nuances, features, and often special unique “rakes” about which will be told.

14:00 Maxim Lapan (lead programmer of the project Search, Mail.Ru Group) tells a fascinating story about how Hadoop clusters are managed in Mail.Ru Group. The speaker will not bypass the difficulties that stood in the way of the development team as the system developed and expanded. The report will be devoted to the practical side of operating the Hadoop / HBase cluster, which has been used in the Search Mail.Ru project for the past three years. During this time, the system has grown from 30 to 400 servers, the storage capacity from 400TB to 9PB. Topics to be covered:

how we invented our bigtop: the structure and logic of our builds of rpm-packages, support for several clusters, user work, features of the configuration of Hadoop components;
monitoring and analyzing cluster performance: how we monitor the operation of clusters, which metrics we use;
problems administering a large Hadoop / HBase installation.

15:00 Lunch. War is war and dinner is on schedule.

From 15:45 to 17:45 in the World Cafe format, everyone will be able to take part in the joint determination and discussion of the most burning issues of operating Hadoop.

At 18:00, Alexey Gryshchenko (Pivotal Enterprise Architect, EMC Corporation) will make a presentation on what features and nuances are typical for the architectural solution of Pivotal HAWQ, and also will tell about its interaction with Hadoop. The report will cover the following topics:

The current position in the market of solutions that implement SQL-interface for working with data in HDFS. Recently, this topic is very actively gaining popularity, which is largely due to the popularization of Hadoop in the corporate sector. I will briefly highlight the major current solutions and the fundamental problems that all such systems face.
Components of the Pivotal HAWQ solution and their interaction with HDFS. Here I will talk in detail about what components our database consists of, how they are located on the cluster, how they are related to HDFS and how they store data.
Detailed analysis of the query execution process. As an example, a simple request will be given, the process of its execution will be described in steps from the receipt of a request to the system until the data is returned to the client application. Also here I will briefly talk about the distinctive features of the processing of requests in HAWQ compared to other systems.
Opportunities for organizing access to customized storage formats on HDFS, as well as to various external systems. Here I will talk about the PXF framework and the possibility of its expansion, give an example of the component I implemented
Other HAWQ features and future direction. I will talk about the possibilities of using HAWQ to solve the problem of data mining, as well as highlight the direction in which our platform is developing and what changes are worth the wait.

Be sure to take with you an identity document, we have strict security. You will also need to register .

Source: https://habr.com/ru/post/237131/

All Articles

Welcome to HadoopKitchen

More articles: