Announcement of Moscow Spark # 2

As we promised, our event becomes regular - Moscow Spark # 2 will take place on July 27! Moscow Spark # 1, organized by the group of companies Rambler & Co, gathered more than 200 participants, and we hope that the hot weather that will ever be established in the Moscow region will not prevent us from collecting as many (and even more) participants this time. Moreover, we found new, interesting speakers.

1. About analytics and silver bullets - Alexander Podsoblyaev (Rambler & Co)

In my report, I will talk about how we restarted Rambler / Top 100, tools available on the market and about our experience of moving from the batch-data architecture to real-time data. I'll tell you about the architecture of the two solutions and their components. Briefly discuss the features of data processing using Python in Hive, the fundamental problems of storing aggregates, briefly consider the advantages and disadvantages of an alternative approach. Let us analyze in detail how to handle changing events using PySpark, ways to work with various components of the system from PySpark, problems that arise and their solution. Plus, look at the results, the speed of the new system and some pitfalls.

2. Tensor expansions for recommendations on Spark - Alexey Petrov (Zvooq)

In Spark.ML for recommendations there is an implementation of the ALS algorithm, which shows itself quite well in most real-world examples. In the report, I want to present my implementation of the iTALS algorithm on Spark, which is a generalization of the ALS matrix expansion algorithm for tensors. This algorithm allows to take into account the context in the recommendations, to make them more accurate and flexible. The report will discuss the results of the comparative experiment ALS and iTALS.

3. Plunge into Catalyst - Pavel Klemenkov (Rambler & Co)

Dataset and Dataframe have become the preferred interfaces for working with Spark. Largely due to the active development of the Catalyst query optimizer. In the report, we will look at the motivation for creating Spark.SQL and understand why it is so critical to PySpark. And we will also analyze in detail how the Catalyst works from the inside and how you can extend its functionality.

4. Dynamic allocation of resources or how to live in a dormitory? - Artyom Pichugin (New Professions Lab)

With the help of dynamic allocation of resources in Spark, you can ensure that a task receives additional resources, if any, in the free pool. Thus, sometimes, you can use the full power of a cluster and perform calculations faster. In the report I will tell how the dynamic allocation of resources helped to make possible the work of 30-40 students in the conditions of the approaching deadline for laboratory work and to live in happiness.

The event is free, and registration is required - rambler-co-e-org.timepad.ru/event/533749
With us pizza and tea!

Beginning at 19.00
Location: Warsaw highway, 9, p. 1, entrance number 5. Attic Rambler & Co

')
Be sure to register and take your passport with you so that the security guard of the business center will miss you!

Come, it will be interesting!

Source: https://habr.com/ru/post/332546/

All Articles

Announcement of Moscow Spark # 2

More articles: