Announcement of Moscow Spark # 4

Hello! New year, new Spark, new Moscow Spark! We start the new season of our wonderful event on April 19 at the Attic Rambler & Co. The framework does not stand still, and so do we, this time we will present a new community site and try out the format with a star from abroad.

1. What's new in Spark 2.3? - Pavel Klemenkov, Chief Data Scientist @ Nvidia / Data Wizard @ BigDataTeam

In the report, I will review three main, in my opinion, new Apache Spark features: continuous streaming, streaming ml and vectorized udf. In the examples we consider the difference between continuous streaming and microbatch, how much faster it is and what restrictions are associated with it. Let us examine the urgent problem of all specialists in machine learning: how to write down the model in the product and do it with the help of a new, unified interface Streaming ML. And finally, consider how the developers have overcome, it seems, the final performance pain of PySpark with the help of UDF vectorization.

2. MOOC for Big Data: give everyone a cluster and check the solutions! - Oleg Ivchenko, Assistant @ MIPT / Data Wizard @ BigDataTeam, and Pavel Akhtyamov, Developer Analyst @ Vicman Development / Data Wizard @ BigDataTeam

Last year, our team (BigDataTeam), together with Yandex, launched the Big Data for Data Engineers specialization. The uniqueness of this specialization lies in the fact that students' solutions are tested on a real cluster. Launching such an infrastructure and its integration with Coursera turned out to be quite a laborious task and set us many interesting engineering tasks. We will tell about them in the report. Namely:

1) How to build a Spark cluster with Jupyter inside a Docker container
2) how to embed in the coursera its pipeline test tasks using the interface LTI
3) how to transfer a jupyter laptop to a production cluster and test it on it

3. Apache Spark on Kubernetes the easy way - Dmitry Lakhvich [KrivdaTheTriewe], Senior Research Engineer @ Tookitaki / Data Engineer @ Maximtelecom

One of the innovations of Apache Spark 2.3 was experimental support for Kubernetes in the main branch. In this report, I will consider both the architecture of Kubernetes itself, its deployment, the basic configuration in the minimum configuration, and the deployment of Apache Spark applications in Kubernetes. Some subtleties of customization will be considered, as well as the question of why we need another scheduler and what benefits it brings.

The event is free, and registration is required .

With us pizza and tea!
')
Beginning at 19.00
Location: Warsaw highway, 9, p. 1, entrance number 5. Attic Rambler & Co

Be sure to register and take your passport with you so that the security guard of the business center will miss you!

Come, it will be interesting!

Source: https://habr.com/ru/post/352772/

All Articles

Announcement of Moscow Spark # 4

More articles: