Announcement of Moscow Spark # 3

Hello! We have been preparing for a long time, looking for cool speakers and, finally, announce the Moscow Spark # 3, which will be held on November 16 at the Attic Hall of Rambler & Co! The previous rally gathered almost 250 people, and we expect to be able to collect at least this time. The key theme of this event will be Spark Streaming, the current and very interesting part of the Apache Spark framework.

1. Spark Streaming and online audience segmentation - Artem Vybornov, leading developer of the auditorial segmentation department at Rambler & Co

The main goal of our team is audience segmentation. To minimize the time between receiving information about the event before it was taken into account, a microbatch pipeline for data processing was built in the unscrewing of the advertisement. From the report you will learn about the experience of implementing online segmentation using Spark Streaming. Understand how to ensure fair exactly once and why we did not do it. You will find out what tasks are definitely not worth solving with the help of Spark Streaming, and which, on the contrary, are ideal for him.

2. Collection and processing of security logs on Spark Streaming in 24/7 mode - Andrey Titov, team leader of the data analysis platforms development department of the Infosecuritte group of companies

The report will consider the use of Spark Streaming 1.6.3 as the main engine for collecting and analyzing logs in the Otkritie Bank Security Operations team. Using logs from various sources, we detect information security incidents and prevent attacks on bank infrastructure. We will talk about how we built the ETL process, where and how we store logs, and what databases we use with the Spark platform. And also about the problems that we encountered in the project.

3. GeoTrellis, Distributed processing of geolocated images on Spark - Grigory Pomadchin, core engineer GeoTrellis in Azavea

Spark GIS data processing, raster data preprocessing and subsequent data use for any algebra or analytics. Problems of storage of such data and their effective use. It will be considered how and why preprocessing is done, and what implications of using preprocessing and postprocessing exist for building real-time services.

4. Building a reference system based on Apache Spark - Nikita Uchitelev, head of data processing and analytics department at YouDo

In my report, I will talk about how YouDo organized data streams, how they can solve various predictive analytics tasks, from automated search of fraudsters to generating personal recommendations for users, what technologies are used for this and what are the strengths and weaknesses of Apache Spark for these tasks. I will try to pay attention, first of all, to technical solutions in terms of the integration of various services among themselves, as well as the architecture of such projects.

The event is free, and registration is required - rambler-co-e-org.timepad.ru/event/604814
With us pizza and tea!

Beginning at 19.00
Location: Warsaw highway, 9, p. 1, entrance number 5. Attic Rambler & Co

')
Be sure to register and take your passport with you so that the security guard of the business center will miss you!

Come, it will be interesting!

Source: https://habr.com/ru/post/341394/

All Articles

Announcement of Moscow Spark # 3

More articles: