The main goal of our team is audience segmentation. To minimize the time between receiving information about the event before it was taken into account, a microbatch pipeline for data processing was built in the unscrewing of the advertisement. From the report you will learn about the experience of implementing online segmentation using Spark Streaming. Understand how to ensure fair exactly once and why we did not do it. You will find out what tasks are definitely not worth solving with the help of Spark Streaming, and which, on the contrary, are ideal for him.
The report will consider the use of Spark Streaming 1.6.3 as the main engine for collecting and analyzing logs in the Otkritie Bank Security Operations team. Using logs from various sources, we detect information security incidents and prevent attacks on bank infrastructure. We will talk about how we built the ETL process, where and how we store logs, and what databases we use with the Spark platform. And also about the problems that we encountered in the project.
Spark GIS data processing, raster data preprocessing and subsequent data use for any algebra or analytics. Problems of storage of such data and their effective use. It will be considered how and why preprocessing is done, and what implications of using preprocessing and postprocessing exist for building real-time services.
In my report, I will talk about how YouDo organized data streams, how they can solve various predictive analytics tasks, from automated search of fraudsters to generating personal recommendations for users, what technologies are used for this and what are the strengths and weaknesses of Apache Spark for these tasks. I will try to pay attention, first of all, to technical solutions in terms of the integration of various services among themselves, as well as the architecture of such projects.
Source: https://habr.com/ru/post/341394/
All Articles