
As we have repeatedly reported earlier, this year JUG.ru Group decided to look into the future and find
out what the need for two gray boxes to interact with each other is to let into our world a dose of sacred knowledge on Big Data and machine learning - we made the
conference SmartData 2017 , which will be held in St. Petersburg on October 21.
Why do we have a big data and machine learning conference? Because we can not collect. And in order to turn as many developers as possible into our brotherhood, we traditionally open a free online broadcast from the first conference hall.
')
So, a free online broadcast from the main hall of SmartData 2017 will begin on October 21, 2017 at 9:30 am Moscow time. Only you, we and the future. This time the broadcast will be available in 2k - reach out your 4k monitors!
Link to the online broadcast of the first track of the conference SmartData 2017 and a brief description of the reports - under the cut.
In the first track of the conference, held in the main hall, are:
- Vitaly Khudobakhshov - Name is a feature
- Mikhail Kamalov - Recommended Systems: From Matrix Decomposition to In-Depth Training in Flow Mode
- Sergey Nikolenko - Deep convolutional networks for object detection and image segmentation
- Dmitry Bugaychenko - From click to forecast and back: Data Science pipelines in Odnoklassniki
- Artem Marinov - Segmentation 600 Million Users in Real Time Every Day
- Alexander Krasheninnikov - Hadoop high availability: Badoo experience
- Ivan Yamshchikov - Neurona: Why did we teach the neural network to write poetry in the style of Kurt Cobain?
In the intervals between presentations, when speakers and participants on the site are removed through the looking glass of discussion zones, we show the online broadcast to the viewers of the off-session conference events and take fascinating interviews with speakers and interesting guests. If during the interview you have your own question - write it in the
conference Telegram-chat . Here is how it looked at the JPoint:
9:30 - 10:30 // Opening, interview with the team of the JUG.ru Group, introductory words from the organizers and partners of the conference.
10: 30-11: 20 Vitaly Khudobakhshov - Name is a featureStrange as it may seem to an educated person, the probability of being lonely / lonely "depends" on behalf of. We will talk about love and relationships, or rather, what exactly the social network data can tell about it. It is about the same as saying: “The probability of being hit by a car, if your name is Seryozha, is higher than if you were called Kostya!” It sounds pretty crazy, isn't it? Well, at least, unscientific. Thus, we will talk about the most unexpected and counterintuitive observations that can be made using data analysis in social networks. Of course, we will not ignore the issues of the statistical significance of such observations, the effect of bots and false correlations.
11: 40-12: 30 Mikhail Kamalov - Recommender systems: from matrix expansions to in-depth depth learningAt present, recommender systems are actively used both in the field of entertainment (YouTube, Netflix), and in the field of Internet marketing (Amazon, Aliexpress). In this regard, the report will discuss the practical aspects of the use of in-depth training, collaborative and content filtering and time filtering as approaches in recommender systems. Additionally, the construction of hybrid recommender systems and modification of approaches for online learning at Spark will be considered.
12: 50-13: 40 Sergey Nikolenko - Deep convolutional networks for object detection and image segmentationConvolutional neural networks have long been the main class of models for image processing. In the report, we will discuss how networks that recognize individual objects turn into networks that distinguish objects from the masses of others. We will talk about the famous YoLo, and about single-shot detectors, and about the line of models from R-CNN to the most recently appeared Mask R-CNN.
14: 25-15: 15 Dmitry Bugaychenko - From a click to the forecast and back: Data Science pipelines in OdnoklassnikiMachine learning is fun, but for it to work in industry, you need to do a lot of boring things. In this report, we will look at all the technologies, algorithms and methods necessary for your machine learning to shine like a diamond set in gold.
As an example, we will consider one complex task - the personalization of the news feed. Without going into the details of machine learning, we will talk about data collection (batch and real-time), ETL, as well as the processing necessary to obtain a model.
But just getting a model is not enough, so we will also talk about how to get model-based predictions in a complex, highly loaded distributed environment and how to use them to make decisions.
In this report we will talk about the processing and storage technologies of the Hadoop ecosystem, as well as many other things. This report will be useful to those engaged in machine learning, not only for fun, but also for profit.
15: 35-16: 25 Artem Marinov - We segment 600 million users in real time every dayEvery day, users commit millions of actions on the Internet. FACETz DMP needs to structure this data and carry out segmentation to identify user preferences. Let's tell how we, using Kafka and HBase:
• we segment 600 million users after switching from MapReduce to Realtime and how we did it;
• we process 5 billion events every day;
• store statistics on the number of unique users in the segment during stream processing;
• monitor the impact of changes in segmentation parameters.
16: 45-17: 35 Alexander Krasheninnikov - Hadoop high availability: Badoo experienceHadoop infrastructure is a popular solution for tasks such as distributed storage and data processing. Good scalability and a developed ecosystem captivate and provide Hadoop with a solid place in the infrastructure of various information systems. But the more responsibility is placed on this component, the more important it is to ensure its resiliency and high availability. In the report, we will talk about ensuring high availability of Hadoop cluster components. In addition, let's talk:
• about the “zoo” with which we deal;
• why provide high availability: system failure points and the consequences of failures;
• about the tools and solutions available for this;
• about our practical implementation experience: preparation, deployment, verification.
The report will be most useful to those who already use Hadoop (to enhance their knowledge). Another part of the audience will be interested in the report from the point of view of the architectural solutions used in this software package.
17: 50-18: 40 Ivan Yamshchikov - Neurona: Why did we teach the neural network to write poems in the style of Kurt Cobain?In 2017, “artificial intelligence” is a phrase that is heard from every iron. There are many examples of application of machine learning and artificial neural networks in business, but in this report we will talk about the creative possibilities of AI. Let
us tell you how we did Neurona ,
Neural Defense and
Pianola . We will discuss current challenges in the field of building creative AI and talk about why this is important and interesting.
Summing up our announcement, we recall a quote from a popular movie: “Life on Earth is a mystery. But its components are a technical problem. ”
Join now!
Restrictions
- The translation is provided on the principle of as is : we are sure that everything will be fine, but if suddenly that - do not judge too much!
- Video Records . They will be available almost immediately, but only for conference participants who have left feedback. And for all the others, we traditionally post them in 3-4 months on the YouTube channel of the conference .
- You cannot watch what happens in other halls . And there will be many interesting things . The next time you buy tickets and see everything without restrictions.