
October 21 in St. Petersburg, we hold a new conference on large and smart data
SmartData 2017 Piter .
Recently, everyone is talking about Big Data: from
schoolchildren to
German Gref . And this is where some dialectic dualism arises: much is said about the problems of working with big data, but all the talk is transfusion from empty to empty or some kind of terry marketing nonsense. Most of all, it is scary that people are starting to believe that there are several petabytes of “big data” somewhere, and that they can be taken to “go too far”. For advice, I turned to Vitaly Khudobakhshov from Odnoklassniki, and I adhere to a similar point of view, judge for yourself:
')
Big data is not a property of volume or time. What is considered "a lot of data" now, will fit on the flash drive in 10 years. What the Hadoop cluster of dozens or even hundreds of nodes now needs can be solved on the phone in the same 10 years. Big data is first of all a new quality, i.e. something that cannot be obtained with a smaller data set. In fact, there are not many such examples, but their number with the increase in the volume of data and the improvement of their quality is continuously increasing.
Sometimes big data makes life so easy that there is no need to use advanced machine learning techniques to solve a specific problem. Consider an example: a user enters his gender in a social network incorrectly, and it turns out that either we have an unknown gender or some gender by default, which is also bad.
Here is cat. It turns out that there is no need to do machine learning in this matter, simply because there is so much other information about the user on the social network that this is enough to solve the problems with simple school-level calculations. Taking, for example, his interests and determining at the expense of the majority, which interests are inherent in more men than women, or you can simply take the name and surname and see how many people with that name or surname designated their gender as male or female, and on the basis this to decide.
There is another problem - these calculations with data in such a volume need to be able to make and produce efficiently. This means that you can also associate technologies for their collection and processing with big data, such as Spark, Hadoop, Kafka and others.
Conference program
Now back to the conference. With this simple example, I want to show the level at which the program of the new conference of the JUG.ru Group will be based on the large and smart data
SmartData 2017 Piter . The conference will be held
on October 21 in St. Petersburg . We will not say why we need big data, what can be obtained from them and why all this is good and useful. Focus on three aspects:
- Data Science, in terms of a scientific approach;
- Solving practical problems with Big Data and using smart data;
- Tuling and solutions to solve problems correctly and quickly.
Data science
Alexei Potapov , to be honest, we are very glad that at the very first conference we managed to draw out such a remarkable person, the luminary of science, at one time engaged in industrial solutions in the field of computer vision. If you look at the reports of Alexei, you can find both those in which the speaker tells complicated things in simple words, and those who endure the consciousness of the most sophisticated engineers. We, of course, dwell on the second version and give you a good scientific hardcore.
Sergey Nikolenko - Data Scientist from POMI RAS, working with machine learning and network algorithms. Previously engaged in cryptography, theoretical computer science and algebra. Sergey is preparing a report on the scientific approach to the development of deep convolutional networks for image segmentation.
Practice
Alexander Serbul - curator of the quality control of integration and implementation of 1C-Bitrix, as well as areas of AI, deep learning and big data. Architect and developer in the company's projects related to high load and fault tolerance, efficient use of 1C-Bitrix clustering technologies in modern cloud services (Amazon Web Services, etc.)
Vitaly Khudobakhshov , a leading analyst at Odnoklassniki, where he deals with various aspects of data analysis, will tell at the conference how to prepare Spark from Kotlin correctly.
Tatyana Lando - what a bigdata, but without Google? We are currently working to get Tatiana Lando, an expert in linguistics and data analysis and organizer of AINL: Artificial Intelligence & Natural Language , to come to us, preliminary confirmation has already been received. Changes are possible in this place , UPD: Tatyana refused, but someone from Google will definitely come to us.
Vladimir Krasilshchik is a developer at Yandex, who has long been noticed “in connections” with big data. Vladimir is not the first time speaking at our conferences, and each of his reports consistently collects a good mark, because they usually have everything: manufacturability, and the proper presentation, and even plot twists. If you have not seen the reports of Vladimir, I advise you to
look at it (the report is simple as it is designed for students, but it allows you to understand how Vladimir acts).
Ivan Begtin - Chairman of the Expert Council at the General Prosecutor's Office
and the director of ANO Infoculture, which specializes in working with open data in machine-readable formats that the government discloses: ecology, criminology, demography, etc. The key point of the meeting with Ivan is to be able to ask him questions in the discussion area - there is an opinion that he will be able to say in one conversation whether it makes sense to develop the conceived project or the matter will not burn out. And this is not a fortune telling, but pure analyst.
Tools & Solutions
Not bypassing and tuling. In the end, how quickly and conveniently the problem will be solved depends very much on the toolkit. Developers of
Yandex.Toloki , a service for learning machine intelligence, Alexey Milovidov from ClickHouse and Alexander Sibiryakov from
ScrapingHub have already confirmed their reports. Naturally, these are not all reports, the program has just begun to be recruited, there will only be three tracks and no less than 17 reports, so stay tuned
to the website . From the interesting - we are trying to pull someone out of PornHub, that's where highload and mountains of data: by interests, geography, preferences and a heap of all that.
Submit a report

If you like not only to receive knowledge, but also to share it, pay attention: now is the time
to submit a report to us ! Although our strict program committee includes only really good performances in the program, the same committee helps the speakers bring their promising developments to the appropriate level - so even if you do not have much experience in speaking, but there is an interesting topic, do not be afraid to apply.
And if you have experience, refer to the video recordings of reports that you did earlier, and this will significantly speed up the decision making by the program committee.
The main requirement: your report should be useful to other developers. We are interested in reports on the following topics:
- Data and their processing (Spark, Kafka, Storm, Flink)
- Storages (Databases, NoSQL, IMDG, Hadoop, cloud storage)
- Data Science (Machine learning, neural networks, data analysis)
Discussion areas

As with any of our conferences, there will be discussion zones on SmartData. Surely, you know the feeling when you raise your hand to ask a question, and the presenter suddenly announces: “There is time for one question” - and of course, the choice does not fall on you, but during the break the speaker is hiding behind the door of the speaker’s room.
It is the discussion zones that answer the question “why go to conferences, when you can learn everything on the Internet”. It's simple: to personally ask your specific question. Well, we have all the conditions for this: in long breaks, specially trained volunteers will lead the speaker into the allocated space with a board for recordings and illustrations, seating and the opportunity to grab a cup of coffee. And here no one will limit your questions, comments and observations. Now not a single question will drown in timing.
check in

The conference program will gradually be replenished, and you can monitor its most current state on the
SmartData 2017 Piter website. And already on this site
ticket sales are
open - the early bird price is valid for the next two weeks. Therefore, it is better to follow the development of the program with a ticket in your pocket :)