Go there, I do not know where: in the wake of the conference SmartData

Conferences related to AI / ML / data science recently and we have quite a lot. The organizers are still looking for formats, the concepts of conferences are changing, but the list of speakers is repeated by 50 percent.

The task of finding a format was faced by the SmartData program committee. This task is rather blurry. Who is the person who is engaged in the analysis and / or data processing that he is interested in? We received partial answers to these questions from the participants of the conference, but we want more data. In this regard, I want to share the idea of the ideal world that has developed at the moment, and invite readers to the discussion in the comments. Help make such a conference, which you yourself then want to go.
')
In addition to questions about your interests and tasks, you are waited for by a click on two previously unpublished videos of speeches from the first conference, a technical method of writing texts on Habr and one funny fact about unmanned vehicles.

But we will begin with what we learned from the conference visitors.

Data Science vs. Data Engineering

Many participants noted that the presence of reports "about algorithms" and "about tools" in one conference seemed strange to them. Although the industry is young, it already has specialization, some people set up the pipelines, while others enjoy the results of labor first, and they have almost no common interests.

Whoever teaches models for, say, video recommendations, is interested in what is in the data, how many are there, how complete is the data for each user, whether they are cleared from robots. But all this is taken from Vertica, ClickHouse or from somewhere else, he doesn’t care if it works. And for someone who supports this system in working condition, there is a difference.

It is not yet clear whether the conference audience is increasing or decreasing from combining these two topics: on the one hand, for more different people something interesting can be done, on the other hand, there will be many “foreign” reports for everyone, and the choice in each time slot will decrease .

Questions: What about you? Is it true that people from data science and data engineering have few interests? Is it good that the conference covers both directions, or is it better to focus on one?

Do not rush to answer! If you write a comment without having read the text to the end, then you will not return to reading with great chances. But if you do not answer right away, the question will then be forgotten. Which of these ways is worse? Both are worse. For the sake of consistency, I ask questions in each section separately, but in the end they will be gathered together and asked again for the convenience of the respondents. And it was a technique for writing texts on Habr, yes.

Since it is impossible to embrace the immensity, then I will talk about the side of the data scientist. We will try to consider the data engineer’s side later with a separate post.

More observations on the results of the collected reviews

Frameworks

Quite favorably received by the audience the report of Anna Veronika Dorogush, dedicated to CatBoost. The opportunity to catch the developer on the sidelines and discuss burning issues is also useful.

Actually, CatBoost is a fairly new project (at the time of the report, it was altogether), so it was quite natural to talk about basic things at that time: what was needed, what is the difference from the existing implementations of boosting, what about the performance on the authors' own measurements, what tasks, by design, are best suited, and for which exactly not.

From products with a longer history, we probably would have expected another report. By the way, which one?

Questions:
Is it interesting to communicate with framework developers? Next time we will try to bring not only Anna, but also people like Tianqi Chen (XGBoost), Guolin Ke (LightGBM). Can you add to the list? What is the plan to ask them?

Foreign speakers

Surprisingly, there were a lot of reviews that there are no foreign (for some reason they wrote “American”) speakers, and that this is very bad. Ng failed to bring, sorry. We, of course, will not give up attempts, but ... Let's speculate about who should be carried from afar.

Is it interesting to see the founders and ask them something? Should you try to bring people like Trevor Hastie / Robert Tibshirani / Jerome Friedman / Geoffrey Hinton? Talking about how such attempts, in principle, may or may not be successful, let us leave, for the time being, beyond discussion. Do you have anything to talk with them?

People who are engaged not in science and teaching, but in practical tasks, can be difficult to pull out in connection with work, moreover, they usually do something highly specialized, interesting to a few. For example, personally, I would gladly pay for the opportunity to talk with Ron Kohavi, and who without Google knows what he does? And in the opposite direction, I probably do not know many people who are ashamed of not knowing. Therefore, someone must tell me their names. Tell me please.

Questions:
Are the founders of the industry interesting and familiar to you? Find what to ask them at the meeting? Which foreign celebrity did I forget? What are the breakthrough projects worth trying to find people?

Practical cases

If you do not staff the entire grid with speakers from Yandex and Mail.ru, then finding people who successfully solve some interesting practical problems turns out to be unexpectedly difficult. The average practitioner read the tutorial , applied it as is, without any adaptation to his specific task, received some numbers that he could not evaluate, and concluded that there was not enough data for training. And I must now share my experience at the conference.

Instead, I would like to find people who are doing something real.

The only question is whether their tasks will not be too narrowly specialized for the majority of viewers. For example, my favorite from the SmartData program, Ivan Drokin's report “No data? No problems! Deep Learning at CGI ”, was held in a half-empty hall. It is possible, however, that we did not work out clearly with the title of the report, and not everyone understood what it was about.

This is a very strong report, do not take half an hour. Speech about how to train the grid to recognize parts on the conveyor (physical conveyor at the factory), without having previously prepared samples of these parts. There are no details, but there are models from which their images can be generated. Why not learn from these images?

Ivan tried several techniques for augmentation of data, which can be useful or, at least, serve as a source of inspiration in other situations. You can add to the generated images, which learns the grid, the noise from the real cameras, working in the shop. It is possible and necessary to emulate the movement of light sources (the sun outside the window does not stand still in our coordinate system). Even style transfer algorithms are a seemingly useless thing, and this can be used to augment artificial data. Fire, in short.

But is it really interesting to listen to highly specialized pieces from different areas?

I will ask more about this potential practical topic. After all, everyone is interested in ~~self-propelled crews,~~ unmanned vehicles? We in Russia have people who deal with them, and not only in Yandex. But what these people are fighting with often doesn’t concern the magic that we, ordinary Muggles, imagine. Actually, a car learns to drive on a simulator, which can be a GTA (and really, why write your own?). After we parse the information from the sensors, the features are almost the same. The whole problem is "almost."

Here goes the car, all is well. And if you put a couple of bags of cement in his trunk, he starts to fail. Are the narrow subtasks and unexpected, I dare say this word, foolish problems that people face, are interesting stories about a ~~bag of cement~~ ? Because about how to hit a billion hours without crashes on the simulator, they themselves may not be interested to talk.

For cases, the most difficult plan is now: we go to meetings and conferences, communicate with everyone who may have interesting tasks, sift an elephant through a sieve, collect informative cases.

Questions:
What practical areas are particularly interesting? Do you want stories about solutions “as a whole” (architecture, choice of technologies, etc.) or about narrow subtasks, too, is curious?

What was not at all, but it may need

Machine learning contests

We say "contests", we mean "Kaggle". This topic was not presented at all on the first SmartDate, and yet it is a very extensive area of activity with its own ecosystem and a huge group of enthusiasts. Typical mistakes when holding contests, saving the budget for Amazon Web Services when participating in them, meeting with one of the top kagglers seems to be interesting options.

Questions:
Are the topics related to the competitive solution of machine learning problems interesting?

Business Cases

Any such business cases? We have an engineering conference!

But which ones: sometimes people did something interesting and unobvious, but the discovery is not in how they implemented something, but in the very formulation of the problem. For example, you have, no matter where, the body of the texts of email-messages from deliberately "white" senders. Is it possible to benefit from there? It so happens that all the years passed by the opportunity, and suddenly someone noticed her. The technical implementation can be at the level of linear regression, the essence of the find is not in it.

It seems to me that in the industry there is a request for “how would I use / monetize data that I already have,” and in this place there may be interesting stories. But you only need to find really interesting, and not as usual. In any case, in the responses to the conference, there were people who, it turns out, were waiting for such insights.

Questions:
Do you have a situation when it is not clear “what would I do with the data”? Are the data analysis problems solved by someone irrespective of the applied algorithms interesting?

Perfect picture of the world

I will gather here arisen and will be happy with suggestions and advice.

It seems that the program of an interesting ML / AI / data analysis engineering conference should include the following elements:

Data science vs. data engineering

1. Does it make sense for you to combine these topics?

Authors of popular frameworks

2. CatBoost, XGBoost, LightGBM, Tensorflow, who else?
3. What to ask to tell these people?

World class stars

4. Is it interesting to see people like Ng / Hastie / Tibshirani / Friedman / Hinton live?
5. Who, from your point of view, I forgot to add to the list line above?
6. A live participant in a famous project you would like to hear?

Practical cases

7. From which areas are the cases particularly interesting?
8. Nothing, if the tasks are quite narrow?

Kaggle

9. Do you need reports specifically dedicated to the competitive solution of machine learning problems?

Business cases, problem solving data analysis

10. Looking for?

Source: https://habr.com/ru/post/344868/

All Articles

Go there, I do not know where: in the wake of the conference SmartData

Data Science vs. Data Engineering

More observations on the results of the collected reviews

Frameworks

Foreign speakers

Practical cases

What was not at all, but it may need

Machine learning contests

Business Cases

Perfect picture of the world

Data science vs. data engineering

Authors of popular frameworks

World class stars

Practical cases

Kaggle

Business cases, problem solving data analysis

More articles: