Course on machine learning on Coursera from Yandex and HSE

Once we published on Habré a machine learning course from Konstantin Vorontsov from the School of Data Analysis. We were then offered to make this a full-fledged course with homework and place it on the Curser.

And today we want to say that we can finally fulfill all these wishes. In January, Cursor will take a course organized jointly by Yandex (School of Data Analysis) and the HSE. You can sign up now: www.coursera.org/learn/introduction-machine-learning .

Coursera co-founder Daphne Koller in Yandex office
')
The course will last seven weeks. This means that in comparison with the ShAD two-semester course it will be noticeably simplified. However, during these seven weeks we tried to contain only what exactly will come in handy in practice, and some basic things that one cannot but know. The result was a perfect Russian-language course for the first acquaintance with machine learning.

In addition, we believe that after passing the course, a person should have not only theory in the head, but also the skill “in fingers”. Therefore, all practical tasks are built around the use of the scikit-learn (Python) library. It turns out that after completing our course, a person will be able to solve data analysis problems himself, and it will be easier for him to develop further.

Under the cat you can read more about all the authors of the course and learn its approximate content.

About teachers

Course Lecturer - Konstantin Vorontsov. For many years, Konstantin Vyacheslavovich has been teaching the basics of machine learning to students of the ShAD, HSE, MIPT and MSU.

The practical part of the course was prepared by Petr Romov, Anna Kozlova and Evgeny Sokolov, who also gives several lectures. All three work in Yandex (Zhenya and Peter in Yandex Data Factory , Anya - in the machine translation department) and use machine learning in their daily activities. The guys were aware of what was happening in the field of data analysis and tried to prepare the tasks so that their implementation would bring the maximum benefit to the course participants.

Program

This is a description of the course modules in the form in which it will open on the “Curser” simultaneously with the start of the course.

1. Familiar with data analysis and machine learning.
In this module, we will discuss the tasks that machine learning solves, define a basic set of concepts, and introduce the necessary notation. We will also tell you about the main libraries of the Python language for working with data (NumPy, Pandas, Scikit-Learn), which will be needed to perform practical tasks throughout the course.

2. Logical classification methods.
Logical methods make the classification of objects based on simple rules, so that they are interpretable and easy to implement. When combined into a composition of logical models allow you to solve many problems with high quality. In this module, we will study the main class of logical algorithms - decision trees. We will also talk about the union of trees in a composition called a random forest.

3. Metric classification methods.
Metric methods classify on the basis of similarity, so that they can work on data with a complex structure — the main thing is that the distance can be measured between objects. We will study the k-nearest-neighbors method, as well as how to generalize it to regression problems using nuclear smoothing.

4. Linear classification methods.
Linear models are one of the most studied classes of algorithms in machine learning. They are easily scalable and are widely used for working with big data. In this module, we will study the method of stochastic gradient for the infusion of linear classifiers, get acquainted with regularization and discuss some of the subtleties of working with linear methods.

5. Support vector machine and logistic regression.
Linear methods have several very important subtypes, which will be discussed in this module. The support vector method maximizes the indentation of objects, which is closely related to minimizing the likelihood of retraining. At the same time, it makes it very easy to proceed to the construction of a nonlinear separating surface due to the core transition. Logistic regression allows to estimate the probabilities of belonging to classes, which turns out to be useful in many applied problems.

6. Metrics of quality classification.
In machine learning there are a large number of quality metrics, each of which has its own applied interpretation and is aimed at measuring a specific property of the solution. In this module, we will discuss which metrics for the quality of binary and multi-class classification, as well as consider ways of reducing multi-class problems to two-class ones.

7. Linear regression.
In this module, we will examine linear regression models and discuss their connection with the singular decomposition of the “feature-objects” matrix.

8. Dimension reduction and principal component method.
In applied tasks, there is often a need to reduce the number of features - for example, to speed up the work of models. In this module, we will discuss approaches to feature selection, and we will also study the principal component method, one of the most popular methods for reducing dimension.

9. Composition of algorithms.
Combining a large number of models in the composition can significantly improve the final quality due to the fact that individual models will correct each other's mistakes. In this module, we will discuss the basic concepts and problem statements associated with compositions, and discuss one of the most common ways to build them — gradient boosting.

10. Neural networks.
Neural networks allow you to find complex nonlinear dividing surfaces, making it widely used in such difficult tasks as image recognition and speech. In this module, we will study multilayer neural networks and their configuration using the back propagation error method. We will also talk about deep neural networks, their architectures and features.

11. Clustering and visualization.
This module is devoted to a new class of problems in machine learning - learning without a teacher. This refers to situations in which it is necessary to find a structure in the data or to perform their “reconnaissance”. In this module we will discuss two such tasks: clustering (searching for groups of similar objects) and visualization (mapping objects into two- or three-dimensional space).

12. Partial training.
Partial learning refers to the task that lies between learning with a teacher and clustering: a sample is given in which the value of the target variable is known only for a part of the objects. Such situations occur when the marking of objects is an expensive operation, but at the same time it is quite cheap to calculate the signs for objects. In this module, we will discuss the differences between the partial training and the previously reviewed productions, and analyze several approaches to the solution.

13. Machine learning in applied tasks
In this module we will summarize the course, recall the main stages of solving the problem of data analysis. We will also examine several tasks from the application areas in order to prepare for the implementation of the final project.

Source: https://habr.com/ru/post/269175/

All Articles

Course on machine learning on Coursera from Yandex and HSE

About teachers

Program

More articles: