📜 ⬆️ ⬇️

Experimenting with Azure ML: Classification, Decision Trees

At the end of February of this year, Habr was blown up by an article about an open machine learning course from the Open Data Science community. Our MVP, Mikhail_Komarov , decided to experiment and go through it using Azure ML to implement some algorithms. Under the cut, you will find an analysis for part 3 of the course “Classification, decision trees and the method of nearest neighbors”.



Further, the story will be conducted on behalf of the author.

The idea of ​​this material arose as you progress through the course “Open Machine Learning Course”. I set the task for myself as follows: “There are reference solutions in Python, some of these solutions can be ported to Azure ML, and the porting should be carried out in such a way as to make the most of the Azure ML functionality. How to do this?
')
First of all, I thought about the format: how detailed the material will be interesting for you. In one of the Azure ML review articles, there were suggestions: more theory, code, and not necessary step by step.

So, thanks to the open course format, we already have a theory in the original article . Many thanks to the creators, the Python code is also here . Therefore, we proceed immediately to the essence of the decision on Azure ML.

First, we solve the problem of importing data as a new DataSet. Unfortunately, Azure ML has limited import functionality, so we are changing the element separator in files in advance to match the standard CSV. Azure ML has the same trial DataSet, but we will use the course.



Taking into account the fact that a tree is being built in the task, a tree with parameters selection, a forest and a forest with parameters selection, we create just such a structure.



In order not to waste time on particulars and concentrate on the main thing, I suggest you take my experiment from the gallery as a basis and change it if necessary. If you do not want to start a free account, you can try without it, simply by removing the module in Python.

Details of work in the cloud


I will focus on some of the details that will help you understand the work of the experiment in the Azure ML cloud.

1. Data cleansing, replacing missing numeric values ​​with their median and module also work for text fields.



2. Do categorical signs. Note that Azure ML automatically parses categorical signs into columns with zeros and ones.



3. Since there is no simple tree algorithm in Azure ML, then for the first approximation we take the Two-Class Boosted Decision Tree and set one tree in the parameters.



4. Now let's see the result of the work, the Evaluate Model module.



5. Get the next set of results.



This is a picture as a whole, but there are more details, such as the selection of parameters and cross-validation.

Below you can see the result for the tree.





Details of the implementation of the selection of parameters can be found in Tune Model Hyperparameters . Do not forget to set Parameter Range in the model Parameter Range .



Thank you for your attention and light cloud!

useful links




Author


Mikhail Komarov - is engaged in the support of existing and implementation of new systems aimed at improving the efficiency of work in the corporate segment. Before working in a large corporate sector, he worked as an information technology coach. The general experience in the field of IT is more than 20 years. Of interest, virtualization, infrastructure, data analysis and machine learning. MVP on Cloud and Data Center Management since 2011.

I want to acknowledge the help of Evgeny Grigorenko in the work on the experiment and Elizabeth Shvets for solving common problems and preparing the publication.



We remind you that you can try Azure for free .

Source: https://habr.com/ru/post/328826/


All Articles