Introduction to machine learning and a quick start with Azure ML

This is a translation of the article by Rafal Lukavetsky from Project Botticelli Ltd, which offers online training and courses on various technologies, including machine learning and Power BI, and so on. The original article can be found at

The Azure Machine Learning service is currently in preliminary public testing available to anyone with an Azure account (or at least trial access ). If you are wondering why I have always been so excited by this technology, look at my review article written a month ago or read on this post, in which I will tell you everything.

In short, in order to perform predictive analytic tasks using Azure Machine Learning, all you need to do is complete the following steps:

Download or import online any current or accumulated data (for example, your customer’s demography and its total expenses)
Build and validate a model (for example, predict demographic-based costs)
Create a web service that uses your models to make quick predictions in real time (decide which offers to provide to a new client based on their demographics)

The Azure ML service (also known as the Passau project ) is represented by two conceptual components: Experiments and Web Services and one development tool called ML Studio . You can invite other people who have a Microsoft (Live ID) account to work together in your workdspaces using ML Studio and at the same time they don’t even need to pay for your Azure subscription to work with you.
')
Experiments can be presented as stream configurations ( data-flow ) of what you would like to do with your information and your models. As an Azure ML data explorer, you focus on experiments and can spend all your time in ML Studio, just rebuilding experiments, changing parameters, algorithms, validation criteria, making periodic changes to data, and so on. ML Studio is a web application and it looks like an Azure management portal (at the time of this writing, mid-2014). The interface looks clean, pleasant and works well not only in IE, but also in Firefox and Chrome, albeit with some reservations, but this is only the first preview version.

ML Studio is the place where you start your work, deciding which data sources you want to use: your uploaded data sets or live data accessible via the Reader mechanism from a web page, OData, SQL Azure, Microsoft Azure, Hive or Azure blobs. Then, you may need to perform some Data Transformations , such as grouping, renaming columns, joins, eliminating duplicates, or a very useful binning / discretisation operation. In addition, you can use other, more interesting transformations, for example, the filters of the finite and infinite input response ( Finite and In ﬁ nite Input Response), which are used in signal processing. They can also be applied more broadly to economic data that can be viewed as complex waves (for example, especially time series ). This is part of the work of determining seasonality and is often associated with searching for frequencies similar to the musical ones in these seasonality. In addition, if you are just starting your project and are not quite sure which data columns to include, then the Feature Selection filters may be useful to you, presenting you with a good choice of correlation indicators. In practice, however, in the later steps you will want to specify a set of columns manually for maximum accuracy.

azure-ml-machine-learning-tasks

Now we will move on to what has been waiting for: we will do the real Machine Learning - which means Initializing (defining) the model, Training ( Train ) the model with some data, Testing ( Evaluate ) the model's performance and validity and, if everything is ok, Rating ( Score ) model (carrying out predictions on its basis). Azure ML offers many algorithms for Classifying Tasks, including Multiclass and Two-Class Decision Forests, Decision Jungles (developed by Microsoft Research), Logistic Regression, Neural Networks, as well as Two-Class Averages Perceptrons, Bayes Point Machine, Boosted Decision Trees and Support Vector Machines (SVM). Clustering uses a variation of the standard K-Means approach. Regressions include Bayesian Linear, Boosted Decision Trees, Decision Forests, of course Linear Regression, Neural Network Regression, Ordinal and Poisson Regression. And this is only in version 1.

You can apply useful Statistical functions in your experiments, including general elementary ones, for example, calculating deviations. Try it yourself, just start by specifying your data to the Descriptive Statistics task and Visualize ( Visualize ) the results (use the connection points on the tasks). Enjoy the elements of boxplots in the resulting visualizations - something that has long been lacking in all Microsoft BI tools, even Excel ...

One cool example of how Azure ML brings external research into your experiments can be found in the Text Analytics task section . The Named Entity Recognition task will allow you to process input text (called stories , for example, email addresses, typed descriptions of situations or tweets) and extract named terms from them, automatically classify them as People, Places or Organizations . There is also support for the Vowpal Wabbit project, which is supported by Yahoo and Microsoft Research. You can use it to get entity hashes on demand. I look forward to the future emergence of more tools and opportunities in this area, as it is obvious that Microsoft has a huge pile of knowledge stored inside Bing.

Deep R language support

And in addition to everything, you can use R inside Azure ML. According to my calculations, today Azure ML contains about 410 pre-installed packages on top of R 3.1.0 (surprisingly, the latest version). Among the packages are ggplot2 (yes!), Plyr and dplyr, car, datasets, HMisc, MASS, and all the other most frequently used packages for data mining. like rpart, nnet, survival, boot and so on.

which-r-packages-come-with-azure-ml

If you want to find the list of packages that were included in Azure ML, then just create a small experiment, such as mine, shown here, run some R code and save the resulting CSV on your computer. Column 1 shows all included packages.

What if your favorite R package (for example, ROCR or nleqslv ) is not listed? In fact, the documentation may confuse you. It says that in the “present time” it is not possible to install your own packages, however, the documentation then describes a workaround that helps to connect your package with a zip file. You can find a description of this approach below this link , which shows how to use install.packages () while using the link to the file passed to the Task Execute R Script.

The key to understanding the importance of the fact that R is part of Azure ML, in my opinion, is not only that the platform provides access to the language of statistics and analytics de facto (lingua-franca), but also how fast and painless it is in processing your data. This is especially noticeable on the background of the fact that R itself is not such a convenient tool for manipulating data. So instead of using the respected RODBC (included) inside your R-script, you can consider using Azure ML for all heavy data-processing tasks (sorry, fans of plyr ) and transfer the data to the R-script as an Azure ML Dataset Data Table which becomes available as a native data frame for R (data frame). The data will magically appear inside your script as an object called dataset . You can add multiple data sources.

I haven’t finished my performance tests yet, but anything that can improve the performance of R in handling large amounts of data can only be warmly welcomed. In addition, these features look like an obvious advantage of the cloud provider, compared with the usual boxed solution. And I can imagine that Microsoft uses a number of tricks to increase performance when Azure data sets are connected to the Azure ML service, even if you keep in mind the limit for 10GB at the moment.

azure-ml-api

With or without R, you can have a working experiment that you can use as a working brick within your web-based application. Imagine that you have just built a recommendation system. In terms of Azure ML, you have an experiment that uses the Assessment task (Scoring, predictions). You determine which of the input ports should be used as a Publish Input for your web service and accordingly what should be considered Publish Output . They will be presented in the form of small green and blue bullets on the contour of the problem. You restart your experiment again and use Studio ML to publish it as an Azure ML Web Service . You can now consume results through the Azure ML REST API as a simple web service or as an OData endpoint . This API offers a Request Response Service (RRS) for low-latency synchronous access to perform predictions, and asynchronous Batch Execution Service (BES) to retrain the model, say, with your future fresh data. This API provides an automatically generated sample code that you can simply copy and paste for use in Python, R or a C # application, as well as in any other place, since all this is simply based on REST and JSON.

testing-a-prediction

There is a cool little testing page available that will allow you to enter the required values for a fresh service and produce a test prediction.

The service also has additional features designed for practical use, for example, preventing Microsoft from automatically updating any of the components (tasks, etc.) of your experiment, changing which could change or even break your work. The right decision, Microsoft, is something that any IT specialists who support web-based systems do not like to face. You can test service updates to styling and configure security through an API access key.

Cost of

How much does it all cost? Bearing in mind the pricing of the preview version, it looks very attractive. There are two types of costs, hourly payment of calculations ( per-hour active compute ) and payment of calls to the web service API ( per-web-service API call ), both types of costs are proportional. The hourly rate is lower while you use ML Studio ($ 0.38 / hour) and a little higher in commercial operation through the ML API Service ($ 0.75 / hour). The costs of API calls are not considered as long as you are working in ML Studio and cost $ 0.18 per 1000 predictions during commercial use. If anything, this is an interesting and extremely simple pricing model, unlike the others that have been with Microsoft. I am extremely interested to know what my development clients think because there is a great opportunity to effectively resell Azure ML as part of your own web application, spending only a minimum of support forces, without having to build the entire system yourself.

Where to begin?

Where to begin? Visit azure.microsoft.com , subscribe and create a workspace in New / Data Services / Machine Learning. Then go to the Dashboard panel and click the Sign-in to ML Studio link . After reviewing the tasks that will determine the Experiment, I will advise you to choose one of many examples, create a copy of it and run it. If it works, follow the steps described above to publish it as your first web prediction service.

Of course, make sure you don't miss our upcoming videos and articles on this topic: become a member of the site to receive a newsletter with a large set of information. If you want to get started quickly, pay attention to our Data mining Training training, especially the modules dedicated to data preparation, since these concepts, especially cases, input and output columns, will definitely be useful when working with Azure ML.

Wish you enjoy learning machine learning!

Announcement! All readers of the translation of this article are offered a 10% discount on the courses of the author with the code MSBI2014RU! The discount is valid until the end of October 2014

Additional links

Try Azure for free for 30 days

Source: https://habr.com/ru/post/236823/

All Articles

Introduction to machine learning and a quick start with Azure ML

Deep R language support

Cost of

Where to begin?

Additional links

Try Azure for free for 30 days

More articles: