Thank you very much for preparing the article for Evgeny Grigorenko, Microsoft Student Partner, for helping to write this article. Our remaining Azure articles can be found on the azureweek tag .
Let me guess, you, like me, have been burning for a couple of months with the idea of a brilliant application. In addition to its basic functionality, in an ideal world, it is simply obliged to have many additional features, for example, to identify the user (or
cat ) from his photo from the front camera or to understand the commands in natural language. Or make a second how
-old (which was made just at Oxford).
But we all know the sad truth. Much is possible only with the use of complex machine learning algorithms, which we absolutely do not have time to study. And that is what stops the development, since without such innovations we will completely get lost among the analogs. But there is a solution to this problem, and his name is Microsoft Project Oxford. If you want to find out how Microsoft Project Oxford can simplify your life and make your applications truly intelligent, then welcome to cat.

At the conference // BUILD in April 2015, Microsoft, among many other announcements, introduced a new service in the Microsoft Azure cloud services group. They became a project with the code name
Project Oxford - a set of ready-made REST APIs that in an accessible form give developers the full power of machine vision algorithms, natural language analysis and voice recognition for use in their applications. It is worth noting that the availability of services in the form of a REST API allows you to use it on absolutely any platform and using your favorite development technologies, not limited to those proposed by Microsoft.
')
The Oxford project itself expands the already existing
Azure Machine Learning Gallery services gallery with new highly intelligent solutions. The initial idea of creating the Azure ML Gallery a year ago was to try to gather in one place fairly simple to use machine learning services. To use them, you do not need to be an expert mathematician; all that is needed is to call the API and not to think about complex mathematical aspects at all - internally the services will do everything on their own. And Project Oxford, more than ever, perfectly responds to this idea.
What does Project Oxford consist of? The project consists of four groups of self-sufficient cloud APIs: Face APIs, Computer Vision APIs, Speech APIs, and while still in a state of closed beta Language Understanding Intelligent Services (LUIS).

The
Face APIs services include cloud-based algorithms for detecting and recognizing human faces in photos, namely:
- detection of the boundaries of faces in the form of descriptive rectangles with the allocation of additional characteristics, such as the coordinates of parts of the face, the position of the head, sex and heuristic estimation of age;
- a wide range of recognition services, representing such opportunities as assessing the similarity of two faces, searching for similar faces on a series of photographs according to a given pattern, automatic grouping of photographs and identification (recognition) of people based on a previously prepared training sample.
In addition to the standard tasks of finding faces in photos and automatically categorizing a gallery of photos like the one presented in Apple iPhoto, the service can be used in many other scenarios. Just remember the modern blockbusters. Tracking the movement of people with the help of external surveillance cameras, automatic authorization when approaching a top-secret installation become possible and realizable using Microsoft Project Oxford Face API services.
But even if you are not going to become a member of the spy drama, the proposed services may be no less useful: intelligent content targeting based on information about the field and the user's age, filtering photos by the image from the contact list - there are a lot of scenarios and by imagination.
For all those interested, as well as those who are not yet so, I strongly advise you to familiarize yourself with the additional information on Face API on the
official website of the project . There, along with the
documentation , several interesting
demos are presented, besides the visualization that give an idea of the API text response to requests, it is a great way to try out services and understand all their capabilities in a couple of minutes.
The popularity of how-old.net immediately after its announcement and its recently announced colleague
twinsornot.net according to the original plan were also created only as a Face API demo for the period of the conference // BUILD 2015. Read about the success story of the first service and try to transfer it on the history of your application
here .

The most intriguing part of the Oxford project is undoubtedly the
Language Understanding Intelligent Service or
LUIS . LUIS gives developers the ability to build models of natural language understanding for convenient use in their applications.
There may be several sources of such models. Simple models can be built on the basis of existing and successfully used in projects such as Cortana or Bing. Do you want your software solution to understand basic concepts like time, numbers or temperature and successfully respond to a query like “remind me of training at 8 am”, standard models will be enough. If it is necessary to respond to more complex queries like “start tracking runs” or “turn on the light”, then you will need to build your own models, which, in principle, is also achievable with the tools represented by LUIS.
Further, these models can be published as REST API and used on any devices and operating systems capable of such calls. The opportunities that LUIS can present are difficult to overestimate. A couple of years ago, such virtual assistants as Cortana and Siri seemed complicated and unattainable, and now they are available to any developer. More than ever, your decision can easily and naturally become truly intellectual and, perhaps, in the end, he will even succeed in successfully passing the Turing test.
But, unfortunately, a project of this magnitude takes time to complete. Unlike other Oxford project services, which are initially restricted for use, LUIS is in a state of closed testing. Additional information about the project can be found on the
official page and video of
one of the reports from // BUILD 2015, and
here you can register in the queue for access to the project. Do not miss the opportunity to plunge into the world of natural language intellectual analysis and give these opportunities to your users first!

The services of the
Computer Vision APIs group continue the direction of visual analytics, but they do it in a completely different way. They specialize in the analysis of arbitrary photos and provide the following wide range of features:
- categorization of images, such as food, people, buildings and, of course, cats;
- searching for inappropriate sexual or racial content in photos;
- the definition of the dominant colors of the image, the facts that it is black and white, clip art or a line drawing;
- text recognition (OCR);
- thumbnail generation of images based on intelligent exposure analysis.
Frankly, Computer Vision API services are my favorite by the number of features presented. Dividing images into categories is probably one of the oldest tasks of machine vision, and you have probably come up with a dozen scenarios for using it in your project. Not yet? Just try it! Did you ever want to implement OCR text recognition in your application, read checks, pointers to poles, or any other textual information that surrounds us everywhere in the real world, but thought it was difficult and expensive? Now it is available and convenient to use as one of the Project Oxford Computer Vision APIs services.
But, in addition to the well-known technologies described above, Computer Vision APIs provide many additional features. For example, consider a purely technical task, such as generating thumbnail images, also known as thumbnails. The task seems simple, until you encounter it in real life. While maintaining the proportions, scaling is a simple task, but one has only to try to change them and bodies without heads, “sky” instead of “cat against the sky” and other problems of incorrect cropping of photos begin to crumble from the cornucopia. And the Computer Vision API has a solution. It not only hides the technical issues of scaling while maintaining the maximum quality of the sketch, but also tries to determine the significant elements of the exposure, which provides a more correct choice of the borders of cropping the image. Such an approach in most cases allows to achieve maximum preservation of the content of the created sketch. Just look at the example below, the service has not been informed of any additional information other than the image of the person on top of the mountain.
All site owners, to which users can upload their images, are aware of the moderation problem. And the ability of the Oxford project to detect sexual and racial content allows you to put this complex task on the shoulders of the machine. All that is required when uploading a photo to your service is to make a parallel call in the direction of Computer Vision APIs and, based on the resulting level of image correspondence to groups of unwanted content, make a decision about additional human processing or a categorical ban on the user in a publication. But, if such a solution is not enough and a more complex approach is required, then perhaps it makes sense to pay attention to a group of similar services that use Computer Vision APIs as the basis:
Microsoft Content Moderator and
PhotoDNA Cloud Service .
All interested can find more information on the
official site of the project , where, as before,
documentation and convenient
demos are also available.

The
Speech APIs services define algorithms for many years in a row that have been used in the voice services of the Bing search engine, recently introduced by Skype Translator, and recently entered into the natural form of Windows 10 in the form of the already well-known virtual assistant Cortana. As it is easy to guess, the Speech API includes voice recognition services from an audio file into text and back.
The described functionality does not require a special presentation, and therefore only discuss some additional features. First of all, you need to talk about languages and here everything is not in favor of Russian. Voice recognition services currently only support English, German, Spanish, French, Italian and Chinese, but this list is constantly expanding. The voice generation in the text pleases support of a number of additional languages, including Russian, and therefore can be actively used now. It is also worth noting that recognition services support online processing with the possibility of returning preliminary results. This allows you to significantly speed up the process of parsing the incoming stream and make the user interface as responsive as possible.
In addition, the Speech API is the only Project Oxford service that does not require a permanent active Internet connection. The corresponding algorithms are built into the Universal Windows Platform and can be used in your universal applications on all devices based on Windows 10 offline.
If during the times of the victorious globalization, the lack of support for the Russian language is not an insurmountable limitation for you, or you are interested in finding out how the Speech API answers exactly your use case, then I advise you to visit the
main page of the project in search of additional information about the solution,
technical documentation and more than once recommended interactive
demos .
If you already have a mobile application or website, or you are only going to create something like this, think about how Project Oxford can be useful to you personally and I am sure that there will be something. With it, your solution will become more modern and stand out positively from many others, and users will be satisfied with previously unprecedented usability and capabilities. And most importantly, you will not need any analysis of complicated refractive mathematics, long development of a complex algorithm that is ready for the whole - no effort at all except a couple of lines of code to call the necessary service. With Project Oxford, using machine learning services is easier than ever.