From the moment Automated Machine Learning Tools (AutoML) began to appear, such as Google AutoML, experts are debating whether they are ready for full corporate integration and application. The description of the AutoML tools states that anyone can take on the role of a “data scientist” (data scientist) who is able to create machine learning models that are ready for industrial use without the traditionally needed technical education.
Although it is certainly true that automated machine learning processes are changing the ways in which enterprises can perform data analysis tasks, the technology is not yet ready to leave data specialists out of work. One of the main statements of the technology is that automatically created models have similar quality and are produced as soon as possible compared to the equivalent model created by a group of data researchers.
Although AutoML models are created faster, they are effective only if the problem they are looking for is permanent and repetitive. Most AutoML models work well and achieve consistent quality in these conditions; but the more difficult the data problems are, the more specialist intervention is required to understand what AutoML has launched and turn it into something useful. To understand some of these limitations, let's look at the AutoML process in more detail.
AutoML tools simplify data processing, doing everything possible using available information. The process consists of three main steps:
The first stage involves “mining” information that helps improve the performance of the generated models, creating additional information for study. It takes a lot of time, since the data analyst needs to practically manually identify the relationships between data elements and develop ways of presenting information as additional data fields that the machine can use for training, as well as decide on the completeness of data to build a model .
This is an important step, because these additional data very often mean the difference between an unsuitable and an excellent model. AutoML is programmed to use a limited range of data detection methods, usually in such a way as to satisfy the “average” data problem, limiting the model’s ultimate performance because it cannot use the knowledge of a particular SME (small medium business) that may be important for success and what the data expert can use in his work.
Many data problems begin with considerable mental effort in selecting data for presentation to the algorithm. Transferring all the data you have in the system can lead to a model that does not match the parameters, because the data usually contains many different, often contradictory signals that must be targeted and modeled individually.
This is especially true of fraud, when different geographic regions, payment channels, etc., have very different types of fraud. Attempts to manually detect these patterns and design appropriate data sets to ensure accurate detection are still largely not automated. The use of a multi-purpose automated approach to this problem is currently impossible due to the enormous complexity of such an event.
The next stage is the generation of models. Models with different configurations are created and trained using the data from the previous stage. This is very important because it is almost impossible to use the default configuration for each problem and get the best results.
At this stage, AutoML systems have an advantage over data specialists, as they are able to create a huge number of test models in a very short period of time. Most AutoML systems tend to be versatile and produce only deep neural networks that may be redundant for many tasks when a simple model, such as logistic regression or decision trees, may be more suitable and benefits from the optimization of hyperparameters.
The final stage is the mass performance testing and selection of the best performer. It is at this stage that some manual labor is required, not least because it is imperative that the user selects the correct model for the task. It is useless to have a fraud risk model, which reveals 100% of cases of fraud, but calls into question every authorization.
In the current manual process, data specialists work with SMEs to understand the data and develop effective descriptive data functions. This important link between the SME and the data expert is missing from general-purpose AutoML. As described earlier, the process attempts to automatically generate these models from what the tool can detect in the data, which may be inappropriate, leading to inefficient models. Future AutoML systems must be designed with this and other constraints in mind in order to create high-quality models in accordance with the standards developed by specialists.
AutoML continues to evolve, and major current AutoML vendors (Google and Microsoft) have made significant improvements. These developments focused primarily on increasing the rate of generation of off-the-shelf models, rather than on how to improve technology to solve more complex problems (for example, fraud detection and network intrusion), where AutoML can go further than a data expert.
As AutoML solutions continue to evolve and expand, you can automate more complex manual processes. Modern AutoML systems work great with images and speech, because AutoML has built-in knowledge for business that allows you to perform these tasks so well. Future AutoML systems will have the opportunity for business users to enter their knowledge to help the machine automatically create very accurate models.
On top of that, complex data pipelines will become more and more streamlined, and adding a large variety of different algorithms for optimization will further expand the possible options for problems that can be solved by scientists working with citizens' data.
Although many data processing tasks will become automated, it will allow scientists to perform custom tasks for a business; further stimulating innovation and providing opportunities for businesses to focus on more important areas of revenue generation and business growth.
Source: https://habr.com/ru/post/449260/
All Articles