Alexander DmitrievGood day, Habrahabr! Today, Alexander Dmitriev, business consultant of the IBM Client Center in Moscow, will tell you about what the Watson cognitive system is and how it works. He will answer questions that have arisen from readers after reading other materials on this topic.
Alexander, on Habré, our readers regularly ask questions, the main message of which can be put in one: “What is the IBM Watson cognitive system, and how does it work?” Help, please, answer it.Hello. First of all, Watson as a whole is a large set of software packages using a wide variety of algorithms. Some of these packages are available in the cloud, and some are intended for local deployment. IBM has collected a variety of analytical modules and built a system that can handle a truly huge amount of data. This system works with both digital and textual information in various languages, including Russian, which is still at a basic, but still deep enough level.
')
When processing information, relationships and correlations are established between a variety of data, events, facts and events. One of the main tasks of the system is to identify links that are invisible to the simple eye and that cannot be detected in the usual way, or it is difficult to do this using standard methods. For example, if any enterprise has a monitoring system that provides thousands of changing parameters per minute, then no analyst is able to analyze such information promptly. The Watson toolkit does that. This is speaking of IBM Watson as a whole.

I have seen questions as to whether Watson is an artificial intelligence. The answer is - no, this is not artificial intelligence. It is a kind of enhancer of human natural intelligence, which allows faster processing of information, covering large amounts of data and finding what passes by the human eye.
Stanislav Lem wrote about this in his book
“The amount of technology” : “Man cannot directly compete with Nature: it is too complicated for him to resist alone. Figuratively speaking, a person must build between himself and Nature a whole chain of links, in which each successive link will be more powerful than the previous one as an amplifier of the Mind ”.
How is this done and why? There is a top level analytics system based on Watson and smaller level analytics systems. The latter are in fact search and analytical systems with a certain specificity. They solve applied problems. How it works? We fill in a large amount of information of a certain subject in the form of files of common formats like xls and csv. We upload this data to the cloud, after which the Watson Analytics system starts analyzing this information, finding correlations on its own - with minimal operator participation. This is a small, but very important difference from other systems, since there is not just a search for previously loaded data. I emphasize - the system itself analyzes the downloaded information.
What do you mean - myself? The system is set up so that it looks at all downloaded data, cleans it up, pointing to technical problems such as format mismatches, spaces, omissions. A person throws out all the extremes that are errors or cause for separate consideration, selects a processing method. The system then analyzes the data, looks for correlations, finds the strongest ones and shows the operator several hypotheses with correlations, say, from 0.3 to 0.8.
I would call these tasks conditionally the tasks of the lower level. They are designed to simplify and speed up the work of the analyst. Routine operations are automated by the system itself. This is speaking about Watson as a system for searching for correlations in big data arrays. What six analysts will need about a week, the IBM Watson Analytics system via the cloud does in about two hours. How difficult is it to work with Watson? I once conducted an experiment, sitting down for a system of people more or less versed in statistics. They saw the interface for the first time. After an hour and a half, they were independently and very actively working with her.
The upper level is a large system, the implementation of which requires considerable time (from six months and more). Their principle of operation is based on the idea of ​​a Gartner study, which states that by 2030 the total qualification of specialists in most industrial sectors will decrease significantly. One of the factors of this can be explained. The fact is that a specialist who is accustomed (and who needs duty) to constantly use reference information no longer considers it necessary to remember everything that experts of the “old school” remembered (from the height of the stratosphere to the boiling point of copper). The new generation willingly resorts to the Internet as a reference and does not keep all the necessary knowledge in my head. It turns out that the specialist becomes, to a certain extent, dependent on machine systems, and the overall level of his qualification decreases accordingly. This time.

Second, why do we need such complex systems? Oil producing and many other corporations have a huge problem - the transfer of information from “generation to generation”. For example, the previous team of employees has accumulated an archive of very valuable technical information. But the problem is that no one is able to read it. After all, it takes a huge amount of time. It will take several years for a specialist to become familiar with this information - this person must read for days without food or rest.
So, training new employees is a costly problem. Personnel changes at large enterprises - there may be hundreds and thousands of specialists a year. A man left - and invaluable experience and knowledge left him. How to transfer experience? By the records? We talked about them above.
It turns out that at the level of transnational corporations, where there is a huge amount of data, nomenclature, hundreds of thousands of personnel, it is required to create a certain system that would accumulate data of a certain thematic specialization. Ideally, this system can be used not just as a reference book, but as a reference book that gives advice.
What is the task of the top level Watson system?A huge variety of analytical packages included in the general Watson toolkit are put together, from which the necessary packages are selected, which will process information using a specific method. Well, after that, data of any type is loaded into the system — digital, meeting minutes, business correspondence and negotiations, communications, contacts, prices, equipment nomenclature, oil industry textbooks, reports for various periods of time. This may take more than a year, but as a result, a pool of the corporation’s core knowledge base is created, which can be actively used.
After that, algorithms are set up that allow analyzing information, detailing it, isolating and building a tree of relevant topics - for the same “oil industry” it is equipment adjustment, reservoir development, statistics, trends in technologies, etc. All this is collected by topic, a hint system is created. Systems of this kind, developed by IBM, are already operating in a number of corporations, including the Australian company Woodside Petroleum.

The above can be illustrated by example. There is a chief engineer at the enterprise, he gives the task to drill a well in the reservoir for which there is relevant data. The person to whom the task was given accesses the system in a natural language: “What should be done to drill a well in such a formation to such a depth?” And the system gives the answer, it works as a hint for a specific oil problem. The system configured for the oil industry makes a selection of documentation with conclusions, and “speaks” —this was what they did before, but there were some such problems that could be solved like this. This is the Watson system - it suggests what a person needs to do in a particular case, acts as an assistant.
Can the Watson system work as an oncologist, meteorologist, someone else?As an advisor or assistant - yes, of course. IBM Watson means a common product system for any application. But in each case it is necessary to set up the system to solve specific issues.
In the case of oncology, this is the creation of a database for a specific disease, for example, lung cancer. A huge amount of data is loaded into the system, including depersonalized patient case histories. After that, the doctor asks a question about the method of treatment of a particular patient, and the system gives an answer taking into account the individual characteristics of the person. Watson does not take on the functions of a doctor - it’s still the doctor who will make the diagnosis and prescribe treatment, but it helps to personalize the treatment, clear the necessary data from mistakes and make a selection of the best treatment options of this particular patient at the moment.
It is important that the system also checks all data for legitimacy and errors, as there may be errors in the same medical data. The problem of doctors (and not only doctors, but modern specialists in general) is that they do not have time to learn everything new. It is not their fault. Simply, if there is a lot of work, and a qualified specialist always has it, then there is not enough time for training. Therefore, the same doctors often use not the most modern methods. And the Watson system can offer a new method of treatment, even a few methods, with a certain probability of curing the patient and a fixed degree of risk to his health. And the doctor, after consulting with the patient or relatives of the patient, can make a decision based on these data. Once again it is worth emphasizing that the responsibility lies with the doctor, because the answers of the system are advisory in nature. The system helps the physician by providing the most up-to-date information on which methods are suitable for a particular patient.
How does IBM Watson work with natural language? Can the system understand the context of a literary work?Definitely yes. But the question is why? Another question - who needs it, and who pays for it? When working with a language in terms of processing a literary work, it is necessary to consider the text in connection with the historical context of the work itself. The system can understand everything, if it sets such a task, including the works of O. Henry, whose translations were best obtained by Korney Chukovsky. It must be said that the systems working with the language are also configured and trained. In the simplest case, this is a trivial parsing, that is, parsing clearing text from unnecessary information. As for Watson, this is, first of all, the creation of dictionaries of different languages. In any case, the system should be trained with an eye on a specific task.
I personally participated in the project of emotional analysis. Today, Watson captures the emotional coloring of the text. For example, she learned to define irony. In general, here again it comes to identifying correlations. As for the same irony - it was invented by the ancient Greeks. It seems that any person recognizes it for some specific features. If a machine is taught to capture these signs, it will also learn to identify irony.
I repeat, the capabilities of the system are determined by the relevance of the problem being solved. Basically, large companies need the help of IBM Watson, and they hardly need to define irony in the reports of their employees first (although this probably happens). But for them, if necessary, we set up the system so that it can determine the attitude of users / buyers to the brands and products of companies.

Example: in Spain, more than two years ago, a large project was implemented to assess the attitude of users to the brand. The customer was a large company that asked to analyze its attitude to it from various sources, including social networks, newspapers, magazines, etc. This was successfully done. In the course of such work, we isolate and analyze false data that are related to counterfeits, which cast a shadow on the reputation of the original brand. At the moment, this system is used by world famous brands, the project is very successful and allows you to increase sales efficiency.
In general, Watson solves specific problems. The system can do a lot of things, what exactly is determined depending on the general statement of the task and its orientation.
The question is the limits of the system's capabilities. Take an example - if you take the same O. Henry, is it possible to set up Watson for literary translation of the works of this author, and how long will it take? Let's say it took the publisher, which is willing to pay for it.The answer is definitely possible. But I can not answer exactly how long it will take. This is a matter of effort and investment.
Any specialized Watson system, be it a “medic”, “financial analyst” or “engineer” requires the participation of specialists. In this case, I would recruit teams of the best linguists on the theory and practice of language. Part of the teams will be dictionaries, idioms, look for data on the text, the correlation between Russian and English. What for? One word in any language can mean a lot. Such words will be included in the dictionaries, indicating the most extensive range of their knowledge.
After that, you need to start solving the second problem. Namely - to drive into the database the texts of O. Henry's translations, which are considered the highest quality and most successful. Watson will then use the method of estimating correlations with maximum values ​​to search for suitable words. The system will choose various translation options from simple to complex (words, phrases, sentences, etc.). During this process, expert groups will be needed who will additionally train the system. They will adjust the translations, fine-tune them so that after such training the Watson system starts giving really good translations. This is how it works - the first translation will not be very good, the second - better, and then - very good. A big plus of Watson is that the system can be tuned up, thanks to feedback. Indeed, without feedback, the system simply loses control. Feedback allows the system in the dynamics, in the course of work to clarify and adjust the main goal. In our case, feedback is provided by subject matter experts. If this is an oil company, like Woodside, then the best experts will mark the best, successful answers of the system, and the system will memorize it, gradually improving the quality of the issued recommendations. So Watson has another advantage. If most systems become obsolete over time and require rework, then this system will only gain experience over time and become even more powerful.
Another question - are there any tasks that Watson cannot solve now under any circumstances?There is a very important aspect - ethical. Part of the problem is unsolvable because the existing issues are beyond the scope of technical systems. As an example, robobile. Roughly speaking, who or what will drive a car if it is impossible to avoid a collision, but there is a choice - say, to a wall, an elderly person, or a pregnant woman? The driver is a man, he still makes his choice. But the machine is not, it cannot make a choice, since this issue has not yet been resolved, either legally or ethically. And in the car, this knowledge and rules of conduct in extreme situations is simply not yet possible. This is the first class of tasks that are not yet resolved, since a number of ethical, legal, social and other problems associated with the tasks themselves have not been solved.

The second class is extremely complex technical tasks that will require a huge amount of resources. In order to understand whether there is a solution to such a problem, you must at least try to solve it. An example is the same as with the texts of O. Henry. Nobody has done this yet. Probably, if you try, everything will work out, but we can’t say for sure now.
Summarizing the above, I want to express the confidence that virtually any task can be solved. If it now seems that a question cannot be answered, after a while there is a person who gives an idea that opens up unprecedented opportunities. Example: at one time it was believed that the composition of the stars could not be determined, and this could never be done. But they soon invented a spectrograph, and right there they determined that the prevailing element in the star is helium. After some time, the composition of the stars learned to determine very accurately.
Regularly, there are solutions that radically change the vision of our world. It is difficult to establish the boundaries of the possible and, to be honest, I would not even put them on.
How do you see IBM Watson in the future?As mentioned above, Watson in general can be described as a system that helps a person make a decision in the face of great uncertainty. I believe that, like all other systems, it will become significantly cheaper, more universal, it can be used in other areas.
I think that this will be a universal hint system that will be able to answer a wide range of questions and which will become familiar to us. Moreover, it will not respond in the same way as modern search engines - not just providing links to sources on the Internet, but providing recommendations indicating the source of information and the methods used.