Why do we need natural language processing in medicine: current tasks and challenges?

As they write in The Huffington Post, 80% of these electronic medical records are stored in an unstructured form - the so-called “text bubble”. Not only the EHR data is stored in text form, but also a large amount of other medical information — these are scientific articles, clinical guidelines, a description of diseases and complaints. And even if the data in them is partially structured, there are no generally accepted formats for storing them.

Extracting useful knowledge from the “text bubble” is problematic - the simplest algorithms are able to check the document for the occurrence of certain words or phrases, but this is not enough: details are always important for the doctor. He needs not only to know that the patient has a fever, but also to understand the dynamics: for example, “the temperature rises in the evenings to 39 and lasts for the fourth day”.

Natural language processing (NLP) technology is capable of extracting valuable information from medical texts and electronic medical cards. Next, we will describe how NLP technologies simplify the work of doctors: let's talk about speech recognition and texts filled with medical terms and help in making clinical decisions.
')

“Doctors DOC + will politely deal with your sore”

What is NLP

In fact, the history of NLP began from the first days of the existence of modern science of artificial intelligence. Alan Turing, in her work “ Computing Machines and the Mind ”, calls her ability to communicate with a person as a criterion for “rationality” of a machine - now this is an important, but not the only task that the developers of NLP systems solve.

NLP combines a number of technologies (including very far from each other from the point of view of mathematics), which allow to solve algorithmic problems associated with the processing of the natural human language:

Extraction of facts from the text (from a simple search for stop words to complete syntactic analysis of the literature);
Voice recognition and voice to text conversion;
Text classification;
Generate text or speech;
Machine translate;
Text analysis of the text (including Opinion mining);
And etc.

In science fiction, a supercomputer can often do all of the above. In the cult film " Space Odyssey of 2001 ", HAL 9000 recognized human speech and visual images, communicated in ordinary language. In practice, all these tasks are highly specialized, and they are solved by separate algorithms.

And these algorithms (and the underlying technologies) are constantly progressing. For example, the direction of “closest” to ordinary users, NLP — voice recognition — was based on hidden Markov models several years ago. They broke up what was said by man into small components, isolated phonemes, carried out statistical analysis and gave the most likely result of what was said in text format. Now developers are much more likely to use neural networks — in particular, recurrent neural networks and their variations, for example, long short-term memory (LSTM).

Today, NLP systems are being used more and more often - we talk to Siri, communicate with Google's assistant (LSTM with CTC is used in Android OS) and car infotainment systems, our mail is protected from spam by smart algorithms, news aggregators select articles that will we are interested, and search engines allow us to find the information we need for any queries.

What tasks solves NLP in medicine

However, NLP-systems are useful not only in the work of modern gadgets and online applications. They have been introduced in separate hospitals and medical universities since the beginning of the 90s.

The first NLP application developed at the University of Utah at the time was the Special Purpose Radiology Understanding System (SPRUS) for the Salt Lake City Clinic. This tool used information from an expert system that compares the symptoms with the corresponding diagnoses, and parsed radiological textual reports (medical protocols interpreting X-rays).

The program used the technique of semantic parsing, based on the search for words in the thesaurus. The thesaurus was automatically updated from the knowledge base to solve diagnostic problems using a specially developed compiler.

Since then, the possibilities of NLP and machine learning in medicine have leaped forward: today technology simplifies working with electronic medical records for doctors and reduces the frequency of clinical errors, “assisting” in making medical decisions.

Simplification of work with electronic cards (EHR)

Electronic medical records, or EHRs, are analogs of paper cards we are used to. The task of the electronic card is to simplify the flow of documents and reduce the volume of paperwork. We described in more detail what an EHR is and how they help control the quality of medical care, in one of our past materials .

Despite the fact that with the introduction of the EHR, it became easier for doctors to work with documents, it still takes some time to fill out the cards. According to a study published in Computers Informatics Nursing in 2012, nurses in US hospitals still spend about 19% of their work time filling out electronic cards.

Yes, this is only the fifth part of the working day, but even this figure can be reduced and the released resources can be used to care for the sick. According to the president of Nuance Communications, Joe Petro, NLP technology will allow this.

In 2009, Nuance learned the opinion of thousands of US therapists about natural language processing technologies. According to the study, 94% of the interviewed doctors called the introduction of EHRs with NLP an important driver for the quality of care.

An example of the implementation of this approach is a service that is used by the Hudson Valley Heart Center medical staff in Pookuji. Hospital nurses, using a solution from Nuance Communications, dictate patient extracts from the patient’s medical history, record the results of a physical examination, and record data on the course of the disease. The application automatically updates the records in the EHR system implemented in the hospital.

Similar solutions are being introduced in Russia. For example, in 2016, the Speech Technology Center began to develop the Voice2Med system for recognizing medical speech and reducing the time to fill out reports and medical cards. As stated in the Ministry of Labor and Social Protection of the Russian Federation, now it takes half of the doctor’s working time.

Our NLP solutions

The key task of NLP in medicine is to extract data from the text. We at DOC + focus on her. Our team of machine learning algorithms employs six people. Of these, two work exclusively on NLP technology. In DOC +, NLP technology is used to mark the cards on which the EQM quality control system is trained (we wrote about it in the previous article ).

On the basis of the same system, our history-bot works, optimizing the work during online consultations. The bot works online and asks the patient to describe the complaints in a free form, then isolates the symptoms from the text and informs the doctor. Thanks to this, the specialist begins a telemedicine consultation with a patient already prepared (we will tell you more about the work of our medical history in the following posts).

Features of the development of NLP-systems

In the development of such systems there are several difficulties. The first of these is that when working with texts it is not enough to use simple, widely used algorithms and approaches. Services that look at the text for the presence of certain words and consider the frequency of their appearance to assess the "importance" in medicine give a very limited result.

When making a diagnosis, it is important for the doctor not only to know that a person had one or another symptom, but also to understand the dynamics and parameters of this symptom - localization, type of pain, exact values of the digitized indicators, etc. Therefore, to work with medical texts, more complex algorithms are needed, highlighting not just words, but complex facts about various complaints and symptoms.

From the text: “On February 18, my head ached from the left side, in the evening the temperature rose to 39. The next day, the headache area increased, there was no dizziness” the system should highlight structured information about three symptoms:

Headache - appeared 18.02; localization: left; Dynamics: 19.02 - increase in the area.
Temperature - 18.02; value: 39 degrees.
Dizziness - a symptom absent.

The second feature is that text processing tools need to be further customized for working with highly specialized materials. For example, we had to “twist” the spelling checker system, because none of the solutions on the market met our requirements.

Spell checkers corrected the word "cough" to "drops" because they were trained on texts without medical terminology. Therefore, we retrained the system on the corpus from medical articles. And such minor improvements to the classical algorithms have to do all the time.

What can our NLP-system

The solution we have developed now recognizes 400 terms — symptoms, diagnoses, drug names, etc. For most of the symptoms, the system is able to isolate additional properties: localization (abdominal pain to the right of the navel ), type ( wet cough), color ( transparent sputum ), the presence of complications and the values of measurable parameters (temperature, pressure).

In addition, she is able to allocate time parameters and compare them with symptoms, correct typos and work with various options for describing the same facts.

Clinical Decision Assistance (CDS)

Clinical decision support systems (CDS) provide automated assistance to doctors when making a diagnosis, prescribing a treatment, determining the dosage of drugs, and so on. The NLP-systems allow you to obtain the necessary medical information - they draw it from scientific works, test results, medical reference books and even the words of the patient himself.

One such solution was developed by IBM. This is the question-answer system DeepQA, with which the IBM Watson supercomputer works . Watson in this case acts as an “NLP-search engine” on large databases: it processes the questions of doctors and gives a specific answer to them, and not just gives out search results on the Internet. The technology at Watson allowed him to win at Jeopardy! (American progenitor of "His game").

Another example of the use of such technologies is the NLP system created by a team of scientists led by Dr. Harvey J. Murff of the Medical Center at Vanderbilt University. The developers have taught the algorithm to analyze the patient's electronic cards and identify diseases that could cause complications after surgery.

The NLP processor indexed entries in medical records using a scheme based on the SNOMED-CT systematized, machine-processed medical nomenclature. At the output, the system generated an XML file with a “tagged” patient card. The experiments showed that the program correctly categorized most of the complications, for example, renal failure was correctly noted in 82% of cases, and postoperative myocardial infarction - in 91% of cases.

DOC + also has its own analogue of CDS - any action of a doctor in the application is accompanied by prompts, but for the time being they are formed by classical algorithms based on rules, without using machine learning and NLP. But we are working on the creation of a new generation of CDS, which will read the entire patient's medical history in natural language and use it to prompt the doctor.

Further development of NLP systems

NLP-systems will allow working not only with medical records, but also with scientific articles and medical standards. A great experience has been accumulated in the field of medicine, which is summarized in clinical recommendations, scientific works and other textual sources. It is logical to use this data for training artificial intelligence systems on a par with real-life patient cards, in parallel creating a structured database of medicine that can be used not by people, but by algorithms.

The advantage of such NLP-systems is that the results of their work are often easier to interpret, that is, tied to specific sources. In general, the question of the interpretability of the results of machine learning algorithms is far from trivial, and is important both for the scientific community as a whole (at the leading international machine learning conference ICML a separate workshop is regularly devoted to it ) and for developers, especially when it comes to projects in evidence-based medicine. For us, the requirement of interest makes the task of improving our NLP system even more difficult (and more interesting).

NLP is a promising direction that will raise the quality of medical care to a new level. We plan to actively develop these technologies further and continue to talk about our developments in our blog.

Additional reading: useful articles from our blog “ Just Ask ”:

Source: https://habr.com/ru/post/411123/

All Articles