The neural network evaluates the emotional coloring of the speaker’s 30-second speech fragment. Illustration from the previous scientific work of the authorsIn recent years, machine learning has increasingly been used as a useful diagnostic tool. Existing models are able to identify words and intonations of speech that may indicate depression. But these models usually work only if the patient answers specific questions of the doctor: for example, about his mood, lifestyle, medical history, etc. That is, the work of the neural network in this case does not differ from the work of an ordinary psychotherapist who talks to the patient.
But for medicine, a new generation is much more effective a system that defines depression on
an arbitrary set of words , without a specific set of questions. Theoretically, in this case, you can automatically monitor the mental health of the entire population in real time (all voice traffic) - and promptly hospitalize patients. An automatic depression detection module can be embedded in mobile apps and games.
This model was developed by scientists from the Massachusetts Institute of Technology,
writes the publication
MIT News . The scientific article will be presented at the conference
Interspeech 2018 , which will be held September 2-6 in India.
')
“If you want to deploy models [of detection of depression] in a scalable way ... then you need to minimize the number of restrictions on the data used. The model should extract data from any ordinary conversation and natural interaction between people, ”said Tuka Alhanai, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology and lead author of the scientific work.
The researchers hope that the new method will be used to detect signs of depression in a natural conversation. For example, based on the model, you can develop mobile applications that track the user's text and voice for mental disorders and send alerts. This is especially useful for those who can not get to the doctor for the initial diagnosis due to the absence of a doctor, the high cost of counseling, or simply because of ignorance that he has a mental problem.
Depression is a very dangerous mental illness, which is accompanied by a decrease in self-esteem, loss of interest in life and habitual activities. In some cases, a person suffering from it may begin to abuse alcohol or other substances.
The key innovation of the technology is its ability to detect patterns that indicate depression, and then compare these patterns with new people without additional information, that is, without prior training on a particular person. “We call this work“ without context ”because you do not impose any restrictions on the types of questions you are looking for and the type of answers to these questions,” explains Alkhanay.
A technique called sequence modeling, which is often used for speech processing, was used to train the neural network. The model is trained on text and sound data sequences from questions and answers from people with depression and without it. Gradually, it reveals general patterns, as some words are associated with different sounds in healthy and sick people. In addition, people with depression may speak more slowly and use longer pauses between words. These text and sound identifiers for mental disorders have been studied in previous studies. Ultimately, the model itself determines if there are signs of depression in speech or not.
The model was tested on a data set of 142 speech fragments from the Distress Analysis Interview Corpus corpus (sound, text, video). The accuracy of diagnosis was 71% (that is, 29% of false-positive results), and the complete detection of the disease - 83% of all patients in the sample. In most tests, the accuracy exceeded the performance of all previous models for diagnosing depression. Researchers find the preliminary results very encouraging.
In a
previous scientific article from 2017, the authors described a neural network that recognizes the speaker's mood by the following features:
- voice characteristics;
- a set of words;
- pulse.
The illustration shows the distribution of emotional content in five-second intervals. Negative segments are those where signs of sadness, disgust, anger, fear, or boredom are found. Positive segments contain signs of happiness, interest or enthusiasm.
In addition to depression, scientists intend to teach the neural network to recognize other mental states, such as dementia.