📜 ⬆️ ⬇️

Scientists have created a neural network that recognizes "drunk" messages on Twitter

In the light of current weekends, it is important not to forget that alcohol and socializing together are not always a good combination, even among stars . However, many of us repeat this experience again and again. And this experience gave the American scientist (Nabil Hossain with friends from the University of Rochester) an interesting idea. As a result, American scientists have developed a neural network that can recognize Twitter posts written while intoxicated. In addition, the resulting mathematical model can determine where the authors of the "drunken" posts were at the time of their writing.
This is reported by MIT Technology Review .



To create their own neural network, experts from the University of Rochester have been collecting tweets for a whole year that use specific “alcoholic” vocabulary. From this set, all tweets that mention alcohol or alcohol-related words, such as "drunk", "beer", "party", and so on, are filtered. Analysis of about 11,000 posts helped to establish whether the author of the message is to those who drink alcohol, and whether the tweet was written directly while drinking the drink. This is a fairly large set of tweets for machine learning.

Scientists also decided to determine where users from most often write "drunk" tweets.
')


In order to understand whether the author of the post was at home, the use of specific “homemade” vocabulary (for example, “sofa” or “bath”) was analyzed. In addition, whenever possible, geolocation data was used. To clarify the coordinates of users, additional algorithms have been developed, for example, it was interesting to know, at home, users, or anywhere else? Typical algorithms include the analysis of the places where the last message was sent in the time interval from 1:00 am to 6:00 am. Nevertheless, the methods have their drawbacks and not high accuracy.

Hossain and colleagues developed a different approach. They have compiled a list of words and phrases most likely sent from their homes, such as "Finally at home!", Or "in the bath", "on the couch", "in front of the TV" and so on. Data tweets were the original data set to clarify the location of people, on the basis of which the neural network formed its own models for defining people at home. The algorithm specified how the user's location at home correlates with other indicators, such as the location of the last tweet during the day, the most massive location of a tweet, the percentage of tweets from a specific location, etc.

Based on several indicators, the neural network has significantly increased its accuracy. As a result, Hossain and co-authors claim that they can identify users outside the home with an accuracy of 100 meters with a probability of 80%. This is significantly better than any previous work.

Together, these two methods allowed the team to develop a model when and where people drink. And they used it to compare typical drinking examples in New York and in the suburban area of ​​Monroe County .

Researchers do this by dividing each area with a grid of 100 x 100 cells and marking those areas where there are alcohol-related tweets. That allows them to develop and compare the "heat maps" of alcohol use for each area.

Also tweets about the topic of drinking made from home location, from tweets in other places. Outlines the point of sale of alcohol in each area. This allows researchers to study the relationship between the density of tweets sent from different regions while intoxicated and the density of alcohol sales.

The results are interesting for review. First, Hossain noted that a higher proportion of tweets in New York are alcohol-related than in Monroe County. “One possible explanation is that a crowded city, such as New York with a high density of alcohol sales, contributes to the fact that more people communicate when they use a higher level of alcohol consumption because of its availability,” they say.



Moreover, geolocation data shows that a higher proportion of people drink at home (or within 100 meters from the house) in New York than in Monroe County, where most people drink further than a kilometer from the house.

Heat maps also reveal interesting patterns. This allows the team to find areas of 100 x 100 meters, in grid squares, where there were at least five alcohol tweets. “We believe that such areas are a sign of unusual drinking activity,” says Hossain.

They also found a correlation between the density of alcohol sales in the region, and the number of tweets indicating that someone is drinking now. There is an interesting question about how correlation and causation are related in this case. Does high selling alcohol make people drink more? Or drinkers flock to areas with high density of alcohol sales? Of course, this kind of data alone cannot answer this question.

However, the advantage of this method is that it is cheap and fast. Other methods for obtaining similar information are extremely expensive and time consuming.

It is usually required that people be carefully selected to complete pre-prepared questionnaires, which must be further carefully analyzed. A neural network trained for this method can even monitor alcohol consumption in real time. “Our results show that tweets can provide detailed information about what is happening in the cities,” the researchers say.

There are features, of course. There is a distortion in the data collected from Twitter, since young people are predominant and a small proportion of the population are actively using social networks. But such distortions are present in other methods of collecting information, for example, surveys, as a rule, do not take into account people who do not want to undergo surveys, such as some immigrants.



Identifying statistical distortions is an important part of all methods of collecting information.

In the future, the authors of the study want to teach the neural network to determine gender, age, ethnicity and other features by recording on Twitter. Scientists believe that this will help in studying the effect of alcohol on health. Such a seemingly not serious study has a rather high practical importance, since only in the United States, 75,000 people a year die due to alcohol abuse. The presence of a model of alcohol consumption in society will allow you to identify reasonable ways to solve this problem with minimal cost.

Source: https://habr.com/ru/post/280186/


All Articles