📜 ⬆️ ⬇️

How we did the subway breakdown alert



Long ago, about a year ago, when the Moscow Metro broke in random places and surprisingly often, we ( dcoder_mm & Irenica ) had the idea: to make some sort of service, to alert you about breakdowns.

This idea may seem strange to you, dcoder_mm, too, so it seemed, until he got into one of these breakdowns. Standing on a crowded platform for 10 minutes while waiting for the train, it turned out to be unpleasant, so much so that from now on I decided not to get caught like that.

After this incident, they decided to look through twitter: do they write anything about incidents in the subway? As it turned out, they write. And the first tweets about it were 10 minutes before I went to the subway.
')
Then, slowly, the idea began to emerge: we collect information from twitter, look for references to metro breakdowns and notify the user if something is found.

True, then this idea was abandoned for a long time, but in the fall of 2014 we remembered it again - the metro began to break down again regularly.

As a notification method, it was decided to use SMS. It was possible, of course, to use something else, even though a twitter bot could be used to aggregate messages about the metro, but SMS has one advantage - they get even when there is no normal internet connection. And in the subway it often does not happen, but at the same time GSM is caught normally. (When we started to do the service, there was no Wifi in the whole metro yet).

The case remained for the small - in fact, to make a parsile of tweets and sending SMS.

To test the battle, a simple single-line script was written, receiving through the search (curl search.twitter.com) the latest tweets with the hashtag # of the metro, selecting the necessary ones by keywords, and sending us sms.

curl -ssl "https://twitter.com/search?f=realtime&q=%23%D0%BC%D0%B5%D1%82%D1%80%D0%BE&src=typd" | grep -E -o "js-tweet-text tweet-text.*<\/p>" | sed -e 's;[><\"=-]; ;g;s;js tweet text tweet text lang ru data aria label part 0;;g;s;/a;;g;s;\/\(a\|p\); ;g;s;\(.span\|.strong\|class\|twitter\|timeline\|link\|js display url\|invisible|\tco ellipsis\|href\|nofollow\|dir\|ltr\|data\|expanded\|url\|invisible\|tco\|ellipsis\|google&\;utm_medium\|banner&\;utm_campaign\|business_news\|target\|_blank\|title\|atreply\|pretty\| \;\|rel\|s\|?utm_source\|draggable\|false\|alt\|aria\|label\|u\|hidden\|pre\|embedded|\true\|b\|a\|qery\|orce\|hahtag_click\|hahtag\|j\|nav\|emedded\|tre\|&qot\;\|emedded\|tre\|rc\|hh\|qery\|orce\|hhtg_click\|hhtg\|img\);;g;s; ; ;g;s; \; ;;g;s;&qot\;;;g;s;emedded;;g;s;tre;;g;s; \/ ;;g;s;hhtg_click hhtg;;g;s;hh;;g;s;qery oe;;g;s;\/tg\/[a-zA-Z]*;;g;s;tweet;;g;s;text;;g;s;lng;;g;s;nd;;g;s;prt;;g;s;http://intgrm.com\s\/[a-zA-Z]*\/;;g;s;[AZ%az]*;;g;s;[\/_\/:\/?\.@&;…]*;;g;s;^[0-9]*;;g;s;[0-9]\{3,\};;g;s;^\s*;;g;s; *; ;g;s;^[0-9] ;;g;s;# ;#;g;s;[0-9 #]*$;;g' | grep -E -i "||||  |||||||||||| ||||| ||||||||||||||| | | |" | grep -E -i "||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||  |||||||||||-||||1905|||[- ]|[]||||||||||||||||||||||||||||||||||||||||-|  | []|-|  |[][]||  |[][][]|-|  | [][]||  | [][]|[]|  |[][]||-|  | [][]||  |[][][]|[][]|||  ||" | grep -i -v -E " | | |||||||-||||||||||iphone|||||porn|pron|||follow|retweet||| |||| ||||| |||| |||||||||| || | || ||| ||| | |  ||||| || ||||||||||||| |||||||||" | sort | uniq 


And oddly enough, it worked quite well. Sometimes, of course, there were false positives, but the main thing was that alerts came for all the least important breakdowns. And they came quickly - much earlier than this information appeared in the media or anywhere else.

Now it is the turn to filter the false positives. The most brazen and obvious spam, for example, advertising with the hashtag # metro, did not pass, because it did not contain keywords (such as “accident” or “breakdown”). But this did not save, for example, from reports of metro breakdowns in other cities (not always in the tweet about the breakdown of the metro in St. Petersburg they would write the word “peter”). Therefore, I had to enter a list of "stop words" in the presence of which the message was not sent. It included some station names from other cities, keywords that were often used in advertising, etc.

The problem with the fact that a couple of dozen tweets may account for one event, and as many sms, was decided on the forehead: we simply block all messages about this metro line for a certain time.

However, there were other causes of false positives: reports of other transport failures (but with the “metro” tag), reports of old metro breakdowns (a couple of accounts write tweets about the tragedy in the metro in summer 2014, as if it had just happened ).

When it finally became more or less stable, we made a web interface with the ability to register by phone number. Slowly more people began to catch up, and naturally, in full accordance with the law of meanness, a few more false positives happened. To prevent this from happening again, they made a pre-moderation: That is, first the messages come to us, and if we do not prohibit their sending in a couple of minutes, they are sent to all other users.

And we have statistics. Well, if you are collecting data, it’s a sin not to make statistics on them. In our case, this is just a display of the number of breakdowns by branches for a certain period of time. And a brief description of each breakdown. In the future, we will probably add something else, first we need to collect more data.

In general, who cares: msk-metro.ru Here we are. We can assume that the project is still at the stage of beta testing, so if you accidentally receive a left sms, do not worry. We will fix it, and next time such SMS will not come exactly.

And this all lives on Raspberry Pi on debian.

Source: https://habr.com/ru/post/257433/


All Articles