How to gather a chat chat bot from scrap materials in a day and a half

We start talking about some projects of our hackathon. Today is the bot issuing several popular tweets to our student in FB with the word just taken into study. It turned out a kind of micro-tutorial on Chatfuel, a convenient and simple tool for assembling such bots from the "cubes".

Let's start with the disclaimer: this text is not about the finished service of the Skyeng school, but about the prototype assembled on the knee in a day and a half. We hope that, perhaps, this story will be useful to someone, and simply entertain someone. Or maybe someone will inspire, and he will bring it to mind - we will only be happy, it falls into our concept of the “outer contour” of the ecosystem. As practice shows, immediately after the hackathon, all its participants are eager to finish their projects, but then the everyday routine rushes in, and these desires disappear somewhere ...

Idea

In our dictionary there are examples of the use of words, but they are all artificial - they were prepared by our methodologists, these are academic “correct” examples. In reality, people do whatever they like with words, so an idea emerged to show examples of such real use from “wildlife”. For example, from Twitter - there people write whatever they want and in any way, they need to write a short message there immediately, as it is painful. Therefore, they are living, people do not think about the structure of phrases, and in general it is the text format that is closest to life.
')
Therefore, we conceived a small project that, with the help of our open Skyeng API, would track the words added by our student for study, and give him in the messenger popular tweets containing them. We wanted our product to work in the messenger, and not on any separate site or in a separate application, so that the user does not have to go anywhere and install anything; I needed a point that he works all the time and will be able to tug at it independently.

The team consisted of two server programmers and an analyst with experience in front-end development. And the project itself, respectively, of two parts - a small server application and chat bot. The general mood was - let's not break up and collect something simple; In the end, this is a hackathon combined with a corporate party, you must not forget about your own pleasure.

Twitter

Twitter has a search, you can go to the site, type in words and see the tweets that contain them. This search is done both as a Twitter interface and as an API . The search engine is smart enough, almost like Google: it understands word forms and compound expressions; for example, if you drive cut down, he will find tweets like I will cut this monstrosity down, where the words themselves are spaced. To implement our plan, it was enough for us to pull the Twitter API, transfer to it the words that the student added to the study, take tweets from the issue and drop them somewhere in the messenger.

Obviously, there will be one hundred million tweets with such words, and somehow we need to select good ones. The API has a set of query parameters, including the result type (result_type) - either the most recent, or the most popular, or all in a row. We used the popular ones: it doesn't matter to us that the tweet is old, it is important to us that the readers liked it. The count parameter defaults to 15, we used 100 to add variety.

Of these, we chose three in the parameter favorites_count, i.e. by the number of likes (Twitter’s popularity calculation mechanism is closed, but in any case it takes into account many criteria that are not very useful to us, such as the number of author’s subscribers, retweets, etc.). Since there was little time on the Hackathon, we decided to fix our parameters in this way, there was simply no time to invent something particularly clever and advanced. If you wish, you can change them in the future, add custom settings, etc.

Server

We took a typical server project template using the Yii 2 PHP framework and made three methods in it.

1. When registering a user (receiving an e-mail address and a token from a chatbot), a simple table is created into which a list of word meanings taken for study is copied. This table is needed due to the fact that our external API does not pass the time when a student has added a value to his dictionary. Therefore:

2. the second method once a half hour goes to the dictionary server and compares the local list of user values with what is stored there. If it detects added values, this method sends them further; if there are more than three new words, three are randomly selected.

3. The third method is sent to the Twitter API to search for three tweets for each word using the algorithm described above and sends the search results to the chat bot API.

Bot

Initially, we considered options for sending tweets to Facebook, Twitter or Telegram. However, in order to tweet, we would have to create a website where the student could give permission to send tweets to him. This is a Twitter requirement, everything is done using an SDK that needs to be embedded somewhere, in general, this was too much for MVP. They started to deal with the docks of Telegram, they realized that Telegram-bot had to be programmed from scratch, there were no ready-made solutions, it would have to be tested, bugs would have gotten ... We decided to stop at the FB-messenger. We decided on the architecture, agreed on the structure of the base and the semantics of the API methods, after which the programmers went to make their services, and the analyst got busy with the bot itself.

Chatfuel was used - a thing that allows us to assemble a bot for an FB messenger without programming (there was an alternative in the form of Sequel , but it didn’t work for us, because it does not allow using third-party API). It has a visual interface in which the functionality is represented by blocks, within which individual actions look like cards processed in turn. Chatfuel has internal variables that matter to us.

The first two blocks that the user sees when creating a new bot are the default greeting message and the reply to the user's reply, to which the answer is not provided. The same text was entered here - a suggestion to enter the e-mail that was used to register with Skyeng.

Our first custom block is email input. The first card of this block is User input. It has built-in validation, in this case it checks whether it’s really an address or something completely different, after which the address is saved to an internal variable, and we go to the second Go to block card, which in turn takes us to the next block Send token.

This block calls our external API, which sends the token to the student's e-mail. The first card is “tapping” for three seconds; it shows the bot's activity to the user while the request is being executed. In parallel with it, a request card is launched — the JSON API, which calls the method of our external API and sends it the student's e-mail from a variable.

The third card is to enter a token without validation, while saving it to a variable; the fourth is the transition to the next block.

The third block is Subscribe, it subscribes the user to the feed. Here again the “tapping” card (since there is again a request that takes time), followed by the JSON API, where we send our request to the new server with an e-mail, token and chat ID {chatfuel user ID} associating the student with this channel. This ID will be required to call Chatfuela's Broadcasting API directly from our new server. Upon receiving this request, the server retrieves identifiers of the word values that the student has studied using a different method of our external API , for each of them retrieves a word from our dictionary , hangs on its schedule a regular check for updates, and then, if available, sends itself tweets in FB via Broadcasting API.

The last block displays these tweets when received. The server pulls the Chatfuel API, sends the values of the variables {tweet_word} and {tweet_text}, and the bot, having received them, immediately sends a message to the user.

Shoals (where without them)

None of us had dealt with Chatfuel before, we did not know that it has a Broadcasting API. Therefore, they first tried to use the RSS block, and the bot structure was different - there was a block in it that subscribes the bot to our RSS and a block that picks up entries from the RSS. It was planned to make for each user a unique address on our server, on which to spread tweets in the form of RSS feeds, and then import them from there to pick up and shove in the chat.

We ditched half a day to do this, hooked up with some kind of server-side PHP library that can build an RSS feed, but it did not climb. At the same time, everything worked perfectly with the news RSS feed from vc.ru (they took it for tests), but the same tape, in exactly the same form, in the same XML format and with HTTP headers, transmitted through our server, not displayed. Why - it remains a mystery, maybe our server is not https, maybe there is something else like a list of trusted sites, but nothing happened. As a result, we found in the docks about the Broadcasting API, and we suffered insight: why bother with XML, when you can simply pass the text as a parameter. As a result, everything was quickly collected, but the traces of our RSS-torments remained in the server code.

Quite a lot of time spent on setting up the server. At first they raised it on a free service, but it turned out that he did not let the API address stand outside. I had to ask our admins to give Amazon a place in our cloud. But in the end, we pretty quickly prepared the MVP and managed for two days to rest and have fun.

What is not implemented, the groundwork for the future

We agreed not to sit at night, but to make the minimum working version. Naturally, a lot of everything remained for later:

- you need to play with the settings to give the most useful tweets. Initially, we wanted to make a smart filter: tried to cut retweets, tried to cut tweets with links (because this is most likely an explanation for something, not a complete record), but we didn’t have time to bring it to mind: often Twitter gave too few results, therefore, not all words were able to find the right amount of tweets.

- it is possible to simplify the logic of calculating new words, if we add to the external API Words time when the meaning of the word was taken for study;

- the school has its own service, which for each word gives the level of knowledge of the language; if you add it to the external API, you can use it to parse each tweet in order to understand if all or not all the words correspond to this level, and to offer the student only those tweets where he will most likely understand all the words;

- it was possible to take into account other vocabulary from the student’s dictionary. This would allow for more useful tweets. In the search for Twitter, you can even feed three words, and he will give something. There are two options for implementation: or make all possible combinations of three words from the student’s dictionary and throw them into Twitter search, after which you can see which set works best; or search for words one by one, and then choose the most relevant among the results;

- there is a subtlety related to the fact that our bot does not know the meaning of the word used on Twitter, as a result of which the student learns the word cap as “cap” and receives a tweet about Captain Obvious; With this for two days we could not do anything. The browser extension team has some groundwork for context analysis, they can be used in the future.

The current version of the bot can be felt here: m.me/skyengtweets .

We will continue to talk a little about projects hackathon, but for now we remind you that we are actively looking for cool people who are ready to join our team!

Source: https://habr.com/ru/post/339918/

All Articles