What's inside of HR? (Anatomy will not)

Hi, Habr! Recently, we shared an interview with a representative of the robot Vera, the world's first robot recruiter, developed in Russia by the Stafory team. And here came the technical details of the project, because we asked the children to tell how the Faith works.

Interview about the robot Faith.

I give the word to the author.
')
Robot Vera was invented a little over a year ago, but in such a short time she managed to do quite a lot. The world's first robot recruiter saves 60% of working time for recruitment specialists in 300 companies not only in Russia, but also abroad. In just one year, Vera managed to pick up more than 95,000 candidates for companies, hold 2,300 video interviews and make more than 1,000,000 calls, and in October 2017, she also won the global startup competition HR Tech World.

A lot of work - Vera is looking for suitable resumes among tens of millions of entries on work sites, at the same time makes tens of thousands of calls to job seekers to offer them vacancies, conducts video interviews, recognizes candidates' emotions and answers the question “What about the money?” than on the standard “What salary is offered?”.

Web side of Faith

Faith is managed in a personal account on robotvera.com, and Microsoft's solutions helped to quickly launch a robot recruiter and ensure uninterrupted work. The first thing that was deployed in the Azure cloud was a web server on Django. Due to the fact that the marketplace has a ready-made solution with very simple and understandable documentation, everyone set up very quickly. In total, less than an hour passed from the start of work to the moment when the site was launched.

Over time, we switched to standard instances from Ubuntu, but in the early stages, the ability to deploy a site as a cloud service, having spent only a few minutes, was very useful for us. The presence of instances in the marketplace already with Django turned out to be another way. For example, there is a build of a virtual machine from Bitnami, and if you use it, then you no longer need to install Django and the server - everything is already there. As a result, you just need to create and run your application site.

Of course, it is impossible not to take into account that Bitnami has its own characteristics, for example, an additional firewall - because of it, the port must be opened not only on Azure, but also on the virtual machine itself in Bitnami. But this is also not difficult and does not take much time, especially with the help of the Azure marketplace we have already pretty saved it.

We teach Faith

The web server, of course, is not the most difficult task that we solve when working with our product. Since Vera takes on the task of calling the candidates around and talking to them, she should talk like a person.

We want our robot recruiter not only to ask the employer's questions about the script and mechanically fix the answers, but also to tell the candidate the information that interests him, coping with any complexity of the wording. This task lies in the field of machine learning. It was almost impossible to program hardcorely all the possible questions from the applicants, because there are thousands of variations. The standard question “What kind of salary?” May sound in a dozen variants, for example, “What about the money?”, “What is the salary?”

It is impossible to foresee all the options, and in order to find the answer that is closest to the candidate’s question, we decided to use the word2vec library. If someone suddenly forgot or does not know, this is a technology that processes huge amounts of textual information and calculates the vector representation of words. The latter is based on proximity in context: words that occur in the text next to the same words are defined as having a similar meaning. In the vector representation, they will have close coordinates of word vectors.

Let's take a look at how the model is trained on a couple of sentences, in this case these words refer to income.

In the code above, we initialize the array with sentences, then create the word2vec model and pass our array into it as an argument. We get all the vectors from the model into the X variable. Next, we create the two-dimensional PCA model (principal component method) using the scikit-learn library, call the fit_transform method and pass our vectors to it.

After that, our words can be displayed in vector space using pyplot:

It can be seen that at this stage the words are still dispersed, and there is no obvious correlation of meaning and their location. This is because the model is not trained at all, since we only passed a couple of sentences to it.

In order for the robot to fully communicate, we have loaded into the model 13 billion words (these are 25,000 books and television scripts, or 150 GB of texts) and a description of 100,000 jobs. This is a huge array of data, but only after that the model starts to calculate the distance between sentences quite well and allows you to correctly select answers to users' questions.

To calculate the distance, use the wmdistance () method. If we take our example with a couple of sentences that we use as arguments, in this case the model must calculate the distance between these sentences. The listing below shows how this works: the candidate’s question “how much do you pay?” Has a smaller distance with respect to the word “salary” than, for example, the word “address”.

Considering how much information we downloaded so that the robot could communicate fully and the distances between sentences were calculated correctly, our model weighs more than 19 GB in RAM, plus a case for training — another 41 GB. Again, using Azure, we easily picked the right machine. It took a few minutes to unwrap it, after that we made an API on this instance to use this model in our product.

We teach Faith for 1 night

And here is another story where Azure helped quickly find the right instance. When we conducted pre-ICO, we had to train a chat bot overnight to answer questions about cryptocurrency and blockchain. Dialogues we took from this forum. There are not so many full-fledged dialogues there, but still enough for the machine to start saying something more or less connected.

For example, we have such dialogs:

- Is mining illegal?
- That depends per view.

- Chinese government banned all of ICOs.
- Please don't panic, just calm for your mother.

- What is your favorite trading exchange?
- I use coinbase.

We taught the bot using the TensorFlow library's Sequence-to-Sequence model, which translates text from French to English. For training, a machine with a GPU was needed, so we used an Azure N-Series instance. All training took place in the cloud. Of course, it is clear from the examples that the bot made mistakes, and at that time we did not achieve 100% coincidence of questions and answers. But considering that we spent only one day and one night on development and training, and the bot still managed to joke, we will assume that we have achieved the goal.

And here you will find an interview with Alexei Kosarev, one of the creators of the Robot of Faith.

about the author

Vladimir Sveshnikov - in the past, the founder of the company for the design of personnel from the CIS countries. Then the partners created a robot Faith. In the fall of 2017, Robot Vera became the first Russian startup to win a world pitch competition at the HR Tech World 2017 conference in Amsterdam.

Source: https://habr.com/ru/post/348808/

All Articles

What's inside of HR? (Anatomy will not)

Web side of Faith

We teach Faith

We teach Faith for 1 night

about the author

More articles: