
Education is one of the most important and at the same time underestimated resume fields. Employers pay attention to it first of all when a young specialist is looking for work. Often it is education that declines in favor of one of the candidates. Finally, it happens that companies are looking for a specialist with a very specific education, right up to the faculty of the desired university.
Applicants, for their part, indicate education in the resume is not very willing. An abbreviation in the field of education is still a good option. Often you can find just "technical" or "the name of Lenin."
')
Until recently, “education” on hh.ru was a free text field, which did not allow to fully search for candidates according to this criterion, to visually read information about education in a resume, and for us to build statistics useful for the market. Therefore, it is time to help users by creating a directory of universities and normalizing this field.
About how we solved this problem with 11 million resumes and how users reacted - in this article.
The big goal was to, firstly, when creating a resume, new users choose a university from our directory, and secondly, that existing users update their resumes in the same way.
The base of educational institutions was kindly provided by colleagues from Odnoklassniki. In the course of work on the creation of our directory, we substantially reworked it, but the foundation was already laid, which greatly accelerated our work at the start.
Step 1. Proposals when filling
First of all, in the form of creating a resume, we added drop-down tips (sadzhesty) with the correct and complete names of universities from our directory. For a month and a half of the work of such a scheme, we saw that only 45% of new users choose the university offered by us, the rest preferred to leave their version, even if it completely coincided with the one proposed! As a result, we received 200 thousand resumes with normalized education, but this figure had to be increased by at least an order of magnitude.

Step 2. Mapping
New resumes are good, but in order for the project to make sense and to be useful today, it was necessary to normalize the existing base, which at that time comprised about 10 million resumes. Therefore, we decided to zamappit (compare) "education", which has already been specified in free form by users to the new directory of universities. It was necessary to take into account that users indicate education in a resume, to put it mildly, very roughly (just the word “higher” is also a very common option).
For the mapping, the classical algorithm for searching for the similarity of two texts was used:
cosine similarity . Each text is considered as a vector in the space of terms (words, its components). The more times a word occurs in the text, the more coordinate the vector has on the corresponding axis. Similarity of 2 texts is nothing more than cos between vectors in the space of terms.
Using this algorithm in the forehead gave not very impressive results, so I had to make some corrections.
1. The coordinates of the vector corresponding to the text can take on the values ​​{0, 1} - indeed, several identical words in the name of the educational institution are exotic.
2. The space of terms had to be made anisotropic: the coordinates along some axes make different contributions to the norm of this space.
There are frequently used words (for example, “state”, “technical”) that can be omitted or be present in the user's writing of an educational institution. And they should have less impact on the degree of similarity of texts. On the contrary, words such as “(them.) Kuybyshev” are more important and make it more likely to establish a match. Thus, when determining the level of similarity, the words constituting the texts are divided into several groups differing in their degree of importance for the search for correspondence.
Universities. HeritageRenaming universities is another task that had to be solved. For example, what was once called the "Pedagogical Institute" is now referred to as the "Pedagogical University". Therefore, when mapping takes into account possible homonymy. By the way, in the 90s many cities changed their names, therefore, within the framework of homonymy, “Kalinin Pedagogical Institute” should be mapped onto “Tver Pedagogical University”. Moreover, employers today know, basically, only the modern name of the educational institution.
Matching AbbreviationsA separate task consisted of matching abbreviations. First, some educational institutions had the same abbreviations at different times: for example, Samara State University - the former KSU (Kuibyshevsky) and Kursk State University - the real KSU.
Secondly, educational institutions of different countries often also have the same abbreviations, for example: BSU is Bryansk State University. I.G. Petrovsky, and Belarusian State University. To resolve such conflicts, one had to take into account information about the cities where educational institutions are located, their population, countries of residence of resume owners. Numerous heuristics used have also been of great help with mapping.
Mapping resultAs a result, we managed to “zamappit” a little more than half of all higher education in our resumes: 6,989,453 out of 12,510,682. After testing and verification, we decided it was time to open the results to users and study their reaction.
Step 3. Check the university in the resume
The user cannot quietly change the name of the institution. Few people will like it if the system will make its own edits to its resume, and there have still been inaccuracies in the reference book. Therefore, we created a notice “specify the name of the educational institution in your resume” on the page with vacancy responses. The result - less than 10% of users who saw it clicked through this link: it was not possible to achieve the goal in this way. Probably, users were sure that everything was in order with the “education” and there was nothing to check there.

However, thanks to this notification, we saw, firstly, typical mistakes, and secondly, a strange pattern: even if we zamappili everything correctly, users still returned their version, which, perhaps, is more familiar and familiar to them. It was worth considering for the future.
In general, in two weeks of work, we received another 150 thousand resumes with the right education. In total, for 2.5 months of the existence of the directory of universities, we had 450 thousand zamappin resumes, or about 5% of the entire database. This result was not impressive again, and we continued to draw conclusions and think through further steps.
Step 4. How to pick up passive users
With the help of sadzhestov and notifications, we covered only active users who come to the site. In order to reach applicants who are not looking for a job right now, we decided to send a message to a part of the database of registered applicants. In the letter we wrote that we made some changes in education from the resume, and they need to be confirmed, but we can also reject.

The logic in the letter was as follows:
- if the user does not respond to this letter, then the education in the resume will remain intact;
- if the user confirms that we changed the name correctly, the education in the resume is updated to the current version from our directory;
- if the user rejects the proposed option, he will go into editing his resume, where he will be able to return the original option.

We have unloaded all cases of refusals from our version and, based on them, checked the reference book once more, making the necessary corrections.
Here it should be noted that the wording on amending the resume was not very successful, so we sent letters to another part of the database, where we talked about the new directory of universities and asked users to update the name of the university on their own.

A week after the mailing, we had 1,000,052 completed resumes with education from the directory — an essential part, but not all. Therefore, we continued mailing with a proposal to update the university, explaining why it is needed and what it gives to applicants. In support of the normalization of universities, we also launched the
“Battle of universities” project to encourage users to update their resumes, thereby supporting our university in an improvised battle. Of course, this project does not pretend to an objective rating of universities, but it, nevertheless, also made (and continues to make) a certain contribution to the cause of normalization of education.

Just a few days ago, we added options for the names of universities in English (for resumes in English). While not for everyone, we will increase their number.
As a result, today we have 23% of resumes in the database with normalized education, which is about 3.3 million. By the end of the year we plan to reach 30%.
If you have not yet updated your resume, now is the time to
do it .
If your university is still not in the directory, then
write to us about it, and we will add it.
Step 5. Search for universities - the first thing for which everything was started
Due to the fact that in fact, the fourth part of all resumes now has a normalized education, and this proportion is constantly growing, we have released the first stage of the search by university. Now a recruiter can find graduates of a particular educational institution simply by clicking on it in any resume, and with the help of search filters, the sample is quickly narrowed down to the desired city, professional sphere, candidate’s work experience, language skills, desired type of employment, and so on. For employers who know exactly what they want, or simply picky (as you like), it has now become much easier to find the right candidates. But this is only the beginning.

The normalization of education is only part of the normalization project, which also includes the normalization of positions, skills, employers and professional areas.
If you have ideas and questions about this project - always welcome them in the comments.