Superjob Data Science Meetup (report, presentations, video)
Videos, reports and a short report for those who did not come and did not have time to watch the live broadcast.
The Data Science Meetup was held at the Superjob office. About a hundred analysts and developers came to listen to the reports, including specialists from Renault, Tinkoff Bank, Eldorado, SAP, VimpelCom, Delloite, VTB, and so on. About 500 people watched the live broadcast.
')
The first was made by Dmitry Cojocari, a senior developer of Superjob . He spoke about the experience of solving the problem of combining similar vacancies into groups and the subsequent formation of search issuance of vacancies on their basis. It was necessary to effectively clean up the search results from very similar vacancies, taking into account the requirements of customers and businesses.
In his report, Dmitry revealed the practice of implementing algorithms for solving problems of natural language processing. He also provided technical details on the use of the SimHash algorithm and hierarchical clustering. At the end of the report, Dmitry listed the performance indicators by which the company assessed the success of the developed algorithm.
Maxim Savchenko, Head of Model Development at the Center for Competence Studies and Model Development at Sberbank Technologies , spoke about the peculiarities of using machine learning methods in personnel management (HRM) problems, technical problems and legal constraints in developing and implementing such models. The report also presented the results of pilots conducted at Sberbank for the development of a statistical model for evaluating the reliability of candidates for mass selection of personnel (antifraud model) and a model that allows linking the efficiency of a unit with the professional qualities and actions of the unit’s employees and the organizational characteristics of this unit personnel, labor and executive discipline of employees, qualifications and education, etc.). The input data used information from the personnel system, time sheets, information on the credit history of candidates and the results of the entrance questionnaire of candidates.
Evgeny Grigorenko, an expert on strategic technologies of Microsoft , presented the experience of using machine learning in relation to the analysis of medical data. He drew particular attention to the need to test the model and visualize the results, and also mentioned the limited applicability of neural networks due to the unprovability of the result. Eugene spoke about his experience in building a method that reveals the relationship between the acid-base status of a patient’s blood (KHS) and information about the state of a patient in intensive care. At the entrance, he had 16 blood composition parameters for each taking of the analysis (for resuscitation patients, it is done twice a day) and data on the patient's condition upon discharge from resuscitation. Eugene mentioned a very large number of methods that he went through in the search for a solution, and described how a successful hypothesis was found. The hypothesis was tested for six months and is now successfully used in clinical studies.