It is a continuation of previous publications .
Suddenly it turned out that for one of the tasks that had to be solved about a year ago, namely the assessment of the “reliability” of a large team, now there is a very popular name “HR analyst”. Updating the materials within the framework of the new task, I went out on the open spaces of the Russian Internet to an informative blog on the subject of HR-analytics. As a matter of fact, this blog + discussion of issues with its author Edward Babushkin served as a starting point for reviewing the issues.
This publication is rather of a debatable and analytical character than an approver. What approaches and methods are optimal for the task of HR analysts, which can be done by means of R. This ambiguity is caused by the fact that the object of the study is not natural phenomena, but people's behavior, which is not always logical and predictable, especially when moving from team to individual.
Different people have different opinions about what is hidden under the term “HR analyst”, and in the West today it is generally one of the topics for HYIP in narrow circles. Therefore, for the sake of clarity and the possibility of algorithmizing the solution for the beginning, we will limit its interpretation in the framework of this publication.
The classic tasks of HR analysts are divided into two very different classes:
The first class is interesting for recruitment agencies and for HR departments of companies. The second class is important only for HR departments (and, of course, for the management of the company).
In fact, the first class of tasks comes down to assessing the “good-bad challenger” for a fixed moment (meeting with the recruiter) based on historical data. Those. This classifier is based on the accumulated data on events in the past. From a technical point of view, statistical methods can be used to solve this problem, including classical methods of machine learning (random forest and neural networks). The question is in optimally selected factors.
But now companies are more interested in solving second-class tasks, namely, managing staff outflows. And here comes the first point for the slip and the potential failure of the undertaking.
HR and heads of departments interested in maintaining the current staff continue to use the terms “model”, “factors”, “accuracy”, “training”, “modeling horizon” and other words heard from marketers. Large vendors continue to delight with beautiful pictures and publications, how everything will be fine if you use their products. But if you look at the illustrations from the well-known publication on HR analytics “Watson Analytics Use Case for HR: Retaining valuable employees” , for example, in one of the pictures you will see that it is proposed to use a simple decision tree , but in a beautiful candy wrapper. But the task has changed quite a lot!
Clarify again.
The above wording, namely, the creation of a model that predicts the departure of employees with a given accuracy (at least 75-80%) with the need to train it on historical data quite clearly indicates the expectation of the final result (“black box”, then OC) as a binary classifier "Quit - not quit." The classifier can be any number, from simple logistic regression to random forest and neural networks, the essence of the problem does not change.
The problem is related to the fact that changes in the environment and in the object of analysis (the employee) occur very dynamically. The built model will lose accuracy very quickly. Moreover, not only by itself over time, but also with an increase in the forecast period. According to the richer experience of Western HR analysts, now the horizon of more or less reliable forecasting for the IT field, even for a well-trained model, is 1-1.5 months, but not 1-2 years, as they like to say in commercials.
Now recall that the main purpose of forecasting is the ability to make adequate managerial influences to keep an employee. But, again, according to various psychological HR studies on the horizon of 1-2 months. employee stop extremely difficult and expensive. He is already looking around, has begun to go on interviews, is mentally ready to leave, rolls up internal activities and packs things.
Even if the binary classifier is updated daily, adjusting its accuracy according to new historical data, it is still impossible to fundamentally improve the forecast horizon radically. The external environment is changing very quickly, but new factors have not yet entered into force - there is nothing to learn from. Moreover, the algorithm should provide information on the optimal management impact, and not all machine learning algorithms can directly provide such an opportunity.
Not at all, just look at this task from the other side and turn to the tools of medical professionals, sociologists, actuaries.
It is about the use of survival algorithms (Survival analysis), which, incl. is one of the most promising from the point of view of leading Western HR analysts. In conjunction with the proportional risk model (Cox proportional hazard), you can operate on the probability of dismissal, build a forecast forecast for dismissal and analyze the influence of various factors for a particular employee. It is fundamental that the transition from binary classification to a probabilistic description allows you to look at the evolution of the entire life cycle of a company and individual employees, operate with trends, analyze changes in the probability of dismissal depending on one or another management impact long before the risk of dismissal may occur.
Using the survival curve, you can compare different employees, count the average residence time, both for the population and for an individual employee. And to operate with probability is a more mature approach than to evaluate everything in the categories “black - white”.
Below is an illustrative example of how different survival curves look (data are taken from the ACS, which records the actual presence of employees in the office) depending on the availability of processing. “CEP, why do overtime refining, you see?” Everything works and not bad, there is time to talk with the employee and correct the situation.
For those who have a need to predict the care of employees, I cite a number of links to useful materials on “Survival analysis in R”. This is quite enough to understand the topic and make a useful tool for yourself:
On the Internet, books were observed in electronic form.
Previous publication - “Very handles”: we make Tableau / Qlik from R and “blue electrical tape” .
Source: https://habr.com/ru/post/347942/
All Articles