Once I wondered what if I tried to analyze the vacancies and compose some tops on them. Find out who pays the most, who is most in demand and much more.
As a data source, I used the well-known HeadHunter. Were collected and processed jobs for May of this year. Only for a month, because the API does not allow getting more.
The HeadHunter API has excellent documentation that is located in the repository . Requests should be made to the https://api.hh.ru/ domain with the installed User-Agent
, preferably of type _/_ (__)
(other User-Agent
options sometimes work, but if the server doesn’t like something, it will return an error ).
The logic of the collection is very simple, so I implemented it in bash using cURL and jq . However, I want to share a few nuances.
For search of vacancies in various parameters there is endpoint GET /vacancies
.
curl -A 'irenica (https://irenica.com/)' 'https://api.hh.ru/vacancies'
Search results will be divided into pages, the size of which is the parameter per_page
(20 by default and 100 maximum). You can select a specific page by specifying the page
parameter (the numbering starts from 0).
In the pages
service information returned with job openings, the total number of pages of the result will be indicated.
With this, you can easily search through all pages:
declare -ii=0 while true; do declare url="https://api.hh.ru/vacancies?per_page=100&page=$i" declare page="$(curl -A 'irenica (https://irenica.com/)' "$url")" # $page ((i++)) declare -i totalCount=$(echo "$page" | jq '.pages') if ((i >= totalCount)); then break fi done
However, the search results contain only part of the job data. To get everything, you need to make a separate request for the endpoint of the form GET /vacancies/id_
.
Partial data on vacancies are in the items
field of search results. At first we will collect from them ID of vacancies:
declare vacanciesIds="$(echo "$page" | jq -r '.items[].id')"
Then we will request complete information about the relevant vacancies separately:
for vacancyId in $vacanciesIds; do declare url="https://api.hh.ru/vacancies/$vacancyId" declare vacancy="$(curl -A 'irenica (https://irenica.com/)' "$url")" # $vacancy done
The HeadHunter API has one feature - no matter how many are found, a maximum of 2000 will be returned. At the same time, the actual amount found will also be returned to the found
field of the search results. Thanks to this, it is possible to know for sure whether you received all the requested data, or if there are losses.
To get around this limitation, I came up with the following. When searching, you can specify the length of time when vacancies of interest were published (via the date_from
and date_to
, which accept the date in ISO 8601 format). You can take a small interval and sort through all the results with such pieces: after all, the shorter the time interval, the less vacancies were published for it.
It is worth paying attention that the vacancies published only for the last month are returned. Therefore, it makes no sense to set the range anymore.
To iterate over the length of time, the latter is best represented as Unix time:
declare -i startTime=$(date -d '-1 month' +%s) declare -i endTime=$(date -d now +%s) while ((startTime <= endTime)); do declare -i intervalEnd=$((startTime + 60*60)) declare startTimeIso="$(date -d @$startTime +%FT%T)" declare intervalEndIso="$(date -d @$intervalEnd +%FT%T)" # ... declare url="https://api.hh.ru/vacancies?per_page=100&page=$i&date_from=$startTimeIso&date_to=$intervalEndIso" # ... startTime=$intervalEnd done
To collect statistics, it was necessary to group vacancies on certain grounds. At bash, doing this was already problematic, so I used Python.
The logic of the collection is nothing special - the accumulation of data in the associative array, sorting and output to CSV. However, again a few nuances.
It should be noted that the salary is presented in the form of two numbers - the minimum and maximum, and any of them may be absent.
Since for analysis it was necessary to have one number, I decided to use the lower limit, and only if it is absent, the upper one.
salary = None if vacancy['salary']: if vacancy['salary']['to']: salary = vacancy['salary']['to'] if vacancy['salary']['from']: salary = vacancy['salary']['from']
Salary in a job can be specified in different currencies, and they - have a different rate. The HeadHunter API has endpoint GET /dictionaries
containing all the necessary predefined values. Currency rates are presented in the currency
field. For convenience, it would be better to put their list in an associative array, where the key is the alphabetic currency code:
currencies = {} dictionaries = requests.get('https://api.hh.ru/dictionaries').json() for currency in dictionaries['currency']: currencies[currency['code']] = currency['rate']
Now, during processing, it will be easy to convert all salaries into one currency:
salary /= currencies[vacancy['salary']['currency']]
In some vacancies, the salary is indicated before the payment of personal income tax, in some - after. The specific field is indicated by the gross
field: it is true
in the first case and false
in the second.
I decided to transfer all salaries to the option after tax:
if vacancy['salary']['gross']: salary -= salary * 0.13
Now is the time to show the numbers.
Probably many of those who read this post, would like to work on the remote. But as we see, work from home in our country is not very much quoted yet. Salary is much lower, the number of vacancies is significantly less. And therefore there is less opportunity to choose for the applicant.
And it is rather strange, because in many professions and many firms (by the specific nature of the tasks), the presence of a person in the office is completely unnecessary. But this is an eternal argument.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Domestic staff | 112536 | 10977 | 130000 | nineteen |
Information technology, Internet, telecom | 55225 | 1000 | 300,000 | 2828 |
Top management | 47687 | 9474 | 100,000 | 23 |
Extraction of raw materials | 46579 | 20,000 | 90898 | 80 |
Installation and Service | 45439 | 11874 | 69600 | 9 |
Public service, non-profit organizations | 44911 | 20,000 | 90000 | nineteen |
Working staff | 44218 | 9499 | 67860 | 37 |
Production | 42388 | 2372 | 100,000 | 236 |
Construction, real estate | 39896 | 70 | 110000 | 329 |
Transport, logistics | 37662 | 9490 | 100,000 | 223 |
However, there is an even smaller category of vacancies - for people with disabilities. And this is completely illogical - even if employers do not want remote workers, but of those who are ready for this, why are there so few who think about people with disabilities? If you do not care that a person is in three time zones, what difference does it make to you, is he able to walk, for example?
Perhaps many of you are familiar with people with disabilities. I, too, and I wondered how difficult it is for them to find a job, and what they can count on.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Public service, non-profit organizations | 69675 | 8700 | 90000 | eight |
Top management | 48705 | 30,000 | 82425 | 15 |
Information technology, Internet, telecom | 45321 | 4350 | 200,000 | 1050 |
Science education | 45056 | 3158 | 90000 | 376 |
Purchases | 43591 | 15,000 | 80,000 | 9 |
Construction, real estate | 42148 | 22 | 250,000 | 210 |
Production | 40969 | 10,000 | 130500 | 189 |
Accounting, management accounting, finance companies | 36387 | 2610 | 113100 | 125 |
Lawyers | 34308 | 2610 | 160,000 | 131 |
Security | 33414 | 22 | 90000 | 178 |
We all start with something, namely, with a job search, without any experience. I decided to assess the situation with positions open to such candidates.
The number of vacancies is encouraging for quick employment. And I do not know how realistic it is to get the maximum salary, but you can even somehow live by the average figures.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Counseling | 62601 | 1500 | 221850 | 2504 |
Construction, real estate | 55855 | 20 | 949989 | 6455 |
Top management | 50826 | 11310 | 400,000 | 111 |
Extraction of raw materials | 38192 | 8,000 | 100,000 | 328 |
Security | 34617 | 3954 | 100,000 | 5844 |
Medicine, Pharma | 34475 | 450 | 200,000 | 11776 |
Transport, logistics | 33600 | 500 | 150000 | 8,000 |
Science education | 31426 | 1100 | 124510 | 1660 |
Sales | 30444 | one | 350000 | 52566 |
Installation and Service | 30360 | 8264 | 80,000 | 381 |
And now the most interesting thing: who pays the most? Sorted all vacancies found without any filters.
Of course, this is top management. Who would doubt that.
A curious fact: if you pay attention to the average salary in all tables, you can see that it is not that different.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Top management | 78789 | 150 | 2,000,000 | 2408 |
Extraction of raw materials | 61699 | 8,000 | 180000 | 2302 |
Counseling | 59797 | 1500 | 500,000 | 3762 |
Information technology, Internet, telecom | 52777 | 26 | 684804 | 25900 |
Construction, real estate | 48587 | 20 | 949989 | 33229 |
Production | 42007 | one | 261,000 | 27269 |
Working staff | 41203 | 25 | 200,000 | 43079 |
Car business | 38555 | 20 | 824254 | 9269 |
Installation and Service | 38412 | 25 | 180000 | 2390 |
Purchases | 37846 | 50 | 261,000 | 2658 |
And here is the easiest way: why study for 5 years, if you can just wash the office? Below is the result of filtering the top vacancies for the query "cleaning *".
What if you get a job in several offices and come in the evening for a couple of hours for cleaning? So you can live quite luxurious. We will consider it life hacking.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Top management | 63000 | 40,000 | 87000 | eight |
Marketing, Advertising, PR | 50,000 | 50,000 | 50,000 | 6 |
Extraction of raw materials | 45,000 | 45,000 | 45,000 | 3 |
HR management, trainings | 33246 | 7908 | 87000 | 58 |
Accounting, management accounting, finance companies | 32,000 | 30,000 | 35,000 | ten |
Security | 31507 | 20,000 | 70,000 | 6 |
Sales | 29696 | 4737 | 55,000 | 159 |
Construction, real estate | 29024 | 413 | 80,000 | 73 |
Transport, logistics | 24987 | 10990 | 45,000 | 26 |
Car business | 24465 | 7124 | 45,000 | 61 |
Finally, I decided to check the number of open positions in cities. The first places are not surprising, but then there are curious and even unexpected positions.
Name | Number |
---|---|
Moscow | 31137 |
St. Petersburg | 11745 |
Minsk | 7608 |
Almaty | 4386 |
Kiev | 3398 |
Yekaterinburg | 3182 |
Novosibirsk | 3097 |
Kazan | 3066 |
Ufa | 2980 |
Nizhny Novgorod | 2876 |
All code from the article, with improvements and instructions, is available in the repository .
Source: https://habr.com/ru/post/418281/
All Articles