📜 ⬆️ ⬇️

Why GitHub won't help hiring a developer

One of my current projects is related to collecting data from GitHub-profiles of developers. It is difficult to use GitHub profiles as a data source, so I want to immediately list the problems when trying to evaluate a developer only by his contribution to GitHub.

One common mistake is the employer's attempt to filter candidates by GitHub profiles. Many still think that you can appreciate the developer’s abilities by looking at his contributions to open source projects. For example, in the latest list of vacancies on Hacker News a bunch of ads asking you to specify a GitHub profile in your application for a job .

There are a few good articles why you can't demand GitHub profiles from candidates. I especially recommend “Ethics of unpaid work and the Open Source community” and “Why GitHub is not a summary . ” Both articles perfectly explain the reasons why when hiring you should not ask about contributions to free projects. But I'm not saying that this is unethical or that GitHub is not very suitable for project demonstration.
')
I'm talking about why these profiles are of little help.

Sparseness of data


If you look at the public profile of the best software engineer I've ever worked with, you'll see something like this:



Although he last year wrote a ton of code on the work, but did not publish anything for public viewing: no public commits, no own repositories. He has very few followers. Despite all this, he is still the best developer I've ever had the pleasure of working with.

He’s not alone with his relatively inactive GitHub profile: the vast majority of GitHub users have a similar picture. For a quantitative assessment, I collected public commits of all users from the GitHub Archive and received the following figures:


Such a distribution roughly corresponds to a power law (or, at least, something similar ). This means that most users are relatively inactive, but in a small number of accounts there are hundreds of thousands of commits per year.

For an example, take a look at the graph for the percentage of developers on GitHub with a certain number of followers:



Getting the number of commits from the GitHub API is difficult [1] , so I give the number of followers who have a similar distribution. If you enter your userid on the article page , you will receive a mini-report on which position you are.

The plus is that even with 10 followers you can safely say that you are in the top 1% of all developers.

The disadvantage is that if the overwhelming number of developers do not have data in public profiles, then these profiles cannot be used to filter job candidates. 83% have no commits from last year, and 88% have no followers. This does not mean that they are bad developers. They just have no contribution to open source projects to brag.

GitHub shows only input to open source


This is obvious: GitHub's public profiles really only show people who create open source software. The overwhelming majority of software produced is closed source, and the lack of input to open source means almost nothing if you have been working on proprietary technology for your entire career.

I think it’s most instructive to compare well-known programmers who work in the open source industry with other celebrities who work in another industry. For example, Linus Torvalds has the most followers on GitHub. This is justified because he is the creator of several incredibly successful open source projects such as Linux and git. On the other hand, John Carmack or Jeff Dean do not have any GitHub profiles at all, although they are both well known for their work in equally successful closed source projects, such as Doom and Google.

I have always believed that requiring you to provide examples of open source commits when looking for developers to a project with a proprietary code is the height of hypocrisy. This is reminiscent of companies that require recommendations, but they themselves prohibit giving recommendations to former employees. If you do not allow a person to write open source projects, then there is no point in requiring the availability of such projects to get work.

Even if we leave aside hypocrisy, the GitHub profile estimate is doubtful if it excludes most developers, including such as John Carmack and Jeff Dean. It seems to me that there must be a certain “Jeff Dean Test” for hiring: if your requirements for a developer’s job exclude someone like Jeff Dean, you are probably doing something wrong.

Most GitHub projects are not impressive.


Even for that small part of developers with projects on GitHub most projects are not too impressive.

Nowadays, in many programming courses and at universities, students are required to create GitHub repositories as part of the curriculum. Although I fully support the training of new programmers in version control skills, the projects that are being created do not tell me anything except that they have completed the course. For example, on GitHub about 190,000 repositories called " datasciencecoursera ".

In addition, of the more than 78 million repositories in the GitHub Archive, about 1.1 million are called "hello-world" and 1 million are called "test".

The number of followers shows popularity, not talent.


Since there are so many mediocre projects on GitHub, it would be logical to consider seriously only projects with a large number of stars or profiles with a large number of followers. Even without taking into account the fact that you are further restricting the pool of candidates, this is still far from an effective way to assess a developer, since it only shows popularity, not talent.

As an example, take a look at the Stichpunk Github profile:



This profile has about 1560 subscribers, which puts the girl in the top 0,002% of the best developers on GitHub. She also has some relatively popular repositories and she seems to work in a large IT company. At first glance, this is a fairly respectable developer.

But this profile is not a real person. It was created by the authors of the series "Silicon Valley" for the episode "Tabs against gaps":


Every time you seriously consider the number of followers of GitHub as a significant indicator, just remember that the fake profile of a character from a single episode of the series has more followers than 99.998% of developers, and it is at about 670th place in the world.

In the same way, if you look at the most popular repositories by the number of stars, many of them are lists or jokes . Although it may not be easy to create such popular lists or jokes, it doesn’t say anything about developer talent.

Many popular projects are actually very impressive. I'm not trying to argue that the popularity and quality at least a little do not correlate. The problem is that making a project popular is a completely different task, requiring a completely different set of skills than just writing quality code.

Interviewers do not check GitHub profiles


GitHub profiles are not only of little use for hiring developers, but they don’t help the developers who are looking for a job too much.

I have had some interviews for the last decade, but as far as I can tell, none of the interviewers checked my GitHub profile before the interview. In fact, one even said that he did not read my resume , not to mention the GitHub profile, which seemed to me attractively honest statement.

Judging by the stories, this is a fairly common practice. For example, Dan Luu tweeted:

“Despite the hype about how open source helps your career and what a github == resume, I have only watched my code in only 2 out of 50 interviews” - Dan Luu

Dan Luu is in the top 1000 by the number of followers on GitHub. At least, my experience is not explained only by a relatively modest portfolio on GitHub.

Or, on hftguy.com, one developer studied GitHub analytics after going through many interviews - and found that his projects were visited by only 1 person (and this could be himself during the test):

“After a dozen telephone interviews (1 developer per call) and several full-time (4 to 7 interviewers), my profile was viewed only once. Conclusion: nobody cares about GitHub. No one will watch it. ”

Although GitHub profiles are difficult to use as a data source, I still plan to use them for a future project. The idea is that although the profiles of individual users give a sparse noisy signal, but combining information from a large number of users still reveals some interesting trends.

Footnote 1
There is no API on GitHub to get activity statistics for the last year. Apparently, the timeline is issued in the SVG (like this ), which can be parsed. I almost coped with this hack for this article, but rested on some of the limitations of CORS in loading it in the browser. It was possible to write a proxy for requests, but this was already turning into insanity, so I chose the option with followers. ↑

Source: https://habr.com/ru/post/350912/


All Articles