📜 ⬆️ ⬇️

Why in Russia there are so few committers in large open source projects

Throughout my short professional career, I enjoyed working with large open source frameworks - Lucene, Solr, Hadoop (map-reduce and yarn), Spark, Zeppelin, IPython, etc. When choosing between developing a proprietary product and something based on open source, I always choose open source for the following reasons:

Jedi development . A Jedi is first and foremost a person who can single-handedly change the fate of the universe (not falling under the principle of “alone in the field is not a warrior”). And some open source frameworks allow solving complex technical problems with a simple depot of ready-made solutions. Theoretically, you can write your map-reduce , your distributed file system, and even your supertable realtime database . But it will take a lot of time and will be worse in quality than existing solutions.

But your Spark outside the valley is no longer to write - just too complex system, requiring too many very highly skilled developers. But why write all this if the entire big data organization stack can be raised for 2 days. Terrabayty logs? Cassandra + Spark + Zeppelin . An experienced person can supply everything from ready-made docker containers in one day.

image
')
- Apache Spark is released once every 3 months with major features . This is a radical increase in stability, the emergence of new tools (SparkSQl, Dataframe, GraphX), an increase in the number of implemented algorithms (Gradient boosting in MLLib). Solr for a couple of years learned shardirovaniya and therefore work with big data. Hadoop was reborn in Yarn . These frameworks acquire new useful functionality without the application of my efforts. So, I can more effectively solve the tasks set for me. In a proprietary product, life would become easier only when I would have invested heavily in making it easier.

Good documentation . Very few top level apache projects with poor documentation. In the apache incubator, bad documentation can be found more often. But even in this case, due to the openness of the project, it has users who leave traces of their research on StackOverflow. The fact that in a proprietary project, the first step is usually to turn directly to the author of the code is the most extreme step in open source. For 2 years of my closest contact with spark, I had to write to the dev mailing list only twice.

The established community . In open source, I always have a sense of leverage and belonging to some circle, which will always help in a correctly posed technical question. There is a feeling that you have wonderful colleagues around the globe. And they will remain, if you even change the company, but do not change the framework.

Work for yourself. Working with open source you increase your expertise in it and quickly grow in the salary and professional terms. Indeed, if you need to change jobs - there are 5 offices on the market, the technological stack of which you already know about and can benefit from the first day. You do not need to go into context for half a year moving from one proprietary stack to another. And firms are also easier - you can hire employees, who practically do not need to be trained.

All this is a plus for employees and employers in Russia. And in order to take advantage of these benefits, there is no need to be a committer. It is enough to be a contributor. For those who do not know, briefly tell you how they differ. ontributor is a person who offered a patch to the project and the committer put it in the master. A committer is a person who has the right (and duty) to commit and commit patches regularly to the master.

The contributor is thus the ideal employee in Russian realities. He knows the project well, since he used it enough to understand where it can be improved. He was able to improve it. He can build a project from the weights and edit the weights, which means that he can always climb into the code and customize if needed.

As a contributor to be cool - you don’t have to pass Spark Certification for 300 bucks, and no one will question your competence in this framework.

Committer has more expertise in the project, but more importantly, more power.

image

He can “push” a patch to the project that benefits his employer. He can ban the patch if it is not profitable. He can determine the development of the project. But power is not free. He really should work on the formation and maintenance of his authority - read endless, useless patches, write architectural Google docks, answer questions. It is almost impossible to do this in your free time - it is a great deal of work. Therefore, the committer does this at the employer's expense. And what about the employer from this? Let's look at the list of committers in Spark:
Aaron davidsonDatabricks
Andrew orDatabricks
Andrew XiaAlibaba
Andy konwinskiDatabricks
Ankur daveUc berkeley
Charles ReissUc berkeley
Cheng lianDatabricks
Davies liuDatabricks
Haoyuan liUc berkeley
Imran rashidCloudera
Jason daiIntel
Joseph bradleyDatabricks
Joseph gonzalezUc berkeley
Josh rosenDatabricks
Kay ousterhoutUc berkeley
Mark hamstraClearStory Data
Matei zahariaDatabricks, MIT
Michael ArmbrustDatabricks
Mosharaf chowdhuryUc berkeley
Mridul MuralidharamYahoo!
Nick pentreathMxit
Patrick wendellDatabricks
Prashant sharmaImaginea, Pramati, Databricks
Ram SriharshaHortonworks
Reynold xinDatabricks
Robert EvansYahoo!
Ryan LeCompteQuantifind
Sandy ryzaCloudera
Sean mcnamaraWebtrends
Sean owenCloudera
Shane huangNational University of Singapore
Shivaram venkataramanUc berkeley
Stephen habermanBizo
Tathagata dasDatabricks
Thomas dudziakGroupon
Thomas gravesYahoo!
Xiangrui mengDatabricks
Yin huaiDatabricks

Spark originated in UC Berkley, so we subtract all of Berkley. Databricks , the company formed by the founders of Spark, makes money from Databricks Cloud - the analytic tool on top of Spark. Spark is in fact their main product, so they should invest in it. Yahoo has always built its infrastructure on open solutions - at first it was Hadoop, now Spark. Companies of this kind need committers for the following reasons:

I do not know for sure, but I think that Alibaba has no less infrastructure investment than Yahoo. Groupon is smaller, but still. For ClearData Spark - the main engine.

Intel needs to know for sure that Spark is well compatible with Intel. Cloudera, Hortonworks - are vendors of Hadup (and therefore Spark). They must broadcast not only their own interests, but also the interests of the customer. Companies for which Big Data and IT are the main business are much more interested in committers. MapR, SAP, Oracle, IBM - are now actively looking for Cpark committers (although I don’t understand how to actively search for only 30 people whom everyone knows by name). And they are willing to pay good money. Become a Sparka Comittee in the Valley - guaranteed to raise your salary by 2 times, if it was already high.

Companies that are willing to pay big money for committers are absent in Russia. IT integrators do not have the scope of IBM and SAP, not only in terms of turnover, but also in terms of ambitions to determine the development of the industry. They follow the trends formed in the valley. Committer simply cannot do them good.

Food companies in Russia are either small or are sitting on a proprietary technological stack. Yandex is trying to develop on the model of Google, where the whole development is in house. As I understand it, this position is based on the idea that developing inside is faster and more effective than any open source, when a company is able to create a critical mass of experienced specialists. I am not familiar with the details of the VKontakte infrastructure, but it is also proprietary. Classmates just use Spark, why they are not visible in the community - I can not say.

Thus, being a committer, I find it very difficult to win for money or opportunities in Russia against the background of a simple contributor.

The absence of companies in Russia who would like to keep committers (which would be beneficial), I consider the main problem of a low number of committers.

Source: https://habr.com/ru/post/259657/


All Articles