In the case of SEMrush, it is pointless to ask "which languages and technologies the company uses": here each team is provided with the maximum degree of autonomy, reducing the "common to all" to a minimum. But the specific team is quite what to ask.
We learned that Scala, C ++, Spark and ClickHouse are used in one of the projects. The choice of Scala is in itself non-standard, the combination with C ++ can be found even less often, the ClickDouse DBMS from Yandex is also not the most common choice - so we decided to ask a few questions about how it all lives.
Alexander Morozov answered us.
- First, tell us which team you are on in SEMrush and in which position?
')
- Backend developer, Maroon team, Traffic Analytics project. We do something like Google Analytics on the contrary - using all sorts of clever statistical methods, we use the metrics of sites all over the Internet.
- Why was it decided to use Scala - due to the fact that you have Spark, or for other reasons? There is an opinion that ordinary Java is just as good for using Spark, and what does your experience say about it?
- If objectively, of course, because Spark is written in Scala. Java fits perfectly, but when the standard functionality begins to be missed, it is more convenient to add your modules to Scala. Personally, I don’t quite understand how easy and simple it is to write implish classes that extend the functionality of Spark structures — I would have to drag a rock into the project.
Well, yes, the best documentation is the code. Sometimes it is necessary to look into the Spark code, and here without knowing the language in which this code is written, in any way. A real-life example is to define and register a dialect in an SQL script so that it can insert data into ClickHouse via jdbc. Obviously this moment was not documented, I had to poke around in the code of Spark to understand where the error is, and what needs to be implemented to eliminate it.
And if we talk about the subjective side - a long story. I had previous experience in telecoms, from there I brought out a love for actors, and there I became interested in functional programming: I started with Erlang, then Haskell, and then Scala. Scala, as the intersection of OOP and OP, and even having Akka - the choice is obvious :)
- Since few people use Scala in Russia, I would like to ask: what was the practical experience, what is good, and what is pain?
- In general, I am satisfied with the conciseness of the code, the speed of development, the number of libraries (including those written in Java). After the Erlang, there is a lack of a normal matching pattern. Not that the pain, but sometimes it is difficult to deal with errors. On the one hand, implications allow you to write concisely and flexibly, on the other - it is sometimes difficult to understand what went wrong even at the compilation stage. Here, I really remember with a shudder what footcloths gcc displays to errors in the template classes.
Most of all pain, in fact, delivers purely infrastructural things. A computing cluster is quite a capricious thing. That calculation will fall in the middle due to lack of memory. That place on some of the disks will suddenly end. That because of network lags in DTs the node will fall off. Bugs in a distributed system - generally a separate conversation. To diagnose and fix such problems is quite a chore.
- Both Scala and C ++ have the reputation of languages “not for the faint of heart”. How do you live with this combination in one project? Does he turn out to be much more “hardcore” typical, is it difficult for a junior to do this?
- It seems to me that C ++ will be more smart. What is sometimes worth otdebazhit "double free or corruption", or a memory leak. Many things had to be done, including manual analysis of memory dumps in the hex editor. In Scala, perhaps, there will be fewer ways to shoot one's leg, although there are also enough nuances, of course.
And for the juna, sometimes it’s hardest not to write in C ++ in the C style with classes, and on Scala in the Java style with a strange syntax :)
- And do you have to pay for the choice of languages / technologies so that everything is technologically powerful, but is it difficult to find people for such a project?
- Of course. We have just opened a vacancy for the second backend. But since the problem of finding people in a non-most mainstream language is quite acute, we are not looking directly for a 100% match. A person who knows, for example, C ++ and Scala, is quite suitable. Or Scala / Java and Spark, but without C ++. The rest - we will teach.
- Let's go to ClickHouse: and what were the reasons for choosing there?
- We needed a base capable of storing and processing hundreds of billions of events in a reasonable time. ClickHouse is the base.
- Yandex says beautiful words like “within the framework of its narrow niche, ClickHouse has no alternatives” - how much does your practical experience confirm such words? Who can you recommend using ClickHouse?
- Well, we once thought seriously about Vertica. I also had a working prototype of the analytical part of our current system exclusively on Spark / Hive, but it required a lot of intermediate “manual” actions, and as a result, it worked much slower. Let's say so, in some form there is an alternative, then you need to choose the balance of pros and cons for a specific project. ClickHouse is great for projects one way or another connected with web-analytics.
- ClickHouse is a relatively young development (at least, if you count from the transition to open source). Does it interfere? Are there any “pioneers” with her who fill the bumps and send bug reports?
- Sometimes it baffles the lack of seemingly obvious functionality. There is something in the roadmap for the coming quarters. Non-critical bugs stumbled, but they were not pioneers - everyone was already worried. In general, the main functionality is in place, the bugs are fixed fairly quickly.
- Does the use of ClickHouse somehow affect the fact that it was created in Russia? Is there a feeling that its developers are “at arm's length”, do you communicate with them?
- ClickHouse has a Russian-language Telegram-chat, in which you can very quickly get answers to questions from developers. For me it was a pleasant surprise: even in corporate products with extremely expensive support, diagnostics and problem solving can result in a weekly correspondence “Re [100500]:”. Taking this opportunity, I would like to thank the team of Yandex in general, and Alexey Milovidov for personally helping.
- In your opinion, how did “independence of teams” in SEMrush affect your choice of technology stack, which allows you to “pick and choose” non-standard?
- I think almost nothing. We didn’t wake up one day and didn’t say to ourselves, “but from now on, write on Scala”. Such immaturity as a developer would be impermissible. At first we had a project exclusively in C ++. At some point, they wrote a prototype for research, which was not used in production. However, it was possible to write it so quickly, to parallel the calculations became so much easier, and the development turned out to be such a boost that they decided to implement. According to the experience of previous jobs, the process would be similar in a more “traditional” company: write a prototype, give an assessment of the pros and cons of what is called “sell” to its management, implement it. Of course, an important condition is to provide support and further development - that is, several people in the company must be familiar with the stack and be able to “pick up” the project.
- Given that the team chose the stack independently, I want to understand how the experience of the “neighbors” affects. When your stack was being formed, did its components actively use other commands in SEMrush, and did it affect the decision?
- Naturally, we are interested in what our neighbors have. Spark used the R & D team, when we still had it, for our own research and statistics. They have learned from them. ClickHouse began to study and implement with another team when the question of data storage arose.
- And how actively do you influence others now? Does it often happen to share your expertise in the same Scala and ClickHouse within the company, and does this lead to “oh, we also want”?
- On Scala, another team started writing. And ClickHouse with us in general, as they say, “has entered”. We actively communicate and exchange experience with other teams. Not that purposefully promoted, but conversations in the spirit of “I heard, do you have a klikhaus? And tell me, we also think ”arise regularly.
