“We'll have to write by ourselves. They sat down and wrote ": the life of the developers of the laboratory cluster of super-arrays in Sbertech

There is a myth that banks are very stiff structures, in which there is no room for experiment. In order to refute this myth, we conducted a brief interview with Valery Vybornov, head of the department for developing a laboratory cluster of super-arrays in Sberbank-Technologies. In their own team, they are not afraid to use all the power of Scala, Akka, Hadoop, Spark, and even write prototypes on Rust.

Main questions:

Discussion of a pilot project example (working with a social graph) with technical details;
Used languages and technologies (Scala, Akka, Hadoop, Spark, Rust, etc.);
Can I come to Sbertech immediately to a management position? How is everything inside organized, what are the grades?
How does a simple developer live? Details of the introduction of Sberjail;

- Tell us a little about yourself?

- I came to Sbertech almost three years ago, I was engaged in building Big Data, took part in building big data infrastructure. We worked in a single team, including with specialists from Sberbank itself.

What tasks did we have to solve? Anything, including recruiting. Now there are 47 staff members in the department. I hired employees, built a department job. He took part in the construction of the cluster, the development of the pilot and the prototype (this was the first phase of the construction of Big Data).

- Where did you come to Sberbank?

- From the company Video International. We were engaged, in fact, technical support for online advertising. They made an advertising twister that was used there for some time. Building Big Data there was also involved, participated in the creation of the platform - which was then separated, and until recently was known in the market as Amber Data. Now they bought NMG.

- When a person came to a large company immediately to a management position, the question usually arises: what were the key skills that made it possible to reach such heights?

- It is difficult to say what was more important: different skills came in handy to different degrees. First, of course, experience with people, management experience. Anyway, I have been working in managerial positions for quite a long time, for more than ten years already. The second thing that helped - for all these ten years, managed not to break away from technology. I have not forgotten how to do something with pens.

- Did you do this in your free time? How much time was spent on programming?

- Let's just say, if you take Sberbank, then in the early stages, when it all started up - up to 80% of the time. As soon as the department appeared, it gradually fell to the current level of twenty, or even lower.

- Do you miss programming?

- There is a slight nostalgia. On the other hand, those tasks that are at a higher level are also quite interesting. Management tasks.

“When you were programming, which tasks did you like most?” You also got not just some kind of division, but a division of Big Data. Does this have something to do with personal preferences?

- Of course. As a result of the development of the division, it turned out what is now - the IT Area of the development of Big Data applications.

- What is it, IT Area?

- In Sbergile matrix structure. Vertically there are “tribes”, horizontally - “IT Areas” and business chapters. In essence, the IT Area is a group of people of the same IT competence, working in different teams.

- People, when they hear about Sbertech, they are interested not only in specific technologies. Moreover, these technologies are difficult to use, because they are all yours and very special. Let's discuss the HYIP themes, including the agile. After the report of German Oskarovich only the lazy does not know about it. So how do you feel about him (if at all)?

- This is a complex topic, there are many interesting problems. What specifically to tell?

- What are the prospects for the introduction of Ajail for such a large company? Lived earlier with usual design management somehow. German Oskarovich talks about high-level abstractions, but what does this mean for specific people? Specific developers? A specific unit manager? In short, you need an insider, how to implement a large agile.

- Specifically, this means several things. For example, debossing. Leaders are no longer as “leaders” as they were before.

- “Debossing.” Take the boss and reduce it. Eliminate!

- Yes :-) This is what is happening now. Probably, it is better to formulate differently. IT professionals used to sit separately from the business, and now we sit together with the business. It changes our working conditions and some tasks. We work together to understand customer needs and improve products.

- How does it affect the life of a specific developer? The rules are becoming stricter, or, on the contrary, softer, or - what?

- Neither more severe nor softer. The degree of responsibility of each team and individual employee increases. There is no one to ask a question and no one to throw off the agreement - you make decisions yourself and you bear responsibility for it.

If before Sbergile IT specialists were sitting in one location, in one office, now even in Moscow they are scattered across several offices. If earlier people were sitting inside the department, now they are sitting on teams with the business.

- Who is your product manager?

- There is a product owner, and if we are talking about business projects, then usually this is a person from business. He is engaged in the development of this product, prioritizes backlog, helps solve problems the team faces. The concept is that the team is created with the product, for the entire life of the product. Previously, the team was formed only at the time of the project.

- Does an ordinary person really go from one product to another product? How are you doing with Internal Mobility? For example, you have worked for three years on one product, you can somehow from there ...

- Of course. If a person has reached the limit of development within the current team, he can initiate a rotation and pick up another team.

- Cool. And how are things with people from whom you can learn? So that they have a level, an order of magnitude higher than your current knowledge and skills. To progress not only career, but also in terms of knowledge.

- Of course, there are experts in all IT areas. If you take their knowledge, then the gap between the 12 Grade expert and the novice 8 Grade developer is exactly what you said. We are trying to stimulate such processes in every way at the IT area level. First of all, in the form of sharing knowledge, sharing competencies. Just yesterday, there was a meeting on this topic: it was decided to launch periodic meetings on the main product phases in which our IT area participates. Plus, they decided to launch their own internal messaging in order to stimulate the exchange of knowledge and the following growth of competencies.

- By the way, how do you look at making a couple of mitaps in Moscow? Can you find interesting speakers with interesting stories?

- Yes, we look positive. There will be speakers. Our speakers will participate, including, at conferences such as JBreak and JPoint. We often have internal reports, but the level is such that you can go outside with them.

- Returning to the topic, what is the developer's progression in the company? Here you said, 8 grade - novice developer. As it is called there, the usual "engineer", without the prefixes "senior" and "chief". What further career can be built?

- As competencies grow, it can grow up to 9 and 10 grades - these are specialists without a management burden, but with higher competencies. Growing grade - increasing compensation. Starting from 11 - this is more serious things. 11 grade was formerly called a “development manager” - that is, a team leader. Now with the transition to Sberdzhayl everything has changed a bit. Now there is debossing, and there are no tmlids as such. In fact, these are people through whom the architectural policy is carried out, their role is very, very large, in the sense that they can tell other employees how to do it so that everything is correct. 12 grade - this is the coolest specialists who can organize and synchronize several teams.

- After 12 grade there is life?

- There are 13 grades - the leading head of the direction and the expert strategist for development. The grades above are managerial competencies.

- We discussed Sbergile and organizational issues. Tell me about your department? What do you do?

- We develop applications that deal with various kinds of data processing on the Hadoop platform. Map-Reduce, Spark, Hive, and so on. And of course, machine learning too. We take tasks and turn them into code that runs on these platforms.

- You said that you have an application development IT area?

- Yes, IT Area - in fact, the union of people with the same competencies. This is what the “department” in the Competence Center is turning into a Sbergile.

- That is, you are developing applications that automate some business processes?

“These are applications that simply do something for a business or for an infra-structure, and they precisely solve various tasks for them. Perhaps this is a somewhat artificial term for our team. Initially, it so happened that we have two large departments in the center of competence, divided according to a very simple feature - the main development tool. The platforms are the same, and the programming language is different. I have this Scala, and the department of Vadim Surpina - Java. His unit is called the IT area of the development of the Big Data platform, and mine is the IT area of application development. But we must understand that this division is connected with our internal issues, and both IT areas are engaged in approximately the same tasks. For example, we differ slightly in business clients: I have “security”, and he has “risks”. But again, now this boundary is being erased.

- And what tribe?

- Our employees work in different tribes. Tribe - is another axis. I have infrastructural tribe and corporates, while Vadim has more risks and retailers, it seems.

- Now I understand where you are in common coordinates: we talked about the horizontal axis of the charters and about tribes. Now let's discuss a little technology. You said that you use Scala everywhere. Why Scala, not Java? Does everyone have java usually?

- I'm not sure that everyone has only Java, there are quite large companies that have Scala one of the main development tools, the same Tinkoff, QIWI ...

- Well, just in Sberbank, the main tool is Java, it is registered everywhere. And you took and used Scala. What's happening?

- As a matter of fact, no one demanded from me that I use Java. As the head of the department then, I had to choose a tool that would be interesting for my people to use, develop, and develop themselves, to develop their competencies in this area. In addition, Scala was chosen for many reasons. As a development language, Scala is very convenient.
Plus, I’m exactly the people who want to develop in this direction: I had an idea to make a community of Scala-development, able to solve our problems.

- Why Scala? What do you like about it? What is it good for you as a department manager, for example?

“Firstly, it is compatible with Java, and all Hadoop, Spark - they all work on the JVM. This is a serious requirement, greatly narrowing the range of options. Therefore, it had to be either a Java or a JVM-compatible language. JVM-compatible languages are not many. For example, Clojure was also interesting to us, but in reality it didn’t take off because we have a special feature - it’s difficult to write really big applications with a large team. In addition, Scala was chosen as the most dynamically developing language providing the features we need.

- There are still Groovy and Kotlin. What's wrong with them?

- Kotlin then, we can assume, has not yet been, he was in childhood. Groovy was, but Groovy is still in its original purpose - a scripting language, and its applicability for large projects raised questions.

- You chose Scala because everything else has fallen off, in a deductive way? Or because Scala has some cool features?

- Cool features. I watched him for a long time, starting from my work in Beeline. At a previous job in Video International, I was thinking of using it, but then there was still not enough supply on the labor market. After I came to Sberbank, I realized that it was time to start, because quite a lot of specialists had already appeared.

- You did not think to do any of your events dedicated to Scala? Inside the company, to carry the word Scala to the corporate masses. Or even for the whole of Russia.

- We think, and, most likely, we will do it.

- How can a programmer, conditional novice developer of the 8th grade, develop, when he was given Scala? What is there better to learn? Maybe some frameworks ...

- First, we have quite intensive training on this topic. This course, full-time. There are courses online. We take people who do not even know about Scala, but are eager to study it, and have a background in development. Online courses - any, starting with Coursera, they can be held at the expense of the company's budget for training. Basic training, of course, in the workplace — performing tasks, reading documentation, and sharing experience — is what your older colleagues tell you. You come, they give you a task, and you do it while studying in parallel.

- Are there any things whose attainment would help dramatically in the life of a developer? For example, learn some specific technology. The concrete part of Hadoop. All Hadoop, probably, cannot be learned - because it is huge.

- Of course. Experience shows that even if a person perfectly learns something small, such as how a JOIN is done inside Spark, he is strongly promoted in his competences in general, and in the perception of his colleagues.

- I remember your presentation in Innopolis. There was about social graphs.

- Yes, that was a year ago. We had a laboratory cluster project, and there was a multi-generation ... pilot project on Big Data, when management decided whether Big Data was ripe enough to be used here, or even need to wait. And one of the prototypes was associated with the social graph, with which difficulties arose, which the entire team had to tackle. This is work with an ultra-large graph in interactive. The prototype was made, and because of this the project “Laboratory cluster” took place, it was recognized as a success, and after that, what started is happening. Here is a background.

- What is the scale of the problem, and what is it?

- In terms of technology, the task is simple. There is a certain social graph, people with connections in social networks. And there are several social networks, and there is a comparison of people ... entities from different social networks, and you need to understand that this is one and the same person.

- What is the business problem?

- Very simple: search for people in social networks to solve problem debts.

- Are we talking only about problem debts, or are there more applications?

- There are many different uses. But when we were solving the problem, the client was the department for work with problem assets. It was a prototype.

- How big is the task? Roughly speaking, do we need ten people to analyze, or the whole of Facebook?

- No, actually, the whole problem is that the graph is big. There are billions of nodes. Much of the big social network.

- You cross-joins between different social networks, or remained within the same network?

- There were several networks, and we tried to compare the data from them.

- Just if you use some conditional GraphQL on Facebook, you can, while remaining inside it, make requests for free. And then you had to write for each social network your adapter, lead to one universal form, right?

- Not. See how it was done: at the beginning there was one big graph, and there each person had several vertices, and each vertex corresponded to a social network. We compare them, connect, and get one person who is present in several networks.

- And what was primary: you took the already existing base of Sberbank, and for each of its elements mine in social networks, or all social networks were analyzed abstractly for coincidences.

- And social networks, and also participated the base Sberbank.

- That is, she was “involved”, and was not the source. Was equal in this column?

- Yes.

- What data was collected?

- The data of public profiles. There was no classified information.

- As a whole, was this prototype successful?

- In general, yes. It was recognized successful, it was possible to achieve interactive, the user interface was written. When they showed the result and the interface, they all said that it turned out great and we need to continue. But the continuation has not started yet :-)

- How is it technically implemented?

- The binary file of the graph was formed from the sources, then these binary data were loaded into the RAM via the Unsafe JVM interface. It is clear that in the JVM the array is indexed by the integer, that is, it is two billion maximum, and our numbers were much higher, and we had to bring them to off-heap. Then connected Akka, developed some of your messaging model. Similar to Bulk Synchronous Processing, used in Giraph and HANA. Implemented Dijkstra's distributed algorithm, Crauser - Meyer - Mehlhorn - Sanders. Based on it, they achieved very good performance results. How the task looked like a la: there is a point A, a point B, to find the shortest path in the interactive (that is, quickly enough). Interactive achieved.

- You usually talk about different algorithms, Dijkstra, Bidirectional Shortest Path, etc. How are they used on this big graph?

- These are all variations on the topic of how to find the shortest distance between two points, and as soon as possible. These are solutions to the same problem.

- How does this help in finding people who are connected to each other?

- "What is the shortest way they are connected." For example, if we see two points on the graph and the distance between them is one, then most likely it’s the same person. Because one connection. There are many aspects, analysts understand this very well.

- These are the “spaces” on which the graph is built, are they names-surnames, or something else?

- "Spaces"? There are vertices and connections in the graph.

- OK, what is the distance in this column?

- We have edges (links), and each has weight. The shortest path is a set of edges between vertices with the lowest total weight.

- How is the weight for the rib? What does it consist of?

- This is determined on the basis of, for example, how reliable this communication is for us, from what source we received it. There may be several scales, and you need to search depending on the characteristics.

- Are these weights eventually reduced to one number, or remain as a vector?

- No, we are looking for a way and watch it. We need a path entirely. If the task needs to collapse, it can be done.

- Already several times the word "interactive" sounded. Can you decipher what it is?

- This is when a person sits in front of a computer, sets a task, and expects to receive an answer in a time acceptable to him. With a delay measured by seconds. Maximum, minute. It is clear that this is different from how Hadoop usually works: there are batch tasks, large batches that can spin for hours and hours. It is important to understand that here we had to get away from the main paradigm of Hadoop.

- But does Hadoop respond to requests anyway?

- No, Spark is used here only for data preparation. It makes a binary file, which is dragged into RAM, and then it already works - in principle, this can also be considered Big Data, but Hadoop is not used there, the main thing used there is Akka.

- Hadoop works as an Internet crawler, I understand correctly?

- In general, yes. He prepares data, collects, processes ...

- pulls out of the internet and puts it in that structure?

- Let's just say, our Hadoop does not directly pull them from the Internet. It was a complex multi-step combination, there were other automated systems involved, I cannot say the exact picture now.

- Who stole?

- Contractor. We ourselves have not stolen anything. Not quite in our competence. Such questions were raised, but in the end they decided to deal directly with their specialty - processing Big Data.

- Any interesting problems arose during the implementation of this whole thing?

- Of course they did. At first, we simply took a graph database of one large supplier out of the box and it turned out that the waiting time there was a clock.

- Waiting time in which scenario?

- Say, an employee of the Problem Assets Department dealing with search for affiliated companies, saw two points on the graph, wanted to check the connections between them, set the task to find the shortest path, and it worked for several hours. Of course, this is no longer interactive. This approach was rejected, and after going over several solutions, they realized that they would have to write it themselves. They sat down and wrote. The solution on the JVM was written second. The prototype was written in Rust.

- Why did they refuse Rust?

- First, Hadoop is not written in Rust. And it cannot be said that they completely refused, because in the future, maybe we will still write on it. To write applications for it under Hadoop is, to put it mildly, inconvenient, because it is not a JVM language. In this case, Rust was used simply because our employees who wrote the prototype were good at Rust.

- All the same, because you need to receive data from the Hadupov stack and transfer them to Rust. How to organize interop?

- Data is prepared by Spark. The application on Spark, of course, was not written on any Rust. It was written in Scala. The prepared data is shifted into RAM.

- That is, they communicated with each other using a file?

- Yes. Spark generated a binary file, very large, and then this application - first written in Rust, and then rewritten under JVM, on Scala from Akka - this file was enough, sucked, and worked on it.

- When switching from Rust to Scala and JVM, has the execution speed changed?

- Has changed, but only slightly. One hundred percent impossible to say for sure, because the application, which was originally made on Rust, did not exist very long. It was such a proof of concept. To make sure that this approach works in principle. Nobody really benchmarked it with respect to the finished application on Scala. It became clear that if we make an industrial solution to this particular problem, we will still do it on Scala, because we do not yet have people with experience in Rust. At that time, there was already a question about how such people are available in the labor market. Now these questions also remain, although the language is developing very dynamically, but still, it is not clear how accessible this competence is in the market. Then we just agreed that once you need to quickly check, we will do it like this, on Rust, and then we will redo it so that other people can come and accompany, based on accepted standards for development tools.

- You said that in the future you can apply it again. What he liked so much that you memorized him at all among all these experiments?

- This is a kind of development of the same branch on which C and C ++ sit. System development But he is deprived of the majority of all the significant problems that C and C ++ have. Moreover, he does not have a number of significant problems that his more modern competitors have, such as Golang. Better performance, no overhead in the form of GC, there are language abstractions that allow you to effectively make large applications. If you suddenly need to do a system-level low-level development, then most likely it will be on Rust.

- If you suddenly imagine that there will be as many Rust specialists as rockies. Rust vs Scala. What is better and when?

- The question is not worth it, because there is Hadoop. The main thing for us, after all, is not a development tool, but a platform. If Hadoop with Spark is rewritten on Rust, then, indeed, such a question will rise. But so far this has not happened, and there is no question.

- As a result of the project, which we are now talking about, what happened except for the prototype? There was some kind of FastGraph. What is it, can a few words?

- Yes, there was an application that quickly searched for the shortest path, a user interface was written (which was called, it seems “the workplace of the problem assets researcher” - you can ask Vadim Surpin for more details). In general, it was handed over as part of a laboratory cluster project. According to it, it was already decided to build a large industrial cluster. The management decided that the direction of Big Data is ripe, and it needs to be launched.

The prototype under discussion has not yet received further development. Because there were attempts to launch it somewhere, but for various reasons they did not shoot. As far as I understand, the main official reason is other priorities, in favor of more urgent projects. Which we are doing now.

- What are the interesting technical conclusions, solutions you have learned from the whole story?

- We understood that the choice of the stack (including Scala as a tool) was made absolutely correctly. This tool is quite ripe to use in industrial development, even in such a serious and large company as Sber. The project that we studied is waiting in the wings - as the opportunity arises, we continue to develop it immediately. In essence, we realized that our approach was correct.

- It was such a business-level conclusions. And the technical level? Have you written a system that uses Dijkstra's algorithm on social graphs, and does it really work?

- Yes. Well, here it is clear, Dijkstra can be implemented in a comprehensive way, but the approach that we have chosen is that it works quite effectively on large graphs.

- And what kind of approach?

- Layout of the application, for example. It could be done differently. We are still being asked, why didn't you use GraphX? But we say that GraphX is a batch system, which in general does not produce results interactively. You could try to work out with him, so that he began to work interactively. There were many options, but we chose this one, and he fired.

Also, we immediately laid distribution, for very, very large graphs. While this issue has been postponed, the current system is successfully working on one JVM, but the architecture has the possibility of using several JVMs with not very big modifications that will not cause a fundamental review of decisions.

- You brought Akka there. Why do you need Akka, how did she recommend herself?

- This is just for messaging, and in order to support our message model. So far no problems have been found with her.

- Messaging who exchanges and who?

- The architecture is such that there are workers, each serves some subset of the graph vertices. Workers communicate with each other in order to do some kind of distributed computing, for example, CMMS, which we talked about. All the cores of the machine on which it all turns are used.

- Distribution at what level? At the machine level, at the cluster level?

- Now - at the level of the machine, it will parallel within the same machine. Generally - at the cluster level. We just did not have time to do it yet, one project has already ended, and the other has not yet begun.

“Distribution within one machine means that you have nailed sixty-four processes to sixty-four cores, or ...”

- No, the process is one, but uses all the cores. It is multi-threaded.

- So you are using java threads. Or do you use Akka, and it automatically does everything?

- Of course, we ourselves are not controlled by threads, it all happens at the level of Akka.

- I see in your presentations the term “conical messaging model”, what is it?

- Yes, this is the same messaging model. This is a kind of formalization that says how workers must exchange messages in order to organize parallel computations. This is analogous to Bulk Synchronous Processing, a term used in Giraph and HANA.

- Is that her clone?

- In a sense, we can assume that this is her clone, but in fact we wrote everything from scratch and looked at BSP a little. Then our experts made a comparison and came to the conclusion that different things turned out. It would probably be interesting to write BSP on our platform and see how it works, but for the same reasons, the hands have not reached this point yet.

- You said something about PageRank and MPI. How do they generally relate to the project?

“These were experimental things that we did in an optional way to ensure the completeness of the features in the prototype, in order to show it to someone in the future.

- Clear. We need to get round, so the last question. For those people who are now reading us on Habré, do you have any instructions, wishes, and so on? Maybe you need a team to code people in Scala? Something like this.

- Yes, we always need smart rockers, and there are a lot of interesting projects to get experience with Big Data and ML. For example, the execution of machine learning models in the sale. This is a topic for another conversation!

- Since we have already touched the topic of hiring. Describe the profile of the developer that you want to see at home?

- A developer who knows Scala. Or the one who wants to study it - but at the same time having development experience. I would like in some form knowledge of the tools Big Data, Hadoop, Spark. Specialists who are ready to do tasks on the user interface are also needed - we use ScalaJS.

- Dedicated specialized mathematics, data scientists - do you need them?

- Yes of course. Now begins some interesting products related to machine learning, in particular, News Monitoring - highlighting the news of organizations of interest to us. Specialists in machine learning would be very useful to us.

- Thank you, Valery! It was a very rich interview, which I hope our readers will like. I hope to see you again at our conferences, for example at JPoint / JBreak / Joker or SmartData. Come with reports!

Source: https://habr.com/ru/post/350990/

All Articles

“We'll have to write by ourselves. They sat down and wrote ": the life of the developers of the laboratory cluster of super-arrays in Sbertech

Main questions:

More articles: