Introducing the first release of the podcast about technologies, processes, infrastructure, and people in IT companies (zero release can be heard and read
here ). Today, CTOcast is visiting Kirill Safonov, Technical Director of RuTarget.
Listen to the podcastA few words about our interlocutor and the company RuTarget:
')
Kirill Safonov lives and works in St. Petersburg, Russia. Graduated from St. Petersburg State Polytechnic University (2004), Candidate of Physical and Mathematical Sciences (2007). In 2001–2004 worked as a C ++ developer at Soft-Impact, in 2004--2006. - programmer in the company Borland. From 2006 to 2010 in SwiftTeams (from 2010 as CTO). In 2011--2013 - Developer at JetBrains. Since April 2013, he has held the position of CTO in the company RuTarget.
RuTarget company
was founded in 2011 by Eugene Light, then the first investments were attracted. The company develops RTB solutions for the order. The RTB infrastructure provided by RuTarget allows agencies and advertisers to automatically purchase and sell advertising and data. The most well-known solution is the platform for the company Segmento (closely related to RuTarget), which provides RTB services to advertisers and advertising agencies.
Text version of the podcast (1st part)
Alexander Astapenko: Explain, please, what is RTB (real-time bidding) and how the interaction between the various participants of this system is arranged: advertisers, sites, data providers.
Kirill Safonov: Of course. RTB, or a real-time auction, is a scheme for displaying Internet advertising, the essence of which boils down to the fact that in the process of loading a web page, in real time, an auction takes place between the advertising network (a page is connected to it) and several redeeming participants (RuTarget among them).
The network offers advertising space and, most importantly, a certain user ID. And the participants determine how interesting the network offer is to them, how attractive the user and the site are, and decide whether to show one of the banners of the advertising campaigns that they run, to this user or not. If the decision is positive, the participant must offer an ad network rate. The maximum bid wins, and the end user sees the winner banner.
Technically, this happens as an interaction between ad network servers and redeeming parties. A request is sent in real time, to which the redeeming party must respond within a short period of time - 100 milliseconds. The ad network collects the answers, filters them in a certain way, selects the maximum bid and displays the banner. Thus, in the RTB scheme, the decision on which banner to display is made in the process of loading the web page.
This scheme is good because advertising works more efficiently for both advertisers and users. Advertisers can show a user-relevant ad that has a higher percentage of clicks and overall responses. And users are shown advertising based on their intentions or interests. So this is a kind of win-win.
It is clear that the more an auction participant (the redeeming party) knows about the user, the more efficiently he can spend the budget: put higher rates on those who click on ads, and not put on those users which are not interesting to him. Here begins mathematics, mathematical models that collect information about users, digest it appropriately and build a set of features, using which one can understand or predict what a person will do: he will click on an advertisement or not.
Alexander Astapenko: What is RuTarget in this chain as a product?
Kirill Safonov: RuTarget is a platform, DSP (demand-side platform), that is, the redeeming side, which can connect to advertising networks (in our area they are called SSP - supply-side platform, the selling side) and make bids. The system also knows how to collect user data, store it and process it in order to place bets most efficiently.
Pavel Pavlov: Who are your customers and how is their interaction with the RuTarget platform organized?
Kirill Safonov: End customers tend to interact with the company Segmento, which is a kind of business frontend. Segmento is able to execute, twist advertising campaigns of clients, providing them with the necessary volume of clicks, conversions and so on. Segmento uses the RuTarget platform to spin these campaigns. Suppose a client comes to Segmento, or Segmento finds a client and enters into a contract with him, and all technical actions are performed by RuTarget.
Pavel Pavlov: So it turns out that there is no RuTarget interaction with the end customer?
Kirill Safonov: Now is not. Although the plans have the output of the RuTarget platform in public, so that users have the opportunity to directly communicate with the technology platform, for example, to connect to it through the API.
Alexander Astapenko: And who is responsible for the formation of the product RuTarget, what happens for the API? Are current customers a product development engine? How inside RuTarget does this happen? Who makes the decisions and controls the process?
Kirill Safonov: We have a product manager, a person who is between Segmento and RuTarget and accumulates all the features and requests that come to our bug tracker, and also keeps track of where the product is moving. On the other hand, there is a development team led by me that analyzes these feature requests, determines at what point they can be executed. And then comes the usual coordination and prioritization, some tasks for the next iteration.
Alexander Astapenko: Still, I want to dig up this topic. Is there any sort of prioritization process for features? The process of getting these features in your product backlog?
Kirill Safonov: On the one hand, these are always incoming features, incoming requests from a product manager. On the other hand, an understanding of how long it all is done and how it goes into the current architecture, what needs to be redone. Next is the discussion, and we make a joint decision depending on the priority, on the business importance of this task and on the time it takes to complete it. As a result of communication, she falls somewhere sooner or later in backlog. Further, when we plan an operation, we take another piece from backlog, select those features that fall into this operation, and execute it.
Pavel Pavlov: Kirill, you mentioned 100 milliseconds a little earlier. It seems that there are quite high requirements for RTB solutions. To speed, in particular. Tell me, how do you manage to meet these requirements and are there any other technical difficulties that you have to deal with when developing, creating and supporting the platform?
Kirill Safonov: “Technical difficulties” - sounds a bit pessimistic, I would call it interesting moments. The first interesting point is the load and the short response time, a unique requirement. We have an incoming stream of requests (about 10--20 thousand requests per second) and we have, in fact, even less than 100 milliseconds, somewhere from 20 to 30 milliseconds to make a decision: show a banner man or not, what kind of banner, what rate and so on. We solved this problem, built the architecture so that it was horizontally scalable, so that we add additional servers with increasing load. Now we are connected to ten local networks of different caliber and we are successfully working with all.
Another, perhaps not unique, but important point is the high availability of the system: around the clock, with a high percentage. We need in the design of the components of the architecture to think about stability and about duplication of components in the event of the failure of some block.
Another interesting place is the mathematical models that should make the most effective forecasts about users based on the data that we have. And this is an open area, without any clear framework. It cannot be said here whether we satisfy the requirements or not, because in this case there is no limit for perfection. This is a permanent job, constant meditation, tests, comparisons, rolling out new versions, rolling back, playing math, let's say, a game in a model. This is probably the most interesting place, the most non-trivial.
Pavel Pavlov: I wonder if I would like to stop here a bit. There is such a thing as “data science”, and you are an expert in this field. How much is science really? Or a “math game” by your own definition? Closer to science or to programming? How do you define this niche in modern IT technology?
Kirill Safonov: We have a data mining department - a few people who develop these models and they have a real science: analyzing large amounts of data, building models, running tests. When they say that there is some idea that needs to be effective, we implement it in our system, roll out for testing on some percentage of requests or advertising campaigns and check whether it works or not. Then our guys look at how it went, analyze the logs and the results. We argue that the new model, the approach is acceptable, extend it to all other advertising campaigns. Or refuse it, go back and try something else. That is, on the one hand, it is data science in the sense in which everyone understands it: the analysis of big data, the search for patterns, the construction of models. And this is a kind of experiment that is constantly happening in the system.
Pavel Pavlov: You talked about big data, the level of scaling and load. How did you manage to achieve such an architecture, to come to the necessary solution? Personal experience in previous projects or collaborative work of employees?
Kirill Safonov: Perhaps my previous experience didn’t help me a lot, because before that I had been involved in either desktop projects or server projects, but not with such workload and requirements. Reading literature, internet helped. Recently, quite a lot of articles appeared, products from which you can make a system. Communication with other people on the market in this area. Strangely enough, quite often you have to discuss any technical issues with engineers from competing companies or partners. Thus, there is some kind of understanding that actually works, which can be applied and which will give a return.
Pavel Pavlov: Since decisions came from outside, maybe there are recipes that you are ready to share? What technological stack to investigate to people who are trying to work in the field with such a level of load and reliability?
Kirill Safonov: As for big data, we are quite obviously using Hadoop and components, such as HBase, MapReduce tasks. As a highly loaded frontend, Nginx works very well with us. We are very pleased with it and almost did not customize it. He is experiencing very large loads without any failures. Quite a lot now there are high-performance NoSQL databases, with which you can organize various caches. Perhaps I can’t give any specific advice, because the information that I have is taken from the Internet, and within two hours you can get some basic idea of ​​what is happening there.
Pavel Pavlov: On the basis of theoretical solutions, you practically came to the conclusion that this whole bundle - Hadoop, NoSQL, Nginx - works and perfectly fits into serious production.
Kirill Safonov: Yes, everything works out.
Pavel Pavlov: Well, and closing the topic of a high level of reliability, can you share the SLA (service-level agreement)? How many nines are there, what are the reliability requirements for such solutions?
Kirill Safonov: Of course, there are no formal requirements for reliability with nines. But at the same time, as soon as something breaks suddenly, then immediately the partners from the advertising network or data providers complain, they say, something is wrong. Therefore, the system was designed as reliable as possible. So that even if the backend stops responding, the front end with a stub will form an answer that would suit everyone.
Pavel Pavlov: Apparently, for vacancies that you have open, the emphasis is on Java-development and technology. What caused such a choice? Is this related to your experience, with the ability to search for specialists in the IT market?
Kirill Safonov: It seems to me that Java is a kind of middle ground. On the one hand, she confirmed that she is a production language in which you can make high-load and high-speed solutions. On the other hand, it is a fairly simple and common language in terms of skills and level of training. Plus, we use Hadoop. In any case, client applications, components for Hadoop will be made in Java.
Continuation of the text version of the podcast - in the coming days.Subscribe to podcast