Friends, we agreed with Ontiko that we will publish the best reports from their conferences on our Youtube channel and share them with you. So we want not only to spread knowledge, but also to help our readers and viewers to develop professionally. Catch a selection of the top 15 reports that were heard on Highload ++ 2018.
Tarantool Replication: Configuration and Usage
Georgy Kirichenko, Mail.ru Group
Replication in Tarantool is used to provide high availability through server redundancy or server clustering for load balancing, and can also be used to perform update operations. In recent versions of Tarantool, several additional features have appeared that make it easy to configure and use replication in a cluster.
')
The report discusses the basic principles of the device and features of asynchronous replication in Tarantool. A detailed look at the internal structure of the state vector - vclock. Discuss ways to ensure data consistency and focus on new features. The basic principles of configuration, their applicability and the most frequent errors are considered, and ways to solve emerging problems with configuration and operation are discussed.
Technical aspects of Internet blocking in Russia. Problems and Prospects
Philip Kulin, The Deep Forest
Technical details of locks. How the locking mechanism is now organized. Who, what, where, when and how. Why is it so organized? Why the ILV is blocking with whole networks. What is the problem of the current locking mechanism from a technical point of view. In which direction should we move from a technical point of view within the framework of the minimal changes in today's regulatory framework.
Forecasting online store sales using gradient boosting (lightGBM)
Alexander Alexeytsev, OZON.RU
This is a report on an automatic warehouse replenishment system. The brain of the system is ML for sales forecasting: setting a task and choosing a loss function, working with signs, generating a data set, choosing a model, the pitfalls of the lightGBM learning process, evaluating results. Skeleton system - Spark / Hadoop: daily data delivery / validation, increasing system reliability. Business realities of procurement of goods: the choice of supplier, insurance stocks, the fight against the level of service providers.
Alexander also told about the use of trained lightGBM-models for assessing the elasticity of demand for goods at the price of marketing campaign planning and the effect of them. Different types of functions of the dependence of demand on price for different types of goods, and much more received as a "side" effect from the main task.
How do we work on the stability of our implementation of Lua
Anton Soldatov, IPONWEB
IPONWEB has been using Lua to describe business logic for more than 10 years. In 2015, they forked LuaJIT and since then have been working with their own implementation of the language. This component of the technology stack is critical to the business, so special attention has been paid to its stability.
Anton told how they created a test base for implementation from scratch; dismantled several cases where the tests were powerless against the complexity of the system being tested, and as a result something broke on the combat servers "suddenly" and "irregularly." The experience they gained in the process of correcting such errors can be applied to working with LuaJIT. And in the end, Anton shared the tools and techniques that they use in their companies when debugging.
Place row level security in a high load project
Alexander Tokarev, DataArt
A report on where and how best to organize row level security for a high-load project. Described the choice of how to implement row level security in a high-loaded enterprise project (4000 users, 10,000 requests at the same time, transactional and olap-load at the same time). I analyzed three technologists of row level security implementation in Oracle DBMS, and why was it that security was chosen in the database, and not on the application server. He told about the choice made, about the problems and future plans.
How we made our own Netfilter with Intel DPDK and prefix trees
Alexander Samoilov, Security Code
Linux Netfilter is at the heart of a huge number of ITUs, both open and commercial. This is a proven, reliable and, recently, even a fairly productive solution. But in today's reality, when tens of gigabits of traffic often have to be passed through the ITU, and the number of filtering rules can exceed one thousand, it is Linux Netfilter that turns out to be a bottleneck.
Alexander talked about how they rewrote the Linux network subsystem, which turned out fast — dozens of gigabits of stateful and stateless filtering, session tracking, NAT, and easy-to-manage routing — taught the subsystem to understand the commands of the well-known iproute2 and nftables utilities that are independent of the number filtering rules.
VShard - horizontal scaling in Tarantool
Vladislav Shpileva, Tarantool
Until 2018, the only means of horizontal scaling of the Tarantool DBMS was Shard - a module that implements sharding, a special case of horizontal scaling. Shard implements sharding by function from the primary key, supports cluster topology change, rebalancing. At the same time, he has three significant flaws that prevented the use of Shard in one of the important projects.
At the beginning of the year, the development of a new VShard module was completed - this is an alternative implementation of sharding. In it, rebalancing is performed in stages, you can set an arbitrary shard function to ensure the locality of the associated data, the result of the calculation of the shard function is stored in each record and is not recalculated. Vladislav spoke about the internal structure of VShard, its subsystems and its implementation with examples of use, and new features of VShard 0.2.
BBM's 150M + users Oracle to Postgres migration without downtime
Alvaro Hernandez, OnGres (English report)
BBM (Black Berry Messenger) is one of the world's largest instant messengers with text, voice and video communication functions, with a subscriber base of more than 150 million users. He worked for on-premise Oracle DBMS. We helped to migrate it to PostgreSQL running GCP with real-time replication with little or no downtime. Alvaro described in detail the process and pitfalls, techniques, technologies and best approaches to the migration of Oracle to PostgreSQL without downtime. Today, such migration is of interest to many, but it requires high qualification and involvement in a process in which it lurks a lot of difficulties.
High-load distributed control system of a modern NPP
Vadim Podolny, Fizpribor
From this report you will learn about the new platform of the distributed control system for nuclear power plants and how the management of the most complex automation objects in the world is ensured. Real-time management of the work of more than 150 special subsystems responsible for various technological processes of nuclear power plants. More than 100K data sources from sensors and up to 500K design parameters. 5 types of physical processes.
With some deviations, the whole system turns into a huge DDoS-source of useful diagnostic information that interferes with the normal management of the object. You will learn how we “solve” such problems, learn about the hardware and software architecture of such systems, how backup and replication are provided, why data redundancy and technological diversity are needed. How is load management provided, how QoS is structured. And what will happen if the normal operation system turns off, as, for example, it was at Fukushima.
4K streaming platform per million online
Alexander Tobol, Classmates
Service Video in Odnoklassniki - the second site in RuNet for video views: 600 million views daily. The streaming platform OK now allows you to conduct professional broadcasts in 4K, stream from your phone to FullHD and give users more than 3 TB / s of traffic.
Alexander told about:
4K video streaming pipeline to millions online;
content delivery system architecture;
TCP tuning for 4K distribution;
how and why you need to abandon ffmpeg and about cutting video on the GPU;
what to do if the power ran out, and users continue to come;
streaming problems on TCP;
future video streaming.
Recent changes to the Linux IO stack from a DBA perspective
Ilya Kosmodemyansky, Data Egret
I / O performance issues have been on the daily agenda of database administrators since databases exist. In Linux - probably the most popular operating system for databases - over the past few years, the IO-stack has been overhauled.
Ilya talked about what is happening, why the IO stack needs urgent improvement, which can be the result for databases. As new drivers NVMe and blk-mq will be improved. As a useful reminder, Ilya suggested a PostgreSQL and Linux configuration checklist to maximize the performance of the I / O subsystem in new kernels.
FAQ on architecture and work VKontakte
Alexey Akulovich, VKontakte
Alexey raised a lot of topics and issues that arise from people "from the side."
For example:
The overall interaction architecture of our servers.
Is there “normal” PHP on VKontakte, where and why. And what other PL use?
How to update the code on tens of thousands of servers in seconds.
Fail safety of memkash clusters with constantly breaking servers.
Why does Vkontakte have its own engines (DB), how many are there, and how they live with them.
How binlog differs from snapshot, and how to "roll back DELETE".
What can monitor all this.
Facebook dns
Oleg Obleukhov, Facebook
Oleg talked about how to balance load on Facebook, and what does the DNS infrastructure have, how resource records fall into Facebook's global infrastructure, and how the company uses DNS in dogfooding.
Databases and KubernetesDevOps and exploitation
Dmitry Stolyarov, Fant
Dmitry shared his experience and with specific examples he told in which cases it makes sense to place databases (and generally stateful applications) in Kubernetes, and in which it is unjustified, or even harmful and dangerous.