📜 ⬆️ ⬇️

Book on intensive data processing

Hello dear readers. We rarely write about the book "protracted", that is, about the works that will not work in the West. But today we want to introduce you to a post from Martin Kleppman's blog, which has been working on the fundamental book " Designing Data-Intensive Applications " for several years now.


In a relatively small publication, the author managed to state the basic ideas of such a voluminous book, outline the target audience and almost convince us that the translation should be taken. But you still read and feel free to vote.

At the end of 2012, I wrote the post “Rethinking caching in web apps” in my blog. It has almost 4,000 words, which is much longer than, according to popular wisdom, there should be a good post. Nevertheless, I still have the impression that in it I only digged slightly the problems that need to be talked about.

Therefore, I wondered if I should write a genuine work, a book, for example. I like to write, because such work stimulates the author to engage in deep research, think through the problem, and then try to explain all this logically. Thus, I work through a problem much better than if I just read about it. Or, to put it aphoristically:
')
Writing helps to understand how clumsy your thoughts are. - Dick Gindon

What books already have


I took up this work, since such a book, which I would like to read, simply did not exist. I needed a book that explained the data systems — the entire scope of databases, distributed systems, batch and stream processing, consistency, caching, indexing — at the right level of complexity. But it turned out that almost all existing books, blog posts, etc. fall into one of the following categories:

  1. Most IT books are application guides for a specific technology. They assume that you are told to use the X database or the Y programming language, so there you are told how to do it. These books are good, but they are of little use in situations where you are trying to choose which tool, X or Y, you really need. Such books usually focus on the merits of a particular technology and ignore its flaws.

  2. In blogs, there are often comparisons of several technologies, but in such publications mainly surface aspects (benchmarks characterizing performance, API, license) are affected, and the fundamental structure of the technology is completely ignored. Such posts can be compared with the cards in the databases from the Top Trumps game, you cannot compile any deep insights on them.

  3. On the contrary, the textbooks consider the fundamental principles and compromises characteristic of various technologies, but at the same time such books lose all connection with reality. As a rule, the authors of such books are academics with extensive research experience in their subject, who do not have practical experience with real software systems. Often, they set out technically correct things, but this information may be useless or just confuse you as soon as you start to create a real system.

I wanted to write a book that combines the merits of all three categories. This book, which tells the story of big ideas on creating data processing systems, will describe the fundamental principles that do not change when updating software versions. But this book also does not lose touch with reality, explains what works in practice, what does not, and why. This book examines the tools and products already used in actual practice, compares the fundamental approaches used in them and helps you understand which technology is best suited for solving a particular problem.

I wanted to understand not only how to use a particular system, but also how it works under the hood. This is partly due to intellectual curiosity, but, no less important, in this way I can clearly imagine what the system is doing. If any unplanned behavior arises, or if you want to check what the true capabilities of this or that technology are, then it is extremely useful to at least approximately imagine what is going on inside.

I discussed these ideas with different people, including O'Reilly's staff, and it became clear that it’s not for me alone that I need such a book. So the idea of ​​the book Designing Data-Intensive Applications was born. And you will easily recognize it on the shelf - after all, on the cover there will be such a cool Indo-Boar boar.

The book “High-loaded applications. Efficient processing of big data "(sorry for the verbose title - you can call it just" a book with a boar ") while preparing for the release, but the site has already been posted early release .

Who is this book for?


If you are a programmer who develops server applications (for example, a web application machine interface), then this book is for you. It is assumed that you already know how to write a web application and access the database, and that you want to upgrade your skills. Maybe you want to work with a complex, scalable system that serves millions of users, maybe you are going to deal with particularly complex and constantly changing data, maybe you want to give flexibility to some ancient inherited ecosystem.

The book begins with the basics, and then I talk about all levels of the database system, with a separate chapter devoted to each level. I am not trying to promote any particular architecture or approach, as I am deeply convinced that different situations require their own solutions. Therefore, each chapter provides an overview and comparison of different methods, each of which has been successful in suitable circumstances.

No matter what programming language or framework you prefer to work with - this book will suit everyone. In it, you will read about the architecture and algorithms, about the fundamental principles and practical limitations, what are the rationales for any decisions made during design.

None of the ideas presented in the book is truly innovative, many of them are for decades. All this has already been expressed somewhere - at conferences, in research articles, posts, code examples, bug trackers and programmer folklore. However, as far as I know, no one has yet collected, compared or evaluated all these ideas in one book in this way.

I hope that as soon as you understand what options there are, what are the advantages and disadvantages of each of them, this will help you grow professionally. Consciously making compromises and choosing the right tools, you will create systems that will be more reliable and support them in the long run will be easier. So, we will be better able to cope with our work and write better programs.

Source: https://habr.com/ru/post/309106/


All Articles