📜 ⬆️ ⬇️

Distributed computing: some theory

Nine years ago, I started “in my spare time” to teach computer science at a university in St. Petersburg. And only relatively recently, to my surprise, I found out that in our universities there are practically no courses with a focus on the problems of distributed computing. And even on Habré this topic is not disclosed sufficiently! It is necessary to correct the situation right now.

I wanted to devote an article or even a series of articles to this topic. But then I decided to post my tutorial on the basics of distributed computing, published this year (read, a small book of 155 pages). The result was a hybrid - an article referring to the book. The book is distributed free of charge and is available in electronic form.

Instead of the prologue. Starting the text of the article, I once again wondered why a programmer needs to know the theoretical foundations of distributed computing. I have repeatedly heard this question (and continue to hear) from students and specialists already working in the field of IT. Indeed, why, for example, to know that “the set of events of a distributed computation is partially ordered rather than linearly”? What, so to speak, is the everyday practical use of this fundamental knowledge?
')
I have to admit that I do not have a ready-made, learned answer, which I can give out without thinking. Therefore, each time you have to strain the convolutions, and each time the answers and arguments are different. And now everything is like for the first time ...

Let's try to start from afar. And to make it clearer - with medicine. Because, if it comes to medical errors, the brain begins to work actively and generate strong indignation: horror, horror, could kill a person. What are they, just what? Do not know what to do?

We all quite naturally expect that before starting any manipulations with the human body, doctors still study its internal structure and working principles. We absolutely disagree with the statement that it is much more important for surgeons to undergo practical courses on cutting and sewing instead of many years of cramming theoretical material about what we have inside and why it is there. So why programmers involved in the development of a system with network interaction (i.e., by now almost any system) do not need to know "what is inside and why is it there"? Why are mistakes in IT perceived with a maximum of light irony? Well, yes, well, a bug. And who does not drink does not make bugs? Call it! No, I'm waiting! Among the requirements for programmers, very often for some reason practical skills in one or another programming language come to the fore. And much to the forefront, completely eclipsing the requirements for understanding the basic concepts, theoretical models, algorithms, in the end ... And the programmers themselves, which is no secret, with the beginning of the conversation "about no one needs a theory" fades like flowers in the desert ... Miracles , is not it…

Let me give you a small statement by L. Lamport on this topic (I tried to translate this statement into Russian just below, not far from the original):

For quite a while, I’ve been disturbed by computer science. What are you supposed to do? It is a program that requires computer programming. If you’re trying to find out where to go, you’ll be able to find out what you’re trying to do.
I think that programmers can learn more about how to think better. Thinking is not the ability to manipulate language; it's the ability to manipulate concepts. Computer science should not be about concepts, not languages.

For quite a long time, I’m worried about too much attention to computer language in IT. As a result of an oversupply of such attention, programmers appear who are experts in C ++, but who are not able to write programs that do what is required of these programs. A typical reaction of IT representatives to this problem is to offer programmers to use another more suitable language (programming, specifications, etc.) instead of / in addition to C ++. In turn, a way out of the situation characteristic of the software development industry is seen in providing programmers with more advanced debugging tools, apparently based on the assumption that you can get good programs simply by setting the monkey at the keyboard and then looking for and correcting errors in its code.
My firm belief is that in order to get high-quality programs you need to teach programmers to think better. The ability to think is not the ability to operate a computer language; it is the ability to operate with concepts. The study of information technology should focus on learning concepts, not languages.

To illustrate how important the “concepts” and “elements of the theory” can be in the construction of distributed systems, let's consider a couple of simple examples. To begin with, a group mailing of e-mail messages between users A, B, C and X. Suppose that user A sends an email to the whole group with the subject “General meeting”. Users B and C respond to the whole group with their messages with the subject “Re: General Meeting”.

In fact, events occur in the following sequence:
  1. The first message sent is from user A.
  2. User B receives it, reads it, and sends a response.
  3. User C receives both messages from A and B and then sends his answer, based on both messages from A and B.

However, due to arbitrary and independent delays in message delivery, some users may see a different sequence of events. For example, according to the scenario shown in the figure below, in the mailbox of user X, messages will be placed in the following order:
  1. Message from user C with the subject “Re: Re: General Meeting”.
  2. Message from User A with the subject “General Meeting”.
  3. Message from User B with the subject “Re: General Meeting”.



Yeah, it turns out the order of receipt of messages, observed by different processes, can be different even for FIFO channels! But what if we want the observed order to be the same everywhere (and at the same time we don’t want to use synchronous messaging)? For example, if we write our transport with the appropriate guarantees. Or do we want to build a fault-tolerant service (replicated state machine), where each replica must process incoming requests in the same order for all replicas so that the replica states do not differ? Question…

Let us now consider another implementation of a distributed system in which processes interact only through the exchange of messages, and each process is involved in turning on / off the lamp with a certain light. Let the first process control a lantern with a red light, the second with a yellow one, and the third with a green one. Such is the traffic light system. In the figure below, the switching on of the process of its lamp is indicated by a rectangle, and the switching off by a vertical line; sending and receiving a message - an arrow. Question: Can processes determine which lights are shining at the same time?



So it turns out that in this implementation of the asynchronous system, the processes will not be able to determine whether the red light was on simultaneously with the yellow one. Maybe yes. Or maybe not ... These things will remain unknown. But it will be known for sure that the red and green lights were at the same time switched on. In other words, it turns out that there is not much point in saying that a particular global state is reached in the course of executing a distributed system! As well as very often it is impossible to say whether any condition (predicate) defined on the set of its global states was fulfilled! Again the question: why?

Our response to Chamberlain . In fact, the answers to these and many other questions related to the operation of asynchronous distributed systems are extremely difficult to put into the framework of a single article. That's why I decided to publish several articles at once in one. More precisely, as stated at the beginning of the article, present your small book on the basics of distributed computing, available in electronic form.

Introduction to Distributed Computing >>

From this book you will learn:


What does the book consist of and how to read it?

In the beginning, I tried in two words to explain what goals were set when writing the book, and how it relates to other literature. This is the introduction to the book. It only takes a little over two pages, so it’s worth reading.

The first section is mostly boltological and is devoted to the "quality" features of distributed systems. If you do not know what a distributed system is and what requirements are imposed on it, then it makes sense to read the first section. If you know that incoming messages to the receiving process may give an outdated idea of ​​the sending process, just as the light emitted by a distant star gives an idea of ​​the past state of the star, then the first four points can be skip :) Separately, it is worth noting p. 1.5 “Interaction in distributed systems”, in which I tried to present several simple tasks that demonstrate the difficulties that can be encountered in the development of distributed systems. We will then solve these problems, armed with theoretical knowledge, so it’s worth getting to know them.

The second section presents the model of distributed computing and the main theory used in the further presentation. In a sense, this section is key. However, one must be ready to work with such terms as “set / subset”, “binary relation”, “equivalence relation”, “order relation”, “linear / partial order”. In this section, you will find evidence of some statements. It seems to me that they should, at a minimum, be overlooked (and better studied) for a deeper understanding of the essential features of the functioning of distributed systems, distinguishing them from systems of other classes.

On the basis of the theory presented above, the third section finally considers more practical things, namely, the various mechanisms of logical clocks. With their help, we can arrange events in one or several sequences that could occur in the system, which allows us to significantly simplify the development of algorithms for distributed systems. Examples of the use of logical clocks for solving the problems formulated in paragraph 1.5 “Interaction in distributed systems” are given.

The fourth section is devoted to the study of the basic distributed mutual exclusion algorithms, built without the use of the usual shared variables. The key ideas of these algorithms are also used to solve many other problems in distributed systems. In addition, their study allows to reveal such important issues as ensuring the security properties and survivability of distributed algorithms. Therefore, this section seems to me very useful for review.

Who is this book focused on?

The material of the book should be considered as an introduction to the issue of distributed computing. She grew out of the academic high school environment, and would certainly be useful if you are just starting to work in this area. If you already have some experience in the development of distributed systems and algorithms, perhaps you will find something new for yourself and share your opinion in the comments. If you have many years of experience behind you and are an expert in this topic, I hope that you will be able to supplement me with your thoughts and thoughts.

What would I like?

I would be glad if the material of the book will be useful and informative for you - there will be time to come back here after reading it and write a “thank you”, I will be grateful. In addition, I would like to collect all the material on the topic, including comments, questions and answers in one place, in order to send everyone here, including new courses of new students. If you can help and supplement the material with your thoughts, your experience, I will be doubly grateful. Read, gain knowledge and use them in your work! Download the book in PDF format right now at the link below:

Introduction to Distributed Computing >>

Successes!
Mikhail Kosyakov

Instead of an epilogue. "Information, unlike resources, is conceived to be shared." Robert Kiyosaki

Source: https://habr.com/ru/post/232361/


All Articles