
annotation
Flows are a direct adaptation of the now dominant sequential model of computation to parallel systems. Programming languages do not require (or require very little) syntax changes to support threads, and operating systems and architectures are continually evolving to increase the efficiency of their use. Many technologists (engineers) seek to use multithreading intensively in software and expect to get a significant (predicted) increase in performance. In this paper, I argue that this is not a good idea. Although the use of threads seems to be a small step from sequential calculations, in fact, it is a huge step. The use of streams destroys such inherent properties of sequential calculations as clarity, predictability and certainty (determinism). Threads, as a computation model, are very non-deterministic, and the work of programs also becomes uncertain. Although many of the studied techniques improve the computational model by reducing uncertainty more effectively, I argue that they do not completely solve the problem. Instead of reducing uncertainty, we have to build a model of calculations based on full determinism in the interaction of software components. Uncertainty should be explicitly and accurately entered where there is a need for it, instead of being removed where there is no need. I argue the advantage of developing parallel component coordination languages. I believe that such languages will be much more reliable, and programs will be more parallelized.
1. Introduction
Everyone knows that programming parallel systems is difficult. In addition, following the imperative of parallel programming is becoming more and more difficult. Many engineers predict that the end of Moore’s law will be canceled by increasing the parallelism of computing system architectures (multi-core, multi-processor) [15]. If we expect to continue to increase computing performance, programs should be able to use this parallelism. One of the possible technical solutions is automating the use of parallelism in sequential programs at the hardware level, for example, dynamic dispatching or at the program level using automatic parallelization of sequential programs [6]. Nevertheless, many researchers agree that these techniques are ineffective and can use only a little parallelism of programs. A more natural solution - programs must themselves become "more parallel."
If we understand why parallel programming is so complicated, we have a greater chance to solve the problem. Sutter and Larus [47] noted: “people quickly become disillusioned with concurrency due to more complex code than in sequential programs. Even neat people miss the possible alternation, even among a simple set of partially ordered operations. ”
The physical world is largely parallel, and our survival depends on our ability to talk about parallel physical dynamics. The problem is that we have chosen parallel programming abstractions that remotely resemble the parallelism of the physical world. We began to use these computational abstractions in such a way that now we even forgot that they are not constant. In this paper, I conclude that the difficulties of parallel programming are a consequence of the wrong choice of abstractions, and if we decided to abandon these abstractions, problems would become solvable.
')
The optimistic point of view is given by Barrosso [7], who concludes that technological forces will lead computing in servers and desktops to parallel multiprocessing (concurrent multiprocessing, CMP), and when technologies become mainstream, programming problems will be solved. But we should not underestimate the difficulties. One of the requirements of Stein (Stein) says: "replace the traditional metaphor of the
sequence of steps with the concept of
combining interacting entities " [46]. This article addresses this particular case.
2 Streams
In the practice of developing general-purpose software, we have reached the point at which the only approach to parallel programming dominates all others, namely, the streaming model of computing (or simply streams). Threads are sequential processes that have shared memory. They represent a key model of parallelism, supported by modern computers, programming languages and operating systems. The many commonly used parallel architectures of today (for example, SMP architecture - symmetric multiprocessors or shared memory processing) are a direct hardware implementation of the “flow” abstraction.
Some applications can use threads very effectively. So-called “embarrassingly parallel” applications (for example, applications that inherently spawn many independent processes, such as build utilities — Parallel Virtual Machine (PVM), gmake, or a web server). Due to the independence of these applications, programming them is relatively easy, and the abstraction used here is more “processes” (which do not have shared memory) than “threads”. When such applications share data, they do it using database abstractions that control concurrency using the transaction mechanism. However, the client side of the application is quite complicated. Again quoting Suter and Larus [47]:
“The world of client applications is not well structured. Parallel computing processes can interact and share data in a variety of ways. Heterogeneous code; granularity, complexity of interactions; Pointer data structures — all of which make this type of program difficult for parallel programming. ”Of course, threads are not the only opportunity for parallel programming. In scientific computing, in which performance has long required and uses parallel programming. Language extensions for data parallelism and the messaging library (PVM [23], MPI [39], OpenMP) - these technologies dominate threads in terms of parallel programming capabilities. In fact, computer architectures aimed at scientific computing are often significantly different from general-purpose architectures. For example, they usually support vectors and threads at the hardware level. However, in this area, parallel programs remain laborious to write. Languages C and Fortran dominate, despite the long history of much better parallel programming languages. In distributed computing, streams are often not practical abstractions, since creating the illusion of having shared memory can be too costly. Even in spite of this fact, we have come a long way towards creating a distributed computing mechanism that emulates multithreaded programming. CORBA and .NET, for example, dominate among distributed object-oriented techniques. In them, software components interact with each other through an intermediary that behaves as if we had local objects in shared memory. The abstraction of object-oriented data limits the use of shared memory illusions, so these techniques are fairly effective. They make distributed programming look like multi-threaded programming.
Embedded programming also uses a different parallelism model from threads. Programmable digital signal processing systems often have a VLIW machine architecture. Video signal processing is combined with SIMD, with VLIW and with streaming (streaming) processing. Network handlers provide explicit hardware support for streaming data. Nevertheless, in spite of a significant amount of innovative research, in practice, the programming models for the listed architectures remain primitive. Developers write low-level assembler code that uses the specific properties of the hardware, and combine this code with the code in C only where performance is not critical.
An interesting feature of many embedded system applications is that reliability and predictability are much more important than expressiveness (expressive capabilities) or performance. This is debatable for universal computers, but this question lies aside from the subject of this work. I will prove that achieving reliability and predictability is usually not possible with threads for many applications.
3 Flows as computational model
In this section, I will analyze the fundamentals of threads, without reference to specific libraries or languages, and show that the computation model based on them is essentially incorrect. In the next section, I will review some of the proposed amendments to the model.
Let the set N = {0,1,2, ...} be the set of natural numbers. Also, let B = {0, 1} be the set of binary digits (bits), and B * the set of all sequences of bits of finite length. Then

- the set of all sequences of bits of infinite length (each of which is a function that maps N to B). Also, let

[17]. We will use B ** to represent the state of the computer: its inputs (of which, potentially, is an infinite number) and its outputs (of which there can also be infinitely many). Let be

denotes the set of all partial functions with a domain of definition and change - B ** (partial functions are functions that may or may not be defined on each element of their domain).
The imperative machine (
A, c ) is a finite set

atomic action and
control function with:

. The set
A represents the atomic actions (instructions) of the machine, and the function c reflects the sequence of execution of these actions. We assume that
A contains as one of the instructions a stop instruction

. That is, the stop instruction does not change the state of the machine.
A sequential program of length m

N is called a function

where

. That is, a sequential program includes a finite number of normal instructions and an infinite number of stop instructions. Note that the set of all sequential programs, which we denote by
P , is a countable infinite set.
The execution of this program is a
stream . It starts from the beginning.

which reflects the initial state of the machine and its inputs. For all


(one)
Here,

is the index in the program
p of the following instruction

. This instruction applies to the state.

to get the next state

. If for any


then

= h and the program stops in the state

(i.e. the state no longer changes). If for all initial states the program
p stops, then
p describes the final function in
Q. If the program
p stops only for some

, then it describes a partial function in
Q (it is easy to show that
Q is an uncountable set (even a subset of constant functions of the set
Q is not countable, since B ** is uncountable in itself. And this can easily be demonstrated using the diagonal argument Cantor) Since the set of all finite programs
P -. countable, we can conclude that not all functions in
Q can be given final program Therefore, any sequential machine has limited expressiveness of Turing and Church [48] have shown that a variety of choices sequence.. tionary machines
(A,) is reflected in the
P programs that can give exactly the same subset of
the Q. This subset is a subset
efficiently computable functions)).Now we turn to the kernel of the sequential program. Let the program and the initial state be given, as well as the sequence (1) defined. If the sequence stops, then the function calculated by the program is defined. Any two programs
p and
p ' can be compared with each other. They are equivalent if they compute the same partial function. That is, they are equivalent if they stop for the same initial states and for these states have the same final state (in the classical theory, when the programs do not stop, they are considered equivalent, which creates serious problems for the application of the theory of computations to embedded systems software, in which programs never stop [34]). Such an equivalence theory is important for any constructive formalism. These essential program properties are lost when multiple threads are used. Consider two programs

and

that run in parallel. What do we mean when we replace expression (1) with the formula:

(2)
At each step n, each of the programs can perform the next atomic action. Check now whether constructive equivalence theory is obtained. That is, when two given multithreaded pairs of programs

and

will be equivalent? The extension of the basic theory determines them to be equivalent when
all the alternations stop for the same initial state in the same final state. The huge number of possible variations of alternations makes the proof of equivalence extremely difficult. The exception is trivial cases, for example, the states B ** are divided, so the execution of each program does not affect the sections of other programs.
Thus, if two programs
p and
p ' are launched in a multithreaded environment, we can no longer judge their equivalence. In fact, we need to be aware of all the other threads that can be started, and we should analyze all possible interlaces. We can thus conclude that the equivalence theory cannot be applied to flows.
Even more than that, implementing a multithreaded computational model is extremely difficult. As evidence, one can mention the subtleties of the memory model of the Java language [41, 24], in which even a very trivial program can behave ambiguously.
The core of the abstractions of computations represented in formula (1), on which all widely used programming languages are based, emphasizes the deterministic composition of deterministic components. These actions are deterministic, and their sequential composition is also deterministic. Sequential launching, by the meaning, is a functional composition — a simple model in which deterministic components produce deterministic results.
Threads, on the other hand, are extremely non-deterministic. The job of a programmer is to reduce this uncertainty. We have, of course, developed supporting tools to reduce uncertainty. Semaphores, monitors, and more advanced threading mechanisms (which will be discussed in the next section). These techniques allow the programmer to greatly reduce uncertainty. But this reduction is rarely satisfactory.
Let us give another analogy, suppose that we asked a mechanical engineer to develop an internal combustion engine made of iron cans, hydrocarbons and oxygen molecules, moving randomly due to the effect of temperature. Thermodynamics and chemistry tell us that this is possible in theory, but is it possible in practice?
Let's look at the third analogy. The everyday definition of
madness is to do the same thing over and over again, expecting that the results will be different. By this definition, we, in fact, demand that programmers of multi-threaded systems be insane. If they were normal, they would not be able to understand their programs.
I will prove that we should and can build a parallel model of computation in such a way that it is as specific as possible. Uncertainty should be introduced carefully and only where necessary, as well as in sequential programming. Threads use the opposite approach. They make programs to absurdities vague and rely on a programming style that limits these uncertainties in order to achieve certain goals.
4 How bad is it in practice?
We have proven that threads are an extremely useless mathematical model of computation. As for practice: many programmers today write programs that work.
Is there a contradiction here? In practice, the programmer is provided with the tools to reduce most of the uncertainty. For example, object-oriented programming limits the visibility of certain parts of a program that determine its state. This effectively splits the state space B ** into disjoint sets. Where programs operate on the open parts of this state space, semaphores, mutually exclusive locks, and monitors (objects with mutually exclusive methods) provide mechanisms that programs can use to reduce even more uncertainty. But in practice, these techniques make programs understandable only for very simple interactions.
Consider the well-known and widely used pattern “Observer” [22]. The Java implementation of this pattern is shown in Listing 1. It shows two class methods in which, when the setValue () method is called, a value change notification is triggered (by calling the valueChanged () method for all objects that have been registered by calling the addListener () method .
public class ValueHolder { private List listeners = new LinkedList(); private int value; public interface Listener { public void valueChanged(int newValue); } public void addListener(Listener listener) { listeners.add(listener); } public void setValue(int newValue) { value = newValue; Iterator i = listeners.iterator(); while(i.hasNext()) { ((Listener)i.next()).valueChanged(newValue); } } }
Listing 1. Java implementation of the Observer pattern, designed to run in one thread
The code in Listing 1 is not thread safe, since if multiple threads invoke the setValue () or addListener () method, it may happen that the listeners list is modified while the iterator goes through the list. This will cause an error and, probably, an unexpected program termination.
The simplest solution is to add the Java language synchronized to the definitions of the setValue () and addListener () methods. The synchronized keyword implements mutual exclusion of methods, including instances of the ValueHolder class into monitors, which prevent simultaneous calling of these methods in any two threads. When a synchronized method is called, the thread that calls it tries to block the object, to receive it exclusively. If any other thread blocks the object, then the calling thread will stop until the lock is released.
However, this decision is not reasonable, because may lead to deadlocks. Suppose we have an instance a of the ValueHolder class and an instance b of another class that implements the Listener interface. The latter class can do anything in its valueChanged () method, including getting a lock on another monitor. If he cannot get the lock, he will continue to block the object of the ValueHolder class. In the meantime, another thread may call the addListener () method for object a. Both streams will then be blocked without hope of unlocking. This type of potential deadlock is hidden in many programs that use monitors.
Consider the improved implementation shown in Listing 2. At the time the object is locked, the setValue () method makes a copy of the listeners list. Since the addListeners () method is synchronous, it helps to avoid the occurrence of an error when modifying the list. Further, the valueChanged () method is called outside of a synchronized block to avoid deadlock.
public class ValueHolder { private List listeners = new LinkedList(); private int value; public interface Listener { public void valueChanged(int newValue); } public synchronized void addListener(Listener listener) { listeners.add(listener); } public void setValue(int newValue) { List copyOfListeners; synchronized(this) { value = newValue; copyOfListeners = new LinkedList(listeners); } Iterator i = copyOfListeners.iterator(); while(i.hasNext()) { ((Listener)i.next()).valueChanged(newValue); } } }
Listing 2. Java implementation of the Observer pattern and attempts to make code thread-safe
It is tempting to think that we have solved the problem, but in fact, the code is still incorrect. Suppose two threads call the setValue () method. One of them will set the value last, leaving this value in the field of the object. But listeners can be notified that the value has changed in the reverse order. "Listeners" can conclude that the final value contained in the object of the ValueHolder class is incorrect.
Of course, this pattern can be implemented in Java so that it works steadily (I leave it as an exercise for the reader). The fact is that even this simple and commonly used design pattern requires some rather complex analysis of possible call alternations. I think most multithreaded programs have such errors. Moreover, I believe that these errors today are not the main obstacles just because today's architectures and operating systems provide modest possibilities for parallel programming. The cost (time) of switching process contexts is too high, so only a tiny percentage of possible alternations of thread instructions occur in practice. My guess is that most multi-threaded general-purpose applications, in fact, contain many errors in the implementation of parallelism, because Since the multi-core architecture becomes the most common, these errors will manifest as system crashes. Such a scenario is very harsh for computer vendors: their next generation of machines is famous for the fact that programs on these machines often fall.
These same vendors protect multi-threaded programming because they have support for concurrency and are willing to sell it to us. For example, Intel is conducting an active campaign, as a result of which many of the leading academic programs on information technology will focus on multi-threaded programming. If they succeed in their work, then the next generation of programmers will use multi-threaded programming more intensively, and then the next generation of computers will become almost useless.
5 Adjusting the flow model with more aggressive uncertainty reduction
In this section, I will discuss approaches for solving the problem described in the section header. These approaches will preserve the thread-based computational model for programmers, but will provide them with more efficient mechanisms for eliminating the uncertainty of program behavior.
The first technique is to improve the software engineering process. This is important for obtaining reliable multithreaded programs. This, however, is not enough. Here is a warning from the Ptolemy project run by the Berkeley Institute. At the beginning of the year 2000, my group began developing the Ptolemy II core [20], a programming environment that supports the parallel computing model. One of the early goals was to allow the modification of parallel programs using the GUI in the mode of their execution. The difficulty is to ensure the consistency of the flows in the conflicting program structure. It was supposed to use a strategy with Java-streams and monitors.
One of the goals of the pilot project was to verify whether effective software engineering practitioners would be useful for scientific research. We developed a process that included: a rating system for the “maturity” of a program consisting of four levels: red, yellow, green, and blue; project analysis, code analysis, nightly builds, regression testing, automated code markup [43]. The part of the core that guaranteed consistency in the presentation of the program structure was written in early 2000. The analysis of the project was yellow on the entered scale, code analysis was green. The analysis was carried out by experts in the field of parallel programming. We wrote regression tests for 100% of the code. The nightly builds and regression tests were run on a dual-processor SMP machine, which was manifested in the excellent behavior of the programs from the behavior on the single-processor development machines. The Ptolemy II system has become widely used. No problems were observed until deadlock appeared on April 26, 2004 (4 years after the start of development).
Obviously, our relatively rigorous practice of designing parallel programs helped to identify and fix a lot of bugs related to parallel execution. But in fact, such a serious bug as deadlock has not been identified, and this is alarming. How many such problems remain? How long is it necessary to test so that we can be sure that there are no bugs? With regret, I must conclude that testing cannot reveal all the problems in a non-trivial multithreaded code.
Of course, there are seemingly simple rules for eliminating the appearance of deadlocks [32]. For example, the lock is always in the same order. But such a rule is very difficult to apply in practice, because There are no specialized techniques in common languages that would show how the method was blocked by a call. Therefore, it is necessary to conduct a study of the source code of all the methods that were called, as well as all the methods that were called in these methods, etc. Even if we correct this language problem by making locks part of the method signature, this rule will make it very difficult to implement symmetric access (when interactions can be initiated from either end). A solution to the problem in which the implementation of mutually exclusive locks would not be extremely difficult is unknown. If programmers cannot understand their code, the code will not be reliable.
One can conclude that the problem is how Java implements the threading model. The synchronized keyword may not be the best tool to reduce uncertainty. Indeed, the version of Java 5.0, introduced in 2005, introduced additional mechanisms for synchronizing threads. They enriched the programmer’s toolkit to reduce uncertainty. But these mechanisms (for example, semaphores) are still difficult to use, and it is very likely that their use will result in incomprehensible programs with cunning hidden errors.
Only the construction of a competent software design process does not solve the problem. Another approach that can help is the use of design patterns that have been specifically studied for parallel computing [32, 44]. Indeed, this is a huge help in the case when the programming task corresponds to one of the patterns. However, there are two difficulties. One is the implementation of a pattern, which can be very complex, even with detailed instructions. Programmers will make mistakes, and there is no technology to automatically check the correctness of the template implementation. More importantly, patterns can be difficult to combine. Their properties are usually incompatible, so non-trivial programs that require the use of more than one pattern are unlikely to be understood.
A more general case of using patterns in parallel computing is searching the database, especially using the transaction mechanism. Transactions support asynchronous calculations on copies of data, followed by
commit or abort (accepting or canceling a transaction). The acceptance of a transaction is carried out in the event that no conflict has occurred during its implementation. Transactions are distributed (usually for databases) or performed in software shared memory machines [45], or, most interestingly, performed in hardware shared memory machines [38]. In the latter case, the technique fits well with the cache alignment protocols, which are therefore necessary for these machines. Transactions eliminate unplanned locks, but even despite their recent improvements (see [26]), still remain a very uncertain interaction mechanism. They are well suited for situations that are in themselves uncertain, in which, for example, several participants compete for resources. But they are not suitable for building deterministic parallel interactions.
A particularly interesting use of templates was in MapReduce [19]. This template was used for large-scale distributed processing of large amounts of data by Google. While most of the templates are granular general data structures and synchronization of access to them - MapReduce is a framework for constructing large distributed programs. This template was developed based on high-level functions of Lisp and other functional languages. Template parameters are part of the functionality of the system in the form of code, rather than "pieces" of data.
Templates can be encapsulated by library experts in the same way as was done with MapReduce, parallel data structures in Java 5.0 and STAPL in C ++ [1]. This is a major improvement in the reliability of the implementation, but it requires the programmer to discipline to follow the rules for performing parallel interactions using these libraries. Collapsing the capabilities of these libraries into languages in which the syntax and semantics implement these restrictions can lead, ultimately, to a much easier process of developing parallel programs.
High-level templates (such as MapReduce) provide interesting challenges and opportunities for developers of programming languages. These templates perform more the role of
coordination languages , rather than traditional programming languages. New coordination languages that are compatible with existing languages (Java, C ++) are more likely to replace traditional languages.
A compromise could be the expansion of existing programming languages with keywords to support concurrency. This will allow the use of most of the legacy functionality where parallel programming is not required, but will require changes in functionality to support parallelism. This strategy is followed in Split-C [16] and Cilk [13] (C-like languages with multithreading support). In Cilk, a programmer can use the keywords
cilk, spawn, sync in his C program. The goal of Cilk is to support dynamic multithreading in the context of shared memory, in which costs arise only when parallelism is actually used (that is, when using a single processor, there are practically no costs).
A similar approach is to combine language extensions and the limitations of the expressiveness of existing languages in order to achieve more consistent and predictable behavior. For example, the Guava language [5] is a restriction of the Java language, in which unsynchronized objects are inaccessible from different streams. Also, there is an explicit distinction between locks that guarantee data integrity (read locks) and locks that allow secure data modification (write locks). These changes in language reduce a significant part of the uncertainty, without causing degradation of efficiency, but nevertheless the risk of deadlocks remains.
Another approach, in which the emphasis is on avoiding mutual locks, is called
promises , because implemented by Mark Miller in the E programming language (see
www.erights.org ). In it, instead of blocking access to shared data, the program interacts with the data broker (in such a way as if they directly accessed the data).
Another approach leaves programming languages and concurrency mechanisms unchanged, and instead introduces a formal analysis of programs to identify where potential errors occur in multi-threaded programs. This is done in Blast [28] and Intel thread checker. Such an approach can greatly assist in analyzing the behavior of programs that are difficult to understand. Less formal debugging techniques, such as Valgrind, can also help. In spite of this, formal and non-formal techniques still require considerable experience in application and suffer from scalability limitations.
All the above techniques reduce some of the uncertainty of the flow programming model. Nevertheless, even as a result of their use, very non-deterministic programs are obtained. For applications that are characterized by uncertainty (servers, parallel access to the database, competition for resources), they are suitable. But achieving certain goals through uncertainty is still difficult. Performing deterministic parallel computing requires consideration of the problem from different angles. Instead of working with such a highly non-deterministic mechanism as threads and placing the work on uncertainty reduction on the programmer, we should start working with deterministic mechanisms and introduce uncertainty only where necessary. We will look at this approach in the following sections.
6 Alternatives to a streaming computation model
Consider again the Observer pattern, [22] shown in Listings 1 and 2. This is a trivial design pattern. The implementation should be simple. But, as shown above, it’s not easy to implement this pattern using threads.
Consider Figure 1, which implements this rendezvous-based template (the concept of parallel programming) in Ptolemy II [20]. The signed rectangle in the upper left corner is the diagram designation, which is the Communicating Sequential Processes (CSP) - a formal language for describing patterns of interaction between parallel systems) -like [29] parallel program in which each component is a process and the interaction is carried out through a rendezvous. Processes themselves are implemented in ordinary Java, so such an environment can be viewed as a coordination language (which, by chance, is a visual programming language). The implementation of the rendezvous area on Ptolemy II was made on the basis of [2] and includes the “Merge” block, which implements the conditional rendezvous. In the diagram, the Merge block determines that either of the two Value Producers (supplier / producer of values) can participate in the rendezvous with the Value Consumer and Observer. That is, two possible tripartite rendezvous may occur. These interactions can occur continuously in a random order.

Figure 1. Observer pattern implementation in a rendezvous-based coordination language of visual programming.
To begin with, it should be noted that when you begin to understand the meanings of the icons, the diagram clearly expresses the Observer pattern. In addition, everything in the program is deterministic, except for clearly non-deterministic interactions specified by the Merge block. If this block were absent, we would have a program in which deterministic interactions between deterministic processes took place.
Deadlock cannot occur (unless, of course, the lack of cycles in the diagram ensures that the dealock will not occur). Also, the three-participant rendezvous way of ensuring that the Value Consumer and Observer receive the values in the correct order. The Observer pattern becomes simple (as it should be).The trivial programming problem has become simple, and we can begin to look at interesting developments. Figure 2 shows the Observer pattern implemented using PN Director, which implements a parallelism model called Kahn's processing network (PN)[31]. In this model, each icon is a process, but the place of interactions in the form of a rendezvous, the processes interact by sending messages with an infinite queue and locks for reading. In the original PN model [31], read locks ensured that each network would perform deterministic calculations. In our case, the model is supplemented by some primitive, which can be called a nondeterministic Merge, which explicitly unifies the threads. Note that the augmented model with apparent uncertainty is usually used for embedded software [18].
Figure 2. Implementing the Observer pattern using the network parallelism model.The model in Figure 2 has all the advantages characteristic of the model in Figure 1, and also has an additional property — the Value Consumer may not explicitly support the Observer pattern. Notifications can be queued for further processing. In streaming implementation, you need to really try to implement this form of interaction.A third implementation method can be introduced based on the nature of the uncertainty expressed by a non-deterministic union (merge). The principles of synchronous languages [8] could guarantee correctness. In Ptolemy II, a similar model could be implemented using SR Director (synchronous / reactive), which is a synchronous model similar to Esterel [12], SIGNAL [10] and Luster [25]. The latter is successfully used for programming and designing parallel, with high security requirements, software for aircraft control applications [11]. Using a streaming programming model for this purpose would be stupid.The fourth implementation specializes in synchronizing non-deterministic events. In Ptolemy II, a similar model is called the DE director (discrete events, discrete events). It provides a specification for synchronization with strict semantics [33], related to the similar hardware description languages such as VHDL, Verilog, and network modeling environments like Opnet Modeler.In all four cases (rendezvous, PN, SR, DE) we started with complete determinism in the interactions. Non-determinism was neatly introduced exactly where necessary. This design style is very different from the flow style, which starts with a highly non-deterministic interaction mechanism and tries to reduce undesired non-determinism.The implementations shown in Figures 1 and 2 use Java threads. However, the software model includes not only the thread model. Comparing with all the techniques described in the previous section, it is closest to MapReduce, which is a kind of data streaming for performing calculations. But, unlike MapReduce, it supports a strong coordination language, expressive enough to describe a wide range of interactions. In fact, two different coordination languages are given here: with rendezvous semantics and with PN semantics.This kind of parallelism is certainly not new. Component architectures in which data flows between components can be called "actor-oriented" [35]. They take many forms. Unix pipes (pipes) resemble PN, although they are more limited, since do not support cyclic graphs. Messaging packets (MPI, OpenMP) include opportunities for rendering rendezvous and PN, but in a less structured context, which places more emphasis on expressiveness than on deterministic interactions. The naive user of such packages may be discouraged by the unexpected result of the display of uncertainty.Languages such as Erlang [4] in their concurrency model make messaging an integral part of a general-purpose programming language. Languages like Ada make rendezvous their integral part. Functional languages [30] and single assignment languages also emphasize deterministic computations, but they are “less parallel”, so control and use of parallelism can be more complex. Data parallelism also emphasizes deterministic interactions, but they require low-level software rewriting.All of these approaches are part of the solution. But it does not seem that any of them can become mainstream. In the next section, I will argue.7 Challenges and opportunities
The fact is that alternatives to the flow model have existed for a long time, but flows still dominate among all the other approaches. There are many reasons for this. Probably one of the main ones is that the kernel of abstractions of computations is stuck in a consistent paradigm. Most widely used programming languages follow this paradigm.Syntactically, threads are just a small extension of these languages (as in Java) or just an external library. Semantically, they completely destroy the essential part of the determinism of language. It is regrettable that programmers are often guided by syntax rather than semantics. Generally accepted alternatives to threads, such as MPI and OpenMP, have the same key feature. They do not change the syntax of the language. Alternatives that completely replace these languages with their own syntax (for example, Erlang, Ada) do not have universal acceptance now and probably will not receive it at all. Even languages with minimal syntax modifications, such as Split-C or Cilk, remain esoteric.The conclusion is that we should not replace conventional languages. We should build them instead. However, building on libraries alone is not enough. Libraries do not allow a variety of structures, enforcing patterns and complex composite properties.I believe that the correct answer lies in the development of coordination languages. They introduce a new syntax, this syntax saves goals, but these goals are orthogonal to those that pursue traditional languages. While parallel general purpose languages (Erlang, Ada) should include syntax for common operations (for example, arithmetic expressions), the coordination language may not include anything other than what is needed for coordination. The syntax can be very different. The “programs” shown in Figures 1 and 2 use visual syntax to implement actor-oriented coordination (in which data flows are more likely to pass between components than management). Although here the visual syntax is used only for pedagogical purposes, it is possible that ultimately visual syntax will be scalable and efficientas a specific part of UML for object-oriented programming. But even if not, it is easy to assume a textual syntax with a similar structure.Although coordination languages were invented long ago [40], but they too many times failed to become widely used. One of the reasons is the loss of homogeneity (homogeneity) of the architecture. The prevailing trend in the research of programming languages: any standing programming language should be of a general purpose. It must be at least expressive enough to express its own compiler. And then, supporters of the language will be considered as traitors if they yield to the new language. The wars of tongues are religious wars. Some of these wars are polytheistic.However, UML, for example, is often used in conjunction with C ++ and Java. Programmers are starting to use more than one language in which additional features are provided by different languages. The programs in Figures 1 and 2 follow this trend.Concurrency models with stronger determinism than flows (such as Kahn's processing network, CSP) have also existed for a long time. Some of them led to the creation of such programming languages as Occam [21] (based on CSP), and some led to the creation of specialized development environments, for example, YAPI [18]. The rest did not have a strong influence on the mainstream programming. I believe that this can change if the parallelism models are used to create coordination languages, rather than to replace existing languages.There are many difficulties along the way. Developing a good language for coordination is no easier than developing a good general-purpose language. For example, it is easy to fall into the trap of false elegance using a set of primitives. It is well known that seven primitives are enough for creating a general purpose language [48], but there is not a single serious language built solely on these primitives.Figure 3 shows two implementations of simple parallel computing. The upper program shows an adaptation of the example [3]: Sequential outputs from data source 1 and 2 alternate with a specific sequence, reflected by a change of order in the Display block. However, this fairly simple functionality is achieved in a complex way. In fact, the understanding of the work of this model is comparable to the puzzle assembly. The program below is much easier to understand. The Commutator performs a rendezvous with each of the outputs in order from top to bottom and thus a similar alternation is achieved. A reasonable choice of language primitives allows a simple, direct and deterministic expression of deterministic goals. The upper model uses a non-deterministic mechanism, although it is more expressive to achieve deterministic goals. In this way,its use is irrational.Of course, coordination languages require mechanisms for implementing scalability and modularity similar to those implemented in traditional languages. This can be done. For example, Ptolemy II is a modern type of system at the level of coordination languages [49]. Moreover, it defines the preliminary forms of inheritance and polymorphism, which are adapted from object-oriented techniques [35]. Huge possibilities are hidden in the adaptation of high-level functions in coordination languages. This will allow the creation of similar MapReduce systems at the coordination level. Several many promising steps in this direction were made in [42].More importantly, in the long term, it is necessary to develop a theory of computation in order to obtain better justifications for parallel computations. Although significant progress has already been made in this direction, I believe that much more can be done. In Section 3, sequential calculations were modeled as the functional mapping of some bit sequences to others. The corresponding parallel model was introduced in [36], in which instead of the function
parallel computations are defined by the function. 
Figure 3. Two ways to implement deterministic interlaces using a rendezvous.In the last formula, T is a partially or fully ordered set of tags, where the order can be time, a causal relationship, or more abstract dependency relationships. Calculations performed on this model are mappings of an “advanced” bit combination into an “advanced” bit combination. This basic formulation can be adapted to many models of parallel computing [9, 14, 37].8 Conclusion
Parallelism in software applications is difficult. However, most of the difficulties are a consequence of the abstractions chosen and used. The dominant abstraction is threads. But most non-trivial multithreaded programs are difficult to understand. These software models can be improved through the use of design patterns, better atomicity of operations (transactions), language improvements, and formal methods. However, these techniques simply cut a huge surplus of non-determinism of the flow model. This model, as before, remains difficult to use.If we expect parallel programming to become mainstream and if we demand reliability and predictability from programs, then we must abandon threads as a programming model. More comprehensible and predictable models of parallelism should be developed. They should be based on the principle: deterministic goals should be achieved by deterministic means. Non-determinism must be carefully and explicitly entered where it is needed in the program. This principle, which seems obvious, is not used in the flow model. Flows must be transferred to the “engine room” (engine room) in order to be used and distributed only by experienced suppliers of this technology.Literature
[1] P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger.STAPL: An adaptive, generic parallel C ++ library. In Wkshp. on lang. and Comp. for Par. Comp.(LCPC), pages 193–208, Cumberland Falls, Kentucky, 2001.[2] F. Arbab. Reo: A channel-based coordination model for component composition. Mathematical Structuresin Computer Science, 14 (3): 329–366, 2004.[3] F. Arbab. A behavioral model for software of components. L'Object, to appear, 2006.[4] J. Armstrong, R. Virding, C.Wikstrom, and M.Williams. Concurrent programming in Erlang. PrenticeHall, second edition, 1996.[5] DF Bacon, RE Strom, and A. Tarafdar. Guava: a dialect of java without data races. In ACM SIGPLAN35.ACM SIGPLAN Notices, pages 382–400, 2000.[6] U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedingsof the IEEE, 81 (2): 211–243, 1993.[7] LA Barroso. The price of performance. ACM Queue, 3 (7), 2005.[8] A. Benveniste and G. Berry. The synchronous approach to reactive and real-time systems. Proceedingsof the IEEE, 79 (9): 1270–1282, 1991.[9] A. Benveniste, L. Carloni, P. Caspi, and A. Sangiovanni-Vincentelli. Heterogeneous reactive systemsmodeling and deployment. In EMSOFT. Springer, 2003.[10] A. Benveniste and PL Guernic. Hybrid dynamical systems theory and the signal language. IEEE Tr.on Automatic Control, 35 (5): 525–546, 1990.[11] G. Berry. Synchronous languages for the development of safety-critical systems.White paper, Esterel Technologies, 2003.[12] G. Berry and G. Gonthier. The esterel synchronous programming language: Design, semantics, implementation.Science of Computer Programming, 19 (2): 87–152, 1992.[13] RD Blumofe, CF Joerg, BC Kuszmaul, CE Leiserson, KH Randall, and Y. Zhou. Cilk: anefficient multithreaded runtime system. In ACM SIGPLAN symposium on Principles and Practice ofParallel Programming (PPoPP), ACM SIGPLAN Notices, 1995.[14] JR Burch, R. Passerone, and AL Sangiovanni-Vincentelli. Notes on agent algebras. TechnicalReport UCB / ERL M03 / 38, University of California, November 2003.[15] M. Creeger. Multicore CPUs for the masses. ACM Queue, 3 (7), 2005.[16] DE Culler, A. Dusseau, SC Goldstein, A. Krishnamurthy, S. Lumetta, T. v. Eicken, and K. Yelick.Parallel programming in Split-C. In Supercomputing, Portland, OR, 1993.[17] BA Davey and HA Priestly. Introduction to Lattices and Order. Cambridge University Press, 1990.[18] EA de Kock, G. Essink, WJM Smits, P. van der Wolf, J.-Y. Brunel, W. Kruijtzer, P. Lieverse, andKA Vissers. Yapi: Application modeling for signal processing systems. In 37th Design AutomationConference (DAC'00), pages 402–405, Los Angeles, CA, 2000.[19] J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Sixth Symposiumon Operating System Design and Implementation (OSDI), San Francisco, CA, 2004.[20] J. Eker, JW Janneck, EA Lee, J. Liu, X. Liu, J. Ludvig, S. Neuendorffer, S Sachs, and Y. Xiong.Taming heterogeneity — the Ptolemy approach. Proceedings of the IEEE, 91 (2), 2003.[21] J. Galletly. Occam-2. University College London Press, 2nd edition, 1996.[22] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison Wesley, 1994.[23] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, B. Manchek, and V. Sunderam. PVM: Parallel VirtualMachine Guide for Network Parallel Computing. MIT Press, Cambridge, MA,1994.[24] A. Gontmakher and A. Schuster. Java consistency: nonoperational characterizations for Java memorybehavior. ACM Trans. Comput. Syst., 18 (4): 333–386, 2000.[25] N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data flow programming languageluster. Proceedings of the IEEE, 79 (9): 1305–1319, 1991.[26] T. Harris, S. Marlow, SP Jones, and M. Herlihy. Composable memory transactions. In ACM Conferenceon Principles and Practice of Parallel Programming (PPoPP), Chicago, IL, 2005.[27] J. Henry G. Baker and C. Hewitt. The incremental garbage collection of processes. In Proceedingsof the Symposium on AI and Programming Languages, volume 12 of ACM SIGPLAN Notices, pages55–59, 1977.[28] TA Henzinger, R. Jhala, R. Majumdar, and S. Qadeer. Thread-modular abstraction refinement. In15th International Conference on Computer Aided Verification (CAV), volume 2725 of Lecture Notesin Computer Science, pages 262-274. Springer-Verlag, 2003.[29] CAR Hoare. Communicating sequential processes. Communications of the ACM, 21 (8), 1978.[30] P. Hudak. Conception, evolution, and application of functional programming languages. ACM ComputingSurveys, 21 (3), 1989.[31] G. Kahn and DB MacQueen. Coroutines and networks of parallel processes. In B. Gilchrist, editor,Information Processing. North-Holland Publishing Co., 1977.[32] D. Lea. Concurrent Programming in Java: Design Principles and Patterns. Addison-Wesley, ReadingMA, 1997.[33] EA Lee. Modeling concurrent real-time processes using discrete events. Annals of Software Engineering,7: 25–45, 1999.[34] EA Lee. What's ahead for embedded software? IEEE Computer Magazine, pages 18–26, 2000.[35] EA Lee and S. Neuendorffer. Classes and subclasses in actor-oriented design. Conference on with InFormal Methods and Models for CodeSign (MemoCode), San by Diego, CA, USA, 2004.[36] EA Lee and A. Sangiovanni-Vincentelli. Framework for comparing models of computation. IEEETransactions on CAD, 17 (12), 1998.[37] X. Liu. Semantic foundation of the tagged signal model. Phd thesis, EECS Department, University ofCalifornia, December 20 2005.[38] H. Maurice and JEB Moss. Transactional memory: architectural support for lock-free data structures.In Proceedings of the 20th annual international symposium on Computer architecture, pages 289–300,San Diego, California, United States, 1993. ACM Press.[39] Message Passing Interface Forum. MPI2: A message passing interface standard. International Journalof High Performance Computing Applications, 12 (1-2): 1-299, 1998.[40] G. Papadopoulos and F. Arbab. Coordination models and languages. In M. Zelkowitz, editor, Advancesin Computers - The Engineering of Large Systems, volume 46, pages 329–400. Academic Press, 1998.[41] W. Pugh. Fixing the Java memory model. ACM 1999 conference on Java Grande,pages 89–98, San Francisco, California, United States, 1999. ACM Press.[42] HJ Reekie. Toward effective programming for parallel digital signal processing. Ph.D. Thesis ResearchReport 92.1, University of Technology, Sydney, 1992.[43] HJ Reekie, S. Neuendorffer, C. Hylands, and EA Lee. Software practice in the Ptolemy project.Technical Report Series GSRC-TR-1999-01, Gigascale Semiconductor Research Center, University ofCalifornia, Berkeley, April 1999.[44] DC Schmidt, M. Stal, H. Rohnert, and F. Buschmann. Pattern-Oriented Software Architecture - Patterns for Concurrent and Networked Objects. Wiley, 2000.[45] N. Shavit and D. Touitou. Software transactional memory. In ACM symposium on Principles of DistributedComputing, pages 204–213, Ottowa, Ontario, Canada, 1995. ACM Press.[46] LA Stein. Challenging the computational metaphor: Implications for how we think. Cybernetics andSystems, 30 (6), 1999.[47] H. Sutter and J. Larus. Software and the concurrency revolution. ACM Queue, 3 (7), 2005.[48] AM Turing. Computability and _-definability. Journal of Symbolic Logic, 2: 153–163, 1937.[49] Y. Xiong. An extensible type system for component-based design. Ph.D. Thesis Technical MemorandumUCB / ERL M02 / 13, University of California, Berkeley, CA 94720, May 1 2002.