📜 ⬆️ ⬇️

Gradual programming

Programming is essentially an incremental (or gradual, sequential) process, and the programming languages ​​we use should reflect this fact. This article discusses several different areas in which software models move as they evolve, and also raises the question of how potential research in the field of usability of programming languages ​​will serve in the future to shape the concept of human-oriented programming languages.

Choose the right task


What serious problems do programming languages ​​have that we use in our work in 2018? Which of them after the decision will have the greatest effect on the next generation of programmers?
If you are interested in this question, we recommend to read the post of Graydon Hoare (the creator of Rust) “What's next?” , As well as the post of Stephen Diehl “The near future of programming languages” .
For me, the most attractive feature of the study of programming languages ​​is hidden in this issue - the fact is that the tools and theories that we develop affect not only one specific area, but also potentially all who are engaged in programming. Hence the following question also arises: from where, please tell us, do we know about the needs of every programmer living on Earth? It is easy to work on the X language, based on the new type theory, or on the Y language, in which there is a new feature interesting to me personally - but what about all the other programmers?

This is one of the most important drawbacks of programming languages ​​as a modern field of study. A huge amount of such research is carried out under the banner of the intuition of the researchers themselves, and in addition to them the specific experience of certain people working with specific programming tools, languages, environments, etc. is superimposed. Obviously, the intuitive approach allowed us to move quite far, since we were able to reach our current level - confirming the thesis that smart people most often have good intuition - but let me assume that the obvious stagnation in the widespread use of modern PL research practices first of all with a lack of attention to the end user (in other words, to an ordinary programmer). The opinion I have come across several times is that the last big and truly new idea was Prolog .

It seems to me that looking at programming languages ​​( PL, PL ) through the lenses of human-machine interaction ( HCI ) is the most important meta-problem of the field today. More than ever, we need to conduct polls, interviews, study user experience, involve sociologists and psychologists, and so on - in order to formulate hypotheses based on the data obtained that will affect the "difficult" sections of the programming discipline. It is necessary not only to make the programming process comfortable for those who are just beginning to learn it, but also for everyone else - from gray-haired low-level system developers to young people represented by web developers. Interest in this direction is already beginning to form; for example, a CHI conference called Usability of Programming Languages ​​Special Interest Group is being held, scientific papers such as the Empirical Analysis of Programming Language Adoption , and working groups on language usability are emerging .

However, even if our knowledge of the usability of languages ​​is not so great yet, nothing keeps us from continuing to work on the key problems of PL, which in our opinion will bring tangible results. The manifesto I formulated further is based mostly on my personal experience - I have been programming for over ten years, have been involved in game development (Lua, C), websites (PHP, JS), high load / distributed systems (C ++, Go, Rust ), compilers (Standard ML, OCaml) and data science (Python, R). During this time I managed to work on small scripts, personal projects, open source software, products in tiny (2 people), small (15 people), medium (500 people) and large (2000+ people) companies, and now I am engaged in scientific research. I studied the theory of programming languages ​​at Carnegie Mellon University, and today I teach the CS242 programming language course at Stanford .
')
I told all this for the purpose of making it clear to you: even if we need much more data in order to thoroughly approach the discussion of these problems, I have tried to formulate an informed opinion about the problems that exist in modern programming languages, in many ways various fields of activity and are really found in the real world. Of course, I will not deny that there is a lot that I don’t know about - so, as usual, I suggest you read this post with a healthy degree of criticism.

Thinking gradually


I firmly believe in the following: programming languages ​​must be designed in such a way that they directly correspond to the programming processes that take place in the programmer’s head. We must strive to understand exactly how people think about programs, and try to determine which of the programming processes are understood by the human mind on an intuitive level. There are enough of all kinds of interesting questions, for example:


A simple observation of the human programming process shows that this process is incremental. No one writes the entire program from scratch in one pass, compiles it and immediately releases it, after which it never opens its code again. Programming is when you stick for a long time through trial and error, where the duration of the trial and the severity of the errors closely depend on the specific field and tools. That is why the ability to explore output and quick compilation times is so significant - for example, the ability to change an HTML document and immediately refresh the page to see what happened. Bret Victor in his article “Learnable Programming” discusses this idea in detail.

I call this process “gradual programming” .
I would use the term "incremental" programming, but incremental calculations already have their own, distinct from my and fixed meaning, especially since the term "gradual" is used in the environment of the PL enthusiasts in this context.

As far as I know, the only recorded case of using the term “gradual programming” (apart from this article) is this publication , but the term is given a slightly different perspective. One of its authors is Jeremy Sike - one of the founders of gradual typing .
While paradigms of imperative or functional programming focus on the aspects underlying our mental program model, gradual programming describes the process by which this mental model is formed. In this sense, gradual programming is just ... programming; but, as it seems to me, the new term is appropriate here, since it is useful to us in order not to get confused further.

With gradual programming, the programmer monitors co-evolution ( parallel evolution ) of two things: 1) a syntax representation of the program, expressed for the computer in a programming language, and 2) a conceptual representation of the program, located in the programmer's head. At the beginning of this process, the programmer starts without any syntax (from an empty file) and is usually armed with a very vague idea of ​​how the final program should work. From this point, the programmer continues to take small steps in the direction of building components of the program - until its final version is ready.

If you are programming, you almost certainly went through this process several times, and you probably more or less recognized it in my description - however, usually most of our thinking process occurs implicitly (for example, inside your head), and never appears in the form communications. To make sure that this gradual process exists, let's look at the following example in detail. Suppose I want to write a program that adds a string with text to a file. In my head I have a certain model of the program, which looks something like this:

  =      =          

Then I decide in what language I will write this program - in our case it will be Python. To begin with, instead of trying to write the whole program at once, I just take the first line from the model and try to write it as it is in Python.

 $ cat append.py input_file = input() print(input_file) 

Here I made several decisions. First, I decided that the input would be from stdin (for simplicity), and used the input() function, the standard Python library function. I had to come up with a name for this value, input_file , and this name had to conform to syntactic conventions in Python. I also added a print statement that was not part of my original programming model, but was part of a temporary programming model designed to debug my little programs. Then I will try to execute it:

 $ echo "test.txt" | python append.py Traceback (most recent call last): File "append.py", line 1, in <module> input_file = input() File "<string>", line 1, in <module> NameError: name 'test' is not defined 

Oops, I mixed up input() and raw_input() . The problem was not with my programming model — I still think about the program in exactly the same way as before — but with my “decoding” in Python. Correcting my mistake:

 $ cat append.py input_file = raw_input() input_line = raw_input() print(input_file, input_line) $ echo "test.txt\ntest" | python append.py ('test.txt', 'test') 

Next, I have to figure out how to add a line to the end of the file. In my original mental model, it was encapsulated into the expression “writing the input line to the end of the input file” (“write input line”), but now it’s time to turn this vague idea into more specific steps that I can easy to write in python. In particular, if I already have an understanding of how the file system works, then I know that I must first open the file in append mode , write a line, and then close the file.

After some reflection, my mental model begins to look like this:

   =      =    =                  


So now let's “translate” all of this in Python:
 $ cat append.py input_file = raw_input() input_line = raw_input() file = open(input_file, 'a') file.write(input_line) file.close() $ echo "test.txt\ntest" | python append.py $ cat test.txt test 

Success! Again, the purpose of our example was to demonstrate the co-evolution of the syntactic and conceptual model of the program as we work on it. Based on my programming experience, along with teaching others programming, I can say that she gives a fairly common example of the thinking process that accompanies the way many of us program.

Axis of evolution


The example described above showed us the gradual nature of the programming process, but did not shed any light on how we should approach the process of creating tools that would fit this process. To simplify your task, let's break the evolution of the program into many small axes of evolution. Essentially, let's ask ourselves: what kind of useful information about their program, developers will learn and understand gradually? Then we can suggest how programming languages ​​can help optimize each of the axes separately.

1. Concrete / Abstraction (Concrete / abstract)


When creating programs, the generally accepted way of working is considered to start from a specific example that you are trying to implement, and then a generalization (or abstraction) of this example, which is produced so that it can cover a wider set of use cases. Abstraction is the cornerstone of programming languages, usually provided through functions and interfaces. For example, we can turn our script into a function:

 def append_line(input_file, input_line): #    append_line('test.txt', 'test') append_line('test.txt', 'test again') 

However, the more uncertain your model is from the start, the more difficult it will be to move immediately to an abstract solution, so this evolution from specifics to abstraction is often observed today when working with modern programming languages ​​(again, see the chapter “Create by Abstracting” in Learnable Programming ) .

2. Anonymity / Name (Anonymous / named)


When we are at the beginning of our programming process at the iteration / experimentation stage, it’s natural that we, as programmers, want to optimize the speed of writing code, rather than reading it . One form of this optimization is short variable names and anonymous values. For example, a shortened version of the first version of our script could look like this:

 s = raw_input() f = open(s, 'a') f.write(raw_input()) f.close() 

Here the variable names are less informative than before: we use s instead of input_file , f instead of file , and input_line has lost its name altogether. However, if it is faster to write, and the script will never be read again, why not? If we plan to continue to use this script in a large code base, then we can begin to incrementally change the names to more informative ones so that the citizens conducting code review are satisfied. Here is another example of a gradual change that is easy to apply in practice and which is commonly used among programmers who write in modern programming languages.

3. Imperativeness / declarativeness (Imperative / declarative)


For a variety of reasons, programmers perceived linear, sequential imperative code more naturally as compared to functional / declarative code in terms of their conceptual program model. For example, a simple list transformation will almost certainly use for loops:

 in_l = [1, 2, 3, 4] out_l = [] for x in in_l: if x % 2 == 0: out_l.append(x * 2) 

While a more declarative version abstracts the flow of execution into object-oriented primitives:

 in_l = [1, 2, 3, 4] out_l = map(lambda x: x * 2, filter(lambda x: x % 2 == 0, in_l)) 

The difference between these two approaches is not only stylistic - the declarative code is usually much easier to analyze into a structure, for example, a map (map, map) can be parallelized in a trivial way, while the for loop is generally much worse for this. Such transformations most often occur in languages ​​that are supported by a mixture of imperative and functional code (at least - closures).

4. Dynamic typing / static typing (Dynamically typed / statically typed)


The rise of the popularity of dynamically typed languages ​​in the last 20 years (Python, Javascript, R, Lua, ...) should be sufficient evidence that people find dynamic typing useful - no matter which side of the barricades you are in , the fact remains . Despite the fact that dynamic typing has many advantages (various data structures, free duck typing, etc.), the simplest is to increase productivity by omitting: the types of variables do not need to be known at compile time, so the programmer does not have to spend his mental energy is also for this.

However, types are still extremely useful tools to ensure correctness and performance, so that a programmer may want to gradually add type signatures to an untyped program if he can be sure that a variable must be of a certain type. This nascent idea, which is called optional or gradual typing, has already gained recognition in Python , Javascript , Julia , Clojure , Racket , Ruby , Hack and other languages.

For example, our program after rewriting might look like this:

 input_file: String = raw_input() input_line: String = raw_input() file: File = open(input_file, 'a') file.write(input_line) file.close() 

5. Dynamic deallocation / static deallocation (Dynamically deallocated / statically deallocated)


You can look at memory management, or at life time, through the same prism through which we looked at types. In 2018, all programming languages ​​should have secure access to memory , the only question here is whether memory allocation should be defined at compile time (such as in Rust with its borrow checker ) or at run time (as in any other language in which there is a garbage collector). Garbage collection is, without a doubt, a big plus for the programmer's performance - so it is natural that our initial software model should not assume how much time each value should live until the deallocation happens.

However, as before, point control over the lifetime of a value is still useful for correctness and performance. Ownership and borrowing, similar to that implemented in Rust, can help structure the system to eliminate data races during competitive programming, as well as avoid the need to use a garbage collector at runtime.

Extending our typed example, it might look like this:

 input_file: String = raw_input() input_line: String = raw_input() file: mut File = open(&input_file, 'a') file.write(&input_line) file.close() 

As far as I know, unlike the optional, or gradual typing, there is no work in the direction of creating gradual (optional) memory management (with the exception of this publication ).

6. General Purpose / Object Orientation (General-purpose / domain-specific)


When a programmer starts writing a program, he wants every function in his language available for use in the implementation to be used to achieve the highest possible prototyping speed in order to increase the productivity of the creative process. Usually it doesn’t occur to anyone during software development, except perhaps from a coding-style perspective (“which subset of Python should I use?”).

However, a growing wave of high-performance, domain-specific languages ​​like TensorFlow , Halide , Ebb , and Weld point out that if a programmer uses only a small subset of general-purpose programs (for example, differentiable pure functions), the optimizer can produce a significantly more efficient code. From the point of view of gradualness, this suggests the possibility of a future workflow in which the programmer gradually narrows a subset of the language that he uses in the individual part of the program so that the compiler can provide a much better optimized backend for it.

Concept of gradual programming


Not that these axes could be called a new idea - in the sense that, say, a compromise between static and dynamic typing has been known for quite some time. However, what I wanted to demonstrate to you is that these solutions are not one-time and one-time solutions - they may change as the program itself is developed. Therefore, all the axes are most likely 1) changing as the individual program evolves, and 2) changing with the help of precise coordination, i.e. for example, when typed and untyped code must be mixed within the same system. This is anathema to the “all or nothing” approach that most languages ​​today adhere to: everything must either be typed or must be untyped. Everything should either be collected by the garbage collector, or should not be collected at all. This forces programmers to deal with absurd trade-offs, such as a complete change of the entire ecosystem of a language in order to take advantage of static typing.

In light of this, advanced gradual programming involves the following research process:


Each of these steps requires further investigation. I gave an initial analysis of my perspective of the important incremental parts of the programming process, but I inevitably missed many others. For some of the axes I mentioned (memory management, language specialization), there are still no documented attempts to systematize attitudes towards them at the language level. I think that working on extensible compilation will help speed up the speed of developing language extensions on these fronts.

Even for more beaten-up areas like gradual typing , publications began in 2016, whose authors wondered “Is it true that Sound Gradual Typing has come to an end?” ( Very much alive and feeling fine , thank you very much for your concern). CircleCI abandoned the use of gradual typing at Clojure two years later. Let the theory itself be well understood and the work productivity grows - there is no practical information on how programmers interact with optional (gradual) types. Is it easy to write programs using this typing? Are partly typed programs more complicated than fully typed / untyped programs? Can the IDE solve any of the problems listed here? And so on - we have no answers to these questions.

Another important issue of gradual programming is the choice between type inference and type signatures (annotations) ( inference vs. annotation ). As our compilers become smarter, it becomes easier for the compiler to output information like types, lifetime, etc. in the case when the programmer did not specify them explicitly. However, the output engines are far from perfect, and in the case when they cannot work as they should (as far as I know), each language capability based on the output will require an explicit annotation from the user, as opposed to suitable dynamic checks.

I imagine it this way: gradual systems have three modes of operation: for any particular type of program information (for example, type), it is either explicitly annotated , inferred , or deferred to runtime ( deferred to runtime ).

This question is interesting in itself, if we try to consider it from the point of view of HCI (human-computer interaction): how effectively can people program in a system where the missing type indication may or may not be displayed? How does this affect usability, performance and correctness? Most likely, all these questions will become another important research area for gradual programming.

In general, I welcome the wide opportunities that can provide us with methods of gradual programming. When they begin to gain popularity, programmers of all skill levels will be able to benefit from languages ​​that best fit their way of thinking.

Comments can also be sent to the post author of the article, as well as left on Hacker News.

Source: https://habr.com/ru/post/352568/


All Articles