📜 ⬆️ ⬇️

A guide for the practitioner how to read scientific articles on programming languages

A week ago, I joked that articles on the principles of POPL programming languages ​​should meet the criterion of “intellectual intimidation” in order to be accepted for publication. Of course, this is not true, but the fact is that articles on programming languages ​​look especially frightening for practitioners (or an academic really works in another field of computer science!). They are chock-full of mathematical symbols and phrases such as “judgments”, “operational semantics” and the like. There are many subtle recording options, but you can basically grasp the essence of the article by learning a few basic concepts. So instead of a story about a regular scientific article, I thought that today I would rather write a short practical guide to deciphering scientific articles on the topic of programming languages. Here I follow Benjamin Pierce's book Types in Programming Languages as an authoritative source.

Syntax


Let's start with the clear: syntax. The syntax says which sentences can be used in the language (that is, which programs we can write). The syntax contains a set of terms that constitute the building blocks. Take the following example from Pierce:

t ::= true false if t then t else t 0 succ t pred t iszero t 

Wherever the symbol t occurs, you can substitute any term. If the article mentions terms, t is often used with a subscript to distinguish between different terms (for example, t1, t2). The set of all terms is often denoted as  tau. It should not be confused with the symbol Twhich is traditionally used to designate types.

In the example above, note that the if itself is not a term (this is a token, in this case, a keyword token). A term is a conditional expression if t then t else t .
')
Terms are expressions that can be calculated, and the final result of the calculation in a well-constructed environment should be a value . Values ​​are a subset of terms. In the example above, the values ​​are true , false , 0 , and the values ​​that can be created by successively applying the operation succ to 0 ( succ 0, succ succ 0, ... ).

Semantics


We want to give some meaning to the terms of the language. This is semantics . There are various ways to determine the meaning of programs. Most often you will find references to operational semantics and denotational semantics . Of these, operational semantics is the most common. It determines the value of the program by setting rules for an abstract machine that calculates terms. Specifications have the form of a set of rules for the calculation - we consider them later. In this worldview, the value (meaning) of the term t is the final state (value) reached by the machine, started with t in its initial state. In denotational semantics , a certain mathematical object (for example, a number or function) in some previously existing semantic area is taken as the term value, and the interpretation function relates the terms of the language to their equivalents in the target area. So we indicate what the term represents (or denotes) in this area.

You may also encounter the fact that the authors mention the operating semantics with a small step (small-step) and the operating semantics with a big step (big-step). As the name implies, this refers to how big a leap an abstract machine makes with each rule applied. In semantics with a small step, terms are rewritten gradually, one small step at a time, until eventually they become values. In semantics with a big step (it is also “natural semantics”) we can move from a term to its final value in one step.

In the semantics record with a small step, you will see something like this: t1 rightarrowt2that should be read so that t1calculated in t2. (Instead of subscripts, strokes can be used, for example, t rightarrowt). This is also known as a judgment . Arrow  rightarrowrepresents one step of the calculation. If you meet t1 rightarrowt2then it means “after repeated use of one-step calculations t1eventually turned into t2".

In semantics with a big step, another arrow is used. So that t Downarrowvmeans term tas a result of the calculation went into v. If semantics with a small step and semantics with a big step are used in the same language, then t rightarrowvand t Downarrowvmean the same thing.

According to the convention, the rules are called CAPITAL letters. And if you're lucky, the authors will add the prefix 'E-' to the calculation rules as an additional hint.

For example:

if true then t 2 else t 3  rightarrowt 2 (E-IFTRUE)

Usually calculation rules are set in the style of inference rules (inference rules). Example (E-IF):

 fract1  rightarrowt1 mathrmif t1  mathrmthen t2  mathrmelse t3 rightarrow mathrmif t1  mathrmthen t2  mathrmelse t3


This should be read as "Taking into account the information in the divider, we can conclude that it is in the denominator." That is, in this particular example: “If t1displayed in t1then in this case  mathrmif t1...displayed in  mathrmif t1....

Types


In a programming language, it is not necessary to have a certain type system (it can be untyped), but most have it.

“The type system is a flexibly controlled syntactic method of proving the absence of certain types of behavior in a program by classifying language expressions according to the types of values ​​they compute” - Pierce, “Types in Programming Languages”.

The colon is used to indicate that a given term belongs to a particular type. For example, t:t. A term is considered to be correctly typed (or typed) if there is such a type for which the expression is true t:t. As we had calculation rules, so here there are typing rules. They are also often defined in the style of output rules, and their names can begin with the prefix 'T-'. For example (T-IF):

 fract1: mathrmBool  t2: mathrmT  t3: mathrmT mathrmif t1  mathrmthen t2  mathrmelse t3  mathrmT


This should be read as “Since term 1 belongs to type Bool, and terms 2 and 3 belong to type T, then the term“ if term 1 then term 2 else term 3 ”will belong to type T”.

In functions (lambda abstractions), we also monitor the types of the argument and the return value. You can annotate related variables by specifying their type, so instead of simple  lambdax.tcan write  lambdax: mathrmT1.t2. The type of lambda abstraction (for a function with one argument) is written as  mathrmT1  rightarrow  mathrmT2. This means that the function takes a type argument.  mathrmT1and returns the result of type  mathrmT2.

In the rules of the withdrawal you will see the sign of deducibility - the operator in the form of a turnstile  vdash. So P vdashQshould be read as "You can derive Q from P" or "P means Q". For example, x: mathrmT1 vdasht2: mathrmT2means "From the fact that x belongs to type T1, it follows that type t2 belongs to type T2."

The binding of variables to types for free variables in a function should be monitored. For this we use the typing context (typing environment). Imagine it as a familiar environment that translates variable names into values, but only here we translate variable names into types. According to the convention, the gamma symbol is used to indicate the typing context (  Gamma). It is often found in scientific articles without explanation, according to the convention. I remembered its meaning by presenting the gallows fixing variables with their types hanging on the rope. But you may have another way! This leads to the frequently encountered triple relationship in the form  Gamma vdasht: mathrmT. It should be read like this: “From the context of typing  Gammait follows that the term t belongs to type T ". Comma Operator Extends  Gammaby adding a new binding on the right (for example,  Gamma,x: mathrmT).

Combining this all, you get the rules that look like this (in this case, to determine the type of lambda abstraction):

 frac Gamma,x: mathrmT1 vdasht2: mathrmT2 Gamma vdash lambdax: mathrmT1.t2: mathrmT1 rightarrow mathrmT2


We decipher the rule: “If (what is indicated in the numerator) from the typing context with the restriction x to T1 it follows that t2 belongs to the type T2, then (the part indicated in the denominator) in the same typing context, the expression  lambdax: mathrmT1.t2belongs to the type  mathrmT1 rightarrow mathrmT2".

Type safety


If Jane Austen had written a book on type systems, she would probably call it “Promotion and Preservation” (in fact, I think you can write many interesting essays with such a title!). A type system is considered “safe” (type-safe) if properly typed terms are never stalled during the calculation. In practice, this means that in the chain of inference rules we will never get stuck somewhere without a final meaning and at the same time without the possibility to apply the rule for further advancement (you will see that the authors use the phrase “stuck” and “be at a dead end” - that is what they mean). To demonstrate that correctly typed terms never end up in a dead end, it suffices to prove the theorems of advancement and preservation.


Sometimes you will see that the authors talk about promotion and conservation, without explicitly indicating why this is important. Now you know why!

Church, Curry, Howard, Hoar


Here are a couple of things that you might encounter and which are interesting to know:


In fact, I'm just an interested outsider who is watching what is happening in the community. If you are reading this as an expert on programming languages, and you know more things to include in the practitioner's cheat sheet, please let us know in the comments!

Source: https://habr.com/ru/post/348874/


All Articles