📜 ⬆️ ⬇️

Idris language overview

Agda is too mainstream!

"Foresight"
image
( source )

There are practically no materials about the Idris language in Russian, I will try to fix it with a brief overview. I will also try to make the text understandable for beginners, non-functional and unfamiliar with dependent types of programmers, and therefore those familiar with such languages ​​may find it easier to rewind to the end or immediately read the official documentation. I do not dare to write a serious introduction to language and theory, but I hope to show what it is all about.

So, Idris is a pure functional general-purpose programming language with dependent types.

Why is this?


For a short answer, Dijkstra’s famous quotation is appropriate; although the approach has changed since then, the goal remains the same:
The one who thinks that the products of programmers are the programs that they write are deeply mistaken. The programmer is obliged to create credible solutions and present them in the form of convincing arguments, and the text of the written program is only accompanying material to which this evidence applies.

To be more precise, there may be several reasons for its creation and use: the author, Edwin Brady, uses it to research application programming with dependent types; Many, like me, like the ability to make precise specifications of programs; I also hope for a significant reduction in the debugging time of application programs through the formal verification of these. And here begs the second quote:
Testing the program can very effectively demonstrate the presence of errors, but hopelessly inadequate to demonstrate their absence.


Static typing and dependent types


One of the ways to classify typed programming languages ​​is to divide them into static and dynamic. With dynamic typing, the types are unknown at compile time, and therefore cannot be checked; it helps to write programs faster, but at the same time adds a class of errors related to typing that are detected during program execution, which is usually not very pleasant. Idris uses static typing, so what’s next is only about it; but in order not to scare people accustomed to dynamic typing, I will immediately note that the type inference is also used, i.e. types can be omitted explicitly if they are obvious to the compiler from the context.
')
Typification can be limited to several primitive types - for example, strings, numbers, floating-point numbers; then all that we can “say” about the type of some value (for example, the function argument, or the value returned by it) is “string” , “number” , “floating-point number” . But usually this is complemented by something like structures: simple C-like structures, classes, (generalized) algebraic data types, etc., and at the same time synonymous with types ( typedef in C, type in Haskell), which already allows you to "say", for example , “Date” , having previously defined the structure, class or type “date”.

This can be supplemented with the phrase “pointer to” , which at the same time allows the use of arrays, but in high-level languages, generic programming (templates in C ++, generics in Java) is more often supported, which already allows you to build for describing types of phrases like “list of dates” , " , " Event tree " . In other words, types can be parameterized by other types.

And, finally, you can parameterize types with values, which makes it possible to say “a list consisting of five lines” , “a message signed by Vasily” ; function types can then take the form “a function that accepts a message signed by a certain user and returns this user’s data” , instead of the more common “function that receives a message and returns user data” , which leads to much more accurate specifications. Moreover, it makes it possible, so to speak, to switch from subject to predicate, to use the principle of “speaking as types”: “5 is greater than 3” , “it is not true that 5 is greater than 3” , “7 is a prime number” , “Vasily received 2 messages . These are the dependent types that are implemented in Idris, as, indeed, in several other languages ​​(Agda, Coq, Epigram, etc.), but Idris is distinguished from them by the desire to be a modern, applied language.

Also this, under conditions of total functional programming, allows using correspondence between computer programs and mathematical proofs ( Curry-Howard correspondence ): the type of function can be, for example, “A is less than B, B is less than B, it means A is less than B” or “the message is sent to some user, this message is available to this user ” ; then the body of the function becomes the proof, which simply has to return the value of the specified type, and checking the proof is the type checking.

Total functional programming


As mentioned, Idris is a pure functional language . But it also supports the optional requirement of the totality property of functions, by which we mean two properties:

  1. Definiteness is everywhere: a function must be defined for any input data. In pure languages, the absence of this property is perhaps the only possibility for a program to “fall” (the others are IO, in particular, the system kills the process due to some of its actions, and possible compiler bugs).
  2. Strict normalization: with recursive function calls, at least one of its arguments must be strictly reduced (structurally), or the function must be productive at each iteration - i.e. return some part of the result, and promise to calculate the next iteration. In the first case, this guarantees the completeness of the function, bypassing the problem of stopping, and in the second, the ability to read the final prefix of any length in a finite time.

This is important, first of all, for building evidence, but the correctness of conventional programs is beneficial, judging by my observations. By the way, the presence of this property is guaranteed that the result of lazy and greedy calculations of this function will be the same, and calculations in Idris are greedy by default, but any argument can be made lazy.

You can enable this option for individual functions, for files, or with the compiler flag.

Examples


The unusual syntax seems to distract attention from the essence of what is being described, and therefore I will not describe the syntax; Code examples are provided for people familiar with the ML family languages, and for the rest, text descriptions are provided before these examples.

Division

Familiar to all the operation, which is in almost all programming languages ​​(or in their standard libraries) - division. But the division by 0 is prohibited, which leads to various controversial decisions:


In languages ​​with dependent types, and in particular in Idris, another option appears: to prohibit division by 0. In other words, division by 0 is prohibited - and we simply make it impossible; the type of the function is then described as “a function that takes two numbers, the second of which is not equal to 0, and returns a number”. If desired, you can add this specification with a property that speaks about the relation between the return value and the function arguments, i.e. "... such that ...", but it is easier to leave it for separate theorems, but for now let us return to the description of the type of the division function: to the usual two arguments, we need to add a third one - the proof that the second argument is not zero.

Now consider the three cases of using division (I remind you that this is a compile time):

  1. The divisor is known: the proof is trivial, since the equivalence of natural numbers is decidable (in other words, the compiler itself sees if the divider is zero or not), and therefore we can, if desired, wrap such a division function into a function with the usual two arguments, substituting the same simple third argument.
  2. The divisor is unknown, but something is known about it that allows you to make sure that it is not zero: for example, that it is greater than a certain non-negative number. Then you need to either write a proof on the spot, or use a function that takes what we have (proof that the number is greater than some non-negative number), and returns what we need (proof that this number is not zero). Such functions can be called both simple functions and lemmas or theorems, but, as already mentioned, in this context they are one and the same.
  3. We ourselves are not sure that the divisor will not be zero (after careful consideration of the situation in an attempt to prove the inequality, the previous case may turn into this): only in this case we really need to enter a check before the division; if the divisor is zero, then use an alternative solution, and otherwise - use the information received and perform the division.


This approach, of course, is not necessary in Idris, but makes the programmer to make sure that he performs the calculations correctly.

And finally, the sample code is the skeleton for the division function:

 -- (total)      -- "==" -    ,  Bool -- "=" -  ,  total div : (n, m: Nat) -> ((m == Z) = False) -> Nat --   0 - ;     --    , ,      div n Z f = FalseElim $ trueNotFalse f --     div n (S k) f = {-todo-}?div_rhs 


Since this is only a skeleton, a meta-variable is used here, div_rhs ; people familiar with Agda already know “holes” (holes), the weaker version of which are meta variables in Idris, and for the rest I will explain: Idris allows you to see its context (variables that are visible from its position) and purpose (which should be constructed / returned from this position), which considerably facilitates the writing of both evidence and programs. This is how it looks in this case:

  n : Nat k : Nat f : S k == 0 = False -------------------------------------- div_rhs : Nat 


It is also possible to fill them semi-automatically: having the goal to return Nat , you can use the (REPL / IDE) “refine” command, passing, for example, the name of the function plus , and if the plus function is able to return the value of the required type, it will be substituted instead of this meta-variable, and for its arguments will be substituted for two new meta. The process is the same as that used for another function - fully automatic replacement of meta-variables, i.e. search for evidence: in some cases, the required evidence can be found automatically.

Types (slightly edited for readability):

 λΠ> :t div div : Nat -> (m : Nat) -> (m == 0 = False) -> Nat 

 λΠ> :t div 1 2 div 1 2 : (2 == 0 = False) -> Nat 

 λΠ> :t div 1 2 refl div 1 2 refl : Nat 

 λΠ> :t div 1 0 refl (,        ) 


Now let's digress from the implementation of the division function and see what this approach gives us. Let's start with the aforementioned correctness: the function works exactly as it should. We can think about a function as we usually think about what it implements (an arithmetic operation, in this case: we cannot divide by zero, but in the process of division no concepts like exceptions are involved, and the result is a number). We can talk about the function in the same way as about the concept it represents, including formally, i.e. writing reasoning code. In the functions that use it, reasoning and conclusions can now be intertwined with calculations and be tested, rather than remain in the head of a programmer or text comments. And the program becomes a construction from correct reasoning and decisions, and not just a recipe for calculations.

Lists

Working with lists (or arrays) is another common task that leads to approximately the same set of controversial decisions and subsequent errors. In particular, we consider a function that returns the nth element of the list: the problem with its implementation is that we can request an element from a position that is not in the list. The usual solutions are to fall with “access violation” or “segmentation fault”, raise an exception (and, possibly, then fall with it), use Maybe.

In languages ​​with dependent types, a couple more possible solutions appear. The first is analogous to solving the problem of dividing by zero, i.e. The last argument of the function becomes the proof that the requested position is less than the length of the list.

Here you can see that different numbers, lists, "if-then-else", etc. defined in the language itself. The natural numbers implemented in the library are replaced with GMP numbers when compiled in C, and such functions as addition and multiplication are replaced with the corresponding GMP functions, which gives both quick calculations in compiled programs and the ability to reason about them. Now take a look at the code:

 data Nat = Z | S Nat data List a = Nil | (::) a (List a) length : List a -> Nat length [] = 0 length (x::xs) = 1 + length xs index : (n : Nat) -> (l : List a) -> (ok : lt n (length l) = True) -> a index Z (x::xs) p = x index (S n) (x::xs) p = index n xs ?indexTailProof index _ [] refl impossible indexTailProof = proof { intros; rewrite sym p; trivial; } 


indexTailProof - proof using tactics, in the style of Coq, but you can (and I like something more) write them in the style of Agda, i.e. more familiar functions (they say, however, that tactics will soon be in Agda).

The second solution is to use not just a list, but a list of fixed length: a vector, as it is called in the context of languages ​​with dependent types. An example was mentioned above: this is what allows you to say "a list of five lines." Also so-called can help us. the final type is a type, the number of different values ​​of which is finite: we can consider it as the same type of natural numbers, but with an upper bound. This type is convenient in particular because it is easier to say “a number less than n” with it than with natural numbers, where it is rather “a number, and it is less than n”. The type of the function of obtaining the nth element of a vector is then read as “a function that accepts a number less than n and a list of n elements of type a and returns an element of type a”.

In this case, the task becomes not to provide evidence, but to construct a number of the required type (ie, less than n). By the way, the logic in the evidence here is different from the classical (is intuitionistic), and consists in constructing evidence, that is, the statement of the statement / type is any value of this type, so some similarities with the previous approach can be traced, but at the same time the approach to writing a program as a whole it changes.

Code:

 data Fin : (n : Nat) -> Type where fZ : Fin (S k) fS : Fin k -> Fin (S k) data Vect : Nat -> Type -> Type where Nil : Vect Z a (::) : (x : a) -> (xs : Vect na) -> Vect (S n) a index : Fin n -> Vect na -> a index fZ (x::xs) = x index (fS k) (x::xs) = index k xs --  -    ,    --   ""  index fZ [] impossible 


So we go further away from clear evidence, but we are building, nevertheless, correct programs.

Well, since we’ve got to the vectors, here’s the function of combining them, often cited as an example to demonstrate dependent types:

 (++) : Vect ma -> Vect na -> Vect (m + n) a (++) [] ys = ys (++) (x::xs) ys = x :: xs ++ ys 


The body is the same as for ordinary lists, but the type is more interesting: pay attention to m + n .

More examples


The above are examples of individual small functions that Idris helps to make correct, but here are a few more important things offhand:



Both the web framework and the library for interacting with the DOM are written on it (by the way, at the time of this writing, the compiler supports compilation in C, LLVM, Java and JavaScript), and all parsers ( one , two ), and it is aimed at system programming. Rewriting programs from Haskell to Idris is quite simple.

Idris is not used in production at the present time - at least, widely (I heard that it is actively used at Potsdam Institute for Climate Impact Research, but this is probably not quite what is usually meant by production). The main technical obstacle is “non-rolling”, which should be solved by running around.

Conclusion


The language is new, and therefore “raw”, but modern, using modern theory and the practice accumulated in other languages.

Allowing yourself to dream a little about the future use of Idris (and the languages ​​that appear after it), you can imagine, in addition to a general increase in the reliability of programs, technical tasks accompanied by formal specifications, or even consisting of them; growth of services like Proof Market ; libraries of physical, mathematical, and any other theories, and languages ​​like Lojban; articles and books as lists of formal specifications of proven theorems; communication built up from automatically checked replicas and question-answer pairs in particular.

Links and books on the topic




PS


Thanks to those on whom this review was tested, for their comments and ideas.

Source: https://habr.com/ru/post/228957/


All Articles