📜 ⬆️ ⬇️

Type inference in programming

When writing algorithms, a situation often arises when a function needs to be called with the same number of arguments, but with different types of arguments. We decide.

A contrived C ++ example:

#include <iostream> #include <string.h> using namespace std; void a(int x) { cout<<"number\n"; } void a(float x) { cout<<"number\n"; } void a(const char *x) { cout<<"string\n"; } template <typename T> void f(void(*g)(T x), T x) { g(x); } int main() { f(a, 1); f(a, 1.0f); f(a, "Alexey"); return 0; } 

The same Javascript code:

 function f(g, x) { g(x) } function a(x) { alert(typeof x) } f(a, 1) f(a, 1.0) f(a, 'Alexey') 

Obviously, the javascript code is more successful. Let's see how this happens.
')

Dynamic approach


In javascript, each variable has a “type” property, which is returned when the typeof operator acts on a variable. This property is stored at runtime, whenever a variable exists. And when a statement is applied to a variable, the function that implements the statement checks the type of the variable. In accordance with the type of some actions are performed. For example, 2 + 2 returns 4, 2 + 'Alexey' returns '2Alexey'. Those. language operators are required to almost always check the types of variables to which they apply. Let's call this approach “dynamic typing”.

The disadvantages of this approach are quite obvious, more precisely, one minus, it is necessary to do additional calculations during the execution of the program.

A plus will be a code in which there is no concept of type in an explicit form, only a sequence of actions, which has a positive effect on readability.

Another advantage will be the code in which you do not need to duplicate a significant part of the logic. This is slightly seen in the example above; the function f is implemented once, but the result of its execution depends on the function passed to it, which implements different logic.

Static approach


The C language requires you to specify the type of the variable and prohibits changing it. As a result, during the execution of the program there are no type checks, operators are uniquely applied to variables. But there are no beautiful mechanisms for generalization. That is: the universal pointer void *, the ability to transfer functions to a pointer to another function. In C ++, which is in some way an extension of the C language, mechanisms such as function polymorphism (overload) and patterns have appeared.

Polymorphism of functions allows you to call 2 functions that do the same thing but with different data types, the same name.

Templates introduce a new type — common to all types. These mechanisms are good in that with their help, without the participation of the programmer, the substitution of functions with the necessary types is performed. Those. if the template of one function involves 2 different types, then at compile time 2 different functions will be created. When calling the original function in the code, the call of the copied function of the desired type will be inserted into the compiled file.

Advantages of the approach: the maximum speed of execution.
Cons: more memory required to perform.

Type inference


In other words, when maximum performance is needed and a static approach is enough, a static approach is more preferable. But when choosing, say, the C ++ language, the readability of the code is lost. After all, for every sneeze you have to specify the type with which the function is called.

For example, quick sort code:

 #include <iostream> #include <string.h> using namespace std; template<class T> void qsort(T** array, int length, int compare(T *a, T *b)) { int left = 0; int right = length-1; T *middle_element; middle_element=array[length/2]; do { while( compare(array[left], middle_element)<0 ) left++; while( compare(array[right], middle_element)>0 ) right--; if(left<=right) { swap(array[left], array[right]); left++; right--; } } while(left<=right); if(right>0) qsort(array, right, compare); if(left<length) qsort(array+left, length-left, compare); } int cmp(char *a, char *b) { return strcmp(a, b); } int main() { char *strings[]={"Alexey", "Borisenko"}; qsort(strings, 2, cmp); return 0; } 

With the complete deletion of types, it only gains in readability:

 #include <iostream> #include <string.h> using namespace std; qsort(array, length, compare) { left = 0; right = length-1; middle_element = array[length/2]; do { while( compare(array[left], middle_element)<0 ) left++; while( compare(array[right], middle_element)>0 ) right--; if(left<=right) { swap(array[left], array[right]); left++; right--; } } while(left<=right); if(right>0) qsort(array, right, compare); if(left<length) qsort(array+left, length-left, compare); } cmp(a, b) { return strcmp(a, b); } main() { strings={"Alexey", "Borisenko"}; qsort(strings, 2, cmp); return 0; } 

From these considerations it was concluded that abandoning types is a good idea. Let's discuss how to implement it.

We distinguish 3 main types:

1) Simple
2) Compound
3) Function

To a simple type we assign all types that do not have other nested (string, numeric, array ...). Composite constructions, such as a heterogeneous array, structure, etc., are related to the composite. The function stands alone because it has arguments that are types themselves.

So, type inference is performed with the following operations:

1) Assignment
2) function call

When inferring a type, the variable cannot change the type. Proof by contradiction:

 f(a) { x=1 if(a<2) x='Alexey' return x } 

In this code, it is necessary to execute the code in order to find out the return type of the function f, which is a contradiction of the static approach.

This implies the need to check the type for each assignment or a function call during compilation of the program.

Type inference from an expression is made quite unambiguously, since operators, both unary and binary, act in a set of a certain type. Thus, the operator "+" when adding two types of int results in an int. The problem arises if you apply a binary operator to variables with different types. Adding an int to a float can be both an int and a float, or it can be an error altogether, since accuracy may be lost.

Thus, when assigning a value to a variable, the match of the type of the variable with the derived type of expression is checked. This already allows you to infer types for code that does not have a function call that takes another function as input. If there is such a call, then you have to figure out which function to substitute as an argument.

Let us consider in more detail the output of the main types.

The simple is obviously included in the rest. Compound output by assignment:

 x={name:"Alexey"} 

And also through the type description in the function arguments with the indication of all the variables used in this type:

 f(user {name}) { } 

The problem with the function output is that it can take another function as an argument and also return a function. In this case, you have to recursively infer the type of the nested function, and only then infer the types of the arguments.

Ending


Based on everything written by me, the author of the article, I write a programming language, which, being dynamic, will eventually become static. The output is already supported by the type at the first level of the function (without the use of closures).

Source: https://habr.com/ru/post/309074/


All Articles