This article describes the difference between statically typed and dynamically typed languages, examines the concepts of "strong" and "weak" typing, and compares the power of typing systems in different languages. Recently, there has been a clear movement towards stricter and more powerful typing systems in programming, so it is important to understand what is at stake when talking about types and typing.
A type is a collection of possible values. An integer can have the values 0, 1, 2, 3, and so on. Boolean can be true or false. You can come up with your own type, for example, the type “DyPyat”, in which the values of “give” and “5” are possible, and nothing more. This is not a string or a number, it is a new, separate type.
Statically typed languages limit the types of variables: a programming language can know, for example, that x is an Integer. In this case, the programmer is forbidden to do x = true
, it will be an incorrect code. The compiler will refuse to compile it, so that we cannot even run such code. Another statically typed language may have other expressive possibilities, and none of the popular type systems is able to express our type of DipAut (but many may express other, more sophisticated ideas).
Dynamically typed languages mark values with types: the language knows that 1 is an integer, 2 is an integer, but it cannot know that the variable x always contains an integer.
The language runtime checks for these labels at different points in time. If we try to add two values, it can check whether they are numbers, strings, or arrays. Then it adds these values, sticks them together or gives an error, depending on the type.
Static languages check types in the program at compile time, even before the program starts. Any program in which types violate the rules of the language is considered incorrect. For example, most static languages will reject the expression "a" + 1
(the C language is an exception to this rule). The compiler knows that "a" is a string, and 1 is an integer, and that +
works only when the left and right parts are of the same type. So he does not need to run the program to understand that there is a problem. Each expression in a statically typed language is of a specific type that can be defined without running code.
Many statically typed languages require type designation. A function in Java public int add(int x, int y)
takes two integers and returns the third integer. Other statically typed languages can automatically determine the type. The same addition function in Haskell looks like this: add xy = x + y
. We do not communicate types to the language, but it can define them itself, because it knows that +
works only on numbers, so x
and y
must be numbers, so the add
function takes two numbers as arguments.
This does not reduce the "static" type system. The type system in Haskell is famous for its static nature, rigor and power, and in all these fronts Haskell is ahead of Java.
Dynamically typed languages do not require to specify the type, but do not define it themselves. Variable types are unknown until they have specific values at startup. For example, a function in python
def f(x, y): return x + y
can add two integers, glue strings, lists, and so on, and we cannot understand what is happening until we run the program. It is possible that at some point the function f will be called with two lines, and with two numbers at another time. In this case, x and y will contain values of different types at different times. Therefore, it is said that values in dynamic languages have a type, but variables and functions are not. A value of 1 is definitely an integer, but x and y can be anything.
Most dynamic languages will give an error if the types are used incorrectly (JavaScript is a known exception; it tries to return a value for any expression, even when it does not make sense). When using dynamically typed languages, even a simple error of the form "a" + 1
can occur in the combat environment. Static languages prevent such errors, but, of course, the degree of prevention depends on the power of the type system.
Static and dynamic languages are built on fundamentally different ideas about the correctness of programs. In the dynamic language "a" + 1
this is the correct program: the code will be launched and an error will appear in the execution environment. However, in most statically typed languages, the expression "a" + 1
is not a program : it will not be compiled and will not run. This is an incorrect code, just like the random character set !&%^@*&%^@*
Is an incorrect code. This additional concept of correctness and incorrectness has no equivalent in dynamic languages.
The concepts of "strong" and "weak" are very ambiguous. Here are some examples of their use:
Sometimes "strong" means "static."
Everything is simple, but it is better to use the term "static", because most use and understand it.
Sometimes "strong" means "does not do implicit type conversion".
For example, JavaScript allows you to write "a" + 1
, which can be called "weak typing." But almost all languages provide one or another level of implicit conversion, which allows you to automatically switch from integers to floating-point numbers like 1 + 1.1
. In reality, most people use the word “strong” to define the boundary between an acceptable and unacceptable transformation. There is no generally accepted border, they are all inaccurate and depend on the opinion of a particular person.
Sometimes "strong" means that it is impossible to circumvent the strict typing rules in a language.
xs
is an array of four numbers, then C will happily execute the xs[5]
or xs[1000]
code, returning some value from the memory immediately after xs
.Let's stop. Here is how some languages meet these definitions. As you can see, only Haskell is consistently "strong" in all respects. Most languages are not so clear.
Tongue | Static? | Implicit conversions? | Strict rules? | Safe for memory? |
---|---|---|---|---|
C | Strong | It depends | Weak | Weak |
Java | Strong | It depends | Strong | Strong |
Haskell | Strong | Strong | Strong | Strong |
Python | Weak | It depends | Weak | Strong |
Javascript | Weak | Weak | Weak | Strong |
("When as" in the column "Implicit conversions" means that the separation between the strong and the weak depends on what transformations we consider acceptable).
Often the terms "strong" and "weak" refer to an indefinite combination of different definitions above, and other definitions not shown here. This whole mess makes the words "strong" and "weak" almost meaningless. When you want to use these terms, it is better to describe what exactly is meant. For example, you can say that "JavaScript returns a value when a string is added to a number, but Python returns an error." In this case, we will not spend our strength on trying to come to an agreement on the set of meanings of the word "strong." Or, even worse: we will come to unresolved misunderstanding because of terminology.
In most cases, the terms "strong" and "weak" on the Internet are unclear and poorly defined opinions of specific people. They are used to call the language "bad" or "good", and this opinion turns into technical jargon.
As Chris Smith wrote :
Strong typing: A type system that I love and feel comfortable with.
Weak typing: A type system that bothers me or is not comfortable with me.
Can I add static types to dynamic languages? In some cases, yes. In others, it is difficult or impossible. The most obvious problem is eval
and other similar features of dynamic languages. Running 1 + eval("2")
in Python gives 3. But what does 1 + eval(read_from_the_network())
give? It depends on what is online at the time of execution. If we get a number, the expression is correct. If string, then no. It is impossible to know before launch, so it is impossible to analyze the type statically.
The unsatisfactory solution in practice is to set the expression eval()
type Any, which resembles Object in some object-oriented programming languages or interface {}
in Go: a type that any value satisfies.
Values of type Any are not limited to anything, so the possibility of a type system that helps us in code with eval disappears. Languages in which both eval
and type system exist must refuse type safety every time eval
used.
Some languages have optional or gradual typing: they are dynamic by default, but allow you to add some static annotations. In Python recently added optional types; TypeScript is an add-on to JavaScript that has optional types; Flow performs static analysis of good old JavaScript code.
These languages provide some advantages to static typing, but they never provide an absolute guarantee, like truly static languages. Some functions will be statically typed, and some will be dynamically typed. The programmer always needs to know and be wary of the difference.
When compiling statically typed code occurs, the syntax is first checked, as in any compiler. Then types are checked. This means that a static language may first complain about one syntax error, and after correcting it, complain about 100 typing errors. A syntax error fix did not create these 100 typing errors. The compiler simply did not have the ability to detect type errors until the syntax was fixed.
Static language compilers can usually generate faster code than dynamic compilers. For example, if the compiler knows that the add function accepts integers, then it can use the native ADD instruction of the central processor. The dynamic language will check the type when executed, choosing one of the many add functions depending on the types (do we add integers or floats or glue strings or maybe lists?) Or we need to decide that an error has occurred and the types do not match each other. All these checks take time. Dynamic languages use different tricks for optimization, for example, JIT compilation (just-in-time), where the code is recompiled when executed after receiving all the necessary types of information. However, no dynamic language can compare in speed with a neatly written static code in a language like Rust.
Proponents of the static type system indicate that without a type system, simple mistakes can lead to production problems. This, of course, is true. Anyone who has used a dynamic language has experienced this.
Proponents of dynamic languages indicate that it seems easier to write code in such languages. This is definitely true for some kinds of code that we occasionally write, like, for example, that code with eval
. This is a controversial decision for regular work, and here it makes sense to recall the indefinite word "easy." Rich Hickey talked well about the word "easy," and his connection to the word "simple." Looking at this report you will realize that it is not easy to correctly use the word "easy." Be wary of "lightness."
The pros and cons of static and dynamic typing systems are still poorly understood, but they definitely depend on the language and the specific problem being solved.
JavaScript tries to continue, even if it means meaningless conversion (like "a" + 1
, giving "a1"). Python in turn tries to be conservative and often returns errors, as is the case with "a" + 1
.
There are different approaches with different security levels, but Python and JavaScript are both dynamically typed languages.
C will happily allow a programmer to read data from any place in memory, or to imagine that the value of one type has a different type, even if it does not make any sense and will cause the program to fall.
Haskell will not allow to add integer and float without explicit conversion before it. C and Haskell are both statically typed, despite such big differences.
There are many variations of dynamic and static languages. Any unqualified statement like "static languages is better than dynamic when it comes to X" is almost guaranteed nonsense. This may be true for specific languages, but then it is better to say "Haskell is better than Python when it comes to X".
Let's take a look at two famous examples of statically typed languages: Go and Haskell. In the Go typing system there are no generic types, types with "parameters" from other types. For example, you can create your own type for MyList lists, which can store any data we need. We want to be able to create MyList integers, MyList strings, and so on, without changing the source code of MyList. The compiler must follow the typing: if there is a MyList of integers, and we accidentally add a string there, then the compiler must reject the program.
Go was specifically designed so that it was impossible to set types like MyList. The best thing you can do is create a MyList of "empty interfaces": MyList may contain objects, but the compiler simply does not know their type. When we retrieve objects from MyList, we need to tell the compiler their type. If we say "I take out a string", but in reality the value is a number, then there will be an execution error, as in the case of dynamic languages.
Go also lacks many other features that are present in modern statically typed languages (or even in some systems of the 1970s). The creators of Go had their own reasons for these decisions, but the opinions of people on this occasion can sometimes be harsh.
Now let's compare with Haskell, which has a very powerful type system. If you set the type to MyList, the type of "list of numbers" is just MyList Integer
. Haskell will not let us accidentally add a string to the list, and make sure that we do not put an element from the list into a string variable.
Haskell can express much more complex ideas directly with types. For example, Num a => MyList a
means "MyList of values that refer to the same type of numbers." This may be a list of integer, float or decimal numbers with fixed precision, but it will definitely never be a list of strings, which is checked when compiled.
You can write an add function that works with any numeric types. This function will have type Num a => (a -> a -> a)
. It means:
a
can be any numerical type ( Num a =>
).a
and returns the type a
( a -> a -> a
).Last example. If the function type is String -> String
, then it takes a string and returns a string. But if it is String -> IO String
, then it also performs some kind of input / output. This can be a disk access, a network, a read from a terminal, and so on.
If a function in a type has no IO, then we know that it does not perform any I / O operations. In a web application, for example, you can see if a function changes a database by just looking at its type. No dynamic and almost no static languages are like that. This is a feature of languages with the most powerful typing system.
In most languages, we would have to deal with the function and all the functions that are called from there, and so on, in attempts to find something that changes the database. This is a tedious process in which it is easy to make a mistake. And the Haskell type system can answer this question simply and guaranteed.
Compare this power with Go, which is not capable of expressing the simple idea of MyList, not to mention "a function that takes two arguments, and they are both numerical and of the same type, and that does input / output."
The Go approach simplifies the writing of Go programming tools (in particular, the implementation of the compiler can be simple). In addition, fewer concepts need to be explored. How these advantages are comparable to significant limitations is a subjective question. However, one cannot argue that Haskell is harder to learn than Go, and that Haskell's type system is much more powerful, and that Haskell can prevent many more types of bugs when compiling.
Go and Haskell are so different languages that grouping them into one class of "static languages" can be misleading, despite the fact that the term is used correctly. If you compare the practical benefits of security, then Go is closer to dynamic languages than to Haskell.
On the other hand, some dynamic languages are safer than some static languages. (Python is generally considered much safer than C). When you want to make generalizations about static or dynamic languages as groups, do not forget about the huge number of differences between languages.
In more powerful typing systems, you can specify restrictions at smaller levels. Here are a few examples, but don't dwell on them if the syntax is incomprehensible.
In Go, you can say "the add function takes two integers and returns an integer":
func add(x int, y int) int { return x + y }
In Haskell, you can say "the function accepts any numeric type and returns a number of the same type":
f :: Num a => a -> a -> a add xy = x + y
In Idris, you can say "the function takes two integers and returns an integer, but the first argument must be less than the second argument":
add : (x : Nat) -> (y : Nat) -> {auto smaller : LT xy} -> Nat add xy = x + y
If you try to call the function add 2 1
, where the first argument is greater than the second, then the compiler will reject the program at compile time . It is impossible to write a program where the first argument is more than the second. Rare language has this feature. In most languages, such a check occurs during execution: we would write something like if x >= y: raise SomeError()
.
In Haskell, there is no equivalent to this type as in the example with Idris above, and in Go there is no equivalent to either the Haskell example or the Idris example. As a result, Idris can prevent many bugs that Haskell cannot prevent, and Haskell can prevent many bugs that Go doesn’t notice. , .
. , . . , .
MyList<String>
, . String, , ., , . — Go, , .
(Java C#) — , .
, Mozilla (Rust) Apple (Swift).
(Idris and Agda) , . .
Source: https://habr.com/ru/post/308484/
All Articles