Why study Clojure?

What is a good programming language? What qualities and characteristics should it have? The answer is difficult to give. Here is one of the possible definitions: a good PL should perform its tasks well. After all, PL is only a tool in the hands of a programmer. And the tool is obliged to help us in our work. In the end, this is the reason for its creation. Different PL try to solve different problems (with varying success). The goal that was set when designing Clojure is to make the programs we write simple . And, as a result, to speed up their creation, testing. And most importantly, reduce the time for their understanding, change and maintenance.

Clojure rocks?

I'll warn you right away - the article will not contain pieces of code that demonstrate the Clojure slope. There will be no phrases like "in X it took 5 lines and in Clojure only 4". This is a disgusting criterion for the quality of the language! In the end, I absolutely do not care if I can write qsort in 2 lines, or I have to strain my fingers for as many as 5 - in real life I will use the library function!

Lambdos won't surprise anyone now, they are everywhere (well, almost, although usually they appear everywhere in the 8th version). Processing of collections (including parallel), list expressions, various syntactic sugar - this is now enough in many languages. In truth, I just adore such articles . But such comparisons are absolutely not suitable for comparing the quality of languages! This is how to measure the speed of PL by how fast the program displays “Hello, world!”. Well, unless we measure the speed of HQ9 + . If you think, then such details are not so important for large systems. As the project grows, we are less and less concerned about whether we use parentheses or indents, infix or prefix notation. The extra line when finding the sum of the array already ceases to care for all - in the first place there are problems of a different kind.
')

Complexity

The systems we create are inherently fluid. It would be very good if the requirements did not change. It’s just great if you could foresee all situations at the very beginning of the development. Alas, in real life we constantly have to finish, rework, improve, rewrite, replace, optimize ... The most annoying thing is that with time the complexity of the system only grows. Constantly, continuously. At the beginning of development, everything is simple and transparent, any change is made quickly, no “crutches”. Beauty. Over time, the situation ceases to be so rosy and fun. Even the slightest revision of the code could potentially lead to an avalanche-like change in the behavior of the system. We have to carefully study, analyze the code, try to predict the side effects of each change. So, over time, we literally cannot thoroughly analyze all possible consequences of our changes.

A person by nature can perceive at one time only a limited amount of information. As the project grows, the number of internal connections increases. Moreover, most of the links are implicit. It’s harder and harder for us to hold what we need Meanwhile, the team is growing, the team is changing - new people no longer know the whole project. There is a division of responsibilities, which can lead to even more entanglement. Gradually, our system becomes complex.

How to deal with it? Maximum coverage of regression tests and their run after each change? Tests are extremely useful, but they are only a safety rope. The tests did not pass - something is wrong, we have problems. This is a treatment of symptoms, but tests do not eliminate the essence of the problem. Strict guidelines and widespread use of patterns? No, the problem is not local difficulties. We simply stop understanding how components interact in our code, there are too many implicit links. Perhaps constant refactoring? This is not a panacea, the complexity is not growing from low-level solutions. In fact, the problem should be solved comprehensively. And one of the important tools is the right tool. A good programming language should help us write simple and transparent programs.

Simple and easy

But “simple” (simple) does not mean “easy” (easy). These are different concepts. Rich Hickey (author of Clojure) even made the famous Simple Made Easy report on this topic. On Habré published translation slides . Simplicity is an objective concept. This is the lack of complexity (complexity), the absence of interweaving, confusion, a small number of links. On the other hand, “easy” is very subjective. Is it easy to drive a bike? Win a chess game? Speak German? I do not know German, but this is not a reason to say "this language is not needed, it is too complicated." It is complicated for me , and only because I do not know him banally.

We are all used to the fact that the function call is written as f(x, y) . We are accustomed to programming in the framework of the PLO. This is commonplace. But in fact, the lung is not necessarily easy. We just get used to the complexity of some things, start to ignore it, take it for granted. Example function:

 (defn select "Returns a set of the elements for which pred is true" {:added "1.0"} [pred xset] (reduce (fn [sk] (if (pred k) s (disj sk))) xset xset))

Looks very ... weird ! It is necessary to spend some time studying the language, mastering its concepts so that it becomes easy. But simplicity (or complexity) is constant. If we study the tool well, the number of internal dependencies will not change anyway. It will not become more difficult or easier, although it will be easier for us.

The familiar tool can give better results right now, momentarily, but in a more distant future, the simplest solution shows the best results.

Side effects

What are the sources of difficulty in our programs? One of them is side effects. We cannot completely do without them, but we can localize them. And the language should help us in this.

Clojure is a functional language, it stimulates us to write pure functions. The result of such functions depends only on the input parameters. No need to wrestle with "hmm, but what will happen if I run this one before calling this function". If there is no input, there is a weekend. No matter how many times we run a function, its result will be the same. This makes testing very easy. You do not need to go through different call orders or re-create (simulate) the correct external state.

Pure functions are easier to analyze, you can literally play with them, see how they behave on live data. Easier to debug the code. We can always reproduce the problem with the pure function - it is enough to pass to it the input parameters that cause an error, because the result of the function does not depend on what was performed before. Pure functions are extremely simple, even if they do a lot of work.

Of course, Clojure supports higher order functions and their composition.

 ((juxt dec inc) 1) ; => [0 2] ((comp str *) 1 2 3) ; => "6" (map (partial * 10) [1 2 3]) ; => [10 20 30] (map (comp inc inc) [1 2 3]) ; => [3 4 5]

Clojure is not a pure language, and functions can have side effects. For example, println is a function call, an action. It is important that the very essence of such functions lies in interaction with the outside world. Print the value to a file, send an HTTP request, execute SQL - all these actions are meaningless apart from the side effect they create. Therefore, it is very useful to separate such functions (clean and dirty).

But they (dirty functions) do not possess a state. They only serve as a means of interaction with the outside world. As we will see, Clojure separates the state of our program with mediated references.

Immunity

All data structures in Clojure are immutable. There is no way to change the vector element. All we can do is create a new vector for which one element will be changed. A very important point is that Clojure preserves the algorithmic complexity (in time and memory) for all standard operations on collections. Well, almost, instead of O (1) for vectors, we have O (lg ₃₂ (N)). In practice, for even collections of millions of elements, lg ₃₂ (N) does not exceed 5.

Such complexity is achieved through the use of persistent collections . The idea is that with the “change” of the structure, the old version and the new one share most of the internal data. At the same time, the old version remains fully operational. Moreover, we have access to all versions of the structure. This is an important point. Of course, unnecessary versions will be collected by the garbage collector.

 (def a [1 2 3 4 5 6 7 8]) ; a -> [1 2 3 4 5 6 7 8] (def b (assoc a 3 999)) ;b -> [1 2 3 999 5 6 7 8]

Out of the box, Clojure supports single-linked lists, vectors, hash tables, red-black trees. There is an implementation of a persistent queue (for the stack, you can use a list or vector). And everything is immutable. For better performance, you can create your own record types.

 (defrecord Color [red green blue]) (def a (Color. 0.5 0.6 0.7) ; a => {:red 0.5, :green 0.6, :blue 0.7}

Here we declare a structure with 3 fields. The Clojure compiler will create an object with 5 fields (2 "extra"). One metadata field, in our case it will be null. 3 fields for the actual data. And one more field - for additional keys. Even if to increase the speed in our program, we declare a structure with an explicit enumeration of fields, Clojure still leaves us with the ability to add additional values.

 (defrecord Color [red green blue]) (def b (assoc a :alpha 0.1)) ; b => {:alpha 0.1, :red 0.5 :green 0.6, :blue 0.7}

And yes, there is a special syntax for data structures in Clojure:

 ;  [1 2 3] ; - {:x 1, :y 2} ;  #{"a" "b" "c"}

condition

So, we have pure functions, they define the business logic of our application. There are dirty functions that serve to interact with external systems (sockets, database, web-server). And there is an internal state of our system, which is stored in Clojure as indirect references.

There are 4 types of standard links:

var is an analogue of thread-local variables, used to set context data: current database connection, current HTTP request, precision parameters for mathematical expressions, and the like;
atom - an atomic cell , allows updating the state synchronously, but not coordinatedly ;
agent - a lightweight analogue for actor (although, in a sense, they are antipodes, more on this below), are used for asynchronous work with the state;
ref - transactional memory cells, provides synchronous and coordinated work with the state.

All global variables are stored in var (including functions). Therefore, they can be redefined “locally”.

 (def ^:dynamic *a* 1) (println a) ; => 1 (binding [a 42] (println a)) ; => 42

Here we indicated to the compiler that the variable a should be dynamic, i.e. stored inside ThreadLocal . Using ThreadLocal reduces performance somewhat, so it does not apply to all var cells by default. But, if necessary, any var cell can be made dynamic after creation (which is often used in tests).

In tests, you can replace entire functions.

 ;     ,   .. (defn some-function-with-side-effect [x] ...) ;       (defn another-function [x] ...) (deftest just-a-test ... (binding [some-function-with-side-effect (fn [x] ...)] ;  mock- (another-function 123)) ...)

All references in Clojure support the deref operation (get value). For var cells, it looks like this:

 ;   #'a (def a 123) (println a) ; => 123 (println #'a) ; => #'user/a (println (deref #'a)) ; => 123

The cell stores the value (immutable), but at the same time is itself a separate entity. A special syntax has deref introduced for the deref function (yes, this is just sugar). Here is an example of using atom .

 (let [x (atom 0)] (println @x) ; => 0 (swap! x inc) ; CAS- (println @x)) ; => 1

swap! function swap! accepts atom and "mutates" function. The latter takes the current value of the atom, and must return a new one. This is where the persistent data structures come in very handy. For example, we can store a vector of a million elements in an atom, but the “mutated” function will be performed fairly quickly for CAS (we remember that the complexity of operations on persistent collections is the same as for ordinary, mutable). Or we can update a couple of fields in the hash table:

 (def user (atom {:login "theuser" :email "theuser@example.com"})) (swap! account assoc :phone "12345") ;    (swap! account (fn [x] (assoc x :phone "12345")))

It is important that the function is clean, since it can be executed several times. We cannot (should not!) Write something like:

 (swap! x (fn [x] (insert-new-record-to-db x) (inc x)))

Agents

Agents serve to maintain a condition that is directly related to side effects. The idea is simple. We have a cell, a queue of functions is “attached” to it. Functions are alternately applied to the value that is stored in this cell, the result of the function becomes the new value. Everything is calculated asynchronously in a separate thread pool.

 (def a (agent 0)) ;   (send a inc) (println @a) ; => 1 (send a (fn [x] (Thread/sleep 100) (inc x))) (println @a) ; => 1 ;  100  (println @a) ; => 2

Agents update their value asynchronously. But we can at any time know the status of the agent. Agents can send messages to each other, when sending a message is delayed until the sending agent updates its status. In other words, if an exception is thrown in one agent, messages sent from it will not be sent anywhere.

 (def a (agent 0)) (def b (agent 0)) (send a (fn [x] (send b inc) ;    b (throw (Exception. "Error")))) (println @b) ; -> 0,

It begs some analogy with the model of actors . They are similar, but there are fundamental differences. The state of the agents explicitly, at any time you can call deref and get the value of the agent. This is contrary to the idea of actors, where we can find out the state only indirectly, by sending and receiving messages. In the case of actors, we can’t even be sure that by querying his condition we will not “accidentally” change it. The agent is absolutely reliable in this sense - its state can only be changed by the functions send and send-off (which differ only in the thread pool in which our message will be processed).

The second key difference is that agents are open to change and add functionality. The only way to change the behavior of an actor is to rewrite its code. Agents are only links, they do not have their own behavior. We can write a new function and send it to the agent queue.

Actors are trying to divide the state of our program into small parts that are easier to smash or isolate. Updating and status reading operations are reduced to sending messages. Sometimes this is extremely useful (for example, when running an erlang program on several nodes). But more often it is not required. Sometimes even the opposite. So, in agents it is convenient to store large amounts of information that need to be searched between threads: caches, sessions, intermediate results of mathematical calculations, etc.

For the actors, we fix a lot of messages to which he can respond (the rest he considers erroneous). The order of the messages is also important, as they can potentially lead to side effects. This is his public contract. For the agent, we fix only the data that can be stored in it, their structure. It is very important to emphasize that the agents are not trying to replace the actors at all. These are different concepts, and their uses differ.

As mentioned, agents work asynchronously. We can build chains of events (sending messages from the agent to the agent). But with the help of some agents, we will not be able to change the state of our program in a coordinated manner .

STM

Software transactional memory is one of the key features of Clojure. Implemented by MVCC . And immediately an example:

 (def account1 (ref 100) (def account2 (ref 0)) (dosync (alter account1 - 30) (alter account2 + 30))

We increase one value and simultaneously reduce the other. If something goes wrong (the exception), then the entire transaction will be canceled:

 (println @account1) ; => 70 (println @account2) ; => 30 (dosync (alter account1 * 0) (alter account2 / 0)) ; => ArithmeticException ;    (println @account1) ; => 70 (println @account2) ; => 30

Very similar to the usual ACID, but without Durability. When entering a transaction, all links seem to be frozen, their values are fixed for the duration of the entire transaction. If, when reading / writing a link, it is found that it has already changed its value (another transaction has completed and spoiled our life), then the current transaction is restarted. Therefore, there should be no side effects inside the transaction (I / O, work with atoms). And then the agents are at an opportune moment.

 (def a (ref 0)) (def b (ref 0)) (def out-agent (agent nil)) (dosync (println "transaction") (alter a inc) ;      (let [a-value @a b-value @b] (send-off out-agent (fn [_] (println "a" a-value "b" b-value)))) (alter b dec)) ;

All messages for agents adhere to the moment when the transaction is completed. In our example, changing the references a and b may result in restarting the transaction, the word “transaction” may be typed several times. But the code inside the agent will be executed exactly once, and after the transaction is completed.

To keep various transactions from interfering with each other as little as possible, Clojure's links store a history of values. By default, this is only the last value, but when a conflict occurs (one transaction writes and the other reads), then for a specific reference the size of the stored history is increased by one (up to 5 values). Do not forget that we store in the links persistent structures that share common structural elements. Therefore, storing such a story in Clojure is very cheap in terms of memory consumption.

STM transactions do not prevent us from changing our code. There is no need to analyze whether it is possible to use this or that link in the current transaction. They are all accessible, and we can add new links completely transparent to existing code. Links do not interact with each other. For example, when using ordinary locks, we need to follow the lock / unlock order in order not to cause a deadlock.

With concurrent access, reader transactions do not block each other, just like when using ReadWriteLock . Moreover, transaction-writers do not block readers! Even if a transaction is currently being executed that changes the link, we can get the value without blocking.

Agents and STM links complement each other. The former are not suitable for coordinated change of state, the latter do not allow working with side effects. But their sharing makes our programs more transparent and simpler (less confusing) than when using “classical” means (mutexes, semaphores, and the like).

Metaprogramming

Now many languages have certain metaprogramming tools. These are AspectJ for Java, AST-transformation for Groovy, decorators and metaclasses for Python, various reflections.

Clojure, as a member of the Lisp family, uses macros for this purpose.With their help, we can program (expand) a language with the means of the language itself. A macro is an “ordinary” function, with the only difference being that it is executed during the compilation of the program. The not yet compiled code is transferred to the macro input, the result of the macro execution is a new code that the compiler is already compiling.

 (defmacro unless [pred ab] `(if (not ~pred) ~a ~b)) (unless (> 1 10) (println "1 > 10. ok") (println "1 < 10. wat"))

We have created our own control structure (inverse version if). All you need to do is write a function!

Macros are used extensively in Clojure. By the way, many of the operators built into the language are actually macros. For example, here is the implementation or:

 (defmacro or ([] nil) ([x] x) ([x & next] `(let [or# ~x] (if or# or# (or ~@next)))))

Even defnjust a macro unfolding in defand fn. By the way, destructuring is also implemented using macros.

 (let [[ab] [1 2]] (+ ab)) ;   - ... (let* [vec__123 [1 2] a (clojure.core/nth vec__123 0 nil) b (clojure.core/nth vec__123 1 nil)] (+ ab))

Recently appeared in Java try-with-resources . In this 7th version of Java, we waited only a few years. For Clojure, just write a few lines:

 (defmacro with-open [[vr] & body] `(let [~r ~v] (try ~@body (finally (.close ~v)))))

In other languages, the situation is better, but still far from ideal. It is important not the presence of a particular construct in the language, but the ability to add your own. Therefore, it is not surprising that, say, pattern matching for Clojure is implemented as a separate plug-in library . There is simply no need to include such things in the core of the language, it is much more expedient to implement them in the form of a macro. The situation is similar with monad support , logical programming , advanced error handling and other language extensions. There is even an optional static typing !

Not to mention the convenience of creating a DSL. For Clojure, they created a lot. This and HTML , HTTP- , , , … ( ).

Clojure ( Lisp- ) — . , , AST-, . - , , . , .

 (defn do2 [x] (list 'do xx)) (do2 '(println 1)) ; => '(do (println 1) (println 1)) ;   ; => (list 'do (list 'println 1) (list 'println 1))

For all its power, macros in Clojure do not degrade the readability of the program (unless, of course, using them in moderation). After all, a macro is just a function, and we can always uniquely determine which function is used in the current context. For example, if we see the code (dosomething [ab] c), it is easy to find out what is behind the name dosomething, just look at the beginning of the file (where the import of other modules occurs). If this is a macro, then its semantics is constant and known. We do not need advanced IDE to understand this code. Although, of course, advanced development environments are able to "deploy" the macro in place, allowing you to see what the program will turn into a compiler.

Polymorphism

Clojure has 2 mechanisms for creating polymorphic functions. Initially, the language was supported only by multimethods - a powerful tool, but most often redundant. Starting with version 1.2 (and version 1.5.1 is currently relevant) a new concept has been added to the language - protocols.

Protocols are similar to Java interfaces, but cannot inherit each other. Each protocol describes a set of functions.

 (defprotocol IShowable (show [this])) ; ... (map show [1 2 3])

By this we declare 2 entities - the protocol itself, as well as the function show. This is the usual Clojure function, which, when it is called, searches for the most appropriate implementation based on the type of the first argument. Separately, we declare the necessary data structures, and specify the protocol implementation for them.

 (defrecord Color [red green blue] IShowable (show [this] (str "<R" (:red this) " G" (:green this) " B" (:blue this))))

You can implement a protocol for third-party type (even built-in).

 (extend-protocol IShowable String (show [this] (str "string " this)) clojure.lang.IPersistentVector (show [this] (str "vector " this)) Object (show [this] "WAT")) (show "123") ; => "string 123" (show [1 2 3]) ; => "vector [1 2 3]" (show '(1 2 3)) ; => "WAT"

You can add the implementation of the protocols to the already existing types, even if we do not have access to the source codes. There is no magical manipulation with baytkodom or similar tricks. Clojure creates a global table -> ; when you call a protocol method, the table is searched for by the type of the first argument, taking into account the hierarchy. Thus, the declaration of the new implementation for the protocol is reduced to updating the global table.

But sometimes protocols are not enough. For example, for double dispatch . In this (and not only) cases, multimethods will be useful to us.. When declaring a multimethod, we indicate a special side function dispatcher. Dispatcher receives the same arguments as the multimethod. The search for the final implementation is already on the value that returned dispatcher. This can be a type, keyword or vector. In the case of a vector, the most suitable implementation is searched for by several values.

 (defmulti convert (fn [obj target-type] [(class obj) target-type])) (defmethod convert [String Integer] [x _] (Integer/parseInt x)) (defmethod convert [String Long] [x _] (Long/parseLong x)) (defmethod convert [Object String] [x _] (.toString x)) (defmethod convert [java.util.List String] [x _] (str "V" (vec x))) (convert "123" Integer) ; -> 123 (convert "123" Long) ; -> 123 (convert 123 String) ; -> "123" (convert [1 2 3] String) ; -> "V[1 2 3]"

Here we have declared an abstract function, the implementation of which is selected based on the type of the first argument and the value of the second (this should be a class). Of course, Clojure considers the type hierarchy when searching for a suitable implementation. It is convenient to use types, but their hierarchy is strictly fixed. But we can create our own ad-hoc hierarchy of keywords.

 ;   "-" (derive ::rect ::shape) (derive ::square ::rect) (derive ::circle ::shape) (derive ::triangle ::shape) (defmulti perimeter :type) ;       ,  :type ~ (fn [x] (:type x)) (defmethod perimeter ::rect [x] (* 2 (+ (:hx) (:wx)))) (defmethod perimeter ::triangle [x] (reduce + ((juxt :a :b :c) x))) (defmethod perimeter ::circle [x] (* 2 Math/PI (:rx))) (perimeter {:type ::rect, :h 10, :w 3}) ; -> 26 (perimeter {:type ::square, :h 10, :w 10}) ; -> 40 (perimeter {:type ::triangle, :a 3, :b 4, :c 5}) ; -> 12 (perimeter {:type ::shape}) ; -> throws IllegalArgumentException

Hierarchies can declare several. As well as with types, it is possible to carry out dispatching on several values at once (vector). When specifying your own hierarchies, you can even mix keywords and Java types!

 (derive java.util.Map ::collection) (derive java.util.Collection ::collection) (derive ::tag .java.lang.Iterable) ; -> ClassCastException

We can “inherit” the type from keyword (but not vice versa). This is useful for creating groups of classes open for extension.

The system of multimethods is simple, but at the same time extremely powerful. Usually, there is enough protocol functionality for everyday needs, but multimethods can be an excellent way out in difficult and unusual situations.

Common sense

Language means nothing without infrastructure. Without a community, a set of libraries, frameworks, various kinds of utilities. One of the strengths of Clojure is the use of the JVM platform. Integration with Java (in both directions) is extremely simple. It's no secret that there is just a huge number of libraries for Java (we will not discuss their quality). They can all be used directly from Clojure. Although the number of native libraries is large enough (and constantly growing).

Plug-ins for Eclipse and IDEA are actively developing . For the assembly of projects, the leiningen utility used by the entire community has long been the de facto standard . There are a variety of frameworks to create WEB ,

Immutant ( JBoss AS7 ). Immutant Ring (HTTP Clojure), , , , , . Immutant .

Clojure , .Net CLR . , , ClojureScript, port for javascript. Of course, there are no means of multithreading, and, as a result, transactional memory and agents. But all other language tools are available, including persistent structures, macros, protocols, and multimethods. And the integration between ClojureScript and JavaScript is as good and simple as between Clojure and Java (and even better in some places).

What's next?

And then everything is simple. We have a tool. Worker, reliable. Not a silver bullet, but quite versatile. Plain.Yes, you may have to spend some time to master it. Much may seem unusual and strange. But this is just a matter of habit, you quickly understand - the whole beauty of the language is in its organicity, the fine joining of individual elements into a single whole.

Get to know Clojure is worth it. Definitely. Even if this tool does not suit you for one reason or another, the ideas that are in it will be very useful.

Source: https://habr.com/ru/post/173071/

All Articles