I have been worried about one topic for a long time. So I decided to speak out and hear what people think about it. It's about the
hGetContents function. If you have ever worked with files, then you know that this function returns the contents of a file (stream). Here is a typical example of using this feature.
import System.IO main = do file <- openFile "1.txt" ReadMode content <- hGetContents file print content hClose file
Everything is very trite - open the file, read the contents, display on the screen, close the file. The feature of
hGetContents (or, more precisely, the Haskell-th feature) is that it is lazy. Namely, the data from the file are not immediately counted as a whole, but will be read as necessary. For example, such a program
import System.IO main = do file <- openFile "1.txt" ReadMode content <- hGetContents file print $ take 3 content hClose file
counts from the file only the first 3 characters.
What can I say, great feature! It is very convenient to operate on the
content variable (the contents of the file), but at the same time know that Haskell will take from it only what is needed and nothing superfluous.
Now consider the following example.
import System.IO main = do file <- openFile "1.txt" ReadMode content <- hGetContents file hClose file print content
Here we just rearranged the last two lines. The result will be the displayed blank line. Why did this happen? The reason is that we are trying to access the contents of the file after the file is already closed. When
hGetContents was called
, the data was not read anywhere, just in the
content variable some reference to this file was kept. And then, when we needed the content
content , it turned out that the file was already closed.
Well, so what, you think, just need to use the
content variable before closing the file.
The problem here is that we cannot “force” Haskell to calculate the
content variable before the file is closed (if someone knows a way, write). For example, we need to write a function that opens a file, reads the contents, parses some structure, closes the file, and returns the resulting structure as a result.
{-# OPTIONS_GHC -XScopedTypeVariables #-} import System.IO parseFile :: (Read a) => String -> IO a parseFile fileName = do file <- openFile fileName ReadMode content <- hGetContents file rezult <- return $ read content hClose file return rezult main = do a :: Int <- parseFile "1.txt" print a
It seems to be an elementary example, but it DOES NOT WORK. For the same reason. The fact that we transferred the contents of the file to some
read function did not change anything, because it was also deposited “for later”. And no matter what we do with
content , this will not force Haskell to read the contents from the file (unless we, of course, display it on the screen).
You can, of course, place the whole program between
openFile and
hClose , but what if you need to read the contents of the file, correct it and write to the same file?
I thought about this example for a long time and the feeling that “something is wrong here” did not leave me alone. Doesn’t this violate the purity of Haskell. I wrote a program that is correct in essence, but it gives an incorrect result. What should I think about at what point he will turn to this variable and what is he doing there at all under the hood? This is not Haskell, this is, sorry for the expression, some C ++.
As a result, I came to the conclusion that this "feature" is not a feature, but a bug. If I did not convince you, here are a few reasons:
- 1. In lambda calculus, there are various strategies for reducing the lambda term: full beta reduction , normal order of calculations , call by name (haskell uses the optimized option call by need ), call by value (energetic calculations). There is a theorem that states that any lambda term, regardless of the computation strategy, is reduced to the same normal form, if it exists at all. Those. for any order of computation (lazy or energetic) the same result should be obtained. True, there are examples of lambda-terms, the calculation of which is looping with an energetic strategy, but not looping if it is lazy (for example, working with infinite lists). But if both calculations are completed, they should give the same result!
(By the way, the opposite example, when the lazy option is fixated, but energetic - no, does not exist. In this sense, the lazy strategy is the most “neat”)
If we look at our last example and imagine that Haskell is calculating it vigorously, then our program should work correctly and output the contents of the file. Those. in an energetic mode, one result, and in a lazy one - another. You can argue that a program is not a lambda term and is not a pure function. In a sense, this is true. But let's remember why the monads were invented. Is it not in order to present the program as some pure function that takes the state of the outside world at the input and gives out a new state to the output?
- 2. Still, why does this happen? In Haskell, all functions are clean. The result of the calculation depends only on the arguments passed. This means that we can postpone the calculation “for later”, and the result will not change. In the case when we call hGetContents , the result of the function depends not only on the argument, but also on the state of the system at the moment. Do we have the right to postpone “for later” a calculation that depends on the state of the system? Ideally, the program should work like this: open the file, close the file, call print to return to the past (when the file was still open), read its contents, return to the future, display it on the screen.
')
Why am I doing all this? You just do not get me wrong. I love Haskell very much. For its purity, laziness, functionality and much more, which is not expressed in words. I just suddenly found out that he was not as “clean” as I had previously thought. There remains some kind of mixed feeling that it seems to be a good feature, but somehow it became “gryazny” from it.
Maybe I just have paranoia. I would like to hear your opinion, bug or feature?