Data.Text.Lazy
is a nice data type, because you can have both simple code managing text, and efficient run-time text processing, because text is loaded chunk by chunk from data streams. It is like a BufferedReader
in Java.But in GHC 8.0.1 some file reading functions do not behave correctly.
import qualified Data.Text.Lazy.IO as LazyText
import qualified Data.Text.Lazy as LazyText
getFileContent1 :: FilePath -> IO String
getFileContent1 fileName = do
fileContent <- LazyText.readFile fileName
return $ LazyText.unpack fileContent
-- NOTE: print the file content, reading it chunk by chunk by `fileName`
-- and writing it on `stdout` chunk by chunk.
-- So this simple code, has a nice run-time behaviour.
printFileContent1 fileName = do
c <- getFileContent1 fileName
putStrLn c
-- NOTE: this seems a correct function,
-- but when executed it returns always an empty file content
getFileContent2 :: FilePath -> IO String
getFileContent2 fileName = do
LazyText.withFile fileName ReadMode $ \handle -> do
fileContent <- hGetContents handle
return $ LazyText.unpack fileContent
-- NOTE: this code print nothing, due to error on `getFileContent2`
printFileContent2 fileName = do
c <- getFileContent2 fileName
putStrLn c
Data.Text.Lazy.IO.readFile
is implemented in this way:readFile :: FilePath -> IO Text
readFile name = openFile name ReadMode >>= hGetContents
Data.Text.Lazy.IO.hGetContents
is a function returning the content of the handle chunk by chunk, and closing the handle when all the content is read.System.IO.withFile
is implemented in this way: withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
withFile name mode = bracket (openFile name mode) hClose
so the getFileContent2
code can be expanded to getFileContent3 fileName = do
bracket (openFile fileName ReadMode) hClose $ \handle -> do
fileContent <- hGetContents handle
return $ LazyText.unpack fileContent
bracket
is one of a series of resource managements functions and monads used for acquiring resources, and releasing them at the end of an action, in a predictable way, and not when the garbage collector arbitrarily decide it. bracket
makes management of scarce resources like file handles, database connections, and so on more robust and predictable.This code will run correctly
printFileContent3 fileName = do
bracket (openFile fileName ReadMode) hClose $ \handle -> do
fileContent <- hGetContents handle
putStrLn $ LazyText.unpack fileContent
because it will:- open the file
- read it chunk by chunk, using
hGetContents
- print it chunk by chunk, using
putStrLn
- close the handle, thanks to
bracket
resource finalization action
printFileContent2
is not running correctly because:bracket
open the file- a lazy evaluation thunk
LazyText.unpack <$> hGetContents handle
is returned from thegetFileContent2
function bracket
close the file handle before the thunk is evaluated
printFileContent2.putStrLn
executed the thunkhGetContents
thunk tries to access a closed handlehGetContents
returns an empty content, instead of signaling with a run-time exception that the handle is closed
Then
printFileContent2
assumes wrongly that the file is an empty file, without any compile time and run time error.A test case for the bug is on https://github.com/massimo-zaniboni/ghc_lazy_file_content_error , and the bug was signaled to Ghc team.
RAII Programming in Haskell
Resource Acquisition is Initialization (RAII) is a tecnique for having predictable resource usages.In Haskell,
bracket
should be RIIA compliant. This implies that bracket
must always return the result in a strict way. In this way when the bracket action is called:- the resources are allocated,
- the action is executed with maximum priority, and predictability,
- the resources are deallocated,
- the result is returned to the caller, completely evaluated, and no further processing involving the resources is required,
This mechanism must be used also in case of nested bracket actions: the called actions must be executed in a strict way.
5 Whys
Why we have thehGetContents
error? Because hGetContents
is buggy. Why? Because bracket
used inside withFile
does not behave correctly with unevaluated thunks. Why? Because unevaluated thunks do not play nice with RIIA semantic. Because bracket
should force a strict evaluation of the returned action, so the used resources are used completely and in a predictable way.If
bracket
forces a strict evaluation of its result, then there will be no bug. This code run correctlygetFileContent4 :: FilePath -> IO String
getFileContent4 fileName = do
fileContent <- LazyText.readFile fileName
return $! LazyText.unpack fileContent
-- NOTE: execute in a strict way, thanks to `$!`
-- NOTE: print the file content, reading it chunk by chunk by `fileName`
-- and writing it on `stdout` chunk by chunk.
-- So this simple code, has a nice run-time behaviour.
printFileContent4 fileName = do
c <- getFileContent4 fileName
putStrLn c
No comments:
Post a Comment