Data.Text.Lazyis a nice data type, because you can have both simple code managing text, and efficient run-time text processing, because text is loaded chunk by chunk from data streams. It is like a
But in GHC 8.0.1 some file reading functions do not behave correctly.
import qualified Data.Text.Lazy.IO as LazyText import qualified Data.Text.Lazy as LazyText getFileContent1 :: FilePath -> IO String getFileContent1 fileName = do fileContent <- LazyText.readFile fileName return $ LazyText.unpack fileContent -- NOTE: print the file content, reading it chunk by chunk by `fileName` -- and writing it on `stdout` chunk by chunk. -- So this simple code, has a nice run-time behaviour. printFileContent1 fileName = do c <- getFileContent1 fileName putStrLn c -- NOTE: this seems a correct function, -- but when executed it returns always an empty file content getFileContent2 :: FilePath -> IO String getFileContent2 fileName = do LazyText.withFile fileName ReadMode $ \handle -> do fileContent <- hGetContents handle return $ LazyText.unpack fileContent -- NOTE: this code print nothing, due to error on `getFileContent2` printFileContent2 fileName = do c <- getFileContent2 fileName putStrLn c
Data.Text.Lazy.IO.readFileis implemented in this way:
readFile :: FilePath -> IO Text readFile name = openFile name ReadMode >>= hGetContents
Data.Text.Lazy.IO.hGetContentsis a function returning the content of the handle chunk by chunk, and closing the handle when all the content is read.
System.IO.withFileis implemented in this way:
withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r withFile name mode = bracket (openFile name mode) hClose
getFileContent2code can be expanded to
getFileContent3 fileName = do bracket (openFile fileName ReadMode) hClose $ \handle -> do fileContent <- hGetContents handle return $ LazyText.unpack fileContent
bracketis one of a series of resource managements functions and monads used for acquiring resources, and releasing them at the end of an action, in a predictable way, and not when the garbage collector arbitrarily decide it.
bracketmakes management of scarce resources like file handles, database connections, and so on more robust and predictable.
This code will run correctly
because it will:
printFileContent3 fileName = do bracket (openFile fileName ReadMode) hClose $ \handle -> do fileContent <- hGetContents handle putStrLn $ LazyText.unpack fileContent
- open the file
- read it chunk by chunk, using
- print it chunk by chunk, using
- close the handle, thanks to
bracketresource finalization action
printFileContent2is not running correctly because:
bracketopen the file
- a lazy evaluation thunk
LazyText.unpack <$> hGetContents handleis returned from the
bracketclose the file handle before the thunk is evaluated
printFileContent2.putStrLnexecuted the thunk
hGetContentsthunk tries to access a closed handle
hGetContentsreturns an empty content, instead of signaling with a run-time exception that the handle is closed
printFileContent2assumes wrongly that the file is an empty file, without any compile time and run time error.
A test case for the bug is on https://github.com/massimo-zaniboni/ghc_lazy_file_content_error , and the bug was signaled to Ghc team.
RAII Programming in HaskellResource Acquisition is Initialization (RAII) is a tecnique for having predictable resource usages.
bracketshould be RIIA compliant. This implies that
bracketmust always return the result in a strict way. In this way when the bracket action is called:
- the resources are allocated,
- the action is executed with maximum priority, and predictability,
- the resources are deallocated,
- the result is returned to the caller, completely evaluated, and no further processing involving the resources is required,
This mechanism must be used also in case of nested bracket actions: the called actions must be executed in a strict way.
5 WhysWhy we have the
hGetContentsis buggy. Why? Because
withFiledoes not behave correctly with unevaluated thunks. Why? Because unevaluated thunks do not play nice with RIIA semantic. Because
bracketshould force a strict evaluation of the returned action, so the used resources are used completely and in a predictable way.
bracketforces a strict evaluation of its result, then there will be no bug. This code run correctly
getFileContent4 :: FilePath -> IO String getFileContent4 fileName = do fileContent <- LazyText.readFile fileName return $! LazyText.unpack fileContent -- NOTE: execute in a strict way, thanks to `$!` -- NOTE: print the file content, reading it chunk by chunk by `fileName` -- and writing it on `stdout` chunk by chunk. -- So this simple code, has a nice run-time behaviour. printFileContent4 fileName = do c <- getFileContent4 fileName putStrLn c