Saturday, October 22, 2016

RAII Programming in Haskell

I'm using Haskell in production for processing data files, and Data.Text.Lazy is a nice data type, because you can have both simple code managing text, and efficient run-time text processing, because text is loaded chunk by chunk from data streams. It is like a BufferedReader in Java.
But in GHC 8.0.1 some file reading functions do not behave correctly.
import qualified Data.Text.Lazy.IO as LazyText
import qualified Data.Text.Lazy as LazyText

getFileContent1 :: FilePath -> IO String
getFileContent1 fileName = do
  fileContent <- LazyText.readFile fileName
  return $ LazyText.unpack fileContent

-- NOTE: print the file content, reading it chunk by chunk by `fileName`
-- and writing it on `stdout` chunk by chunk.
-- So this simple code, has a nice run-time behaviour.
printFileContent1 fileName = do
  c <- getFileContent1 fileName
  putStrLn c

-- NOTE: this seems a correct function, 
-- but when executed it returns always an empty file content
getFileContent2 :: FilePath -> IO String
getFileContent2 fileName = do
  LazyText.withFile fileName ReadMode $ \handle -> do
    fileContent <- hGetContents handle
    return $ LazyText.unpack fileContent
  
-- NOTE: this code print nothing, due to error on `getFileContent2`
printFileContent2 fileName = do
  c <- getFileContent2 fileName
  putStrLn c
Data.Text.Lazy.IO.readFile is implemented in this way:
readFile :: FilePath -> IO Text
readFile name = openFile name ReadMode >>= hGetContents
Data.Text.Lazy.IO.hGetContents is a function returning the content of the handle chunk by chunk, and closing the handle when all the content is read.
System.IO.withFile is implemented in this way:
 withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
 withFile name mode = bracket (openFile name mode) hClose
so the getFileContent2 code can be expanded to
 getFileContent3 fileName = do
    bracket (openFile fileName ReadMode) hClose $ \handle -> do
      fileContent <- hGetContents handle
      return $ LazyText.unpack fileContent
bracket is one of a series of resource managements functions and monads used for acquiring resources, and releasing them at the end of an action, in a predictable way, and not when the garbage collector arbitrarily decide it. bracket makes management of scarce resources like file handles, database connections, and so on more robust and predictable.
This code will run correctly
 printFileContent3 fileName = do
    bracket (openFile fileName ReadMode) hClose $ \handle -> do
      fileContent <- hGetContents handle
      putStrLn $ LazyText.unpack fileContent
because it will:
  • open the file
  • read it chunk by chunk, using hGetContents
  • print it chunk by chunk, using putStrLn
  • close the handle, thanks to bracket resource finalization action
The code printFileContent2 is not running correctly because:
  • bracket open the file
  • a lazy evaluation thunk LazyText.unpack <$> hGetContents handle is returned from the getFileContent2 function
  • bracket close the file handle before the thunk is evaluated
So from this point:

  • printFileContent2.putStrLn executed the thunk 
  • hGetContents thunk tries to access a closed handle 
  • hGetContents returns an empty content, instead of signaling with a run-time exception that the handle is closed

Then printFileContent2 assumes wrongly that the file is an empty file, without any compile time and run time error.
A test case for the bug is on https://github.com/massimo-zaniboni/ghc_lazy_file_content_error , and the bug was signaled to Ghc team.

RAII Programming in Haskell

Resource Acquisition is Initialization (RAII) is a tecnique for having predictable resource usages.
In Haskell, bracket should be RIIA compliant. This implies that bracket must always return the result in a strict way. In this way when the bracket action is called:
  • the resources are allocated,
  • the action is executed with maximum priority, and predictability,
  • the resources are deallocated,
  • the result is returned to the caller, completely evaluated, and no further processing involving the resources is required,
So the priority is always given to the called bracket action, and not to the caller action. The caller can be arbitrary complex, and long-running, so the called action had to complete its work with greater priority respect the caller, in order to release its allocated resources in a predictable way. In Haskel terms this signifies that when the caller decides it needs the result of the called code, it evaluates the called thunk completely, without intermediate unevaluated thunks. So called thunk evaluation is an atomic action, and it is not intervaled with the partial execution of the caller code.
This mechanism must be used also in case of nested bracket actions: the called actions must be executed in a strict way.

5 Whys

Why we have the hGetContents error? Because hGetContents is buggy. Why? Because bracket used inside withFile does not behave correctly with unevaluated thunks. Why? Because unevaluated thunks do not play nice with RIIA semantic. Because bracket should force a strict evaluation of the returned action, so the used resources are used completely and in a predictable way.
If bracket forces a strict evaluation of its result, then there will be no bug. This code run correctly

getFileContent4 :: FilePath -> IO String
getFileContent4 fileName = do
  fileContent <- LazyText.readFile fileName
  return $! LazyText.unpack fileContent
  -- NOTE: execute in a strict way, thanks to `$!`

-- NOTE: print the file content, reading it chunk by chunk by `fileName`
-- and writing it on `stdout` chunk by chunk.
-- So this simple code, has a nice run-time behaviour.
printFileContent4 fileName = do
  c <- getFileContent4 fileName
  putStrLn c

Friday, October 14, 2016

Idiomatic Haskell Applicative Code

Not Idiomatic Haskell Code

The many1 function code is the same of the Alternative.many function in the Haskell prelude. The scope of the function is applying zero, one or more times its applicative argument.

{-# LANGUAGE ApplicativeDo #-}
import Control.Applicative
class Alternative f => Alternative1 f where

   -- | Zero or more.
   many1 :: f a -> f [a]
   many1 v = many_v
     where
       many_v = some_v <|> pure []
       some_v = (fmap (:) v) <*> many_v

For me, the some_v inner function is hard to read/comprehend.

Idiomatic Haskell Code

We can rewrite some_v in this way

class Alternative f => Alternative2 f where

   -- | Zero or more.
   many2 :: f a -> f [a]
   many2 v = many_v
     where
       many_v = some_v <|> pure []
       some_v = (:) <$> v <*> many_v

some_v is rewritten using the Applicative idiomatic form:

  • we evaluate v and many_v using the specific f Applicative instance rules, and inside the applicative running context
  • we use the two results as arguments of the plain Haskell function (:)
  • we put the final result of (:) function in the f Functor

The majority of Applicative usage patterns follow this syntactic pattern, or its variation pure f <*> argument1 <*> argument2 <*> ... and after some accustomization, it can be comprehended intuitively without thinking too much to the type definitions, and to the details of involved functional combinators. This is the same for the code like case1 <|> case2, where we recognize immediately that <|> is separating two choices, and we understand the meaning without thinking too much to the underling details.

So we have improved the readability of the first version of the code using the common syntactic form (:) <$> ... <*> ..., instead of the less used fmap (:) <*> ... <*> ... form.

ApplicativeDo

This is a variant of many, but with ApplicativeDo syntax.

class Alternative f => Alternative3 f where

   -- | Zero or more.
   many3 :: f a -> f [a]
   many3 v = some_v <|> pure []

     where

       some_v = do
         v' <- v
         v'' <- (some_v <|> pure [])
         return (v':v'')

The ApplicativeDo notation helps understanding the semantic of some_v because all function arguments are explicit. But the code is a little more verbose respect the idiomatic Applicative pattern usage.

Functional Code

We can rewrite many using explicit types, and adding comments. The resulting code is too much verbose, and not intuitively readable, because there are too much low level details about types and functional combinators, hiding its true meaning.

class Alternative f => Alternative4 f where

   -- | Zero or more.
   many4 :: f a -> f [a]
   many4 v = many_v
     where
       -- TYPE: many_v :: f [a]
       many_v = some_v <|> pure []
       -- if `some_v` fails, return empty list,
       -- terminating `some_v/many_v` mutual recursion.

       -- TYPE: some_v :: f [a]
       some_v 
         = let -- TYPE: fa :: a -> ([a] -> [a])
               fa a = \as -> a:as
               -- `a` is the current result,
               -- to concatenate with next results,
               -- supplied in the [2] part.  

               -- TYPE: fm :: f ([a] -> [a])
               fm = fmap fa v
               -- create something nice to combine using
               -- `<*>` `Applicative` combinator,
               -- having as specific types:
               -- `(<*>) :: f ([a] -> [a]) -> f [a] -> f [a]`  
               -- `fmap :: ([a] -> [a]) -> f [a] -> f [a]

            in fm <*> many_v
               -- [2] part:
               -- concatenate the current result with the next results in `many_v`.
               -- This is a mutual recursion between `many_v` and `some_v`.
               -- At some point `many_v` will fail returning the empty list,
               -- and the recursion loop will stop.

Personal Conclusions

Haskell gives a lot of freedom in the way we can express things, but this freedom can harm code readability. Haskell code using Monads and Applicative instances, should follow few known and accepted idiomatic forms, because otherwise understanding the meaning of complex function combinations is not easy.

All complex function combinations in the end are only function calls, but human mind (at least my mind) does not reason easily with second order functions combined toghether in fancy ways. Instead we are able to recognize common syntactic forms like f <$> arg1 <*> arg2, associating immediately to them a semantic meaning, independently from the more complex low level details.