Higher order functions combinators are at the base of Haskell power, but they can make the code harder to comprehend, so some counter measure should be adopted.
Do-Notation vs Applicative-Notation
This post is a literate Haskell document, so we must start with some boilerplate code.
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ApplicativeDo #-}
{-# LANGUAGE Arrows #-}
{-# LANGUAGE DeriveDataTypeable #-}
module Main where
import Control.Applicative
import qualified Data.Attoparsec.Text.Lazy as A
import Data.Char
import Hakyll
main = do putStrLn "A main is mandatory, but not used in practice."
Attoparsec implements a domain specific language (DSL) for describing parsers. This is an example of function using Attoparsec, and the Applicative idiom.
-- | Parse an hexadecimal number like "%A0", and convert to the
-- the corresponding decimal number 10.
-- This function is written using the Applicative class style.
attoHex :: A.Parser Int
attoHex
= toInt <$> (A.char '%' *> hexDigit) <*> hexDigit
where
hexDigit :: A.Parser Int
hexDigit = (((-) (ord '0')) . ord) <$> A.choice [A.digit, A.satisfy (A.inClass "A-F")]
toInt x y = x * 16 + y
What is nice in this code? For sure that there are no low level details about the state of the parser, and the code is only a specification of what we want to parse.
What is not nice in this code? The fact that a novice user must learn some idiomatic use of the Applicative class: toInt <$> ... *> ... <*>
. Is this really necessary?
The same function, written using an ApplicativeDo
form is:
attoHex' :: A.Parser Int
attoHex' = do
A.char '%'
x <- hexDigit
y <- hexDigit
return $ x * 16 + y
where
hexDigit :: A.Parser Int
hexDigit = do
d <- A.choice [A.digit, A.satisfy (A.inClass "A-F")]
return $ (ord d) - (ord '0')
In this form the arguments of functions are explicit, and the code work-flow is a standard top-down processing. Also the internal function hexDigit
is a lot more readable, using explicit arguments, instead of the point-free form.
Higher Order Functions
filter :: (a -> Bool) -> [a] -> [a]
is an higher order function accepting another function as argument. Until there are simple functions like filter
, it is easy to figure out the final meaning of an expression. When we start combining more complex functions it can become difficult.
Applicative class introduce some non trivial higher order function combinators, generalizing a lot of computation patterns involving parallel computations inside a certain context, with a final composition described by a pure function.
We redefine Maybe
, for studying how Applicative instance is supported.
data NMaybe a = NJust a | NNothing
> class Functor f where
> fmap :: (a -> b) -> f a -> f b
This is the standard instance of Maybe
, but defined for NMaybe
.
instance Functor NMaybe where
fmap _ NNothing = NNothing
fmap f (NJust a) = NJust (f a)
Recap this:
> class Functor f => Applicative f where
> pure :: a -> f a
> (<*>) :: f (a -> b) -> f a -> f b
In this case the instance definition for NMaybe
became
instance Applicative NMaybe where
pure = NJust
pure = NJust
is a simple definition, but pure x = NJust x
is a little more explicit, because its parameters are explicits.
NJust f <*> m = fmap f m
This definition is not immediate to understand, because the types are implicit. Types are also a form of documentation, so we try to expand the type of each part of the expression, using an invented notation:
assuming Applicative f
<*> ::: ... -> NMaybe b
-- we are applying the <*> function, and its result is `NMaybe b`.
NJust ::: ... -> NMaybe (a -> b)
-- this is the first argument of `<*>` parent function,
-- with its explicit type.
f ::: a -> b
-- the argument of `NJust` parent function
-- This is a "terminal" argument because, it is not an application of a function,
-- and there are no `...` on its type annotation.
m ::: NMaybe a
-- this is the second argument.
-- Note that the type of the complete type of the parent
-- is derivable substituting to `...` the resulting types of its arguments,
-- so in this case `NMaybe (a -> b) -> NMaybe a -> NMaybe b`
=
fmap ::: ... -> NMaybe b
f ::: NMaybe (a -> b)
m ::: NMaybe a
-- the type is `NMaybe (a -> b) -> NMaybe a -> NMaybe b`
-- and it is the same of the left definition.
This form is too much verbose, but it contains all the types, and we can understand the meaning of the expression only studying the types.
Homomorphism Law
This is law must be respcted from every instance of Applicative
:
pure g <*> pure x = pure (g x)
The law expressed with explicit types is:
assuming Applicative f
<*> ::: ... -> f b
pure ::: ... -> f (a -> b)
g ::: a -> b
pure ::: ... -> f a
x ::: a
=
pure ::: ... -> f b
g ::: ... -> b
x ::: a
Using a pseudo ApplicativeDo notation
do x' <- pure x
return $ g x'
=
do g' <- pure $ g x
return g'
Identity
This law says:
pure id <*> v = v
With explicit types:
assuming Applicative f
(<*>) ::: ... -> f a
pure id ::: f (a -> a)
v ::: f a
=
v ::: f a
Using a pseudo ApplicativeDo:
do v' <- v
return $ id v'
=
v
Composition
This law says:
pure (.) <*> u <*> v <*> w = u <*> (v <*> w)
Disambiguating the associativity (<*> is left associative) the law is
((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
The function composition function (.)
is the classical
> (.) :: (b -> c) -> (a -> b) -> (a -> c)
> (.) f g = \a -> f (g a)
The form with explicit types is:
assuming Applicative f
(<*>) ::: ... -> f c
(<*>) ::: ... -> f (a -> c)
(<*>) ::: ... -> f ((a -> b) -> a -> c)
pure (.) ::: f ((b -> c) -> ((a -> b) -> a -> c))
u ::: f (b -> c)
v ::: f (a -> b)
w ::: f a
=
<*> ::: ... -> f c
u ::: f (b -> c)
<*> ::: ... -> f b
v ::: f (a -> b)
w ::: f a
The pseudo ApplicativeDo is:
do u' :: b -> c
u' <- u
v' :: a -> b
v' <- v
w' :: a
w' <- w
return $ (u' . v') w'
=
do
vw' :: b
vw' <- do w' :: a
w' <- w
v' :: a -> b
v' <- v
return $ v' w'
u' :: b -> c
u' <- u
return $ u' vw'
Lesson Learned
A point-free/applicative expression like
pure (.) <*> u <*> v <*> w = u <*> (v <*> w)
is hard to understand. Showing the intermediate types helps. Expressing it using a do-notation, with explicit names and types for each part of the expression, and with a top-down semantic helps further.
Nested Domain Specific Languages
This is a example of Hakyll code, for generating a static web site.
hakyllDemo :: IO ()
hakyllDemo = hakyllWith defaultConfiguration $ do
-- Dot images
match "images/*.dot" $ do
route $ setExtension "png"
compile $ getResourceLBS >>= traverse (unixFilterLBS "dot" ["-Tpng"])
hakyllWith
execute a Rules
Monad. Rules
is a DSL for generating pages, so apparently it is all very neat and elegant.
match
, route
, and compile
returns Rules
, so we have apparently only a series of statements of the Rules
monad.
Studying the code there is a little surprise: compile
accepts as parameter a Monad of type Compiler
, and then it returns a value of type Rules
. So the part after compile $
is a DSL written not in the Rules
DSL, but in the Compiler
DSL. But without studying the type of compiler
there are no hints of this.
It is the same for route
that accepts a description of routes using Routes
DSL, and then return a Rules
value, that can be embeded in the main hosting Rules
Monad.
Because every Monad is a different DSL with a different semantic, the Haskell syntax should indicate better when we are using a certain type of Monad. So the code can be rewritten in a pseudo Haskell notation like this:
hakyllDemo' :: IO ()
hakyllDemo' = hakyllWith defaultConfiguration (:Rules: do
match "images/*.dot" $ do
route (:Route: setExtension "png"))
compile (::Compile: getResourceLBS >>= traverse (unixFilterLBS "dot" ["-Tpng"]))
)
Code Golfing
Haskell permits many variants for the same expression:
applicativeCall :: NMaybe Int -> NMaybe Int -> NMaybe Int
applicativeCall x y = pure (+) <*> x <*> y
applicativeCall' :: NMaybe Int -> NMaybe Int -> NMaybe Int
applicativeCall' x y = (+) <$> x <*> y
applicativeCall'' :: NMaybe Int -> NMaybe Int -> NMaybe Int
applicativeCall'' x y = liftA2 (+) x y
applicativeCall''' :: NMaybe Int -> NMaybe Int -> NMaybe Int
applicativeCall''' x y = do
x' <- x
y' <- y
return $ x' + y'
In case of very small code fragments, the do
notation seems too much verbose respect a liftA2
variant. But in case of real code, with longer functions, maybe a more predictable do
notation should be preferred.
Conclusion
Programs written in Java are usually easy to read and comprehend “in the small”, because Java has a simple and coherent semantic. Maybe in the large they use a lot of complex patterns, and the general architecture of the application is not easy to understand, but in the small every fragment of code is readable.
On the contrary Haskell code can be very hard to comprehend in the small, because every Haskell expression can perform a lot of different things, depending from the context and the implicit types, so it must be studied carefully before understanding its real meaning.
Haskell should use a more uniform syntax and semantic, favouring:
- more explicit types
do
notation with explitic named parts, and clear top-down semantic, instead of clever function compositions- explicit indication of the Monad/Applicative in which the statements are executed
The ideal code should be readable from left to right, from top to bottom, without thinking too much at the low level details of the language semantic.