Thursday, September 29, 2016

Higher Harder Functions

Higher order functions combinators are at the base of Haskell power, but they can make the code harder to comprehend, so some counter measure should be adopted.

Do-Notation vs Applicative-Notation

This post is a literate Haskell document, so we must start with some boilerplate code.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ApplicativeDo #-}
{-# LANGUAGE Arrows             #-}
{-# LANGUAGE DeriveDataTypeable #-}
module Main where
import Control.Applicative
import qualified Data.Attoparsec.Text.Lazy as A
import Data.Char
import Hakyll
main = do putStrLn "A main is mandatory, but not used in practice."

Attoparsec implements a domain specific language (DSL) for describing parsers. This is an example of function using Attoparsec, and the Applicative idiom.

-- | Parse an hexadecimal number like "%A0", and convert to the
--   the corresponding decimal number 10.
--   This function is written using the Applicative class style.
attoHex :: A.Parser Int
attoHex
  = toInt <$> (A.char '%' *> hexDigit) <*> hexDigit
 where

  hexDigit :: A.Parser Int
  hexDigit = (((-) (ord '0')) . ord) <$> A.choice [A.digit, A.satisfy (A.inClass "A-F")]

  toInt x y = x * 16 + y

What is nice in this code? For sure that there are no low level details about the state of the parser, and the code is only a specification of what we want to parse.

What is not nice in this code? The fact that a novice user must learn some idiomatic use of the Applicative class: toInt <$> ... *> ... <*>. Is this really necessary?

The same function, written using an ApplicativeDo form is:

attoHex' :: A.Parser Int
attoHex' = do
  A.char '%'
  x <- hexDigit
  y <- hexDigit
  return $ x * 16 + y
 where

  hexDigit :: A.Parser Int
  hexDigit = do
    d <- A.choice [A.digit, A.satisfy (A.inClass "A-F")]
    return $ (ord d) - (ord '0')

In this form the arguments of functions are explicit, and the code work-flow is a standard top-down processing. Also the internal function hexDigit is a lot more readable, using explicit arguments, instead of the point-free form.

Higher Order Functions

filter :: (a -> Bool) -> [a] -> [a] is an higher order function accepting another function as argument. Until there are simple functions like filter, it is easy to figure out the final meaning of an expression. When we start combining more complex functions it can become difficult.

Applicative class introduce some non trivial higher order function combinators, generalizing a lot of computation patterns involving parallel computations inside a certain context, with a final composition described by a pure function.

We redefine Maybe, for studying how Applicative instance is supported.

data NMaybe a = NJust a | NNothing
> class  Functor f  where
>   fmap        :: (a -> b) -> f a -> f b

This is the standard instance of Maybe, but defined for NMaybe.

instance  Functor NMaybe  where
  fmap _ NNothing       = NNothing
  fmap f (NJust a)      = NJust (f a)

Recap this:

> class Functor f => Applicative f where
>   pure :: a -> f a
>   (<*>) :: f (a -> b) -> f a -> f b

In this case the instance definition for NMaybe became

instance Applicative NMaybe where
  pure = NJust

pure = NJust is a simple definition, but pure x = NJust x is a little more explicit, because its parameters are explicits.

  NJust f <*> m = fmap f m

This definition is not immediate to understand, because the types are implicit. Types are also a form of documentation, so we try to expand the type of each part of the expression, using an invented notation:

assuming Applicative f
  <*> ::: ... -> NMaybe b
  -- we are applying the <*> function, and its result is `NMaybe b`.

    NJust ::: ... -> NMaybe (a -> b)
    -- this is the first argument of `<*>` parent function,
    -- with its explicit type.

      f ::: a -> b
      -- the argument of `NJust` parent function
      -- This is a "terminal" argument because, it is not an application of a function,
      -- and there are no `...` on its type annotation.

    m ::: NMaybe a
    -- this is the second argument.
    -- Note that the type of the complete type of the parent
    -- is derivable substituting to `...` the resulting types of its arguments,
    -- so in this case `NMaybe (a -> b) -> NMaybe a -> NMaybe b`

  =

  fmap ::: ... -> NMaybe b
    f ::: NMaybe (a -> b)
    m ::: NMaybe a
  -- the type is `NMaybe (a -> b) -> NMaybe a -> NMaybe b`
  -- and it is the same of the left definition.

This form is too much verbose, but it contains all the types, and we can understand the meaning of the expression only studying the types.

Homomorphism Law

This is law must be respcted from every instance of Applicative:

pure g <*> pure x = pure (g x)

The law expressed with explicit types is:

assuming Applicative f
  <*> ::: ... -> f b
    pure ::: ... -> f (a -> b)
      g ::: a -> b
    pure ::: ... -> f a
      x ::: a
  =
  pure ::: ... -> f b
    g ::: ... -> b
    x ::: a

Using a pseudo ApplicativeDo notation

do x' <- pure x
   return $ g x'
=
do g' <- pure $ g x
   return g'

Identity

This law says:

pure id <*> v = v

With explicit types:

assuming Applicative f
  (<*>) ::: ... -> f a
    pure id ::: f (a -> a)
    v ::: f a
  =
  v ::: f a

Using a pseudo ApplicativeDo:

do v' <- v
   return $ id v'
=
v

Composition

This law says:

pure (.) <*> u <*> v <*> w = u <*> (v <*> w)

Disambiguating the associativity (<*> is left associative) the law is

((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)

The function composition function (.) is the classical

> (.) :: (b -> c) -> (a -> b) -> (a -> c)
> (.) f g = \a -> f (g a)

The form with explicit types is:

assuming Applicative f
  (<*>) ::: ... -> f c
    (<*>) ::: ... -> f (a -> c)
      (<*>) ::: ... -> f ((a -> b) -> a -> c)
        pure (.) ::: f ((b -> c) -> ((a -> b) -> a -> c))
        u ::: f (b -> c)
      v ::: f (a -> b)
    w ::: f a
  =
  <*> ::: ... -> f c
    u ::: f (b -> c)
    <*> ::: ... -> f b
      v ::: f (a -> b)
      w ::: f a

The pseudo ApplicativeDo is:

do u' :: b -> c
   u' <- u

   v' :: a -> b
   v' <- v

   w' :: a
   w' <- w

   return $ (u' . v') w'

=
do
   vw' :: b
   vw' <- do w' :: a
             w' <- w

             v' :: a -> b
             v' <- v

             return $ v' w'

   u' :: b -> c
   u' <- u

   return $ u' vw'

Lesson Learned

A point-free/applicative expression like

pure (.) <*> u <*> v <*> w = u <*> (v <*> w)

is hard to understand. Showing the intermediate types helps. Expressing it using a do-notation, with explicit names and types for each part of the expression, and with a top-down semantic helps further.

Nested Domain Specific Languages

This is a example of Hakyll code, for generating a static web site.

hakyllDemo :: IO ()
hakyllDemo = hakyllWith defaultConfiguration $ do
  -- Dot images
  match "images/*.dot" $ do
    route   $ setExtension "png"
    compile $ getResourceLBS >>= traverse (unixFilterLBS "dot" ["-Tpng"])

hakyllWith execute a Rules Monad. Rules is a DSL for generating pages, so apparently it is all very neat and elegant.

match, route, and compile returns Rules, so we have apparently only a series of statements of the Rules monad.

Studying the code there is a little surprise: compile accepts as parameter a Monad of type Compiler, and then it returns a value of type Rules. So the part after compile $ is a DSL written not in the Rules DSL, but in the Compiler DSL. But without studying the type of compiler there are no hints of this.

It is the same for route that accepts a description of routes using Routes DSL, and then return a Rules value, that can be embeded in the main hosting Rules Monad.

Because every Monad is a different DSL with a different semantic, the Haskell syntax should indicate better when we are using a certain type of Monad. So the code can be rewritten in a pseudo Haskell notation like this:

hakyllDemo' :: IO ()
hakyllDemo' = hakyllWith defaultConfiguration (:Rules: do
  match "images/*.dot" $ do
    route   (:Route: setExtension "png"))
    compile (::Compile: getResourceLBS >>= traverse (unixFilterLBS "dot" ["-Tpng"]))
)

Code Golfing

Haskell permits many variants for the same expression:

applicativeCall :: NMaybe Int -> NMaybe Int -> NMaybe Int
applicativeCall x y = pure (+) <*> x <*> y

applicativeCall' :: NMaybe Int -> NMaybe Int -> NMaybe Int
applicativeCall' x y = (+) <$> x <*> y

applicativeCall'' :: NMaybe Int -> NMaybe Int -> NMaybe Int
applicativeCall'' x y = liftA2 (+) x y

applicativeCall''' :: NMaybe Int -> NMaybe Int -> NMaybe Int
applicativeCall''' x y = do
  x' <- x
  y' <- y
  return $ x' + y'

In case of very small code fragments, the do notation seems too much verbose respect a liftA2 variant. But in case of real code, with longer functions, maybe a more predictable do notation should be preferred.

Conclusion

Programs written in Java are usually easy to read and comprehend “in the small”, because Java has a simple and coherent semantic. Maybe in the large they use a lot of complex patterns, and the general architecture of the application is not easy to understand, but in the small every fragment of code is readable.

On the contrary Haskell code can be very hard to comprehend in the small, because every Haskell expression can perform a lot of different things, depending from the context and the implicit types, so it must be studied carefully before understanding its real meaning.

Haskell should use a more uniform syntax and semantic, favouring:

  • more explicit types
  • do notation with explitic named parts, and clear top-down semantic, instead of clever function compositions
  • explicit indication of the Monad/Applicative in which the statements are executed

The ideal code should be readable from left to right, from top to bottom, without thinking too much at the low level details of the language semantic.

No comments:

Post a Comment