Hackle's blog
between the abstractions we want and the abstractions we get.

# One recursion for all! Catamorphism step by step

One of the thrills of learning Haskell, is how something can come up out of the blue to completely invalidate my existing knowledge.

For example, I used to feel pretty good about being able to use the likes of map, filter, find and reverse fluently, until I found out all these list functions can be implemented using reduce alone.

And it happened to me again a few days ago, this time about recursion.

## Look mum, I can do recursion!

A number of Haskell tutorials would show us how to write recursion data types, such as this simple Tree.

``````data Tree a = Leaf a | Node (Tree a) (Tree a)
``````

This makes for straightforward pattern matching.

``````showTree :: Show a => Tree a -> String
showTree (Leaf a) = show a
showTree (Node l r) = showTree l ++ " " ++ showTree r

-- cabal ghci
ghci> myTree = Node (Node (Leaf 1) (Leaf 2)) (Node (Leaf 3) (Leaf 4))
ghci> showTree myTree
"1 2 3 4"
``````

So intuitive! And we usually feel pretty good about ourselves at this moment. I remember being a bit drunk with power then :)

As is with real life or Haskell alike, when such feelings hit, quite likely it's NOT truly a time for self-congratulation, no people! Much, much better! It usually means a new landscape is opening up, and curse if we linger on the drunkenness!

## A curious Recursive type

The brilliant people out there tell us: manual recursion is fun for a while, but quickly gets boring. So they devised this one-for-all recursive type,

``````data Rec f = R { unR :: f (Rec f) }
``````

It's a pretty curious type isn't it, being infinitely recursive? Notice it needs an `f` that accepts a type parameter itself (or, `f` is of kind `* -> *`).

Suppose we try to construct a value for `Rec`? I happen to know something of kind `* -> *`.

``````ghci> :t R (Just (R Nothing))
R (Just (R Nothing)) :: Rec Maybe
``````

No problem, a value is constructed, but it's not very meaningful, `R (Just (R Nothing))` hardly contains any data. Should we try kind `* -> * -> *`?

``````ghci> :t R (Left "oh!")
R (Left "oh!") :: Rec (Either String)
``````

This is more meaningful - at least there is an `"oh"` in the data. Supposed we try `Right`?

``````ghci> :t R (Right "oh!")

<interactive>:1:10: error:
• Couldn't match type: [Char]
with: Rec (Either a)
Expected: Rec (Either a)
Actual: String

ghci> :t R (Right (R (Left "oh!")))
R (Right (R (Left "oh!"))) :: Rec (Either String)
``````

`Right String` won't work because `Rec` must be recursive, and `String` does not satisfy the expected `Rec (Either a)`, but we manage to get a valid `R (Right (R (Left "oh!")))` that checks all the boxes.

I don't want to alarm you, but we have effectively made two types `Maybe` and `Either` recursive with this strange `Rec` type! These types are not really known for being recursive.

Let that sink in...

## Back to Tree

Knowing how `Rec` works, let's doctor the `Tree` type a little so it accepts two type parameters.

``````data TreeR a r = LeafR a | NodeR r r deriving (Show)
``````

Note `TreeR` is not recursive by definition, and it allows values like `NodeR 2 2`.

It's obviously a `Functor`, so let's implement it,

``````instance Functor (TreeR a) where
fmap f (LeafR a) = LeafR a
fmap f (NodeR l r) = NodeR (f l) (f r)

ghci> fmap (+ 1) (LeafR 1)
LeafR 1
ghci> fmap (+ 1) (NodeR 1 2)
NodeR 2 3
``````

Compared to the original `Tree`, `TreeR` is strange as it maps over `NodeR` but leaves `LeafR` untouched, for being functorial only in its second parameter `r`, not the first one `a`.

But how do we make `TreeR` recursive? The name `r` should have given it away.

## Recursive TreeR

The combination of `TreeR` and `Rec` gives rise to a recursive `TreeR`,

``````myTreeR :: Rec (TreeR Int)
myTreeR = R (NodeR
(R (NodeR (R (LeafR 1)) (R (LeafR 2))))
(R (NodeR (R (LeafR 3)) (R (LeafR 4)))))
``````

Can you recognise `r` in `TreeR a r`? It's `Rec (TreeR Int)` itself! Which can be expanded to `Rec (TreeR Int (Rec (TreeR Int (Rec ...))))`, Cool!

Now supposed we need to show all the elements of `myTreeR` in order. It's not a straightforward reduce, map or fold, so let's write our own.

``````showTreeR :: Rec (TreeR Int) -> String
showTreeR (R (LeafR a)) = show a
showTreeR (R (NodeR l r)) = showTreeR l ++ " " ++ showTreeR r

ghci> showTreeR myTreeR
"1 2 3 4"
``````

Did you notice how `LeafR` and `NodeR` are handled differently? A `LeafR` has an `Int` inside, so we call `show a`; but `NodeR` would have its branches built as `String`s via recursion. However, they do converge on `String`. This is important.

"Not so bad", but didn't we say the point of `Rec` was to reduce manual recursion? We are back at ground zero. What's all the fuss about?!

Taking a closer look, I do notice `showTreeR` is suspiciously similar to the `Functor` implementation. Sure we can use `fmap`? Throwing in a recursive call to `showTreeR2` itself, we get,

``````showTreeR2 :: Rec (TreeR Int) -> String
showTreeR2 (R r) = case fmap showTreeR2 r of
LeafR a -> show a
NodeR l r -> l ++ " " ++ r

ghci> showTreeR2 myTreeR
"1 2 3 4"
``````

That's truly interesting. The recursion is moved to `fmap showTreeR2 r`, and the case-split does not use recursion at all. Cool again! What's next?

## Lifting

We can make a more useful version `showTreeR3` by lifting the conversion from `TreeR` to `String` to a function `showTreeFlat`.

``````showTreeR3 :: (TreeR Int String -> String) -> Rec (TreeR Int) -> String
showTreeR3 f (R r) = let inner = fmap (showTreeR3 f) r in f inner

showTreeFlat :: TreeR Int String -> String
showTreeFlat (LeafR a) = show a
showTreeFlat (NodeR l r) = l ++ " " ++ r

ghci> showTreeR3 showTreeFlat myTreeR
"1 2 3 4"
``````

The recursion is isolated in `showTreeR3`, which can be used for different algorithms to show a `TreeR`; more importantly, `showTreeFlat` is completely free of recursion. We are getting there!

Note how we arrive at `TreeR Int String -> String` where the `String` is lined up ("converge") with the eventual return type, but not `Int`? If we recall the definition of `data TreeR a r = LeafR a | NodeR r r`, a function of type `TreeR a r -> r` implies it encapsulates `a -> r`, as `show` in `showTreeFlat`. Pleasant hindsight.

## "Genericise"

Can we still make `showTreeR3` more generic? Sure thing. Without changing the implementation, we can add a requirement for `Functor` in order to use `fmap`, and `String` can be made generic,

``````showTreeR4 :: Functor f => (f a -> a) -> Rec f -> a
showTreeR4 f (R r) = let inner = fmap (showTreeR4 f) r in f inner

ghci> showTreeR4 showTreeFlat myTreeR
"1 2 3 4"
``````

This is great, but also very vague!

Fair to say it is pretty high-concept, in particular `Rec f -> a` strongly implies `Rec f` encapsulates some `a`, but it's not visible in the type of `Rec f` at all!

This of course is hinted at with `f a -> a`, but this itself is cryptic, as `Rec f` expands to `f (Rec f (Rec f))`, there is no hint of `a`.

A workable intuition for me: `Rec f` has some data (type unknown) that can be converted to `a`; utilising `f a -> a` we can recursively extract `a` and roll it up into one single `a` for return. (This smells strongly of Monoid but note how it's not necessarily so.)

Of course we realise `showTreeR4` needs to be renamed to fit its now generic purpose. Indeed, it is well-known as `cata`, short for `catamorphism` - collapsing a structure into a single value.

`cata` works for any recursive type, for example, `Either`,

``````resultR :: Rec (Either Bool)
resultR = R (Right (R (Right (R (Left True)))))

exclaim :: Either Bool String -> String
exclaim (Left l) = "it's " ++ show l
exclaim (Right r) = r ++ "!"

ghci> cata exclaim resultR
"it's True!!"
``````

Pretty amazing stuff.

There are other functions that work on recursive data structures such as `ana`, `apo`, `para`, `zygo`... This family of functions are otherwise referred to as, behold, "recursion schemes".

No doubt, recursion schemes do take away the fun of writing recursion manually. But it more than makes up for it, by revealing a fantastic landscape!

## References

This post is basically my study notes for this classic paper Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire and this amazing series by Patrick Thomson, who shared much much more information about recursion schemes.

The `Rec` type is a synonym to the famous `Fix` type. In application it can be abstracted away from the users by utilising generic programming, such as `makeBaseFunctor` in the recursion-schemes library by Edward A. Kmett and co. Of course, this is only possible because the user-provided functions do not need to know about `Rec`. Quite the good abstraction!