lj_chaource: My tutorial on free monads

My tutorial on free monads
Writing a "monad tutorial" is a first rite of passage for a functional programming novice. The novice has been trying to understand monads for a while and finally reached clarity. The rite of passage consists of writing down some ramblings to remind yourself how you achieved clarity - using what analogies and examples, and through what reasoning. These ramblings are to be published in a blog under the title "My monad tutorial".

A second rite of passage is to write an explanation of the "free monad".

At this point, I still have questions about the free monad; but these questions are more abstract and do not detract from a certain amount of clarity that I did reach regarding the free monad. Here goes the "tutorial".
(I heavily use the notation of my previous posts on category theory, and the notions of natural transformation.)

Remind myself what a "monadic functor" is:

Definition. A functor F is monadic if there exist natural transformations η: Id ~> F and μ: F∘F ~> F, and if the monoid laws hold for η as the "unit" and μ as the "multiplication".

A motivation for a "free monad" is that we have a functor F that is not a monad, and we would like to add some structure around F so that the new functor becomes a monad.

Definition (bad). For a functor F, a free F-monad is a monadic functor M_F such that there exists a natural transformation λ: F ~> M_F (the "lifting of F into the F-monad").

This definition is "bad" because it does not specify how the free F-monad should depend on F. Nothing prevents us from saying that the identity monad is the free F-monad for any F, according to this bad definition. But so far I don't have a clear idea about a better definition.

Main theorem. A free F-monad exists for any functor F. The functor M_F is given by the recursive formula

M_F(t) = t + F(M_F(t)) .

Proof. Assuming that F is a functor, we need to prove that the following natural transformations exist:

1) η: Id ~> M_F.
2) μ: M_F∘M_F ~> M_F.
3) λ: F ~> M_F.

We also need to prove that the following properties hold (the standard properties of a monadic functor):
4) (Monoidal right unit) If η is used to map M_F(t) to M_F(M_F(t)), then μ will bring this back to the initial value of type M_F(t). That is, μ ∘ η = Id on M_F.
5) (Monoidal left unit) Take a value of type M_F(M_F(t)), first use μ on it to map it to M_F(t), then use η on it to inject it back into M_F(M_F(t)). This should yield the same value of type M_F(M_F(t)). That is, η ∘ μ = Id on M_F(M_F(t)).
6) (Monoidal asociativity) Take a value of type M_F(M_F(M_F(t))) and map it to M_F(t) by using μ twice. The result is the same regardless of the order of applying the two instances of μ.

For 1). η(t) = t + 0, which is the left variant of M_F.

For 2). Since M_F is defined recursively, we consider the base of induction and the induction step separately. For clarity, we write the definition of M_F as

M_F(t) = t + F(M_F(t)),

where the boldface M_F(t) denotes the functor that we assume to exist by the inductive assumption.

Now, we need to construct a natural transformation
μ_t: M_F(M_F(t)) ~> M_F(t), which is the same as
μ_t: M_F(t) + F(M_F(M_F(t)) ~> M_F(t).

The first term is mapped identically; this is the base of induction.

The second term is mapped using the functor property of F: by inductive assumption, the functor M_F is such that we already have a natural transformation
μ_t: M_F(M_F(t)) ~> M_F(t).
It remains to lift this μ_t into the functor F and obtain

F(μ_t): F(M_F(M_F(t)) ~> F(M_F(t)) .

This last type is the same as 0 + F(M_F(t)), that is, the right variant of the sum t + F(M_F(t)) = M_F(t). Therefore, we have constructed the natural transformation μ as required.

For 3). We need a morphism
λ_t: F(t) ~> M_F(t) = t + F(M_F(t)).
We lift η_t: t -> M_F(t), which was constructed in item 1, into the functor F to obtain

F(η_t): F(t) -> F(M_F(t)).

This is the right variant of M_F(t).

For 4), 5), 6) we need to perform some straightforward calculations using induction and the definition of M_F.
Q.E.D.

What can we say about M_F if F is already a monad?
We can embed F as a subfunctor of M_F using λ, and it turns out we can also extract F back:

Statement. If F is a monadic functor, there exists a natural transformation ("projection") π: M_F ~> F, such that π∘λ = Id (the "identity" as a natural transformation acting on F).

Proof.
We use the definition of M_F. To construct π_t: M_F(t) ~> F(t), we have to map the left variant, t -> F(t), and the right variant,
F(M_F(t)) -> F(t).

For the left variant, the morphism t -> F(t) already exists since F is monadic (it is the η_(F) map for F).

For the right variant, the morphism F(M_F(t)) -> F(t) is constructed using the induction assumption that π: M_F ~> F already exists. Lifting this π into the functor F, we obtain
F(π_t): F(M_F(t)) -> F(F(t)). Now we use the μ_(F) map of F to convert the latter type into F(t).

We have denoted μ_(F) to distinguish it from μ_{(M_F)}. We used the similar notation for η_(F). In this notation, we can write the following property to be used below:
π∘η_{(M_F)} = η_(F). (This is true by construction of η_{(M_F)}.)

It remains to show that injecting F into M_F using λ and then projecting back onto F using π will yield the identity natural transformation on F.
By construction of λ (see item 3 in the proof of the "main theorem"), we have
λ(F) = 0 + F(η), where η_t: t -> M_F(t). If q is any term of type F(t) then λ_t(q) = 0 + F(η_t)q, which is a term of type M_F(t)).
Now, we need to apply π to this term and show that we obtain q again.

Since this term is the right variant in M_F(t)), we need to lift into F the projection π into F and apply to that term. When applied to terms q of type F(t), the transformation π∘λ is equal to

π∘λ = μ_(F)∘F(π)∘F(η_{(M_F)}) =
μ_(F)∘F(π∘η_{(M_F)}) = [ using the property shown above ]
μ_(F)∘F(η_(F)).
Now, since F is a monad, its operations μ_(F) and η_(F) must satisfy the right unit property
μ_(F)∘F(η_(F)) = μ∘F(η) = Id (identity natural transformation acting on F).
Therefore, π∘λ = Id as required.
Q.E.D.

Here is a somewhat nontrivial example of embedding a monadic functor F in its own free F-monad. Consider the functor F = List. It is defined as List(t) = 1 + t*List(t). Particular variants of List(t) are
1+0
0+t*(1+0) = t*1 = t
0+t*(0+t*(1+0)) = t*t*1 = t*t
etc.

The free List-monad is defined as

M_List t = t + (1 + M_List(t)*List(M_List(t))).
So, particular variants of M_List(t) are
t + 0 = t (pure value)
0 + (1 + 0) = 1 (empty list)
0 + (0 + (t+0)*(0 + (1+0)) (a list of length one containing a pure monadic value)
etc.
So, actually the free monadic values are equivalent to a number of nested lists containing some values of t or, perhaps, empty lists. The free List-monad is therefore much more complicated than a List.

Consider a variant t*t of type List t, or in a more familiar syntax, a two-element list [a,b], and let us inject it into the free List-monad and then project back.

To inject, we take the injection map η_{(M_F)}, which maps t -> t + 0, and lift it into the functor List. This will produce essentially [a+0, b+0] of type List[M_List(t)]. So

λ: [a,b] -> [a+0, b+0].

Now we would like to project this back into the List functor. We first use the inductive case for the projection map, which maps t+0 -> η_(F)(t). In the case of List, this produces a one-element list:

π: a+0 -> [a].

Then we lift this into the List functor, to obtain a map List[M_List(t)] -> List(List(t)):

F(π): [a+0,b+0] -> [ [a], [b] ].

Finally, we use the List monad's μ operation, which flattens the list and recovers [a,b], which is identical to the value we started with. This illustrates the property π∘λ = Id on F.

Note that F(η_(F)(t)) acting on [a,b] produces [[a],[b]], which is not the same as η_F(t) acting on [a,b] and producing [[a, b]]. Nevertheless, after applying the "flattening" operation μ, the results are the same.

End of example.

Now, all is well and good, except for two issues that I have not yet figured out.

1. The formula for M_F in the "main theorem" seems to fall out from the blue sky. I have not yet understood how to derive it from some other principle, - for example, from some "initial algebra" or something like that. The difficulty for me is that the free F-monad is specific to the functor F. If the "free monad" were an initial object in the category of monadic functors, I would expect to have a natural transformation from the free monad to any other monadic functor. However, the free F-monad is different for every F, and so there seems to be no universal "free monad" that can map to all possible monadic functors.

2. When F is already monadic, the free F-monad is different from F (and I expected, perhaps wrongly, that M_F should be isomorphic to F when F is already a monad). In fact, M_F is a much "larger" functor that includes all objects such as t + F(t), t+F(t+F(t)), etc., which cannot be isomorphic to F(t).

Here is how we can show that (at least some) monadic functors F are not isomorphic to their free F-monads.

A useful property of a monad is being "layered".

Definition. A functor F is pointed if there exists a natural transformation η: Id ~> F.

Definition. A pointed functor F is layered if there exists a natural transformation ζ: F(t) ~> 1 + t such that ζ∘η (t) = 0 + t. In other words, we can always say whether a value in F(t) is obtained by injecting a value of t, and if so, we can extract that value of t.

Not all monadic functors are layered. For example, the functor F(t) = 1 + t is obviously layered, but Θ(t) = (t -> 0) -> 0 is non-layered.

Statement: Θ(t) = (t -> 0) -> 0 is a monadic functor that is non-layered.
Proof. Θ(t) is a functor since t occurs in the covariant position. Note that the Curry-Howard image of Θ(t) is the double negation of the proposition t. Thus the term η_t : t -> (t->0) -> 0 can be constructed by modus ponens. Also, Θ(Θ(t)) is the quadruple negation, which can be converted to double negation. For Θ(t) to be layered, we need ζ_t: Θ(t) -> 1+t. Since we cannot extract t from the double negation of t, the only term of this type is Θ(t) -> 1. So, if we first inject t into Θ(t) and then apply ζ_t, we will never get t back. This violates the required property of the morphism ζ_t. So Θ(t) is not layered.

Statement: There exist some monadic functors F for which the free F-monad is not isomorphic to F.
Proof. It is clear that the free F-monad is always layered for any F: the definition of M_F is of the form (t + ...), where the term t is obtained precisely as the image of the injection η.

At the same time, we know that there exist some monadic functors F that are non-layered. If M_F were isomorphic to F, they would be both layered or both non-layered. Therefore, for any non-layered monadic functor F, the free F-monad is not isomorphic to F.
Q.E.D.