lj_chaource: Free functors, free applicatives, free monads III

Free functors, free applicatives, free monads III
Part I
Part II

Studying the properties of the "free functor" reveals three basic constructions:

1. "Universal runner". For a type function F (which may fail to be a functor), the free F-functor is the type function Φ(F) with the following properties:

- Φ(F) is a functor

- There is an injection mapping ∀t. F t -> Φ(F)t (this mapping is not a natural transformation because F is not required to be a functor!)

- For any functor G such that we have ∀t. F t -> G t (similarly, this mapping is not a natural transformation), we can produce a natural transformation Φ(F)t ~> G t compatible with the injection mapping, so that the following diagram commutes:

F t -> Φ(F)t ~> G t
 \_____________↗

The last property is the "universal property" of the free F-functor, as such properties are typically called in category theory.

In applications to functional programming, the natural transformation Φ(F)t ~> G t is understood as an "interpreter" or "runner" for the free F-functor. The purpose of the "runner" is to "run a computation" on a free functor value and convert it into a value of a specific functor G, which could be, say, a data structure containing the interesting results of some computation.

The "runner" must be "universal" in the sense that it will work in the same way for any functor G and any transformation F t -> G t. The "universal runner" will produce Φ(F)t ~> G t by going over the structure of Φ(F), which it knows about, but without using any knowledge of the specific structure of G (beyond assuming that G is a functor) or of the mapping F t -> G t. The programmer will write the code of the "universal runner" only once and will be able to use it with any G and any mapping F t -> G t.

The commutative diagram expresses the requirement that the interpretation of a single "F t" must give the same result of type "G t" when we first inject "F t" into Φ(F)t and then interpret using the "universal runner". Without the commutative diagram requirement, nothing prevents us from cheating and saying that Φ(F) should be always equal to, say, the identity functor Id, for every F. Indeed, Id is a functor, and we could pretend that we have built Id by taking F and "adding structure to it". The commutative diagram is the rigorous formulation of the informal English description that Φ(F) is in fact "adding some structure to F". A value of Φ(F)t must contain both F t and some additional structure, and "running" Φ(F)t should be able to reveal the "F t" it contains.

2. "Properties for free." The definition of a functor requires only to have the mapping

fmap : ∀t.∀a. Φ a -> (a -> t) -> Φ t.

Additionally, we need a morphism that injects or "wraps" values of F t into Φ t:

wrap : ∀t. F t -> Φ t.

We now notice that both of these required mappings are of the form

"...something..." -> Φ t.

So we can rewrite them in the "Haskell long form" (i.e. as a representable functor) and obtain the type definition of the free F-functor Φ:

data Φ t where -- this is Haskell for the English phrase "Φ t is the representing object for Ffmap * Wrap, where..."
Ffmap :: ∀a. Φ a -> (a -> t) -> Φ t
Wrap :: F t -> Φ t

In the short notation, this is equivalent to the recursive definition

Φ t = F t + ∃a. Φ a * (a -> t)

To fully "implement" the free functor, we need to define some mappings. First, we need to define "fmap":

fmap : Φ t -> (t -> s) -> Φ s

This is simply the first type constructor, Ffmap, applied to the values (Φ t) and (t -> s).

The injection mapping is the second constructor ("Wrap").

Note that we got these two mappings "for free", simply because we defined the type constructors to give us the required types directly. Of course, even "free" things are not without costs. The cost here is the fact that the constructors perform no computations at all - not even a single function composition is computed when we apply several "fmap"s in a row to a value of Φ t. The burden of computation is put entirely on the "universal runner".

Indeed, here is what happens when we first construct a value of type Φ(F) using "Wrap" on a value of "F a" and then apply two "fmap"s to it, with functions of types a->b and b->c:

First value: Wrap (F a)
After the first fmap: Ffmap ( Wrap (F a) * (a -> b))
After the second fmap: Ffmap ( Ffmap (Wrap (F a) * (a -> b)) * (b -> c)).

The free functor contains a nested data structure that merely holds all the data that has been given to it but performs no computations at all with that data. This data structure can be more easily visualized as a heterogeneous list of the form (F a, a->b, b->c, c->d, ...) Until a "runner" is applied, the free functor merely stores some value of "F a" and various functions a->b, b->c, etc., without ever composing these functions or applying them to any specific values.

It remains to implement the "universal runner". For a given mapping ∀t. F t -> G t, the required natural transformation Φ t ~> G t is constructed by mapping the recursive case Φ a ~> G a and then using the fact that G is a functor:

F t + Φ a * (a -> t) ~> G t + G a * (a -> t) ~> G t

The required diagram commutes by construction: injecting F t into Φ creates the "Wrap" variant, which gets projected immediately onto G without any transformations.

3. "Performance optimization." This rather straightforward definition of the functor Φ has the advantage of being manifestly correct, but also has the drawback that all calculations are deferred until the free functor is given to a "runner". To be sure, we can create a value of Φ t and apply several "fmaps" to it, but the free functor will merely accumulate the given data in a nested data structure. The number of nesting layers is equal to the number of "fmaps" applied to the free F-functor. The "runner" will need to unwrap all these layers, performing a lot of bookkeeping and trivial calculations, which will take O(n) steps, where n is the number of "fmaps" applied.

The definition of the free F-functor can be "optimized" by the trick of replacing the recursive occurrence "∀a. Φ a" by "∀a. F a" (making the type non-recursive!).

Φ t = F t + ∃a. F a * (a -> t)

With this implementation, the value of "F a" stays unchanged while the functions a->t, t->s, etc., get composed when several "fmap"s are applied. This allows us to perform some evaluations inside a value of Φ t before it is ever "run". Thus, we can optimize the performance of the "universal runner". There are no more nested structures; the runner works in O(1) time.

Since the "Wrap" constructor remains explicit, we can also determine (prior to "running" the free functor) whether a given value of Φ t is a "pure wrap", i.e. it has been produced by wrapping a value of "F a" without any "fmap" ever applied to that value. In this case, the "universal runner" can avoid computing any function compositions when it encounters such a "pure wrap" value. This is possibly another optimization for the "universal runner". It comes at the cost of making the "universal runner" more complicated, since it now needs to distinguish the two constructors of Φ. If this is not desired or does not produce any appreciable gain, the "Wrap" constructor can be omitted from the definition of Φ:

Φ t = ∃a. F a * (a -> t)

Now the function wrap : F t -> Φ t will need to be implemented separately. It will inject a value of "F t" and an identity map t->t into Φ t = ∃a. F a * (a -> t) where we just need to set a = t.

It should be noted that the different "levels of optimization" of Φ will produce slightly inequivalent data types. None of these types are "perfect" from the theoretical point of view. For instance, Φ t = F t + ∃a. A a * (a -> t) and Φ t = F t + ∃a. F a * (a -> t) do not satisfy the functor law

fmap id = id

because executing an fmap with an identity will convert a left variant "F t + 0" into a right variant "0 + ∃a. A a * (a -> t)", which is not an identity transformation. At the same time, Φ t = ∃a. A a * (a -> t) does not satisfy the "universal property" because it will fail to commute with a mapping from F t to the free F-functor G t = F t + ∃a. A a * (a -> t). (Injecting F t into Gt should produce the left variant F t + 0, but ∃a. A a * (a -> t) will never produce the left variant.) However, these violations of laws are not observable once we start "running" the free functors and mapping them into some non-free functors.

Exercise: Suppose we have a free functor construction Φ(F)t. Define another construction,

G t = F t + Φ(F)t.

Show that G t does not satisfy the functor law of identity, but would be otherwise a functor. Show that G t also has some of the properties of the free functor: "fmap", "wrap", and the "universal runner" can be defined.

If more optimization for the "universal runner" is desired, we can add more constructors to the definition of Φ. For example, a new constructor could denote the case when a value of Φ t is a "wrap-and-fmap" (wrapped once and fmapped once), etc. Thus there are infinitely many different data types that implement a free F-functor. They are all equivalent in terms of their goal, but offer different performance trade-offs.

We have now thorougly analyzed the free functor using the three reasoning steps ("Universal runner", "Properties for free", and "Performance optimization").
We can now apply the same reasoning steps to the constructions of free applicative functors and free monads.

Free applicatives

A functor A is applicative if there exist two natural transformations:

pure :: ∀t. t ~> A t
distr :: ∀s.∀t. A s * A t ~> A (s*t)

Let us use the constructions 1.-3. towards "deriving" a definition of a "free F-applicative" for a functor F (which is not necessarily already applicative).

The first step is to put the required properties into a representable functor. Now it is not straightforward to express the second property ("distr" stands for "distributivity") in terms of a representable functor, because that property does not have the form "...something..." -> F t. The trick used for this purpose is to replace the type "t" by a function type "s->t", and to notice that the natural transformation
F s * F (s->t) ~> F (s*(s->t))
is equivalent to what is called "ap",
ap :: ∀s.∀t. F s * F (s->t) ~> F t
via modus ponens: s*(s->t) ~> t. (The reverse direction of this equivalence is found by setting s = a and t = a*b, with the natural morphism b ~> (a->a*b), thus obtaining F a * F b -> F a * F(a -> a*b) -> F(a*b).)

Now both natural transformations are in the form "...something..." ~> F t, and we can rewrite them in the form of a representable functor, adding the obligatory "Wrap":

data A t where
Pure :: t -> A t
Ap :: A a * A (a->t) -> A t
Wrap :: F t -> A t

Converting this to the short notation, we get

A t = t + F t + ∃a. A a * A (a->t)

Indeed, all the required properties are directly satisfied by the type constructors. (It remains to define the "universal runner", which we leave as a straightforward exercise.)

Note in passim that the free F-applicative is automatically F-modular:
A x * F y ~> A (x*y).

A more optimized form of the free F-applicative is obtained by replacing the first recursive occurrence of "A a" by "F a" and eliminating the "Wrap" constructor:

A t = t + ∃a. F a * A (a->t).

Indeed, this appears to be the representable translation of a "free F-modular" functor, that is, the functor A that satisfies F x * A y -> A (x*y).

The new definition is simpler and possibly more efficient, since we now have fewer constructors and less wrapping of trivial values.

The price is a more complicated definition of "wrap" and "ap". The "wrap" function is no longer simply the type constructor, but now has to be implemented separately:

wrap = F a -> 0 + F a * (id^a->a + 0)

The implementation of the "ap" function,

ap :: A s * A (s->t) -> A t

becomes more complicated. It is defined by assuming that "ap" already works on a recursively defined instance of A (a->t). Since A t now has two variants, we need to consider two cases:

1. (s+0) * A(s->t) -> A t

Since A is a functor, we can convert s*A (s->t) into A(s*(s->t)) and finally to A t.

2. (0 + ∃a. F a * A (a->s)) * A (s->t) -> A t

By induction, we assume that we can convert A (a->s) * A (s->t) into A (a->t). (This is the distributive property of the applicative functor, which is equivalent to "ap" as we already saw.) Thus we obtain 0 + ∃a. F a * A (a->s), which is the "Ap" variant of A t.

We see that this implementation is a bit more fine-grained: in the first case, we do not use induction (so the operation takes O(1) steps). In the second case, we do not use the functor property of F; so F can actually be a non-functor! The initial value of "F a" stays constant under "ap" and "fmap", just as it does in the free functor. This implementation of free F-applicative combines the applicative property with the construction of the free F-functor.

Note that using the free F-functor instead of F in the construction of the free F-applicative will yield a different, much more cumbersome implementation, with several additional layers that will be discarded by the "universal runner". The present implementation, perhaps unexpectedly, combines the free functor and the free applicative in a concise way.

Can we eliminate the second recursive use of "A" in the definition of the free applicative functor?

Consider the following definition:

A t = t + ∃a. A a * F (a->t)

The functor property holds since A t depends on t in the covariant position (and we assume that F is a functor).

Let us try to implement "ap", which needs to have the type

ap :: A s * A (s->t) -> A t

Since A has two variants, we now need to consider the following two cases:

1. s * A(s->t) -> A t

This generates A t since A is a functor, as before.

2. A s * (0 + ∃a. A a * F (a->s->t)) -> A t

The inductive assumption about "A a" guarantees that we can convert A s * A a ~> A (a*s). Thus we can obtain

∃a. A (a*s) * F (a*s -> t) = ∃b. A b * F (b->t),

which is the second variant of A t.

The two implementations differ in how they handle the inductive case when applying "ap" to the free F-applicative: In the first implementation, functions are composed while the initial value of "F a" remains unchanged. The functor property of F remains unused. In the second implementation, a pair of two types is made but no function compositions are computed. However, the functor property of F is required.

It does not seem possible to eliminate both inductive uses of "A" in the definition of the free applicative functor! The reason is that there is no inductive assumption, and so we have no way of converting F(a)*F(b) into F(a*b) if F is not already applicative.

The "Wrap" constructor can also be retained as another alternative implementation of the free F-applicative. This will somewhat complicate the code since we will now have three cases to work with (and the functor property of F will be again required, unless we implement "fmap" in a special way to mimic the free functor construction). However, we might gain an improved efficiency since we can now recognize a pure injected value of "F a" and avoid computing unnecessary function compositions.

Exercise: Define a "free F-contrafunctor".
Answer: Φ t = ∃a. F a * (t -> a)

To be continued.