lj_chaource: Representable functors and GADTs

Representable functors and GADTs
In functional programming, functors represent "generic data containers". The type F(t) is interpreted as the "data container" that contains data values of type t.

Definition. A functor F is representable if there exists a type r (called the representing object) and a natural isomorphism between the functors r -> t and F(t).

The existence of an isomorphism means the existence of two natural transformations, which we can interpret more visually if we assume that F(t) represents a data type containing an "indexed" table of values of t, such that each value of t is located at a certain "index", while index values are all of type r. The natural transformations are

(r -> t) ~> F t -- create a value of type F(t) that has a value "t" at every "index" value r

and

F t ~> (r -> t) -- given a value of type F(t) and a value of the "index" r, we can extract the value t at that index

Examples:

Polynomial functors are representable only if they have the monic monomial form t*t*...*t. These containers have exactly n values of type t (so the "index" type is isomorphic to the set of integers from 1 to n). All other polynomial functors are not representable. For example, t+1 or c*t are not representable. The "List" functor is not representable either. The reason why t+1 or List(t) cannot be representable is that they can be empty (contain no values of t), while we are required to be able to extract a value of t out of the functor values, for each value of the "index" r.

For the monic monomial F = t*t*...*t, the representing object is r = 1+1+...+1. This is easy to see: for example, (1+1)->t is equivalent to the product (1->t)*(1->t), which in turn is simply equivalent to t*t.

Exp-poly functors can be representable if they are products of functors of the form c -> t for some types c. Co-products such as A+B are not representable. The functor (t -> b) -> b (where "b" is a fixed type) is not representable because we cannot get a value of t out of a value of (t -> b) -> b.

Generally, if F(t) is representable by an object c then we have a naturally defined value of type F(c). This value corresponds to the identity morphism id : c->c by the isomorphism of F(c) and c->c. (This is, of course, a rather weak necessary condition.)

There is a curious formal analogy between taking the representing object of a representable functor and taking the logarithm:

If b->t is denoted by t^b, then b = log (b -> t) / log t.

So, representable functors are those which "have a logarithm that is divisible by log t". This is nonsense but quite suggestive.

Besides the analogy with indexed data containers, the concept of representable functors is useful for understanding generalized data types (GADTs). It turns out that functional languages define algebraic data types (including all polynomial functors and GADTs) using representable functors in disguise.

In Haskell, there are two alternative syntaxes for defining a polynomial data type:

First, one can write a data type definition in the "direct" form,

data T = A Int | B Float | C Bool Int

In the short notation, this is

T = Int + Float + Bool*Int = a + b + c*d -- for suitable constant types a,b,c,d

Second, one can define the same data type using a longer (but equivalent) syntactic form,

data T where
A :: Int -> T
B :: Float -> T
C :: Bool -> Int -> T

This form is used for GADTs, but right now we are going to use it for ordinary algebraic data types.

The two syntactic forms are equivalent in Haskell because "data T = A Int | B Float | C Bool Int" also defines the three constructor functions A, B, C in exactly the same way.

However, the "long form" of the data type looks very much like a product of three types,
(Int -> t)*(Float -> t)*(Bool -> Int -> t).
Let us temporarily consider this type as a functor in the type variable t:

F(t) = (a->t)*(b->t)*(c*d->t) -- for suitable constant types a,b,c,d

Now we notice that the functor F(t) is representable, and its representing object is a + b + c*d - which is precisely the data type being defined.

So, the "long form" specifies the data type as the representing object of the functor F.

Note that in Scala, the "long form" is the only syntax available for algebraic data types. The example above is written in Scala as

sealed trait T
case class A(a:Int) extends T
case class B(b:Float) extends T
case class C(x: Bool, y: Int) extends T

In Scala, we are obliged to provide names A, B, C for all type constructors and names a,b,x,y for every value. In Haskell, we are only obliged to provide names for type constructors. In our short notation, we do not provide any names:

T = Int + Float + Bool*Int

Recursive data types

We can try using this technique for understanding the definitions of recursive data types when written in the long form. Consider the List data type, specialized from a functor t=>List(t) to a constant type List(c), which we will denote List_c. The type List_c is defined recursively,

List_c = 1 + c*List_c

The "long form" for List_c is

data List_c where
Nil :: List_c -- equivalently 1 -> List_c
Cons :: c -> List_c -> List_c

If we now try to rewrite this as a functor, replacing List_c by the type variable "t", we get

F(t) = (1->t)*(c->t->t).

This is actually not a functor but only a profunctor, because "t" occurs both in covariant and contravariant positions. A profunctor that is not actually a functor cannot be representable since it cannot be isomorphic to a functor of the form r -> t for some type r.

The interpretation of Haskell's "long form" type definitions must be different. We are not actually writing a profunctor F(t) shown above. A first hint is that Haskell will not allow a constructor that does not return the parent type. So, for instance, Haskell rejects this:

data B where
P :: B -> Int -- error: the function is not returning B

So, obviously, the last type being returned is treated specially. Let us replace that with the type variable "t" by hand.

Second, if there are any recursive instances of that type, such as the first "List_c" in
Cons :: c -> List_c -> List_c,
the compiler will have to know about those anyway, to generate the proper fixpoint. So let us replace all recursive instances with the type variable "s".

The result is that we translate the definition

data List_c where
Nil :: List_c -- equivalently 1 -> List_c
Cons :: c -> List_c -> List_c

into the type function
F(s,t) = (1 -> t) * (c -> s -> t).
By construction, F(s,t) is a contrafunctor in s and a representable functor in t.
Thus we may ask, what is the functor R such that F(s,t) = R(s) -> t.
The type R(s) is defined as the representing object of F(s,t) with respect to t, and it is easy to see that R(s) is a functor in s since F(s,t) was a contrafunctor in s.

In our example, R(s) = 1 + c*s.

Now, we interpret this data type definition as the fixpoint of R(s) with respect to s.

Fix_sR(s) = Fix_s(1+c*s) = List(c).

Two more examples of the "long form" of the data type definition interpreted as a fixpoint of the representing object:

1. The "natural number" type.

data Nat where
Zero :: Nat
Succ :: Nat -> Nat

This defines the type function F(s,t) = t * (s->t). We find the representing object for this as R(s) = 1+s. The "natural number" type is therefore defined via the following fixpoint,
Nat = Fix_s(1+s).

Informally, Nat = 1 + (1 + (1 + ...)...)

The direct definition in Haskell is
data Nat = Zero | Succ Nat

2. The "binary tree".

data Btree where
Leaf :: c -> Btree
Branch :: Btree -> Btree -> Btree

This defines the type function F(s,t) = (c->t)*(s->s->t). Note that we translate the first two occurrences of "Btree" in "Branch" as the type variable "s" but the last occurrence as the type variable "t". This is because the first two occurrences are recursive while the last is the mandatory return of the type "Btree" that we are in the process of defining.

The representing object for F(s,t) is R(s) = c + s*s. Thus the type is Btree = Fix_s(c+s*s).

The direct definition in Haskell is

data Btree = Leaf c | Branch Btree Btree

Questions:

- Can we define functor types such as List(c) via representable functors? How do we handle the type dependence of the functor?

We can do this if we simply ignore the type parameter in the functor. For example, the definition of a binary tree with a type parameter,

data Btree c where
Leaf :: c -> Btree c
Branch :: Btree c -> Btree c -> Btree c

is interpreted as the specification of a value of F(c,s,t) where the functor F is

F(c,s,t) = (c->t)*(s->s->t).

This functor (in its dependence on t) is representable, and the representing object is R(c,s) = c + s*s. Then we define the functor Btree(c) as the fixpoint Fix_sR(c,s). We can easily show that the result is a functor in c.

- Can GADTs be defined via representable functors in the same way? How can we handle the type dependence and the possible type specialization of GADTs?

GADTs are not covered by our short notation. A simple example of a GADT would be a co-product whose variants are differently parameterized. Consider a functor F(t) defined like this,

F(t) = a + b,

where a,b are constant types. Specific examples of values of F(t) are a+0 and 0+b. Normally, both of these variants have type F(t) -- in this case, they are independent of t. However, a GADT may specify that a+0 has type, say, F(b) while 0+b has type F(a).

In Haskell, this definition will look like this:

data F t where
P :: a -> F b
Q :: b -> F a

We might introduce special short notation for this, for example,

(a+0)^F(b), (0+b)^F(a)

However, I'm not sure at this point how useful this notation will be.