lj_chaource: Functional programming is physics

Functional programming is physics
After spending a few weeks with category theory calculations (still thinking about how to best write it up), it occurred to me that there is a close analogy between modern functional programming and physics. The analogy goes like this.

A physicist starts with an informal description of an idealized real-world situation. ("A massive cart is rolling without friction on a rigid inclined plane...") The physicist translates this description into the language of algebra, geometry, differential equations, and other mathematical tools used in physics. The physicist then uses the tools of mathematics to solve the problem. The physicist will develop new mathematical tools if necessary.

A (functional) programmer starts with an informal description of an idealized real-world situation. ("A large number of Web browser clients communicate with a Web server to store and retrieve user information...") The programmer first translates this informal description into the language of function calls and data structures. The programmer uses the tools of formal logic, category theory, and type theory to help derive the required program code. If necessary, the programmer will develop new tools (or new programming languages).

Now, this situation is not entirely new or specific to functional programming. Programmers had to do much of the same work before. Translating from natural-language problem description to a more formal specification is not new. However, imperative programming languages do not typically allow the programmer to reason about the program using purely mathematical tools such as category theory. Reasoning about programs was always ad hoc, not based on broad mathematical theories. It is only with the advent of functional programming (and programming languages such as Haskell) that programmers gained the ability to use formal logic, type theory, and category theory during certain essential steps of program design.

Note that programmers do not need to become pure mathematicians, any more than physicists do. In terms of pure mathematics, the typical tools that physicists use (e.g. solutions of some differential equations, or calculations with matrices) are not of much interest. To a mathematician, a physicist appears to be an "engineer" who is merely interested in using whatever tools are good for the job. Similarly, a programmer appears as an engineer to a pure mathematician.

However, a physicist is not an engineer because a physicist is interested not only in solving specific problems, but also in developing the theory of physics so that more problems can be solved more easily. Similarly, functional programmers (in the last 15 years or so) have not only written programs but developed a sizable body of theoretical work, a "theory of programming".

For a taste of this, take a look at the "Haskell wikibook". After an introduction into the syntax and basic constructs of the language, the book seems to drift into theoretical computer science, with chapter headings such as "Monads", "Arrows", "Applicative functors", "Monoids", "Existentially quantified types" and so on to category theory. The reason the book goes so far into mathematics is not some mathematician's idle wish to translate this or that construction from category theory into Haskell. Rather, it's the fact that Haskell programming, at an advanced level, actually consists of doing some (simple) calculations and reasoning in type theory and category theory. Functors, natural transformations, and other such tools are actively used when programming database applications, Web services, or data science pipelines in Haskell. When I look to studying more advanced Haskell idioms, I have a very similar "immersion" feeling: there is a mathematical theory that allows me to obtain a lot of in-depth understanding rather quickly, as long as I go through the definitions and some example calculations.

The application of (intuitionistic) type theory and category theory to functional programming is not interesting to pure mathematicians studying logic or category theory. This situation is quite similar to the fact that many calculations necessary in theoretical physics are not interesting to pure mathematicians. People who develop a theory behind these calculations (when new bits of theory are required) are not mathematicians but physicists. The science of physics consists of applications of pure mathematics to idealized world.

Similarly, functional programmers develop useful calculations in type theory and category theory when new calculations are required for practical programming. The science of computing ("computer science") consists of applications of pure mathematics to the idealized world of information processing. Functional programming is the first real breakthrough where a significant chunk of pure mathematics (formal logic and category theory) suddenly found fruitful new applications.

I would compare this to the unexpected applications of representation theory in particle physics. Where before we had a disorganized soup of particles, we now have used the classification theorem for Lie algebras and their representation theory, and found enlightenment. Where before we had a haphazard bag of programming "tricks" and language "features", we now have a few clear mathematical principles upon which we can rely both in designing new programming languages and in writing programs.

Just as mathematics was noted to be of "unusual effectiveness" for physics, logic has been noted for its "unreasonable effectiveness" in computer science. This is perhaps only a beginning: computer science is very young and has discovered relatively little. But it is going in the same direction as physics.

What is "functional programming"

The paradigm of "functional programming" rests on four core principles:

1. Programs should be mathematical formulas, or "expressions" containing "named variables". Variables and expressions should be immutable. Expressions can be anonymous (do not necessarily need a name).
2. Each expression and each variable should have a well-defined type (the type can be that of a number, a function, etc.). There are otherwise no restrictions for constructing expressions.
3. The type system and the predefined constructions of the programming language should fully correspond to some intuitionistic logic (propositional with universal quantifiers, 1st order or higher-order, temporal, modal, linear, etc.). The choice of logic is a matter of debate, but there must be a consistent logic behind the programming language, and the language should faithfully reproduce its logic in full (no missing axioms or derivation rules, no extra axioms or derivation rules).
4. The type system should include some category-theoretic constructs (products, co-products, terminal object, functors, natural transformations).

It is not obvious why programming languages should adhere to items 1-4. Many widely used programming languages do not; programming in those languages resembles not physics but rather automobile repair. At the moment, only Scala and Haskell (along with several Haskell-based languages such as Purescript, Agda, Idris) adhere to the functional programming principles sufficiently closely to allow the programmer to use mathematical reasoning.

Why are these principles useful? Let us consider each of them in turn.

Everything is an expression

An example of mathematical expression is

∑ⁿ_k=1 k^2.

This expression represents a function of "n" that can be evaluated to a number if "n" is given as a number.

Mathematics has given us a lot of experience working with expressions like these. For instance, we know that it is useful if we can to substitute some other formula instead of "n" into this expression. It is also useful to be able to call this entire expression by a symbol, say, "S", and then substitute this "S" into some other expression that uses S as a variable, for example, (1-S)/(1+S). In mathematical texts, this kind of substitution is often written like this,

(1-S)/(1+S), where S = ∑ⁿ_k=1 k^2.

We know that it is very useful if we can restrict substitution to reasonable cases (e.g. that S should be a number when substituted, rather than, say, a geometric surface). At the same time, mathematics does not have any restrictions on the length or complexity of expressions that can be substituted. For instance, the fact that S contains a summation is not a problem - we can still substitute S into any other expression.

There are, however, quite a few programming languages that do not allow us to write expressions of arbitrary complexity. For example, not many programming languages can represent the summation above (the definition of "S") as an expression. In Haskell, this looks like sum $ (1 .. 10) `map` (\k -> k*k). In Scala, this is (1 to 10) .map(k => k*k) .sum. But in Python, we can do this only if the function (k -> k^2 in this case) is sufficiently simple, so that it fits into one line of code! This is the kind of arbitrary restrictions on expressions that (among other problems) makes Python unsuitable for functional programming.

If we think about a mathematical expression, we realize that it is meaningless to say that an expression such as (1-S)/(1+S) can "change" or can "be modified". This expression is, by definition, always the same; we cannot modify it, any more than we can "modify" the number 1. If we need a different expression, such as (2-S)/(2+S), we can simply write it down beside the first one.

A "variable" is also something that cannot be meaningfully "modified" within an expression. Consider the variable S in the expression (1-S)/(1+S). The variable S can be either set to a value or not (yet) set; but once S is set to a value, it cannot be modified within the expression (1-S)/(1+S). The entire history of mathematical experience shows us that this convention (all variables are immutable) makes the most sense. I have never seen a text in mathematics that considers the value of S in an expression such as (1-S)/(1+S) to be something that could change midway, so that, e.g., the "S" in "1-S" is different from the "S" in "1+S". This kind of behavior would have been extremely confusing. So, mathematical "variables" are actually "named constant values". In programming literature, this property is called "immutability".

A programming language that does not allow the programmer to have immutable variables is severely handicapped. The programmer can, of course, take care that nothing is modified, but errors will inevitably occur, and the temptation to modify things (so that the program runs faster) will be impossible to resist.

Among the most popular languages today are Python and Javascript; neither one can enforce immutability of values. A year ago a colleague (who was coding in Javascript) showed me a bug that was very difficult to diagnose. The bug manifested itself in a code that looked extremely simple:

var x = 123;
function f() { return x; }

The function "f" was supposed to always return 123. It did return 123, but only the first time it was called! After the first call, f() would return another value. The reason was that the function was called by a widely popular Javascript library that my colleague also used. This library modified the result of calling f(), which was the mutable variable x. The library code looked innocuous as well:

var y = f();
...
y = 10;
...

The programmers who wrote the library surely thought that they are just "reusing" the variable "y" for convenience. But the result was that the "x" in my colleague's code got set to 10 after the first call to f(). My colleague spent several hours until we found this. The problem would be avoided if the programmers didn't modify variables.

Another important feature of a functional programming language is that expressions should be "anonymous", that is, not labeled by a name. When a programmer writes "R = (1-S)/(1+S)", the expression "(1-S)/(1+S)" is being labeled by the variable name "R". However, the subexpression "(1-S)" does not bear a name. We take this for granted in mathematics that we can simply write "(1-S)/(1+S)" and not have to put a name to each part of the expression. However, this was not always so for programming languages. For instance, assembly language programming is essentially equivalent to putting a name to each and every subexpression, so that instead of "(1-S)/(1+S)" we would have to write something like

R₁ = 1
R₂ = S
R₃ = R₁ - R₂
R₄ = R₁ + R₂
R₅ = R₃ / R₄
return R₅

This madness is very similar to actual assembly language programming. Some people would argue that this kind of programming is "more efficient" because, for instance, we have reused "1" and "S", which makes our program "faster" and uses "less memory". This is true in a technical sense, but the speedup and the savings on memory are very small, compared with the huge cost of writing and testing this kind of programs.

The lack of anonymous expressions is one of the reasons assembly language is so tedious. FORTRAN was the first language that had anonymous subexpressions, and they have been in almost every programming language since then. If we want to take this principle to its logical conclusion, we should allow the entire program to be written as one anonymous expression, with no variable names introduced. Thus, the programming language should allow us to write anonymous function expressions.

For example, we should be able to write a function that takes k and returns k*k. In the syntax of Haskell, this is written as \k -> k*k. In Scala, k => k*k. These expressions are functions without names, and they can be handled as any other expressions: we can give them as arguments to other functions, store them in data structures, etc.

The ability to write anonymous function expressions gives the programmer such a tremendous productivity boost that even extremely conservative standards bodies such as C++ and Java standardization committees caved in (quite recently) and included anonymous functions in their respective language standards.

Type systems

Most programming languages created recently (F#, Scala, Swift, Rust, etc.), as well as all ML and Haskell dialects, are based on more or less the same type system: algebraic data types with simple polymorphism, with Hindley-Milner-Damas type checking. This type system corresponds to a certain flavor of lambda calculus with types. There are some variations and experimental extensions in certain languages, but the core remains the same. It appears that the industry has reached a consensus that, whatever other features the language may have, its type system should at least support the Hindley-Milner types.

This core type system corresponds to a restricted version of 1st order intuitionistic logic. If the programming language supports a type system that corresponds to this logic, two benefits are immediately obtained:

- We know that the logic cannot derive a contradiction (mathematicians already proved that). Deriving a contradiction corresponds quite literally to crashing the program because of invalid or undefined operations (such as, dividing a string by an array of colors). So, our programs will never crash due to incorrect types of operations.

- We can sometimes derive code from types alone. For example, if we need a function that takes an argument of type t and returns a pair of values (t,t), for any type t, then there is only one way of writing code for this function:

x => (x,x)

This can be derived automatically by a constructive decision procedure. Given a function type, this procedure either finds that there are no functions of a given type, or produces the program code for that function. This functionality can be built into a compiler or a programming environment, helping the programmer avoid errors.

Category-theoretic tools

One of the features of the Hindley-Milner type system is the presence of so-called algebraic data types. These are motivated by category theory and include the following:

- A type corresponding to the terminal object. This type is denoted by "unit" in Haskell and by "Unit" in Scala.
- For any two types T and U, the product type T*U and the co-product type T+U.
- For any two types T and U, the type T->U, meaning a function taking a value of type T as its (only) argument and returning a value of type B.

Haskell and Scala support some more advanced features in their type system:

- For any functor F and any type T, the type F(T) as well as the functor F itself as a (higher-order) type.
- For any natural transformation N, the result of applying N to a suitable functor F, as well as N itself as a higher-order type.

In addition to plain functors, Haskell and Scala can implement functors with extra properties, often formulated via the existence of suitable natural transformations or suitable adjoint functors. For instance, a functor M is called "monadic" if it has two natural transformations: from identity functor to M, and from M∘M to M.

Functors are widely used in functional programming to implement generic data containers. Where a programmer in a traditional language such as Java would write a loop over an array, functional programming represents the array as a functor with suitable properties, so that a loop becomes an expression. Functional programming is almost entirely loop-free, the exception being some rare recursive algorithms that are difficult to express in terms of generic properties of functors.

What we need to learn

A physicist needs to learn three things:

- the mathematical theory of physical phenomena
- how to translate plain language descriptions into formal mathematical equations
- how to solve the equations and get the result

A functional programmer needs to learn three things:

- the mathematical theory of information processing
- how to translate plain language descriptions into formal descriptions or mathematical specifications
- how to write code implementing the specifications

Programmers in non-functional languages do not need to learn the first item on this list. This is an advantage for employers who can more easily find Java, Python, or Javascript programmers. However, having to learn mathematics is a significant filter which works to the advantage of the functional programmer. The increased productivity and code quality will be appreciated by the most innovative employers, who will be happy to hire mathematically minded individuals. This is another instance of the "revenge of the nerds".

To summarize:

Physics develops a mathematical theory for movement of matter and provides a toolbox for solving practical problems in that domain. Functional programming does the same about information processing.