## The Conception, Evolution, and Application of Functional Programming Languages

Paul Hudak Yale University

Department of Computer Science March 1989

Final Draft^{∗}

**Abstract**

The foundations of functional programming languages are examined from both
historical and technical perspectives. Their evolution is traced through several
critical periods: early work on lambda calculus and combinatory calculus, Lisp,
Iswim, FP, ML, and modern functional languages such as Miranda^{1} and Haskell.

The fundamental premises on which the functional programming methodology stands are critically analyzed with respect to philosophical, theoretical, and prag- matic concerns. Particular attention is paid to the main features that characterize modern functional languages: higher-order functions, lazy evaluation, equations and pattern-matching, strong static typing and type inference, and data abstrac- tion. In addition, current research areas—such as parallelism, non-determinism, input/output, and state-oriented computations—are examined with the goal of pre- dicting the future development and application of functional languages.

**Categories and Subject Descriptors:**D.1.1 [Programming Techniques]: Applicative
(Functional) Programming; D.3.2 [Programming Languages]: Language Classiﬁcations—

applicative languages; data-ﬂow languages; nonprocedural languages; very high- level languages; F.4.1 [Mathematical Logic and Formal Languages]: Mathematical Logic—lambda calculus and related systems; K.2 [History of Computing]: Software.

**General Terms:** Languages.

**Additional Key Words and Phrases:** referential transparency, higher-order func-
tions, types, lazy evaluation, data abstraction.

**1** **Introduction**

The earliest programming languages were developed with one simple goal in mind: to provide a vehicle through which one could control the behavior of computers. Not

*∗*To appear in ACM Computing Surveys.

1Miranda is a trademark of Research Software Ltd.

surprisingly, these early languages reﬂected fairly well the structure of the underlying machines. Although at ﬁrst blush this seems emminently reasonable, this viewpoint quickly changed, for two very good reasons. First, it became obvious that what was easy for a machine to reason about was not necessarily easy for a human to reason about. Second, as the number of diﬀerent kinds of machines increased, the need arose for a common language with which to program all of them.

Thus from primitive assembly languages (which were at least a step up from raw machine code) there grew a plethora of high-level programming languages, beginning with Fortran in the 50’s. The development of these languages grew so rapidly that by the 80’s they were best characterized by grouping them into “families” that reﬂected a common computation model or programming “style.” Debates over which language or family of languages is best will undoubtedly persist for as long as computers need programmers.

The class of *functional, orapplicative*programming languages, in which computa-
tion is carried out entirely through the*evaluation of expressions, is one such family of*
languages, and debates over its merits have been quite lively in recent years. Are func-
tional languages toys, or are they tools? Are they artifacts of theoretical fantasy, or of
visionary pragmatism? Will they ameliorate our software woes, or merely compound
them? Whatever answers one might have for these questions, one cannot ignore the
signiﬁcant interest current researchers have in functional languages, and the impact
they have had on both the theory and pragmatics of programming languages in general.

Among the claims made by functional language advocates are that programs can be written quicker, are more concise, are “higher level” (resembling more closely tra- ditional mathematical notation), are more amenable to formal reasoning and analysis, and can be executed more easily on parallel architectures. Of course, many of these features touch on rather subjective issues, which is one reason why the debates can be so lively.

It is my purpose in writing this paper to give the reader signiﬁcant insight into the very essence of functional languages and the programming methodology that they support. I will do so by starting with a general discussion of the nature of functional languages, followed by a historical sketch of their development, a summary of the dis- tinguishing characteristics of modern functional languages, and a discussion of current research areas. Through this study I hope to put into perspective both the power and weaknesses of the functional programming paradigm.

**A Note to the Reader:** I have assumed in writing this paper that the reader has a
good understanding of the fundamental issues in programming language design and
use. Although reasonably self-contained, this paper should not be considered as a tu-
torial on functional languages; the reader is referred to either [BW88] or [FH88] as suit-
able textbooks for learning more about modern functional programming techniques,
including the important ideas behind *reasoning* about functional programs. In addi-
tion, I will say very little about how to*implement*functional languages; see [PJ87] for

an introduction to this important area (additional references are given in Sections 2.8 and 6).

Finally, a comment on notation: unless otherwise stated, all examples will be written
in Haskell, a recently proposed functional language standard [HWe88]. However, rather
than explain the syntax all at once, explanations will be given “on the ﬂy” and enclosed
in square brackets [...].^{2}

**1.1** **The Programming Language Spectrum**

*Imperative languages* are characterized as having an *implicit state* that is modiﬁed
(i.e. side-eﬀected) by constructs (i.e. commands) in the source language. As a result,
such languages generally have a notion of *sequencing* (of the commands) to permit
precise and deterministic control over the state. Most, including the most popular,
languages in existence today are imperative.

As an example, the *assignment statement* is a (very common) *command, since its*
eﬀect is to alter the underlying implicit store so as to yield a diﬀerent binding for a
particular variable. Thebegin...endconstruct is the prototypical*sequencer*of com-
mands, as are the well-known*goto statement*(unconditional transfer of control),*con-*
*ditional statement* (qualiﬁed sequencer), and *while loop*(an example of a “structured
command”). With these simple forms we can then, for example, compute the factorial
of the numberx:

n := x;

a := 1;

while n>0 do begin a := a*n;

n := n-1 end;

...

After execution of this program the value of a in the implicit store will contain the desired result.

In contrast,*declarative languages*are characterized as having*no*implicit state, and
thus the emphasis is placed entirely on programming with *expressions* (or*terms). In*
particular,*functional languages*are declarative languages whose underlying model of
computation is the *function*(in contrast to, for example, the *relation* that forms the
basis for logic programming languages.

2Since the Haskell Report is relatively new, some minor changes to the language may occur after this paper has appeared. An up-to-date copy of the Report may be obtained from the author.

In a declarative language state-oriented computations are accomplished by carry-
ing the state around explicitly rather than implicitly, and looping is accomplished via
*recursion*rather than by sequencing. For example, the factorial ofxmay be computed
in the functional language Haskell by:

fac x 1

where fac n a = if n>0 then fac (n-1) (a*n) else a

in which the formal parameters n and a are examples of “carrying the state around
explicitly,” and the recursive structure has been arranged so as to mimic as closely as
possible the looping behavior of the program given earlier. Note that the conditional in
this program is an*expression*rather than*command; i.e. it denotes avalue*(conditional
on the value of the predicate), rather than denoting a sequencer of commands. Indeed
the*value of the program*is the desired factorial, rather than it being found in an implicit
store.

Functional (in general, declarative) programming is often described as expressing
*what*is being computed rather than*how, although this is really a matter of degree. For*
example, the above program may say less about how factorial is computed than the
imperative program given earlier, but is perhaps not as abstract as:

fac x

where fac n = if n==0 then 1 else n * fac (n-1)

[==is the inﬁx operator for equality.] which appears very much like the mathematical deﬁnition of factorial, and is indeed a valid functional program.

Since most languages have expressions, it is tempting to take our deﬁnitions liter- ally and describe functional languages via derivation from conventional programming languages: simply drop the assignment statement and any other side-eﬀecting prim- itives, and there you have it! This, of course, is very misleading. The result of such a derivation is usually far less than satisfactory, since the purely functional subset of most imperative languages is hopelessly weak (although there are important excep- tions, such as Scheme [RCe86]).

Rather than saying, then, what functional languages*don’t*have, it is better to char-
acterize them by the features they *do* have. For modern functional languages, those
features include higher-order functions, lazy evaluation, pattern-matching, and various
kinds of data abstraction — all of these features will be described in detail in this paper.

Functions are treated as ﬁrst-class objects, are allowed to be recursive, higher-order, and polymorphic, and in general are provided with mechanisms that ease their deﬁni- tion and use. Syntactically, modern functional languages have an “equational” look, in which functions are deﬁned using mutually recursive equations and pattern-matching.

This discussion suggests that what is important is the functional programming
*style, in which the above features are manifest, and in which side eﬀects are strongly*
discouraged but not necessarily eliminated. This is the viewpoint taken by the ML com-
munity, for example, and to some extent the Scheme community. On the other hand,
there is a very large contingency of “purists” in the functional programming commu-
nity who believe that purely functional languages are not only suﬃcient for general
computing needs, but also better because of their “purity.” At least a dozen purely
functional languages exist along with their implementations.^{3} The main property that
is lost when side eﬀects are introduced is *referential transparency; this loss in turn*
impairs*equational reasoning, as described below.*

**1.2** **Referential Transparency and Equational Reasoning**

The emphasis on a pure declarative style of programming is perhaps the hallmark of
the functional programming paradigm. The term*referentially transparent*is often used
to describe this style of programming, in which “equals can be replaced by equals.” For
example, consider the (Haskell) expression:

... x+x ...

where x = f a

The function application(f a)may be substituted for any free occurrence ofxin the
scope created by thewhere expression, such as in the subexpressionx+x. The same
cannot generally be said of an imperative language, where one must ﬁrst be sure that
no assignment to x is made in any of the statements intervening between the initial
deﬁnition of x and one of its subsequent uses.^{4} In general this can be quite a tricky
task, for example in the case where procedures are allowed to induce non-local changes
to lexically scoped variables.

Although the notion of referential transparency may seem like a simple idea, the clean equational reasoning that it allows is very powerful, not only for reasoning for- mally about programs, but also informally in writing and debugging programs. A pro- gram in which side-eﬀects are minimized, but not eliminated, may still beneﬁt from equational reasoning, although naturally more care must be taken when applying such reasoning. The degree of care, however, may be much higher than one might think at ﬁrst: most languages that allow “minor” forms of side-eﬀects do not minimize their

3This situation forms an interesting contrast with the logic programming community, where Prolog is often described as declarative (whereas Lisp is usually not), and there are very few pure logic programming languages (and even fewer implementations).

4In all fairness, there are logics for reasoning about imperative programs, such as those espoused by Floyd, Hoare, Dijkstra, Wirth, and others. None of them, however, exploit any notion of referential transparency.

locality lexically—thus any call to any function in any module might conceivably intro- duce a side-eﬀect, in turn invalidating many applications of equational reasoning.

The perils of side-eﬀects are appreciated by the most experienced programmers in
any language, although most are loathe to give them up completely. It remains the
goal of the functional programming community to demonstrate that one can do*com-*
*pletely*without side-eﬀects, without sacriﬁcing eﬃciency or modularity. Of course, as
mentioned earlier, the lack of side-eﬀects is not all there is to the functional program-
ming paradigm—as we shall soon see, modern functional languages rely heavily on
certain other features, most notably higher-order functions, lazy evaluation, and data
abstraction.

**1.3** **Plan of Study**

Unlike many developments in computer science, functional languages have maintained the principles on which they were founded to a surprising degree. Rather than chang- ing or compromising those ideas, modern functional languages are best classiﬁed as embellishments of a certain set of ideals. It is a distinguishing feature of modern func- tional languages that they have so eﬀectively held on to pure mathematical principles in a way shared by very few other languages.

Because of this, one can learn a great deal about functional languages simply by studying their evolution closely. On the other hand, such a study may fail to yield a consistent treatment of any one feature that is common to most functional languages, for it will be fractured into its manifestations in each of the languages as they were historically developed. For this reason I have taken a three-fold approach to our study:

First, in the next section I will provide a historical sketch of the development of functional languages. Starting with the lambda calculus as the prototypical functional language, I will gradually embellish it with ideas as they were historically developed, leading eventually to a reasonable technical characterization of modern functional lan- guages.

In so doing, however, I will postpone detailed discussion of four important concepts—

higher-order functions, lazy evaluation, data abstraction mechanisms, and equations/pattern- matching—which are critical components of all modern functional languages and are best discussed as independent topics. Section 3 is devoted to these important ideas.

Finally, in Section 4 I will discuss more advanced ideas, and outline some critical research areas. Then to round out the paper, in Section 5 I will try to put into perspec- tive some of the limitations of functional languages by examining some of the myths that have accompanied their development.

**2** **The Evolution of Functional Languages**

**2.1** **The Lambda Calculus**

The development of functional languages has been inﬂuenced from time to time by
many sources, but none is as paramount nor as fundamental as the work of Alonzo
Church on the*lambda calculus*[Chu33, Chu41]. Indeed the lambda calculus is usually
regarded as the ﬁrst functional language, although it was certainly not thought of as
programming language at the time, given that there were no computers to run the
programs on! In any case, modern functional languages can be thought of as (non-
trivial) embellishments of the lambda calculus.

It is often thought that the lambda calculus also formed the foundation for Lisp, but this in fact appears not to be the case [McC78]. The impact of the lambda calculus on early Lisp development was minimal, and it has only been very recently that Lisp has begun to evolve more toward lambda calculus ideals. On the other hand, Lisp had a signiﬁcant impact on the subsequent development of functional languages, as will be discussed in Section 2.2.

Church’s work was motivated by the desire to create a*calculus*(informally, a syntax
for terms and set of rewrite rules for transforming terms) that captured one’s intuition
about the behavior of*functions. This approach is counter to the consideration of func-*
tions as, for example, *sets* (more precisely, sets of argument/value pairs), since the
intent was to capture the *computational*aspects of functions. A calculus is a formal
way for doing just that.

Church’s*lambda calculus*was the ﬁrst suitable treatment of the computational as-
pects of functions. Its type-free nature yielded a particularly small and simple calculus,
and it had one very interesting property, capturing functions in their fullest generality:

functions could be applied to themselves. In most reasonable theories of functions as
sets, this is impossible, since it requires the notion of a set containing itself, result-
ing in well-known paradoxes. This ability of self-application is what gives the lambda
calculus its power, since it allows one to gain the eﬀect of recursion without explic-
itly writing a recursive deﬁnition. Despite this powerful ability, the lambda calculus is
*consistent*as a mathematical system—no contradictions or paradoxes arise.

Because of the relative importance of the lambda calculus to the development of functional languages, I will describe it in some detail in the remainder of this section, using modern notational conventions.

**2.1.1** **Pure Untyped Lambda Calculus**

The abstract syntax of the*pure untyped lambda calculus*(a name chosen to distinguish
it from other versions developed later) embodies what are called *lambda expressions,*

deﬁned by:^{5}

*x* *∈* *Id* Identiﬁers

*e* *∈* *Exp* Lambda Expressions

where*e*::*=x* *|e*1*e*2 *|λx.e*

Expressions of the form *λx.e* are called*abstractions, and of the form* *(e*1 *e*2*)* are
called *applications. It is the former that captures the notion of a function, and the*
latter that captures the notion of application of a function. By convention, application
is assume to be left associative, so that*(e*1 *e*2 *e*3*)*is the same as*((e*1*e*2*) e*3*).*

The rewrite rules of the lambda calculus depend on the notion of*substitution*of an
expression*e*1 for all “free” occurrences of an identiﬁer *x* in an expression *e*2, which
we write as *[e*_{1}*/x]e*_{2}.^{6} Most systems, including both the lambda calculus and pred-
icate calculus, that utilize substitution on identiﬁers must be careful to avoid name
conﬂicts. Thus, although the intuition behind substitution is strong, its formal deﬁni-
tion can be somewhat tedious—the reader who is comfortable with his/her intuition
about substitution may skip over the next paragraph, which is included primarily for
completeness.

To understand substitution we must ﬁrst understand the notion of the*free variables*
of an expression*e, which we write asfv(e), and deﬁne by the following simple rules:*

*fv(x)* *= {x}*

*fv(e*_{1} *e*_{2}*)* *=* *fv(e*_{1}*)∪fv(e*_{2}*)*
*fv(λx.e)* *=* *fv(e)− {x}*

We say that *x* *is free in* *e* iﬀ *x* *∈* *fv(e). The substitution* *[e*1*/x]e*2 is then deﬁned
inductively by:

*[e/x*_{i}*]x*_{j}*=*

*e,* if*i=j*
*x*_{j}*,* if*i=j*

*[e*1*/x](e*2 *e*3*)* *=* *([e*1*/x]e*2*) ([e*1*/x]e*3*)*

*[e*_{1}*/x*_{i}*](λx*_{j}*.e*_{2}*)* *=*

*λx*_{j}*.e*2*,* if*i=j*

*λx*_{j}*.[e*_{1}*/x*_{i}*]e*_{2}*,* if*i=j*and*x*_{j}*∈fv(e*_{1}*)*

*λx**k**.[e*1*/x**i**]([x**k**/x**j**]e*2*),* otherwise, where*k=i, k=j,*
and*x*_{k}*∈fv(e*1*)∪fv(e*2*)*

The last rule is the subtle one, since it is where a name conﬂict could occur, and is resolved by making a name change. The following example demonstrates application

5The notation “d *∈* *D” means that**d* is a “typical element” of the set*D, whose elements may be*
distinguished by subscripting. In the case of identiﬁers, we assume that each*x**i*is unique; i.e.,*x**i**=**x**j*if
*i**=**j. The notation “d*::*=**alt1**|**alt2**| · · · |**altn” is standard BNF syntax.*

6In denotational semantics the notation*e[v/x]*is used to denote the function *e** ^{}*that is just like

*e*except that

*e*

^{}*x*

*=*

*v. Our notation of placing the brackets in front of the expression is to emphasize that*

*[v/x]e*is a

*syntactic*transformation on the expression

*e*itself.

of all three rules:

*[y/x]((λy.x)(λx.x)x)* *≡* *(λz.y)(λx.x)y*

To complete the lambda calculus we deﬁne three simple rewrite rules on lambda expressions:

1. *α-conversion (“renaming”):* *λx*_{i}*.e⇔λx*_{j}*.[x*_{j}*/x*_{i}*]e, where* *x*_{j}*∈fv(e).*

2. *β-conversion (“application”):* *(λx.e*1*)e*2*⇔[e*2*/x]e*1.
3. *η-conversion:* *λx.(e x)⇔e, ifx∈fv(e).*

These rules, together with the standard equivalence relation rules for reﬂexivity, sym-
metricity, and transitivity, induce a theory of *convertibility* on the lambda calculus,
which can be shown to be consistent as a mathematical system.^{7} The well-known
Church-Rosser Theorem [CR36] (actually two theorems) is what embodies the strongest
form of consistency, and has to do with a notion of *reduction, which is the same as*
convertibility but restricted so that*β-conversion andη-conversion only happen in one*
direction:

1. *β-reduction:* *(λx.e*_{1}*)e*_{2}*⇒[e*_{2}*/x]e*_{1}.
2. *η-reduction:* *λx.(e x)⇒e, ifx∈fv(e).*

We write “e1^{∗}*⇒* *e*2” if *e*2 can be derived from zero or more *β- or* *η-reductions or* *α-*
conversions; in other words^{∗}*⇒* is the reﬂexive, transitive closure of *⇒*including *α-*
conversions. Similarly,*⇔** ^{∗}* is the reﬂexive, transitive closure of

*⇔*. In summary,

^{∗}*⇒*cap- tures the notion of “reducibility,” and

*⇔*

*captures the notion of “intra-convertibility.”*

^{∗}**Deﬁnition:** A lambda expression is in*normal form*if it cannot be further reduced using
*β- orη-reduction.*

Note that some lambda expressions have*no*normal form, such as:

*(λx. (x x)) (λx. (x x))*

where the only possible reduction leads to an identical term, and thus the reduction process is non-terminating.

Nevertheless, the normal form appears to be an attractive canonical form for a term,
has a clear sense of “ﬁnality” in a computational sense, and is what we intuitively think
of as the*value*of an expression. Obviously we would like for that value to be unique,
and we would like to be able to ﬁnd it whenever it exists. The Church-Rosser Theorems
give us positive results for both of these desires.

7The lambda calculus as we have deﬁned it here is what Barendregt [Bar84] calls the*λKη-calculus,*
and is slightly more general than Church’s original*λK-calculus (which did not include* *η-conversion).*

Furthermore, Church originally showed the consistency of the*λI-calculus [Chu41], an even smaller subset*
(it only allowed abstraction of*x*from*e*if*x*was free in*e). We will ignore the subtle diﬀerences between*
these calculi here—our version is the one most often discussed in the literature on functional languages.

**2.1.2** **The Church-Rosser Theorems**

**Church-Rosser Theorem I:** *If* *e*0 *⇔*^{∗}*e*1 *then there exists an* *e*2 *such that* *e*0^{∗}*⇒* *e*2 *and*
*e*_{1}^{∗}*⇒* *e*_{2}*.*^{8}

In other words, if *e*0 and *e*1 are intra-convertible, then there exists a third term
(possible the same as*e*_{0} or*e*_{1}) to which they can both be reduced.

**Corollary:** *No lambda expression can be converted to two distinct normal forms (ignor-*
*ing diﬀerence due toα-conversion).*

One consequence of this result is that how we arive at the normal form does not matter; i.e. the order of evaluation is irrelevant (this has important consequences for parallel evaluation strategies). The question then arises as to whether or not it is always possible to ﬁnd the normal form (assuming it exists). We begin with some deﬁnitions.

**Deﬁnition:**A*normal-order reduction*is a sequential reduction in which, whenever there
is more than one reducible expression (called a*redex), the left-most one is chosen ﬁrst.*

In contrast, an*applicative-order reduction*is a sequential reduction in which the left-
most*innermost*redex is chosen ﬁrst.

**Church-Rosser Theorem II:** *If* *e*_{0}^{∗}*⇒* *e*_{1} *and* *e*_{1} *is in normal form, then there exists a*
*normal-order reduction frome*0*toe*1*.*

This is a very satisfying result; it says that if a normal form exists, we can always ﬁnd it; i.e. just use normal-order reduction. To see why applicative-order reduction is not always adequate, consider the following example:

applicative-order: normal-order:

*(λx. y) ((λx. x x) (λx. x x))* *(λx. y) ((λx. x x) (λx. x x))*

*⇒* *(λx. y) ((λx. x x) (λx. x x))* *⇒* *y*

*⇒* ...

We will return to the trade-oﬀs between normal- and applicative-order reduction in Section 3.2. For now we simply note that the strongest completeness and consistency results have been achieved with normal-order reduction.

In actuality, one of Church’s (and others’) motivations for developing the lambda calculus in the ﬁrst place was to form a foundation for all of mathematics (in the way that, for example, set theory is claimed to provide such a foundation). Unfortunately, all attempts to extend the lambda calculus suﬃciently to form such a foundation failed to yield a consistent theory. Church’s original extended system was shown inconsistent by the Kleene-Rosser Paradox [KR35]; a simpler inconsistency proof is embodied in what is known as the Curry Paradox [Ros82]. The only consistent systems that have

8Church and Rosser’s original proofs of their theorems are rather long, and many have tried to improve on them since. The shortest proof I am aware of for the ﬁrst theorem is fairly recent, and aptly due to Rosser [Ros82].

been derived from the lambda calculus are much too weak to claim as a foundation for mathematics, and the problem remains open today.

These inconsistencies, although disappointing in a foundational sense, did not slow down research on the lambda calculus, which turned out to be quite a nice model of functions and of computation in general. The Church-Rosser Theorem was an ex- tremely powerful consistency result for a computation model, and in fact rewrite sys- tems completely diﬀerent from the lambda calculus are often described as “possessing the Church-Rosser property” or even anthropomorphically as being “Church-Rosser.”

**2.1.3** **Recursion,***λ-Deﬁnability, and Church’s Thesis*

Another nice property of the lambda calculus is embodied in the following theorem:

**Fixpoint Theorem:** *Every lambda expressione* *has a ﬁxpointe*^{}*such that(e e*^{}*)⇔*^{∗}*e*^{}*.*
**Proof:** Take*e** ^{}*to be

*(Y e)*where

*Y*, known as the

*Y combinator, is deﬁned by:*

*Y* *≡λf .(λx.f (x x))(λx.f (x x))*
Then we have:

*(Y e)* *=* *(λx.e(x x))(λx.e(x x))*

*=* *e((λx.e(x x))(λx.e(x x)))*

*=* *e(Y e)*

This surprising theorem (and equally surprising simple proof) is what has earned
*Y* the name “paradoxical combinator.” The theorem is quite signiﬁcant—it means that
any recursive function may be written non-recursively (and non-imperatively!). To see
how, consider a recursive function*f* deﬁned by:

*f* *≡ · · ·f· · ·*
This could be rewritten as:

*f* *≡* *(λf .* *· · ·f· · ·)f*

where note that the inner occurrence of*f* is now bound. This equation essentially says
that*f* is a ﬁxpoint of the lambda expression*(λf .* *· · ·f· · ·)—but that is exactly what*
*Y* computes for us, so we arrive at the following*non-recursive*deﬁnition for*f*:

*f* *≡* *Y (λf .* *· · ·f* *· · ·)*
As a concrete example, the factorial function:

*fac* *≡* *λn.if(n=*0)*then*1*else(n∗fac(n−*1))
can be written non-recursively as:

*fac* *≡* *Y(λfac. λn.if* *(n=0)then 1 else(n∗fac(n−1)))*

The ability of the lambda calculus to “simulate recursion” in this way is the key to its power, and accounts for its persistence as a useful model of computation. Church recognized this, and is perhaps best expressed in his now famous thesis:

**Church’s Thesis:** *Eﬀectively computable functions from positive integers to positive in-*
*tegers are just those deﬁnable in the lambda calculus.*

This is quite a strong claim! Although the notion of “functions from positive inte-
gers to positive integers” can be formalized very precisely, the notion of “eﬀectively
computable” cannot; thus no proof can be given for the thesis. However, it gained
support from Kleene who in 1936 [Kle36] showed that “λ-deﬁnability” was precisely
equivalent to Gödel and Herbrand’s notions of “recursiveness.” Meanwhile, Turing
had been working on his now famous *Turing Machine* [Tur36], and in 1937 [Tur37]

he showed that “Turing computability” was also precisely equivalent to*λ-deﬁnability.*

These were quite satisfying results.^{9}

The lambda calculus and the Turing machine were to have profound impacts on
programming languages and computational complexity,^{10} respectively, and computer
science in general. This inﬂuence was probably much greater than Church or Turing
could have imagined, which is perhaps not surprising given that computers did not
even exist yet!

In parallel with the development of the lambda calculus, Schönﬁnkel and Curry
were busy founding*combinatory logic. It was Schönﬁnkel [Sch24] who discovered the*
surprising result that any function could be expressed as the composition of only two
simple functions, K and S. Curry proved the consistency of a pure combinatory calculus
[Cur30], and with Feys elaborated the theory considerably [CF58]. Although this work
deserves as much attention from a logician’s point of view as the lambda calculus, and
in fact its origins predate that of the lambda calculus, we will not pursue it further
here, since it did not contribute directly to the development of functional languages
in the way that the lambda calculus did. On the other hand, the combinatory calculus
was to eventually play a surprising role in the*implementation*of functional languages,
beginning with [Tur79] and summarized in [PJ87, Chapter 16].

Another noteworthy attribute of the lambda calculus is its restriction to functions
of one argument. That it suﬃces to consider only such functions was ﬁrst suggested by
Frege in 1893 [vH67] and independently by Schönﬁnkel in 1924 [Sch24]. This restriction
was later exploited by Curry [CF58], who used the notation*(f x y)*to denote*((f x) y),*
and which previously would have been written *f (x, y). This notation has become*
known as “currying,” and *f* is said to be a “curried function.” As we will see, the

9Much later Post [Pos43] and Markov [Mar51] proposed two other formal notions of eﬀective com-
putability, but these also were shown to be equivalent to*λ-deﬁnability.*

10Although the lambda calculus and the notion of*λ-deﬁnability pre-dated the Turing Machine, com-*
plexity theorists latched onto the Turing Machine as their fundamental measure of decidability. This
is probably because of the appeal of the Turing Machine as a*machine, giving it more credibility in the*
emerging arena of electronic digital computers. See [Tra88] for an interesting discussion of this issue.

notion of currying has carried over today as a distinguishing syntactic characteristic of modern functional languages.

There are several variations and embellishments of the lambda calculus that his- torically could be mentioned here, but instead of doing that I will postpone their dis- cussion until the point at which functional languages exhibited similar characteristics.

In this way we can see more clearly the relationship between the lambda calculus and functional languages.

**2.2** **Lisp**

A discussion of the history of functional languages would certainly be remiss if it did not include a discussion of Lisp, beginning with McCarthy’s seminal work in the late 1950’s.

Although lambda calculus is often considered as the foundation of Lisp, by Mc- Carthy’s own account [McC78] the lambda calculus actually played a rather small role.

Its main impact came through McCarthy’s desire to represent functions “anonymously,”

and Church’s *λ-notation was what he chose: a lambda abstraction written* *λx.e* in
lambda calculus would be written(lambda (x) e)in Lisp.

Beyond that, the similarity wanes. For example, rather than use the *Y* combinator
to express recursion, McCarthy invented the *conditional expression,*^{11} with which re-
cursive functions could be deﬁned explicitly (and, arguably, more intuitively). As an
example, the non-recursive factorial function given in the lambda calculus in the last
section would be written recursively in Lisp in the following way:

(define fac (n) (if (= n 0)

1

(* n (fac (- n 1))) ))

This and other ideas were described in two landmark papers in the early 60’s [McC60, McC63], papers that inspired work on Lisp for many years to come.

McCarthy’s original motivation for developing Lisp was the desire for an algebraic list processing language for use in artiﬁcial intelligence research. Although symbolic processing was a fairly radical idea at the time, his aims were quite pragmatic. One of the earliest attempts at designing such a language was suggested by McCarthy, and resulted in FLPL (for Fortran-compiled List Processing Language), implemented in 1958 on top of the Fortran system on the IBM 704 [GHG60]. Over the next few years McCarthy designed, reﬁned, and implemented Lisp. His chief contributions during this period were:

11The conditional in Fortran (essentially the only other programming language in existence at the time)
was a*statement, not an expression, and was for**control, not value-deﬁning, purposes.*

1. The conditional expression and its use in writing recursive functions.

2. The use of lists and higher-order operations over lists such asmapcar.

3. The central idea of a “cons cell,” and the use of garbage collection as a method of reclaiming unused cells.

4. The use of S-expressions (and abstract syntax in general) to represent both pro-
gram and data.^{12}

All four of these features are essential ingredients of any Lisp implementation today, and the ﬁrst three are essential to functional language implementations as well.

A simple example of a typical Lisp deﬁnition is the following one formapcar:

(define mapcar (fun lst) (if (null lst)

nil

(cons (fun (car lst)) (mapcar fun (cdr lst))) )) This example demonstrates all of the points mentioned above. Note that the function fun is passed as an argument to mapcar. Although such higher-order programming was very well known in lambda calculus circles, it was certainly a radical departure from Fortran, and has become one of the most important programming techniques in Lisp and functional programming (higher-order functions are discussed more in Section 3.1). The primitive functionscons,car,cdr, andnull are the well-known operations on lists whose names are still used today. conscreates a new list cell without burdoning the user with explicit storage management; similarly, once that cell is no longer needed a “garbage collector” will come along and reclaim it, again without user involvement.

For example, sincemapcarconstructs a new list from an old one, in the call:

(mapcar fun (cons a (cons b nil)))

the list(cons a (cons b nil))will become garbage after the call and will automat- ically be reclaimed. Lists were to become the paradigmatic data structure in Lisp and early functional languages.

The deﬁnition of mapcar in a modern functional language such as Haskell would appear similarly, except that pattern-matching would be used to “destructure” the list:

mapcar fun [] = []

mapcar fun (x:xs) = fun x : mapcar fun xs

12Interestingly, McCarthy claims that it was the read and print routines that inﬂuenced this notation most! [McC78]

[[]is the null list, and: is the inﬁx operator forcons; also note that function appli- cation has higher precedence than any inﬁx operator.]

McCarthy was also interested in designing a practical language, and thus Lisp had many pragmatic features—in particular, sequencing, the assignment statement, and other primitives that induced side-eﬀects on the store. Their presence undoubtedly had much to do with early experience with Fortran. Nevertheless, McCarthy emphasized the “mathematical elegance” of Lisp in his early papers, and in a much later paper his student Cartwright demonstrated the ease with which one could prove properties about pure Lisp programs [Car76].

Despite its impurities, Lisp had a great inﬂuence on functional language develop-
ment, and it is encouraging to note that modern Lisps (especially Scheme) have returned
more to the purity of the lambda calculus rather than the*ad hocery that plagued the*
Maclisp era. This return to purity includes the “ﬁrst-class” treatment of functions and
the lexical scoping of identiﬁers. Furthermore, the preferred modern style of Lisp pro-
gramming, such as espoused by Abelson and Sussman [ASS85], can be characterized
as being predominantly side-eﬀect free. And ﬁnally, we point out that Henderson’s
Lispkit Lisp [Hen80] is a purely functional version of Lisp that uses an inﬁx, algebraic
syntax.

**2.2.1** **Lisp in Retrospect**

Before continuing our historical development it is helpful to consider some of the design decisions McCarthy made and how they would be formalized in terms of the lambda calculus. It may seem that conditional expressions, for example, are an obvious feature to have in a language, but I think that only reﬂects our familiarity with mod- ern high-level programming languages, most of which have them. In fact the lambda calculus version of the factorial example given in the previous section used a condi- tional (not to mention arithmetic operators) yet most readers probably understood it perfectly, and did not object to my departure from precise lambda calculus syntax!

The eﬀect of conditional expressions can in fact be achieved in the lambda calculus by encoding the true and false values as functions, as well as deﬁning a function to emulate the conditional:

*true* *≡* *λx.λy.x*
*false* *≡* *λx.λy.y*

*cond* *≡* *λp.λc.λa.(p c a)*

In other words,*(condp c a)* *≡* *(ifpthenc* *elsea). One can then deﬁne, for example,*
the factorial function by:

*fac* *≡* *λn.cond(=n*0)1*(∗n (fac(−n*1)))

where*=*is deﬁned by:

*(=n n)* *⇒* *true*

*(=n m)* *⇒* *false,* if*m=n*

where*m*and*n*range over the set of integer constants. However, I am still cheating a
bit by not explaining the nature of the objects*−*,*∗*, 0, 1, etc. in pure lambda calculus
terms. It turns out that they can be represented in a variety of ways, essentially using
functions to simulate the proper behavior, just as for true, false, and the conditional
(for the details, see [Chu41]). In fact*any*conventional data or control structure can be

“simulated” in the lambda calculus; if this were not the case, it would be diﬃcult to believe Church’s Thesis!

However, even if McCarthy knew of these ways to express things in the lambda
calculus (there is reason to believe that he did not), eﬃciency concerns might have
rapidly led him to consider other alternatives anyway, especially since Fortran was the
only high-level programming language anyone had any experience with. In particular,
Fortran functions evaluated their arguments*before*entering the body of the function,
resulting in what is often called a*strict, orcall-by-value, evaluation policy, correspond-*
ing roughly to applicative-order reduction in the lambda calculus. With this strategy
extended to the primitives, including the conditional, one cannot easily deﬁne recur-
sive functions. For example, in the above deﬁnition of factorial all three arguments to
*cond*would be evaluated, including*fac(−n*1), resulting in non-termination.

*Non-strict* evaluation, corresponding to the normal-order reduction that is essen-
tial to the lambda calculus in realizing recursion, was not very well understood at
the time—it was not at all clear how to implement it eﬃciently on a conventional von
Neumann computer—and we would have to wait another 20 years or so before such
an implementation was even attempted.^{13} The conditional expression essentially al-
lowed one to*selectively*invoke normal-order, or non-strict, evaluation. Stated another
way, McCarthy’s conditional, although an expression, was compiled into code that es-
sentially controlled the “reduction” process. Most imperative programming languages
today that allow recursion do just that, and thus even though such languages are of-
ten referred to as strict, they all rely critically on at least one non-strict construct: the
conditional.

**2.2.2** **The Lambda Calculus With Constants**

The conditional expression is actually only one example of very many primitive func-
tions that were included in Lisp. Rather than explain them in terms of the lambda cal-
culus by a suitable encoding (i.e. compilation), it is perhaps better to formally extend
the lambda calculus by adding a set of*constants* along with a set of what are usually

13On the other hand, the*call-by-name*evaluation strategy invented in Algol had very much of a normal-
order reduction ﬂavor. See [Weg68, Wad71] for early discussions of these issues.

called*δ-rules*which state relationships between constants and which eﬀectively extend
the basis set of *α-,* *β-, and* *η-reduction rules. For example, the reduction rules for =*
that I gave earlier (and which are repeated below) are*δ-rules. This new calculus, often*
called the*lambda calculus with constants, can be given a precise abstract syntax:*

*x* *∈* *Id* Identiﬁers

*c* *∈* *Con* Constants

*e* *∈* *Exp* Lambda Expressions

where*e*::*=x* *|c* *|e*1*e*2*|λx.e*
for which various*δ-rules apply, such as the following:*

*(=* 0 0) *⇒* *True*
*(=* 0 1) *⇒* *False*

*· · ·*
*(+*0 0) *⇒* 0
*(+*0 1) *⇒* 1

*· · ·*
*(+*27 32) *⇒* 59

*· · ·*
*(If Truee*1*e*2*)* *⇒* *e*1

*(If Falsee*1*e*2*)* *⇒* *e*2

*· · ·*
*(Car (Cons e*1*e*2*))* *⇒* *e*1

*(Cdr (Cons e*1*e*2*))* *⇒* *e*2

*· · ·*

where =, +, 0, 1,*If,True,False,Cons, etc. are elements ofCon.*

The above rules can be shown to be a*conservative extension*of the lambda calculus,
a technical term that in our context essentially means that convertible terms in the
original system are still convertible in the new, and (perhaps more importantly) incon-
vertible terms in the original system are still inconvertible in the new. In general, care
must be taken when introducing*δ-rules, since all kinds of inconsistencies could arise.*

For a quick and dirty example of inconsistency, we might deﬁne a primitive function
over integers called*broken*with the following*δ-rules:*

*(broken*0) *⇒* 0
*(broken*0) *⇒* 1

which immediately implies that more than one normal form exists for some terms, violating the ﬁrst Church-Rosser theorem (see Section 2.1)!

As a more subtle example, suppose we deﬁne the logical relation*Or*by:

*(Or Truee)* *⇒* *True*
*(OreTrue)* *⇒* *True*
*(Or False False)* *⇒* *False*

Even though these rules form a conservative extension of the lambda calculus, a com-
mitment to evaluate either of the arguments may lead to non-termination, even though
the other argument may reduce to*True. In fact it can be shown that with the above rules*
there does not exist a deterministic sequential reduction strategy that will guarantee
that the normal form*True*will be found for all terms having such a normal form, and
thus the second Church-Rosser property is violated. This version of*Or*is often called
the “parallel or,” since a parallel reduction strategy is needed to implement it properly
(and with which the ﬁrst Church-Rosser Theorem will at least hold). Alternatively, we
could deﬁne a “sequential or” by:

*(Or Truee)* *⇒* *True*
*(Or Falsee)* *⇒* *e*

which can be shown to satisfy both Church-Rosser theorems.

**2.3** **Iswim**

Historically speaking, Peter Landin’s work in the mid 60’s was the next signiﬁcant impe- tus to the functional programming paradigm. Landin’s work was deeply inﬂuenced by that of Curry and Church. His early papers discussed the relationship between lambda calculus and both machines and high-level languages (speciﬁcally Algol 60). In [Lan64]

Landin discussed how one could mechanize the evaluation of expressions, through an abstract machine called the SECD machine; and in [Lan65] he formally deﬁned a non-trivial subset of Algol 60 in terms of the lambda calculus.

It is apparent from his work that Landin regarded highly the expressiveness and purity of the lambda calculus, and at the same time recognized its austerity. Undoubt- edly as a result of this work, in 1966 Landin introduced a language (actually a family of languages) called Iswim (for “If you See What I Mean”), which included a number of signiﬁcant syntactic and semantics ideas [Lan66]. Iswim, according to Landin, “can be looked on as an attempt to deliver Lisp from its eponymous committment to lists, its reputation for hand-to-mouth storage allocation, the hardware dependent ﬂavor of its pedagogy, its heavy bracketing, and its compromises with tradition.” When all is said and done, I think the primary contributions of Iswim, with respect to the development of functional languages, are the following:

1. Syntactic innovations:

(a) The abandonment of preﬁx syntax in favor of inﬁx.

(b) The introduction of*let*and*where*clauses, including a notion of simultaneous
and mutually recursive deﬁnitions.

(c) The use of an “oﬀ-side rule” based on *indentation* rather than separators
(such as commas or semicolons) to scope declarations and expressions. For
example (using Haskell syntax), the program fragment:

e where f x = x a b = 1

cannot be confused with:

e where f x = x a b = 1

and is equivalent to what might otherwise be written as:

e where { f x = x;

a b = 1 }

2. Semantic innovations:

(a) An emphasis on generality. Landin was half-way serious in hoping that the Iswim family could serve as the “next 700 programming languages.” Central to his strategy was the idea of deﬁning a syntactically rich language in terms of a very small but expressive core language.

(b) An emphasis on equational reasoning (i.e. the ability to replace equals with equals). This elusive idea was backed up with four sets of rules for rea- soning about expressions, declarations, primitives, and “problem-oriented”

extensions.

(c) The SECD machine as a simple abstract machine for executing functional programs.

We can think of Landin’s work as extending the “lambda calculus with constants”

deﬁned in the last section so as to include more primitives, each with its own set
of*δ-rules, but more importantly* *let* and *where*clauses, for which it is convenient to
introduce the syntactic category of*declarations:*

*e* *∈* *Exp* Expressions

where*e* ::*= · · · |e* *whered*1 *· · ·* *d**n*

*|letd*1 *· · ·* *d**n* *ine*

*d* *∈* *Decl* Declarations

where*d* ::*=* *x=e*

*|* *x x*1 *· · ·* *x**n**=e*

and for which we then need to add some axioms (i.e. reduction rules) to capture the desired semantics. Landin proposed special constructs for deﬁning simultaneous and mutually recursive deﬁnitions, but we will take a simpler and more general approach

here: We assume that a block of declarations*d*1*· · ·d**n* always is potentially mutually
recursive—if it isn’t, our rules still work:

*(letd*1 *· · ·* *d**n**ine)* *⇒* *(ewhered*1 *· · ·* *d**n**)*
*(x x*_{1} *· · ·* *x*_{n}*=e)* *⇒* *(x=λx*_{1}*.λx*_{2}*.· · ·λx*_{n}*. e)*

*(ewherex*1*=e*1*)* *⇒* *(λx*1*.e)(Y λx*1*.e*1*)*

*(e* *where(x*1*=e*1*)· · ·(x**n**=e**n**))* *⇒* *(λx*1*.e)(Y λx*1*.e*1*)where* *x*2*=(λx*1*.e*2*)(Y λx*1*.e*1*)*

*· · ·*

*x**n**=(λx*1*.e**n**)(Y λx*1*.e*1*)*
These rules are semantically equivalent to Landin’s, but they avoid the need for a tu-
pling operator to handle mutual recursion, and they use the Y combinator (deﬁned in
Section 2.1) instead of an iterative “unfolding” step.

We will call this resulting system the *recursive lambda calculus with constants, or*
just*recursive lambda calculus.*

Landin’s emphasis on expressing*what* the desired result is, as opposed to saying
*how*to get it, and his claim that Iswim’s declarative^{14} style of programming was bet-
ter than the incremental and sequential imperative style, were ideas to be echoed by
functional programming advocates to this day. On the other hand, it took another ten
years before interest in functional languages was to be substantially renewed. One
of the reasons is that there were no decent implementations of Iswim-like languages
around; this reason, in fact, was to persist into the 80’s.

**2.4** **APL**

Iverson’s APL [Ive62], although not a purely functional programming language, is worth
mentioning because its functional subset is an example of how one could achieve func-
tional programming without relying on lambda expressions. In fact Iverson’s design of
APL was motivated out of his desire to develop an*algebraic*programming language for
arrays, and his original work used an essentially functional notation. Subsequent de-
velopment of APL resulted in several imperative features, but the underlying principles
should not be overlooked.

APL was also unique in its goal of succinctness, in that it used a specially designed
alphabet to represent programs—each letter corresponding to one operator. That APL
became popular is apparent in the fact that many keyboards, both for typewriters and
computer terminals, carried the APL alphabet. Backus’ FP, which came after APL, was
certainly inﬂuenced by the APL philosophy, and its abbreviated publication form also
used a specialized alphabet (see the example in Section 2.5). In fact FP has much in
common with APL, the primary diﬀerence being that FP’s fundamental data structure
is the*sequence, whereas in APL it is thearray.*

14Landin actually disliked the term “declarative,” preferring instead “denotative.”

It is worth noting that recent work on APL has revived some of APL’s purely func-
tional foundations. The most notable work is that of Tu [Tu86, TP86], who designed a
language called FAC, for Functional Array Calculator (presumably a take-oﬀ on Turner’s
KRC, Kent Recursive Calculator). FAC is a purely functional language that adopts most
of the APL syntax and programming principles, but in addition has special features to
allow programming with*inﬁnite*arrays; naturally, lazy evaluation plays a major role in
the semantics of the language. Another interesting approach is that of Mullin [Mul88].

**2.5** **FP**

Backus’ FP was one of the earliest functional languages to receive wide-spread atten- tion. Although most of the features in FP are not found in today’s modern functional languages, Backus’ Turing Award lecture [Bac78] in 1978 was one of the most inﬂuen- tial and now most-often cited papers extolling the functional programming paradigm.

It not only stated quite eloquently why functional programming was “good,” but also
quite vehemently why traditional imperative programming was “bad.”^{15} Backus’ coined
the term “word-at-a-time programming” to capture the essence of imperative languages,
showed how such languages were inextricably tied to the “von Neumann machine,” and
gave a convincing argument why such languages were not going to meet the demands
of modern software development. That this argument was being made by the person
who is given the most credit for designing Fortran, and who also had signiﬁcant inﬂu-
ence on the development of Algol, led substantial weight to the “functional thesis.” The
exposure given to Backus’ paper was one of the best things that could have happened
to the ﬁeld of functional programming, which at the time was certainly not considered
main-stream.

Despite the great impetus Backus’ paper gave to functional programming, it is inter- esting to note that in the same paper Backus also said that languages based on lambda calculus would have problems, both in implementation and expressiveness, because the model was not suitably history-sensitive (more speciﬁcally, it did not handle large data structures, such as databases, very easily). With regard to implementation this argument is certainly understandable, because it was not clear how to implement the notion of substitution in an eﬃcient manner, nor was it clear how to structure data in such ways that large data structures could be implemented eﬃciently (both of these issues are much better understood today). And with regard to expressiveness, well, that argument is still a matter of debate today. In any case, these problems were the motivation for Backus’ Applicative State Transition (AST) Systems, in which state is introduced as something on which purely functional programs interact with in a more traditional (i.e. imperative) way.

Perhaps more surprising, and an aspect of the paper that is usually overlooked, Backus had this to say about “lambda-calculus based systems:”

15Ironically, the Turing Award was given to Backus in a large part because of his work on Fortran!

“An FP system is founded on the use of a ﬁxed set of combining forms called
functional forms.*. . .*In contrast, a lambda-calculus based system is founded
on the use of the lambda expression, with an associated set of substitution
rules for variables, for building new functions. The lambda expression (with
its substitution rules) is capable of deﬁning all possible computable func-
tions of all possible types and of any number of arguments. This freedom
and power has its disadvantages as well as its obvious advantages. It is
analogous to the power of unrestricted control statements in conventional
languages: with unrestricted freedom comes chaos. If one constantly in-
vents new combining forms to suit the occasion, as one can in the lambda
calculus, one will not become familiar with the style or useful properties of
the few combining forms that are adequate for all purposes.”

Backus’ argument, of course, was in the context of garnering support for FP, which had a small set of combining forms that were claimed to be suﬃcient for most pro- gramming applications. One of the advantages of this approach is that each of these combining forms could be named with particular brevity, and thus programs become quite compact—this was exactly the approach taken by Iverson in designing APL (see Section 2.4). For example, an FP program for inner product looks like:

Def*IP* *≡(/+)◦(α×)◦Trans*

where*/,* *◦*, and *α*are combining forms called “insert,” “compose,” and “apply-to-all,”

respectively. In a modern functional language such as Haskell this would be written with slightly more verbosity as:

ip l1 l2 = foldl (+) 0 (map2 (*) l1 l2)

[In Haskell an inﬁx operator such as+ may be passed as an argument by surrounding
it in parentheses.] Here foldl is the equivalent of insert (/), and map2 is a two-list
equivalent of apply-to-all (α), thus eliminating the need for *Trans. These functions*
are pre-deﬁned in Haskell, as they are in most modern functional languages, for the
same reason that Backus argues—they are commonly used. If they were not they could
easily be deﬁned by the user. For example, we may wish to deﬁne an inﬁx composition
operator for functions, the ﬁrst of which is binary, as follows:

(f .o. g) x y = f (g x y)

[Note how inﬁx operators may be deﬁned in Haskell; operators are distinguished lexi- cally by their ﬁrst character being non-alphabetic.] With this we can reclaim much of FP’s succinctness in deﬁningip:

ip = foldl (+) 0 .o. map2 (*)

[Recall that in Haskell function application has higher precedence than inﬁx operator application.] It is for this reason, together with the fact that FP’s specialization pre- cluded the generality aﬀorded by user-deﬁned higher-order functions (which is all that combining forms are), that modern functional languages did not follow the FP style. As we shall soon see, certain other kinds of syntactic sugar became more popular instead (such as pattern-matching, list comprehensions, and sections).

Many extensions to FP have been proposed over the years, including the inclusion of strong typing and abstract datatypes [GHW81]. In much more recent work, Backus, Williams and Wimmers have designed the language FL [BWW86], which is strongly (although dynamically) typed, and in which higher-order functions and user-deﬁned datatypes are allowed. Its authors still emphasize the algebraic style of reasoning that is paramount in FP, although it is also given a denotational semantics which is provably consistent with respect to the algebraic semantics.

**2.6** **ML**

In the mid 70’s, at the same time Backus was working on FP at IBM, several research projects were underway in the United Kingdom that related to functional programming, most notably work at Edinburgh. There Gordon, Milner, and Wadsworth had been work- ing on a proof generating system called LCF for reasoning about recursive functions, in particular in the context of programming languages [GMW79]. The system consisted of a deductive calculus called PPλ (polymorphic predicate calculus) together with an interactive programming language called ML, for “meta-language” (since it served as the command language for LCF).

LCF is quite interesting as a proof system, but its authors soon found that ML was
also interesting in its own right, and they proceded to develop it as a stand-alone
functional programming language [GMM*78]. That it was, and still is, called a*functional*
language is actually somewhat misleading, since it has a notion of*references*which are
locations that can be stored into and read from, much as variables are assigned and
read. Its I/O system also induces side-eﬀects, and is not referentially transparent.

Nevertheless, the style of programming that it encourages is still functional, and that is the way it was promoted (the same is somewhat true for Scheme, although to a lesser extent).

More recently a standardization eﬀort for ML has taken place, in fact taking some
of the good ideas of Hope [BMS80] along with it (such as pattern-matching), yielding a
language now being called*Standard ML, or just SML [Mil84, Wik88].*

ML is a fairly complete language—certainly the most practical functional language
at the time it appeared—and SML is even more so. It has higher-order functions, a
simple I/O facility, a sophisticated module system, and even exceptions. But by far the
most signiﬁcant aspect of ML is its*type system, which is manifested in several ways:*

1. It is*strongly*and*statically*typed.

2. It uses*type inference*to determine the type of every expression, instead of relying
on explicit type declarations.

3. It allows*polymorphic* functions and data structures; that is, functions may take
arguments of arbitrary type if in fact the function does not depend on that type
(similarly for data structures).

4. It has user-deﬁned concrete and abstract datatypes (an idea actually borrowed from Hope, and not present in the initial design of ML).

ML was the ﬁrst language to utilize type inference as a semantically integrated compo- nent of the language, and at the same time its type system was richer than any previous statically typed language in that it permitted true polymorphism. It seemed that the best of two worlds had been achieved—not only is making explicit type declarations (a sometimes burdensome task) not required, but in addition a program that successfully passes the type inferencer is guaranteed not to have any type errors. Unfortunately, this idyllic picture is not completely accurate (although it is close!), as we shall soon see.

We shall next discuss the issue of types in a foundational sense, thus continuing our plan of elaborating the lambda calculus to match the language features being discussed.

This will lead us to a reasonable picture of ML’s *Hindley-Milner type system, the rich*
polymorphic type system that was mentioned above, and that was later adopted by
every other statically-typed functional language, including Miranda and Haskell. Aside
from the type system, the two most novel features in ML are its*references*and*modules,*
which are also covered in this section. Discussion of ML’s data abstraction facilities
will be postponed until Section 3.3.

**2.6.1** **The Hindley-Milner Type System**

We can introduce types into the pure lambda calculus by ﬁrst introducing a domain
of basic types, say *BasTyp, as well as a domain of derived types, say* *Typ, and then*
requiring that every expression be “tagged” with a member of *Typ, which we do by*
superscripting, as in “e* ^{τ}*.” The derived type “τ2

*→τ*1” denotes the type of all functions from values of type

*τ*2to values of type

*τ*1, and thus a proper application will have the form “(e

^{τ}_{1}

^{2}

^{→}

^{τ}^{1}

*e*

^{τ}_{2}

^{2}

*)*

^{τ}^{1}.” Modifying the pure lambda calculus in this way, we arrive at the