# Theory

We now turn our attention to theoretical foundations of programming languages and the meaning of code. These foundations are crucial to understanding how languages, programs, and their implementations work.

# Lambda Calculus¶

We start by examining lambda calculus, the mathematical foundation of functional programming, and use it to reason about how to construct abstractions and model computations. Its simplicity allows us to understand every detail about how it works, yet it is general enough to enable the expression of arbitrary computations.

Lambda calculus (also \(\lambda\)-calculus), introduced by Alonzo Church in the 1930s, is a model of computation based on functions. All functions in lambda calculus are anonymous, providing the inspiration for lambda expressions in modern programming languages.

Lambda calculus is composed of only three elements: variables,
function abstraction, and function application. *Function abstraction*
is the process of defining a new function through a lambda
(\(\lambda\)) expression. The following is a context-free grammar
for \(\lambda\)-calculus:

We will use individual letters, such as \(x\) to denote a variable. Function application is left associative and has higher precedence than abstraction, and we will use parentheses where necessary as a result of associativity and precedence. All functions have exactly one parameter, and functions that would otherwise have multiple parameters must be curried.

Since function application is left associative, a sequence of applications such as \(f~g~h\) is equivalent to \(((f~g)~h)\). And since function application has higher precedence than abstraction, abstraction extends as far to the right as possible. Consider the following example:

The \(\lambda x\) introduces a function abstraction, which extends as far right as possible:

Thus, this is equivalent to

Then within the parentheses, the \(\lambda y\) introduces a new abstraction, which now extends as far right as possible, to the point of the existing closing parenthesis:

Finally, within the body of the inner abstraction, we have a sequence of function applications, which are left associative:

Using the syntax of Scheme, the following is a representation of the function above:

```
(lambda (x)
(x (lambda (y)
((x y) z)
)
)
)
```

(This is merely for illustration. Function semantics are different between Scheme and \(\lambda\)-calculus, so using this syntax is not meant to imply an equivalence.)

The following is the identity function:

The function takes in an argument, binds it to the parameter \(x\), and immediately returns it.

Functions themselves are first-class values, so they can be bound to parameters and returned. The following is a function that discards its input and returns the identity function:

Since abstraction extends as far to the right as possible, this is equivalent to the following parenthesization:

As another example, the following function takes in another function as its argument and applies it to the identity function:

In \(\lambda\)-calculus, functions are statically scoped. The result is that in \(\lambda x .~ E\), \(x\) is bound in \(E\), and \(E\) is the scope of \(x\). If the same name is introduced multiple times within nested scopes, then use of the name resolves to the closest abstraction that introduced it. The following illustrates these rules:

The first abstraction introduces the name \(x\), so the scope of \(x\) is the body of the first abstraction. Thus, when \(x\) appears within the second abstraction, it resolves to the parameter of the first abstraction. The second abstraction itself introduces the name \(y\), so use of the name within its body resolves the the associated parameter. Finally, the third abstraction reintroduces the name \(x\), so \(x\) within its body resolves to the closest introduction, i.e. the parameter of the third abstraction.

An unbound variable is allowed to appear in an expression, and such a
variable is called *free*. For example, in \(\lambda y .~ x~ y\),
\(x\) is a free variable in the expression \(x~ y\) but
\(y\) is bound. In \(\lambda x .~ \lambda y .~ x~ y\), both
\(x\) and \(y\) are bound in the expression \(\lambda y
.~ x~ y\). Free variables are useful for reasoning about subexpressions
such as \(\lambda y.~ x~ y\) in isolation without needing to
consider the full context in which the subexpression appears.

In the expression \(\lambda x .~ E\), replacing all occurrences of
\(x\) with another variable \(y\) does not affect the meaning
as long as \(y\) does not occur in \(E\). For example,
\(\lambda y .~ y\) is an equivalent expression of the identity
function. This process of variable replacement is called
\(\alpha\)*-reduction*, and we denote this replacement process
as follows:

The expressions \(\lambda x .~
x\) and \(\lambda y .~ y\) are \(\alpha\)*-equivalent*, and
we denote this equivalence is follows:

In function application, \(\alpha\)-reduction is used to ensure that names are restricted to the appropriate scope. This translation has the same effect as environments in an interpreter. As an example, consider applying the identity function to itself:

First, we apply \(\alpha\)-reduction on the argument to ensure that variables in the argument are distinct from those in the function being applied:

We then replace each occurrence of the parameter with the argument expression in the body of the function being applied. The result is the body itself after this substitution process:

This argument-substituting procedure is called \(\beta\)*-reduction*, and it is similar to the call-by-name argument-passing
convention in programming languages. We denote \(\beta\)-reduction
as follows:

This expression is itself \(\alpha\)-equivalent to the identity
function, and the original expression \((\lambda x.~ x) (\lambda
x.~ x)\) is \(\beta\)*-equivalent* to the identity function since
it \(\beta\)-reduces to the same expression as the identity
function:

As a more complex example, consider the following:

In the first function application, the variable names are already distinct, so no \(\alpha\)-reduction is necessary. We can then apply \(\beta\)-reduction to obtain:

This results in another function application, where the function and argument do share variable names. Applying \(\alpha\)-reduction, we get:

This \(\beta\)-reduces to

Another \(\beta\)-reduction results in

This cannot \(\beta\)-reduce any further, so it is said to be in
*normal form*. The following denotes the full computation:

## Non-Terminating Computation¶

Evaluating an expression in \(\lambda\)-calculus applies \(\beta\)-reduction as long as possible, until the expression is in normal form. Not all evaluations terminate. Consider a function abstraction that applies an argument to itself:

If we apply this to the identity function, we get:

This evaluation terminates, and as expected, we obtain the identity function. Now consider what happens when we apply the original function to itself:

This evaluation never terminates, as reduction continues to produce an expression that is \(\alpha\)-equivalent to the original one.

## Normal-Order Evaluation¶

Function application in \(\lambda\)-calculus is similar to call by name in that the argument is not evaluated before the function is applied. Instead, the argument expression is substituted for the parameter directly in the body. This results in lazy evaluation, where the argument expression is not evaluated unless it is needed. As an example, consider the following:

The argument expression is a non-terminating computation, so if we were to evaluate it prior to substitution, the computation as a whole would not terminate. Instead, \(\lambda\)-calculus specifies that the substitution happens first:

Since the parameter \(y\) does not appear in the body, the argument expression is eliminated once the argument substitution is made. Thus, the computation terminates, and its end result is the identity function.

There is an important distinction between the evaluation process in
\(\lambda\)-calculus and call by name. In the former, function
bodies are reduced to normal form before the function is applied. This
is referred to as *normal-order evaluation*. By contrast, call by name
performs argument substitution before manipulating the body of the
function. The following illustrates normal-order evaluation:

Before the function on the left is applied, its body is reduced, which involves applying the function \(\lambda y.~ y~ y\) to its argument \(x\). This results in the expression \(x\), so the function on the left becomes \(\lambda x.~ x~ x\). This is in normal form, so the function can now be applied to its argument. Further \(\alpha\)- and \(\beta\)-reductions result in the final value of the identity function.

Summarizing the evaluation rules for a function application \(f~x\), we have the following:

Reduce the body of the function \(f\) until it is in normal form \(f_{normal}\).

If a bound-variable name appears in both \(f_{normal}\) and \(x\), then perform \(\alpha\)-reduction on \(x\) so that this is no longer the case 1, obtaining \(x_\alpha\).

Perform \(\beta\)-reduction by substituting \(x_\alpha\) for the parameter of \(f_{normal}\) in the body of the latter. The result of this reduction is the substituted body itself.

Proceed to reduce the substituted body until it is in normal form.

- 1
Our convention is to \(\alpha\)-reduce the argument rather than the function, though the result of evaluation would be equivalent in either case.

If a variable is free in \(f\) but bound in \(x\) or vice versa, then \(\alpha\)-reduction must be applied in step 2 to rename the bound variable. Thus:

## Encoding Data¶

Lambda calculus consists solely of variables and functions, and we can apply \(\beta\)-reduction to substitute functions for variables. However, none of the familiar values exist directly in \(\lambda\)-calculus, such as integers or booleans. It is thus surprising that \(\lambda\)-calculus can model any computational process. We demonstrate this by encoding values as functions.

### Booleans¶

To start with, let us define an abstraction for the booleans \(true\) and \(false\). The only building block we have to work with is functions, and we need to ensure that the functions that represent the two values are not \(\beta\)-equivalent so that we can distinguish between them. There are many ways we can do so, but the one we use is to define \(true\) and \(false\) as functions that take two values and produce either the first or the second value:

The \(=\) sign here means that we take this as a mathematical definition; it does not denote assignment. Since all functions in \(\lambda\)-calculus must take a single argument, the actual definitions of \(true\) and \(false\) are curried. Applying \(true\) to two values results in the first:

Similarly, applying \(false\) to two values yields the second:

We can proceed to define logical operators as follows:

To see how these work, let us apply them to some examples:

Here, we use \(\rightarrow\) on its own to denote some sequence of \(\alpha\)- and \(\beta\)-reductions. Applying \(and\) to \(true\) and any other boolean results in the second boolean, while applying \(or\) to \(true\) and another boolean always results in \(true\). Similarly:

Applying \(and\) to \(false\) and some other boolean always results in \(false\), while applying \(or\) to \(false\) and another boolean results in the second boolean. Finally, \(not\) works as follows:

Applying \(not\) to \(true\) results in \(false\), and vice versa.

We can define a conditional as follows:

If the condition \(p\) is \(true\), then applying \(p\) to \(a\) and \(b\) results in \(a\), since \(true\) selects the first of two values. On the other hand, if \(p\) is \(false\), then applying \(p\) to \(a\) and \(b\) results in \(b\), since \(false\) selects the second of two values.

### Pairs¶

In order to represent structured data, we need an abstraction for a pair of two values. As with booleans, the only mechanism at our disposal is functions, so we need to produce a “container” function that holds the two values within its body.:

The \(pair\) constructor takes two items \(x\) and \(y\) and produces as a result a function that contains \(x\) and \(y\) in its body. Applying \(pair\) to two concrete items \(a\) and \(b\) results in:

In order to obtain the first item \(a\) above, we can substitute \(true\) for \(f\), so that \(f~a~b\) evaluates to \(a\). Similarly, to obtain the second item, we can substitute \(false\) for \(f\). This leads to the following definitions of the \(first\) and \(second\) selectors:

The following demonstrates how selectors work:

We can also define a representation for \(nil\), as well as a predicate to test for \(nil\):

Let us see how the \(null\) predicate works:

With a definition for pairs, we can represent arbitrary data structures. For example, we can represent trees using nested pairs:

### Church Numerals¶

Many representations of numbers are possible in \(\lambda\)-calculus. For example, we can represent natural numbers in unary format, using pairs:

However, the most common representation is the *Church numerals*,
which represents a natural number by how many times it applies a
function to an input:

A number \(n\) is a higher-order function that, given another function \(f\), produces a new function that applies \(f\) to its argument \(n\) times in succession. Using the mathematical notation \(f^k\) to denote the composition of \(f\) with itself \(k\) times, e.g \(f^3 = f \circ f \circ f\), the Church numeral \(n\) is a function that takes \(f\) and produces \(f^n\).

As a concrete example, the \(right\) function above applies the \(second\) function twice to its argument, so we can define it instead as:

The following demonstrates how this works 2:

- 2
To simplify reasoning about the results, we depart from normal-order evaluation for the remainder of our discussion on \(\lambda\)-calculus when reducing expressions. In particular, we do not reduce a function body before applying it. However, applying the resulting expressions would have the same effect as those generated by normal-order evaluation.

By applying \(right\) to a tree with \(c\) as its right subtree, we obtain \(c\).

We can define an increment function as follows:

Given a number, \(incr\) produces a new one that applies a function to an argument one more time than the original number. Thus, where \(n\) turns its input \(f\) into \(f^n\), the result of \(incr~ n\) turns its input \(f\) into \(f^{n+1}\). This is accomplished by first applying \(n~ f\), which is equivalent to \(f^n\), and then applying \(f\) one more time. For example:

We can then define \(plus\) as follows:

This applies the \(incr\) function \(m\) times to \(n\). For example:

We can then use the same strategy to define multiplication:

Here, we perform \(m\) additions of \(n\), starting at zero, resulting in the product of \(m\) and \(n\).

We can define exponentiation using the same pattern. Decrement and subtraction are a little more difficult to define, but are possible. Finally, we need a predicate to determine when a number is zero:

We apply a number to a function that returns \(false\) and a starting value of \(true\). Only if the function is never applied is the result \(true\), otherwise it is \(false\):

## Recursion¶

Church numerals allow us to perform bounded repetition, but in order to express arbitrary computation, we need a mechanism for unbounded repetition. Since \(\lambda\)-calculus only has functions, recursion is a natural mechanism for repetition.

In recursion, a function needs to be able to refer to itself by name. However, in \(\lambda\)-calculus, the only way to introduce a name is as a function parameter. Thus, a recursive function must take itself as input. For example, the following defines a factorial function:

As an analogy, the equivalent form in Python is as follows:

```
>>> factorial = lambda f: (lambda n: 1 if n == 0 else n * f(f)(n-1))
```

In order to actually apply the \(factorial\) function, we need another function that applies its argument to itself:

We can then compute a factorial as follows:

Further evaluation results in the factorial of \(m\). Performing the analogous operation in Python:

```
>>> apply = lambda g: g(g)
>>> apply(factorial)(4)
24
```

The \(apply\) function can be generalized as the following
function in \(\lambda\)-calculus, known as a *fixed-point
combinator* and, by convention, the *Y combinator*:

Applying the Y combinator to a function \(F\) results in:

This allows us to define \(factorial\) more simply. Let us first define a concrete function \(F\):

Notice that this is the same as \(factorial\), except that we have not passed the input function to itself in the recursive application. If we apply the Y combinator to \(F\) and apply the result to a number, we get:

Letting \(factorial~ =~ Y~ F\), we get

Thus, we see that applying the Y combinator to \(F\) results in a recursive \(factorial\) function, and the Y combinator enables us to write recursive functions in a simpler manner.

## Equivalent Models¶

Lambda calculus models functional programming in its simplest and
purest form, and its ability to encode data and perform recursion
demonstrates the power of functional programming. It is not the only
model for computation, however. Perhaps the most famous model is the
*Turing machine*, described by Alan Turing around the same time as
Church’s work on \(\lambda\)-calculus. The Turing model is
imperative at its core, and it is more closely related to the workings
of modern machines than \(\lambda\)-calculus.

Many variants of Turing machines have been defined, but the following is a description of one variant:

A

*tape*device is used for storage, divided into individual cells in a linear layout. Each cell contains a symbol from a finite alphabet. The tape extends infinitely in both left and right directions.A

*head*reads and writes symbols from the tape. It can be moved one step at a time to the right or left.A

*state register*keeps track of the state of the machine. There are a finite number of states the machine can be in, including special start and halt states.A

*table*of instructions specifies what the machine is to do for each combination of state and symbol. Since the sets of states and symbols are finite, the instruction table is also finite. At each step in the computation, the machine looks up the current state and the symbol currently under the head in the table and follows the specified instruction.An

*instruction*can either halt the machine, ending computation, or do the following:Write a specific symbol at the current position of the head.

Move the head either one step to the left or the right.

Go to a specified new state.

Analogizing with imperative programming, each instruction in a Turing
machine can be considered a statement, and each statement transfers
control to a new one in a manner similar to a `goto`

.

Despite the vastly different model of computation, Alan Turing proved
that a Turing machine can solve exactly the same problems as
\(\lambda\)-calculus. This suggests that both models encompass
all of computation, a conjecture formalized in the *Church-Turing
thesis*. The thesis states that a function is computable by a human
following an algorithm if and only if it is computable by a Turing
machine, or equivalently, an expression in \(\lambda\)-calculus.

All known models of computation have been shown to be either
computationally equivalent to or weaker than Turing machines.
Equivalent models are said to be *Turing complete*. A programming
language also defines a model of computation, and all general-purpose
programming languages are Turing complete, whether they follow a
functional paradigm, an imperative one, or an alternative approach.

# Operational Semantics¶

As mentioned previously, *semantics* is concerned with the meaning of
code fragments, as opposed to syntax, which is concerned with their
structure. We have seen that syntax can be formally described with
regular expressions and context-free grammars. Semantics can also be
described formally, and there are a number of approaches.
*Denotational semantics* specifies program behavior using set and
domain theory, with program fragments described as partial functions
over program state. *Axiomatic semantics* is concerned with proving
logical assertions over program state, so it specifies the meaning of
each construct with respect to its effect on these logical assertions.
*Operational semantics* specifies what each computational step does to
the state of a program, and what value is computed in each step.
Operational semantics more closely describes what an interpreter for a
language must perform for each step than denotational or axiomatic
semantics.

In this section, we will examine a form of operational semantics known
as *structured operational semantics*, and more specifically,
*natural* or *big-step* semantics. This form of semantics is
particularly well-suited to implementation in a recursive interpreter.
We specify rules for how the computation evolves for each syntactic
construct in a programming language. We will begin our exploration
with a simple imperative language.

## Language¶

Consider a simple imperative language with variables, integers, booleans, statements, conditionals, and loops. The following is a context-free grammar that describes this language:

In order to avoid ambiguities, arithmetic and boolean expressions are
parenthesized where necessary, and conditionals and loops end with the
`end`

keyword. The `skip`

statement simply does nothing, and it is
equivalent to the empty statement in many languages. It allows us to
write conditionals that do nothing in a branch. Variables consist of
any identifier, and we will use combinations of letters and numbers to
denote them. Any integer can be used as a number literal.

## States and Transitions¶

The *state* of a program consists of a mapping from variables to
values, and we will use the lowercase Greek sigma (\(\sigma\)) to
denote a state. In our simple language, variables only hold integer
values, and the value of a variable \(v\) is specified as
\(\sigma(v)\). In the initial state, the value of each variable is
undefined. We use the notation

to denote a state where the value of the variable \(v\) has value \(n\), but the remaining variables have the same value as in \(\sigma\). Formally, we have

A *transition* denotes the result of a computation:

The left-hand side is the combination of a program fragment \(s\) and an initial state \(\sigma\). The right-hand side consists of a value \(u\) and a new state \(\sigma\). The transition as a whole denotes that \(s\), when computed in the context of state \(\sigma\), results in the value \(u\) and a new state \(\sigma'\). If the computation does not produce a value, then no value appears on the right-hand side:

Similarly, if the computation does not result in a new state, then the state may be elided from the right-hand side:

In big-step operational semantics, a transition may only result in a value and/or state. Program fragments may not appear on the right-hand side of a transition. Thus, a transition specifies the complete result of computing a program fragment.

We specify computation in the form of *transition rules*, also called
*derivation rules*. They have the following general form:

Only transitions may appear at the top or bottom of a rule. The top of
a rule is called the *premise*, and the bottom the *conclusion*. It
should thus be read as a conditional rule: if program fragment
\(s_1\), when computed in state \(\sigma_1\), evaluates to
value \(u_1\) in state \(\sigma_1'\), \(\dots\), and
\(s_k\), when computed in state \(\sigma_k\), evaluates to
\(u_k\) in state \(\sigma_k'\), then fragment \(s\) in
state \(\sigma\) can evaluate to \(u\) in state
\(\sigma'\). If a computation does not affect the state of the
program, then the state may be elided from the right-hand side of a
transition. Similarly, if a computation does not result in a value, as
in the execution of a statement, then the right-hand side of a
transition will not include a value.

A transition rule prescribes how to perform a computation in an interpreter. A particular program fragment \(p\) can be interpreted by finding a transition rule where \(p\) appears in the conclusion and performing the computations listed in the premise of the rule. The results of these smaller computations are then combined as specified in the rule to produce the result of program fragment \(p\). If more than one rule has \(p\) in its conclusion, then the interpreter is free to chose which of the rules to apply. Each computational step in a program applies a transition rule, and a program terminates when no more transition rules can be applied.

## Expressions¶

Expressions are generally used for the values to which they evaluate, and in our language, expressions do not have side effects. As a result, the right-hand side of transitions will not include a new state in most of the rules we define below.

### Arithmetic Expressions¶

An integer literal evaluates to the respective integer in all cases. The transition rule is as follows, where \(n\) denotes an arbitrary integer:

A rule like this, with an empty premise, is called an *axiom*. Axioms
are the starting point of computation, as we will see below.

A variable, denoted by \(v\) below, evaluates to its value as tracked by the state:

The rules for addition, subtraction, and multiplication are as follows:

In evaluating \((a_1 + a_2)\) in state \(\sigma\), if \(a_1\) evaluates to \(n_1\) in \(\sigma\) and \(a_2\) to \(n_2\), then \((a_1 + a_2)\) evaluates to the sum of \(n_1\) and \(n_2\). Similarly for subtraction and multiplication.

The process of evaluating a compound expression results in a derivation tree starting with axioms. For example, consider the evaluation of \(((x + 3) * (y - 5))\), where \(x\) and \(y\) are variables with values 1 and 2, respectively, in state \(\sigma\). The full derivation tree is as follows:

In this tree, we’ve applied transition rules to each subexpression to get from axioms to the conclusion that \(((x + 3) * (y - 5))\) evaluates to -12 in \(\sigma\).

The tree above demonstrates how computation could proceed in an interpreter. The program fragment \(((x + 3) * (y - 5))\) has the form \((a_1 * a_2)\), where \(a_1 = (x + 3)\) and \(a_2 = (y - 5)\). The interpreter would thus apply the rule for multiplication, which in turn requires computing \((x + 3)\) and \((y - 5)\). The former has the form \((a_1 + a_2)\), so the interpreter would apply the rule for addition, which itself requires the computation of \(x\) and \(3\). The former is a variable, so applying the rule for a variable results in the value \(1\), while the latter is an integer literal, which evaluates to the value \(3\) that it represents. Thus, the addition \((x + 3)\) evaluates to the value \(4\). Repeating the same process for the expression \((y - 5)\) results in the value \(-3\), so the full program fragment evaluates to \(-12\).

### Order of Evaluation¶

If expressions may have side effects, then transitions must include a new state, and we need to consider the order of evaluation of operands. The following rule specifies that the left-hand operand of an addition must be evaluated before the right-hand operand:

In this rule, we’ve specified that the first operand is to be evaluated in the original state, while the second operand is to be evaluated in the new state produced by evaluating the first operand. The final state is the new state produced by evaluating the second operand.

If, on the other hand, we choose to allow operands to be evaluated in
either order, but require that they be evaluated in *some* order, we
can introduce a second rule for addition that enables the evaluation
to done in reverse order:

Now, in evaluating \((a_1 + a_2)\), we can apply either rule to get either order of evaluation. Thus, implementations are now free to evaluate operands in either order.

### Boolean Expressions¶

There are two axioms corresponding to boolean expressions:

The following are the rules for comparisons, assuming that expressions have no side effects:

The rules for negation are as follows:

Conjunction can be specified as follows:

Notice that this rule does not short circuit: it requires both operands of \(\textbf{and}\) to be evaluated. If we want short circuiting, we can use the following rules for conjunction instead:

Here, the right-hand side need only be evaluated when the left-hand side is true. An interpreter, upon encountering a conjunction, would evaluate the left-hand operand. If the result is false, the first rule must be applied, but if it is true, then the second rule must apply.

## Statements¶

Statements in imperative programs are generally used for their side effects, so they change the state of the program. In our language, statements do not have a value. In our transition rules below, the right-hand side of a transition will be a new state, representing the state that results from completely executing the statement:

The intended meaning of such a transition is that executing statement \(s\) in state \(\sigma\) terminates in a new state \(\sigma'\). Not all statements terminate; a statement that does not terminate will not yield a final state through any sequence of transition rules.

The \(\textbf{skip}\) statement terminates with no effect on the state:

Assignment produces a new state such that the given variable now has the value of the given expression:

As described in States and Transitions, the notation \(\sigma[v := n]\) denotes a state where variable \(v\) has the value \(n\), but all other variables have the same value is in \(\sigma\). Thus, the assignment \(v = a\) produces a new state where \(v\) has the value that is the result of evaluating \(a\), but the remaining variables are unchanged.

Sequencing ensures that the second statement executes in the new state produced from executing the first:

Conditionals require separate rules for when the predicate is true or false:

If the test evaluates to true, then the first rule applies, executing
the *then* statement. If the test is false, on the other hand, the
second rule applies, executing the *else* statement.

A loop whose predicate is false has no effect:

On the other hand, a loop whose predicate is true has the same effect as executing the body and then recursively executing the loop in the resulting state:

The following demonstrates the execution of the terminating loop \(\textbf{while}~ (x <= 2)~ \textbf{do}~ x = (x + 1)~ \textbf{end}\), with \(x\) having an initial value of 1. Applying a single transition rule for \(\textbf{while}\), along with fully evaluating the predicate and executing one iteration of the body, yields:

Recursively executing the \(\textbf{while}\) produces the following, where we’ve truncated the derivation tree for the predicate and body:

One more recursive execution results in:

This implies that the final state is \(\sigma' = \sigma[x := 3]\), so the result of the \(\textbf{while}\) loop is that \(x\) now has value 3.

As an example of a non-terminating or *divergent* computation,
consider the loop \(\textbf{while}~ \textbf{true}~ \textbf{do}~
\textbf{skip}~ \textbf{end}\). Applying the transition rule for
\(\textbf{while}\) results in:

In order to execute the \(\textbf{while}\) in the premise, we need to recursively apply the same transition rule, producing the same result. This repeats forever, resulting in a divergent computation.

## Examples¶

Operational semantics allows us to reason about the execution of
programs, specify equivalences between program fragments, and prove
statements about programs. As an example, the following rule specifies
an equivalence between two forms of `define`

in Scheme:

In Scheme, an expression produces a value but may also have side
effects, so the right-hand side of a transition includes a new state.
The rule above states that if the expression \((\texttt{define}~
f~ (\texttt{lambda}~ (params)~ body))\) results in a particular value
and new state, then the expression \((\texttt{define}~ (f~
params)~ body)\) evaluates to the same value and new state. Thus, an
interpreter could handle the latter `define`

form by translating it
to the former and proceeding to evaluate the translated form.

As another example, in our simple language above, we can demonstrate that swapping operands in an addition is a legal transformation, since \(x + y\) and \(y + x\) always evaluate to the same value:

## Operational Semantics for Lambda Calculus¶

As another example, we proceed to develop operational semantics for lambda calculus. Computation in \(\lambda\)-calculus does not involve a state that maps variables to values. Thus, transitions have the following form, with neither a state on the left-hand nor on the right-hand side:

An expression \(e\) that is in normal form evaluates to itself:

The following defines whether or not an expression is in normal form:

Here, \(v\) denotes a variable, while \(e\), and \(e_i\) denote arbitrary expressions. A variable is always in normal form, while a function abstraction is in normal form exactly when its body is in normal form. For a function application, if the left-hand side is a variable or application in normal form, then the overall expression is in normal form. On the other hand, if the left-hand side is an abstraction, then a \(\beta\)-reduction can be applied, so the application is not in normal form.

A function abstraction that is not in normal form is evaluated by evaluating its body:

In a function application, a \(\beta\)-reduction involves substituting the parameter of a function for its argument in the body of the function, then evaluating the substituted body. Assuming that no variable names are shared between the function and its argument, the following rule specifies this computation:

The \(subst(body, var, arg)\) transformer performs substitution of an expression \(arg\) for a variable \(var\) in some larger expression \(body\). It can be defined as follows:

A variable is substituted with the argument expression if it is the same as the variable being replaced. Otherwise, substitution has no effect on the variable.

For a function abstraction, if the function’s parameter has the same name as the substitution variable, then all references to that name within the body of the function refer to the parameter rather than the substitution variable. Thus, the body should remain unchanged. On the other hand, if the parameter name is different, then the body itself should recursively undergo substitution.

Finally, applying substitution to a function application recursively applies it to both the function and its argument.

The transition rule above for \(\beta\)-reduction assumes that no \(\alpha\)-reduction is necessary between a function and its argument. However, \(\alpha\)-reduction becomes necessary in the following cases:

The argument contains a bound variable with the same name as a bound or free variable in the function. The following are examples:

\[\begin{split}(\lambda x.~ \lambda y.~ x~ y)~ (\lambda y.~ y)\\ (\lambda x.~ x~ y)~ (\lambda y.~ y)\end{split}\]The function contains a bound variable with the same name as a free variable in the argument. The following is an example:

\[(\lambda x.~ \lambda y.~ x~ y)~ y\]

In the first case, our convention is to apply \(\alpha\)-reduction to the argument, while in the second, we are forced to \(\alpha\)-reduce the function.

Thus, we need to determine the bound and free variables of an expression. We first define \(boundvars(expr)\) to collect the bound variables of an expression \(expr\):

A variable on its own contributes no bound variables. The bound variables of a function application are the union of the bound variables in the function and its argument. The bound variables of a function abstraction are the bound variables of the body plus the parameter of the function itself.

In order to determine the free variables of an expression, we require as input the set of variables that are bound when the expression is encountered. We define \(freevars(bound, expr)\) as follows:

A variable is free if it is not included in the bound set. The free variables of a function application are the union of the free variables in the function and its argument. The free variables of a function abstraction are the free variables in the body, using a bound set that includes the parameter of the abstraction.

We can also define a transformer \(alpha(vars, expr)\) to rename the bound variables in \(expr\) that occur in the set \(vars\):

A variable on its own is not bound, so it should not be renamed. A
function application is renamed by renaming both the function and its
argument. For a function abstraction, if the parameter appears in
\(vars\), we replace it with a new name that is *fresh*, meaning
that it is not used anywhere in the program. This requires applying
substitution to the body, replacing the old variable name with the new
one. We also have to recursively apply renaming to the body, whether
the parameter is replaced or not.

To put this all together, we define a transformer \(beta(func, arg)\) for performing \(\beta\)-reduction when \(func\) is applied to \(arg\):

Here, \(alpha_{arg}(func, arg)\) applies renaming to \(arg\) given the bound and free variables in \(func\), and \(alpha_{func}(func, arg)\) applies renaming to \(func\) given the free variables in \(arg\). The result must be an abstraction of the form \(\lambda v'.~ e'\), so \(beta(func, arg)\) proceeds to substitute \(v'\) for the renamed argument in the body \(e'\).

We can now proceed to write a general transition rule for \(\beta\)-reduction:

Finally, we need a transition rule for a sequence of function applications:

We can apply the rules to derive the following computation for \((\lambda x.~ \lambda y.~ y~ x)~ x~ a\):

Here, we’ve applied the rule for a sequence of function applications, then applied the \(\beta\)-reduction rule to each of the premises. The end result is \(a~ x\).

# Formal Type Systems¶

We now turn our attention to type systems and *type checking*, which
determines whether or not the use of types in a program is correct.
Given a language, we will define rules to determine the type of each
expression in the language. Where the rules do not assign a type for a
particular expression, that expression should be considered erroneous.

We start with a simple language of boolean and integer expressions, parenthesized where necessary to avoid ambiguity:

The two types in this language are \(Int\) for integer expressions
and \(Bool\) for boolean expressions. We use the notation \(t
~:~ T\) to denote that a term \(t\) has the type \(T\). A
statement of the form \(t ~:~ T\) is often called a *typing
relation* or *type judgment*.

The base typing rules assign types to integer and boolean literals:

For more complex expressions, we have derivation rules that are similar
to those in operational semantics, where the top of the rule is the
*premise* and the bottom the *conclusion*. The following is the rule
for addition:

This rule states that if \(t_1\) has type \(Int\), and \(t_2\) has type \(Int\), then the term \((t_1 + t_2)\) also has type \(Int\). Thus, the rule allows us to compute the type of a larger expression from the types of the subexpressions, as in the following derivation:

On the other hand, an expression such as \((\textbf{true} + 1)\) is not well typed: since \(\textbf{true} ~:~ Bool\), the premise in the rule for addition does not hold, so it cannot be applied to derive a type for \((\textbf{true} + 1)\). Since no type can be derived for the expression, the expression does not type check, and it is erroneous.

The following rules for subtraction and multiplication are similar to that of addition:

The rule for comparison requires that the two operands have type \(Int\), in which case the type of the overall expression is \(Bool\):

The rule for conjunction requires that the operands have type \(Bool\), and the resulting expression also has type \(Bool\). Negation similarly requires its operand to have type \(Bool\):

The conditional expression requires the test to have type \(Bool\). However, the only restrictions on the remaining two operands is that they are well typed, and that they both have the same type. For example, an expression such as \((\textbf{if}~ test~ \textbf{then}~ 3~ \textbf{else}~ 5)\) will always produce an integer, regardless of the value of \(test\), while \((\textbf{if}~ test~ \textbf{then}~ false~ \textbf{else}~ true)\) will always produce a boolean. Thus, our typing rule has a type variable \(T\) to represent the type of the last two operands, ensuring that they match:

## Variables¶

Now that we have typing rules for a simple language of booleans and integers, we proceed to introduce variables into the language. For the purposes of typing, we will assume that each variable in a program has a distinct name. As we saw in lambda calculus, we can rename variables if necessary so that this is the case.

We introduce the following syntax for a binding construct to the language:

The semantics of this construct are to replace all occurrences of the given variable in the body of the \(\bf let\) with the variable’s bound value. Thus, an expression such as the following should produce an integer:

On the other hand, the following expression should not type check, since replacing \(x\) with its bound value results in an ill-typed body:

In order to determine whether or not the body of a \(\bf let\) is
well typed, we need a *type context* or *type environment* that keeps
track of the type of the variables that are in scope. The following is
the notation we use for a type environment:

The symbol \(\Gamma\) represents a type environment.

The notation \(x : T \in \Gamma\) denotes that \(\Gamma\) maps the name \(x\) to the type \(T\).

We extend a type environment as \(\Gamma, x : T\), which denotes the type environment that assigns the type \(T\) to \(x\) but assigns all other variables the same type as in \(\Gamma\).

We express a type judgment as \(\Gamma ~\vdash~ t ~:~ T\), which states that the term \(t\) has type \(T\) within the context of the type environment \(\Gamma\).

As indicated by the last point above, type judgments are now made in the context of a type environment that maps variables to their types. If a particular term has the same type regardless of typing environment, then we can elide the environment in a type judgment. For example, the judgment \(\vdash~ \textbf{true} ~:~ Bool\) indicates that \(\bf true\) always has type \(Bool\) within the context of any type environment.

The following are our existing typing rules using the notation of type environments:

We need a rule for typing a variable:

Here, we use \(v\) to denote a variable. The rule states that if the type environment \(\Gamma\) maps the variable \(v\) to type \(T\), then the term consisting of \(v\) itself has type \(T\) within the context of \(\Gamma\).

We can now add a rule for the \(\bf let\) binding construct:

Here, we use \(v\) to denote the name of the variable introduced by the \(\bf let\). The rule states that if the initializer expression is assigned the type \(T_1\) within the context of the original type environment \(\Gamma\), and the body has type \(T_2\) within the context of the original environment extended with the mapping \(v : T_1\), then the overall \(\bf let\) expression also has type \(T_2\). We can use this to derive the type of our first \(\bf let\) example in the context of any type environment:

## Functions¶

Now that we have typing rules for expressions of booleans and integers, we proceed to add functions to our language and introduce rules for computing the types of function abstractions and applications. As in lambda calculus, we will consider functions that take in exactly one argument. A function then has two types that are relevant: the type of the argument to the function, and the type of its return value. We will use the notation \(T_1 \rightarrow T_2\) to denote the type of a function that takes in an argument of type \(T_1\) and returns a value of type \(T_2\).

For simplicity, we will require that the parameter type of a function be explicitly specified. It would also be reasonable to infer the type of the parameter from how it is used in the body, or to deduce the type of the parameter independently each time the function is applied to an argument. The latter would provide a form of parametric polymorphism. However, we will not consider such schemes here.

To allow functions to be defined, with explicit typing of parameters, we extend our language as follows:

We introduce two new expressions, one for function abstraction and one
for function application, borrowing syntax from
\(\lambda\)-calculus. We also introduce types into our grammar,
with \(Int\) and \(Bool\) as the non-function types. A
function type is specified by separating its input and output types by
the *type constructor* \(\rightarrow\). When chained, the type
constructor is right associative, so that \(Int \rightarrow Int
\rightarrow Bool\) is equivalent to \(Int \rightarrow (Int
\rightarrow Bool)\), denoting a function that takes in an \(Int\)
and returns a function with type \(Int \rightarrow Bool\).

As with \(\bf let\), we will assume that parameter names introduced by \(\bf lambda\) expressions are distinct from any other names in the program, knowing that we can always rename variables to ensure that this is the case.

We can now define the typing rule for abstraction as follows:

The rule states that if the body \(t_2\) has type \(T_2\) in the type environment that consists of \(\Gamma\) extended with the mapping \(v : T_1\) for the parameter, then the function as a whole has type \(T_1 \rightarrow T_2\). Thus, the function takes in a value of type \(T_1\) as an argument and returns a value of type \(T_2\).

The following is the rule for application:

This states that if the function has type \(T_2 \rightarrow T_3\), taking in a \(T_2\) and returning a \(T_3\), and the argument has the requisite type \(T_2\), then the application results in the type \(T_3\).

As an example, consider the following program fragment:

We can derive the type of this expression in any type environment as follows:

At the bottom of the derivation, we apply the rule for \(\bf let\), requiring us to compute the type of the variable initializer as well as the type of the body in a type environment where the new variable has its computed type.

To compute the type of the initializer, we apply the rule for abstraction, requiring us to compute the type of the body in a type environment with the function parameter having its designated type of \(Int\). This applies the rule for \(<=\), further requiring computation of types for the variable \(x\) and integer literal \(10\). The body then has the type \(Bool\), so the abstraction has type \(Int \rightarrow Bool\).

We can then compute the type of the body of the \(let\), in a type context where \(f\) has type \(Int \rightarrow Bool\). This requires us to apply the rule for function application, computing the type of both the function and its argument. The function is the variable \(f\), which has type \(Int \rightarrow Bool\) in the type environment. The argument is the integer literal \(3\), which has type \(Int\). Thus, the application is applying an \(Int \rightarrow Bool\) to an \(Int\), resulting in \(Bool\). This is also the type of the \(\bf let\) expression as a whole.

## Subtyping¶

Our working language now has the base types \(Bool\) and \(Int\), as well as function types. Let us extend the language by adding floating-point numbers:

We add a typing rule for floating-point literals:

We would also like to allow operations such as addition on expressions of type \(Float\). We could define a separate rule for adding two \(Float\)s:

However, the combination of this rule and the rule for adding \(Int\)s does not permit us to add a \(Float\) and an \(Int\). Adding more rules for such a combination is not a scalable solution: introducing more numerical types would result in a combinatorial explosion in the number of rules required.

Functions pose a similar problem. If we define a function such as \((\textbf{lambda}~ x : Float .~ (x + 1.0))\), we would like to be able to apply it to an \(Int\) as well as a \(Float\). Conceptually, every integer is also a floating-point number 3, so we would expect such an operation to be valid.

- 3
This may not actually be the case in the implementation, depending on the representation used for the two types. However, it still makes sense semantically that an integer should be allowed where a floating-point number is expected.

Rather than adding more rules to permit this specific case, we
introduce a notion of *subtyping* that allows a type to be used in
contexts that expect a different type. We say that type \(S\) is a
*subtype* of type \(T\) if a term of type \(S\) can be
substituted anywhere a term of type \(T\) is expected. We use the
notation \(S <: T\) to denote that \(S\) is a subtype of
\(T\).

The subtype relation \(<:\) must satisfy the following requirements:

It is

*reflexive*, meaning that for any type \(S\), it must be that \(S <: S\), so that \(S\) is a subtype of itself.It is

*transitive*, so that \(S <: T\) and \(T <: U\) implies that \(S <: U\).

Thus, the subtype relation must be a *preorder*. In many languages,
the subtype relation is also a *partial order*, additionally
satisfying the following:

It is

*antisymmetric*, so that \(S <: T\) and \(T <: S\) implies that \(S = T\).

In our working language, we specify that \(Int\) is a subtype of \(Float\):

To allow our type system to accommodate subtyping, we introduce a new
typing rule, called the *subsumption* rule, to enable a subtype to be
used where a supertype is expected:

This rule states that if the type of term \(t\) has been computed as \(S\), and \(S\) is a subtype of \(T\), then we can also conclude that \(t\) has type \(T\). This allows a function that expects a \(Float\) to be applied to an \(Int\) as well:

### Subtyping and Arithmetic Operators¶

It may be tempting to rewrite the rules for arithmetic operators on numbers to require both operands to be of the \(Float\) type:

However, such a rule always produces a \(Float\) as a result. This precludes us from using the result as an argument to a function that expects an \(Int\) as its argument: it is not the case that \(Float <: Int\), so we cannot use a \(Float\) in a context that requires \(Int\).

Instead, we need to rewrite the rule such that it produces a \(Float\) when at least one of the operands is a \(Float\), but it results in an \(Int\) if both operands are \(Int\)s. More generally, we desire the following, where \(T_1\) and \(T_2\) are the types of the two operands:

Both operands are of numerical type. In our language, this means that they are each of a type that is some subtype of \(Float\). Thus, we require that \(T_1 <: Float\) and \(T_2 <: Float\).

The result is the

*least upper bound*, or*join*, of the two operand types. This means that the result type is the minimal type \(T\) such that \(T_1 <: T\) and \(T_2 <: T\). We use the notation 4 \(T = T_1 \sqcup T_2\) to denote that \(T\) is the join of \(T_1\) and \(T_2\).- 4
The symbols \(\vee\) and \(\cup\) are also commonly used to denote the least upper bound. However, we will stick to \(\sqcup\) to avoid confusion with disjunction and set union.

Since \(S <: S\), it is always the case that \(S = S \sqcup S\). Thus, in our language, we have \(Int = Int \sqcup Int\), \(Float = Float \sqcup Float\), and \(Float = Int \sqcup Float\).

Putting these requirements together, we can define the typing rule for addition as follows:

Subtraction and multiplication can be similarly defined.

### The Top Type¶

Many languages include a \(Top\) type, also written as \(\top\), that is a supertype of every other type in the language. Thus, for any type \(S\), we have:

The \(Top\) type corresponds to the \(Object\) type in many
object-oriented languages. For example, the `object`

type in Python
is a supertype of every other type.

Introducing \(Top\) into our language ensures that a join exists for every pair of types in the language. However, it is not necessarily the case in general that a particular language has a join for every pair of types, even if it has a \(Top\) type.

The existence of a join for each pair of types allows us to loosen the typing rule for conditionals:

Rather than requiring that both branches have exactly the same type, we allow each branch to have an arbitrary type. Since we can always compute the join of the two types in our language, the resulting type of the conditional is the join of the types of the branches.

### Subtyping and Functions¶

In a language with higher-order functions, subtyping is also applicable to function types. There are contexts where it would be semantically valid to accept a function type that is different from the one that is expected. For instance, consider the following higher-order function:

This function takes in another function \(f\) as an argument and then applies \(f\) to an \(Int\). If the actual function provided as an argument had type \(Float \rightarrow Bool\) instead, it would still be semantically valid to invoke it on an \(Int\). Thus, it should be the case that \(Float \rightarrow Bool <: Int \rightarrow Bool\), since the former can be used in contexts that expect the latter.

Now consider another higher-order function:

This new function takes in a function \(f\) and applies it to an \(Int\) to produce a \(Float\). However, if the function we provide as the argument has type \(Int \rightarrow Int\), it would produce an \(Int\); the latter is a valid substitution for a \(Float\), making such an argument semantically valid. Thus, it should also be the case that \(Int \rightarrow Int <: Int \rightarrow Float\).

Putting both cases together, we end up with the following subtyping rule for functions:

A function that accepts parameters of type \(S_1\) accepts more
general argument values than one that accepts type \(T_1\); the
former has a more general *domain* than the latter. Any contexts that
expect to pass a \(T_1\) as an argument would be just as well
served if the function accepts an \(S_1\). Thus, the function type
that accepts an \(S_1\) should be substitutable for the function
type that accepts a \(T_1\).

A function that produces a return value of type \(S_2\) has a more
restricted set of outputs, or *codomain*, than a function that produces
a \(T_2\). Any context that expects a \(T_2\) as output would
be just as well served by an \(S_2\) as output. Thus, the function
type that produces an \(S_2\) should be substitutable for the
function type that produces a \(T_2\).

The subtyping rule permits a *contravariant* parameter type in the
function subtype: it is contravariant since the direction of the
relation \(<:\) is reversed for the parameter types compared to
the relation for the function types. The rule also permits a
*covariant* return type, since the direction of \(<:\) is the same
for the return types and the function types.

Covariant return types often appear in object-oriented languages in a different context, that of overriding a base class’s method, for the same semantic reasons they are valid here. We will discuss covariance and contravariance in object-oriented programming in more detail later.

## Full Typing Rules¶

Putting together all the features we have discussed, the following are the rules for subtyping:

\(Top\):

\[S <: Top\]Numbers:

\[Int <: Float\]Functions:

\[\frac{ T_1 <: S_1 ~~~~~~~~~~ S_2 <: T_2 }{ S_1 \rightarrow S_2 <: T_1 \rightarrow T_2 }\]Subsumption:

\[\frac{ \Gamma ~\vdash~ t ~:~ S ~~~~~~~~~~ S <: T }{ \Gamma ~\vdash~ t ~:~ T }\]

The typing rules for each kind of term are as follows:

Literals:

\[\frac{ }{ \vdash~ IntegerLiteral ~:~ Int }\]\[\frac{ }{ \vdash~ \textbf{true} ~:~ Bool }\]\[\frac{ }{ \vdash~ \textbf{false} ~:~ Bool }\]\[\frac{ }{ \vdash~ FloatingLiteral ~:~ Float }\]Arithmetic:

\[\frac{ \Gamma ~\vdash~ t_1 ~:~ T_1 ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ T_2 ~~~~~~~~~~ T_1 <: Float ~~~~~~~~~~ T_2 <: Float ~~~~~~~~~~ T = T_1 \sqcup T_2 }{ \Gamma ~\vdash~ (t_1 + t_2) ~:~ T }\]\[\frac{ \Gamma ~\vdash~ t_1 ~:~ T_1 ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ T_2 ~~~~~~~~~~ T_1 <: Float ~~~~~~~~~~ T_2 <: Float ~~~~~~~~~~ T = T_1 \sqcup T_2 }{ \Gamma ~\vdash~ (t_1 - t_2) ~:~ T }\]\[\frac{ \Gamma ~\vdash~ t_1 ~:~ T_1 ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ T_2 ~~~~~~~~~~ T_1 <: Float ~~~~~~~~~~ T_2 <: Float ~~~~~~~~~~ T = T_1 \sqcup T_2 }{ \Gamma ~\vdash~ (t_1 * t_2) ~:~ T }\]Comparisons 5:

- 5
We avoid unnecessary conversions in this rule, in light of the fact that many implementations use different representations for integer and floating-point values. Of course, such an implementation would still require a conversion when the operand types are different.

\[\frac{ \Gamma ~\vdash~ t_1 ~:~ T_1 ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ T_2 ~~~~~~~~~~ T_1 <: Float ~~~~~~~~~~ T_2 <: Float }{ \Gamma ~\vdash~ (t_1 <= t_2) ~:~ Bool }\]Logic:

\[\frac{ \Gamma ~\vdash~ t_1 ~:~ Bool ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ Bool }{ \Gamma ~\vdash~ (t_1 ~\textbf{and}~ t_2) ~:~ Bool }\]\[\frac{ \Gamma ~\vdash~ t ~:~ Bool }{ \Gamma ~\vdash~ \textbf{not}~ t ~:~ Bool }\]Conditionals:

\[\frac{ \Gamma ~\vdash~ t_1 ~:~ Bool ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ T_2 ~~~~~~~~~~ \Gamma ~\vdash~ t_3 ~:~ T_3 ~~~~~~~~~~ T = T_2 \sqcup T_3 }{ \Gamma ~\vdash~ (\textbf{if}~ t_1 ~\textbf{then}~ t_2 ~\textbf{else}~ t_3) ~:~ T }\]Variables:

\[\frac{ v : T \in \Gamma }{ \Gamma ~\vdash~ v ~:~ T }\]\(\bf let\):

\[\frac{ \Gamma ~\vdash~ t_1 ~:~ T_1 ~~~~~~~~~~ \Gamma, v : T_1 ~\vdash~ t_2 ~:~ T_2 }{ \Gamma ~\vdash~ (\textbf{let}~ v = t_1 ~ \textbf{in}~ t_2) ~:~ T_2 }\]Function abstraction and application:

\[\frac{ \Gamma, v : T_1 ~\vdash~ t_2 ~:~ T_2 }{ \Gamma ~\vdash~ (\textbf{lambda}~ v : T_1 .~ t_2) ~:~ T_1 \rightarrow T_2 }\]\[\frac{ \Gamma ~\vdash~ t_1 ~:~ T_2 \rightarrow T_3 ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ T_2 }{ \Gamma ~\vdash~ (t_1~ t_2) ~:~ T_3 }\]