We now turn our attention to theoretical foundations of programming
languages and the meaning of code. These foundations are crucial to
understanding how languages, programs, and their implementations work.
We start by examining lambda calculus, the mathematical foundation of
functional programming, and use it to reason about how to construct
abstractions and model computations. Its simplicity allows us to
understand every detail about how it works, yet it is general enough
to enable the expression of arbitrary computations.
Lambda calculus (also \(\lambda\)-calculus), introduced by Alonzo
Church in the 1930s, is a model of computation based on functions. All
functions in lambda calculus are anonymous, providing the inspiration
for lambda expressions in modern programming languages.
Lambda calculus is composed of only three elements: variables,
function abstraction, and function application. Function abstraction
is the process of defining a new function through a lambda
(\(\lambda\)) expression. The following is a context-free grammar
for \(\lambda\)-calculus:
We will use individual letters, such as \(x\) to denote a
variable. Function application is left associative and has higher
precedence than abstraction, and we will use parentheses where
necessary as a result of associativity and precedence. All functions
have exactly one parameter, and functions that would otherwise have
multiple parameters must be curried.
Since function application is left associative, a sequence of
applications such as \(f~g~h\) is equivalent to \(((f~g)~h)\).
And since function application has higher precedence than abstraction,
abstraction extends as far to the right as possible. Consider the
following example:
\[\lambda x .~ x~ \lambda y .~ x~ y~ z\]
The \(\lambda x\) introduces a function abstraction, which extends
as far right as possible:
\[\lambda x .~ \underline{x~ \lambda y .~ x~ y~ z}\]
Thus, this is equivalent to
\[\lambda x .~ (x~ \lambda y .~ x~ y~ z)\]
Then within the parentheses, the \(\lambda y\) introduces a new
abstraction, which now extends as far right as possible, to the point
of the existing closing parenthesis:
\[\begin{split} &\lambda x .~ (x~ \lambda y .~ \underline{x~ y~ z})\\
=~ &\lambda x .~ (x~ \lambda y .~ (x~ y~ z))\end{split}\]
Finally, within the body of the inner abstraction, we have a sequence
of function applications, which are left associative:
\[\lambda x .~ (x~ \lambda y .~ ((x~ y)~ z))\]
Using the syntax of Scheme, the following is a representation of
the function above:
(lambda(x)(x(lambda(y)((xy)z))))
(This is merely for illustration. Function semantics are different
between Scheme and \(\lambda\)-calculus, so using this syntax is
not meant to imply an equivalence.)
The following is the identity function:
\[\lambda x .~ x\]
The function takes in an argument, binds it to the parameter
\(x\), and immediately returns it.
Functions themselves are first-class values, so they can be bound to
parameters and returned. The following is a function that discards its
input and returns the identity function:
\[\lambda y .~ \lambda x .~ x\]
Since abstraction extends as far to the right as possible, this is
equivalent to the following parenthesization:
\[\lambda y .~ (\lambda x .~ x)\]
As another example, the following function takes in another function
as its argument and applies it to the identity function:
\[\lambda f .~ f~ \lambda x .~ x\]
In \(\lambda\)-calculus, functions are statically scoped. The
result is that in \(\lambda x .~ E\), \(x\) is bound in
\(E\), and \(E\) is the scope of \(x\). If the same name
is introduced multiple times within nested scopes, then use of the
name resolves to the closest abstraction that introduced it. The
following illustrates these rules:
The first abstraction introduces the name \(x\), so the scope of
\(x\) is the body of the first abstraction. Thus, when \(x\)
appears within the second abstraction, it resolves to the parameter of
the first abstraction. The second abstraction itself introduces the
name \(y\), so use of the name within its body resolves the the
associated parameter. Finally, the third abstraction reintroduces the
name \(x\), so \(x\) within its body resolves to the closest
introduction, i.e. the parameter of the third abstraction.
An unbound variable is allowed to appear in an expression, and such a
variable is called free. For example, in \(\lambda y .~ x~ y\),
\(x\) is a free variable in the expression \(x~ y\) but
\(y\) is bound. In \(\lambda x .~ \lambda y .~ x~ y\), both
\(x\) and \(y\) are bound in the expression \(\lambda y
.~ x~ y\). Free variables are useful for reasoning about subexpressions
such as \(\lambda y.~ x~ y\) in isolation without needing to
consider the full context in which the subexpression appears.
In the expression \(\lambda x .~ E\), replacing all occurrences of
\(x\) with another variable \(y\) does not affect the meaning
as long as \(y\) does not occur in \(E\). For example,
\(\lambda y .~ y\) is an equivalent expression of the identity
function. This process of variable replacement is called
\(\alpha\)-reduction, and we denote this replacement process
as follows:
The expressions \(\lambda x .~
x\) and \(\lambda y .~ y\) are \(\alpha\)-equivalent, and
we denote this equivalence is follows:
\[\lambda x.~ x ~=_\alpha~ \lambda y.~ y\]
In function application, \(\alpha\)-reduction is used to ensure
that names are restricted to the appropriate scope. This translation
has the same effect as environments in an interpreter. As an example,
consider applying the identity function to itself:
\[(\lambda x.~ x)~ (\lambda x.~ x)\]
First, we apply \(\alpha\)-reduction on the argument to ensure
that variables in the argument are distinct from those in the function
being applied:
We then replace each occurrence of the parameter with the argument
expression in the body of the function being applied. The result is
the body itself after this substitution process:
This argument-substituting procedure is called \(\beta\)-reduction, and it is similar to the call-by-name argument-passing
convention in programming languages. We denote \(\beta\)-reduction
as follows:
This expression is itself \(\alpha\)-equivalent to the identity
function, and the original expression \((\lambda x.~ x) (\lambda
x.~ x)\) is \(\beta\)-equivalent to the identity function since
it \(\beta\)-reduces to the same expression as the identity
function:
In the first function application, the variable names are already
distinct, so no \(\alpha\)-reduction is necessary. We can then
apply \(\beta\)-reduction to obtain:
Evaluating an expression in \(\lambda\)-calculus applies
\(\beta\)-reduction as long as possible, until the expression is
in normal form. Not all evaluations terminate. Consider a function
abstraction that applies an argument to itself:
\[\lambda x.~ x~ x\]
If we apply this to the identity function, we get:
Function application in \(\lambda\)-calculus is similar to
call by name in that the argument is not
evaluated before the function is applied. Instead, the argument
expression is substituted for the parameter directly in the body. This
results in lazy evaluation, where the argument expression is not
evaluated unless it is needed. As an example, consider the following:
The argument expression is a non-terminating computation, so if we
were to evaluate it prior to substitution, the computation as a whole
would not terminate. Instead, \(\lambda\)-calculus specifies
that the substitution happens first:
Since the parameter \(y\) does not appear in the body, the
argument expression is eliminated once the argument substitution is
made. Thus, the computation terminates, and its end result is the
identity function.
There is an important distinction between the evaluation process in
\(\lambda\)-calculus and call by name. In the former, function
bodies are reduced to normal form before the function is applied. This
is referred to as normal-order evaluation. By contrast, call by name
performs argument substitution before manipulating the body of the
function. The following illustrates normal-order evaluation:
Before the function on the left is applied, its body is reduced, which
involves applying the function \(\lambda y.~ y~ y\) to its
argument \(x\). This results in the expression \(x\), so the
function on the left becomes \(\lambda x.~ x~ x\). This is in
normal form, so the function can now be applied to its argument.
Further \(\alpha\)- and \(\beta\)-reductions result in the
final value of the identity function.
Summarizing the evaluation rules for a function application
\(f~x\), we have the following:
Reduce the body of the function \(f\) until it is in normal
form \(f_{normal}\).
If a bound-variable name appears in both \(f_{normal}\) and
\(x\), then perform \(\alpha\)-reduction on \(x\) so
that this is no longer the case [1], obtaining \(x_\alpha\).
Perform \(\beta\)-reduction by substituting \(x_\alpha\)
for the parameter of \(f_{normal}\) in the body of the latter.
The result of this reduction is the substituted body itself.
Proceed to reduce the substituted body until it is in normal form.
If a variable is free in \(f\) but bound in \(x\) or vice
versa, then \(\alpha\)-reduction must be applied in step 2 to
rename the bound variable. Thus:
Lambda calculus consists solely of variables and functions, and we can
apply \(\beta\)-reduction to substitute functions for variables.
However, none of the familiar values exist directly in
\(\lambda\)-calculus, such as integers or booleans. It is thus
surprising that \(\lambda\)-calculus can model any computational
process. We demonstrate this by encoding values as functions.
To start with, let us define an abstraction for the booleans
\(true\) and \(false\). The only building block we have to
work with is functions, and we need to ensure that the functions that
represent the two values are not \(\beta\)-equivalent so that we
can distinguish between them. There are many ways we can do so, but
the one we use is to define \(true\) and \(false\) as
functions that take two values and produce either the first or the
second value:
The \(=\) sign here means that we take this as a mathematical
definition; it does not denote assignment. Since all functions in
\(\lambda\)-calculus must take a single argument, the actual
definitions of \(true\) and \(false\) are curried. Applying \(true\) to two values results in the first:
Here, we use \(\rightarrow\) on its own to denote some sequence of
\(\alpha\)- and \(\beta\)-reductions. Applying \(and\)
to \(true\) and any other boolean results in the second boolean,
while applying \(or\) to \(true\) and another boolean always
results in \(true\). Similarly:
Applying \(and\) to \(false\) and some other boolean always
results in \(false\), while applying \(or\) to \(false\)
and another boolean results in the second boolean. Finally,
\(not\) works as follows:
If the condition \(p\) is \(true\), then applying \(p\) to
\(a\) and \(b\) results in \(a\), since \(true\)
selects the first of two values. On the other hand, if \(p\) is
\(false\), then applying \(p\) to \(a\) and \(b\)
results in \(b\), since \(false\) selects the second of two
values.
In order to represent structured data, we need an abstraction for a
pair of two values. As with booleans, the only mechanism at our
disposal is functions, so we need to produce a “container” function
that holds the two values within its body.:
The \(pair\) constructor takes two items \(x\) and \(y\)
and produces as a result a function that contains \(x\) and
\(y\) in its body. Applying \(pair\) to two concrete items
\(a\) and \(b\) results in:
In order to obtain the first item \(a\) above, we can substitute
\(true\) for \(f\), so that \(f~a~b\) evaluates to
\(a\). Similarly, to obtain the second item, we can substitute
\(false\) for \(f\). This leads to the following definitions
of the \(first\) and \(second\) selectors:
A number \(n\) is a higher-order function that, given another
function \(f\), produces a new function that applies \(f\) to
its argument \(n\) times in succession. Using the mathematical
notation \(f^k\) to denote the composition of \(f\) with
itself \(k\) times, e.g \(f^3 = f \circ f \circ f\), the
Church numeral \(n\) is a function that takes \(f\) and
produces \(f^n\).
As a concrete example, the \(right\) function above applies the
\(second\) function twice to its argument, so we can define it
instead as:
Given a number, \(incr\) produces a new one that applies a
function to an argument one more time than the original number. Thus,
where \(n\) turns its input \(f\) into \(f^n\), the result
of \(incr~ n\) turns its input \(f\) into \(f^{n+1}\).
This is accomplished by first applying \(n~ f\), which is
equivalent to \(f^n\), and then applying \(f\) one more time.
For example:
Here, we perform \(m\) additions of \(n\), starting at zero,
resulting in the product of \(m\) and \(n\).
We can define exponentiation using the same pattern. Decrement and
subtraction are a little more difficult to define, but are possible.
Finally, we need a predicate to determine when a number is zero:
We apply a number to a function that returns \(false\) and a
starting value of \(true\). Only if the function is never applied
is the result \(true\), otherwise it is \(false\):
Church numerals allow us to perform bounded repetition, but in order
to express arbitrary computation, we need a mechanism for unbounded
repetition. Since \(\lambda\)-calculus only has functions,
recursion is a natural mechanism for repetition.
In recursion, a function needs to be able to refer to itself by name.
However, in \(\lambda\)-calculus, the only way to introduce a
name is as a function parameter. Thus, a recursive function must take
itself as input. For example, the following defines a factorial
function:
Further evaluation results in the factorial of \(m\). Performing
the analogous operation in Python:
>>> apply=lambdag:g(g)>>> apply(factorial)(4)24
The \(apply\) function can be generalized as the following
function in \(\lambda\)-calculus, known as a fixed-point
combinator and, by convention, the Y combinator:
Notice that this is the same as \(factorial\), except that we have
not passed the input function to itself in the recursive application.
If we apply the Y combinator to \(F\) and apply the result to a
number, we get:
Thus, we see that applying the Y combinator to \(F\) results in a
recursive \(factorial\) function, and the Y combinator enables us
to write recursive functions in a simpler manner.
Lambda calculus models functional programming in its simplest and
purest form, and its ability to encode data and perform recursion
demonstrates the power of functional programming. It is not the only
model for computation, however. Perhaps the most famous model is the
Turing machine, described by Alan Turing around the same time as
Church’s work on \(\lambda\)-calculus. The Turing model is
imperative at its core, and it is more closely related to the workings
of modern machines than \(\lambda\)-calculus.
Many variants of Turing machines have been defined, but the following
is a description of one variant:
A tape device is used for storage, divided into individual cells
in a linear layout. Each cell contains a symbol from a finite
alphabet. The tape extends infinitely in both left and right
directions.
A head reads and writes symbols from the tape. It can be moved one
step at a time to the right or left.
A state register keeps track of the state of the machine. There
are a finite number of states the machine can be in, including
special start and halt states.
A table of instructions specifies what the machine is to do for
each combination of state and symbol. Since the sets of states and
symbols are finite, the instruction table is also finite. At each
step in the computation, the machine looks up the current state and
the symbol currently under the head in the table and follows the
specified instruction.
An instruction can either halt the machine, ending computation, or
do the following:
Write a specific symbol at the current position of the head.
Move the head either one step to the left or the right.
Go to a specified new state.
Analogizing with imperative programming, each instruction in a Turing
machine can be considered a statement, and each statement transfers
control to a new one in a manner similar to a goto.
Despite the vastly different model of computation, Alan Turing proved
that a Turing machine can solve exactly the same problems as
\(\lambda\)-calculus. This suggests that both models encompass
all of computation, a conjecture formalized in the Church-Turing
thesis. The thesis states that a function is computable by a human
following an algorithm if and only if it is computable by a Turing
machine, or equivalently, an expression in \(\lambda\)-calculus.
All known models of computation have been shown to be either
computationally equivalent to or weaker than Turing machines.
Equivalent models are said to be Turing complete. A programming
language also defines a model of computation, and all general-purpose
programming languages are Turing complete, whether they follow a
functional paradigm, an imperative one, or an alternative approach.
As mentioned previously, semantics is concerned with the meaning of
code fragments, as opposed to syntax, which is concerned with their
structure. We have seen that syntax can be formally described with
regular expressions and context-free grammars. Semantics can also be
described formally, and there are a number of approaches.
Denotational semantics specifies program behavior using set and
domain theory, with program fragments described as partial functions
over program state. Axiomatic semantics is concerned with proving
logical assertions over program state, so it specifies the meaning of
each construct with respect to its effect on these logical assertions.
Operational semantics specifies what each computational step does to
the state of a program, and what value is computed in each step.
Operational semantics more closely describes what an interpreter for a
language must perform for each step than denotational or axiomatic
semantics.
In this section, we will examine a form of operational semantics known
as structured operational semantics, and more specifically,
natural or big-step semantics. This form of semantics is
particularly well-suited to implementation in a recursive interpreter.
We specify rules for how the computation evolves for each syntactic
construct in a programming language. We will begin our exploration
with a simple imperative language.
Consider a simple imperative language with variables, integers,
booleans, statements, conditionals, and loops. The following is a
context-free grammar that describes this language:
In order to avoid ambiguities, arithmetic and boolean expressions are
parenthesized where necessary, and conditionals and loops end with the
end keyword. The skip statement simply does nothing, and it is
equivalent to the empty statement in many languages. It allows us to
write conditionals that do nothing in a branch. Variables consist of
any identifier, and we will use combinations of letters and numbers to
denote them. Any integer can be used as a number literal.
The state of a program consists of a mapping from variables to
values, and we will use the lowercase Greek sigma (\(\sigma\)) to
denote a state. In our simple language, variables only hold integer
values, and the value of a variable \(v\) is specified as
\(\sigma(v)\). In the initial state, the value of each variable is
undefined. We use the notation
\[\sigma[v := n]\]
to denote a state where the value of the variable \(v\) has
value \(n\), but the remaining variables have the same value as in
\(\sigma\). Formally, we have
\[\begin{split}\sigma[v := n](w) =
\begin{cases}
n, &\textrm{if}~ v = w\\
\sigma(w), &\textrm{if}~ v \neq w\\
\end{cases}\end{split}\]
A transition denotes the result of a computation:
\[\langle s, \sigma\rangle\Downarrow\langle u, \sigma'\rangle\]
The left-hand side is the combination of a program fragment \(s\)
and an initial state \(\sigma\). The right-hand side consists of a
value \(u\) and a new state \(\sigma\). The transition as a
whole denotes that \(s\), when computed in the context of state
\(\sigma\), results in the value \(u\) and a new state
\(\sigma'\). If the computation does not produce a value, then no
value appears on the right-hand side:
\[\langle s, \sigma\rangle\Downarrow \sigma'\]
Similarly, if the computation does not result in a new state, then the
state may be elided from the right-hand side:
\[\langle s, \sigma\rangle\Downarrow u\]
In big-step operational semantics, a transition may only result in a
value and/or state. Program fragments may not appear on the right-hand
side of a transition. Thus, a transition specifies the complete result
of computing a program fragment.
We specify computation in the form of transition rules, also called
derivation rules. They have the following general form:
Only transitions may appear at the top or bottom of a rule. The top of
a rule is called the premise, and the bottom the conclusion. It
should thus be read as a conditional rule: if program fragment
\(s_1\), when computed in state \(\sigma_1\), evaluates to
value \(u_1\) in state \(\sigma_1'\), \(\dots\), and
\(s_k\), when computed in state \(\sigma_k\), evaluates to
\(u_k\) in state \(\sigma_k'\), then fragment \(s\) in
state \(\sigma\) can evaluate to \(u\) in state
\(\sigma'\). If a computation does not affect the state of the
program, then the state may be elided from the right-hand side of a
transition. Similarly, if a computation does not result in a value, as
in the execution of a statement, then the right-hand side of a
transition will not include a value.
A transition rule prescribes how to perform a computation in an
interpreter. A particular program fragment \(p\) can be
interpreted by finding a transition rule where \(p\) appears in
the conclusion and performing the computations listed in the premise
of the rule. The results of these smaller computations are then
combined as specified in the rule to produce the result of program
fragment \(p\). If more than one rule has \(p\) in its
conclusion, then the interpreter is free to chose which of the rules
to apply. Each computational step in a program applies a transition
rule, and a program terminates when no more transition rules can be
applied.
Expressions are generally used for the values to which they evaluate,
and in our language, expressions do not have side effects. As a
result, the right-hand side of transitions will not include a new
state in most of the rules we define below.
In evaluating \((a_1 + a_2)\) in state \(\sigma\), if
\(a_1\) evaluates to \(n_1\) in \(\sigma\) and \(a_2\)
to \(n_2\), then \((a_1 + a_2)\) evaluates to the sum of
\(n_1\) and \(n_2\). Similarly for subtraction and
multiplication.
The process of evaluating a compound expression results in a
derivation tree starting with axioms. For example, consider the
evaluation of \(((x + 3) * (y - 5))\), where \(x\) and
\(y\) are variables with values 1 and 2, respectively, in state
\(\sigma\). The full derivation tree is as follows:
In this tree, we’ve applied transition rules to each subexpression to
get from axioms to the conclusion that \(((x + 3) * (y - 5))\)
evaluates to -12 in \(\sigma\).
The tree above demonstrates how computation could proceed in an
interpreter. The program fragment \(((x + 3) * (y - 5))\) has the
form \((a_1 * a_2)\), where \(a_1 = (x + 3)\) and \(a_2 =
(y - 5)\). The interpreter would thus apply the rule for
multiplication, which in turn requires computing \((x + 3)\) and
\((y - 5)\). The former has the form \((a_1 + a_2)\), so the
interpreter would apply the rule for addition, which itself requires
the computation of \(x\) and \(3\). The former is a variable,
so applying the rule for a variable results in the value \(1\),
while the latter is an integer literal, which evaluates to the value
\(3\) that it represents. Thus, the addition \((x + 3)\)
evaluates to the value \(4\). Repeating the same process for the
expression \((y - 5)\) results in the value \(-3\), so the
full program fragment evaluates to \(-12\).
If expressions may have side effects, then transitions must include a
new state, and we need to consider the order of evaluation of
operands. The following rule specifies that the left-hand operand of
an addition must be evaluated before the right-hand operand:
In this rule, we’ve specified that the first operand is to be
evaluated in the original state, while the second operand is to be
evaluated in the new state produced by evaluating the first operand.
The final state is the new state produced by evaluating the second
operand.
If, on the other hand, we choose to allow operands to be evaluated in
either order, but require that they be evaluated in some order, we
can introduce a second rule for addition that enables the evaluation
to done in reverse order:
Now, in evaluating \((a_1 + a_2)\), we can apply either rule to
get either order of evaluation. Thus, implementations are now free to
evaluate operands in either order.
Notice that this rule does not short circuit: it requires both
operands of \(\textbf{and}\) to be evaluated. If we want short
circuiting, we can use the following rules for conjunction instead:
Here, the right-hand side need only be evaluated when the left-hand
side is true. An interpreter, upon encountering a conjunction, would
evaluate the left-hand operand. If the result is false, the first rule
must be applied, but if it is true, then the second rule must apply.
Statements in imperative programs are generally used for their side
effects, so they change the state of the program. In our language,
statements do not have a value. In our transition rules below, the
right-hand side of a transition will be a new state, representing the
state that results from completely executing the statement:
\[\langle s, \sigma\rangle \Downarrow \sigma'\]
The intended meaning of such a transition is that executing statement
\(s\) in state \(\sigma\) terminates in a new state
\(\sigma'\). Not all statements terminate; a statement that does
not terminate will not yield a final state through any sequence of
transition rules.
The \(\textbf{skip}\) statement terminates with no effect on the
state:
Assignment produces a new state such that the given variable now has
the value of the given expression:
\[\frac{
\langle a, \sigma\rangle \Downarrow n
}{
\langle v = a, \sigma\rangle \Downarrow \sigma[v := n]
}\]
As described in States and Transitions, the notation
\(\sigma[v := n]\) denotes a state where variable \(v\) has
the value \(n\), but all other variables have the same value is in
\(\sigma\). Thus, the assignment \(v = a\) produces a new
state where \(v\) has the value that is the result of evaluating
\(a\), but the remaining variables are unchanged.
Sequencing ensures that the second statement executes in the new state
produced from executing the first:
If the test evaluates to true, then the first rule applies, executing
the then statement. If the test is false, on the other hand, the
second rule applies, executing the else statement.
On the other hand, a loop whose predicate is true has the same effect
as executing the body and then recursively executing the loop in the
resulting state:
The following demonstrates the execution of the terminating loop
\(\textbf{while}~ (x <= 2)~ \textbf{do}~ x = (x + 1)~
\textbf{end}\), with \(x\) having an initial value of 1. Applying a
single transition rule for \(\textbf{while}\), along with fully
evaluating the predicate and executing one iteration of the body,
yields:
This implies that the final state is \(\sigma' = \sigma[x := 3]\),
so the result of the \(\textbf{while}\) loop is that \(x\) now
has value 3.
As an example of a non-terminating or divergent computation,
consider the loop \(\textbf{while}~ \textbf{true}~ \textbf{do}~
\textbf{skip}~ \textbf{end}\). Applying the transition rule for
\(\textbf{while}\) results in:
In order to execute the \(\textbf{while}\) in the premise, we need
to recursively apply the same transition rule, producing the same
result. This repeats forever, resulting in a divergent computation.
Operational semantics allows us to reason about the execution of
programs, specify equivalences between program fragments, and prove
statements about programs. As an example, the following rule specifies
an equivalence between two forms of define in Scheme:
In Scheme, an expression produces a value but may also have side
effects, so the right-hand side of a transition includes a new state.
The rule above states that if the expression \((\texttt{define}~
f~ (\texttt{lambda}~ (params)~ body))\) results in a particular value
and new state, then the expression \((\texttt{define}~ (f~
params)~ body)\) evaluates to the same value and new state. Thus, an
interpreter could handle the latter define form by translating it
to the former and proceeding to evaluate the translated form.
As another example, in our simple language above, we can demonstrate
that swapping operands in an addition is a legal transformation, since
\(x + y\) and \(y + x\) always evaluate to the same value:
\[\begin{split}\frac{
\langle x, \sigma\rangle \Downarrow n_x ~~~
\langle y, \sigma\rangle \Downarrow n_y
}{
\langle(x + y), \sigma\rangle \Downarrow n
} ~~~ &\textrm{where}~ n = n_x + n_y\\\\
\frac{
\langle y, \sigma\rangle \Downarrow n_y ~~~
\langle x, \sigma\rangle \Downarrow n_x
}{
\langle(y + x), \sigma\rangle \Downarrow n
} ~~~ &\textrm{where}~ n = n_x + n_y\end{split}\]
As another example, we proceed to develop operational semantics for
lambda calculus. Computation in \(\lambda\)-calculus does not
involve a state that maps variables to values. Thus, transitions have
the following form, with neither a state on the left-hand nor on the
right-hand side:
\[e_1 \Downarrow e_2\]
An expression \(e\) that is in normal form evaluates to itself:
Here, \(v\) denotes a variable, while \(e\), and \(e_i\)
denote arbitrary expressions. A variable is always in normal form,
while a function abstraction is in normal form exactly when its body
is in normal form. For a function application, if the left-hand side
is a variable or application in normal form, then the overall
expression is in normal form. On the other hand, if the left-hand side
is an abstraction, then a \(\beta\)-reduction can be applied, so
the application is not in normal form.
A function abstraction that is not in normal form is evaluated by
evaluating its body:
In a function application, a \(\beta\)-reduction involves
substituting the parameter of a function for its argument in the body
of the function, then evaluating the substituted body. Assuming that
no variable names are shared between the function and its argument,
the following rule specifies this computation:
The \(subst(body, var, arg)\) transformer performs substitution of
an expression \(arg\) for a variable \(var\) in some larger
expression \(body\). It can be defined as follows:
A variable is substituted with the argument expression if it is the
same as the variable being replaced. Otherwise, substitution has no
effect on the variable.
For a function abstraction, if the function’s parameter has the same
name as the substitution variable, then all references to that name
within the body of the function refer to the parameter rather than the
substitution variable. Thus, the body should remain unchanged. On the
other hand, if the parameter name is different, then the body itself
should recursively undergo substitution.
Finally, applying substitution to a function application recursively
applies it to both the function and its argument.
The transition rule above for \(\beta\)-reduction assumes that no
\(\alpha\)-reduction is necessary between a function and its
argument. However, \(\alpha\)-reduction becomes necessary in the
following cases:
The argument contains a bound variable with the same name as a bound
or free variable in the function. The following are examples:
The function contains a bound variable with the same name as a free
variable in the argument. The following is an example:
\[(\lambda x.~ \lambda y.~ x~ y)~ y\]
In the first case, our convention is to apply
\(\alpha\)-reduction to the argument, while in the second, we are
forced to \(\alpha\)-reduce the function.
Thus, we need to determine the bound and free variables of an
expression. We first define \(boundvars(expr)\) to collect the
bound variables of an expression \(expr\):
A variable on its own contributes no bound variables. The bound
variables of a function application are the union of the bound
variables in the function and its argument. The bound variables of a
function abstraction are the bound variables of the body plus the
parameter of the function itself.
In order to determine the free variables of an expression, we require
as input the set of variables that are bound when the expression is
encountered. We define \(freevars(bound, expr)\) as follows:
A variable is free if it is not included in the bound set. The free
variables of a function application are the union of the free
variables in the function and its argument. The free variables of a
function abstraction are the free variables in the body, using a bound
set that includes the parameter of the abstraction.
We can also define a transformer \(alpha(vars, expr)\) to rename
the bound variables in \(expr\) that occur in the set
\(vars\):
A variable on its own is not bound, so it should not be renamed. A
function application is renamed by renaming both the function and its
argument. For a function abstraction, if the parameter appears in
\(vars\), we replace it with a new name that is fresh, meaning
that it is not used anywhere in the program. This requires applying
substitution to the body, replacing the old variable name with the new
one. We also have to recursively apply renaming to the body, whether
the parameter is replaced or not.
To put this all together, we define a transformer \(beta(func,
arg)\) for performing \(\beta\)-reduction when \(func\) is
applied to \(arg\):
Here, \(alpha_{arg}(func, arg)\) applies renaming to \(arg\)
given the bound and free variables in \(func\), and
\(alpha_{func}(func, arg)\) applies renaming to \(func\) given
the free variables in \(arg\). The result must be an abstraction
of the form \(\lambda v'.~ e'\), so \(beta(func, arg)\)
proceeds to substitute \(v'\) for the renamed argument in the body
\(e'\).
We can now proceed to write a general transition rule for
\(\beta\)-reduction:
We can apply the rules to derive the following computation for
\((\lambda x.~ \lambda y.~ y~ x)~ x~ a\):
\[\cfrac{
\cfrac{
\cfrac{
}{
\lambda y.~ y~ x
\Downarrow
\lambda y.~ y~ x
}
~~~
\cfrac{
}{
beta(\lambda x.~ \lambda y.~ y~ x, x) = \lambda y.~ y~ x
\Downarrow
\lambda y.~ y~ x
}
}{
(\lambda x.~ \lambda y.~ y~ x)~ x
\Downarrow
\lambda y.~ y~ x
}
~~~
\cfrac{
\cfrac{
}{
y~ x
\Downarrow
y~ x
}
~~~
\cfrac{
}{
beta(\lambda y.~ y~ x, a) = a~ x
\Downarrow
a~ x
}
}{
(\lambda y.~ y~ x)~ a
\Downarrow
a~ x
}
}{
(\lambda x.~ \lambda y.~ y~ x)~ x~ a
\Downarrow
a~ x
}\]
Here, we’ve applied the rule for a sequence of function applications,
then applied the \(\beta\)-reduction rule to each of the premises.
The end result is \(a~ x\).
We now turn our attention to type systems and type checking, which
determines whether or not the use of types in a program is correct.
Given a language, we will define rules to determine the type of each
expression in the language. Where the rules do not assign a type for a
particular expression, that expression should be considered erroneous.
We start with a simple language of boolean and integer expressions,
parenthesized where necessary to avoid ambiguity:
The two types in this language are \(Int\) for integer expressions
and \(Bool\) for boolean expressions. We use the notation \(t
~:~ T\) to denote that a term \(t\) has the type \(T\). A
statement of the form \(t ~:~ T\) is often called a typing
relation or type judgment.
The base typing rules assign types to integer and boolean literals:
\[\frac{
}{
IntegerLiteral ~:~ Int
}\]
\[\frac{
}{
\textbf{true} ~:~ Bool
}\]
\[\frac{
}{
\textbf{false} ~:~ Bool
}\]
For more complex expressions, we have derivation rules that are similar
to those in operational semantics, where the top of the rule is the
premise and the bottom the conclusion. The following is the rule
for addition:
\[\frac{
t_1 ~:~ Int ~~~~~~~~~~ t_2 ~:~ Int
}{
(t_1 + t_2) ~:~ Int
}\]
This rule states that if \(t_1\) has type \(Int\), and
\(t_2\) has type \(Int\), then the term \((t_1 + t_2)\)
also has type \(Int\). Thus, the rule allows us to compute the
type of a larger expression from the types of the subexpressions,
as in the following derivation:
\[\cfrac{
\cfrac{
}{
1 ~:~ Int
}
~~~~~~~~~~
\cfrac{
\cfrac{
}{
3 ~:~ Int
}
~~~~~~~~~~
\cfrac{
}{
5 ~:~ Int
}
}{
(3 + 5) ~:~ Int
}
}{
(1 + (3 + 5)) ~:~ Int
}\]
On the other hand, an expression such as \((\textbf{true} + 1)\)
is not well typed: since \(\textbf{true} ~:~ Bool\), the premise
in the rule for addition does not hold, so it cannot be applied to
derive a type for \((\textbf{true} + 1)\). Since no type can be
derived for the expression, the expression does not type check, and it
is erroneous.
The following rules for subtraction and multiplication are similar to
that of addition:
\[\frac{
t_1 ~:~ Int ~~~~~~~~~~ t_2 ~:~ Int
}{
(t_1 - t_2) ~:~ Int
}\]
\[\frac{
t_1 ~:~ Int ~~~~~~~~~~ t_2 ~:~ Int
}{
(t_1 * t_2) ~:~ Int
}\]
The rule for comparison requires that the two operands have type
\(Int\), in which case the type of the overall expression is
\(Bool\):
\[\frac{
t_1 ~:~ Int ~~~~~~~~~~ t_2 ~:~ Int
}{
(t_1 <= t_2) ~:~ Bool
}\]
The rule for conjunction requires that the operands have type
\(Bool\), and the resulting expression also has type \(Bool\).
Negation similarly requires its operand to have type \(Bool\):
\[\frac{
t ~:~ Bool
}{
\textbf{not}~ t ~:~ Bool
}\]
The conditional expression requires the test to have type
\(Bool\). However, the only restrictions on the remaining two
operands is that they are well typed, and that they both have the same
type. For example, an expression such as \((\textbf{if}~ test~
\textbf{then}~ 3~ \textbf{else}~ 5)\) will always produce an integer,
regardless of the value of \(test\), while \((\textbf{if}~
test~ \textbf{then}~ false~ \textbf{else}~ true)\) will always produce
a boolean. Thus, our typing rule has a type variable \(T\) to
represent the type of the last two operands, ensuring that they match:
\[\frac{
t_1 ~:~ Bool
~~~~~~~~~~
t_2 ~:~ T
~~~~~~~~~~
t_3 ~:~ T
}{
(\textbf{if}~ t_1 ~\textbf{then}~ t_2 ~\textbf{else}~ t_3) : T
}\]
Now that we have typing rules for a simple language of booleans and
integers, we proceed to introduce variables into the language. For the
purposes of typing, we will assume that each variable in a program has
a distinct name. As we saw in lambda calculus, we can rename
variables if necessary so that this is the case.
We introduce the following syntax for a binding construct to the
language:
The semantics of this construct are to replace all occurrences of the
given variable in the body of the \(\bf let\) with the
variable’s bound value. Thus, an expression such as the following
should produce an integer:
\[(\textbf{let}~ x = 3~ \textbf{in}~ (x + 2))\]
On the other hand, the following expression should not type check,
since replacing \(x\) with its bound value results in an ill-typed
body:
\[(\textbf{let}~ x = 3~ \textbf{in}~ \textbf{not}~ x)\]
In order to determine whether or not the body of a \(\bf let\) is
well typed, we need a type context or type environment that keeps
track of the type of the variables that are in scope. The following is
the notation we use for a type environment:
The symbol \(\Gamma\) represents a type environment.
The notation \(x : T \in \Gamma\) denotes that \(\Gamma\)
maps the name \(x\) to the type \(T\).
We extend a type environment as \(\Gamma, x : T\), which denotes
the type environment that assigns the type \(T\) to \(x\)
but assigns all other variables the same type as in \(\Gamma\).
We express a type judgment as \(\Gamma ~\vdash~ t ~:~ T\),
which states that the term \(t\) has type \(T\) within the
context of the type environment \(\Gamma\).
As indicated by the last point above, type judgments are now made in
the context of a type environment that maps variables to their types.
If a particular term has the same type regardless of typing
environment, then we can elide the environment in a type judgment. For
example, the judgment \(\vdash~ \textbf{true} ~:~ Bool\) indicates
that \(\bf true\) always has type \(Bool\) within the context
of any type environment.
The following are our existing typing rules using the notation
of type environments:
\[\frac{
}{
\vdash~ IntegerLiteral ~:~ Int
}\]
\[\frac{
}{
\vdash~ \textbf{true} ~:~ Bool
}\]
\[\frac{
}{
\vdash~ \textbf{false} ~:~ Bool
}\]
\[\frac{
\Gamma ~\vdash~ t_1 ~:~ Int ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ Int
}{
\Gamma ~\vdash~ (t_1 + t_2) ~:~ Int
}\]
\[\frac{
\Gamma ~\vdash~ t_1 ~:~ Int ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ Int
}{
\Gamma ~\vdash~ (t_1 - t_2) ~:~ Int
}\]
\[\frac{
\Gamma ~\vdash~ t_1 ~:~ Int ~~~~~~~~~~ \Gamma ~\vdash~ t_2 ~:~ Int
}{
\Gamma ~\vdash~ (t_1 * t_2) ~:~ Int
}\]
\[\frac{
\Gamma ~\vdash~ t ~:~ Bool
}{
\Gamma ~\vdash~ \textbf{not}~ t ~:~ Bool
}\]
\[\frac{
\Gamma ~\vdash~ t_1 ~:~ Bool
~~~~~~~~~~
\Gamma ~\vdash~ t_2 ~:~ T
~~~~~~~~~~
\Gamma ~\vdash~ t_3 ~:~ T
}{
\Gamma ~\vdash~ (\textbf{if}~ t_1 ~\textbf{then}~ t_2 ~\textbf{else}~ t_3) ~:~ T
}\]
We need a rule for typing a variable:
\[\frac{
v : T \in \Gamma
}{
\Gamma ~\vdash~ v ~:~ T
}\]
Here, we use \(v\) to denote a variable. The rule states that if
the type environment \(\Gamma\) maps the variable \(v\) to
type \(T\), then the term consisting of \(v\) itself has type
\(T\) within the context of \(\Gamma\).
We can now add a rule for the \(\bf let\) binding construct:
Here, we use \(v\) to denote the name of the variable introduced
by the \(\bf let\). The rule states that if the initializer
expression is assigned the type \(T_1\) within the context of the
original type environment \(\Gamma\), and the body has type
\(T_2\) within the context of the original environment extended
with the mapping \(v : T_1\), then the overall \(\bf let\)
expression also has type \(T_2\). We can use this to derive the
type of our first \(\bf let\) example in the context of any type
environment:
\[\cfrac{
\cfrac{
}{
\vdash~ 3 ~:~ Int
}
~~~~~~~~~~
\cfrac{
\cfrac{
x : Int \in x : Int
}{
x : Int ~\vdash~ x ~:~ Int
}
~~~~~~~~~~
\cfrac{
}{
x : Int ~\vdash~ 2 ~:~ Int
}
}{
x : Int ~\vdash~ (x + 2) ~:~ Int
}
}{
\vdash~ (\textbf{let}~ x = 3~ \textbf{in}~ (x + 2)) ~:~ Int
}\]
Now that we have typing rules for expressions of booleans and
integers, we proceed to add functions to our language and introduce
rules for computing the types of function abstractions and
applications. As in lambda calculus, we will consider functions
that take in exactly one argument. A function then has two types that
are relevant: the type of the argument to the function, and the type
of its return value. We will use the notation \(T_1 \rightarrow
T_2\) to denote the type of a function that takes in an argument of
type \(T_1\) and returns a value of type \(T_2\).
For simplicity, we will require that the parameter type of a function
be explicitly specified. It would also be reasonable to infer the type
of the parameter from how it is used in the body, or to deduce the
type of the parameter independently each time the function is applied
to an argument. The latter would provide a form of parametric
polymorphism. However, we will not consider
such schemes here.
To allow functions to be defined, with explicit typing of parameters,
we extend our language as follows:
We introduce two new expressions, one for function abstraction and one
for function application, borrowing syntax from
\(\lambda\)-calculus. We also introduce types into our grammar,
with \(Int\) and \(Bool\) as the non-function types. A
function type is specified by separating its input and output types by
the type constructor\(\rightarrow\). When chained, the type
constructor is right associative, so that \(Int \rightarrow Int
\rightarrow Bool\) is equivalent to \(Int \rightarrow (Int
\rightarrow Bool)\), denoting a function that takes in an \(Int\)
and returns a function with type \(Int \rightarrow Bool\).
As with \(\bf let\), we will assume that parameter names
introduced by \(\bf lambda\) expressions are distinct from any
other names in the program, knowing that we can always rename
variables to ensure that this is the case.
We can now define the typing rule for abstraction as follows:
The rule states that if the body \(t_2\) has type \(T_2\) in
the type environment that consists of \(\Gamma\) extended with the
mapping \(v : T_1\) for the parameter, then the function as a
whole has type \(T_1 \rightarrow T_2\). Thus, the function takes
in a value of type \(T_1\) as an argument and returns a value of
type \(T_2\).
This states that if the function has type \(T_2 \rightarrow T_3\),
taking in a \(T_2\) and returning a \(T_3\), and the argument
has the requisite type \(T_2\), then the application results in
the type \(T_3\).
As an example, consider the following program fragment:
\[(\textbf{let}~ f = (\textbf{lambda}~ x : Int .~ (x <= 10))~
\textbf{in}~ (f~ 3))\]
We can derive the type of this expression in any type environment as
follows:
\[\cfrac{
\cfrac{
\cfrac{
\cfrac{
x : Int \in x : Int
}{
x : Int ~\vdash~ x ~:~ Int
}
~~~~~~~~
\cfrac{
}{
x : Int ~\vdash~ 10 ~:~ Int
}
}{
x : Int ~\vdash~ (x <= 10) ~:~ Bool
}
}{
\vdash~ (\textbf{lambda}~ x : Int .~ (x <= 10)) ~:~
Int \rightarrow Bool
}
~~~~~~~~
\cfrac{
\cfrac{
f : Int \rightarrow Bool \in f : Int \rightarrow Bool
}{
f : Int \rightarrow Bool ~\vdash~ f ~:~ Int \rightarrow Bool
}
~~~~~~~~
\cfrac{
}{
f : Int \rightarrow Bool ~\vdash~ 3 ~:~ Int
}
}{
f : Int \rightarrow Bool ~\vdash~ (f~ 3) ~:~ Bool
}
}{
\vdash~ (\textbf{let}~ f = (\textbf{lambda}~ x : Int .~ (x <= 10))~
\textbf{in}~ (f~ 3)) ~:~ Bool
}\]
At the bottom of the derivation, we apply the rule for \(\bf
let\), requiring us to compute the type of the variable initializer as
well as the type of the body in a type environment where the new
variable has its computed type.
To compute the type of the initializer, we apply the rule for
abstraction, requiring us to compute the type of the body in a type
environment with the function parameter having its designated type of
\(Int\). This applies the rule for \(<=\), further requiring
computation of types for the variable \(x\) and integer literal
\(10\). The body then has the type \(Bool\), so the
abstraction has type \(Int \rightarrow Bool\).
We can then compute the type of the body of the \(let\), in a type
context where \(f\) has type \(Int \rightarrow Bool\). This
requires us to apply the rule for function application, computing the
type of both the function and its argument. The function is the
variable \(f\), which has type \(Int \rightarrow Bool\) in the
type environment. The argument is the integer literal \(3\), which
has type \(Int\). Thus, the application is applying an \(Int
\rightarrow Bool\) to an \(Int\), resulting in \(Bool\). This
is also the type of the \(\bf let\) expression as a whole.
Our working language now has the base types \(Bool\) and
\(Int\), as well as function types. Let us extend the language by
adding floating-point numbers:
\[\begin{split}E ~\rightarrow&~ F\\
F ~\rightarrow&~ FloatingLiteral\\
T ~\rightarrow&~ Float\end{split}\]
We add a typing rule for floating-point literals:
\[\frac{
}{
\vdash~ FloatingLiteral ~:~ Float
}\]
We would also like to allow operations such as addition on expressions
of type \(Float\). We could define a separate rule for adding two
\(Float\)s:
However, the combination of this rule and the rule for adding
\(Int\)s does not permit us to add a \(Float\) and an
\(Int\). Adding more rules for such a combination is not a
scalable solution: introducing more numerical types would result in a
combinatorial explosion in the number of rules required.
Functions pose a similar problem. If we define a function such as
\((\textbf{lambda}~ x : Float .~ (x + 1.0))\), we would like to be
able to apply it to an \(Int\) as well as a \(Float\).
Conceptually, every integer is also a floating-point number [3], so
we would expect such an operation to be valid.
Rather than adding more rules to permit this specific case, we
introduce a notion of subtyping that allows a type to be used in
contexts that expect a different type. We say that type \(S\) is a
subtype of type \(T\) if a term of type \(S\) can be
substituted anywhere a term of type \(T\) is expected. We use the
notation \(S <: T\) to denote that \(S\) is a subtype of
\(T\).
The subtype relation \(<:\) must satisfy the following
requirements:
It is reflexive, meaning that for any type \(S\), it must be
that \(S <: S\), so that \(S\) is a subtype of itself.
It is transitive, so that \(S <: T\) and \(T <: U\)
implies that \(S <: U\).
Thus, the subtype relation must be a preorder. In many languages,
the subtype relation is also a partial order, additionally
satisfying the following:
It is antisymmetric, so that \(S <: T\) and \(T <: S\)
implies that \(S = T\).
In our working language, we specify that \(Int\) is a subtype of
\(Float\):
\[Int <: Float\]
To allow our type system to accommodate subtyping, we introduce a new
typing rule, called the subsumption rule, to enable a subtype to be
used where a supertype is expected:
\[\frac{
\Gamma ~\vdash~ t ~:~ S
~~~~~~~~~~
S <: T
}{
\Gamma ~\vdash~ t ~:~ T
}\]
This rule states that if the type of term \(t\) has been computed
as \(S\), and \(S\) is a subtype of \(T\), then we can
also conclude that \(t\) has type \(T\). This allows a
function that expects a \(Float\) to be applied to an \(Int\)
as well:
\[\cfrac{
\Gamma ~\vdash~ f ~:~ Float \rightarrow Float
~~~~~~~~~~
\cfrac{
\Gamma ~\vdash~ x ~:~ Int
~~~~~~~~~~
Int <: Float
}{
\Gamma ~\vdash~ x ~:~ Float
}
}{
\Gamma ~\vdash~ (f~ x) ~:~ Float
}\]
However, such a rule always produces a \(Float\) as a result. This
precludes us from using the result as an argument to a function that
expects an \(Int\) as its argument: it is not the case that
\(Float <: Int\), so we cannot use a \(Float\) in a context
that requires \(Int\).
Instead, we need to rewrite the rule such that it produces a
\(Float\) when at least one of the operands is a \(Float\),
but it results in an \(Int\) if both operands are \(Int\)s.
More generally, we desire the following, where \(T_1\) and
\(T_2\) are the types of the two operands:
Both operands are of numerical type. In our language, this means that
they are each of a type that is some subtype of \(Float\). Thus,
we require that \(T_1 <: Float\) and \(T_2 <: Float\).
The result is the least upper bound, or join, of the two operand
types. This means that the result type is the minimal type \(T\)
such that \(T_1 <: T\) and \(T_2 <: T\). We use the notation
[4]\(T = T_1 \sqcup T_2\) to denote that \(T\) is the join
of \(T_1\) and \(T_2\).
Since \(S <: S\), it is always the case that \(S = S \sqcup
S\). Thus, in our language, we have \(Int = Int \sqcup Int\),
\(Float = Float \sqcup Float\), and \(Float = Int \sqcup
Float\).
Putting these requirements together, we can define the typing rule for
addition as follows:
Many languages include a \(Top\) type, also written as
\(\top\), that is a supertype of every other type in the language.
Thus, for any type \(S\), we have:
\[S <: Top\]
The \(Top\) type corresponds to the \(Object\) type in many
object-oriented languages. For example, the object type in Python
is a supertype of every other type.
Introducing \(Top\) into our language ensures that a join exists
for every pair of types in the language. However, it is not
necessarily the case in general that a particular language has a join
for every pair of types, even if it has a \(Top\) type.
The existence of a join for each pair of types allows us to loosen
the typing rule for conditionals:
Rather than requiring that both branches have exactly the same type,
we allow each branch to have an arbitrary type. Since we can always
compute the join of the two types in our language, the resulting type
of the conditional is the join of the types of the branches.
In a language with higher-order functions, subtyping is also
applicable to function types. There are contexts where it would be
semantically valid to accept a function type that is different from
the one that is expected. For instance, consider the following
higher-order function:
\[(\textbf{lambda}~ f : Int \rightarrow Bool .~ (f~ 3))\]
This function takes in another function \(f\) as an argument and
then applies \(f\) to an \(Int\). If the actual function
provided as an argument had type \(Float \rightarrow Bool\)
instead, it would still be semantically valid to invoke it on an
\(Int\). Thus, it should be the case that \(Float \rightarrow
Bool <: Int \rightarrow Bool\), since the former can be used in
contexts that expect the latter.
Now consider another higher-order function:
\[(\textbf{lambda}~ f : Int \rightarrow Float .~ (f~ 3))\]
This new function takes in a function \(f\) and applies it to an
\(Int\) to produce a \(Float\). However, if the function we
provide as the argument has type \(Int \rightarrow Int\), it would
produce an \(Int\); the latter is a valid substitution for a
\(Float\), making such an argument semantically valid. Thus, it
should also be the case that \(Int \rightarrow Int <: Int
\rightarrow Float\).
Putting both cases together, we end up with the following subtyping
rule for functions:
A function that accepts parameters of type \(S_1\) accepts more
general argument values than one that accepts type \(T_1\); the
former has a more general domain than the latter. Any contexts that
expect to pass a \(T_1\) as an argument would be just as well
served if the function accepts an \(S_1\). Thus, the function type
that accepts an \(S_1\) should be substitutable for the function
type that accepts a \(T_1\).
A function that produces a return value of type \(S_2\) has a more
restricted set of outputs, or codomain, than a function that produces
a \(T_2\). Any context that expects a \(T_2\) as output would
be just as well served by an \(S_2\) as output. Thus, the function
type that produces an \(S_2\) should be substitutable for the
function type that produces a \(T_2\).
The subtyping rule permits a contravariant parameter type in the
function subtype: it is contravariant since the direction of the
relation \(<:\) is reversed for the parameter types compared to
the relation for the function types. The rule also permits a
covariant return type, since the direction of \(<:\) is the same
for the return types and the function types.
Covariant return types often appear in object-oriented languages in a
different context, that of overriding a base class’s method, for the
same semantic reasons they are valid here. We will discuss
covariance and contravariance
in object-oriented programming in more detail later.