[DRAFT] Scheme Semantics: A Gentle Introduction

[[!meta <span class=”error”>Error: cannot parse date/time: 2022-03-16 xx:xx:xx +11:00</span>]]

I’ve been programming in Scheme for a very long time: indeed, it’s one of the first programming languages I learned. Recently, I’ve been hacking on some code with Scheme, and along the way, I got distracted by the “fun” idea of implementing Scheme myself.

So… what does it mean to implement a programming language?

In my case, I want to write a program that, given some well-formed input in that programming language, could compute a corresponding output. In other words: I want to write a program that reads the source language, evaluates the program described in the source, and produces (perhaps by printing) the corresponding output.

I’ll ignore reading and writing for now, and just concentrate on the middle bit — a program that evaluates some input program — and now I’ve turned my problem into another, much more curious one:

What does ‘evaluation’ even mean here?

How do I know my notion of ‘evaluation’ matches that of the creators of the language?

This is where having a defined semantics of our programming language is handy!

Describing Scheme

Scheme’s first description is in the Report on the Algorithmic Language Scheme. It’s essentially a user’s manual of the language as it stood at the time, and it’s interesting to read, if only to see how much the language as it was first defined by Steele and Sussman has changed from then to now. Notably, there’s no formalism here: it’s purely a definition by example. In some circumstances, that’s OK — but that leaves open many possibilities for things to go wrong.

Since the fourth revision of the language (the Revised^3 Report on the Algorithmic Language Scheme, or R³RS), it has had a more formal definition: in R³RS, R⁴RS, R⁵RS, and R⁷RS, by a denotational semantics; and in R⁶RS, by a reduction semantics. I opted to look at the denotational semantics defined in R⁷RS, and to start an implementation from there. (The semantics defined in R⁶RS look a little scary — at some point I’d like to come back and write up an explanation of those, too.)

So, armed with a concrete definition of what Scheme programs mean, I can get going, right? Well… maybe.

Aside: Denotational Semantics?

One way of assigning meaning to terms and expressions in a programming language is via denotational semantics. This margin certainly isn’t big enough for a full introduction to them, but Wikipedia’s summary, reproduced below, is a good starting point:

[Denotational semantics is an approach of] formalising the meanings of programming languages by constructing mathematical objects (called denotations) that describe the meanings of expressions from the languages. […] Broadly speaking, denotational semantics is concerned with finding mathematical objects that represent what programs do.

This sounds exactly like what I want: all I would then need to do is to implement the mathematical objects — and then I have a program that can run Scheme programs!

From here on out, I’m going to work through R⁷RS §7.2, piece by piece, until at the end I’ve got an interpreter!

Aside: Implementing Scheme

I’m going to do this implementation in Haskell, to make the code as close as possible to the specified denotational semantics, and to preserve as much of the semantics as possible, even though this results in Haskell that isn’t always idiomatic. (In some ways we’re very lucky: it’s already very close, and the necessary transformations are surprisingly simple.)

One of the terms I’ll throw around is host environment: this is my way of being clear about where I’m implementing the language, and differentiating that from the language being implemented.

Often, a careful choice of host environment can really help us out as we implement, by letting us take advantage of its abstractions. For example: whilst Lisp-like languages are famously garbage-collected, I won’t need to consider implementing a garbage collector, and, if my host language is garbage-collected, it’s a good bet I get that for free. (Indeed, because the semantics operate at a level where memory management is so thoroughly abstracted-over, garbage collection is effectively immaterial.)

In this case, my host environment is Haskell. I get garbage collection for free! I also get partial application, which is used heavily in these semantics. (I’d love to try this exercise again in another garbage-collected language which supported partial application, but these are a rare beast. I’d also like to try this in a real interactive theorem prover, but the one I’m most comfortable with objects to the syntax.)

The Abstract Syntax

The first thing I need is… well, a language: to be precise, a way to talk about objects and constructs in the language. In practice, we don’t usually use the language’s concrete syntax: instead, we define abstract syntax.

The R⁷RS semantics define an abstract syntax comprising of:

\(\K \in \Con\): constants — literal values, essentially, which we can translate from our source language, and to make life easier for ourselves, we’ll cheat and do exactly that (and I’ll come back to the type Datum shortly):
```
type Con = Datum        -- Datum defined elsewhere.
```

\(\I \in \Ide\): identifiers; for these, we’ll just use strings in the host language:
```
data Ide = Ide String
  deriving (Eq, Show)
```

\(\C \in \Com = \Exp\): commands, which are just expressions whose evaluated result is ignored —
```
type Com = Exp
```
— and, at last,

\(\E \in \Exp\): expressions, where an expression is one of:
- \(\Exp \to \K\): the value of a constant;
- \(\Exp \to \I\): an identifier, effectively serving as a variable reference;
- \(\Exp \to \texttt{(}\E_{0}\texttt{ }\arbno{\E}\texttt{)}\): application of the procedure expression \(\E_0\) to the argument expressions \(\arbno{\E}\);
- \(\Exp \to \texttt{(lambda (}\arbno{\I}\texttt{)}\texttt{ }\arbno{\C}\texttt{ }\E_{0}\texttt{)}\): a procedure abstraction with a fixed number of arguments, given as a sequence of bound names \(\arbno{\I}\) that are valid across both the sequence of commands \(\arbno{\C}\) and the expression \(\E_0\); when this procedure is applied, the argument expressions are bound to the names in order; then the commands and expression are evaluated in order, and the whole term then evaluates to the value of the expression;
- \(\Exp \to \texttt{(lambda (}\arbno{\I}\texttt{ . }\I\texttt{)}\texttt{ }\arbno{\C}\texttt{ }\E_{0}\texttt{)}\): a procedure abstraction, as above, but with a variable number of arguments — any arguments after the last fixed argument are bound as a list with the name \(\I\);
- \(\Exp \to \texttt{(lambda }\I\texttt{ }\arbno{\C}\texttt{ }\E_{0}\texttt{)}\): a procedure abstraction, as above, but with no fixed arguments — as we’ll see later, this is equivalent to \(\texttt{(lambda (. }\I\texttt{) }\arbno{\C}\texttt{ }\E_{0}\texttt{)}\).
- \(\Exp \to \texttt{(if }\E_{0}\texttt{ }\E_{1}\texttt{ }\E_{2}\texttt{)}\): a conditional expression which, when evaluated, first evaluates the expression \(\E_{0}\), and if that value is “truthy”, then the whole expression evaluates to \(\E_{1}\), otherwise it evaluates to \(\E_{1}\);
- \(\Exp \to \texttt{(if }\E_{0}\texttt{ }\E_{1}\texttt{)}\): a conditional expression which, when evaluated, first evaluates the expression \(\E_{0}\), and if that value is “truthy”, then the whole expression evaluates to \(\E_{1}\), otherwise it evaluates to the magic term \(\mathsf{unspecified}\); and finally,
- \(\Exp \to \texttt{(set! }\I\texttt{ }\E\texttt{)}\): an assignment expression that first evaluates the expression \(\E\), and then stores the resulting value as the identifier \(\I\).
All up, that might look like:
```
data Exp where
  K    :: Con                          -> Exp
  I    :: Ide                          -> Exp
  App  :: Exp -> [Exp]                 -> Exp
  Lam0 :: [Ide]        -> [Com] -> Exp -> Exp
  Lam1 :: [Ide] -> Ide -> [Com] -> Exp -> Exp
  Lam2 ::          Ide -> [Com] -> Exp -> Exp
  If   :: Exp -> Exp -> Exp            -> Exp
  If0  :: Exp -> Exp                   -> Exp
  Setx :: Ide -> Exp                   -> Exp
  deriving (Show)
```

All things going well, at this point I can write programs in this weird language — I now have constants, variables, abstraction, application, conditionals, and assignment. But there’s one small problem: I can’t talk about any actual terms. Let’s fix that.

The Domain Equations

Denotational semantics concerns itself with constructing mathematical objects, which it calls domains. R⁷RS defines perhaps twenty domains — let’s walk through each of them, explaining what each are, illustrating their names in the spec (as well as the name typically used for objects in that domain), and giving a definition in Haskell.

Natural numbers (\(\NAT\))

The natural numbers, denoted \(\NAT\), usually named \(\nu\) — because what would some formal semantics be without needing some natural numbers? The main places these come in handy is when we deal with applying optional arguments, and also whilst dealing with dynamic-points. For the sake of simplicity, I’ll just use Haskell’s Int type.

type NAT = Int

Booleans (\(\TRU\))

Booleans, denoted \(\TRU\), defined as either \(\{ \mathit{false}, \mathit{true} \}\) — because, again, what would some formal semantics be without having Boolean true and false? These are mostly used as mutability markers, which I’ll go into more depth about when I dissect memory management. And, again, for the sake of simplicity, I’ll appeal to Haskell’s Bool type.

type TRU = Bool

Values (\(\EXP\); \(\SYM\), \(\CHR\), \(\NUM\), \(\PAI\), \(\VEC\), \(\STR\), \(\MSC\), \(\FUN\))

There are a raft of value types:

Symbols, denoted \(\SYM\):

data SYM = SYM String deriving (Eq, Show)

Characters, denoted \(\CHR\):

data CHR = CHR Char deriving (Eq, Show)

Numbers, denoted \(\NUM\):

Numbers are well-understood by Scheme — but for now I’ll be ignoring the details of Scheme’s numeric tower, so the definition I’ll use is:

data NUM {- ... -}

I assume we have definitions of a range of type-classes on this — in particular, Eq.

Pairs, denoted \(\PAI\):

Defined as \(\LOC \times \LOC \times \TRU\).

They’re one of the fundamental constructs of lots of Lisp-like languages, including Scheme; R⁷RS §6.4 describes them as “a record structure with two fields”, classically referred to as car and cdr. The semantic definition of a pair gives us knowledge of two locations and also give us a mutability marker — by definition, true for mutable objects, and false for immutable objects — which is valuable in cases where the memory model might include read-only data.

data PAI = PAI (LOC, LOC, TRU) deriving (Show)

For convenience, we’ll also define equality: if two pairs point at the same location, they’re the same pair:

instance Eq PAI where
  (PAI (car₁, cdr₁, mut₁))  ==  (PAI (car₂, cdr₂, mut₂)) =
    (car₁ == car₂) && (cdr₁ == cdr₂)

Vectors, denoted \(\VEC\):

Defined as \(\arbno{\LOC} \times \TRU\). A list of locations and a mutability marker (defined as true for mutable objects, and false for immutable objects). They don’t come up in the semantics much, and I’ll mostly ignore them, but we can still define their type:

data VEC = VEC ([LOC], TRU) deriving (Eq, Show)

Strings, denoted \(\STR\):

Defined as \(\arbno{\LOC} \times \TRU\), similarly to vectors. I presume the locations store characters, though this isn’t especially well-defined!

data STR = STR ([LOC], TRU) deriving (Eq, Show)

Miscellaneous values, denoted \(\MSC\):

There are a collection of miscellaneous values floating around in the system, and these can be any of \(\fun{false}\) or \(\fun{true}\), \(\fun{null}\), \(\fun{undefined}\), and \(\fun{unspecified}\):

data MSC = Mfalse | Mtrue | Mnull | Mundef | Munspec
  deriving (Eq, Show)

Procedure values, denoted \(\FUN\):

Usually named \(\phi\), Defined as \(\LOC \times (\arbno{\EXP} \to \DP \to \EC \to \CC)\). This construction — a tuple of a location and a function — turns out to be a closure capturing some state out of the environment it was declared in, and arranging to make it available at the point of the procedure call.

data FUN = FUN (LOC, [EXP] -> DP -> EC -> CC)

For convenience as we’re putting this together, instances for Eq and Show would be nice, but we can’t simply derive them due to the function-typed argument. Instead, I’ll assume that the location acts as a sufficiently-unique identifier, and define these instances:

instance Eq FUN where
  (FUN (loc₁, fun₁))  ==  (FUN (loc₂, fun₂)) =
    (loc₁ == loc₂)
 instance Show FUN where
   show (FUN (loc, _)) = "#<procedure " ++ (show loc) ++ ">"

Finally, we glue all these domains together to give us the space of expressed values, denoted \(\EXP\), usually named \(\epsilon\), possibly with a numeric subscript to distinguish between multiple such values:

\[\begin{equation*} \epsilon\in\EXP = \SYM+\CHR+\NUM+\PAI+\VEC+\STR+\MSC+\FUN \end{equation*}\]

In Haskell, these can all be data constructors:

data EXP
  = EQ_ SYM | EH CHR | ER NUM | EM MSC
  | EEp PAI | EEv VEC | EEs STR | EF FUN
  deriving (Eq, Show)

Locations, Stores, Environments (\(\LOC\), \(\STO\), \(\ENV\))

Memory management is one of the things left most loosely defined in the semantics. We’re given three domains, \(\LOC\), \(\STO\), and \(\ENV\), and a small handful of primitives that interact with them.

Locations, denoted \(\LOC\):

Usually named \(\alpha\).

Locations provide (part of) an abstraction over the underlying memory management mechanism, whatever that may be. For the purposes of this adventure, I’ll choose a really simple setup: locations are just a (tagged) number, which serves as an index.

data LOC = LOC Int deriving (Eq, Show)

This is where \(\fun{new}\) comes in: in our toy implementation, it finds the next unused slot and returns that. Given the places it’s used and the ways it’s used, I suspect this would be a memory address in a real implementation.

Stores, denoted \(\STO\):

Usually named \(\sigma\). Defined as \(\LOC \to (\EXP \times \TRU)\).

Mapping a location to an expressed value strongly suggests that stores are our abstraction over memory — and, indeed, that’s exactly how they’ll behave.

type STO = LOC -> (EXP, TRU)

Every location maps onto a pair of the value stored there, and a Boolean marker of undefined use [1]. Our \(\fun{new}\) primitive is used to allocate a new location that can store values, which we can then set using \(\fun{update}\). Retrieving a stored value is done by directly invoking the store with the appropriate location.

Environments, denoted \(\ENV\):

Usually named \(\rho\). Defined as \(\Ide \to \LOC\).

Mapping an identifier to a location is similarly suggestive that an environment provides a way of tying a name to a location. Given we also can tie a location to a value, the combination of a store and an environment gives us variables and binding.

type ENV = Ide -> LOC

Our \(\fun{lookup}\) primitive retrieves the mapping from identifier to location held in an environment; and our \(\fun{extends}\) primitive adds some arbitrary number of mappings to an environment, yielding a new environment.

Continuations (\(\EC\), \(\CC\), \(\ANS\))

In my view, continuations are one of the most fascinating parts of Scheme. To quote R⁷RS,

Whenever a Scheme expression is evaluated there is a continuation wanting the result of the expression. The continuation represents an entire (default) future for the computation.

This continuation concept lets us talk about where in the program we’re up to, and what we need to go on to do once we’re done with the current execution. In some ways, this is like the call-stack, where continuations are the activation frames that let us return control to the caller. But in other ways, it’s much closer to threading the flow of execution through a program. (If “monadic state” is a thought you’re comfortable with, it’s not entirely unlike that.) But this is nothing new for anyone who’s looking to model the state of a program.

What makes this special is that Scheme exposes these continuations as objects in the language itself. They behave somewhat like procedures which, when called, allow execution to go somewhere else.

For programmers familiar with C-like goto, this is in some ways similar: we’re given a mechanism to change from wherever our program is currently executing to somewhere else within the program. However, unlike goto, a continuation preserves enough state that one can jump across scopes or functions, instead of merely within one function.

For those familiar with setjmp and longjmp, again, this is similar in some ways: setjmp grabs a slice of execution state, rolls it up into an object that can be passed around, and then longjmp allows one to unroll that. But given a setjmp, one can only longjmp to it once. A continuation doesn’t have any such limitation.

Continuations are so phenomenally powerful that R⁷RS prefaces some compelling examples of call-with-current-continuation, a procedure that lets the program being executed directly interact with (surprise!) the current continuation at the point it occurs, with a droll remark:

If all real uses were as simple as these examples, there would be no need for a procedure with the power of call-with-current-continuation.

In R⁷RS Scheme, there are two major types of continuations: expression and command continuations.

Expression continuations, denoted \(\EC\):

Usually named \(\kappa\). Defined as \(\arbno{\EXP} \to \CC\).

This is a function that, given any number of expressed values — perhaps those that result from a computation currently underway — proceeds to a command continuation, a possible future to execute within. Objects in domain \(\EC\) are exactly those described above as “wanting the result of the expression”. We capture this in Haskell by using its own lambdas.

type EC = [EXP] -> CC

Command continuations, denoted \(\CC\):

Usually named \(\theta\). Defined as \(\STO \to \ANS\).

By mapping a store to an answer, we have a way to understand the world our execution takes place in.

type CC = STO -> ANS

Answers, denoted \(\ANS\):

Bizarrely, answers don’t seem to serve any purpose, except to anchor a command continuation. My guess is they’re here for state threading: they’re passed in at the beginning, sequenced through largely untouched, and fall out the end of an evaluation: but they don’t affect the computation. They do need to be constructed, though, so my chosen Haskell definition is:

data ANS = A deriving (Eq, Show)

Errors (\(\ERR\))

Errors, denoted \(\ERR\), seem to be a way to handle things going wrong. In cases where the semantics must intentionally generate an error, the value includes a string explaining the error, so we’ll do that too:

data ERR = X String deriving (Show)

Dynamic points (\(\DP\))

Dynamic points, denoted \(\DP\), usually named \(\omega\), are defined as either the tuple \(\FUN \times \FUN \times \DP\), or the single value \(\{ \textit{root} \}\).

The spec tells us:

The dynamic extent of a call […] is defined as follows:

The dynamic extent is entered when the execution of the body of the called procedure begins.

The dynamic extent is also entered when execution is not within the dynamic extent and a continuation is invoked that was captured (using call-with-current-continuation) during the dynamic extent.

[The dynamic extent] is exited when the called procedure returns.

It is also exited when execution is within the dynamic extent and a continuation is invoked that was captured while not within the dynamic extent.

This is a weird, twisty construction!

Dynamic points capture some sense of the current call stack, and those constructors and destructors needed therein.

If we momentarily ignore the complications yielded by call/cc, we have a way to add a hook onto the beginning and the end of a dynamic point: these are the two \(\FUN\) values! The inner \(\DP\) value is because a procedure call within a dynamic extent would enter a new dynamic extent. a dynamic point must occur within another dynamic point.

data DP = P (FUN, FUN, DP) | Proot
  deriving (Eq)

The Notation

We also need to introduce some notation that the semantics use. Some of it can be a bit surprising!

\(\left\langle\,\ldots\,\right\rangle\): sequence formation — much of this semantics talk about order-preserving, heterogeneously-typed sequences of items. Haskell’s lists or tuples will work fine for our needs.
\(\Element{s}{k}\): sequence indexing — get the \(k\)’th member of some sequence \(s\), where indices start at 1. In Haskell, this gets messy due to our abuse of lists and tuples for sequences; but, for the general case, we can define a suitable infix operator for the existing list-indexing operator, !!:
```
(↓) :: [a] -> Int -> a
s ↓ k  =  s !! (k - 1)
```
\(\Length{s}\): sequence length. In Haskell, we can’t cheat and use # as a name — it’s used elsewhere in its otherwise-sparse syntax — but we will still define a name here, and use the existing list-length operator, length:
```
len :: [a] -> Int
len s  =  length s
```
\(\Concat{s}{t}\): sequence concatenation: we form a new sequence consisting of all the elements of \(s\), followed by all the elements of \(t\). In Haskell, § is a valid name (despite Pygments’ objections), and we can use a definition based on the existing list-append operator, ++:
```
(§) :: [a] -> [a] -> [a]
s § t  =  s ++ t
```
\(\Drop{s}{k}\): sequence slicing — drop the first \(k\) members of sequence \(s\). In Haskell, we can use the existing drop, though we need to fudge its argument order.
```
(†) :: [a] -> Int -> [a]
s † k  =  drop k s
```
\(t \rightarrow a, b\): McCarthy conditional — “if \(t\), then \(a\), else \(b\)”. I don’t much like this notation; but helpfully, Haskell has explicit conditional constructs — both if/then/else for Booleans, and case…of for general pattern-matching — and we’ll use it for preference.
\(\Subst{\rho}{x}{i}\): substitution — the term \(\rho\), but whenever \(x\) appears, substituting \(i\) instead.

The Haskell for this is a bit weird: I must rely on \(\rho\) being a function, and wrap it in another function. Also, I sadly cannot define mixfix operators, so the Haskell won’t look quite like the semantic form.
```
subst :: Eq a => (a -> b) -> b -> a -> (a -> b)
subst ρ v k =  {- ρ [v / k] -}
  \it -> if it == k then v else ρ it
```
For clarity: given some function ρ :: a -> b, we produce a new function, also of type a -> b, which compares its first argument to the passed k :: a, and if (according to type a’s definition of equality) they are equal it evaluates to v :: b, otherwise it invokes the original function. This is slow and ugly, but it works!
\(\Inject{x}{\mathtt{D}}\): injection of \(x\) into domain \(\mathtt{D}\); \(\Project{x}{\mathtt{D}}\): projection of \(x\) to domain \(\texttt{D}\). In Haskell, these turn into data constructors and pattern matching. We don’t need to worry much more about them.

… include:: /home/jashank/www/trunk/www/misc/r7rs-semantics/aux.rst … include:: /home/jashank/www/trunk/www/misc/r7rs-semantics/sem.rst

Reading

Invoking the semantics

We know that the type of sem_e is:

sem_e  ::  Exp -> ENV -> DP -> EC -> EC

But this implies that, along with the abstract-syntax expression, we need “outermost” values for an environment, a dynamic point, and an expression continuation; and, as this produces an expression-continuation, we need a way to extract the value it produced.

Let’s start with the outermost environment. For now, we’ll start with an empty environment:

emptyEnv :: ENV
emptyEnv i
  = trace(("U bottom: " ++ show i))
    LOC (-1)

This environment always returns location -1, which I’ll assume is never allocated. (If you read through the definition of new_, you’ll find that’s most definitely the case.) I’ll also toss a debugging trace call here.

Next, we need an outermost dynamic point. This is … actually the easiest part: we can just use Proot.

To construct the outermost expression continuation that we feed into this expression, and which will receive the expression’s evaluation result, we need a few things first.

We know we’ll need a store, so we create a store that knows nothing — one where every location has no value (or, specifically, the magic bottom value):

emptyStore :: STO
emptyStore _α = (EM MTACK, False)

Next, we need a command-continuation, which is extremely simple — it turns any arbitrary store into an answer. I’ve also put together a simple function that prints out any bindings known.

hopelessC :: CC
hopelessC σ
  = trace(("C: " ++ dumpState σ 0))
    A
  where
    dumpState σ n
      = let loc = LOC n in
        case σ loc of
          (EM MTACK, _) -> "[--- end ---]"
          x -> describe loc x ++ "\n   "
            ++ dumpState σ (n + 1)
    describe loc val =
      (show loc) ++ " --> " ++ (show val)

Now we can make a simulacrum of the “outermost” expression continuation, which takes the value returned by the expression we evaluated We just print that out, using trace.

hopelessK :: EC
hopelessK = \ εs -> trace(("K: " ++ show εs))
                    hopelessC

-- Crappy invoker that returns an `A' but also spits out results
sem_e' :: Exp -> ANS
sem_e' x = sem_e x (emptyEnv) (Proot) (hopelessK) (emptyStore)

Conclusion

This post took a while to put together, and is actually just the preamble to another post which is also in the works, about the half-dozen bugs I’ve found in R⁷RS.