Build me a LISP

Created 9th February, 2019 07:42 (UTC), last edited 9th February, 2019 10:53 (UTC)

Normally programming language blog posts get started with grammars and syntax and parsing and all sorts of stuff like that. I can't be bothered with all of that, so instead, let's start with the (maybe more interesting aspect) of the actual evaluation of the code. Here we'll build an evaluation engine for LISP like languages in less than 50 lines of C++ — and literate C++ at that!

Lisp uses something called a "cons cell" to store each part of both the program and the data. Let's just say we have something called a cell for now, and we'll come back to defining what it is later on.

1
struct cell;

S-Expressions

A Lisp program is actually just something called an s-expression. It's a fancy term for a list of cells, but there is a special way to interpret these s-expressions. We'll come back to that later on.

For now, we just need to know that s-expressions describe our LISP program, and to run a LISP program we simply evaluate the s-expression.

2
3
#include <vector>
using sexpr = std::vector<cell>;

Cells

Now we can talk about what a cell is. Our simple LISP will only have a single data type, strings, so our cells will either contain a string or an s-expression.

4
5
6
7
#include <string>
#include <variant>
struct cell {
    std::variant<std::string, sexpr> content;

We do need to be able to make cells, and they can be made from either strings or s-expressions.

8
9
10
    cell(std::string s) : content{std::move(s)} {}
    cell(sexpr s) : content{std::move(s)} {}
};

Evaluation

There are three different sorts of data structure that we have so far:

  1. Strings
  2. S-expressions
  3. Cells

Because running our programs is going to be a matter of evaluating these structures, we'll need to consider what "evaluation" means for each of these.

First of all, strings are just themselves.

11
std::string eval(std::string s) { return s; }

The s-expressions are a bit more interesting, but for now we just have to promise to implement them so that we can start with the simpler cells (and because we have to implement the evaluation of the s-expressions in terms of evaluating cells)…

12
std::string eval(sexpr const &);

Evaluating a cell is pretty simple. It's either going to be a string or a s-expression, so depending on what we find in the cell content we can just forward to evaluating whichever one we find.

13
14
15
std::string eval(cell const &c) {
    return std::visit([](auto const &c) { return eval(c); }, c.content);
}

The standard library

Although an s-expression is just a vector of cells, the first item in it is special. To evaluate an s-expression we take the first item in it to be the name of a function that is to be executed, and the remaining s-expression cells are then taken as that function's arguments (some of which may be s-expressions that require evaluation themselves).

So if you're used to writing your programs like this:

foo(bar, baz)

The s-expression form would look more like:

(foo bar baz)

We're going to implement our library in C++, and we're going to look functions up by name. Because the s-expression can have any number of arguments, it makes sense to evaluate them based on a couple of iterators (markers), one to the first argument and one to the end of the s-expression.

16
using marker = sexpr::const_iterator;

Our built in library functions take the two iterators representing their arguments and return a string which is the result of running them.

17
18
#include <functional>
using builtin = std::function<auto(marker, marker)->std::string>;

We now need a simple data structure that will hold these for us and allow look-ups by name.

19
20
#include <map>
std::map<std::string, builtin> library;

Evaluating S-Expressions

The first item in an s-expression is a function name that we're going to call. This means that an empty s-expression is an error.

21
22
23
24
#include <stdexcept>
std::string eval(sexpr const &s) {
    if (s.begin() == s.end()) {
        throw std::domain_error("Empty sexpr");

An alternative design would be to say that an empty s-expression evaluates to an empty string, but this is good for us.

Now, assuming our s-expression contains a first item, we need to fetch the function name. Remember that our s-expression is comprised of cells (which can contain strings or s-expressions) so we need to turn our cell into a string. Luckily we already have a function that does that: eval.

25
26
    } else {
        auto const fname = eval(*s.begin());

Using eval here (rather than just fetching the string out of the cell directly) has some important consequences to what this language can do.

Although we've said so far that the first item in an s-expression has to be the function name, by evaluating the cell to fetch that string we can actually have an s-expression as the first cell! And evaluating that s-expression gives the function name that we will run. Don't worry about this for now, there will be an example at the end.

Now that we have a function name we only need to look it up and call it, and of course take into account the possibility that it might not be found.

27
28
29
30
31
32
33
        if (auto const fp = library.find(fname); fp != library.end()) {
            return fp->second(++s.begin(), s.end());
        } else {
            throw std::invalid_argument{fname + " not in the library"};
        }
    }
}

And that's actually it for the whole evaluation engine. Luckily we have some lines left over to try it out with.

Running a program

34
35
36
#include <iostream>
int main() {
    try {

As with all programming languages we have to make sure to say hello to the whole world.

(cat "Hello" "world!")

Because we didn't bother with a parser we have to write the data structure for the program directly into the C++ source code. Thankfully it's not too hard.

37
38
        using namespace std::string_literals;
        auto const prog = sexpr{"cat"s, "Hello "s, "world!"s};

Of course we didn't put anything into our library yet, so let's make sure that we do at least have an implementation of cat we can call.

39
40
41
42
43
        library["cat"] = [](marker pos, marker end) {
            std::string ret;
            while (pos != end) ret += eval(*pos++);
            return ret;
        };

Now we can run our program and show the result.

44
        std::cout << eval(prog) << std::endl;

If you grab this C++ file you can compile and run it using something like (the file is available as a Gist on Github):

clang++ lisp.cpp -std=c++17 -o lisp
./lisp

And you'll see some output:

Hello world!

Errors…

The very last thing we should do is to print out some sort of error message if a run-time error came up.

45
46
47
48
49
    } catch (std::exception &e) {
        std::cerr << "Error: " << e.what() << std::endl;
        return 1;
    }
}

More fun

Remember there was the bit about the evaluation of the lead term in the s-expression? Because the first term can itself be an s-expression you can write a program to generate the function name you want to call.

If you do try this on your machine, try to change the program to the following and see what happens.

auto const prog = sexpr{
    sexpr{"cat"s, "c"s, "a"s, "t"s},
        "Hello "s, "world!"s};

When you run this it will still print "Hello world!" because the s-expression as the first term gets evaluated and produces the name cat which then gets called to produce the output.

Order fun

Remember the type we had for builtin took markers to the s-expressions of the arguments? This means that the argument values are evaluated inside of the library function definition. Look again at the loop in our cat function:

while (pos != end) ret += eval(*pos++);

It would also be a reasonable choice to move that evaluation outside of the library function and instead have our builtin type look like this (a function that takes a vector of strings and produces a string):

using builtin = std::function<auto(std::vector<std::string>)->std::string>;

This difference in where the expressions are evaluated is called Evaluation Strategy and the choice we made above is called normal-order. This is of course the opposite of what you're used to from pretty much every other programming language, which normally use applicative-order.

More commonly, normal-order is referred to as lazy evaluation and applicative-order as eager evaluation.

if isn't (an eager) function

In LISP terms our builtins are all special forms. This is important in order to be able to correctly implement forms like if.

Let's say we want to implement if so that it can be used something like this:

(if (condition) (true-case) (false-case)) -> string

We will always need to evaluate the condition expression, and, depending on what happens, we are for sure going to return either the true-case or the false-case. But more importantly we only want to evaluate one of them.

If we have to implement if in a language that only has eager evaluation then we will evaluate both the true-case and the false-case even if we only return the result of one of them.

But with lazy evaluation we can save ourselves some processing, and this control of the evaluation that we give to the library functions is critical if you want to take this toy language and turn it into something real.

C++ itself is an eager language and this is why we cannot short-circuit in user-defined implementations of && and ||, something that can cause us some pain. But there is hope.