We’re gonna talk about De Bruijn indices—why they’re interesting, and how you’d implement them. We’re going to go over some motivation behind them, and we’ll get to some code that implements an interpreter that uses them.
This post got too long, so this is now part one in a series about De Bruijn indices.
I’m going to be really handwavy in defining things, mostly because these are my notes and I’m not helping you with any homework! Before we get started, we’re going to go over a few basics1: judgements, and untyped lambda calculus.
Judgements are statements that can be made about objects in a language we are discussing, like “is this logical proposition true?” or “does this expression in my C code typecheck?”. We’re going to use some special notation to describe and prove judgements that you see in lots of logic-related papers (not that I’ve read any), and it looks really scientific and impressive, so let’s go over it quickly:
We’re defining a rule that says that if , , all the way up to (the premises) hold, then (the conclusion) must hold. Note that could be zero—that would be a rule that doesn’t need any premises, and would be called an axiom. Here’s a very important example of a real rule written in this fancy notation, where our judgement is “truth”:
This rule is called modus ponens, and it’s one you use every day. (Note that is read “implies” – “if then ”, if you’re used to programming). Since and are metavariables2, let’s substitute them for things you might see in real life:
If we know both propositions above the line hold, then I owe you lots of schmekles.
Sometimes you’ll see propositions that look like , where means “entailment”, and is written “turnstile” if you want to Google it. Here we’re going to use that symbol to say holds if we can use as an assumption. Note that this isn’t quite the same as the implication we used above, because the has meaning in our proof system, and not in the propositions/language we’re describing.
Here’s one example of a rule using , called weakening:
Here we’re saying that if we’re able to prove using as a set of hypotheses, we can still prove using and a possibly unrelated, but unused, proposition as well (if you’ve solved a murder mystery and have a clue you didn’t use, you still did solve the mystery).
Since this weakening thing seems obvious3, we’re going to assume rules similar to this apply for the rest of this blog post, but note that very interesting things called substructural logics appear when you can’t assume rules like weakening.
Untyped lambda calculus
Lambda calculus is a nifty notation invented by a really smart person to represent computation, using the fundamental building block of functions4. -calculus is so powerful because it’s incredibly simple to describe and use, while being expressive enough to represent any arbitrary computation. We don’t generally write code in lambda calculus because doing so is too verbose, not because there aren’t ideas that we can’t express in it.
To describe the (untyped) -calculus, we’re going to use some rules to define what the terms in our languages can look like. We’re going to describe what our terms can be using the judgement
We can read that as “ is a term in the context ”, where is a set of propositions of the form , where the are variables in our language.
Alright, let’s go through the three rules that define what a -calculus term can look like.
This rule is an axiom: if we have a variable in our context somewhere, it must follow that the bare variable is a valid term. (We’ll go through an example of how our context gets filled in ahead). Note that is a real variable in the language and not a metavariable: we cannot just insert whatever we want in place of that while discussing our language; we need to have rules in our language that tell us what we can do with that .
This rule lets us use the functions we build - this applies the argument to the term , and we have to hope that is a function. There aren’t any parentheses around the arguments to the function like there are in most mainstream programming languages; we just use a space to denote function application. Note that we’re already seeing a weakness with having no types: we don’t have any rules to enforce what ’s type can be. As an important side note, in the -calculus we only have functions that take in one argument. Later we’ll see why we don’t need really need functions that take more than one argument, ever.
Okay! This is the rule where the -calculus earns its name. This rule lets us create an anonymous function, also known as a -abstraction. is a function that takes in one argument and executes the body with that value. The premise (top part) makes it clear what is: it’s a valid term, which may or may not contain the variable , but either way it’s a valid term. If contains a free copy of (an not already bound by a -abstraction), then making that an argument to a function should leave the whole abstraction a valid term.
And that’s all there is to -calculus! It’s embarrassingly simple,
and it doesn’t even have any assignment operators, named functions,
bindings, or even numbers: in the purest use of -calculus all of
those things are implementable in terms of just… functions.
Let’s go through some examples. The Python function
or to make the comparison even more obvious
would be written like this in -calculus, if we had a function available:
Now wait a minute, you might be thinking: how does take in what looks like two arguments? Where are the parentheses and commas to make that function call? Didn’t I say we only had one-argument functions?
In the lambda calculus (and in many functional programming languages) we implicitly treat function application as left-associative, so that
and we don’t have to write as many parentheses. When we wrote , we actually meant . So this is still interesting: now looks like a function that takes in one argument, and the result of that function call is another function that must take in one argument?
This is exactly what we’ve done, and this process is called currying: it takes a function that would take two arguments, and creates a function that takes one argument, returns a function that takes the leftover argument, and hands both of these two variables to the body of the original function. So after currying, looks something like
We use currying often5 to make functions a lot simpler, and so that we don’t need to add tuple (product) types yet.
Sidetrack: proving things
So why did we define -calculus terms with the judgement, and not BNF or something like that? Well, using this form we can directly prove things—by stacking rules on top of each other! To show this, let’s prove that our squaring function is a valid -calculus term. With all the parentheses thrown back in (note that parentheses don’t necessarily live in our language so we can throw them in only when things get ambiguous), we’re going to try to prove
for some . is really just the environment containing variable bindings that we start off with, and since we said we were given the function, let’s make our environment just that: , so that we’re actually proving
Proving this using our rules is actually a fun little game of matching patterns. You just write the thing you want to prove at the bottom of a piece of paper, draw a line over it, then find the rule that matches the pattern of your term (this is exactly like pattern matching if you’ve used a good language like Haskell, ML, Rust, Swift, etc). Then just keep going until your proof doesn’t have anything else you can do! Here’s my version:
This is a pretty convincing proof! We broke down the structure of our expression, and found a rule to justify each and every subterm inside our main term. Some fun things to note:
- The proof forks off into little subtrees for each branch of the rule, and we just keep proving things the same way.
- Some of the proof trees (like the one justifying ) are repeated. This is normal—that subterm just happened to appear multiple times. A lot of these proofs get repetitive real fast.
To finish up this episode, we’re going to talk about one annoying thing about this -calculus thing I’ve hyped up, that might trip us up if we were to write an interpreter (and trust me, a similar problem did trip me up once, and I only caught it 4 months into writing a C compiler).
One annoying thing in our little language that can happen is in variable shadowing and capture6. There’s no guidance in our rules on how we pick variables for -abstraction, which can get messy sometimes. We haven’t covered how to actually evaluate functions yet (we’ll be using something called reduction in the next post) so this example may be a little contrived, but try to figure out what this function does:
You can write this in Python and try to test it out:
All this function does is spit back the value we give it, untouched, but it’s confusing to read because there’s two different ’s we’re talking about: the variables inside the bodies of the abstractions are actually referring to different quantities. It helps to color them:
In particular, the red inside the inner -abstraction is particularly confusing, because it’s actually under two different bindings of , and by convention we have decided to pick the innermost one7.
It also doesn’t help that when you squint a little and think about all that programming experience you have, it’s kind of… obvious (?) that the variable names we choose don’t particularly matter (as long as you’re okay with coworkers hating your unreadable code). We could have easily written this as
and I think we’d all agree it’s a little less confusing. This transformation of variable names is a process known as conversion, where we replace all the instances of a bound variable with a different name, to avoid conflicts when performing complicated operations on -calculus terms (and we will definitely be doing some in the future). We might even say that these two representations are equivalent, since they basically mean the same thing:
Wouldn’t it be cool if there was some notation that removed all these variable names and just cut down on all this noise? These two terms look and feel the same, so Shirley there must be some smart math way to represent them the same way, in kind of like a canonical representation? We’ll be using de Bruijn indices as this cool new notation in a future post.
Like any good blog author, I have an intended audience all figured out (?) ↩
These are not your garden variety variables. We’re making this distinction early because we’re going to be discussing languages that have actual variables, and it’s important not to mix up variables in your programming language, with variables in your logic system. ↩
Well, not really. ↩
In contrast to some silliness involving infinite paper tape. ↩
Indeed, in the Haskell standard library, you can find
curry f x y = f (x, y), which is a more elegant way of writing
curry f = \x -> \y -> f (x, y). ↩
As usual, naming things is difficult. ↩
We’ll reformulate -calculus in the future with a few small rules so that this convention doesn’t seem totally arbitrary. ↩