Judging Programming Performance, Programmatically

One aspect that is shared between programming games and programming education is the need to measure progress. In both cases a player/pupil is given some sort of problem description or set of requirements, and they must write a program to satisfy this. If the program meets all the requirements, that’s usually a success (modulo code quality). But if the program doesn’t do what it is supposed to, how do you judge its merit?

When marking programming assessments, educators usually try to judge if the student had the right idea, or if they were working along the right lines. Perhaps fittingly, judging the quality of a part-finished program is a task which needs a human. No-one has yet invented a metric that can judge program quality: it either works or it doesn’t, and if it doesn’t, judging progress is hard.

This is really the reason why there are so few large-scale programming contests. Either the submissions must be manually judged (which doesn’t scale well, at least without crowd-sourcing), or there needs to be some way of automatically scoring/ranking the submissions. But if all you can test is working/not-working, there’s no way to form a fine-grained scale to pick a winner! Thus the programming contests that do exist (like the ICFP contest) tend to be based around an open-ended problem with a score attached, such as writing a control program for a Mars rover that must take a minimum time to find its way to its destination. Thus, the outcome is measured — not the program itself.

So what do programming games do when they want to score your efforts? SpaceChem opts for two easily measured metrics: program speed, and program size. As a game mechanic, this works reasonably well. Trying to speed up your programs really taxes your brain and adds a new dimension to the game. However, in turns of programming education, this is a pretty terrible idea. Not only does it support students’ misconceptions that speed is important, but to speed up programs you must do all sorts of dirty tricks and learn more complicated, intricate ways of programming. The same goes for optimising for program size. So as good a programming game as SpaceChem is, it demonstrates that measuring programs by these simple metrics is a bad idea.

Presenting Performance

Having said that, one really nice thing about SpaceChem’s scores is the way they are presented. Lots of games just present your score and global rank. But being #148502 in the world is not really very exciting, not is it particularly motivating. So SpaceChem presents histograms, showing everyone’s score, and where yours sits:

As well as being statistically sensible (a point estimate for the mean would not tell the full story), there is motivation to improve. The diagram above shows that my solution was clearly taking a lot longer than most people, and it looks like there’s a fairly common solution (the tall blue bar on the left) that I have missed. I wonder if it would work to hand out histograms like this to school/university classes after an assessment?

Programming Games: SpaceChem

In this series on games that involve programming as part of playing, I’ve so far covered RoboRally and LightBot, and this time it’s on to SpaceChem.

SpaceChem doesn’t help its own cause. For starters, it has a dull, misleading name: it may sound like a chemistry game, but it’s got only slightly more relation to chemistry than Angry Birds has to birdwatching. SpaceChem is a programming game, through and through. Its interface is rather, well, austere (see picture below), the tutorial is a bit sparse, and it’s difficulty curve is a lesson in what the function f(x) = x^x looks like. Now that we’ve set out its flaws, we can go on to look at why I believe SpaceChem to be the best programming game yet made.

Programming Model

SpaceChem is rather close to assembly language programming. It has two parallel threads of execution, each with an instruction pointer (called a “waldo”) and a series of instructions that are rather familiar: input, pick up (load), put down (store), output, and later on: conditional execution. The difference between SpaceChem and text-based assembler is that the “memory” is a 2D grid, the instruction pointer moves on a two-dimensional track of your choosing, and the data to be manipulated is represented by chemical atoms which can be bonded together. Here is the solution to one of the early puzzles involving simply shuffling a molecule from the input area (left) to the output area (right):

The blue instruction pointer (the hollow circle) starts at the START node. It then follows the path traced by the blue arrows. Each square can have 0 or 1 arrows and 0 or 1 instructions. The INPUT node will summon an atom to the input area, PICK UP will pick it up, so that the instruction pointer carries the atom around the track, until the PUT DOWN node is reached, and then the OUTPUT node sends it off. The instruction track is a loop, so the program will repeat until enough output atoms have been produced.

Although SpaceChem is primarily a sequence of instructions to be executed, as in RoboRally or LightBot, there is also a strong element of control flow. LightBot has functions and basic recursion, but SpaceChem has an abundance of loops, and in fact has control flow beyond most programs (instructions can usually be executed twice by lying where two control paths intersect). You must consider state and dependency a lot more: INPUT instructions must execute before the PICK UP instruction, so that the atom is ready to be picked up. You must make sure you output old data before attending to the new data: a primitive sort of scope. Problems must be broken down into smaller steps or else you have no hope of solving them. Small components (e.g. a fetch-bond loop) can (and must) be re-used as part of larger solution. The complexity is quickly on a different level to that of LightBot: if LightBot is for kids playing around, SpaceChem is for teenagers or adults who must apply serious brainpower.

Difficulty Wall

SpaceChem shares several traits with LightBot: it has a clearly defined goal (the atoms that you must output), all the state is visible at once (essentially, the grid, and the atoms currently present), and it’s relatively easy to see how the instructions get executed. But SpaceChem is not content to build up a progression of simple puzzles. You soon start using two threads in tandem (one red instruction pointer, one blue), which leads to issues with race hazards, so you have to start using a SYNC instruction to coordinate. You can see the two tracks in action in SpaceChem’s intro video:

You move on from programming one reactor (as seen earlier) to systems of several connected reactors wired together, so that the output of one reactor feeds to the input of another. Therefore large problems must be broken down into several smaller steps, each to be solved by a different reactor:

You can even save the reactors for later use. Code re-use and modularisation! As you progress through the game, you get access to new instructions: detectors for conditional execution, “flip-flops” for state. And soon the problems grow large enough that the eight by ten grid is very limited space, so you must be cleverer about what you are doing. But this really is programming: hardcore clever fiendish programming of the sort that old programmers get all nostalgic about, and that is largely (and thankfully) lost to the ages. Even as a fairly experienced programmer, I found it quite a challenge, and still have yet to complete all the levels! To give you a feel, here’s one of the neater and simpler solutions for a later level:

Bettering SpaceChem

SpaceChem is too hard to give to most students, but it has a nice programming model and visualisation that has a higher ceiling than any other programming game. I feel like there is a great game to be made, suited to a much wider range of students, based on SpaceChem’s core template. SpaceChem Jr, if you like. Dump the atoms to avoid the chemistry link and replace them with farm animals, or sticky balls of goo, or whatever. Remove the explicit INPUT and OUTPUT instructions, make the grid larger (to avoid the cramped space), greatly reduce the difficulty curve, maybe add in re-usable components on the reactor level, and find a better way of doing extra state than SpaceChem (flip-flops) and LightBot 2.0 (colours). I think that could be a real asset for introducing programming in the classroom via games.

SpaceChem is available for Windows, Mac, Linux, and also iOS and Android (though I wouldn’t recommend it on a phone — only for a tablet). There’s a demo if you want to give it a try (demo link is on the right-hand side of this page). The game is only 10 USD if you want the full thing — though if you hurry, it’s currently 50% off on Steam. In the next post I’ll look at a couple of the competitive aspects of SpaceChem and why programming games tend to teach some bad practice.

Programming Games: LightBot

This series of posts looks at games that involve programming as part of playing them. Today: LightBot, a free online flash game.

In my last post, I described RoboRally, a board game involving programming a robot using simple commands like move forward, turn left, and so on. In this post, I’m going to look at LightBot, a computer game involving programming a robot using simple commands like move forward, turn left, and so on. Hmmmm! I don’t know if LightBot was actually copied from RoboRally, but either way: the mechanics are very similar.

The interesting parts of LightBot are, therefore, where it differs from RoboRally. In terms of the programming mechanics, the main addition LightBot makes over RoboRally is the addition of functions/procedures. LightBot makes their use necessary by having problems that require more instructions than you can fit in the normal program execution, thus requiring you to pull out parts into a procedure, and calling that several times. It’s quite a neat, simple addition that adds another concept (although there’s no parameterisation for the functions, which obviously limits them).

The major difference between RoboRally and LightBot are the overall setup. RoboRally is a competitive game in which you must guide your robot around to the next checkpoint, dealing with the board obstacles. (Though I guess you could adapt RoboRally as a solitaire game…) LightBot has an explicit set of challenges that increase in difficulty and progressively introduce new concepts. The design of puzzle games is very similar to ideas in instructional design: you must slowly add new tools to the player’s toolbox and offer simple problems that demonstrate the use of each new tool, before you can then offer a problem that involves combining several of the tools.

The other benefits of LightBot are that the goal of each puzzle is clear, the behaviour of the robot and the correspondence to the instructions is clear, and the state of the world is simple and visible. This makes it easy to understand what the instructions do, and easy to see where you are going wrong. It’s no surprise that many teachers use LightBot as a good introduction to programming, especially for younger pupils.

LightBot and RoboRally both share these advantages of having highly visible state and allow easy understanding of what each instruction does. So why can’t all programming be like this? I think the answer is that programming all too quickly involves too much state to be visualised succinctly. As soon as you have more than a few variables (and maybe a data structure), the state becomes too much to visualise all at once. And when your program gets big enough, you shouldn’t need to visualise the whole state anyway: modularisation is an important part of programming later on, precisely because it allows you to consider one part in isolation of the others. Although that’s not to say that programming games can’t get complex, as we’ll see in our next example.

Programming Games: RoboRally

There are relatively few games that feature programming as a core mechanic. In this post I’m going to explain one such game: RoboRally. For those who know LightBot (which I’ll cover next time), RoboRally is like a non-electronic, multiplayer precursor to LightBot.

Game Mechanics

RoboRally does involve programming, but it’s a board game rather than a computer game:

The game is centred around a group of robots (hence Robo) let loose in a dangerous factory, racing towards a series of checkpoints (hence Rally), while trying to shoot and ram each other as much as possible. Hmmm, this one might be a bit of a boy’s game, but bear with me! Play takes place on a combination of these boards of square grids:

Each robot occupies one square, and the board is littered with environmental hazards: lasers that damage you, pushers that shove you, conveyor belts that move you around and so on:

Your job is to guide your robot from A to B. And how you do this is where the programming comes in. There is a deck of movement cards, with very simple movement primitives: move forward (1, 2, or 3 squares), move backwards (1 square), turn left (90 degrees), turn right (90 degrees) and turn 180. You normally get dealt nine of these cards at the beginning of each round:

You must then select five of these cards to comprise your movement for this turn. Every other player does the same. Then you all reveal your cards, one by one, and execute the moves on them. Here’s an animated example (although here I’m covering up the cards to show which moves I’ve made):

The game has various other features which add challenge. Every unit of damage you incur (from players or board obstacles) reduces your starting hand by one card, which starts to get difficult. When you are close to another robot, you must start to consider that you might get rammed, i.e. pushed one space, which really upsets your planned movement! And you want to try and avoid being shot by other players, or else you’ll soon have very few cards each round.


So what programming concepts does RoboRally feature? The first is sequencing of instructions: your cards take effect one after the other. Aside from the unpredictability of other players’ robots, the game is completely deterministic. Your move cards do exactly what they say, and your robot’s interaction with the board has no element of chance. In fact, it all sounds very easy! But a major complicating factor is that you must plan all five cards without moving your robot. So you must be able to predict where your robot will be after each card, in order to get the next move set up correctly. So you have to be able to mentally execute your moves.

The Learning Curve

When players first start out in RoboRally, the first challenge is just to string together five cards to move to where they want. The mental execution proves a bit tricky, so after the third or fourth card, players end up not quite where they had imagined, and thus the fifth card can see them lurching forward towards a deadly pit! But by executing the moves (and with players watching and correcting each other), they can see the disagreement between their prediction and the actual execution. One nice feature of RoboRally is that all state is visible: your robot’s position and rotation on the board is the only thing to keep track of. And the good thing is that failing is funny, with robots unintentionally running into walls or pits.

After players have mastered the basics of movement, they have to then understand the environmental hazards. Walls are very straightforward: you can’t pass through them. But boards can also feature conveyor belts, which move you forwards (and potentially turn you) if you are standing on them, and pushers which shove you sideways. So players must learn the rules of the board squares, and form a mental model of what will happen, in order to plan their move successfully.

Players soon start learning little tricks, especially when their hand selection becomes limited: a turn left 90 plus a turn 180 is a substitute for the turn right 90 that you desperately needed but weren’t dealt. Sometimes, moving backwards can be a useful way of getting to your goal. Planning eventually becomes almost as easy as you might initially have expected: after all, how hard can programming a deterministic machine be? (Programmers know the answer to that one!) You then have to start factoring in what your opponent on the adjacent square is doing, and planning against it (defensive programming!?).


RoboRally is a great example of a programming game. It has a few flaws as a board game: once one player edges ahead, they tend to stay ahead (while the robots behind shoot each other and get in each other’s way), although you can solve this by playing the capture-the-flag variant instead of plain racing. It can get quite slow with many players, which is especially a problem if you have mixed abilities: each turn proceeds at the pace of the slowest player. But I still think it might be a suitable activity for a small after-school club (though it may run over several sessions). Maybe we can try slotting in an evening game at our teachers’ summer school this year.


I couldn’t end without my favourite bit of RoboRally trivia. Designer Richard Garfield came up with the idea for RoboRally in the mid 1980s and was trying to get it published when he met with a young company called Wizards of the Coast (WotC) in the early 1990s. WotC were interested but felt the game would be quite pricey to produce; they wanted a smaller, cheaper game before RoboRally. Garfield offered them a card game that fit the bill, and WotC bought the card game and RoboRally. That card game was Magic: The Gathering, which quickly became one of the most successful card games ever, and which still has millions of players nearly twenty years after its release. But I prefer RoboRally.