Code highlighting: the lowlights

Syntax highlighting is such a ubiquitous feature in program editors that we often give it very little thought. It’s even like an indicator of program code: you can tell something is code if it is in a fixed-width font and some of the words are consistently coloured. It’s clearly a popular feature but is it actually helpful?

The latest paper on this (paywalled, alas) is by Hannebauer et al, which I found via Greg Wilson. They set ~400 participants a variety of comprehension and editing tasks and found zero difference in correctness between having syntax highlighting on and off during the tasks. So it doesn’t look like it helps with programming.

There’s two ways to take this result. One is that since the feature is ineffective, we should stop wasting effort on building it into our IDEs. The other way to view null results is that since it makes no difference either way, we are free to choose based on other considerations. The authors of the paper imply that people seem to like syntax colouring, which may well be for aesthetic reasons. And if it doesn’t get in the way, why not make it look prettier?

The authors end with a suggestion that highlighting syntax keywords may not be most effective use of colour, and propose a few of their own schemes, such as using colouring to do a live git blame display. I’d say the obvious other use of colour is for scope highlighting, where coloured background indicates the extents of code blocks. BlueJ has both syntax highlighting and scope highlighting, which can be a bit busy:

When we made the Stride editor, we left out syntax highlighting and just kept the scope highlighting provided by the frame outlines, which seemed less visually noisy:

We haven’t done a study to look at the effect of this scope highlighting. But David Weintrop and Uri Wilensky did something similar when they looked at multiple choice questions shown in text-based form (with syntax highlighting) versus block-based form (which is effectively scope highlighting), and the non-null effects showed a superiority of blocks over text, although the highlighting is not the only difference:

Their paper is available online (unpaywalled).

So although syntax highlighting of tokens does not seem to make a difference, scope highlighting may aid comprehension. (If anyone wants to study this directly, you can toggle syntax and scope highlighting on and off in BlueJ, so be our guest…)

Advertisements

What makes a [computing education] research paper?

Yesterday on Twitter, Jens Moenig had some kind words to say about our journal paper on Stride and complained about its repeated rejection from other journals as a symptom of incorrect criteria for accepting computing education research papers (head to twitter for the full thread):

The core issue was this: the original Stride journal paper was a long detailed description of the design of our Stride editor and the decisions involved, with very minimal evaluation. In general, should this be accepted as a paper?

The case against accepting

Computing education is full of tools. There are lots of block-based editors and beginners’ IDEs and learning assistants and so on. I’ll admit that — even as I work on making new tools — when I come to review a paper with a new tool, I do roll my eyes briefly and wonder if yet another tool is needed. The problem for our field is that we have a lot of tool-makers, but few tool evaluators. There are much fewer researchers (like David Weintrop, for example) who perform detailed comparisons between tools that they did not write themselves. The field has a glut of unevaluated tools, which is surely not helpful for someone wondering which tool to use, and we can’t be sure that any of the tools actually aid learning. In this light, rejecting our paper seems reasonable: yet another paper on a new tool with no evaluation.

The case for accepting

There’s two main arguments I see for accepting the paper. One is that the design description itself can be of value. As someone who builds tools I find it very useful to talk to other designers, like Jens and John Maloney, to find out why they made certain design decisions. I can use their tools — Scratch, Snap, GP — but that doesn’t explain the full story behind the design choices. Jens’ point is that he found it useful to read our decisions in order to improve the decisions they make in their tools. This type of exchange is beneficial for the field — the question then is should these design descriptions be considered computing education research papers by themselves, or should they be put somewhere else (some kind of design journal? or things like a tools paper track?).

The other argument for accepting design by itself is the amount of work. Our paper with design alone was 25-30 pages, which pushes the limit for most journals, and was the summation of three years’ work. A full evaluation would add another year and another 10 pages. Should this be one mega piece of work, or two separate bits of work? It can only be two papers if the first design paper can get accepted by itself. (The counter-argument is that there’s no guarantee the second paper ever appears…)

I will say that there are differences in quality of writing about design. A lot of papers I see on tools fall into the trap of describing technical details which do not generalise (e.g. we used web server X and hooked it up to cloud Y for storage) rather than discussing design decisions and trade-offs and user considerations. They also tend, due to page limits, to have minimal descriptions and pictures of the system as a whole. I had to look quite far back in time for the related work section in the Stride paper, and I can confirm that your paper will outlast your tool, so it needs to be useful to someone who does not have the tool itself. I’ve written in more detail in a previous post about this issue.

Summary

I’m not interested in grousing about one particular paper, but this is an issue that we run into repeatedly in our research. Our team has a lot of expertise in building tools but not much in evaluation. Should we be able to publish our designs as a computing education research paper, or should it always be coupled with an evaluation? It would be much easier for us if we were able to only publish designs, but I’m sympathetic to arguments on both sides.

At the ICER doctoral consortium a few years ago, Sally Fincher and Mark Guzdial ran an exercise to ask the students and discussants what computing education research comprised. I said that it was investiations of student learning and that our tools-only approach was on the periphery. What surprised was that all the people doing such investigations said that computing education research was tool-building, and their investigations were peripheral. I think perhaps this tension between tools and evaluation is inevitable in our field — but maybe it’s also useful.

Pedagogy of Programming Tools

If you want to teach programming, you have several decisions you need to make. You need to choose:

  • a programming language, such as Java, Python, Javascript, Ruby,
  • a programming environment, which may be something like Notepad + command-line, or a full-blown IDE like Visual Studio,
  • a context, such as making games, media computation, website creation, robotics, and
  • a pedagogical approach, such as what you will teach, in what order, and using which activities.

Not everyone thinks about that last item in terms of explicitly deciding on a pedagogical approach. But as soon as you start making decisions such as: “what task do I start with?”, you are implicitly deciding. Do you start with “What is a variable?” or “Here’s how to print ‘Hello World'” or “This is the syntax of a function call”? Do you teach automated testing? Do you start with a blank program or start by modifying an existing program? You have always chosen a pedagogical approach, whether you realise it or not.

What’s interesting about the four items above is that they all interact with each other. The top three clearly so: you can’t write Java in IDLE, for example, and you may find your robot of choice doesn’t support Javascript. But the tool and the language you choose will affect the available pedagogy and vice versa. Programming tools are not pedagogy-neutral. Your tool determines which programming-related activities are easy and which are hard, which in turn will affect how you use the tool to teach.

Code tracing is a useful skill but doing it an environment with a debugger that shows variable values step-by-step makes it much easier than in Notepad+command-line. Parsons problems (where you drag bits of pre-written code into order) are easier in Scratch than in a text editor. BlueJ lets you call methods on objects via the context menu without writing any code, whereas an IDE like IntelliJ does not. It’s useful to understand what pedagogies your tool supports or makes difficult when making a choice.

In our latest Greenfoot Live video, my colleague Hamza and I sat down for half an hour to do some Greenfoot programming and talk about pedagogical strategies in Greenfoot: ways you can use it to teach, and what pedagogical approaches we have in mind when designing the tool. I’m quite pleased with how it turned out, and I think it’s worth watching:

Whether you agree with our particular pedagogical philosophies or not, next time you choose a programming language and tool, be aware of its impact on what teaching approaches and activities it can support well, and which activities it will make hard for you to engage in.

Frame-Based Editing: The Paper

Frame-based editing is our work which combines blocks and text-based programming into a single method of editing. Our frame-based language, Stride, is available for use right now in the public releases of both Greenfoot and BlueJ.

Now, our large paper on frame-based editing has been published. You can freely download the individual paper, or the whole special issue with several other interesting-looking papers on block-based programming.

This paper is the canonical description of our frame-based editing work, describing its features and our design choices. I’m also quite pleased with section 13, which tries to explain why structured editing failed to take off, and yet block-based programming became a great success, despite being very similar concepts.

Thanks are due to John Maloney and Jens Moenig who have been very supportive of our work, as has the editor Franklyn Turbak, who did a very thorough job of editing the paper.

Greenfoot Scenarios Back Online

Many years ago, we created the ability for users to upload Greenfoot scenarios to our website, and play them from the website without requiring Greenfoot to be installed locally. This was a great selling point. If you were a learner creating games at home or at school, it allowed you to share the game with friends and family by just giving them a link to the online game.

However, the implementation used Java applets, which have turned out to not be secure. Over time, browsers have one by one dropped support for applets, and slowly it became harder and harder to run the Greenfoot scenarios in a typical web browser. The only technology guaranteed to work in every browser is Javascript. Despite the names, Java and Javascript are not really related, and so converting Java to Javascript is a very difficult technical challenge. But… my colleague Davin has been working on this, and in a surprisingly short time, has got a Java to Javascript converter working for Greenfoot scenarios. It’s currently in beta testing, and details are available on our forum about how to run this.

The Marbles scenario is a good one to try. Remember that you need to press the Run button before playing. (Enlarging this button and making it easier to start playing is one of the items still on our todo list.). Then drag from the gold ball to fire it.

On desktops and laptops, this really just restores functionality that we had several years ago with applets. However, the new advantage of being in Javascript is that it means the scenarios on the website can now run on Android and iOS devices. You can’t develop the games on such devices, but once published to the website, they will be playable on other devices (essentially, touch works like a mouse device as far as the scenario is concerned). This should allow Greenfoot games to be shared widely to friends and family. Once it is finished, this will be publicly available by default, but for now you need to follow our instructions to enable this.

Greenfoot Live

There are many aspects to making a learners’ programming tool successful. You obviously need the tool to be working and useful, but you also need to have material available on how to use it. Over the past year or two, our team been very busy with a lot of technical implementation work: having made our Stride editor available, we’re now neck-deep in a rewrite of BlueJ’s GUI from Swing to JavaFX. So, we have fallen a bit behind on our efforts to communicate how to teach with Greenfoot, and to engage with teachers who are using it. But this Monday we started a new initiative to try to rectify that: Greenfoot Live. Our plan is that every two weeks on a Monday at 17:00 UK time (which is UTC+1 over the summer), we’re going to do about 30-40 minutes live stream talking about Greenfoot. If you can make it live, great — but if you can’t then you can still watch the recording afterwards.

Our first show involved Michael and me covering how to display text on the screen in Greenfoot using the showText method, but we also discuss a few software design choices and encounter an exception along the way:

The next live stream will be on Monday 22nd May at 16:00 UTC, available via our Greenfoot Youtube channel.

What Do Novice Programmers Write Literally?

Recently I was asked about use of hex and floating point literals (especially “E” notation) in the Blackbox data set: do beginners use them? I was intrigued enough to knock up a simple program to find out. My method is quite straightforward: I take the latest version of each source file which successfully compiled, run it through a Java lexer and pick out the literals. This gives us about 40 million source files to look at.

Before we get into the results, here’s some predictions I made beforehand about our data (where most users are assumed to be programming novices):

  • Very few users use hex literals
  • Most hex literals are 0xFF or similar bitmasks
  • Almost no-one uses underscores (Java lets you write numbers with underscores, e.g. 1_000_000)
  • Almost no-one uses E notation (and when they do, mainly 1e-6 for epsilon values in floating point comparison)
  • Most floating point values are between 0 and 1

Hexadecimal Integers

Let’s start with hexadecimal integer literals. There were 814,920 hex integer literals, compared to 29,044,559 decimal integer literals. So 2.7% of hex/decimal integer literals were in hex. (I didn’t bother going into octal, but there were a handful of uses. I suspect many of these were an accident.). That is a bit higher than I was expecting, admittedly. In terms of their value, here’s the top five:

  • 0xFF: frequency 89,663
  • 0x0: frequency 52,732
  • 0x30: frequency 16,742
  • 0xF: frequency 16,009
  • 0x1: frequency 13,799

There are two F bitmask values there as predicted. I was a bit surprised by how many zeroes and ones were in there: why write them as hex (0x0) and not just decimal (0)? My guess is that they are working with bitmasks nearby, and out of habit/consistency write the values as hex.

Decimal Integers

There’s not too much to say about decimal integer literals, but I will mention the most frequent items. It’s a sequence that runs as you might expect (zero being most frequent, and increasing numbers being less frequent), punctuated by some numbers which testify to computing’s love of powers of two. Most frequent first:

0, 1, 2, 255, 128, 3, 4, 5, 256, 8, 10, 7, 6, 100, 127, 16, 20, 1000, 9, 50

The frequency is a decreasing power law (1 is half that of 0, 2 is half that of 1, then the tail begins to flatten out).

Underscores

Underscores are a relatively recent addition to Java (added in Java 7) and little-known. Indeed, only 692 decimal literals had underscores: 0.002% of all decimal literals. Oddly, 737 hex literals had underscores, which as a proportion is much higher: 0.09%. I suspect this is because both underscores and hex literals are both used by more advanced users. Generally though, our users are clearly not making much use of this underscore feature.

Decimal Floating Point

There were 1,791,915 floating point decimal literals. Of these, only 3,002 used the “E” notation (e.g. 1.15E12): 0.16%. Clearly not a very used feature. As for their values, the top five were: 1e-3, 1e-8, 1e-6, 1e6, 1e-20. I’d say my prediction about the use for epsilon values was borne out.

Regardless of notation, across all floating point decimal literals, the most frequent values were: 1.0, 0.0, 100.0, 3.0, 2.0. Technically, my prediction that most values were between zero and one was almost correct: 47% of values were between zero and one. But really, this is only because 23% of them were zero or one. As a last side note on these literals: 7,130 (0.40%) started with a dot (e.g. “.5”) — something we disbarred in Stride due to the awkwardness of parsing that in expressions. But actually we could have banned E notation (also a pain) with less immediate impact.

Hexadecimal Floating Point

If you even knew that hexadecimal floating point notation was a thing in Java, then give yourself a pat on the back. Added in Java 5, they look like “0x1.fe2p5”, where p takes the place of the usual “E” notation because E is of course a valid hex character. I only know about this because we have a parser in BlueJ, which does accept these. I found precisely four uses of this notation, which is probably more than expected.

Limitations

This is a pretty cursory look at literals with a fairly crude methodology. Note that although we only looked at the latest version of each source file, source files in Blackbox are not independent of each other (e.g. if a teacher gives out a project with a floating point literal, that will show up identically in each student’s copy). For example, the four hex float point literals were the same value, suggesting they are not independent. And on a related note, I’ve only looked at source files regardless of whether they come from the same user or not, so we’re only measuring source occurrences here, not the number of users who use a particular notation. But I think our N is high enough that individual users cannot tilt the statistics.