Evaluating our Frame-Based Stride Editor

Last week at ICER, Thomas Price presented a paper: “Evaluation of a Frame-based Programming Editor”. I provided some technical assistance for the data recording aspect (and not very proficiently — I was responsible for some lost data), but Thomas and his local co-authors at North Carolina State Universiry (NCSU) did all the hard work.

The study looked to compare our frame-based Stride editor to text-based programming. Since Greenfoot supports both Java and Stride, it makes for a rather neat experiment design: you can get students doing exactly the same tasks in the same IDE, with only the editor component differing between the two conditions. Thomas & co at NCSU set up such an experiment, with one class of middle-school students doing a Greenfoot exercise in Java, and the other condition using Stride. We recorded detailed data on their programming activity by re-purposing the code from our Blackbox data recording project.

It’s no surprise to say that the Kent side of the team were hoping for positive results for Stride, as we are personally invested in our new tool. The nice thing about this study is that with the NCSU side doing all the data recording and all the analysis of the results, we managed to minimise any such bias. And as often happens in real experiments, the results do not have a single clear narrative. My one-paragraph gist is as follows:

The students in both conditions rated the activity as low frustration and high satisfaction. No differences in satisfaction were found between Java and Stride; there exists, however, potential of a ceiling effect. Students in the Stride condition advanced through the task instructions faster than the Java side and completed more objectives with less idle time than Java. Less time was spent making syntactic edits in Stride than in Java, and less time was spent in Stride with non-compilable code.

Mark Guzdial pointed out on twitter that it was quite surprising that since the Java students did seem to tail off, spending more time idle and completing less of the later objectives, that there was no different in frustration, satisfaction or performance:

I wonder if any difference in frustration/satisfaction may also be masked by the ceiling effect: if students in Stride were super-satisfied and unfrustrated at the extremes of the scale, a slight reduction on the Java side may not have been picked up. As for the lack of difference in time, I’m guessing that the time saved by Stride’s syntax benefits is being partially absorbed by struggles with the editor paradigm, such as learning the command keys or editing expressions.

The paper is freely available to read, so if you are interested, take a look.

Measuring Errors

One interesting technical aspect of the paper is to do with measurement of errors. One of our hypotheses was that Stride should help to eliminate or minimise a lot of errors in Java, such as syntax errors (e.g. mismatched brackets) and forgetting method parameters.

In Greenfoot 2.x and BlueJ 3.x, students must explicitly click the compile button to compile their code, which either succeeds or displays an error. With explicit compilation you get information from the fact that the user clicked compile: they probably think their code is in a state where a compilation should succeed. And if in two conditions users compile with a similar frequency, you can compare the proportions of successful compilations to get an idea of any differences in error rates.

Greenfoot 3.x and BlueJ 4.x have changed this. Now, errors are delivered automatically, whenever you stop typing for a few seconds. So we lose the information on whether students think the code is error-free, and we get a lot of spurious errors. Worse, the error rates will differ arbitrarily between Java and Stride. For example, if you slowly enter “if (x < 0) {x=1+2;}" in Java, you'll get syntax errors from the moment you start typing until you enter the final closing curly bracket. In Stride, the code will be valid for a few intermediate states (e.g. the if-frame will compile successfully once you've entered the condition, but before you've begun the body). It is not fair in that case to say that Stride causes less errors.

Thomas and I struggled to find a way to sensibly compare error rates. Obviously there are other measures which we could and did use, such as task progress, or occurrence of a specific kind of error (e.g. mismatched brackets). But it is still interesting to compare programming error rates. The measure we settled on is "time spent with uncompilable code". It is not perfect, but it seemed like a reasonable first stab and better than say number of errors per minute, or percentage of successful compilations. I'd be interested to know of any other such measures, perhaps from studies of professional programmers where IDEs also automatically provide errors. The result seemed to favour Stride:

The Stride group spent on average 7.45 minutes less time with non-compilable code than the Java group, which spent on average almost half of the activity with non-compilable code.

I was also interested in another of the paper’s minor results. Our Stride editor provides customised help in a number of cases. If you begin to write a variable declaration in Java and you get as far as “int” then go off to look up something else (forgetting the variable name), you won’t get a very helpful error message from the Java compiler (“Not a statement”). In Stride, we can give the more specific error “Variable name cannot be blank”. We have a small number of these customised errors, but we found that four out of the top five errors in Stride were our customised versions. This means that Stride can replace the often-unhelpful Java compiler error messages in the most frequent cases, which I think bodes well for assisting novices.

Our current plan is to add Stride to the forthcoming BlueJ 4.x, and to add Blackbox support for recording it, which should give us some interesting data on the use of the language. (It will also provide an interesting opportunity to contrast behaviour between BlueJ 3.x, with explicit compilation, and behaviour in BlueJ 4.x, with automatic error display.) If you have signed up for access to Blackbox, you will also get access to all this data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s