Category Archives: Uncategorized

Stride and Git arrive in BlueJ preview

We’ve just released a preview version of the next major BlueJ release, 4.0.0. It’s version 4.0.0preview2, available for download from the main website. There’s several features available in this preview release:

Stride is our blocks-like structured code editor which we added to Greenfoot 3, and we’ve now added it to BlueJ as well. There’s lots of details elsewhere on Stride so I won’t reproduce it all here. There is our guide with a simple text+pictures overview of the editor. We recently noticed that a conference talk we gave at last year’s JavaOne made its way online, so you can watch a video of that on Youtube. And Michael has been making a few short videos about Stride over on his blog, which you can watch. We’ve now included two-way conversion from Stride to Java and Java to Stride, so it’s easy to take your existing Java projects and convert them to Stride to get a good feel for the editor.


We’ve also added Git to BlueJ. People have been asking for Git support for years, but previously we only had Subversion and CVS(!) support. We’ve now added Git, with a fairly simple interface that makes it easy to get started with using Git to version control and share BlueJ projects. We’ve got a draft tutorial online for Git in BlueJ. (And CVS support has now been removed.)

Another major change is in the error highlighting. Previously, BlueJ would only show an error when you hit the compile button, and then only one. That’s now changed to match the behaviour of most Java IDEs: errors are shown as a red underline as you write the code, and if there are several errors, they are all highlighted. It will be interesting, once the full release is out, to look at our Blackbox data and see what effect this change has (if any) on programming behaviour in BlueJ.


The last change is one of the most time-consuming but least catchy. We’ve rewritten large parts of BlueJ’s interface (from using Swing to using JavaFX, the newer Java GUI toolkit). Along the way we’ve improved various features, which I’ll talk more about another time. Probably the most noticeable change is that we now support tabbed editors (you can see the tabs in the pictures above), with multiple editor tabs in one window, rather than always having a new window for each editor (another oft-requested feature).

It’s called a preview release because we know it’s not quite finished: there’s still a bit of GUI to improve, some small bugs to iron out and so on. But we think it’s close enough to let everyone have a play with it. If you spot any issues, let us know: in the comments here, by email to, or on the Blueroom teachers site.

Leave a comment

Filed under Uncategorized

A case for publishing research software

A major part of research is acquiring and sharing knowledge. This manages not to be as straightforward as it should for political/business reasons (see: journal publishers, paywalls and open access), but technically it is at least simple. You write a paper consisting of words and pictures, other people download them and read them. Knowledge has been transmitted. Where life gets much more difficult is a newer but fast-growing part of research: sharing software.

One use of software in research is as a tool for doing analysis. This affects all the natural sciences, and there are issues with how to gain credit for producing software (see the new proposed policy on software citation, and the new journal of open source software). But within computer science there is an additional research role for software: sometimes the software is part of the research output. Nowhere is this more apparent than in areas like human-computer interaction research.

Publishing software interface research

The typical form of a modern paper about a new software interface is to provide a description of the interface, followed by an evaluation of the interface with human participants. Thus the research output is two-fold: design and science. Putting these both into a paper might seems sufficient.

The software research process.

The classic software research process. The researchers follow the process on the left, but everyone else only sees the output on the right. The researchers must have the software, but they are not required to share it, only to describe it in the paper.

However, accurately describing an interface design in text is a difficult task — the medium is just ill-suited. (Much like writing about music being compared to dancing about architecture.) It is difficult to describe the function of all interactions with the system: you’d write an endless series of “when the user presses left here they return to the home screen”; something almost akin to the original program code. You’d also need to describe not only the intended interactions but what happens when the user does something wrong. You also can’t use pure text: images are surely necessary to portray an interface. And that’s not to mention emergent properties which affect software’s usability, like the speed of the interface. Ultimately, if you want to understand the design of a software interface, there’s very little substitute for just using the interface.

Research Software Archaeology

Recently, I needed to write a detailed related work section for our work on frame-based editing. One of the challenges of publishing this work is that it is similar to work on the structured editors of the 1980s, which have largely failed to catch on. And additionally, it seems every reviewer knows a different editor, so each one seems to come back with “how is this different to structured editor X that I used in the 1980s?” [1]

So I end up searching for details about the design of 1980s structured editors. If there’s no paper and no software, there’s not really any way to find out about the design. If there is a paper, I hope that it has a reasonably detailed description of the editor (for example, the write-up of the Cornell Program Synthesizer). Regardless, I also try to search for a runnable version of the software. Ha!

There are few editors from that period which are available to run on a modern machine. Some were simply never released, partly because pre-Internet, sharing software was awkward. Some are unavailable due to their age: many of the structured editors were designed for processors or operating systems which are no longer available. So some editors seem to be totally lost — I can’t find any leads on downloading a copy of the Cornell Program Synthesizer, for example. Some other editors have a tantalising binary distribution which often cannot be run: for example, Boxer’s Mac binary.

The ACSE/GENIE editor, alive and running.

The ACSE/GENIE editor, alive and running.

I did have one or two successes, such as getting a version of the GENIE editor running in an emulator. And it was a revelation that greatly pushed forward my understanding of old structured editors. By modern standards, they were awful. The papers’ descriptions didn’t make clear how tedious and fiddly the navigation was, how unhelpful the editor was, how awkward it was to deal with errors. Running the software was an absolutely crucial step to comparing our work to theirs. It allowed me to understand the design and critique the editor’s operation for myself, rather than relying on the authors’ incomplete descriptions of their own software.

For all the other editors which I couldn’t run, there are these reviewers asking the perfectly valid question in research: “How does your work relate to previous work X?” And the honest answer is: I don’t know. Perhaps nobody can know any more — the paper wasn’t very detailed and the software is lost in time. This is no way to do research.

The Solution

The solution to all of this is readily apparent: if your software is part of the research output, you must publish the software. And a binary is insufficient; binaries too easily bit rot, refusing to run on modern systems with no way to fix them. Source code is what is needed.

This week Andy Ko made available his Citrus/Barista structured editor from the 2000s. I downloaded and ran it: the binary did run, but it spat out repeated exceptions and I wasn’t sure if that was impairing the functioning of the software. Thankfully, Ko didn’t just publish a binary: he published the source code. For this, I salute him. I went to modify the source code and it turned out not to compile with a modern Java compiler. After some tweaks I got it compiling, and then fixed the exception. Because the code was on github, one accepted pull request later and the software in his repository will now compile and run on a modern machine. This — this is how software research should be.

Published source code for software is crucial to allow later researchers to use, evaluate and compare the software. I fully understand that everyone feels antsy about publishing source code. If I’m honest, the Citrus source code is a bit confusing, somewhat lacking in documentation and the software seems a little rickety. But that’s how research software usually is; my research code for our Blackbox work is the same. I’m not particularly proud of that code, but my recognition that sharing the source is important marginally outstripped my embarassment at its quality. Research code will almost always be shaky and iffy [2]. It’s usually written by a single person (often not a professional software developer) for a single purpose, so it’s likely to be hacky and not well documented. Let’s all accept that research code is bad, and agree to share anyway.


[1] It’s interesting to note that when the researchers ask how our work compares, they are implicitly asking about the design, not the science. Given that almost all venues will only accept science or design+science, it’s curious that most of the comparison to related work is about comparing the design. This is at least partly because the science quickly outdates in software interfaces. Even if the older editor papers had performed rigorous evaluations (which they almost exclusively did not), the results don’t necessarily persist. If someone told you that editor X had been evaluated as easy to use and as good as text editors, tested on a 25-line text terminal on a 1980s thin-client Unix machine, would you say that was useful in evaluating editor X against a modern editor? Would it even be worth comparing the usability of our editor directly against a 1980s editor? I doubt it; the usefulness of the previous work is more in comparing our design to theirs, not so much our scientific evaluation against theirs.

[2] Given that we make software — BlueJ and Greenfoot — which we encourage people to use, I should point out that they are actually stable, reliable, and fairly well engineered! And open source, to boot. The setup of our research group and funding allows us to do this, making us blessed compared to other researchers. Quality software in research is of course possible, and the preferred option, but we must recognise that it is a rarity, and not let that get in the way of sharing.

Leave a comment

Filed under Uncategorized

When do students program?

We store enough information in our Blackbox data set to look at when most programming activity in BlueJ occurs. Most BlueJ users are students, so this should give us an idea of when most student programming occurs. Methodology notes below, but what you really want is the graph, so here it is, for the USA:


It’s a heatmap: time of day on the X axis, days on the Y axis, red is highest amount of activity, down through orange to white being zero activity (e.g. overnight). A few thoughts:

  • Lunchtime is clearly visible in the data. Most programming takes place during the working day, partly because of scheduled classes. Despite stereotypes of night owl programmers, on average, people don’t program late at night.
  • Not much programming on Mondays. One reason for this is that US federal public holidays mostly fall on Monday, which reduces the amount of activity (including in the evening), but I’m not sure if that completely explains it.
  • No-one programs Friday night or Saturday… but check out that Sunday night my-assignment-is-due panic! At least, I’m guessing that’s the explanation

Methodology Notes

We could just look at time of day, but that loses a bit too much detail if you average across all days in the week. So it is better to look at times of day across the week, Monday to Sunday. Weekends are different across the world, so I’ve chosen to narrow by country. Rather than try and pick a list of all the Monday-to-Friday-workweek countries, I just looked at the USA, which is by far and away the country from which we get most data. We store enough information to know the user’s timezone, so I am adjusting properly for the multiple timezones and daylight savings.

The number for activity is a count of IDE events recorded in that hour. That is primarily source code edits, but also includes other things like debugger interactions. I think it’s a good proxy for programming activity, though.

Other Countries

Two of the next most frequent countries in the data are Germany and the UK. Here’s Germany:


and here’s the UK:


These frequencies are much lower (note the scale adjustments) so the data is noisier, and I suspect the frequencies are low enough that some patterns in the data are caused by individual institutions using BlueJ at particular times (remember that the students in our data are not totally independent: 100 UK students all programming at the same time at a university on Friday morning could noticeably affect the data). Germany has an odd pattern: more programming on Sunday than almost any other day of the week. This might be because there’s not much to do in Germany on a Sunday, but it also hints that maybe more work is done outside classes. Feel free to add your own speculation for any of the patterns above.

Leave a comment

Filed under Uncategorized

Novice Lambda Use in Java

Java provides two ways to easily provide a reference to a function:

button.setOnAction(e -> showAlert("pressed"));
stream2 =;

The first one is generally referred to as a lambda, and the second as a method reference, but I’m going to refer to them both as lambdas for this post.

In line with the previous post on enum use, this post looks at lambda use in our Blackbox data set (collected from users of the BlueJ beginners’ Java IDE). All data is from the beginning of the project in mid-2013 up until the end of February 2016, a few days ago.

Lambdas are a very recent addition to Java. Java stems from 1996, but lambdas were only introduced in Java 8 in March 2014. Thus it is not surprising that they will not occur very frequently in the data set, as instructors are likely not up to speed on lambdas yet, won’t have had much chance to adjust their course designs, and often treat lambdas as an advanced topic, in contrast to BlueJ’s novice focus. But let’s take a look anyway.

In the data we have 11,666,331 source files which have been successfully compiled at least once. I looked at the most recent successful compilation of each of those source files, to see if they contained a lambda. 6,669 (0.05%) of those source files contained a lambda, with 20,698 lambdas overall. So although lambdas are very rare, the average number used once they are used is three per file. That suggests to me that people find them quite useful, but that they have not gained any traction in education yet.

Number of parameters

For the lambda arrow syntax (i.e. params -> code), I counted the number of parameters on the left-hand side of the arrow:


The reason that 3 and 4 are on there are that there is one instance of each, from 20k lambdas, with no lambdas having more than four parameters. So if the language designers had restricted lambdas to two parameters max, our users would not have noticed.

Haskell programmers, among others, may be a bit confused by the zero-parameter lambda. This is Java’s equivalent of the use of the () -> code pattern in Ocaml/F# to refer to some code but delay its evaluation until it is called later without any real parameters. Haskell would use a monadic computation of a type like “IO ()” for this purpose.

Styles of use

I categorise lambda style into three mutually exclusive categories:

list.forEach(x -> print(x.toString())); // Expression RHS
list.forEach(x -> {print(x.toString());...}); // Block RHS // Method reference

Of our 20,698 lambdas, 14,014 (67.7%) used the expression form, 6,161 (29.8%) used the block form, and 523 (2.5%) used the method reference form. I wonder if a bit of this is advertising: I wasn’t aware of the method reference form in Java for a little while after I learnt about lambdas in the language. I use method references whenever possible in my code because I’m used to point-free style in Haskell. However, I suspect it’s more difficult for novices to understand than the lambda form, so I’m not surprised it’s used much less than the arrow form.

Destinations of Lambdas

I’m doing a syntactic analysis here, so I didn’t attempt the complex semantic task of working out which types the lambdas were being compiled into. However, I did look at which methods the lambdas were being passed to. That is, if you have code like: -> e.getX())

Then I recorded “map” as the lambda destination. The graph below gives the most popular destination methods for lambdas. The y-axis is purely cosmetic to separate out items with similar frequencies, and I only show frequency 200 upwards because it gets congested below that:


Since there are low numbers of lambdas, some of this is definitely influenced by individual courses appearing in the data. For example, I tracked the addButton method back to a utility class from a particular university, and I think accumulate has a similar story. But the general pattern is pretty apparent: lambdas are used much more for GUI event handlers than for streams. This is not surprising in one sense: existing code is more likely to use event handlers (which pre-date lambdas) than streams (which were introduced alongside lambdas), so GUIs may be the easiest way to add lambdas to existing courses. I’m still surprised at how large the difference appears to be, though. It will be interesting to check back in a year or two to see if this pattern continues to hold.

Leave a comment

Filed under Uncategorized

Novice Enum Use In Java

An enum in Java is a limited set of constant values. For example:

enum Direction { LEFT, RIGHT, UP, DOWN }

Each variable of type Direction can take on only one of those four values. (Or null, alas.)

I recently got the chance to talk to Andy Stefik, who does really interesting work on evidence-based language design in Quorum (check out this talk of his for more info). He is interested in adding enums to Quorum, but lamented that there is little data about enum use in other languages. He knew of our Blackbox data set (collected from users of the BlueJ beginners’ Java IDE), and asked about enum use in Java in the educational wild. So: here it is. All data is from the beginning of the project in mid-2013 up until the end of February 2016, a few days ago.

Enum Use in Java

On its initial release in 1996, the Java programming language had no support for enums. They were added in Java 5 in 2004 (although of course adoption of new language versions is never instant). This matters when looking at enum use in Java, because although they have been in Java for 12 years, several software projects started before they existed, many course instructors may have trained before enum use was widespread, and so on. So we will probably see less enum use in Java than if they had been present from the outset.

In the data we have 11,666,331 source files which have been successfully compiled at least once. I looked at the most recent successful compilation of each of those source files, to see if they contained an enum. 20,333 (0.2%) of those source files contained an enum (either as top-level declaration, or an inner type), with 22,693 enums overall.

It’s hard to decide what value we would have expected there. Obviously not every type is going to be an enum. If we’d found, say, 10% of all types were enums then that would be weirdly high. I did a quick count on our own BlueJ/Greenfoot code base, which began life before Java 5. We have only used Java 5 features since 2008, so would not have used enums before then. Across 2129 source files we have 70 enums in 61 files, so about 2.9% of our source files have an enum. (I tried searching github to get some data, but a lot of the enum results seemed to be IDE test-cases. Interesting!)

If anything I would expect that enum use should be higher in teaching than professional code, because I expect that educators will deliberately show enums in order to teach the concept, even though they are not used with particularly high frequency in full programs. So I am surprised to see enums used quite so infrequently in Blackbox.

Size of enums

I made a prediction about enums to Andy: that if we plotted the frequency of number of enum values (e.g. four for my direction example at the top of the post) there would be a spike at seven because many examples would be very artificial and use day of the week. I think this is just about confirmed, although not as pronounced as I had expected:


Looks like the most common number of items in an enum is 3 or 4. The data tails off after 10 as you would expect (but omitted from graph above).

Enum Features

Something else that is of interest about enums is how often sub-features of enums are used in Java. I showed above the use of a simple plain enum, but there are several other allowable features:

enum Color
    RED(255,0,0), // Constructor arguments
    GRAY(128,128,128)  // Individual body:
        public String toString() { return "gray/grey"; }

    // Group body:
    int red, green, bluej;
    private Color(int red, int green, int blue)

So: how many of the enums in Blackbox use each of these features? Of the 22,693 enums, 4,597 (20.3%) have a group body, i.e. fields, methods or constructors as well as just the item list. 3,264 (14.4%) have items which use constructors with arguments. Just 190 (0.8%) have an individual body for any of the items.

Names of Enums

A final item of interest is the most popular names for enums, which gives hints as to what they are being used for. Here’s a graph of the top ones. The y-axis is purely cosmetic to separate out items with similar frequencies, and I only show frequency 900 upwards because it gets congested below that:


The left/right/up/down directions are the most frequent use, colours are another use, and omitted from this graph are monday/tuesday/etc on about 700. This isn’t relevant to enums specifically, but I was interested to see the grad/rad/deg/degmin/degminsec pattern. This is not some extra-common pattern you’ve never heard of, but I believe is instead an artifact of the one of the MOOCs which have taken place using BlueJ. If one of their examples uses a particular enum, this gets multiplied up by the number of users, and thus shows up prominently in our data — this probably also explains “border_collie” and “pitbull” at around frequency 600. It’s something we need to be mindful of when analysing the Blackbox data; not all observations in the data set are as independent as one might think.

Leave a comment

Filed under Uncategorized

Diversity in Computer Science, Revisited

The closing keynote of SIGCSE 2016 was given by Karen Ashcraft on the subject of diversity in Computer Science. A perennial topic which I’ve seen discussed many times, but I felt this talk offered some convincing explanations and ways forward. Here’s my understanding of the talk — you may also be interested in Pamela Fox’s more detailed notes.

Ashcraft’s central argument was that professions take on an identity through a figurative practitioner. That is, software development has an identity which comes from the commonly imagined programmer: white, male, quiet, socially awkward, interested in games, comics and sci-fi, etc. This idea that professions have an identity is different from the idea that the workers have an identity that is transferred to the profession. She gave the example of pilots, where the identity of the profession was deliberately re-sculpted from “debonair daredevil adventurer” to “fatherly naval officer” in the 1930s, because passengers were much more likely to want to fly with the latter than the former. The identity of a profession is constructed and can be manipulated independently of who is actually a member of the profession.

Ashcraft had particular criticism for the idea of fixing diversity by “leaning in”: the idea that women should adjust themselves to fit into the male world. (I’m partial to the description of lean-in as “victim blaming”.) She argued that any attempt to solve diversity which emphasises the divide is bound to lose. That is, if you tell women “women can program too”, you’re emphasising the idea that women differ from men when it comes to programming, and are not going to be successful in eliminating the gender division in programming.

Ashcraft’s suggestion was that to increase diversity in a profession, you should not focus on messages about “more women”, but rather to emphasise the diversity already present in the profession which is not automatically tied to attributes like gender. For example, let’s take my imagined programmer. Rather than worry about “programmers don’t have to be white males”, we should start with “programmers don’t have to be quiet, or socially awkward, or interested in games”. Building this type of diversity will eventually encompass non-whites and non-males, if we can keep the identity of a profession broad and fluid, rather than pigeon-holed into something very specific and narrow. And this could benefit everyone.

It’s a well-known phenomenon in ergonomics and user-interface design that if you adjust your design for the extreme users, you often improve the design for core users. For example, kitchen tools were given extra-grippy larger handles to help the elderly but it also made holding them easier for everyone else, so it became the default for all tools. A few more examples are given here. (Closer to home, the interesting work of John Maloney, Jens Moenig and Yoshiki Ohshima on keyboard support in their GP blocks language arose from wanting to support blind users.)

I think that this design principle transfers into the diversity argument. Narrow stereotypes for professions harm everyone, not just those who are far outside it. There are already programmers who aren’t quiet, who aren’t socially awkward, who don’t like computer games, who play sports and rock-climb, who don’t chug coffee and code all night, who are not single, who have families. Just because you’re a white male doesn’t mean you aren’t impacted by everything else that comes with a stereotype, that you can’t feel like something of an outsider when you meet few of the criteria associated with a stereotype. For example, it’s been suggested that for programmer types, World of Warcraft (or similar games) is the new golf: the social mechanism outside work where people meet and get ahead. Golf used to exclude women, World of Warcraft excludes non-gamers, or those who don’t have time to play because of family commitments. Exclusion can negatively affect any of us. There is often a shoulder-shrugging among programmers: yeah our diversity sucks, but we don’t really care whether more women enter the profession or not. But diversity and stereotypes can pinch for anyone. It’s worth trying to fix for the benefit of everyone.

Leave a comment

Filed under Uncategorized

Is Teaching Programming Not Enough?

The Atlantic has an interesting take on the programming/computer science/computational thinking debate, with several points for disagreement. I’m going to pull out various quotes from the article, and interleave them with some of my own thoughts.

Who Needs Java or JavaScript Anyway?

“Educators and technology professionals are voicing concerns about the singular focus on coding—for all students, whether learning coding is enough to build computational thinking and knowledge, and for students of color in particular, whether the emphasis on knowing Java and JavaScript only puts them on the bottom rung of the tech workforce.”

That last bit’s a jaw-dropper. A mastery of two of the most popular programming languages in the world would surely not leave you on the bottom rung of the whole tech workforce! I think this quote is a misinformed summary of some of the later comments, and thus a very unfortunate quote to repeat in the byline. Research like that from Sweller’s talk suggests generic cognitive skills like computational thinking cannot be directly taught. Even if they can be instilled, through transferring learning from another domain, programming may well be the best place to transfer from.

The Self-Programming Computer

“The artificial-intelligence system will build the app… Coding might then be nearly obsolete, but computational thinking isn’t going away.”

I can foresee that increasingly powerful programming tools, higher-level languages and code-reuse (e.g. micro services) could reduce the number of programmers needed, but the idea that AI will write the code seems incorrect. I don’t think any efforts to take the programming out of creating computer programs have ever really succeeded. It always ends up with programming again, just at a different level of abstraction.

The Relation to Mathematics

To avert the risk of technical ghettos, all students must have access to an expansive computer- science education with a quality math program… It’s a myth to think that students can simply learn to code and flourish without a minimum level of mathematical sophistication.

This is a perennial point of contention, as to whether mathematics knowledge is required for programming, and/or whether the skills in them correlate. The assumption that they are intrinsically related is a personal bugbear, and I’m not sure there’s strong evidence either way. Not that I’m arguing against the notion that all students should have access to a quality mathematical education!


This issue of privilege is interesting, though. Historically, many programmers are self-taught. This means that programmers are primarily those with access to computers, who are encouraged (or not discouraged) by their families, which means you get less women (who are less likely to be encouraged to try it) or those from poorer backgrounds (who are more likely to need to get a part-time job, and thus have less time available for self-teaching). The UK’s gambit of teaching it to everybody has the potential to partially correct this artificial selection. However, it is a very touchy subject to suggest that self-taught programmers are automatically inferior, which is what one interviewee implies:

Further, [Bobb] recommends combining the endless array of out-of-school programs for coding and hackathons with learning opportunities in schools. Brown echoes this point, adding that coding to the uninformed can take many forms, but “you simply cannot learn the analytical skills… without a formal and rigorous education.”

This chimes with the potential disagreement with Sweller’s talk. Is computing special (because you can learn from the computer’s feedback) so that self-teaching can work more effectively than in other disciplines? The legions of self-taught programmers in the workforce surely shows that self-teaching must be possible. It may not be the most efficient method to teach yourself, but I think claiming that only formal education can teach the necessary analytical skills for programming is surely incorrect.

Leave a comment

Filed under Uncategorized