Blackbox: Observing Student Behaviour

Blackbox logoWhat do beginners do when they are learning to program? Where do they get stuck, and how can we improve their learning?

Because people program via a computer, we have the ability to add data recording capabilities to the tool they are using to program, and thus automatically observe their programming behaviour. (Aside: this should also be possible with a word processor to see how different students compose essays — surely someone must have looked at that?) From these observations we can hopefully learn how different students program, where they are getting stuck, and come up with ways to improve their learning experience.

There have been many previous small local studies that have recorded data from students at a single institution. In a paper published last week at the SIGCSE 2014 conference, we detailed our “Blackbox” project, in which we are collecting data from hundreds of thousands of programming students worldwide, and sharing the data with other programming education researchers. This kind of data is not a panacea for programming education, but we hope that it provides an interesting new, large dataset with which to study some aspects of beginner behaviour. I’ve written more about this project in a previous post, and hopefully the paper provides a more thorough (but still readable) summary of the project so far, along with a few example mini-analyses.

At the end of the SIGCSE conference, we held a workshop to explore how to get started with analysing this kind of data. Here are some relevant points that arose:

  • We looked at some traces of programmer behaviour in the workshop. One participant shared their observation: “This student was not having a great time.” Browsing some of these traces is like watching a classic horror film, as you watch frustrated while the protagonist makes all the wrong decisions, heading into all the wrong situations. They’re so close to right… and then a bad error message sends them down the wrong track, and they delete their code and try writing it again. You want to reach back through time, and across the Internet, to give them a hug. And point out how to fix their problem. Looking at these traces shows how much headroom there is to improve students’ learning experience.
  • Analysing this kind of data is difficult. If you want to analyse at large scale, you must be able to do it automatically, with a program. The requirement to translate your relatively high-level abstract research question into a concrete analysis program is difficult, and full of non-obvious caveats and assumptions.
  • We are sharing this data with other researchers, and I am confident that this is a good decision. Research surely progresses much faster when teams of researchers communicate with each other and build on each other’s work. If you want to advance the methodology in a domain, you want researchers to have easy access to data. Where feasible, data sharing seems to make as much sense as code sharing. We are now launching a small online community to allow researchers using the Blackbox data to coordinate and share tools and ideas with each other.

I’ll hopefully post a bit more about some simple trends and results in future, time permitting. For now, the paper contains most of what little we have done with the data thus far.

Leave a comment

Filed under Uncategorized

Seven Short Book Reviews

In the middle of last year I started reading non-fiction books quite regularly. Most of books relevant to computing education have already been covered in previous posts: Teacher Proof by Tom Bennett, Why Don’t Students Like School by Dan Willingham, Mindstorms by Seymour Papert, Visible Learning by John Hattie, and Peer Instruction by Eric Mazur. (Are there many good non-fiction computing books? Seems like the economists and psychologists have managed to produce best-sellers, but not computer scientists.)

Instead, this post features mini-reviews of seven books not relevant to computing education that I’ve read in the past year:

signal-and-noiseThe Signal and the Noise by Nate Silver. At a previous job, we used to play poker at lunchtimes. Once an argument broke out over whether you judged a decision (raise/fold) on the information you had available at the time, or the eventual outcome of the hand. Were you right to fold against a high bet when only holding a pair of eights because it seemed unlikely you would win, or wrong because you later would have won when the third eight appeared? I see Silver’s book, which features a chunk on poker, as a long explanation of why the decision must be judged against the available evidence, not the outcome that unfolded later. Most practical outcomes have an element of uncertainty, and Silver argues that people are bad at recognising and accounting for the uncertainty. We may want to predict the election or the weather, or some other outcome of interest. Realistically, we can only offer a probabilistic prediction: 80% chance of rain, 75% chance of Obama’s victory. If it later turns out there was no rain, that does not mean the prediction was flawed. Moreover, any certain prediction — it will definitely rain today, Obama will win the election — is flawed or useless, even if it appears better and more useful than the hazy probabilistic prediction.

the-gambleThe Gamble by John Sides and Lynn Vavreck. The Gamble concerns the determining factors behind the 2012 US election result. It is chock full of data, and offers a relatively unexciting explanation for the election result: there were no game-changing gaffes, no big swings, no canny campaigning. The election was decided by fundamentals (e.g. the economy) and two juggernaut campaigns that cancelled each other out. I find The Gamble an interesting book in the wider scheme of things. I must admit that I would have preferred to read a dramatic blow-by-blow account of the debates and gaffes (binders full of women! that 47% remark!) and so on, but The Gamble seems to be the correct answer: a slow and steady data-based account of the election that gets to the heart of the matter. I guess evidence-based rationalism means that you have to live without unnecessary excitement.

liars-pokerthebigshortLiar’s Poker, and The Big Short, both by Michael Lewis. These are excellent as a pair. Liar’s Poker was written in the ’80s by Lewis after he quit his bond trading job and wrote a tell-all book about Wall Street. It explains how, in the ’80s, mortgages went from simple affairs between the building society (or equivalent) and the borrower, to being traded as bonds on Wall Street. It details that a lot of traders were immature, unscrupulous, and often trading on their customers’ ignorance to make fortunes. Of course, repackaging and misrating of these mortgage-backed bonds was a trigger for the recent economic crisis, so it was no surprise that Lewis returned to the same area to write The Big Short, which focuses on the oddball few who correctly predicted the meltdown ahead of time and found clever ways to make money off this (more on this in a moment). Several of them were driven close to a personal meltdown, because they could not tally their evidence-based prediction that the market would crash heavily, with the behaviour and beliefs of all the other traders in the industry, who seemed to be happily sailing their ships over the cliff. It gives you some insight into why when everyone is wrong, it is hard to be right.

This pair of books also leads to an interesting observation. In Liar’s Poker, Lewis did not portray brokers well. They mostly appear in the book as a bunch of conniving (although admittedly canny), greedy, immature louts. The book was a fairly damning exposé, and Lewis’ ex-CEO would later claim that it “destroyed his career”. So there Lewis is, 20 years on, trying to write a second book about Wall Street. Who would dare talk to him, knowing all this? NPR asked him in an interview, as described by Tim O’Reilly:

You have to understand, Lewis replied (more or less), that many of those people got into the financial industry after reading [Liar's Poker]. Their big takeaway was how easy it was to make a lot of money without regard to the niceties of creating much value. He finished with the memorable line, “You never know what book you wrote until you know what book people read.”

Bad PharmaBad Pharma by Ben Goldacre. Bad Pharma is an examination of how medical research and development is set up wrongly, and is providing opportunities and incentives for companies to act against patient’s interests, often with collusion from regulators and doctors. Some of the bad-regulation aspects chime with The Big Short, where regulators seem beholden to the companies they are intended to regulate. It’s an important book and one worth reading, as a general read, and if you are involved in (any kind of) research. The practices to do with authorship on papers and ghost-writing, and the responses by some journal editors, are quite eye-opening.

marvelcomicsBrick by BrickMarvel Comics: The Untold Story by Sean Howe, and Brick by Brick by David Robertson. Two similar books, one about Marvel Comics and one about Lego. My slight frustration with both books is that I was somewhat hoping for an indulgent read all about the evolution of Technic and Spider-Man, but instead both books focus heavily on the political and business sides of Marvel and Lego, with surprising little detail on the actual products of the company. You read all about the politics that forced a change in writer of the X-Men, but not much about the plotlines, or evolution of the art and colouring. You read about the background to the changing CEOs at Lego, but get little description of the differing sets and bricks beyond their general sales figures. Clearly, I was still picking too much from the serious end of things!

You do come away with a sense of how Marvel shoot themselves in the foot by driving short-term sales (limited edition covers, cross-overs) over quality or long-term loyalty — and they continued to do this even though they knew it hurt long-term sales. Also interesting is how both companies nearly drove themselves to the point of irrelevance and bankruptcy: Lego were saved by refocusing and getting smarter about their business processes; Marvel were largely saved by their recent adventures into blockbuster films. The Marvel book also has a fairly dark undertone: steadily thoughout the book, lots of employees seem to die early, to the point where it no longer seems to be confirmation bias. One illustrator dies at the drawing board of a heart attack, aged just 32. It’s a little like seeing a documentary about the sweatshops where your clothes are made: a hidden human cost to your fun. I suspect there would be similar tales from a book about the computer game industry.

That’s the lot for now! I’ve got a few more books left on my reading pile, but recommendations for others (relevant to computing education or not) are welcome.

1 Comment

Filed under Uncategorized

Complete Our Research Questionnaire

We are currently running a research questionnaire and we are looking for responses from those who teach programming. You are eligible to take part if you have taught introductory programming at some stage, and you are familiar with Java. Note: you do not have to have taught programming in Java itself. The questionnaire should take around 15 minutes and can be found here:

http://fluidsurveys.com/s/educator/

This is an electronic version of a paper questionnaire that we handed out at ICER 2013. If you completed the questionnaire at ICER, then thanks — there’s no need to fill it in a second time. We’ll keep the questionnaire open until the end of January.

1 Comment

Filed under Uncategorized

Podcast on Computing in Schools

Last summer I met Freek Leemhuis, a software developer from the Netherlands. I ended up recording a podcast chat with him about the state of computing in UK schools, and a few other computing education topics besides. The podcast has just recently been edited and released — you can listen to it here:

Info: http://devnology.nl/en/podcast/10-content/249-devnology-podcast-043-neil-brown
MP3: http://traffic.libsyn.com/devnology/Devnology_Podcast_043-Neil_Brown.mp3

It’s about 45 mins long, and I think it turned out to have reasonably interesting content.

1 Comment

Filed under Uncategorized

Blackbox at SIGCSE

This is a brief announcement, for those who will, or may, attend the SIGCSE (Special Interest Group in Computer Science Education) conference in Atlanta in March next year. The programme has now been published and registration is open. I will attend and present two items about our Blackbox data collection project. Firstly, we have a paper accepted, which I’ll be presenting on Thursday afternoon. (I’ll blog again about this nearer the time.)

Secondly, we will be running a workshop on Saturday afternoon showing how to get started with analysing data from the Blackbox project. This should be of interest to those who already have access to the data but haven’t yet gotten started with their analysis, or those who want a more detailed look at what the project has to offer. If you want to attend, make sure you don’t fly back too early on the Saturday: the workshop runs 3–6pm. You can sign up during registration, or on site. (Technically, the program is still subject to change, but I presume this wouldn’t involve moving the workshop.)

Leave a comment

Filed under Uncategorized

Tracking down a parser problem

In our Blackbox project, we are collecting a lot of data from programming novices as they learn, including the Java source code they are writing. We now have a reasonable amount of data: 5 million successful compilations (and 5 million unsuccessful). I’ve been doing some simple analysis on the data we have collected; in this post I will describe a bug in the analysis that look me a little while to track down.

The analysis is very straightforward. I pull all of the successfully-compiled files out of a disk cache and then process each one (by parsing it and doing my analysis on the resulting syntax tree), then finally pull all of the results together to produce a summary. The analysis runs on a multi-core machine so I run several worker threads in parallel, each running one task at a time until the work queue is empty. I encountered a problem where sometimes the analysis was very slow to finish (or didn’t finish at all). Tracking down a problem that occurs at the end of your hours-long analysis is definitely a frustrating and time-consuming business.

Early on I looked at time spent in garbage collection, and the possibility of a livelock in my worker framework. But eventually I realised that the problem did not lie in the framework or environment, it lay in the processing tasks themselves. Some of my processing tasks were simply taking an incredibly long time to finish. This sounds like an obvious cause, but it was quite surprising. The task was very small. It simply parsed a source file and scanned the resulting tree. There should be no possibility for an infinite loop in there, given that the input is a finite source file. Why would most of these tasks finish near-instantly, while a small handful took near-forever?

Once I realised the problem was in the processing task, I also realised that it was certain inputs that were tying up the worker threads. With enough bad inputs, each worker would eventually be occupied trying to process a difficult input, until my program pretty much ground to a halt. Looking at the difficult inputs gave a clue to the problem. Here is the gist of one problematic input, which was implementing something like a game of twenty questions:

boolean q1 = nextAnswer();
if (q1) {
    return "...";
}
if (!q1) {
    boolean q2 = nextAnswer();
    if (q2) {
        return "...";
    }
    if (!q2) {
        boolean q3 = nextAnswer();
        if (q3) {
            return "...";
        }
        if (!q3) {
            ... // Lots more nested ifs for q4 onwards
        }
    }
}

And so on, with all the if statements nested twenty levels deep. (Like I said: the data comes from programming novices.) So clearly the problem resided with if statements. Now, parsing if statements in many C-like languages has a problem known as the dangling-else problem — if you have this:

if (a)
if (b)
blah();
else
bleh();

Does that else attach to the first if or the second? The usual answer is that it attaches to the innermost if. However, if you use a classic declarative BNF approach to writing the grammar, you will write an ambiguous grammar. So C and Java rewrite their grammar to be unambiguous. Here’s Java’s rewrite, from the Java Language Specification:

StatementNoShortIf:
    StatementWithoutTrailingSubstatement
    LabeledStatementNoShortIf
    IfThenElseStatementNoShortIf
    WhileStatementNoShortIf
    ForStatementNoShortIf

IfThenStatement:
    if ( Expression ) Statement

IfThenElseStatement:
    if ( Expression ) StatementNoShortIf else Statement

IfThenElseStatementNoShortIf:
    if ( Expression ) StatementNoShortIf else StatementNoShortIf

This rewrite essentially ensures that a StatementNoShortIf will never end in a (non-brace-enclosed) if-without-else, thus removing the ambiguity. And it turns out that the Java parsing library I was using follows this grammar exactly. Effectively, they had copied the Java Language Specification directly into a parsing library, which doesn’t sound unreasonable as a way to ensure that you parse all of Java correctly. Here’s the relevant snippet of code that uses the Parsec parsing combinator library (comments added by me):

ifStmt = do
 tok KW_If -- Consume if keyword
 e <- parens exp -- then parenthesised expression
 -- Then try the following two alternatives in turn.
 -- First, statement with else:
 (try $ do th <- stmtNSI -- Stmt with no short if
           tok KW_Else -- Consume else keyword
           el <- stmt -- Parse another statement
           return $ IfThenElse e th el) <|>
   -- Second alternative: if-without-else:
   (do th <- stmt 
       return $ IfThen e th)

The practical consequence of this code is that every if-without-else has two parse attempts; one that looks for an else afterwards (but must parse the body of the if before it finds out there isn’t one) and one that re-parses the same body, but doesn’t look for an else. By itself, this double parse is not that big an issue — sure, it takes time to parse the body twice, especially if it’s large, but computers are fast enough that it’s not a massive issue.

That is, until you nest multiple if statements. Our novice programmer nested 24 deep. So we first attempt to parse the if looking for an else. But we also make the same attempt with the other 20 nested ifs. The innermost one will fail (our novice isn’t using “else”s), and re-parse. Then the second innermost one will fail and reparse. Suddenly our parsing will make something like 2^24 attempts to parse the innermost if body. It was this complexity that caused our parsing tasks to get into a very long running loop. One small fix later to avoid parsing the body twice, and the parsing was running in reasonable time on the problematic inputs — effectively going from O(2^N) to O(N) complexity. Sanity restored. (You can now argue about the superiority of LL, LR parsers and so on, if you’re so inclined.)

Conclusion

These sorts of articles are usually end meant to end with a learned lesson. Take your pick: don’t blindly copy BNF into a parsing library, beware of potential problems in the libraries you are using, have analytics on how long your processing tasks are taking and flag up/kill off outliers, and finally be better at narrowing down the problem than I was (a few print statements at the right points pinpointed the problem, but I went down a few blind alleys first). I am certainly learning that when you have a large dataset like ours, the odd corner cases in your code will be found out. This is both a blessing and a curse: you have to write better code, but at least you have a massive source of input data to test your algorithm on.

2 Comments

Filed under Uncategorized

Interface Design

All programs exist within a larger environment, and typically have an interface to interact with the environment. That may be a hardware interface for doing low-level I/O, a “standard library” in languages like C or Java, or a set of built-in commands in an educational environment like Logo, Scratch or Greenfoot. Broadly, there’s two different approaches to take when designing an interface: I’ll call them minimalism and convenience. I’ll describe each in turn, with examples.

One way to design an interface is for minimalism: to find the smallest set of orthogonal commands that allow you to manipulate an interface. For example, in many environments (e.g. POSIX/C), the file-access interface consists of: open, close, seek, tell, read and write. This allows you to accomplish anything you want with files, and it’s hard to provide a simpler interface that can still accomplish all possible tasks. As a non-programming example, some calculators have a power operation (x^y) but no square root button, as the former subsumes the latter (x^0.5=\sqrt{x}) — and they do not offer a conversion between radians and degrees, just multiplication, division and a button for π.

The other way to design an interface is for convenience. The classic file interface is nice and orthogonal, but the most common way to write a file is just to write from a string over the top of a file. Some languages, like Haskell, have a function that does exactly that — you give a file path and a string, and it just writes that to the file (effectively doing open-for-overwrite, seek-to-begin, write, close). It overlaps with the classic operations, which are still provided, but it is easier to get done what you want. As another example, Excel provides an AVERAGE function for getting the mean of a range, even though it’s trivial to do given the SUM and COUNT functions.

These two different approaches are different ends of a spectrum, and each interface designer must make design decisions to veer towards one or the other. Some orthogonal interfaces can make it very hard to implement common cases (e.g. Java’s file API), while some convenience interfaces pack in so many similar methods and synonyms (e.g. Ruby’s array API) that it becomes a tangled mess (especially if, in Ruby’s case, you want to duck-type something to act like an array). I’m particularly interested in these decisions in the context of educational environments, such as in Greenfoot, where I have a hand in the design.

Education Example: Greenfoot

greenfoot-crab

The original Greenfoot 1.* had these methods for movement: setLocation(x, y), getX() and getY(). It also had setRotation(r) and getRotation(), which rotated your image. This was design by minimalism: these methods are enough to implement any kind of movement you want: up/down/left/right Pacman-style movement, Asteroids style drifting, directional movement, and so on.

So there you are on your first day of teaching. You load up Greenfoot and want to show some novices how to get your character running across the screen in the direction in which it is pointed. Here’s the code, if you just used Greenfoot’s original built-in movement methods:

setLocation((int)(getX() + 5 * Math.cos(getRotation())),
            (int)(getY() + 5 * Math.sin(getRotation())));

Ouch! Let’s see how many concepts are in that one line:

  • Method calls with no parameters or multiple parameters
  • Method calls with and without return values
  • The general concept of accessors and mutators (aka getters and setters) and their combination
  • Trigonometry
  • Nested method calls
  • Casting between numeric types

That’s a lot to take in. Compare this to our move method which we added more recently:

move(5);

Now we’re down to just one method call, with one parameter and no return. (We also debated building in a no-parameter version, move(), that advanced a default amount, but that seemed too little gain over the one-parameter version.)

In fact, since version 2.0 of Greenfoot, we’ve slowly added more methods in the convenience category: move, turn, turnTowards, isTouching/removeTouching and a few more. Looking back, I think it has mainly been me arguing to add these methods, based on making our beginners’ workshop smoother. Now, designing the software differently purely to make a single workshop easier is a bit bizarre, but my thought was always been that the workshop mirrors most users’ journey into Greenfoot. Generally, allowing people to easily do what they commonly ask for when they start with Greenfoot seems like a good idea.

asteroids

Everyone wants collision detection, so in 2.3.0 we made it shorter and simpler. In the beginners’ workshop, everyone always asks how to add a score, and my answer has always been that unfortunately it’s not as easy as you’d think. So in Greenfoot 2.4.0, we’re adding new methods so that it can be as easy as you’d think. There’s a limit to how easy we want to make things: at some point users need to learn more complicated concepts to perform more complicated tasks. But I think the right time for that should be a bit later than it was in version 1.*.

With the new methods, a learner’s journey can be better structured. In the upcoming version, if you want a score, you can add it and display it easily. If you want to eat things you run into, that’s also easier. Then, if you want to be able to shoot bullets that destroy asteroids and increase the player’s score, you already have an easy way to do the score and the destroying, but you must use object references (the bullet holds a reference to the rocket that fired it) to add to the score at the appropriate time. But at least it’s a more manageable pathway.

(Early versions of Greenfoot tried to have it both ways, by providing a minimal core interface, and then introducing convenience helper classes into beginners’ scenarios, so that the move() method would be in a per-scenario helper class. But ultimately this could cause confusion when beginners moved on and found these convenience methods were “missing” from their next scenario. So gradually, we have steered away from this approach to putting convenience methods into the core of Greenfoot.)

Education Example: Scratch

scratch-fish

I find Scratch 1.* an interesting example because it had a couple of significant complications in its interface design. Scratch 1.* did not support user-defined methods (a restriction later remedied by Scratch 2.0 and BYOB/Snap). So users were not able to combine several simple commands into a larger command for re-use. This probably biased Scratch somewhat towards convenience, as the users could not add a convenience layer themselves. But at the same time, in the Scratch interface, all blocks are accessible in a visual menu down the side of the interface. Thus it is harder to hide a large set of methods compared to something like Greenfoot (where you could have a large list available through the documentation). So this might steer Scratch away from having too many blocks available.

scratch-motionYou can run a critical eye over Scratch’s interface yourself, but I would say that Scratch mainly ended up with convenience methods. If you look at the list of motion blocks (shown on the left), technically this could be boiled down to a small number of orthogonal methods, but instead they have provided many similar variants. For example, turn anti-clockwise could have been left out (turn clockwise a negative amount), “set x to” and “set y to” are just special cases of “go to x: y:”, and so on. However, having all these blocks means beginners may need to think less about motion: you just search for a block that does almost exactly what you want, rather than having to think how to combine several smaller operations. (On the other hand, you could argue that having all these blocks makes it more overwhelming for the beginner.)

I can’t finish discussing Scratch without pointing out my favourite convenience block: the “say” block. This block displays a speech bubble coming from a character with the given text inside it. This single block enables entire classes of Scratch scenarios: stories, interactive text adventures and so on. I think it was probably the minor masterpiece in Scratch.

scratch-say

Summary

Interfaces can be designed for minimalism (with fewer, orthogonal methods) or convenience (with more methods to easily handle the common cases). In general software design, there’s no single right answer. However, I wonder if in educational software design, there are a different set of constraints that means that convenience is more often the right answer, to allow learners to get satisfying results earlier on with less cognitive demand. Of course, the meat of programming comes in the combination of methods, but via interface design, the creators of systems like Scratch and Greenfoot can choose at which point novices need to begin to combine methods, and how much they can done with built-in convenient primitive methods.

(The inspiration for this blog post came from John Maloney, one of the Scratch designers, who shared his observation about these two different ways of approaching interface design.)

Leave a comment

Filed under Uncategorized