Because people program via a computer, we have the ability to add data recording capabilities to the tool they are using to program, and thus automatically observe their programming behaviour. (Aside: this should also be possible with a word processor to see how different students compose essays — surely someone must have looked at that?) From these observations we can hopefully learn how different students program, where they are getting stuck, and come up with ways to improve their learning experience.
There have been many previous small local studies that have recorded data from students at a single institution. In a paper published last week at the SIGCSE 2014 conference, we detailed our “Blackbox” project, in which we are collecting data from hundreds of thousands of programming students worldwide, and sharing the data with other programming education researchers. This kind of data is not a panacea for programming education, but we hope that it provides an interesting new, large dataset with which to study some aspects of beginner behaviour. I’ve written more about this project in a previous post, and hopefully the paper provides a more thorough (but still readable) summary of the project so far, along with a few example mini-analyses.
At the end of the SIGCSE conference, we held a workshop to explore how to get started with analysing this kind of data. Here are some relevant points that arose:
- We looked at some traces of programmer behaviour in the workshop. One participant shared their observation: “This student was not having a great time.” Browsing some of these traces is like watching a classic horror film, as you watch frustrated while the protagonist makes all the wrong decisions, heading into all the wrong situations. They’re so close to right… and then a bad error message sends them down the wrong track, and they delete their code and try writing it again. You want to reach back through time, and across the Internet, to give them a hug. And point out how to fix their problem. Looking at these traces shows how much headroom there is to improve students’ learning experience.
- Analysing this kind of data is difficult. If you want to analyse at large scale, you must be able to do it automatically, with a program. The requirement to translate your relatively high-level abstract research question into a concrete analysis program is difficult, and full of non-obvious caveats and assumptions.
- We are sharing this data with other researchers, and I am confident that this is a good decision. Research surely progresses much faster when teams of researchers communicate with each other and build on each other’s work. If you want to advance the methodology in a domain, you want researchers to have easy access to data. Where feasible, data sharing seems to make as much sense as code sharing. We are now launching a small online community to allow researchers using the Blackbox data to coordinate and share tools and ideas with each other.
I’ll hopefully post a bit more about some simple trends and results in future, time permitting. For now, the paper contains most of what little we have done with the data thus far.