When do students program?

We store enough information in our Blackbox data set to look at when most programming activity in BlueJ occurs. Most BlueJ users are students, so this should give us an idea of when most student programming occurs. Methodology notes below, but what you really want is the graph, so here it is, for the USA:

blackbox-times-us

It’s a heatmap: time of day on the X axis, days on the Y axis, red is highest amount of activity, down through orange to white being zero activity (e.g. overnight). A few thoughts:

  • Lunchtime is clearly visible in the data. Most programming takes place during the working day, partly because of scheduled classes. Despite stereotypes of night owl programmers, on average, people don’t program late at night.
  • Not much programming on Mondays. One reason for this is that US federal public holidays mostly fall on Monday, which reduces the amount of activity (including in the evening), but I’m not sure if that completely explains it.
  • No-one programs Friday night or Saturday… but check out that Sunday night my-assignment-is-due panic! At least, I’m guessing that’s the explanation

Methodology Notes

We could just look at time of day, but that loses a bit too much detail if you average across all days in the week. So it is better to look at times of day across the week, Monday to Sunday. Weekends are different across the world, so I’ve chosen to narrow by country. Rather than try and pick a list of all the Monday-to-Friday-workweek countries, I just looked at the USA, which is by far and away the country from which we get most data. We store enough information to know the user’s timezone, so I am adjusting properly for the multiple timezones and daylight savings.

The number for activity is a count of IDE events recorded in that hour. That is primarily source code edits, but also includes other things like debugger interactions. I think it’s a good proxy for programming activity, though.

Other Countries

Two of the next most frequent countries in the data are Germany and the UK. Here’s Germany:

blackbox-times-de

and here’s the UK:

blackbox-times-uk

These frequencies are much lower (note the scale adjustments) so the data is noisier, and I suspect the frequencies are low enough that some patterns in the data are caused by individual institutions using BlueJ at particular times (remember that the students in our data are not totally independent: 100 UK students all programming at the same time at a university on Friday morning could noticeably affect the data). Germany has an odd pattern: more programming on Sunday than almost any other day of the week. This might be because there’s not much to do in Germany on a Sunday, but it also hints that maybe more work is done outside classes. Feel free to add your own speculation for any of the patterns above.

Leave a comment

Filed under Uncategorized

Novice Lambda Use in Java

Java provides two ways to easily provide a reference to a function:

button.setOnAction(e -> showAlert("pressed"));
stream2 = stream.map(Object::toString);

The first one is generally referred to as a lambda, and the second as a method reference, but I’m going to refer to them both as lambdas for this post.

In line with the previous post on enum use, this post looks at lambda use in our Blackbox data set (collected from users of the BlueJ beginners’ Java IDE). All data is from the beginning of the project in mid-2013 up until the end of February 2016, a few days ago.

Lambdas are a very recent addition to Java. Java stems from 1996, but lambdas were only introduced in Java 8 in March 2014. Thus it is not surprising that they will not occur very frequently in the data set, as instructors are likely not up to speed on lambdas yet, won’t have had much chance to adjust their course designs, and often treat lambdas as an advanced topic, in contrast to BlueJ’s novice focus. But let’s take a look anyway.

In the data we have 11,666,331 source files which have been successfully compiled at least once. I looked at the most recent successful compilation of each of those source files, to see if they contained a lambda. 6,669 (0.05%) of those source files contained a lambda, with 20,698 lambdas overall. So although lambdas are very rare, the average number used once they are used is three per file. That suggests to me that people find them quite useful, but that they have not gained any traction in education yet.

Number of parameters

For the lambda arrow syntax (i.e. params -> code), I counted the number of parameters on the left-hand side of the arrow:

lambda-params

The reason that 3 and 4 are on there are that there is one instance of each, from 20k lambdas, with no lambdas having more than four parameters. So if the language designers had restricted lambdas to two parameters max, our users would not have noticed.

Haskell programmers, among others, may be a bit confused by the zero-parameter lambda. This is Java’s equivalent of the use of the () -> code pattern in Ocaml/F# to refer to some code but delay its evaluation until it is called later without any real parameters. Haskell would use a monadic computation of a type like “IO ()” for this purpose.

Styles of use

I categorise lambda style into three mutually exclusive categories:

list.forEach(x -> print(x.toString())); // Expression RHS
list.forEach(x -> {print(x.toString());...}); // Block RHS
list.stream().map(Object::toString)... // Method reference

Of our 20,698 lambdas, 14,014 (67.7%) used the expression form, 6,161 (29.8%) used the block form, and 523 (2.5%) used the method reference form. I wonder if a bit of this is advertising: I wasn’t aware of the method reference form in Java for a little while after I learnt about lambdas in the language. I use method references whenever possible in my code because I’m used to point-free style in Haskell. However, I suspect it’s more difficult for novices to understand than the lambda form, so I’m not surprised it’s used much less than the arrow form.

Destinations of Lambdas

I’m doing a syntactic analysis here, so I didn’t attempt the complex semantic task of working out which types the lambdas were being compiled into. However, I did look at which methods the lambdas were being passed to. That is, if you have code like:

myList.map(e -> e.getX())

Then I recorded “map” as the lambda destination. The graph below gives the most popular destination methods for lambdas. The y-axis is purely cosmetic to separate out items with similar frequencies, and I only show frequency 200 upwards because it gets congested below that:

lambda-dest

Since there are low numbers of lambdas, some of this is definitely influenced by individual courses appearing in the data. For example, I tracked the addButton method back to a utility class from a particular university, and I think accumulate has a similar story. But the general pattern is pretty apparent: lambdas are used much more for GUI event handlers than for streams. This is not surprising in one sense: existing code is more likely to use event handlers (which pre-date lambdas) than streams (which were introduced alongside lambdas), so GUIs may be the easiest way to add lambdas to existing courses. I’m still surprised at how large the difference appears to be, though. It will be interesting to check back in a year or two to see if this pattern continues to hold.

Leave a comment

Filed under Uncategorized

Novice Enum Use In Java

An enum in Java is a limited set of constant values. For example:

enum Direction { LEFT, RIGHT, UP, DOWN }

Each variable of type Direction can take on only one of those four values. (Or null, alas.)

I recently got the chance to talk to Andy Stefik, who does really interesting work on evidence-based language design in Quorum (check out this talk of his for more info). He is interested in adding enums to Quorum, but lamented that there is little data about enum use in other languages. He knew of our Blackbox data set (collected from users of the BlueJ beginners’ Java IDE), and asked about enum use in Java in the educational wild. So: here it is. All data is from the beginning of the project in mid-2013 up until the end of February 2016, a few days ago.

Enum Use in Java

On its initial release in 1996, the Java programming language had no support for enums. They were added in Java 5 in 2004 (although of course adoption of new language versions is never instant). This matters when looking at enum use in Java, because although they have been in Java for 12 years, several software projects started before they existed, many course instructors may have trained before enum use was widespread, and so on. So we will probably see less enum use in Java than if they had been present from the outset.

In the data we have 11,666,331 source files which have been successfully compiled at least once. I looked at the most recent successful compilation of each of those source files, to see if they contained an enum. 20,333 (0.2%) of those source files contained an enum (either as top-level declaration, or an inner type), with 22,693 enums overall.

It’s hard to decide what value we would have expected there. Obviously not every type is going to be an enum. If we’d found, say, 10% of all types were enums then that would be weirdly high. I did a quick count on our own BlueJ/Greenfoot code base, which began life before Java 5. We have only used Java 5 features since 2008, so would not have used enums before then. Across 2129 source files we have 70 enums in 61 files, so about 2.9% of our source files have an enum. (I tried searching github to get some data, but a lot of the enum results seemed to be IDE test-cases. Interesting!)

If anything I would expect that enum use should be higher in teaching than professional code, because I expect that educators will deliberately show enums in order to teach the concept, even though they are not used with particularly high frequency in full programs. So I am surprised to see enums used quite so infrequently in Blackbox.

Size of enums

I made a prediction about enums to Andy: that if we plotted the frequency of number of enum values (e.g. four for my direction example at the top of the post) there would be a spike at seven because many examples would be very artificial and use day of the week. I think this is just about confirmed, although not as pronounced as I had expected:

enum-sizes

Looks like the most common number of items in an enum is 3 or 4. The data tails off after 10 as you would expect (but omitted from graph above).

Enum Features

Something else that is of interest about enums is how often sub-features of enums are used in Java. I showed above the use of a simple plain enum, but there are several other allowable features:

enum Color
{
    RED(255,0,0), // Constructor arguments
    GRAY(128,128,128)  // Individual body:
    {  
        public String toString() { return "gray/grey"; }
    },
    BLACK(0,0,0);

    // Group body:
    int red, green, bluej;
    private Color(int red, int green, int blue)
    {
        this.red=red;this.green=green;this.blue=blue;
    }
}

So: how many of the enums in Blackbox use each of these features? Of the 22,693 enums, 4,597 (20.3%) have a group body, i.e. fields, methods or constructors as well as just the item list. 3,264 (14.4%) have items which use constructors with arguments. Just 190 (0.8%) have an individual body for any of the items.

Names of Enums

A final item of interest is the most popular names for enums, which gives hints as to what they are being used for. Here’s a graph of the top ones. The y-axis is purely cosmetic to separate out items with similar frequencies, and I only show frequency 900 upwards because it gets congested below that:

enum-popular

The left/right/up/down directions are the most frequent use, colours are another use, and omitted from this graph are monday/tuesday/etc on about 700. This isn’t relevant to enums specifically, but I was interested to see the grad/rad/deg/degmin/degminsec pattern. This is not some extra-common pattern you’ve never heard of, but I believe is instead an artifact of the one of the MOOCs which have taken place using BlueJ. If one of their examples uses a particular enum, this gets multiplied up by the number of users, and thus shows up prominently in our data — this probably also explains “border_collie” and “pitbull” at around frequency 600. It’s something we need to be mindful of when analysing the Blackbox data; not all observations in the data set are as independent as one might think.

Leave a comment

Filed under Uncategorized

Diversity in Computer Science, Revisited

The closing keynote of SIGCSE 2016 was given by Karen Ashcraft on the subject of diversity in Computer Science. A perennial topic which I’ve seen discussed many times, but I felt this talk offered some convincing explanations and ways forward. Here’s my understanding of the talk — you may also be interested in Pamela Fox’s more detailed notes.

Ashcraft’s central argument was that professions take on an identity through a figurative practitioner. That is, software development has an identity which comes from the commonly imagined programmer: white, male, quiet, socially awkward, interested in games, comics and sci-fi, etc. This idea that professions have an identity is different from the idea that the workers have an identity that is transferred to the profession. She gave the example of pilots, where the identity of the profession was deliberately re-sculpted from “debonair daredevil adventurer” to “fatherly naval officer” in the 1930s, because passengers were much more likely to want to fly with the latter than the former. The identity of a profession is constructed and can be manipulated independently of who is actually a member of the profession.

Ashcraft had particular criticism for the idea of fixing diversity by “leaning in”: the idea that women should adjust themselves to fit into the male world. (I’m partial to the description of lean-in as “victim blaming”.) She argued that any attempt to solve diversity which emphasises the divide is bound to lose. That is, if you tell women “women can program too”, you’re emphasising the idea that women differ from men when it comes to programming, and are not going to be successful in eliminating the gender division in programming.

Ashcraft’s suggestion was that to increase diversity in a profession, you should not focus on messages about “more women”, but rather to emphasise the diversity already present in the profession which is not automatically tied to attributes like gender. For example, let’s take my imagined programmer. Rather than worry about “programmers don’t have to be white males”, we should start with “programmers don’t have to be quiet, or socially awkward, or interested in games”. Building this type of diversity will eventually encompass non-whites and non-males, if we can keep the identity of a profession broad and fluid, rather than pigeon-holed into something very specific and narrow. And this could benefit everyone.

It’s a well-known phenomenon in ergonomics and user-interface design that if you adjust your design for the extreme users, you often improve the design for core users. For example, kitchen tools were given extra-grippy larger handles to help the elderly but it also made holding them easier for everyone else, so it became the default for all tools. A few more examples are given here. (Closer to home, the interesting work of John Maloney, Jens Moenig and Yoshiki Ohshima on keyboard support in their GP blocks language arose from wanting to support blind users.)

I think that this design principle transfers into the diversity argument. Narrow stereotypes for professions harm everyone, not just those who are far outside it. There are already programmers who aren’t quiet, who aren’t socially awkward, who don’t like computer games, who play sports and rock-climb, who don’t chug coffee and code all night, who are not single, who have families. Just because you’re a white male doesn’t mean you aren’t impacted by everything else that comes with a stereotype, that you can’t feel like something of an outsider when you meet few of the criteria associated with a stereotype. For example, it’s been suggested that for programmer types, World of Warcraft (or similar games) is the new golf: the social mechanism outside work where people meet and get ahead. Golf used to exclude women, World of Warcraft excludes non-gamers, or those who don’t have time to play because of family commitments. Exclusion can negatively affect any of us. There is often a shoulder-shrugging among programmers: yeah our diversity sucks, but we don’t really care whether more women enter the profession or not. But diversity and stereotypes can pinch for anyone. It’s worth trying to fix for the benefit of everyone.

Leave a comment

Filed under Uncategorized

Is Teaching Programming Not Enough?

The Atlantic has an interesting take on the programming/computer science/computational thinking debate, with several points for disagreement. I’m going to pull out various quotes from the article, and interleave them with some of my own thoughts.

Who Needs Java or JavaScript Anyway?

“Educators and technology professionals are voicing concerns about the singular focus on coding—for all students, whether learning coding is enough to build computational thinking and knowledge, and for students of color in particular, whether the emphasis on knowing Java and JavaScript only puts them on the bottom rung of the tech workforce.”

That last bit’s a jaw-dropper. A mastery of two of the most popular programming languages in the world would surely not leave you on the bottom rung of the whole tech workforce! I think this quote is a misinformed summary of some of the later comments, and thus a very unfortunate quote to repeat in the byline. Research like that from Sweller’s talk suggests generic cognitive skills like computational thinking cannot be directly taught. Even if they can be instilled, through transferring learning from another domain, programming may well be the best place to transfer from.

The Self-Programming Computer

“The artificial-intelligence system will build the app… Coding might then be nearly obsolete, but computational thinking isn’t going away.”

I can foresee that increasingly powerful programming tools, higher-level languages and code-reuse (e.g. micro services) could reduce the number of programmers needed, but the idea that AI will write the code seems incorrect. I don’t think any efforts to take the programming out of creating computer programs have ever really succeeded. It always ends up with programming again, just at a different level of abstraction.

The Relation to Mathematics

To avert the risk of technical ghettos, all students must have access to an expansive computer- science education with a quality math program… It’s a myth to think that students can simply learn to code and flourish without a minimum level of mathematical sophistication.

This is a perennial point of contention, as to whether mathematics knowledge is required for programming, and/or whether the skills in them correlate. The assumption that they are intrinsically related is a personal bugbear, and I’m not sure there’s strong evidence either way. Not that I’m arguing against the notion that all students should have access to a quality mathematical education!

Self-Teaching

This issue of privilege is interesting, though. Historically, many programmers are self-taught. This means that programmers are primarily those with access to computers, who are encouraged (or not discouraged) by their families, which means you get less women (who are less likely to be encouraged to try it) or those from poorer backgrounds (who are more likely to need to get a part-time job, and thus have less time available for self-teaching). The UK’s gambit of teaching it to everybody has the potential to partially correct this artificial selection. However, it is a very touchy subject to suggest that self-taught programmers are automatically inferior, which is what one interviewee implies:

Further, [Bobb] recommends combining the endless array of out-of-school programs for coding and hackathons with learning opportunities in schools. Brown echoes this point, adding that coding to the uninformed can take many forms, but “you simply cannot learn the analytical skills… without a formal and rigorous education.”

This chimes with the potential disagreement with Sweller’s talk. Is computing special (because you can learn from the computer’s feedback) so that self-teaching can work more effectively than in other disciplines? The legions of self-taught programmers in the workforce surely shows that self-teaching must be possible. It may not be the most efficient method to teach yourself, but I think claiming that only formal education can teach the necessary analytical skills for programming is surely incorrect.

Leave a comment

Filed under Uncategorized

John Sweller on Cognitive Load Theory and Computer Science Education

The opening keynote at SIGCSE 2016 was given by John Sweller, who invented cognitive load theory and works on issues such as the relation between human memory and effective instructional design. The keynote was purportedly about “Cognitive Load Theory and Computer Science Education”. As one or two people pointed out, the “Computer Science Education” part was a bit lacking, which I’ll come back to. (I should also point out this is a bit hazy as there was a full conference day before I had chance to write this — happy to take corrections in the comments.)

Knowledge and Memory

Sweller started by discussing biologically primary versus biologically secondary knowledge. Biologically primary knowledge is that which we have evolved to acquire: how to walk, how to talk, how to recognise faces, etc. None of this is taught, it is all learnt automatically. In contrast, biologically secondary knowledge is the rest, which he argues must be explicitly taught. Algebra, history, science: his rough summary was that everything that is taught in school is biologically secondary.

I won’t go deeply into working memory and long-term memory here, but the relevant parts were that working memory is small and transient (i.e. disappears quickly), whereas long-term memory is almost unlimited and can be retained permanently. Novices can get overwhelmed because everything novel must live in working memory, whereas experts have knowledge stored in long-term memory and so use different processes to tackle the same tasks. One thing I did not know before was that we have two working memories: one auditory and one visual, which means you can potentially achieve benefits by presenting information in a mixed modality, providing they mesh well together.

Visualisations

One issue that came up in the questions was about algorithm visualisation. Algorithm visualisation is something that many people are convinced is useful for understanding programming, but which has rarely, if ever, been shown to be effective for learning. Sweller’s suggestion was that if comparing two states (before/after) is important and non-trivial then it is better to provide two static diagrams for careful comparison, rather than a video which animates the transition from one to the other. My take-away message from this is that visualisations need to be comics, not cartoons.

Experts Use Memory More Than Reasoning

Sweller made the point that experts tend to operate through pattern-matching. Although we may think of programming as carefully constructing logical code paths to fit a given problem, we are often just recognising a problem as similar to one we have solved before, and adjusting our template to fit. More expert programmers just know more patterns (and “patterns” here matches well to the idea of design patterns). The difficult part of programming is thus only when we stray outside our pattern catalogue. This was work covered in some work in the 1980s which I’ve previously discussed here (and see also this post).

What May Make Programming Special

The issue of how knowledge is transmitted was contentious. Sweller is against constructivism: he believes the idea that knowledge is best gained through discovery is incorrect, and explicit instruction is always superior. This is where the Computer Science domain becomes important. I can see that for something like algebra, you must be taught the rules, and taught the processes by which to rearrange an equation. You can’t just mess around with equations and hope to learn how they work — because you have no feedback.

But the computer is an interesting beast. It provides an environment which gives you feedback. If you know the basics of how to write a program, you can potentially learn the semantics of a language solely by exploring and discovering. Can is not the same as should, though: explicit instruction may still be faster and more effective. But I think it went unacknowledged in the talk that programming is somewhat different to many other learning experiences, because you have an environment that offers precise, immediate, automatic feedback based on your actions, even if no other humans are involved.

Final Notes

There’s a bunch more content that I haven’t included here for space reasons or because I didn’t remember it clearly enough (didn’t reach long-term memory, I guess!). Terms to google if you want to know more: cognitive load theory, randomness as genesis principle, expertise reversal effect, worked-example effect, split-attention effect.

I liked Sweller’s talk, and I believe that understanding what he called human cognitive architecture is important for education. I think the main issue with the talk is that Sweller is a psychologist, not a computer scientist, and thus there was little consideration given to the potential ways in which computer science may be different. How does having syntax and semantics in code play with working memory for novices? What should explicit instruction look like in programming education? Do different notional machines cause different interactions with our memory? How can we best organise our teaching, and our design of programming languages and tools to minimise cognitive load? Some answers to that arise in Briana Morrison’s work (including the paper later in the day yesterday, not yet listed there), but there is clearly a lot more potential work that could be done in this area.

3 Comments

Filed under Uncategorized

Book Review: Learner-Centered Design of Computing Education

Mark Guzdial works at Georgia Tech and writes the most prolific and most read blog in computer science education. Thus I was intrigued to read his new book, “Learner-Centered Design of Computing Education: Research on Computing for Everyone”.

The book is a short and accessible read, that summarises huge chunks of research, especially the parts about teaching computing to everyone (i.e. non-specialists). One of Guzdial’s most well-known interests is media computation, which is used at Georgia Tech to teach non-majors (i.e. those taking non-CS courses) about programming, and it’s interesting to see how much positive impact it has had. But there’s no favouritism here: the book always has the research to back up its claims, and this is one of the great strengths of the book. The reference section has some 300 references, making it a great place to start getting to grips with research in the area. If I have any complaint with the book it is that it could do with a bit of copy-editing to remove some typos, but it’s not a serious issue.

Computational Thinking

The recent buzz area of computational thinking is covered in the book. One of the key ideas around computational thinking is that of teaching computing in order to teach general problem solving skills. Again backed by research, Guzdial dismisses this idea as unsupported: there never seems to be any transfer from programming into more general problem-solving skills. Programming’s utility would seem to be two-fold: teaching programming can transfer into other programming (i.e. between programming languages), and teaching programming does give an idea of how computers operate and execute. Beyond that, much of the rest is wishful thinking. But in an increasingly computer-centric and computing-reliant world, I believe this is still sufficient to argue for teaching computing to all.

Identity and Community

Several parts of the book remind me of this article I read recently on computing as a pop culture. One interesting aspect of programming is how programming languages live and die. It’s tempting to think that it’s solely to do with suitability for purpose. However, there are all sorts of other factors, such as tool support, library availability, and popularity. Languages do clearly move in fads, and this is just as true in computing education. Pascal is not necessarily any worse today to teach programming than it was twenty or thirty years ago, but it is dead in classrooms. Guzdial relates this to communities of practice: students want to believe they are learning something which leads them towards a community of professionals. For physicists, this might be MATLAB; for programmers this might be Java. This can be frustrating as a teacher: why should a teaching language be discounted just because students (perhaps wrongly) come to see it as a toy language. But this seems to be a danger to languages for use in classrooms, especially at university.

The issue of identity and belonging crops up repeatedly. Guzdial makes the point that for people to become programmers, they need to feel like they belong. If they don’t see people like themselves doing it then they are likely to be put off. This may be minorities not becoming full-time software developers, but it can also affect non-specialists. There’s an anecdote from one of Brian Dorn’s studies where a graphic designer who writes scripts in Photoshop tells of being told that he’s not a real programmer, he’s just a scripter. Technically, programming is programming, but it’s clear that issues of identity and community are often more important than the technical aspects. Programmers reject those they see as non-programmers, non-programmers reject those they see as programmers.

One really interesting case study on belonging as a programmer is that of the Glitch project, which engaged African American students in programming by using game testing as a way in. Several of the participants learned programming and did extra work because they were so engaged — but they would not admit this to anyone else. They constructed cover stories about how they did extra for extra money, or that they were only doing it to play games or learn money, disguising the fact that they were learning computer science.

The issue of identity also resurfaces when Guzdial talks about computing teachers. As in the UK, many US computing teachers come from other subjects: business, maths, science, etc. Many felt that they were not a computing teacher and (in the US) lacked sight of a community of computing teachers for them to join, which demotivated them. I also appreciated the section on how computing teachers’ needs differ from those of software developers. “Teachers need to be able to read and comment on code… We don’t see successful CS teachers writing much code.” I wonder to what extent this is reflected in CS teacher training. The point is also made that teachers need to “know what students are typically going to wrong”, which is something my colleague Amjad Altamdri and I have done some work looking into.

Conclusion

I’d recommend the book to any computing teacher who is interested in learning more about what the research tells us about teaching computing. I could imagine lots of secondary school computing teachers in the UK being interested in this, and a fair few computing university academics who could benefit from reading it. Although there may be little chance of widespread impact, as the book itself describes: “Several studies have supported the hypothesis that CS teachers are particularly unlikely to adopt new teaching approaches” and “Despite being researchers themselves, the CS faculty we spoke to for the most part did not believe that results from educational studies were credible reasons to try out teaching practice.” (I wonder how many of those believe in evidence-based medicine, etc, but not evidence-based teaching.) The book would also be particularly useful to new computing education PhD students who want a head start in getting to grips with previous work in the area. If any of those describe you, I’d recommend picking up a copy.

3 Comments

Filed under Uncategorized