The Paper History of Code Reading

How do you read code? It’s such a simple question that it seems obvious. You go to your IDE, select the class you’re interested in, and scroll up and down. You may also jump around the code using shortcut keys or mouse clicking to jump to the definition or usages of an identifier. There have been many papers over the years on improving the readability of code, but once you go back far enough, you start realising that a lot has changed. In 1986, Rambally tried different colouring schemes for keywords to enhance readability. They suggest reasons why this might not have been done before:

From Rambally's 1986 paper "The influence of color on program readability and comprehensibility" (ACM, public link), amusingly only available in black and white.

From Rambally’s 1986 paper “The influence of color on program readability and comprehensibility” (ACM, public link), amusingly only available in black and white.

Say what? Colour screens were becoming available at that time (quick digging shows CGA was 1981, and I believe Turbo Pascal 1.0 used colour in its launch in 1983; there were doubtless similar developments in Unix at that time), but the paper takes for granted that code is read on paper. This is referred to as an issue by Baecker and Marcus:

From Baecker and Marcus's 1986 paper "Design principles for the enhanced presentation of computer program source text" (ACM)

From Baecker and Marcus’s 1986 paper “Design principles for the enhanced presentation of computer program source text” (ACM)

Choice of font being dictated by its photocopyability is a long way from font choice on hi-res screens like Retina displays. This was a different era, and it means that some of the results are not necessarily applicable. For example, Miara et al summarise earlier work by Weissman as follows:

From Miara et al.'s 1983 paper "Program Indentation and Comprehensibility" (ACM, public link)

From Miara et al.’s 1983 paper “Program Indentation and Comprehensibility” (ACM, public link)

I think it’s safe to say that paper-page boundaries are no longer an issue with program reading!

Because of the paper focus, proposals for code navigation were also quite different. Where now we have things like outline views and jump-to-declaration, 30 years ago programmers were flicking backwards and forwards through piles of paper. So Oman and Cook’s idea of having a table of contents and an index must have been appealing, but the idea now of printing code into a book format is like asking for a hardcopy of wikipedia or a DVD of a youtube channel:

From Oman and Cook's 1990 paper "Typographic Style is More than Cosmetic"(ACM)

From Oman and Cook’s 1990 paper “Typographic Style is More than Cosmetic”(ACM)

It was also a different era of programming languages. I’m mainly looking in the mid 80s onward, so programs were entered via terminals not punched cards, but it’s the era of Pascal, FORTRAN and that whippersnapper C, maligned even then:

From Baecker's 1988 paper "Enhancing Program Readability and Comprehensibility with Tools for Program Visualization" (ACM)

From Baecker’s 1988 paper “Enhancing Program Readability and Comprehensibility with Tools for Program Visualization” (ACM)

This is historically amusing, given that C’s syntax was broadly copied into a very successful set of languages, like C++, Java, C#, Javascript and so on. The design of C was likely driven not so much by readability as by the Unix philosophy of reducing keystrokes. Just as you had “rm” instead of “remove”, you had “{” instead of “begin”. Languages tend to use shorter words for more frequent words, which reduces talking effort (less syllables) and writing and reading: less characters for the hand, and less width to travel for the eye. And what is shorter than one character? (Spoiler: zero characters! Dump those brackets…)

This era was also early in the structured programming era; several papers make mention of GOTO statements making the display of structured programming less effective.

Some of the papers, such as Miara et al.’s aforementioned study of indentation are using ALL CAPITALS MONOSPACE programs which are a bit horrendous to the modern eye. But the typography ideas were not all prehistoric; some of the pretty printing suggestions for the paper display of code were fairly advanced. I find it a bit too over the top, but Baecker and Marcus’ proposed display of code was an admirable effort at improvement:

From Baecker and Marcus's 1986 paper "Design principles for the enhanced presentation of computer program source text" (ACM)

From Baecker and Marcus’s 1986 paper “Design principles for the enhanced presentation of computer program source text” (ACM)

The margin notes and boxes are displays of comments in the original source. I think it ended up too busy, and some of the horizontal gaps are too large, but they do things like get rid of the curly brackets, use proportional fonts, the code is mainly lower case in the source, and they even do things like variable size brackets for grouping and variable spacing in expressions. Ultimately, it was the move from reading on paper into the modern standard of reading on screen that killed off this avenue of code display, to be constrained by character terminals and low-resolution graphics systems for the next ten to fifteen years. With modern displays, though, it’s worth considering some of these ideas again. Finally, we should beware citations of studies from this era when talking about screen-based reading; the paper-based results from that era may well not transfer to screens.

1 Comment

Filed under Uncategorized

The holes in computing education research

Sally Fincher (the head of the computing education research group I work in) has published a very interesting article (originally in CACM, public version available here) about the lack of good research in computing education, and her worries that much of our efforts to push computing into schools lacks backup from researchers on how best to actually teach computing (and programming in particular):

What is resolutely held common [between teaching computing/programming literacy and] traditionally formulated literacy is that these approaches are unleashed on classrooms, often whole school districts, even into the curriculum of entire countries—with scant research or evaluation. And without carrying the teachers. If we are to teach computing in schools we should go properly equipped. Alongside the admirable energy being poured into creating curricular and associated classroom materials, we need an accompanying set of considered and detailed programs of research, to parallel those done for previous literacies.

I really enjoyed the article, which challenges some of the rest of our approaches to delivering and researching computing education — I highly recommend reading it.

Having read it, you might well wonder why there is so little research into school programming education. There are several reasons for this. One is that computing education research is a small field even within computing, and compared to something like literacy research it is tiny indeed. Another problem is that until recently, computing education researchers had no compelling reason to look at teaching introductory programming to all school-age students, because nobody was trying to deliver such teaching; instead the researchers were mainly focused on the status quo of teaching programming to self-selecting undergraduates. (As an additional practical issue, it is famously hard to get funding to research computing education; computing research funding organisations haven’t generally viewed it as an important area, and it was considered too niche for education research funding organisations.)

There is a further issue in teaching programming. Many concepts in programming remain reasonably static (e.g. variables, functions, collections, iteration, recursion), but all around them there is innovation and evolution. You have people toying with different syntaxes, different representations (blocks, frames, text, etc), different frameworks to find out what might work better for teaching programming, with scant rigorous evaluation. For example, we know teaching syntax can be tricky, so a few researchers study better ways to teach syntax while others — like the Scratch team or our team — just try to build new tools that eliminate syntax as a consideration. It’s a tough field in which to stand still and take stock. But Fincher is right that it is very important to properly evaluate teaching approaches and tools.

I know that when I tell teachers I am a computing education researcher, they are usually interested in what research has to say, and often have very sensible questions: What language should I teach in? What order should I cover the concepts? Is blocks-based programming good for students long-term? How should you best manage the transition between blocks and text? Some of these are at least partly personal preference, but we have so little research on them that I am rarely able to give a good answer. The true answer is usually: no-one really knows, and my work makes it harder by trying to build new tools instead of evaluating existing tools.

Although it’s not just the educational part of computing that lacks these answers: a teacher may ask whether static or dynamic languages are best for teaching, while a professional developer may have the exact same question for professional use, and the research there is almost as bare.

I remember during my PhD, where I was creating new abstractions/libraries for concurrent programming, a researcher from another discipline asked how I would evaluate the new abstraction: would I get a bunch of programmers together and get some of them programming with an existing library, and some with mine and see who performed better according to some metric? I remember scoffing, and thinking: no-one does that. But now I realise that is a large problem with the study of programming. Similar to economists, many computer scientists are so convinced that our discipline is purely theoretical and abstract, that we forget that a good part of the discipline is distinctly anthropological. How can we study the programming of machines with so little consideration for the role of the programmer? Educationally and professionally, why do we build so many programming tools and languages without ever stopping to ask: do they actually help?

1 Comment

Filed under Uncategorized

We’re looking to hire a web developer

Computing At School is an organisation supporting and promoting the delivery of computing in UK schools. Since 2012, our team at the University of Kent has maintained the CAS Community web site. We do this alongside our regular research work on BlueJ, Greenfoot and associated projects. This has lead to our time being stretched quite thinly, and unfortunately it is the CAS web site that has suffered a bit, with little active development in the past months. We’re looking to rectify this by hiring a web developer to focus their time on the CAS site. The link to apply is here:

http://www11.i-grasp.com/fe/tpl_kent01.asp?newms=jj&id=39211&newlang=1

Why it might appeal: This is a good opportunity to further your technical skills and take the initiative to make identifiable contributions and really add something to a website used by thousands of teachers to share resources and improve the teaching of our own discipline: computer science. You will work at a university with the combined benefits of a large employer (big campus with lots of amenities) and a small team. We offer a non-commercial, relaxed environment, flexible on working hours, with no bureaucracy (no timesheets or tedious project management), but retaining a core dedication to getting the job done well. We are based in Canterbury, a pleasant historical city (but the students keep it lively) only a few miles from the coast and only one hour’s train journey from London.

Canterbury campus in summer, original photo by Stephen Train, CC BY-NC 2.0

Canterbury campus in summer, original photo by Stephen Train, CC BY-NC 2.0

Logistics: Closing date for applications is 26th April 2015, currently two weeks away, with interviews set for 5th May. The job is intended to be full-time, although we would also consider part-time working if that suited a candidate’s circumstances. Ideally, we would like a candidate to work in our offices in Canterbury, but we can also consider (UK-based) remote working with regular Canterbury meetings. The job is grade 6 in the university’s structure, which means a salary of 26–30k. It’s initially a one-year fixed term position, as we currently only have funding available for one year. More details on the post are available via the above link.

Technical skills: The website is written in Ruby on Rails, with a smattering of client-side Javascript/AJAX. We’re looking for what some call a “full stack developer”, although the stack is quite small. We’d ideally like someone who is familiar with doing the technical aspects of Rails and a little AJAX development, able to SSH into a Linux server and poke a cron job or log file, but also able to design decent enough webpages when needed, talk to users and other stakeholders about improvements to the site. But we realise that you never get a perfect candidate: we at least need someone who can program well, is familiar with the basics of web development, and can pick up whatever bits of Ruby on Rails and Javascript they need, quickly, while also being able to converse with users and fellow developers.

Other skills: The developer will work as part of a small team, primarily with Michael Kölling and me. We’ll decide priorities together, and I can advise on the structure of the site code and plan architecture of new features and so on, but we are looking for someone who can work effectively on their own, taking charge of the website and making a real difference to it and its users.

If you have any questions, feel free to contact me (nccb@kent.ac.uk). Please pass this on to friends or colleagues who might be interested.

Leave a comment

Filed under Uncategorized

Greenfoot 3, at SIGCSE

Most of the BlueJ/Greenfoot team will be headed to Kansas City next week, for the SIGCSE 2015 conference. We’ll be doing a Raspberry Pi demo at 3pm on Thursday in the exhibit hall, and we’ll be presenting a new paper on our Blackbox data on Saturday morning at 9am in room 2502A (more on that in a future post). But the event we are most excited about is our Greenfoot event, 5:30pm on Friday night in room 2502B, where we plan to demo and launch a public preview of Greenfoot version 3.0.0.

What’s New In Greenfoot 3?

We’ve continued to refine the existing parts of Greenfoot: we’ve added generics to the right places in our Greenfoot API, and we’ve switched to automatic compilation, so the Compile button will be a thing of the past. By far the biggest development in Greenfoot 3 is that we have added a totally new editor (which will sit alongside the existing editor), with a new way to edit programs. We’re calling it frame-based editing: roughly speaking, it is a hybrid of block-based editors like Scratch, and text-based editors like Greenfoot’s Java editor. The intention is that it takes the best parts of block-based editing (easy manipulation by dragging, avoidance of syntax errors) but marries them with the best parts of text-based editing (keyboard control, less tedious dragging for program creation and expression manipulation, easier management of longer programs).

The new Greenfoot 3 editor.  The look is not too far different from Greenfoot 2, but how you use it is quite different.

The new Greenfoot 3 editor. The look is not too far different from Greenfoot 2, but how you use it is quite different.

I’ll be posting a lot more details about the new editor after SIGCSE (for now, we’re busy focusing on testing the release ahead of the conference), but if you’re coming to SIGCSE: do come join us on Friday night to take a look. There’s even free food and drinks!

Leave a comment

Filed under Uncategorized

How to spell program

The word program has various meanings. The common meaning is this one, from the Oxford English Dictionary (OED):

An advance notice describing any formal proceedings, as an entertainment, a course of study, etc.

This word is spelled “program” in the US, and “programme” in the UK: one OED example usage is “The dance programme featured four works”. Fine. However, the word program also has a special meaning in our domain: a computer program, which is programmed by programmers. Over to OED again:

Noun: A series of coded instructions and definitions which when fed into a computer automatically directs its operation in performing a particular task.

Verb: To write a computer program.

Let’s be clear: this sense of the word is now spelt program, not programme, even in the UK. Even the OED admits it (in its definition of the verb above, and the note on the noun: “Now usu. in form program”). But that doesn’t stop various UK organisations from trying the British spelling:

  • Telegraph, Nov 2013: “This has discouraged software developers from writing programmes for Android”.
  • UK government, this week (Dec 2014): “In schools, a new GCSE in computer science [will cover] the most up-to-date issues including writing code, designing programmes…”.
  • The Guardian have recently got the hang of it — using programme in 2005 (“one programme can infringe many different patents at once”), but updating to program by 2012 (“Coderdojo inspires kids to program”).

Why does this matter? It bothers me because I’m a nitpicking pedant, but I think it’s also a culture signifier. Using phrases like “to programme a computer” or “writing a computer programme” shows that the person has never actually been involved with programming, or else they would realise that all programmers have adopted the US spelling (the US being quite big in computing, apparently). It always suggests someone writing about something they don’t know much about. So, if you want to avoid this, update your style guides: program, not programme.

2 Comments

Filed under Uncategorized

Expressive Whitespace

Do you know the operator precedence rules in the programming languages that you use? Given an expression like this one in Java:

x/6+5&8>>2-4!=8

Can you say how it will be parsed? (Never mind that the semantics may be meaningless; the parser doesn’t care about semantics.) Generally, programming environments give very little help in improving readability of expressions; the expression will be displayed exactly as above. You do get bracket-match highlighting if you have brackets (and some would argue that you should always bracket expressions for extra clarity). But maybe the display of expressions can be further improved in other ways.

Designers tend to use whitespace for grouping items — for example, grouping related columns in tables. And in fact, a lot of programmers do tend to omit the whitespace around high precedence (tightly bound) operators while putting it in around lower precedence operators. You are much more likely to see this:

dist * Math.sqrt(x*x + y*y)

Than this equally spaced version:

dist * Math . sqrt (x * x + y * y)

Or this devilishy spaced version:

dist*Math . sqrt(x * x+y * y)

If the rule that most people follow is simply to have whitespace inversely proportional to the precedence of the operator, surely we could automate this in a programming editor.

Dynamically varying whitespace

Consider this expression, without spaces:

expression-unspaced

Using the parse tree of this expression, we can assign smaller whitespace to operators nearer the leaves (higher precedence operators which bind more tightly) than operators nearer the root (lower precedence operators):

expression-spaced-tree

It is now more obvious at a glance how the expression should be read, compared to the unspaced original. Not only is the display clearer, but if the editor takes charge of putting the whitespace in expressions, the user can save keystrokes by never having to insert spaces in expressions in the first place. They can enter the unspaced version above, and the editor displays it as the second version automatically.

Choosing the amount of space

One design question is whether the width of whitespace is solely determined by absolute operator precedence — i.e. plus always has the same amount of whitespace around it — or whether it is determined by the relative precedence of the operator in the chosen expression. In the absolute case, the complete expression “2 * 3″ will be spaced differently to “2 + 3″, which to me seems odd. In the dynamic case, you get spacing readjustment: if you take “2+3″ and add “*4″ on the end, the + will get more space added to reflect that its dynamic precedence has changed. That is:

expression-23

Becomes:

expression-234

While re-spacing as you edit is visually disturbing, I think this is the right thing to do — the addition of *4 has changed the semantics of the existing expression (it is not “(2+3)*4″ but rather “2+(3*4)”) so respacing it to reflect the altered semantics seems correct.

Longer expressions

There is a limit to how long an expression this technique can make readable — our earlier terrible expression:

expression-long-unspaced

Becomes a bit better when space is varied:

expression-long-spaced

But I think in that case, there’s only so much that spacing can do — you really should add some brackets.

Summary

I’ve discussed a way that programming editors could be smarter about whitespace when displaying expressions: dynamically varying the whitespace around operators based on their relative precedence in the expression. This is likely to be included in a new editor we’re currently working on for our Greenfoot system. I note that this scheme does vary the width of spaces in your editor, which may upset some users. But while fixed-width spacing is useful for aligning the left-hand edges of lines of code, I’m less convinced that it matters within the line of code.

Addendum: a colleague points me to this work which mentions a similar system for mathematical equations (page 6). It’s interesting that this idea has been implemented in mathematics but not yet caught on in programming.

1 Comment

Filed under Uncategorized

The dark at the end of the funnel

In a Q&A session last week, Facebook founder Mark Zuckerberg talked about the problem of getting more women into Computer Science (CS). He referred to the vicious circle of trying to encourage more female participation in CS:

You need to start earlier in the funnel so that girls don’t self-select out of doing computer science education, but at the same time one of the big reasons why today we have this issue is that there aren’t a lot of women in the field today.

The funnel or pipeline is this idea that you only get trained developers by educating them; if you want more graduate developers you need to get them in an educational pipeline at an earlier age so that they will take computing degrees. This September, England began its Computing adventure, with boys and girls required to study Computing (which includes CS and programming) from ages 5–14. We’ll let you know how “filling the funnel” turns out. There are definitely problems of attracting and retaining more women in CS have originated in education — Mark Guzdial has a good blog post about this that I won’t repeat here.

However, this is not solely an issue with the education system (though that would be a familiar narrative — work force not as we would like it? Must be the fault of schools and universities). The pipeline or funnel doesn’t just need filling by shoving lots of 5 year old girls in one end and waiting for the hordes of female developers to swim out of the other end into an idyllic tech industry pool. Zuckerberg mentions that the lack of women in the industry forms a vicious cycle. This is not a problem at the education end of the funnel.

As this Fortune article describes, the industry is not welcoming to women. The Anita Borg Institute found that women’s quit rates were double those of men. Not to mention issues like maternity leave. The pool at the end of the pipeline is leaking, and for good reason. So the vicious cycle is not simply an accident of history; the women that are in the industry tend to leave. There are several reasons for this, some of which are identity and culture in the industry.

Gamer Identity and Culture

You may well have seen press coverage of the recent “#gamergate” mess. Despite their cover story about ethics in games journalism, #gamergate was started as a way to deliberately target women in the games industry and hound and harass them until they quit or worse:

The 4channers express their hatred and disgust towards [Quinn]; they express their glee at the thought of ruining her career; they fantasize about her being raped and killed. They wonder if all the harassment will drive her to suicide, and only the thought of 4chan getting bad publicity convinces some of them that this isn’t something they should hope for.

How do you explain that to the young women you are inviting to join the pipeline? Come learn to program and how to make games — and try to ignore the fact that a terrorist movement was begun in order to hound your gender out of the industry. It’ll be fun!

Some games writers have focused on the gamergate idiocy as being related to identity and gaming going mainstream. Back in the 80s and 90s, a group of people, mainly young white men, who felt excluded from a masculine sports-centred culture, found solace in making gaming their identity. (And during that period, programming was well linked to gaming, as home PCs like the Spectrum, Commodore, etc were amenable to games and to programming.) As more and more people got into games, the original gamers slowly redefined their identity. Sure, lots of people played games, but no true gamer played Candy Crush. Subcultures formed, each looking down on someone else. Call of Duty players sneered at casual gamers. Older gamers scoffed at Call of Duty players. And so on.

(This is not that unusual in media fandom — music lovers have been the same way for generations (a classic examples being mods vs rockers). This article by Arthur Chu directly compares the anti-disco movement to #gamergate, describing how a perception of losing majority status can lead to reactionary rage.)

Programmer Identity and Culture

Now, the games industry isn’t the same thing as the tech industry — but it does clearly overlap: programmers work in both industries. And the identity issues are paralleled in the tech industry. This article by Carlos Bueno nicely sums up how programmer identity is important in the silicon valley tech industry. One tale of non-conformism:

[The interview candidate] was dressed impeccably in a suit… I stole a glance to a few of the people from my team who had looked up when he walked in. I could sense the disappointment. It’s not that we’re so petty or strict about the dress code that we are going to disqualify him for not following an unwritten rule, but we know empirically that people who come in dressed in suits rarely work out well for our team. He was failing the go-out-for-a-beer test and he didn’t even know it…

And another:

Again Max Levchin: “PayPal once rejected a candidate who aced all the engineering tests because for fun, the guy said that he liked to play basketball. That single sentence lost him the job.”

These are not issues of job performance, and not a simple gender issue. These are issues of identity and culture (see also the rise of the “brogrammer”). I think that programmers mirror gamers in this aspect. We built a culture where we subtly redefined what characteristics are important until it fitted only the people who we thought it should. As Bueno puts it:

We’ve created a make-believe cult of objective meritocracy, a pseudo-scientific mythos to obscure and reinforce the belief that only people who look and talk like us are worth noticing.

I’m sure many people have worked in programming offices (or sat in programming classes) where they felt excluded if they did not have anything to say about the latest sci-fi series or talk about last night’s DOTA 2 game or whatever. (This problem occurs in several industries, but that’s no justification not to fix it in your own.) It’s not usually a malicious thing, but as with the “go for a beer test” in the earlier quote, companies often assume that new hires must bend to the culture, rather than bending the culture to fit new hires. It’s not exclusively a gender issue, but women and minorities tend to be hit harder by it.

These problems get buried under the idea that programmers have created this wonderful meritocracy, where if you can code well, you will succeed. Programming skill is what really matters. (Despite evidence to the contrary: is the highest-paid or highest-status person in a tech company the best programmer? Does it actually help that much in your career?) And thus programmers tend to believe the reverse, too: if you didn’t succeed, it’s because you couldn’t code well. When the #ghcmanwatch participants suggested that women should just “be better”, that surely arose from this meritocratic world view.

Summary

Computing education is currently making moves to put more women into “the pipeline” (aka “the funnel”) so that we might get more computing graduates. But it’s a tough sell when the end of the pipeline is not a desirable destination:

The only people who can alter that are those who are already in the tech industry, by making sure that the work environment is more welcoming and nurturing to all. That’s a day-to-day, office-by-office battle. A two-fold approach is needed: making the work place more inviting for women, and getting more women into CS during school and university. Of course culture is just one gender issue (Microsoft’s CEO made headlines and backtracker over his equal pay comments) but it’s one that everyone in the industry can help to address.

Leave a comment

Filed under Uncategorized