Is Teaching Programming Not Enough?

The Atlantic has an interesting take on the programming/computer science/computational thinking debate, with several points for disagreement. I’m going to pull out various quotes from the article, and interleave them with some of my own thoughts.

Who Needs Java or JavaScript Anyway?

“Educators and technology professionals are voicing concerns about the singular focus on coding—for all students, whether learning coding is enough to build computational thinking and knowledge, and for students of color in particular, whether the emphasis on knowing Java and JavaScript only puts them on the bottom rung of the tech workforce.”

That last bit’s a jaw-dropper. A mastery of two of the most popular programming languages in the world would surely not leave you on the bottom rung of the whole tech workforce! I think this quote is a misinformed summary of some of the later comments, and thus a very unfortunate quote to repeat in the byline. Research like that from Sweller’s talk suggests generic cognitive skills like computational thinking cannot be directly taught. Even if they can be instilled, through transferring learning from another domain, programming may well be the best place to transfer from.

The Self-Programming Computer

“The artificial-intelligence system will build the app… Coding might then be nearly obsolete, but computational thinking isn’t going away.”

I can foresee that increasingly powerful programming tools, higher-level languages and code-reuse (e.g. micro services) could reduce the number of programmers needed, but the idea that AI will write the code seems incorrect. I don’t think any efforts to take the programming out of creating computer programs have ever really succeeded. It always ends up with programming again, just at a different level of abstraction.

The Relation to Mathematics

To avert the risk of technical ghettos, all students must have access to an expansive computer- science education with a quality math program… It’s a myth to think that students can simply learn to code and flourish without a minimum level of mathematical sophistication.

This is a perennial point of contention, as to whether mathematics knowledge is required for programming, and/or whether the skills in them correlate. The assumption that they are intrinsically related is a personal bugbear, and I’m not sure there’s strong evidence either way. Not that I’m arguing against the notion that all students should have access to a quality mathematical education!

Self-Teaching

This issue of privilege is interesting, though. Historically, many programmers are self-taught. This means that programmers are primarily those with access to computers, who are encouraged (or not discouraged) by their families, which means you get less women (who are less likely to be encouraged to try it) or those from poorer backgrounds (who are more likely to need to get a part-time job, and thus have less time available for self-teaching). The UK’s gambit of teaching it to everybody has the potential to partially correct this artificial selection. However, it is a very touchy subject to suggest that self-taught programmers are automatically inferior, which is what one interviewee implies:

Further, [Bobb] recommends combining the endless array of out-of-school programs for coding and hackathons with learning opportunities in schools. Brown echoes this point, adding that coding to the uninformed can take many forms, but “you simply cannot learn the analytical skills… without a formal and rigorous education.”

This chimes with the potential disagreement with Sweller’s talk. Is computing special (because you can learn from the computer’s feedback) so that self-teaching can work more effectively than in other disciplines? The legions of self-taught programmers in the workforce surely shows that self-teaching must be possible. It may not be the most efficient method to teach yourself, but I think claiming that only formal education can teach the necessary analytical skills for programming is surely incorrect.

John Sweller on Cognitive Load Theory and Computer Science Education

The opening keynote at SIGCSE 2016 was given by John Sweller, who invented cognitive load theory and works on issues such as the relation between human memory and effective instructional design. The keynote was purportedly about “Cognitive Load Theory and Computer Science Education”. As one or two people pointed out, the “Computer Science Education” part was a bit lacking, which I’ll come back to. (I should also point out this is a bit hazy as there was a full conference day before I had chance to write this — happy to take corrections in the comments.)

Knowledge and Memory

Sweller started by discussing biologically primary versus biologically secondary knowledge. Biologically primary knowledge is that which we have evolved to acquire: how to walk, how to talk, how to recognise faces, etc. None of this is taught, it is all learnt automatically. In contrast, biologically secondary knowledge is the rest, which he argues must be explicitly taught. Algebra, history, science: his rough summary was that everything that is taught in school is biologically secondary.

I won’t go deeply into working memory and long-term memory here, but the relevant parts were that working memory is small and transient (i.e. disappears quickly), whereas long-term memory is almost unlimited and can be retained permanently. Novices can get overwhelmed because everything novel must live in working memory, whereas experts have knowledge stored in long-term memory and so use different processes to tackle the same tasks. One thing I did not know before was that we have two working memories: one auditory and one visual, which means you can potentially achieve benefits by presenting information in a mixed modality, providing they mesh well together.

Visualisations

One issue that came up in the questions was about algorithm visualisation. Algorithm visualisation is something that many people are convinced is useful for understanding programming, but which has rarely, if ever, been shown to be effective for learning. Sweller’s suggestion was that if comparing two states (before/after) is important and non-trivial then it is better to provide two static diagrams for careful comparison, rather than a video which animates the transition from one to the other. My take-away message from this is that visualisations need to be comics, not cartoons.

Experts Use Memory More Than Reasoning

Sweller made the point that experts tend to operate through pattern-matching. Although we may think of programming as carefully constructing logical code paths to fit a given problem, we are often just recognising a problem as similar to one we have solved before, and adjusting our template to fit. More expert programmers just know more patterns (and “patterns” here matches well to the idea of design patterns). The difficult part of programming is thus only when we stray outside our pattern catalogue. This was work covered in some work in the 1980s which I’ve previously discussed here (and see also this post).

What May Make Programming Special

The issue of how knowledge is transmitted was contentious. Sweller is against constructivism: he believes the idea that knowledge is best gained through discovery is incorrect, and explicit instruction is always superior. This is where the Computer Science domain becomes important. I can see that for something like algebra, you must be taught the rules, and taught the processes by which to rearrange an equation. You can’t just mess around with equations and hope to learn how they work — because you have no feedback.

But the computer is an interesting beast. It provides an environment which gives you feedback. If you know the basics of how to write a program, you can potentially learn the semantics of a language solely by exploring and discovering. Can is not the same as should, though: explicit instruction may still be faster and more effective. But I think it went unacknowledged in the talk that programming is somewhat different to many other learning experiences, because you have an environment that offers precise, immediate, automatic feedback based on your actions, even if no other humans are involved.

Final Notes

There’s a bunch more content that I haven’t included here for space reasons or because I didn’t remember it clearly enough (didn’t reach long-term memory, I guess!). Terms to google if you want to know more: cognitive load theory, randomness as genesis principle, expertise reversal effect, worked-example effect, split-attention effect.

I liked Sweller’s talk, and I believe that understanding what he called human cognitive architecture is important for education. I think the main issue with the talk is that Sweller is a psychologist, not a computer scientist, and thus there was little consideration given to the potential ways in which computer science may be different. How does having syntax and semantics in code play with working memory for novices? What should explicit instruction look like in programming education? Do different notional machines cause different interactions with our memory? How can we best organise our teaching, and our design of programming languages and tools to minimise cognitive load? Some answers to that arise in Briana Morrison’s work (including the paper later in the day yesterday, not yet listed there), but there is clearly a lot more potential work that could be done in this area.

Book Review: Learner-Centered Design of Computing Education

Mark Guzdial works at Georgia Tech and writes the most prolific and most read blog in computer science education. Thus I was intrigued to read his new book, “Learner-Centered Design of Computing Education: Research on Computing for Everyone”.

The book is a short and accessible read, that summarises huge chunks of research, especially the parts about teaching computing to everyone (i.e. non-specialists). One of Guzdial’s most well-known interests is media computation, which is used at Georgia Tech to teach non-majors (i.e. those taking non-CS courses) about programming, and it’s interesting to see how much positive impact it has had. But there’s no favouritism here: the book always has the research to back up its claims, and this is one of the great strengths of the book. The reference section has some 300 references, making it a great place to start getting to grips with research in the area. If I have any complaint with the book it is that it could do with a bit of copy-editing to remove some typos, but it’s not a serious issue.

Computational Thinking

The recent buzz area of computational thinking is covered in the book. One of the key ideas around computational thinking is that of teaching computing in order to teach general problem solving skills. Again backed by research, Guzdial dismisses this idea as unsupported: there never seems to be any transfer from programming into more general problem-solving skills. Programming’s utility would seem to be two-fold: teaching programming can transfer into other programming (i.e. between programming languages), and teaching programming does give an idea of how computers operate and execute. Beyond that, much of the rest is wishful thinking. But in an increasingly computer-centric and computing-reliant world, I believe this is still sufficient to argue for teaching computing to all.

Identity and Community

Several parts of the book remind me of this article I read recently on computing as a pop culture. One interesting aspect of programming is how programming languages live and die. It’s tempting to think that it’s solely to do with suitability for purpose. However, there are all sorts of other factors, such as tool support, library availability, and popularity. Languages do clearly move in fads, and this is just as true in computing education. Pascal is not necessarily any worse today to teach programming than it was twenty or thirty years ago, but it is dead in classrooms. Guzdial relates this to communities of practice: students want to believe they are learning something which leads them towards a community of professionals. For physicists, this might be MATLAB; for programmers this might be Java. This can be frustrating as a teacher: why should a teaching language be discounted just because students (perhaps wrongly) come to see it as a toy language. But this seems to be a danger to languages for use in classrooms, especially at university.

The issue of identity and belonging crops up repeatedly. Guzdial makes the point that for people to become programmers, they need to feel like they belong. If they don’t see people like themselves doing it then they are likely to be put off. This may be minorities not becoming full-time software developers, but it can also affect non-specialists. There’s an anecdote from one of Brian Dorn’s studies where a graphic designer who writes scripts in Photoshop tells of being told that he’s not a real programmer, he’s just a scripter. Technically, programming is programming, but it’s clear that issues of identity and community are often more important than the technical aspects. Programmers reject those they see as non-programmers, non-programmers reject those they see as programmers.

One really interesting case study on belonging as a programmer is that of the Glitch project, which engaged African American students in programming by using game testing as a way in. Several of the participants learned programming and did extra work because they were so engaged — but they would not admit this to anyone else. They constructed cover stories about how they did extra for extra money, or that they were only doing it to play games or learn money, disguising the fact that they were learning computer science.

The issue of identity also resurfaces when Guzdial talks about computing teachers. As in the UK, many US computing teachers come from other subjects: business, maths, science, etc. Many felt that they were not a computing teacher and (in the US) lacked sight of a community of computing teachers for them to join, which demotivated them. I also appreciated the section on how computing teachers’ needs differ from those of software developers. “Teachers need to be able to read and comment on code… We don’t see successful CS teachers writing much code.” I wonder to what extent this is reflected in CS teacher training. The point is also made that teachers need to “know what students are typically going to wrong”, which is something my colleague Amjad Altamdri and I have done some work looking into.

Conclusion

I’d recommend the book to any computing teacher who is interested in learning more about what the research tells us about teaching computing. I could imagine lots of secondary school computing teachers in the UK being interested in this, and a fair few computing university academics who could benefit from reading it. Although there may be little chance of widespread impact, as the book itself describes: “Several studies have supported the hypothesis that CS teachers are particularly unlikely to adopt new teaching approaches” and “Despite being researchers themselves, the CS faculty we spoke to for the most part did not believe that results from educational studies were credible reasons to try out teaching practice.” (I wonder how many of those believe in evidence-based medicine, etc, but not evidence-based teaching.) The book would also be particularly useful to new computing education PhD students who want a head start in getting to grips with previous work in the area. If any of those describe you, I’d recommend picking up a copy.

Greenfoot and Blackbox at SIGCSE 2016

This year, the SIGCSE (Special Interest Group in Computer Science Education) conference is in Memphis, Tennessee, at the beginning of March. With the early registration deadline of February 2nd fast approaching, I thought it’s worth quickly highlighting our events.

On Friday evening, we’ll be delivering a workshop on Greenfoot, using our new Stride language. Stride is a halfway house between blocks-based programming like in Scratch, Snap! & co, and full text programming in languages like Java or Python. We officially released Stride in Greenfoot 3.0 towards the end of last year, so we’re looking forward to having the opportunity to now deliver workshops using Stride. The workshop is at 7pm on Friday night. The committee like to see sign-ups in advance, so if you are thinking of attending, it would help us if you sign up now rather than once you arrive.

Stride Editor
Stride Editor

Also related to our work on Stride, I’ll be taking part in a panel discussion (proposal available here) on Friday morning at 10:45, entitled “Future Directions of Block-based Programming”. I’m very excited by the panel we’ve put together, where I’ll be joining: Jens Moenig, creator of Snap! and now working on GP, a new block-based language; David Anthony Bau, one of the leads of Pencil Code, an impressive blocks-based editor; and David Weintrop, who has done some excellent work looking at block-based programming (see 2015 section here). I’m looking forward to talking about the important details of designing and implement block-based and block-like editors, and how the design affects the way they are used in teaching.

Our other event is not focused on Greenfoot, but instead on our Blackbox data collection project. We’re holding a BoF (Birds of a Feather) event on Thursday evening at 5:30pm where we hope to get together with other researchers interested in educational data collection and analysis to talk about the data set and data collection platform that we provide in BlueJ.

I’m also looking forward to attending and seeing all the other interesting sessions. Glancing at the program, the paper sessions on CS Ed Research and Program Design look interesting for me. If you have any other suggestions on what to attend, feel free to put them in the comments.

Adventures in Graphing

One of my favourite parts of research, probably because I don’t get to do it often, is trying to come up with a graph or similar visualisation to display some results. You often have a lot of choice in how you could visualise some data; the art is in trying to find the best way to convey the interesting aspect of the results so that it’s obvious to the reader’s gaze. I thought it might be interesting to demonstrate this through an example. In one strand of our work, we look at frequency of different programming mistakes in our Blackbox data set. We’ve recently added a location API for the data so as a descriptive statistic, I wanted to see if the frequency differed by geography. To that end, I produced a table of mistake (A-R) versus continent:

The data is already normalised by total number of mistakes for each continent, so each column will sum to 1. One option would be simply to use the table to portray the data. This is often the best option for single-column data, or for small tables (e.g. 4×4). But it’s too difficult to get any sense of the pattern from that table. So we need a graph of some sort. The particular challenge with this data is that although it’s only a two-dimensional table, both dimensions are categorical, with no obvious ordering available.

Since each column sums to one, we could use a stacked bar chart:

I don’t like this option, though, for several reasons. 18 different stacked bars is too visually noisy, and we still need to add labels, too. Stacked bar charts also prevent comparison: is there much of a difference in the proportion of C (the light blue bar, third from bottom) between areas? It’s too hard to tell when stacked.

If stacked doesn’t work, it’s tempting to try a clustered bar chart, with the bars for each area alongside each other instead. That looks like this:

This time, you can see the difference between the proportion of C between areas. But… it’s still too noisy. It’s too difficult to take in a glance, and annoying fiddly to look at each cluster in turn and check the proportions. You actually get a reasonable sense of difference between mistakes (A-R) rather than between areas. You could thus try to cluster in the other dimension, giving you six clusters of eighteen rather than eighteen clusters of six:

A first obvious problem is that the colours repeat. It sounds like a simple fix to just pick more colours, but picking eighteen distinct colours would be very difficult. And even then, comparing one set of eighteen bars to another to look for differences is just too fiddly.

So maybe we should forget the bars. An alternative is to use line graphs. Now, technically, you shouldn’t use line graphs for this data because a line typically indicates continuous data rather than categorical. But I think this should be considered a strong guideline rather than absolute rule; sometimes with difficult data, it’s about finding the most reasonable (or “least worst”) approach when nothing is ideal. So here’s a line graph with mistakes on the X and proportions on the Y:

Not really very clear; any differences at a given point are obscured by the lines rapidly merging again before and after. Also, I don’t like the mistakes on the X axis like this, as it does strongly suggest some sort of relation between adjacent items. Maybe it would help to make it radial:

Ahem. Let’s just forget I considered it making it radial, alright? Looking back at the previous line graph, I said one issue was that the lines jump around too much, merging after a difference. We could try to alleviate this by sorting the mistakes by overall frequency (note the changed X axis):

Once the labels get added, it’s probably the one I like most so far. You at least get a sense, for the most frequent mistakes, of some difference in the pattern between the continents (one line per continent).

At this point, we’ve exhausted the bar and line graph possibilities, with either mistake or continent on the X axis, a different bar/line for the other variable, and proportion always on the Y axis. A final possibility I’m going to consider is a scatter graph. In this case, we can use continent as the X axis, and use the mistake label as the point itself on the scatter graph, with the proportion again on the Y:

At first glance, it’s a bit of a mess. Several of the letters overlap. But as it happens, I’m less interested in these. Minor differences in the low frequency mistakes are probably not that interesting for this particular data. The difference between the most frequent mistakes is, I think, more clearly displayed in this graph than in any of the others.

There is one finishing touch to add. One issue with normalising the data by total frequency for that continent is that it obscures the fact that we have quite different amounts of data for each continent. Happily, this scatter graph gives us the opportunity to scale the size of the points to match how much data we have for that continent. (There are two ways to do this: scale the height of each character by amount of data, or the area of each character by amount of data. Having tried both, the latter seemed to be a better choice.) That leaves us with this:

Easy to see where most of our data comes from! I may yet add some colour, but really the final decision is: is this interesting enough to include in a paper, or should I just scrap my work on this? Such is research.

Greenfoot: Frames, Pi, and Events

Lots of positive Greenfoot developments recently, which seemed worth summarising. Firstly, we released preview2 of Greenfoot 3.0.0, our new version of Greenfoot which includes our new frame-based editor (a hybrid of block- and text-based programming). The download of 3.0.0-preview2 is now available. Our intention is that this is a feature-complete preview, and we’re now just fixing bugs and optimising ahead of the final release (likely to be November or so).

Our other big development is that BlueJ and Greenfoot are now included in the default image of the official Raspbian Linux distribution for Raspberry Pi. A few weeks ago we made it into the main repository, which means that on any Raspbian, you can now run: sudo apt-get update && sudo apt-get install bluej greenfoot and have both tools available in the Programming menu in the GUI. But being included in the default image means that anyone installing Raspbian from now on will automatically already have BlueJ and Greenfoot installed. We’re really pleased to have our software so easily available of course. Thanks must go to the Raspberry Pi folks who supported this move, and my colleague Fabio Hedayioglu who worked on some optimisations to help our performance on the Raspberry Pi. BlueJ in particular comes bundled with libraries for interfacing with hardware via the GPIO pins.

We also have a set of events coming up which packs all my travel for the second half of the year into three weeks. I will be attending the Blocks and Beyond workshop on 22 October in Atlanta, which leads into a trip to the JavaOne conference in San Francisco where we will be teaching Greenfoot at a kids event (open to the public) on 24 October and talking about Greenfoot 3 on 26 October in the main conference.

After that there is the CAS Scotland conference on 7 November in Dundee where I’ll be doing a session on Greenfoot 3, and we will also be at the WiPSCE conference in London on 9–11 November, presenting our paper “Frame-Based Editing: Easing the Transition from Blocks to Text-Based Programming” on the Tuesday — when there is a cheaper day rate available for teachers. If you’re a UK teacher, it’s well worth considering attending if you can get out of school that day.

Improving Blocks with Keyboard Support

The Blocks and Beyond workshop will take place in Atlanta on October 22. It’s shaping up to be an interesting workshop, based around discussion of the submitted papers. Our own contribution is a position paper, entitled “Lack of Keyboard Support Cripples Block-Based Programming“. It’s a brief summary of the design philosophy which has informed our design of Greenfoot 3 (which we’re busy working on): the mouse-centric nature of existing block-based programming systems (Scratch, Snap!, etc) hampers their ability to scale up to larger programs, which in turn puts a ceiling on their use. Our gambit is that if we can remove this restriction, there might no longer be any reason to switch back to using text-based programming languages.

Greenfoot 3 Editor
The Greenfoot 3 Editor — public preview available

The paper (accepted version, copyright IEEE) is freely available, and only 3 pages, so rather than reproduce the full argument here, I suggest you take a look at the paper — and if you want to discuss it, or see similar work, then why not attend Blocks and Beyond?

Better Mathematics Through Types

I’m a huge fan of strong static types; it’s one of the reasons I like Haskell so much. Recently I’ve been toying with some F#, which has a typing feature I’m really enjoying: units of measure. I want to explain a few examples of where units of measure can prevent bugs. To do this, let’s start off with some simple F# code for a car moving around the screen. It has a direction and a speed, and you can steer and speed up/slow down using the keys. Here’s some basic code to do this:

type Key = Up | Down | Left | Right
let isKeyDown (key : Key) : bool = ...

type Position = {x : float32; y : float32}

let mutable pos = {x = 0.0f; y = 0.0f}
let mutable direction = 0.0f
let mutable curSpeed = 1.0f
let acc = 0.5f
let turnSpeed = 4.0f

let update (frameTime : float32) =
  if isKeyDown Up then
    curSpeed <- curSpeed + acc
  elif isKeyDown Down then
    curSpeed <- max 0.0f (curSpeed - acc)

  if isKeyDown Left then
    direction <- direction - turnSpeed
  elif isKeyDown Right then
    direction <- direction + turnSpeed

  pos <- {x = pos.x + cos(direction) * curSpeed * frameTime;
          y = pos.y + direction * curSpeed * frameTime}

This code looks fairly reasonable, and it compiles and runs. It’s also riddled with mistakes, which we can eliminate through the use of units of measure. To start with, we need to define a few units:

type [<Measure>] second
type [<Measure>] pixel
type [<Measure>] radian

Units of measure allow easy tagging of numeric types with units. The units are really just arbitrary names. The compiler doesn’t know second means, it just knows that second is distinct from radian. I think most people use single letters for brevity, but I like full names for clarity. You can use them to type a variable like so:

let mutable direction : float32<radian> = 0.0f

But actually this is insufficient; the compiler will complain that you are assigning 0.0f (a literal with no units) to direction, which has units. Much easier is to only type the literal and let the type inference do the rest:

let mutable direction = 0.0f<radian>

Typing our initial variables is pretty straightforward. Positions are in pixels, and we know from school that speed is distance over time, acceleration is distance over time squared:

type Position = { x : float32<pixel>; y : float32<pixel>}

let mutable pos = {x = 0.0f<pixel>; y = 0.0f<pixel>}
let mutable direction = 0.0f<radian>
let mutable curSpeed = 1.0f<pixel/second>
let acc = 0.5f<pixel/second^2>
let turnSpeed = 4.0f<radian/second>

This alone is enough to show up all the errors in our code. Let’s start with:

  if isKeyDown Up then
    curSpeed <- curSpeed + acc

The units of curSpeed is pixel/second, and we’re trying to add acc[eleration], which has units pixel/second^2. This is a subtle bug. To explain: as you may know, frames in a game or simulation don’t realistically run at completely fixed frame rates. They tend to vary between machines because different machines can maintain different maximum frame rates, and they even vary on the same machine while the program is running because other processes may interrupt to use the CPU. If I add acc to my speed every frame, then on a desktop doing roughly 60 frames per second (FPS) I’ll have a car that accelerates twice as fast as on a laptop doing 30FPS, which would make for quite a different game. The units of measure system has pointed this bug out. We shouldn’t just add the acceleration, we need to multiply the acceleration by the frame duration:

  if isKeyDown Up then
    curSpeed <- curSpeed + acc * frameTime

The same type of bug occurs in a few other places, such as here:

  if isKeyDown Left then
    direction <- direction - turnSpeed

Similar to above, turnSpeed is radians per second, but directions is radians, so we must again multiply by frameTime.

Standard Functions

Here’s another bug, where I have missed out a call to the sin function:

  pos <- {x = pos.x + cos(direction) * curSpeed * frameTime;
          y = pos.y + direction * curSpeed * frameTime}

Previously, the compiler was happy because all the variables were plain float32. Now, I’m trying to add pos.y [pixel] to direction * curSpeed * frameTime [radian*pixel]. Looking at it, the problem is the missing sin call. So we can add that in:

  pos <- {x = pos.x + cos(direction) * curSpeed * frameTime;
          y = pos.y + sin(direction) * curSpeed * frameTime}

However, we still get an error on these lines, complaining that we’ve attempted to pass float32<radian> to a function which takes plain float32. Hmmm. The problem here is that the standard library functions don’t have any units specified on their parameters or return values, so cannot pass a quantity with units to any standard library function. This is a bit of an ugly effect of units of measure arriving after the language’s initial release. One fix is that each time you call sin/cos/tan/atan2/sqrt/oh-my-god-it-goes-on-and-on you have to cast away the units, like so:

  pos <- {x = pos.x + cos(float32 direction) * curSpeed * frameTime;
          y = pos.y + sin(float32 direction) * curSpeed * frameTime}

Eugh. Firstly: this is annoying to have to do every time, especially if you use a particular standard library function a lot. Secondly: this is losing type safety. What if direction wasn’t in radians — which should be an error — but we’re masking the error by casting away the units? Much better is to add your own typed thin wrappers around the functions:

let cosr (t : float32<radian>) : float32 = cos (float32 t)
let sinr (t : float32<radian>) : float32 = sin (float32 t)

(Is there already an F# library that does this for all the standard numeric functions? Seems like it would be a sensible thing to do.) Now we can fix our code in a type-safe manner:

  pos <- {x = pos.x + cosr(direction) * curSpeed * frameTime;
          y = pos.y + sinr(direction) * curSpeed * frameTime}

You may question why you can’t pass a float32 to a function expecting float32<radian>. My understanding is that the designers chose float32 to represent a unitless measure, which is not the same as unit-indifferent. Think atheist vs agnostic: float32 is not unit-agnostic in modern F#, it’s unit-atheist. The output of my cosr function really is unitless, so it is specifically [unitless] float32, not [I was just too lazy to give a unit] float32. Even if you are using units of measure in your code, you will still have functions and variables that are unitless, and using units in their place is an error. For example, an acosr (i.e. inverse cosine) function would take a (unitless) float32 and give back a float32<radian>. It should, and would, be an error to pass in a float32<radian> as the parameter.

This also means units of measure aren’t a subtype: much of their advantage comes from the fact that you can’t substitute a plain float32 and a float32<whatever> without using an explicit cast. They’re more like an easy way to clone numeric types.

Units of Measure: Summary

I really like units of measure. I’ve used types for similar purpose in Haskell in the past, but the way that units of measure effortlessly wrap the existing numerical types make it very easy in F#. If you are doing anything with any arithmetic, not just geometry, units of measure can spot a lot of bugs and provide a lot of automatic safety. Units of measure really come into their own because they track units through arithmetic expressions. Units of measure are not just about avoiding confusing feet with metres; it’s more about not accidentally adding an office space rent cost per day (pounds/metre^2/day) to a cost per day (pounds/day) without remembering to multiply the first one by the desired floor area. Think how much easier your secondary school maths/physics (and later calculus) would have been if you had had a automated system like this, checking the units in all your equations.

One final note: this post is more computing than education, but you may be interested in this previous post (and its comments) on the role of types in programming education.

Thoughts on F Sharp

F# is an attempt to bring a functional programming language (based on ML) to the .NET framework. It seems to have Microsoft blessing, with support out of the box in Visual Studio 2015 (now free for personal use). I thought I’d give some initial impressions of the language, mainly coming from a Haskell perspective. I am happy to be corrected, as I’m still quite new to the language.

I quite like F#, but the main impression I get is that whatever I want to do, there’s two or more separate ways to do it, and no obvious way to choose between them. F# is, very clearly, the result of stitching together two programming systems: the Object-Oriented Programming [OOP] of .NET and the Functional Programming [FP] of the ML language. Much like Java post-generics (with its “int” and “Integer”) or Windows 8 with its tablets-meets-desktop, systems like this often have quite glaring joins. And especially if you come from neither a .NET nor an ML background, you are just left a bit bamboozled as to which you should be using. Where there is a reasonable set of rules to choose between int and Integer in Java, in F# it often just seems to come down to preference or convention. This multiplicity of approaches is especially evident in the type system.

Types

Let’s start with lists. You can specify a list’s type in F# using the postfix syntax “int list” or the prefix syntax “List<int>”. Both mean the same thing and there’s no obvious reason to choose one or the other (this StackOverflow answer mentions a suggestion to use prefix for all except four types, but with no clear justification). Lists have some properties which can be accessed directly, like “myList.Head”. But there’s also a function you can call using “List.head myList” for the same purpose. The property is probably an OOP thing, the function is FP, but as a programmer who just wants to write some code, what’s the difference? The dividing line seems to vary depending on the origin of what you’re using: a C# class is going to have lots of member functions, an F# library will have lots of external functions, and your code will seemingly always end up with a mess of the two. When writing your own code, it’s often not clear which way you should lean, either.

The Class System

F# allows you to write classes which override other .NET classes, which makes for a potentially great marriage of functional programming with existing [OOP] .NET libraries. In the classes you define in F#, there’s multiple ways to define fields:

type MyClass2 =
  let readOnlyPrivateValue = 1
  let mutable readWritePrivateValue = 2
  [<DefaultValue>] val mutable private readWritePrivateValueNoInit : int
  [<DefaultValue>] val mutable readWritePublicValueNoInit : int
  member this.ReadOnlyPublicValue = 5
  member private this.ReadOnlyPrivateValue = 6
  member val ReadWritePublicValue = 7 with get, set
  member val private ReadWritePrivateValue = 8 with get, set

This StackOverflow answer suggests when to use each, but I was still left puzzled. I’m slowly getting the hang of it: let when you have an initial value, val when you don’t, mutable when you want to change it, but I’m not much wiser as to when to use member vs let/val — are they semantically equivalent if I’m not customising the get/set? For example, I found this code in another StackOverflow answer which does a “memberwise clone”. Will that clone my “let mutable” fields or just my “member” fields?

To define a member method you use a similar member syntax, but with brackets for any parameters. You also need an object reference, which you can see is also required for some field declarations but not others. It took me a while to understand the syntax:

type MyClass() =
  member MyClass.GetHundredA() = 100
  member x.GetHundredB() = 100
  member y.GetHundredC() = 100
  member whateveryoulikeiguess.GetHundredD() = 100

I thought the part before the dot was some reference to the class I was making the method for, hence I ended up using the class name (like the first member, above) because that made sense to me, and it compiled fine (not sure it should!). I saw people using “x” instead and thought it was a keyword. It turns out the first part, before the dot, is a name declaration for the “this” concept. So in the last method above, “whateveryoulikeiguess” defines a name to refer to what you would use “this” for in Java or C++. I’m not sure yet why they didn’t just swallow a keyword and always use “this”, and it still strikes me as a pretty weird syntax.

Data kinds

There’s many ways to define a data structure with a couple of members. If you want an int and string value pair, you could use:

type aPair = int * string
type anADT = Combo of int * string
type aRecord = {theInt : int; theString : string}
type aClass(i : int, s : string) =
  member this.TheInt = i
  member this.TheString =s
type aStruct =
   struct 
      val theInt : int
      val theString: string
   end 

Quite the variety! Sometimes there’s a good reason to choose one or the other, but quite often you just stick your finger in the air and guess. (I tend towards the second one because it’s what I’d use in Haskell, but I suspect the third is the best choice.) Suddenly Java with its everything-is-a-class philosophy seems sensibly restrained.

Ignorance is Dangerous Bliss

F# has a feature shared with Haskell where it tries to warn you if you are discarding a return value. Let’s say you have some function to write a file, which takes a path, some content and returns a boolean for success:

// F#:
let writeFile (path : string) (content : string) : bool = ...
-- Haskell:
writeFile :: String -> String -> IO Bool

If you call this function without assigning the return value or otherwise using it, F# and Haskell will both give you a warning (I think Haskell may require a compiler flag to turn the warning on) that the return value is discarded:

// F#, gives warning:
writeFile "C:/log.txt" "Bad stuff happened" 
-- Haskell, gives warning:
writeFile "C:/log.txt" "Bad stuff happened" 

You can suppress this using the ignore/void functions respectively:

// F#, no warning:
ignore (writeFile "C:/log.txt" "Bad stuff happened")
-- Haskell, no warning:
void (writeFile "C:/log.txt" "Bad stuff happened")

But if you mess up and miss out a parameter, you get quite different results in each language:

// F#, missed parameter: compiles fine, no warning:
ignore (writeFile "C:/log.txt")
-- Haskell, missed parameter: compiler type error:
void (writeFile "C:/log.txt") 

I’ll avoid an in-depth discussion of type systems here, but Haskell’s type system (in particular, monads) saves you here from a mistake, while F# happily ignores the return value, which is a partially applied function that does nothing. Lesson: use ignore with great care!

Misc Notes

A few more brief notes:

  • I know this one comes from ML, but whoever came up with expressing a triple type as “int * string * float” needs to be put back in the box with all the other mathematicians. Haskell’s equivalent syntax “(Int, String, Float)” is much nicer and easier to remember.
  • I got caught out at one point by bracketing pattern matches. I had a type “type Position = Position (int * int)”. I wanted to pull the values out using:

    let Position (x, y) = calculatePos()
    printfn "Position: %d, %d" x y
    // Error above: x and y not found

    I’m not sure what F# thinks that code is doing, but it doesn’t give a compile error on the let binding. Turns out you need more brackets to achieve what I was aiming for:

    let (Position (x, y)) = calculatePos()
    printfn "Position: %d, %d" x y
  • I remember, many years back, when I learnt Java after using C++, it was very refreshing to not worry about declaration locations. No header files, no needing to have forward declarations of classes before using them — you just put each type in its own file and off you go. Haskell cannot have cycles in its module dependencies; if module A imports B then B cannot import A, even indirectly — but within each module, declaration order doesn’t matter. Thus it felt like a bit of a backwards step in F# to start worrying about declaration order within a file; you can’t use any function or value before you have declared it in the file.
  • So far, I find writing F# unexpectedly similar to writing modern Java. A lot of the streams or lambda code I’d now write in Java 8 is very similar to what I find myself writing in F# with Seq and fun. If Java were to add algebraic data types (aka discriminated unions) with pattern matching, and syntactic sugar for currying, the core of two languages don’t seem like they would be that far apart. I guess both are part of the recent convergence/smashing together of OOP and FP, but coming at the problem from different sides.

The Good

I’ve mainly talked about the bad and the ugly instead of the good here, but I do actually like F#! There’s several aspects that improve on Haskell. The convention to use the |> operator is good, meaning you write “source |> operation1 |> operation2 …” which I find more readable than Haskell’s conventional “operation2 . operation1 $ source”. Similarly, I prefer <<, or more often >>, as function composition rather than “.”. Records are nice and easy to use; if you know Haskell, you’ll understand what a relief writing myRecord.myField is. Similarly, the ability to stick printf/trace statements anywhere in your code is very welcome.

I’d still take Haskell over F# as a functional language given a free choice (for its purity and type system), but as a way to bridge the gap between FP and OOP (with a recognised VM and thus lots of libraries behind it), F# is a good language. I’ve complained a lot about the joins in the FP/.NET aspects here, but with most of them I can see the language designers’ quandaries which led to the compromise. I suspect you can’t get much better than this when trying to bolt FP on to an OOP framework and still retain easy interoperability. I should look at Scala again at some point as a comparison, but at least in .NET land, F# seems like a good choice of language, if you can just find a set of conventions that keep your code nice and tidy when you so often have multiple approaches to choose from.

Flash Boys

Flash Boys is a book by Michael Lewis about high frequency trading and exploits of the [US] stock market. Lewis writes great books, and seems to have a knack for finding the interesting people to write about in any given scenario. I’d recommend all the other books of his I’ve read, especially Moneyball and The Big Short. He usually writes about finance, but Flash Boys is about the intersection of finance and computing, which is why I’m highlighting it here. From my point of view, Flash Boys is about what happens when a human system becomes an algorithmic computer system.

Way back when, the stock market was a totally human process, with clients instructing stockbrokers to buy or sell, and deals carried out between people on the trading floor. Gradually, technology was used to support this process, until now the transactions are carried out purely electronically, with stockbroker machines passing orders to the stock exchange machine, which puts together buyers and sellers into transactions. This has various consequences, which are covered in Flash Boys, which I wanted to highlight here.

Too Fast For Humans

The market now moves so fast that there is no opportunity for microscopic human oversight. Machines are programmed to execute client orders, or to use an algorithm to try to make profit in the market, but with microseconds (or less) per trade, even the machine’s controllers are at the mercy of its actions. Knight Capital famously lost 440 million dollars in 45 minutes due to computer error, and that could be seen as a generously long time over which the error occurred. With little opportunity to correct matters after the fact (unless you have sufficient influence to get the stock exchange to rollback your mistake), it is increasingly important for your machines and algorithms to be correct.

Having said that, one interesting detail seemed to be that on a microscopic level, the market is actually very inactive. As I understand, the machines don’t generally sit there trading continuously in stocks against each other. The market settles to an equilibrium, and it’s only when new information is received that it makes sense to trade. Thus the activity is still driven by human input: when a buyer wants to buy, probably because a person somewhere has issued an order, the trading activity kicks in before (see speed, below) and after the human-triggered order, then stops.

Speed Matters…

One of the themes in Flash Boys is how being the fastest earns you money. It is less about being fast to access a single stock exchange, but more about being fastest between two sources of information. If you know a large order for, say, Apple shares is incoming from a buyer, but you can get to the stock exchange before them, you can buy some Apple shares at the current price and then sell them on to the buyer for slightly more. Similarly, if you see the price for a stock increase in New York, and you can get that information to Chicago fast enough, you can again buy in Chicago, and thus buy shares which are almost immediately worth more than you paid for them. There are tales in Flash Boys of building the straightest fibre optic link possible (time being distance over speed, and speed being limited by the speed of light, less distance is key) and moving around machine rooms to be closest to the exit. Lewis characterises these practices as a tax on the real buyers and sellers in the market: the high frequency traders who perform this racing between exchanges are making money at the direct expense of those slower than them in the market.

…But Doesn’t Have To

For many technological dodges, there are technological solutions. A group that Lewis focuses on try to build an exchange immune to many of these practices by increasing the latency to their exchange. By slowing everyone’s access down, they can eliminate the advantages gained by spotting price differences between that exchange and others.

Some elements of trading really are an information war. Certain regulations, or just certain programmed behaviours, mean that a small order for a share being placed in Chicago is likely to be a sign of a subsequent order arriving in New York (if you want to buy lots of one share, you may have to visit many exchanges to find a buyer). Deterministic behaviours offer opportunities for exploitation by other players in the market; but also offer guarantees of reproducible behaviour.

A Black Box

One running theme is how little many of the players in the stock market seemed to know about the implications of the electronic system underlying it. Some stockbrokers and investors were having their orders exploited by high frequency traders for months or years before they figured out what was going on. There was a complete lack of human oversight of how the system as a whole worked, partly due to regulations that prevented certain information being public, and partly because there was a lack of technical understanding of how the system worked. Lewis describes how the technical staff went from being subservient to the brokers, to being the ones highly sought after to provide a market advantage.

The Trial

This theme of technical ignorance permeates into the trial of one programmer, Sergey Aleynikov, accused of stealing program code. Whether or not you consider him guilty, it is troubling how little the investigator, prosecutor and expert witnesses seemed to know about relatively basic technical concepts. The investigator apparently only arrested the programmer on the suggestion/order of Goldman Sachs. The part where the investigator found it suspicious that the programmer had put the code in a subversion repository is a face palm moment. (If, like me, you have your technical hat on by default and don’t immediately see the problem, recall the dictionary definition of subversion to see the investigator’s view. Interestingly, even a search for define:subversion on google won’t offer the dictionary definition, and will only show you links to download Subversion.) The trial was still rumbling on in the past few months, with Aleynikov convicted a second time (having been convicted and acquitted once already), and then acquitted a second time.

Summary

Wikipedia points out several people claim that Lewis’s book gives an inaccurate and overly-negative view on high frequency trading; I don’t know enough details to judge. But as a book looking at what happens when a human system becomes a machine system, I found it fascinating. A recommended read if you want to consider the effects of computerisation on society. It’s also interesting to wonder what would be different in the book if all the people featured had had some computing education; what would have turned out differently? The trial might have reached a different verdict, the people may have sussed the stock exchange problems a little earlier, the stock trading rules may have been written slightly differently.