John Sweller on Cognitive Load Theory and Computer Science Education

The opening keynote at SIGCSE 2016 was given by John Sweller, who invented cognitive load theory and works on issues such as the relation between human memory and effective instructional design. The keynote was purportedly about “Cognitive Load Theory and Computer Science Education”. As one or two people pointed out, the “Computer Science Education” part was a bit lacking, which I’ll come back to. (I should also point out this is a bit hazy as there was a full conference day before I had chance to write this — happy to take corrections in the comments.)

Knowledge and Memory

Sweller started by discussing biologically primary versus biologically secondary knowledge. Biologically primary knowledge is that which we have evolved to acquire: how to walk, how to talk, how to recognise faces, etc. None of this is taught, it is all learnt automatically. In contrast, biologically secondary knowledge is the rest, which he argues must be explicitly taught. Algebra, history, science: his rough summary was that everything that is taught in school is biologically secondary.

I won’t go deeply into working memory and long-term memory here, but the relevant parts were that working memory is small and transient (i.e. disappears quickly), whereas long-term memory is almost unlimited and can be retained permanently. Novices can get overwhelmed because everything novel must live in working memory, whereas experts have knowledge stored in long-term memory and so use different processes to tackle the same tasks. One thing I did not know before was that we have two working memories: one auditory and one visual, which means you can potentially achieve benefits by presenting information in a mixed modality, providing they mesh well together.

Visualisations

One issue that came up in the questions was about algorithm visualisation. Algorithm visualisation is something that many people are convinced is useful for understanding programming, but which has rarely, if ever, been shown to be effective for learning. Sweller’s suggestion was that if comparing two states (before/after) is important and non-trivial then it is better to provide two static diagrams for careful comparison, rather than a video which animates the transition from one to the other. My take-away message from this is that visualisations need to be comics, not cartoons.

Experts Use Memory More Than Reasoning

Sweller made the point that experts tend to operate through pattern-matching. Although we may think of programming as carefully constructing logical code paths to fit a given problem, we are often just recognising a problem as similar to one we have solved before, and adjusting our template to fit. More expert programmers just know more patterns (and “patterns” here matches well to the idea of design patterns). The difficult part of programming is thus only when we stray outside our pattern catalogue. This was work covered in some work in the 1980s which I’ve previously discussed here (and see also this post).

What May Make Programming Special

The issue of how knowledge is transmitted was contentious. Sweller is against constructivism: he believes the idea that knowledge is best gained through discovery is incorrect, and explicit instruction is always superior. This is where the Computer Science domain becomes important. I can see that for something like algebra, you must be taught the rules, and taught the processes by which to rearrange an equation. You can’t just mess around with equations and hope to learn how they work — because you have no feedback.

But the computer is an interesting beast. It provides an environment which gives you feedback. If you know the basics of how to write a program, you can potentially learn the semantics of a language solely by exploring and discovering. Can is not the same as should, though: explicit instruction may still be faster and more effective. But I think it went unacknowledged in the talk that programming is somewhat different to many other learning experiences, because you have an environment that offers precise, immediate, automatic feedback based on your actions, even if no other humans are involved.

Final Notes

There’s a bunch more content that I haven’t included here for space reasons or because I didn’t remember it clearly enough (didn’t reach long-term memory, I guess!). Terms to google if you want to know more: cognitive load theory, randomness as genesis principle, expertise reversal effect, worked-example effect, split-attention effect.

I liked Sweller’s talk, and I believe that understanding what he called human cognitive architecture is important for education. I think the main issue with the talk is that Sweller is a psychologist, not a computer scientist, and thus there was little consideration given to the potential ways in which computer science may be different. How does having syntax and semantics in code play with working memory for novices? What should explicit instruction look like in programming education? Do different notional machines cause different interactions with our memory? How can we best organise our teaching, and our design of programming languages and tools to minimise cognitive load? Some answers to that arise in Briana Morrison’s work (including the paper later in the day yesterday, not yet listed there), but there is clearly a lot more potential work that could be done in this area.

4 thoughts on “John Sweller on Cognitive Load Theory and Computer Science Education

  1. Neil, thanks for the link to my work. What I can say at this point is that empirical evidence does show that some of the well known principles (effects) of cognitive load theory that have been demonstrate in other STEM disciplines don’t necessary replicate exactly in computer science (intro programming). My argument is that CS IS different, but we don’t yet know all the ways it’s different or why it’s different. My hypothesis is that the cognitive load imposed on a learner in the beginning phase of learning programming is so high that all the effects don’t necessarily hold. And until a learner creates a schema in long term memory for a specific programming construct, it doesn’t do them much good to move on to another construct. This supports Anthony Robins’ Learning Edge Momentum Theory – which I hope will be the final chapter of my dissertation.

  2. One group that I think *may* disagree with Sweller are instructors who use POGIL (process oriented guided inquiry learning). Contrasted from POGIL, Sweller supported the “worked example” concept where a question is posed followed by a complete answer walkthrough. In giving an answer, an instructor is arguably imposing some particular way of seeing the problem which will work for some learners and not for others.

    The important letter in POGIL is the “G” and the question of how much guidance is given. If I understand correctly, strict POGIL adherents should never give an “OK, here’s how you do it” answer but rather give further inquiry-based tasks to give clarification.

    I don’t use POGIL, but I believe it was developed in Chemistry which in some ways is a field closer to algebra than CS and POGIL studies have shown greater retention of concepts.

    Finally, I think Sweller’s support of worked examples is orthogonal to his points on cognitive load, both inquiry learning and worked examples should reflect an understanding of cognitive load.

    p.s. I found it ironic that a presenter who spoke in detail about using different kinds of media in instruction used nothing but text slides in his talk.

Leave a comment