Binary is an Implementation Detail

Binary is a topic that people strongly associate with the subject of computing. After all, computing is the only area where binary is actually of any use: it’s ours, and ours alone. Binary is often one of the very first topics we teach students. But is it actually important — I would argue that binary is not very important at all for the computer science/software end of the subject (rather than the electronics/hardware end), and we should probably not bother to teach binary to computing beginners.

My post is in part triggered by some of the discussions surrounding the teaching of Computing in Key Stage 3 and 4 in the UK (i.e. ages 11-16), but I want to make it clear that I am making an academic (see the sidebar!) argument about the discipline, not attacking particular specifications or syllabuses, which are in general moving in the right direction in the UK (hurrah!). I also realise that teachers must teach what the syllabus requires, but this will hopefully be food for thought.

The Arguments For Binary

Below is a list of arguments I’ve heard for why binary is important and fundamental enough in computing to require teaching early on, along with my counter-arguments to each. See if you are convinced!

It’s important to know that all data is represented as binary numbers. I agree that it is important to realise that computing is ultimately all numbers, underneath it all. The representation of text as numbers is a useful understanding in computing, and the notion that pointers and references are ultimately stored as some form of number can also be useful. But the fact these underlying numbers are binary is an afterthought: if the numbers were stored as denary (aka decimal) then it wouldn’t matter.

You need to know that data is limited by the number of bits available. Most languages use limited-size integers as the common numeric type, and it is important to know that the storage range is limited. But that again doesn’t require knowing binary: the limits -128 to 127 are a bit odd without knowing binary, but the principle of limited storage is the same as if the limits were -99 to +99.

You need to understand what happens if your calculation overflows. Some would argue that it’s important to know what happens if you add 1 to the maximum integer (e.g. add 1 to a signed byte holding the value 127), for which you would need to know binary. Let’s take C as an example: C is generally recognised as being about the lowest level language above assembler, so if any language cares about these sorts of overflows, you’d think it would be C. So consider this snippet of C:

signed char x = 127;
x = x + 1;

Assume that the char type is 8 bits. What happens — what is the value of x after this statement? Well, the C specification is, in a sense, quite clear on this: anything. Not only is the result of the calculation in x undefined, your entire program’s behaviour is undefined: the program is allowed to exit because of the overflow. (For more information on why this is actually useful for optimisation, see here.) To be fair, Java actually does define what happens (x = -128), but I think ultimately C has the right idea: if your program is overflowing an integer, it’s doing something wrong. So the only reason to know what happens is to help diagnose a bug in your program. And even then, you can understand that it wraps round from the upper limit to the lower limit without knowing binary (same as if 99 + 1 wrapped to -99).

Binary helps explain bytes. Bytes are used throughout computing, in particular when you need to operate with files (which are a long list of bytes) or exchange network communications (also a long list of bytes). At this point, it becomes useful to understand the concept of bytes. For working with existing binary file specs (e.g. reading in JPEG) you need to know some binary (and probably bit-twiddling — see further down), and by all means teach it for this purpose. But interacting with existing binary file specs is generally an advanced topic, not something you should do as an early topic. If all you need to do is write text or numbers to a file then you still don’t need to know binary, you just need an API that can write integers or strings to files rather than byte buffers — such APIs are available in pretty much any language above C.

You can explain logical operations. Binary allows various logic operations: AND, OR, XOR, NOT. You can’t explain what these do without binary — if you’re using them as bitwise operations on bytes. Obviously, the notion of AND, OR and NOT are useful in computing — but on booleans. You can understand the logic operations on booleans without knowing any binary, and operating on booleans is the major application of these operations. I can’t think of any applications (besides things like bit-twiddling — see next paragraph) where using these operations on integers in a high-level language is necessary. Searching a handy large Java code-base for bitwise AND reveals a few uses for bit-twiddling when interacting with older APIs that use bit-masks, and one use that should use modulo arithmetic instead.

Bit twiddling. You can bit-twiddle using bitmasks and so on. While this can be a fun topic, it’s ultimately bad programming practice and is mostly gone from high-level programming. It’s useful to know eventually in case you need to deal with lower-level APIs, but it’s not useful for beginners.

Shift operations and optimisations. Some bit-based operations were historically faster than their arithmetic equivalents. Shift-right was a fast division for powers of two (with AND as the fast modulo), XORing could be used for fast swaps and fast zeroing. Nowadays, all modern compilers can perform this optimisation for you, so all you are doing in a high-level language is obfuscating the meaning of the code — again, bad practice.

It helps you understand the processor. Understanding a modern CPU is a lost cause (a subject that could fill another blog post), and I think this shows that binary fits in electronic and hardware courses (building adder circuits and so on), not as an early topic in mainstream computing. Telescopes, and all that (which wikiquotes claims is not actually a Dijkstra quote…).

It helps you understand networking. If you are studying networking down to the level of wire protocols and checksums then understanding binary is very important. But courses have to be quite specialised before they descend to this level: modern computing courses on networking focus more on TCP/IP and protocols at these higher levels on the network stack rather than the lowest levels such as Ethernet.

It helps you understand floating point. Understanding the dangers of floating-point is an important topic. For example, knowing not to compare two floating-point numbers for equality is important, but that aspect doesn’t require binary, simply an understanding that floating-point numbers have limited storage and are thus imprecise. Understanding why 0.1 recurs in a floating point number is something that does benefit from knowing binary. But it’s something that I believe is quite hard for students to understand even then: you don’t just need to understand integer binary, you need to understand binary digits beyond the decimal (binary!) point. I think I’d happily gloss over that issue rather than properly and fully explain it to beginners.

Summary

In summary, I believe that binary has its uses in some places in computing, but in general its worth in modern computing is vastly over-rated, and it does not need to be introduced much before university. Binary is not a fundamental topic that underpins computing: for the most part, it’s an implementation detail.

Advertisements

2 Comments

Filed under Uncategorized

2 responses to “Binary is an Implementation Detail

  1. Rob St. Amant

    See if you are convinced!

    Excellent. I’m convinced. Eventually computer science students will need to know binary, but I agree that early on it’s not as important as other topics.

  2. “it is important to realise that computing is ultimately all numbers”

    Well, provided you do not mean it *literally*. Ultimately, computing is a physical, mechanical process. There are no more numbers “in” a machine than there are letters, colors, or logical values or whatnot; there is no data, no information. The so-called theory of information is not about information but about physical states; about *forms*. When we write down an operation like “false or true” and get the right result, all happens *as if* a meaningful operation had been performed. The machine is so designed that there is an *analogy* between its functioning and operations, information, meaning which all are in our minds. Just like with an abacus [https://en.wikipedia.org/wiki/Abacus]. Computing is all *metaphor*.

    As long as we propagate such metaphors as if they were literal truths, mindlessly I’d say, people including ourselves can go on believing mad ideas about machines… and ourselves. Like the ones that drive ideologies underlying so-called IA or cognitive sciences, in particular computationalism. There is no information in a machine, less so information processing or conception: all is interpretation by programmers and users, in our “heads”. A computer is (like) a super abacus.
    “it is important to be aware computing is ultimately all metaphor”

    Otherwise, yes, binary is not that important. Rather, it will be truely unimportant the day we replace floats by decimal fractionals.

    Denis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s