The ugliest beautiful line of code

Some program code is considered beautiful, often because it accomplishes a great deal just by combining a few simple primitives. One such line in C is the loop that copies one string (const char* src) into another (char* dest), using only programming language primitives:

while (*dest++ = *src++);

(For those who don’t know C very well, “*src++” takes the value from the src pointer, then increments the src pointer. “*dest++ =” stores the value in the dest pointer then increments the dest pointer. The result of the expression is the copied value, and as long as this value is non-zero, the loop continues. Because C strings end in a zero, usually called a null terminator, this loop keeps copying from src to dest until the zero has been copied.)

The code does have an elegance to it. But the more I think about it, the more I realise that this one line manages to contain many of the flaws in the C language:

  • The expressions have both a prefix and postfix operator on the same item, which makes precedence very hard to guess. Is that (*dest)++ or *(dest++)? Both are reasonable answers, but you can’t judge it easily from the syntax. (See also “const char * p” — is that “const (char * p)” or “(const char) * p”?)
  • The loop body is a single semi-colon. This can cause problems for beginners, who instinctively put a semi-colon at the end of every line. For example, what does this code do?
    int n = 0;
    while (*dest++);
    {
        n++;
    }
    
  • The loop condition has a side effect, which is why no body is needed. In general, side effects in loop conditions and if conditions are a source of bugs because they make refactoring trickier and you are more likely to miss seeing side effects in conditions than other code. Not to mention that assignments in conditions are easily confused with equality checks. What do these two loops do: “while (*dest++ = 0);” vs “while (*dest++ == 0);” ?
  • The condition uses C’s ability to have any numeric type in an if condition, with non-zero meaning true. This is another source of bugs, as you won’t notice if you forget to add “== 0″ or “< n" on the end of a condition.
  • Perhaps most crucially, this code works because C’s strings are null-terminated. Unless you can guarantee that src has a null terminator within the bounds of the dest array (not src!), this is a buffer overflow that will write beyond the end of the dest buffer and trash your stack until src happens to finish. Oh, and if dest is pointing to a location in src before the end of the src string, it’s an overflowing infinite loop.

So what lesson should be taken from this, apart from the fact that C is a dangerous language? I wonder if we should beware any such terse elegant code. I’ve written some wonderfully brief code in Haskell before, and usually had to rewrite it again so that it is more easily intelligible for next time I look at it. A lot of people mock Java’s verbosity, but I at least find it reasonably rare that I have to deal with Java code that is too clever for its own good.

About these ads

1 Comment

Filed under Uncategorized

One response to “The ugliest beautiful line of code

  1. I wonder if the problem with the cleverness is that our languages can’t always express the cleverness in an understandable way.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s