Me Code Pretty Some Day

A voice in the wilderness

The Problem

1949: As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. It was on one of my journeys between the EDSAC room and the punching equipment that [...] the realization came over me with full force that a good part of the remainder of my life was going to be spent finding errors in my own programs.
—Maurice Wilkes
1959: The Supply Department [...] has been constructing a computer simulation of [the] supply system. This effort has been only partially successful. It is now recommended that work on this project be discontinued.
—Alan McDougall
1975: The tar pit of software engineering will continue to be sticky for a long time to come.
—Fred Brooks
1991: The construction of new software [...] is an unexpectedly hard problem. It is perhaps the most difficult problem in engineering today, and has been recognized as such for more than 15 years. It is often referred to as the "software crisis". It has become the longest continuing "crisis" in the engineering world, and it continues unabated.
—Winston Royce
1996: The Federal Aviation Administration has squandered 15 years and at least half a billion dollars on a new air traffic control system that is still years from completion and already obsolete.
—The New York Times
1997: Programming is costly and unpredictable [...], and the resulting code is often less than 100% reliable.
—Bjarne Stroustrup
2005: We waste billions of dollars each year on entirely preventable mistakes.
—Why Software Fails, IEEE Spectrum
2010: The software crisis isn't over—we're now in year 45
—Douglas Crockford, Canadian University Software Engineering Conference keynote address
2012: Billion-Dollar Flop: Air Force Stumbles on Software Plan
—The New York Times

As this litany of fail illustrates, we are not very good at writing software. This is a huge problem

money—billions—wasted
time lost
effort for naught

and—insult to injury—when we can't make our software work, we still have to solve the original problem by other means, or do without.

Given all this, you would think that people would be focused on two big questions

what is the problem?
what are we doing about it?

I posit that there are three groups of people involved in software development

management
academics
programmers

and part of the problem is that these three groups have very different ideas about the problem. In brief

Who What is the problem? What are we doing about it?
management process more process
academics technology research
programmers problem? what problem?

Who	What is the problem?	What are we doing about it?
management	process	more process
academics	technology	research
programmers	problem?	what problem?

Let's take a look at how these three groups approach the software problem.

Management

Management typically thinks that a software development problem is a process problem. This is understandable: management's job is to create and administer process. They see the problem in terms of their own job function.

For example, in 1997, the United States General Accounting Office (GAO) investigated software development problems in the Federal Aviation Administration's Advanced Automation System. They issued a report titled Immature Software Acquisition Processes Increase FAA System Acquisition Risks, where they write

Software quality is governed largely by the quality of the processes involved in developing or acquiring, and maintaining it.

I have worked in many organizations of varying sizes. My experience is that the amount of process that management creates depends not on the size, scope, or challenges presented by the software, but simply on the size of the organization: bigger organizations have more process.

Now, there's nothing wrong with this. Organizations need process in order to function, and organizations need process in order to develop software. A well run software shop will typically have product requirements, and functional specifications, and design documents, and version control, and a defect database, and test plans, and project reviews; and yet, for all that, there's always something missing. Can you see what's missing? Here, maybe a picture will help

                  Software Development Process

requirements                                    acceptance criteria
    functional spec                         system test
        design documents                functional test
            version control         bug tracking
                    	   \   |   /
                    	 -- here be --
                    	 -- dragons --
                    	   /   |   \

There, in the middle. That uncharted region. There is no process for actually writing software. Code. You know, that stuff that the programmers type into their terminals.

In fairness, there can't be any process for writing software, because writing software is a fundamentally creative activity. You can't do it by rote, or by the book. Or, if you can—if the software that you are writing is so repetitive and formulaic that you can write it by following a fixed procedure—then eventually someone will reduce it to a formula, and then you will come into work one morning and find that you have been replaced by a report generator, or possibly an intern with a spreadsheet.

Because there is no process for writing software, software is invisible to management. It's a bit like a black hole: they can see all around it, but they can't see the thing itself. In particular, management never reads the software that their own staff writes, and they don't know if it is good or bad.

Because there is no process for writing software, programmers are free to write whatever code they please, in whatever way they please. The quality of that code is dependent solely on the skill and dedication of each individual programmer. The results are highly variable. I would place the center of the quality distribution somewhere between bad and horrible, with a tail extending up to good and a substantial fraction that is catastrophic.

                  Distribution of Software Quality

	 |                      *
	 |	          *           *
	 |	     *	                  *
	 |	  *	                     *
Quantity |	*	                       *
of	 |    *	    	                          *
Code	 |   *   	                              *
	 |  * 	                                           *
	 | *    	                                               *
	 |*		                                                               *
	 +-----------------------------------------------------------------------------------
         Catastrophic   Horrible       Bad            OK             Good           Excellent

	                               Quality of Code ->

When a software project is in trouble, management does the one thing that it knows how to do: it creates process; it improves process; it administers process more carefully. In many cases, the end result is a better process for writing bad code.

In order for management to improve the quality of the software produced by their organization, they would have to somehow cause some programmer to write different code than they would have otherwise, and that virtually never happens. The reason that never happens is that nothing that management does to process ever reaches the uncharted region where there be dragons: nothing ever reaches actual code.

Academics

The second big group involved in software is the academics. These are professors of Computer Science at universities, and people doing research in industry. I divide the efforts of academics into two broad areas

infrastructure
technique

Infrastructure

By infrastructure, I mean computers, and the operating systems that manage them, and the programming languages that we use to program them. Over the last half century, the computing infrastructure available to us has grown enormously—exponentially.

You might think that powerful computers would ease our software problems, but in fact, they seem more to enable and amplify them:

To err is human; to really foul things up requires a computer.
—Bill Vaughan, 1969

Dijkstra puts the matter down to hardware

When we had no computers, we had no programming problem either. When we had a few computers, we had a mild programming problem. Confronted with machines a million times as powerful, we are faced with a gigantic programming problem.
—Edsger W. Dijkstra

but the software infrastructure is just as important. If we were still programming in machine language, we could never have software failures of the magnitude that we see today: humans simply can't write that much machine language. In order to have a million-line, billion-dollar software catastrophe, you first have to write a million lines of code, and that is only possible with powerful, modern operating systems and programming languages. Advances in computers are like advances in warfare: they allow us to do more damage.

Technique

By technique, I mean the things that programmers do when they write code: the nuts-and-bolts of getting the job done.

In the early days, there was some low-hanging fruit here, and academics made real contributions to the art. For example, it was Dijkstra who first observed that indisciplined use of GOTO statements was a problem, and suggested avoiding them. This led to structured programming, and genuine improvements in software quality.

But there is only so much low-hanging fruit. The basics of programming language design were laid down in the 1960s; more advanced techniques, like object-oriented programming, in the 1970s. Database access, networking, and inter-process communication are now well established technologies. Academics thrive on new questions and new research, and the questions that they investigate today are far beyond the work of ordinary programmers.

For example, The Association for Computing Machinery (ACM) publication acmqueue 2013 May issue has an article titled Structured Deferral: Synchronization via Procrastination. It looks like an interesting article; I may even read it some day. But the only reason I found this article is that Slashdot reported it, and it looks to me as if the only reason that Slashdot took notice of it is that the author motivates his problem with a somewhat humorous example of a database of animals in a zoo, one of which is Schrödinger's cat. Of the 40-odd comments on the Slashdot story, only one or two address the substance of the article; the rest are cat jokes or discussion of quantum mechanics. Ordinary programmers are simply not dealing with this kind of thing.

There are still new things being developed that affect technique and the work that programmers do, but they are not coming from the academy, and they are not helping us write better software. Two that come to mind are XML and PHP.

XML

Extensible Markup Language (XML) is a format for representing data. It was developed by the World Wide Web Consortium (W3C) to facilitate data interchange between disparate systems. XML is not a very good data format

The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.
—Simeon/Wadler

Nonetheless, XML has crept into all manner of software, complicating things where ever it goes.

XML is not the solution. XML is the problem.
—Steven McDougall

PHP

PHP is a programming language, and it is an unmitigated disaster. PHP was not designed by academics, nor even by programmers, but by people who profess not to be programmers, and to not like programming.

The PHP language is profoundly flawed on every level. It actively impedes the development of good software. Nonetheless, PHP is in widespread and growing use. Huge tracts of software are being written in this nightmare language, all of it unavoidably bad.

The academics have neglected their garden, and it is now being choked by weeds.

Programmers

Pace the GAO, software quality is governed largely by the skill and care of the programmers who write it. For the most part, that skill is lacking, that care is wanting, and the resulting software is abysmal.

Is it really that bad?

Indulge me in an extended metaphor.

Suppose you had a musician—a composer—and they were not very good. In fact, they were really pretty bad. When you looked at their composition, you saw problems at every level. The handwriting was close to illegible; the notation was sloppy and incorrect. Time signatures were screwed up; key signatures were absent or wrong. They didn't seem to have any concept of melody, or harmony, or rhythm. Dynamics and phrasing were random, at best. Voicing and instrumentation were...bizarre. The high level structure—verses, chorus, repeats—was a mess. And on and on, up to the broadest scope of their work, which, when you finally stepped back and looked at it, turned out to be fundamentally incoherent. Really bad.
But hey, everybody has to start somewhere, right? Perhaps they could improve: with instruction; with study; with practice.
But then suppose that you listened to them perform, and after a while, you started to realize that their basic attitude toward their instrument is that they are going to beat it into submission!!!.
At that point...at that point, you might begin to despair of the possibilities for improvement. You might begin to think that this person was not meant to be a musician.

Most commercially written software falls somewhere on the spectrum from really bad...to utter despair. But don't take my word for it:

looking at "average" pieces of code can make me cry. The structure is appalling, and the programmers clearly didn't think deeply about correctness, algorithms, data structures, or maintainability.
—Bjarne Stroustrup, The Problem with Programming

In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt.
—Blair P. Houghton

Five Easy Pieces

Programming is a creative activity. That means that there cannot be any fixed procedure or set of rules for writing good software, any more than there can be a procedure for painting a good painting, or rules for composing a good song.

But there can be rules for not writing bad software. Rules that should be followed by default, and only broken for good cause. Dijkstra's rule to avoid GOTOs is a good example.

We would like for programmers to write good software, but current practice in the field is so awful that if we could just get programmers to stop writing bad software—to stop doing things that are actively bad—it would be a great help.

Here are five things—five easy things—that programmers can do to improve their code

left brace
vertical alignment
short subroutines
small namespaces
functional programming

Left brace

Left brace means that the opening brace of a control structure goes on its own line, indented to the same level as the keyword that introduces the control structure. Right brace means that the opening brace goes at the end of the same line as the keyword. These snippets illustrate the difference

Left brace Right brace

for (...) { if (...) { while (...) { } } else { if (...) { } } }

for (...) { if (...) { while (...) { } } else { if (...) { } } }

Left brace	Right brace
for (...) { if (...) { while (...) { } } else { if (...) { } } }	for (...) { if (...) { while (...) { } } else { if (...) { } } }

This might seem a trivial matter, but left brace is overwhelmingly better than right brace. The reason is that the vertical alignment of the opening and closing braces guides the eye, making it easy to see where blocks begin and end, and how they nest.

With right brace, the opening and closing braces are offset diagonally. This makes it harder to identify blocks, and harder still when nested structures put their own braces on that same diagonal.

Vertical alignment

This is the data analog of left brace. When you have a series of statements—often assignment statements—that do substantially the same thing, insert spaces to make things line up vertically. Observe

Aligned Unaligned

$name ->{title } = $form->{title }; $name ->{first } = $form->{first_name }; $name ->{last } = $form->{last_name }; $address->{address1} = $form->{address1 }; $address->{address2} = $form->{address2 }; $address->{city } = $form->{city }; $address->{state } = $form->{state }; $address->{zip } = $from->{zip }; $phone ->{home } = $form->{home_phone }; $phone ->{business} = $form->{business_phone}; $phone ->{cell } = $form->{cell_phone };

$name->{title} = $form->{title}; $name->{first} = $form->{first_name}; $name->{last} = $form->{last_name}; $address->{address1} = $form->{address1}; $address->{address2} = $form->{address2}; $address->{city} = $form->{city}; $address->{state} = $form->{state}; $address->{zip} = $from->{zip}; $phone->{home} = $form->{home_phone}; $phone->{business} = $form->{business_phone}; $phone->{cell} = $form->{cell_phone};

Aligned	Unaligned
$name ->{title } = $form->{title }; $name ->{first } = $form->{first_name }; $name ->{last } = $form->{last_name }; $address->{address1} = $form->{address1 }; $address->{address2} = $form->{address2 }; $address->{city } = $form->{city }; $address->{state } = $form->{state }; $address->{zip } = $from->{zip }; $phone ->{home } = $form->{home_phone }; $phone ->{business} = $form->{business_phone}; $phone ->{cell } = $form->{cell_phone };	$name->{title} = $form->{title}; $name->{first} = $form->{first_name}; $name->{last} = $form->{last_name}; $address->{address1} = $form->{address1}; $address->{address2} = $form->{address2}; $address->{city} = $form->{city}; $address->{state} = $form->{state}; $address->{zip} = $from->{zip}; $phone->{home} = $form->{home_phone}; $phone->{business} = $form->{business_phone}; $phone->{cell} = $form->{cell_phone};

The aligned version shows the parallel structure of the code at a glance.

The unaligned version obscures the structure of the code. The programmer must wade through it, parsing each line in turn, while fending off the visual confusion created by the surrounding lines. In practice, a programmer encountering a wad of code like this is likely to skim over it, acquiring only a vague idea of what it does, and emerging with little more than hope that it is correct.

The aligned version also makes the typo stand out visually.

Short subroutines

Subroutines should be short.

The length of a subroutine depends on what it does, so any rule concerning subroutine length must necessarily be approximate and flexible. Some authors will cite an upper bound, such as 50 lines of code. Another approach is captured by the maxim

Each subroutine should do one thing well.

which limits the functionality of a subroutine to "one thing".

For me, the issue is strictly visual: the entire subroutine must fit on one screen. The reason is simple: I can't keep track of what a subroutine does unless I can see the whole thing at once. In particular, if I have to scroll to find a matching brace, I'm lost.

This has the perhaps surprising consequence that the length of my subroutines depends on my display technology. When I coded on punch-cards, my subroutines could sprawl across two sheets of 14"x11" fan-fold line-printer paper—over 100 lines. When I coded on 25-line character terminals, my subroutines had to fit on that 25-line screen. Now that I have big graphics terminals, they can push 40 or 50 lines, if necessary.

Of course, those are upper limits. In well written code, most subroutines are shorter than that, with the bulk typically falling in a range of perhaps 5 to 30 lines.

A sure sign that a subroutine has become too long is when the programmer has to leave a trail of breadcrumbs to find their way back out at the end.

void foo()
{
    ...

    for (...)
    {
        ...

        while (...)
        {
            ...

            ... hundreds of lines later ...

        }   // end of big while loop

    }   // end of big for loop

}   // end of subroutine foo

Small namespaces

This is the data analog of short subroutines. Just as subroutines divide code into small, isolated pieces, namespaces segregate variables into small, isolated groups.

Namespaces

reduce variable name conflicts
reduce the number of variables that the programmer has to think about at once
limit the region of code within which a variable is known

Beginning programmers sometimes put all their variables into global namespace; in response, some teachers forbid their students to use globals. This is well intentioned, but misses the point. The problem isn't that globals are bad: after all, some things really are global

#include <time.h>

time_t T0 = time(0);

The problem is having too many names in a single namespace. A student who revises

int a, b, c, d, e, i, ii, i2, n, nm, mm, x, xy, xxy, xz, z1, z1;

void main()
{
    ...
}

void main()
{
    int a, b, c, d, e, i, ii, i2, n, nm, mm, x, xy, xxy, xz, z1, z1;
    ...
}

hasn't really improved the situation. There are still too many names in a single namespace.

The solution is to divide the variables among multiple namespaces. In most programming languages, each subroutine creates a namespace for its local variables.

int a, b, c;

void main()
{
    int d, e;
    ...
}

int foo()
{
    int i, ii, i2;
    ...
}

int bar()
{
    int n, nm, mm;
    ...
}
...

Now we have just a few globals, and each subroutine has just the few variables that it needs. (If every subroutine needs access to all 17 variables, then you have bigger problems than poor namespace management.)

Just as subroutine length is limited by display technology, namespace population is limited by biological working memory: the number of things that a human can keep track of at once. The capacity of working memory is usually given as seven, plus or minus two; that is, five to nine items. In well written code, most namespaces will respect this limit.

Functional programming

Functional programming is an alternative to imperative programming. Imperative programming is about telling the computer what to do. Functional programming is about expressing the result that you want. Functional programming is a different way of thinking about code, and it takes a bit of a mind shift. Here are some examples.

Suppose we want to round a number to the nearest integer. First, we remember the rule: if the first decimal is less than 5, round down; otherwise, round up.

In the imperative style, we take that rule and translate it directly to code: we do what the rule tells us to do

double rounded;
if (x - floor(x) < 0.5)
    rounded = floor(x);
else
    rounded = floor(x) + 1;

In the functional style, we stop thinking about procedure—stop thinking about doing things—and think instead about the result we want, and how to express it. This leads to

double rounded = floor(x + 0.5);

Functional programming—computing with expressions, rather than procedures—can often simplify and clarify code. How far you can go with functional programming depends partly on how much support your language provides for it. About all you get from C is the conditional operator ( ? : ). Perl provides map and grep, which can be used to good effect in many places.

Suppose we want to scan a directory and make a list of all the files (but not subdirectories) that it contains. We could write

my @files;
for my $name (<$dir/*>)
{
    push @files, $name if -f $name;
}

but with the grep operator, we can reduce that to

my @files = grep { -f } <$dir/*>;

If we want a table of files and sizes, we could write

my %size;
for my $name (<$dir/*>)
{
    -f $name or next;
    $size{$name} = (stat $name)[7];
}

With the map operator, this becomes

my %size = map { $_ => (stat)[7] } grep { -f } <$dir/*>;

Functional programming has several advantages. The first, obviously, is code size. In the examples above, a 1 line expression replaces a 4 or 5 line loop. This is typical, and every time you do it, your code becomes smaller and simpler.

Less obvious—but perhaps more important—the functional style saves the programmer from having to read and interpret control structures. Reading control structures is hard, because you have to hold the execution state in your head as you emulate the operations that the computer will carry out when it executes it. Functional programming lifts that burden from the programmer.

Five really easy pieces

I want to emphasize that these things are not hard to do.

Left brace and vertical alignment are strictly cosmetic issues: they only concern the appearance of code. You don't have to revise any code; you don't even have to think about what the code does. All you have to do is push the symbols around on the screen until they line up.

Short subroutines and small namespaces do require you to think about code. In particular, they require you to write subroutines. This takes thought and effort, yes, but surely writing subroutines is within the capabilities of a competent programmer.

Real functional programming—with higher-order functions, and closures, and recursion—can be complex, and abstract, and is not well supported by common programming languages. But that's not what I'm talking about here. All I'm suggesting is that programmers not write loops and conditionals in places where an expression and an assignment would serve.

Pieces on the ground

I've been in this business for 30 years now. I've worked in many different places. I've written a lot of code; I've read a lot of code; I've fixed a lot of code. And everywhere I've been, I've seen the same thing: almost no one codes like this.

The vast majority—80%, 90%, 95%—of programmers code right brace.

No one pays any attention to vertical alignment. Programs are littered with inscrutable wads of code, where the highest priority seems to be conserving whitespace.

Subroutines sprawl across hundreds of lines of code; scores of variables are thrown into global namespace—or into those huge subroutines.

I have, on occasion, tried to get other programmers to improve their practice. I've worked with programmers, and for programmers; I've supervised programmers. I've been in code reviews; I've drafted coding standards. I've tried to lead by example; to persuade; to require.

These efforts have been uniformly unsuccessful. Many programmers can not be made to understand the issue. Code is code; what's the problem? Others insist, more or less forcefully, on their prerogative to code as they please, and they please to do it their way, not mine. The few who are responsive are those who follow these practices of their own accord, and then I'm preaching to the choir.

It's depressing. It's discouraging. And...it's part of the problem.

Part of the problem

The problem—the problem—is that we humans are not very good at writing software, and a big part of that problem is that we programmers are not very good at writing code. It doesn't matter how good your process is, or how advanced your technology is, if your code is a horrid mess, then your software is not going to work.

As the Stroustrup quote above suggests, a typical badly written program has problems on many levels. But if programmers won't observe good practice in the most elementary matters—decent formatting and minimally competent subroutine structure—what hope is there of addressing deeper issues? Code matters.

What is the problem?

In view of all this, we need to return to the question that we started with

what is the problem?

Why do programmers insist—and persist—in writing such wretched code?

Over the years, I have discovered that people usually have reasons for the things that they do. Their actions are rarely arbitrary; even things that seem perverse often make sense once you understand their objectives, and interests, and priorities.

The standard example of ancient nonsense—the debate about angels on pinheads—makes sense once you realize that theologians were not discussing whether five or eighteen would fit, but whether a pin could house a finite or an infinite number.
—S. J. Gould, "Wide Hats and Narrow Minds"

With this in mind, let's take another look at our Five Easy Pieces, and see if we can understand why so many programmers code the way they do.

The Imperative Mood

Functional programming asks the programmer to compute with expressions, rather than control structures. Most programmers seem to prefer control structures; many programs are positively overrun with them.

I went looking for an example of this, and the Code Snippet Of the Day immediately yielded The Truth of the Matter.

Code Snippet Of the Day (CodeSOD) features interesting and usually incorrect code snippets taken from actual production code in commercial and/or open source software projects.

The task is to flip a bit. The snippet shows four lines of imperative code that don't work

if (showOptionsButton == true)
    showOptionsButton = false;
if (showOptionsButton == false)
    showOptionsButton = true;

A better solution is one line of functional code that does

showOptionsButton = !showOptionsButton

It is easy to look down on the programmer who wrote the faulty code, but it is very difficult for humans to reason correctly about the operation of control structures. The programmer's real fault is not that he wrote an incorrect if statement, but that he chose imperative code over functional code in the first place. Worse, if we read the article, we find that he stood by his imperative code and refused the functional code, even when the problem was pointed out to him.

I think what we are seeing here is a version of the paradox of the active user. As described by Jakob Nielsen, this is

a well-known phenomenon in user interface design: people are more motivated to start using things than to take the initial time to learn about them [...]
[This] is a paradox because users would save time in the long term by taking some initial time to optimize the system and learn more about it. But that's not how people behave in the real world [...]

Programmers are subject to the same paradox: they would rather write code than think about the problem. Some programming courses caution against this:

The sooner you start coding, the longer it takes.

but their advice is evidently as ineffective as mine.

A programmer who wants to be active tends naturally towards imperative programming, and away from functional programming.

In order to write functional code, you first have to think about the problem, and the result that you want, and how to express it. That's not what the active programmer wants to do. The active programmer wants to write code. Now.

If you write imperative code, you don't have to think about anything. You just walk through the problem in your head, and translate your thoughts straight into code: "If it's true, set it to false; if it's false, set it to true".

As the example above shows, getting it right is harder than that, but the active programmer is not driven by a need to get it right. The active programmer is driven by a need to be doing something, and writing imperative code satisfies that need.

Run-on sentences

The rule that subroutines should be short is widely known, widely accepted, and widely disregarded. I've spent more of my life than I care to think about crawling through subroutines that run on for hundreds—sometimes thousands—of lines.

It appears that many programmers begin at main, and just keep typing, producing one long unbroken stream of code. When it gets too long to keep track of, they start adding banner comments, like mile markers along a highway.

void main()
{
   /*
    * Do the foo thing
    */
    x = 1;
    y = x - 2;
    z++

   /*
    * Do the bar thing
    */
    w = 42;
    x = substr(z, 3, y);
    z--;

   /*
    * Do the baz thing
    */
    xx = x + 1;
    yy = y - 1;
    zz = xx * yy;

   /*
    * Are we there yet?
    */
    ...
    ...
    ...

   /*
    * Wake me when it's over...
    */
    ...
    ...
    ...

The problem here is not the underlying program structure. If the task at hand is to carry out a long list of actions, one after another, then the program should carry out those actions, one after another. But the code shown above snatches defeat from the jaws of victory.

If, as the code grew, the programmer started writing subroutines, instead of banner comments, then they would arrive at the vastly superior

void main()
{
    foo();
    bar();
    baz();
    if (there_yet())
        wake_me();
}

void foo
{
    x = 1;
    y = x - 2;
    z++
}

void bar
{
    w = 42;
    x = substr(z, 3, y);
    z--;
}

void baz
{
    xx = x + 1;
    yy = y - 1;
    zz = xx * yy;
}

int there_yet() { ... }

void wake_me()  { ... }

With subroutines, each action is isolated and encapsulated in its own small block of code, and—Bonus!—we get a short, comprehensible main to give us an overview of the whole process.

The subroutine names become our mile markers, so we can dispense with the banner comments, and the size of the program increases only by the size of the new main routine. True, we're probably going to need some parameters, and some return values, but that's part of what subroutines do for you: they create data interfaces between different pieces of code.

Namespaces very much come along for the ride with subroutines. If you have one big subroutine, then all the variables needed by all the code in that subroutine are going to end up in that subroutine's namespace—or worse—in global namespace. When you divide the code into subroutines, then each subroutine has just the variables that it needs, and the namespace populations become manageable.

So, long subroutines have many problems, and short subroutines have many advantages. And writing subroutines isn't even that hard. Why then do programmers persist in writing these monstrosities? I think that long subroutines are run-on sentences.

Little children write run-on sentences because they haven't yet learned that there is more to writing than transcribing their words. Good writing has grammar, sentences, paragraphs, chapters: it has structure.

In just the same way, programmers write long subroutines because they don't understand that there is more to programming than transcribing their thoughts. Good software has syntax, statements, subroutines, modules: it has structure.

The programmers who write long subroutines aren't doing the fundamental work of programming: they aren't creating structure by which code can be partitioned and organized. They are just transcribing their thoughts, one after another, stream-of-consciousness. They are snatching defeat from the jaws of victory.

Cats

Left brace and vertical alignment are the most trivial, yet also the most profound of these issues. Trivial, because they don't affect the function of code, only its appearance. Yet profound, because of what they reveal about programmers.

A subroutine can have any number of lines; a namespace any number of variables, but brace position is binary: either you code left brace or right brace. As it happens, most programmers code right brace. I have known many programmers, and asked many of them why, and gotten many different answers, none entirely satisfactory.

Some programmers say that it is arbitrary and they don't care. But if you then suggest that they code left brace, they won't do it. What's more, if programmers were truly indifferent to brace style, then you might imagine that a good fraction (half?) would code left brace; yet very few do. So there is more to it than that.

Some programmers say that it is arbitrary and they do care, and since it is arbitrary, their way is as valid as any other, and no one can tell them otherwise. This would seem to be a principled stand, but the principle that they are asserting is their right to make an arbitrary choice, which is not very illuminating.

Many programmers cite authority of one sort or another. They code right brace because that is the way they were taught, or the way it was shown in their textbook, or for conformance to some coding standard, or for consistency with existing code. All these responses merely beg the question: why do all those authorities code right brace?

The only substantive answer I ever receive is to save a line on the screen. If you code left brace, then the opening brace takes up an entire line by itself; if you code right brace, then the opening brace goes on the same line as the keyword.

Left brace Right brace

for (...) { ... }

for (...) { ... } # One line shorter!

Left brace	Right brace
for (...) { ... }	for (...) { ... } # One line shorter!

This could account for why textbooks are written right brace. Authors have page breaks and page counts to contend with; saving one line per keyword helps with both. But books are not programs: they have different purposes and constraints.

A program in a text book differs from a real program.
—Bjarne Stroustrup

Programmers who ape every aspect of the examples in their textbooks are drawing the wrong lessons.

Saving lines

In real programs, coding right brace to save lines on the screen seems senseless.

Back in the day, people coded on 25x80 character-mode terminals. Saving one of those 25 precious lines was at least a tenable reason for coding right brace. Not dispositive—I coded left brace on 25-line terminals—but tenable. Today, everyone has mega-pixel screens that easily display 50 or 60 lines of code. Coding right brace on a screen that size—sacrificing readability to save lines—is a false economy.

More than that, a modern screenful of code is more than you should be looking at at once—more than you should need to look at at once. If you are coding right brace in order to squeeze even more lines onto the screen in the hopes of being able to understand what you are writing, then you have bigger problems than an inferior brace style. Code like that desperately needs to be reorganized, and abstracted, and—especially—broken up into smaller subroutines (see Run-on sentences, above).

The trouble with braces

To understand all this, we need to remember that people are rarely arbitrary. People make decisions for reasons, and we can understand their decisions if we understand their reasons.

The trouble with braces begins with programmers who write stream-of-consciousness imperative code. The resulting software is overrun with sprawling control structures.

If you pull the braces over to the left, then you can see the control structures. Upon inspection, you will find that most of them are either unnecessary, incorrect, or incoherent. With this understanding, you can fix what is broken and remove what is useless. You will then have left-brace code that is smaller, simpler, cleaner, and more correct than the original right-brace code. The Programmers' Stone shows an example of this process; scroll down to the section headed The Quality Plateau to see it.

Ignoring code

The substantive reason that programmers give for coding right brace is to avoid putting the opening brace on a separate line. But they aren't doing this so that they can fit more code on the screen. They are doing this so that they can ignore the code on the screen.

If they coded left brace, then the structure of their code would be manifest on the screen, and they would see that their code has been overrun by big sprawling control structures, and looking at those big sprawling control structures is viscerally unpleasant—even for the programmers who write them.

Coding right brace gives the programmer a way to compact—squish—the software. With the braces on the right, and the extra lines closed up, it looks less like code, and more like a paragraph of English prose. It becomes dense and unreadable.

This sounds bad, but to programmers who code right brace, it's not a bug, it's a feature. They don't want to read the code, or think about the code, or understand the code. Squishing the code into an unreadable wad of text on the screen keeps it from taking up space in their heads. It makes it easier to ignore it and move on.

The universal unwillingness to maintain any kind of vertical alignment stems from this same underlying desire to ignore code. When things are aligned on the screen, then it is visually obvious that the code does have some structure, even if that structure is nothing more than a series assignment statements.

The programmer does not want to be burdened by having to see, or think about, that structure. By leaving everything unaligned, the programmer obscures the structure of the code. The code appears as a compact blob on the screen. It doesn't have any apparent structure. It doesn't take up any space in their head. They can ignore it and move on.

One place where this desire to ignore code becomes unmistakable is when software crosses a language boundary—for example, when a Perl program contains an embedded SQL query.

The programmer apprehends that their task is to write a Perl program, and they will write the Perl code with some reasonable formatting and indentation. But the programmer does not recognize the SQL as part of the program. Rather, the SQL is just some stuff that the Perl code sends to the database. So when they get to the SQL, they write it in one continuous stream, without even line breaks, let alone any kind of formatting. The result is something like

sub GetUser
{
    my($DBH, $userID) = @_;
    my @user = $DBH->selectrow_array("SELECT first_name, last_name, address_1, address_2, city, state, zip, phone FROM users INNER JOIN status USING(id) WHERE id = $userID AND status = 'active'");
    return @user;
}

In deference to your browser, I have limited the SQL in this example to 200 characters. I have seen SQL like this that goes on for many hundreds of characters, wrapping around and around into a solid block of text when viewed in a text editor.

I have broken out and formatted SQL like that (so that I could understand it and work on it), only to see the original programmer knock it back to a single line of text the next time they edited the file. They desperately do not want to see or think about the code that they are writing. It takes only the slightest of excuses—some code in a different language—for them to completely abandon any kind of formatting, and just type out solid blocks of text.

Covering your waste

If you have cats, and you watch cats, one thing that you can see is that most of what cats do is stereotyped behavior. In other words, cats do not understand the things that they do in the way that humans do. Instead, cats have a collection of more or less hardwired behaviors—routines. From moment to moment, their behavior consists of selecting a routine that is appropriate to the time and circumstance, and then carrying out that routine.

Now is a time to sleep. Now is a time to lick myself (not clean: the cat does not understand that it is cleaning itself). Now is a time to hunt. Now is a time to eat. And so on.

One thing that cats do is cover their waste. This is adaptive: a cat that does not cover its waste advertises its presence in the area to both predators and prey. As with all things, cats do not understand what they are doing when they cover their waste, but if you watch cats, you can see that they have a routine for doing this, and you can see how this routine works.

First, the cat eliminates. Then, the cat turns around and sniffs at the place where it eliminated. Then it turns around again and kicks at the ground behind it. Then it turns around and sniffs again. The cat repeats this sequence: turn and sniff; turn and kick; turn and sniff; turn and kick, until the smell goes away. Then it moves on.

All these programmers who code right brace, and who won't align anything, are acting like cats covering their waste. They kick at the code until it doesn't smell too bad, and then they move on.

Notes

someone will reduce it to a formula

probably a programmer

black hole

pit of despair?

exponentially

see: Moore's law

Huge tracts of software

One small consolation is that it is mostly GUI code. Should we care?

when nested structures put their own braces on that same diagonal

I find the 4-way else configuration specially confusing:

        }
    } else {
        if (...) {
        }

typo

I have found bugs in production code simply by aligning statements to flush out typos.

looking for an example

because you can't make this stuff up

it is very difficult for humans to reason correctly about the operation of control structures

That's one reason that we build computers to execute them for us.

all the variables needed by all the code in that subroutine are going to end up in that subroutine's namespace

In most languages, variables can be localized to blocks within a subroutine, but programmers who aren't writing subroutines probably aren't localizing their variables, either.

either you code left brace or right brace

It's actually more complicated than that. The Wikipedia article on Indent style documents a dozen different ways to arrange braces. But a simple left brace-right brace distinction will suffice for this discussion.

Back in the day, people coded on 25x80 character-mode terminals

Way back in the day, people coded on punch cards, but that mostly predated curly-brace languages.

cats

I have cats.

Steven W. McDougall / resume / swmcd@theworld.com / 2013 June 08