The Theory of a Program

This is part two of a three part series on decisions. Here are part one and part three.

Main Ideas

Managing decisions—retaining their rationales and their history, and making future decisions that are consistent with past decisions—is the bottleneck in large software projects.
It’s easier to write code than read it because you don’t have access to those rationales.
People and time are the dimensions along which a decision may break down; by extension, documentation is useful to distant people, or the same people in a distant time.
Most tech companies manage decisions by sharding knowledge of their context across engineers.

One of my favorite essays on Software Engineering is “How to Build Good Software,” by Li Hongyi. I agree with most of the points it makes, but my favorite is that “[t]he main value in software is not the code produced, but the knowledge accumulated by the people who produced it.” The author elaborates:

To make progress, you need to start with a bunch of bad ideas, discard the worst, and evolve the most promising ones. Apple, a paragon of visionary design, goes through dozens of prototypes before landing on a final product. The final product may be deceptively simple; it is the intricate knowledge of why this particular solution was chosen over its alternatives that allows it to be good.

This knowledge continues to be important even after the product is built. If a new team takes over the code for an unfamiliar piece of software, the software will soon start to degrade.

I later discovered that this idea predates Li’s essay. Turing award-winner Peter Naur wrote a longer exploration of this idea in Programming as Theory Building in 1985 (it’s a long quote but all relevant):

A main claim of the Theory Building View of programming is that an essential part of any program, the theory of it, is something that could not conceivably be expressed, but is inextricably bound to human beings. It follows that in describing the state of the program it is important to indicate the extent to which programmers having its theory remain in charge of it….The building of the program is the same as the building of the theory of it by and in the team of programmers. During the program life a programmer team possessing its theory remains in active control of the program, and in particular retains control over all modifications. The death of a program happens when the programmer team possessing its theory is dissolved. A dead program may continue to be used for execution in a computer and to produce useful results. The actual state of death becomes visible when demands for modifications of the program cannot be intelligently answered. Revival of a program is the rebuilding of its theory by a new programmer team.

The extended life of a program according to these notions depends on the taking over by new generations of programmers of the theory of the program. For a new programmer to come to possess an existing theory of a program it is insufficient that he or she has the opportunity to become familiar with the program text and other documentation. What is required is that the new programmer has the opportunity to work in close contact with the programmers who already possess the theory, so as to be able to become familiar with the place of the program in the wider context of the relevant real world situations and so as to acquire the knowledge of how the program works and how unusual program reactions and program modifications are handled within the program theory.

In short, the essence of a computer program is not its source code, but a theory of what the computer ought to be doing for certain users and how.

I think that Li and Naur are describing the same issue I wrestled with in my prior essay on decisions. What their writing reveals is the extent to which managing these decisions—retaining their rationales and their history, and making future decisions that are consistent with past decisions— becomes the bottleneck in a software project.

On reflection, I now believe that many of the difficulties in software engineering reduce to this problem. For example, many others¹ have written about how code is easier to write than read, and I think this framing lays the cause bare: it’s transparently much easier to reason (an almost automatic process) and make your own decisions than to deduce the private reasoning behind another person’s decisions. If the other person’s reasoning was informed by experience you don’t have and can’t easily get, the work might be worth it, but then that’s the hardest reasoning to infer. You don’t know what’s motivating it.

Another example is “Software Engineering is what you get when you take programming and add people and time”². Or, equivalently (as I’ll explain): “write documentation for your future self as much as for others”³. People and time (i.e. “others” and “your future self”) are the axes along which a decision might break down. A decision that worked for one person may not work for another, or circumstances may change and a decision may no longer be useful. People and time are, by extension, where a past decision may be questioned and therefore where documentation may usefully provide an explanation. If you subtract people and time from software engineering, you remove the possibility that a decision might fail.

Wait, Can’t We Just Document Our Decisions Then?

Documentation is essential but necessarily has two flaws. First, documentation is always incomplete. Each decision made while developing a piece of software may have an arbitrarily complex rationale, and a piece of software is composed of almost innumerably many decisions. Second, each new line of documentation adds to the amount that must be read by an engineer looking to understand the software. Even if e.g. a new team member doesn’t need to know about a particular decision, extending the documentation by recording that decision will further dilute whatever information they do need.

If not through documentation, how does complex software stay alive and functional? Most tech companies solve this problem by documenting the most recent, most important decisions and storing the rest (per “Programming as Theory Building”) in their employees’ brains—that is, just knowing and understanding the decisions made in the company’s codebase is a big part of software engineers’ jobs at large tech companies (especially senior engineers). As I’ll explain in part 3, this explains some of the organizational dynamics that tech companies tend to have.

Epilogue: Can You Have a Big Project Without Big Headcount?

A single skilled engineer can store an enormous amount of information about a software project in their mind (c.f. Dwarf Fortress). But the risk associated with actually doing so is if the engineer leaves the project, then all of that information is lost. Tech companies try to avoid this by storing that knowledge redundantly across multiple engineers, and an explicit goal of many engineering managers is to distribute knowledge of their team’s codebase among their engineers. But, a natural question remains: is there any way to build a complicated piece of software that outlives its designer and isn’t supported by a big tech recruiting team?

Another Turing award-winner, Fred Brooks, discussed a version of this question in his 1986 essay “No Silver Bullet”:

How much of what software engineers now do is still devoted to the accidental, as opposed to the essential? Unless it is more than 9/10 of all effort, shrinking all the accidental activities to zero time will not give an order of magnitude improvement. Therefore it appears that the time has come to address the essential parts of the software task, those concerned with fashioning abstract conceptual structures of great complexity.

In it, Brooks discusses a variety of ideas for managing software complexity, but finds only a few promising prospects. However, one of those, “Requirements refinement and rapid prototyping”, I find promising myself. As far as I can tell, this is the approach that yet another Turing award-winner, Leslie Lamport, aims to take with TLA+. I have no experience with it myself, but but I know it’s already gaining adoption at Amazon.

It makes intuitive sense to me that such an approach would help. While I drew a sharp distinction between “making decisions” and “solving problems” in part one to make a point, the line is honestly somewhat blurry, especially in software. If you choose an inefficient algorithm that becomes a bottleneck for a few users, or if you handle a particular corner case poorly, did you make a creative decision that renders your product a bad fit for those users, or did you fail to solve a problem? I think many software teams make these decisions incidentally; they implement something simple and then deal with the results as they arise. Their projects then become very complicated; decisions ping-pong between different trade-offs and sharp edges (exacerbated by changing user demands, but that’s probably unavoidable). By forcing engineers to enumerate, formally, some properties that they expect their software to have, rather than allowing them to make decisions ad hoc, I can see how TLA+ (or similar tools; I know about Alloy) would greatly restrict a product’s decision space, which may mean more stable products.

Concretely, how much of senior engineers’ minds consists of knowledge like “we did it this way because our other approaches created UX traps/operational problems”? If tools like TLA+ catch on, software projects might require less of this flavor of institutional knowledge, because the TLA+ checker would prevent many bad ideas from getting released in the first place. I’m excited to see whether tools like TLA+ catch on and what effect they have on software team dynamics.

I don’t remember where I first heard this (I want to say Coding Horror, but if so, I couldn’t find it). But I found at least one blog post as well as tweets by @kathytafel and @recipromancer and others, as well as the odd HN comment. ↩︎
I first recall hearing this at Google, and sure enough it’s in Chapter 1 of “Software Engineering at Google”, though rewritten slightly:

This suggests the difference between software engineering and programming is one of both time and people.

It’s since circulated a bit in casual discussion on the Internet. ↩︎
This is likewise in “Software Engineering at Google,” Chapter 10:

Most of the documentation an engineer at Google writes comes in the form of code comments…Tricks in code should be avoided, in any case, but good comments help out a great deal when you’re staring at code you wrote two years ago, trying to figure out what’s wrong.

It also shows up in a few blog posts on design docs, and is often tweeted:

Hey developers - document your damn code, if not for yourself, then for the people that have to follow. - @lonnieezell

I found similar tweets by @Crell, @franconchar, @danigrrl and others ↩︎