Steve's Tech Talk : Principle of Least Astonishment

Principle of Least Astonishment

I've been writing code for more than 25 years, if you count my initial foray as an Apple ][ coder as a teen (and I do). In my career, I've written a great deal of code (probably in the millions of lines at this point), and I've seen many changes in the practices of writing code, most for the better.

One change that is happening for the worst, I think, is API management. As software grows and software projects grow, APIs are getting huge. I've seen add-on product APIs that are
more extensive than entire operating system APIs.

In today's market, time to develop an application is paramount. One needs to be able to build an application quickly (there are other priorities, but I'll talk about those some other time). If you are working with Someone Else's Code, you need to be able to get up and running with it quickly and be able to use it effectively.

This is where POLA (Principle of Least Astonishment) comes in. POLA means that code should be written in a way that astonishes a client in the least possible way. There are several ways that a client can be astonished by your code:

Poor Nomenclature

This is probably the number one problem in designing code. How do you name something so that it will indicate what it will do and how it will do it. Language is rife with naming issues and ambiguities and in most cases, we can interact and ask questions. If you have good documentation, you might be able to work with that, but how often is that the case? Name your functions, classes, methods, parameters, etc. well and you will have a good start.

Hungarian notation is not a substitute for good nomenclature. Consider the following C function declaration:

char *strcpy(char *lpszS1, char *lpszS2);

This is correct (ignoring that I have left out the ANSI const), but heinous in at least four ways:
strcpy is a horrible name. strcopy is marginally better, stringcopy (or StringCopy or stringCopy or string_copy) better still. One might argue for CopyString as best since it matches the grammar of English, which is used to name it (ie, verb-object).
lpszS1 and lpszS2, while correct Hungarian names don't tell me at all what they're used for and then as a client, I have to start asking myself whether or not the argument order is meant to match an assignment statement or if it is a from/to. The traditional UNIX names of dest and src are way better, although I might lean to dst in that it is a consistent abbreviation form (see below)
Hungarian notation detracts from readability. It is unsightly at best and if you "get used to it" it is human nature to ignore it (if you think this isn't the case, try this if you have kids: if you leave a lot of toys out, put away or hide some then two weeks later, hide another set and reintroduce the hidden ones. Chances are, they won't miss the ones that are hidden, but sure will notice when they come out again).
Hungarian notation is a deterrent to refactoring since renaming your variables if the type changes is an impediment, and if you don't rename your variables, you now have misleading code, which is worse.
Don't abbreviate if you don't have to. One of Ken Thompson major regrets in creating UNIX is the call creat() for making a file. The older excuses for abbreviation (disk quotas, time/effort spent typing, intentional obfuscation to protect intellectual property) are all obviated by modern tools or equipment (storage is cheap, intellisense, implementation/data hiding in OOP languages and obfuscation tools).

Don't be overly cute or obscure. I am most guilty of this because I love language. I like to find exactly the right word for representing an action, but when I choose a word which is only in the quotidian vocabulary of an elite few, I'm doing my audience a disservice.

Be specific. For example, I hate methods named Insert. I prefer the suite InsertBefore, Prepend, InsertAfter, and Append, even if they are all implemented with a private Insert.

Inconsistency

Be consistent. You don't have to be a fascist in your organization. Just get concensus on how you're going to name things and how they're going to operate.

Be consistent in argument usage. If you're going to model an assignment statement (see above) for one set of routines, do it across all in your suite. UNIX stdio is notoriously bad at this. fgets(char *s, int n, FILE *stm) puts the FILE at the end. fprintf(FILE *fp, char *fmt, ...) puts it at the beginning. In my mind, the FILE should go first in all cases. In fprintf() it's necessary due to the nature of variant arguments, but taking that as a model and changing the others makes sense. It's starts with f, so make the first argument a FILE.

Be consistent about behaviors. If one object's version of Parse restores the file pointer seek position on error, they all should.

Side Effects

Don't put hidden side effects in your code as a "service" (like "fixing" bad inputs). You will have to live with those side effects for the life of your product or beyond.

Overloading Behaviors

Don't make functions that perform two separate and completely different tasks with one function/method.

The Win32 API is nototiously bad in this regard. There are a number of calls which follow the general form of:
int GetMumbleData(LPMUMBLEDATA lpData, int nBytes);
which return an error code on failure. Unless, of course, you pass in NULL for the data pointer. Then it returns the number of bytes needed for MumbleData. Yuck. For Pete's sake, it's just an API. Make another call like this:
int GetMumbleDataSize();

There's an additional problem - let's say that you're a cavalier programmer and you don't check for NULL (shame on you), and you wrote this code:

int size = GetMumbleData(NULL, 0);
LPMUMBLEDATA myMumbleData = (LPMUMBLEDATA)malloc(size);
int error = GetMumbleData(myMumbleData, size);
if (error != NOERROR)
HandleHeinousError(error, "Getting mumble");

If malloc() returns NULL, error will be set to the number of bytes needed for a MUMBLEDATA and will then be treated as an error, for which there is probably no meaningful error code.

Nope - way better to have an API that does what it says. Then, heaven forbid, if you don't check for NULL, GetMumbleData should return a correct error code for an invalid pointer.

In the olden days, there were storage limitations and library limitations and so on, so some organizations put strict limits on the number of exported APIs and got around these edicts by doing the Win32 thing. If you're in this position (and I'd seriously question the motivation for being required to limit the number of calls to an arbitrary number), at least define a macro to make it appear that you've done the API correctly.

Bad Layering

I see a good API or object model as one that is layered. The first, outermost layer should be the simplest and least granular. If a novice starts using your API, what's the path of least resistance to getting it working in the most common usage cases?

The second layer should include ways of changing/controlling/specifying the behavior of your code, and not only should it be trivial to implement the first layer in terms of this one, it damn well better be.

The third layer should include ways to override, extend, and otherwise hack with the operational model. The second layer should be straight forward, if not trivial to implement in this layer and it should be.

In many cases, I stop here. Three layers is enough for 90% of the cases, if not, I will have a fourth layer which is never public and is the most granular. This layer, if it exists, handles all the primitive operations that will be used by the other layers.

Here's how I conceptually model these layers in an OOP implementation:
Public default constructor that sets reasonable default parameters, a small to moderate number of public methods to perform actions.
Alternate public constructors to control the most obvious start-up parameters and public methods or properties for controlling behaviors.
protected virtual methods, callback routines, interfaces, events, etc.
private methods or linkage to private libraries.
Now consider this: if I've done my job right, 90% of my clients are going to have all their needs met by layers 1 & 2. Another 9% or so will be satisfied with layer 3. Any who aren't satisfied mean that I've missed a key feature, I've botched the operational model, or they're using the wrong tool for the job. In addition, 90% of my work should be done in layer 4 and layer 3, 9% in layer 2, and 1% in layer 1.

Published Monday, February 27, 2006 10:17 AM by Steve Hawley

Comments

Monday, March 27, 2006 1:33 PM by Steve's Tech Talk

# Indirection

A few months back I was reading this blog and kept this quote from Alan Kay in my mind: "any problem...

Wednesday, April 25, 2007 2:47 PM by Steve's Tech Talk

# PixelAccessors

For the 5.0 version of dotImage, I added a new abstraction to the way that images can be manipiulatd

Wednesday, December 01, 2010 6:36 AM by RealTime - Questions: "Using fgets in ANSI C?"

# RealTime - Questions: "Using fgets in ANSI C?"

PingBack from http://www.copious-systems.com/questions/4cf613721a8efc0664000175/

Anonymous comments are disabled

Steve's Tech Talk

This Blog

Syndication

Search

Navigation

Tags

Recent Posts

Archives

Principle of Least Astonishment

Comments

# Indirection

# PixelAccessors

# RealTime - Questions: "Using fgets in ANSI C?"