Estimated reading time: 20 to 30 minutes
By Luís Möllmann (luism6n@gmail.com)
Special thanks to Alexandre Salle, Pietro Menna and other colleagues for the valuable discussions on this theme.
What is the purpose of writing errors when programming? And how can we help people using our code by writing good errors? This is a summary of a few discussions I had on the subject and a short survey on how this matter is handled in the Go language.
Let's imagine you're writing code and you call a function that might return an error. What can you do with that error? There are two cases (ignoring the error is not an option!):
Having that, we can think what a good error offers the caller:
TemporaryError
informs the caller that they can retry the operation. EndOfFileError
can inform the caller that they've reached the end of input, and so on. And,Notice these two traits mirror the two cases listed before. It's important to distinguish what the code and what the IT guy can do about the error, since these are very different enterprizes. The next two sections go through this two traits in more detail.
Suppose you're calling a function called Compute()
, which is a remote procedure call (RPC). Many things can go wrong. The stack trace for this call may look like this:
Level | Function Call |
---|---|
RPC | Compute() |
HTTP | Get() |
TCP | Recv() |
IP | OS level |
… | … |
If Compute()
receives an error which originated in a corrput IP package or a failed DNS lookup, the caller of Compute()
shouldn't be able to recover. Just imagine writing scientific computing code and having if
statements to deal with network issues. Yikes. It is the job of Compute()
to distinguish which errors are recoverable and which ones are not. The caller can't deal with every single error on lower layers, because the branching factor makes this undesirable or even unfeasable.
Each function should summarize the errors from lower layers and give the caller only meaningful information to answer questions like: What can I do about this error here in the code? Is this error actionable in runtime?
This relates to point 1 mentioned in the beginning of this text.
Let's change perspectives. You're not writing code anymore, but debugging code or looking at system logs. In this case, you want the error messages to be complete. You want to see the errors from the first layer that failed until the main routine of the program.
Imagine you're debugging the program that uses Compute()
and you see this message:
Compute() failed: Connectivity error
You'll certainly be frustrated. Compare this to:
Compute() failed: Connectivity error: HTTP request failed: GET http://loclhost:8080: Could not resolve host: loclhost
Now you've found the culprit.
Doesn't this feel very different from what the code wants to see? It is now the responsability of the error to carry all the underlying causes of it. Every useful detail. Notice, however, that the information "could not resolve host" does not help the program to recover, but is extremely helpful for the programmer debugging it. Therefore, it is hidden in the message, not in an error code or error type.
To see if I could map those hypothesis to real applications, I took samplings of Go code. I went to the Github monthly trending Go repositories and downloaded (go get
) about 20 projects and their dependencies. Thanks to Go ubiquitous "if err ...
" statements, I could randomly sample projects for points where errors are encountered. I used this shell command:
This recursively finds all .go
files starting in current directory, shuffles them, takes the first 100 and finds in them the lines containing "if err
" (mind the space). Then it prints the match and the 5 subsequent lines. Phew. If you run this in your $GOPATH/src
folder you'll see output that looks like this:
I ran this command many times scanning the output. The goal here is to see what happened when an error was found in code. I didn't do proper statistics, but after reading many of those snippets, it seemed to me the handling could be grouped in some categories. You can run the above command and see if you find the same categories. I just recommend doing before reading any further to avoid confirmation bias. You may also disagree about how the sampling was done, or think that the command was just plain wrong. In any case, I've found the handlig to lie in these four calsses, ordered from most common to least common:
return err
. Can't do anything nor add information, so just return.return fmt.Errorf(..., err)
. Can't do anything, but debug information is preppended to the underlying error.log.Fatalf(..., err)
, log.Errorf(..., err)
, etc. This seems to be most common in source files for executables. Places like main.go
files, files with the same name as its parent folder, or files in a cmd
folder.return newTypedError(message, err)
. The underlying error is wrapped in a new type of error, raising the level of abstraction. This seemed surprisingly uncommon.I've also selected some examples which I thought were representative of the two uses for errors proposed here.
The following are code snippets to illustrate the concepts of cases 1 and 2, which I've distiguished in the beginning of this text. They're all collected by the code sampling technique I mentioned, but I've made the formatting a bit nicer. The first comment tells you where to find the code.
This is the caller's perspective of case number 1. I considered here cases in which the code takes action due to the error.
Here, branching occurs on a special error type:
In this example, the code turns on a flag and continues processing:
This is the error's perspective of case number 1. I considered abstracting when an error variable is collapsed into a single kind of error, when more than one bit of information is summarized in one error type or when errors are retinterpreted before they're returned to the caller.
Below, a special condition receives a name:
This time an error is reinterpreted:
And, here, potentially many types of errors are collapsed into one type:
This is the callers perspective of case number 2. Plenty of examples of simply logging the error were found. In this cases, the error message is exposed to whoever is looking at the terminal output. Notice how, even without context, you can see the errors seem to be "non-actionable". Things related to hardware failure, invalid input, hard network problems, failed system calls, etc. The code can't recover from this, but the programmer sitting in the chair can plug a network cable, optimize loops, fix the syntax error, and so on.
This is the error's perspective of case number 2. This were cases when the underlying error was not available to the caller, but its information was appended to the error message. Again, you can notice patterns similar to the aforementioned ones. These errors are irrecoverable, so a generic error type is returned. The caller can't do anything and the underlying cause belongs to log messages, not the callers code.
This is related to runtime decision making. When either a generic error is returned (return fmt.Errorf(..., err)
cases) or no error is returned (return nil
), the information of the underlying error is collapsed into one bit. The caller of this function has two cases to distinguish: either an error occured or it didn't.
When we define error types in Go (or exception types in Java, or special return values in C, etc.), we are giving the caller more information. A function that can return two kinds of error gives the caller three possible outcomes: errors of the first type, errors of the second type and no errors at all.
Go has the special trait in that errors are values. This allows for flexible error handling techniques. The standard library has many ways of creating and exposing errors to its users: using variables, types, methods, anonymous functions, etc. The principle is still the same: to convey the relevant information about the error to the calling code in the appropriate level of abstraction.
As we can see from the code samples, sometimes information is hidden from the caller. When we wrap the underlying error by doing
we are hiding from the caller the real cause of the error. Notice, however, that we're not hiding it from the person debugging the output of the code. They still see the underlying error in the final message, because that might be useful for debugging.
When returning errors to the caller, it's important to distinguish the two types of decision making developers go through: runtime and debug time. Taking into account well known themes in computer science, such as information hiding and abstraction, we must careful not to conflate those matters and hide useless information for the caller (though possibly useful to someone debugging) inside the error message and expose only the necessary information for runtime decision making.