Error Messages in Terraform

One of the changes in Terraform v0.12 is improved error messages. To my surprise, a slide containing one of these error messages draw quite an applause during the HashiConf 2018 keynote, so I thought I'd write a little about how they work.

Error: Unsupported Attribute

  on example.tf line 12, in resource "aws_security_group" "example":
  12:   description = local.wrong.foo
    |-----------------
    | local.wrong is "nope"

This value does not have any attributes.

There's a lot of information packed into these new error messages. Not all messages include all of these details, but the above example shows a situation where the full set of possible context is included.

Diagnostics

The in-memory model for a message like this is a tfdiags.Diagnostic value. Since Terraform deals with error messages from a variety of different sources that have different capabilities, tfdiags.Diagnostic is an interface type with a few different concrete implementations. The definition of the interface is as follows, in Terraform v0.12:

type Diagnostic interface {
	Severity() Severity
	Description() Description
	Source() Source
	FromExpr() *FromExpr
}

type Severity rune
const (
	Error   Severity = 'E'
	Warning Severity = 'W'
)

type Description struct {
	Summary string
	Detail  string
}

// (we'll look at Source and FromExpr in a later section)

The simplest possible diagnostic has a severity and a summary, with its detail string empty. This minimal sort of diagnostic would render as a one-liner:

Warning: Something non-fatal happened

In practice, only legacy messages that have yet to be updated have this minimal structure. There are lots of error messages in Terraform, and as of this writing we've not yet updated all of them.

For updated messages, a minimal diagnostic is a severity, a summary, and a detail message. The summary is intended to be a short, concise name for the problem that is searchable and memorable once you've seen it a few times. The detail message contains one or more paragraphs of more detailed information, ideally also including suggested resolution steps.

This simple form of message is used for errors that are "global" in nature, where no particular configuration construct is responsible for the problem, and render like this:

Error: No configuration files

Apply requires configuration to be present. Applying without a configuration
would mark everything for destruction, which is normally not what is desired.
If you would like to destroy everything, run 'terraform destroy' instead.

Internally this sort of message is represented as a "sourceless" diagnostic, which can be constructed with the tfdiags.Sourceless factory function. This uses a simple implementation of tfdiags.Diagnostic built in to the tfdiags package.

Returning Diagnostics

In Go programs it is conventional for a function that might fail to return an error result which is non-nil if a problem occurred, leading to the following usage pattern.

if err := doSomething(); err != nil {
    // Handle error
}

This pattern usually results in zero or one errors. Multiple errors can potentially be returned using special implementations of error that wrap other errors, but such patterns are not common. This approach also doesn't allow for returning warnings.

In Terraform we now use a slight variant of this convention for functions that can fail with user-facing error messages. Instead of error, we return a value of type tfdiags.Diagnostics, which is a named type for []tfdiag.Diagnostic that adds some additional helper methods.

Since a function will often accumulate multiple diagnostics before returning, a slightly different usage pattern emerges:

var diags tfdiags.Diagnostics

moreDiags := doSomething()
diags = diags.Append(moreDiags) // always append, in case there are warnings
if moreDiags.HasErrors() {
    return diags
}

// ... and then other operations that may further add to diags

// Finally, we return any accumulated diagnostics, which will still be nil
// in the happy path.
return diags

This pattern is used primarily in codepaths that deal with user input or which otherwise produce messages that directly address the user. The normal error pattern is used in cases where the error must be handled in some way by calling code, so we can continue to use the usual patterns for recognizing specific error types, etc. Functions that are at the inflection point between user-facing errors and internal errors are responsible for translating error return values into user-friendly diagnostics.

Diagnostics must sometimes be returned through functions that use the standard Go error pattern, and so the diagnostics package offers a mechanism to wrap a set of diagnostics that includes at least one error up in an error value, and then later to unwrap that error and recover the original diagnostics:

func returnsError() error {
    if diags := doSomething() diags.HasErrors() {
        return diags.Err()
    }
}

func producesDiagnostics() tfdiags.Diagnostics {
    var diags tfdiags.Diagnostics

    if err := returnsError(); err != nil {
        diags = diags.Append(err)
    }

    return diags
}

The diags.Append function can recognize error wrappers produced by diags.Err and unpack them to recover the original diagnostics. It also recognizes some other common patterns for bundling multiple errors in a single error value.

This wrapping and unwrapping is non-ideal, but it was a pragmatic way to propagage diagnostics up the call stack in full-fidelity even when some functions on the stack have not been updated to use diagnostics. Such updates may not even be possible in cases where control passes through a third-party library.

Source Location Information

For any diagnostic message that relates to a construct in a configuration file, it's important to include source location information to help the user quickly identify the cause.

The concept of diagnostics actually originally came from HCL 2, which has its own diagnostics model used to report parsing and evaluation errors. The tfdiags package has an implementation of tfdiags.Diagnostic that wraps an HCL diagnostic value and uses it to implement the Source() method, returning a Source value:

type Source struct {
	Subject *SourceRange
	Context *SourceRange
}

type SourceRange struct {
	Filename   string
	Start, End SourcePos
}

type SourcePos struct {
	Line, Column, Byte int
}

The Subject and Context fields of Source indicate two source code ranges related to the diagnostic. Subject is a small range indicating directly the problematic construct, while Context is an optional broader range that may be important to understand the problem.

Terraform uses both of these ranges to render the source snippet portion of a diagnostic message:

  on example.tf line 12, in resource "aws_security_group" "example":
  12:   description = local.wrong.foo

The Context range decides which lines are included in the snippet. If Context is nil then Subject is used also as the context. The Subject range is then shown as underlined in the message to indicate specifically which part of the rendered source code is problematic.

Terraform retains in memory the source code of each configuration file loaded from disk, and so the rendered snippet is simply constructed of slices from that buffer.

Terraform will also, where possible, show the header of the top-level block that contains the Subject range, which is resource "aws_security_group" "example" in this case. This is implemented using a helper function in HCL itself, where it can search the syntax tree to find the outermost block that contains a given source range. This portion of the rendered message may be omitted if HCL is unable to find such a block.

Approximate Source Location Information

In most cases, configuration-related problems are reported from within HCL itself or from a configuration-specific portion of Terraform, and so precise source location information is available.

An exception to this rule is any error that is raised from within a provider plugin. Providers are given only the final result values from decoding the configuration, and so they don't have direct access to source code constructs or source location information.

To address this, tfdiags has a special concept of contextual diagnostics, which are diagnostic values that are incomplete on initial construction but that can be completed by a caller adding additional context. When a provider wishes to indicate a problem in a particular configuration attribute, it will return a contextual diagnostic with the attribute path, and then Terraform Core will resolve that by reverse-engineering the source location information from the configuration syntax tree objects.

It's not always possible to recover an exact source location from a path within the configuration value because configuration decoding is lossy. In that case, Terraform will make a best effort to find a suitable source construct to use, ultimately falling back on the containing configuration block itself if nothing more precise can be found.

Showing Variable Values

The final interesting feature of the diagnostic messages is the contextual information about the values of various variables when the problem occurred:

    |-----------------
    | local.wrong is "nope"

This again exploits information from HCL's own diagnostic values, which are exposed via the FromExpr() method on tfdiags.Diagnostic, returning a pointer to a FromExpr value:

type FromExpr struct {
	Expression  hcl.Expression
	EvalContext *hcl.EvalContext
}

If the FromExpr() result is non-nil, the fields of this struct give access to the expression that was being evaluated when the problem occurred (Expression) and the evaluation context that was active (EvalContext).

This additional information is particularly useful for errors in expressions that are evaluated multiple times, such as the expressions within a for expression:

  foo = [for x in aws_instance.servers: x.vpc_security_group_ids[0]]

If an error occurs while evaluating x.vpc_security_group_ids[0], the EvalContext is what will tell us the value of x, hopefully allowing us to figure out which of the instances caused the problem.

Terraform uses the hcl.Expression object to discover which variables are mentioned in the offending expression, which it then includes in the rendered message.

At the time of writing this, the value-printing portion of the message renderer is still rather rudimentary. In later releases, we might enhance it with some additional heuristics to recognize common naming conventions in resource types so that it can, for example, also show the id value of an object that is involved in an error, even if the id itself was not used in the expression.

Summary

Terraform's diagnostic messages rely on gathering up various contextual information to use in message rendering. The goal is to include as much information as possible to allow you to figure out the cause of the problem and potential solutions.

Since native Go error values do not provide sufficient context, Terraform defines its own tfdiags.Diagnostics type for transporting diagnostic messages up the call stack. But to interoperate with other systems that follow the usual Go patterns, diagnostics can be wrapped up in an error and unpacked later.

HCL adopted this diagnostics pattern before Terraform did. HCL's diagnostics model is specific to HCL, while Terraform's is generalized to solve a variety of different challenges that arise from Terraform's architecture and its history.

The rendering of diagnostics in the output is likely to evolve over time as we get more experience with different error situations. The diagnostic model may also be exposed in a machine-readable way to text editor integrations to show problems directly in the editor.