One of the changes in Terraform v0.12 is improved error messages. To my surprise, a slide containing one of these error messages draw quite an applause during the HashiConf 2018 keynote, so I thought I'd write a little about how they work.
Error: Unsupported Attribute on example.tf line 12, in resource "aws_security_group" "example": 12: description = local.wrong.foo |----------------- | local.wrong is "nope" This value does not have any attributes.
There's a lot of information packed into these new error messages. Not all messages include all of these details, but the above example shows a situation where the full set of possible context is included.
Diagnostics
The in-memory model for a message like this is a tfdiags.Diagnostic
value.
Since Terraform deals with error messages from a variety of different sources
that have different capabilities, tfdiags.Diagnostic
is an interface type
with a few different concrete implementations. The definition of the interface
is as follows, in Terraform v0.12:
type Diagnostic interface { Severity() Severity Description() Description Source() Source FromExpr() *FromExpr } type Severity rune const ( Error Severity = 'E' Warning Severity = 'W' ) type Description struct { Summary string Detail string } // (we'll look at Source and FromExpr in a later section)
The simplest possible diagnostic has a severity and a summary, with its detail string empty. This minimal sort of diagnostic would render as a one-liner:
Warning: Something non-fatal happened
In practice, only legacy messages that have yet to be updated have this minimal structure. There are lots of error messages in Terraform, and as of this writing we've not yet updated all of them.
For updated messages, a minimal diagnostic is a severity, a summary, and a detail message. The summary is intended to be a short, concise name for the problem that is searchable and memorable once you've seen it a few times. The detail message contains one or more paragraphs of more detailed information, ideally also including suggested resolution steps.
This simple form of message is used for errors that are "global" in nature, where no particular configuration construct is responsible for the problem, and render like this:
Error: No configuration files
Apply requires configuration to be present. Applying without a configuration
would mark everything for destruction, which is normally not what is desired.
If you would like to destroy everything, run 'terraform destroy' instead.
Internally this sort of message is represented as a "sourceless" diagnostic,
which can be constructed with the tfdiags.Sourceless
factory function.
This uses a simple implementation of tfdiags.Diagnostic
built in to the
tfdiags
package.
Returning Diagnostics
In Go programs it is conventional for a function that might fail to return
an error
result which is non-nil
if a problem occurred, leading to
the following usage pattern.
if err := doSomething(); err != nil { // Handle error }
This pattern usually results in zero or one errors. Multiple errors can
potentially be returned using special implementations of error
that wrap
other errors, but such patterns are not common. This approach also doesn't
allow for returning warnings.
In Terraform we now use a slight variant of this convention for functions that
can fail with user-facing error messages. Instead of error
, we return a
value of type tfdiags.Diagnostics
, which is a named type for
[]tfdiag.Diagnostic
that adds some additional helper methods.
Since a function will often accumulate multiple diagnostics before returning, a slightly different usage pattern emerges:
var diags tfdiags.Diagnostics moreDiags := doSomething() diags = diags.Append(moreDiags) // always append, in case there are warnings if moreDiags.HasErrors() { return diags } // ... and then other operations that may further add to diags // Finally, we return any accumulated diagnostics, which will still be nil // in the happy path. return diags
This pattern is used primarily in codepaths that deal with user input or which
otherwise produce messages that directly address the user. The normal error
pattern is used in cases where the error must be handled in some way by calling
code, so we can continue to use the usual patterns for recognizing specific
error types, etc. Functions that are at the inflection point between user-facing
errors and internal errors are responsible for translating error
return values
into user-friendly diagnostics.
Diagnostics must sometimes be returned through functions that use the standard
Go error
pattern, and so the diagnostics package offers a mechanism to wrap
a set of diagnostics that includes at least one error up in an error
value,
and then later to unwrap that error and recover the original diagnostics:
func returnsError() error { if diags := doSomething() diags.HasErrors() { return diags.Err() } } func producesDiagnostics() tfdiags.Diagnostics { var diags tfdiags.Diagnostics if err := returnsError(); err != nil { diags = diags.Append(err) } return diags }
The diags.Append
function can recognize error wrappers produced by diags.Err
and unpack them to recover the original diagnostics. It also recognizes some
other common patterns for bundling multiple errors in a single error
value.
This wrapping and unwrapping is non-ideal, but it was a pragmatic way to propagage diagnostics up the call stack in full-fidelity even when some functions on the stack have not been updated to use diagnostics. Such updates may not even be possible in cases where control passes through a third-party library.
Source Location Information
For any diagnostic message that relates to a construct in a configuration file, it's important to include source location information to help the user quickly identify the cause.
The concept of diagnostics actually originally came from HCL 2, which has its
own diagnostics model used to report parsing and evaluation errors. The tfdiags
package has an implementation of tfdiags.Diagnostic
that wraps an HCL
diagnostic value and uses it to implement the Source()
method, returning
a Source
value:
type Source struct { Subject *SourceRange Context *SourceRange } type SourceRange struct { Filename string Start, End SourcePos } type SourcePos struct { Line, Column, Byte int }
The Subject
and Context
fields of Source
indicate two source code ranges
related to the diagnostic. Subject
is a small range indicating directly the
problematic construct, while Context
is an optional broader range that may
be important to understand the problem.
Terraform uses both of these ranges to render the source snippet portion of a diagnostic message:
on example.tf line 12, in resource "aws_security_group" "example": 12: description = local.wrong.foo
The Context
range decides which lines are included in the snippet. If
Context
is nil then Subject
is used also as the context. The Subject
range is then shown as underlined in the message to indicate specifically
which part of the rendered source code is problematic.
Terraform retains in memory the source code of each configuration file loaded from disk, and so the rendered snippet is simply constructed of slices from that buffer.
Terraform will also, where possible, show the header of the top-level block
that contains the Subject
range, which is
resource "aws_security_group" "example"
in this case. This is implemented
using a helper function in HCL itself, where it can search the syntax tree
to find the outermost block that contains a given source range. This portion
of the rendered message may be omitted if HCL is unable to find such a block.
Approximate Source Location Information
In most cases, configuration-related problems are reported from within HCL itself or from a configuration-specific portion of Terraform, and so precise source location information is available.
An exception to this rule is any error that is raised from within a provider plugin. Providers are given only the final result values from decoding the configuration, and so they don't have direct access to source code constructs or source location information.
To address this, tfdiags
has a special concept of contextual diagnostics,
which are diagnostic values that are incomplete on initial construction but
that can be completed by a caller adding additional context. When a provider
wishes to indicate a problem in a particular configuration attribute, it will
return a contextual diagnostic with the attribute path, and then Terraform Core
will resolve that by reverse-engineering the source location information from
the configuration syntax tree objects.
It's not always possible to recover an exact source location from a path within the configuration value because configuration decoding is lossy. In that case, Terraform will make a best effort to find a suitable source construct to use, ultimately falling back on the containing configuration block itself if nothing more precise can be found.
Showing Variable Values
The final interesting feature of the diagnostic messages is the contextual information about the values of various variables when the problem occurred:
|----------------- | local.wrong is "nope"
This again exploits information from HCL's own diagnostic values, which are
exposed via the FromExpr()
method on tfdiags.Diagnostic
, returning
a pointer to a FromExpr
value:
type FromExpr struct { Expression hcl.Expression EvalContext *hcl.EvalContext }
If the FromExpr()
result is non-nil, the fields of this struct give access
to the expression that was being evaluated when the problem occurred
(Expression
) and the evaluation context that was active (EvalContext
).
This additional information is particularly useful for errors in expressions
that are evaluated multiple times, such as the expressions within a for
expression:
foo = [for x in aws_instance.servers: x.vpc_security_group_ids[0]]
If an error occurs while evaluating x.vpc_security_group_ids[0]
, the
EvalContext
is what will tell us the value of x
, hopefully allowing us
to figure out which of the instances caused the problem.
Terraform uses the hcl.Expression
object to discover which variables are
mentioned in the offending expression, which it then includes in the rendered
message.
At the time of writing this, the value-printing portion of the message renderer
is still rather rudimentary. In later releases, we might enhance it with some
additional heuristics to recognize common naming conventions in resource types
so that it can, for example, also show the id
value of an object that is
involved in an error, even if the id itself was not used in the expression.
Summary
Terraform's diagnostic messages rely on gathering up various contextual information to use in message rendering. The goal is to include as much information as possible to allow you to figure out the cause of the problem and potential solutions.
Since native Go error
values do not provide sufficient context, Terraform
defines its own tfdiags.Diagnostics
type for transporting diagnostic messages
up the call stack. But to interoperate with other systems that follow the usual
Go patterns, diagnostics can be wrapped up in an error
and unpacked later.
HCL adopted this diagnostics pattern before Terraform did. HCL's diagnostics model is specific to HCL, while Terraform's is generalized to solve a variety of different challenges that arise from Terraform's architecture and its history.
The rendering of diagnostics in the output is likely to evolve over time as we get more experience with different error situations. The diagnostic model may also be exposed in a machine-readable way to text editor integrations to show problems directly in the editor.