Validation in Terraform

Terraform has long had a terraform validate subcommand, which performed basic validation of whether each reference matches a corresponding declaration, whether each resource type used is known by the provider that it's supposed to belong to, etc.

However, until Terraform v0.12 Terraform Core had no way to see the schema for each resource type. Terraform Core would simply pass a data structure representing the raw configuration constructs over to the provider plugin and have the provider logic validate it.

There were several significant problems with that design. The first is that the form of configuration passed to the provider was so far removed from the original configuration constructs that it had ambiguities of representation. Consider the following contrived aws_instance configuration:

resource "aws_instance" "example" {
  instance_type = "t2.micro"
  ami           = "ami-abcd"

  tags = {
    Name = "foo"
  }

  ebs_block_device {
    device_name = "xda1"
    volume_size = 8
  }
}

Terraform v0.11 and earlier applied a straightforward mapping from the configuration constructs to a Go dynamic data structure, which would produce something like the following in this case:

map[string]interface{}{
    "instance_type": "t2.micro",
    "ami":           "ami-abcd",
    "tags": map[string]interface{}{
      "Name": "foo",
    },
    "ebs_block_device": []interface{}{
        map[string]interface{}{
            "device_name": "xda1",
            "volume_size": 8,
        },
    },
}

Since Terraform did not know what configuration structure was expected by the plugin, it was forced to apply some general rules to produce a data structure like the above and then pass that data structure over to the provider plugin for validation.

An important loss in this translation is any record of where in the source code each construct came from. If the provider detects any problems it can't possibly produce a good error message with this information, because it has no idea where the problematic input came from.

A more subtle problem is that the mapping is ambiguous. Terraform produced a very similar data structure for a map attribute as for a nested block, even though those two constructs have a quite different meaning in the provider schema: a nested block is more like a "struct" type in a programming language, where its shape is predefined and a user just populates its values, whereas a map attribute is freeform and allows the user to select both the keys and the values.

This led to users discovering loopholes in the validation that would allow certain incorrect patterns to pass through validation undetected:

locals {
  block_devices = [
    {
      device_name = "xda1"
      volume_size = "8"
    },
  ]
}

resource "aws_instance" "example" {
  instance_type = "t2.micro"
  ami           = "ami-abcd"

  tags = {
    Name = "foo"
  }

  # INCORRECT USAGE: ebs_block_device is a nested block, not an attribute
  ebs_block_device = "${local.block_devices}"
}

Terraform Core would again apply its simple mapping to this, producing an identical structure to the above, which would then pass validation and give the impression of allowing dynamically-generated nested blocks. Unfortunately this was not a perfect loophole, because the representation of nested blocks and maps is not 100% compatible, and so users would often find themselves in an unfortunate situation where a configuration like the above would appear to work at first, but then get trapped later when the abstraction leaked:

# (now using an input variable instead of a local value)
variable "block_devices" {
    type = "list"
}

resource "aws_instance" "example" {
  instance_type = "t2.micro"
  ami           = "ami-abcd"

  tags = {
    Name = "foo"
  }

  # INCORRECT USAGE: ebs_block_device is a nested block, not an attribute
  ebs_block_device = "${var.block_devices}"
}

As soon as a caller of this module attempts to pass in a block device list derived from something not known until apply time, a very strange error would appear:

aws_instance.example: ebs_block_device: should be a list

This error is very confusing, because var.block_devices is a list. However, this arose because this triggered one of the subtle differences between nested blocks and map values: a map value can be "unknown" during the plan phase, while nested blocks cannot. Since unknown values should never appear in that position, there was no code in the provider to handle that case, and so it fell through into the general type-checking codepath and produced a misleading error.

The correct error here would be to report that ebs_block_device is a nested block type rather than an attribute, but neither Terraform Core nor the provider have enough information on their own to report it this way.

Schema in Terraform Core

For Terraform v0.12 we addressed this with a significant architectural change: Terraform Core now requires the provider to produce a manifest of all of the resource types it supports and the configuration constructs they support. This new requirement is the original reason for the plugin protocol change for Terraform v0.12.

The wire protocol for this is essentially a protobuf serialization of some of the types in Terraform's configschema package:

message Schema {
    message Block {
        int64 version = 1;
        repeated Attribute attributes = 2;
        repeated NestedBlock block_types = 3;
    }

    message Attribute {
        string name = 1;
        bytes type = 2;
        string description = 3;
        bool required = 4;
        bool optional = 5;
        bool computed = 6;
        bool sensitive = 7;
    }

    message NestedBlock {
        enum NestingMode {
            INVALID = 0;
            SINGLE = 1;
            LIST = 2;
            SET = 3;
            MAP = 4;
        }

        string type_name = 1;
        Block block = 2;
        NestingMode nesting = 3;
        int64 min_items = 4;
        int64 max_items = 5;
    }

    // The version of the schema.
    // Schemas are versioned, so that providers can upgrade a saved resource
    // state when the schema is changed.
    int64 version = 1;

    // Block is the top level configuration block for this schema.
    Block block = 2;
}

This representation of schema is, roughly-speaking, a subset of the schema representation used internally within the provider SDK. It includes only the features required to interpret the configuration constructs into a predictable shape for the provider to validate and process.

In particular, it distinguishes between attributes and nested blocks, which addresses the problem described above. Terraform Core will itself now verify that the configuration constructs match the schema, performing value type conversions as necessary, and pass to the provider a value that conforms exactly to the given schema for any further validation of specific attribute values.

Because Terraform Core has access to the full-fidelity configuration objects, in case of any problems an exact source location can be returned in the error message.

Validating References

A second significant problem with the architecture in Terraform v0.11 and prior is that it only accounted for validation of configuration blocks. Resource type schemas are also relevant in validating references, which is where a resource instance is used in an expression elsewhere in configuration:

output "base_url" {
  value = "https://${aws_instance.example.public_ip}:8443/"
}

Terraform v0.11 can verify that aws_instance.example refers to a resource that was declared elsewhere in the module, but without access to the resource type schema Terraform Core could not do much with the .public_ip attribute access.

To deal with this, Terraform v0.11 defers this particular check until the plan step, where the provider had by this point returned a strange "flatmap" representation of the planned final state of the resource instance:

{
  "ami": "ami-abcd",
  "ebs_block_device.#": "1",
  "ebs_block_device.14353453.device_name": "xda1",
  "ebs_block_device.14353453.volume_size": "8",
  "instance_type": "t2.micro",
  "public_ip": "(a special unknown value sigil)",
  "tags.%": "1",
  "tags.Name": "foo"
}

Terraform v0.11 can therefore refer to this description, see that there is indeed a key public_ip, and accept the expression as valid.

Unfortunately this approach was lossy and created another ambiguity: Terraform Core couldn't tell the difference between an attribute not being set for a particular instance and it not being part of the schema at all, so it would always return a generic error message like this:

output.base_url: Resource 'aws_instance.example' does not have attribute
'public_ip' for variable 'aws_instance.example.public_ip'

Terraform v0.12 will use its new access to the resource type schema also to check references like these, catching problems during the validate step rather than during the plan step, and allowing unassigned exported attributes to evaluate to null rather than returning the generic error.

Access to the schema also allows Terraform to produce more precise and more helpful error messages in several cases. For example, given the following (admittedly contrived) output declaration:

output "volume_size" {
  value = aws_instance.example.ebs_block_device.volume_size
}

Terraform Core can now detect that ebs_block_device is a nested block construct represented by a set of objects value, and so produce a specific error message stating exactly that and offering a suggestion on a different way to write this:

Error: Cannot index a set value

  on example.tf line 21, in output "volume_size":
  21:   value = aws_instance.example.ebs_block_device.volume_size
    |-----------------
    | aws_instance.example.ebs_block_device is a set of object

Block type "ebs_block_device" is represented by a set of objects, and set
elements do not have addressable keys. To find elements matching specific
criteria, use a "for" expression with an "if" clause.

Now this particular situation requires a complex enough change that admittedly this error message is not as helpful as it might be, but it at least identifies the exact problem and gives some keywords to find the solution, which is to rewrite this as a for expression matching by device name:

output "volume_size" {
  value = [
    for v in aws_instance.example.ebs_block_device:
    v.volume_size if v.device_name == "xda1"
  ][0]
}

Future Uses of Schema

While the initial motivation for giving Terraform Core access to schemas was for validation, this is also a building block for several other interesting features that we plan to explore more in forthcoming releases.

The most obvious next step is to offer a Terraform CLI command to export schemas for a particular configuration so that a text editor integration can offer precise autocomplete suggestions, whereas today such integrations often ship with their own internal database of arguments and blocks from a subset of providers. All of the necessary data is available in Terraform Core to do this, but as with all external machine-readable formats we want to take the time to design it well to minimize the need for breaking changes to it in future.

A more interesting area we want to explore in later releases is to make better use of the "sensitive" flag that providers can set in order to propagate the sensitive trait through expressions via static analysis, so that any value derived from a sensitive value is also considered sensitive. This will not be a trivial thing to implement since it would be the first example of global static analysis in Terraform, but it should allow sensitive values to be used more safely in Terraform.