Evolving the Terraform Language

The most significant goal for Terraform 0.12 was to take what we've learned about Terraform use-cases over the years and revamp the Terraform language to be a better foundation for expressing those use-cases. This article is an attempt to capture some of the design goals and design process that went into these language improvements, and to argue in particular that the new features added are a natural extension of what came before, and not a departure from the original language goals as some have claimed.

The essense of the Terraform language

The word that has most consistently been applied to the Terraform language from the outset is "declarative". Declarative programming is a paradigm in which we describe a desired outcome rather than a specific process or algorithm to achieve that outcome. A Terraform operation itself has side-effects (creating, updating, or deleting remote objects) but they come not from the program itself but from the plan.

Terraform's plan step evaluates the input program (which we call the "configuration", since it seems more appropriate for our audience) to obtain a description of the desired result, and compares that with the saved state to find any differences. Terraform Core then, with the help of providers, generates a set of create, update, and delete actions which can be performed to converge on the desired result.

"Declarative language" is not the opposite of "programming language". In this context, "declarative" is better contrasted with "imperative", which is a more common programming paradigm where the programmer describes a sequence of steps that imply a particular result.

Although declarative languages do not specify a sequence of steps, they can still model computation. In Terraform's language, we model computation through expressions that transform a result attribute from one resource to populate an argument of another.

Prior to Terraform 0.12 the computation features were limited and inconsistent, as a result of having gradually evolved to meet real use-cases rather than designed holistically. Terraform 0.12 retains this declarative computation essense, but equips the developer with more consistent behaviors and some new features that aim to help with declarative composition.

Design goals

Before embarking on the design of any particular new features, we thought it important to agree on some general design principles so that the resulting language would feel self-consistent, predictable, and intuitive. Language design is always subjective, and every design decision comes with tradeoffs. Design principles help us to prioritize different possible solutions to each design problem.

The following are the most important design principles that inform the design of the evolved Terraform language:

Prioritize the reader over the writer: Code is generally read far more times than it is written. Code will often stick around longer than the person who originally wrote it.
Explicit is better than implicit: You shouldn't need to be a Terraform expert to understand the intent of a Terraform configuration written by someone else. Implicit behavior is obvious only to those who know it is there. It's better to write a little more code so that your intent is clearer to the next reader.
Simple things should be simple, and complex things should be possible: The earliest uses of Terraform were relatively-simple descriptions of specific, isolated infrastructure. As the language has grown, it's become increasingly common to describe more complicated systems with many inter-connected components. We want to retain the core simplicity of the language for more straightforward situations, but also allow for more complex scenarios without sacrificing readability and clarity.

There were of course numerous other, smaller principles at play as we evaluated and designed each language feature, but I consider the three above to be the most pervasive and most important ones. Pragmatism requires that we sometimes make tradeoffs that don't emphasize our principles, and time will tell whether the decisions we made live up to these principles in practice.

In the remaining sections, I'll discuss some of the specific new features or changes to the Terraform language in v0.12 and how they fit in with the design principles.

First-class Expressions

In Terraform's earliest versions, its language was based on libucl, combined with a simple template language within UCL strings.

Eventually libucl was replaced with HCL primarily because linking a C library into Terraform made build and distribution complicated, but with it came some trivial syntax adjustments such as separating arguments with newlines rather than with semicolons. HCL was still used only for the overarching structure though, and the simple template language within strings grew over time to include basic computation through operators and functions.

This template language was eventually separated from Terraform to create HIL, a string-interpolated expression language that is crucial in Terraform's ability to describe larger systems that span across platforms through composition of individual resources.

However, as Terraform configurations grew to make greater use of compound data structures like lists and maps, the separation between the structural language HCL and the expression language HIL became pretty rough. The syntax for building a list was different depending on whether you were writing a constant list in HCL or a dynamic list in HIL:

  list_argument = ["a", "b", "c"]
  list_argument = "${concat(var.other_list, list("b", "c"))}"

The idea of a string interpolation returning a non-string result was also highly counter-intuitive, leading to lots of confusion for those who were new to Terraform and unfamiliar with these quirks.

One of the earliest decisions was to merge the structural and expression languages together to unify this syntax. The two most sigificant implications are that the same list-construction syntax is used everywhere and that the string interpolation syntax is now needed only for actually combining strings together:

  list_argument = concat(var.other_list, ["b", "c"])

We think this new form is easier to read and understand, particularly for those who are less familiar with the Terraform language. The expression syntax uses punctuation that will be familiar to those who are comfortable with a number of different popular general-purpose languages.

This particular decision was barely a tradeoff at all. It did lead to some other knock-on effects, however. In particular, because braces are now used both for nested blocks and for map expressions, the new language must now be more particular about the distinction between arguments and nested blocks, where before the language would usually figure out what the user meant:

  map_argument = {
    "a" = "b"
  }
  nested_block {
    a = "b"
  }

This particular change is going to be an ongoing effort to shore up mistakes in documented examples over the years that the old language accommodated, and to improve the precision of various documentation that was previous unclear about whether a particular construct was an argument or a nested block.

A single number type

The expression language in Terraform 0.11 and earlier had two numeric types -- integer and float -- and would perform arithmetic operations differently depending on which was used. However, the Terraform language does not generally use explicit type declarations, and so it was often unpredictable whether a particular number would be understood as an integer or a float, particularly after passing through other layers that may convert it implicitly.

After first attempting to define some predictable behavior for integer vs. float and conversions between them, we came to realize that there isn't really a strong need for this distinction in the Terraform language. Integers and floating point numbers are used in different situations as a performance tradeoff in traditional software, but performance at that level is irrelevant in Terraform since any minor difference is dwarfed by the time spent waiting for remote APIs to respond to requests.

Instead, we concluded that the most straightforward answer was to eliminate the distinction and have a single number type. Remote APIs often do specify a machine numeric type though, so we needed a number system that was at least a superset of 64-bit integers and 64-bit floats so that the full range of both of these types could be expressed.

Terraform v0.12 uses an arbitrary-precision floating point implementation, allowing much higher range and precision than either of the standard machine types. This means that arithmetic operations always behave in a consistent way, and the results can instead be converted to machine types just in time to be passed to remote APIs, detecting and reporting any range errors at that point.

An important consequence of this is that the division operator now always performs floating-point division. To perform integer division, the floor function must be used on the result:

  int_argument = floor(a / b)

This is an example of explicit is better than implicit. Integer division is used for practical reasons in computing, but conventional arithmetic is with real numbers and so that is the most familiar default. The inclusion of the floor call makes it explicit that something unusual is going on.

`for` expressions

A common struggle with more complicated configurations in Terraform 0.11 was when one resource required a list or map argument value in a slightly different form than was produced by some other resource attribute, but the Terraform language lacked any robust mechanisms for projecting one collection to another, element-by-element.

The Terraform language already had numerous operators and functions for manipulating scalar values, and so the more limited handling of collections was unfortunate, and often led to very convoluted workarounds that significantly hurt readability.

A few of the options we considered to improve on this situation were:

Continue to add more built-in functions that perform some specialized transform from one list to another, continuing the trend started by formatlist of having both a scalar form and a list form of each transform function.
Extend the expression language to support passing expressions to functions as data and then implement generic functions like map(list, expr) and filter(list, expr).
Add list- and map-comprehension syntax to combine various collection manipulations together into a single construct and support both lists and maps with similar syntax.

The first option was eliminated quite quickly by producing a list of examples of what functions we might add which quickly grew very large while still leaving various use-cases unmet.

The decision between the others was harder, and indeed the second option was in the lead for a while, but what ultimately we felt like it would be confusing to overload normal function syntax to create these "special" functions, and that the idea of list comprehensions would be familiar to Python users and our syntax looked enough like a for-each loop in other languages for most readers to be able to make a good guess as to what it does even if they've not seen such a construct before.

The final result was a pair of constructs that can both map and filter at once, and can also (in the case of map comprehensions) group by key. In the interests of keeping simple things simple, we tend to avoid using computer science jargon in our user model, so these constructs are instead called "for expressions":

  list_argument = [for x in var.other_list: upper(x)]
  map_argument  = {for inst in aws_instance.example: inst.availability_zone => inst.id...}

The group-by-key syntax using ... here is, in the end, not ideal: it's perhaps a little too on the "arcane" side. This is a downside of packing so much functionality into a single construct, but I expect this particular grouping feature won't be as commonly-used as the other for expression capabilities in practice.

Those familiar with Python's comprehension syntax will also note that we've inverted its usual ordering: in the Terraform language the for keyword leads the construct, for two reasons. Subjectively, this ordering looks more like a for loop in an imperative programming language, and is thus hopefully familiar to more readers. More practically, having the declaration of the temporary variable(s) first creates an opportunity for a text editor extension to potentially offer auto-complete features when working with those symbols.

Resources as objects

The final significant change in Terraform v0.12 is that whole resource instances can now be manipulated as object values within the language. As with for expressions, the treatment of resource instances as a special opaque construct in expressions made it hard to apply other language features to resources, and thus tended to lead to the inclusion of bespoke features that worked only for resources, such as the "splat syntax":

  instance_ids = aws_instance.example.*.id

In Terraform v0.11, .* is not actually an operator, but rather the language runtime sees that whole reference as a single construct and looks into the state to find all of the instances of aws_instance.example and extracts their ids. This special construct was necessary because aws_instance.example alone was not a valid expression, and so it could not be combined with other operators like the index operator:

  instance_id = aws_instance.example[count.index].id

Instead, we required some rather-convoluted expressions building on the splat syntax, such as the following equivalent of the above:

  instance_id = aws_instance.example.*.id[count.index]

Once this was combined with other language features, the result inevitably degenerated into unreadable gibberish, confusing to any future maintainer of the code.

By making aws_instance.example a normal value in Terraform v0.12, the more-familiar index syntax above can be used, and resources can also be used with other normal language features like functions and for expressions.

Conversely, we've seen many users over the years confused that this splat syntax looks like an operator but yet only works in that one context. We also took the opportunity in 0.12 to generalize the splat operator to work with any list value, as these users had expected. Now it truly is an operator, and can be thought of as a shorthand for for expressions in the common situation of extracting an attribute value from each object in a list of objects:

  instance_ids = aws_instance.example.*.id
  instance_ids = [for inst in aws_instance.example: inst.id]

To be honest, if this construct hadn't previously existed in the language it probably wouldn't have passed our design principles since it's questionable whether this passes the "explicit is better than implicit" test, but this is an example of pragmatism: this construct was very well established and often used in the existing language, so removing it entirely was not a practical option and so instead we sought to properly define its behavior and generalize it.

Conclusion

My goal with the above was both to show how we evaluated and prioritized features in the Terraform language for v0.12 and to illustrate that these new features do not represent a paradigm shift or regression towards imperative programming.

The essense of the Terraform language is retained, and these new features just fill in some gaps where the original language design fell short of its goals. We expect that this will be the last significant shake-up of the Terraform language for the foreseeable future, but in the event that any new language-level features are considered, we'll use the same design principles to evaluate and refine them.