The Future of the Terraform Language

A few months ago the Terraform team released Terraform CLI v1.0, and with it published the Terraform v1.0 Compatibility Promises. The promises outline in detail what turns out to be a rather complicated problem in practice, because the product that we think of as "Terraform" is built from various different autonomous parts.

However, a key intent of the promises, details aside, is that if someone writes a module for Terraform v1.0 then that module should continue to behave in a manner consistent with the author's intent, without modification, in all future v1.x versions.

The Terraform team actually has a stronger goal than that though, because we're assuming that in practice there can never be an "ecosystem break" for the Terraform language ever again: Terraform v0.12 was our main opportunity to get a "do-over" of a lot of the accumulated accidental design of the Terraform language, and that investment does seem to have paid off but it wasn't without the cost of a lot of upgrade work on the part of shared module authors and provider developers.

Given this, it seems likely to me (though this is not a guarantee) that any subsequent "Terraform CLI v2.0" will be motivated by breaking changes to the CLI workflow — that is, the commands we run to realize configuration changes against real infrastructure — and not by breaking changes to the Terraform language.

We still have lots of hopes and dreams for the Terraform language itself though, and inevitably some of those goals will come into conflict with the existing language design in ways that will require redefining some existing syntax. Because of that, we've left ourselves some room in the Terraform v1.0 design so that we can keep moving forward cautiously, rather than becoming totally hamstrung by the language as defined in Terraform v1.0.

Reserved Words and Extension Points

A common conflict when evolving most programming languages is any situation where externally-defined syntax coexists with language-defined syntax in a way that requires the language design to reserve certain words or symbols as "special", and thus unavailable for use by external definitions.

In Go, for example, we cannot define a local variable called func because in many situations it would then be ambiguous whether the word func represents that variable or whether it's the beginning of a function literal. The word func is therefore a reserved word, unavailable for use as an identifier.

The Terraform language has a few similar situations. Due to being a domain-specific language with a particular focus (rather than a general-purpose language) we can more easily make use of context to resolve ambiguity and so there aren't any words that are generally reserved in all situations, but there are some specific situations where built-in language features coexist with external definitions in an ambigous way:

Because referring to other managed resources is very common in most practical Terraform modules, Terraform doesn't require any particular reference prefix to indicate that what follows is a managed resource address, and instead treats any unrecognized top-level symbol as the name of a managed resource type.
For example, if using the hashicorp/aws provider then we might refer to a particular VPC as aws_vpc.example, where aws_vpc is one of the resource types defined in that provider. However, if a provider were to offer a managed resource type named data then a reference to data.example wouldn't be understood as a reference to a resource of that type, because Terraform reserves data as a namespace to contain all of the data resources defined in the current module.
In order to achieve a "flatter" aesthetic for common definitions, Terraform reserves some particular argument names in certain block types as so-called "meta-arguments", which are extracted and interpreted by Terraform itself rather than being passed through to a provider or other external extension point.
For example, the for_each meta argument is defined for resource, data, and module blocks as a way to systematically expand a single resource definition into multiple resource instances based on the elements of a map.
If a provider were to define a resource type with its own argument for_each then the meta-argument of that name would mask it, and thus writing for_each = var.example inside the configuration of a resource of that type would not cause that value to reach the provider as part of the resulting resource configuration object.

These two are the main (but not the only) areas of conflict we're anticipating for future language evolution. Referenceable objects are a particular concern because many new features we've considered in the past have called for introducing new reserved symbol names to contain objects of a new kind. We add meta-arguments less often (we can typically use the already-reserved lifecycle block type as a container for many new additions) but we did already see the addition of for_each in Terraform v0.12, and so we can see that there will occasionally be demand to add new meta-arguments.

Opt-in Language Editions

I mentioned above that this general category of concern is common to almost all programming languages, which is advantageous to us because we can use other languages as prior art to learn about and evaluate different strategies to managing change over time.

Although the details do vary between languages, a common theme that's emerged across many language ecosystems is the idea of specifying at a granular level which variation of a language the author is intending to use. The appropriate granularity for that declaration varies between languages, but common choices are per-individual-file (allowing potentially many language variations in the same codebase) or per-package, where "package" might variously be called "package", "module", "library", etc in different languages.

For example:

During the 2000s, Microsoft de-facto introduced "doctype sniffing" as a way to allow continued evolution of HTML while keeping their rendering engine compatible with existing content written against its quirks.
Although that started as a vendor-specific idea unique to Microsoft Internet Explorer, other vendors ended up following a similar strategy and eventually the HTML 5 specification adopted this approach as official, specifying that all HTML 5 documents must begin with the <!DOCTYPE html> preamble in order to opt in to standard-compliant processing, rather than the vendor-specific quirks enabled by default.
This mechanism only permits two different "editions" of the language (assuming we consider "vendor-specific quirks" as a funny sort of edition), and so this only served as a pragmatic way to allow moving forward to a new standards-based approach once, and would not generalize to allow future evolution.
The web platform typically favors explicit feature sniffing and "polyfills" as its ongoing approach to evolution. Custom Elements do introduce a conflict between HTML's "reserved words" (defined in the specification) and a user extension point, although that set of features comes with the intent that many existing elements and newly-defined elements can be understood, in a sense, as predefined "custom elements". (And indeed, some of them are actually implemented that way in browsers.)
The Perl community learned that several of the language's "magical" behaviors were commonly the root cause of bugs and misunderstandings, and so adopted the "strict" pragma as a way for developers to opt in to stricter interpretation of particular features on a per-module basis.
The strict pragma in particular allows opting in to three different orthogonal sets of new behaviors, effectively creating eight "editions" of the Perl language which display different interpretations of the same syntax elements.
Perl uses this idea of "pragmas" (really: modules which, on import, modify settings inside the compiler) for various other opt-in behaviors too, and so this can be a workable approach for later modifications to the language.
The Python community has used a bit of a hybrid model, where new features can start off as opt-in but have typically become mandatory in later releases. Unfortunately, Python has become a go-to "negative example" for language evolution due to the large number of opt-in-becomes-mandatory changes in the Python 3.0 release, but I think it's still interesting to consider its design for opting in to these "future features".
PEP-236 introduced the idea of "future statements", which are in practice an entirely new language construct for enabling features, but which are intentionally designed to appear as normal package import statements so that they can be parsed and executed as such in prior Python versions.
For example, writing from __future__ import print_function near the start of a Python module will (prior to Python 3.0) cause Python to present print as a predefined function rather than as a special statement. Doing so makes most typical uses of the print statement be invalid syntax, but in return gets the benefits described in PEP-3105.
Python's model illustrates one way to introduce backward-incompatible changes as opt-in for some period and then to later make those mandatory, on the assumption that developers will use that opt-in period to prepare for the later mandatory feature. In practice, that approach didn't really work for Python 3.0 because even now, with the Python 2.x series unsupported, there are still many codebases not yet compatible with Python 3.
The Python community's experience is what leads me to the assumption that we will likely never be able to use mandatory breaking changes as a means to move forward in Terraform, even if we waited until a hypothetical future Terraform v2.0 to do it. Terraform is arguably already in a less-extreme version of that same situation, with various existing users still sticking with Terraform v0.11 due to not being able to prioritize that upgrade process. We are only fortunate that we did that transition earlier in Terraform's adoption curve.
The Go team published its own 1.0 Compatibility Promises (as you might guess, that idea was the main model for Terraform making such promises) and was initially in a similar situation as Terraform is today, where there wasn't yet a clear story about what future evolution outside of the specification guaranteed by those promises would look like.
For a long time various possible future breaking changes were placed into a hypothetical "Go 2" bucket, though I think the Go team always had at least some sense that there wouldn't really be a big breaking Go 2.0 styled after Python 3.0, and indeed they've developed that position further in the intervening years.
The current design for breaking evolution of the Go language is firstly to minimize it as much as possible: the Go team considers it as a last resort, preferring a backward-compatible enhancement where possible.
But in situations where that's unavoidable, the Go team adopted a model where each Go Module can specify which Go version it was written to target, by writing a statement like go 1.18 in the go.mod dependency metadata file. For Go, there is no strong distinction made between the language specification and the reference implementation of that specification, and so we can say that "The Go 1.18 language" is the language as described in the language specification associated with Go 1.18, which is implemented in the 1.18 release of the reference implementation published by the Go team. The conformity to that specification by other implementations of the specification is up to the maintainers of those implementations.
An interesting characteristic of Go's approach as compared to the others I've discussed here is that it's presented as more of a "hint" than as a hard constraint. If I try to use a go 1.18 module with Go 1.17, the Go toolchain will not reject that module outright but will instead attempt to compile it anyway, and it will only mention that version mismatch if compilation fails. This means that Go 1.17 should generally be able to successfully compile Go 1.18 modules that don't use generics (for example), but will fail on attempting to use a module that does use the new syntax.
The other side of that compromise is that any new features must not redefine existing syntax. That is, a new feature must be designed such that it will always produce a compile error if interpreted under an older version of the specification, rather than succeeding but behaving in some way other than what the Go 1.18 specification calls for.
The Rust project approached its v1.0 with a similar set of goals as Go and Terraform, but framed it as Stability as a Deliverable, including the principle of "stability without stagnation".
Rust did inevitably encounter changes that required small amounts of incompatibility with some existing code, and addressed that concern with Editions (which is a very specific use of the nomenclature I've been using in this article in the more general sense of different minor variations of the same language).
Rust Editions, in a sense, decouple language changes from the compiler versions that support them. The Rust project makes compiler releases on a regular schedule and has the goal that anyone should be able to always take the latest release and be confident that it will understand code written against any edition that compiler supports.
Furthermore, each "crate" (main the Rust packaging primitive) selects its intended edition independently of all others in the program, so the decision to adopt a new edition is generally private to the maintainers of a particular crate, except of course that it will require callers of that crate to use a newer compiler version than they might've been before.
The per-crate selection of editions does limit the kinds of changes that the editions mechanism would be appropriate for. Editions alone can't model breaking changes to the standard library, for example, because there's only one copy of the standard library linked into each program.
In return though, the Rust team can largely focus on only one release stream of the compiler and toolchain, because there's no systemic reason for someone to want to keep using an old minor release of the compiler. Instead, if they encounter a bug in an old compiler release they can typically just update all the way to latest and be confident their existing code will still work, saving any edition updates for a more convenient time.

What all of these strategies have in common is that they effectively split a single language into multiple variants, with different components of an overall program each specifying which variant they intend to use.

The main differences seem to be in the processes and community expectations built around that basic idea. The Python community uses the multiple variants as a form of deprecation cycle, but typically with the intent of eventually removing the older variants. Rust embraces multiple variants as the normal course of business, committing to support stablized features indefinitely while also using a related opt-in mechanism (unstable features) to allow for developing unproven new features in the main branch without committing to support them.

Terraform Language Editions

After reviewing the tradeoffs made by these and other language communities with similar goals, the Terraform team's current intent is to move forward with a scheme similar to Rust's, where we would aim to roll up collections of new features which include small compatibility breaks into new language editions which each module can select independently of others in the same configuration.

Terraform already has an idea of experimental language features, which itself took inspiration from Rust's model of opt-in unstable features. These each effectively create a new variant of the Terraform language with some different features, and give us the freedom to iterate on those new features in the main branch while being clear that they are not yet final.

We chose to follow Rust's lead most closely here because we have some similar goals of being able to converge on a "release-train-like" model where new features may in principle arrive at any time and will be included in whatever release comes next. We also have a related goal of creating a separation between the CLI behavior (the available subcommands and their behavior) and the language itself, which makes it convenient to have a way to talk about variations of the Terraform language independently of specific Terraform CLI versions that happened to introduce them.

In preparation for Terraform v1.0 we created room in the Terraform language for selecting a language edition, which comes in the form of a new argument inside the terraform block type, called language:

terraform {
  language = TF2021
}

This TF2021 keyword is the so-far-unspoken name of the Terraform language edition established by the Terraform v1.0 release. If you don't write a language argument at all then your module implicitly selects TF2021.

In current versions of Terraform this argument is only a stub: it just returns an error if you select any edition other than TF2021, because that's the only edition currently defined. However, having that specialized error message in place means that if we introduce a new edition in a later release then users of older Terraform versions will get good, actionable feedback to upgrade, rather than a generic error message about an unrecognized argument.

If we do use this mechanism to introduce a new language edition in future, we expect it to work in a similar way as the existing experiments argument, which effectively sets some internal flags inside Terraform's in-memory representation of a Terraform module that other parts of the Terraform language implementation can use to branch into different codepaths when selected.

This idea will remain hypothetical for the foreseeable future, because we aren't actively working on any language changes that need a new edition. Any new feature we design would only use a new edition if we conclude that the benefit outweighs the cost, and that will always be a case-by-case decision rather than something we can generalize.

Automatic Edition Migrations

Another aim we have for editions is that it should be very easy to adopt a new edition for an existing module, to the extent of having an automatic process for doing so.

For that to be possible, it's important that each new edition be a superset of those that came before it. Upgrading a module to a new edition should never cost you any features, but may require a new syntax to access some pre-existing features.

In anticipation of that requirement, Terraform v1.0 also includes some language features that can remove ambiguity between reserved words and externally-defined keywords, as described earlier.

The first is an alternative way to refer to managed resources: in Terraform v1.0 and later, resource.aws_instance.example is equivalent to aws_instance.example, and similarly for any other resource type name that doesn't conflict with a reserved symbol name.

The second is a special syntax for forcing particular arguments or blocks to be handled as externally-defined, even if their names overlap with meta-arguments in the same context. For example, if we were intending to add for_each today as a feature in a new edition, and if some existing provider had already defined an argument named for_each in one of its resource types, anyone adopting the new language edition while using that provider could use the "escaping block" syntax — a special reserved block type named _, the underscore character — to tell Terraform to ignore the usual special meaning of that keyword:

resource "happycloud_thing" "example" {
  _ {
    # Because this is inside the special escaping
    # block "_", Terraform will just pass it through
    # to the provider.
    for_each = "hello!"
  }
}

We intend to use these two new mechanisms only as a last resort. When designing a new feature we'll always aim to select new reserved words that are unlikely to conflict with commonly-used providers and modules, but we can't see into all providers and modules in order to guarantee there won't be conflicts, and so these two escaping mechanisms ensure that an automatic edition upgrade tool can always have some way to get the same effect as the module previously had, though admittedly with a less-ideal syntax.

For Example: a `convert` function

While designing the new type constraints mechanism for Terraform v0.12, I designed as part of it a special function convert which provide bring the same type-constraint-conversion functionality Terraform has for input variables, but make it available for use locally within expressions inside a module:

locals {
  settings = convert(
    yamldecode(file("${path.module}/settings.yaml")),
    object({
      name   = string
      memory = number
      tags   = map(string)
    }),
  )
}

The above would force the data structure from settings.yaml to conform to the given object type constraint, which both validates its overall structure (returning an error if there's no name set, for example) and forces type conversions if e.g. the YAML file were written in such a way that memory initially decoded as a string containing decimal digits.

This function didn't make it into Terraform v0.12 because it involves a blend of normal expressions (the first argument) and the special expression syntax Terraform uses for type constraints, currently only in the type argument of a variable block.

Terraform v0.11 already had functions named list and map that would conflict with the type constructor functions of the same name in type constraint expressions, and also the keywords like string and number would appear as new reserved symbols, potentially conflicting with a (admittedly unlikely) provider defining resource types named exactly string or number.

An earlier version of Terraform removed the now-unnecessary list and map functions (prior to the v1.0 Compatibility Promises), so adding string, number, bool, and any to the set of reserved symbol names is the remaining blocker for offering this function as I'd originally designed it.

If we decided that this feature were important enough to warrant making a new language edition to support it (which remains to be seen!) then the automatic version upgrade tool would need to detect references to resource types that have conflicting names, and rewrite them to use the unambiguous resource. prefix. For example, if we had the following contrived module written for TF2021 as the input:

resource "string" "example" {
}

output "id" {
  value = string.example.id
}

The automatic edition upgrade tool would need to notice first that this module is currently targeting TF2021 (implied by making no explicit edition selection), and therefore search for references whose first component is one of the newly-reserved keywords, rewriting to a result like the following:

terraform {
  # (INVALID: just a hypothetical future edition name)
  language = TF2048
}

resource "string" "example" {
}

output "id" {
  value = resource.string.example.id
}

This rewritten version still has the same meaning and effect as the original, but the module author could then choose to make use of the convert function that was not previously available to them when targeting TF2021.

We'll see!

Much of what I've written about above remains hypothetical until we actually concretely plan the first non-default edition. What I've shown here represents my high-level sense of how we might use these language features we've reserved, but this isn't intended as a commitment to do exactly what I described here or to establish a new language edition at any particular time.

It's likely that for the moment we'll decline possible language enhancements that we can't realize without breaking compatibility, because I have a sense that editions should be few and far between, and that they deserve a fair amount of due consideration and overhead to offset the inconvenence of breaking upgrades, even if they are mostly automated.

However, I think it's important that we do have these tools in our toolbox to allow us to, as the Rust community puts it, stablize without stagnating. With the architectural stability work we did in the Terraform v0.12, v0.13, v0.14, and v0.15 releases we're mostly happy with Terraform's overall structure through the ongoing v1.x releases, and that will hopefully give us the opportunity to invest in more end-user-impactful improvements in the future, which includes new language features where the new capability is impactful enough to warrant additional language complexity. An important prerequisite for that will be a sustainable process for turning use-cases into proposals and then evaluating those proposals.

I'll probably have more to say about this in future if we do move towards establishing a new language edition, but for now this set of ideas is just in our back pockets for later work.