I spent much of last year working on Terraform Stacks, which aims to address several problems I've long wanted to solve in Terraform around describing larger systems that have multiple components spread across multiple deployment environments.

My most significant contribution was designing and implementing the new DSL for describing a Terraform stack, and that's what this article is about.

What is a Stack, anyway?

Current Terraform does a reasonable job of planning and applying changes to single instances of relatively straightforward systems, but has historically delegated "bigger picture" problems -- such as deploying multiple copies of the same infrastructure to create a series of deployment environments -- to external orchestration tools.

That means that a variety of different patterns have emerged for managing the additional complexity that appears in bigger systems, each with its own pros and cons. A very common question for someone new to Terraform is which of these many patterns they should follow, and why that one is better than the others.

Some years ago now I started thinking about ways that Terraform could be more opinionated about this situation and in return offer a more full-featured experience for managing larger systems. This involved considering all of the different patterns and finding the best bits across all of them while minimizing the disadvantages.

Our typical recommended approach for multiple instances of the same infrastructure in Terraform has been to write a root module that consists only of a backend configuration, zero or more provider blocks, and one or more module blocks calling to shared modules that describe the actual infrastructure. Each new deployment environment can then have its own copy of this root module with different values for those three elements, while the shared modules represent what all of the environments should have in common.

I have often heard from people I've suggested this to that it "feels wrong", and I understand that sentiment: even though the details of these different root modules are different, and so this isn't really "repeating yourself", the common boilerplate sure feels like something is being repeated.

There are two specific annoyances with the current Terraform CLI design that make things particularly awkward:

  • The state storage configuration (i.e. the backend block) is presented in the same configuration artifact as the infrastructure being described, which is very awkward when you intend to instantiate the described infrastructure multiple times with different state storage each.

  • Terraform CLI doesn't offer any way to declare a set of variable values that "sticks to" a particular instance of a configuration in a systematic way. Instead, those building automation around Terraform need to be careful to always specify the appropriate input variable values for the instance of the infrastructure that's currently under consideration.

The most visible difference with stacks, then, is that a traditional "root module" is split into two parts:

  • The stack configuration describes the desired infrastructure and the provider configurations required to interact with it. A stack configuration declares input variables in a similar way to a traditional root module, so it can still allow for variations between instances, but with the differences expressed as expressions based on this input variables rather than by writing a separate root module for each instance.

  • A deployment configuration deals with the settings required for individual deployments. In Terraform Cloud's incarnation of stacks it's implied that state storage is in Terraform Cloud and so the deployment configuration focuses just on per-deployment input variables, but this would also be the appropriate home for state storage configuration once stacks are available outside of Terraform Cloud.

This article is focused primarily on the stack configuration part of this, since the deployment configuration details were designed and implemented by others and so that'll be their story to tell! This also happens to be the part of the project whose source code is already visible in the BuSL-licensed hashicorp/terraform repository, under the internal/stacks directory, and so you can study the implementation details of what I'm describing if you wish.

The official writing on Terraform Stacks focuses on the 2D matrix of deployments and components, and so one way to answer the question "What is a Stack?" is to use that metaphor. But since I'm focused primarily on stack configurations here, deployments are not in scope and I'll instead be talking about the makeup of a stack configuration.

At the time I'm writing this, Stacks is in a private preview phase. Anything I've described here is subject to change before final release, based on feedback from the preview participants. This is just a taste of the rough shape of things.

Stack Components

A "stack" can be planned and applied in a similar way to a root module in today's Terraform, particularly if you've adopted the recommendation of having a root module that consists only of provider and module blocks.

However, modules in traditional Terraform are really just namespaces, not real objects in their own right. Terraform mostly ignores module boundaries when it's building its execution graphs, so for example it's possible for a dependency chain to flow out of a module and back in again. That is a useful ability for certain abstractions, but it has also been a significant constraint on a number of different potential Terraform language features in the past.

With that in mind, rather than "just" using a root module with module blocks as a stack configuration, a stack configuration is written in a separate language (still based on HCL) whose primary object type is called "component".

A component is in many ways the same as a call to a module, but components are always planned or applied in their entirety before proceeding with other components that depend on them, so it isn't possible for a dependency chain to pass out of and back into the same component, but in return the execution model is simpler and can better support other features that have been historically challenging to implement in Terraform modules.

Here's what a component declaration looks like:

component "example" {
  source = "./example"

  inputs = {
    upstream_thingy = component.upstream.thingy
  }
  providers = {
    aws = provider.aws.example
  }
}

Compared to module blocks in the traditional Terraform language, the most visible difference is that the input variable values are provided as a single argument inputs that has an object assigned to it, rather than a separate argument per input variable. This makes it possible to construct that object dynamically in various ways, or to produce the entire object in another component and just assign it across wholesale in situations where that makes sense.

Another difference you might notice is in the providers argument. As we'll see in the next section, the stack configuration language takes quite a different approach to provider configurations that allows treating references to them as normal values rather than as a special separate concept, and so provider.aws.example is a direct reference to such a provider configuration. That's being assigned to the key aws, which is one of the provider configurations that this ./example module requires, using the provider reference syntax familiar from the traditional Terraform language because this block is effectively a bridge from stack world into modules world.

A common theme in the stack language design is fixing the historical mistake of mixing built-in arguments with those defined by the module author or provider developer. The inputs argument also achieves that: all of the top-level arguments in a component block are stack language builtins, while the variable names defined by the module author are segregated as attributes of the inputs object. That means new arguments for component blocks added in future releases cannot conflict with any existing input variable name.

Components can refer to each other in a similar way as resources refer to one another in the traditional Terraform language, and once again Terraform uses those references to infer a suitable execution order. The implementation of this is quite different than the traditional Terraform language runtime, since I was able to benefit from many lessons learned in my nearly-a-decade working on Terraform so far!

Provider Configurations

Another significant difference in the stacks language is how provider configurations work.

In the traditional Terraform language, provider configurations are a very special kind of object that often gets implicitly bound to resources, and when explicitly bound it's done using a special mechanism that's only for provider configurations and is totally separate from Terraform's normal expression language.

In the stacks language, however, a reference to a provider configuration is just another kind of value, so for example you can make a map of configurations for a particular provider and then use it with for_each. And then, building on that, the stacks language also provides a for_each for the provider configurations themselves, where such a mapping gets constructed automatically similar to for_each in a resource or module block in the traditional language:

variable "aws_regions" {
  type = map(object({
    enabled = bool
  }))
}

provider "aws" "regions" {
  for_each = var.aws_regions

  config {
    region = each.value
  }
}

component "per_region" {
  for_each = {
    for name, cfg in var.aws_regions :
    name => provider.aws.regions[name]
    if cfg.enabled
  }
  source = "./per-region"

  inputs = {
    # ...
  }
  providers = {
    aws = each.value
  }
}

removed {
  for_each = {
    for name, cfg in var.aws_regions :
    name => provider.aws.regions[name]
    if !cfg.enabled
  }
  from = component.per_region[each.key]

  source = "./per-region"
  providers = {
    aws = each.value
  }
}

This allows dynamically specifying a set of regions to use, which can differ for each instance of the stack configuration. Since a component calls a traditional module, currently that capability ends at component blocks and the modules inside are still constrained by the traditional language, but hopefully a future language edition will be able to extend this concept into individual modules too.

This does unfortunately raise a challenge, which is a big part of why this hasn't been implemented in Terraform so far: what happens when someone removes an element from var.aws_regions?

Terraform still needs the provider configuration for a region in order to plan to destroy existing objects associated with it, and that's the reason for the extra enabled flag associated with each region: it allows removing the component.per_region instance separately from removing the corresponding provider.aws.regions instance, by first setting the enabled flag to false. The removed block tells Terraform how to choose which provider configuration to use for destroying each instance of the component.

This is not an ideal design by any means, but was the best compromise I found so far to solve this problem and finally unblock the implementation of dynamic provider instantiation.

Embedded Stacks

Although stack configurations are primarily analogous to root modules, I expect that some teams will also want to factor out shared abstractions at the stack configuration level too, and so stack configurations support stack blocks for embedding another stack in its own namespace, in a similar way to how traditional Terraform modules can use module blocks to call other modules.

One possible design, for example, would be to write a stack configuration that encapsulates setting up a private network topology in Amazon EC2 across multiple regions, and then reuse that across many different stacks that could then use the described network for different purposes.

stack "network" {
  source  = "app.terraform.io/example-org/network/aws"
  version = "~> 1.0.0"

  inputs = {
    # (imagine that provider.aws.regions is the same one
    # declared in the previous example above)
    provider_insts = provider.aws.regions
  }
}

The shared stack configuration at app.terraform.io/example-org/network/aws can declare that it expects to be passed a map of instances of the AWS provider, since the stacks language allows treating provider references as regular values:

required_providers {
  # The following declares that the short name "aws" in this
  # module means the "hashicorp/aws" provider.
  aws = {
    source = "hashicorp/aws"
  }
}

variable "provider_insts" {
  # This variable accepts a map of references to configurations
  # of the `hashicorp/aws` provider, because that's what we
  # declared "aws" to mean above.
  type = map(providerconfig(aws))
}

Stack configurations also have output values and local values with behavior very similar to those in the module language. However, output values in the stacks language have explicit type declarations just like input variables, which was a long-requested feature in the traditional Terraform language that will hopefully follow there eventually too.

output "private_subnets" {
  # Map from provider configuration key to availability zone to
  # object describing a subnet.
  type = map(map(object({
    id         = string
    cidr_block = string
  })))

  # This assumes a component "vpc" block with
  # for_each = var.provider_insts that declares
  # a subnet for each availability zone it can
  # find in the target region.
  value = {
    for k, vpc in component.vpc : k => {
      for az, subnet in vpc.subnets : az => {
        id         = subnet.id
        cidr_block = subnet.string
      }
      if !subnet.public
    }
  }
}

What's Next?

The various Terraform teams at HashiCorp are still iterating on different parts of the stacks features in response to preview feedback. My work on it is largely done for now, so I can't actually say what's next, but I hope the above was an interesting and useful overview of one part of the new functionality that's under active development.