I spent much of last year working on Terraform Stacks, which aims to address several problems I've long wanted to solve in Terraform around describing larger systems that have multiple components spread across multiple deployment environments.
My most significant contribution was designing and implementing the new DSL for describing a Terraform stack, and that's what this article is about.
What is a Stack, anyway?
Current Terraform does a reasonable job of planning and applying changes to single instances of relatively straightforward systems, but has historically delegated "bigger picture" problems -- such as deploying multiple copies of the same infrastructure to create a series of deployment environments -- to external orchestration tools.
That means that a variety of different patterns have emerged for managing the additional complexity that appears in bigger systems, each with its own pros and cons. A very common question for someone new to Terraform is which of these many patterns they should follow, and why that one is better than the others.
Some years ago now I started thinking about ways that Terraform could be more opinionated about this situation and in return offer a more full-featured experience for managing larger systems. This involved considering all of the different patterns and finding the best bits across all of them while minimizing the disadvantages.
Our typical recommended approach for multiple instances of the same
infrastructure in Terraform has been to write a root module that consists
only of a backend configuration, zero or more provider
blocks, and one
or more module
blocks calling to shared modules that describe the actual
infrastructure. Each new deployment environment can then have its own copy
of this root module with different values for those three elements, while
the shared modules represent what all of the environments should have in common.
I have often heard from people I've suggested this to that it "feels wrong", and I understand that sentiment: even though the details of these different root modules are different, and so this isn't really "repeating yourself", the common boilerplate sure feels like something is being repeated.
There are two specific annoyances with the current Terraform CLI design that make things particularly awkward:
The state storage configuration (i.e. the
backend
block) is presented in the same configuration artifact as the infrastructure being described, which is very awkward when you intend to instantiate the described infrastructure multiple times with different state storage each.Terraform CLI doesn't offer any way to declare a set of variable values that "sticks to" a particular instance of a configuration in a systematic way. Instead, those building automation around Terraform need to be careful to always specify the appropriate input variable values for the instance of the infrastructure that's currently under consideration.
The most visible difference with stacks, then, is that a traditional "root module" is split into two parts:
The stack configuration describes the desired infrastructure and the provider configurations required to interact with it. A stack configuration declares input variables in a similar way to a traditional root module, so it can still allow for variations between instances, but with the differences expressed as expressions based on this input variables rather than by writing a separate root module for each instance.
A deployment configuration deals with the settings required for individual deployments. In Terraform Cloud's incarnation of stacks it's implied that state storage is in Terraform Cloud and so the deployment configuration focuses just on per-deployment input variables, but this would also be the appropriate home for state storage configuration once stacks are available outside of Terraform Cloud.
This article is focused primarily on the stack configuration part of this,
since the deployment configuration details were designed and implemented by
others and so that'll be their story to tell! This also happens to be the
part of the project whose source code is already visible in the BuSL-licensed
hashicorp/terraform
repository, under the internal/stacks
directory, and so
you can study the implementation details of what I'm describing if you wish.
The official writing on Terraform Stacks focuses on the 2D matrix of deployments and components, and so one way to answer the question "What is a Stack?" is to use that metaphor. But since I'm focused primarily on stack configurations here, deployments are not in scope and I'll instead be talking about the makeup of a stack configuration.
At the time I'm writing this, Stacks is in a private preview phase. Anything I've described here is subject to change before final release, based on feedback from the preview participants. This is just a taste of the rough shape of things.
Stack Components
A "stack" can be planned and applied in a similar way to a root module in
today's Terraform, particularly if you've adopted the recommendation of
having a root module that consists only of provider
and module
blocks.
However, modules in traditional Terraform are really just namespaces, not real objects in their own right. Terraform mostly ignores module boundaries when it's building its execution graphs, so for example it's possible for a dependency chain to flow out of a module and back in again. That is a useful ability for certain abstractions, but it has also been a significant constraint on a number of different potential Terraform language features in the past.
With that in mind, rather than "just" using a root module with module
blocks as a stack configuration, a stack configuration is written in a separate
language (still based on HCL) whose primary object type is called "component".
A component is in many ways the same as a call to a module, but components are always planned or applied in their entirety before proceeding with other components that depend on them, so it isn't possible for a dependency chain to pass out of and back into the same component, but in return the execution model is simpler and can better support other features that have been historically challenging to implement in Terraform modules.
Here's what a component declaration looks like:
component "example" { source = "./example" inputs = { upstream_thingy = component.upstream.thingy } providers = { aws = provider.aws.example } }
Compared to module
blocks in the traditional Terraform language, the most
visible difference is that the input variable values are provided as a single
argument inputs
that has an object assigned to it, rather than a separate
argument per input variable. This makes it possible to construct that object
dynamically in various ways, or to produce the entire object in another
component and just assign it across wholesale in situations where that makes
sense.
Another difference you might notice is in the providers
argument. As we'll
see in the next section, the stack configuration language takes quite a
different approach to provider configurations that allows treating references
to them as normal values rather than as a special separate concept, and so
provider.aws.example
is a direct reference to such a provider configuration.
That's being assigned to the key aws
, which is one of the provider
configurations that this ./example
module requires, using the provider
reference syntax familiar from the traditional Terraform language because this
block is effectively a bridge from stack world into modules world.
A common theme in the stack language design is fixing the historical mistake
of mixing built-in arguments with those defined by the module author or
provider developer. The inputs
argument also achieves that: all of the
top-level arguments in a component
block are stack language builtins, while
the variable names defined by the module author are segregated as attributes
of the inputs
object. That means new arguments for component
blocks added
in future releases cannot conflict with any existing input variable name.
Components can refer to each other in a similar way as resources refer to one another in the traditional Terraform language, and once again Terraform uses those references to infer a suitable execution order. The implementation of this is quite different than the traditional Terraform language runtime, since I was able to benefit from many lessons learned in my nearly-a-decade working on Terraform so far!
Provider Configurations
Another significant difference in the stacks language is how provider configurations work.
In the traditional Terraform language, provider configurations are a very special kind of object that often gets implicitly bound to resources, and when explicitly bound it's done using a special mechanism that's only for provider configurations and is totally separate from Terraform's normal expression language.
In the stacks language, however, a reference to a provider configuration is
just another kind of value, so for example you can make a map of configurations
for a particular provider and then use it with for_each
. And then, building
on that, the stacks language also provides a for_each
for the provider
configurations themselves, where such a mapping gets constructed automatically
similar to for_each
in a resource
or module
block in the traditional
language:
variable "aws_regions" { type = map(object({ enabled = bool })) } provider "aws" "regions" { for_each = var.aws_regions config { region = each.value } } component "per_region" { for_each = { for name, cfg in var.aws_regions : name => provider.aws.regions[name] if cfg.enabled } source = "./per-region" inputs = { # ... } providers = { aws = each.value } } removed { for_each = { for name, cfg in var.aws_regions : name => provider.aws.regions[name] if !cfg.enabled } from = component.per_region[each.key] source = "./per-region" providers = { aws = each.value } }
This allows dynamically specifying a set of regions to use, which can differ
for each instance of the stack configuration. Since a component calls a
traditional module, currently that capability ends at component
blocks and
the modules inside are still constrained by the traditional language, but
hopefully a future language edition will be able to extend this concept into
individual modules too.
This does unfortunately raise a challenge, which is a big part of why this
hasn't been implemented in Terraform so far: what happens when someone removes
an element from var.aws_regions
?
Terraform still needs the provider configuration for a region in order to
plan to destroy existing objects associated with it, and that's the reason for
the extra enabled
flag associated with each region: it allows removing the
component.per_region
instance separately from removing the corresponding
provider.aws.regions
instance, by first setting the enabled
flag to
false
. The removed
block tells Terraform how to choose which provider
configuration to use for destroying each instance of the component.
This is not an ideal design by any means, but was the best compromise I found so far to solve this problem and finally unblock the implementation of dynamic provider instantiation.
Embedded Stacks
Although stack configurations are primarily analogous to root modules, I expect
that some teams will also want to factor out shared abstractions at the stack
configuration level too, and so stack configurations support stack
blocks
for embedding another stack in its own namespace, in a similar way to how
traditional Terraform modules can use module
blocks to call other modules.
One possible design, for example, would be to write a stack configuration that encapsulates setting up a private network topology in Amazon EC2 across multiple regions, and then reuse that across many different stacks that could then use the described network for different purposes.
stack "network" { source = "app.terraform.io/example-org/network/aws" version = "~> 1.0.0" inputs = { # (imagine that provider.aws.regions is the same one # declared in the previous example above) provider_insts = provider.aws.regions } }
The shared stack configuration at app.terraform.io/example-org/network/aws
can
declare that it expects to be passed a map of instances of the AWS provider,
since the stacks language allows treating provider references as regular values:
required_providers { # The following declares that the short name "aws" in this # module means the "hashicorp/aws" provider. aws = { source = "hashicorp/aws" } } variable "provider_insts" { # This variable accepts a map of references to configurations # of the `hashicorp/aws` provider, because that's what we # declared "aws" to mean above. type = map(providerconfig(aws)) }
Stack configurations also have output values and local values with behavior very similar to those in the module language. However, output values in the stacks language have explicit type declarations just like input variables, which was a long-requested feature in the traditional Terraform language that will hopefully follow there eventually too.
output "private_subnets" { # Map from provider configuration key to availability zone to # object describing a subnet. type = map(map(object({ id = string cidr_block = string }))) # This assumes a component "vpc" block with # for_each = var.provider_insts that declares # a subnet for each availability zone it can # find in the target region. value = { for k, vpc in component.vpc : k => { for az, subnet in vpc.subnets : az => { id = subnet.id cidr_block = subnet.string } if !subnet.public } } }
What's Next?
The various Terraform teams at HashiCorp are still iterating on different parts of the stacks features in response to preview feedback. My work on it is largely done for now, so I can't actually say what's next, but I hope the above was an interesting and useful overview of one part of the new functionality that's under active development.