In early use of Terraform, users usually first learn the syntax for requesting
a single remote object, which is often the venerable aws_instance
both due
to popularity of usage and due to its relative simplicity:
resource "aws_instance" "example" { ami = "ami-abc1234" instance_type = "t2.micro" }
At this stage of learning, the terms resource, instance, and object can
seem totally interchangable. This single configuration block is named
resource
and it creates a single object in AWS, and (confusingly) its type
contains the word "instance" even though that is meant in a different sense
than Terraform itself uses it.
It usually isn't long before users discover the special count
argument, which
allows the same configuration block to generate multiple similar objects.
At this point we introduce Terraform's idea of instance, which is used to
distinguish the single configuration block from each of the possibly-multiple
things it describes.
I expect that for many users, the difference in meaning between "instance" and
"object" remains obscure even after many years of use, because we intentionally
minimize the exposure of the distinction in the Terraform UI. If you have used
the special create_before_destroy
lifecycle mode then you have indirectly
benefitted from the distinction between instance and object in that brief
period during terraform apply
where both a new object and an old object exist
for the same instance because the previous object (the "deposed" object, in
Terraform's terms) has not yet been destroyed.
My main motivation for writing this article is to describe the motivations for
the new for_each
feature that will serve as an alternative to count
in a
forthcoming version of Terraform, but this is also an excuse to describe some
of Terraform's other related concepts in perhaps a slightly more theoretical
way than we tend to engage with them day-to-day.
This is certainly not required reading for Terraform users, and indeed part of Terraform's mission is to (as much as possible) worry about these details so you don't have to. With that said, I know that some people enjoy theory for theory's sake and others have learning styles that prefer to build the practice on the theory, so I hope that both of those audiences will find this a useful overview. It may also prove useful for someone considering contributing code changes to Terraform Core itself.
Terraform Configuration as a Function
Before getting into the details, I think it's worth introducing my mental model of Terraform Configuration, since the rest of this article will assume and build upon it.
We often describe Terraform's language as declarative, which is distinguished from imperative. By this, we mean that Terraform configurations describe a desired result rather than the individual steps required to produce that result.
To put that in more practical terms, we can think of a Terraform configuration as a function which returns not actual infrastructure (whatever that might mean) but instead a data structure describing what ought to exist.
This characterization is not completely honest, though: the data structure it
returns does include some additional information which implies an ordering of
any steps required to achieve the result, as statements of the form "A is
required by B". The Terraform language syntax is designed to allow those
relationships to usually be defined implicitly, but you can also list them
explicitly using the depends_on
argument within certain blocks.
Another way this model falls short of reality is that the full data structure describing all desired objects cannot be evaluated all at once. The most obvious situation illustrating that fact is when an identifier assigned by the remote system must be used as part of the configuration of some other object:
resource "aws_security_group" "example" { name = "https_server" ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } } resource "aws_instance" "example" { ami = "ami-abc1234" instance_type = "t2.micro" vpc_security_group_ids = [aws_security_group.example.id] }
With this said, I think a more reasonable mental model is to say that the Terraform configuration is a function that returns a data structure containing other functions, each one defining a resource. In a conventional functional or imperative programming language, we might express the same thing in a different notation:
function all_infrastructure() { return { resources: [ { type: "aws_security_group", name: "example", config: function () { return { name: "https_server", ingress: [{ from_port: 443, to_port: 443, protocol: "tcp", cidr_blocks: ["0.0.0.0/0"], }], }; }, }, { type: "aws_instance", name: "example", depends_on: ["aws_security_group.example"], config: function (sg) { return { ami: "ami-abc1234", instance_type: "t2.micro", vpc_security_group_ids: [sg.id], }, }, }, ] } }
Notice that the config
function for the aws_instance
resource takes an
extra argument giving the final result of creating or updating the security
group and uses it as part of its return value. The Terraform language infers
the need for that extra requirement automatically by detecting the reference,
but I made it explicit here to help illustrate that at a theoretical level
each resource's configuration is itself a function that builds a configuration
in terms of other objects in the configuration.
Implied Side-effects
Configuration is only one part of the story, of course. Terraform's task, after executing your configuration "functions", is to determine which actions must be taken in order to reach the described result.
Terraform's repertoire of actions derives from the core set of verbs commonly offered for objects in REST APIs: Create, Update, and Delete. The specific implementation of each of these actions depends on the resource type and is ultimately decided by the provider, but how does Terraform decide which actions are required in the first place?
Terraform maintains a sidecar data structure which is simply called the "state".
The first time a user runs terraform apply
, there isn't yet any state, which
is equivalent to the state being empty. In this simple case, the only reasonable
action for each resource in the configuration is to create the objects it
requests. Terraform delegates to a provider to determine what exactly a "create"
operation entails, which we could consider to be an impure function that takes
a data structure describing the result of the resource's configuration function,
performs various side-effects to create the requested object, and returns
another data structure describing the object that was created:
// Pseudo-code for creating an aws_security_group object function create_aws_security_group(config) { sg = ec2_sdk.create_security_group({ name: config.name, ingress_rules: [ { from_port: x.from_port, to_port: x.to_port, protocol: x.protocol, cidr_blocks: x.cidr_blocks, } for x in config.ingress ], // etc, as required by the EC2 SDK }) return { id: sg.id, name: config.name, ingress: config.ingress, // (and any other attributes defined for this resource type) } }
The result of this function is a data structure that is logically a superset of the given configuration data structure, filling in any values that were determined as a result of the side-effect(s). This object is then recorded in the "state" for next time.
On a subsequent run of Terraform, the state lets Terraform know that there's already a remote object representing the requested instances, and Terraform will ask the provider to check for differences between the previous state and the configuration. If none are found then no action is required. If a change is detected then the provider will decide whether it can make that change in-place (via an update) or whether the object must be replaced (a destroy followed by a create, or vice-versa).
Assuming an in-place update is possible, the provider provides another impure function to handle an update:
function update_aws_security_group(prior_state, config) { result = ec2_sdk.update_security_group({ id: prior_state.id // etc, etc }); return { id: prior_state.id, name: config.name, ingress: config.ingress // etc, etc }; }
If we consider the absense of state to be an state in itself, we can say that actions in Terraform generalize as a function which takes both a configuration and a prior state (which might be null) and returns a new state. In order to show plans for approval this is actually two steps in practice, which can be thought of abstractly as two functions:
function plan_change(prior_state, config) { // compare prior_state and config, decide what the new // object will look like (planned_state) and whether this // change requires a destroy+create rather than a single // update. return planned_state, replace_required; } function apply_change(prior_state, planned_state) { // call to a remote API to move from prior_state to // planned_state, and then build new_state to describe // the result. return new_state; }
The distinction between planned_state
and new_state
is subtle, and is
possible only because the Terraform language has a special feature: certain
values within planned_state
will be marked by the provider as "unknown",
meaning that their concrete values will not be known until apply time. The
closest analog to this in an imperative programming language would be a
promise for a value
determined during apply_change
.
The distinction between create, update, and delete here is now more implicit:
if prior_state
is null then we are creating, while if config
is null
then we are deleting. This gives us the same three operations but implemented
through a common pipeline. In practice, the providers themselves still use
separate functions to apply each action type due to an abstraction provided by
the plugin SDK, but the plan step is shared.
Correlating Between Configuration and State
In the sequence above I intentionally skipped a crucial detail: how does Terraform know which blocks in the configuration correspond to which objects recorded in the state?
Each instance has an identity that persists from one run to the next, which is
used in both the configuration and the state. Because these identifiers are
crucial to Terraform's planning mechanism, the Terraform language defines
a concise reprsentation of them in the form of a resource address. This is
a bit of a misnomer since the syntax actually identifies a specific instance
rather than a resource as a whole, but in our examples so far resources and
instances have been one-to-one because we've not been using count
:
aws_security_group.example
aws_instance.example
If we were to update the aws_instance
resource to include count = 2
, the
set of addresses in this configuration would change to reflect that resource
and instance are no longer one-to-one:
resource "aws_instance" "example" { count = 2 ami = "ami-abc1234" instance_type = "t2.micro" vpc_security_group_ids = [aws_security_group.example.id] tags = { Name = "example-${count.index}" } }
aws_security_group.example
aws_instance.example[0]
aws_instance.example[1]
Terraform must use an additional instance index to uniquely identify each of
the two instances that the resource "aws_instance" "example"
block now
represents. With count
, these indices are always consecutive integers from
zero to the count value minus one.
When count
is in use, we can say that the configuration-building function
implied by this block now takes an additional argument giving that instance
index:
{ type: "aws_instance", name: "example", depends_on: ["aws_security_group.example"], config: function (idx, sg) { return { ami: "ami-abc1234", instance_type: "t2.micro", vpc_security_group_ids: [sg.id], tags: { Name: "example-" + idx, }, }, }, },
Now Terraform will execute this configuration "function" once for each index, and ask the provider to compare with the prior state of the corresponding instance. The index is used as part of the identifier to correlate configuration with state, so on subsequent runs Terraform can distinguish these two objects in the state and ensure the correct result from the configuration function is compared to the correct instance state.
If we increase to count = 3
and re-run Terraform, it will see that there is
no aws_instance.example[2]
in the state and know that it must be created,
while the existing instances zero and one should remain unchanged. Likewise,
if we decrease to count = 1
, Terraform will treat that as intend to delete
the object represented by aws_instance.example[1]
, while leaving instance
zero unchanged.
So far so good! We can successfully correlate between configuration and state to produce a suitable set of actions. This approach has an important limitation though, which we will see in the following section.
More Complex Interactions
Although count
was originally intended as a means simply to simply construct
multiple similar objects for purposes like horizontal scaling, users
constructing reusable modules found themselves with more complex needs. Someone
in the community (sadly, I've lost track of who) realized that the count
feature can be set to a non-constant expression in order to express
relationships in a more intuitive way.
A straightforward example is using a list of values to both choose a number of instances and set one or more unique properties for each instance using a single input variable:
variable "instance_names" { type = list(string) default = [ "foo", "bar", "baz", ] } resource "aws_instance" "example" { count = length(var.instance_names) ami = "ami-abc1234" instance_type = "t2.micro" vpc_security_group_ids = [aws_security_group.example.id] tags = { Name = var.instance_names[count.index] } }
Because count
is a special argument handled by Terraform itself, Terraform
can evaluate this expression before it evaluates the rest of the configuration
for the resource. This resource's implied definition function still takes a
single index each time, with Terraform using the length of the list of names to
decide how many times to call the function, which might now look like this:
{ type: "aws_instance", name: "example", depends_on: ["count.index", "var.instance_names", "aws_security_group.example"], config: function (idx, names, sg) { return { ami: "ami-abc1234", instance_type: "t2.micro", vpc_security_group_ids: [sg.id], tags: { Name: names[idx], }, }, }, },
The result is three instances numbered with indices zero, one and two, each
identical other than the Name
tag. This sort of usage is attractive because
it serves to de-emphasize the numeric indices assigned by Terraform and focus
instead on the one-to-one relationship between the names and the instances.
This pattern runs into trouble if a new element is added into the middle of the list of names, though:
variable "instance_names" { type = list(string) default = [ "foo", "boop", "bar", "baz", ] }
The instances and their names are still connected only indirectly via their indices, and so this change is understood by Terraform not as adding a new name to the list but rather as renaming the instances with indexes one and two, and then adding a new instance with index three.
Assuming these pet names have some meaning to the user, the result is confusing: what was formerly instance "bar" is now instance "boop", and to make matters worse there is still an instance "bar" but it is the one that was initially called "baz".
This situation is particularly problematic when the list values are used to populate arguments that cannot be updated in-place. In this case, Terraform will needlessly destroy and re-create existing instances rather than just adding one new one.
Because this issue arises only on subsequent updates, this problem has emerged as a bit of a "trap": users will write modules that use this pattern, but find out too late that they actually can't change the list without destroying an important remote object. Working around this requires the use of Terraform plumbing commands that can, if not used with care, cause data loss.
Custom Instance Keys with for_each
The key problem (pun not intended!) with this pattern is that the relationships between indices and configuration values are decided by Terraform itself, and always forced to be based on indices. In the real world, the situation is rarely this clean, since remote systems tend not to themselves track objects by index. This creates a mismatch between Terraform's own tracking mechanism and the one used by the remote system or by the user.
This problem would be solved if Terraform allowed the user to customize the
correspondences between configuration values and instances. The for_each
feature aims to achieve that by generalizing the idea of instance indices to
instead be instance keys, and then allow the user to define which keys are
used on a per-resource basis.
resource "aws_instance" "example" { for_each = {for n in var.instance_names: n => n} ami = "ami-abc1234" instance_type = "t2.micro" vpc_security_group_ids = [aws_security_group.example.id] tags = { Name = each.key } }
This more complex expression in the for_each
argument is a new Terraform 0.12
feature for projecting between collection values. Those familiar with functional
language or certain imperative languages might recognize this as a
map comprehension, though in the Terraform language we simply call it a
"for
expression". The effect of this expression is to construct a map from
the list variable, with the same result as if the user had hand-written the
following map:
for_each = { foo = foo bar = bar baz = baz }
Whereas count
forces the instance keys to be consecutive integers starting
at zero, for_each
allows the user to select arbitrary strings as instance
keys, leading to a new variant of resource address:
aws_instance.example["bar"]
aws_instance.example["baz"]
aws_instance.example["foo"]
When for_each
is used, instead of using count.index
to select an index
we can use each.key
and each.value
to reference keys and values from the
given map, which in this case are both equal. Continuing our functional
pseudo-code, we might consider this new block to be similar to the following:
{ type: "aws_instance", name: "example", depends_on: ["each.key", "aws_security_group.example"], config: function (key, sg) { return { ami: "ami-abc1234", instance_type: "t2.micro", vpc_security_group_ids: [sg.id], tags: { Name: key, }, }, }, },
As with count
, the for_each
is handled by Terraform itself as a special
case. Terraform evaluates the expression and then behaves as if it were calling
this configuration function once per element in the resulting collection,
passing in the key and value for each element if requested.
Most importantly, though: the Terraform state uses the keys from the collection
to identify each instance, so introducing our new name boop
into the list
simply establishes a new instance with that key:
aws_instance.example["bar"]
aws_instance.example["baz"]
aws_instance.example["boop"]
aws_instance.example["foo"]
This resource's definition does not refer to any individual indices in the list of names — indeed, the conversion to map discards any sense of ordering — and so adding a new name to the list simply declares a new instance, leaving all of the existing instances untouched.
Other for_each
Types
The example in the previous section showed what we expect will be the most
common situation where for_each
is set to a map value, allowing the user to
concisely set both an string key and a potentially-complex value for each
instance.
The for_each
argument will also accept list values, in which case it is
essentially a more readable form of count
where the instances are still
identified by numeric indices:
resource "aws_instance" "example" { for_each = var.instance_names ami = "ami-abc1234" instance_type = "t2.micro" vpc_security_group_ids = [aws_security_group.example.id] tags = { Name = each.value Index = each.key } }
The above has all of the same caveats as using count
, so it should be used
sparingly, but may be useful in rare situations where the ordering of the
instances is significant.
More interesting for the simple use-case we've used in this article is to use a set of strings value:
variable "instance_names" { type = set(string) default = [ "foo", "bar", "baz", ] } resource "aws_instance" "example" { for_each = var.instance_names ami = "ami-abc1234" instance_type = "t2.micro" vpc_security_group_ids = [aws_security_group.example.id] tags = { Name = each.value Index = each.key } }
By changing the type
of var.instance_names
to be set(string)
rather than
list(string)
we tell Terraform that the ordering of these elements is not
significant and that each value must be distinct, which is the same set of
constraints as for the instance keys themselves. This allows us to use the
variable directly as the for_each
expression without the gotcha of using
numeric index keys.
In fact, internally Terraform treats a set of strings here as if it were the
for
expression we saw in our first example above, constructing a map where
the key and value for each element is equal. A set of strings is, therefore,
just a shorthand for this common case.
Conclusion
The for_each
feature was too large to fit in the initial Terraform 0.12
release along with all of the other significant language changes, but the
0.12 development process did include a lot of groundwork for this feature
such as making sure the state serialization format can deal with both integer
and string instance keys.
We plan to complete the feature in a minor release in the v0.12 series, though the timeline for that will depend on how must post-release work is required for the other changes coming in 0.12.0. I think this will be a big help for anyone writing re-usable modules that create abstractions, and will be a logical extension of the other expression-level improvements in 0.12.0.