Resources in Terraform support the special arguments
count
and
for_each
,
which both serve as declarations that a particular resource configuration
should be repeatedly evaluated to produce multiple instances of that single
resource.
resource "null_resource" "example1" { count = 2 triggers = { index = count.index } } resource "null_resource" "example2" { for_each = { a = "foo" b = "bar" } triggers = { key = each.key value = each.value } }
If we run terraform plan
against the above configuration, we can see that
each of those resource blocks expands out into two resource instances:
Terraform will perform the following actions: # null_resource.example1[0] will be created + resource "null_resource" "example1" { + id = (known after apply) + triggers = { + "index" = "0" } } # null_resource.example1[1] will be created + resource "null_resource" "example1" { + id = (known after apply) + triggers = { + "index" = "1" } } # null_resource.example2["a"] will be created + resource "null_resource" "example2" { + id = (known after apply) + triggers = { + "key" = "a" + "value" = "foo" } } # null_resource.example2["b"] will be created + resource "null_resource" "example2" { + id = (known after apply) + triggers = { + "key" = "b" + "value" = "bar" } } Plan: 4 to add, 0 to change, 0 to destroy.
The instances each recieve an instance key which uniquely identifies them.
The ones created by count
have numeric instance keys corresponding to the
count.index
value inside the block body, while the ones created by for_each
have string instance keys that correspond with the keys of the given map and
are exposed to expressions as each.key
.
In today's Terraform, this repetition ability is a special power of resources
and isn't available for any other Terraform construct. While
dynamic
blocks
do allow for a similar sort of repetition with for_each
for nested blocks,
that mechanism is rather different in that it serves only as a macro for
generating the configuration data to be passed to the provider, not as
something Terraform itself tracks.
For a long time now we've wanted to extend this repetition ability to Terraform modules, but that has been a rather more complicated feature than we'd like. I've recently been doing some exploration of ways we might implement that, and so this article is documenting both a summary of what makes this problem a complicated one and also of a potential solution I've found which seems like a promising technical direction to build on.
How Resource Repetition Works
An important capability of both count
and for_each
is that their respective
expressions can refer to other objects in the configuration. This is important
to allow for the expression of more complex intents, like
"for each virtual machine, create a DNS record", which might look something
like this:
variable "virtual_machine_names" { type = set(string) } resource "example_virtual_machine" "vm" { for_each = var.virtual_machine_names # each.eky here is one of the values from var.virtual_machine_names name = each.key # (...other virtual machine arguments...) } resource "example_dns_record" "vm" { for_each = example_virtual_machine.vm type = "A" # each.key here is the key associated with one virtual machine, which # in turn is one of the values from var.virtual_machine_names name = "${each.key}.example.com" # each.value here refers to one virtual machine instance, as an object value = each.value.ip_address }
This capability creates an interesting constraint: Terraform builds a dependency
graph to determine a correct order to evaluate all of the expressions in a
configuration, and because the count
and for_each
expressions can contain
references themselves, that means that the resource
blocks are what must
participate in the dependency graph, not the potentially-many instances that
result from repeating them.
To accommodate that, the graph Terraform builds for planning is one of
static resource blocks rather than of the individual instances created from them.
It's for that reason that an expression like aws_instance.foo[0]
creates a
dependency on aws_instance.foo
as a whole, not on instance zero specifically.
When Terraform walks the dependency graph, it visits each configuration object
in turn and when it visits a resource block it checks to see if count
or
for_each
is set, evaluates them if so, and then produces a separate
"subgraph" representing the resulting instances, which is conceptually nested
inside the graph node representing the resource:
Inside Terraform Core, this process of generating a subgraph for a node during
the graph walk is called dynamic expansion. The nested nodes don't exist
at all during initial graph construction: the nested subgraph is created
immediately after Terraform visits the containing node, at which point the
dynamic expansion logic can make use of all of the information learned so
far during the graph walk. That includes the values for other configuration
objects that the count
and for_each
expressions depend on.
These subgraphs can potentially have their own inter-dependencies between nodes, but in practice for resource dynamic expension the resulting graph is just one node per resource instance and no edges at all, because the instances of a single resource are not permitted to depend on one another.
Terraform then performs a nested graph walk of this nested graph, which must
complete before Terraform will consider the containing node to be "done" and
move on to visiting its dependents. During Terraform's plan phase, the action
taken on visiting each of the nodes representing resource instances is to
call into the corresponding provider to produce the planned action for that
particular resource instance; to do so, it recieves the result of evaluating
all of the resource-type-specific arguments included inside the resource
block.
Because count.index
, each.key
, and each.value
are available to the
expressions inside a resource block, they can be used to create differences
in the final configuration values for each of the instances of a particular
resource, which means that each instance has its own object value representing
the configuration which is passed to the provider for planning. The provider
doesn't know that these objects are multiple instances of the same resource,
since each instance is processed separately from all others from a provider's
perspective.
For the example configuration shown above then, the graph traversal will visit the nodes in the following order and take these actions:
Visit
var.virtual_machine_names
and evaluate the expression given for that variable in the calling module, saving the result for later use.Visit
example_virtual_machine.vm
and evaluate itsfor_each
expression, using the result for dynamic expansion to produce a subgraph. The subgraph has no dependency edges, so all of the nodes in it can be visited concurrently:Visit
example_virtual_machine.vm["a"]
, evaluating its configuration witheach.key
set to"a"
and then passing the result to theexample
provider for planning.Visit
example_virtual_machine.vm["b"]
, evaluating its configuration witheach.key
set to"b"
and then passing the result to theexample
provider for planning.
Visit
example_dns_record.vm
and evaluate itsfor_each
expression, using the result for dynamic expansion to produce a subgraph. The subgraph has no dependency edges, so all of the nodes in it can be visited concurrently:Visit
example_dns_record.vm["a"]
, evaluating its configuration witheach.key
set to"a"
andeach.value
set to the object produced by planningexample_virtual_machine.vm["a"]
. Passing the result to theexample
provider for planning.Visit
example_dns_record.vm["b"]
, evaluating its configuration witheach.key
set to"b"
andeach.value
set to the object produced by planningexample_virtual_machine.vm["b"]
. Passing the result to theexample
provider for planning.
This process works because all of the instances of a particular resource are configured by the same set of expressions, and thus they are all required to have the same static dependencies. Therefore a single node can represent the resource as a whole, safe in the knowledge that it can represent the collective dependencies of all of the instances that will be created just in time during dynamic expansion. The instances don't need to produce any dependency edges themselves because they are embedded within a node that already declared dependencies for them all.
How Terraform modules work
In Terraform, modules are a code reuse and namespacing construct. Modules can be installed from remote sources such as the public registry, or they can just refer to nearby directories in the local filesystem.
Each module can define its own set of objects whose names are independent of names in other modules. That means two modules in a configuration can both define a resource with the same type and name without any problem, and furthermore a configuration can call the same module multiple times with each of them declaring its own separate set of resource instances.
For the sake of example here, let's focus on modules in the local filesystem and imagine we have the following directory structure:
example1/ main.tf example2/ main.tf
The example1/main.tf
file might include both a direct resource definition and
a call to the module represented by the example2
directory:
resource "null_resource" "example" { triggers = { val = "foo" } } module "example2" { source = "../example2" vals = ["bar", "baz"] } output "ids" { value = concat( [null_resource.example.id], module.example2.ids, ) }
The example2/main.tf
file can then in turn declare objects belonging to that
module:
variable "vals" { type = set(string) } resource "null_resource" "example" { for_each = var.vals triggers = { val = each.key } } output "ids" { value = [for r in null_resource.example : r.id] }
Both module1
(the root module) and module2
contain a null_resource
resource named example
, but Terraform distinguishes them by assigning each
one an absolute resource address:
null_resource.example
for the one in the root module.module.example2.null_resource.example
for the one in the child module.
A key detail about how modules work in Terraform today is that they are, once installed, solely a mechanism for namespace management and they do not appear directly in the graph themselves. The individual input variables and outputs are included in the graph to bridge between the namespaces, but the module itself does not appear directly:
This "flattening" of modules when constructing the graph has two interesting benefits:
Terraform can start working on a resource in a module as soon as all of the individual module variables it depends on are ready, rather than waiting for all of them to become ready.
This can therefore result in improved concurrency for modules that have many different resources that are not themselves directly interconnected.
It's possible for one input variable of a module to depend on one of the outputs of that same module, as long as the output doesn't itself depend (directly or indirectly) on the input variable in question. This is not a common need, but it can be useful in module composition in situations where a matrix of connections need to be made.
For example, I used this capability my shared module
terraformnet/vpc-region/aws
where themodules/peering-mesh
helper module uses module calls that mutually refer to one another in order to declare a mesh of VPC peering connections, which is possible only because the output valueoutgoing_connection_ids
does not depend on the input variableother_region_connections
.
As Terraform walks the graph, it maintains a separate evaluation context per module so that expression evaluation within each node can be performed in the context of the right module. The graph walk logic asks each node which module path it belongs to and then passes it the appropriate evaluation context for that module. Some of the data that the evaluation contexts refer to is module-agnostic global data, such as the state data structure, so the different evaluation contexts all cooperate to update different portions of such structures as appropriate.
Repetition of Modules
The long-desired capability of supporting count
and for_each
inside module
blocks requires combining the two behaviors described above.
Unfortunately, as implemented today these behaviors are not compatible because
repetition of resources requires expansion of a graph node but yet modules are
not represented by a single graph node.
Although Terraform v0.12 laid some groundwork for the surface-level
manifestations of multi-instance modules, such as supporting module instance
addresses like module.foo["bar"]
, the internal processing in Terraform Core
still assumes that one module call in configuration results in one module
instance, and that aside from resources the other constructs that can be
declared within modules are likewise one-to-one with their configuration
constructs.
Since the v0.12 release, we've been evaluating some different design approaches for continuing that work to make Terraform Core ready support repetition of modules alongside repetition of resources.
As a motivating example for the discussion in this section, let's modify our two modules from above to do the repetition at the module level instead of at the nested resource level.
In example1/main.tf
:
resource "null_resource" "example" { triggers = { val = "foo" } } module "example2" { source = "../example2" for_each = toset(["bar", "baz"]) val = each.key } output "ids" { value = concat( [null_resource.example.id], [for mi in module.example2 : mi.id], ) }
...and in example2/main.tf
:
variable "val" { type = string } resource "null_resource" "example" { triggers = { val = var.val } } output "ids" { value = null_resource.example.id }
Notice that now the example2
module contains only a single-instance resource,
and it's the module call itself that has for_each
set. To express that
difference in resource instance address notation, we can say that our earlier
version had module.example2.null_resource.example["bar"]
, but this new
version instead has module.example2["bar"].null_resource.example
, where the
instance key has moved to attach to the module call step rather than to the
resource step at the end.
Attempt 1: Heirarchical Dynamic Expansion
Modules are a more troublesome object to implement repetition for because, unlike resources, they are containers for other objects; repeating a module implies repeating all of the objects inside.
When I first started searching for design approaches to solve this problem, I initially tried to directly apply the resource repetition approach to modules by introducing modules into the graph as containers with subgraphs, which would result in a more tree-like structure of recursively expanding graph nodes:
This approach can produce a correct operation ordering in this simple case, but it loses the two benefits described earlier of flattening all modules into a single graph. In particular, it becomes impossible to represent the situation where one input variable depends on another output value of the same module: it would cause the module node to depend on itself, creating a dependency cycle.
When using count
and for_each
on a module it's inevitable that the objects
inside the module would need to block on the evaluation of those expressions,
so module expansion requires a certain amount of compromise on the
concurrency issue, but the capability of a module to reference itself is a
trickier proposition: while situations where it arises are relatively
exceptional, when it does arise there is often no convenient alternative.
Another reason this approach gave me pause is the dramatic increase in complexity of the graph traversal: it can already be very hard to debug when things aren't working right, and multi-level recursive expansion would literally add another dimension of complexity.
After completing this initial investigation I had to divert my attention elsewhere for a while, so I just drew a figurative line under the above results and made a note that we might have some difficult design compromises ahead for the module repetition feature.
Attempt 2: Leaf Object Dynamic Expansion
After some time away from the problem, I got some inspiration to take another attempt this week after some other work led me to a useful observation: no matter how many times you repeat a particular module, all of the instances must always have the same set of configuration objects inside due to the static nature of Terraform's module graph.
To put that another way: in our example above we can see that the expansion
of module.example2
produces two subgraphs whose sets of nodes and topologies
are always identical to one another. It's only the data flowing between
those objects that varies, as a result of referenes to count.index
,
each.key
, or each.value
.
That then in turn led me realize that, in a plan graph where each node
represents some object in the configuration, we don't actually need to
create a separate set of nodes for each instance of a module. Instead, we can
create a single graph node representing, say, the null_resource.example
resource inside the example2
module above, and then have its dynamic
expansion take care of noticing that it's contained within a repeated module.
If we apply that same principle to the graph nodes of all objects that can appear inside modules, we can avoid an arbitrarily-nested heirarchy of subgraphs and instead just let the leaf nodes expand to as many instances as necessary to cover every combination of calling module instances.
This approach requires two major changes to the graph shape:
Each module call gets a graph node that represents the evaluation of its
count
orfor_each
expression, but does not include any of the contents of the module, or any of its input variables.All node types representing objects that can belong to modules must have dynamic expansion behavior, whereas in current Terraform only resource nodes do.
All nodes representing objects in a module must depend on the node representing the module itself, so the dynamic expand behavior can know how many instances of the containing module(s) there are and expand accordingly.
As before, we inevitably introduce new dependency edges representing the
references in the count
and for_each
expressions, slightly constraining
concurrency whenever they are used, but this would cause no meaningful change
to the processing of modules with neither argument set.
Note also that the individual variables and outputs for the module still appear as distinct nodes in the top-level graph, and so they can have dependencies between them individually and thus we avoid breaking the ability for a module's inputs to refer to its own outputs.
A crucial part of this approach is that all of the graph nodes must be aware of the repetitions of all of the modules they are contained within. In our example above we only have one level of nested module, but Terraform allows arbitrary nesting depth in principle, and so the expansion can potentially be exponential. (The Terraform team recommends against using grandchild modules, and recommends module composition instead, but module nesting must still work for situations where it really is the best option.)
This need to track expansion of containing modules suggests that repetition would become a more cross-cutting concern, visible to many more components. The current implementation of resource repetition is largely contained within the node types that represent resources, but extrapolating that to apply to many more node types would lead to a lot of code duplication and careful coordination. Can we handle expansion in a more distributed way while ensuring it stays consistent across all node types?
The Instance Expander object
In an attempt to answer "yes" to the question posed in the last section, I've designed an abstraction currently called an "instance expander", which has three responsibilities:
To collect information about the
count
orfor_each
usage of modules and resources as it gradually becomes available during the graph walk.To use that gathered information to determine the fully-expanded set of instances for a given module or resource, including any nested expansion.
To determine what values should be set for
each.key
,each.value
, andcount.index
when evaluating argument expressions for a particular module or resource instance.
I implemented the above encapsulated in a single Go struct
type that uses
a mutex to ensure that it will always remain self-consistent despite many
concurrent readers and writers during a Terraform graph walk. If plumbed
fully into Terraform, the callers of Expander
methods would be spread
across different graph node implementations and other subsystems, but for
the sake of demonstration I'll write some simpler programs that
perform the same steps as simpler sequential code.
First, let's reproduce the situation from our most recent example above, that
has module example2
with for_each
as a set of two strings:
ex := instances.NewExpander() // We'll create some address objects to make things easier to read // below. These would be dynamic from the configuration in normal use, // but we're using constants here for simplicity's sake. nullResourceExample := addrs.Resource{ // resource "null_resource" "example" Mode: addrs.ManagedResourceMode, Type: "null_resource", Name: "example", } moduleCallExample2 := addrs.ModuleCall{ // identifies the module "example2" config block Name: "example2", } moduleExample2 := addrs.RootModule.Child("example2") // absolute path to example2 module // Call the following when visiting null_resource.example from the // root module. for _, moduleAddr := range ex.ExpandModule(addrs.RootModule) { // The root module is always a singleton, so there will only be one // iteration here with moduleAddr set to the root module. ex.SetResourceSingle(moduleAddr, nullResourceExample) } // The node representing null_resource.example in the root module // would then dynamic expand, producing only a single instance in this case. for _, resourceAddr := range ex.ExpandResource(addrs.RootModule, nullResourceExample) { fmt.Printf("plan %s\n", resourceAddr) } // Call the following when visiting module.example2. ex.SetModuleForEach( addrs.RootModuleInstance, moduleCallExample2, map[string]cty.Value{ "bar": cty.StringVal("bar"), "baz": cty.StringVal("baz"), }, ) // We'd then visit var.val in the example2 module, but the main visit // doesn't do anything because we need to dynamic expand to evaluate // the variable for all of the module instances it belongs to. // .... // Call the following for dynamic expansion of var.val. We can ask the // expander to expand the example2 module here because its repetition was // already set with SetModuleForEach above. for _, moduleAddr := range ex.ExpandModule(moduleExample2) { varAddr := moduleAddr.InputVariable("val") fmt.Printf("evaluate %s\n", varAddr) } // Call the following when null_resource.example in the example2 // module. for _, moduleAddr := range ex.ExpandModule(moduleExample2) { // After we've noted that there's no for_each or count set on this // resource, we must register that for each containing module instance. // (We need to do this in a loop because if for_each or count _were_ // set then we'd need to evaluate the expression separately for each // module instance in case it depends on an input variable.) ex.SetResourceSingle(moduleAddr, nullResourceExample) } // The node representing null_resource.example in the example2 module // would then dynamic expand, visiting each of the expanded instances. for _, resourceAddr := range ex.ExpandResource(moduleExample2, nullResourceExample) { fmt.Printf("plan %s\n", resourceAddr) } // We'd then visit output.id in the example2 module, but again the // main visit doesn't do anything. // .... // Call the following for dynamic expansion of output.id. for _, moduleAddr := range ex.ExpandModule(moduleExample2) { outputAddr := moduleAddr.OutputValue("id") fmt.Printf("evaluate %s\n", outputAddr) } // Finally we visit the root module output.ids, and again the main visit // doesn't do anything. // ... // Call the following for dynamic expansion of the root output.ids. for _, moduleAddr := range ex.ExpandModule(addrs.RootModule) { outputAddr := moduleAddr.OutputValue("ids") fmt.Printf("evaluate %s\n", outputAddr) }
The above is using types from
Terraform's addrs
package,
which is how it internally represents addresses like null_resource.example
given in the configuration. The string representations of these are the forms
that would be written in the configuration. The result of running the above
is the expected sequence of operations for the our example configuration:
plan null_resource.example evaluate module.example2["bar"].var.val evaluate module.example2["baz"].var.val plan module.example2["bar"].null_resource.example plan module.example2["baz"].null_resource.example evaluate module.example2["bar"].output.id evaluate module.example2["baz"].output.id evaluate output.ids
Things get more interesting when we have repeated objects inside other
repeated objects. Consider for example the case where count = 2
were
set on the resource in the example2
module. That means our registration
during the visit of that node changes slightly in the above program:
// Call the following when null_resource.example in the example2 // module. for _, moduleAddr := range ex.ExpandModule(moduleExample2) { // Here we'd evaluate the count expression in the context of each // of our expanded module addresses. For our example here it's // hard-coded to 2, so we'd set them all the same. ex.SetResourceCount(moduleAddr, nullResourceExample, 2) } // The node representing null_resource.example in the example2 module // would then dynamic expand, visiting each of the expanded instances. for _, resourceAddr := range ex.ExpandResource(moduleExample2, nullResourceExample) { fmt.Printf("plan %s\n", resourceAddr) }
With that change in place, we can now see the effect of the extra level of expansion:
plan null_resource.example evaluate module.example2["bar"].var.val evaluate module.example2["baz"].var.val plan module.example2["bar"].null_resource.example[0] plan module.example2["bar"].null_resource.example[1] plan module.example2["baz"].null_resource.example[0] plan module.example2["baz"].null_resource.example[1] evaluate module.example2["bar"].output.id evaluate module.example2["baz"].output.id evaluate output.ids
This exponential expansion effect would be increased for each additional level
of module nesting, assuming that all of the module calls produce the same
number of instances. In practice though, since both count
and for_each
can derive from the input variables of the calling module, the expansion count
under each instance is likely to vary in practice.
We can also simulate the effect of another related common request for modules:
the ability to disable an entire module by setting its count
to zero or
its for_each
to an empty collection:
// Call the following when visiting module.example2. ex.SetModuleForEach( addrs.RootModuleInstance, moduleCallExample2, map[string]cty.Value{}, // Like for_each = {} )
This produces the following output:
plan null_resource.example evaluate output.ids
Although we still visited all of the nodes representing the objects in the
example2
module, the ExpandModule
and ExpandResource
calls all
returned an empty set of instances, and thus we took no visible action based
on those and thus they are effectively all disabled. Huzzah! 🎉
What's Next?
As a further test of the viability of this approach I've written
Pull Request #23462 which
includes both the Expander
code itself and an initial integration of it into
Terraform Core as a partial replacement of the existing implementation
of count
and for_each
for resources, by just forcing all of the modules to
be registered as singletons for now.
Terraform Core would still need quite a lot of work to fully implement count
and for_each
for modules using this "expander" technique, including the
following:
Terraform's graph walker logic assumes that nodes belong to module instances rather than modules, which is a distinction without a difference today because modules are always singletons, but implementing the above technique will require it to be clear that the plan walk nodes belong to modules themselves, not yet expanded.
The graph node implementations representing resources are the only ones that currently know how to "dynamic expand", so we'd need to add similar capabilities to the node types representing all of the other objects that can belong to a module, including input variables, output values, local values, and provider configurations.
The treatment of module expansion during the apply phase still needs more thought, because Terraform currently implements that by recording the result of all of the resource expansions in the plan so that it can start with already-expanded resource instances in the apply step. In order to implement expansion of all of the other node types during the apply step, we'd either need to record additional information in the plan about how the modules and module objects expanded, or add some additional graph nodes to the apply graph as well in order to give an opportunity to re-decide and register the expansions with the
Expander
for correct operation.
Along the way here I realized that merging my currently-draft PR would add
a suitable building block for another oft-requested feature: depends_on
for
modules. My PR above adds a new graph node representing the decision about
repetition of a module, and that node could potentially also serve to represent
depends_on
. If the Terraform team decides to move forward with something like
that PR, an implementation of module depends_on
might be a nice side-benefit!
We'll need to think through the tradeoffs of this design approach first though,
because we wouldn't want to merge the PR and then find it's an evolutionary
dead-end with regard to count
and for_each
. We'll see!