It's so common a question now that it's essentially a trope: why does Terraform have its own language, rather than just using some general-purpose language to generate a data structure and operating on that?

There are various answers to that question, though many of them come down to personal preference. For example, when I see folks ask this question in busy forums I see lots of responses that argue that it's better for the description of your infrastructure to look like a manifest of what should exist, rather than the instructions for building such a manifest.

However, I'm sure there are just as many people who would assert that they find it easier to read their particular favorite general-purpose imperative programming language than the custom Terraform language syntax. And I'm not here to tell them they are wrong: as with many questions of this sort, it ultimately comes down to your personal taste and what other technology you are most familiar or comfortable with.

However, there is one feature of the Terraform language that doesn't get a whole lot of attention but is actually pretty crucial to Terraform providing a key feature: terraform plan.

What Terraform does when it plans

For those not so familiar with Terraform, I suppose I should quickly describe what terraform plan actually is. One of Terraform's innovations over its competitors when we first built it (a "we" that didn't include me quite yet, but still) is its ability to propose a set of actions to take based on the difference between configuration and existing state, and then actually take those actions.

This is subtly different than the "dry run" feature commonly seen in other software that addresses similar problems. Rather than behaving as if it's taking the actions but just stubbing out the side-effects, Terraform's plan phase is in a sense writing a program to achieve the desired state. When you apply the plan, Terraform then runs that program, ensuring along the way that it'll either do what the plan said it would do or generate an error explaining why it can't (e.g. because the outside world changed in the meantime).

A classic problem with "dry run" functionality is that often operations are interdependent. For example, we might need to create an AWS VPC and then use the ID of that VPC to configure a number of AWS subnets. But until we actually create the VPC, we don't know what that ID will be, and so the configuration for the subnet is incomplete.

To create a plan in spite of that constraint, Terraform still evaluates all of the configured arguments for each of your resources, but internally it uses placeholders for values it doesn't know yet. In the internals of the Terraform language runtime we call those unknown values, although in the UI they appear as (known after apply); these two ideas are equivalent.

The result of planning, then, is an overview of all of the high-level resource actions Terraform intends to take, and a partial description of the inputs to those operations, using unknown value placeholders where necessary.

"Unknown value" placeholders in other languages

I'm sure at this point some readers are making a connection between this idea of unknown values and the related idea of "promises" or "futures". These mechanisms are available in other languages, sometimes just as a library and other times more deeply integrated, and indeed allow a value to serve as a placeholder for a value which will become known later.

If we think about some JavaScript-like pseudocode, we might represent my example above, of subnet configuration needing a VPC id, like this:

let vpc = createVpc({cidrBlock: '10.1.0.0/16'});
let subnets = vpc.then(vpc => {
  let subnets = {
    'us-west-2a': '10.1.1.0/24',
    'us-west-2b': '10.1.2.0/24',
  };
  return Object.entries(subnets).map([az, cidr] => createSubnet({
    vpcId: vpc.id,
    cidrBlock: cidr,
    availabilityZone: az,
  }));
});

This does indeed address the problem of not being able to construct the subnet objects until the VPC is created, but this particular design also means that we can't even see the general structure of the subnet configuration until we're ready to resolve the vpc promise.

There are some other possible designs here which could avoid that problem. For example, perhaps we could call the then callback with an object such that vpc.id is itself a promise, and thus we'd still be able to see what properties that object has, just not what concrete value vpcId has in particular. That would be totally sufficient to produce a description of a planned set of actions just like Terraform does, assuming that these createVpc and createSubnets functions were writing these objects into some implied global data structure that would serve a similar purpose to a Terraform configuration.

However, the other part of Terraform's promise about plans is that it will either use the configuration it showed during planning or it'll produce an error saying why it can't. In particular, Terraform should not silently send the provider some other configuration entirely.

And that is, unfortunately, where the promise-based design falls down: there's no guarantee that running the program again with a concrete value for vpc.id will produce the same result. For example, the program could make use of something non-deterministic, like Math.random, or it could intentionally deviate its behavior based on whether vpc.id is a promise:

let vpc = createVpc({cidrBlock: '10.1.0.0/16'});
let subnets = vpc.then(vpc => {
  let subnets = {
    'us-west-2a': '10.1.1.0/24',
    'us-west-2b': '10.1.2.0/24',
  };
  return Object.entries(subnets).map([az, cidr] => createSubnet({
    vpcId: vpc.id,
    cidrBlock: vpc.then ? cidr : '10.1.5.0/24',
    availabilityZone: az,
  }));
});

Now of course this is a very contrived situation: it would be self-defeating to intentionally write a program that lies about what it's going to do. However, it's also possible to introduce this sort of non-determinism by mistake, by relying on behavior that isn't guaranteed. For example, if I were determining these CIDR prefixes by fetching data from a remote API rather than hard-coding them, the remote API's response might change between the plan and the apply.

How does Terraform avoid that situation?

Unknown values are opaque to the program

The key difference between unknown values and promises is that unknown values are not part of the explicit programming model at all. Instead, they are hidden inside the language runtime and handled automatically as part of expression evaluation.

Let's consider a Terraform configuration roughly equivalent to the example from the previous section.

resource "aws_vpc" "example" {
  cidr_block = "10.1.0.0/16"
}

resource "aws_subnet" "example" {
  for_each = {
    "us-west-2a" = "10.1.1.0/24"
    "us-west-2b" = "10.1.2.0/24"
  }

  vpc_id            = aws_vpc.example.id
  cidr_block        = each.value
  availability_zone = each.key
}

There is nothing we could write in this configuration that would allow the result to vary based on whether aws_vpc.example.id is currently known or not. Instead, the Terraform language runtime automatically computes a suitable unknown value result for any operation on an unknown value.

Unknown values carry type constraint information along with them, so the language runtime can still (where context allows) detect type-related errors even though the values aren't known. Writing aws_vpc.example.id.foo will generate an error even if we don't know the id value yet, because the provider told us that the id attribute is a string, and it's never valid to access an attribute on a string.

But what we cannot do is replicate the second example from the previous section, which (intentionally or not) returned a different value for cidr_block once the VPC ID became known:

  # There is no "is_known" function like this in the Terraform language
  cidr_block = is_known(aws_vpc.example.id) ? "10.1.5.0/24" : each.value

Dealing with side-effects

The use of unknown values is part of a broader effort within Terraform to make the result either deterministic or explicitly unknown during planning.

More generally, the Terraform language draws ideas primarily from functional programming rather than imperative programming, because that avoids dealing with side-effects.

There are some side-effects lurking in our first JavaScript-like example above: those createVpc and createSubnet functions presumably register some data into a hidden global data structure that ultimately represents the configuration. That means it might not even be possible to predict correctly which objects will exist in the final outcome, let alone what their values might be:

let vpc = createVpc({cidrBlock: '10.1.0.0/16'});
let subnets = vpc.then(vpc => {
  let subnets = {
    'us-west-2a': '10.1.1.0/24',
    'us-west-2b': '10.1.2.0/24',
  };
  if (vpc.id.contains('z')) {
    return Object.entries(subnets).map([az, cidr] => createSubnet({
      vpcId: vpc.id,
      cidrBlock: vpc.then ? cidr : '10.1.5.0/24',
      availabilityZone: az,
    }));
  } else {
    return [];
  }
});

Again this is a contrived example just to make the point succinctly, but notice that the above is using a conditional check against the final VPC ID value to decide whether to call createSubnet at all. Until we know what vpc.id concretely evaluates to, there's no way to know how many subnets should be in the final configuration.

Because the Terraform language draws from functional programming principles, Terraform can always re-evaluate the same expression once it's gathered more information and know that the result will always be strictly a more complete version of what it learned on the previous evaluation.

However, Terraform does still ultimately need to know at planning time how many subnets are being declared, and so that's why we have the pragmatic compromise of rejecting an unknown result in for_each:

Error: Invalid for_each argument

The "for_each" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.

In today's Terraform language, we treat the declaration of a resource block as a funny sort of "side-effect". This doesn't necessarily need to be true: Terraform could potentially just report that it plans to create some undetermined number of aws_subnet.example instances, but we intentionally made this an error because we concluded that a plan that can't even tell you how many objects will be created is not a particularly useful plan.

With that said, you can see a different variant of this decision for nested dynamic blocks or for expressions involving unknown values: in that case, Terraform will allow the number of results to be unknown during planning. This was a tradeoff for flexibility at the expense of producing an accurate plan.

Both ways are arguable as the best behavior, but in any case both of them depend on the concept of unknown values, in order for Terraform to recognize the difference between a known number of objects vs. an unknown number of objects.

Unpredictable functions

The Terraform team is typically pretty pragmatic about the language, as long as the pragmatism doesn't create any situations where an expression would be likely to produce a different known value during plan than apply.

One key example of this are a couple "unpredictable" functions, which produce different values on each call due to reliance on constantly-changing external state: uuid, timestamp, and bcrypt.

Terraform deals with these by just making them return known values of the appropriate type during planning. This makes their results behave similarly to exported attributes from resources not created yet: the final value will be determined only at apply time.

There is, unfortunately, a historical exception to this rule: the file function (and some other similar functions) reads the content of a file on disk and returns it as a string. If you change the content of that file after planning but before applying then you'll defeat Terraform's best efforts to remain deterministic between plan and apply, but Terraform will still catch it and fail with an error rather than silently applying the changed value:

╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for null_resource.example to include new values
│ learned so far during apply, provider "registry.terraform.io/hashicorp/null"
│ produced an invalid new value for .triggers["content"]: was
│ cty.StringVal("Hello world!\n"), but now cty.StringVal("Hello world!\nUh-oh!").
│ 
│ This is a bug in the provider, which should be reported in the provider's own
│ issue tracker.
╵

It's unfortunate that Terraform mistakenly blames the provider for this, but sadly this is caught by the same safety check that catches a provider failing to meet the expected contract and so Terraform has to make a guess at who to blame here. The important thing, though, is that Terraform detected it and stopped before sending this changed value to any remote API.

The file function is, I think, a historical mistake. It dates back to very early Terraform releases before we had a strong conception of what promises the Terraform language ought to be allowing for, and by the time we realized it was too late to treat it as a true unpredictable function because it would break many existing configurations. In practice it seems pretty rare for users to encounter this hypothetical situation though, and so I feel satisfied with the tradeoff of retaining it in the language.

Why doesn't Terraform use an existing language?

So returning to the original question: why doesn't Terraform use an existing language? So far, I've not yet encountered an existing language runtime which contains the building blocks Terraform uses in order to keep the plan-related promises.

I hope I've already made a convincing argument that Terraform's planning feature on having a language free of programmer-controlled side-effects. I expect some would argue that this promise is not actually so important, because if you just write your configuration code correctly then it'll work out, but I'm describing here the motivations that led to Terraform's current design, not intending to argue for or against them.

There are various languages which are good fit for deterministically constructing configuration based on expressions, which like Terraform draw from functional and/or logic programming rather than imperative programming ideas. For example, Jsonnet is a great little language for concisely generating data structures within the JSON data model, and CUE allows for fitting provided expressions to a set of constraints to produce a validated data structure.

These languages seem like they could be the basis of the Terraform language based on their lack of side-effects, but as far as I know they don't allow embedders enough control over evaluation to create an equivalent of Terraform's unknown value resolution behavior. CUE does have a sense of a type constraint which is conceptually similar to an unknown value of a particular type, which makes the most promising I've seen as a potential building-block for the Terraform language given a suitable integration point, but it's also relatively new and so didn't exist at the time we created the Terraform language.

With that said then, so far the best compromise seems to have been to use another language to generate Terraform language code. You can use any language you like to describe the Terraform language data structure, and then feed that result into Terraform to evaluate it. This approach does have an important caveat, though: it's not possible for the program generating the configuration to itself react to the results of creating new objects. Instead, it can only generate instructions telling Terraform how to propagate data from one object to another, which Terraform will then handle safely using unknown values as described earlier.

I've heard of a number of folks doing this sort of pre-generation using Jsonnet as the authored language, using it to describe a data structure that Terraform would understand as the JSON variant of the Terraform language. I also personally, prior to joining the Terraform team while still a direct Terraform user, wrote several specialized tools for turning higher-level descriptions of infrastructure into Terraform language JSON, although the modern Terraform language now allows achieving most of what I did there within Terraform itself, without any pre-generation, because those were needs that fit well within the Terraform language design constraints.

After seeing the relative popularity of this practice of generating the Terraform language from another language, and building on the good work done by AWS for CloudFormation, HashiCorp is now developing Terraform-CDK, which is a framework tailored to using general-purpose imperative programming languages in order to generate Terraform language JSON. It's still pretty early days, but it does show that it's possible to offer a robust mechanism for writing configuration in general-purpose languages while still relying on the Terraform language runtime to keep the promises for planning.

In the meantime though, I hope the above illustrates that the existence of the Terraform language is founded in a practical design tradeoff and not just the whims of someone who thought it might be fun to write a programming language. Terraform's domain-specific language is tailored to Terraform's specific needs and it's not a simple matter to just swap it out for some other language runtime. The JSON variant of the language has long been intended as a mechanism for the sort of programmatic generation that Terraform-CDK now provides, so I'm excited to see how that grows alongside the Terraform language.

If you're aware of an existing embeddable language that offers mechanisms corresponding to the needs I describe above, then I'd love to hear about it! Such a language could in principle exist alongside the current Terraform language, as long as its evaluation model is compatible enough. For now though, the Terraform language runtime is an individible part of Terraform Core and isn't going anywhere.