CUE as a new language for OpenTofu

OpenTofu currently uses its own domain-specific language, built in terms of HashiCorp's HCL, as the means for module authors to describe the infrastructure that should exist in their target environments.

At various points over the years people have asked whether OpenTofu or its predecessor could directly support some other similarly-designed language as an alternative way to write a module. There are lots of existing attempts at this which work as preprocessors that then generate the main OpenTofu language, but these questions (and this article) are about having another language be directly supported by OpenTofu, so that there isn't any additional indirection where an author is using one language to generate another.

(Note that this is not about supporting general-purpose languages such as TypeScript or Python. That's also a common question with some interesting details, but if that's what you're looking for then I'd recommend you consider using Pulumi instead of OpenTofu, since that product already addresses that problem well.)

Because of these common requests -- and also, to be honest, because I just find the question interesting in itself! -- I've researched a number of different potential alternative languages in the past, and unfortunately have always found some drawback that makes it not a good fit for OpenTofu.

One of the earliest alternative languages I considered was CUE, but it was very young when I first looked at it, and so the language was not yet complete enough for practical use and its library API (for using CUE as an embedded language in other Go programs) was aimed at too high a level of abstraction for OpenTofu's needs.

CUE has come quite a long way in the meantime though, so I thought it would be interesting to revisit it and see if the situation has changed. This is a also a pretty good time to consider alternative languages for OpenTofu, since we're getting started on a redesign of OpenTofu's internals that should, among other benefits, hopefully better isolate OpenTofu's semantic layer from its surface language(s).

A CUE Primer

The CUE website has a number of guides on the concepts behind its language and so I won't spend a lot of words on that here, but there are some top-level ideas that the rest of this article relies on, so I'll attempt a short summary.

For my purposes here, the most importand part is the letter "U" in "CUE", which stands for "unify". I think is is by far the most significant idea that any CUE author needs to understand.

"Unification" is a fancy word for the idea of taking two definitions of the same item and producing a new value that somehow represents the meaning of both inputs at once. The notation for unification is the & operator, which has the values to be unified as its operands.

For example, the type constraint string can unify with the concrete value "hello" to produce the concrete value "hello", because that value is a string: string & "hello" produces "hello".

A more complicated example is unifying two "struct" values (CUE's closest equivalent to OpenTofu's object types) by combining their fields together: { a: "foo" } and { b: "bar" } can unify to { a: "foo", b: "bar" }.

Conversely, a failure to unify is the primary way that CUE can be used for validation-related use-cases: bool and "hello" cannot unify at all, and so successfully unifying a concrete value with a value that describes schema affirms that the concrete value conforms to the schema. CUE represents such failures as a special value called "bottom", spelled _|_, and so the error gets recorded as part of the data structure in the place where it occured rather than being returned through a side-channel as in many other languages.

OpenTofu actually has some similar theoretical foundations itself, though they are considered to be implementation details rather than part of the programming model, and are implemented directly as logic in Go rather than abstractly in a domain-specific language as in CUE.

For example, a subset of the schema for an older, simpler version of aws_vpc in the hashicorp/aws provider could be written in CUE syntax something as like this:

{
  // Arguments that the configuration author is allowed to set
  cidr_block:                       string
  instance_tenancy:                 string | *null
  enable_dns_support:               bool | *true
  enable_dns_hostnames:             bool | *false
  enable_classiclink:               bool | *false
  enable_classiclink_dns_support:   bool | *false
  assign_generated_ipv6_cidr_block: bool | *false
  tags:                             { [string]: string } | *{}

  // Arguments whose values are decided by the provider
  arn:                 string
  id:                  string
  main_route_table_id: string
  ipv6_association_id: string | null
  ipv6_cidr_block:     string | null
}

That (non-concrete) value can be unified with a concrete aws_vpc object from OpenTofu state to determine whether it conforms to the schema, and also to substitute default values (using the syntax like *true above) where the given struct value does not include those fields.

For example, given this very minimal input...

{
  cidr_block: "192.168.0.0/16"
}

...unifying that with the schema above would produce:

{
  cidr_block:                       "192.168.0.0/16"
  instance_tenancy:                 null
  enable_dns_support:               true
  enable_dns_hostnames:             false
  enable_classiclink:               false
  enable_classiclink_dns_support:   false
  assign_generated_ipv6_cidr_block: false
  tags:                             {}
  arn:                              string
  id:                               string
  main_route_table_id:              string
  ipv6_association_id:              string | null
  ipv6_cidr_block:                  string | null
}

Notice that some of the fields still contain non-concrete constraints like string instead of concrete values like "hello". This has a similar meaning to the idea of "unknown values" in the OpenTofu language: it's a placeholder for a value that we don't fully know yet, describing whatever we do know about it. The unification operation I described above is quite similar to the work OpenTofu does to produce the "proposed new value" to send to a provider for planning purposes.

The other very important idea about CUE for the sake of this example is that, unlike in the OpenTofu language, the values described in the input are also the values you can refer to from other expressions. You can refer to anything you've defined elsewhere.

Here's a simple example demonstrating that:

greeting = "Hello"
name = "Martin"
message = "\(greeting), \(name)!"

message can refer directly to the greeting and name fields defined at the same level. In the OpenTofu language the symbols you can refer to are only indirectly related to the input configuration, mediated using rules decided by OpenTofu itself, so something like the above would typically involve defining some Local Values -- an OpenTofu-level concept rather than an HCL-level concept -- with references between them.

What it means to use CUE for OpenTofu

There are lots of different ways to think about using a different source language for defining an OpenTofu module. I already mentioned above that preprocessors are not what I'm talking about here, but there's also another possibility I want to rule out before we start:

In principle one could try to map HCL's syntax-agnostic information model onto CUE's concepts, so that applications like OpenTofu would continue thinking in terms of HCL's concepts even though the source syntax is CUE, similar to how HCL already defines a mapping to JSON's concepts.

However, HCL's abstraction here was designed for languages that either have a very similar evaluation model as HCL's own native syntax (of which there are currently no interesting examples) or languages that have no expression evaluation concept of their own at all and so HCL can impose its own, like with JSON where HCL says that JSON strings correspond to HCL string templates.

CUE is not a viable target for this abstraction because it has its own separate evaluation model that is not compatible with HCL's. The best we could do is effectively the same as preprocessing: evaluate the CUE program to obtain a concrete data structure, and then use that data structure as the input to HCL. Most notably, that would mean that CUE expressions could not refer to any dynamic data from HCL's scope, because CUE evaluation would happen before HCL's expression evaluation.

Therefore I'm only really interested in a model where OpenTofu interacts directly with CUE in a similar way to how it interacts with HCL today, where the CUE program is the definition of the module and OpenTofu expects the CUE program to produce a final value to send to the provider, rather than producing another level of expressions to evaluate through HCL.

Making CUE meet OpenTofu's expectations

Although the OpenTofu language and CUE have some similar theoretical foundations, the fact that CUE treats the input program as the data that's available in references, rather than having a separate symbol table as HCL does, means that OpenTofu would need to approach CUE evaluation a little differently than HCL evaluation.

Specifically, OpenTofu must introduce additional data into the CUE program by actually modifying that program to include the additional data.

For the sake of my experiment here, I decided to slightly extend CUE using its "attribute" syntax, which was intended for this very purpose of describing how data in CUE relates to data in some other language/format/system. For example, let's consider this relatively-simple but realistic description of some AWS EC2 network resources:

_base_cidr_block: string @input(base_cidr_block)
_subnets: {
    [string]: close({
        number:   int
        tag_name: string
    })
} @input(subnets)
_tags: { [string]: string } @input(tags)

vpc: {
    cidr_block: _base_cidr_block
    tags:       _tags
} @resource(aws_vpc.main)

subnets: {
    for n, s in _subnets {
        (n): {
            cidr_block: cidrsubnet(vpc.cidr_block, 4, s.number)
            vpc_id:     vpc.id
            tags: {
                _tags
                Name: s.tag_name
            }
        }
    }
} @resource(aws_subnet.main[*])

vpc_id: vpc.id @output(vpc_id)

subnet_ids: {
    for k, s in subnets {
        (k): s.id
    }
} @output(subnet_ids)

Notice that some of the fields are annotated with attributes starting with @input, @resource, and @output. I intend for these to have a similar meaning as variable, resource/data/ephemeral, and output blocks respectively in the current OpenTofu language, but here I'm leaning into CUE's "data-first" design instead of HCL's "structure-first" design.

These extra attributes allow OpenTofu to identify parts of the data structure that it should interact with, while everything else is just arbitrary data that's private to the CUE program.

Instead of passing in a symbol table with symbols like var.subnets, OpenTofu would handle input variables by behaving as if it was modifying this program with additional "unify" operations on every field that's annotated with an @input attribute, like this:

_base_cidr_block: string & "192.168.0.0/16" @input(base_cidr_block)
_subnets: {
    [string]: close({
        number:   int
        tag_name: string
    })
} & {
    foo: {
        number:   1
        tag_name: "Foo"
    }
    bar: {
        number:   2
        tag_name: "Bar"
    }
} @input(subnets)
_tags: { [string]: string } & {
    Environment: "PROD"
} @input(tags)

After the unification operator has been applied, _base_cidr_block refers to the concrete value "192.168.0.0/16", and so on. Therefore CUE can propagate that value into the definition of vpc, producing the same value as this program:

vpc: {
    cidr_block: "192.169.0.0/16"
    tags: {
        Environment: "PROD"
    }
} @resource(aws_vpc.main)

The other helpful modification OpenTofu can make is to find each field annotated with a @resource and modify its expression to be unified with a value derived from the provider schema. I'm going to use an even more simplified subset of the real provider schema here just to keep these examples relatively terse:

vpc: {
    cidr_block: _base_cidr_block
    tags:       _tags
} & close({
    id: string
    cidr_block: string
    tags: { [string]: string } | *{}
}) @resource(aws_vpc.main)

subnets: {
    for n, s in _subnets {
        (n): {
            cidr_block: cidrsubnet(vpc.cidr_block, 4, s.number)
            vpc_id:     vpc.id
            tags: {
                _tags
                Name: s.tag_name
            }
        }
    }
} & {
    [string]: close({
        id: string
        vpc_id: string
        cidr_block: string
        tags: { [string]: string } | null
    })
} @resource(aws_subnet.main[*])

One somewhat-arbitrary decision I made here is that using @resource with an address that ends in [*] has a similar effect to for_each in the current OpenTofu language, and so whatever it's annotating is expected to be a map from instance keys to configuration objects rather than just a single configuration object. Therefore the @resource(aws_vpc.main) field gets annotated with just the aws_vpc schema directly, while @resource(aws_subnet.main[*]) gets annotated with a slightly more elaborate schema that represents the map-of-objects structure.

(I'm writing out the modified CUE programs as source code here just for exposition purposes, but in my prototype implementation of this they really exist only in memory as a modified abstract syntax tree, so the end-user never needs to see these somewhat-ugly expressions.)

After the CUE runtime to evaluate this modified program, annotated with both input values and provider schemas, produces the following value:

{
    // (the underscore-prefixed names are removed in CUE's default
    // value presentation because by convention they are unexported
    // fields.)
    vpc: {
        id:         string
        cidr_block: "192.168.0.0/16"
        tags: {
            Environment: "PROD"
        }
    }
    subnets: {
        bar: {
            id:         string
            cidr_block: "192.168.32.0/20"
            vpc_id:     string
            tags: {
                Environment: "PROD"
                Name:        "Bar"
            }
        }
        foo: {
            id:         string
            cidr_block: "192.168.16.0/20"
            vpc_id:     string
            tags: {
                Environment: "PROD"
                Name:        "Foo"
            }
        }
    }
    vpc_id: string
    subnet_ids: {
        bar: string
        foo: string
    }
}

As a final postprocessing step we can use the OpenTofu-specific attributes again to extract the sub-trees of this data structure that are relevant to OpenTofu's goals:

// (this notation is just some debug output from my prototype, showing
// the evaluated value associated with each declaration.)
@resource(aws_vpc.main) is {
    id:         string
    cidr_block: "192.168.0.0/16"
    tags: {
        Environment: "PROD"
    }
}
@resource(aws_subnet.main) is {
    bar: {
        id:         string
        cidr_block: "192.168.32.0/20"
        vpc_id:     string
        tags: {
            Environment: "PROD"
            Name:        "Bar"
        }
    }
    foo: {
        id:         string
        cidr_block: "192.168.16.0/20"
        vpc_id:     string
        tags: {
            Environment: "PROD"
            Name:        "Foo"
        }
    }
}
@output(vpc_id) is string
@output(subnet_ids) is {
        bar: string
        foo: string
}

What we have here is essentially the same as what the current OpenTofu language produces during the validation phase: resource instance objects that conform to the resource type schemas, with unknown values as placeholders for values that won't be known until either the plan phase or apply phase, and the similarly-placeholder values for outputs, derived from those resource values.

A real implementation could therefore send each of these three resource instance objects (aws_vpc.main, aws_subnet.main["bar"], and aws_subnet.main["foo"]) to the provider's ValidateManagedResourceConfig function to make sure that the value also respects any additional validation rules that can't be expressed in CUE, similar to how providers normally catch validation problems that can't be caught by HCL/OpenTofu alone.

So far so good! But next we need to deal with an additional problem: side-effects and the dependencies between them.

Gradual Evaluation with Dependencies

During the planning phase, the OpenTofu language runtime visits and evaluates each resource instance configuration in a dependency-respecting order, typically inferring dependencies automatically based on the references between resource instances.

For each resource instance, it calls the provider's PlanManagedResourceChange operation to allow the provider to run some arbitrary logic to decide how to merge the prior state with the desired state implied by the configuration, producing the planned new value. When one resource instance refers to another, it's planned new value that actually gets populated into HCL's symbol table, so that downstream resource instances can incorporate values that the provider added to the upstream resource instance's object.

The apply phase is essentially the same except that it also uses ApplyManagedResourceChange to cause changes to the real infrastructure, and then that function's value propagates downstream to other resource instances instead.

Because CUE does not have a separate symbol table from the source program, again we need a slightly different strategy for CUE. This step is the main place where things fell apart in my first experiment with incorporating CUE, but this is also an area where some things have changed in our favor in the meantime.

In particular, CUE actually has its own subsystem that gradually performs side-effects in a dependency-respecting fashion and propagates data between them, called flow. Unfortunately, the current form of this relies on being embedded in the CUE codebase, and so an external caller like OpenTofu cannot follow this strategy today. Someone from the CUE team has indicated that they intend to expose the underlying building-block eventually though, so the remainder of this is a hypothetical design that I've not been able to verify using a prototype, but the approach in the "flow" package is similar enough that I'm optimistic that it should work.

Recalling how I previously incorporated input values and resource type schemas into the program, you might already have guessed what comes next: OpenTofu must continue gradually modifying the program with additional "unify" operations, one resource instance at a time until they've all been evaluated. Each modification adds information needed to evaluate downstream resource instances.

For example, after applying the changes for aws_vpc.main, OpenTofu could modify the expression for the vpc field to include another unify operation, this time with the final object returned by the provider:

vpc: {
    cidr_block: _base_cidr_block
    tags:       _tags
} & close({
    id: string
    cidr_block: string
    tags: { [string]: string } | *{}
}) & {
    id:         "vpc-a1b2c3d4"
    cidr_block: "192.168.0.0/16"
    tags: {
        Environment: "PROD"
    }
} @resource(aws_vpc.main)

If OpenTofu then asked CUE to evaluate this modified program, it would incorporate the VPC ID that was now returned by the provider and propagate it downstream, allowing the resources and output values to update to the following:

@resource(aws_vpc.main) is {
    id:         "vpc-a1b2c3d4"
    cidr_block: "192.168.0.0/16"
    tags: {
        Environment: "PROD"
    }
}
@resource(aws_subnet.main) is {
    bar: {
        id:         string
        cidr_block: "192.168.32.0/20"
        vpc_id:     "vpc-a1b2c3d4"
        tags: {
            Environment: "PROD"
            Name:        "Bar"
        }
    }
    foo: {
        id:         string
        cidr_block: "192.168.16.0/20"
        vpc_id:     "vpc-a1b2c3d4"
        tags: {
            Environment: "PROD"
            Name:        "Foo"
        }
    }
}
@output(vpc_id) is "vpc-a1b2c3d4"
@output(subnet_ids) is {
        bar: string
        foo: string
}

After this, OpenTofu has enough information to create and apply the final plan for both aws_subnet.main["bar"] and aws_subnet.main["foo"], and can likewise modify their expressions with an additional unification operation:

subnets: {
    for n, s in _subnets {
        (n): {
            cidr_block: cidrsubnet(vpc.cidr_block, 4, s.number)
            vpc_id:     vpc.id
            tags: {
                _tags
                Name: s.tag_name
            }
        }
    }
} & {
    [string]: close({
        id: string
        vpc_id: string
        cidr_block: string
        tags: { [string]: string } | null
    })
} & {
    // Note that this time we have two separate objects to return,
    // because this is a multi-instance resource.
    bar: {
        id:         "subnet-abc123"
        cidr_block: "192.168.32.0/20"
        vpc_id:     "vpc-a1b2c3d4"
        tags: {
            Environment: "PROD"
            Name:        "Bar"
        }
    }
    foo: {
        id:         "subnet-def789"
        cidr_block: "192.168.16.0/20"
        vpc_id:     "vpc-a1b2c3d4"
        tags: {
            Environment: "PROD"
            Name:        "Foo"
        }
    }
} @resource(aws_subnet.main[*])

...and then that's all of the resource instances dealt with and one final evaluation should leave us with concrete state for each resource instance and so concrete output values derived from those:

@resource(aws_vpc.main) is {
    id:         "vpc-a1b2c3d4"
    cidr_block: "192.168.0.0/16"
    tags: {
        Environment: "PROD"
    }
}
@resource(aws_subnet.main) is {
    bar: {
        id:         "subnet-abc123"
        cidr_block: "192.168.32.0/20"
        vpc_id:     "vpc-a1b2c3d4"
        tags: {
            Environment: "PROD"
            Name:        "Bar"
        }
    }
    foo: {
        id:         "subnet-def789"
        cidr_block: "192.168.16.0/20"
        vpc_id:     "vpc-a1b2c3d4"
        tags: {
            Environment: "PROD"
            Name:        "Foo"
        }
    }
}
@output(vpc_id) is "vpc-a1b2c3d4"
@output(subnet_ids) is {
        bar: "subnet-abc123"
        foo: "subnet-def789"
}

The execution engine can then return these output values to the caller, and its work is complete!

Comparisons with the current OpenTofu Language implementation

Aside from the fact that not all of the needed functionality is currently exposed to outside callers, it now seems like CUE has sufficient features to be used as part of an evaluation and execution model like OpenTofu's.

The main "trick" to this is that whereas OpenTofu's runtime maintains several separate data structures -- a table of input variable values, a table of provider schemas, the "state" containing the results for each resource instance -- for CUE we effectively store all of that data inside the CUE program itself, by modifying it in-place and repeatedly re-evaluating it.

With that comes a performance concern, though: HCL is intentionally designed to allow its calling application to pull the input program into small parts that can be evaluated in isolation, whereas for CUE the programming model effectively requires re-evaluating the entire program over and over as new information is added to it. I don't have the practical experience to gauge how much this actually hurts, but incremental evaluation is already recorded as a concern for CUE's own "flow" engine, which follows a very similar implementation strategy to what I sketched for OpenTofu above.

On the other hand, re-evaluating the entire program every time (or, ideally in future, re-evaluating a subset that's affected by a change) gives authors a lot more flexibility in how they can structure things, vs. OpenTofu's highly prescriptive structure. In most of my sketching here I used a relatively flat structure that resembles a current typical OpenTofu module with all of the declaration attributes on top-level fields, but in principle those declaration attributes could appear at arbitrary points in the program, and be nested inside one another:

vpc: {
    cidr_block: _base_cidr_block
    tags:       _tags

    // NOTE: Directly annotating the `id` field also being the definition
    // of the `vpc_id` output, instead of declaring a separate field for
    // the output somewhere else.
    id: string @output(vpc_id)
} @resource(aws_vpc.main)

When I tried this in practice I had some troubles with the attributes not being propagated consistently during unification, and so in certain shapes of configuration the @output(vpc_id) attribute got silently dropped when unifying with the final value for the resource instance. I'm not sure if that's a bug in the CUE evaluator or if I just don't understand well enough the rules for attributes under unification.

What's missing?

So far I've mainly focused on the commonalities between the current OpenTofu language and this hypothetical CUE-based alternative, but there are still a few remaining details that CUE doesn't seem to have an answer to yet:

OpenTofu uses a concept called "marks" from the cty type system that HCL uses to represent various details about the provenence of a value, such as whether it was derived from a sensitive value or from an ephemeral value.
As far as I can tell, CUE doesn't have any mechanism quite like that. I don't know how the concepts of sensitivity and ephemerality would be implemented for modules written in CUE.
OpenTofu is also beginning to make use of another cty concept called "capsule types" in the new evaluator prototype, as a way to pass references to OpenTofu-specific objects like providers through expressions as a special kind of value.
As far as I can tell, CUE's type system is closed and so doesn't offer any similar way for the calling application to pass opaque references to its own objects through CUE evaluation. I can certainly understand why that would be omitted though, since it would need to be possible to define where each of these new types appears in the overall type lattice and define custom unification rules for them. The equivalent mechanisms in cty are quite complex and awkward to use.
OpenTofu has a set of built-in functions that modules can rely on. Many of these are general-purpose enough that CUE already has its own equivalents, such as JSON parseing/encoding, but OpenTofu also has a few that are a little more specific to OpenTofu's domain or that directly expose information from OpenTofu's language runtime, so we'd probably want some way to introduce custom functions written in Go.
There does not appear to currently be any way to do that. All of the Go-implemented functions available in CUE live inside the CUE codebase and are implemented in terms of unexported APIs, so OpenTofu cannot currently define its own functions.
(Observant readers might've noticed that my earlier examples used a cidrsubnet function that doesn't actually exist in CUE. I cheated with that: the real program I was using for prototyping uses "192.168.\(s.number*16).0/20" instead as a placeholder, just like folks used to do in Terraform before it had CIDR calculation functions!)
Finally, as noted earlier, the facilities for detecting dependencies between values are currently not exposed in the public API, though I expect that will change eventually. Dependency detection is crucial for OpenTofu's behavior because it needs to propagate the results of side-effects, so this is a show-stopper for now.
The "flow" engine also works at the ADT level rather than the AST level as I used in my prototype. The AST API serves as a good enough substitute for experimentation, but it would be a lot less clunky to work at the semantic layer, so hopefully more of that functionality will be exposed through the public cue.Value abstraction eventually.

Overall though, I think the path to this hypothetical future is considerably clearer than when I first investigated this several years ago.

What's next?

I'm not intending to pursue this any further at least until the dependency API and a more complete representation of the ADT are exposed in CUE's public API, since the dependency detection is crucial and while AST-based processing can work in simple cases it's unlikely to be robust in more complicated programs, such as those where the predefined symbols like string are shadowed by local declarations.

The proposed new internal architecture for OpenTofu will hopefully make it more plausible to support different source languages in future, and possibly even allow mixing them in the same program, but I expect we won't prioritize that for now since we've got plenty work to do just to get HCL-based modules working to the same extent as they work in today's runtime.

With that said then: this article is mainly just some notes for my own future reference so I can hopefully pick up where I left off as the CUE team continues to expose more functionality in the public API. We'll see how it goes!