Changes Outside of Terraform

Terraform's primary function is to compare the current state of remote objects against a desired state written down in configuration and then, in case of any detected differences, plan one or more actions to change the remote system to match the desired state.

Terraform maintains its own "state" artifact because it serves as the binding between resource instance addresses within Terraform and whatever identification scheme the remote system uses. But since the intent is to make decisions based on the real remote system state, rather than Terraform's own cache of it, Terraform typically uses the Terraform state that resulted from the previous apply only as input to the step of "refreshing" to make the in-memory state match the remote system.

In an ideal world, once a team is using Terraform they will change Terraform-managed objects only using Terraform itself, and so that refreshing step serves only to read back the very same information Terraform already knew. In practice though, there are various reasons why Terraform-managed objects might change outside of Terraform:

The remote system architecture isn't a perfect fit for the system you're building on top of it, and so there are objects that are partially managed by Terraform but also partially managed by some other system.
This is a common use-case for Terraform's ignore_changes workaround, which lets you tell Terraform to consider the remote system state as always correct for a particular argument, and so not to propose any updates when the configuration doesn't match.
The provider for a particular resource type exposes some attributes that change dynamically at runtime, independently of Terraform changes.
These are often not super useful to expose, but some providers expose them either out of a sense of completeness or because the resource type schemas are programmatically generated from remote API schemas and are thus automatically exhaustive of all possible attributes, even if they are not directly useful in Terraform.
In an exceptional situation or due to a mistake, you've modified the remote system directly using its admin console or similar tool, and so now Terraform needs to react to those updates. This scenario is commonly known as "drift", and is typically best kept to a minimum in order to reduce confusion within a team.

Terraform has always dealt with the situations above by updating its in-memory record of object state before making any plans, so the plan will be derived from the updated object rather than the last known result. However, Terraform typically didn't explicitly mention that it was doing that aside from a nondescript note that it was "Refreshing", and so situations like the above would often lead to Terraform proposing an action that doesn't have a corresponding change to the given configuration, and doesn't have any other apparent explanation in Terraform's output.

Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_instance.example will be updated in-place
  ~ resource "aws_instance" "example" {
      tags = {
        ~ Name = "blorp" -> "cms-app-server"
      }
    }

Plan: 0 to add, 1 to change, 0 to destroy

Terraform does know what the previous result and the refresh result contain, though. It just was historically immediately discarding the previous result after refreshing, and thus throwing away the explanation of the underlying cause of an unprompted action like this.

We've been gradually working towards having Terraform retain both the previous result and the refresh result, and so from Terraform v0.15.4 onwards Terraform will now report any changes it detected prior to proposing any actions required to undo them, adding in the missing information required to explain why Terraform is proposing those changes.

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the
last "terraform apply":

  # aws_instance.example has been changed
  ~ resource "aws_instance" "example" {
      tags = {
        ~ Name = "cms-app-server" -> "blorp"
      }
    }

────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_instance.example will be updated in-place
  ~ resource "aws_instance" "example" {
      tags = {
        ~ Name = "blorp" -> "cms-app-server"
      }
    }

Plan: 0 to add, 1 to change, 0 to destroy

Terraform can't know why that has object changed outside of Terraform, but it can at least tell you that it did, and then leave you to decide what (if anything) to do about it. Perhaps the change outside of Terraform was intentional and so it's better to update the configuration to match it, or maybe that change was made in error and so you'd want to apply the plan to undo it and restore the original value.

Tracking Multiple States

In Terraform's original design it had only one "state" artifact, representing the most up-to-date representation of the current state or the desired state at any given moment.

During the refresh phase Terraform would replace objects saved after the last run with objects resulting from the refresh operation, and then during the planning phase Terraform would replace those refresh results with a best approximation of the desired state represented in the configuration, though using placeholders for any values that won't be known until after apply. If you then apply the plan, Terraform will update the state one more time to represent the results of each action in the plan, and then serialize the resulting state to the configured state storage to use as one of the inputs to the next run.

In a recent major release we addressed some long-standing ordering problems between refreshing and planning by merging the refresh phases and planning phases together. Modern Terraform now performs both operations in a single graph walk: for each resource instance visited, it will first refresh the state to match the remote object and then immediately create a plan for that resource instance based on the result.

This change meant that for the first time Terraform Core needed to track two different "state" artifacts at once: both the result of refreshing, which is what the plan is based on, and the approximation of the result of the proposed actions which must be taken into account when planning other downstream objects.

Terraform v0.15.0, then, has both the "refreshed state" and the "proposed state", and updates them both with each resource instance visited. The planned actions are therefore the difference between the refreshed state and the proposed state.

However, detecting changes outside of Terraform requires tracking a third state artifact: the state as it existed before refreshing, which we'll call the "previous run state". This one is interesting in that Terraform largely treats it as read-only, but it does still need to make adjustments to it in order to upgrade stored objects to the latest resource type schema, or else Terraform would need to present the automatic upgrade changes as part of the detected changes, and that would be both confusing and also technically difficult given that Terraform only has the latest schema information for each resource type.

Terraform's planning phase therefore now produces the proposed new state, the refreshed state, and the previous run state, with the latter two being the data source for the "Objects have changed outside of Terraform" message.

This first pass of the feature meets the basic need of reporting changes that Terraform has always seen and reacted to but previously didn't mention. However, an important consequence of implementing it so late in Terraform's life is that there are inevitaibly decisions elsewhere that assume Terraform will keep quiet about these changes, and so for the moment this mechanism is a little more noisy than we'd like. I'll discuss some of these in the remaining sections, along with some early ideas of what we might do about them.

Drift vs. Normalization

One of the key responsibilities of a Terraform provider is to consider the normalization rules of a particular remote system and then decide for any given change whether the new value is materially different than the old value.

A common example of this is case insensitivity: Terraform by default considers two strings as equal only if they contain the same sequence of Unicode characters, and so without a provider's help it'll assume that a change of case represents a material change that must be handled somehow. But we assume that a provider has knowledge of the design of a particular API and so it can detect case-only changes to a case-insensitive argument and let Terraform know that the change is immaterial, and so to avoid planning an update to "fix" it.

We typically informally describe an immaterial change as "normalization", and a material change as "drift". Not all material changes are necessarily drift, but that's not important for what we're discussing in this section.

Unfortunately, due to a historical design quirk of the current Terraform SDK the provider's rules about what changes are only normalization only get applied during planning, and not during refreshing. Therefore the first time you make a plan after creating an object, you might see Terraform do the following:

Detect that your mixed-case argument value has now become all lowercase, and report that as a change outside of Terraform.
But then, because the provider signals that case changes are immaterial during planning, Terraform will just keep the all-lowercase value as the new desired state and thus not plan to "fix" it.

Hopefully a future version of the SDK will correct this and apply the same normalization rules in all cases (both in refresh and immediately after apply), so that the provider can stick with whatever form you wrote in your configuration as long as what it finds in the remote system is just a normalized version of what you wrote. If so, providers built with that newer SDK will keep quiet about normalization detected during refresh, removing one element of noise from the current output.

Unused Attributes

As I mentioned in my introduction above, some resource types export attributes that Terraform can't directly control but that are still tracked by the remote system in response to activities elsewhere. For example, in an autoscaling system we typically use Terraform to configure the desired range of instance counts, but the actual number of instances is decided by the remote system itself, and so that information is irrelevant to Terraform even though the remote API might include it in service of other use-cases.

Because Terraform has historically just quietly written these arguably-useless values into its state and moved on, some providers quite reasonably included them as outputs "just in case", either manually out of a sense of completeness or automatically as a result of mechanically deriving the Terraform schema from the remote system's schema.

Terraform itself doesn't know the meaning of any of this data, so it will currently just dutifully report those changes as made outside of Terraform, but as humans we know these are not interesting from a Terraform standpoint and thus their inclusion is just noise.

There is an argument here that anything you can hypothetically make use of in an expression somewhere else in your module is fair game to have changes reported against it, because those changes might have knock-on effects elsewhere in the module.

However, if your module includes no expressions which refer to that attribute then there can be no downstream effect. Ideally then, Terraform would recognize when a particular attribute is neither set in configuration nor used elsewhere in the module and exclude it from the change report.

It turns out that this is easier said than done, because there are lots of ways in the Terraform language to indirectly use an attribute in a way that Terraform can't prove via static analysis alone. Here's one relatively-simple example:

output "example" {
  value = aws_instance.example
}

In this case the module has exported the entire aws_instance.example object, and so it would take some pretty complex analysis to follow that reference into the parent module and understand that module.foo.example.private_ip is a reference to only the private_ip attribute of that instance. Not impossible by any means, but far more inference than the Terraform language has ever needed before.

Hopefully we can find a good compromise here to detect unused attributes with good enough accuracy to be useful for filtering out most noise. Perhaps also this feature will inspire developers working on new resource types to consider excluding attributes that change independently of the desired state, since any actual use of those values in a Terraform module would typically cause the module to become non-convergent, which is usually undesirable.

Legacy Terraform SDK Oddities

A final quirk is one that's been haunting us a bit for a while now, which I discussed previously in Tech Dept in Terraform's Plugin SDK.

As noted over there, what we now call the "SDK" was originally designed as just a library inside Terraform itself, because at that time the providers were built into the same distribution package as Terraform CLI. This helper library was therefore tightly coupled with both Terraform Core and with the providers that we implemented in terms of it, without any intentionally-designed interface boundary between them. That was justified because Terraform was still pretty young and evolving quickly, and because any time we changed Terraform Core we'd typically change the helper library to match in the same commit.

Over time though, there has become far too much provider code implemented against that library for us to make any significant changes to its design, and so ever since Terraform v0.12 (which addressed several long-standing design problems with Terraform Core) the SDK has been considerably mismatched with Terraform Core's assumptions. We also split providers out as separate packages in Terraform v0.10, which means we can no longer treat Terraform Core, the SDK, and providers together as a single monorepo.

As a pragmatic compromise for this, Terraform Core and Terraform CLI both include some accommodations for the legacy SDK, including some specific allowances in the UI layer which renders plans to hide some differences that aren't really differences at all, but rather just historical design quirks of the SDK.

We preserved those workarounds for the new report of detected changes, but in Terraform v0.15.4 we didn't implement the workarounds at the right layer, and so in some situations Terraform will report that there are changes outside of Terraform but then not actually show what they are, because the component responsible for rendering the changes dutifully hid the oddities.

At the time I'm writing this we have a fix for this merged and ready to include in the forthcoming Terraform v0.15.5.

Onward

With Terraform being quite an old product at this point, there are very few changes we can make to its existing behavior that don't run afoul of at least some quirks from elsewhere. We do try to contain those as much as possible, but in cases like this we prefer to ship an initial version of a feature sooner, rather than holding back over several releases until we can address every possible situation.

I hope that despite the noise this feature adds for some long-standing configurations and providers, it'll still be helpful to folks new to Terraform who aren't equipped with the experience of understanding how Terraform responds to external changes. And, of course, I hope we'll be able to improve the change detection to take more information into account so we can focus only on changes that are relevant, thus making it more useful for complex situations too.