Chapter 3
State — The Most Important File You've Never Thought About
If you stop reading this series after one chapter, make it this one. State management is the line between "I tried Terraform once" and "I run Terraform in production." Almost every painful Terraform incident — corrupted infrastructure, lost resources, drift nobody can explain — traces back to a state file mishandled.
This chapter does three things. First, it explains what state actually is and why Terraform needs it. Second, it shows the four ways local state goes wrong. Third, it walks through the modern solution — an S3 backend with native locking, the 2026 default for AWS teams.
What state actually is
Open the terraform.tfstate file from chapter 2 again. You're looking at JSON that records, for every resource Terraform manages: the resource type, the local name, every attribute (including AWS-assigned things like ARNs and IDs), and metadata like the Terraform version that created it.
This file is Terraform's memory of the world. Without it, when you run terraform plan, Terraform has no way to know:
- What does it currently manage? Is that S3 bucket in your account something Terraform created, or did someone make it manually?
- What was the last known state? Has someone changed it since the last
apply? - What's the dependency graph? If you delete a resource, what else needs to be updated to compensate?
You might think: "Why doesn't Terraform just query AWS to figure this out?" Two reasons.
First, it would be impossibly slow. A medium-sized infrastructure has hundreds of resources. Terraform would have to make hundreds of API calls just to figure out where it stands before doing anything. State lets it remember the last-known-good and only diff against actual reality when needed.
Second, ambiguity. AWS has, say, 20 S3 buckets in your account. Three of them are Terraform-managed; the rest are not. Without state, Terraform can't tell them apart. It might try to delete a bucket you created for a different purpose, or worse, leave a Terraform-managed bucket alone because it can't recognize it.
State is the link between the abstract resources you wrote in HCL (aws_s3_bucket.first) and the concrete, AWS-side objects (a bucket with a specific ARN). Lose that link and Terraform is functionally blind.
Four ways local state goes wrong
Right now, your state file is on your laptop. That's fine for chapter 2's solo learning exercise. The moment you involve a team or any kind of automation, it falls apart in predictable ways.
1. Two people, two state files
You and a teammate clone the same Terraform repo. You run terraform apply Monday morning to add a security group rule. The state file on your laptop now records that resource. Your teammate runs terraform apply Monday afternoon to change a tag — but their state file doesn't know about your security group rule. Terraform plans to delete it, because as far as their state knows, it doesn't exist.
The first person to apply after a teammate makes a change destroys that teammate's work.
2. State in git
The seemingly obvious fix — "just commit the state file to git!" — is the second-worst option. State files contain every attribute of every resource, in plaintext. That includes things you do not want in git history: connection strings, IAM policy documents, sometimes secrets injected by providers. Even if you scrub it once, the next apply regenerates it. Git is wrong for this.
It also doesn't solve the concurrency problem — two applys racing each other still corrupt the same file.
3. Laptop loss / rebuild
Your laptop dies. Or you wipe it. The state file is gone. Your AWS account still has the infrastructure, but Terraform has no record of which resources are "its" anymore. Recovery is technically possible (terraform import, painfully, one resource at a time) but unpleasant for anything beyond a handful of resources.
4. CI/CD has nowhere to keep state
You set up a GitHub Actions pipeline to run terraform apply on merges to main. Where does that pipeline put the state file? On the runner? Runners are ephemeral. Commit it back to the repo? See problem #2. There's no clean answer with local state.
The solution to all four problems is the same: don't keep state on disk. Keep it in a shared, versioned, locked, encrypted place that everyone — humans and CI — reads from and writes to.
Remote backends
Terraform's mechanism for "store state somewhere else" is called a backend. Several backends exist (Azure Blob, GCS, Terraform Cloud), but on AWS the canonical answer is S3.
What you want from a state backend, regardless of vendor:
| Property | Why it matters |
|---|---|
| Durability | Losing state is catastrophic |
| Versioning | If today's apply corrupts state, yesterday's version saves you |
| Locking | Prevents two applys from racing each other |
| Encryption | State contains attribute data you don't want leaked |
| Access control | Only the right humans + CI service should read it |
S3 gives you all five. Until recently, getting locking on S3 required a second AWS resource (a DynamoDB table). As of Terraform 1.10, S3 supports native state locking using S3 conditional writes. You no longer need DynamoDB. If a tutorial tells you otherwise, it was written before late 2024 and is now out of date.
The bootstrap problem
There's a chicken-and-egg issue: how do you Terraform an S3 bucket whose purpose is to store the state of Terraform? You can't, not in the same configuration that uses it as a backend.
The conventional solution is bootstrapping in a separate, tiny Terraform configuration. We'll do this in two stages.
Stage 1: bootstrap the state bucket (with local state)
Make a new directory terraform-state-bootstrap/ with this main.tf:
terraform {
required_version = ">= 1.10"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
resource "aws_s3_bucket" "tf_state" {
bucket = "tf-state-yourname-2026"
# Once this bucket holds state, you do NOT want to accidentally delete it.
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "tf_state" {
bucket = aws_s3_bucket.tf_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "tf_state" {
bucket = aws_s3_bucket.tf_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_public_access_block" "tf_state" {
bucket = aws_s3_bucket.tf_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}Three things to notice:
lifecycle { prevent_destroy = true }— this is a guardrail. Even if someone runsterraform destroyhere later, Terraform will refuse to delete the bucket. Tripwire.- Versioning enabled — every state write becomes a new S3 version. If today's state is corrupted, yesterday's version is one click away.
- Encryption + public access block — state files contain attribute data you don't want leaked. Both are non-negotiable in any production setup.
Apply this with the four-command loop from chapter 2:
cd terraform-state-bootstrap
terraform init
terraform plan
terraform applyThe bucket exists. The bootstrap config's own state still lives locally, in terraform-state-bootstrap/terraform.tfstate. That's fine — you'll touch this directory once a quarter at most, and the resources here are tiny.
Stage 2: migrate your real config to use the new bucket
Go back to your terraform-first-deploy/ directory from chapter 2. Edit main.tf and add a backend block inside the terraform block:
terraform {
required_version = ">= 1.10"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "tf-state-yourname-2026"
key = "first-deploy/terraform.tfstate"
region = "us-east-1"
encrypt = true
use_lockfile = true # native S3 locking, Terraform 1.10+
}
}The five backend attributes:
bucket— the bucket you just created.key— the path inside that bucket where this config's state lives. Best practice: namespace by project, likefirst-deploy/terraform.tfstate.region— where the bucket is.encrypt— server-side encryption on the state object. Yes, the bucket is already encrypted, but belt-and-suspenders.use_lockfile = true— this is the new flag. It tells Terraform to use S3-native conditional writes for locking. Without this flag (or on Terraform <1.10), you'd need a DynamoDB table.
Now re-init the directory:
terraform init -migrate-stateTerraform notices the backend changed, asks you to confirm migrating the local state to S3, and after you confirm, uploads the existing terraform.tfstate to the bucket. Your local state file becomes a stub.
Run terraform plan to confirm it sees the same world it did before. You should get No changes.
Locking in action
To see the lock work, open two terminals in the terraform-first-deploy/ directory.
In terminal 1:
terraform plan -lock-timeout=60sWhile that's running, in terminal 2 quickly run:
terraform planTerminal 2 should fail almost immediately:
Error: Error acquiring the state lock
Error message: ConditionalCheckFailedException: ...
Lock Info:
ID: ...
Path: tf-state-yourname-2026/first-deploy/terraform.tfstate
Operation: OperationTypePlan
Who: you@your-laptop
Terraform refuses to start a second operation because someone else already holds the lock. This is the property that prevents two teammates from corrupting state by applying simultaneously.
Recovering from a bad apply
S3 versioning is the recovery mechanism. Suppose you do something dumb and your state ends up in an inconsistent place. Open the AWS console, navigate to the bucket → Show versions, find the previous version of terraform.tfstate, restore it (or download it, then re-upload as the current version).
Then run terraform plan to verify the restored state matches reality. If it doesn't, you may have to do terraform refresh or, in the worst case, terraform import the affected resources.
The point is: state restore is a real recovery path that exists because we set up versioning at bootstrap. If you skipped that step, you're stuck.
Going deeper. State files contain provider-injected attribute data, including some sensitive values (RDS passwords, IAM access keys, etc.) when you provision those resources via Terraform. Encryption-at-rest is necessary but not sufficient. Production teams add KMS-managed keys and tighter IAM policies on the state bucket. Some go further and refuse to provision secrets through Terraform at all — they create the secret out-of-band in Secrets Manager and look it up by ARN. We'll touch on this in chapter 5.
What you should walk away with
- State is Terraform's memory of the world. Lose it and Terraform is blind.
- Local state breaks the moment a second person or a CI pipeline shows up. Migrate to remote state before either of those happens.
- On AWS, the 2026 default is S3 with native locking via
use_lockfile = true. DynamoDB-based locking is deprecated. - Versioning + encryption +
prevent_destroyon the state bucket are non-negotiable. They're the difference between "we lost an afternoon" and "we lost the company."
In chapter 4 we tackle the next thing that breaks: the single main.tf file. Adding a VPC, a subnet, an EC2 instance to the bucket-only config we have today turns it into a 200-line monolith — and we'll see why that's a problem worth fixing structurally rather than ignoring.
Next up — Chapter 4: Modules — When Your Single main.tf Becomes the Enemy. We refactor the running example into reusable modules and meet the AWS community modules that save you from writing your own VPC from scratch.