James's Ramblings

Terraform tips from an expert

I felt like sharing some tips on some of the trickier bits of Terraform that I have picked up over 1000s of hours of using the tool, making many mistakes/learning from them, and seeing loads of different environments…

With software architecture, simpler is generally better. The simplest design that meets requirements now and in the near future means easier-to-understand code, which in turn means fewer bugs, quicker releases, and quicker learning of codebases; all things that equate to generally lower costs and shorter time to market. Terraform is no different.

I want to talk about Terraform modules and states, as I see this going wrong very often and there’s a very simple design that will meet almost all requirements.

Terraform Modules

Terraform modules are reusable collections of resources. For cloud providers, that’s usually resources like virtual machines.

There are three reasons, in my opinion, to use Terraform modules:

  1. You want to create many of a set of resources with a specific purpose and some minor variation. I will coin three subtypes here: service modules, lego modules, and layer modules.

  2. You want to hide complexity so that someone less knowledgeable and/or skilled can use it without having to be an expert.

When used incorrectly, Terraform modules can cause an absolute mess in an environment, that is very costly, time-consuming, and boring to fix. When building something new that’s somewhat complex, it’s best to build it without using modules first to better understand how to divide everything up, particularly with lego modules.

Most publicly available Terraform modules fall into the trap of being too generic and therefore too complex. I avoid using them for this reason. Unless you audit them (and mirror) or inherently trust the publisher, they’re a security liability.

Service Modules

A service module is a single module that encapsulates all the resources (and sub-modules) required to run a service. In the root state for a service, I don’t have anything declared except a single module block that references (usually) one service module. Injected into the service module (as variables) are things that will change for the service as it is promoted between environments, such as CIDR blocks and names.

This is a great pattern as it’s really easy to reap the benefits and hard to get wrong. I tend to keep service modules in the same repository where the application code lives, so everything can be reverted at once. However, keeping them externally can also work.

Lego Modules

Lego (or composable) modules are very generic blocks of infrastructure that will be used many times across an environment. Examples include a VPC module or a best-practice private S3 bucket (this includes several resources).

Creating many of a set of resources can be somewhat of an art form. Common mistakes include:

  • Encapsulating too few resources, creating unnecessary complexity and abstraction.
  • Encapsulating too many resources, leading to hard-to-read/maintain messes with conditions everywhere.

Lego modules should be kept in an external repository, version pinned, and use a versioning system like semantic releases. Terraform’s registry or its version pinning syntax with Git can be used. Without version pinning (and careful Terraform plans), there is a risk of costly mistakes.

Terraform States and Layer Modules

State is a text file where Terraform tracks the configuration of modules and resources. Most environments have more than one state file. This is another area where things commonly go wrong:

  • Single huge state: Increases risk (blast radius) of change, slows down Terraform plans/applies, and could be a security risk (secrets can exist in state).

  • Too many states: Becomes too difficult to maintain due to drift.

Rules for Terraform States:

Prefer one state file unless:

  1. You’re crossing a trust boundary.
  2. terraform plan/apply becomes too slow.
  3. Resources will be managed by a different team. Each team should have its own state.

Layer Modules

Layer modules solve race conditions in creating resources that don’t become healthy immediately. Terraform tracks dependencies between resources, but in some cases, layers are needed to avoid failed applies.

  • Each layer can exist in a module.
  • Use -target on each layer to prevent race conditions. This approach simplifies the number of states required, checks for drift across all infrastructure, and allows a single terraform destroy to remove everything.

A Rant About Makefiles and Runtime Variables

Makefiles, runtime variables, and runtime backend blocks may seem cool but are unnecessary complexity. These features don’t solve problems that can’t be solved more easily by other means.

A junior engineer should be able to terraform init, terraform plan, and terraform apply on day one of their career (in a sandbox account). Overcomplicating the setup with these tools makes a codebase unnecessarily complex and difficult for less experienced engineers. Keeping everything in files makes things simple.