Abhishek Sah

Terraform on CI - Part 2

June 07, 2025

This blog is part 2 of a series of articles. In the last blog, we saw the benefits of running Terraform on CI. The following section will provide details on how to set up Terraform. Let’s zoom in on what challenges one can face while setting up the Terraform run on CI.

Typically, Terraform is used to manage infrastructure in the Cloud. It can use a native provider such as AWS Provider. Terraform can also utilize other tools to perform various automation tasks, such as Helm for installing and managing resources in Kubernetes clusters, Ansible for VM orchestration, and others. You may require more providers depending on your infrastructure requirements. Each provider has its setup caveats. We will cover some of these provider setups.

Terraform Version

Let’s start with the Terraform version itself. Locking and standardizing a Terraform version for everyone is a crucial step to avoid provider version conflict issues. Terraform version changes can affect the modules. If we create a state file using a Terraform version other than the decided-upon locked version, we may encounter issues with state file compatibility in newer versions.

From the docs:

In general, Terraform will continue to work with a given state file across minor version updates. For major or minor releases, Terraform will update the state file version if required, and give an error if you attempt to run an older version of Terraform using an unsupported state file version.

State file issues are more nasty to deal with than other minor issues, such as some backend flags being deprecated: Example. Such issues need to be dealt with on a case-by-case basis. If you are able to run terraform init and terraform plan in all modules without any error, it means you have chosen the right version for the current IaC codebase.

Cloud provider access

Terraform will need to talk to the cloud providers via CI. Proper authentication mechanisms must be in place for the “Plan” and “Apply” commands to run without any access errors. To understand this in detail, I’ll take the Example of AWS, but the idea remains the same for other providers. I will use the example of GitHub Actions to demonstrate how to run Terraform on a CI/CD pipeline.

When an Infra admin runs Terraform from a local machine to make Infra changes, they typically use their identity (AWS SSO Profile, for example) to authenticate with AWS. AWS Terraform provider can get auth configuration from several sources with a pre-defined priority order. We can also use any listed auth mechanisms to authenticate terraform on CI.

Setup

The first setup involves creating an AWS directory in CI (~/.aws) and an AWS configuration file that the TF AWS Provider can use. Inside this file, we should mention a role that has access to create/destroy/change all kinds of infra in your AWS account.

This is how the role policy should look like:

{
  "Statement": [
    {
      "Action": "*",
      "Effect": "Allow",
      "Resource": "*",
      "Sid": ""
    }
  ],
  "Version": "2012-10-17"
}

We have given */* permissions to this role. However, we can also restrict access by allowing selective actions on specific resources, providing better control over infrastructure and costs.

Let’s call this IAM role core-tf-runner-role.

We assume this role in CI to provision infrastructure in our AWS account. We will need to allow the Github Runners to assume this role. We can achieve that via a trust relationship. AWS has a nice blog on how to achieve just that. You can refer to it here. It involves configuring an OIDC identity provider inside an AWS account. This setup enables the usage of the IAM role and short-term credentials. It will allow the GitHub runner to assume the core-tf-runner-role.

Conceptual Flow

On CI, we need to create a configuration file in the format below to provide it with an AWS identity. The AWS Profile should assume the role we created above.

Example Config file:

[profile core_aws_account]
role_arn = arn:aws:iam::dev-acnt-id:role/core-tf-runner-role
credential_source = Environment
region = us-west-1

We have multiple ways to make his file available on CI, such as pre-baking in the CI image, importing as a GitHub action step, or using shell magic, among others. For the sake of simplicity, I assume that we are creating this file from a script. Here is a demonstrative GitHub Actions code block that describes the approach.

- name: create aws directory and a config file
  run: mkdir ~/.aws && touch ~/.aws/config

- name: Render AWS configs
  run: |
    <<your script to load the Config file inside ~/.aws/config

- name: Setup terraform
  uses: hashicorp/setup-terraform@v3
  with:
    terraform_version: '1.6.0' # locking the version

With this setup ready, we can run commands inside Terraform modules on the GitHub Actions CI job.

Below is an example Terraform module that uses the AWS profile we just set up. Here, the state file is in the same AWS account where we are creating other infrastructure resources.

//terraform.tf
terraform {
  required_version = "~> 1.6.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.59.0"
    }
  }

  backend "s3" {
    bucket              = "<<bucket-name>>"
    key                 = "<<state path>>"
    region              = "us-west-1"
    profile             = "core_aws_account"   //access provided via ~/.aws/config file
    use_legacy_workflow = false
  }
}

//provider.tf
provider "aws" {
  region  = var.region
  profile = "core_aws_account"
}

//main.tf
resource "aws_s3_bucket" "example" {
  bucket = "my-tf-test-bucket"

  tags = {
    Name        = "My bucket"
    Environment = "Dev"
  }
}

Multi AWS Account setup

Suppose your modules have multiple AWS account connectivity requirements. In that case, it’s best to create one tf-runner role per AWS account for better control and management, e.g., prod-tf-runner-role, Security-Tf-Runner-Role, dev-tf-runner-role. The config file expands correspondingly.

[profile dev_aws_account]
role_arn = arn:aws:iam::dev-acnt-id:role/dev-tf-runner-role
credential_source = Environment
region = us-west-1

[profile prod_aws_account]
role_arn = arn:aws:iam::prod-acnt-id:role/prod-tf-runner-role
credential_source = Environment
region = us-west-1

[profile security_aws_account]
role_arn = arn:aws:iam::prod-acnt-id:role/security-tf-runner-role
credential_source = Environment
region = us-west-1

We can establish the same trust relationship with Github in each role, but that can be tedious to manage. We can “DRY” it further by creating an intermediate role that can assume all these runner roles and can be assumed by Github Action runner. I’ll call this intermediate role “core-tf-runner-role” here.

Conceptually:

Conceptual Flow

To achieve this flow, we will need to add a trust relationship between core-tf-runner-role and other runner roles.

// create this trust relationship with all account runner roles
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCNT_ID:role/core-tf-runner-role"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Additionally, we need to allow the core-tf-runner-role to assume other roles by attaching an appropriate policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": [
        "arn:aws:iam::dev-acnt-id:role/dev-tf-runner-role",
        "arn:aws:iam::prod-acnt-id:role/prod-tf-runner-role",
        "arn:aws:iam::security-acnt-id:role/security-tf-runner-role"
      ]
    }
  ]
}

With this setup, we can now use the AWS config file to handle the last leg of role chaining.

Role Chaining Flow

To recap, we have set the following:

  1. Created runner roles per AWS Account with */* permissions.
  2. Created core-tf-runner-role which can assume Runner roles in various accounts
  3. Allowed Github Action to assume this core-tf-runner-role via OIDC IdP flow
  4. Created an AWS Config file on CI to allow TF AWS Providers to authenticate

The GitHub action file is where the role chaining happens.

- name: create aws directory and a config file
  run: mkdir ~/.aws && touch ~/.aws/config

- name: Render AWS configs
  run: |
    <<your script to load the Config file inside ~/.aws/config

- name: Setup terraform
  uses: hashicorp/setup-terraform@v3
  with:
    terraform_version: '1.6.0' # locking the version

- name: Assume runner role
  uses: aws-actions/configure-aws-credentials@v2
  with:
    role-to-assume: arn:aws:iam::AWS_ACNT_ID:role/core-tf-runner-role
    audience: audString
    aws-region: us-west-1
# next steps will run plan / apply commands

CI Workflow Design

Now that we have authentication sorted out, let’s design the CI triggers. A well-designed workflow should:

  1. Run the terraform plan on all pull requests
  2. Run the terraform apply when the pull request is merged
  3. Handle multiple Terraform module changes in a single pull request
  4. Provide clear output for review in all actions

We will also need a runner with appropriate network connectivity in various AWS accounts and VPCs. Let’s keep the runner setup part out of scope for the sake of brevity in explaining the topic of this blog. Let’s consolidate all these in a GitHub Actions workflow file.

Terraform Plan on Pull Requests

name: Terraform Plan

on:
  pull_request: # run this wf on PRs on this repo
  push:
    branches:
      - main

jobs:
  terraform-plan:
    runs-on: internal-runner # A runner with proper networking setup
    steps:
      - uses: actions/checkout@v3

      - name: Create AWS config
        run: |
          mkdir -p ~/.aws
          cat > ~/.aws/config << EOF
          [profile dev_aws_account]
          role_arn = arn:aws:iam::dev-acnt-id:role/dev-tf-runner-role
          credential_source = Environment
          region = us-west-1

          [profile prod_aws_account]
          role_arn = arn:aws:iam::prod-acnt-id:role/prod-tf-runner-role
          credential_source = Environment
          region = us-west-1

          [profile security_aws_account]
          role_arn = arn:aws:iam::security-acnt-id:role/security-tf-runner-role
          credential_source = Environment
          region = us-west-1
          EOF

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: '1.6.0' # your desired version

      - name: Assume Terraform runner role
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::AWS_ACNT_ID:role/core-tf-runner-role # allowed to assume via OIDC IdP flow
          audience: audString
          aws-region: us-west-1

      - name: Get changed files # relative to main branch
        id: changed-files
        uses: tj-actions/changed-files@v40

      - name: Create directory to store plan out file
        run: |
          mkdir -p artifact

      - name: Run PLAN on changed leaf directories # custom script
        run: |
          python3 change_detection.py ${{ steps.changed-files.outputs.all_changed_files }}

      - name: Upload Artifact
        if: github.ref == 'refs/heads/main' # upload artifact only when branch is merged to main
        uses: actions/upload-artifact@v4
        with:
          name: terraform-pr-artifact
          path: artifact

This workflow does several important things:

  1. Assumes the core tf runner role via OIDC IdP flow (short-term credentials used)
  2. Setup terraform at a particular version
  3. Find all changed files relative to the target branch(main)
  4. Run a script that takes an input list of all the changed files.
1. detect the changed terraform modules in the PR from the list of changed files provided as input
2. for each changed module(here in my example, the leaf directories) in the IaC repo:
 - Runs `terraform init` utilising [chdir flag](https://developer.hashicorp.com/terraform/cli/commands#switching-working-directory-with-chdir)
 - Runs `terraform plan` and saves the plan out
3. Upload the plan-out file to GitHub Artifact to utilize it in a separate GitHub workflow.

Here is the change_detection.py script

import sys
import subprocess
import json
import utils


def main():
    tf_files_changed = utils.get_tf_files(sys.argv)
    changed_directories = utils.get_changed_directories(tf_files_changed)
    paths_references = {}

    # print the changed leaf
    for item in changed_directories:
        print(f"Detected changed directories: {item}")

    for item in changed_directories:
        print(
            f"\n===============================Processing: {item}===============================\n"
        )
        result = subprocess.run(["terraform", f"-chdir={item}", "init"],
                                stdout=subprocess.PIPE,
                                stderr=subprocess.STDOUT,
                                text=True)
        print(result.stdout)
        result.check_returncode()  # raises if non zero exit code

        plan_file_name = "_".join(item.split("/"))
        tf_plan_cmd = [
            "terraform", f"-chdir={item}", "plan", "-out", f"{plan_file_name}"
        ]
        result = subprocess.run(tf_plan_cmd,
                                stdout=subprocess.PIPE,
                                stderr=subprocess.STDOUT,
                                text=True)
        print(result.stdout)
        result.check_returncode()  # raises if non zero exit code

        #move plan file to artifact directory
        result = subprocess.run(
            ["mv", f"{item}/{plan_file_name}", f"artifact/{plan_file_name}"],
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True)
        print(result.stdout)
        result.check_returncode()  # raises if non zero exit code
        paths_references[plan_file_name] = item

    with open('artifact/path_ref.json', 'w') as fp:
        json.dump(paths_references, fp)


if __name__ == "__main__":
    main()

The code is self-explanatory. I have used some utils functions. The actual implementation will depend on how you have organized the IaC repository. We are saving the plan-out file in GitHub Artifact to apply the same changes in a later stage. Github Artifact is one way to save and use the plan file. You can use other methods to manage the plan output file.

Terraform Apply

The admins can decide how they want to trigger the Apply. It could be manual or automated. Here is an example using manual Github Workflow dispatch, taking the workflow run ID of the planning stage we showed earlier as input.

name: Terraform APPLY

on:
  workflow_dispatch:
    inputs:
      plan_workflow_id:
        type: string
        required: true
        description: PLAN workflow run id

jobs:
  terraform-apply:
    runs-on: internal-runner #A runner with proper networking setup
    steps:
      - uses: actions/checkout@v3

      - name: Create AWS config
        run: |
          mkdir -p ~/.aws
          cat > ~/.aws/config << EOF
          [profile dev_aws_account]
          role_arn = arn:aws:iam::dev-acnt-id:role/dev-tf-runner-role
          credential_source = Environment
          region = us-west-1

          [profile prod_aws_account]
          role_arn = arn:aws:iam::prod-acnt-id:role/prod-tf-runner-role
          credential_source = Environment
          region = us-west-1

          [profile security_aws_account]
          role_arn = arn:aws:iam::security-acnt-id:role/security-tf-runner-role
          credential_source = Environment
          region = us-west-1
          EOF

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: '1.6.0'

      - name: Assume Terraform runner role
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::AWS_ACNT_ID:role/core-tf-runner-role # allowed to assume via OIDC IdP flow
          audience: audString
          aws-region: us-west-1

      - name: APPLY plan out files
        id: terraform_apply
        run: |
          python3 apply.py

Here is a similar script running the apply command. It reads each plan file in the artifact. It moves those plan files to their respective modules and applies them one by one.

"""This module reads the artifact contents and moves
the plan files to proper leaf directories and applies
the plan out files one by one"""
import subprocess
import json

path_ref_filepath = 'artifact/path_ref.json'

def move_plan_out_to_proper_dir(plan_file_name, path):
    plan_mv_cmd = ["mv", f'artifact/{plan_file_name}', f'{path}/plan.out']
    result = subprocess.run(plan_mv_cmd,
                            stdout=subprocess.PIPE,
                            stderr=subprocess.STDOUT,
                            text=True)
    print(result.stdout)
    result.check_returncode()


def run_terraform_init(path):
    result = subprocess.run(["terraform", f"-chdir={path}", "init"],
                            stdout=subprocess.PIPE,
                            stderr=subprocess.STDOUT,
                            text=True)
    print(result.stdout)
    result.check_returncode()


def show_plan_file(path):
    result = subprocess.run(["terraform", "show", "plan.out"],
                            cwd=path,
                            stdout=subprocess.PIPE,
                            stderr=subprocess.STDOUT,
                            text=True)
    print(result.stdout)
    result.check_returncode()


def run_terraform_apply(path):
    result = subprocess.run(
        ["terraform", f"-chdir={path}", "apply", "plan.out"],
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        text=True)
    print(result.stdout)
    result.check_returncode()


def main():
    with open(path_ref_filepath) as file:
        file_contents = file.read()

    successful_apply = []
    path_ref = json.loads(file_contents)

    for plan_file_name in path_ref:
        path = path_ref[plan_file_name]
        print(
            f"\n===============================Processing: {path}===============================\n"
        )

        move_plan_out_to_proper_dir(plan_file_name=plan_file_name, path=path)
        run_terraform_init(path=path)
        show_plan_file(path=path)
        run_terraform_apply(path=path)

        successful_apply.append(path)

    print("apply successful in these directories:", successful_apply)


if __name__ == "__main__":
    main()

After this job is successful, the pipeline will have applied the changes to each module. The output logs will display the logs from the Terraform run that achieves our desired goal.

In conclusion, you can use multiple approaches to model your CI workflow; the above is one example. A centralized Terraform execution environment helps achieve a collaborative infrastructure workflow for engineering teams. The shared state files ensure that changes are applied serially to common resources and that everyone on the team is using the same tooling, thereby avoiding configuration drift. The GitOps model scales with your team. It also allows for faster collaboration on the infrastructure code.

Best Practices and Learnings

Based on our experience running Terraform on CI, here are some best practices and learnings from this workflow:

Module structure

Organize your Terraform code into small, logical modules that can be updated without affecting other modules (single reason to change). This separation enables the parallel execution of Plan and Apply on CI on multiple modules in different CI jobs, reducing the cascading effect of infrastructure changes.

terraform/
  ├── networking/         # VPC, subnets, etc.
  ├── compute/            # EC2, ASGs, etc.
  ├── database/           # RDS, DynamoDB, etc.
  ├── observability/      # CloudWatch, Grafana, etc.
  └── iam/                # IAM roles and policies

Handle Terraform state locking

When multiple jobs run in parallel and multiple developers work on infrastructure, state locking becomes essential. We store our terraform state on S3 buckets.

terraform {
  backend "s3" {
    bucket         = "terraform-state-bucket"
    key            = "path/to/state/file"
    region         = "us-west-1"
    profile        = "core_aws_account"
  }
}

Separate environments with workspaces or directories

For multi-environment setups, use directories and sub-directories. Keeping environments separate makes testing easy, brings confidence in changes, and facilitates easier debugging and rollback.

terraform/
  ├── dev/
  │   ├── networking/
  │   └── kubernetes/
  ├── prod/
  │   ├── networking/
  │   └── kubernetes/

Use variables for cross-account resource references

When resources in one account need to reference resources in another account, use variables and data sources to facilitate this connection. It reduces confusion in resource naming and clearly expresses intent.

# In account A
output "vpc_id" {
  value = aws_vpc.main.id
}

# In account B
variable "account_a_vpc_id" {
  description = "VPC ID from Account A"
}

data "aws_vpc" "from_account_a" {
  provider = aws.account_a
  id       = var.account_a_vpc_id
}

Security considerations

  • Limit the permissions of the CI role to only what’s necessary. */* should be used cautiously.
  • Consider manual approval for sensitive changes
  • Be cautious with terraform output

Multi Cloud Infra

  • In case your infrastructure spans multiple cloud providers, the approach of role chaining and change detection remains the same.
  • The module organization becomes a key aspect in the smooth functioning of the pipeline
  • You may need to repeat the similar role chaining respective to other cloud providers. It will become easier if the underlying cloud provider supports a similar short-term credentials approach.

Conclusion

Running Terraform on CI offers numerous benefits for infrastructure management:

  • Standardized environments for all terraform runs
  • Improved collaboration through PR reviews
  • Version control and history for all infrastructure changes
  • Reduced manual toil and human error
  • Better onboarding experience for new team members
  • Reduces knowledge silos
  • Gives a solid base for your IaC to scale as the team/company grows

You can create a robust infrastructure-as-code pipeline that scales with your organization by setting up proper authentication and workflow design and following best practices. This approach democratizes infrastructure changes while maintaining security and control over critical resources.

The journey to effective Terraform in CI may require an initial investment in setup. However, the long-term benefits of increased productivity, reliability, and team satisfaction make the effort worthwhile.

In part 3, we will explore how to use Ansible and Terraform to manage a fleet of Virtual machines.


Written by Abhishek Sah
👨‍💻Ψ ☮️
Twitter