Provider API Limits

To reduce provider API hits for large Terraform scripts, you can implement the following strategies:

Data sources

Use data sources to fetch data from the provider only when necessary
Avoid creating new resources every time

Here's an example code to use data sources:

data "aws_ami" "example" {
  most_recent = true

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }

  owners = ["099720109477"]
}

resource "aws_instance" "example" {
  ami           = data.aws_ami.example.id
  instance_type = "t2.micro"

  # ...
}

In this example, the data source "aws_ami" fetches the ID of the latest Ubuntu AMI from AWS. The resource "aws_instance" then uses this ID to create an instance, reducing the number of API calls made to the provider.

By using data sources, you can minimize API limits and optimize the performance of your Terraform scripts.

Caching

Use Terraform's built-in caching, which stores resource state locally and can speed up plan operations.
Use external caching solutions, such as Redis or Memcached, to store frequently accessed data and reduce the number of API calls made.
Consider using a content delivery network (CDN) to cache provider API responses and reduce latency.

Example code to enable Terraform's built-in caching:

terraform {
  backend "local" {
    path = "terraform.tfstate"
  }

  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "~> 3.0"
    }
  }

  provider "aws" {
    region = "us-west-2"
    # Enable caching for the AWS provider
    max_retries = 2
    retry_sleep = 5s
    retryable_errors = [
      "Throttling",
      "RequestLimitExceeded",
    ]
  }
}

Dependencies

Terraform dependencies reduce the number of failed API calls
Without dependencies, Terraform might try to create the Subnet before the VPC, resulting in a failed API call
Terraform would have to retry the creation of the Subnet after the VPC has been created, resulting in two API calls: one failed and one successful
Terraform automatically determines dependencies between resources based on the attributes used
For example, if the ID of a VPC (aws_vpc.example.id) is used while creating a Subnet, Terraform understands that the Subnet depends on the VPC
In certain complex scenarios, you might need to use the depends_on argument to explicitly specify additional dependencies

Use Workspaces

Terraform Workspaces allow you to maintain separate state files for different environments (like dev, staging, and prod). This can help to segregate API requests.

Splitting Up Your Infrastructure

It may be beneficial to split up your infrastructure into smaller, more manageable chunks. This can help reduce the number of API calls made in any single terraform apply or terraform plan command.

Parallelism Option

you'd utilize the -parallelism option provided by Terraform during commands like apply or plan. This controls the maximum number of concurrent operations as Terraform walks your resources graph. By default, this is set to 10.

If you're experiencing issues with rate limits with your AWS provider, you might want to lower this number to limit the number of concurrent operations interacting with the AWS API.

Example:

bashCopy code
terraform apply -parallelism=5

This command will limit Terraform to use only 5 concurrent operations.

Please note that reducing the parallelism will slow down the Terraform operation because fewer operations are happening at once, but it can help avoid rate limits on the provider side.

However, using -parallelism is more of a mitigation step when you are already hitting rate limits. The long-term solution would be to optimize your Terraform configurations and scripts and to have a discussion with your cloud provider to potentially increase the rate limits if possible.

Use Resource Targeting Judiciously

Terraform has an option target=resource that allows you to specify individual resources to apply. You can use this option to limit the number of resources that Terraform tries to change at once, thus limiting the number of API calls.

Leverage Provisioners Wisely

Provisioners are used as a last resort when specific actions need to be taken on local or remote machines. Avoiding them or using them wisely can also help in reducing API calls.

Provisioners in Terraform are used as a last resort when you need to execute scripts or specific actions on the local machine or on the remote machine. However, if you are not careful, they can lead to some issues, including the risk of running into API rate limits. The best practice is to try to do as much as possible using Terraform resources and data sources, and avoid using provisioners if possible.

If you must use provisioners, here are a few ways you can optimize their usage to minimize API calls:

Avoid Unnecessary Provisioners: Only use provisioners if there's no other way to achieve what you want. For example, to install software on an EC2 instance, consider using something like user data or a configuration management tool like Ansible or Chef, rather than a remote-exec provisioner.
Use the self Object: If you're using a remote-exec provisioner, make sure to use the self object to reference the resource's attributes. This avoids the need for additional data sources, which can result in extra API calls.
Retry on Failure: Network issues and other temporary failures can result in failed API calls. By setting the on_failure parameter to "continue", you can ensure that temporary issues don't result in unnecessary additional API calls.
Use Local Exec Where Possible: The local-exec provisioner executes a command locally on the machine running Terraform. If the task can be done on the local machine, using local-exec can reduce the number of API calls.

Here's an example:

resource "aws_instance" "example" {
  ami           = "ami-0c94855ba95c574c8"
  instance_type = "t2.micro"

  provisioner "local-exec" {
    command = "echo ${self.private_ip} > file.txt"
  }
}

In this example, rather than making an API call to get the private IP of the instance, we're using the self object to get it directly from the resource. Also, we're using a local-exec provisioner to write the IP to a file on the local system, rather than a remote-exec provisioner that would need to make API calls to connect to the remote system.

Do Not Refresh state

Yes, there is an option to not refresh state when running terraform plan. You can use the -refresh=false flag to do this. Here's an example:

terraform plan -refresh=false

By skipping the refresh phase, you can indeed reduce the number of API calls to your provider. The refresh operation is used to reconcile the state Terraform has recorded in its state file with the real-world resources. It involves making API calls to fetch the current status of every resource in the state file.

However, there are significant disadvantages to doing this:

Outdated State: If you choose not to refresh the state before planning, your state file may be outdated, which means your plan won't accurately reflect what changes Terraform will make. This can lead to unexpected results when you apply the plan.
Inconsistencies: There might be inconsistencies between your local state and the actual state of resources in the provider. For instance, someone might have manually changed a resource in the cloud console, or an automated process might have done so. Skipping the refresh phase would mean Terraform is unaware of these changes.
Drift Detection: One of the benefits of running a terraform plan is to identify drift, or changes made to infrastructure outside of Terraform. If you skip the refresh, you will not detect this drift.

In general, it's a best practice to allow Terraform to refresh the state so that it can accurately calculate what changes need to be made. But, there might be specific situations where you could consider using the -refresh=false flag, for example in a CI/CD pipeline where you're certain no out-of-band changes have been made and you need to save time or avoid hitting rate limits. However, this should be done with care and a clear understanding of the implications.