The Day Our CI Pipeline Cost $300 in Idle Resources: How We Implemented Auto-Scaling Runners

Faced with a sudden $300 cloud bill attributed to idle CI runners over a single weekend, we restructured our infrastructure to use ephemeral auto-scaling runners, cutting costs by 90%.

The email hit my inbox on a Monday morning in early March 2026. It wasn't a bug report from the security team or a deployment failure notification; it was a billing alert from AWS. We had a budget threshold set at $500 for the month, and we were already at 60% of that limit just ten days in. I opened the Cost Explorer dashboard, expecting to see a misconfigured database instance or a runaway data transfer fee. Instead, the culprit was staring me in the face: EC2 Compute.

Specifically, three t3.xlarge instances running in our development environment. These were our self-hosted GitHub Actions runners, humming along happily. The problem? It was 8:00 AM on a Monday, and the last commit to the dev branch was Friday at 6:00 PM. For roughly 60 hours, these machines had been running at full blast, doing absolutely nothing but burning cash. The total for that idle weekend was just over $300.

We had fallen into the classic trap of convenience. To avoid the "cold start" latency of spinning up new virtual machines for every build, we had configured our runners as persistent resources. They were always on, ready to grab a job the moment a developer pushed code. While this improved developer experience slightly—saving maybe two minutes per build—the financial reality was unsustainable.

Diagnosing the Always-On Fallacy

I brought the data to our weekly DevOps sync. The team was skeptical at first. We justified the always-on approach because our build process was heavy, involving Docker compositions and integration tests that required significant RAM. We feared that if we spun up runners on-demand, the lag would break our flow.

However, the numbers didn't lie. I pulled the logs from our CI platform. Out of 720 hours in a month, our runners were actively executing jobs for less than 40 hours. That is a utilization rate of roughly 5.5%. We were essentially paying for a taxi to wait outside our apartment 24/7 just in case we decided to go to the corner store once a week.

We needed a shift from static infrastructure to an event-driven model. The goal was to implement a system where infrastructure exists only for the duration of the job. If a build takes 15 minutes, the runner should live for 16 minutes (15 for the build, 1 for startup/teardown).

Architecture: From Static to Ephemeral

We decided to move away from static EC2 instances managed by a cron job and toward a fully ephemeral architecture using AWS Auto Scaling Groups and a custom controller. We weren't using AWS Fargate vs. Self-Managed Kubernetes: Why We Chose Serverless Containers for Our Startup for this specific workload because we needed raw Docker socket access for our containerized builds, which is easier to manage on standard EC2 instances at the moment.

The architecture hinged on a simple loop:

The CI platform queues a job.
A webhook or scaling policy detects the queue depth.
AWS Auto Scaling Group launches a fresh instance.
The instance registers itself as a runner.
The job executes.
Upon completion, the runner deregisters and terminates immediately.

The critical component here is step 6. In a standard scaling scenario, an Auto Scaling Group usually keeps an instance alive for a cooldown period or a set minimum lifecycle. For CI, that wastes money. We needed the instance to die the second the sleep command ended after the job finished.

Photographic detail related to The Day Our CI Pipeline Cost $300 in Idle Resources: How We Implemented Auto-Scaling Runners

Implementing the Lifecycle Hook

To achieve this immediate shutdown, we utilized a User Data script that runs on the instance boot. This script is the heart of the ephemeral runner logic. Instead of the runner service running in the background indefinitely, we wrapped it in a foreground process that blocks the instance from terminating until the job is done.

We set up the Auto Scaling Group with a capacity of 0. When a workflow triggers, a Lambda function (driven by a webhook from our CI provider) increases the desired capacity to 1.

Here is where the specificity matters: we used t3.xlarge spot instances to reduce costs further, but we acknowledged the risk of interruption. For our test suite, if a spot instance was reclaimed, we simply let the job fail and retry on a new one.

The configuration relied heavily on infrastructure as code. Managing the state of these scaling policies and IAM roles locally on a laptop would have been a nightmare. We relied on Terraform State Files: Why Remote Backend Isn't Optional for Teams to ensure that the team was collaborating on the exact same infrastructure definition, preventing drift between the code repo and the actual AWS environment.

The User Data script looked something like this conceptually:

Install the runner binary.
Configure the runner with a registration token (retrieved from AWS Secrets Manager).
Start the runner in a "one-shot" mode.
Once the process exits, execute shutdown -h now.

This ensured that no matter what happened—success, failure, or timeout—the resource would deallocate itself.

Security: When Ephemeral Means Safer

As a security researcher, I saw an immediate side benefit of this approach that had nothing to do with cost. Persistent runners are a security liability. If an attacker compromises a build script or injects malicious code into a pipeline, they have a persistent beachhead into your VPC. They can pivot, scrape secrets, or mine cryptocurrency as long as that runner stays online.

With ephemeral runners, the attack surface is constantly resetting. Even if a job is compromised, the attacker has a very narrow window to act—often just minutes. Furthermore, because the runners are destroyed and recreated from a golden image (AMI) every time, we ensure that no manual changes or "temporary fixes" survive the reboot cycle.

However, this model requires strict IAM hygiene. The runner needs permission to pull code, push artifacts, and access databases during tests. It is tempting to assign a broad "CI-Role" with S3FullAccess and RDSFullAccess to avoid debugging permission errors. We resisted this. We scoped the IAM role down to specific prefixes in S3 and specific DB clusters.

We also had to address container security. Since our runners spawn Docker containers, we ensured the Docker daemon was not exposed to the public internet and that our builds never ran as the root user inside the containers. Following the principle of least privilege applies even to disposable infrastructure. If you are unsure why this matters, read up on Why Should You Never Run Docker Containers as Root User?; it is a vector often overlooked in the rush to ship code.

The Financial Reality Check

Two weeks after implementing the auto-scaling runners, I checked the dashboard again. The visual difference was stark. Instead of a solid block of green usage bars representing 24/7 uptime, the graph now looked like a comb—sparse spikes of activity corresponding exactly to our commits.

The previous month, before the change, our CI infrastructure cost us roughly $450. The following month, the bill was $42.

That isn't a typo. We cut our costs by over 90%.

We did incur a latency penalty. The first job of the morning now takes about 90 seconds longer due to the time required for the AWS API to launch the instance and the runner to register. For the team, this was a non-issue. Developers realized that saving $400 a month was worth waiting an extra minute for the first build of the day. Subsequent jobs in a queue often reuse already warming instances, so the penalty is amortized.

More importantly, the "billing shock" is gone. We no longer fear long weekends. We no longer hesitate to spin up resource-heavy integration tests because we know the meter stops running the moment the test passes.

CI Infrastructure Must Be Disposable

The transition to ephemeral runners taught us a valuable lesson about cloud maturity. Treating build servers as pets—naming them, keeping them alive, nursing them when they get slow—is an artifact of on-premise thinking. In the cloud, infrastructure should be treated as cattle.

The financial savings are the headline, but the operational improvement is the real story. We don't patch runners anymore; we update the AMI. We don't clear disk space; we terminate the instance. The system heals itself.

If you are staring at a cloud bill right now wondering why your development environment is costing as much as production, look at your CI pipeline. Turn off the always-on servers. Embrace the cold start. Your finance team will thank you, and your security posture will likely improve as a side effect.