- Initial thoughts
- 1. The right EC2 instance at the right price
- 2. Scripting the GitLab runner installation and configuration
- 3. Deploying the auto-stopping architecture with Terraform
- Further reading
Initial thoughts
In GitLab CI: The Majestic Single Server Runner, we found that a single server runner outperforms a Kubernetes cluster with equivalent node specifications until approximately 200 jobs requested simultaneously! This is typically beyond the average daily usage for most software teams. Equally important, when there are 40 queued jobs to process or below, the single server runner is twice as fast. This scenario is quite common, even during the busiest days, for most teams.
As demonstrated in GitLab Runners: Which Topology for Fastest Job Execution?, single-server executors deliver the fastest job execution times.
This article will help you deploy this no-compromise runner on AWS, at a reasonable price, thanks to multiple optimizations. Part of it applies to any Cloud, public or private.
The deployment is automated and optimized as much as possible:
- Infrastructure is provisioned with Terraform
- A spot instance is used
- EC2 is stopped at night and on week-end
- EC2 boot script (re)installs everything and registers to GitLab
- The runner is tagged with a few interesting ec2 characteristics
1. The right EC2 instance at the right price
An AWS spot instance is a cost-effective option that allows you to leverage spare EC2 capacity at a discounted price. By choosing spot instances, you can significantly reduce your Amazon EC2 costs. Since our deployment is automated and downtime is not critical, opting for spot instances is an optimal choice for cost optimization.
To fully utilize the capabilities of a single server runner while keeping costs reasonable, it is essential to select an EC2 instance with a local NVMe SSD disk. These instances are identified by the 'd' in their name, indicating that they are disk-optimized.
When choosing an EC2 instance, the following conditions should be considered:
- The instance should have the 'd' letter to indicate NVMe local disk support.
- It should be available in our usual region.
- The CPU specifications should match our usage requirements. For Java/Javascript applications CICD, about 1 core per parallel job is good. We choose here 16 CPU for 20 parallel jobs.
- The spot price should be reasonable.
For the purpose of this article, we have selected the r5d.4xlarge instance type. At the time of writing, the spot price for this instance in us-east-1 is approximately $370/month. It might seems high to you.
But when compared to the monthly cost of our development team, this price is relatively low. However, we can further optimize costs by automatically stopping the EC2 instance outside of working hours using daily CloudWatch executions. Since it is a local disk instance, the state will be lost every day, but we have nothing to loose except some cache, that can be warmed up with a scheduled pipeline every morning.
Let's calculate the cost: $0.5045/hour x 12 open daily hours x 21 open days per month = $127/month. This brings the cost even lower than the already acceptable price. To put it into perspective, this represents an 85% discount compared to running the same instance full-time on-demand ($841/month).
2. Scripting the GitLab runner installation and configuration
To streamline the process of deploying the EC2 instance, we will create a script that can be used as the user_data to bootstrap the server anytime it (re)boots. This script will handle the installation of Docker, the GitLab Runner, and the configuration required to connect to the GitLab instance.
The script is designed to handle reboots and stop/start actions, which may result in the deletion of local disk data on the NVMe EC2 instance.
Key features of this updated script:
-
AWS CloudWatch Integration: Automatic installation and configuration of the CloudWatch agent to send logs (
/var/log/user-data.log,/var/log/syslog) and system metrics (CPU, memory, disk, network) to AWS CloudWatch for centralized monitoring -
Enhanced Logging: All script operations are logged to
/var/log/user-data.logwith timestamps and detailed execution traces (set -x) - Containerd Support: In addition to Docker, the script now manages containerd with proper bind mounts, ensuring better container runtime isolation
-
GitLab Runner Optimizations: Includes performance feature flags (
FF_TIMESTAMPS,FF_USE_FASTZIP,ARTIFACT_COMPRESSION_LEVEL=fastest,CACHE_COMPRESSION_LEVEL=fastest) to speed up job execution - Improved Error Handling: Validates NVME disk presence and verifies GitLab Runner registration success with explicit error messages
- Fast Node Manager (fnm): Pre-installs fnm for efficient Node.js version management in CI/CD pipelines
-
Robust Mount Architecture: Uses bind mounts from
/mnt/nvme-*to standard locations (/var/lib/docker,/var/lib/containerd,/gitlab), providing better disk organization and maintainability
Make sure to modify the following variables at the start of the script according to your specific requirements:
aws-ec2-init-nvme-and-gitlab-runner.sh
#!/bin/bash
#
### Script to initialize a GitLab runner on an existing AWS EC2 instance with NVME disk(s)
#
# - script is not interactive (can be run as user_data)
# - will reboot at the end to perform NVME mounting
# - first NVME disk will be used for GitLab cache
# - last NVME disk will be used for Docker and containerd (if only one NVME, the same will be used without problem)
# - robust : on each reboot and stop/start, disks are mounted again (but data may be lost if stop and then start after a few minutes)
# - runner is tagged with multiple instance data (public dns, IP, instance type...)
# - works with a single spot instance
# - should work even with multiple ones in a fleet, with same user_data (not tested for now)
#
# /!\ There is no prerequisite, except these needed variables :
MAINTAINER=zenika
GITLAB_URL=https://gitlab.com/
GITLAB_TOKEN=XXXX # https://gitlab.com/groups/ZenikaIT/-/runners
RUNNER_NAME=majestic-runner-v2026
# Enable verbose logging (set -x shows all executed commands)
set -x
exec > >(tee -a /var/log/user-data.log)
exec 2>&1
echo "========================================"
echo "GitLab Runner EC2 Initialization - $(date)"
echo "Instance: $(ec2-metadata --instance-type | cut -d ' ' -f 2) / $(ec2-metadata --instance-id | cut -d ' ' -f 2)"
echo "========================================"
echo "\n=== Installing CloudWatch Agent ==="
wget -q https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
sudo dpkg -i -E ./amazon-cloudwatch-agent.deb
rm amazon-cloudwatch-agent.deb
# Configure CloudWatch Agent to send logs and metrics (compact JSON)
sudo tee /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json >/dev/null <<'CWCONFIG'
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{"file_path": "/var/log/user-data.log", "log_group_name": "/aws/ec2/gitlab-runner", "log_stream_name": "{instance_id}/user-data", "timestamp_format": "%Y-%m-%d %H:%M:%S"},
{"file_path": "/var/log/syslog", "log_group_name": "/aws/ec2/gitlab-runner", "log_stream_name": "{instance_id}/syslog", "timestamp_format": "%b %d %H:%M:%S"}
]
}
}
},
"metrics": {
"metrics_collected": {
"cpu": {"measurement": ["cpu_usage_idle","cpu_usage_iowait","cpu_usage_user","cpu_usage_system","cpu_usage_steal"], "metrics_collection_interval": 60, "resources": ["*"]},
"mem": {"measurement": ["mem_used_percent","mem_available","mem_total","mem_used"], "metrics_collection_interval": 60},
"swap": {"measurement": ["swap_used_percent","swap_used","swap_free"], "metrics_collection_interval": 60},
"disk": {"measurement": ["disk_used_percent","disk_free","disk_total","disk_used"], "resources": ["*"], "metrics_collection_interval": 60},
"diskio": {"measurement": ["diskio_read_bytes","diskio_write_bytes","diskio_reads","diskio_writes"], "resources": ["*"], "metrics_collection_interval": 60},
"net": {"measurement": ["bytes_sent","bytes_recv","packets_sent","packets_recv"], "resources": ["*"], "metrics_collection_interval": 60}
}
}
}
CWCONFIG
# Start CloudWatch Agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config \
-m ec2 \
-s \
-c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
echo "β
CloudWatch logs: /aws/ec2/gitlab-runner"
# prepare docker (re)install
echo "\n=== Installing Docker & GitLab Runner ==="
sudo apt-get update -qq
sudo apt-get -y install apt-transport-https ca-certificates curl gnupg lsb-release sysstat
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt-get update -qq
# Install Fast Node Manager (fnm)
curl -fsSL https://fnm.vercel.app/install | bash
# install gitlab runner
curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash
sudo apt-get -y install gitlab-runner
echo "β
GitLab Runner: $(gitlab-runner --version | head -n1)"
# create NVME initializer script
echo "\n=== Creating NVME Initializer ==="
cat <<EOF >/home/ubuntu/nvme-initializer.sh
#!/bin/bash
#
# To be run on each fresh start, since NVME disks are ephemeral
# so first start, start after stop, but not on reboot
# inspired by https://stackoverflow.com/questions/45167717/mounting-a-nvme-disk-on-aws-ec2
#
set -x
exec >> /var/log/user-data.log 2>&1
echo "=== NVME Initializer - \$(date) ==="
lsblk -b --output=NAME,SIZE,TYPE,MOUNTPOINT
# get NVME disks bigger than 100Go (some small size disk may be there for root, depending on server type)
NVME_DISK_LIST=\$(lsblk -b --output=NAME,SIZE | grep "^nvme" | awk '{if(\$2>100000000000)print\$1}' | sort)
echo "Found NVME disks: \$NVME_DISK_LIST"
# there may be 1 or 2 NVME disks, then we split (or not) the mounts between GitLab cache and Docker/containerd runtime
export NVME_GITLAB=\$(echo "\$NVME_DISK_LIST" | head -n 1)
export NVME_CONTAINER=\$(echo "\$NVME_DISK_LIST" | tail -n 1)
echo "NVME_GITLAB=/dev/\$NVME_GITLAB NVME_CONTAINER=/dev/\$NVME_CONTAINER"
if [ -z "\$NVME_GITLAB" ]; then
echo "β ERROR: No NVME disk found!"
exit 1
fi
# format disks if not
sudo mkfs -t xfs /dev/\$NVME_GITLAB || echo "Already formatted"
if [ "\$NVME_GITLAB" != "\$NVME_CONTAINER" ]; then
sudo mkfs -t xfs /dev/\$NVME_CONTAINER || echo "Already formatted"
fi
# Mount NVME disks on /mnt/nvme-*
# - If 1 disk: everything goes on the same disk (NVME_GITLAB == NVME_CONTAINER)
# - If 2 disks: gitlab cache on first, docker+containerd runtime on second
sudo mkdir -p /mnt/nvme-gitlab /mnt/nvme-runtime
sudo mount /dev/\$NVME_GITLAB /mnt/nvme-gitlab
sudo mount /dev/\$NVME_CONTAINER /mnt/nvme-runtime
# Create service directories and bind mount to standard locations
sudo mkdir -p /mnt/nvme-gitlab/gitlab-cache /gitlab
sudo mount --bind /mnt/nvme-gitlab/gitlab-cache /gitlab
sudo mkdir -p /mnt/nvme-runtime/docker /var/lib/docker
sudo mount --bind /mnt/nvme-runtime/docker /var/lib/docker
sudo mkdir -p /mnt/nvme-runtime/containerd /var/lib/containerd
sudo mount --bind /mnt/nvme-runtime/containerd /var/lib/containerd
# reinstall Docker and containerd (which data may have been wiped out)
sudo apt-get -y reinstall docker-ce docker-ce-cli containerd.io docker-compose-plugin
echo "\n=== Mounted volumes ==="
df -h | grep -E '(Filesystem|nvme|gitlab|docker|containerd)'
echo "β
NVME initialization successful"
EOF
# set NVME initializer script as startup script
sudo tee /etc/systemd/system/nvme-initializer.service >/dev/null <<EOS
[Unit]
Description=NVME Initializer
After=network.target
[Service]
ExecStart=/home/ubuntu/nvme-initializer.sh
Type=oneshot
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
EOS
sudo chmod 744 /home/ubuntu/nvme-initializer.sh
sudo chmod 664 /etc/systemd/system/nvme-initializer.service
sudo systemctl daemon-reload
sudo systemctl enable nvme-initializer.service
sudo systemctl start nvme-initializer.service
sudo systemctl status nvme-initializer.service
# tail -f /var/log/syslog
### Runner creation at the end to have a feedback on Gitlab side of the whole process done
echo "\n=== Registering GitLab Runner ==="
echo "gitlab-runner ALL=(ALL) NOPASSWD:ALL" | sudo tee -a /etc/sudoers
echo "Runner: $RUNNER_NAME"
# FF_NETWORK_PER_BUILD to fix a DinD error, from https://forum.gitlab.com/t/since-docker-update-docker-ce-5-29-0-0-1-debian-12-bookworm-cicd-dind-errors-fatal-no-host-or-port-found/131377/4
sudo gitlab-runner register --name "$RUNNER_NAME" --url "$GITLAB_URL" --token "$GITLAB_TOKEN" --executor "docker" --docker-image "ubuntu:22.04" --docker-volumes "/gitlab/:/host/" --custom_build_dir-enabled=true --docker-privileged --docker-pull-policy "if-not-present" --env "FF_NETWORK_PER_BUILD=true" --non-interactive --env "FF_TIMESTAMPS=true" --env "FF_USE_FASTZIP=true" --env "ARTIFACT_COMPRESSION_LEVEL=fastest" --env "CACHE_COMPRESSION_LEVEL=fastest"
if [ $? -eq 0 ]; then
echo "β
GitLab Runner registered successfully!"
else
echo "β GitLab Runner registration FAILED!"
exit 1
fi
# bind docker socket (to avoid docker-in-docker service)
# sudo gitlab-runner register --name "$RUNNER_NAME" --url "$GITLAB_URL" --token "$GITLAB_TOKEN" --executor "docker" --docker-image "ubuntu:22.04" --docker-volumes "/var/run/docker.sock:/var/run/docker.sock" --docker-volumes "/gitlab/custom-cache/:/host/" --custom_build_dir-enabled=true --docker-privileged --docker-pull-policy "if-not-present" --non-interactive
# to unregister :
# sudo gitlab-runner unregister --name "$(curl --silent http://169.254.169.254/latest/meta-data/public-hostname)"
# replace "concurrent = 1" with "concurrent = 20"
sudo sed -i '/^concurrent /s/=.*$/= 20/' /etc/gitlab-runner/config.toml
# replace "check_interval = 0" with "check_interval = 2"
sudo sed -i '/^check_interval /s/=.*$/= 2/' /etc/gitlab-runner/config.toml
### from https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4036#note_1083142570
# replace "/cache" technical volume with one mounted on disk to avoid cache failure when several jobs in parallel
# this could have also have been a docker volume mounted : https://gitlab.com/gitlab-org/gitlab-runner/-/issues/1151#note_1019634818 but this does not make it faster if 2 different MVNE disks (gitlab + docker)
sudo sed -i 's#"/cache"#"/gitlab/cache:/cache"#' /etc/gitlab-runner/config.toml
sudo systemctl restart gitlab-runner
sudo systemctl status gitlab-runner --no-pager
echo "\n========================================"
echo "π GitLab Runner initialization COMPLETED!"
echo "========================================"
3. Deploying the auto-stopping architecture with Terraform
To quickly deploy the architecture, we will be using Terraform. With Terraform, we can automate the deployment process and have our infrastructure up and running in minutes.
Before we proceed, please ensure that you have an existing VPC created as a prerequisite. You can refer to the examples provided in the official GitHub repo for guidance on creating the VPC.
Key improvements in this updated Terraform configuration:
-
EC2 Fleet instead of single spot instance: Uses
aws_ec2_fleetwith multiple instance types and availability zones for maximum availability and cost optimization -
CloudWatch integration: Creates a dedicated log group (
/aws/ec2/gitlab-runner) that works with the CloudWatch agent installed by the bootstrap script - IAM instance profile: Allows the EC2 instance to send logs and metrics to CloudWatch without hardcoded credentials
- Launch template architecture: Separates instance configuration from fleet management, making updates easier
- Multi-instance-type strategy: Tries multiple compute-optimized instance types (c5ad.4xlarge, c6id.4xlarge, g4ad.4xlarge, c5d.4xlarge) across 3 AZs for better spot availability
- Pure Terraform scheduler: Replaces external module with inline Python Lambda function for stop/start scheduling, reducing dependencies and improving maintainability
-
Cost optimization: Uses
lowestPriceallocation strategy to always select the cheapest available spot instance
Here is the gitlab-runner.tf file that contains the Terraform configuration:
################################################################################
# Gitlab Runner EC2 Fleet (multi-AZ, multi-instance-type for better availability)
################################################################################
# CloudWatch Log Group for runner logs
resource "aws_cloudwatch_log_group" "runner_logs" {
name = "/aws/ec2/gitlab-runner"
retention_in_days = 7
}
resource "aws_security_group" "in-ssh-out-all" {
name = "in-ssh-out-all"
vpc_id = module.vpc.vpc_id
ingress {
cidr_blocks = [
"0.0.0.0/0"
]
from_port = 22
to_port = 22
protocol = "tcp"
} // Terraform removes the default rule
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# Launch template with common configuration for all instances
resource "aws_launch_template" "gitlab-runner" {
name_prefix = "gitlab-runner-"
image_id = "ami-03446a3af42c5e74e" # Ubuntu 24.04 LTS amd64, build on 2025-12-12. From https://cloud-images.ubuntu.com/locator/ec2/
key_name = "my-key"
user_data = filebase64("aws-ec2-init-nvme-and-gitlab-runner.sh")
iam_instance_profile {
name = aws_iam_instance_profile.runner_instance.name
}
network_interfaces {
associate_public_ip_address = true
security_groups = [aws_security_group.in-ssh-out-all.id]
delete_on_termination = true
}
tag_specifications {
resource_type = "instance"
tags = merge(
local.tags,
{
Name = "steroid-runner"
Scheduled = "working-hours"
}
)
}
tag_specifications {
resource_type = "volume"
tags = merge(
local.tags,
{
Name = "steroid-runner-volume"
}
)
}
}
# EC2 Fleet with multiple instance types and AZs for maximum availability
resource "aws_ec2_fleet" "gitlab-runner" {
target_capacity_specification {
default_target_capacity_type = "spot"
total_target_capacity = 1 # Only 1 instance needed
spot_target_capacity = 1
}
# Try all instance types across all AZs (cheapest first)
# Priority 01/2026 x86_64 (spot average β¬/month, % discount vs on-demand):
# c5ad.4xlarge: 249β¬/mo avg (-48%) < c6id.4xlarge: 266β¬/mo avg (-52%) < g4ad.4xlarge: 269β¬/mo avg (-54%) < c5d.4xlarge: 288β¬/mo avg (-46%)
launch_template_config {
launch_template_specification {
launch_template_id = aws_launch_template.gitlab-runner.id
version = "$Latest"
}
# c5ad.4xlarge: 249β¬/mo avg spot (32GB RAM, 600GB NVMe SSD, -48% vs on-demand 478β¬)
override {
instance_type = "c5ad.4xlarge"
subnet_id = module.vpc.public_subnets[0]
priority = 1.0
}
override {
instance_type = "c5ad.4xlarge"
subnet_id = module.vpc.public_subnets[1]
priority = 1.1
}
override {
instance_type = "c5ad.4xlarge"
subnet_id = module.vpc.public_subnets[2]
priority = 1.2
}
# c6id.4xlarge: 266β¬/mo avg spot (32GB RAM, 950GB NVMe SSD, -52% vs on-demand 558β¬)
override {
instance_type = "c6id.4xlarge"
subnet_id = module.vpc.public_subnets[0]
priority = 1.3
}
override {
instance_type = "c6id.4xlarge"
subnet_id = module.vpc.public_subnets[1]
priority = 1.4
}
override {
instance_type = "c6id.4xlarge"
subnet_id = module.vpc.public_subnets[2]
priority = 1.5
}
# g4ad.4xlarge: 269β¬/mo avg spot (64GB RAM, 600GB NVMe SSD, -54% vs on-demand 590β¬)
override {
instance_type = "g4ad.4xlarge"
subnet_id = module.vpc.public_subnets[0]
priority = 1.6
}
override {
instance_type = "g4ad.4xlarge"
subnet_id = module.vpc.public_subnets[1]
priority = 1.7
}
override {
instance_type = "g4ad.4xlarge"
subnet_id = module.vpc.public_subnets[2]
priority = 1.8
}
# c5d.4xlarge: 288β¬/mo avg spot (32GB RAM, 400GB NVMe SSD, -46% vs on-demand 532β¬)
override {
instance_type = "c5d.4xlarge"
subnet_id = module.vpc.public_subnets[0]
priority = 1.9
}
override {
instance_type = "c5d.4xlarge"
subnet_id = module.vpc.public_subnets[1]
priority = 2.0
}
override {
instance_type = "c5d.4xlarge"
subnet_id = module.vpc.public_subnets[2]
priority = 2.1
}
}
spot_options {
allocation_strategy = "lowestPrice" # Strict price priority (respects override order)
instance_interruption_behavior = "terminate"
instance_pools_to_use_count = 1 # Only use the cheapest pool at a time
}
terminate_instances = true
terminate_instances_with_expiration = false
valid_until = "2030-01-01T00:00:00Z"
replace_unhealthy_instances = true
type = "maintain" # Maintain target capacity
tags = merge(
local.tags,
{
Name = "steroid-runner-fleet"
Scheduled = "working-hours"
}
)
}
################################################################################
# Stop/Start scheduler with pure Terraform (EventBridge + Lambda)
################################################################################
# Lambda function to stop/start the fleet by modifying its capacity
resource "aws_lambda_function" "scheduler" {
function_name = "runner-scheduler"
role = aws_iam_role.scheduler_lambda.arn
handler = "index.handler"
runtime = "python3.12"
timeout = 60
filename = data.archive_file.scheduler_lambda.output_path
source_code_hash = data.archive_file.scheduler_lambda.output_base64sha256
environment {
variables = {
FLEET_ID = aws_ec2_fleet.gitlab-runner.id
}
}
}
# Create Lambda deployment package
data "archive_file" "scheduler_lambda" {
type = "zip"
output_path = "${path.module}/.terraform/scheduler-lambda.zip"
source {
content = <<-EOF
import boto3
import os
ec2 = boto3.client('ec2')
def handler(event, context):
action = event.get('action', 'stop')
fleet_id = os.environ.get('FLEET_ID')
print(f"Fleet ID: {fleet_id}")
print(f"Action: {action}")
if not fleet_id:
return {'statusCode': 400, 'body': 'FLEET_ID environment variable not set'}
# Get current fleet status
try:
fleet_response = ec2.describe_fleets(FleetIds=[fleet_id])
if not fleet_response['Fleets']:
return {'statusCode': 404, 'body': f'Fleet {fleet_id} not found'}
fleet = fleet_response['Fleets'][0]
current_target = fleet['TargetCapacitySpecification']['TotalTargetCapacity']
print(f"Current target capacity: {current_target}")
except Exception as e:
print(f"Error describing fleet: {e}")
return {'statusCode': 500, 'body': f'Error: {str(e)}'}
# Modify fleet capacity based on action
try:
if action == 'stop':
# Set capacity to 0 to terminate instances
print(f"Setting fleet capacity to 0")
ec2.modify_fleet(
FleetId=fleet_id,
TargetCapacitySpecification={'TotalTargetCapacity': 0}
)
return {'statusCode': 200, 'body': f'Fleet capacity set to 0 (instances will terminate)'}
elif action == 'start':
# Set capacity to 1 to launch instance
print(f"Setting fleet capacity to 1")
ec2.modify_fleet(
FleetId=fleet_id,
TargetCapacitySpecification={'TotalTargetCapacity': 1}
)
return {'statusCode': 200, 'body': f'Fleet capacity set to 1 (instance will launch)'}
else:
return {'statusCode': 400, 'body': f'Unknown action: {action}'}
except Exception as e:
print(f"Error modifying fleet: {e}")
return {'statusCode': 500, 'body': f'Error: {str(e)}'}
EOF
filename = "index.py"
}
}
# CloudWatch Log Group for Lambda
resource "aws_cloudwatch_log_group" "scheduler_lambda" {
name = "/aws/lambda/${aws_lambda_function.scheduler.function_name}"
retention_in_days = 7
}
# EventBridge rule to stop runner nightly
resource "aws_cloudwatch_event_rule" "stop_runner" {
name = "stop-runner-nightly"
description = "Stop runner at 18:00 UTC every day"
schedule_expression = "cron(0 18 ? * * *)"
}
resource "aws_cloudwatch_event_target" "stop_runner" {
rule = aws_cloudwatch_event_rule.stop_runner.name
target_id = "StopRunnerLambda"
arn = aws_lambda_function.scheduler.arn
input = jsonencode({
action = "stop"
})
}
resource "aws_lambda_permission" "allow_eventbridge_stop" {
statement_id = "AllowExecutionFromEventBridgeStop"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.scheduler.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.stop_runner.arn
}
# EventBridge rule to start runner daily on working days
resource "aws_cloudwatch_event_rule" "start_runner" {
name = "start-runner-daily"
description = "Start runner at 06:00 UTC Monday-Friday"
schedule_expression = "cron(0 6 ? * MON-FRI *)"
}
resource "aws_cloudwatch_event_target" "start_runner" {
rule = aws_cloudwatch_event_rule.start_runner.name
target_id = "StartRunnerLambda"
arn = aws_lambda_function.scheduler.arn
input = jsonencode({
action = "start"
})
}
resource "aws_lambda_permission" "allow_eventbridge_start" {
statement_id = "AllowExecutionFromEventBridgeStart"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.scheduler.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.start_runner.arn
}
The runner starts at 06:00 UTC and stops at 18:00 UTC every day (Monday-Friday for starts, every day for stops). Feel free to adjust the cron expressions according to your requirements.
Here is the iam.tf file that defines the IAM roles and policies:
################################################################################
# IAM Roles and Policies for GitLab Runner Infrastructure
################################################################################
# Attach AWS managed policy for CloudWatch Agent metrics
resource "aws_iam_role_policy_attachment" "runner_cloudwatch_agent" {
role = aws_iam_role.runner_instance.name
policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
}
# IAM role for EC2 instances to send logs to CloudWatch
resource "aws_iam_role" "runner_instance" {
name = "gitlab-runner-instance-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
# IAM policy for CloudWatch Logs
resource "aws_iam_role_policy" "runner_cloudwatch_logs" {
name = "gitlab-runner-cloudwatch-logs"
role = aws_iam_role.runner_instance.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
]
Resource = "arn:aws:logs:*:*:log-group:/aws/ec2/gitlab-runner:*"
},
{
Effect = "Allow"
Action = [
"ec2:DescribeTags"
]
Resource = "*"
}
]
})
}
# Instance profile to attach IAM role to EC2
resource "aws_iam_instance_profile" "runner_instance" {
name = "gitlab-runner-instance-profile"
role = aws_iam_role.runner_instance.name
}
################################################################################
# IAM for Lambda Scheduler
################################################################################
# IAM role for Lambda execution
resource "aws_iam_role" "scheduler_lambda" {
name = "runner-scheduler-lambda-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}]
})
}
# IAM policy for Lambda to manage EC2 instances
resource "aws_iam_role_policy" "scheduler_lambda" {
name = "runner-scheduler-lambda-policy"
role = aws_iam_role.scheduler_lambda.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "arn:aws:logs:*:*:*"
},
{
Effect = "Allow"
Action = [
"ec2:DescribeFleets",
"ec2:ModifyFleet"
]
Resource = "*"
}
]
})
}
And the variables.tf file to customize your deployment:
variable "region" {
description = "Cluster region"
default = "eu-west-1" # Ireland - adjust to your preferred region
}
variable "local_aws_profile" {
description = "local AWS profile used for provisioning"
default = "zenika"
}
variable "client" {
description = "Client"
default = "gitlab-runner"
}
variable "stop_schedule" {
description = "Cron expression to stop the runner (UTC timezone)"
type = string
default = "cron(0 18 ? * MON-SUN *)" # 19h-20h Paris time
}
variable "start_schedule" {
description = "Cron expression to start the runner (UTC timezone)"
type = string
default = "cron(0 06 ? * MON-FRI *)" # 7h-8h Paris time, weekdays only
}
Once you have created and adapted the configuration, follow these steps:
- Run
terraform initto initialize the Terraform configuration. - Run
terraform applyto apply the configuration and deploy the infrastructure.
With these commands, Terraform will handle the deployment process, and your autonomous architecture will be up and running in no time.
Illustrations generated locally by DiffusionBee using FLUX.1-schnell model
Further reading
π Git / π¦ GitLab
- GitLab Runners: Which Topology for Fastest Job Execution?
- Efficient Git Workflow for Web Apps: Advancing Progressively from Scratch to Thriving
- Forget GitKraken, Here are the Only Git Commands You Need
- A Python Script Displaying Latest Pipelines in a Group's Projects
- A Python Script Calculating DORA Metrics
- Deploy a Majestic Single Server Runner on AWS
- The Majestic Single Server Runner βπ
- YAML Modifications: Tackling the Feedback Loop Problem
- 15+ Tips for Faster Pipelines ββπ
- 10+ Best Practices to Avoid Widespread Anti-Patterns βββπ
- Pages per Branch: The No-Compromise Hack to Serve Preview Pages
- Jobs Attributes Sorter: A Python Script for Consistent YAML Files
- Runners Topologies: Pros and Cons
βΈοΈ Kubernetes
- A Convenient Variable Substitution Mechanism for Kustomize βπ
- Why Managed Kubernetes is a Viable Solution for Even Modest but Actively Developed Applications
- From Your Docker-Compose File to a Cluster with Kompose
- A Pragmatic Kubectl Aliases Collection
- Web Application on Kubernetes: A Tutorial to Observability with the Elastic Stack
- NGINX Ingress Controller: 10+ Complementary Configurations for Web Applications
- Awesome Maintained Links You Will Keep Using Next Year
- Managed Kubernetes: Our Dev is on AWS, Our Prod is on OVHCloud βπ
- How to Deploy a Cost-Efficient AWS/EKS Cluster Using Terraform
- How to Deploy a Secured OVHCloud Managed Cluster Using Terraform
- FinOps EKS: 10 Tips to Reduce the Bill up to 90% on AWS Managed Clusters
π Miscellaneous
- Every Developer Should Review Code β Not Just Seniors
- Future-Proof Tech Blogging: Understanding AI's Core Traits
This article was enhanced with the assistance of an AI language model to ensure clarity and accuracy in the content, as English is not my native language.



Top comments (0)