DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
# From 400 Alerts/Night to 8: The SRE Playbook That Saved My Team’s Sanity

# From 400 Alerts/Night to 8: The SRE Playbook That Saved My Team’s Sanity

Comments
3 min read
Lessons in Testing, Performance, and Legacy Systems from /dev/mtl 2025

Lessons in Testing, Performance, and Legacy Systems from /dev/mtl 2025

Comments
7 min read
A Complete Production-Ready Checklist for Smooth, Safe Deployments

A Complete Production-Ready Checklist for Smooth, Safe Deployments

Comments
1 min read
Utility Sector Outage Prep with Load Tests

Utility Sector Outage Prep with Load Tests

Comments
8 min read
Rightsizing Kubernetes Requests with the In-Place Vertical Pod Autoscaler

Rightsizing Kubernetes Requests with the In-Place Vertical Pod Autoscaler

2
Comments
3 min read
AWS Security Series: AWS Access Key is Compromised. Now What? An Incident Response Playbook.

AWS Security Series: AWS Access Key is Compromised. Now What? An Incident Response Playbook.

Comments
3 min read
Bash Scripting for Non-Coders

Bash Scripting for Non-Coders

Comments
37 min read
What is performance engineering: A Gatling take

What is performance engineering: A Gatling take

Comments
8 min read
A practical guide to observability TCO and cost reduction

A practical guide to observability TCO and cost reduction

6
Comments
13 min read
The Lie of the Global Average: Why Taming Complex SLIs Requires Bucketing

The Lie of the Global Average: Why Taming Complex SLIs Requires Bucketing

Comments
6 min read
How AI-Powered Observability Actually Changes Life For CIOs

How AI-Powered Observability Actually Changes Life For CIOs

Comments
5 min read
Reverse Proxy en Docker con Nginx y SSL automático

Reverse Proxy en Docker con Nginx y SSL automático

Comments
7 min read
The Hidden Currency of Tech Leadership: The Resilience Loop

The Hidden Currency of Tech Leadership: The Resilience Loop

Comments
1 min read
Building an Air-gapped Hardened Kubernetes Cluster with Kubespray

Building an Air-gapped Hardened Kubernetes Cluster with Kubespray

Comments
3 min read
End-to-End DevSecOps Project (Movies Finder)

End-to-End DevSecOps Project (Movies Finder)

Comments
2 min read
AWS Multi-Account Guardrails: A Complete Blueprint for Secure, Automated Cloud Governance

AWS Multi-Account Guardrails: A Complete Blueprint for Secure, Automated Cloud Governance

Comments
9 min read
What Engineers Can Learn From the Cloudflare Outage (November 2025)

What Engineers Can Learn From the Cloudflare Outage (November 2025)

Comments
4 min read
EKS Standard vs. EKS Auto Mode: The Evolutionary Leap in Kubernetes Operations

EKS Standard vs. EKS Auto Mode: The Evolutionary Leap in Kubernetes Operations

8
Comments
6 min read
Rightsizing Kubernetes Requests with the In-Place Vertical Pod Autoscaler

Rightsizing Kubernetes Requests with the In-Place Vertical Pod Autoscaler

6
Comments
3 min read
Vendor Tools & Reliability — Lessons from the 2025 Cloud Outages

Vendor Tools & Reliability — Lessons from the 2025 Cloud Outages

Comments
3 min read
USRE: Unifying DevOps, SRE, Security & Compliance for the Next Generation of SaaS

USRE: Unifying DevOps, SRE, Security & Compliance for the Next Generation of SaaS

Comments
7 min read
How to Cut AWS Costs and Maintain Reliability Without a FinOps Team

How to Cut AWS Costs and Maintain Reliability Without a FinOps Team

Comments
3 min read
The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

Comments
3 min read
Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Comments
4 min read
Map a Kubernetes cluster with one command

Map a Kubernetes cluster with one command

Comments
1 min read
loading...