DEV Community

Cover image for You Blocked docker.sock. Your Containers Are Still Not Safe.
Shamsher Khan (Shamz)
Shamsher Khan (Shamz)

Posted on

You Blocked docker.sock. Your Containers Are Still Not Safe.

I spent the last two weeks building out a full runtime escape lab — five attack scenarios, automated defense scripts, Falco rules, the works. Scenario 1 (docker.sock mounting) already has its own deep dive on DZone. Everyone knows that one.

But scenarios 3 and 4 are what actually kept me up at night. Not because they're loud and dramatic. Because they're quiet. They pass the checks your team runs. They look normal in docker inspect. And they hand an attacker a path to the host.
This post covers both. One is an audit blind spot. The other is a two-container escalation chain. Neither requires --privileged.

The Audit Blind Spot: CAP_SYS_ADMIN


CAP_SYS_ADMIN Section (Audit Gap)

Here's the thing most Docker security checklists do: they scan for Privileged: true. If the flag is false, they move on. Green checkmark. Threat neutralized.

But CAP_SYS_ADMIN alone — without --privileged — gives a container almost everything privileged mode does. Mount filesystems. Manipulate namespaces. In some kernel configurations, escape to the host entirely. And it shows up in the audit as just another capability in a list. Not as a red flag.

This is what I call the audit gap. It's the single capability that consistently slips through automated scans because scanners are tuned to look for the Privileged boolean, not for what individual capabilities can actually do.

What it actually looks like

# This is what your scanner sees
docker inspect my-container | jq '.[] | {Privileged: .HostConfig.Privileged}'
# Output: { "Privileged": false }   <-- Scanner says: all clear

# This is what's actually running
docker inspect my-container | jq '.[] | .HostConfig.CapAdd'
# Output: ["SYS_ADMIN"]             <-- Scanner doesn't flag this
Enter fullscreen mode Exit fullscreen mode

Why it matters in production
In my lab (Scenario 3 of Lab 09), I ran a container with only --cap-add=SYS_ADMIN. No privileged flag. No other dangerous capabilities. From inside that container, I was able to mount the host's /etcdirectory and read credential files directly. The container passed every Privileged: false check along the way.
T
his isn't theoretical. FUSE filesystems, certain monitoring agents, and some CI tooling legitimately request SYS_ADMIN. It's in production configs right now, and most teams have no idea what it enables.

The three-line audit that actually catches it

# Don't just check Privileged. Check capabilities.
docker ps -q | xargs docker inspect --format \
  '{{.Name}}: Privileged={{.HostConfig.Privileged}} CapAdd={{.HostConfig.CapAdd}}'

# Flag anything with SYS_ADMIN, SYS_PTRACE, or SYS_MODULE
# These three are the ones that cross the line from "useful" to "escape vector"
Enter fullscreen mode Exit fullscreen mode

That's it. Three lines. Run it against your production containers right now. If you see SYS_ADMIN in the output, you have a conversation to have with whoever owns that container.

The Escalation Chain: Host Mounts + docker.sock


Escalation Chain (Two Containers)

This one is more subtle, and it's the pattern that worries me most for real-world environments. It's not a single misconfiguration. It's two containers working together — not by design, but because an attacker can chain them.

How the chain works

Scenario 4 in the lab demonstrates this with two containers:
Container A — a bind mount that exposes /etc from the host. Legitimate use case: an application that needs to read host configuration. Totally normal setup.

docker run -d --name app-container \
  -v /etc:/host-etc:ro \
  ubuntu:22.04 sleep infinity
Enter fullscreen mode Exit fullscreen mode

Container B — has access to docker.sock. Also common. CI tools, monitoring agents, Portainer. The socket is there for a reason.

docker run -d --name monitor \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ubuntu:22.04 sleep infinity
Enter fullscreen mode Exit fullscreen mode

Neither container alone is the vulnerability. Container A can't create new containers. Container B can't read host files (it doesn't have a bind mount to /etc). But an attacker who controls Container B can use the Docker API through the socket to create a new container that mounts the same paths Container A has access to — and make it privileged.

# From inside Container B (has docker.sock):
docker run --privileged \
  -v /etc:/stolen-etc \
  -v /var/run/docker.sock:/var/run/docker.sock \
  alpine cat /stolen-etc/shadow
Enter fullscreen mode Exit fullscreen mode

The chain is: socket access → create privileged container → mount host paths → full credential access. No single container had all the permissions. The attacker assembled them.

Why this is hard to catch

Each container, in isolation, passes a standard audit. Container A has a bind mount — auditors see it, note it as "read-only, accepted." Container B has the socket — it's flagged in some scans, but it's a monitoring tool, so it gets an exception. The combination is what's dangerous, and nobody audits combinations.

What detection actually looks like

The defense script I built for this scenario generates a Falco rule that watches for the pattern specifically — a socket-mounted container spawning a new privileged container with host path mounts. Here's the core logic:

- rule: Escalation Chain Detected
  desc: >
    Container with docker.sock access created a new privileged
    container that mounts host paths. Likely escalation chain.
  condition: >
    container.image.digest != "" and
    evt.type = container_start and
    ka.verb = create and
    container.privileged = true and
    container.mount.dest in ("/etc", "/root", "/var/run")
  output: >
    Escalation chain: socket container spawned privileged mount
    (container=%container.name image=%container.image)
  priority: CRITICAL
Enter fullscreen mode Exit fullscreen mode

This catches the pattern, not just the individual container. That's the shift in thinking that makes the difference.

The Hands-On Lab

All five scenarios — including the two I covered here — are in the open-source lab repository. Each scenario has:

  • demo.sh — runs the attack so you can see it
  • defense.sh — generates the detection artifacts (Falco rules, audit scripts)
  • validate.sh — verifies your defenses actually work
  • cleanup.sh — tears everything down cleanly
git clone https://github.com/opscart/docker-security-practical-guide
cd labs/09-runtime-escape

# Run Scenario 3 (CAP_SYS_ADMIN audit gap)
cd scenario-3-sys-admin
chmod +x *.sh && ./demo.sh

# Run Scenario 4 (host mount escalation chain)
cd ../scenario-4-host-mount
chmod +x *.sh && ./demo.sh
Enter fullscreen mode Exit fullscreen mode

Works on Docker Desktop (macOS/Windows) and Linux. The README has Docker Desktop-specific notes for the parts that behave differently on the VM layer.

What to actually do about it
Two things. Both take five minutes.

First, run the capability audit from above against your environment. Look for SYS_ADMIN, SYS_PTRACE, SYS_MODULE. If you find them, trace back to why they're there. Half the time, nobody remembers.

Second, audit for the escalation chain pattern: any container with docker.sock mounted in the same environment as containers with host path bind mounts. If both exist, even if they're unrelated services, the attack surface is there.

Related reading
If you want the full picture on container runtime escapes, these are the other pieces in the series:

The lab repo has everything: github.com/opscart/docker-security-practical-guide

Connect:

Top comments (0)