“Those who forget history are doomed to repeat it — usually in production.”
This document traces the evolution of containers, focusing not on hype, but on what actually broke, why it broke, and what attackers really abused.
- Multi-user, time-sharing systems
- Single shared kernel
- Processes trusted each other far more than they should have
Unix assumed cooperation, not hostility. As soon as multi-user systems appeared, it became clear that:
- One bad process could snoop, starve, or interfere with others
- Process separation ≠ security boundary
- Treating process separation as isolation
→ If you can run code, you can usually hurt neighbors. - Running untrusted workloads on shared hosts without MAC/LSM
- World-writable directories enabling persistence and tampering
There are no famous CVEs here — because the threat model itself was flawed.
Security failure was architectural, not patch-level.
Takeaway:
If your security model assumes “well-behaved users,” attackers will politely disagree.
- Changed the apparent root directory for a process
- Provided filesystem isolation only
- No process isolation
- No network isolation
- No privilege reduction
chroot was never meant to be a sandbox — it was a convenience feature.
- Using
chrootas the only sandbox - Leaving
/proc,/dev, or sensitive mounts accessible - Running privileged processes inside
chroot
Root inside a poorly configured chroot often equals root outside it.
Breakouts are almost always misconfiguration-driven, not chroot bugs.
Takeaway:
chroot answers “where am I?”, not “what am I allowed to do?”
- Separate kernels
- Strong hardware-backed isolation
- Clear security boundary
- Heavy
- Slow to boot
- Resource-hungry
- Unpatched hypervisors or guest tools
- Over-permissive device passthrough
- Exposed management planes (vCenter, ESXi APIs)
- CVE-2015-3456 (VENOM) – QEMU floppy controller escape class
- CVE-2017-4902 – VMware SVGA escape-related vulnerability
VMs fail rarely, but when they do, the blast radius is massive.
Takeaway:
VMs trade speed for isolation — containers do the opposite.
Containers are not magic. They are carefully composed kernel features.
Namespaces isolate views of system resources.
-
PID Namespace
- Each container sees its own PID tree
- PID 1 semantics matter (signal handling, zombie reaping)
-
Mount Namespace
- Independent mount tables
- HostPath mounts can silently destroy isolation
-
Network Namespace
- Separate interfaces, routes, firewall rules
- Misconfigurations enable lateral movement
-
IPC Namespace
- Isolates shared memory and message queues
-
UTS Namespace
- Hostname isolation (mostly cosmetic)
-
User Namespace
- Maps container root to unprivileged host UID
- One of the strongest hardening features
- Also one of the most underused
-
Cgroup Namespace
- Mostly reduces information leakage
Security truth:
Namespaces reduce visibility — they do not enforce trust boundaries.
Controls resource consumption:
- CPU
- Memory
- PIDs
- I/O
- Prevents denial-of-service
- Limits blast radius
- v1: fragmented, confusing
- v2: unified, predictable
Security role:
Availability protection, not confidentiality or integrity.
Root is split into fine-grained powers:
CAP_SYS_ADMIN– the “god capability”CAP_NET_ADMINCAP_SYS_PTRACECAP_DAC_OVERRIDE
A “privileged container” is basically root with better marketing.
- Filters syscalls per workload
- Dramatically reduces exploitability
- Requires application awareness
Security trade-off:
Tighter profiles = safer, but risk of breaking apps.
- Mandatory access control
- Enforced even for root
- Only as good as the policy
- OverlayFS enables image layering
- Bind mounts enable config/secrets injection
Danger zone:
- Mounting
/var/run/docker.sock - Mounting
/proc,/sys, or host root paths
- No user namespaces
- HostPID / HostNetwork / HostIPC
- Privileged containers
- No seccomp / no LSM
- Excessive capabilities
- CVE-2016-5195 (Dirty COW)
- CVE-2022-0847 (Dirty Pipe)
Takeaway:
Containers share a kernel. Kernel bugs matter a lot.
- First “real” containers
- System-container model (OS-like)
- Low-level, complex
- Treating LXC as VM-equivalent
- Weak UID/GID mappings
- Broad privileges
Reality:
Most serious issues come from the kernel or runtime, not LXC itself.
- Simple CLI
- Image distribution
- Immutable artifacts
- Privileged containers
- Mounting
docker.sock - Running as root
- Mutable tags (
latest) - Untrusted registries
- CVE-2019-14271 –
docker cpbreakout class - CVE-2019-5736 – runc overwrite escape
Takeaway:
Docker made containers easy — and made insecure defaults popular.
- Image spec
- Runtime spec
- Behavior standardization, not security guarantees
- runc – executes containers (high-risk)
- containerd – lifecycle & images
- CRI-O – Kubernetes-focused
- Exposed runtime sockets
- Outdated runc/containerd
- No image signing or provenance
- CVE-2020-15257 – containerd-shim escape class
- CVE-2024-21626 – runc FD leak breakout
- Scheduling
- Self-healing
- Declarative control
Security moves from host-level to API-driven cluster control.
- RBAC
- ServiceAccount tokens
- Kubelet exposure
- etcd protection
- East-west traffic
- Cluster-admin everywhere
- Auto-mounted tokens
- No NetworkPolicy
- Privileged pods + hostPath
- Insecure etcd
- CVE-2018-1002105 – API server privilege escalation
- CVE-2021-25741 – kubelet subPath host access
Truth:
Most Kubernetes breaches are API abuse, not container escapes.
- Containers are processes
- Isolation is conditional
- Defaults are unsafe
- Perimeter → workload
- Static → ephemeral
- Single exploit → chained abuse
- Build-time: minimal images, signing, SBOMs
- Deploy-time: non-root, seccomp, admission policies
- Runtime: syscall, file, network, process monitoring
- Platform: node + control plane hardening
Prevention fails silently. Detection fails loudly — and that’s good.
- Containers aren’t strong isolation
- RBAC misconfig is worse than kernel CVEs
- Runtime visibility is mandatory
- Security is configuration, not tooling
Containers trade isolation strength for speed and scale.
Good security is not about trusting that trade —
it’s about monitoring it continuously.