Linux Kernel Evolution — Container Security Interview Q&A (Intermediate → Expert)

1. Kernel Architecture & Threat Modeling

Q1. From a kernel perspective, what actually enforces container isolation?

Answer: Container isolation is enforced by multiple independent kernel subsystems, not a single boundary:

Namespaces restrict visibility (what a process can see)
cgroups restrict resource consumption (what a process can exhaust)
Capabilities restrict privileged operations
Seccomp restricts kernel attack surface (syscalls)
LSMs enforce mandatory access control

Isolation strength is therefore emergent and configuration-dependent. Any misconfiguration in one layer weakens the whole model.

Q2. Why is the Linux kernel considered the single point of failure for containers?

Answer: Because containers do not virtualize the kernel. All containers share:

The same syscall interface
The same kernel memory
The same kernel scheduler

A single kernel privilege escalation (e.g., Dirty Pipe) can allow:

Container → host compromise
Cross-container impact
Node-wide breach in Kubernetes

This is fundamentally different from VM-based isolation.

2. Namespaces — Deep Dive

Q3. Why are namespaces considered weak isolation compared to virtualization?

Answer: Namespaces only affect namespaces of perception, not authority:

They hide objects (PIDs, mounts, interfaces)
They do not prevent privileged kernel actions

If a process gains sufficient privileges (e.g., CAP_SYS_ADMIN), namespaces become largely bypassable.

Q4. Explain a real attack path involving mount namespaces.

Answer: A common exploit chain:

Container runs with CAP_SYS_ADMIN or privileged
Writable hostPath mount exists
Attacker remounts host filesystem
Modifies host binaries or runtime files
Achieves persistent host compromise

Mount namespaces are therefore one of the highest-risk kernel interfaces in container environments.

Q5. Why are user namespaces still not universally enabled in production?

Answer: User namespaces introduce:

Complex UID/GID mappings
Filesystem ownership complications
Compatibility issues with legacy software and NFS

Despite being a major security improvement, operational friction has limited adoption.

3. cgroups — Abuse & Defense

Q6. How can cgroups be abused offensively?

Answer: Attackers can:

Trigger repeated OOM kills to disrupt workloads
Exhaust PID limits to cause node instability
Abuse CPU shares to starve critical services

These are availability attacks, not escapes, but can be leveraged for lateral movement or incident masking.

Q7. Why are cgroups considered a DoS control rather than a security boundary?

Answer: Because cgroups:

Do not restrict privileges
Do not isolate memory access
Do not prevent kernel exploitation

They reduce blast radius but do not prevent compromise.

4. Capabilities — Privilege Escalation Reality

Q8. Why is CAP_SYS_ADMIN effectively equivalent to root in containers?

Answer: CAP_SYS_ADMIN is a catch-all capability that includes:

Mounting filesystems
Namespace manipulation
Kernel tuning interfaces

Most historical container escapes rely on this capability.

Q9. Why is dropping capabilities more effective than running as non-root alone?

Answer: Because UID ≠ privilege in Linux.

A non-root process with dangerous capabilities can still:

Mount filesystems
Reconfigure networking
Abuse kernel interfaces

Capabilities define what you can do, not UID.

5. Seccomp — Kernel Surface Reduction

Q10. Why does seccomp materially change exploitability, not just risk?

Answer: Most kernel exploits require:

Specific syscalls
Specific argument patterns

By blocking those syscalls entirely, seccomp can:

Break exploit primitives
Convert RCE into DoS
Force attackers into harder chains

This is exploit prevention, not just detection.

Q11. Why are default seccomp profiles often insufficient?

Answer: Because they:

Are generic
Allow many legacy syscalls
Optimize for compatibility over minimalism

High-security workloads require tailored profiles.

6. LSMs — Policy Enforcement

Q12. How do LSMs protect against container breakouts?

Answer: LSMs restrict:

File access beyond DAC
Process interactions
Capability usage in context

They can block attacker actions after container compromise.

Q13. Why are LSMs often misconfigured or disabled?

Answer: Because:

Policies are hard to write
Break applications silently
Require deep workload knowledge

Security teams often trade enforcement for operability.

7. Filesystems & Images

Q14. Why are OverlayFS-related bugs especially dangerous?

Answer: OverlayFS operates across:

Host filesystem
Container layers
Copy-on-write logic

Bugs here often lead to host filesystem access or corruption.

Q15. Why are runtime mounts riskier than image layers?

Answer: Because mounts are:

Mutable at runtime
Often writable
Frequently misconfigured

Image layers are static; mounts are dynamic attack surfaces.

8. Kernel CVEs — Practical Risk Assessment

Q16. Why are kernel CVEs like Dirty Pipe still relevant today?

Answer: Because:

Containers allow attacker-controlled code execution
Shared kernel magnifies impact
Exploits are often reliable and fast

Kernel patch latency directly maps to container risk.

Q17. Why don’t kernel exploits always lead to container escapes in practice?

Answer: Because successful exploitation often depends on:

Available syscalls
Capabilities
LSM policies
Kernel config hardening

Defense-in-depth can break exploit chains.

9. Senior-Level Tradeoff Questions

Q18. Why did Linux evolve containers without a strong isolation boundary?

Answer: Linux optimized for:

Performance
Multi-tenancy efficiency
Backward compatibility

Security isolation was layered incrementally, not designed upfront.

Q19. When is container isolation insufficient by design?

Answer: When:

Running untrusted tenant code
High-value secrets share the node
Regulatory isolation requirements exist

VMs or hardware isolation are more appropriate.

Q20. What kernel hardening steps provide the highest ROI?

Answer: In practice:

Enable user namespaces
Drop all unnecessary capabilities
Enforce seccomp profiles
Enable SELinux/AppArmor
Patch kernels aggressively

Together, these drastically reduce real-world exploitability.

10. Standardization & Container Runtimes

Q21. Why was standardization (OCI) necessary from a security and ecosystem perspective?

Answer: Without standardization, container behavior was tightly coupled to Docker’s implementation, creating:

Vendor lock-in
Inconsistent security guarantees
Opaque runtime behavior

OCI introduced explicit contracts for image format and runtime behavior, making container execution auditable, portable, and analyzable across platforms.

Q22. What security problem does the OCI Runtime Specification actually solve?

Answer: The OCI runtime spec defines how isolation must be applied, including:

Namespace configuration
cgroup application
Mount semantics
Capability dropping

It does not guarantee security, but it eliminates ambiguity, which is critical for:

Threat modeling
Runtime hardening
CVE impact assessment

Q23. Why is `runc` considered one of the most sensitive components in the container stack?

Answer: Because runc:

Runs with elevated privileges
Directly configures namespaces, cgroups, mounts, and capabilities
Executes the container’s initial process

Any flaw in runc can collapse all higher-level isolation, which is why runc CVEs often lead to container escape class vulnerabilities.

Q24. How does the `runc` execution model increase blast radius compared to higher-level runtimes?

Answer: runc operates:

Outside the container
With host-level privileges
At container creation time

A successful exploit can therefore impact the host and all containers on the node, not just a single workload.

Q25. Why doesn’t containerd replace `runc` from a security standpoint?

Answer: Because containerd is a lifecycle manager, not an execution engine. It:

Pulls and unpacks images
Manages snapshots and metadata
Delegates execution to runc

The actual isolation enforcement still occurs in runc and the kernel.

Q26. From a security perspective, what risks does containerd introduce?

Answer: Although safer than runc, containerd:

Becomes a high-value control-plane component
Manages container lifecycle and state
Exposes APIs that, if reachable, can allow container manipulation

Misconfiguration or exposure of containerd APIs can lead to host-level impact without kernel exploitation.

Q27. How does CRI-O reduce risk compared to Docker-based stacks?

Answer: CRI-O:

Implements only what Kubernetes requires
Removes Docker legacy features
Reduces code paths and attack surface

However, it still relies on runc, so kernel and runtime risks remain.

Q28. Why is the Kubernetes CRI a security-relevant abstraction?

Answer: CRI expands the trust boundary by inserting:

kubelet
CRI
runtime

Any weakness or misbehavior in this chain can affect all workloads on a node, making runtime integrity critical.

Q29. Why are runtime sockets (e.g., docker.sock, containerd.sock) considered critical security risks?

Answer: Because access to runtime sockets allows:

Creating privileged containers
Mounting host filesystems
Escaping namespace boundaries without exploits

In practice, mounting docker.sock is equivalent to granting root on the host.

Q30. How do image registries expand the runtime attack surface?

Answer: Registries introduce:

Supply chain trust assumptions
Remote code ingestion at scale
Dependency confusion risks

A compromised registry or image can affect entire fleets, not just individual containers.

Q31. Why is image immutability critical for runtime security assumptions?

Answer: Because mutable images:

Break provenance guarantees
Undermine incident response
Allow silent behavior changes

Immutability enables deterministic forensics and rollback.

Q32. How do runtime CVEs differ from kernel CVEs in exploit dynamics?

Answer:

Kernel CVEs exploit shared execution primitives
Runtime CVEs exploit privileged orchestration logic

Runtime CVEs often require less sophistication and are more reliable in real environments.

Q33. Why is patch latency especially dangerous for container runtimes?

Answer: Because runtimes:

Run on every node
Are identical across clusters
Sit below orchestration layers

A single unpatched runtime CVE can enable mass compromise.

Q34. From a defender’s perspective, what runtime signals indicate possible exploitation?

Answer: High-signal indicators include:

Unexpected container creation or deletion
Runtime socket access from containers
Mount or namespace syscalls during runtime
Execution of shells or debugging tools in production containers

Q35. What is the core security lesson of standardization and runtimes?

Answer: Standardization improves consistency and observability, but it also concentrates risk.

Security therefore depends on:

Aggressive patching
Minimal runtime exposure
Strong runtime monitoring

namishelex01/container_security_questions.md