Kubernetes Complete Guide: Pods, Deployments, RBAC & Production Patterns
Kubernetes (K8s) is the single most important skill in modern DevOps and SRE. I have used it daily in production — managing clusters on AWS EKS and Azure AKS, responding to incidents, tuning HPA policies, and designing multi-AZ high-availability architectures. This guide is written from that real-world experience, not from reading documentation.
Originally built by Google based on their internal system Borg, Kubernetes was open-sourced in 2014 and has since become the industry standard for running containerised applications at scale. If you are preparing for a Senior DevOps, Platform Engineer, or SRE interview, a deep, practical understanding of Kubernetes is non-negotiable.
What is Kubernetes and Why Does It Exist?
Before Kubernetes, running Docker containers at scale exposed a fundamental problem: Docker solved packaging, but not orchestration. What happens when a container crashes? How do you spread containers across 50 servers? How do you update 100 running containers without downtime? How do you handle a traffic spike at 2am?
Kubernetes answers all of these questions. It is an orchestration platform — a system that manages containers across a cluster of machines. You declare what you want (3 replicas of my API, always running, with 512MB RAM each) and Kubernetes makes it happen and maintains it — even when servers fail, containers crash, or traffic doubles.
Kubernetes Architecture
A Kubernetes cluster has two types of machines: the control plane and worker nodes. Understanding this architecture is the first question in most K8s interviews.
Control Plane Components
- API Server (kube-apiserver) — The single entry point for all operations. Every kubectl command, every CI/CD tool, every internal component — all talk to the API server. It validates requests and writes to etcd.
- etcd — A distributed key-value store that holds all cluster state. If etcd is lost without backup, the cluster is lost. In production, run etcd on at least 3 nodes for high availability. Back it up daily.
- Scheduler (kube-scheduler) — Watches for unscheduled Pods and assigns them to nodes based on resource availability, node selectors, affinity rules, and taints/tolerations.
- Controller Manager (kube-controller-manager) — Runs dozens of control loops. The Deployment controller watches for desired replica count and creates/deletes ReplicaSets. The ReplicaSet controller watches for desired pod count. Each controller does one job and does it continuously.
Worker Node Components
- kubelet — The agent on every worker node. Receives Pod specs from the API server and ensures the described containers are running and healthy. Reports node and pod status back to the control plane.
- kube-proxy — Maintains iptables or IPVS rules for Service routing. Enables any Pod to reach any Service IP, regardless of which node the target Pod is on.
- Container Runtime — The software that actually runs containers. containerd is the standard runtime in modern clusters (Docker was removed as the default runtime in K8s 1.24).
Core Workload Objects
Pod — The Atomic Unit
A Pod is the smallest deployable unit. It contains one or more containers that share a network namespace (they talk to each other via localhost) and storage volumes. In practice, most Pods run a single container. The sidecar pattern (Istio's Envoy proxy, Vault's secret injector) is the main case for multi-container Pods.
Pods are ephemeral. They are not self-healing. If a Pod dies, it stays dead unless a controller (like a Deployment) creates a replacement. Never create bare Pods in production — always use a Deployment or StatefulSet.
Deployment — For Stateless Applications
A Deployment manages a ReplicaSet (a set of identical Pods). You declare the number of replicas and the container image. The Deployment controller ensures that many pods are always running. It also handles rolling updates — replacing Pods gradually so there is no downtime — and rollbacks when a new version fails.
StatefulSet — For Stateful Applications
StatefulSets are like Deployments but with three additional guarantees: stable Pod names (pod-0, pod-1, pod-2 — never random), stable per-Pod PersistentVolumeClaims that survive rescheduling, and ordered start/stop (pod-0 starts before pod-1, pod-1 stops before pod-0). Use StatefulSets for PostgreSQL, MySQL, Kafka, Redis Cluster, Cassandra, Elasticsearch — any workload where instance identity matters.
DaemonSet — One Pod Per Node
DaemonSets ensure exactly one Pod runs on every node (or a subset). Used for: log collectors (Fluentd, Filebeat), monitoring agents (Prometheus Node Exporter), network plugins (Calico CNI), and security agents (Falco). When a new node joins the cluster, the DaemonSet Pod is automatically scheduled on it.
Networking — Services and Ingress
Kubernetes networking is the most common area where engineers get stuck. There are four distinct communication problems to understand:
- Container-to-container within a Pod — Shared network namespace. Use localhost.
- Pod-to-Pod across nodes — Every Pod gets a unique cluster IP. Pods can reach each other directly. The CNI plugin (Calico, Cilium, Flannel) implements this flat network.
- Pod-to-Service — Services provide a stable virtual IP and DNS name. kube-proxy routes traffic from the Service IP to healthy Pod endpoints. Even as Pods are replaced, the Service IP stays constant.
- External-to-Cluster — LoadBalancer Services or Ingress controllers route external traffic in.
Service Types
- ClusterIP (default) — Internal-only virtual IP. Use for microservice-to-microservice communication. DNS:
servicename.namespace.svc.cluster.local - NodePort — Exposes service on a static port (30000–32767) on every node. Useful for development. Not for production — exposes node IPs directly.
- LoadBalancer — Provisions a cloud load balancer (AWS ELB, Azure LB). One load balancer per service — gets expensive quickly for many services.
- Ingress — L7 HTTP/HTTPS router. One Ingress controller (nginx, AWS ALB, Traefik) handles routing for all services based on hostname and URL path rules. SSL termination included. This is the right solution for most production HTTP workloads.
Real YAML Examples
Production Deployment
Ingress with TLS
HPA with Custom Metrics
Health Probes — The Most Misunderstood Feature
In production Kubernetes environments, health probe misconfiguration is the single most common cause of production incidents I have investigated. Get these wrong and you will have either CrashLoopBackOff (liveness probe too aggressive) or traffic to broken pods (readiness probe missing).
failureThreshold: 30 and periodSeconds: 10 on the startup probe. This gives 300 seconds (5 minutes) for startup before liveness kicks in. Without this, JVM apps frequently enter CrashLoopBackOff on first deploy.
Security and RBAC
Role-Based Access Control (RBAC) is the primary security mechanism inside a Kubernetes cluster. The principle of least privilege applies everywhere: a Pod should only have access to the resources it actually needs, and nothing more.
Debugging Kubernetes in Production
This is the section that separates senior engineers from juniors in interviews. When a pod is broken in production, you need a systematic, fast debugging approach — not random kubectl commands.
Interview Q&A
kubectl logs --previous to see the last crash output, and kubectl describe pod to see the exit code. Exit code 137 = OOMKilled, exit code 1 = application error.kubectl describe pod events for "FailedScheduling" to see why.maxUnavailable: 0 and maxSurge: 1 in the RollingUpdate strategy. Configure a proper readiness probe so the new Pod only joins the Service endpoints when it is actually ready to serve traffic. Set a preStop hook (sleep 5) to allow in-flight requests to complete before the container is terminated. Set terminationGracePeriodSeconds long enough for graceful shutdown. Test with a canary deployment first using a second Deployment with 1 replica pointing to the new image.minAvailable: 2 ensures at least 2 Pods are always running. Set PDBs for every production Deployment with more than 1 replica.☸️ Explore Kubernetes on the Interactive Mind Map
See how Kubernetes connects to Docker, Helm, ArgoCD, Prometheus, AWS, and more — with real commands and 5 interview Q&As per tool.
Open Interactive Mind Map ← DevOps Basics FirstKubernetes runs on cloud infrastructure. See how to deploy and manage EKS clusters on AWS →, and automate the whole thing with Terraform →
📩 Get Free DevOps Interview Notes
Cheat sheets, real commands, interview Q&As — free.
No spam · Follow @master.devops for daily tips