AWS DevOps Roadmap: EKS, IAM, VPC, IRSA & Real Interview Questions
AWS is the dominant cloud platform in DevOps, with the widest service catalogue and the largest ecosystem of tools and integrations. At a top enterprise, I work with AWS daily — provisioning EKS clusters with Terraform, configuring IRSA for pod-level IAM authentication, designing multi-AZ VPC architectures, and debugging IAM permission issues. This guide covers the AWS services and concepts that appear most in DevOps interviews.
Core AWS Services Every DevOps Engineer Must Know
- EKS (Elastic Kubernetes Service) — Managed Kubernetes control plane. AWS manages the control plane (API server, etcd); you manage worker nodes via Node Groups.
- EC2 (Elastic Compute Cloud) — Virtual machines. The foundation of most AWS workloads. Know instance families (t3, m5, c5, r5) and when to use each.
- IAM (Identity and Access Management) — Authentication and authorisation for all AWS APIs. Every action in AWS goes through IAM.
- VPC (Virtual Private Cloud) — Isolated virtual network. Every production workload lives in a VPC with private subnets.
- S3 (Simple Storage Service) — Object storage. Terraform state, build artifacts, static assets, log archives, and data lake storage.
- RDS (Relational Database Service) — Managed SQL databases. Multi-AZ for HA, Read Replicas for read scaling.
- ALB/ELB (Load Balancers) — Application Load Balancer for L7 HTTP routing. Network Load Balancer for L4 TCP performance.
- CloudWatch — Metrics, logs, alarms, and dashboards. The primary observability tool in AWS environments.
- Lambda — Serverless functions. Event-driven processing without managing servers.
VPC Architecture — Production Design
Every production AWS workload should live in a properly designed VPC. The standard multi-AZ architecture:
- 3 public subnets (one per AZ) — Internet-facing resources only: NAT Gateways, Application Load Balancers, bastion hosts.
- 3 private subnets (one per AZ) — Application layer: EC2 instances, EKS worker nodes, Lambda functions. No direct internet access.
- 3 database subnets (one per AZ) — Database layer: RDS, ElastiCache. No route to internet at all.
- NAT Gateway in each public subnet — Allows private subnet resources to initiate outbound internet connections (for package downloads, AWS API calls) without accepting inbound connections.
- Internet Gateway attached to VPC — Required for public subnet resources to communicate with the internet.
IAM — Identity and Access Management
IAM is the most complex AWS service and the most common source of security vulnerabilities. The principle of least privilege must be applied everywhere: users, roles, and services should only have exactly the permissions they need and nothing more.
Key IAM Concepts
- IAM User — Long-term credentials for a human or non-federated service. Avoid creating IAM Users for applications — use IAM Roles instead.
- IAM Role — Temporary credentials assumed by services, applications, or users. No long-term access keys. A Lambda function, EC2 instance, or EKS pod assumes a role to get temporary credentials.
- IAM Policy — JSON document defining allowed/denied actions on specific resources. Attach to users, groups, or roles.
- SCP (Service Control Policy) — Organisation-level guardrails that restrict what accounts in an AWS Organisation can do, regardless of IAM policies. Use to prevent root account actions, restrict regions, or mandate encryption.
IRSA — IAM Roles for Service Accounts
IRSA (IAM Roles for Service Accounts) is one of the most important EKS security features. Without IRSA, the only way to give a Kubernetes pod AWS API access is to attach an IAM role to the EC2 node — giving every pod on that node the same permissions. This violates least privilege.
IRSA solves this by federating Kubernetes Service Accounts with IAM Roles using OIDC. Each pod gets its own IAM role with precisely the permissions it needs. The mechanism:
- EKS cluster has an OIDC provider endpoint (configured when creating the cluster).
- An IAM Role is created with a trust policy that allows the Kubernetes Service Account to assume it.
- The Service Account is annotated with the IAM Role ARN.
- When the Pod starts, the EKS Pod Identity Webhook injects AWS credentials via environment variables — no access keys stored anywhere.
S3 vs EBS vs EFS — When to Use Each
- S3 (Object Storage) — Unlimited capacity, 99.999999999% (11 nines) durability, accessed via HTTP API. Use for: build artifacts, Terraform state, static website assets, log archives, data lake, backups. Not mountable as a filesystem (except with S3FS, which is slow).
- EBS (Elastic Block Store) — Block storage attached to a single EC2 instance or Kubernetes pod (PersistentVolume). Low latency, high IOPS. Use for: OS volumes, databases that need block storage (Postgres, MySQL running on EC2), Kubernetes PersistentVolumes for StatefulSets.
- EFS (Elastic File System) — Managed NFS filesystem. Can be mounted by thousands of EC2 instances and pods simultaneously. Use for: shared content (CMS media, machine learning datasets), configuration files shared across instances, ReadWriteMany Kubernetes PersistentVolumes.
RDS — Multi-AZ vs Read Replicas
This is a very common AWS interview question, and the answer requires understanding the purpose of each:
- Multi-AZ — High availability (HA) and disaster recovery. A standby instance in a different AZ receives synchronous replication. On primary failure, RDS automatically promotes the standby in 1-2 minutes. The standby is not accessible for reads — it exists purely for failover. Use in all production environments.
- Read Replica — Read scalability. Asynchronous replication to one or more read-only instances. Applications can route read queries to replicas, reducing load on the primary. Replicas can be in the same AZ, different AZ, or even different regions. Can be promoted to a standalone database if needed.
Interview Q&A
☁️ Explore AWS on the Interactive Mind Map
See how AWS connects to Kubernetes, Terraform, Prometheus, and more — with real commands and interview Q&A.
Open Interactive Mind Map ← Terraform GuideAWS infrastructure is best managed as code. See how Terraform → provisions your VPCs, EKS clusters, and IAM roles automatically.
📩 Get Free DevOps Interview Notes
Cheat sheets, real commands, interview Q&As — free.
No spam · Follow @master.devops for daily tips