Skip to content

Topic directory

Browse InfraTales by operating problem.

Use topic hubs to navigate the publication like a working technical library rather than a flat archive.

How this is organized

Topics map to operating surfaces, not AWS service names.

Most AWS content is organized by service: "here's an S3 tutorial, here's a Lambda guide." That's fine for learning individual services. It's useless when you're trying to make an architecture decision that spans six services and three teams.

InfraTales organizes by the problem you're solving. If you're designing a system shape, that's architecture. If you're figuring out why your deploy pipeline keeps breaking, that's DevOps. If you're trying to cut your AWS bill without breaking production, that's cost optimization. The service names show up inside the articles - they're not the organizing principle.

Each topic hub below collects articles, patterns, and case studies around one operating surface. Some have deep libraries already. Others are building out. Here's what each one covers.

AWS Architecture

Where you go when the question is "how should this system be shaped?" VPC design, multi-account strategy, service boundaries, data flow patterns. Not service tutorials - architecture decisions with trade-offs, failure modes, and cost implications baked in.

If you're doing a design review or evaluating whether your current architecture will hold at 10x scale, start here.

DevOps and Platform Engineering

CI/CD pipelines, IaC patterns, developer platforms, deployment strategies. The mechanics that let teams ship without breaking things. CDK vs Terraform decisions, blue-green vs canary, and what your deploy pipeline should actually look like when it's not a demo.

This is where you'll find the CodeDeploy, CodePipeline, and GitHub Actions content.

Security and Reliability

IAM that actually follows least-privilege (not just claims to), failure isolation patterns, KMS encryption decisions, WAF rules that do something useful, and recovery plans that work at 2am. The stuff that matters when something breaks and the on-call engineer is you.

If your security review is coming up or you just had an incident, this is your starting point.

Cost Optimization

Not just "turn off unused instances." Architecture-level cost decisions, FinOps patterns, Reserved Instance math that doesn't lie, and the specific places where optimization breaks production reliability. Real numbers, real trade-offs.

Every cost article includes a breakdown by service so you can compare against your own bill.

AI and GenAI Infrastructure

Production AI isn't a notebook running on a GPU. It's RAG pipelines, vector stores, model serving, LLM cost management, and the operational reality around inference workloads. Bedrock, SageMaker, OpenSearch Serverless - the infrastructure side that most AI content ignores.

Content building out - first articles planned for Q2 2026.

Observability and Operations

Monitoring that catches problems before users do. Logging that's actually searchable during an incident. Alerting that doesn't wake you up for things that can wait. CloudWatch, X-Ray, OpenTelemetry, and the operational discipline that makes systems legible under stress.

Content building out - first articles planned for Q2 2026.

How the taxonomy works

Clean public hubs, tighter editorial control.

Every article should belong to one primary hub and a small set of public facets. Internal tags stay internal and carry format, funnel, series, and refresh metadata.

Public hub tags

AWS Architecture, DevOps and Platform Engineering, AI and GenAI Infrastructure, Security and Reliability, Observability and Operations, Cost Optimization, Case Studies and Failure Breakdowns, Opinion and Architecture Reviews.

Public facet tags

Use narrowly and consistently: serverless, kubernetes, terraform, cdk, networking, iam, finops, genai, observability, and multi-region.

Internal tags

Reserve internal tags for content type, series, lifecycle, and intent so the public taxonomy stays readable and scalable.