Lambda vs Fargate vs ECS — the napkin decision tree
My team has been running the same forty-minute architecture-review meeting for a year. New service, three engineers in the room, the question is Lambda, Fargate, or ECS, and we re-derive the answer from first principles every single time.
I got tired of it. So I drew a four-question decision tree on a napkin at lunch last week and made everyone agree to use it. Notes here so it lives somewhere besides the napkin, which I have lost.
The four questions, in order
1. Does this workload have to be ready in under 100 ms cold?
If yes — interactive user-facing path, real-time event processing where 200 ms tail latency tanks the experience — you can't use Lambda for the cold path. Lambda cold-starts on Node are around 200 ms; on Java without SnapStart it's a second or two. SnapStart helps for Java, but Python and .NET don't have it yet.
For the cold-tolerant workload — async jobs, scheduled tasks, webhooks, things on a queue — Lambda is the right default.
2. Will this run for more than 15 minutes at a stretch?
Lambda has a 15-minute execution cap. Always has, probably always will. If the workload is a long-running batch process, ML training, video encoding, anything where the per-invocation budget is genuinely longer than 15 minutes, Lambda's out.
Fargate or ECS for those.
3. Do you need a specific runtime environment that Lambda doesn't ship?
Lambda has a fixed set of runtimes and a Linux Amazon Linux 2 base. If the workload needs a system library Lambda doesn't have, a custom kernel module, a specific FFmpeg build, or anything that needs a particular Docker base image — you can do this with Lambda container images, but the friction is enough that Fargate or ECS is usually the right move.
4. Are you running enough hours per month that "always on" is cheaper than "pay per request"?
This is the only question with real math behind it. The rule of thumb on my team: if a service runs more than about 40% of the hours in a month, ECS Fargate ends up cheaper than Lambda at typical request rates. Below that, Lambda's per-invocation pricing wins. AWS publishes a calculator; we don't trust it without checking against our own request-pattern data.
Different question: do you need to control instance shape (memory + CPU ratio, GPU access, specific networking) more tightly than Fargate exposes? If yes, ECS on EC2.
The flowchart, in two sentences
If cold-start sensitive or runs > 15 min → not Lambda. If high enough utilization or needs instance control → ECS over Fargate; otherwise Fargate. Else Lambda.
That's it. Four questions, three sentences, one decision.
Where this breaks
The decision tree assumes you're already inside AWS. If you're greenfield and could go to a different cloud, that's a bigger conversation. The tree also assumes the workload is a service — for batch workloads, AWS Batch is the obvious answer and isn't in the tree because the question doesn't come up the same way.
It also doesn't address EKS. We use EKS at our company for the workloads where Kubernetes is a hard requirement from elsewhere in the org. That's a different decision than the compute one. If you're standing up a new service and asking "should I use Kubernetes" without an external constraint forcing the answer, the answer is no.
The meeting that doesn't happen anymore
The point of the tree isn't that I think compute decisions are simple. It's that they're not the most expensive decision in the architecture, and they deserve five minutes of room, not forty.
Our team's architecture reviews are now back to spending the bulk of the hour on the actual hard parts — data model, failure modes, observability — instead of relitigating Lambda vs ECS for the fifth time this quarter.
Print the tree. Tape it to the conference room wall. Get the time back.