Step-level checkpointing
Each step is an atomic transaction. When step 7 fails only step 7 retries — the tokens spent on steps 1-6 are never wasted.
AI fails unpredictably. APIs crash, context windows overflow, LLMs get rate limited. Wrap code in functions that checkpoint, wait, and offload without extra infrastructure.

For anyone who is building multi-step AI agents, I highly recommend building it on top of Inngest, the traceability it provides is super useful, plus you get timeouts & retries for free.
01
Stay in your codebase
Add automatic retries to code, whilecontrolling flow during surprise spikes.
02
Serverless-first
Run long running agents on any runtime,even serverless.
03
Agent-native observability
Treat all model calls like first-class events.See everything without extra tooling.
FirstFunctionTo FullProduction
FirstFunction
To FullProduction
Install the SDK. Serve one HTTP endpoint. Wrap any async function.
Quick StartDX humans love, structure agents grok.
Each step is an atomic transaction. When step 7 fails only step 7 retries — the tokens spent on steps 1-6 are never wasted.
Every model call is captured: prompt, response, token count, latency, and cost — automatically. Offloads the wait so serverless functions don't burn compute.
Rate-limit by any identifier. One user's burst can't saturate your OpenAI quota and degrade everyone else. Priority queuing for premium tiers, built in.
Pause mid-workflow for review or approval — hours or days. State maintained automatically. No polling, no cron, no database hacks.
LLM chains run for minutes. Batch pipelines run for hours. Inngest functions run to completion across any host — no 30-second limits.
One command. Live traces, step inspection, event replay, and every prompt/response pair — completely offline, before a single token is spent.
Inngest orchestrates AI workflows by invoking your functions via HTTP between steps. You write workflows as normal async functions and wrap logic in step.run(). Inngest handles retry logic, state, and scheduling between steps — no extra queues, workers, or stateful backends required.
Quick-start guideInngest handles LLM rate limits through built-in throttling and concurrency controls. You can cap simultaneous LLM calls, set per-user or per-tenant rate limits, and queue excess requests rather than dropping them. This prevents hitting provider rate limits at scale without custom infrastructure.
When an agentic workflow fails mid-execution, only the failed step retries — not the entire workflow. Inngest tracks completed steps and resumes from the point of failure. No work is duplicated and no state is lost.
Inngest works with serverless platforms by invoking functions via HTTP, so they run on any platform that serves HTTP requests. step.ai.infer offloads LLM inference to Inngest's infrastructure, pausing your function during the request so you don't pay for idle serverless execution time.
Yes, Inngest's Dev Server runs locally and provides full step-by-step execution traces, the ability to replay runs, and re-trigger functions — all before deploying to production.
Yes. Inngest supports workflows that run for hours or days. Functions can pause indefinitely — waiting for human input, external events, or slow inference — and resume exactly where they left off with no timeout constraints on workflow duration.