JamJet Benchmarks — Agent Framework Overhead Comparison

Framework orchestration overhead — JamJet vs LangGraph vs raw LLM call.
All runners make the same LLM call through the same client. The difference is pure framework tax.

llama3.2 · Ollama · Apple M-series · 2026-03-08

Model: llama3.2 Endpoint: http://localhost:11434/v1 (Ollama) Runs: 20 (+3 warmup)
Framework mean (ms) median p95 p99 stdev overhead visual
Raw (baseline) 947.2 943.7 970.3 972.2 9.9
JamJet 0.1.1 948.6 948.2 959.0 964.2 6.0 +1.4ms
LangGraph 944.0 943.0 953.8 961.1 8.1 -3.2ms

Note: All three frameworks within measurement noise (~1ms). JamJet's in-process executor adds zero observable overhead over a raw LLM call.

qwen3:8b (thinking mode) · Ollama · Apple M-series · 2026-03-08

Model: qwen3:8b Endpoint: http://localhost:11434/v1 (Ollama) Runs: 15 (+3 warmup)
Framework mean (ms) median p95 p99 stdev overhead visual
Raw (baseline) 8429.5 8303.4 8940.3 9427.6 352.3
JamJet 0.1.1 10140.1 10139.1 10487.0 10519.5 285.1 +1710.6ms
LangGraph 11902.9 11923.3 12761.8 12823.5 551.7 +3473.3ms

Note: qwen3:8b generates variable-length chain-of-thought. High stdev dominates — overhead numbers reflect token generation variance, not framework overhead.

Vertex AI (Gemini 2.0 Flash) — plan-and-execute agent

End-to-end run: JamJet @task + @tool on Vertex AI's OpenAI-compatible endpoint. Two-step research agent — plan then synthesize.

Model
gemini-2.0-flash-001
Provider
Vertex AI (GCP)
Strategy
plan-and-execute
Wall-clock
41,811 ms
Total tokens
10,961
Est. cost
$0.00191
Step Latency (ms) Prompt tokens Compl tokens Total tokens
plan — Gemini Flash2,64196104200
step 1 execution1,32412990219
step 2 execution1,413127110237
step 3 execution1,228132103235
step 4 execution1,986664186850
step 5 execution1,290280100380
synthesize — Gemini Flash3,050124153277
TOTAL (12 calls)41,8116,1214,84010,961
Integration — 2 env vars, no custom client
export OPENAI_BASE_URL="https://us-central1-aiplatform.googleapis.com/..." export OPENAI_API_KEY=$(gcloud auth print-access-token) # Then just use @task/@tool as normal @task(model="google/gemini-2.0-flash-001", tools=[web_search]) async def research(question: str) -> str: """Research assistant — search first, then summarize."""

Methodology

All benchmarks measure wall-clock time per call. Each framework makes the identical LLM call through the same OpenAI-compatible client — what we measure is framework orchestration overhead.

  • Raw (baseline) — bare openai.OpenAI().chat.completions.create() call
  • JamJet — Workflow.run_sync() in-process executor
  • LangGraph — StateGraph.compile().invoke() with a single node
# Reproduce locally (Ollama) export OPENAI_API_KEY="ollama" export OPENAI_BASE_URL="http://localhost:11434/v1" export MODEL_NAME="llama3.2" git clone https://github.com/jamjet-labs/jamjet-benchmarks cd jamjet-benchmarks/benchmarks pip install -r requirements.txt python bench_single_call.py --json results/my-run.json
  • Warmup runs excluded from measurements
  • Each timed run is independent — no shared state
  • Benchmarks run sequentially to avoid contention
  • Hardware: Apple M-series, 16GB RAM, Ollama local
Benchmark source Feature comparison Get started