Research · Working paper

BatchDAG: LLM-Planned Execution Graphs for Scalable Ad-Hoc Analysis Over Enterprise Data

Anupreet Walia · Brevian.ai · 2026

Abstract. Large language models excel at analyzing individual documents but break down when users ask exhaustive, cross-entity analytical questions over enterprise-scale datasets. For example, “Did our account executives open every meeting with effective discovery questions?” across 50,000 recorded meetings. The single-agent tool-calling paradigm fails at this scale for three compounding reasons: context window overflow, loss of per-entity attribution in global top-N retrieval, and linear wall-clock time growth from sequential tool calls. We present BatchDAG, a system in which an LLM generates a typed directed acyclic graph (DAG) of operations (SQL queries, semantic searches, in-memory transforms, parallel fan-outs, and single-shot analyses) that a deterministic execution engine then evaluates with topological-wave parallelism. Structured JSON rows flow between steps (never prose summaries), enabling proper joins, filters, and grouping. A key optimization, entity-aware batching, groups input rows by their logical entity before fan-out, reducing LLM calls by up to 47× while preserving per-entity attribution. BatchDAG has been deployed in production at Brevian.ai, processing analytical queries over corpora of 50,000+ meetings and 3,000+ sales opportunities in under 60 seconds.

47×

fewer LLM calls via entity-aware batching

<60s

per query over 50K+ meetings

98.8%

valid-DAG planning rate (300 calls)

77%

transcript evidence rate (vs 46–60% baselines)

The problem

Tool-augmented LLM agents (ReAct, Toolformer, LangChain, LlamaIndex) perform well on queries targeting individual entities or small document sets. But enterprise users increasingly ask exhaustive, cross-entity analytical queries that require processing hundreds to thousands of entities, each needing its own retrieval, contextual analysis, and per-entity attribution: “analyze every deal to see if security insurance was covered,” or “for each meeting, check if the key stakeholder negotiated on price.” These expose three limits of the single-agent loop: context window overflow, loss of per-entity attribution under global top-N search, and linear wall-clock growth.

The approach

BatchDAG decomposes the problem into two phases: an LLM planner that generates a typed DAG of operations from a natural-language query, and a deterministic execution engine that evaluates the DAG with topological-wave parallelism, structured data flow, and entity-aware batching. Each step is one of six typed operations (sql, search, transform, fan_out, analyze, compare), four of which require zero LLM calls during execution. Only fan_out and analyze invoke the model, and entity-aware batching minimizes that cost by grouping rows by their logical entity (meeting, deal, account) before fan-out.

Key contributions

A typed DAG formalism for decomposing ad-hoc analytical queries into composable operations with structured inter-step data flow.
An entity-aware batching algorithm achieving up to 47× reduction in LLM calls versus row-level batching.
A goal-based planning prompt architecture that outperforms both exhaustive-rules and few-shot example approaches.
A production deployment report on cost, latency, and correctness over enterprise-scale data (50K+ meetings, 3K+ opportunities).
A controlled evaluation showing automatically generated DAG pipelines match expert-designed baselines, with superior provenance and 27% fewer hallucinations via structured intermediates.

Why structured intermediates matter

Steps pass structured JSON rows between them, never prose summaries. This is the single most important architectural decision in BatchDAG. When intermediate results were summarized in natural language, downstream steps hallucinated data and lost attribution. Structured rows are less expressive but fully composable: they support real database-style joins, filters, and grouping, and they preserve the provenance chain from source data to final answer.

Read the full paper (PDF) → Google Scholar

Working paper. arXiv submission pending.