
When professionals across functional units use GenAI agents equipped with tools to query data sources and knowledge bases, the reliability of output is subject to cumulative degradation across multiple stages. Each stage in the agent's pipeline introduces a probability of error, and because these stages are sequential, failures cascade — meaning overall system reliability is roughly the product of per-stage accuracy. An agent that is 90% accurate at each of six stages delivers a correct end-to-end result only ~53% of the time.
This taxonomy traces the lifecycle of an agentic request from formulation through final output, identifying six distinct failure modes. Critically, most of these failures are silent — the agent produces output that looks plausible but is subtly wrong, making traceability and auditability essential design considerations rather than afterthoughts.
Stage: Request formulation by the user
The professional making the request often omits details that are "common sense" to a human colleague but not inferrable by an AI agent. This includes implicit assumptions about time periods, business units, metric definitions, data scope, or output format. Because the agent lacks this shared organizational context, it may construct queries to data sources or knowledge bases with missing or incorrect parameters, leading to retrieval of wrong data, files, or document chunks from the very first step.
Why it matters most: This is the highest-leverage failure point. An ambiguous request propagates errors through every downstream stage, and no amount of downstream precision can compensate for a fundamentally misunderstood intent.
Examples:
Mitigation approaches:
Stage: Agent planning and tool orchestration
Even when the request is well-understood, there may be ambiguity in the sequence of actions the agent should take. The agent must decide which tools to invoke, in what order, and how to handle conditional logic that depends on intermediate results. It may select an incorrect sequence, skip a necessary validation step, combine tool outputs inappropriately, or fail to branch correctly when an intermediate result changes the required approach.
Why it's hard: This is fundamentally a planning problem. Current LLM-based agents handle simple linear workflows adequately but struggle with conditional branching, especially when the branching conditions depend on data only available mid-execution. The risk is not just inefficiency — the agent follows a plausible-looking but incorrect path, and the user has no visibility into that routing decision.
Examples:
Mitigation approaches:
Stage: Fetching data from sources (structured and unstructured)
This failure mode is most pronounced with unstructured text and less common with well-structured databases. It manifests in two related but distinct ways.
Missing or incomplete metadata at the file level. Enterprise file systems are filled with invoices, reports, contracts, and other documents that lack consistent metadata — no standardized naming conventions, no reliable date tags, no entity or department labels, or metadata that has drifted out of sync with actual content. When an agent uses grep-like search or file-system traversal to locate the right document, missing metadata means it is essentially pattern-matching on filenames, folder paths, or sparse text snippets. This frequently leads to retrieval of the wrong file entirely — an invoice from the wrong vendor, a report from the wrong period, or a draft mistaken for a final version — with no signal to the agent that it has the wrong document in hand.
Loss of context at the chunk level. Even when the correct document is located, RAG systems that chunk documents for embedding may strip away essential context. Section headers, document titles, table captions, and surrounding paragraphs that clarify scope and meaning are often lost at chunk boundaries. The agent then retrieves a passage that appears relevant based on semantic similarity but is missing the qualifiers that would reveal it applies to a different business unit, time period, or product line.
Examples:
Mitigation approaches:
Stage: Parsing and extracting specific values from retrieved content
Even when the correct document or chunk has been retrieved, extracting precise data points — particularly from unstructured formats — is error-prone. Numbers embedded in PDF tables, figures, footnotes, or complex layouts are especially vulnerable. Document processing pipelines may misparse table structures, confuse row-column alignment, misread OCR'd text, or fail to handle merged cells, multi-page tables, and nested headers.
Why it's dangerous: Table extraction from PDFs remains genuinely brittle even with modern tooling. The agent returns a number with full confidence, but it may be from the wrong row, the wrong table, or a misread digit — and nothing in the output signals this to the user.
Examples:
Mitigation approaches:
Stage: Interpreting the meaning of extracted values
Even when extraction succeeds mechanically, the agent may misinterpret the semantic meaning of the data. This is particularly common with numbers in unstructured text, where the same figure might represent different things depending on context — units (thousands vs. millions), time frames (annual vs. quarterly), accounting treatments (GAAP vs. non-GAAP), or measurement definitions that vary across documents or departments.
Examples:
Mitigation approaches:
Stage: Synthesizing outputs from multiple sources into a coherent response
This failure mode emerges when each individual retrieval and extraction step succeeds in isolation, but the agent incorrectly combines outputs drawn from different sources. The resulting synthesis may mix data from incompatible time periods, different organizational scopes, inconsistent methodologies, or mismatched definitions — producing an answer that is internally incoherent even though each component is individually correct.
Why it's distinct: This is not a failure of any single step but a failure of integration. It is especially insidious because every source citation checks out on inspection; the error lies in the act of combining them.
Examples:
Mitigation approaches:
Unlike traditional software that throws errors, these ambiguities produce output that looks reasonable and reads confidently. The mitigation strategy cannot rely solely on making the agent more capable — it must also make the agent's reasoning and sourcing transparent, so that a human can audit the chain from request to final output.
If each of the six stages operates at some accuracy rate, overall reliability is approximately the product of those rates. This framing makes explicit that even modest per-stage improvements yield significant gains in end-to-end reliability — and conversely, that a single weak stage can undermine an otherwise strong pipeline.
| Per-Stage Accuracy | End-to-End Reliability (6 stages) |
| 99% | ~94% |
| 95% | ~74% |
| 90% | ~53% |
| 85% | ~38% |
Errors detected at stages 4 through 6 — extraction failures, semantic misinterpretations, and composition mismatches — represent some of the most valuable signal available for improving the system over time. Rather than treating each failed interaction as an isolated incident, organizations should aggregate this feedback and route it back into earlier stages of the pipeline.
Why stages 4–6 specifically? These are the stages where errors are most likely to be caught during human review, because they produce concrete, verifiable outputs — a number, a label, a synthesized conclusion — that a domain expert can evaluate against their own knowledge. Earlier stages (retrieval, routing) produce intermediate artifacts that are harder to audit in isolation. Late-stage corrections therefore serve as a natural quality signal for the entire upstream chain.
How feedback should flow:
Implementation considerations: