CyDRA: Diagnosis-Guided Zero-Shot Prompting for Text-to-Cypher Generation

ICML 2026 Workshop - Failure Modes in Agentic AI: Website

Abstract

Execution accuracy alone is a terminal-score metric: it reveals that an LLM-generated Cypher query is wrong, but not why or in what systematic way. We present CyDRA, a Cypher Diagnostic Rule Alignment framework that operationalizes Text-to-Cypher failure as a taxonomy of 11 structurally grounded error categories covering graph pattern construction, aggregation, deduplication, and return projection, each with deterministic triggering conditions derived from AST-lite comparison against gold queries. This taxonomy serves simultaneously as a trace-level diagnostic and as the input to a mitigation strategy: CyDRA distills offline failure signal into a frozen natural-language prompt policy that requires no retrieved examples. On the CypherBench benchmark across seven held-out graph domains, CyDRA improves execution accuracy by 8.92 percentage points on average over a zero-shot baseline across five LLMs, whereas a documentation-only policy lacking diagnostic evidence yields no consistent gain, offering controlled evidence of both effective and ineffective intervention directions. We offer this framework as a case study in how failure taxonomies with operational definitions can close the loop from trace-level diagnostics to verifiable prompt interventions.

Dayeon Shin
Dayeon Shin
M.S. Student

Connecting artificial intelligence and mathematics, in both directions.

Donghun Lee
Donghun Lee
Associate Professor

Connecting artificial intelligence and mathematics, in both directions.