+@X!ED[EL(~?SD,~@2|>L{8
SYSTEM PROCESSING...
+@X!ED[EL(~?SD,~@2|>L{8
SYSTEM PROCESSING...
Posted: 2025-04-13 20:03:23 UTC

This article contains some claims that remain unverified. While much of the content may be accurate, exercise care when relying on this information.
This article contains some claims that remain unverified. While much of the content may be accurate, exercise care when relying on this information.
Status
Last Updated
2025-04-13 20:04:14 UTC
Verified By
Rollup News
This paper introduces QuestBench, a benchmark for evaluating if LLMs can pinpoint the single, crucial question needed to solve logic, planning, or math problems by framing underspecification as a Constraint Satisfaction Problem. The study reveals that models struggle with logic and planning despite success in math.
Framing underspecification as 1-sufficient Constraint Satisfaction Problems enables rigorous testing of LLM information needs identification.
QuestBench objectively measures LLM clarification, revealing low 40-50% accuracy in complex logic/planning domains.
Correlating accuracy with CSP factors suggests LLMs use domain-specific strategies for information gathering.
LLMs struggle with logic and planning tasks, achieving only 40-50% accuracy.
Identifying the single, crucial question needed to solve reasoning tasks.
Models have difficulty identifying the right question to ask, even when they can solve the fully specified version of the problem.