+@X!ED[EL(~?SD,~@2|>L{8

SYSTEM PROCESSING...

QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? - Rollup News

QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?

Posted: 2025-04-13 20:03:23 UTC

@Rohan Paulrohanpaul_ai

#Logic

#Mathematics

#LLMs

#Planning

#ConstraintSatisfactionProblem

#QuestBench

#ReasoningTasks

#InformationNeeds

Read With Caution

This article contains some claims that remain unverified. While much of the content may be accurate, exercise care when relying on this information.

Full Thread

Read With Caution

This article contains some claims that remain unverified. While much of the content may be accurate, exercise care when relying on this information.

Verification Details

Status

In Progress

VerifiedPartially VerifiedFalse

Last Updated

2025-04-13 20:04:14 UTC

Verified By

Rollup News

TL;DR;

This paper introduces QuestBench, a benchmark for evaluating if LLMs can pinpoint the single, crucial question needed to solve logic, planning, or math problems by framing underspecification as a Constraint Satisfaction Problem. The study reveals that models struggle with logic and planning despite success in math.

Key Impact Areas

Framing underspecification as 1-sufficient Constraint Satisfaction Problems enables rigorous testing of LLM information needs identification.

QuestBench objectively measures LLM clarification, revealing low 40-50% accuracy in complex logic/planning domains.

Correlating accuracy with CSP factors suggests LLMs use domain-specific strategies for information gathering.

Challenges

LLMs struggle with logic and planning tasks, achieving only 40-50% accuracy.

Identifying the single, crucial question needed to solve reasoning tasks.

Models have difficulty identifying the right question to ask, even when they can solve the fully specified version of the problem.

QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?

Read With Caution

Full Thread

Read With Caution

Verification Details

TL;DR;

Key Impact Areas

Challenges

Claims

Deliberation Map

Similar Rollups