A Benchmark for Evaluating LLMs' Ability to Request Missing Information in Math Problems

Codes: https://github.com/frinkleko/LLM-Fail-to-Acquire-Context

Xinjie Shen ([email protected]) Georgia Institute of Technology 10 June 2025


Welcome to the first blog in the Fragile LLM series! In this post, we'll explore an intriguing challenge faced by Large Language Models (LLMs):

<aside> 💡

  1. LLM's tendency to directly answer when curial context is missing and their significant performance degradation as a result.
  2. LLM may be stuck in a special type of hallucinations-as-assumption then answer directly when curial context is missing, which leads to completely wrong answers. </aside>

Introduction

Large Language Models (LLMs) have transformed many fields with their impressive capabilities in understanding and generating language. However, a common frustration arises when LLMs provide mismatched implementations or make uncontrolled assumptions, leading to unexpected and undesirable results. This challenge often occurs when LLMs encounter incomplete or ambiguous inquiries, which users may unconsciously provide due to their interaction habits and expectations.

To investigate this, we've developed a benchmark to evaluate how well LLMs can identify missing context in math problems and actively seek out additional information. This benchmark focuses on mathematical problems, where precise conditions are essential for accurate solutions.

Benchmark

In recent years, the application of LLMs in solving mathematical problems has gained significant attention. Numerous datasets and specialized models have emerged, accompanied by rich, human-verified benchmarks. We adopt a high-quality, verified dataset collection to build our benchmark.

Here's an example:

{
    "original_question": "Consider a function $f(x)$ defined over $\\\\mathbb{R}$ such that for any real number $x$, it satisfies $f(x) = f(x - 2) + 3$, and $f(2) = 4$. Find the value of $f(6)$.",
    "condition": "f(x) = f(x - 2) + 3",
    "incomplete_question": "Consider a function $f(x)$ defined over $\\\\mathbb{R}$ such that $f(2) = 4$. Find the value of $f(6)$.",
    "answer": "10",
}

Key Properties

To this point, we have established a benchmark with the following key properties:

  1. Condition Extraction: Each question includes a critical condition necessary for solving the problem.
  2. Unsolvable without Condition: The modified question should not be answered correctly without the extracted condition.
  3. Ground Truth Answers: Each question has a verified answer for reliable evaluation.

Experiment