NovelQA

A long-context benchmark with 2,305 questions over novels averaging 200K+ tokens, covering question answering, retrieval, and reasoning.

NovelQA cover

NovelQA is a long-context evaluation benchmark built around full-length novels. I led the design of the benchmark during 2023-2024, focusing on tasks that require models to retain, retrieve, and reason over information distributed across very long narratives.

The benchmark contains 2,305 questions and is designed to test multiple dimensions of long-context ability rather than a single metric. It aims to capture whether a model can locate relevant evidence, maintain narrative consistency, and answer questions whose supporting information may appear far apart in the source text.

Paper

NovelQA Online Testbench