Ruoxi Ning

ICLR 2025

NovelQA

A long-context benchmark with 2,305 questions over novels averaging 200K+ tokens, covering question answering, retrieval, and reasoning.

VLM-Lens

Probes and test cases for Qwen-VL and InternLM, with a focus on color naming, language gaps, and modality gaps in vision-language models.

GLoRE

A unified logical reasoning benchmark and evaluation pipeline across datasets such as LogiQA, ReClor, FOLIO, AR-LSAT, ProofWriter, and RuleTaker.

Latinate-Germanic Ratio

A linguistically grounded metric for genre formality using 40K etymology entries and Brown-family corpora, reaching strong correlation with existing formality scores.

Ruoxi Ning

Projects

NovelQA

VLM-Lens

GLoRE

Latinate-Germanic Ratio