Ruoxi Ning

GLoRE is a logical reasoning project centered on evaluating model performance consistently across a diverse set of benchmarks. I contributed to the benchmark integration and evaluation pipeline spanning datasets such as LogiQA, ReClor, FOLIO, AR-LSAT, ProofWriter, and RuleTaker.

The main value of the project is in making comparisons more unified and interpretable across reasoning settings that are often studied separately. That makes it easier to analyze strengths, weaknesses, and transfer patterns in model reasoning.

You can keep refining this page directly in markdown with details about datasets, methods, results, or links to papers and code.