Ruoxi Ning

VLM-Lens studies the behavior of vision-language models through targeted probes and evaluation cases. My contribution focused on building and analyzing examples for systems such as Qwen-VL and InternLM.

The project emphasizes several failure modes that are especially interesting from a linguistic and interpretability perspective, including color naming behavior, cross-lingual inconsistencies, and gaps between visual and textual competence.

This page can be expanded freely in markdown as the project evolves. You can add figures, links, updates, or a fuller narrative of methods and findings here.