Latinate-Germanic Ratio
A linguistically grounded metric for genre formality using 40K etymology entries and Brown-family corpora, reaching strong correlation with existing formality scores.
Latinate-Germanic Ratio is a computational linguistics project on formality and lexical origin. I led the work during 2022-2023, using roughly 40K etymology entries together with Brown-family corpora to build a genre-sensitive formality signal.
The project explores whether lexical origin can serve as a meaningful quantitative cue for style and register. The resulting metric showed strong correlation with existing formality scores, while also remaining interpretable from a linguistic perspective.
This markdown file is the editable source for the project page, so you can continue adding examples, charts, citations, or external links here.