Unsupervised Discovery of Gendered Language through Latent-Variable Modeling


Studying to what degree the language we use is gender-specific has long been an area of interest in socio-linguistics. Studies have explored, for instance, the speech of male and female characters in film, or gendered language used when describing male versus female politicians. In this paper, we aim not to merely analyze this phenomenon qualitatively, but instead to quantify the degree to which language used to describe men and women is different, and moreover, different in a positive or negative way. We propose a novel generative latent-variable model, to be trained on a large corpus, that jointly represents adjective (or verb) choice with its sentiment given the natural gender of the head (or dependent) noun. We find that there are significant differences between how male and female nouns are described, which are in line with common gender stereotypes: Positive adjectives used to describe women are more likely to be related to a person’s body than adjectives describing men.

In Proceedings of the Annual Meeting of the Association for Computational Linguistics.