Partiality and Misconception: Investigating Cultural Representativeness in Text-To-Image Models

lili zhang February 23, 2024 July 24, 2025

Text-to-image (T2I) models enable users worldwide to create high-definition and realistic images through text prompts, where the underrepresentation and potential misinformation of images have raised growing concerns. However, few existing works examine cultural representativeness, especially involving whether the generated content can fairly and accurately reflect global cultures. Combining automated and human methods, we investigate this issue in multiple dimensions quantificationally and conduct a set of evaluations on three prevailing T2I models (DALL-E v2, Stable Diffusion v1.5 and v2.1). Introducing attributes of cultural cluster and subject, we provide a fresh interdisciplinary perspective to bias analysis. The benchmark dataset UCOGC is presented, which encompasses authentic images of unique cultural objects from global clusters.