Unpacking Translation Effects. Influences of Target Language Choice on Topic Modeling in Multilingual Environments

Machine translation is widely used in communication science to consolidate texts, not least for exploratory clustering approaches such as multilingual topic modeling. However, the impact of target language choice on topic modeling results remains unclear. This study examines these effects by (a) consolidating texts into one of the original document languages and (b) translating texts into an intermediary language not present in the dataset under study. To assess the effects, we use a corpus of parallel United Nations texts in Russian and German (N = 3,760). We compare the results of structural topic modeling after translating Russian texts into German, chosen as the original language, with consolidating the entire corpus into English as an intermediary language. The translation approaches are compared based on feature overlap, topical prevalence, and topical content. The findings show that intermediary-language translation yields a more symmetrical topic distribution and higher overlap in top words, but significantly reduces vocabulary size compared to consolidation into the original language. The results are replicated using a second bilingual journalistic corpus (N = 434) and validated across different numbers of topics. Finally, we discuss best practices for target language selection in multilingual topic modeling and situate them within the context of recent developments in computational communication science.

Ozornina, N. & Haim, M. (2026). Unpacking Translation Effects. Influences of Target Language Choice on Topic Modeling in Multilingual Environments. Medien & Kommunikationswissenschaft, 74(1), 71-87. https://doi.org/10.5771/1615-634X-2026-1-71 (content_copy)

This paper is available with open access. That means, you can just read/download it here.