Mario Haim · Academia

Assessing the potential reproducibility and replicability of research in Computational Communication Science

← go back

When it comes to reproducibility and replicability, some attributes of Computational Communication Science (CCS) as a distinct subfield, such as its use of large-scale media (content) data and complex computational data processing and analysis pipelines can pose particular challenges (Breuer & Haim, 2024; van Atteveldt et al., 2019). Previous research on open science practices in communication science in general and CCS in particular has highlighted some of those challenges. For example, Haim and Jungblut (2023) document generally low levels of data and code availability across Communication Science as a whole. By comparison, Chan et al. (2024) find a comparatively higher data and code sharing rate of about 47% for CCS studies published in the journal Computational Communication Research. However, they were only able to computationally reproduce 20% of these studies, mainly due to incomplete sharing of materials and restricted access to data.
Building on these findings, we used a mixed-methods approach combining a semi-automated systematic literature review with manual content analysis to assess the potential reproducibility and replicability of CCS research. We compiled 22,375 English-language articles (2010–2021) from top Communication journals via Web of Science and filtered them with CCS-specific keywords, arriving at a sample of 6,556 articles. We then manually classified 1,000 articles as being empirical CCS papers to train a Naïve Bayes model (83% accuracy). This model identified 2,551 CCS publications as empirical CCS papers, from which we drew a stratified (across years) random sample of 500. Due to unavailability and misclassification, 24 articles had to be removed, leaving a final sample of N = 476 articles. Two trained coders analyzed these using a two-level codebook on (1) article characteristics and (2) methodological details per reported study (e.g., data types, collection/analysis methods, data/code sharing; Krippendorff’s α across all categories ≥ 0.8).
In terms of data, media content stands out as the most widely used type, featuring in about 56% of studies in our sample. This broad category includes news articles, images, audio, video, and, importantly, also social media posts. The dominance of media content highlights one key challenge for replicability in CCS as such data is typically proprietary and usually distributed via commercial platforms with access restrictions. The second most used data type, self-report data, such as surveys and interviews, make up for 31% of studies. While less susceptible to access restriction problems, self-reports introduce concerns around privacy and ethics that can prevent data sharing.
Data collection methods largely mirror the aforementioned data types, with database downloads (26%), API access (25%), and surveys (22%) being the most common. Analytical approaches in CCS are similarly diverse, with content analysis (28%), network analysis (21%), and regression (18%) being most dominant.
A critical finding from our analysis is that nearly 90% of CCS publications do not publicly share their data. Only around 6% provide full datasets, typically through public repositories or supplementary materials. This scarcity of data sharing presents a major barrier to reproducibility and transparency. Legal and ethical concerns, especially regarding proprietary media and sensitive self-report or trace data, may be reasons for this (Akdeniz et al., 2023). Similarly, code sharing remains limited, with over 90% of publications not providing any code and only about 8% sharing their full code.
Overall, our findings echo broader concerns about reproducibility and replicability in the social sciences in general and suggest that there are some additional challenges in CCS. To address those, awareness and intrinsic motivation may not be enough (Xu & Zhang, 2024). While adopting the principle of being “as open as possible, as closed as necessary” (Horizon Europe Open Science Policy) could help researchers balance openness with privacy, for example, by sharing processed data or restricting access to sensitive raw data (Van Atteveldt et al., 2020), institutionalized measures, such as requirements by journals and conferences, should be discussed to further increase reproducibility and replicability in CCS.

Knöpfle, P., Breuer, J., & Haim, M. (3/2026). Assessing the potential reproducibility and replicability of research in Computational Communication Science. Presented at the 71st Annual Conference of the DGPuK, Dortmund. ()

← go back