Mario Haim · Academia

← go back

Today's study of digital journalism entails a plethora of new forms of research data. For example, traditional survey data can be accompanied by large-scale trace data, full-text availability allows for transnational comparative analyses of news articles, and social-media engagement opens up for sophisticated insights into news consumption patterns around the globe. This allows for adequately capturing diverse online phenomena, such as multi-channel dissemination, segmented audiences, or social-media fragmentation, both between national media markets and within them.

Yet, despite significant developments in data capturing and open-science endeavors (Dienlin et al., 2020; Haim, 2020; van Atteveldt et al., 2020), such new forms of combined and large-scale datasets pose a series of challenges also to our field's analytics. That is, common procedures, statistics, and coefficients of survey, observational, or content-analytic methods cannot simply be applied to such data due to a wide variety of reasons (e.g., validity, reliability, assumption violations, unknown populations). Moreover, open-science principles of pre-registration and sharing (data, materials, trained models) to foster replication may violate privacy or copyright law, particularly when it comes to trace data or news content, or pose considerable ethical challenges. Consequently, such new forms of data are easily kept behind closed doors. Empirical digital journalism research thus not only lacks guidance for adequate validation and analyses but also for adequate sharing and replicability of modern data.

This special issue of Digital Journalism invites scholars to present, provide, and discuss best practices of analyses while implementing open-science principles for the study of online news. While being open to a plethora of submissions, seminal ideas for contributions include to …

- investigate ways to analyze and compare multilingual news content
- discuss pathways to empirically describe comparisons of news consumption across time, settings, levels, or media systems
- simulate outcomes of different approaches to measure news-use fragmentation
- introduce or compare validity measures for third-party datasets on online news
- rethink ways to conduct and report combined manual and automated content analyses
- rethink ways to conduct and report combinations of survey and trace data
- propose modes of comparable analyses for the study of engagement data
- enrich or combine various openly available data sources

To make this special issue also a reference example for open science, we provide an extensive reference dataset for the study of online news, specifically put together for this special issue (Puschmann & Haim, 2020). Authors of accepted proposals will be invited to develop their outline and hypotheses to register them with the guest editors. Then, authors submit their complete manuscript which, together with pre-registered materials, will undergo full blind review in accordance with the journal’s peer-review procedure.

We have put together the unique useNews dataset (Puschmann & Haim, 2020). The dataset can (but does not have to) serve as reference dataset for submissions. It consists of data from three sources.

First, original survey data from the Reuters Digital News Reports of both 2019 and 2020 (Newman et al., 2019, 2020) provides online news outlets used by at least 10 percent of respondents for each of 12 countries (i.e, Australia, Austria, Brazil, Germany, Japan, the Netherlands, Norway, Romania, South Korea, Spain, the UK, and the US) in both years; this selection subsumes 9 languages, a broad variety of global regions and media systems, and is accompanied by well-known variables from the report, such as sociodemographic information, political orientation, or willingness to pay.

Second, 1.74 million news articles published by 76 of these news outlets were collected between August 2018 and August 2019 and 1.25 million news articles published by 81 different news sources in were collected between August 2019 and August 2020 through the MediaCloud API, an open-source platform for media analysis jointly provided by the MIT Center for Civic Media and the Berkman Klein Center for Internet & Society at Harvard University. Data includes the publication date, author, and topical keywords as well as the articles’ textual content as document term matrices (DTMs; full texts are unavailable due to legal provisions), enabling analyses of the textual content of articles.

Third, for each individual article URL, an array of engagement metrics is included from CrowdTangle, a subsidiary of Facebook. Specifically, useNews includes aggregate numerical engagement data on the number of likes, shares, and comments, as well as reactions of love, wow, haha, sad, and angry.

Special Issue in Digital Journalism, 11(2).

M. Haim & C. Puschmann (2023). Analytical Advances through Open Science: Employing a Reference Dataset to Foster Best-Practice Data Validation, Analysis, and Reporting. Digital Journalism. ()

This paper is available with open access. That means, you can just read/download it here.

← go back