Mario Haim · Academia

A Framework for Ensuring Reproducibility in LLM-Assisted Research

← go back

Peer-Reviewed Presentations
2026
English
reproducibility replicability methods artificial intelligence

The use of large language models (LLMs) is increasing rapidly across scientific disciplines. From refining research questions and conducting efficient literature searches to summarizing prior findings, assisting in study design, programming experiments (incl. generating stimulus materials), processing and analyzing data, and supporting the reporting and dissemination of results, LLMs have become an integral part of the modern research cycle. Scientists are increasingly embracing LLMs for their potential to save time, reduce effort, and enhance the efficiency and quality of the scientific process. However, these new possibilities also come with challenges, particularly regarding transparency and systematic documentation. Technological barriers such as the frequent updates of LLMs and the absence of clear guidelines on how to report their use in a research context contribute to inconsistent practices, which threaten the reproducibility of scientific results.

Our group, consisting of researchers from social and behavioral sciences, approaches this topic from an open-science perspective. In recent years, the scientific community has invested considerable effort into increasing the transparency and reproducibility of research, particularly in response to the so-called replication crisis. Practices such as preregistration, registered reports, and the sharing of data, materials, and analysis code, have thus become much more common in many empirical fields and are now increasingly integrated into research workflows, supported by both infrastructure and cultural incentives. However, the integration of LLMs into these workflows has not yet been accompanied by comparable standards for documentation and reporting. As a result, research that incorporates LLMs at any stage of the research cycle may be more difficult to trace and reproduce.

In this talk, we explore the underexamined intersection between LLM usage and reproducibility. We present a practical framework for systematically documenting LLM usage across the entire research cycle, aligning the level of documentation with the extent to which LLM use may shape research findings and their broader implications - including risks of bias, misinterpretation, or flawed conclusions. For instance, using LLMs to extract information from documents requires more detailed documentation than using them for stylistic text editing only, as the former directly affects the data, their analysis, and the interpretation of results. To demonstrate the applicability and usefulness of our framework, we present use cases drawn from our own scientific practice, illustrating how LLMs were used, how their influence should be documented, and what implications this has for reproducibility.

With this contribution, we aim to initiate a dialogue between applied research in the social and behavioral sciences and the research software engineering community. On the one hand, we clarify what researchers should document about LLM usage and how to best do so. On the other hand, we aim to discuss what developers of LLM-based tools and APIs could do to support this endeavor - such as enabling persistent setting files, interaction logs, or metadata export functionalities. We believe that fostering such an exchange can help to lower the barriers to reproducible research practices in LLM-assisted workflows and foster transparency and trust in AI-supported research.

Frank, M., Schroeders, U., Breuer, J., Einsiedler, J., Jankowsky, K., Haim, M., & Schönbrodt, F. (3/2026). A Framework for Ensuring Reproducibility in LLM-Assisted Research. Presented at the 6th conference for Research Software Engineering in Germany (deRSE26), Stuttgart. ()

This paper is available with open access. That means, you can just read/download it here.

← go back