LLM Guidelines for SE
Guidelines for Empirical Studies in Software Engineering involving LLMs.
This website hosts community guidelines for reporting empirical studies in software engineering involving LLMs. Besides our motivation and scope, we present a first taxonomy of LLM study types and corresponding guidelines. These guidelines are meant as a starting point for further discussions in the community with the aim of developing a common understanding of how we should conduct and report empirical studies involving large language models (LLMs). The project was initiated by a position paper as well as discussions during the ISERN 2024 meeting and the 2nd Copenhagen Symposium on Human-Centered Software Engineering AI. To contribute to the guidelines, you can open an issue or create a pull request in our GitHub repository. To cite our guidelines, you can refer to the arXiv version:
@misc{baltes2025guidelinesempiricalstudiessoftware,
title={Guidelines for Empirical Studies in Software Engineering involving Large Language Models},
author={Sebastian Baltes and Florian Angermeir and Chetan Arora and Marvin Muñoz Barón and Chunyang Chen and Lukas Böhme and Fabio Calefato and Neil Ernst and Davide Falessi and Brian Fitzgerald and Davide Fucci and Marcos Kalinowski and Stefano Lambiase and Daniel Russo and Mircea Lungu and Lutz Prechelt and Paul Ralph and Christoph Treude and Stefan Wagner},
year={2025},
eprint={2508.15503},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2508.15503},
}
Project Coordinators:
- Sebastian Baltes, University of Bayreuth (Germany)
- Stefan Wagner, Technical University of Munich (Germany)
Team:
- Florian Angermeir, fortiss (Germany) and Blekinge Institute of Technology (Sweden)
- Chetan Arora, Monash University (Australia)
- Marvin Muñoz Barón, Technical University of Munich (Germany)
- Lukas Böhme, HPI, University of Potsdam (Germany)
- Fabio Calefato, University of Bari (Italy)
- Chunyang Chen, Technical University of Munich (Germany)
- Neil Ernst, University of Victoria (Canada)
- Davide Falessi, University of Rome Tor Vergata (Italy)
- Brian Fitzgerald, Lero and University of Limerick (Ireland)
- Davide Fucci, Blekinge Institute of Technology (Sweden)
- Marcos Kalinowski, Pontifical Catholic University of Rio de Janeiro (Brazil)
- Stefano Lambiase, Aalborg University in Copenhagen (Denmark)
- Mircea Lungu, IT University of Copenhagen (Denmark)
- Lutz Prechelt, Free University of Berlin (Germany)
- Paul Ralph, Dalhousie University (Canada)
- Daniel Russo, Aalborg University in Copenhagen (Denmark)
- Christoph Treude, Singapore Management University (Singapore)