LLM Guidelines for SE

Guidelines for Empirical Studies in Software Engineering involving LLMs.

This website hosts community guidelines for reporting empirical studies in software engineering involving LLMs. Besides our motivation and scope, we present a first taxonomy of LLM study types and corresponding guidelines. These guidelines are meant as a starting point for further discussions in the community with the aim of developing a common understanding of how we should conduct and report empirical studies involving large language models (LLMs). The project was initiated by a position paper as well as discussions during the ISERN 2024 meeting and the 2nd Copenhagen Symposium on Human-Centered Software Engineering AI. To contribute to the guidelines, you can open an issue or create a pull request in our GitHub repository. To cite our guidelines, you can refer to the arXiv version:

@misc{baltes2025guidelinesempiricalstudiessoftware,
      title={Guidelines for Empirical Studies in Software Engineering involving Large Language Models}, 
      author={Sebastian Baltes and Florian Angermeir and Chetan Arora and Marvin Muñoz Barón and Chunyang Chen and Lukas Böhme and Fabio Calefato and Neil Ernst and Davide Falessi and Brian Fitzgerald and Davide Fucci and Marcos Kalinowski and Stefano Lambiase and Daniel Russo and Mircea Lungu and Lutz Prechelt and Paul Ralph and Christoph Treude and Stefan Wagner},
      year={2025},
      eprint={2508.15503},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2508.15503}, 
}

Project Coordinators:

Team: