Evaluation Guidelines for Empirical Studies involving LLMs
Towards community guidlines for empirical studies in software engineering involving LLMs.
This website hosts a draft of community guidlines for empirical studies in software engineering involving LLMs. We present a first taxonomy of study types and corresponding guidelines.
The current draft is based on a position paper as well as discussion during the ISERN 2024 meeting and the 2nd Copenhagen Symposium on Human-Centered Software Engineering AI. To contribute to the guidelines, you can open an issue or a pull request in our GitHub repository.
Workstream Leads:
- Sebastian Baltes, University of Bayreuth (Germany)
- Stefan Wagner, Technical University of Munich (Germany)
Team:
- Marvin Muñoz Barón, Technical University of Munich (Germany)
- Lukas Böhme, HPI, University of Potsdam (Germany)
- Fabio Calefato, University of Bari (Italy)
- Neil Ernst, University of Victoria (Canada)
- Davide Falessi, University of Rome Tor Vergata (Italy)
- Brian Fitzgerald, Lero and University of Limerick (Ireland)
- Davide Fucci, Blekinge Institute of Technology (Sweden)
- Marcos Kalinowski, Pontifical Catholic University of Rio de Janeiro (Brazil)
- Stefano Lambiase, University of Salerno (Italy)
- Mircea Lungu, IT University of Copenhagen (Denmark)
- Margaret-Anne Storey, University of Victoria (Canada)
- Christoph Treude, Singapore Management University (Singapore)