LLMs as Subjects (S4)

Description

In empirical studies, data is collected from participants through methods such as surveys, interviews, or controlled experiments. LLMs can serve as virtual subjects by simulating human behavior and interactions. If LLMs can generate responses that approximate those of human participants, they could be valuable for research involving user interactions, collaborative coding environments, and software usability assessments (Zhao, Habule, and Zhang 2025). To achieve this, prompt engineering techniques are widely employed; for instance, the Personas Pattern (Kong et al. 2023) involves tailoring LLM responses to align with predefined profiles or roles that emulate specific user archetypes. To serve as virtual subjects, generated responses should be indistinguishable from human-produced texts, consistent with the attitudes and sociodemographic information of the conditioning context (e.g., junior vs. senior developers), naturally aligned with the form, tone, and content of the simulated scenario, and reflect patterns in relationships between ideas, demographics, and behavior observed in comparable human data (Argyle et al. 2022).

Example(s)

Xu et al. (2024) (Xu et al. 2024) compiled a list of ways LLMs can support social science research, some of which transfer to empirical SE research. For example, LLMs can emulate human responses and behaviors in simulated interviews and focus groups Gerosa et al. (2024) (Gerosa et al. 2024). Similarly, Bano, Gunatilake, and Hoda (2025) (Bano, Gunatilake, and Hoda 2025), investigated biases in LLM-generated candidate profiles in SE recruitment processes. They found biases favoring male candidates, lighter skin tones, and slim physiques, particularly for senior roles. LLMs may be able to simulate end-user feedback and behavior in usability studies, identify usability issues and offering suggestions for improvement based on predefined user personas.

References

Argyle, Lisa P., Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Michael Rytting, and David Wingate. 2022. “Out of One, Many: Using Language Models to Simulate Human Samples.” CoRR abs/2209.06899. https://doi.org/10.48550/ARXIV.2209.06899.

Bano, Muneera, Hashini Gunatilake, and Rashina Hoda. 2025. “What Does a Software Engineer Look Like? Exploring Societal Stereotypes in LLMs.” In 47th IEEE/ACM International Conference on Software Engineering: Software Engineering in Society, ICSE-SEIS 2025, Ottawa, ON, Canada, April 27 - May 3, 2025, 173–84. IEEE. https://doi.org/10.1109/ICSE-SEIS66351.2025.00023.

Gerosa, Marco Aurélio, Bianca Trinkenreich, Igor Steinmacher, and Anita Sarma. 2024. “Can AI Serve as a Substitute for Human Subjects in Software Engineering Research?” Autom. Softw. Eng. 31 (1): 13. https://doi.org/10.1007/S10515-023-00409-6.

Kong, Aobo, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, and Xin Zhou. 2023. “Better Zero-Shot Reasoning with Role-Play Prompting.” CoRR abs/2308.07702. https://doi.org/10.48550/ARXIV.2308.07702.

Xu, Ruoxi, Yingfei Sun, Mengjie Ren, Shiguang Guo, Ruotong Pan, Hongyu Lin, Le Sun, and Xianpei Han. 2024. “AI for Social Science and Social Science of AI: A Survey.” Inf. Process. Manag. 61 (2): 103665. https://doi.org/10.1016/J.IPM.2024.103665.

Zhao, Chenguang, Meirewuti Habule, and Wei Zhang. 2025. “Large Language Models (LLMs) as Research Subjects: Status, Opportunities and Challenges.” New Ideas in Psychology 79: 101167. https://doi.org/10.1016/j.newideapsych.2025.101167.