Research
This page provides an overview of my previous and ongoing research projects, arranged along seven main research areas. Many projects we work on focus on software analytics, that is, processing, analyzing, and visualizing software engineering data to monitor, govern, and improve software development processes and tools. We are further interested in interdisciplinary research and methodological aspects of empirical software engineering. We are convinced that thoroughly analyzing and understanding the state-of-practice is an essential first step towards improving how software is being developed. Our vision is that software engineering becomes more evidence-based, which is only possible if academic researchers provide actionable insights on topics that are relevant to practitioners. You can find more information on my personal notion of empirical software engineering in my research statement and in my inaugural lecture at the University of Bayreuth, which is available online.
Research Areas
- Data-driven decision making in software engineering
- Developer experience and tool support
- Human-centric software engineering
- Mining software repositories
- Meta-scientific issues in software engineering research
- Behavioral studies of software developers
- Interdisciplinary research
Data-driven decision making in software engineering
Projects in this research area include empirical studies and tool prototypes built to support data-driven decision making in software projects. A central theme is test flakiness: Across a series of studies conducted at SAP HANA, we analyzed the impact of timeouts, studied how test and environmental complexity relate to flakiness, and, more recently, investigated how LLM-generated tests and test code features predict flakiness. Other work in this area includes a tailored visualization for company-wide software service dependencies used to guide service deprecation, and our global Pandemic Programming study providing data-driven recommendations to support developers during forced remote work.
- Configuring Agentic AI Coding Tools: An Exploratory Study
- On the Flakiness of LLM-generated Tests for Industrial and Open-Source Database Management Systems
- Flaky Tests in a Large Industrial Database Management System: An Empirical Study of Fixed Issue Reports for SAP HANA
- On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents
- Can We Classify Flaky Tests Using Only Test Code? An LLM-Based Empirical Study
- Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA
- Taming Timeout Flakiness: An Empirical Study of SAP HANA
- Visually Analyzing Company-wide Software Service Dependencies: An Industrial Case Study
- Pandemic Programming: How COVID-19 Affects Software Developers and How Their Organizations Can Help
Developer experience and tool support
Developer experience encompasses how the interplay of people (e.g., developers or other stakeholders) and the environment (e.g., processes, tools, culture) positively or negatively affects activities along the software development lifecycle. This broad area touches “micro” aspects such as developer tooling, including AI assistants, but also “macro” aspects such as documentation ecosystems and organizational practices. Recent work includes a critical review of developers’ trust in AI assistants and a study on context engineering for AI agents in open-source software. Related aspects are developer productivity, wellbeing, and satisfaction, all of which are difficult to accurately describe and operationalize. Other papers in this area presented novel tool prototypes, for example, to better integrate cost transparency in cloud applications, support automated query reformulation, or link documentation and source code.
- Configuring Agentic AI Coding Tools: An Exploratory Study
- On the Need to Rethink Trust in AI Assistants for Software Development: A Critical Review
- Context Engineering for AI Agents in Open-Source Software
- On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents
- User Misconceptions of LLM-Based Conversational Programming Assistants
- A Penny a Function: Towards Cost Transparent Cloud Programming
- Automated Query Reformulation for Efficient Search Based on Query Logs from Stack Overflow
- Pandemic Programming: How COVID-19 Affects Software Developers and How Their Organizations Can Help
- Round-Trip Sketches: Supporting the Lifecycle of Software Development Sketches from Analog to Digital and Back
- Linking Sketches and Diagrams to Source Code Artifacts
- RegViz: Visual Debugging of Regular Expressions
Human-centric software engineering
Software is developed with and for a wide range of stakeholders, and understanding how human factors shape software development, and how development practices affect people, is central to this research area. Our work spans a broad range of topics: We critically reviewed developers’ trust in AI assistants, contributed to the Copenhagen Manifesto on human-centered generative AI in software engineering, and helped develop the Software Infrastructure Attitude Scale (SIAS) for measuring professionals’ attitudes toward technical infrastructure. We have also studied job satisfaction and turnover intentions of software professionals, examined cognitive capability and personality as predictors of coding performance, and investigated diversity and inclusion in software engineering, including ageism and gender bias in the industry.
- Operationalizing Ethics for AI Agents: How Developers Encode Values into Repository Context Files
- On the Need to Rethink Trust in AI Assistants for Software Development: A Critical Review
- How Does Cognitive Capability and Personality Influence Problem Solving in Coding Interview Puzzles?
- User Misconceptions of LLM-Based Conversational Programming Assistants
- Staying or Leaving? How Job Satisfaction, Embeddedness and Antecedents Predict Turnover Intentions of Software Professionals
- The Software Infrastructure Attitude Scale (SIAS): A Questionnaire Instrument for Measuring Professionals’ Attitudes Toward Technical and Sociotechnical Infrastructure
- Making Software Development More Diverse and Inclusive: Key Themes, Challenges, and Future Directions
- UX Debt: Developers Borrow While Users Pay
- Generative AI in Software Engineering Must Be Human-Centered: The Copenhagen Manifesto
- “STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software Developers
- Challenges for Inclusion in Software Engineering: The Case of the Emerging Papua New Guinean Society
- Is 40 the new 60? How Popular Media Portrays the Employability of Older Software Developers
- Towards a Theory of Software Development Expertise
Mining software repositories
Projects in this research area analyze data from software repositories, including version control systems, issue tracking systems, and developer Q&A forums, with the goal of identifying patterns or deriving actionable recommendations. Recent work includes studying context engineering for AI agents in open-source software and applying information-theoretic methods to detect unusual source code changes. Earlier work studied links in commit messages, developer use of GitHub Discussions, search behavior on Stack Overflow, code duplication, and attribution practices for reused code snippets. Several of those studies are based on SOTorrent, a dataset for studying the evolution of Stack Overflow posts that we maintained between 2017 and 2020 (see also the SOTorrent Project Page).
- Configuring Agentic AI Coding Tools: An Exploratory Study
- Context Engineering for AI Agents in Open-Source Software
- Information-Theoretic Detection of Unusual Source Code Changes
- Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA
- Taming Timeout Flakiness: An Empirical Study of SAP HANA
- 18 Million Links in Commit Messages: Purpose, Evolution, and Decay
- GitHub Discussions: An Exploratory Study of Early Adoption
- Characterizing Search Activities on Stack Overflow
- On the Diversity and Frequency of Code Related to Mathematical Formulas in Real-World Java Projects
- Code Duplication on Stack Overflow
- An Annotated Dataset of Stack Overflow Post Edits
- Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects
- SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets
- SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts
- (No) Influence of Continuous Integration on the Commit Activity in GitHub Projects
- Attribution Required: Stack Overflow Code Snippets in GitHub Projects
Meta-scientific issues in software engineering research
As empirical software engineering matures as a discipline, meta-scientific questions about how to conduct and communicate empirical research become increasingly important. An early focus was sampling: We studied the effectiveness and ethical implications of different strategies for sampling software developers, and later published a critical review of sampling in software engineering research. We also addressed secondary research, discussing that literature reviews in software engineering are still quite rudimentary compared to other disciplines. Our paper The Silent Scientist examines why software engineering research often fails to reach practitioners. More recently, we investigated the challenges software engineering researchers face when trying to publish interdisciplinary work, and we are developing evaluation guidelines for empirical studies involving LLMs (see also the project website).
- The Silent Scientist: When Software Research Fails to Reach Its Audience
- Not Real or Too Soft? On the Challenges of Publishing Interdisciplinary Software Engineering Research
- Towards Evaluation Guidelines for Empirical Studies involving LLMs
- Teaching Literature Reviewing for Software Engineering Research
- Paving the Way for Mature Secondary Research: The Seven Types of Literature Review
- Sampling in Software Engineering Research: A Critical Review and Guidelines
- Worse Than Spam: Issues In Sampling Software Developers
Behavioral studies of software developers
Traces of software developers’ behavior can be found in repositories and artifacts (see Mining software repositories), but behavior can also be studied through lab and field studies, online surveys, and interviews. Recent work in this area includes studies on how cognitive capability and personality influence problem-solving in coding tasks, large-scale surveys on job satisfaction and turnover intentions of software professionals, the development of the SIAS questionnaire instrument for measuring attitudes toward software infrastructure, and an analysis of how code comments affect the perceived helpfulness of Stack Overflow posts. Earlier work studied how developers debug performance bugs in a pair programming setting, how they use sketches and diagrams in their daily work, and how they reference documentation resources.
- How Does Cognitive Capability and Personality Influence Problem Solving in Coding Interview Puzzles?
- Staying or Leaving? How Job Satisfaction, Embeddedness and Antecedents Predict Turnover Intentions of Software Professionals
- The Software Infrastructure Attitude Scale (SIAS): A Questionnaire Instrument for Measuring Professionals’ Attitudes Toward Technical and Sociotechnical Infrastructure
- The Influence of Code Comments on the Perceived Helpfulness of Stack Overflow Posts
- Contextual Documentation Referencing on Stack Overflow
- Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects
- Navigate, Understand, Communicate: How Developers Locate Performance Bugs
- Sketches and Diagrams in Practice
Interdisciplinary research
Interdisciplinary research in software engineering can mean applying concepts from other disciplines such as psychology or information theory to software engineering problems. However, it can also mean working with researchers from other disciplines on problems rooted in software engineering or in another discipline. It can also mean conducting a study focusing on software engineering, the results of which are then picked up in other disciplines as well.
- Information-Theoretic Detection of Unusual Source Code Changes
- Applying Information Theory to Software Evolution
- From Full-fledged ERP Systems Towards Process-centric Business Process Platforms
- Pandemic Programming: How COVID-19 Affects Software Developers and How Their Organizations Can Help
- Towards a Theory of Software Development Expertise
- Constructing Urban Tourism Space Digitally: A Study of Airbnb Listings in Two Berlin Neighborhoods
- Visual Analysis and Coding of Data-Rich User Behavior
- VisualCues: Visually Explaining Source Code in Computer Science Education
- CodeBasket: Making Developers’ Mental Model Visible and Explorable