MSR 2025
Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025
Mon 28 Apr 2025 14:10 - 14:20 at 214 - AI for SE (1) Chair(s): Diego Elias Costa

In code review, generating structured and relevant comments is crucial for identifying code issues and facilitating accurate code changes that ensure an efficient code review process. Well-crafted comments not only streamline the code review itself but are also essential for subsequent tasks like code refinement, where the code is modified to satisfy the input review comment. Although various AI-based approaches aimed to automate comment generation, their effectiveness remains limited by the quality of the training data. Existing code review datasets are often noisy and unrefined, posing limitations to the learning potential of AI models and hindering the automation process.

To address these challenges, we propose a curation pipeline designed to enhance the quality of the largest publicly available code review dataset. We begin by establishing an evaluation framework, incorporating specific criteria and categories to assess study the initial quality of the dataset. Using a large language model (LLM)-driven approach, we then apply our curation pipeline to refine the dataset. A comparative analysis of the newly curated dataset, based on the same evaluation framework, demonstrates substantial improvements in the clarity and conciseness of the comments. Additionally, we assess the impact of the curated dataset on automating downstream tasks, specifically comment generation and code refinement. Our findings show that the curated dataset leads to enhanced model performance in generating more accurate comments. Curated comments are also more useful as they lead to more accurate code refinement.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
AI for SE (1)Technical Papers / Data and Tool Showcase Track / Registered Reports at 214
Chair(s): Diego Elias Costa Concordia University, Canada
14:00
10m
Talk
Combining Large Language Models with Static Analyzers for Code Review Generation
Technical Papers
Imen Jaoua DIRO, Université de Montréal, Oussama Ben Sghaier DIRO, Université de Montréal, Houari Sahraoui DIRO, Université de Montréal
Pre-print
14:10
10m
Talk
Harnessing Large Language Models for Curated Code Reviews
Technical Papers
Oussama Ben Sghaier DIRO, Université de Montréal, Martin Weyssow Singapore Management University, Houari Sahraoui DIRO, Université de Montréal
Pre-print
14:20
10m
Talk
SMATCH-M-LLM: Semantic Similarity in Metamodel Matching With Large Language Models
Technical Papers
Nafisa Ahmed Polytechnique Montreal, Hin Chi Kwok Hong Kong Polytechnic University, Mohammad Hamdaqa Polytechnique Montréal, Wesley Assunção North Carolina State University
14:30
10m
Talk
How Effective are LLMs for Data Science Coding? A Controlled Experiment
Technical Papers
Nathalia Nascimento Pennsylvania State University, Everton Guimaraes Pennsylvania State University, USA, Sai Sanjna Chintakunta Pennsylvania State University, Santhosh AB Pennsylvania State University
Pre-print
14:40
10m
Talk
Do LLMs Provide Links to Code Similar to what they Generate? A Study with Gemini and Bing CoPilot
Technical Papers
Daniele Bifolco University of Sannio, Pietro Cassieri University of Salerno, Giuseppe Scanniello University of Salerno, Massimiliano Di Penta University of Sannio, Italy, Fiorella Zampetti University of Sannio, Italy
Pre-print
14:50
10m
Talk
Too Noisy To Learn: Enhancing Data Quality for Code Review Comment Generation
Technical Papers
Chunhua Liu The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:00
5m
Talk
Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks
Technical Papers
Kyi Shin Khant The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:05
5m
Talk
RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering
Data and Tool Showcase Track
Samuel Abedu Concordia University, Laurine Menneron CESI Graduate School of Engineering, SayedHassan Khatoonabadi Concordia University, Emad Shihab Concordia University
15:10
5m
Talk
How do Copilot Suggestions Impact Developers' Frustration and Productivity?
Registered Reports
Emanuela Guglielmi University of Molise, Venera Arnaoudova Washington State University, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Rocco Oliveto University of Molise, Simone Scalabrino University of Molise
15:15
5m
Talk
Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software Repositories
Registered Reports
Matin Koohjani Concordia University, Diego Elias Costa Concordia University, Canada