MSR 2025
Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025

This program is tentative and subject to change.

Mon 28 Apr 2025 15:00 - 15:05 at 214 - AI for SE (1)

Learning-based techniques, especially advanced pre-trained models for code have demonstrated capabilities in code understanding and generation, solving diverse software engineering (SE) tasks. Despite the promising results, current training approaches may not fully optimize model performance, as they typically involve learning from randomly shuffled training data. Recent work shows that Curriculum Learning (CL) can improve NLP task performance through incremental learning based on the difficulty level. Yet, the effectiveness of CL in automated SE tasks remains largely unexplored. In this study, we explore the potential of CL to improve the performance of the pre-trained code model (CodeT5). We assess the effectiveness of CL for fine-tuning the CodeT5 model on two SE tasks: code clone detection and code summarization. We explore two code metrics: code length and cyclomatic complexity to determine the difficulty levels. Our empirical study on the CodeXGLUE benchmark shows that CL has little impact on the model performance for both SE tasks. Surprisingly, the model performance saturates after training on only the first quarter of training data. These results suggest that CL may not provide benefits of improving model performance for SE tasks. This may be due to the model learning capacity. Future work should further explore various CL strategies with various code models for a wider range of SE tasks for a holistic understanding.

This program is tentative and subject to change.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
14:00
10m
Talk
Combining Large Language Models with Static Analyzers for Code Review Generation
Technical Papers
Imen Jaoua DIRO, Université de Montréal, Oussama Ben Sghaier DIRO, Université de Montréal, Houari Sahraoui DIRO, Université de Montréal
14:10
10m
Talk
Harnessing Large Language Models for Curated Code Reviews
Technical Papers
Oussama Ben Sghaier DIRO, Université de Montréal, Martin Weyssow Singapore Management University, Houari Sahraoui DIRO, Université de Montréal
14:20
10m
Talk
SMATCH-M-LLM: Semantic Similarity in Metamodel Matching With Large Language Models
Technical Papers
Nafisa Ahmed Polytechnique Montreal, Hin Chi Kwok Hong Kong Polytechnic University, Mohammad Hamdaqa Polytechnique Montréal, Wesley Assunção North Carolina State University
14:30
10m
Talk
How Effective are LLMs for Data Science Coding? A Controlled Experiment
Technical Papers
Nathalia Nascimento Pennsylvania State University, Everton Guimaraes Pennsylvania State University, USA, Sai Sanjna Chintakunta Pennsylvania State University, Santhosh AB Pennsylvania State University
14:40
10m
Talk
Do LLMs Provide Links to Code Similar to what they Generate? A Study with Gemini and Bing CoPilot
Technical Papers
Daniele Bifolco University of Sannio, Pietro Cassieri University of Salerno, Giuseppe Scanniello University of Salerno, Massimiliano Di Penta University of Sannio, Italy, Fiorella Zampetti University of Sannio, Italy
Pre-print
14:50
10m
Talk
Too Noisy To Learn: Enhancing Data Quality for Code Review Comment Generation
Technical Papers
Chunhua Liu The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:00
5m
Talk
Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks
Technical Papers
Skylar Kyi Shin Khant The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:05
5m
Talk
RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering
Data and Tool Showcase Track
Samuel Abedu Concordia University, Laurine Menneron CESI Graduate School of Engineering, SayedHassan Khatoonabadi Concordia University, Emad Shihab Concordia University