MSR 2025
Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025
Mon 28 Apr 2025 17:10 - 17:15 at 215 - Software evolution and analysis Chair(s): Mauricio Verano Merino

One of the central tasks in software maintenance is being able to understand and develop code changes. Thus, given a natural language description of the desired new operation of a function, an (human or AI) agent might be asked to generate the set of edits to that function to implement the desired new operation; likewise, given a set of edits to a function, an agent might be asked to generate a changed description, of that function’s new workings. Thus, there is an incentive to train a neural model for change-related tasks. Motivated by this, we offer a new, “natural”, large dataset of coupled changes to code and documentation mined from actual high-quality GitHub projects, where each sample represents a single commit where the code and the associated docstring were changed together. We present the methodology for gathering the dataset, and some sample, challenging (but realistic) tasks where our dataset provides opportunities for both learning and evaluation. We find that current models (specifically Llama-3.1 405B, Mixtral 8x22B) do find these maintenance-related tasks challenging.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
Software evolution and analysisData and Tool Showcase Track / Technical Papers / Industry Track at 215
Chair(s): Mauricio Verano Merino Vrije Universiteit Amsterdam
16:00
10m
Talk
50 Years of Programming Language Evolution through the Software Heritage looking glass
Technical Papers
Adèle Desmazières Sorbonne Unversité, Roberto Di Cosmo Inria, France / University of Paris Diderot, France, Valentin Lorentz Inria Foundation
16:10
10m
Talk
It Works (only) on My Machine: A Study on Reproducibility Smells in Ansible Scripts
Technical Papers
Ghazal Sobhani Dalhousie University, Israat Haque Dalhousie University, Tushar Sharma Dalhousie University
Pre-print
16:20
10m
Talk
Are the Majority of Public Computational Notebooks Pathologically Non-Executable?
Technical Papers
Waris Gill Virginia Tech, Muhammad Ali Gulzar Virginia Tech, Tien Nguyen Virginia Tech
Pre-print
16:30
10m
Talk
Understanding Test Deletion in Java Applications
Technical Papers
Suraj Bhatta North Dakota State University, Frank Kendemah North Dakota State University, Ajay Jha North Dakota State University
Pre-print
16:40
10m
Talk
A Public Benchmark of REST APIs
Technical Papers
Alix Decrop University of Namur, Sara Eraso University of Valle, Xavier Devroey University of Namur, Gilles Perrouin Fonds de la Recherche Scientifique - FNRS & University of Namur
Pre-print
16:50
5m
Talk
What Do Contribution Guidelines Say About Software Testing?
Technical Papers
Pre-print
16:55
5m
Talk
Measuring InnerSource Value
Industry Track
17:00
5m
Talk
CoUpJava: A Dataset of Code Upgrade Histories in Open-Source Java Repositories
Data and Tool Showcase Track
Kaihang Jiang University of Waterloo, Bihui Jin University of Waterloo, Pengyu Nie University of Waterloo
17:05
5m
Talk
EvoChain: A Framework for Tracking and Visualizing Smart Contract Evolution
Data and Tool Showcase Track
Ilham Qasse Reykjavik University, Mohammad Hamdaqa Polytechnique Montréal, Björn Þór Jónsson Reykjavik University
17:10
5m
Talk
CoDocBench: A Dataset for Code-Documentation Alignment in Software Maintenance
Data and Tool Showcase Track
Kunal Suresh Pai UC Davis, Prem Devanbu University of California at Davis, Toufique Ahmed IBM Research
Pre-print
17:15
5m
Talk
RefExpo: Unveiling Software Project Structures through Advanced Dependency Graph Extraction
Data and Tool Showcase Track
Vahid Haratian Bilkent Univeristy, Pouria Derakhshanfar JetBrains Research, Vladimir Kovalenko JetBrains Research, Eray Tüzün Bilkent University
17:20
5m
Talk
HyperAST: Incrementally Mining Large Source Code Repositories
Data and Tool Showcase Track
Quentin Le Dilavrec TU Delft, Netherlands, Andy Zaidman Delft University of Technology
Pre-print