MSR 2025
Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025
Mon 28 Apr 2025 11:10 - 11:20 at 214 - Defects, bugs, and issues Chair(s): Minhaz Zibran

Just-in-time defect prediction (JIT DP) leverages machine learning to identify defect-prone code commits, enabling quality assurance (QA) teams to allocate resources more efficiently by focusing on commits that are most likely to contain defects.

Although JIT defect prediction techniques have introduced notable improvements in terms of predictive accuracy, they are still susceptible to misclassification errors such as false positives and false negatives. This can lead to wasted resources or undetected defects, a particularly critical concern when QA resources are limited.

To mitigate these challenges and preserve the practical utility of JIT defect prediction tools, it becomes essential to estimate the reliability of the predictions, i.e., computing confidence scores. Such scores can help practitioners identify predictions that are most likely to be correct and thus prioritize them efficiently.

A simple approach to computing confidence scores is to extract, alongside each prediction, the corresponding prediction probabilities and use them as indicators of confidence. However, for these probabilities to reliably serve as confidence scores, the predictive model must be well-calibrated. This means that the prediction probabilities must accurately represent the true likelihood of each prediction being correct.

Miscalibration, common in modern machine learning models, distorts probability scores such that model’s predictions probabilities do not align with the actual probability of those predictions being correct; hence leading to poor prioritization and resource allocation.

Despite its importance, model calibration has been largely overlooked in JIT defect prediction. In this study, we evaluate the calibration of several state-of-the-art JIT defect prediction techniques to determine whether and to what extent they exhibit poor calibration. Furthermore, we assess whether post-calibration methods can improve the calibration of existing JIT defect prediction models.

Our experimental analysis reveals that all evaluated JIT DP models exhibit some level of miscalibration, with Expected Calibration Error (ECE) ranging from 7% to 35%. Furthermore, post-calibration methods do not consistently improve the calibration of these JIT DP models.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
Defects, bugs, and issuesData and Tool Showcase Track / Technical Papers / Registered Reports at 214
Chair(s): Minhaz Zibran Idaho State University
11:00
10m
Talk
Learning from Mistakes: Understanding Ad-hoc Logs through Analyzing Accidental Commits
Technical Papers
Yi-Hung Chou University of California, Irvine, Yiyang Min Amazon, April Wang ETH Zürich, James Jones University of California at Irvine
Pre-print
11:10
10m
Talk
On the calibration of Just-in-time Defect Prediction
Technical Papers
Xhulja Shahini paluno - University of Duisburg-Essen, Jone Bartel University of Duisburg-Essen, paluno, Klaus Pohl University of Duisburg-Essen, paluno
11:20
10m
Talk
An Empirical Study on Leveraging Images in Automated Bug Report Reproduction
Technical Papers
Dingbang Wang University of Connecticut, Zhaoxu Zhang University of Southern California, Sidong Feng Monash University, William G.J. Halfond University of Southern California, Tingting Yu University of Connecticut
11:30
10m
Talk
It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python SoftwareTechnical Track Distinguished Paper Award
Technical Papers
Shrey Tiwari Carnegie Mellon University, Serena Chen University of California, San Diego, Alexander Joukov Stony Brook University, Peter Vandervelde University of California, Santa Barbara, Ao Li Carnegie Mellon University, Rohan Padhye Carnegie Mellon University
Pre-print
11:40
10m
Talk
Enhancing Just-In-Time Defect Prediction Models with Developer-Centric Features
Technical Papers
Emanuela Guglielmi University of Molise, Andrea D'Aguanno University of Molise, Rocco Oliveto University of Molise, Simone Scalabrino University of Molise
11:50
10m
Talk
Revisiting Defects4J for Fault Localization in Diverse Development Scenarios
Technical Papers
Md Nakhla Rafi Concordia University, An Ran Chen University of Alberta, Tse-Hsun (Peter) Chen Concordia University, Shaohua Wang Central University of Finance and Economics
12:00
5m
Talk
Mining Bug Repositories for Multi-Fault Programs
Data and Tool Showcase Track
Dylan Callaghan Stellenbosch University, Bernd Fischer Stellenbosch University
12:05
5m
Talk
HaPy-Bug - Human Annotated Python Bug Resolution Dataset
Data and Tool Showcase Track
Piotr Przymus Nicolaus Copernicus University in Toruń, Poland, Mikołaj Fejzer Nicolaus Copernicus University in Toruń, Jakub Narębski Nicolaus Copernicus University in Toruń, Radosław Woźniak Nicolaus Copernicus University in Toruń, Łukasz Halada University of Wrocław, Poland, Aleksander Kazecki Nicolaus Copernicus University in Toruń, Mykhailo Molchanov Igor Sikorsky Kyiv Polytechnic Institute, Ukraine, Krzysztof Stencel University of Warsaw
Pre-print File Attached
12:10
5m
Talk
SPRINT: An Assistant for Issue Report Management
Data and Tool Showcase Track
Ahmed Adnan , Antu Saha William & Mary, Oscar Chaparro William & Mary
Pre-print
12:15
5m
Talk
Identifying and Replicating Code Patterns Driving Performance Regressions in Software Systems
Registered Reports
Denivan Campos University of Molise, Luana Martins University of Salerno, Emanuela Guglielmi University of Molise, Michele Tucci University of L'Aquila, Daniele Di Pompeo University of L'Aquila, Simone Scalabrino University of Molise, Vittorio Cortellessa University of L'Aquila, Dario Di Nucci University of Salerno, Rocco Oliveto University of Molise