MSR 2025
Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025

This program is tentative and subject to change.

Mon 28 Apr 2025 14:30 - 14:40 at 214 - AI for SE (1)

The adoption of Large Language Models (LLMs) for code generation in data science offers substantial potential for enhancing tasks such as data manipulation, statistical analysis, and visualization. However, the effectiveness of LLMs in this domain has not been thoroughly evaluated, limiting their reliable use in real-world data science workflows. This paper presents a controlled experiment that empirically assesses the performance of four leading LLM-based AI assistants—Microsoft Copilot (GPT-4 Turbo), ChatGPT (o1-preview), Claude (3.5 Sonnet), and Perplexity Labs (Llama-3.1-70b-instruct)—on a diverse set of data science coding challenges sourced from the Stratacratch platform. Using the Goal-Question-Metric (GQM) approach, we evaluated each model’s effectiveness across task types (Analytical, Algorithm, Visualization) and varying difficulty levels. Our findings show that ChatGPT and Claude lead in success rate, with ChatGPT excelling in analytical and algorithm tasks. Hypothesis testing reveals that only ChatGPT and Claude maintain effectiveness above a 60% assertiveness baseline, while no models reach significance at a 70% threshold, underscoring their strengths in specific areas but also limitations at higher standards. Efficiency analysis indicates no significant differences in execution times among models for analytical tasks, although Claude and Copilot demonstrate more stable performance, while ChatGPT and Perplexity exhibit higher variability. This study provides a rigorous, performance-based evaluation framework for LLMs in data science, equipping practitioners with insights to select models tailored to specific task demands and setting empirical standards for future AI assessments beyond basic performance measures.

This program is tentative and subject to change.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
14:00
10m
Talk
Combining Large Language Models with Static Analyzers for Code Review Generation
Technical Papers
Imen Jaoua DIRO, Université de Montréal, Oussama Ben Sghaier DIRO, Université de Montréal, Houari Sahraoui DIRO, Université de Montréal
14:10
10m
Talk
Harnessing Large Language Models for Curated Code Reviews
Technical Papers
Oussama Ben Sghaier DIRO, Université de Montréal, Martin Weyssow Singapore Management University, Houari Sahraoui DIRO, Université de Montréal
14:20
10m
Talk
SMATCH-M-LLM: Semantic Similarity in Metamodel Matching With Large Language Models
Technical Papers
Nafisa Ahmed Polytechnique Montreal, Hin Chi Kwok Hong Kong Polytechnic University, Mohammad Hamdaqa Polytechnique Montréal, Wesley Assunção North Carolina State University
14:30
10m
Talk
How Effective are LLMs for Data Science Coding? A Controlled Experiment
Technical Papers
Nathalia Nascimento Pennsylvania State University, Everton Guimaraes Pennsylvania State University, USA, Sai Sanjna Chintakunta Pennsylvania State University, Santhosh AB Pennsylvania State University
14:40
10m
Talk
Do LLMs Provide Links to Code Similar to what they Generate? A Study with Gemini and Bing CoPilot
Technical Papers
Daniele Bifolco University of Sannio, Pietro Cassieri University of Salerno, Giuseppe Scanniello University of Salerno, Massimiliano Di Penta University of Sannio, Italy, Fiorella Zampetti University of Sannio, Italy
Pre-print
14:50
10m
Talk
Too Noisy To Learn: Enhancing Data Quality for Code Review Comment Generation
Technical Papers
Chunhua Liu The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:00
5m
Talk
Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks
Technical Papers
Skylar Kyi Shin Khant The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:05
5m
Talk
RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering
Data and Tool Showcase Track
Samuel Abedu Concordia University, Laurine Menneron CESI Graduate School of Engineering, SayedHassan Khatoonabadi Concordia University, Emad Shihab Concordia University