MSR 2025
Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025

This program is tentative and subject to change.

Mon 28 Apr 2025 16:30 - 16:35 at 215 - LLMs for Code

This paper introduces the human-curated PandasPlotBench dataset, designed to evaluate language models’ effectiveness as assistants in visual data exploration. Our benchmark focuses on generating code for visualizing tabular data—such as a Pandas DataFrame—based on natural language instructions, complementing current evaluation tools and expanding their scope. The dataset includes 175 unique tasks. Our experiments assess several leading Large Language Models (LLMs) across three visualization libraries: Matplotlib, Seaborn, and Plotly. We show that the shortening of tasks has a minimal effect on plotting capabilities, allowing for the user interface that accommodates concise user input without sacrificing functionality or accuracy. Another of our findings reveals that while LLMs perform well with popular libraries like Matplotlib and Seaborn, challenges persist with Plotly, highlighting areas for improvement. We hope that the modular design of our benchmark will broaden the current studies on generating visualizations. Our benchmark is available online: https://huggingface.co/datasets/JetBrains-Research/plot_bench. The code for running the benchmark is also available: https://github.com/JetBrains-Research/PandasPlotBench.

This program is tentative and subject to change.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
16:00
10m
Talk
How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning
Technical Papers
Fabio Salerno Delft University of Technology, Ali Al-Kaswan Delft University of Technology, Netherlands, Maliheh Izadi Delft University of Technology
16:10
10m
Talk
Can LLMs Generate Higher Quality Code Than Humans? An Empirical Study
Technical Papers
Mohammad Talal Jamil Lahore University of Management Sciences, Shamsa Abid National University of Computer and Emerging Sciences, Shafay Shamail LUMS, DHA, Lahore
16:20
10m
Talk
Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code
Technical Papers
Jiho Shin York University, Clark Tang , Tahmineh Mohati University of Calgary, Maleknaz Nayebi York University, Song Wang York University, Hadi Hemmati York University
16:30
5m
Talk
Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code
Data and Tool Showcase Track
Timur Galimzyanov JetBrains Research, Sergey Titov JetBrains Research, Yaroslav Golubev JetBrains Research, Egor Bogomolov JetBrains Research
Pre-print
16:35
5m
Talk
SnipGen: A Mining Repository Framework for Evaluating LLMs for Code
Data and Tool Showcase Track