Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code (MSR 2025 - Data and Tool Showcase Track)

Who

Timur Galimzyanov, Sergey Titov, Yaroslav Golubev, Egor Bogomolov

Track

MSR 2025 Data and Tool Showcase Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 16:30 - 16:35 at 214 - LLMs for Code Chair(s): Ali Ouni

Abstract

This paper introduces the human-curated PandasPlotBench dataset, designed to evaluate language models’ effectiveness as assistants in visual data exploration. Our benchmark focuses on generating code for visualizing tabular data—such as a Pandas DataFrame—based on natural language instructions, complementing current evaluation tools and expanding their scope. The dataset includes 175 unique tasks. Our experiments assess several leading Large Language Models (LLMs) across three visualization libraries: Matplotlib, Seaborn, and Plotly. We show that the shortening of tasks has a minimal effect on plotting capabilities, allowing for the user interface that accommodates concise user input without sacrificing functionality or accuracy. Another of our findings reveals that while LLMs perform well with popular libraries like Matplotlib and Seaborn, challenges persist with Plotly, highlighting areas for improvement. We hope that the modular design of our benchmark will broaden the current studies on generating visualizations. Our benchmark is available online: https://huggingface.co/datasets/JetBrains-Research/plot_bench. The code for running the benchmark is also available: https://github.com/JetBrains-Research/PandasPlotBench.

Link to Preprint

https://arxiv.org/abs/2412.02764

Timur Galimzyanov

JetBrains Research

Sergey Titov

JetBrains Research

Yaroslav Golubev

JetBrains Research

Serbia

Egor Bogomolov

JetBrains Research

Netherlands

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30	LLMs for CodeTechnical Papers / Data and Tool Showcase Track / Tutorials / Program at 214 Chair(s): Ali Ouni ETS Montreal, University of Quebec

16:00 10m Talk		How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuningTechnical Track Distinguished Paper Award Technical Papers Fabio Salerno Delft University of Technology, Ali Al-Kaswan Delft University of Technology, Netherlands, Maliheh Izadi Delft University of Technology
16:10 10m Talk		Can LLMs Generate Higher Quality Code Than Humans? An Empirical Study Technical Papers Mohammad Talal Jamil Lahore University of Management Sciences, Shamsa Abid National University of Computer and Emerging Sciences, Shafay Shamail LUMS, DHA, Lahore Pre-print
16:20 10m Talk		Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code Technical Papers Jiho Shin York University, Clark Tang , Tahmineh Mohati University of Calgary, Maleknaz Nayebi York University, Song Wang York University, Hadi Hemmati York University
16:30 5m Talk		Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code Data and Tool Showcase Track Timur Galimzyanov JetBrains Research, Sergey Titov JetBrains Research, Yaroslav Golubev JetBrains Research, Egor Bogomolov JetBrains Research Pre-print
16:35 5m Talk		SnipGen: A Mining Repository Framework for Evaluating LLMs for Code Data and Tool Showcase Track Daniel Rodriguez-Cardenas William & Mary, Alejandro Velasco William & Mary, Denys Poshyvanyk William & Mary Pre-print
16:50 40m Tutorial		Harmonized Coding with AI: LLMs for Qualitative Analysis in Software Engineering Research Tutorials Christoph Treude Singapore Management University, Youmei Fan Nara Institute of Science and Technology, Tao Xiao Kyushu University, Hideaki Hata Shinshu University File Attached