MSR 2025
Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025

This program is tentative and subject to change.

Mon 28 Apr 2025 16:00 - 16:10 at 215 - LLMs for Code

Code language models, while widely popular, are often trained on unsanitized source code gathered from across the Internet. Previous work revealed that \textit{pre-trained} models can remember the content of their training data and regurgitate them through data extraction attacks. Due to the large size of current models, only a few entities have the resources for pre-training such models. However, fine-tuning requires fewer resources and is increasingly used by both small and large entities for its effectiveness on specialized data. Such small curated data for fine-tuning might contain sensitive information or proprietary assets. In this study, we attack \textit{both} pre-trained and fine-tuned code language models to investigate the extent of data extractability. We first develop a custom benchmark to assess the vulnerability of both pre-training and fine-tuning samples to extraction attacks. Our findings reveal that 54.9% of extractable pre-training data could be retrieved from StarCoder2-15B, whereas this number decreased to 23.5% after fine-tuning. This indicates that fine-tuning reduces the extractability of \textit{pre-training} data. However, compared to larger models, \textit{fine-tuning smaller} models increases their vulnerability to data extraction attacks on \textit{fine-tuning} data. Given the potential sensitivity of fine-tuning data, this can lead to more severe consequences. Lastly, we also manually analyzed 2000 extractable samples before and after fine-tuning. We also found that data carriers and licensing information are the most likely data categories to be memorized from pre-trained and fine-tuned models, while the latter is the most likely to be forgotten after fine-tuning.

This program is tentative and subject to change.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
16:00
10m
Talk
How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning
Technical Papers
Fabio Salerno Delft University of Technology, Ali Al-Kaswan Delft University of Technology, Netherlands, Maliheh Izadi Delft University of Technology
16:10
10m
Talk
Can LLMs Generate Higher Quality Code Than Humans? An Empirical Study
Technical Papers
Mohammad Talal Jamil Lahore University of Management Sciences, Shamsa Abid National University of Computer and Emerging Sciences, Shafay Shamail LUMS, DHA, Lahore
16:20
10m
Talk
Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code
Technical Papers
Jiho Shin York University, Clark Tang , Tahmineh Mohati University of Calgary, Maleknaz Nayebi York University, Song Wang York University, Hadi Hemmati York University
16:30
5m
Talk
Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code
Data and Tool Showcase Track
Timur Galimzyanov JetBrains Research, Sergey Titov JetBrains Research, Yaroslav Golubev JetBrains Research, Egor Bogomolov JetBrains Research
Pre-print
16:35
5m
Talk
SnipGen: A Mining Repository Framework for Evaluating LLMs for Code
Data and Tool Showcase Track