How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning (MSR 2025 - Technical Papers)

Who

Fabio Salerno, Ali Al-Kaswan, Maliheh Izadi

Track

MSR 2025 Technical Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 16:00 - 16:10 at 214 - LLMs for Code Chair(s): Ali Ouni

Abstract

Code language models, while widely popular, are often trained on unsanitized source code gathered from across the Internet. Previous work revealed that \textit{pre-trained} models can remember the content of their training data and regurgitate them through data extraction attacks. Due to the large size of current models, only a few entities have the resources for pre-training such models. However, fine-tuning requires fewer resources and is increasingly used by both small and large entities for its effectiveness on specialized data. Such small curated data for fine-tuning might contain sensitive information or proprietary assets. In this study, we attack \textit{both} pre-trained and fine-tuned code language models to investigate the extent of data extractability. We first develop a custom benchmark to assess the vulnerability of both pre-training and fine-tuning samples to extraction attacks. Our findings reveal that 54.9% of extractable pre-training data could be retrieved from StarCoder2-15B, whereas this number decreased to 23.5% after fine-tuning. This indicates that fine-tuning reduces the extractability of \textit{pre-training} data. However, compared to larger models, \textit{fine-tuning smaller} models increases their vulnerability to data extraction attacks on \textit{fine-tuning} data. Given the potential sensitivity of fine-tuning data, this can lead to more severe consequences. Lastly, we also manually analyzed 2000 extractable samples before and after fine-tuning. We also found that data carriers and licensing information are the most likely data categories to be memorized from pre-trained and fine-tuned models, while the latter is the most likely to be forgotten after fine-tuning.

Fabio Salerno

Delft University of Technology

Ali Al-Kaswan

Delft University of Technology, Netherlands

Netherlands

Maliheh Izadi