SMATCH-M-LLM: Semantic Similarity in Metamodel Matching With Large Language Models (MSR 2025 - Technical Papers) - MSR 2025

Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada

co-located with ICSE 2025

Who

Nafisa Ahmed, Hin Chi Kwok, Mohammad Hamdaqa, Wesley Assunção

Track

MSR 2025 Technical Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Mon 28 Apr 2025 14:20 - 14:30 at 214 - AI for SE (1) Chair(s): Diego Elias Costa

Abstract

Metamodel matching is a crucial step in defining transformation rules within model-driven engineering, as it identifies correspondences between different metamodels and lays the foundation for effective transformations. Current techniques face significant challenges due to syntactical and structural heterogeneity. To address this, matching techniques often employ semantic similarity to identify correspondences. Traditional semantic matchers, however, rely on ontology matching tools or lexical databases, which can struggle when metamodels use different terminologies or have different hierarchical structures. Inspired by the contextual understanding capabilities of Large Language Models (LLMs), we explore their applicability—specifically GPT-4—as semantic matchers and alternatives to baseline methods for metamodel matching. However, metamodels can be large, which can overwhelm LLMs if provided in a single prompt, leading to reduced accuracy. Therefore, we propose prompting LLMs with fragments of the source and target metamodels, identifying correspondences through an iterative process. The fragments to be provided in the prompt are identified based on an initial mapping derived from their elements’ definitions. Through experiments with 10 metamodel matching cases, our results show that our LLM-based approach remarkably improves the accuracy of metamodel matching, achieving an average F-measure of 91%, greatly outperforming both the baseline and hybrid approaches, which have a maximum average F-measure of 29% and 74%, respectively. Moreover, our approach surpasses single-prompt LLM-based matching, which has an average F-measure of 80%, by approximately 11%

Nafisa Ahmed

Polytechnique Montreal

Hin Chi Kwok

Hong Kong Polytechnic University

Mohammad Hamdaqa

Polytechnique Montreal

Canada

Wesley Assunção

Johannes Kepler University Linz

United States

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

	14:00 - 15:30	AI for SE (1)Technical Papers / Data and Tool Showcase Track / Registered Reports / Program at 214 Chair(s): Diego Elias Costa Concordia University, Canada

	14:00 10m Talk		Combining Large Language Models with Static Analyzers for Code Review Generation Technical Papers Imen Jaoua DIRO, Université de Montréal, Oussama Ben Sghaier DIRO, Université de Montréal, Houari Sahraoui DIRO, Université de Montréal Pre-print
	14:10 10m Talk		Harnessing Large Language Models for Curated Code Reviews Technical Papers Oussama Ben Sghaier DIRO, Université de Montréal, Martin Weyssow Singapore Management University, Houari Sahraoui DIRO, Université de Montréal Pre-print
	14:20 10m Talk		SMATCH-M-LLM: Semantic Similarity in Metamodel Matching With Large Language Models Technical Papers Nafisa Ahmed Polytechnique Montreal, Hin Chi Kwok Hong Kong Polytechnic University, Mohammad Hamdaqa Polytechnique Montreal, Wesley Assunção Johannes Kepler University Linz
	14:30 10m Talk		How Effective are LLMs for Data Science Coding? A Controlled ExperimentTechnical Track Distinguished Paper Award Technical Papers Nathalia Nascimento Pennsylvania State University, Everton Guimaraes Pennsylvania State University, Sai Sanjna Chintakunta Pennsylvania State University, Santhosh AB Pennsylvania State University Pre-print
	14:40 10m Talk		Do LLMs Provide Links to Code Similar to what they Generate? A Study with Gemini and Bing CoPilot Technical Papers Daniele Bifolco University of Sannio, Pietro Cassieri University of Salerno, Giuseppe Scanniello University of Salerno, Massimiliano Di Penta University of Sannio, Italy, Fiorella Zampetti University of Sannio, Italy Pre-print
	14:50 10m Talk		Too Noisy To Learn: Enhancing Data Quality for Code Review Comment Generation Technical Papers Chunhua Liu The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
	15:00 5m Talk		Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks Technical Papers Kyi Shin Khant The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
	15:05 5m Talk		RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering Data and Tool Showcase Track Samuel Abedu Concordia University, Laurine Menneron CESI Graduate School of Engineering, SayedHassan Khatoonabadi Concordia University, Montreal, Emad Shihab Concordia University, Montreal
	15:10 5m Talk		How do Copilot Suggestions Impact Developers' Frustration and Productivity? Registered Reports Emanuela Guglielmi University of Molise, Venera Arnaoudova Washington State University, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Rocco Oliveto University of Molise, Simone Scalabrino University of Molise
	15:15 5m Talk		Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software Repositories Registered Reports Matin Koohjani Concordia University, Diego Elias Costa Concordia University, Canada Pre-print