RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering (MSR 2025 - Data and Tool Showcase Track)

Who

Samuel Abedu, Laurine Menneron, SayedHassan Khatoonabadi, Emad Shihab

Track

MSR 2025 Data and Tool Showcase Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 15:05 - 15:10 at 214 - AI for SE (1) Chair(s): Diego Elias Costa

Abstract

Software repositories contain a wealth of data about the software development process, such as source code, documentation, issue tracking, and commit histories. However, accessing and extracting meaningful insights from these data is time-consuming and requires technical expertise, posing challenges for software practitioners, especially non-technical stakeholders like project managers. Existing solutions, such as software engineering chatbots leveraging LLMs, have demonstrated significant limitations in retrieving relevant data to answer user questions. In this paper, we introduce RepoChat, a web-based tool designed to answer repository-related questions by synergizing LLMs with knowledge graphs. RepoChat operates in two steps: (1) the Data Ingestion step, where it collects and constructs a knowledge graph from repository metadata, such as commits, issues, files and users; and (2) the Interaction step, where it takes the users natural language question, translates it into graph queries using an LLM, executes these queries against the knowledge graph, and generates a user-friendly response to the question using the query results as context. We evaluate RepoChat by conducting a user study in which participants asked a series of repository-related questions representing common developer intents. RepoChat achieved an accuracy of 90%, correctly answering 36 out of 40 questions, demonstrating its effectiveness in accurately retrieving relevant information to answer user’s questions. RepoChat is available at https://repochattool.streamlit.app, and its source code is accessible on GitHub at https://github.com/sabedu/repositoryChat.

Samuel Abedu

Concordia University

Canada

Laurine Menneron

CESI Graduate School of Engineering

France

SayedHassan Khatoonabadi

Concordia University, Montreal

Canada

Emad Shihab

Concordia University, Montreal

Canada

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	AI for SE (1)Technical Papers / Data and Tool Showcase Track / Registered Reports at 214 Chair(s): Diego Elias Costa Concordia University, Canada

14:00 10m Talk		Combining Large Language Models with Static Analyzers for Code Review Generation Technical Papers Imen Jaoua DIRO, Université de Montréal, Oussama Ben Sghaier DIRO, Université de Montréal, Houari Sahraoui DIRO, Université de Montréal Pre-print
14:10 10m Talk		Harnessing Large Language Models for Curated Code Reviews Technical Papers Oussama Ben Sghaier DIRO, Université de Montréal, Martin Weyssow Singapore Management University, Houari Sahraoui DIRO, Université de Montréal Pre-print
14:20 10m Talk		SMATCH-M-LLM: Semantic Similarity in Metamodel Matching With Large Language Models Technical Papers Nafisa Ahmed Polytechnique Montreal, Hin Chi Kwok Hong Kong Polytechnic University, Mohammad Hamdaqa Polytechnique Montreal, Wesley Assunção Johannes Kepler University Linz
14:30 10m Talk		How Effective are LLMs for Data Science Coding? A Controlled ExperimentTechnical Track Distinguished Paper Award Technical Papers Nathalia Nascimento Pennsylvania State University, Everton Guimaraes Pennsylvania State University, Sai Sanjna Chintakunta Pennsylvania State University, Santhosh AB Pennsylvania State University Pre-print
14:40 10m Talk		Do LLMs Provide Links to Code Similar to what they Generate? A Study with Gemini and Bing CoPilot Technical Papers Daniele Bifolco University of Sannio, Pietro Cassieri University of Salerno, Giuseppe Scanniello University of Salerno, Massimiliano Di Penta University of Sannio, Italy, Fiorella Zampetti University of Sannio, Italy Pre-print
14:50 10m Talk		Too Noisy To Learn: Enhancing Data Quality for Code Review Comment Generation Technical Papers Chunhua Liu The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:00 5m Talk		Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks Technical Papers Kyi Shin Khant The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
15:05 5m Talk		RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering Data and Tool Showcase Track Samuel Abedu Concordia University, Laurine Menneron CESI Graduate School of Engineering, SayedHassan Khatoonabadi Concordia University, Montreal, Emad Shihab Concordia University, Montreal
15:10 5m Talk		How do Copilot Suggestions Impact Developers' Frustration and Productivity? Registered Reports Emanuela Guglielmi University of Molise, Venera Arnaoudova Washington State University, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Rocco Oliveto University of Molise, Simone Scalabrino University of Molise
15:15 5m Talk		Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software Repositories Registered Reports Matin Koohjani Concordia University, Diego Elias Costa Concordia University, Canada Pre-print