RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering
Software repositories contain a wealth of data about the software development process, such as source code, documentation, issue tracking, and commit histories. However, accessing and extracting meaningful insights from these data is time-consuming and requires technical expertise, posing challenges for software practitioners, especially non-technical stakeholders like project managers. Existing solutions, such as software engineering chatbots leveraging LLMs, have demonstrated significant limitations in retrieving relevant data to answer user questions. In this paper, we introduce RepoChat, a web-based tool designed to answer repository-related questions by synergizing LLMs with knowledge graphs. RepoChat operates in two steps: (1) the Data Ingestion step, where it collects and constructs a knowledge graph from repository metadata, such as commits, issues, files and users; and (2) the Interaction step, where it takes the users natural language question, translates it into graph queries using an LLM, executes these queries against the knowledge graph, and generates a user-friendly response to the question using the query results as context. We evaluate RepoChat by conducting a user study in which participants asked a series of repository-related questions representing common developer intents. RepoChat achieved an accuracy of 90%, correctly answering 36 out of 40 questions, demonstrating its effectiveness in accurately retrieving relevant information to answer user’s questions. RepoChat is available at https://repochattool.streamlit.app, and its source code is accessible on GitHub at https://github.com/sabedu/repositoryChat.