MSR 2025
Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025

The International Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge, we call upon everyone interested to apply their tools to a common dataset. The challenge is for researchers and practitioners to bravely use their mining tools and approaches on a dare.

One of the secret ingredients behind the success of the International Conference on Mining Software Repositories (MSR) is its annual Mining Challenge, in which MSR participants can showcase their techniques, tools, and creativity on a common data set. In true MSR fashion, this data set is a real data set contributed by researchers in the community, solicited through an open call. There are many benefits of sharing a data set for the MSR Mining Challenge. The selected challenge proposal explaining the data set will appear in the MSR 2025 proceedings, and the challenge papers using the data set will be required to cite the challenge proposal or an existing paper of the researchers about the selected data set. Furthermore, the authors of the data set will join the MSR 2025 organizing committee as Mining Challenge (co-)chair(s), who will manage the reviewing process (e.g., recruiting a Challenge PC, managing submissions, and reviewing assignments). Finally, it is not uncommon for challenge data sets to feature in MSR and other publications well after the edition of the conference in which they appear!

If you would like to submit your data set for consideration for the 2025 MSR Mining Challenge, prepare a short proposal (1-2 pages plus appendices, if needed) containing the following information:

  1. Title of data set.
  2. High-level overview:
    • Short description, including what types of artifacts the data set contains.
    • Summary statistics (how many artifacts of different types).
  3. Internal structure:
    • How are the data structured and organized?
    • (Link to) Schema, if applicable
  4. How to access:
    • How can the data set be obtained?
    • What are recommended ways to access it? Include examples of specific tools, shell commands, etc, if applicable.
    • What skills, infrastructure, and/or credentials would challenge participants need to effectively work with the data set?
  5. What kinds of research questions do you expect challenge participants could answer?
  6. A link to a (sub)sample of the data for the organizing committee to pursue (e.g., via GitHub, Zenodo, Figshare).

Submissions must conform to the IEEE conference proceedings template, specified in the IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options).

The first task of the authors of the selected proposal will be to prepare the Call for Challenge Papers, which outlines the expected content and structure of submissions, as well as the technical details of how to access and analyze the data set. This call will be published on the MSR website on September 2nd. By making the challenge data set available by late summer, we hope that many students will be able to use the challenge data set for their graduate class projects in the Fall semester.

Important dates:

Deadline for proposals: August 19, 2024

Notification: August 26, 2024

Call for Challenge Papers Published: September 2, 2024

More details to follow …