Blog
Improving Data Science Competitions to Advance Mental Health Research
In the past three years, the Child Mind Institute has led several highly successful data science competitions, attracting participation from over 9,000 teams of researchers, data scientists, and those interested in mental health. Now, researchers from the Centers for Data Analytics, Innovation, and Rigor (DAIR) and the Strategic Data Initiatives (SDI) have published new recommendations for designing and promoting successful data science competitions.
The paper, published in peer-reviewed journal Nature Mental Health, offers guidance and highlights challenges for structuring competitions to promote scientific advancement focused on brain health data. Accompanying the publication, the team has released a Data Science Competition Organizer Checklist to help researchers organize more effective competitions in the brain and mental health fields.
“Data science competitions offer a powerful way to crowdsource innovative solutions and multidisciplinary expertise,” says Gregory Kiar, PhD, director of the DAIR Center. “However, simply making data publicly available doesn’t guarantee meaningful participation or scientifically useful outcomes. Our goal was to provide a roadmap for maximizing both the inclusivity and scientific impact of these competitions.”
The paper addresses a fundamental source of tension in data science competitions: participants are typically rewarded for achieving the highest scores on performance metrics rather than generating scientifically meaningful insights. The researchers recommend that organizers design datasets and evaluation criteria that minimize the risk of exploiting these structural biases whenever possible.
It is also critical to recognize participants who contribute valuable discussions and interpretations, through “medals” or other reward systems. This way, organizers can leverage participant contributions to directly advance the scientific question — and drive meaningful dialogue that advances our understanding of mental health and informs future research and clinical applications.
In addition, competition organizers should consider and seek to remove barriers that prevent equitable participation. This includes recognizing underrepresented groups in data science and offering educational materials that support skill development.
“The availability of big data doesn’t automatically lead to engagement,” explains Arianna Zuanazzi, PhD, author and Open Science and Research Collaboration specialist. “Organizers need to deliberately plan for inclusivity, from recruiting diverse participants to ensuring datasets themselves don’t reinforce existing healthcare and other systemic biases.”
The Child Mind Institute has been at the forefront of supporting open science practices that have made such competitions possible and impactful. Prior competitions have featured data from the Healthy Brain Network, with support from Kaggle, the California Department of Health Care Services, Dell Technologies, NVIDIA, and the Stavros Niarchos Foundation. The Data Science Competition Organizer Checklist is freely available on the Open Science Framework for researchers and organizations looking to design their next competition.