Sage Bionetworks / DREAM Breast Cancer Prognosis Challenge

Last year at the Sage Congress, we announced a contest to create a computational model that more accurately predicts breast cancer survival than existing models.

It was called the Sage Bionetworks / DREAM Breast Cancer Prognosis Challenge, and we built it on the following basis information:

•    Training data set: genomic and clinical data from 2000 women diagnosed with breast cancer (the METABRIC data set)
•    Data access and analysis tools: Synapse
•    Compute resources: each participant provided with a standardized virtual machine donated by Google
•    Model scoring: models submitted to Synapse for scoring on a real-time leaderboard

This Challenge was open source and encouraged code-sharing to forge innovative computational models. The standardized and shared computational infrastructure enabled participants to use code submitted by others in their own model building, and the winning code must be reproducible.

We’re using a brand new dataset to select the winning model, derived from approximately 180 breast cancer samples, with data generation funded by Avon. The winning model will be the one that, having been trained using METABRIC data, is most accurate for survival prediction when applied to a brand new dataset.

The winner gets, in addition to peer acclaim, the right to submit a pre-approved article about his/her winning model to Science Translational Medicine.

354 participants from 35 countries registered for the challenge.  Throughout the model-training period, more than 1700 models were submitted to the leaderboard on Synapse for scoring (the code for all the models on the leaderboard was accessible to all to use to evolve new models).

Each participant was invited to submit up to 5 of their models, from all of those that they had generated, for final scoring against the METABRIC data set. The top two teams were recognized at the Nov 2012 DREAM conference.

•    Top Scoring Team: Attractor Metagenes
–    Team Members: Wei-Yi Cheng, Tai-Hsien Ou Yang, and Dimitris Anastassiou
–    Affiliation: Center for Computational Biology and Bioinformatics and Department of Electrical Engineering, Columbia University
•    Second Best Scoring Team: PittTransMed
–    Team Members: Dr. Songjian Lu, Dr. Chunhui Cai, Ms. Hatice Ulku Osmanbeyoglu, Ms. Lujia Chen, Dr. Roger Day, Dr. Gregory Cooper and Dr. Xinghua Lu
–    Affiliation: Department of Biomedical Informatics, School of Medicine, University of Pittsburgh

The same or different of their models are now being tested against the newly generated Oslo-Val data set, and a single winner will publish in STM and to be invited to speak at the Congress.

For final scoring, 46 individuals or teams ended up submitting models.  So in total, there were about 170 final models to score against Metabric and then against Oslo-Val.

We’ll be announcing the final winner soon. To stay up on this and other Sage Bionetworks news, follow us on Twitter.