The algorithm's success rate is higher than a human scientist, in part because it's analysing data from failed experiments, otherwise known as "dark reactions." Often, these sit in laboratory notebooks, accessible only to the scientist that conducted the original experiment. But the team from Haverford College has taken a different approach, digitizing thousands of successful and failed reactions to create a vast, publicly accessible repository. Associate Professor of Chemistry Joshua Schrier broke down the properties of each experiment, while fellow Associate Professor of Chemistry Alexander Norquist worked on the machine-learning algorithm.
As Nature explains, the team has been focusing on crystalline reactions, produced by mixing and heating a set of reagents in a solvent. Specifically, this involved materials called vanadium selenites -- compounds of vanadium, selenium and oxygen. While examining their notes, the researchers predicted new reactions based on their years of scientific experience. But the algorithm was is able to look deeper, spotting underlying patterns that might not be obvious to the human brain.
"I think about the failures as the bit of the iceberg that's underwater -- we only ever see the top."
The numbers back up this hypothesis; the algorithm, when tested, was able to generate a crystalline product in 89 percent of roughly 500 cases. The researchers, meanwhile, were successful 78 percent of the time. "Leveraging unpublished data in an unbiased way by machine learning models can lead to invaluable predictions," says Harvard Professor of Chemistry and Chemical Biology Alán Aspuru-Guzik. "In particular, the authors show that non-trivial correlations and predictions can arise from laboratory notebook data that can accelerate new materials discovery."
Such thinking could change the way scientific discoveries are reported. At the moment, researchers often limit their papers to the materials and processes that caused a successful compound. The multitude of failures are left out. "There could have been a hundred total reactions that went into the development or the refinement of the conditions in order to give those specific reactions," Norquist explains. "I think about the failures as the bit of the iceberg that's underwater -- we only ever see the top."
The team's database is available online as the Dark Reactions Project. The hope is that other scientists will share their failed attempts, improving the dataset and the machine-learning algorithm's predictions.