The research team trained their machine learning model on 250,000 molecular graphs, which are basically detailed images of a molecule's structure. The researchers then had the model generate molecules, find the best base molecules to build off of and design new molecules with improved properties. The researchers found that their model was able to complete these tasks more effectively than other systems designed to automate the drug design process.
When tasked with generating new, valid molecules, each one the model created turned out to be valid. And that's particularly important since producing invalid molecules is a major shortcoming of other automation systems -- of the others the researchers compared their model to, the best only had a 43.5 percent validity rate. Secondly, when the model was told to find the best base molecule -- known as a lead molecule -- that is both highly soluble and easily synthesized, it again outperformed other systems. The best candidate molecule generated by their model scored 30 percent higher on those two desired properties than the best option produced by more traditional systems. Lastly, when the model was told to modify 800 molecules to improve them for those properties but keep them similar in structure to the lead molecule, around 80 percent of the time, it created new, similarly structured molecules that scored higher for those two properties than did the original molecules.
Going forward, the research team will test the model on other pharmaceutical properties and work to make a model that can function with limited amounts of training data. The research will be presented next week at the International Conference on Machine Learning.