MIT researchers automate drug design with machine learning

Their model can generate molecules that could be used for therapeutics.

Developing and improving medications is typically a long and very involved process. Chemists build and tweak molecules, sometimes aiming to create a new treatment for a specific disease or symptom, other times working to improve a drug that already exists. But it takes a lot of time and a lot of expert knowledge, and attempts often end with a drug that doesn't work as hoped. But researchers at MIT are using machine learning to automate this process. "The motivation behind this was to replace the inefficient human modification process of designing molecules with automated iteration and assure the validity of the molecules we generate," Wengong Jin, a PhD student in MIT's Computer Science and Artificial Intelligence Laboratory, said in a statement.

The research team trained their machine learning model on 250,000 molecular graphs, which are basically detailed images of a molecule's structure. The researchers then had the model generate molecules, find the best base molecules to build off of and design new molecules with improved properties. The researchers found that their model was able to complete these tasks more effectively than other systems designed to automate the drug design process.

When tasked with generating new, valid molecules, each one the model created turned out to be valid. And that's particularly important since producing invalid molecules is a major shortcoming of other automation systems -- of the others the researchers compared their model to, the best only had a 43.5 percent validity rate. Secondly, when the model was told to find the best base molecule -- known as a lead molecule -- that is both highly soluble and easily synthesized, it again outperformed other systems. The best candidate molecule generated by their model scored 30 percent higher on those two desired properties than the best option produced by more traditional systems. Lastly, when the model was told to modify 800 molecules to improve them for those properties but keep them similar in structure to the lead molecule, around 80 percent of the time, it created new, similarly structured molecules that scored higher for those two properties than did the original molecules.

Going forward, the research team will test the model on other pharmaceutical properties and work to make a model that can function with limited amounts of training data. The research will be presented next week at the International Conference on Machine Learning.