MIT Research News' Journal
[Most Recent Entries]
[Calendar View]
Friday, July 6th, 2018
| Time |
Event |
| 12:00a |
Automating molecule design to speed up drug development Designing new molecules for pharmaceuticals is primarily a manual, time-consuming process that’s prone to error. But MIT researchers have now taken a step toward fully automating the design process, which could drastically speed things up — and produce better results.
Drug discovery relies on lead optimization. In this process, chemists select a target (“lead”) molecule with known potential to combat a specific disease, then tweak its chemical properties for higher potency and other factors.
Often, chemists use expert knowledge and conduct manual tweaking of molecules, adding and subtracting functional groups — atoms and bonds responsible for specific chemical reactions — one by one. Even if they use systems that predict optimal chemical properties, chemists still need to do each modification step themselves. This can take hours for each iteration and may still not produce a valid drug candidate.
Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Department of Electrical Engineering and Computer Science (EECS) have developed a model that better selects lead molecule candidates based on desired properties. It also modifies the molecular structure needed to achieve a higher potency, while ensuring the molecule is still chemically valid.
The model basically takes as input molecular structure data and directly creates molecular graphs — detailed representations of a molecular structure, with nodes representing atoms and edges representing bonds. It breaks those graphs down into smaller clusters of valid functional groups that it uses as “building blocks” that help it more accurately reconstruct and better modify molecules.
“The motivation behind this was to replace the inefficient human modification process of designing molecules with automated iteration and assure the validity of the molecules we generate,” says Wengong Jin, a PhD student in CSAIL and lead author of a paper describing the model that’s being presented at the 2018 International Conference on Machine Learning in July.
Joining Jin on the paper are Regina Barzilay, the Delta Electronics Professor at CSAIL and EECS and Tommi S. Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science in CSAIL, EECS, and at the Institute for Data, Systems, and Society.
The research was conducted as part of the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium between MIT and eight pharmaceutical companies, announced in May. The consortium identified lead optimization as one key challenge in drug discovery.
“Today, it’s really a craft, which requires a lot of skilled chemists to succeed, and that’s what we want to improve,” Barzilay says. “The next step is to take this technology from academia to use on real pharmaceutical design cases, and demonstrate that it can assist human chemists in doing their work, which can be challenging.”
“Automating the process also presents new machine-learning challenges,” Jaakkola says. “Learning to relate, modify, and generate molecular graphs drives new technical ideas and methods.”
Generating molecular graphs
Systems that attempt to automate molecule design have cropped up in recent years, but their problem is validity. Those systems, Jin says, often generate molecules that are invalid under chemical rules, and they fails to produce molecules with optimal properties. This essentially makes full automation of molecule design infeasible.
These systems run on linear notations of molecules, called “simplified molecular-input line-entry systems,” or SMILES, where long strings of letters, numbers, and symbols represent individual atoms or bonds that can be interpreted by computer software. As the system modifies a lead molecule, it expands its string representation symbol by symbol — atom by atom, and bond by bond — until it generates a final SMILES string with higher potency of a desired property. In the end, the system may produce a final SMILES string that seems valid under SMILES grammar, but is actually invalid.
The researchers solve this issue by building a model that runs directly on molecular graphs, instead of SMILES strings, which can be modified more efficiently and accurately.
Powering the model is a custom variational autoencoder — a neural network that “encodes” an input molecule into a vector, which is basically a storage space for the molecule’s structural data, and then “decodes” that vector to a graph that matches the input molecule.
At encoding phase, the model breaks down each molecular graph into clusters, or “subgraphs,” each of which represents a specific building block. Such clusters are automatically constructed by a common machine-learning concept, called tree decomposition, where a complex graph is mapped into a tree structure of clusters — “which gives a scaffold of the original graph,” Jin says.
Both scaffold tree structure and molecular graph structure are encoded into their own vectors, where molecules are group together by similarity. This makes finding and modifying molecules an easier task.
At decoding phase, the model reconstructs the molecular graph in a “coarse-to-fine” manner — gradually increasing resolution of a low-resolution image to create a more refined version. It first generates the tree-structured scaffold, and then assembles the associated clusters (nodes in the tree) together into a coherent molecular graph. This ensures the reconstructed molecular graph is an exact replication of the original structure.
For lead optimization, the model can then modify lead molecules based on a desired property. It does so with aid of a prediction algorithm that scores each molecule with a potency value of that property. In the paper, for instance, the researchers sought molecules with a combination of two properties — high solubility and synthetic accessibility.
Given a desired property, the model optimizes a lead molecule by using the prediction algorithm to modify its vector — and, therefore, structure — by editing the molecule’s functional groups to achieve a higher potency score. It repeats this step for multiple iterations, until it finds the highest predicted potency score. Then, the model finally decodes a new molecule from the updated vector, with modified structure, by compiling all the corresponding clusters.
Valid and more potent
The researchers trained their model on 250,000 molecular graphs from the ZINC database, a collection of 3-D molecular structures available for public use. They tested the model on tasks to generate valid molecules, find the best lead molecules, and design novel molecules with increase potencies.
In the first test, the researchers’ model generated 100 percent chemically valid molecules from a sample distribution, compared to SMILES models that generated 43 percent valid molecules from the same distribution.
The second test involved two tasks. First, the model searched the entire collection of molecules to find the best lead molecule for the desired properties — solubility and synthetic accessibility. In that task, the model found a lead molecule with a 30 percent higher potency than traditional systems. The second task involved modifying 800 molecules for higher potency, but are structurally similar to the lead molecule. In doing so, the model created new molecules, closely resembling the lead’s structure, averaging a more than 80 percent improvement in potency.
The researchers next aim to test the model on more properties, beyond solubility, which are more therapeutically relevant. That, however, requires more data. “Pharmaceutical companies are more interested in properties that fight against biological targets, but they have less data on those. A challenge is developing a model that can work with a limited amount of training data,” Jin says. | | 11:25a |
J-PAL North America’s Education, Technology, and Opportunity Innovation Competition announces inaugural partners J-PAL North America, a research center at MIT, has announced that it will partner with two leading education technology nonprofits to test promising models to improve learning, as part of the inaugural round of the Education, Technology, and Opportunity Innovation Competition.
Launched at MIT this past year, J-PAL North America’s Education, Technology, and Opportunity Innovation Competition supports education leaders in using randomized evaluations to generate evidence on how technology can improve student learning, particularly for students from disadvantaged backgrounds.
J-PAL North America’s inaugural competition partners are the Family Engagement Lab, an education technology nonprofit that aims to promote effective at-home learning opportunities, and the Western Governors University Center for Applied Learning Science, an online innovation lab that seeks to improve student performance in math.
“We’re excited to partner with Family Engagement Lab and Western Governors University to develop randomized evaluations that can help us better understand the potential for technology to meaningfully improve education outcomes,” says Philip Oreopoulos, professor of economics at the University of Toronto and co-chair of the J-PAL Education, Technology, and Opportunity Initiative. “Technology presents an exciting opportunity to deliver promising new and novel approaches at scale. But with so many innovative programs out there, it’s crucial that researchers and practitioners work together to test, identify, and improve upon effective programs and understand their mechanisms.”
Family Engagement Lab will partner with J-PAL North America to develop an evaluation of FASTalk (Families and Schools Talk), a multilingual digital messaging platform that helps prekindergarten through grade 5 teachers engage with hard-to-reach parents.
Family engagement in the learning process has been found to improve student outcomes. However, it can be challenging for teachers to connect with their students’ families and ensure that classroom learnings are reinforced by learning activities at home. For some households, language barriers make teacher-to-parent communication particularly challenging.
Through the FASTalk platform, teachers can send learning tips and activities to the child’s caregiver in their home language. Message content is aligned to the curriculum and academic calendar. Moreover, the content is pre-scheduled and automatically sent out to reduce demands on teacher time. The platform also supports a two-way dialogue between teachers and parents to facilitate ongoing, reciprocal communication.
"We’re thrilled to be a winner of the J-PAL Education, Technology, and Opportunity Innovation Competition,” says Elisabeth O’Bryon, co-founder and head of research for Family Engagement Lab. “Understanding the impact of FASTalk is crucial as we work towards our goal of developing a scalable, evidence-based family engagement program that meaningfully supports teachers, families, and students. It is an amazing opportunity to have J-PAL’s support to design and implement a randomized evaluation of FASTalk to evaluate the student-level effects of teachers sending curriculum-aligned learning activities to families."
Western Governors University’s (WGU) Center for Applied Learning Science (CALS) seeks to develop scalable models to improve student learning in math. Almost 64 percent of adult learners start at WGU with limited math proficiency, and survey results indicate that many WGU students feel anxiety around their math ability and aptitude.
WGU is partnering with J-PAL North America to rigorously evaluate a suite of four adaptive online interventions that support mathematical thinking and reasoning. The interventions aim to help students cultivate a math mindset, offer concrete strategies to reduce math anxiety, and utilize concept mapping to increase comprehension of key math concepts.
Beyond its population of 95,000 online learners, WGU seeks to understand whether these interventions can improve math and overall academic performance at community colleges.
“We are honored to be named partners with J-PAL. CALS is focused on using scientific principles to develop and evaluate ed-tech products that improve student learning,” says Jason Levin, vice president of institutional research at WGU. “J-PAL’s mission of using rigorous science to improve outcomes for disadvantaged groups is perfectly aligned with what we do at CALS. We look forward to learning from the network of experience that J-PAL provides. Together with J-PAL we hope to make a big impact for students.”
J-PAL North America will work with these two organizations to build evidence around how technology can improve learning. Despite rapid innovation and substantial investment in education technology, there is little rigorous research to help decision-makers understand which uses of education technology are truly helping students learn.
J-PAL North America is a regional office of the Abdul Latif Jameel Poverty Action Lab. J-PAL was established in 2003 as a research center at MIT’s Department of Economics within the School of Humanities, Arts, and Social Sciences. Since then, it has built a global network of affiliated professors based at over 50 universities and regional offices in Africa, Europe, Latin America and the Caribbean, North America, South Asia, and Southeast Asia. J-PAL North America was established with support from the Alfred P. Sloan Foundation and the Laura and John Arnold Foundation and works to improve the effectiveness of social programs in North America through three core activities: research, policy outreach, and capacity building. J-PAL North America’s education technology work is supported by the Laura and John Arnold Foundation and the Overdeck Family Foundation. | | 2:00p |
Project to elucidate the structure of atomic nuclei at the femtoscale The Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy (DOE) Office of Science User Facility, has selected 10 data science and machine learning projects for its Aurora Early Science Program (ESP). Set to be the nation’s first exascale system upon its expected 2021 arrival, Aurora will be capable of performing a quintillion calculations per second, making it 10 times more powerful than the fastest computer that currently exists.
The Aurora ESP, which commenced with 10 simulation-based projects in 2017, is designed to prepare key applications, libraries, and infrastructure for the architecture and scale of the exascale supercomputer. Researchers in the Laboratory for Nuclear Science’s Center for Theoretical Physics have been awarded funding for one of the projects under the ESP. Associate professor of physics William Detmold, assistant professor of physics Phiala Shanahan, and principal research scientist Andrew Pochinsky will use new techniques developed by the group, coupling novel machine learning approaches and state-of-the-art nuclear physics tools, to study the structure of nuclei.
Shanahan, who began as an assistant professor at MIT this month, says that the support and early access to frontier computing that the award provides will allow the group to study the possible interactions of dark matter particles with nuclei from our fundamental understanding of particle physics for the first time, providing critical input for experimental searches aiming to unravel the mysteries of dark matter while simultaneously giving insight into fundamental particle physics.
“Machine learning coupled with the exascale computational power of Aurora will enable spectacular advances in many areas of science,” Detmold adds. “Combining machine learning to lattice quantum chromodynamics calculations of the strong interactions between the fundamental particles that make up protons and nuclei, our project will enable a new level of understanding of the femtoscale world.” | | 2:00p |
Kirigami-inspired technique manipulates light at the nanoscale Nanokirigami has taken off as a field of research in the last few years; the approach is based on the ancient arts of origami (making 3-D shapes by folding paper) and kirigami (which allows cutting as well as folding) but applied to flat materials at the nanoscale, measured in billionths of a meter.
Now, researchers at MIT and in China have for the first time applied this approach to the creation of nanodevices to manipulate light, potentially opening up new possibilities for research and, ultimately, the creation of new light-based communications, detection, or computational devices.
The findings are described today in the journal Science Advances, in a paper by MIT professor of mechanical engineering Nicholas X Fang and five others. Using methods based on standard microchip manufacturing technology, Fang and his team used a focused ion beam to make a precise pattern of slits in a metal foil just a few tens of nanometers thick. The process causes the foil to bend and twist itself into a complex three-dimensional shape capable of selectively filtering out light with a particular polarization.

Previous attempts to create functional kirigami devices have used more complicated fabrication methods that require a series of folding steps and have been primarily aimed at mechanical rather than optical functions, Fang says. The new nanodevices, by contrast, can be formed in a single folding step and could be used to perform a number of different optical functions.
For these initial proof-of-concept devices, the team produced a nanomechanical equivalent of specialized dichroic filters that can filter out circularly polarized light that is either “right-handed” or “left-handed.” To do so, they created a pattern just a few hundred nanometers across in the thin metal foil; the result resembles pinwheel blades, with a twist in one direction that selects the corresponding twist of light.
The twisting and bending of the foil happens because of stresses introduced by the same ion beam that slices through the metal. When using ion beams with low dosages, many vacancies are created, and some of the ions end up lodged in the crystal lattice of the metal, pushing the lattice out of shape and creating strong stresses that induce the bending.
“We cut the material with an ion beam instead of scissors, by writing the focused ion beam across this metal sheet with a prescribed pattern,” Fang says. “So you end up with this metal ribbon that is wrinkling up” in the precisely planned pattern.
“It’s a very nice connection of the two fields, mechanics and optics,” Fang says. The team used helical patterns to separate out the clockwise and counterclockwise polarized portions of a light beam, which may represent “a brand new direction” for nanokirigami research, he says.

The technique is straightforward enough that, with the equations the team developed, researchers should now be able to calculate backward from a desired set of optical characteristics and produce the needed pattern of slits and folds to produce just that effect, Fang says.
“It allows a prediction based on optical functionalities” to create patterns that achieve the desired result, he adds. “Previously, people were always trying to cut by intuition” to create kirigami patterns for a particular desired outcome.
The research is still at an early stage, Fang points out, so more research will be needed on possible applications. But these devices are orders of magnitude smaller than conventional counterparts that perform the same optical functions, so these advances could lead to more complex optical chips for sensing, computation, or communications systems or biomedical devices, the team says.
For example, Fang says, devices to measure glucose levels often use measurements of light polarity, because glucose molecules exist in both right- and left-handed forms which interact differently with light. “When you pass light through the solution, you can see the concentration of one version of the molecule, as opposed to the mixture of both,” Fang explains, and this method could allow for much smaller, more efficient detectors.
Circular polarization is also a method used to allow multiple laser beams to travel through a fiber-optic cable without interfering with each other. “People have been looking for such a system for laser optical communications systems” to separate the beams in devices called optical isolaters, Fang says. “We have shown that it’s possible to make them in nanometer sizes.”
The team also included MIT graduate student Huifeng Du; Zhiguang Liu, Jiafang Li (project supervisor), and Ling Lu at the Chinese Academy of Sciences in Beijing; and Zhi-Yuan Li at the South China University of Technology. The work was supported by the National Key R&D Program of China, the National Natural Science Foundation of China, and the U.S Air Force Office of Scientific Research. |
|