MIT Research News' Journal

A new way to block unwanted genetic transfer

We receive half of our genes from each biological parent, so there’s no avoiding inheriting a blend of characteristics from both. Yet, for single-celled organisms like bacteria that reproduce by splitting into two identical cells, injecting variety into the gene pool isn’t so easy. Random mutations add some diversity, but there’s a much faster way for bacteria to reshuffle their genes and confer evolutionary advantages like antibiotic resistance or pathogenicity.

Known as horizontal gene transfer, this process permits bacteria to pass pieces of DNA to their peers, in some cases allowing those genes to be integrated into the recipient’s genome and passed down to the next generation.

The Grossman lab in the MIT Department of Biology studies one class of mobile DNA, known as integrative and conjugative elements (ICEs). While ICEs contain genes that can be beneficial to the recipient bacterium, there’s also a catch — receiving a duplicate copy of an ICE is wasteful, and possibly lethal. The biologists recently uncovered a new system by which one particular ICE, ICEBs1, blocks a donor bacterium from delivering a second, potentially deadly copy.

“Understanding how these elements function and how they're regulated will allow us to determine what drives microbial evolution,” says Alan Grossman, department head and senior author on the study. “These findings not only provide insight into how bacteria block unwanted genetic transfer, but also how we might eventually engineer this system to our own advantage.”

Former graduate student Monika Avello PhD ’18 and current graduate student Kathleen Davis are co-first authors on the study, which appeared online in Molecular Microbiology on July 30.

Checks and balances

Although plasmids are perhaps the best-known mediators of horizontal transfer, ICEs not only outnumber plasmids in most bacterial species, they also come with their own tools to exit the donor, enter the recipient, and integrate themselves into the recipient’s chromosome. Once the donor bacterium makes contact with the recipient, the machinery encoded by the ICE can pump the ICE DNA from one cell to the other through a tiny channel.

For horizontal transfer to proceed, there are physical barriers to overcome, especially in so-called Gram-positive bacteria, which boast thicker cell walls than their Gram-negative counterparts, despite being less widely studied. According to Davis, the transfer machinery essentially has to “punch a hole” through the recipient cell. “It’s a rough ride and a waste of energy for the recipient if that cell already contains an ICE with a specific set of genes,” she says.

Sure, ICEs are “selfish bits of DNA” that persist by spreading themselves as widely as possible, but in order to do so they must not interfere with their host cell’s ability to survive. As Avello explains, ICEs can’t just disseminate their DNA “without certain checks and balances.”

“There comes a point where this transfer comes at a cost to the bacteria or doesn't make sense for the element,” she says. “This study is beginning to get at the question of when, why, and how ICEs might want to block transfer.”

The Grossman lab works in the Gram-positive Bacillus subtilis, and had previously discovered two mechanisms by which ICEBs1 could prevent redundant transfer before it becomes lethal. The first, cell-cell signaling, involves the ICE in the recipient cell releasing a chemical cue that prohibits the donor’s transfer machinery from being assembled. The second, immunity, initiates if the duplicate copy is already inside the cell, and prevents the replicate from being integrated into the chromosome.

However, when the researchers tried eliminating both fail-safes simultaneously, rather than re-instating ICE transfer as they expected, the bacteria still managed to obstruct the duplicate copy. ICEBs1 seemed to have a third blocking strategy, but what might it be?

The third tactic

In this most recent study, they’ve identified the mysterious blocking mechanism as a type of “entry exclusion,” whereby the ICE in the recipient cell encodes molecular machinery that physically prevents the second copy from breaching the cell wall. Scientists had observed other mobile genetic elements capable of exclusion, but this was the first time anyone had witnessed this phenomenon for an ICE from Gram-positive bacteria, according to Avello.

The Grossman lab determined that this exclusion mechanism comes down to two key proteins. Avello identified the first protein, YddJ, expressed by the ICEBs1 in the recipient bacterium, forming a “protective coating” on the outside of the cell and blocking a second ICE from entering.

But the biologists still didn’t know which piece of transfer machinery YddJ was blocking, so Davis performed a screen and various genetic manipulations to pinpoint YddJ’s target. YddJ, it turned out, was obstructing another protein called ConG, which likely forms part of the transfer channel between the donor and recipient bacteria. Davis was surprised to find that, while Gram-negative ICEs encode a protein that’s quite similar to ConG, the Gram-negative YddJ equivalent is actually much different.

“This just goes to show that you can’t assume the transfer machinery in Gram-positive ICEs like ICEBs1 are the same as the well-studied Gram-negative ICEs,” she says.

The team concluded that ICEBs1 must have three different mechanisms to prevent duplicate transfer: the two they’d previously uncovered plus this new one, exclusion.

Cell-cell signaling allows a cell to spread the word to its neighbors that it already has a copy of ICEBs1, so there’s no need to bother assembling the transfer machinery. If this fails, exclusion kicks in to physically block the transfer machinery from penetrating the recipient cell. If that proves unsuccessful and the second copy enters the recipient, immunity will initiate and prevent the second copy from being integrated into the recipient’s chromosome.

“Each mechanism acts at a different step, because none of them alone are 100 percent effective,” Grossman says. “That’s why it’s helpful to have multiple mechanisms.”

They don’t know all the details of this transfer machinery just yet, he adds, but they do know that YddJ and ConG are key players.

“This initial description of the ICEBs1 exclusion system represents the first report that provides mechanistic insights into exclusion in Gram-positive bacteria, and one of only a few mechanistic studies of exclusion in any conjugation system,” says Gary Dunny, a professor of microbiology and immunology at the University of Minnesota who was not involved in the study. “This work is significant medically because ICEs can carry “cargo” genes such as those conferring antibiotic resistance, and also of importance to our basic understanding of horizontal gene transfer systems and how they evolve.”

As researchers continue to probe this blocking mechanism, it might be possible to leverage ICE exclusion to design bacteria with specific functions. For instance, they could engineer the gut microbiome and introduce beneficial genes to help with digestion. Or, one day, they could perhaps block horizontal gene transfer to combat antibiotic resistance.

“We had suspected that Gram-positive ICEs might be capable of exclusion, but we didn’t have proof before this,” Avello says. Now, researchers can start to speculate about how pathogenic Gram-positive species might control the movement of ICEs throughout a bacterial population, with possible ramifications for disease research.

This work was funded by research and predoctoral training grants from the National Institute of General Medical Sciences of the National Institutes of Health.

Automating artificial intelligence for medical decision-making

MIT computer scientists are hoping to accelerate the use of artificial intelligence to improve medical decision-making, by automating a key step that’s usually done by hand — and that’s becoming more laborious as certain datasets grow ever-larger.

The field of predictive analytics holds increasing promise for helping clinicians diagnose and treat patients. Machine-learning models can be trained to find patterns in patient data to aid in sepsis care, design safer chemotherapy regimens, and predict a patient’s risk of having breast cancer or dying in the ICU, to name just a few examples.

Typically, training datasets consist of many sick and healthy subjects, but with relatively little data for each subject. Experts must then find just those aspects — or “features” — in the datasets that will be important for making predictions.

This “feature engineering” can be a laborious and expensive process. But it’s becoming even more challenging with the rise of wearable sensors, because researchers can more easily monitor patients’ biometrics over long periods, tracking sleeping patterns, gait, and voice activity, for example. After only a week’s worth of monitoring, experts could have several billion data samples for each subject.

In a paper being presented at the Machine Learning for Healthcare conference this week, MIT researchers demonstrate a model that automatically learns features predictive of vocal cord disorders. The features come from a dataset of about 100 subjects, each with about a week’s worth of voice-monitoring data and several billion samples — in other words, a small number of subjects and a large amount of data per subject. The dataset contain signals captured from a little accelerometer sensor mounted on subjects’ necks.

In experiments, the model used features automatically extracted from these data to classify, with high accuracy, patients with and without vocal cord nodules. These are lesions that develop in the larynx, often because of patterns of voice misuse such as belting out songs or yelling. Importantly, the model accomplished this task without a large set of hand-labeled data.

“It’s becoming increasing easy to collect long time-series datasets. But you have physicians that need to apply their knowledge to labeling the dataset,” says lead author Jose Javier Gonzalez Ortiz, a PhD student in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). “We want to remove that manual part for the experts and offload all feature engineering to a machine-learning model.”

The model can be adapted to learn patterns of any disease or condition. But the ability to detect the daily voice-usage patterns associated with vocal cord nodules is an important step in developing improved methods to prevent, diagnose, and treat the disorder, the researchers say. That could include designing new ways to identify and alert people to potentially damaging vocal behaviors.

Joining Gonzalez Ortiz on the paper is John Guttag, the Dugald C. Jackson Professor of Computer Science and Electrical Engineering and head of CSAIL’s Data Driven Inference Group; Robert Hillman, Jarrad Van Stan, and Daryush Mehta, all of Massachusetts General Hospital’s Center for Laryngeal Surgery and Voice Rehabilitation; and Marzyeh Ghassemi, an assistant professor of computer science and medicine at the University of Toronto.

Forced feature-learning

For years, the MIT researchers have worked with the Center for Laryngeal Surgery and Voice Rehabilitation to develop and analyze data from a sensor to track subject voice usage during all waking hours. The sensor is an accelerometer with a node that sticks to the neck and is connected to a smartphone. As the person talks, the smartphone gathers data from the displacements in the accelerometer.

In their work, the researchers collected a week’s worth of this data — called “time-series” data — from 104 subjects, half of whom were diagnosed with vocal cord nodules. For each patient, there was also a matching control, meaning a healthy subject of similar age, sex, occupation, and other factors.

Traditionally, experts would need to manually identify features that may be useful for a model to detect various diseases or conditions. That helps prevent a common machine-learning problem in health care: overfitting. That’s when, in training, a model “memorizes” subject data instead of learning just the clinically relevant features. In testing, those models often fail to discern similar patterns in previously unseen subjects.

“Instead of learning features that are clinically significant, a model sees patterns and says, ‘This is Sarah, and I know Sarah is healthy, and this is Peter, who has a vocal cord nodule.’ So, it’s just memorizing patterns of subjects. Then, when it sees data from Andrew, which has a new vocal usage pattern, it can’t figure out if those patterns match a classification,” Gonzalez Ortiz says.

The main challenge, then, was preventing overfitting while automating manual feature engineering. To that end, the researchers forced the model to learn features without subject information. For their task, that meant capturing all moments when subjects speak and the intensity of their voices.

As their model crawls through a subject’s data, it’s programmed to locate voicing segments, which comprise only roughly 10 percent of the data. For each of these voicing windows, the model computes a spectrogram, a visual representation of the spectrum of frequencies varying over time, which is often used for speech processing tasks. The spectrograms are then stored as large matrices of thousands of values.

But those matrices are huge and difficult to process. So, an autoencoder — a neural network optimized to generate efficient data encodings from large amounts of data — first compresses the spectrogram into an encoding of 30 values. It then decompresses that encoding into a separate spectrogram.

Basically, the model must ensure that the decompressed spectrogram closely resembles the original spectrogram input. In doing so, it’s forced to learn the compressed representation of every spectrogram segment input over each subject’s entire time-series data. The compressed representations are the features that help train machine-learning models to make predictions.

Mapping normal and abnormal features

In training, the model learns to map those features to “patients” or “controls.” Patients will have more voicing patterns than will controls. In testing on previously unseen subjects, the model similarly condenses all spectrogram segments into a reduced set of features. Then, it’s majority rules: If the subject has mostly abnormal voicing segments, they’re classified as patients; if they have mostly normal ones, they’re classified as controls.

In experiments, the model performed as accurately as state-of-the-art models that require manual feature engineering. Importantly, the researchers’ model performed accurately in both training and testing, indicating it’s learning clinically relevant patterns from the data, not subject-specific information.

Next, the researchers want to monitor how various treatments — such as surgery and vocal therapy — impact vocal behavior. If patients’ behaviors move form abnormal to normal over time, they’re most likely improving. They also hope to use a similar technique on electrocardiogram data, which is used to track muscular functions of the heart.