MIT Research News' Journal
 
[Most Recent Entries] [Calendar View]

Tuesday, August 9th, 2016

    Time Event
    12:00a
    Protecting privacy in genomic databases

    Genome-wide association studies, which try to find correlations between particular genetic variations and disease diagnoses, are a staple of modern medical research.

    But because they depend on databases that contain people’s medical histories, they carry privacy risks. An attacker armed with genetic information about someone — from, say, a skin sample — could query a database for that person’s medical data. Even without the skin sample, an attacker who was permitted to make repeated queries, each informed by the results of the last, could, in principle, extract private data from the database.

    In the latest issue of the journal Cell Systems, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and Indiana University at Bloomington describe a new system that permits database queries for genome-wide association studies but reduces the chances of privacy compromises to almost zero.

    It does that by adding a little bit of misinformation to the query results it returns. That means that researchers using the system could begin looking for drug targets with slightly inaccurate data. But in most cases, the answers returned by the system will be close enough to be useful.

    And an instantly searchable online database of genetic data, even one that returned slightly inaccurate information, could make biomedical research much more efficient.

    “Right now, what a lot of people do, including the NIH, for a long time, is take all their data — including, often, aggregate data, the statistics we’re interested in protecting — and put them into repositories,” says Sean Simmons, an MIT postdoc in mathematics and first author on the new paper. “And you have to go through a time-consuming process to get access to them.”

    That process involves a raft of paperwork, including explanations of how the research enabled by the repositories will contribute to the public good, which requires careful review. “We’ve waited months to get access to various repositories,” says Bonnie Berger, the Simons Professor of Mathematics at MIT, who was Simmons’s thesis advisor and is the corresponding author on the paper. “Months.”

    Bring the noise

    Genome-wide association studies generally rely on genetic variations called single-nucleotide polymorphisms, or SNPs (pronounced “snips”). A SNP is a variation of one nucleotide, or DNA “letter,” at a specified location in the genome. Millions of SNPs have been identified in the human population, and certain combinations of SNPs can serve as proxies for larger stretches of DNA that tend to be conserved among individuals.

    The new system, which Berger and Simmons developed together with Cenk Sahinalp, a professor of computer science at Indiana University, implements a technique called “differential privacy,” which has been a major area of cryptographic research in recent years. Differential-privacy techniques add a little bit of noise, or random variation, to the results of database searches, to confound algorithms that would seek to extract private information from the results of several, tailored, sequential searches.

    The amount of noise required depends on the strength of the privacy guarantee — how low you want to set the likelihood of leaking private information — and the type and volume of data. The more people whose data a SNP database contains, the less noise the system needs to add; essentially, it’s easier to get lost in a crowd. But the more SNPs the system records, the more flexibility an attacker has in constructing privacy-compromising searches, which increases the noise requirements.

    The researchers considered two types of common queries. In one, the user asks for the statistical correlation between a particular SNP and a particular disease. In the other, the user asks for a list of the SNPs in a particular region of the genome that correlate best with a particular disease.

    In the first case, the system returns a widely used measure of correlation called a p-value. Here, the p-value would be modified — augmented or reduced by some random factor — in order to ensure privacy.

    In the second case, the system has some chance of returning not the top-scoring SNPs in a given region, but several of the top-scoring SNPs and maybe one or two lower-scoring ones. To calculate the probability that a given SNP will make it into the results, the researchers use a measure called the Hamming distance, which indicates how far away a lower-scoring SNP is from the one that it’s replacing. This turns out to yield more useful results than relying on the p-value. Finding an efficient algorithm for calculating Hamming distances on the fly is one of the system’s chief innovations.

    Ironing out differences

    The other is that the system corrects for a problem common in population genetics called population stratification. “The standard example is that a particular SNP is closely linked to being lactose intolerant,” Simmons explains. “Let’s say that people in East Asia are more likely to be lactose intolerant than someone in, say, Northern Europe. But also Northern Europeans tend to be taller than people from East Asia. A naive method would suggest that this particular SNP has an effect on height, but it’s really a false correlation.”

    The researchers’ algorithm assumes that the largest variations in a given population are the results of differences between subpopulations, filters those differences out, and hones in on the ones that remain.

    “Since Homer’s attack in 2008, the biomedical community has been debating to what extent and to whom genomic and phenotypic databases should be made accessible,” says Jean-Pierre Hubaux, a professor of computer science at the École Polytechnique Fédérale de Lausanne, referring to a paper by Nils Homer, then a graduate student at the University of California at Los Angeles, on determining whether a given person’s genetic data is present in a database. “In parallel, Cynthia Dwork and other computer scientists have developed the concept of differential privacy, the theory of which is now well-understood. The authors of this paper make a crucial contribution, because they provide concrete examples of how differential privacy can be used to protect the privacy of genome-wide association studies in heterogeneous human populations. Hopefully, this will encourage the biomedical community to test this promising approach at large scale and, if it’s successful, define best practices and develop related tools.”

    6:20p
    Faculty at MIT and beyond respond forcefully to an article critical of Suzanne Corkin

    On August 7, 2016, the New York Times Magazine published “The Brain That Couldn’t Remember,” an article adapted from the forthcoming book “Patient H.M.: A Story of Memory, Madness, and Family Secrets,” by Luke Dittrich. The article is highly critical of the late Suzanne Corkin, who was a professor emerita of neuroscience until her death on May 24.

    In response to the article, more than 200 members of the international scientific community — most from outside MIT — have signed a letter in support of Corkin and her research with the amnesic patient Henry Molaison.

    What follows is a statement by James DiCarlo, the Peter de Florez Professor of Neuroscience and head of the Department of Brain and Cognitive Science.

    In “The Brain That Couldn’t Remember,” three allegations are made against Professor Suzanne Corkin, who died on May 24. Professors John Gabrieli and Nancy Kanwisher at MIT have examined evidence in relation to each allegation, and, as detailed below, have found significant evidence that contradicts each allegation. In our judgment, the evidence below rebuts each claim.

    1. Allegation that research records were or would be destroyed or shredded.

    We believe that no records were destroyed and, to the contrary, that Professor Corkin worked in her final days to organize and preserve all records. Even as her health failed (she had advanced cancer and was receiving chemotherapy), she instructed her assistant to continue to organize, label, and maintain all records related to Henry Molaison. The records currently remain within our department.

    Assuming that the interview is accurately and fully reported by Mr. Dittrich, we cannot explain why Professor Corkin made the comments reported in the article. This may have been related to tensions between the author and Professor Corkin because she had turned down his request to examine Mr. Molaison’s confidential medical and research records.

    Regardless, the critical point is not what was said in an interview, but rather what actions were actually taken by Professor Corkin. The actions were to preserve the records.

    2. Allegation that the finding of an additional lesion in left orbitofrontal cortex was suppressed.

    The public record is clear that Professor Corkin communicated this discovery of an additional lesion in Mr. Molaison to both scientific and public audiences. This factual evidence is contradictory to any allegation of the suppression of a finding.

    The original scientific report (Nature Communications, 2014) of the post-mortem examination of Mr. Molaison’s brain included this information in the most prominent and widely read portion of the report, the abstract.

    In addition, Professor Corkin herself disseminated this information in public forums, including a 2014 interview, posted on MIT News and subsequently elsewhere online, in which she said: “We discovered a new lesion in the lateral orbital gyrus of the left frontal lobe. This damage was also visible in the postmortem MRI scans. The etiology of this lesion is presently unknown; future histological studies will clarify the cause and timeframe of this damage. Currently, it is unclear whether this lesion had any consequence for H.M.’s behavior.”

    3. Allegation that there was something inappropriate in the selection of Tom Mooney as Mr. Molaison’s guardian.

    In her book “Permanent Present Tense” (2013), Professor Corkin describes precisely the provenance of Mr. Molaison’s guardianship (page 201).

    Briefly, in 1974 Mr. Molaison and his mother (who was in failing health; his father was deceased) moved in with Lillian Herrick, whose first husband was related to Mr. Molaison’s mother. Mrs. Herrick is described as caring for Mr. Molaison until 1980, when she was diagnosed with advanced cancer, and Mr. Molaison was admitted to a nursing home founded by her brother.

    In 1991, the Probate Court in Windsor Locks, Connecticut, appointed Mrs. Herrick’s son, Tom Mooney, as Mr. Molaison’s conservator. (Mr. Mooney is referred to as “Mr. M” in the book because of his desire for privacy.) This family took an active interest in helping Mr. Molaison and his mother, and was able to help place him in the nursing home that took care of him.

    Mr. Dittrich provides no evidence that anything untoward occurred, and we are not aware of anything untoward in this process. Mr. Dittrich identifies some individuals who were genetically closer to Mr. Molaison than Mrs. Herrick or her son, but it is our understanding that this family took in Mr. Molaison and his mother, and took care of Mr. Molaison for many years. Mr. Mooney was appointed conservator by the local court after a valid legal process, which included providing notice of a hearing and appointment of counsel to Mr. Molaison.

    Journalists are absolutely correct to hold scientists to very high standards. I — and over 200 scientists who have signed a letter to the editor in support of Professor Corkin — believe she more than achieved those high standards. However, the author (and, implicitly, the Times) has failed to do so.

    James J. DiCarlo MD, PhD
    Peter de Florez Professor of Neuroscience
    Head, Department of Brain and Cognitive Sciences
    Investigator, McGovern Institute for Brain Research
    Massachusetts Institute of Technology

    << Previous Day 2016/08/09
    [Calendar]
    Next Day >>

MIT Research News   About LJ.Rossia.org