MIT Research News' Journal
 
[Most Recent Entries] [Calendar View]

Thursday, November 1st, 2012

    Time Event
    4:00a
    Predicting what topics will trend on Twitter
    Twitter’s home page features a regularly updated list of topics that are “trending,” meaning that tweets about them have suddenly exploded in volume. A position on the list is highly coveted as a source of free publicity, but the selection of topics is automatic, based on a proprietary algorithm that factors in both the number of tweets and recent increases in that number.

    At the Interdisciplinary Workshop on Information and Decision in Social Networks at MIT in November, Associate Professor Devavrat Shah and his student Stanislav Nikolov will present a new algorithm that can, with 95 percent accuracy, predict which topics will trend an average of an hour and a half before Twitter’s algorithm puts them on the list — and sometimes as much as four or five hours before.

    The algorithm could be of great interest to Twitter, which could charge a premium for ads linked to popular topics, but it also represents a new approach to statistical analysis that could, in theory, apply to any quantity that varies over time: the duration of a bus ride, ticket sales for films, maybe even stock prices.

    Like all machine-learning algorithms, Shah and Nikolov’s needs to be “trained”: it combs through data in a sample set — in this case, data about topics that previously did and did not trend — and tries to find meaningful patterns. What distinguishes it is that it’s nonparametric, meaning that it makes no assumptions about the shape of patterns.

    Let the data decide

    In the standard approach to machine learning, Shah explains, researchers would posit a “model” — a general hypothesis about the shape of the pattern whose specifics need to be inferred. “You’d say, ‘Series of trending things … remain small for some time and then there is a step,’” says Shah, the Jamieson Career Development Associate Professor in the Department of Electrical Engineering and Computer Science. “This is a very simplistic model. Now, based on the data, you try to train for when the jump happens, and how much of a jump happens.

    “The problem with this is, I don’t know that things that trend have a step function,” Shah explains. “There are a thousand things that could happen.” So instead, he says, he and Nikolov “just let the data decide.”

    In particular, their algorithm compares changes over time in the number of tweets about each new topic to the changes over time of every sample in the training set. Samples whose statistics resemble those of the new topic are given more weight in predicting whether the new topic will trend or not. In effect, Shah explains, each sample “votes” on whether the new topic will trend, but some samples’ votes count more than others’. The weighted votes are then combined, giving a probabilistic estimate of the likelihood that the new topic will trend.

    In Shah and Nikolov’s experiments, the training set consisted of data on 200 Twitter topics that did trend and 200 that didn’t. In real time, they set their algorithm loose on live tweets, predicting trending with 95 percent accuracy and a 4 percent false-positive rate.

    Shah predicts, however, that the system’s accuracy will improve as the size of the training set increases. “The training sets are very small,” he says, “but we still get strong results.”

    Keeping pace

    Of course, the larger the training set, the greater the computational cost of executing Shah and Nikolov’s algorithm. Indeed, Shah says, curbing computational complexity is the reason that machine-learning algorithms typically employ parametric models in the first place. “Our computation scales proportionately with the data,” Shah says.

    But on the Web, he adds, computational resources scale with the data, too: As Facebook or Google add customers, they also add servers. So his and Nikolov’s algorithm is designed so that its execution can be split up among separate machines. “It is perfectly suited to the modern computational framework,” Shah says.

    In principle, Shah says, the new algorithm could be applied to any sequence of measurements performed at regular intervals. But the correlation between historical data and future events may not always be as clear cut as in the case of Twitter posts. Filtering out all the noise in the historical data might require such enormous training sets that the problem becomes computationally intractable even for a massively distributed program. But if the right subset of training data can be identified, Shah says, “It will work.”

    “People go to social-media sites to find out what’s happening now,” says Ashish Goel, an associate professor of management science at Stanford University and a member of Twitter’s technical advisory board. “So in that sense, speeding up the process is something that is very useful.” Of the MIT researchers’ nonparametric approach, Goel says, “it’s very creative to use the data itself to find out what trends look like. It’s quite creative and quite timely and hopefully quite useful.”

    EVENT: As part of the MIT News at Noon program, Devavrat Shah will speak about this research at the MIT Museum on Friday, Nov. 9, from 12:10 to 12:50 p.m. Learn more
    6:00p
    Sitting still or going hunting: Which works better?
    For the kinds of animals that are most familiar to us — ones that are big enough to see — it’s a no-brainer: Is it better to sit around and wait for food to come to you, or to move around and find it? Larger animals that opt to sit around aren’t likely to last long.

    But for bacteria out in the ocean, the question is a far more complicated one.

    Oceanographers have long assumed that because turbulence distributes nutrients uniformly through the water, and because the ability of tiny organisms to move around is insignificant compared to this turbulence, there was no reason for such creatures to move at all. Sea-dwelling bacterial life, they believed, should consist just of static feeders.

    That view has now been upended by research conducted by Roman Stocker, an associate professor in MIT’s Department of Civil and Environmental Engineering, and John R. Taylor, a former MIT postdoc who is now a lecturer in applied mathematics and theoretical physics at Cambridge University. It turns out that swimmers and passive feeders each have some advantages — but also pay some costs — in food gathering, depending on how fast the swimmers swim and how strong the turbulence is.

    The results, based on a computer model that for the first time considers nutrient competition by bacteria in a turbulent flow, have just been published in Science.

    Until now, most studies of microorganism behavior have taken place in static environments, such as a test tube or Petri dish, without regard to fluid flow. “Marine bacteria have mostly been studied in isolation from the motion of the seawater they live in,” Stocker says. These new computer simulations, which model both how water flows and how bacteria in that water behave, have now made it possible to study the foraging of bacteria in dynamic environments, similar to the turbulent waters they naturally inhabit.

    “We’re working at the interface between microbiology and fluid dynamics,” Stocker says. The new work has produced the first study of how this turbulent environment affects the choice of bacterial feeding strategy.

    While it’s true that ocean currents and turbulence are orders of magnitude faster than anything that even the most energetic microorganisms can achieve, it turns out that these creatures’ motions are nevertheless very important, Stocker and Taylor found. That’s because the microenvironment a bacterium occupies — the microliters of water surrounding it and representing its foraging ground — is transported en masse in those flows, so even small rates of swimming can make a big difference in accessing food.


    Numerical simulations were used to study the ability of swimming bacteria to consume nutrients dissolved in the seawater. Here, a patch of nutrients is released at the start of the simulation, which quickly becomes distorted by the turbulent motion of the water. Bacteria start out uniformly distributed, but quickly cluster around the nutrient filaments. Colors show the concentration of bacteria in blue, green and yellow, in order of increasing concentration. Red shows high initial concentrations of nutrients.

    Although swimming toward a food source can confer some advantage in the amount of food consumed, it also carries a penalty: Swimming uses energy, requiring more food. “As you swim faster, you do better in terms of food uptake, but you also pay a price,” Stocker says.

    Because of that tradeoff, the researchers found, for any given intensity of turbulence there is an optimal swimming speed; anything faster than this produces diminishing returns.

    “What we did was to quantify the cost-benefit trade-off,” Stocker says. “We considered the pluses and minuses, and found that the swimming costs matter.” In general, the results showed, the optimal swimming speed for marine bacteria is about 60 micrometers per second, which is “in very nice agreement with what’s observed for the actual swimming speed of many marine bacteria,” Stocker adds.

    Turbulence stirs nutritious material through the water in much the same way as it stirs cream into a cup of coffee: First forming filaments, then mixing them away. “Something similar happens — on a tiny scale — when bursts of organic matter enter gently moving water,” Taylor says. “The swirls of organic matter are easily accessed by swimming bacteria, which surround and absorb it.”

    But the swimmers’ advantage is short-lived. In turbulent seawater, after several seconds, nutrients are uniformly distributed — meaning there’s as much available for the stationary microbes as there is for the swimmers.

    Most of this organic matter is derived from phytoplankton, tiny organisms that capture sunlight and convert it into carbohydrates. In fact, phytoplankton in the ocean are responsible for about as much photosynthesis as all of Earth’s terrestrial plant matter combined, so understanding their role in the global carbon cycle is crucially important for understanding everything from the food chains that sustain the world’s fisheries to the complexities of global climate change.

    As part of their analysis, Stocker and Taylor conducted detailed simulations of exactly how a patch of nutrient-rich material, derived from dying phytoplankton or other sources, disperses in turbulent water. What starts out as a compact blob “gets stretched out, stirred, and explodes into a tangled web of filaments,” Stocker says.

    That’s when the swimming advantage is greatest: The filaments provide lots of nutrient-rich zones to swim toward. But then the window of opportunity closes. “Eventually it all gets homogenized,” he says. “Once it’s mixed, swimming makes no more sense”.

    How fast that happens depends on the strength of turbulence. Stocker and Taylor found that very high or very low turbulence provides the least advantage for the swimmers. But for a range in between, they had a substantial edge over their inert relatives.

    The basic understanding of these microscopic processes could have large-scale importance, Stocker says. As the Earth’s climate warms, turbulence might become weaker in the ocean, which could affect which species dominate the competition for nutrients at the base of the food chain. Species composition, in turn, can dictate how much carbon is channeled to organisms higher in the food chain, shaping the productivity of a marine environment.

    Thomas Kiørboe, a professor at the Technical University of Denmark and head of its Centre for Ocean Life, says this paper is “significant and thought-provoking” and “provides a new view on how small-scale physics in the ocean may determine both the behavior of bacteria and the biogeochemistry of the oceans.” These processes, he says, “play a major role in the global carbon budget, and thus have large implications for the global climate.”

    Kiørboe, who was not involved in this research, adds that it “provides new, significant insights by demonstrating that small-scale turbulence — contrary to former beliefs — may increase nutrient uptake and hence, growth” of marine bacteria.

    The research was partly supported by grants from the National Science Foundation.

    << Previous Day 2012/11/01
    [Calendar]
    Next Day >>

MIT Research News   About LJ.Rossia.org