MIT Research News' Journal
 
[Most Recent Entries] [Calendar View]

Monday, June 20th, 2016

    Time Event
    12:00a
    Parallel programming made easy

    Computer chips have stopped getting faster. For the past 10 years, chips’ performance improvements have come from the addition of processing units known as cores.

    In theory, a program on a 64-core machine would be 64 times as fast as it would be on a single-core machine. But it rarely works out that way. Most computer programs are sequential, and splitting them up so that chunks of them can run in parallel causes all kinds of complications.

    In the May/June issue of the Institute of Electrical and Electronics Engineers’ journal Micro, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new chip design they call Swarm, which should make parallel programs not only much more efficient but easier to write, too.

    In simulations, the researchers compared Swarm versions of six common algorithms with the best existing parallel versions, which had been individually engineered by seasoned software developers. The Swarm versions were between three and 18 times as fast, but they generally required only one-tenth as much code — or even less. And in one case, Swarm achieved a 75-fold speedup on a program that computer scientists had so far failed to parallelize.

    “Multicore systems are really hard to program,” says Daniel Sanchez, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science, who led the project. “You have to explicitly divide the work that you’re doing into tasks, and then you need to enforce some synchronization between tasks accessing shared data. What this architecture does, essentially, is to remove all sorts of explicit synchronization, to make parallel programming much easier. There’s an especially hard set of applications that have resisted parallelization for many, many years, and those are the kinds of applications we’ve focused on in this paper.”

    Many of those applications involve the exploration of what computer scientists call graphs. A graph consists of nodes, typically depicted as circles, and edges, typically depicted as line segments connecting the nodes. Frequently, the edges have associated numbers called “weights,” which might represent, say, the strength of correlations between data points in a data set, or the distances between cities.

    Graphs crop up in a wide range of computer science problems, but their most intuitive use may be to describe geographic relationships. Indeed, one of the algorithms that the CSAIL researchers evaluated is the standard algorithm for finding the fastest driving route between two points.

    Setting priorities

    In principle, exploring graphs would seem to be something that could be parallelized: Different cores could analyze different regions of a graph or different paths through the graph at the same time. The problem is that with most graph-exploring algorithms, it gradually becomes clear that whole regions of the graph are irrelevant to the problem at hand. If, right off the bat, cores are tasked with exploring those regions, their exertions end up being fruitless.

    Of course, fruitless analysis of irrelevant regions is a problem for sequential graph-exploring algorithms, too, not just parallel ones. So computer scientists have developed a host of application-specific techniques for prioritizing graph exploration. An algorithm might begin by exploring just those paths whose edges have the lowest weights, for instance, or it might look first at those nodes with the lowest number of edges.

    What distinguishes Swarm from other multicore chips is that it has extra circuitry for handling that type of prioritization. It time-stamps tasks according to their priorities and begins working on the highest-priority tasks in parallel. Higher-priority tasks may engender their own lower-priority tasks, but Swarm slots those into its queue of tasks automatically.

    Occasionally, tasks running in parallel may come into conflict. For instance, a task with a lower priority may write data to a particular memory location before a higher-priority task has read the same location. In those cases, Swarm automatically backs out the results of the lower-priority tasks. It thus maintains the synchronization between cores accessing the same data that programmers previously had to worry about themselves.

    Indeed, from the programmer’s perspective, using Swarm is pretty painless. When the programmer defines a function, he or she simply adds a line of code that loads the function into Swarm’s queue of tasks. The programmer does have to specify the metric — such as edge weight or number of edges — that the program uses to prioritize tasks, but that would be necessary, anyway. Usually, adapting an existing sequential algorithm to Swarm requires the addition of only a few lines of code.

    Keeping tabs

    The hard work falls to the chip itself, which Sanchez designed in collaboration with Mark Jeffrey and Suvinay Subramanian, both MIT graduate students in electrical engineering and computer science; Cong Yan, who did her master’s as a member of Sanchez’s group and is now a PhD student at the University of Washington; and Joel Emer, a professor of the practice in MIT’s Department of Electrical Engineering and Computer Science, and a senior distinguished research scientist at the chip manufacturer NVidia.

    The Swarm chip has extra circuitry to store and manage its queue of tasks. It also has a circuit that records the memory addresses of all the data its cores are currently working on. That circuit implements something called a Bloom filter, which crams data into a fixed allotment of space and answers yes/no questions about its contents. If too many addresses are loaded into the filter, it will occasionally yield false positives — indicating “yes, I’m storing that address” — but it will never yield false negatives.

    The Bloom filter is one of several circuits that help Swarm identify memory access conflicts. The researchers were able to show that time-stamping makes synchronization between cores easier to enforce.  For instance, each data item is labeled with the time stamp of the last task that updated it, so tasks with later time-stamps know they can read that data without bothering to determine who else is using it.

    Finally, all the cores occasionally report the time stamps of the highest-priority tasks they’re still executing. If a core has finished tasks that have earlier time stamps than any of those reported by its fellows, it knows it can write its results to memory without courting any conflicts.

    “I think their architecture has just the right aspects of past work on transactional memory and thread-level speculation,” says Luis Ceze, an associate professor of computer science and engineering at the University of Washington. “‘Transactional memory’ refers to a mechanism to make sure that multiple processors working in parallel don't step on each other’s toes. It guarantees that updates to shared memory locations occur in an orderly way. Thread-level speculation is a related technique that uses transactional-memory ideas for parallelization: Do it without being sure the task is parallel, and if it's not, undo and re-execute serially. Sanchez's architecture uses many good pieces of those ideas and technologies in a creative way.”

    1:00p
    Winds of change?

    China has an opportunity to massively increase its use of wind power — if it properly integrates wind into its existing power system, according to a newly published MIT study.

    The study forecasts that wind power could provide 26 percent of China’s projected electricity demand by 2030, up from 3 percent in 2015. Such a change would be a substantial gain in the global transition to renewable energy, since China produces the most total greenhouse gas emissions of any country in the world.

    But the projection comes with a catch. China should not necessarily build more wind power in its windiest areas, the study finds. Instead, it should build more wind turbines in areas where they can be more easily integrated into the operations of its existing electricity grid.

    “Wind that is built in distant, resource-rich areas benefits from more favorable physical properties but suffers from existing constraints on the operation of the power system,” states Valerie Karplus, an assistant professor at the MIT Sloan School of Management, director of the Tsinghua-MIT China Energy and Climate Project, and a member of the MIT Energy Initiative. Those constraints include greater transmission costs and the cost of “curtailment,” when available wind power is not used.

    The paper, “Integrating wind into China’s coal-heavy electricity system,” is appearing in Nature Energy. In addition to Karplus, the authors are Michael R. Davidson, a graduate student in MIT’s Joint Program on the Science and Policy of Global Change and the MIT Institute for Data, Systems, and Society; Da Zhang, a postdoc in MIT’s Joint Program on the Science and Policy of Global Change; and Weiming Xiong and Xiliang Zhang of Tsinghua University. Karplus and Zhang are the corresponding authors of the paper, and lead an MIT-Tsinghua collaboration focused on managing energy and climate change in China.

    Co-existing with coal

    While China has invested heavily in renewable energy sources in recent years, more investment in the sector will be needed if the country is to meet its pledge of having 20 percent of its energy consumption come from non-fossil fuel sources by the year 2030, as part of the Paris climate agreement of 2015.

    While several previous studies have evaluated China’s wind-energy potential based on the country’s natural environment, the MIT study is the first to study how wind energy could expand, based on simulations of China’s power system operations.

    When operational constraints are considered, the MIT team found, China may only be able to use 10 percent of the physical potential for wind power cited in their analysis and other studies. Nevertheless, even harnessing that 10 percent would be enough for wind power to provide the study’s estimated 26 percent of electricity by 2030.

    A key challenge the study identifies is integrating wind power into a system that has traditionally been geared toward consumption of coal. Wind power, being intermittent, currently requires flexibility in the operation of the electricity system to ensure wind can be used when it is available.

    That, in turn, requires flexibility in the delivery of electricity from coal-fired power plants, which accounted for over 70 percent of electricity generated in China in 2015. However, China has regulations determining high minimum output levels for many coal-powered electricity plants, to ensure the profitability of those plants. Reducing these requirements and creating more flexible generation schedules for coal would create more space for wind power.

    “Renewable energy plays a central role in China’s efforts to address climate change and local air quality,” Da Zhang explains. “China plans to substantially increase the amount of wind electricity capacity in the future, but its utilization — and ultimately its contribution to these environmental goals — depends on whether or not integration challenges can be solved.”

    New policies possible?

    As the researchers see it, new policies can help create the conditions for increased use of wind power — but may be difficult to implement. As Davidson notes, “establishing regulatory structures and policy incentives to capture these benefits will be difficult in China because of legacy institutions.”

    And as Karplus adds, current regulations have been designed to ensure profitability for power producers, rather than making them compete to lower costs. “Existing policies prioritize sharing benefits equally among participants rather than facing strict price competition,” she says. “As electricity demand growth has slowed in recent years, the limited size of the pie means sharper conflicts between wind and coal.”

    To be sure, as Karplus notes, government planners in China have been experimenting with using energy markets that do not rely strictly on the system that uses a quota for coal power, but encourages competition for long-term contracts to deliver coal-based electricity, while creating additional markets for flexible operation.

    Such market mechanisms could prove beneficial to renewable energy sources, principally wind and solar power. As Karplus concludes: “Our work shows the value of continuing these reforms, including introducing markets and relaxing the administrative constraints … for China's ability to utilize its present and future wind capacity to the fullest.”

    At MIT, the research was funded by a consortium of founding sponsors of the MIT-Tsinghua China Energy and Climate Project, supported through the MIT Energy Initiative: Eni, the French Development Agency (AFD), ICF, and Shell. At Tsinghua University, researchers received separate support from government and industry sources. The MIT-Tsinghua China Energy and Climate Project is part of the MIT Joint Program on the Science and Policy of Global Change.

    4:00p
    Analog computing returns

    A transistor, conceived of in digital terms, has two states: on and off, which can represent the 1s and 0s of binary arithmetic.

    But in analog terms, the transistor has an infinite number of states, which could, in principle, represent an infinite range of mathematical values. Digital computing, for all its advantages, leaves most of transistors’ informational capacity on the table.

    In recent years, analog computers have proven to be much more efficient at simulating biological systems than digital computers. But existing analog computers have to be programmed by hand, a complex process that would be prohibitively time consuming for large-scale simulations.

    Last week, at the Association for Computing Machinery’s conference on Programming Language Design and Implementation, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory and Dartmouth College presented a new compiler for analog computers, a program that translates between high-level instructions written in a language intelligible to humans and the low-level specifications of circuit connections in an analog computer.

    The work could help pave the way to highly efficient, highly accurate analog simulations of entire organs, if not organisms.

    “At some point, I just got tired of the old digital hardware platform,” says Martin Rinard, an MIT professor of electrical engineering and computer science and a co-author on the paper describing the new compiler. “The digital hardware platform has been very heavily optimized for the current set of applications. I want to go off and fundamentally change things and see where I can get.”

    The first author on the paper is Sara Achour, a graduate student in electrical engineering and computer science, advised by Rinard. They’re joined by Rahul Sarpeshkar, the Thomas E. Kurtz Professor and professor of engineering, physics, and microbiology and immunology at Dartmouth.

    Sarpeshkar, a former MIT professor and currently a visiting scientist at the Research Lab of Electronics, has long studied the use of analog circuits to simulate cells. “I happened to run into Rahul at a party, and he told me about this platform he had,” Rinard says. “And it seemed like a very exciting new platform.”

    Continuities

    The researchers’ compiler takes as input differential equations, which biologists frequently use to describe cell dynamics, and translates them into voltages and current flows across an analog chip. In principle, it works with any programmable analog device for which it has a detailed technical specification, but in their experiments, the researchers used the specifications for  an analog chip that Sarpeshkar developed.

    The researchers tested their compiler on five sets of differential equations commonly used in biological research. On the simplest test set, with only four equations, the compiler took less than a minute to produce an analog implementation; with the most complicated, with 75 differential equations, it took close to an hour. But designing an implementation by hand would have taken much longer.

    Differential equations are equations that include both mathematical functions and their derivatives, which describe the rate at which the function’s output values change. As such, differential equations are ideally suited to describing chemical reactions in the cell, since the rate at which two chemicals react is a function of their concentrations.

    According to the laws of physics, the voltages and currents across an analog circuit need to balance out. If those voltages and currents encode variables in a set of differential equations, then varying one will automatically vary the others. If the equations describe changes in chemical concentration over time, then varying the inputs over time yields a complete solution to the full set of equations.

    A digital circuit, by contrast, needs to slice time into thousands or even millions of tiny intervals and solve the full set of equations for each of them. And each transistor in the circuit can represent only one of two values, instead of a continuous range of values. “With a few transistors, cytomorphic analog circuits can solve complicated differential equations — including the effects of noise — that would take millions of digital transistors and millions of digital clock cycles,” Sarpeshkar says.

    Mapmaking

    From the specification of a circuit, the researchers’ compiler determines what basic computational operations are available to it; Sarpeshkar’s chip includes circuits that are already optimized for types of differential equations that recur frequently in models of cells.

    The compiler includes an algebraic engine that can redescribe an input equation in terms that make it easier to compile. To take a simple example, the expressions a(x + y) and ax + ay are algebraically equivalent, but one might prove much more straightforward than the other to represent within a particular circuit layout.

    Once it has a promising algebraic redescription of a set of differential equations, the compiler begins mapping elements of the equations onto circuit elements. Sometimes, when it’s trying to construct circuits that solve multiple equations simultaneously, it will run into snags and will need to backtrack and try alternative mappings.

    But in the researchers’ experiments, the compiler took between 14 and 40 seconds per equation to produce workable mappings, which suggests that it’s not getting hung up on fruitless hypotheses.

    “‘Digital’ is almost synonymous with ‘computer’ today, but that’s actually kind of a shame,” says Adrian Sampson, an assistant professor of computer science at Cornell University. “Everybody knows that analog hardware can be incredibly efficient — if we could use it productively. This paper is the most promising compiler work I can remember that could let mere mortals program analog computers. The clever thing they did is to target a kind of problem where analog computing is already known to be a good match — biological simulations — and build a compiler specialized for that case. I hope Sara, Rahul, and Martin keep pushing in this direction, to bring the untapped efficiency potential of analog components to more kinds of computing.”

    << Previous Day 2016/06/20
    [Calendar]
    Next Day >>

MIT Research News   About LJ.Rossia.org