MIT Research News' Journal

Making better decisions when outcomes are uncertain

Markov decision processes are mathematical models used to determine the best courses of action when both current circumstances and future consequences are uncertain. They’ve had a huge range of applications — in natural-resource management, manufacturing, operations management, robot control, finance, epidemiology, scientific-experiment design, and tennis strategy, just to name a few.

But analyses involving Markov decision processes (MDPs) usually make some simplifying assumptions. In an MDP, a given decision doesn’t always yield a predictable result; it could yield a range of possible results. And each of those results has a different “value,” meaning the chance that it will lead, ultimately, to a desirable outcome.

Characterizing the value of given decision requires collection of empirical data, which can be prohibitively time consuming, so analysts usually just make educated guesses. That means, however, that the MDP analysis doesn’t guarantee the best decision in all cases.

In the Proceedings of the Conference on Neural Information Processing Systems, published last month, researchers from MIT and Duke University took a step toward putting MDP analysis on more secure footing. They show that, by adopting a simple trick long known in statistics but little applied in machine learning, it’s possible to accurately characterize the value of a given decision while collecting much less empirical data than had previously seemed necessary.

In their paper, the researchers described a simple example in which the standard approach to characterizing probabilities would require the same decision to be performed almost 4 million times in order to yield a reliable value estimate.

With the researchers’ approach, it would need to be run 167,000 times. That’s still a big number — except, perhaps, in the context of a server farm processing millions of web clicks per second, where MDP analysis could help allocate computational resources. In other contexts, the work at least represents a big step in the right direction.

“People are not going to start using something that is so sample-intensive right now,” says Jason Pazis, a postdoc at the MIT Laboratory for Information and Decision Systems and first author on the new paper. “We’ve shown one way to bring the sample complexity down. And hopefully, it’s orthogonal to many other ways, so we can combine them.”

Unpredictable outcomes

In their paper, the researchers also report running simulations of a robot exploring its environment, in which their approach yielded consistently better results than the existing approach, even with more reasonable sample sizes — nine and 105. Pazis emphasizes, however, that the paper’s theoretical results bear only on the number of samples required to estimate values; they don’t prove anything about the relative performance of different algorithms at low sample sizes.

Pazis is joined on the paper by Jonathan How, the Richard Cockburn Maclaurin Professor of Aeronautics and Astronautics at MIT, and by Ronald Parr, a professor of computer science at Duke.

Although the possible outcomes of a decision may be described according to a probability distribution, the expected value of the decision is just the mean, or average, value of all outcomes. In the familiar bell curve of the so-called normal distribution, the mean defines the highest point of the bell.

The trick the researchers’ algorithm employs is called the median of means. If you have a bunch of random values, and you’re asked to estimate the mean of the probability distribution they’re drawn from, the natural way to do it is to average them. But if your sample happens to include some rare but extreme outliers, averaging can give a distorted picture of the true distribution. For instance, if you have a sample of the heights of 10 American men, nine of whom cluster around the true mean of 5 feet 10 inches, but one of whom is a 7-foot-2-inch NBA center, straight averaging will yield a mean that’s off by about an inch and a half.

With the median of means, you instead divide your sample into subgroups, take the mean (average) of each of those, and then take the median of the results. The median is the value that falls in the middle, if you arrange your values from lowest to highest.

Value proposition

The goal of MDP analysis is to determine a set of policies — or actions under particular circumstances — that maximize the value of some reward function. In a manufacturing setting, the reward function might measure operational costs against production volume; in robot control, it might measure progress toward the completion of a task.

But a given decision is evaluated according to a much more complex measure called a “value function,” which is a probabilistic estimate of the expected reward from not just that decision but every possible decision that could follow.

The researchers showed that, with straight averaging, the number of samples required to estimate the mean value of a decision is proportional to the square of the range of values that the value function can take on. Since that range can be quite large, so is the number of samples. But with the median of means, the number of samples is proportional to the range of a different value, called the Bellman operator, which is usually much narrower. The researchers also showed how to calculate the optimal size of the subsamples in the median-of-means estimate.

“The results in the paper, as with most results of this type, still reflect a large degree of pessimism because they deal with a worst-case analysis, where we give a proof of correctness for the hardest possible environment,” says Marc Bellemare, a research scientist at the Google-owned artificial-intelligence company Google DeepMind. “But that kind of analysis doesn't need to carry over to applications. I think Jason's approach, where we allow ourselves to be a little optimistic and say, ‘Let's hope the world out there isn't all terrible,’ is almost certainly the right way to think about this problem. I’m expecting this kind of approach to be highly useful in practice.”

The work was supported by the Boeing Company, the U.S. Office of Naval Research, and the National Science Foundation.

Engaged neighbors

MIT faculty, students, and alumni consistently find creative ways to apply their knowledge to local contexts, collaborating with partners around the globe to make the world a better place. Arguably, no MIT exchange is more fertile than the one the School of Architecture and Planning (SA+P) enjoys with Mexico.

Recently, students and scholars from SA+P have worked with local collaborators to plan mixed-use developments around mass transit hubs in Mexico City and to design novel solutions for Baja wineries faced with chronic drought. And here in Cambridge, Massachusetts, Mexican artist and social activist Pedro Reyes spent a semester in residence at the MIT Center for Art, Science, and Technology (CAST), where he explored potential collaborative projects with MIT faculty and taught a course that invited students to consider the human cost of unbridled technology.

The Institute’s synergy with Mexico will be celebrated on March 23 at the MIT Better World campaign event at the Four Seasons Hotel in Mexico City. Part of an ongoing celebration of MIT’s global culture that has already sponsored gatherings in New York, San Francisco, Hong Kong, London, Tel Aviv, and Los Angeles, the Mexico City event will feature economist Pedro Aspe Armella PhD ’78, cellist and author Carlos Prieto ’58, faculty members Miho Mazereeuw and Paulo Lozano SM ’98, PhD ’03, and master’s in city planning candidate Carlos Sainz Caccia, as well as MIT President L. Rafael Reif.

The event will reflect the spirit of the many innovative collaborations between MIT and Mexico, from urban initiatives around housing, transportation, and the environment, to scholarly exchanges on many other topics. Through these engagements, including the recent SA+P activities described below, faculty, students, and alumni are tackling pressing global issues.

Researching transportation in Mexico City

Last spring, P. Christopher Zegras, an associate professor in the Department of Urban Studies and Planning (DUSP), co-taught a graduate practicum in which students traveled to Mexico City to study the potential of linking real estate development to public transportation networks. Known as transit-oriented development, the development paradigm requires designers to engage with multiple stakeholders — government officials, local businesses, residents, transit authorities, and private developers — to help create sustainable mixed-use communities around mass transit stations.

Zegras’s students integrated their classroom knowledge into the realities of Mexico’s economy, geography, and politics, crafting solutions that made sense — and could be implemented — in that particular time, space, and culture.

“The good news is that there are concrete actions that can be taken,” says Onésimo Flores PhD ’13, who co-taught the course as a visiting lecturer at MIT. Flores is now CEO of Conecta Cuatro, a company that promotes innovative technology ventures to tackle urban and transportation problems in Mexico.

The client and sponsor for the practicum was Grupo Prodi, a private developer and operator of multimodal transit stations in Mexico City. “It’s clear that transit-oriented development can be the key to improving quality of life in urban areas not only in Mexico City, but also in cities like Monterey and Guadalajara,” says José Miguel Bejos, Grupo Prodi CEO.

Designing sustainable agriculture in Baja

Sheila Kennedy, a professor in the Department of Architecture, brought her Architecture Design Core Studio III students to the Valle de Guadeloupe, in Baja — a wine region severely affected by drought and climate change. “I wanted to place the students at the intersection between places of production and the limits of natural resources,” says Kennedy. “The students needed to have a strategy for their designs, their materials, and where those materials were sourced. They couldn’t just rely on standard sustainable solutions like green roofs.”

After a weeklong visit to Baja, with host and sponsor Bodegas F. Rubio Winery, the students returned to campus to develop novel architectural and site designs that could help wineries survive and thrive in a challenging climate. One student suggested using solar chimneys to take advantage of the sharp nightly drops in temperature, storing cool air and releasing it during the day. Another designed a temporary winery that would exist just as long as there was water and could then be dismantled — with its materials repurposed or recycled — when the water ran out.

Second-year master’s student Anne Graziano came to Baja intending to work on dew collection. But after she saw the morning fog roll in each day from the nearby Pacific Ocean, she returned to MIT to design an undulating series of walls made of MIT’s Fog Harvesting Mesh, a finely woven metal textile that can increase fog collection yields fivefold.

“I love the idea that architecture can create something functional but also communicate a sense of place and create shade and shared spaces,” says Graziano. “Because of the landscape and the culture and the community I encountered there, I was able to imagine a winery whose walls provide the water with which the wine will be made.”

Engaging technology through an artist’s perspective

While working and studying in Mexico has stimulated innovation by MIT scholars, the Institute has also inspired others who call Mexico home. “I don’t think there’s any other place in the world that has such a density of creativity” as MIT, says Reyes, the Mexican artist and activist who is the Dasha Zhukova Distinguished Visiting Artist at CAST. “Every person I meet there is inspiring.” Reyes’ ongoing appointment was the first in a series endowed by a $1 million gift to CAST from Russian-American philanthropist, entrepreneur, and art collector Dasha Zhukova.

Last fall, Reyes taught a course with visiting lecturer Carla Fernández, a Mexican fashion designer, titled “The Reverse Engineering of Warfare: Challenging Techno-Optimism and Reimagining the Defense Sector (an Opera for the End of Times).” Reyes encouraged his students to consider the effects of automation on global employment, reexamine the influence of the military sector on U.S. government policies, and imagine retooling the U.S. military complex to combat global climate change.

“My Latin culture leads me to place more importance on human interaction than on technology,” says Reyes. “I’m more interested in replacing robots with humans than in replacing humans with robots. But that doesn’t make me any less of a tech lover. You can see great potential in technology, but also recognize that technology needs to be applied with care. That critical dialectic is a vital part of MIT culture.”