MIT Research News' Journal
 
[Most Recent Entries] [Calendar View]

Wednesday, December 13th, 2017

    Time Event
    10:00a
    Students launch products that help users harness their superpowers

    Even superheroes need products to enhance their powers. Thor has a hammer. Wonder Woman has the lasso of truth. Batman has his suit. On Monday evening, teams of mechanical engineering students unveiled new products with their own power-extending capabilities.

    These students of 2.009 (Product Engineering Processes), have spent the semester developing and designing a product prototype centered around this year’s theme: “Super!” Products ranged from the fun (a game blind and sighted people can play together) to the life-saving (a real-time system for search and rescue teams) to the life-changing (a wearable device that minimizes the effects of tremors in Parkinson’s patients).

    Months of ideating, modeling, and testing culminated in this week’s final presentation. Eight teams of students presented to a full capacity crowd in MIT’s Kresge Auditorium. They received a rock-star welcome. Brandishing pompoms in every color of the rainbow, the crowd of 1,200 cheered as 140 students revealed their prototypes. 

    The 2.009 final presentation has become a seminal event each year at MIT. For the past 22 years, professor of mechanical engineering David Wallace has been at the helm of the class, which serves as a capstone for seniors. He leads a triple-digit team of dedicated teaching assistants, course instructors, and support staff to ensure students leave this class with an understanding of how products are created and launched.

    He also makes sure students have a lot of fun along the way. That was abundantly clear during Monday’s event, which kicked off with an organ performance of “Despacito” and featured a live band, lots of confetti, and of course, a top-hat-wearing Professor Wallace.

    Throughout the event, each team of students had seven minutes to pitch and demonstrate its product. Teams walked through each product’s unique features as well as their proposed business models. At the conclusion of every presentation, students answered questions from the audience in Kresge Auditorium and on social media.

    The live demonstrations provided each team an opportunity to demonstrate the “superpowers” their product could give its users. The silver team demonstrated the silver team’s wearable thermal danger detection device, FireSense. The yellow team simulated a search and rescue team’s efforts on MIT’s campus, providing a live demonstration of their product, Coordinate. Students from the green team invited MIT Office of Government  Community Relations Co-Director Paul Parravano, who is blind, to play their game platform Tatchi, which allows users to play games that are not visually dependent.

    One presentation that deeply moved the audience was Animo, a wearable device that helps manage tremors in people living with Parkinson’s Disease. The purple team enlisted the help of local business owner Michael Wackell. In a live demonstration on stage, Wackell traced a spiral shape with a marker, first without wearing the device around his wrist, then again while wearing Animo. His tremors made tracing the spiral incredibly difficult. But when he put the vibrating device on, he was able to neatly trace the spiral. When asked by his own daughter, who was in the audience, how he felt when used Animo the first time, Wackell responded, “I felt like I was myself again, like how I felt before my diagnosis.”

    The products designed by this year’s students can help a diverse range of people — from professional musicians to bricklayers and people with hearing loss — unlock their inner super powers. A brief summary of each product presented at the 2.009 final presentation follows:

    Silver Team: FireSense
    FireSense is a wearable, thermal danger detection device designed to help firefighters determine whether opening a door is safe during a fire. It lights up in green, yellow, or red to indicate the likelihood of thermal danger behind the door. FireSense also vibrates when it detects dangerous conditions to provide redundant nonvisual feedback.

    Red Team: Blink
    Blink is an assistive eyewear system that empowers people with advanced neurodegenerative diseases to control their homes simply by blinking. The wearer blinks to select options from a custom cascading auditory menu. These commands are relayed from the Blink app to a smart home assistant, such as Google Home or Amazon Alexa, to trigger any smart device.

    Green Team: Tatchi
    Tatchi is a strategy game platform that relies not on vision but a combination of memory, hearing, and sense of touch. It allows users to play games that are not visually dependent and helps promote social interaction between blind and sighted players.

    Pink Team: Volti
    Volti is a hands-free music page-turner that allows musicians to play without interruption. The turner does not require special preloading or setup of the pages, and is an elegant art piece in its own right.

    Blue Team: Robin
    Robin is a discreet wearable device that provides real-time feedback to help those who are hard of hearing adjust their speaking volume. It uses two discreet vibration patterns to notify the user if he or she is speaking either too loudly or too softly.

    Yellow Team: Coordinate
    Coordinate is a three-device system that provides real-time feedback for search-and-rescue teams, optimizing a lifesaving operation. The three devices — a search module, lead module, and command module — are designed for three distinct roles within a search-and-rescue mission.

    Purple Team: Animo
    Animo is a wearable device that uses vibration therapy to reduce tremors in Parkinson’s patients. It automatically determines the vibration pattern to mitigate tremors for different patients’ needs.

    Orange Team: Rhino
    Rhino is a masonry tool that features an impact hammer attachment and allows masons to remove mortar for repointing, quickly, accurately, and safely. The product substantially reduces the amount of harmful silica dust generated during the brick-removal process.

    11:59p
    Computer systems predict objects’ responses to physical forces

    Josh Tenenbaum, a professor of brain and cognitive sciences at MIT, directs research on the development of intelligence at the Center for Brains, Minds, and Machines, a multiuniversity, multidisciplinary project based at MIT that seeks to explain and replicate human intelligence.

    Presenting their work at this year’s Conference on Neural Information Processing Systems, Tenenbaum and one of his students, Jiajun Wu, are co-authors on four papers that examine the fundamental cognitive abilities that an intelligent agent requires to navigate the world: discerning distinct objects and inferring how they respond to physical forces.

    By building computer systems that begin to approximate these capacities, the researchers believe they can help answer questions about what information-processing resources human beings use at what stages of development. Along the way, the researchers might also generate some insights useful for robotic vision systems.

    “The common theme here is really learning to perceive physics,” Tenenbaum says. “That starts with seeing the full 3-D shapes of objects, and multiple objects in a scene, along with their physical properties, like mass and friction, then reasoning about how these objects will move over time. Jiajun’s four papers address this whole space. Taken together, we’re starting to be able to build machines that capture more and more of people’s basic understanding of the physical world.”

    Three of the papers deal with inferring information about the physical structure of objects, from both visual and aural data. The fourth deals with predicting how objects will behave on the basis of that data.

    Two-way street

    Something else that unites all four papers is their unusual approach to machine learning, a technique in which computers learn to perform computational tasks by analyzing huge sets of training data. In a typical machine-learning system, the training data are labeled: Human analysts will have, say, identified the objects in a visual scene or transcribed the words of a spoken sentence. The system attempts to learn what features of the data correlate with what labels, and it’s judged on how well it labels previously unseen data.

    In Wu and Tenenbaum’s new papers, the system is trained to infer a physical model of the world — the 3-D shapes of objects that are mostly hidden from view, for instance. But then it works backward, using the model to resynthesize the input data, and its performance is judged on how well the reconstructed data matches the original data.

    For instance, using visual images to build a 3-D model of an object in a scene requires stripping away any occluding objects; filtering out confounding visual textures, reflections, and shadows; and inferring the shape of unseen surfaces. Once Wu and Tenenbaum’s system has built such a model, however, it rotates it in space and adds visual textures back in until it can approximate the input data.

    Indeed, two of the researchers’ four papers address the complex problem of inferring 3-D models from visual data. On those papers, they’re joined by four other MIT researchers, including William Freeman, the Perkins Professor of Electrical Engineering and Computer Science, and by colleagues at DeepMind, ShanghaiTech University, and Shanghai Jiao Tong University.

    Divide and conquer

    The researchers’ system is based on the influential theories of the MIT neuroscientist David Marr, who died in 1980 at the tragically young age of 35. Marr hypothesized that in interpreting a visual scene, the brain first creates what he called a 2.5-D sketch of the objects it contained — a representation of just those surfaces of the objects facing the viewer. Then, on the basis of the 2.5-D sketch — not the raw visual information about the scene — the brain infers the full, three-dimensional shapes of the objects.

    “Both problems are very hard, but there’s a nice way to disentangle them,” Wu says. “You can do them one at a time, so you don’t have to deal with both of them at the same time, which is even harder.”

    Wu and his colleagues’ system needs to be trained on data that include both visual images and 3-D models of the objects the images depict. Constructing accurate 3-D models of the objects depicted in real photographs would be prohibitively time consuming, so initially, the researchers train their system using synthetic data, in which the visual image is generated from the 3-D model, rather than vice versa. The process of creating the data is like that of creating a computer-animated film.

    Once the system has been trained on synthetic data, however, it can be fine-tuned using real data. That’s because its ultimate performance criterion is the accuracy with which it reconstructs the input data. It’s still building 3-D models, but they don’t need to be compared to human-constructed models for performance assessment.

    In evaluating their system, the researchers used a measure called intersection over union, which is common in the field. On that measure, their system outperforms its predecessors. But a given intersection-over-union score leaves a lot of room for local variation in the smoothness and shape of a 3-D model. So Wu and his colleagues also conducted a qualitative study of the models’ fidelity to the source images. Of the study’s participants, 74 percent preferred the new system’s reconstructions to those of its predecessors.

    All that fall

    In another of Wu and Tenenbaum’s papers, on which they’re joined again by Freeman and by researchers at MIT, Cambridge University, and ShanghaiTech University, they train a system to analyze audio recordings of an object being dropped, to infer properties such as the object’s shape, its composition, and the height from which it fell. Again, the system is trained to produce an abstract representation of the object, which, in turn, it uses to synthesize the sound the object would make when dropped from a particular height. The system’s performance is judged on the similarity between the synthesized sound and the source sound.

    Finally, in their fourth paper, Wu, Tenenbaum, Freeman, and colleagues at DeepMind and Oxford University describe a system that begins to model humans’ intuitive understanding of the physical forces acting on objects in the world. This paper picks up where the previous papers leave off: It assumes that the system has already deduced objects’ 3-D shapes.

    Those shapes are simple: balls and cubes. The researchers trained their system to perform two tasks. The first is to estimate the velocities of balls traveling on a billiard table and, on that basis, to predict how they will behave after a collision. The second is to analyze a static image of stacked cubes and determine whether they will fall and, if so, where the cubes will land.

    Wu developed a representational language he calls scene XML that can quantitatively characterize the relative positions of objects in a visual scene. The system first learns to describe input data in that language. It then feeds that description to something called a physics engine, which models the physical forces acting on the represented objects. Physics engines are a staple of both computer animation, where they generate the movement of clothing, falling objects, and the like, and of scientific computing, where they’re used for large-scale physical simulations.

    After the physics engine has predicted the motions of the balls and boxes, that information is fed to a graphics engine, whose output is, again, compared with the source images. As with the work on visual discrimination, the researchers train their system on synthetic data before refining it with real data.

    In tests, the researchers’ system again outperformed its predecessors. In fact, in some of the tests involving billiard balls, it frequently outperformed human observers as well.

    "The key insight behind their work is utilizing forward physical tools — a renderer, a simulation engine, trained models, sometimes — to train generative models," says Joseph Lim, an assistant professor of computer science at the University of Southern California. "This simple yet elegant idea combined with recent state-of-the-art deep-learning techniques showed great results on multiple tasks related to interpreting the physical world."

    << Previous Day 2017/12/13
    [Calendar]
    Next Day >>

MIT Research News   About LJ.Rossia.org