MIT Research News' Journal
 
[Most Recent Entries] [Calendar View]

Thursday, September 20th, 2018

    Time Event
    12:00a
    Reducing false positives in credit card fraud detection

    Have you ever used your credit card at a new store or location only to have it declined? Has a sale ever been blocked because you charged a higher amount than usual?

    Consumers’ credit cards are declined surprisingly often in legitimate transactions. One cause is that fraud-detecting technologies used by a consumer’s bank have incorrectly flagged the sale as suspicious. Now MIT researchers have employed a new machine-learning technique to drastically reduce these false positives, saving banks money and easing customer frustration.

    Using machine learning to detect financial fraud dates back to the early 1990s and has advanced over the years. Researchers train models to extract behavioral patterns from past transactions, called “features,” that signal fraud. When you swipe your card, the card pings the model and, if the features match fraud behavior, the sale gets blocked.

    Behind the scenes, however, data scientists must dream up those features, which mostly center on blanket rules for amount and location. If any given customer spends more than, say, $2,000 on one purchase, or makes numerous purchases in the same day, they may be flagged. But because consumer spending habits vary, even in individual accounts, these models are sometime inaccurate: A 2015 report from Javelin Strategy and Research estimates that only one in five fraud predictions is correct and that the errors can cost a bank $118 billion in lost revenue, as declined customers then refrain from using that credit card.

    The MIT researchers have developed an “automated feature engineering” approach that  extracts more than 200 detailed features for each individual transaction — say, if a user was present during purchases, and the average amount spent on certain days at certain vendors. By doing so, it can better pinpoint when a specific card holder’s spending habits deviate from the norm.

    Tested on a dataset of 1.8 million transactions from a large bank, the model reduced false positive predictions by 54 percent over traditional models, which the researchers estimate could have saved the bank 190,000 euros (around $220,000) in lost revenue.

    “The big challenge in this industry is false positives,” says Kalyan Veeramachaneni, a principal research scientist at MIT’s Laboratory for Information and Decision Systems (LIDS) and co-author of a paper describing the model, which was presented at the recent European Conference for Machine Learning. “We can say there’s a direct connection between feature engineering and [reducing] false positives. … That’s the most impactful thing to improve accuracy of these machine-learning models.”

    Paper co-authors are: lead author Roy Wedge, a former researcher in the Data to AI Lab at LIDS; James Max Kanter ’15, SM ’15; and Santiago Moral Rubio and Sergio Iglesias Perez of Banco Bilbao Vizcaya Argentaria.

    Extracting “deep” features

    Three years ago, Veeramachaneni and Kanter developed Deep Feature Synthesis (DFS), an automated approach that extracts highly detailed features from any data, and decided to apply it to financial transactions.

    Enterprises will sometimes host competitions where they provide a limited dataset along with a prediction problem such as fraud. Data scientists develop prediction models, and a cash prize goes to the most accurate model. The researchers entered one such competition and achieved top scores with DFS.

    However, they realized the approach could reach its full potential if trained on several sources of raw data. “If you look at what data companies release, it’s a tiny sliver of what they actually have,” Veeramachaneni says. “Our question was, ‘How do we take this approach to actual businesses?’”

    Backed by the Defense Advanced Research Projects Agency’s Data-Driven Discovery of Models program, Kanter and his team at FeatureLabs — a spinout commercializing the technology — developed an open-source library for automated feature extraction, called Featuretools, which was used in this research.

    The researchers obtained a three-year dataset provided by an international bank, which included granular information about transaction amount, times, locations, vendor types, and terminals used. It contained about 900 million transactions from around 7 million individual cards. Of those transactions, around 122,000 were confirmed as fraud. The researchers trained and tested their model on subsets of that data.

    In training, the model looks for patterns of transactions and among cards that match cases of fraud. It then automatically combines all the different variables it finds into “deep” features that provide a highly detailed look at each transaction. From the dataset, the DFS model extracted 237 features for each transaction. Those represent highly customized variables for card holders, Veeramachaneni says. “Say, on Friday, it’s usual for a customer to spend $5 or $15 dollars at Starbucks,” he says. “That variable will look like, ‘How much money was spent in a coffee shop on a Friday morning?’”

    It then creates an if/then decision tree for that account of features that do and don’t point to fraud. When a new transaction is run through the decision tree, the model decides in real time whether or not the transaction is fraudulent.

    Pitted against a traditional model used by a bank, the DFS model generated around 133,000 false positives versus 289,000 false positives, about 54 percent fewer incidents. That, along with a smaller number of false negatives detected — actual fraud that wasn’t detected — could save the bank an estimated 190,000 euros, the researchers estimate.

    Stacking primitives

    The backbone of the model consists of creatively stacked “primitives,” simple functions that take two inputs and give an output. For example, calculating an average of two numbers is one primitive. That can be combined with a primitive that looks at the time stamp of two transactions to get an average time between transactions. Stacking another primitive that calculates the distance between two addresses from those transactions gives an average time between two purchases at two specific locations. Another primitive could determine if the purchase was made on a weekday or weekend, and so on.

    “Once we have those primitives, there is no stopping us for stacking them … and you start to see these interesting variables you didn’t think of before. If you dig deep into the algorithm, primitives are the secret sauce,” Veeramachaneni says.

    One important feature that the model generates, Veeramachaneni notes, is calculating the distance between those two locations and whether they happened in person or remotely. If someone who buys something at, say, the Stata Center in person and, a half hour later, buys something in person 200 miles away, then it’s a high probability of fraud. But if one purchase occurred through mobile phone, the fraud probability drops.

    “There are so many features you can extract that characterize behaviors you see in past data that relate to fraud or nonfraud use cases,” Veeramachaneni says.

    12:00p
    A game changer takes on cricket’s statistical problem

    Jehangir Amjad has done something few people can: He found a way to combine his favorite sport with his work. A longtime cricket enthusiast and player, he’s currently tackling an important statistical problem in the game — how to declare a winner when a match must end prematurely, due to weather or other circumstances. Given cricket’s global popularity, and the fact that matches can last for several hours, it’s a problem of great interest to fans and players alike.

    For Amjad, it’s also a project that incorporates his passion for operations research. And the Laboratory for Information and Decision Systems (LIDS) was the perfect place for him to explore it.

    Amjad took a circuitous path to MIT. Born and raised in Pakistan, he received a scholarship to complete his last two years of high school at the Red Cross Nordic United World College in Norway. Along with the school’s 200 other students, who came from over 100 countries, he studied, made personal and professional connections, and learned how to live with people of many different cultures during his time there. He then returned home to teach for a year (following in the footsteps of his parents, who are both professors), before attending Princeton University for a bachelor's in electrical engineering.

    He graduated in 2010, and assuming he was finished with school, went to Microsoft to be a product manager. After several years there, though, he felt restless. Realizing that he’d found himself increasingly drawn to data science and machine learning since starting at Microsoft, he says figured he could either stay in the tech industry and learn more about these fields on the job, or “go back to school to master the mathematical nuances of this field.” He chose academics and came to MIT in 2013 as a graduate student in the Operations Research Center. There, he collaborated frequently with LIDS students and researchers, under the supervision of MIT Professor Devavrat Shah.

    Because Shah is also a cricket fan, he and Amjad had been discussing the cricket problem for years, although Amjad didn’t land on his research project immediately. In fact, the theory that he is now applying to the cricket problem — robust synthetic control — is mostly used in economics, health policy, and political science. But because all of his work is interdisciplinary, he was able to see how to connect them. “A lot of what we train on [at LIDS] is the methods, but the applications are and should be very diverse,” Amjad says.

    The current standard for international cricket games is to use the Duckworth-Lewis-Stern (DLS) method, created by British statisticians in the mid-1990s, to determine the winner when a game has to be called early. Amjad is viewing this as a forecasting problem.

    “We aren’t just interested in predicting what the final score would be; we actually project out the entire trajectory for every ball, we project out what might happen on average,” he says.

    In collaboration with Shah and Vishal Misra, a professor of computer science at Columbia University, Jehangir has used the robust synthetic control method to propose a solution to the forecasting problem, which has also led to a target revision algorithm like the Duckworth-Lewis-Stern method. Having back-tested their cricket results on many games, they are confident in the approach. They are currently comparing it to DLS, he says, and planning “what statistical argument we can make so that we can hopefully convince people that we have a viable alternative.”

    Broadly, synthetic control is a statistical method for evaluating the effects of an intervention. In many cases, the intervention is the introduction of a new law or regulation.

    “Let’s say that 10 years ago, Massachusetts introduced a new labor law, and you wanted to study the impact of that law,” Amjad explains. “This theory says you can use a data-driven approach to come up with a synthetic Massachusetts, one that that mimics Massachusetts as well as possible before the law was in place, so that you can then project what would have happened in Massachusetts had this law not been introduced.”

    This creates a useful comparison point to the real Massachusetts, where the law has been in place. Placing the two side-by-side — the synthetic Massachusetts data and the real Massachusetts data — gives a sense of the law’s impact.

    Amjad and his collaborators have developed a robust generalization of the classical method known as Robust Synthetic Control. In examining a problem this way, it turns out that limited and missing data do not become insurmountable obstacles. Instead, these sorts of difficulties can be accommodated, which is especially useful in the social sciences where there may not be many common data points available.

    Continuing his example, he says, “the method is about using data about other states … to construct a synthetic unit. So, specifically, coming up with a synthetic Massachusetts that ends up being 20 percent like New York, 10 percent Wyoming, 5 percent something else — coming up with a weighted average of those. And those weights are essentially what is known as the synthetic control because now you’ve fixed those weights and you’re going to project that out into the future to say, ‘This is what would have happened had the law not been introduced.’”

    Eventually, as research continues and more data become available to add to the synthetic unit, the accuracy of the results should improve, he says.

    Amjad has used robust synthetic control in this more traditional way, as well. One of his other projects has been a collaboration with a team at the University of Washington on a study of alcohol and marijuana use to assess whether various laws have, over time, affected their sale and use. Another example he mentions as being a particularly good fit is any situation where a randomized control trial isn’t possible, such as studying the effect of distributing international aid in a crisis. Here, the moral and ethical implications of denying certain people aid make it impossible to use a randomized trial. Instead, observational studies are in order.

    “You [the researcher] can’t control who gets the treatment and who doesn’t,” he says, but the results of it can be watched, recorded, and studied. As his work evolves, he’s also looking towards the future, thinking about time series forecasting and imputation.

    “My work has converged on imputation and forecasting methods, whether it’s synthetic control or just pure time-series analysis,” he says.

    This intersection is an emerging field of study. Econometricians historically used small data sets and classical statistics for problem solving, but with modern machine learning, options now exist that use lots of data to do approximate inference instead. Combining these approaches means you can explore the why of the problem and the prediction.

    “You care both about the explanatory power and the predictive power, using these algorithms,” Amjad says. “These are designed for a larger scale, where you can still be prescriptive as well as predictive.” Elections forecasting is just one important example of the areas in which this work could be put to use.

    Having defended his thesis earlier this year, Amjad is now a lecturer of machine learning at MIT’s Computer Science and Artificial Intelligence Laboratory. He says he is grateful for his time at LIDS — and all of the inspirational individuals he’s met and the groundbreaking ideas he’s come across here.

    “The biggest lesson of my PhD is that it’s a journey,” he says. “LIDS is very accepting of you breaking the norm. They let people wander. And what that really helps you with is to understand that you can deal with ambiguity. If there is a problem that I don’t know about, I may never be able to completely solve it, but that won’t prevent me from thinking about it in a systematic way to hope to solve some parts of it.”

    12:40p
    Recognizing the partially seen

    When we open our eyes in the morning and take in that first scene of the day, we don’t give much thought to the fact that our brain is processing the objects within our field of view with great efficiency and that it is compensating for a lack of information about our surroundings — all in order to allow us to go about our daily functions. The glass of water you left on the nightstand when preparing for bed is now partially blocked from your line of sight by your alarm clock, yet you know that it is a glass.

    This seemingly simple ability for humans to recognize partially occluded objects — defined in this situation as the effect of one object in a 3-D space blocking another object from view — has been a complicated problem for the computer vision community. Martin Schrimpf, a graduate student in the DiCarlo lab in the Department of Brain and Cognitive Sciences at MIT, explains that machines have become increasingly adept at recognizing whole items quickly and confidently, but when something covers part of that item from view, this task becomes increasingly difficult for the models to accurately recognize the article.

    “For models from computer vision to function in everyday life, they need to be able to digest occluded objects just as well as whole ones — after all, when you look around, most objects are partially hidden behind another object,” says Schrimpf, co-author of a paper on the subject that was recently published in the Proceedings of the National Academy of Sciences (PNAS).

    In the new study, he says, “we dug into the underlying computations in the brain and then used our findings to build computational models. By recapitulating visual processing in the human brain, we are thus hoping to also improve models in computer vision.”

    How are we as humans able to repeatedly do this everyday task without putting much thought and energy into this action, identifying whole scenes quickly and accurately after injesting just pieces? Researchers in the study started with the human visual cortex as a model for how to improve the performance of machines in this setting, says Gabriel Kreiman, an affiliate of the MIT Center for Brains, Minds, and Machines. Kreinman is a professor of ophthalmology at Boston Children’s Hospital and Harvard Medical School and was lead principal investigator for the study.

    In their paper, "Recurrent computations for visual pattern completion," the team showed how they developed a computational model, inspired by physiological and anatomical constraints, that was able to capture the behavioral and neurophysiological observations during pattern completion. In the end, the model provided useful insights towards understanding how to make inferences from minimal information.

    Work for this study was conducted at the Center for Brains, Minds and Machines within the McGovern Institute for Brain Research at MIT.

    2:06p
    Plug-and-play technology automates chemical synthesis

    Designing a new chemical synthesis can be a laborious process with a fair amount of drudgery involved — mixing chemicals, measuring temperatures, analyzing the results, then starting over again if it doesn’t work out.

    MIT researchers have now developed an automated chemical synthesis system that can take over many of the more tedious aspects of chemical experimentation, freeing up chemists to spend more time on the more analytical and creative aspects of their research.

    “Our goal was to create an easy-to-use system that would allow scientists to come up with the best conditions for making their molecules of interest — a general chemical synthesis platform with as much flexibility as possible,” says Timothy F. Jamison, head of MIT’s Department of Chemistry and one of the leaders of the research team.

    This system could cut the amount of time required to optimize a new reaction, from weeks or months down to a single day, the researchers say. They have patented the technology and hope that it will be widely used in both academic and industrial chemistry labs.

    “When we set out to do this, we wanted it to be something that was generally usable in the lab and not too expensive,” says Klavs F. Jensen, the Warren K. Lewis Professor of Chemical Engineering at MIT, who co-led the research team. “We wanted to develop technology that would make it much easier for chemists to develop new reactions.”

    Former MIT postdoc Anne-Catherine Bédard and former MIT research associate Andrea Adamo are the lead authors of the paper, which appears in the Sept. 20 online edition of Science.

    Going with the flow

    The new system makes use of a type of chemical synthesis known as continuous flow. With this approach, the chemical reagents flow through a series of tubes, and new chemicals can be added at different points. Other processes such as separation can also occur as the chemicals flow through the system.

    In contrast, traditional “batch chemistry” requires performing each step separately, and human intervention is required to move the reagents along to the next step.

    A few years ago, Jensen and Jamison developed a continuous flow system that can rapidly produce pharmaceuticals on demand. They then turned their attention to smaller-scale systems that could be used in research labs, in hopes of eliminating much of the repetitive manual experimentation needed to develop a new process to synthesize a particular molecule.

    To achieve that, the team designed a plug-and-play system with several different modules that can be combined to perform different types of synthesis. Each module is about the size of a large cell phone and can be plugged into a port, just as computer components can be connected via USB ports. Some of modules perform specific reactions, such as those catalyzed by light or by a solid catalyst, while others separate out the desired products. In the current system, five of these components can be connected at once.

    The person using the machine comes up with a plan for how to synthesize a desired molecule and then plugs in the necessary modules. The user then tells the machine what reaction conditions (temperature, concentration of reagents, flow rate, etc.) to start with. For the next day or so, the machine uses a general optimization program to explore different conditions and ultimately to determine which conditions generate the highest yield of the desired product.

    Meanwhile, instead of manually mixing chemicals together and then isolating and testing the products, the researcher can go off to do something else.

    “While the optimizations are being performed, the users could be talking to their colleagues about other ideas, they could be working on manuscripts, or they could be analyzing data from previous runs. In other words, doing the more human aspects of research,” Jamison says.

    Rapid testing

    In the new study, the researchers created about 50 different organic compounds, and they believe the technology could help scientists more rapidly design and produce compounds that could be tested as potential drugs or other useful products. This system should also make it easier for chemists to reproduce reactions that others have developed, without having to reoptimize every step of the synthesis.

    “If you have a machine where you just plug in the components, and someone tries to do the same synthesis with a similar machine, they ought to be able to get the same results,” Jensen says.

    The researchers are now working on a new version of the technology that could take over even more of the design work, including coming up with the order and type of modules to be used. 

    The research was funded by the Defense Advanced Research Projects Agency (DARPA).

    << Previous Day 2018/09/20
    [Calendar]
    Next Day >>

MIT Research News   About LJ.Rossia.org