Data Center Knowledge | News and analysis for the data center industry - Industr's Journal
 
[Most Recent Entries] [Calendar View]

Tuesday, August 8th, 2017

    Time Event
    4:01a
    IBM Says It Has Beat Facebook’s AI Server Scaling Record

    Today IBM announced the availability of the beta version of its Distributed Deep Learning software it says has demonstrated “a leap forward in deep learning performance.”

    Deep learning is a form of AI that relies on the application of “artificial neural networks” inspired by the biological neural networks of human and animal brains. Its focus is on giving computers the ability to “understand” the contents of digital images, videos, audio recordings and the like in much the same way that people do.

    Much of the potential for deep learning remains unfulfilled, however, because the logistics of processing the great amount of data required for a system’s “deep level training” makes it a slow process that can take days or even weeks. Accuracy of the results is another issue contributing to the time factor, as the system needs to be taught multiple times in order to gain the desired results. A higher accuracy on each pass means fewer times the computer must be “retrained” until it gets it right.

    Reducing the time factor has been difficult because merely adding more compute power with faster processors and more of them doesn’t speed things up. Actually, just the opposite: as the number of “learner” processors increases, the computation time decreases as expected, but the amount of communication time per learner stays constant.

    In other words, bottlenecks get in the way.

    “Successful distributed deep learning requires an infrastructure in which the hardware and software are co-optimized to balance the computational requirements with the communication demand and interconnect bandwidth,” IBM explained in a research paper. “In addition, the communication latency plays an important role in massive scaling of GPUs (over 100). If these factors are not kept under control, distributed deep learning can quickly reach the point of diminishing return.”

    See also: This Data Center is Designed for Deep Learning

    This has kept most deep learning projects limited to single-server implementations. It’s also where the research and new software IBM unveiled today come into play. The company has learned how to speed up the process with more accurate results.

    “Most popular deep learning frameworks scale to multiple GPUs in a server, but not to multiple servers with GPUs,” Hillery Hunter, director of systems acceleration and memory at IBM Research, wrote in a blog post. “Specifically, our team wrote software and algorithms that automate and optimize the parallelization of this very large and complex computing task across hundreds of GPU accelerators attached to dozens of servers.”

    In tests of the software, IBM researchers achieved record communication overhead and 95 percent scaling efficiency when deploying the Caffe deep learning framework with a cluster of 64 IBM Power systems with 4 NVidia Tesla P100-SXM2 GPUs connected to each — for a total of 256 processors. This bested the previous best scaling of 89 percent demonstrated by Facebook AI Research using smaller learning models and data sets, which reduced complexity.

    See also: Nvidia CEO Says AI Workloads Will Flood Data Centers

    In addition, the tests produced a record image recognition accuracy of 33.8 percent for a neural network trained on a data set of 7.5 million images, besting the previous accuracy record of 29.8 percent posted by Microsoft.

    “My team in IBM Research has been focused on reducing these training times for large models with large data sets,” Hunter wrote. “Our objective is to reduce the wait-time associated with deep learning training from days or hours to minutes or seconds, and enable improved accuracy of these AI models. To achieve this, we are tackling grand-challenge scale issues in distributing deep learning across large numbers of servers and GPUs.”

    Hunter and her team have certainly made a big start in speeding up the process — completing the test in only seven hours.

    “Microsoft took 10 days to train the same model,” she said, referring to the previous industry record. “This achievement required we create the distributed deep learning code and algorithms to overcome issues inherent to scaling these otherwise powerful deep learning frameworks.”

    A beta version, or technical preview, of the code Big Blue developed around the test — IBM Research Distributed Deep Learning software — became available today in IBM PowerAI 4.0, making the cluster scaling feature available to developers using deep learning for training AI models.

    “We expect that by making this DDL feature available to the AI community, we will see many more higher accuracy runs as others leverage the power of clusters for AI model training,” Hunter said.

    12:00p
    Switch Signals Legal Action on Data Center Design Patents Coming

    Switch, the Las Vegas-based data center provider that has for years touted a long list of its founder and CEO Rob Roy’s pending and issued data center design patents, is beefing up its legal team to go after companies it says have infringed on its intellectual property.

    Roy has “more than 350 issued and pending patent claims,” the company says, crediting the CEO with designing everything from power and cooling systems to the meeting-room interiors in its data centers. Switch is now signaling that it’s prepared to pursue legal action against companies it alleges have copied some of his inventions.

    “Today many companies copy these designs in their data centers, and while Switch is flattered, we are also ramping up our IP legal team to address those that are infringing on our patents,” Sam Castor, the company’s executive VP of policy and deputy general counsel, said in a statement last month.

    A licensing deal Switch announced today may be a way to avoid legal conflict for one of the companies he was referring to. Schneider Electric, the French energy management and automation giant that’s also one of the biggest suppliers of data center infrastructure equipment, agreed to license the design of Switch’s hot-aisle containment and cooling system, which prevents hot air that comes out of servers from mixing with cold air produced by the facility’s cooling system, making the system more efficient.

    Switch may have developed a unique way to implement hot-aisle containment, but the overall concept has been in widespread use by the data center industry for years. Keeping hot and cold air from mixing is a generally accepted best practice in designing energy efficient data centers.

    But Schneider licensed nearly 270 patents that describe Switch’s approach, according to Adam Kramer, executive VP of strategy at Switch. With that many patents, there are bound to be elements that make its particular implementation unique.

    Switch did not say why Schneider licensed the intellectual property in its announcement of the deal. In a phone interview with Data Center Knowledge Kramer only said, “They want to use this IP.”

    It is possible that Schneider is licensing technology it has been using in its products already. It is a €25 billion corporation with vast engineering resources it could tap to develop its own implementation of the hot-aisle-containment concept. We’ve reached out to Schneider for clarification and will update this story when we get a response.

    In a statement, Schneider’s senior VP of data center systems, Chris Hanley, said the license will “clear the way for us to incorporate Switch’s innovative hot-aisle containment and cooling technologies, which will complement Schneider’s product offerings and efficiencies.”

    The hot-aisle containment and cooling system it licensed is called Switch T-SCIF (Thermal Separate Compartment in Facility). According to Kramer, key among the 265 patents licensed are:

    • Integrated Wiring System and Thermal Shield Support Apparatus for a Data Center: U.S. Patent Number: 8,072,780
    • Air Handling Control System for a Data Center: U.S. Patent Number: 8,180,495
    • Data Center Air Handling Unit: U.S. Patent Number: 8,469,782
    • Electronic Equipment Data Center of Co-Location Facility Designs and Method of Making and Using the Same: U.S. Patent Number: 8,523,643
    • Data Center Facility Design Configuration: U.S. Patent Number: 9,198,331
    • Electronic Equipment Data Center and Server Co-Location Facility Configurations and Method of Using the Same: U.S. Patent Number: 9,622,389
    • Air Handling Unit With A Canopy Thereover For Use with a Data Center and Method of Using the Same: U.S. Patent Number 9,693,486

    Switch said it has had a licensing program since 2016. Its first licensee was NV Energy, the Berkshire Hathaway-owned Nevada utility.

    6:13p
    Report: IBM Tries to Block Former Exec from Joining AWS

    Brought to you by IT Pro

    IBM is suing a former senior manager for violating a non-compete agreement as he starts a new job at Amazon Web Services (AWS) three months after leaving his role as CIO for transformation and operations at IBM.

    According to a report by Westfair Online, IBM sued Jeff S. Smith last week in White Plains, NY, demanding that he repay $1.7 million in stock bonuses. Smith had worked at IBM since 2014.

    IBM argues that in starting his job at AWS on Monday, he would “inevitably be involved in decision-making about how best to compete against IBM and would inevitably disclose or use IBM trade secrets.” The lawsuit also alleges that Smith had shared inside information with AWS CEO Andy Jassy while he was working for IBM, wiping his company phone and tablet to make it impossible to detect communications, the report said.

    The judge, who blocked Smith from starting his job at AWS on Aug. 1 until a full hearing could be scheduled, amended the order to allow him to begin work Monday in “listen and learn mode” for training purposes.

    According to IBM, Smith signed a non-compete agreement where he agreed not to work for a competitor for one year. He notified IBM in June of his plans to start work at AWS in August.

    IBM has asked the court to ban Smith from work for AWS until May 2, 2018. There is a hearing scheduled for Aug. 21.

    In June, AWS won a temporary restraining order to prevent a former executive from joining Smartsheet, a collaboration software provider, only to drop the lawsuit a week later.

    << Previous Day 2017/08/08
    [Calendar]
    Next Day >>

Data Center Knowledge | News and analysis for the data center industry - Industry News and Analysis About Data Centers   About LJ.Rossia.org