Data Center Knowledge | News and analysis for the data center industry - Industr's Journal
 
[Most Recent Entries] [Calendar View]

Tuesday, May 16th, 2017

    Time Event
    12:00p
    NVIDIA CEO: AI Workloads Will “Flood” Data Centers

    During a keynote at his company’s big annual conference in Silicon Valley last week, NVIDIA CEO Jensen Huang took several hours to announce the chipmaker’s latest products and innovations, but also to drive home the inevitability of the force that is Artificial Intelligence.

    NVIDIA is the top maker of GPUs used in computing systems for Machine Learning, currently the part of the AI field where most action is happening. GPUs work in tandem with CPUs, accelerating the processing necessary to both train machines to do certain tasks and to execute them.

    “Machine Learning is one of the most important computer revolutions ever,” Huang said. “The number of [research] papers in Deep Learning is just absolutely explosive.” (Deep Learning is a class of Machine Learning algorithms where innovation has skyrocketed in recent years.) “There’s no way to keep up. There is now 10 times as much investment in AI companies since 10 years ago. There’s no question we’re seeing explosive growth.”

    While AI and Machine Learning are one of Gartner’s top 10 strategic technology trends for 2017, most other trends on the list – such as conversational systems, virtual and augmented reality, Internet of Things, and intelligent apps – are accelerating in large part because of advances in Machine Learning.

    “Over the next 10 years, virtually every app, application and service will incorporate some level of AI,” Gartner fellow and VP David Cearley said in a statement. “This will form a long-term trend that will continually evolve and expand the application of AI and machine learning for apps and services.”

    No Longer Just for Hyper-Scalers

    Growth in Machine Learning means a “flood” of AI workloads is headed for the world’s data center floors, Huang said. Up until now, the most impactful production applications of Deep Learning have been developed and deployed by a handful of hyper-scale cloud giants – such as Google, Microsoft, Facebook, and Baidu – but NVIDIA sees the technology starting to proliferate beyond the massive cloud data centers.

    “AI is just another kind of computing, and it’s going to hit many, many markets,” Ian Buck, the NVIDIA VP in charge of the company’s Accelerated Computing unit, told Data Center Knowledge in an interview. While there’s no doubt that Machine Learning will continue growing as portion of the total computing power inside cloud data centers, he expects to see it in data centers operated by everybody in the near future — from managed service providers to banks. “It’s going to be everywhere.”

    In preparation for this flood, data center managers need to answer some key basic questions: Will it make more sense for my company to host Deep Learning workloads in the cloud or on-premises? Will it be a hybrid of the two? How much of the on-prem infrastructure will be needed for training Deep Learning algorithms? How much of it will be needed for inference? If we’ll have a lot of power-hungry training servers, will we go for maximum performance or give up some performance in exchange for higher efficiency of the whole data center? Will we need inference capabilities at the edge?

    Cloud or On-Premises? Probably Both

    Today, many companies large and small are in early research phases, looking for ways Deep Learning can benefit their specific businesses. One data center provider that specializes in hosting infrastructure for Deep Learning told us most of their customers hadn’t yet deployed their AI applications in production.

    This drives demand for rentable GPUs in the cloud, which Amazon Web Services, Microsoft Azure, and Google Cloud Platform are happy to provide. By using their services, researchers can access lots of GPUs without having to spend a fortune on on-premises hardware.

    “We’re seeing a lot of demand for it [in the] cloud,” Buck said. “Cloud is one of the reasons why all the hyper-scalers and cloud providers are excited about GPUs.”

    A common approach, however, is combining some on-premises systems with cloud services. Berlin-based AI startup Twenty Billion Neurons, for example, synthesizes video material to train its AI algorithm to understand the way physical objects interact with their environment. Because those videos are so data-intensive, twentybn uses an on-premises compute cluster at its lab in Toronto to handle them, while outsourcing the actual training workloads to cloud GPUs in a Cirrascale data center outside San Diego.

    Read more: This Data Center is Designed for Deep Learning

    Cloud GPUs are also a good way to start exploring Deep Learning for a company without committing a lot of capital upfront. “We find that cloud is a nice lubricant to getting adoption up for GPUs in general,” Buck said.

    Efficiency v. Performance

    If your on-premises Deep Learning infrastructure will do a lot of training – the computationally intensive applications used to teach neural networks things like speech and image recognition – prepare for power-hungry servers with lots of GPUs on every motherboard. That means higher power densities than most of the world’s data centers have been designed to support (we’re talking up to 30kW per rack).

    Read more: Deep Learning Driving Up Data Center Power Density

    However, it doesn’t automatically mean you’ll need as high-density cooling infrastructure as possible. Here, the tradeoff is between performance and the number of users, or workloads, the infrastructure can support simultaneously. Maximum performance means the highest-power GPUs money can buy, but it’s not necessarily the most efficient way to go.

    NVIDIA’s latest Volta GPUs, expected to hit the market in the third quarter, deliver maximum performance at 300 watts, but if you slash the power in half you will still get 80 percent of the number-crunching muscle, Buck said. If “you back off power a little bit, you still maintain quite a bit of performance. It means I can up the number of servers in a rack and max out my data center. It’s just an efficiency choice.”

    What about the Edge?

    Inferencing workloads – applications neural networks use to apply what they’ve been trained to do – require fewer GPUs and less power, but they have to perform extremely fast. (Alexa wouldn’t be much fun to use if it took even 5 seconds to respond to a voice query.)

    While not particularly difficult to handle on-premises, one big question to answer about inferencing servers for the data center manager is how close they have to be to where input data originates. If your corporate data centers are in Ashburn, Virginia, but your Machine Learning application has to provide real-time suggestions to users in Dallas or Portland, chances are you’ll need some inferencing servers in or near Dallas and Portland to make it actually feel close to real-time. If your application has to do with public safety — analyzing video data at intersections to help navigate autonomous vehicles for example – it’s very likely that you’ll need some inferencing horsepower right at those intersections.

    “Second Era of Computing”

    Shopping suggestions on Amazon.com (one of the earliest uses of Machine Learning in production), Google search predictions, neither of these capabilities was written out as sequences of specific if/then instructions by software engineers, Huang said, referring to the rise of Machine Learning as a “second era of computing.”

    And it’s growing quickly, permeating all industry verticals, which means data center managers in every industry have some homework to do.

    3:52p
    HPE Rolls Out The Machine Prototype, Its Version of the Future of Computing

    Hewlett Packard Enterprise unveiled its answer to the problems that may arise in the near future as the size of datasets that need to be analyzed outgrows the capabilities of the fastest processors. That answer is essentially a single massive pool of memory, with a single address space.

    A prototype version of The Machine the company introduced Tuesday has 160 terabytes of memory across 40 physical nodes interconnected by a high-performance fabric.

    Importantly, The Machine is not powered by Intel processors, which dominate the data center and high-performance computing market, but by an ARM System-on-Chip designed by Cavium. While nobody can predict the future of computing, Intel’s role in HPE’s version of that future appears to be a lot smaller than it is today.

    The core idea behind HPE’s new architecture – “the largest R&D program in the history of the company” – is replacing focus on the processor with focus on memory. From the press release:

    “By eliminating the inefficiencies of how memory, storage and processors interact in traditional systems today, Memory-Driven Computing reduces the time needed to process complex problems from days to hours, hours to minutes, minutes to seconds – to deliver real-time intelligence.”

    The philosophy is similar to the one behind in-memory computing systems for Oracle or SAP HANA databases. Holding all the data in memory theoretically makes computing faster because data doesn’t have to be shuffled between storage and memory.

    This means one of the biggest engineering challenges in creating these systems is designing the fabric that interconnects CPUs to memory in a way that avoids bottlenecks.

    It’s a similar challenge to the one that faces engineers who work on another answer to the reportedly looming disconnect between data volume and processing muscle: offloading big processing jobs from the CPU to a big pool of GPUs.

    The Machine prototype uses photonics to interconnect components.

    HPE expects the size of shared memory to scale in the future “to a nearly limitless pool of memory – 4,096 yottabytes,” which is 250,000 times the size of all digital data that exists today, according to the company:

    “With that amount of memory, it will be possible to simultaneously work with every digital health record of every person on earth; every piece of data from Facebook; every trip of Google’s autonomous vehicles; and every data set from space exploration all at the same time.”

    4:36p
    Microsoft Faulted Over Ransomware While Shifting Blame to NSA

    Dina Bass (Bloomberg) — There’s a blame game brewing over who’s responsible for the massive cyberattack that infected hundreds of thousands of computers. Microsoft Corp. is pointing its finger at the U.S. government, while some experts say the software giant is accountable too.The attack started Friday and has affected computers in more than 150 countries, including severe disruptions at Britain’s National Health Service. The hack used a technique purportedly stolen from the U.S. National Security Agency to target Microsoft’s market-leading Windows operating system. It effectively takes the computer hostage and demands a $300 ransom, to be paid in 72 hours with bitcoin.

    Microsoft President and Chief Legal Officer Brad Smith blamed the NSA’s practice of developing hacking methods to use against the U.S. government’s own enemies. The problem is that once those vulnerabilities become public, they can be used by others. In March, thousands of leaked Central Intelligence Agency documents exposed vulnerabilities in smartphones, televisions and software built by Apple Inc., Google and Samsung Electronics Co.

    The argument that it’s the NSA’s fault has merit, according to Alex Abdo, staff attorney at the Knight First Amendment Institute at Columbia University. Still, he said Microsoft should accept some responsibility.

    “Technology companies owe their customers a reliable process for patching security vulnerabilities,” he said. “When a design flaw is discovered in a car, manufacturers issue a recall. Yet, when a serious vulnerability is discovered in software, many companies respond slowly or say it’s not their problem.’”

    Microsoft released a patch for the flaw in March after hackers stole the exploit from the NSA. But some organizations didn’t apply it, and others were running older versions of Windows that Microsoft no longer supports. In what it said was a  “highly unusual” step, Microsoft also agreed to provide the patch for older versions of Windows, including Windows XP and Windows Server 2003.

    In 2014, Microsoft ended support for the highly popular Windows XP, released in 2001 and engineered beginning in the late 1990s, arguing that the software was out of date and wasn’t built with modern security safeguards.  The company had already been supporting it longer than it normally would have because so many customers still used it and the effort was proving costly. Security patches would be available for clients with older machines, but only if they paid for custom support agreements.

    But with Microsoft making an exception this time and providing the patch free to XP users, it may come under pressure to do the same next time it issues a critical security update. (These are the most important patches that the company recommends users install immediately). That could saddle the company with the XP albatross for many years past when it hoped to be free from having to maintain the software. The precedent may impact other software sellers too.

    “They’re going to end up going above and beyond and some vendors are going to start extending support for out-of-support things that they haven’t done before,” said Greg Young, an analyst at market research firm Gartner Inc. “That’s going to become a more common practice.”On Monday,  private-sector sleuths found a clue about potentially who’s responsible for the WannaCrypt attack.  A researcher from Google posted on Twitter that an early version of WannaCrypt from February shared some of the same programming code as malicious software used by the Lazarus Group, the alleged North Korean government hackers behind the destructive attack on Sony Corp. in 2014 and the theft of $81 million from a Bangladesh central bank account at the New York Fed last year. Others subsequently confirmed the Google researcher’s work.On its own, the shared code is little more than an intriguing lead. Once malicious software is in the wild, it is commonly reused by hacking groups, especially nation-states trying to leave the fingerprints of another country. But in this case, according to Kaspersky Lab, the shared code was removed from the versions of WannaCrypt that are currently circulating, which reduces the likelihood of such a ‘false flag’ attempt at misdirection. Some security researchers speculated that if the perpetrators were North Korean, the goal may have been to cause a widespread internet outage to coincide with this weekend’s latest missile test.

    As for Microsoft, some intelligence agency experts questioned its NSA criticism, saying it’s unreasonable for the company to ask governments to stop using its products as a way to attack and monitor enemies.

    “For Microsoft to say that governments should stop developing exploits to Microsoft products is naive,” said Brian Lord, a managing director at PGI Cyber and former deputy director at the Government Communications Headquarters, one of the U.K.’s intelligence agencies. “To keep the world safe these things have to be done.”

    He said that intelligence agencies tended to be good and responsible stewards of the hacks and exploits they develop. “Occasionally mistakes happen,” he added.

    –With assistance from Gerrit De Vynck and Jeremy Kahn

    6:45p
    A Little TLC Goes a Long Way Toward Staff Retention

    Good help is often hard to find and even harder to keep. That’s why it makes sense for you to make consistent efforts to increase positive attitudes and retention by taking a few extra morale-building steps.

    When you and your organization are effective talent magnets, good employees stick around rather than look for work elsewhere.

    That combination—getting and sustaining a top talent pool—is at least one of the top two or three concerns on the minds of CIOs today. So, how can you attract and hire the best talent, engage their commitment and loyalty, and retain them for longer periods of time?

    Here are some tips from Brian Carlsen, a speaker and consultant with St. Aubin, Haggerty & Associates, Inc., and co-author of “Attract, Engage and Retain Top Talent.”

    Notice them. Think of the supervisors and staff who work for you. Which of them are easy to work with and have the potential to grow? Who exhibits the qualities you need in a technology team member? When you identify people like this, tell them how valuable they are to you and the organization. This is how to become a career mentor.

    Mentor them. When you know you want to keep someone, be creative about ways to keep him or her around. Make it a point to learn their interests and career goals. Then, help them do it. For example, in the DSC Logistics data center, one person showed potential and wanted to go back to school in computer science. The company rearranged his schedule to work part-time and certain nights so that he could keep working while also getting his degree. This helped DSC Logistics to hang onto him as he continued his education.

    Invest and re-invest. Before any large-scale process or technology change, ask yourself: How does this affect my staff? According to a Seachdatacenter.com article, “Staffing Often Overlooked During Platform Migration,” the data center of San Mateo County in California pulled off a migration that saved the data center $500,000 the first year. The data center then reinvested much of that money back into its staff by retraining current workers and bringing in new people, including a systems programmer.

    Connect them. U.S. Oncology is the nation’s leading healthcare services network dedicated exclusively to cancer treatment and research. The management team at US Oncology reduced voluntary turnover across IT from 25 percent to 10 percent in just one year through a broad range of initiatives in response to an employee engagement survey. Several of those initiatives are designed to help connect technology staff to their colleagues, their leaders and the greater organization and its customers. For example, the department formed action committees to help staff connect across groups and focus on retention. The company holds new employee luncheons to talk about the culture and mission of the organization. That’s because IT people are often far removed from the organization’s mission of advancing cancer care in America. Consequently, it’s important to go over why what they do is important to the mission of the company.

    Manage the interface. Your most talented data center personnel share the outlook of other good performers…they want a manager that shows them respect, helps them remove barriers, and gets out of the way so they can get the work done. An obstacle to job satisfaction that your data center staff may frequently encounter is abrasive interactions with technology specialists in other units. They may or may not be equipped to handle these interactions well. While you may not enjoy this part of the job, it may be one of the most important things you do during the day.

    Communicate. People want to know what is going on. Data center staff may feel they are working in isolation. Yet, people want to be connected. You can never communicate enough; employees may think you are holding something back when communication is absent. Communicate through group meetings and individually. The more you do this with your staff the more it contributes to higher levels of trust; you in them and they in you.

    Thank them. An easy thing to do (but a hard thing to remember) is to simply thank people. When people are appreciated, they often give extra effort and say more positive things about their work and their colleagues. In some organizations, the cultural norm is to tell people when they are doing something wrong, and stay silent when there is no problem. Magnetic power is created when you tell someone that he or she is doing something right. Easy, high-gain ways to show your appreciation include sending frequent, sincere thank you notes, talking about the good things that people are doing, and acknowledging your staff publicly—internally and externally.

    A focus on any one of these areas can add strength to your magnetic pull on mission-critical employees. Your team wants to work with good leaders and colleagues who appreciate them, treat them with respect and help them be more fulfilled in their work life. A well-intended effort to retain and engage your current staff members may have a more positive impact than you can imagine.

    << Previous Day 2017/05/16
    [Calendar]
    Next Day >>

Data Center Knowledge | News and analysis for the data center industry - Industry News and Analysis About Data Centers   About LJ.Rossia.org