Data Center Knowledge | News and analysis for the data center industry - Industr's Journal
[Most Recent Entries]
[Calendar View]
Wednesday, May 28th, 2014
Time |
Event |
8:00a |
Google Using Machine Learning to Boost Data Center Efficiency Google is using machine learning and artificial intelligence to wring even more efficiency out of its mighty data centers.
In a presentation today at Data Centers Europe 2014, Google’s Joe Kava said the company has begun using a neural network to analyze the oceans of data it collects about its server farms and to recommend ways to improve them. Kava is the Internet giant’s vice president of data centers.
In effect, Google has built a computer that knows more about its data centers than even the company’s engineers. The humans remain in charge, but Kava said the use of neural networks will allow Google to reach new frontiers in efficiency in its server farms, moving beyond what its engineers can see and analyze.
Google already operates some of the most efficient data centers on earth. Using artificial intelligence will allow Google to peer into the future and model how its data centers will perform in thousands of scenarios.
In early usage, the neural network has been able to predict Google’s Power Usage Effectiveness with 99.96 percent accuracy. Its recommendations have led to efficiency gains that appear small, but can lead to major cost savings when applied across a data center housing tens of thousands of servers.
Why turn to machine learning and neural networks? The primary reason is the growing complexity of data centers, a challenge for Google, which uses sensors to collect hundreds of millions of data points about its infrastructure and its energy use.
“In a dynamic environment like a data center, it can be difficult for humans to see how all of the variables interact with each other,” said Kava. “We’ve been at this (data center optimization) for a long time. All of the obvious best practices have already been implemented, and you really have to look beyond that.”
Enter Google’s ‘Boy Genius’
Google’s neural network was created by Jim Gao, an engineer whose colleagues have given him the nickname “Boy Genius” for his prowess analyzing large datasets. Gao had been doing cooling analysis using computational fluid dynamics, which uses monitoring data to create a 3D model of airflow within a server room.
Gao thought it was possible to create a model that tracks a broader set of variables, including IT load, weather conditions, and the operations of the cooling towers, water pumps and heat exchangers that keep Google’s servers cool.
“One thing computers are good at is seeing the story in the data, so Jim took the information we gather in the course of our daily operations and ran it through a model to help us make sense of complex interactions that his team may not otherwise have noticed,” Kava said. “After some trial and error, Jim’s models are now 99.96% accurate in predicting PUE under a given set of conditions, which then helps him find new ways to optimize our operations. Those small tweaks add up to significant savings in both energy and money.”
 A graph showing how the projections by Google’s neural network tool aligned with actual PUE readings. Click for larger image.
How it Works
Gao began working on the machine learning initiative as a “20 percent project,” a Google tradition of allowing employees to spend a chunk of their work time exploring innovations beyond their specific work duties. Gao wasn’t yet an expert in artificial intelligence. To learn the fine points of machine learning, he took a course from Stanford University Professor Andrew Ng.
Neural networks mimic how the human brain works, allowing computers to adapt and “learn” tasks without being explicitly programmed for them. Google’s search engine is often cited as an example of this type of machine learning, which is also a key research focus at the company.
“The model is nothing more than series of differential calculus equations,” Kava explained. “But you need to understand the math. The model begins to learn about the interactions between these variables.”
Gao’s first task was crunching the numbers to identify the factors that had the largest impact on energy efficiency of Google’s data centers, as measured by PUE. He narrowed the list down to 19 variables and then designed the neural network, a machine learning system that can analyze large datasets to recognize patterns. | 11:00a |
Admin Error Brings Down Joyent’s Ashburn Data Center Joyent, a San Francisco-based provider of high-performance cloud infrastructure services, saw one of its data centers go down Tuesday as a result of an error made by an administrator. The company had to reboot all servers in its US-East-1 data center, located in Ashburn, Virginia.
The provider has not released information on what exactly caused the outage, but is promising a “full postmortem.” In a forum post on Hacker News, Joyent CTO Bryan Cantrill wrote that the company would be providing the information “as soon as we reasonably can.”
Cloud outages sting more than others
Outages of service provider data centers cause a lot more damage than enterprise data center outages do because they host infrastructure for many companies instead of one. Cloud data center outages are especially painful because each physical server may be a host to multiple customers’ virtual compute nodes.
Another service provider, Internap, which offers cloud hosting services, experienced three outages at its New York City data centers during the past two weeks. The company did not say how many customers the outages affected overall, but at least 20 companies were affected by one of incidents.
Internap’s problems were caused by electrical equipment failure. This kind of an outage is different from Joyent’s. Internap’s outage happened at the facilities layer of the stack, while Joyent’s incident happened at the IT administration level.
‘Fat finger’ shouldn’t hurt so much
While human error was at fault, Joyent’s system ideally would have been built to withstand such errors. “While the immediate cause was operator error, there are broader systemic issues that allowed a fat finger to take down a data center,” Cantrill wrote, adding that the company would be improving software and operational procedures to prevent such incidents from happening in the future.
Joyent does not plan to discipline the administrator that made the error, Cantrill told The Register, explaining that the company was more interested in learning from the incident than punishing people.
Joyent provides public and private cloud infrastructure services for companies that need more computing horsepower than the mainstream Infrastructure-as-a-Service providers, such as Amazon Web Services, can offer.
In addition to the Ashburn data center, brought online in February 2012, its cloud infrastructure lives in data centers in San Francisco, Las Vegas and Amsterdam. | 12:00p |
QTS Gets Three Data Centers Open-IX-Certified QTS Realty Trust has secured Open-IX certification for three of its data centers, joining the multi-company effort to create a distributed member-governed Internet exchange system in North America.
Open-IX is a non-profit industry group comprised of data center and network service providers, as well as providers of Internet-based products and content. Its member list includes representatives from Google, Twitter, Microsoft, Netflix and Comcast, to name a few. Among data center providers that participate are Rackspace, CyrusOne, Vantage, CoreSite, Digital Realty Trust and DuPont Fabros among others.
Counterweight to Equinix, Telx
The non-profit was formed to provide an adequate alternative to the biggest Internet exchanges in North America, nearly all of which are commercially operated and controlled by a handful of companies, including Equinix, Telx and Verizon Terremark. Equinix is far ahead of all others in terms of the number of companies trading Internet traffic through its exchanges.
To participate in Open-IX, data center providers and exchange operators have to satisfy criteria set by the organization. The goal is to standardize exchanges across multiple sites and, hopefully, to attract enough peering members to the community to enable the distributed exchange to compete with the likes of Equinix and Telx.
QTS joins group of certified firms
QTS is the latest data center provider to have its facilities certified, following certification announcements by DuPont Fabros, CyrusOne, Continuum, Digital Realty and EvoSwitch.
Only two exchanges have been certified so far, both of them European companies that have recently expanded into the U.S., in parallel with Open-IX efforts. LINX (London Internet Exchange) has received certification for its LINX NoVA exchange in Ashburn, Virginia, and AMS-IX (Amsterdam Internet Exchange) has gotten exchange infrastructure in New York City certified.
QTS received Open-IX blessings for data centers in Atlanta, Georgia, Suwanee (an Atlanta suburb) and in Richmond, Virginia. The stamp of approval means the facilities can now host Open-IX-certified Internet exchanges.
“Achieving OIX-2 certification for three QTS data centers demonstrates our commitment to provide our customers with best-in-class, carrier-neutral Internet connectivity and interconnection options at our data centers,” Jim Reinhart, the company’s chief operations officer, said. “We fully support OIX’s efforts to foster the development of critical technical and operating standards for the data center industry, ultimately for the benefit of all Internet users.” | 12:30p |
Address Cloud Demand with a “Pre-flight” Checklist, Increase Dynamic Scalability Shashi Mysore is director of product management at Eucalyptus, where his responsibilities include overseeing all aspects of the product life cycle, from strategy to execution. You can find him on LinkedIn at Shashi Mysore.
Not long ago, organizations could comfortably forecast demand for applications by analyzing activity levels and applying acute knowledge of user populations. But the rapid rise of mobile devices, along with the video streaming, social media and the “freemium” games that they enable, has altered how individuals engage with software, creating intense and variable demand. Businesses can no longer target a single endpoint or a set number of users. Instead, they must be equipped to scale operations in response to rapidly changing conditions. Overall, gauging cloud demand is becoming a significant and increasing issue.
Unfortunately, much of today’s IT architecture was designed for a different time, when companies only had to accommodate static and predictable workloads. As a result, dev/test and IT teams sometimes have to pick their poison, and either over-provision resources for a worst-case scenario that may not even materialize, or under-provision them and invite issues with run-time performance and general availability.
Among the many tools that can be used to ensure scalability of infrastructure, applications and services are hybrid cloud platforms and automated instances. Conducting a “pre-flight checklist” of cloud capabilities can help prepare for potentially overwhelming demand.
Video Streaming, Games, and Social Media: The Changing Faces of Application Demand
Applications once targeted at small populations are now designed to capture and sustain the attention of thousands, even millions of users. For example, take video streaming, the most recent Ericsson Mobility Report (PDF) found that video was the fastest growing component of global mobile data traffic, accounting for 35 percent in 2013, but it’s primed to top the 50 percent mark by 2019. Popular streaming services such as YouTube, Netflix and Hulu regularly have to scale to meet exceptional demand.
Moreover, the bandwidth requirements and social sharing components of streaming video together make for heavy, dynamic workloads. The April 2014 season debut of “Game of Thrones” on HBO, created what the network called “overwhelming demand” that took its “HBO Go” application temporarily offline. Such interest in video is hardly atypical, though, on a given day, Netflix may account for almost one-third of U.S. internet traffic.
Games and social media, including platforms such as Twitter and LinkedIn, as well as enterprise collaboration tools present similar scalability challenges. There are more than 250 million social gaming users on Facebook alone. Likewise, prominent mobile games such as “Flappy Bird” and “Clash of Clans” have used a mix of incentive systems and networking effects to reach enormous numbers of users in remarkably little time.
Pre-flight Checklist: Preparing Cloud-based IT Architectures for Viral Demand
Organizations need to prepare themselves in the face of such unpredictable, dynamic demand. Business strategy is increasingly predicated on providing seamless online experiences, with zero latency and constant uptime. Companies are integrating public cloud capabilities from infrastructure-as-a-service to cloud-delivered software with existing architectures to create hybrid deployments that facilitate easy access to compute, storage and networking resources. At the same time, this setup affords control over data and dedicated hardware performance from on-premises machines as needed.
Still, some companies opt for a strictly private or public cloud at first and only make the journey to hybrid once demand has become a real issue. More specifically, larger organizations may start with a private deployment to ensure security, while startups may choose public services to minimize infrastructure management. Either way, it is a good idea for businesses to think about scalability early on, since applications can take off over night.
With that in mind, here are several core things that should be considered when getting started with the cloud. Think of it as the aforementioned “pre-flight checklist” that ensures you are not caught off-guard by surging demand:
- IaaS and API compatibility – not all IaaS is equal, with some services being black boxes that are not amenable to application portability or compatible with leading APIs. Make sure that the selected IaaS can accommodate changes in cloud utilization.
- Licensing and use of open source software – using open source software is a good way to keep costs down (in turn freeing up resources to address demand) and get access to new features. Mixing open source and commercial solutions promotes flexibility and freedom from vendor lock in, making it easier to build a cloud that meets specific business requirements.
- Instance automation – infrastructure, software and platforms can be automated for superior consistency and speed. If a change needs to be made in the testing environment, for example, then it can be done on the fly without taking up too much time or attention.
- Ample caching – today’s applications process considerable amount of data and content. Using a multi-tiered caching system leads to fast load times and high levels of performance.
The stakes are high for setting up efficient, scalable cloud architectures. Supporting applications with dynamic user populations and erratic workloads is not just a task for the IT department but a core competency for the entire company. With careful consideration of how to align a hybrid cloud with business requirements, organizations can stay one step ahead of changing demand.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library. | 2:00p |
Data Center Jobs: ViaWest At the Data Center Jobs Board, we have two new job listings from ViaWest, which is seeking a Data Center Engineer in Richardson, Texas, and a Data Center Manager in North Las Vegas, Nevada.
The Data Center Engineer is responsible for monitoring the buildings HVAC, mechanical and electrical systems, performing preventive maintenance, site surveys, replacement of electrical and mechanical equipment, reading and interpreting blueprints, engineering specifications, project plans and other technical documents, performing operation, installation and servicing of peripheral devices, and assisting with equipment start-ups, repairs and overhauls. To view full details and apply, see job listing details.
The Data Center Manager is responsible for developing and maintaining positive relationships with clients, overseeing the scheduling, maintenance, and monitoring of all heating, ventilating, air conditioning, water, electric and other systems to ensure efficient operation, inspecting facility and generates inspection reports, cultivating productive and proactive working relationships with property management and other tenants in order to jointly resolve issues, building a strong team of technical experts to maintain infrastructure, and managing facilities staff to deliver expected service levels to the client within the prescribed budget To view full details and apply, see job listing details.
Are you hiring for your data center? You can list your company’s job openings on the Data Center Jobs Board, and also track new openings via our jobs RSS feed. | 2:30p |
5 Reasons Modern Data Centers Use Environmental Sensors Even as the modern data center grows in usage, administrators are still tasked with developing and even more efficient environment. Organizations are growing and new demands around the data center platform create direct efficiency challenges which must be overcome.
Here’s the important part: data center demand will only continue to increase. With more users, cloud, and data, the data center model will need to adapt to ever-changing business demands.
A big piece of powerful data center management revolves around the utilization of environmental sensors. According to researchers at Gartner:
- Sensors can help prevent overcooling, undercooling, electrostatic discharge, corrosion and short circuits.
- Sensors help organizations to reduce operational costs, defer capital expenditures, improve uptime, and increase capacity for future growth.
- Sensors provide environmental monitoring and alert managers to potential problems like the presence of water, smoke, and open cabinet doors.
- Sensors can save you up to four percent in energy costs for every degree of upward change in the baseline temperature, known as a set point.
There’s a great saying that now directly impacts the data center platform: You can’t manage what you don’t measure.
In this eBook from Raritan, we quickly see exactly why the using of intelligent environmental sensors can positively impact data center efficiency. These five key reasons include:
- Save on Cooling by Confidently Raising Data Center Temperatures
- Ensure Uptime by Monitoring Airflow and Air Pressure to and from Racks
- Maintain Cabinet Security with Contact Closure Sensors
- Improve Data Center Uptime by Receiving Environment Alerts
- Make Strategic Decisions on Environmental Designs and Modifications
Visibility into your data center is critical to keeping it healthy. Utilizing environmental sensors gives you a proactive view into exactly how well your data center is operating. Download this whitepaper today to see the diverse nature of environmental sensors, what they can see, and how they can directly improve your data center efficiency. | 5:30p |
HP Rolls Out Government Flavor of Helion Private Cloud Services HP announced a new set of cloud infrastructure services targeted at government agencies.
Helion Managed Private Cloud for Public Sector includes hardware, software and ongoing management services. Helion is HP’s new $1 billion cloud services initiative that includes Infrastructure-as-a-Service and Platform-as-a-Service offerings.
The U.S. federal government is a major opportunity for cloud service providers because federal agencies are mandated to use cloud instead of in-house infrastructure whenever possible. HP’s announcement also comes at a time when agencies are faced with a looming deadline for ensuring all cloud service providers they contract with are FedRAMP-compliant. HP is one of only 12 providers that have been FedRAMP-certified.
FedRAMP is a pre-screening program for cloud providers to ensure their infrastructure meets the government’s security requirements. The deadline for all agencies to use only FedRAMP-compliant providers is June 5.
“With a robust hybrid portfolio of enterprise cloud services already in place for commercial clients, our priority has been making sure that they are available to meet government demands,” said Stacy Cleveland, director of global practices for HP’s public sector enterprise services division.
HP’s managed private cloud allows agencies to act as IT brokers by accessing a web-based portal to manage consumption and monitor resources, allowing charge back of costs to departments and business units. The fully engineered solution provisions a cloud environment in either a client-owned or a third-party data center. The wider Helion initiative includes a roll-out of cloud services across 20 of HP’s own data centers around the world over the next 18 months.
HP is no stranger to Federal procurement regulations and red tape management, and as such its Virtual Private Cloud has received a FedRAMP provisional Authority to Operate (pATO), which allows any agency to use HP’s pATO and grant its own ATO without conducting duplicative assessments. It meets compliance needs for FedRAMP, FISMA high, HIPAA and the Defense Information System Agency Enterprise Cloud Service Broker (DISA ECSB) impact Level-5.
Amazon’s GovCloud, IBM and Microsoft also have FedRAMP provisional authority, and Oracle earned it just last week.
The level 5 authorization from DISA allows HP Helion to host sensitive information systems, making an easy path for the cloud solutions to access defense and intel agency deals as well. | 6:30p |
Latisys Beefs Up Security With Partners Alchemy and AlienVault High-profile data security breaches, such as Target’s infamous holiday-season fiasco and the most recent eBay break-in, bring the issue of security to public attention, but thousands of non-household-name enterprises struggle with the security of their IT infrastructure every day.
Cloud infrastructure services have created an opportunity for service providers to offer security management as part of their proposition to enterprise customers. Most of these providers turn to partners who specialize in security to gain such capabilities.
The latest example is Latisys, an Infrastructure-as-a-Service-oriented provider that has teamed up with Alchemy and AlienVault to beef up its security software suite for clients. Alchemy provides the core security brains, while AlienVault provides real-time threat data, all consolidated under Latisys’ Threat and Compliance Management Services.
Wendy Nather, security research director at 451 Research, said such partnerships were a wise thing to do for companies for whom security was not a core competency. “Latisys is really smart to bring on Managed Security Services Provider security consultancy like Alchemy because it’s very hard and human-resource intensive to build this,” she said.
Making security real-time
Latisys has offered a lot of standard firewall, VPN intrusion detection and prevention services and has had a relationship with Alert Logic, provider of security solutions for cloud and hosting services. Autumn Salama, director of solutions management at Latisys, said the new partnership brings in an event correlation engine that was not available to the company’s customers before, however.
“There’s a lot of stuff out there that just checks the box,” Salama said. “Security is a lot more than that. The importance of a holistic solution and bringing it all back together and layering the expertise atop of that is important. The customer doesn’t have to do the monitoring if they don’t want to. They’re using industry-recognized technology, and Alchemy is running that stuff for the customer and they can monitor it too.”
No good checklist for security
Nather said just about every company was trying to figure out exactly what level of security they needed. “There isn’t a good checklist except for PCI (Payment Card Industry Data Security Standard), so many are turning that into a checklist.
“One bump in the road that service providers and MSSPs understand is that they can only do so much to secure customers. Half of the security is how they use the technology. If they’re patching, the provider can’t do much about it if they get breached.” |
|