Data Center Knowledge | News and analysis for the data center industry - Industr's Journal
[Most Recent Entries]
[Calendar View]
Tuesday, February 2nd, 2016
| Time |
Event |
| 1:00p |
Three Pillars of Modern Data Center Operations Modern enterprise data centers are some of the most technically sophisticated business activities on earth. Ironically enough, they are also often bastions of inefficiency, with equipment utilization much below ten percent and 30 percent of the servers in those facilities being comatose (using electricity but performing no useful information services). The operators of these facilities also struggle to keep pace with rapid changes in deployments of computing equipment.
These problems have led to much attention being paid to improving data center management. While almost every enterprise data center has taken steps to improve its operations, virtually all are much less efficient, much more costly, and far less flexible than they could be. Those failings ultimately prevent data centers from delivering maximum business value to the companies that own them.
Well-managed data centers use what I call the three pillars of modern data center operations: tracking, procedures, and physical principles.
Tracking
Running a data center requires accurate real-time measurements of temperature, humidity, and airflow, as well as detailed inventories of equipment characteristics, vintage, and performance. Most Data Center Infrastructure Management (DCIM) tools deliver this information using sensors spread throughout each facility. DCIM software often requires customization to be most effective in any particular application, but it has become much more sophisticated over time.
Sign up for Jonathan Koomey’s online course Modernizing Enterprise Data Centers for Fun and Profit. More details on the course below.
The most advanced facilities use radiofrequency identification (RFID) technology to “tag” each piece of IT equipment, physical tags that tie each server to a particular spot on the rack, and “over the network” tracking of equipment status. Whenever equipment is moved, RFID readers help update equipment status in the central tracking database, and when equipment conditions change, the devices update the central database with that new information over the network.
DCIM software is like the dashboard of a car, which gives information on vehicle speed and engine temperature in real time. Many data center managers mistakenly think that once they have a DCIM tool, they have all they need to manage their facilities, but nothing could be further from the truth. Such tools are necessary (because they offer a detailed picture of the current status of the data center) but they are not sufficient.
Procedures
Because the equipment in data centers is constantly changing, sometimes in unpredictable ways, well-managed facilities need well-defined and empirically grounded procedures for design, deployment, maintenance, and decommissioning of computing and infrastructure equipment. That means procedures based on best practices as defined by Lawrence Berkeley National Laboratory, The Green Grid, Open Compute Project, ITI, the TBM Council, and others.
RFID tracking tools and over-the-network data collection (as described above) make procedures for accurate inventory tracking much easier to implement. DCIM sensor measurements can make it easier to define operational procedures for the data center. In the most interesting case, sensor data can be combined with machine learning algorithms to automate some data center operations, thus simplifying the design of human procedures.
Such procedures are like simple rules that drivers use to maintain safety, like “if you see a pedestrian, slow down” or “turn into the skid.”
Physical Principles
The last of the three pillars involves applying knowledge of the physical laws, engineering designs, and technological constraints affecting reliable delivery of power and cooling in the facility. Because data centers are constantly changing, and because of the complexity of air and heat flows, it is essential to apply engineering simulation tools to both data center design and operations. That means taking the information from tracking tools and incorporating it into software that simulates airflow, power distribution, and heat transfer.
The best of these tools rely in part on sophisticated Computational Fluid Dynamics software, extensive libraries of the power and airflow characteristics of thousands of different kinds of IT equipment, and visual analyzers to simply and accurately predict the effects of changes in IT deployments. These computer models need to be calibrated with real measurements from a data center to ensure they accurately characterize the facility’s operations, but once calibrated, they can be used to predict the effects of changes in IT equipment configurations on airflow, temperature, efficiency, reliability, available capacity, and cost without having to actually move or install that equipment.
When properly used, such engineering simulation tools are like the headlights of a car, showing clearly what’s on the road ahead. They show the costs and risks of operational plans and are just as important as careful tracking and appropriate procedures for proper management of data center operations.
The three pillars, taken together, constitute the most reliable means of delivering business value from the data center. No modern data center manager should be without them.
About the author: Jonathan Koomey is a Research Fellow at the Steyer-Taylor Center for Energy Policy and Finance at Stanford University and is one of the leading international experts on the energy use and economics of data centers.
Sign up here for his upcoming online course, called Modernizing Enterprise Data Centers for Fun and Profit, which is starting May 2.
The course teaches you how to turn your data centers into cost-reducing profit centers. It provides a road map for businesses to improve the business performance of information technology (IT) assets, drawing upon real-world experiences from industry-leading companies like eBay and Google. For firms just beginning this journey, it describes concrete steps to get started down the path of higher efficiency, improved business agility, and increased profits from IT. | | 5:29p |
How Data Center Trends Are Forcing a Revisit of the Database Ravi Mayuram is Senior Vice President of Products and Engineering at Couchbase.
Data centers are like people: no two are alike, especially now. A decade of separating compute, storage, and even networking services from the hardware that runs them has left us with x86 pizza boxes stacked next to, or connected with, 30-year-old mainframes. And why not? Much of the tough work is done by software tools that define precisely how and when hardware is to be used.
From virtual machines to software-defined storage and network functions virtualization, these layers of abstraction fuse hardware components into something greater and easier to control.
That’s a startling change from the early days of data center design, when the industry’s major players poured billions into data centers that were to be the factories of a then-burgeoning digital economy. Their rise gave birth to standards committees that defined how space and power were to be used, ensuring that major suppliers wouldn’t accidentally build a server too tall or too wide to find a home.
Today, in 2016, disaggregated services running on cheap equipment accomplish as much or more than the mighty machines of old. They also occupy less space and form natural connections to external resources accessed in the cloud.
Our approach to infrastructure has morphed in response, and yet databases remain largely unchanged — processing jobs in the same lowest common denominator fashion as they always have and wasting compute and storage resources in the process. Data center operators are handling far too much information for this to go on much longer.
A Data Center Deluge
Oddly, most of the industry is well aware that over-provisioning is a problem. A study last year estimated the value of idle servers — digital sentries standing by, doing nothing — at north of $30 billion. In a similar study from last September, The Wall Street Journal uncovered a facility that had more than 1,000 unused machines powered on and ready for work that would never come. How can we justify such an astounding waste of capital when the need for data center efficiency has never been greater?
Information is moving to and through data centers with unmatched volume and variety, and at a velocity never before seen. Cisco tracks the changes in its Visual Networking Index (VNI). The latest figures put Internet-based traffic at 2 Exabyte per day last year, equivalent to 40 percent of the words ever spoken by human beings since the dawn of existence. Cisco sees that total more than doubling, to 5.5 Exabyte per day, by 2019.
Without the tools for software-defining and deploying data center resources efficiently, we’d have no choice but to rely on brute force over-provisioning to handle all that information without suffering downtime. That we still need sheer muscle for most database services makes these crucial systems a bottleneck. Disaggregation is key to solving the problem.
Scaling to Workload, Not Infrastructure
Databases handle any one of three kinds of functions. Data services are core to the system and define the schema used to store information. Index services categorize data for fast retrieval, and query services extract it according to defined parameters. Most systems will handle many different types of requests at once.
The difficulty comes in how databases leverage hardware; it’s all blunt force with requests distributed equally across an infrastructure. There is no accounting for what systems would be better suited for handling an I/O intensive data service. Nor is there a mechanism to handle a memory-intensive index service in a separate system, or provisions to manage compute-intensive queries in a distinct, optimized cluster of machines. Database platforms don’t enjoy the separation of compute, storage, and network services that bring efficiency to modern data centers. We need to change that; we need systems to evolve to scale database services to different subsystems independently and on-demand.
Interestingly, this is as much a problem for the majority of NoSQL databases as it is for all relational systems. Inefficient NoSQL databases may even have it worse because of how often they’re paired with a massively distributed infrastructure. With no way to assign queries to different nodes, jobs collide into each other, consuming massive amounts of memory and compute in a bare-knuckles fight for resources. Solving for that is like disaster planning, which is why we still have so much over-provisioning in the data center.
Yet it doesn’t need to be this way. Multi-dimensional database scaling was born from the same spirit that brought us virtualization, software-defined networking, and other disaggregation techniques that have transformed the data center for the better. In its simplest form, multi-dimensional scaling is software-defined workload optimization wherein administrators assemble the compute, memory, and storage needed for their workload characteristics. Systems are then optimally provisioned to avoid idling while maintaining the elasticity required to handle spikes when they occur.
Think of it as scaling according to the needs of the workload — and optimizing for every piece of available hardware in the process — rather than designing to the lowest-common-denominator limits of the infrastructure. In a world that’s awash with data, that’s a change that can’t come soon enough.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library. | | 7:04p |
Google Open Sources Data Center Load Balancing Software 
By The VAR Guy
Google has open-sourced another internal software project. This one, called Seesaw, is a load-balancing platform that is based on Linux. It’s now available under an Apache 2.0 license.
To be sure, load balancing does not top most people’s lists of the most romantic or interesting IT solutions. But it’s an essential component of modern networking. Load-balancing software helps servers exchange information efficiently and make optimal use of the data pipelines available to them — while also preventing network overloads or other potential problems.
Read more: Why Should Data Center Operators Care about Open Source?
Seesaw, as Google explained, was developed internally to meet the company’s load-balancing needs, after engineers determined that no good existing solution was available. It’s written using Google’s Go programming language.
Google says Seesaw was designed to provide easy management and the ability to automate configuration changes. That makes it ideal for large enterprises that need a flexible load-balancing solution.
Plenty of load-balancing platforms are already available on the market, but many of them are linked to particular hardware. By providing an enterprise-grade, vendor-neutral package for load-balancing, Google has open-sourced another networking and data center solution that could see widespread use throughout the enterprise world.
This first ran at http://thevarguy.com/open-source-application-software-companies/google-open-sources-seesaw-network-load-balancing | | 7:15p |
Email Gets Customers Hooked on Cloud – at Least in Microsoft’s Case 
By The WHIR
While not a major talking point in Microsoft’s fiscal second-quarter earnings report and conference call last week, email services seem to be one of the major drivers of cloud services for the company.
In the report, server products and cloud services revenue grew 10 percent, at least partially due to customers already relying on Microsoft as their email provider.
As Reuters’ Sarah McBride wrote, “For companies already relying on Microsoft Exchange and Outlook for sending and receiving email, information technology managers say, turning to the same company to handle that data in the cloud seems like a logical move.” Examples include the University of Wisconsin, Madison, whose transition to cloud services started with email and continued to Office 365 because of its existing investment.
Outlook has enormous traction. Some 30 million iOS and Android active devices run it, paving the way for Exchange email.
Being able to provide different services provides natural avenues for customers to expand the extent of their services. “If you’re an Office 365 customer, you have data in Office 365, you want to build applications that tap into that data, then you use Azure,” Microsoft CEO and director Satya Nadella said in the conference call Thursday. “If you deploy Exchange online, you have Azure AD and then the natural expansion from there is the full EMS suite with device management for all your mobile devices and also Advance Threat Protection and what have you.”
Microsoft’s revenue for its public cloud, Azure, grew 140 percent in the quarter according to Nadella, and that these customers were likely to have already been a consumer of the Microsoft ecosystem. He said, “Azure customers who also purchase Office 365 consume eight times more Azure than other customers. More than 70 percent of the Fortune 500 have at least two different Microsoft cloud offerings.”
Get customers signed up for one cloud service, and others are bound to follow.
This first ran at http://www.thewhir.com/web-hosting-news/email-gets-customers-hooked-on-cloud-offerings-at-least-in-microsofts-case | | 7:50p |
HPC Virtualization: Three Key Considerations In the past, many would say that you simply can’t virtualize high-performance computing workloads. They require dedicated sets of resources, the workloads themselves are very heavy, and a lot of architectures never took virtualization into consideration.
Today, that view is pretty different. Virtualization technologies allow HPC and other workloads to leverage resources ever more efficiently and allow more scalability by bursting into the cloud. For example, VMs are a convenient way to package and deploy scientific applications across heterogeneous system. Applications can be packaged with their required libraries and support programs, including (perhaps) a distributed file system that would otherwise be difficult or impossible to install without special privilege.
From a vHPC perspective, there a number of key aspects to consider when creating a virtual HPC architecture.
The Hypervisor
The modern hypervisor has come a long way. So, if you’re in the HPC world and are still skeptical about creating a vHPC cluster, consider some of the following. It’s important to explain here what it means to run compute-intensive code on a modern hypervisor. Because of direct paravirtualization optimizations, that workload is basically running on the bare metal architecture. Yes, there is another level in the memory hierarchy, but powerful virtualization technologies have shown that this generally does not introduce performance issues given hardware support from chip vendors. In fact, the continuing development of hardware acceleration techniques for virtualization is another major point to consider:
- Advancements around CPU virtualization
- Optimized memory virtualization
- Modern I/O virtualization techniques
These all help to reduce the overheads for applications running virtualized HPC workloads.
Resource and Data Control
Once moved to a virtualized environment, VM abstraction offers some additional benefits beyond being able to bring your own software onto the cluster. Separation of workloads into multiple VMs can add value as well:
- For organizations centralizing multiple groups onto a cluster or for teams with per-project data security issues (for example in a Life Sciences environment where access to genomic data may need to be controlled and restricted to specific researchers), VM abstraction offers security separation that isn’t available in traditional HPC environments. In those environments, the batch scheduler schedules jobs based on available compute resources, placing jobs from different teams within the same OS instance. Using multiple batch queues for separation results in lower cluster utilization and therefore isn’t a good approach.
- In bare-metal environments, running multiple users’ jobs within the same OS instance can result in more than just data leakage. If jobs disrupt the OS (fill/tmp, crash daemons, etc.), those failures can affect other, unrelated jobs. VM abstraction can protect a user’s jobs from failures caused by other workloads.
- Visibility into the resource layer is critical. Not only are you able to create cost-based scenarios for what end-users need to operate, you’re also enabling your environment for better security and data controls. For example, being able to archive a VM is an easy way of ensuring that the exact software environment used for some type of HPC workload is saved. Similarly, for academic and other institutions concerned about the reproducibility of their scientific research – or subsequent auditing of their research results, being able to save and then later restore the exact software environment used during their research can be very important.
The Cloud
Virtualization and cloud computing play a big part in creating a powerful vHPC cluster. Imagine an ecosystem where commercial, government, and pharma platforms can dynamically scale their HPC workloads into a controlled cloud environment. Using these technologies, organizations are able to create private cloud environments with self-service portals. These portals then allow entire research groups to effectively check out a pre-configured vHPC cluster that has been sized precisely to their requirements. Now, imagine being able to scale this private cloud architecture into a hybrid cloud model.
Is it time to virtualize your HPC workloads? Maybe. The power of the cloud, the ability to replicate data, and the need to process even more information makes virtualization a very real option for HPC workloads. Remember, the nature of information and the ability to quantify it quickly will only continue to evolve. The digital world is ever-expanding and more parallel applications are being deployed to compute very complex processes. Fears around overhead, system stability, and even management should all be corralled as virtual technologies have come to a point where HPC systems can directly benefit from this type of architecture. Remember, the idea is to make your systems run more optimally and help your IT functions better align with the goals of your organization. | | 9:37p |
LinkedIn Designs Own 100G Data Center Switch 
Following examples set by other web-scale data center operators, companies like Google and Facebook, the infrastructure engineering team behind the professional social network LinkedIn has designed its own data center networking switch to replace networking technology supplied by the major vendors, saying the off-the-shelf products were inadequate for the company’s needs.
LinkedIn has successfully tested its first-generation switch, which consists of contract design manufacturer hardware, merchant silicon, and the company’s home-baked Linux-based software, and plans to deploy it at scale for the first time in its upcoming data center in Oregon, in an Infomart Datacenters facility, which will also be the first site to use LinkedIn’s own data center design.
LinkedIn’s data center switch, called Pigeon, is a 3.2Tbps (32 by 100G) platform that can be used as a leaf or a spine switch. The architecture is based on the Tomahawk 3.2Tbps merchant silicon. It runs a Linux OS.

Base architecture of Pigeon, LinkedIn’s first 100G data center switch (Image: LinkedIn)
Google was the first web-scale, or cloud data center operator to create its own switches. Facebook got into designing its own networking technology several years ago.
After introducing Wedge, its first 40G top-of-rack data center switch, in 2014, Facebook rolled out its own data center switching fabric and an aggregation switch, called Six Pack. Last year, the company announced it had designed its first 100G switch, which it plans to start deploying at scale in the near future.
Read more: With its 100-Gig Switch, Facebook Sees Disaggregation in Action
Companies like Facebook, Google, Microsoft, or Amazon, have built global data center infrastructure of unprecedented scale, finding along the way that technology that exists on the market often doesn’t meet their needs and creating solutions in-house that work better for them.
LinkedIn’s user base has been growing rapidly, and the company is starting to face similar challenges. It went from 55 million members at the end of 2009 to nearly 400 million as of the third quarter of last year, according to Statista. Its 10MW data center lease with Infomart earned it a spot on the list of 10 ten companies that leased the most data center space last year.
Read more: Who Leased the Most Data Center Space in 2015?
 Infomart’s Portland data center in Hillsboro, Oregon (Photo: Infomart Data Centers)
Its network latency problems started three years ago, and after spending some time trying to address them, the engineers realized that they would have to build a data center networking solution from scratch.
“We were not scaling our network infrastructure to meet the demands of our applications – high speed, high availability, and fast deployments,” Zaid Ali Kahn, LinkedIn’s director of global infrastructure architecture and strategy, said in a blog post. “We knew we needed greater control of features at the network layer, but we hit a roadblock on figuring out how.”
The team traced its latency problems to subsequent microbursts of traffic. These were difficult to detect because commercial switch vendors don’t expose buffers inside third-party merchant silicon chips. Visibility into merchant silicon became one of the design goals for Pigeon.
LinkedIn’s data center network vendors were also slow to address software bugs and built features into their products the company didn’t need. Some of those features also had bugs LinkedIn had to address.
The engineers also wanted to use Linux-based automation tools, such as Puppet and Chef, and more modern monitoring and logging software. Finally, it was simply too expensive to scale switching software licenses and support. All these concerns echo the reasons other web-scale data center operators have given for turning to custom technology, designed in-house. |
|