Data Center Knowledge | News and analysis for the data center industry - Industr's Journal
[Most Recent Entries]
[Calendar View]
Monday, June 22nd, 2015
| Time |
Event |
| 12:00p |
DCIM Implementation – the Challenges This is Part 4 of our five-part series on the countless number of decisions an organization needs to make as it embarks on the DCIM purchase, implementation, and operation journey. The series is produced for the Data Center Knowledge DCIM InfoCenter.
In Part 1 we gave an overview of the promises, the challenges, and the politics of DCIM. Read Part 1 here.
In Part 2 we described the key considerations an organization should keep in mind before starting the process of selecting a DCIM solution. Read Part 2 here.
In Part 3 we weighed DCIM benefits versus its costs, direct, indirect, and hidden. Read Part 3 here.
The first three parts of this series examined vendor promises, purchasing guidelines, and potential benefits of DCIM. However, while it all may look good on the whiteboard, actual implementation may not be quite as simple as the vendor’s sales teams would suggest. Existing facilities, and especially older ones, tend to have lower energy efficiency and also far less energy monitoring. In this part we will review some of the challenges of retrofitting an operating data center, as well as some of the considerations for incorporating a DCIM system into a new design.
Facility Systems Instrumentation
Virtually all data centers have Building Management Systems to supervise the operation of primary facility components. These generally monitor the status of the electrical power chain and subsystems include utility feeds, switchboards, automatic transfer switches, generators, UPS, and downstream power distribution panels. They are also connected to cooling system components. However, in many cases, the BMS systems are not very granular in the amount and type of data they collect. In some cases, the information is limited to very basic device status information (on-off) and alarm conditions.
Therefore, these sites are prime candidates for reaping the potential benefits of DCIM. In order for DCIM systems to gather and analyze energy usage information, they require remotely readable energy metering. Unfortunately, some data centers may not even have any real-time utility energy metering at all and can only base their total energy usage on the monthly utility bill. While this has been the de-facto practice for some sites in the past, it does not provide enough discrete data (or sometimes any data) about where the energy is used or about facility efficiency. More recently, DCIM (and some BMS) systems have been designed to measure and track far more granular information from all of these systems. However, the typical bottleneck is the lack of energy meters in power panels in these older facilities or the lack of internal temperature or other sensors within older cooling equipment (that can be remotely polled), such as CRAC/CRAH units or chillers.
The retro-fitting of energy metering and environmental sensors is one of the major impediments of DCIM adoption. This is especially true in sites with lower levels of redundancy of power and cooling systems. This requires the installation of current transformers (CT) and potential transformers (PT) to measure voltage. Although there are “snap-on” type CTs that do not require disconnecting a conductor to install them, more recently OSHA has restricted so called “hot work” on energized panels and therefore may require shutting down some systems to safely do the electrical work required. And of course in the mission critical data center world “shutdown” is simply not in the vernacular. So, in addition to getting funding and internal support and resources for a DCIM project, approving this type of potentially disruptive retro-fit work requires management approval and cooperation by facility and IT domains, an inherent bottleneck in many organizations.
Basic Facility Monitoring: Start With PUE
At its most elementary level, a DCIM system should display real-time data, historic trends, and provide annualized reporting of Power Usage Effectiveness (PUE). This involves installing energy metering hardware at the point of utility handoff to the facility and at a minimum also collecting IT energy usage (typically at the UPS output). However, for maximum benefit, other facilities related equipment (chillers, CRAH/CRACs, pumps, cooling towers, etc.) should have energy metering and environmental monitoring sensors installed. This allows DCIM to provide an in-depth analysis and permits optimization of the cooling infrastructure performance, as well as provide early failure detection warnings and predictive maintenance functions.
Whitespace: IT Rack-Level Power Monitoring
While metering total IT energy at the UPS output is the simplest and most common method to derive PUE readings, it does not provide any insight into how IT energy is used. This is a key function necessary to fulfill the promised holistic view of the overall data center, not just the facility. However, compared to the facility equipment, the number of racks (and IT devices), and therefore the number of required sensors, is far greater. The two areas that have been given the most attention at the rack level are power/energy metering and environmental sensors. The two most common places to measure rack-level power/energy is either at the floor level PDU (with branch circuit monitoring) or by metered PDUs within the rack (intelligent power strips, some of which can even meter per outlet to track energy used by the IT device).
From a retrofit perspective, if the floor-level PDU is not already equipped with branch-circuit current monitoring, adding CTs to each individual cable feeding the racks are subject to the same “hot-work” restrictions as any other electrical work, another impediment to implementation. However, another method to measure rack-level IT equipment power which has been used for many years is the installation of the metered rack power distribution units (rack power strips). This normally avoids any hot work, since the rack PDUs plug into existing receptacles. While installing a rack PDU does require briefly disconnecting the IT equipment to replace a non-metered power strip, it can potentially be far less disruptive than the shutdown of a floor-level PDU, since it can be done one rack at a time (and if the IT hardware is equipped with dual power supplies, may not require shutting down the IT equipment). While this is also true for A-B redundant floor-level PDUs, some people are more hesitant to do so, in case some servers may not have the dual-feed A-B power supply cords correctly plugged-in to the matching A-B PDUs.
The rack level PDU also commonly uses TCP/IP (SNMP), so it can connect via the existing cabling and network. However, while this avoids the need to install specialized cabling to each rack, it is not without cost. Network cabling positions are an IT resource, as are network ports on an expensive production switch. The most cost-effective option may be to add a low-cost 48-port switch for each row to create a dedicated network, which can also be isolated for additional security. | | 1:00p |
Portworx Raises $8.5M to Solve Docker Storage Headaches Aiming to address one of the biggest pain points of running applications in Docker containers in production, a startup called Portworx came out of stealth today with a technology that automates provisioning of storage resources for containerized applications.
The startup has raised $8.5 million from the venture capital firm Mayfield Fund, with additional investment from Michael Dell, founder and CEO of the IT giant. It made the announcement in conjunction with this week’s DockerCon in San Francisco.
While developers like the way Docker containers simplify the application development lifecycle, from coding to staging to production in the data center, provisioning infrastructure for multi-container applications remains a complex, tedious process that often leads to unintended consequences, Gou Rao, Portworx co-founder and CTO, said.
“Containers are just not production-ready,” he said. And the problems are predominantly with storage and networking.
Portworx is not the first startup for Rao and his co-founder Murli Thirumale, the company’s CEO. The two were also co-founders of Ocarina Networks, a storage-optimization company Dell acquired in 2010. Prior to that they founded Net6, an application-delivery solutions firm they sold to Citrix in 2007.
Data center infrastructure is generally not container-aware, which is what Portworx is trying to change. Today, infrastructure provisioning is done separately, outside of the application. You specify what infrastructure resources the application will need, provision them, and then deploy the application.
Portworx’ software solution caters to infrastructure necessities of the containerized application natively, while understanding “data center genetics,” Rao explained. By understanding data center genetics he means software that knows which servers have more flash, which have more SATA drives, which of them are dense with CPU cores or with memory.
The company’s software installs on commodity servers. It clusters storage and carves out virtual storage volumes for a container wherever it is scheduled to run.
As the container writes data to the volume, the software replicates the data synchronously for high availability. It also provides storage functionality like snapshots and cloning.
“It automates storage provisioning, and it actually provides storage implementation as well,” Rao said. “Our focus is on VMware type of experience to the customer, as opposed to do-it-yourself OpenStack kind of experience.”
Portworx plans to have a “developer playground” for its solution available in July. The company is also working with several beta customers. It plans the first full release over the next six to nine months.
The company will use a subscription model, and the solution will be available for both on-premise deployments on commodity bare-metal x86 hardware and on Amazon’s EC2 cloud. | | 3:00p |
HP and CommScope Partner to Blend DCIM and ITSM CommScope has partnered with HP to add HP’s Converged Management Consulting Services into the iTRACS data center infrastructure management (DCIM) platform.
The two companies have had an ongoing relationship, and this latest venture has formalized that relationship. HP has partnered with a handful of vendors to couple its expertise in facilities operations with best-of-breed tools and DCIM suites.
HP’s Converged Management is a framework for integrating DCIM with IT service management (ITSM). It’s about providing the bigger picture regarding data center capacity which, in turn, can help in making better business decisions. HP’s consulting services include workshops, roadmap creation, design management for architecture across IT and facilities, service management, and implementation.
HP is both a large customer and large partner, said Jay Williams, VP of sales and new market development for CommScope. “One of the key things that this relationship does is continue to enhance the relationship corporately,” he said. “For HP, it gives them a next level of access to a customer, a next level of support to existing customers and more target customers.”
From an iTRACS’ perspective, it helps the company to continue to expand what it wants to do globally, said William Bloomstein, director of strategic and solutions marketing. It enhances the joint effort of both companies with iTRACS acting as DCIM provider and HP offering management consulting services to existing and future customers.
Bloomstein said iTRACS is seeing a lot of activity across a range of customers from small companies to large global brands. “We’ve got an avenue with HP where we can fulfill a multitude of needs,” he said. “It continues the advantage that Commscope has in place, with dedicated resources around the world to drive better opportunities.”
Enhanced HP infrastructure management solutions can optimize the capacity, availability and efficiency of infrastructure. “Customers will be able to get a wider variety of solutions through a single vendor, streamlining a lot of things from a products and usage perspective,” said Bloomstein. “Having our [DCIM] solution deployed through HP will give customers visibility into new solutions, and the ability to take existing solutions and enhance them.”
DCIM is an ongoing process for customers. Deployments aren’t static, in that customers often extend into measuring other things and finding new actionable insights. “It’s not just about the data center, it’s about the relationship between data center and the business,” said Bloomstein. “It’s really an easy-to-do business because the purchase process and deployment get streamlined. There are a lot of synergies.” | | 3:30p |
Exploding Pinto Better Than an IT Hero in Your Data Center Steve Francis is Founder and Chief Product Officer of LogicMonitor.
In a recent issue of The New Yorker, an interesting article by Malcolm Gladwell appeared called “The Engineer’s Lament”. Gladwell revisits the 1970’s Ford incident, where the top- selling car, the Pinto, exploded, culminating in the indictment of the Ford Motor Company for reckless homicide. The author discusses the variety of perceptions that arose after the accident and how the viewpoint of the public varied drastically from Ford engineers and the National Highway Traffic Safety Administration (NHTSA).
The public saw an isolated “worst-case outcome” that resulted in catastrophic deaths, believing that engineering adaptations could have reduced the risk of these events.
The engineers saw cars that were performing within specifications. The Pinto had exactly the same rate of fatal fires (1.9 percent) as the percentage of cars on the road that were not Pintos (1.9 percent). Its construction, design and incident rates were all very similar to its competitors: marginally better in some areas, marginally worse in others.
So, why did the Pinto become the poster car for flawed design? I suggest it was because people were looking to make a name for themselves – to become heroes. Journalists, attorneys and politicians heard about the edge cases (such as a Pinto exploding into flames when hit by a van at 50 miles per hour), then investigated, found more edge cases, and decided that something must be done. The result, though well meaning, actually diverted resources from an approach that could have had a far greater impact on automobile safety, and would probably have had zero impact on the cases that triggered the publicity.
So what does this have to do with your IT infrastructure?
This scenario illustrates a trap that even the best executives can fall into: focusing on heroic actions instead of best engineering practices. We all want to reward and recognize team members that go above and beyond expectations, but it’s sometimes hard to differentiate between the good and not-so-good: those activities that are heroic because of an issue that could not have been foreseen, and those that are heroic-seeming reactions that actually divert resources from more meaningful work.
Here’s an example of the latter. After I left a company to move elsewhere, I heard about a staff member that used to report to me being commended at an all-company meeting. This person drove 90 minutes in the middle of the night to a reboot a misbehaving server in the data center, and recovered from the resulting outage. While it’s great that he was willing to do that, it is not at all heroic. First, the data center had remote hands capability (staff that can be used for exactly these kinds of tasks). Why not call them and have the server rebooted in five minutes (thus reducing the time of the outage by 95 percent)? Why wasn’t every server reachable out-of-band, by ILOM (Integrated Light Out Management cards) or console, or able to be hard-cycled by managed power strips? Under those circumstances, the staff member could have rebooted the server from the comfort of his home bedroom. Lastly, why did the failure of a single server cause an outage in the first place? That’s another issue that should be investigated.
Another example is when IT administrators perform a lot of work to achieve a big result that has no impact on the business. For example, tuning kernel IO schedulers and reducing logged messages may improve CPU efficiency by a few percentage points. However, if all the systems use less than 50 percent of CPU, but storage latency is high and no one has tuned mount options, then the heroic work to tune CPU was a misplaced effort, regardless of how much it improved things.
It’s far more valuable for a company not to have situations that require heroics by making sure IT infrastructure works. Since not all projects are worth unlimited budgets, it’s not realistic to have 100 percent uptime planned for everything. Instead, focus on the most common causes of service disruption, and devise reasonable plans for dealing with them.
If the budget and other constraints given to an IT department are limited, do not expect every application design to tolerate rare but catastrophic events, such as someone hitting the Emergency Power Off button in the data center. Just as importantly, before lauding someone as a hero, understand whether the problem addressed was something that basic engineering would have already solved, or that a rational assessment of priorities would have argued not be done at all.
IT operations is a team sport, and the best teams will be adding value regularly – not providing opportunities for heroism. | | 5:00p |
New IBM Tools Aim to Prep Docker Containers for Enterprise IBM unveiled today technologies and services designed to make Docker containers robust enough to run even the most mission-critical of enterprise applications. Big Blue made the announcement in conjunction with the second annual DockerCon conference that kicks off in San Francisco Monday.
Specifically, IBM is making available elastic scaling and auto recovery tools, private overlays, load balancing, and automated routing capabilities. It is also adding support for Docker containers to Active Deploy to make sure there is no downtime when updating Docker applications, as well as support for Docker using log analytics, performance monitoring, and other IBM life-cycle management tools for IT operations.
Finally, IBM is also adding support for persistent storage, Docker image and vulnerability scanning, and the ability to access a variety of IBM Bluemix services running on the company’s implementation of the open source Cloud Foundry Platform-as-a-service environment. IBM will also resell the Docker Trusted Registry, an implementation of Docker Hub that runs on premise.
Angel Diaz, vice president of cloud architecture and technology at IBM, said IBM is providing all the traditional operations tools that IT organizations need to deploy Docker containers on physical servers without having to rely on virtual machine software in between to provide a framework for IT management and security.
“Layering containers on top of virtual machines is like adding more bloat to something that is already pretty bloated,” said Diaz. “The real value comes when you run containers on bare metal servers.”
Diaz said Docker containers not only consume less memory than virtual machines when running on bare metal servers, the utilization rates of those servers is orders of magnitude better, resulting in deployment of fewer servers to support any given application workload.
IBM containers, he added, essentially provide a foundation for what IBM sees as a Cloud 2.0 movement. The first phase of the cloud was primarily about saving money. The second phase is more about enabling developers to more rapidly create applications that add business value in a way that IT operations teams can easily and consistently deploy with a minimal amount of friction.
Ultimately, IBM envisions an IT world where Docker containers can move seamlessly between open heterogeneous cloud computing environments. But before any of that can happen, Diaz said, IT operations teams clearly need to have access to the DevOp tooling required to both manage and secure Docker containers that will soon be running at unprecedented levels of scale across the entire enterprise. | | 5:00p |
Docker Intros Container SDN to Automate Network Provisioning The amount of manual labor required to specify the way Docker containers in a multi-container application interconnect and how those connections use the physical network infrastructure underneath has been one of the largest barriers to adoption of the startup’s technology.
This is why a new software-defined network for containers is front and center in the latest release of the platform Docker announced this morning at its sold-out second annual DockerCon event in San Francisco, the startup’s home base.
The big addition in Docker 1.7 is an “SDN stack that takes multi-host networking all the way down to the container,” David Messina, the company’s VP of marketing, said. “Our goal is to move distributed applications forward.”
Essentially, it automates network provisioning for an application that consists of multiple Docker containers running on multiple host servers in a data center or multiple VMs in a cloud. It creates an automated virtual network topology where containers identify each other through a domain name system (DNS) and communicate over an IP infrastructure.
Docker gained the container SDN technology through its acquisition of a startup called SocketPlane in March. SocketPlane’s open source virtual switch, called Open vSwitch integrates with Docker’s container management platform.
The SDN code is part of Docker Engine, the Docker runtime. All that’s required from the developer is to define the container images the application uses and relationships between them using the company’s Compose tool.
Swarm, the server clustering part of the platform, schedules the containers to run on different hosts. Through APIs, Swarm calls into the networking stack to define the network topology.
The Docker platform doesn’t configure the network itself. The company is partnering with numerous vendors to make their solutions compatible with the container SDN.
Initial partners that have created Docker networking plugins include Cisco, VMware, Midokura, Nuage Networks, Microsoft, Weaveworks, and Calico.
Docker has also made some updates to the orchestration tools Compose and Swarm. Aiming to give operations staff more flexibility with the way they deploy Docker applications, it has added the ability to replace the backend piece of Swarm with Mesosphere, the popular clustering system. This capability is currently in beta.
Another option that’s coming in the near future is the ability to replace Swarm’s backend with Amazon Web Services’ EC2 Container Service, or ECS. | | 5:00p |
Docker, CoreOS, Others Form Vendor-Agnostic Linux Container Project A potential war for control of the future direction of Linux containers appears to have been averted following an agreement between Docker and most of the leading players driving the development of containers to create a standard that will enable images to be shared across multiple container formats.
Vendors backing the new Open Container Project standard, to be overseen by the Linux Foundation, include Amazon Web Services, CoreOS, Docker, Google, Microsoft, HP, IBM, Intel, Red Hat, and VMware, among others. Investment banking giant Goldman Sachs announced its support for the initiative as well.
Announced today at DockerCon conference in San Francisco, the letter of intent signed by all the major parties represents a major advance in the sense that OCP isolates end users from potential battles between vendors over whether Docker or Rocket containers created by CoreOS are the best way forward, CoreOS CEO Alex Polvi said.
Specifically, OCP has pledged to not be bound by higher-level constructs such as a particular client or orchestration stack, and not be tightly associated with any particular commercial vendor or project and be portable across a wide variety of operating systems, hardware, CPU architectures and clouds.
The OCP image format will be backwards compatible with the Docker image format and appc, the CoreOS-created format. OCP will also make an effort to harmonize multiple Linux container projects that already exist. For example, the backers of a Docker libcontainer project will become the lead maintainers for the OCP, joined by two prominent maintainers of the appc project, which CoreOS recently spun off as an independent entity.
“Users will be able to share images across container formats without having to worry about getting locked into a specific vendor,” Polvi said. “This is a win-win for everybody.”
Scott Johnston, senior vice president of product for Docker, said this approach will enable the industry to innovate in a way that doesn’t wind up tearing the IT house asunder.
“It’s really about fostering a community,” Johnston said. “There’s no need to fragment the industry.”
However, CoreOS will continue to make its case that Rocket containers are more secure and easier to compose inside of multiple management frameworks than Docker containers, Polvi said.
OCP at is core incorporates both draft specifications and existing Docker code around an image format and container runtime along with concepts that CoreOS put forward in an Application Container spec that CoreOS created to define how to build containerized applications when it first launched Rocket.
Within three months, the parties aim to complete the process of creating the projet, migrating code, and publishing a draft specification, building on technology donated by Docker.
Polvi said that in the case of OCP, Docker deserves credit for exercising a considerable amount of leadership. For Docker’s part, Johnson credits CoreOS for coming up with the Application Container spec concept.
In the meantime, Polvi said he personally wished more end-user IT organizations like Goldman would join OCP to help balance out what for the moment is a vendor-heavy standardization effort. |
|