Data Center Knowledge | News and analysis for the data center industry - Industr's Journal

Data Center Knowledge | News and analysis for the data center industry - Industr's Journal

[Most Recent Entries] [Calendar View]

Thursday, April 14th, 2016

Time	Event
`9:00a`	Vapor IO’s Re-envisioned Data Centers One Step Closer to Reality The basic principle is this simple: If you take the heat exhausts from a cluster of servers and point them toward a central cylinder, like a covey of quail, then run a chimney pipe from that axis, you create a convection. As hot air rises, cooler air rushes in to replace it. This principle is the central tenet of Vapor IO’s not-all-that-radical data center cooling architecture, which it claims could drive PUE ratios down by close to 10 percent, from an average of about 1.7 to 1.6. For this principle to be enacted in the world’s data centers today, both old and new, most everything about their architecture, layout, and construction has to be reconsidered. On Wednesday, Vapor IO took a critical step toward making such a reconsideration actually happen on a grander scale, with the official announcement of its having joined the Open Data Center Alliance. “As we are all aware, data centers are extremely complex, and both capitally and operationally expensive,” stated Vapor IO CEO Cole Crawford, in a note to Datacenter Knowledge. “ODCA represents a powerful organization that, I think, compliments other community-based organizations dedicated to commoditizing an overly complex and often disjointed environment.” The Venturi Effect The original purpose of the ODCA, as we reported in 2011, was to concentrate the buying power of multiple organizations, toward infrastructure components that best suit a broader range of customers. Commoditization is one very effective approach to making these components both more affordable and less proprietary, as has been proven by Facebook’s efforts with the Open Compute Project. Cole Crawford was the founding executive director of that project’s foundation. At an industry conference last September, Crawford demonstrated a simulation of the airflow of two data centers, one employing his company’s vapor chambers, and the other using the standard hot aisle/cold aisle layout. In explaining its physics (in a manner Richard Feynman would have appreciated), Crawford asked attendees to imagine having to evacuate the water out of a bottle by blowing into a straw. “I don’t care how strong your lungs are,” he said, “it wouldn’t happen. “In thermodynamics, air moves exactly the same way as water,” he continued. The formal name for this motion is the Venturi effect — the principle that, when a fluid or gas moves through a constricted pipe, its velocity rises and its static pressure proportionately declines. More fluid or gas rushes in from behind to fill the vacuum. “Thermal dynamics shows us that hot air likes to rise, and that pulling air as opposed to pushing air is far more efficient,” the CEO told us Wednesday. “By eliminating head pressure, we can move more air at less cost, while increasing rack density and the delta-T (the difference between hot aisle and cold aisle temperatures) — which equates to massive OpEx savings.” The Southbound Interface In Vapor IO’s design, heated air is helped along upward by way of a fan. Now, if layout were the only element of vapor chamber design, you might think that it wouldn’t be that big of a deal to release it to open source. But there’s another critical element that makes the company’s enrollment in ODCA make a lot more sense: the monitoring and automation interface. Last month, Vapor IO released version 1.2 of its Open Data Center Runtime Environment (OpenDCRE). Borrowing a phrase from the realm of SDN, the company describes OpenDCRE as including a “southbound interface.” This is comprised of a RESTful API which gives resource monitoring and workload orchestration tools real-time access to the current statistics for every device in the rack. As Crawford confirmed for us, Vapor IO racks would follow the Open Rack standard. “OpenDCRE is available for anyone to leverage as a 21st century, open source alternative to the gratuitous differentiation offered by existing data center management products for rack-scale control and automation,” Crawford told Datacenter Knowledge, with a bit of trademark bravado. “We’ve seen both adoption and innovation for what people intend to do with OpenDCRE, and based on what we’re hearing, it will be disruptive on many levels of the data center.” Orchestrating Racks One of those orchestrators to which OpenDCRE will be connected is Mesosphere’s DCOS — the groundbreaking, container-based, multi-server workload orchestration platform which has attracted the earnest attention of the communications industry. “I am really excited about the work that they (Vapor IO) are doing,” said Mesosphere CMO Matt Trifiro, in an e-mail to Datacenter Knowledge, “building self-contained and highly-efficient data center ‘modules’ that start to move us toward our vision of the entire data center behaving as one giant computer.” Trifiro pointed us toward an “unprecedented” future where very granular workloads — conceivably, individual microservices — could be orchestrated and scheduled for deployment on specific servers based, in part, upon their up-to-the-minute physical conditions. “One of the most interesting aspects of Vapor IO’s work,” said Trifiro, “is they are building hardware and software interfaces that provide new sources of telemetry, from the hardware that highly-automated systems like Mesosphere DCOS will use to make real-time scheduling decisions that optimize for business outcomes. For example, DCOS can ingest real time temperature and power information from the cluster, and use that data to make scheduling decisions that optimize for cooling or power constraints.” “Data centers today not only have a gargantuan amount of data to collect,” said Vapor IO’s Crawford, “but that data is, at best, delivered near-time and through proprietary interfaces. Additionally, data center operators and staff have to ensure uptime, but when the pager goes off at 2 in the morning, it’s fair to say we’re likely not operating at 100 percent. The role of the sysadmin has matured into a more proactive DevOps role. OpenDCRE makes this maturation possible for data center operators as well. “Imagine designing self-healing physical environments,” he continued, “that work in concert with self-healing IT environments. This is the power of OpenDCRE.” (Comment on this)
`9:00a`	Why Reliable Power Protection is Critical for Data Center Operators John Collins is a Product Line Manager for Eaton. You may not know it, but last month we celebrated World Backup Day, in which the tech industry encouraged both consumers and professionals to back up their important data. The occasion served as a good reminder for data center professionals that backing up critical data means having the right power protection strategy in place to ensure data center downtime doesn’t translate into lost revenue for their businesses. But not everyone took notice. In fact, it’s somewhat surprising that many operators consider reliable power protection to be low on their list of priorities, even though it can have major implications for data loss. During the course of operation, power sags, surges and outages are unavoidable, and more than capable of damaging valuable IT equipment and cutting off access to important data. Because of this, it’s essential that data center operators incorporate a robust power protection solution into their overall data center design strategies. This article will provide an introductory overview of why comprehensive power protection is critical to ensuring continuous uptime in the data center. Additionally, we’ll look at an example of how one data center operator, ByteGrid, recently implemented a comprehensive power management and monitoring solution to help ensure reliability and reduce the risk of downtime in its facility. Why Power Protection Matters No company can afford to leave its IT assets unprotected from power issues. Here are just a few of the reasons why: Even short outages can be trouble. Losing power for as little as 1/50 of a second can trigger events that may keep IT equipment unavailable for anywhere from 15 minutes to many hours. This downtime can be enormously costly for the business. Some experts believe the U.S. economy loses between $200 billion and $570 billion a year due to power outages and other disturbances. Utility power isn’t clean. By law, electrical power can vary widely enough to cause significant problems for IT equipment. According to current U.S. standards, for example, voltage can legally vary from 5.7 percent to 8.3 percent under absolute specifications. That means that what utility services promising 208 volts actually deliver can range from 191 to 220 volts. The problems and risks are intensifying. Today’s storage systems, servers and network devices use components so miniaturized that they falter and fail under power conditions earlier-generation equipment easily withstood. Generators and surge suppressors aren’t enough. Generators can keep systems operational during a utility outage, but they take time to startup and provide no protection from power spikes and other electrical disturbances. Surge suppressors help with power spikes but aren’t capable of aiding with issues like power loss, under-voltage and brownout conditions. Availability is everything. Once, IT played a supporting role in the enterprise. These days it’s absolutely central to how most companies compete and win. When IT systems are down, core business processes quickly come to a standstill. Power costs must be managed. The cost of power and cooling has spiraled out of control in recent years. Data center managers are typically held responsible for achieving high availability while simultaneously reducing power costs. Highly efficient UPS systems can help with this goal, and products are available today that were not an option even a few years ago. This should serve as a sobering reminder to data center managers that they simply cannot afford to go without reliable backup power as part of a comprehensive power protection strategy. Now we’ll look at how one data center operator realized the critical nature of power management as it grew its multi-tenant facility, and took the necessary steps to implement it. ByteGrid Case Study When ByteGrid, a leading provider of complaint hosting solutions, began planning a major expansion of its Cleveland Technology Center (CTC) – the city’s largest multi-tenant facility – the company embraced a lofty aspiration: to deliver customized data center solutions with the highest level of compliance. The company pledged to tailor the exact space that each of its customers desires, not a one-size-fits-all approach that many other multi-tenant facilities take. To achieve this, ByteGrid’s project leaders knew they needed to implement a robust power management solution within the company’s exceptionally secure, 333,215-square-foot Tier 3 space. Understanding the importance of reliable power protection, ByteGrid installed two uninterruptible power systems (UPSs), each consisting of 2 x 750 kW, which ByteGrid expected to lower total cost of ownership (TCO) by leveraging groundbreaking energy saving technology while increasing power system reliability. Conclusion Data center downtime is unacceptable for any operator, which is why it’s critical to implement reliable backup power management solutions as part of their data center design strategy, whether building a new facility or expanding an existing one. ByteGrid provides a strong example of one such company that understood the importance of having reliable power protection solutions in place and took the needed steps to make its vision a reality. Operators should take a good look at ByteGrid’s example when reviewing their own power management needs and consider adding new power protection capabilities to lower costs and reduce the risk of costly outages in their own facilities. Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library. (Comment on this)
`7:37p`	DigitalOcean Gets $130M Credit Facility to Stretch Cloud Infrastructure By Talkin’ Cloud Infrastructure as a service provider DigitalOcean announced on Thursday that it has secured a $130 million credit facility to purchase equipment that will support its global expansion. DigitalOcean has seen rapid growth over the past two years: its registered customer base has grown from 253,000 to 708,000 users that have launched more than 13 million cloud servers. Since November DigitalOcean has opened up another data center, bringing its total to 12, with one to go live next month in India, according to Forbes. The company is also on track to launch a storage product in December which DigitalOcean CEO Ben Uretsky calls the “next generation” of its product. “These financing transactions contribute to our goal of building the next generation cloud for software developers,” Uretsky said. “We’ll be releasing new products and features in the upcoming months that will enable larger production environments to scale on our cloud infrastructure.” KeyBanc Capital Markets led the financing, while other banks participating in the deal include Barclays Bank PLC, Pacific Western Bank, East West Bank, Opus Bank, Webster Bank and HSBC Bank USA. “We are delighted with the outcome of our credit facility. It complements the $83 million Series B equity financing that we closed in June 2015 and our strong cash flows and balance sheet in pursuing long-term growth opportunities,” Brian Cohen, Chief Financial Officer of DigitalOcean said in a statement. Original article appeared at http://talkincloud.com/cloud-computing-funding-and-finance/digitalocean-gets-130m-credit-facility-stretch-cloud-infrastruct (Comment on this)
`8:43p`	Google Reimburses Cloud Clients After Massive Google Compute Engine Outage By Talkin’ Cloud Google is reimbursing Google Compute Engine users up to 25 percent of their monthly charges after an outage that impacted instances across all regions on Monday. The outage lasted 18 minutes, and did not affect Google App Engine, Google Cloud Storage, or other Google Cloud Platform products. While 18 minutes may not sound like a lot of time, in the cloud world it is. And because the outage impacted multiple regions, it meant clients couldn’t failover to a new region in order to mitigate the impact of the outage. According to a lengthy and apologetic post mortem on the Google Cloud Platform status page on Wednesday, the issue began when engineers removed an unused GCE IP block from its network configuration and instructed its systems to propagate the new configuration across the network. “By itself, this sort of change was harmless and had been performed previously without incident. However, on this occasion our network configuration management software detected an inconsistency in the newly supplied configuration,” Google VP of engineer Benjamin Treynor Sloss said. “In attempting to resolve this inconsistency the network management software is designed to ‘fail safe’ and revert to its current configuration rather than proceeding with the new configuration. However, in this instance a previously-unseen software bug was triggered, and instead of retaining the previous known good configuration, the management software instead removed all GCE IP blocks from the new configuration and began to push this new, incomplete configuration to the network.” To read more about the specifics and the timeline of the outage, check out Google’s post mortem which goes into more detail. According to Google, its engineering teams will be working over the next several weeks on a “broad array of prevention, detection and mitigation systems intended to add additional defense.” “It is our intent to enumerate all the lessons we can learn from this event, and then to implement all of the changes which appear useful,” he said, noting that “there are already 14 distinct engineering changes planned spanning prevention, detection and mitigation.” Original article appeared at http://talkincloud.com/cloud-computing/google-reimburses-cloud-clients-after-massive-google-compute-engine-outage (Comment on this)
`10:33p`	Rackspace Calls on Software Devs to Break CPU Performance Barriers A number of various laws of physics have evidently conspired to prevent the performance-doubling strategies of processor builders from being carried out interminably. Two months ago, Intel began a campaign of letting down the server manufacturing world gently. At the annual ISSCC conference in San Francisco, Intel Executive Vice President Bill Holt admitted to attendees that, beyond the 7 nm lithography process, engineers would need to resort to out-and-out quantum physics extrapolations if it expects to maintain the same performance improvements as when Moore’s Law was moving swimmingly along. The OpenPOWER Foundation brings together industry engineers from Google, Micron Technology, NVIDIA, Samsung, and of course, IBM — whose 24-core, 14 nm Power9 processors are on track for H2 2017. Even put together, these organizations may be as short on quantum physicists as Intel. So in a presentation for the annual OpenPOWER summit last week, Rackspace Distinguished Engineer Aaron Sullivan made a suggestion that would have sounded outlandish, if the alternative on the table didn’t involve striking some kind of a bargain with quarks. Stated simply, Sullivan suggested that the consortium seek out Python developers to work together to resolve the performance barricade issues in their spare time. A Whole New Drawing Board “Today, developers are really not where we need them. We’ve done this to ourselves in the industry with things like Java and Python,” said Sullivan. Showing a diagram depicting the relative disparity between digital logic (the raw programming that determines the schematics of integrated circuits) and scripting (the types of jobs that produce Web pages), he continued, “today, most of our developers live in the world of scripting and very abstract languages.” It was evidence for a much broader case: As the performance requirements of data centers continue to scale linearly as though physics were not standing in the way, many are coming to believe, engineers’ attention should shift from “cramming” more components onto integrated circuits (to borrow Gordon Moore’s phrase) to improving the performance of workloads which those ICs process. It’s a case that’s been made in the pages of Datacenter Knowledge before. Wrote AppliedMicro Vice President John Williams last November, “The future of the data center is a broad set of solutions using cost-effective, energy-efficient processors.” One of those solutions, Williams wrote, could be a workload accelerator — a class of hardware designed specifically to improve the performance of software. That’s exactly what Rackspace’s Sullivan is alluding to. IBM is currently producing what it calls a Coherent Accelerator Processor Interface for its Power8 processors. Not unlike leveraging GPUs for highly-parallel math operations, CAPI enables highly parallel operations to be offloaded from the CPU to an FPGA, which is programmable in a way that’s a bit more similar to programming software than architecting integrated circuits. Experience Bottlenecks A bit more similar, but not much, said Kurt Marko, veteran consultant and analyst with MarkoInsights, in an interview with Datacenter Knowledge. While some software devs may understand the basics of how instructions are translated into Boolean logic and down into microscopic gates, their experience would not translate to an ability to design FPGAs. “He (Sullivan) was talking about how CPU architectures enable hardware accelerators for specific applications — GPUs being one of them, but FPGAs being the notable, other category,” explained Marko, who was in attendance Wednesday. “Where he was going with that Python developer comment was: When you develop FPGAs, they’ve got their own design language. . . Until you can make that hardware customization easy enough for Python developers, it’s going to have limited appeal.” OpenPOWER competitor Intel has been thinking along much the same lines as CAPI, as was made evident by its acquisition last year of FPGA producer Altera. What CPU makers are realizing is that the overall performance of servers can still be perceived to increase, by the people who run workloads on those servers. The problem at hand is how to shift the burden — maybe slightly, maybe significantly — from IC engineers to IT professionals. (The path Marko foresees to making this shift happen, is the subject of his blog post for Diginomica published Thursday.) As Sullivan explained to OpenPOWER Summit, the education and experience necessary for a person to become a traditional hardware engineer simply won’t scale to the same breadth as a traditional software developer. And a small subset of software devs are working with the lower elements in the hardware stack — at the operating system and driver level. “We have very clever developers, and we love what they do because they make our lives more convenient and more interesting,” said Sullivan. “And those entrepreneurs are Python developers. So what if your world mostly consists of this sort (Python devs) and you want to retool?” Road Closed Ahead As Marko confirmed for us, Sullivan wasn’t trying to draw any correlations between FPGAs and Python. Rather, his point was to illustrate that scripters constituted the bulk of the mindset for today’s devs — even though he made that point in what Marko called a “mirthful” way. It’s those devs to whom FPGA will somehow need to appeal, in order to acquire the depth of contribution necessary to make workload performance improve the way it had been at the dawn of the multicore era. “The industry needs to have better design tools for building customizable hardware circuits that implement specific software features in hardware,” said Marko. “The thing that I’m seeing — from Google, IBM, OpenPOWER, and Intel too — is that Moore’s Law is wheezing. It’s not keeping up. The days of just getting software improvements just by brute force performance improvement of a generic CPU, are slowing down drastically. “The way to keep up with that historical performance curve,” he continued, “isn’t to try and just wring out faster, general-purpose CPUs. It’s to apply the hardware to more specific problem domains, so that you’re actually accelerating specific functions that are time-consuming in software.” In closing his Wednesday speech, Aaron Sullivan recalled that every successful effort at bringing new developers to the table for Linux improvements, has been through the distribution of more productive tools for open source efforts. But even these were served up with an extra helping of evangelism. “You want to get to the moon? You’ve got to bring a lot of people from different disciplines together; you give them a big, crazy challenge; you tell them how awesome it would be if we got there. And it’s not just the shuttle you build; you’ve got to build the platform underneath it as well.” (Comment on this)

<< Previous Day 2016/04/14
[Calendar] Next Day >>

Data Center Knowledge | News and analysis for the data center industry - Industry News and Analysis About Data Centers

About LJ.Rossia.org