Data Center Knowledge | News and analysis for the data center industry - Industr's Journal
 
[Most Recent Entries] [Calendar View]

Friday, March 11th, 2016

    Time Event
    1:00p
    Facebook Data Centers: Huge Scale at Low Power Density

    Editorial-Theme-Art_DCK_2016_March

    This month, we focus on the open source data center. From innovation at every physical layer of the data center coming out of Facebook’s Open Compute Project to the revolution in the way developers treat IT infrastructure that’s being driven by application containers, open source is changing the data center throughout the entire stack. This March, we’ll zero in on some of those changes to get a better understanding of the pervasive open source data center.

    When Facebook was launching the first data center it designed and built on its own, the first of now several Facebook data centers in rural Oregon, Jason Taylor, the company’s VP of infrastructure, expected at least a little fallout from the new power distribution design that was deployed there.

    Instead of a centralized UPS plant in a separate room behind the doors of the main data hall, Facebook had battery cabinets sitting side by side with IT racks, ready to push 48V DC power to the servers at moment’s notice. What made him nervous was that the servers needed 12V AC power, and mechanism that switched between two different combinations of voltage and current had to work like a Swiss watch if you didn’t want to fry some gear.

    “I would have expected at least some fallout,” Taylor said in an interview. But, the system was tested in Prineville many times in the first couple of years in Prineville – both intentionally and unintentionally – and, eventually, he learned to stop worrying and [insert the rest of the cliché].

    Facebook later open sourced some of the innovations in data center design that were introduced in Prineville through the Open Compute Project, an initiative started by the social networking giant to bring some open source software ethos to IT hardware, power, and cooling infrastructure.

    Better Efficiency Through Disaggregation

    One of the interesting aspects about Facebook data center designs is that the company has been able to scale tremendously without increasing power density. Many data center industry experts predicted several years ago that the overall amount of power per rack is going to grow in data centers – a forecast that for the most part has not materialized.

    “Rather than targeting 20kW per rack or 15kW per rack, we actually targeted about 5.5kW per rack,” Taylor said. “We understood that the low power density on racks was just fine.”

    Jason Taylor, Facebook's VP of infrastructure, speaking at the Open Compute Summit in March 2016 in San Jose, California (Photo: Yevgeniy Sverdlik)

    Jason Taylor, Facebook’s VP of infrastructure, speaking at the Open Compute Summit in March 2016 in San Jose, California (Photo: Yevgeniy Sverdlik)

    One big reason Facebook has been able to keep its data centers low-density is that its infrastructure and software teams have been willing to completely rethink their methods on a regular basis. This, coupled with advances in processors and networking technology, has resulted in new levels of efficiency that enabled Facebook to do more with less.

    One of the most powerful concepts that resulted from this kind of rethinking is disaggregation, or looking at an individual component of a switch or a server as the basic infrastructure building block – be it CPU, memory, disk, or a NIC – not the entire box.

    Disaggregation in Action

    An example that demonstrates just how powerful disaggregation can be is the way the backend infrastructure that populates a Facebook user’s news feed is set up. Until sometime last year, Multifeed, the name of the news feed backend, consisted of uniform servers, each with the same amount of memory and CPU capacity.

    The query engine that pulls data for the news feed, called Aggregator, uses a lot of CPU power. The storage layer it pulls data from keeps it in memory, so it can be delivered faster. This layer is called Leaf, and it taxes memory quite heavily.

    The previous version of a Multifeed rack contained 20 servers, each running both Aggregator and Leaf. To keep up with user growth, Facebook engineers continued adding servers and eventually realized that while CPUs on those servers were being heavily utilized, a lot of the memory capacity was sitting idle.

    To fix the inefficiency, they redesigned the way Multifeed works – the way the backend infrastructure was set up and the way the software used it. They designed separate servers for Aggregator and Leaf functions, the former with lots of compute, and the latter with lots of memory.

    This resulted in a 40 percent efficiency improvement in the way Multifeed used CPU and memory resources. The infrastructure went from a CPU-to-RAM ratio of 20:20 to 20:5 or 20:4 – a 70 to 80-percent reduction in the amount of memory that needs to be deployed.

    Network – the Great Enabler

    According to Taylor, this Leaf-Aggregator model, which is now also used for search and many other services, couldn’t have been possible without the huge increases in network bandwidth Facebook has been able to enjoy.

    “A lot of the most interesting stuff that’s happening in software at large scale is really being driven by the network,” he said. “We’re able to make these large long-term software bets on the network.”

    Today, servers and switches in Facebook data centers are interconnected with 40-Gig links – up from 1 Gig links from the top-of-rack switch to the server just six years ago. New Facebook data centers being built today will use 100-Gig connectivity, thanks to the latest Wedge 100 switch the company designed and announced earlier this year.

    “As of January of next year, everything will be 100 Gig,” Taylor said.

    Read more: With its 100-Gig Switch, Facebook Sees Disaggregation in Action

    With that amount of bandwidth, having memory next to CPU is becoming less and less important. You can split the components and optimize for each individual one, without compromises.

    “Locality is starting to become a thing of the past,” Taylor said. “The trend in networking over the last six years is too big to ignore.”

    Disaggregation Keeps Density Down

    Disaggregation has also helped keep overall power density in Facebook data centers at bay.

    Some compute-heavy racks, such as the ones populated with web servers, can be between 10kW and 12kW per rack. Others, such as the ones packed with storage servers, can be about 4.5kW per rack.

    As long as the overall facility averages out to about 5.5kW per rack, it works, Taylor said.

    One of the disaggregation extremes Facebook has gone to recently is designing storage servers specifically for rarely accessed user content, such as old photos, and designing separate facilities next to its primary data centers optimized just for those servers.

    The “cold storage racks are unbelievably cold,” Taylor said, referring to the amount of power they consume. They are at 1 to 1.5kW per rack, he said.

    As a result, it now takes 75 percent less energy to store and serve photos people dig out of their archives to post on a Thursday with a #tbt tag than it did when those photos were stored in the primary data centers.

    As it looks for greater and greater efficiency, Facebook continues to re-examine and refine the way it designs software and the infrastructure that software runs on.

    The concept of disaggregation has played a huge role in helping the company scale its infrastructure, increase its capacity without increasing the amount of power it requires, but disaggregation at that scale could not have been possible without rapid progress in data center networking technology over the recent years.

    9:51p
    Worldwide Server Revenue, Shipments Up in 2015
    By WindowsITPro

    By WindowsITPro

    According to Gartner, overall 2015 server shipments worldwide increased by 9.9 percent alongside an increase of 10.1 percent in server revenues.

    The latest report from Gartner, focused on the fourth quarter of 2015, shows just one company taking a hit in their year over year on both revenue and server shipments.

    Hewlett Packard Enterprise (HPE), the newly formed company from the split of HP into separate enterprise and consumer companies, showed a 2.2 percent drop in server revenues compared to the same period in the previous year. Gartner also shows a 2.6 percent drop in server shipments between the same two periods.

    The reason for the drops for HPE, according to Gartner, include:

    The primary decline in HPE’s server shipments can be attributed to a global weakness in Windows-based x86 servers, while the decline in revenue was driven mostly by a drop in RISC/Itanium Unix server sales for the period.

    The only other company to see a decrease in the fourth quarter was Dell with just a .3 percent drop in server hardware shipments.

    Among the other companies in the report, Cisco experienced a 20.2 percent growth in revenue, although they do not even show up in the corresponding top five list in shipment numbers.

    Inspur Electronics and Huawei saw the biggest increases on global shipments of server hardware with 53.3 and 27 percent respectively.

    Regionally Asia/Pacific led in server shipments with 20.1 percent, followed by North America (8.5 percent) and Western Europe (4.3 percent).

    Gartner’s report indicates that Facebook, Microsoft and Google are having a significant impact on server shipments:

    “The real growth driver for the quarter in terms of absolute value was the Other Vendors category,” said Jeffrey Hewitt, research vice president at Gartner. “This collection of unspecified vendors that includes original design manufacturers (ODMs), like Quanta and Wistron, contributed over $750 million in revenue and over 170,000 server unit shipments for the period. This demonstrates that the growth of hyperscale data centers, like those of Facebook, Google and Microsoft, continues to be the leading contributor to physical server increases globally.”

    Remember – what we know as being in the cloud is actually data sitting on a server somewhere in the world – so hardware remains a critical link in the ecosystem.

    This first ran at http://windowsitpro.com/hardware/report-server-revenue-shipments-increased-worldwide-4th-qtr-2015

    10:17p
    Benchmark Test Pits Docker’s Swarm Against Kubernetes

    Talkin Cloud logoCloud admins have not yet reached consensus on which container orchestration platform is best. But Docker has started pushing Swarm hard by promoting new benchmark tests that show major performance advantages vis-à-vis Google’s competing Kubernetes cluster management solution.

    Swarm and Kubernetes are part of the burgeoning ecosystem of container orchestration platforms that have arisen in recent years to help admins manage container-based cloud environments. Other popular solutions in the same niche include Amazon ECS, Apache Mesos, and Openshift, among others.

    As Docker noted in a recent blog post, market share is currently split relatively evenly between Swarm and Kubernetes. A survey showed that they are the most popular container orchestration platforms, with Swarm enjoying only a slight lead. (Amazon ECS comes in a close third.)

    Swarm Versus Kubernetes

    But Docker also suggested that those results are surprising. According to the company, Swarm, its home-grown cluster management solution, clearly beats out the competition.

    To make that point, Docker commissioned benchmark testing by a third-party consultant, Jeff Nickoloff. The tests compared Swarm and Kubernetes performance and response time at different levels of scale.

    The full results are detailed in Docker’s blog post. But the major finding was that Swarm delivers performance that is about five times faster overall. Under circumstances — namely, when the cluster workload reached 100 percent — Swarm was as much as 98 times faster than Kuberetes. (Yes, that’s 98 times faster, or 9800 percent.)

    Docker chalked the differences up to what it describes as a simpler, more streamlined design for Swarm. In contrast, it said, Kubernetes is “overly complex and [needs] teams of cloud engineers to implement and manage it day to day.”

    To be sure, the fact that the benchmark tests were commissioned by Docker raises some questions about how neutral they were. In addition, the scale of the environment that Nicoloff used for testing — 30,000 containers running on 1,000 nodes — may be different from what many organizations would consider a large-scale cluster. That means it’s unclear whether Swarm would outpace Kubernetes in all real-world situations as well as it did in the benchmark tests.

    Still, the message Docker wants to send is clear. The company is pushing organizations to adopt its cluster management solution rather than going with a third-party tool. That is notable because Kubernetes can manage Docker containers just as well as Swarm can. Kubernetes is not a threat to Docker’s main product, which is containers. But Docker hopes enterprises will adopt more than Docker containers. It wants them to use Docker products at all levels of the cloud stack.

    Expect this to remain an important part of Docker’s strategy as the company’s software and business model continue to mature.

    This first ran at http://talkincloud.com/cloud-computing-and-open-source/container-wars-benchmark-test-pits-dockers-swarm-platform-against-ku

    << Previous Day 2016/03/11
    [Calendar]
    Next Day >>

Data Center Knowledge | News and analysis for the data center industry - Industry News and Analysis About Data Centers   About LJ.Rossia.org