Data Center Knowledge | News and analysis for the data center industry - Industr's Journal
 
[Most Recent Entries] [Calendar View]

Tuesday, March 7th, 2017

    Time Event
    1:00p
    How to Survive a Cloud Meltdown

    One of the biggest questions following Amazon’s cloud outage last week was whether you can use the world’s biggest cloud provider and still avoid downtime when the provider has a major outage – a common if infrequent occurrence. If you can, how to do it? And if there is a way to do it, why isn’t everybody doing it?

    The answer to the first question is clearly “yes.” While lots of websites and other services delivered over the internet took a hit – by one analyst firm’s estimate, the outage collectively cost hundreds of millions of dollars to AWS customers among S&P 500 companies and companies in the financial services industry alone — many did not.

    There are several potential answers to the “how” question, while the reason why everyone isn’t using those methods appears to be mostly cost.

    The ways to avoid going down along with the cloud provider are essentially different ways to build redundancy into your system. You can keep multiple copies of stored objects and virtual machines in multiple data centers located in different regions and use a database that spans multiple data centers, Adam Alexander, senior cloud architect at the cloud management firm RightScale, told Data Center Knowledge via email.

    More on the incident: AWS Outage that Broke the Internet Caused by Mistyped Command

    The ways to implement this include using multiple regions by the same cloud provider, using multiple cloud providers (Microsoft Azure or Google Cloud Platform in addition to AWS, for example), or using a mix of cloud services and your own data centers, either on-premise or leased from a colocation provider. You can choose to spend the time and money to set this architecture up on your own or you can outsource the task to one of the many service providers that help companies do exactly that.

    You can also use a caching service by a CDN provider like Cloudflare, which stores redundant copies of the data you store in Amazon’s S3 service – the cloud storage service that was the culprit in the AWS outage – in its own data centers. This in effect is outsourcing redundancy completely (at least for storage).

    Companies that used S3 “behind” Cloudflare did not lose access to that data during the incident, Cloudflare CTO, John Graham-Cumming, said in an interview. “That’s quite a common configuration for us,” he said. “We can deal with any kind of outage like this.”

    Asked why all cloud users don’t have some sort of redundant, fault-tolerant scheme in place, Graham-Cumming cited cost and complexity. “It’s not necessarily easy to do that,” he said. “There’s a financial cost to it.” One of the most attractive characteristics of cloud services is the pay-for-what-you-use model. If you have redundant VMs and multiple copies of data, your cloud bill can easily skyrocket, and that’s before you take the cost of setting up the automatic failover mechanism into account.

    The extra cost of cloud resiliency includes the cost of “storing additional copies of your data in another location, maintaining standby compute resources to handle disasters, and additional in-network bandwidth required to keep the two locations in sync,” Philip Williams, principal architect at Rackspace, wrote in a blog post last week.

    And cost is already a big deal for companies using cloud services at any significant scale. Managing cost was the most frequently cited challenge among mature cloud users in this year’s State of the Cloud survey by RightScale. Along with security, spend is one of the two top challenges cloud users report.

    While 85 percent of enterprise respondents to RightScale’s cloud survey said they had multi-cloud strategies in place, it’s unclear whether they have those strategies for resiliency or simply use different cloud services for different purposes. For example, a company could use AWS for Infrastructure-as-a-Service for testing and development and Google’s Platform-as-a-Service for a production website: two different clouds being used for completely different things while technically amounting to a multi-cloud strategy.

    Cloudflare’s Graham-Cumming said multi-cloud strategies for the purpose of resiliency are on the rise, however. “More and more customers are looking for a multi-cloud solution,” he said. And resiliency isn’t the only reason they’re interested. It’s partly to avoid outages, but many companies also want to avoid being locked into a single cloud provider. “That’s going to become more and more of a trend.”

    Helping that trend along is the recent rise of application containers and container orchestration tools, such as Docker, Kubernetes, and Mesosphere’s DC/OS, designed with the aim of making applications independent of the type of infrastructure they run on. If the promise of portable workloads is truly realized, multi-cloud strategies will be a whole lot easier to execute.

    “Outages happen; that’s a fact,” Zac Smith, CEO of the cloud upstart Packet, said via email. “However, there’s a huge silver lining in the promise of workload portability and the power of open source orchestration tools like Kubernetes and others. In short, my advice is: If you’re moving to the cloud or are already there, make sure you’re portable and that you control your own orchestration.”

    See also: No Shortage of Twitter Snark as AWS Outage Disrupts the Internet

    Correction: A previous version of this article incorrectly used Salesforce’s cloud CRM as an example of a cloud service used by respondents to RightScale’s State of the Cloud survey. The CRM is a Software-as-a-Service solution, and RightScale specifically excluded SaaS from its survey data.

    4:00p
    Evaluating Predictive Analytics for Flash Arrays   

    Andre Franklin is Senior Product Marketing Manager for Nimble Storage.

    Flash storage is rapidly becoming the norm in the enterprise data center. Since flash inherently delivers high speed, buyers are starting to evaluate vendors on criteria other than speed and table stakes features. Predictive analytics is among the capabilities being regularly evaluated as it promises far more than raw storage speed. That said, predictive analytics is relatively new in the long history of storage, which makes it difficult to effectively evaluate the features and benefits of any given vendor’s predictive analytics capabilities.

    What makes the challenge even more difficult is that predictive analytics doesn’t yet have a widely agreed upon definition. And when vendors themselves disagree on what a technology is, it’s not a surprise that customers likewise won’t know how to evaluate it.

    So, for the moment, in the storage context, let’s agree on three things that predictive analytics is NOT:

    1. A re-labeling or re-naming of age-old common storage capabilities, such as capacity reporting and projections, reporting from hard fault sensors, or analysis of log files
    2. A reactive system that responds to issues after the fact
    3. A tool that assumes most issues are rooted in the storage system itself (research across thousands of issues and thousands of users revealed that application slowdowns are blamed on storage most of the time, but that storage is actually at fault less than half the time)

    Predictive Analytics and the Maytag Repairman

    Years ago, Maytag televised an ad campaign with the focal point being a Maytag repairman. Maytag was well-known as a manufacturer of washing machines in those days. Unlike the image of a repairman always rushing from job-to-job, the Maytag man was, unintentionally, the least busy man in the neighborhood. Although not his fault, there was little work to do because the Maytag washing machines never broke down.

    Clearly, the goal of all IT departments is perfect reliability and totally effortless management. Put another way, modern IT departments aspire to have “Maytag” systems so that nothing ever goes wrong. Of course, IT resources are not going to sit by the phone (or their workstation) all day waiting for trouble tickets that never come in. Instead, IT can reassign them to more strategic tasks.

    Realistically, the complexities of the modern data center make the Maytag dream impossible without something to address complexity unlike anything the Maytag repairman ever faced.

    That “something” is predictive analytics.

    What Is Predictive Analytics for Infrastructure?

    True predictive analytics eliminates the stress of managing infrastructure. It uses machine learning and data science predict, diagnose, and prevent problems, using data collected from the infrastructure stacks across a large base of customers. This keeps IT infrastructure running optimally, while also enabling IT to anticipate future needs.

    Customers evaluating the predictive analytics capabilities of flash storage products should understand and evaluate the following capabilities:

    1. Ability to proactively predict and prevent problems
    2. Global visibility so that data and analytics can take the complete environment into account and correlate it with data collected from all other environments
    3. An analytics-based support experience that is vastly beyond what would have been possible otherwise

    Predict and Prevent

    If you can predict, you should be able to prevent. The two should go hand-in-hand for IT infrastructure. But in order to reach this level of functionality, IT infrastructure should get smarter over time. Once the system encounters a complicated problem, it should learn from the experience so the same issue doesn’t arise again.

    A vendor may state that their flash storage environment can calculate when a disk drive will reach capacity, based on how much a customer is utilizing per week. But calculating the duration of time before it reaches capacity is easy. Yes, technically it’s predictive. But what’s the true benefit? These types of capabilities are rudimentary, and have existed for more than 30 years.

    The ultimate goal is to achieve a truly self-healing infrastructure. Whether compute, network or storage, the problem should be identified, and often times resolved, with no human intervention required and before the business is adversely affected.

    Global Visibility

    A common limitation in predictive analytics capabilities is the inability to look across the entire environment and across other customer environments when analyzing an issue. In today’s complex data center infrastructures, that’s a crippling limitation, because IT is left with the typical, frustrating circle of finger pointing: whoever made the network switch blames storage, the storage vendor blames the network, and the cycle goes on, ad infinitum.

    Take, for instance, a storage solution that only looks at itself. As mentioned above, recent research illustrated that 46 percent of performance related issues are in the storage environment. But what about the other 54 percent? Configuration errors, interoperability and poor IT practices are among the many other culprits.

    Instead of trying to track down the offender, it’s much easier to be handed the diagnoses for the problem with actionable insights, or even better, automated adjustments. When predictive analytics examine granular data across the infrastructure stack – network, storage and compute – the pain of firefighting and troubleshooting is dramatically reduced. This becomes even more powerful when a particular environment can be compared against other environments to determine if it is operating correctly. As a result, IT can focus on more strategic directives that drive business success.

    A Transformed Support Experience

    When predictive analytics are utilized appropriately, the result is a support experience unlike any other.

    Customer satisfaction hinges on the support experience. That’s why it’s imperative for businesses to get out in front of it. For years, the model has been that the customer calls the company when a problem persists. Technological advances have come too far, too fast for this model not to be considered unsuitable. Companies should call customers to pre-empt problems; not the other way around.

    IT shouldn’t have to spend countless hours troubleshooting, and businesses shouldn’t have to waste time talking to level 1 and level 2 customer support before finally being sent to level 3. Companies can’t afford to go days without an answer. Yet, many vendors apparently believe this is the way life is supposed to be. The right predictive analytics system should be able to automate level 1 and 2 support and troubleshooting entirely.

    There are plenty of “me too” claims coming out of the flash market. But in many cases, these predictive capabilities are very limited in functionality and add little value to the customer.

    The data center should be experiencing the return of the Maytag spirit. Predictive analytics done right prompts enhanced performance and reliability, and ultimately a better customer experience, from firefighting to customer support engagements. In short, these capabilities should act like a built-in IT admin of sorts, or better yet, IT’s own Maytag repair man.

    Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Penton.

    Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
    5:30p
    How Losses of Federal Open Data Could Affect IT and Businesses

    Brought to you by IT Pro

    Open government data is important to businesses, developers, researchers, scientists and others so it can be used for mapping new markets, products, trends and patterns across the United States, but the recent deletion of federal open data content on the open.whitehouse.gov website has some people concerned.

    Since the Trump administration came into office on Jan. 20, all the Obama-era data on the open.whitehouse.gov website has been removed, though most of its data has been archived and is now available through the National Archives and Records Administration by searching and downloading files, making it less convenient. The major repository for U.S. government data, data.gov, still remains online, however, and it is the storehouse for most open government data.

    Joshua New, a policy analyst for the Center for Data Innovation, a nonprofit, nonpartisan research institute, told WindowsITPro that while he doesn’t think the removal of the data from the White House site was malicious, he is concerned it may set a precedent with the new administration.

    “Open data was such a strong theme of the previous administration,” which was standard operating procedure for the government under President Obama, said New. The deletion of the data under the Trump administration is worrisome because it could be a statement that open data isn’t seen as important to the new president, he said.

    Alex Howard, the deputy director of The Sunlight Foundation, a nonpartisan, nonprofit open technology advocacy group, has a different view about the removal of the open data from the White House site. Howard told WindowsITPro that the archiving of the data from the site is part of a normal transition from one administration to another and that this particular development is not necessarily worrisome.

    “There is a strong argument that it would be in the public interest for the Trump administration to have left all of the Obama-era data on the site and then added to that data as more becomes available,” he said. That didn’t happen, though. “What we are concerned about is whether the Trump administration is going to keep providing open data … or whether they might limit access to it in the future. The biggest risk to open government data content is political and not technical.”

    Presently, the U.S. is still awaiting the approval of an open data law for the nation. The OPEN Data Act was approved in December by the U.S. Senate but the House left for its holiday recess before voting on the bill, leaving it in limbo.

    “Open data is not legally binding until Congress makes it so,” said New. “I’m not going to attribute it to malice that all the open.whitehouse.gov data has been removed, but I’m willing to bet that this is going to be more and more common the longer we go without strong leadership on this issues.”

    It is worth noting that “there is always turnover on government websites” as information is updated, replaced, added or removed, said New, but the removal of the data without any comments by the administration and without any replacement is worrisome.

    Open data is government data sets are published online in machine readable open format and are made available for free use to commercial businesses, individuals, developers, researchers and anyone else. 

    “Anyone we talk to about open data is supportive of it,” said New. “It generates tremendous amounts of value and innovation for companies.”

    New said he is concerned that the removal of the data from the White House site could mean that the largest federal open data clearinghouse, Data.gov, which has a much deeper repository of information, could be threatened in the future. 

    “If Data.gov would go blank, that would be a malicious act, that would be a smoking gun,” said New. He wrote about his concerns recently in an op-ed piece on The Hill

    The issue should be discussed and resolved by the new administration to end the concerns, he said. “This is an issue we need to resolve before all of this comes to a halt,” said New. “Businesses are built on data. Losing data whether it’s on Data.gov or not is super-concerning.”

    Charles King, principal analyst at research firm Pund-IT, said the removal of the open data from the White House website is something that should be watched carefully by businesses, developers and other users of the data.

    “The free exchange of information is necessary for a working democracy but it’s also a critical resource for scientists and research organizations inside and outside of government, as well as thousands of businesses,” said King. “President Trump is also reversing the eight years of pioneering open information progress achieved by the Obama administration.”

    For years, individuals and groups both in and out of government have been working to archive information before it is sequestered, said King. “Concerned businesses and IT organizations should lend their support to these efforts, vocally denounce the administration’s policies and do whatever they can to reverse them. Cheering or ignoring the situation is the equivalent of embracing a new Dark Age.”

    Another IT analyst, Rob Enderle, principal of Enderle Group, said he has mixed feelings about the removal of the data from the White House website because some government data is useful and some is biased depending on what agencies want to publicize.

    “It is a cheap source of information but unless you can adjust for any bias, the accuracy of it can vary widely both between and within sources,” said Enderle. “Personally, the choice should be accurate data or no data. Since government data often can’t be adequately validated I’d prefer that got fixed before it was made widely available.”

    This article originally appeared on IT Pro.

    6:09p
    AMD Paves Road Back to Data Center with High-Performance Naples SoCs

    AMD, which after a strong push into the data center using the ARM architecture a few year ago that eventually withered seemed to have all but conceded the server market to Intel, is back at it. This time, it’s returning to x86 and targeting the high end of the market.

    The company on Tuesday previewed its upcoming Naples CPU, designed for cloud and dedicated corporate data centers. Naples is a System on Chip, or SoC, meaning it integrates multiple devices in a single package, including processor cores, memory chips, and connectivity.

    In a statement, Forrest Norrod, senior VP and general manager of AMD’s Enterprise, Embedded, and Semi-Custom business unit, said the preview marked “the first major milestone in AMD re-asserting its position as an innovator in the data center and returning choice to customers in high-performance server CPUs.”

    The announcement comes on the eve of the Open Compute Summit, which kicks off Wednesday in Santa Clara, California. AMD plans to unveil the details of its new data center strategy at the event, so stay tuned for more coverage by Data Center Knowledge later this week.

    For now, here are the core features of AMD Naples:

    • A highly scalable, 32-core System on Chip (SoC) design, with support for two high-performance threads per core
    • High memory bandwidth, with  eight channels of memory per “Naples” device. In a two-socket server, support for up to 32 DIMMS of DDR4 on 16 memory channels, delivering up to 4 terabytes of total memory capacity.
    • The processor is a complete SoC with fully integrated, high-speed I/O supporting 128 lanes of PCIe 34, negating the need for a separate chipset.
    • A highly-optimized cache structure for high-performance, energy efficient compute
    • AMD Infinity Fabric coherent interconnect for two “Naples” CPUs in a 2-socket system
    • Dedicated security hardware

    << Previous Day 2017/03/07
    [Calendar]
    Next Day >>

Data Center Knowledge | News and analysis for the data center industry - Industry News and Analysis About Data Centers   About LJ.Rossia.org