Data Center Knowledge | News and analysis for the data center industry - Industr's Journal
[Most Recent Entries]
[Calendar View]
Tuesday, March 1st, 2016
Time |
Event |
1:00p |
Alluxio: Open Source Tech Making Baidu’s Data Centers Faster 
This month, we focus on the open source data center. From innovation at every physical layer of the data center coming out of Facebook’s Open Compute Project to the revolution in the way developers treat IT infrastructure that’s being driven by application containers, open source is changing the data center throughout the entire stack. This March, we’ll zero in on some of those changes to get a better understanding of the pervasive open source data center.
Running a successful internet business without using the data you accumulate to your advantage is clearly impossible in this day and age. Until about one year ago, Baidu, the web company behind the largest Chinese-language search engine and the country’s answer to Google, had a major technology problem on its hands.
The queries Baidu product managers ran against its databases took hours to complete because of the huge amount of data stored in the company’s data centers. Baidu needed a solution, and its engineers were given the goal of creating an ad-hoc query engine that would manage petabytes of data and finish queries in 30 seconds or less.
The first step was to get rid of MapReduce, the open source distributed computing framework that’s part of Apache Hadoop and that has for years been the most popular framework for batch analytics. Google, the creator of MapReduce, stopped using it years ago because it couldn’t handle the amount of data it needed it to handle, replacing it with a new proprietary framework called Cloud Dataflow.
Read more: Google’s MapReduce Divorce Does Not Mean End of Hadoop is Near
Baidu switched to Spark SQL, a query engine that enables you to run SQL queries against Apache Spark, the open source Big Data processing engine that replaces the batch analytics approach of MapReduce and Hadoop with real-time, or stream analytics. It helped, but not to the extent they had hoped.
Spark SQL queries were about four times faster, but each query still look about 10 minutes to complete. Upon closer inspection, the team found that it wasn’t the way the CPUs behaved that slowed the process down. The problem was the network, and, more specifically, the way the query engine used the network to access stored data. Baidu’s data lives in multiple data centers, and to run a query, the system would have to transfer data between sites, stressing the networks and causing big delays.
In-Memory Performance With any Storage
To solve this problem, they turned to a relatively young open source project born at the University of California, Berkeley. At the time, the project was called Tachyon. The software turns a set of disparate storage systems into a single virtual storage pool accessed via a single API, but, more importantly for Baidu, it makes that single virtual storage system behave the same way an in-memory system does, where data that’s being processed sits in memory of the clustered servers that are processing it, making the process exponentially faster.
The project was recently renamed as Alluxio, which is also the name of a venture capital-backed startup founded by one of its original creators. The company’s founder and CEO, Haoyuan Li, was one of the founding contributors for Spark, which also came out of the AMPLab at UC Berkeley, where he was a PhD candidate at the time.
AMPLab is a Berkeley research hub that focuses on computing challenges presented by modern-day Big Data analytics and distributed hyperscale systems. Ion Stoica, AMPLab’s co-director and co-creator of Spark, was one of Li’s PhD advisors.
Last year, Alluxio received $7.5 million in funding from Andreessen Horowitz, one of Silicon Valley’s most prominent venture capital firms. Since then, the startup has attracted distributed computing experts from Google, VMware, Palantir, and Carnegie Mellon University, as well as other AMPLab participants.
Switching to Alluxio as the underlying storage management layer for its query engine worked for Baidu. A query that took between 100 and 150 second to complete using Spark SQL alone now takes 10 to 15 seconds, according to a recently published case study, and that’s for data stored remotely. Similar queries across data stored on local Alluxio nodes take up to five seconds.
To get that in-memory performance, Alluxio moves frequently accessed, or hot data from the underlying storage systems to the computing nodes it runs on.
Baidu is now in the process of deploying Alluxio more broadly, gradually moving more and more of its workloads into Alluxio clusters. Some of the first workloads to move are systems for serving images to online users and for offline image analysis. Currently, images for each system sit on separate storage systems. Alluxio will enable Baidu to store them on a single system, where they can be accessed for both online serving and offline analysis. The company expects this change to cut its development and operation costs substantially.
Storage Needs a Revolution
Baidu isn’t the only high-profile case study for Alluxio. British banking giant Barclays runs it as part of the stack used to build machine learning and data-driven applications. The open source project has enjoyed endorsements and investments from giants like China’s Alibaba Group, IBM, Intel, EMC, and its subsidiary Pivotal.
One of its most attractive features is the ability to present a mix of storage resources underneath, be they on-premise or in the public cloud, physical or virtual, spinning disk or SSD, as a single storage resource to the compute layer via an API. It does to storage the same thing Apache Mesos and the Datacenter Operating System by the startup called Mesosphere does to compute. Mesos abstracts disparate computing resources in the data center, presenting them to the application as a single computer.
Another important capability in Alluxio is tiered storage. Users can assign the top tier to the in-memory storage layer it creates on the compute side, for example, second tier to Flash arrays, and third tier to disk drives. Different workloads have different performance requirements, and many users may be satisfied with performance and cost levels of regular disk storage in some cases while using the faster in-memory capabilities in others. Alluxio unifies it all into a single system.
It’s not limited to Spark. Any framework can access data on any storage system, Li said.
He declined to go into detail about his startup’s business plan, saying Alluxio the company was still in stealth mode. “We have a direction,” he said. “We cannot share it for now.”
At the moment, all the focus is on making the open source technology better. Spark revolutionized computation frameworks, while Mesos is revolutionizing the way data center resources are managed, Li said. “We’re missing revolution in the storage layer.” | 5:27p |
Half-Baked Government Consolidation Causes Cybersecurity Headaches: Report  By The WHIR
Consolidation and modernization processes which should improve the cybersecurity of federal IT departments are actually doing the opposite during incomplete transitions, according to research released Tuesday by SolarWinds.
The third annual SolarWinds Federal Cybersecurity Survey also reveals that foreign governments have caught up to careless or untrained insiders as a security threat. Foreign governments and insiders were each cited as the top threat by almost half (48 percent) of the 200 IT and IT security professionals surveyed by SolarWinds, mostly in federal government and military positions. Ten percent more federal IT professionals consider foreign governments a top threat than in 2015, while concerns about insiders dropped five percent.
A reflection of this change is a growing concern around the sophistication of attacks, with 44 percent saying it increased agency vulnerability. By contrast, only 26 percent said the same of attack volume, and 24 percent said end user policy violation is an increasing vulnerability.
Read more: Ten Key Figures from Latest Progress Report on US Government IT Reform
Consolidation and modernization processes are increasing IT security challenges according to 48 percent, with 48 percent saying they are incomplete, 46 percent blaming the complexity of enterprise management tools, and 44 percent pointing to a lack of familiarity with new systems. Cloud adoption is seen as increasing challenges by 35 percent.
“As federal IT departments move through the process of consolidation and modernization, the complexity of IT environments increases significantly and the responsibility of managing both legacy infrastructure and upgraded systems places a considerable burden on IT pros,” Mav Turner, director of product strategy, SolarWinds said. “When completed, consolidation and modernization projects will provide more efficient and secure environments, but this isn’t going to happen overnight, so additional attention must be given to securing environments against threats no matter where they originate.”
One in five federal IT professionals said consolidation and modernization have decreased IT security challenges. Replacing legacy software and hardware are each seen as a benefit by over half of those, while 42 percent said simplified administration and management are decreasing challenges.
Read more: IBM, HPE: Government Cloud Security Process Broken
While the reported obstacles to IT security are the usual mix of factors like internal environment complexity and competing priorities, led by budget constraints, the number of respondents blaming their budgets has dropped by over 10 percent since 2014. This both indicates how seriously federal organizations are treating cybersecurity, and suggests the funds for service providers to help the organizations meet those challenges are there.
Other results from the survey confirm that budgets are increasing, and the organizational challenge will be selecting the right security tools. Solutions designed for the federal government, like CenturyLink’s recently launched FedRAMP compliant IaaS offering, could be widely adopted if they can persuade federal clients that the transitions will be completed securely.
Smart card/common access card solutions which provide dual-factor authentication are considered the most valuable security product by those surveyed, followed by identity and access management. The mean number of security products used is 5.35, with significant variation between products in number of deployments and perceived effectiveness.
Market Connections Inc. director of research services said it is positive that 28 percent of respondents feel more secure, despite 38 percent noting an increase in IT security incidents.
This first ran at http://www.thewhir.com/web-hosting-news/half-baked-government-consolidation-causes-cybersecurity-headaches-report | 6:00p |
Ask These Five Questions for Data Protection Peace of Mind Eran Farajun is Executive Vice President of Asigra.
The growing complexity of today’s enterprise computing environment means critical corporate data is stored in increasingly fragmented and heterogeneous infrastructures. Ensuring all this decentralized data is backed up in case of breach or disaster is a major cause of anxiety for both business executives and senior IT professionals.
That’s because comprehensive data protection is really not core to most people’s jobs – most of you have other things to worry about, and you just hope and pray that the systems you’ve implemented have backed up your data and will recover it in case of a disaster. But you’ve got your fingers crossed because you’re really not that confident that they will.
According to Jason Buffington, principal analyst for data protection at ESG, improving data backup and recovery systems has been a top five IT priority and area of investment for the past several years. That’s because continually-evolving computing infrastructures and production platforms are forcing companies to reexamine their data protection strategies. “When an organization goes from 30 percent virtualized to 70 percent, or from on-premises email servers to Office 365 in the cloud, these evolutions to your infrastructure drive the need to redefine your data protection strategy,” says Buffington. “Legacy approaches for data protection can’t protect all of the data in these more complex environments.”
It’s Complicated
How concerned should you be about your existing data protection solution? Let’s explore the complexity of today’s average computing environment to find out.
Chances are good you have multiple virtualization technologies operating within your infrastructure, including VMware, Hyper-V, and KVM. You may have a data protection solution for one of your hypervisors, or maybe two. But in a data loss event, you’ll lose the data in VMs that you haven’t protected. The same is true of Docker containers. It’s certainly possible, but not trivial, to protect the data in containers. However, if you haven’t deployed a data protection system specifically for them, your data in containers isn’t backed up and won’t be recoverable.
If you’re using convenient, cloud-based apps like Google Docs, Salesforce.com, and Office 365, then you probably know that you can pay the vendor to back up your data. But if the system goes down, as it did for Office 365 in late January, the backup that you’re paying for and counting on could be in the same data center — and maybe on the same servers – that suffered the outage. Then you’re stuck trying to perform data recovery from dead equipment.
Up to 70 percent of enterprise employees use endpoint devices such as laptops, smartphones, and tablets to access corporate data. While these devices may contain some of an organization’s most critical data, implementing a comprehensive data protection plan for multiple endpoints running multiple OSs isn’t child’s play. What happens at your company when a laptop is lost or stolen? Do you have a way to retrieve that data if the device hasn’t been backed up recently? Does your current data protection system geo-locate the missing device and have the ability to perform a remote wipe? If not, no wonder you don’t feel confident about your data protection strategy.
“Today’s data center is anything but simple,” says Marc Staimer, president of Dragon Slayer Consulting. “There are so many different kinds of data being created every day, on myriad platforms, operating systems, in containers, in the cloud, and they each have their own means of data protection. It makes data protection seem like the Wild Wild West, a chaotic free-for-all. And most people don’t realize how bad it is until they have a small data loss event. And then the worrying begins: ‘If I wasn’t able to recover that data, what else isn’t protected’”
Protecting Data in Complex Environments
The traditional approach to protecting data in multiple applications and on disparate platforms is to protect it with multiple point protection solutions. This can quickly lead to big business impacts.
First, there’s cost: The more data protection systems you run, the more you will pay in licensing. Also, in a complex environment, it’s inevitable you’ll have overlapping stores of the same data, so you’ll end up paying to protect it multiple times. This may not seem like a significant pain point – until you are forced to recover that same data multiple times, overwriting earlier recoveries and destroying any new data that’s been added to that recovery. Management of multiple systems is another consideration: keeping current with multiple trainings, methodologies, fixes, patches, updates, and upgrades just adds more cost and complexity for IT. And it’s inherently difficult to recover data from a patchwork of data protection solutions, lengthening Recovery Time Objectives (RTOs) and challenging Recovery Point Objectives (RPOs), and slowing business continuity and disaster recovery efforts after outages.
“Organizations now must have their critical production workloads and critical data available immediately after a data loss event. In today’s ‘no downtime’ world, organizations large and small need to explore a comprehensive data protection solution,” says Staimer. “Businesses need instant-on access to their data, and that requires not just backup, but also replication.”
The complexity of today’s environments require a more comprehensive approach to data protection, one that converges both backup and replication technologies into a single, easily managed solution. A comprehensive solution backs up data from any source – whether in the data center, the cloud, a virtual environment, or on an end-point device – and stores this data not only locally but also remotely in the cloud to ensure full data replicability in case of a natural disaster, hardware failure, data breach or malicious data attack.
“Searching for a single technology that offers both a rigorous on-premises data protection solution and the ability to easily replicate data stored in the cloud can force a reconsideration of vendors,” says Buffington. “Couple that with the fact that primary data is growing over 25 percent year over year, that data protection storage is growing 40 percent annually, and that data protection budgets are only growing four to six percent yearly, and it means that you can’t keep doing what you’re doing because it doesn’t work anymore.”
A Vendor Checklist
For confidence that your data is fully protected, you may need to update your existing data protection to a comprehensive backup and replication technology. Don’t be embarrassed to ask for help: If it were simple to deploy such a solution, you’d have already done it. Bringing in a data protection specialist can give you the confidence to uncross your fingers.
Here are some questions to ask vendors as you narrow your search:
- Does the proposed data protection solution provide protection of data residing in both physical and virtual servers (including containers)?
- Can I protect all forms of data from multiple sources across the enterprise in a centralized data repository?
- In the event of a disaster, how quickly can I access critical applications and resume business operations?
- Can I use the proposed solution to backup and replicate data off site to a secure third party location for disaster recovery purposes?
- Does the solution support security protocols, such as NIST FIPS 140-2 Certification, which is mandatory for regulatory compliance in some industries?
What are the top data protection requirements for your business? What other questions would you ask?
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library. | 6:30p |
IT Innovators: Gamification Company Bets on Containers  By WindowsITPro
Bunchball, a company that offers gamification as a service to help drive engagement, customer loyalty and more for organizations like Applebee’s, Salesforce and Ford Canada, recently decided to transform its infrastructure from a manually configured platform to an auto-scaling, all-container solution.
The company had relied on virtual machines in the past, but faced some obstacles due to VMs’ relatively static nature. The company wanted to make production changes in a more timely manner, explains Joe Schneider, DevOps engineer for Bunchball. “It used to take up to two months, but now we can make these changes in a week,” he says.
Containers, which include application files and its dependencies, are a more portable option than a VM that includes all operating system files. The challenge to find a more flexible and portable solution prompted the Bunchball team to reconfigure its infrastructure. The team started off by focusing on its’ main app and then slowly transitioning new features piece by piece. “We realized we couldn’t run our app the way we’re running it forever,” Schneider says. “The plumbing became a lot more dynamic, and we had to rethink how to have the computer layer talk to the data layer.” Schneider adds. “With a container app, we could have things up and running in minutes.”
The best approach for Bunchball’s needs, according to Schneider, involved a slow transition to containers. “We had to keep the old stuff running while we were building a new airplane right next to the one already in flight,” Schneider says, explaining that he and his team would make a new piece of infrastructure and make changes one piece at a time. “There were really no drastic changes all at once,” he says.
Schneider says that getting the easiest parts done first was instrumental to the team’s success. This approach gave the team extra time to build the experience needed to tackle the more difficult challenges more efficiently. He warns, if you recreate your applications all at once with containers, you’ll have to recreate all the plumbing on the spot and automate it, which could turn out to be much more work than anticipated.
The move also required some careful and strategic planning when it came to internal talent. Assigning certain individuals to become the dedicated experts was helpful when it came to building leadership and training other employees as needed. “We really adopted a mentor approach, coupled with ongoing training and online documentations that were always available to reference, which worked fairly well,” Schneider says.
Another important lesson learned, according to Schneider, was to keep expectations in line. “A lot of the hype around doing orchestration with containers is that you get higher resource utilization,” he says. “Well, that won’t happen for a long time; it’ll probably be at least a year before you see savings,” he says. He emphasizes that improvements will come with time and reminds individuals interested in a similar mission to keep the bigger picture in mind. “In the end, it’ll most certainly be worth it,” Schneider says.
Renee Morad is a freelance writer and editor based in New Jersey. Her work has appeared in The New York Times, Discovery News, Business Insider, Ozy.com, NPR, MainStreet.com, and other outlets. If you have a story you would like profiled, contact her at renee.morad@gmail.com.
The IT Innovators series of articles is underwritten by Microsoft, and is editorially independent.
This first ran at http://windowsitpro.com/it-innovators/it-innovators-gamification-company-places-its-bets-containers | 7:16p |
Cisco Expands Cloud and Hyperconverged Infrastructure Play Cisco made two major data center announcements Tuesday, unveiling an agreement to acquire IT orchestration startup CliQr Technologies and rolling out a new line of hyperconverged infrastructure systems. Both the acquisition and the new product line leverage Cisco’s existing data center technologies.
CliQr, the San Jose, California-based startup Cisco is buying for $260 million cash, has integrated its orchestration platform with Cisco’s Application Centric Infrastructure software, its flagship data center network automation technology, and Unified Computing System, its pre-integrated compute, networking, and virtualization IT package.
The new hyperconverged product line, called HyperFlex, adds a storage management layer to UCS, which turns it from a converged infrastructure solution into a hyperconverged one. Both converged and hyperconverged infrastructure concepts are attempts to unify disparate data center hardware into a single easy-to-manage environment, with the main difference being that hyperconverged systems add Software Defined Storage, which turns clusters of commodity x86 servers into automated storage systems using software.
Read more: Why Hyperconverged Infrastructure is so Hot
Cisco’s HyperFlex consists of UCS and hyperconverged infrastructure software by a company called Springpath.
CliQr gives Cisco a piece of technology that helps customers switch to IT environments that take advantage of cloud infrastructure – public, private, or hybrid – faster. It orchestrates IT resources underneath, regardless of the type, to support applications at hand.
A user sets up a single application profile, and CliQr ensures it can be deployed in a way that’s consistent with the IT department’s access control and security policies.
The platform is already integrated with UCS and ACI, but Cisco plans to integrate it further across its data center technology portfolio.
Read more: Five Myths about Hyperconverged Infrastructure | 8:13p |
Colocation Data Center News Roundup: IO, Sentinel, CoreSite IO to Build Three-Story Data Center in Phoenix
 Data center modules now manufactured by BaseLayer inside an IO data center. (Photo: IO)
IO is planning to expand data center capacity in its native Phoenix market. The company has bought nine acres of land close to its existing data center there, where it plans to build a three-story facility that will be gradually populated by its famous data center modules, fabricated locally.
The company bought the land in February and plans to start construction in late 2016, according to a press release.
The modules will be produced by BaseLayer, which is the second of two companies the former IO was split into last year. BaseLayer is a technology company, producing and designing modular data centers and data center management software, while IO is continuing as the data center colocation provider.
Sentinel Kicks Off Big North Carolina Data Center Expansion
 On-site electrical substation at Sentinel’s North Carolina data center. (Photo: Sentinel Data Centers)
Sentinel Data Centers has started construction on the second phase of its Durham, North Carolina, data center. The project will add 120,000 square feet total, with 50,000 square feet of data center space and 10MW of power.
The 100,000-square foot first phase of the facility is nearing capacity. Sentinel has already pre-leases some of the space in the future second phase, expected to come online this summer.
The company has been touting low cost of operating data centers in North Carolina, highlighting the state’s sales tax exemption on IT purchases by data center tenants and power rates below $0.042 per kWh.
CoreSite Scores another Big Tenant in Santa Clara
 CoreSite’s SV4 data center in Santa Clara, California. (Photo: CoreSite)
CoreSite Realty Corp. announced that a Fortune 500 company has pre-leased the entire second phase of a big new data center it is building on its Silicon Valley data center campus in Santa Clara.
The customer, whose name CoreSite did not disclose, has agreed to take the entire 80,000 square feet of data center space in the 230,000-square foot SV7 building’s second phase, which the data center provider expects to complete by July. About half of the similarly sized first phase has been pre-leased too.
Silicon Valley has been an especially hot data center market recently, with constant shortage of supply and a lot of demand. This is a third recent announcement of a big customer win in Santa Clara by CoreSite. The company said last year it was also building a separate 140,000-square foot data center for a single customer.
CoreSite hasn’t disclosed any names of the companies it signed the recent leases with, but a report by the commercial real estate company North American Data Centers said at least two of them were Uber and Amazon. Uber, the report said, signed for 4MW of capacity, and Amazon Web Services signed for 130,000 square feet of data center space. | 11:00p |
Rackspace Shifts 90 Employees Away from Public Cloud Department  By The WHIR
Rackspace is in the process of re-assigning 90 of its employees who work in its public cloud department to faster growing areas of the company, like private and hybrid cloud.
According to a report by the San Antonio Business Journal on Tuesday, it is undetermined whether these employees will be laid off, but Rackspace said that the company regularly shuffles employees, which it calls Rackers, to “fast-growing areas” of its business “and may from time to time eliminate some roles in areas” it chooses to reduce investment. The company has more than 6,000 employees.
Rackspace said it is placing employees in public cloud marketing and engineering into private and hybrid cloud computing departments in preparation for a slow-down of new signups for its OpenStack public cloud service as more new public cloud workloads head towards AWS and Azure.
Read more: What’s Behind Rackspace’s Private OpenStack Cloud Partnership With Red Hat
In an email to The WHIR, a Rackspace spokesperson said: “At Rackspace, we regularly align Rackers to fast-growing areas of our business and may from time to time eliminate some roles in areas where we choose to reduce our investment. We help Rackers, whose roles are eliminated, try and find new roles within the company and many do so. We anticipate that our 6,000-plus Racker workforce will continue to grow this year.”
The public cloud market has been unkind to companies that challenge AWS and Azure, with Verizon being the latest firm to duck out of the running by shuttering its public cloud service. In the last year, Rackspace has shifted its focus to partnerships, such as its recent partnership with Red Hat, which help it offer clients a hybrid cloud solution. In October, Rackspace began offering support for AWS, noting increased customer demand for such a service.
Rackspace CEO Taylor Rhodes told investors on a recent earnings call that its OpenStack private cloud is growing in the “high double digits.”
Despite the restructuring, Rackspace told investors that it expects its workforce to grow this year.
This first ran at http://www.thewhir.com/web-hosting-news/rackspace-shifts-90-employees-away-from-public-cloud-department |
|