Data Center Knowledge | News and analysis for the data center industry - Industr's Journal
[Most Recent Entries]
[Calendar View]
Wednesday, September 9th, 2015
| Time |
Event |
| 11:00a |
Cloudera Aims to Replace MapReduce With Spark as Default Hadoop Framework Looking to tie the Apache Spark in-memory computing framework much closer to Apache Hadoop, Cloudera today announced it is leading an effort to make Spark the default data processing framework for Hadoop.
While IT organizations will be able to continue to layer other data processing frameworks on top of Hadoop clusters, the One Platinum Initiative is making a case to essentially replace MapReduce with Saprk as the default data processing engine, Matt Brandwein, director of product marketing for Cloudera, said.
Most IT organizations consider MapReduce to be a fairly arcane programming tool. For that reason, many have adopted any number of SQL engines as mechanisms for querying Hadoop data.
Google publicly announced it had stopped using MapReduce because it was inadequate for its purposes last year, replacing it with its own framework called Dataflow. The company launched Dataflow as a beta cloud service earlier this year.
When it comes to building analytics applications that reside on top of Hadoop, the Spark framework has been enjoying a fair amount of momentum.
Brandwein noted that there are at least 50 percent more active Spark projects than there are Hadoop projects. The One Platinum Initiative would in effect formalize what is already rapidly becoming a de facto standard approach to building analytics applications on Hadoop.
“We want to unify Apache Spark and Hadoop,” he said. “We already have over 200 customers running Apache Spark on Hadoop.”
Cloudera, claimed Brandwein, has five times more engineering resources dedicated to Spark than other Hadoop vendors and has contributed over 370 patches and 43,000 lines of code to the open source stream analytics project. Cloudera also led the integration of Spark with Yarn for shared resource management on Hadoop as well integration efforts involving SQL frameworks such as Impala; messaging systems such as Kafka; and data ingestion tools such as Flume.
The long-term goal, said Brandwein, is to make it possible for Spark jobs to scale simultaneously across multi-tenant clusters with over 10,000 nodes, which will require significant improvements in Spark reliability, stability, and performance.
Cloudera, he added, is also committed to making Spark simpler to manage in enterprise production environments and ensuring that Spark Streaming supports at least 80 percent of common stream processing workloads. Finally, Cloudera will look to improve Spark Streaming performance in addition to opening up those real-time workloads to higher-level language extensions.
Exactly how much support for this initiative Cloudera has remains to be seen. The company, for example, has long-standing relationships with both Intel and Oracle. The rest of the IT industry at this juncture appears to be more committed to the Hadoop distribution put forward by Cloudera’s rival Hortonworks. | | 12:00p |
Optimizing Cloud Resources: the Requirements and the User In a cloud environment, administrators are still using physical resources to deliver their workloads to the end points. These resources may be located at a nearby data center or somewhere offsite. The most important fact to remember is that these resources must be properly watched over and managed as they are very finite. As mentioned earlier, poor resource provisioning will result in a Band-Aid effect, where administrators are simply pumping more RAM, storage or bandwidth into an environment without really fixing the original issue: improper cloud resource balancing.
When deploying a cloud-ready data center, engineers must plan out their environment and properly size as well as balance their resources. This means understanding the following components:
- Current user count. The only way to properly size and balance a system is to establish the amount of users that will be accessing the cloud infrastructure immediately. This can be a department, a corporate division or an entire branch office. By understanding the immediate need of the cloud, administrators are able to plan for baseline requirements. When user count is established, plans can begin for proper resource provisioning. Here, RAM, CPU, storage and WAN requirements are calculated based on the number of users accessing the environment at any given time, and the workloads that they will be launching.
- Future user count. One of the most important planning phases in any cloud environment is forecasting for future usage. This means working directly with business partners to understand organizational demands for growth and expansion. If an administrator knows that there will be a new acquisition around the corner, they will size their cloud environment for growth. This could mean having a spare blade chassis available for more users, or having additional resource prepared for a spike in user count. This also means planning for capacity needs. For example, if a cloud-based storage controller is purchased only for “now” demands, future usage spikes could potentially cripple performance for any user trying to access the workload. When forecasting for the future, it’s important to size every component in the cloud environment properly. This way, as user counts increase, administrators are able to equally balance the additional users amongst available resources.
- WAN requirements. The ability to quickly and efficiently deliver workloads over the WAN will be crucial to the success of a cloud deployment. Special considerations must be made depending on the environment. Some organizations will have multiple different links connecting their cloud environment for proper load-balancing and HA. Although each environment will have its own needs, there is a good set of best practices which can be followed for a respective site type:
- Major cloud datacenter: This is a central cloud computing environment with major infrastructure components. Hundreds or even thousands of users would be connecting to this type of environment. It can host major workload operations where workers from all over the world would connect and receive their data. The requirements here involve very high bandwidth and very low latency.
- Recommendations: MPLS, optical circuits, or carrier Ethernet services.
- Branch cloud datacenter: This is usually a smaller, but still sizeable cloud environment. This infrastructure would be used to house secondary, but still vital cloud systems. Here, administrators may be working with a few cloud delivered workloads which need to be distributed to a smaller amount of users. In this type of datacenter, requirements call for moderate bandwidth availability with the possible need for low latency.
- Recommendations: MPLS or a carrier Ethernet service.
- Small cloud datacenter for DR or testing: This is a small cloud datacenter with only a few components. Many times small distributed datacenters are used for testing and development or for smaller DR purposes. Requirements in this environment call for low bandwidth but may still need low latency and the option for mobility.
- Recommendations: MPLS over T1/DSL, broadband wireless options, or Internet VPNs.
Remember, your data center must adapt to new kinds of technologies including mobility, consumerization, and now IoT. It’s critical to create interconnected environments capable of sharing resources to help the user, and the business, be most productive. New kinds of link aggregation services, user optimizations, and even virtual technologies are directly impacting how we control major data center points and remote branch locations as well. The key point to understand is that it’s becoming easier to control these environments. Cloud computing brings distribution of data. It’s up to the administrator and the data center to properly control this data and optimize the delivery.
Be ready for user spikes. Be ready for new challenges around workload and application delivery. Most of all, be ready for a new kind of cloud architecture that’s designed to optimize resources and the overall user experience. We’re moving towards an age where automation and orchestration help drive many data center and cloud components. But that still means that you must properly plan and align your physical resources. Poor data center resources utilization can take down even the best cloud strategy. Ensure that our technology solutions and your business are always aligned. | | 1:00p |
Google Partners with CDNs to Lower Cloud Prices for Users Who Cache Content At “The Edge” Google has partnered with four Content Delivery Network providers to slash bandwidth costs for users of its cloud infrastructure services. Customers who use CloudFlare, Fastly, Highwinds, or Level 3 CDN services together with Google Cloud Platform will pay less for in-region egress traffic from their cloud environments.
Put simply, if you use CloudFlare’s CDN, for example, to serve files from your Google cloud VMs to customers in one of the regions covered by the partnership, your data transport costs out of the Google data center that hosts the VMs will be lower than usual.
The move appears to be about sharing the burden of moving increasingly large content and web application files from data centers to users more evenly between Google and companies that serve that content and those apps.
CDNs store frequently accessed files close to population centers where those files are in high demand, reducing the cost of data transport and improving performance for the users. Google has a highly distributed data center network (70-plus Points of Presence in 33 countries), but CDNs have much wider geographic reach.
If a Google cloud customer serves content from a Google data center to a user in a different state, Google has to pay to move that data across a long distance to a local Internet Service Provider that serves that user. But if the customer pays a CDN provider to store copies of that content in a data center that’s already on that ISP’s network, Google doesn’t have to move the data over that long distance every time the user asks for it.
The program is called CDN Interconnect. “CDN Interconnect’s special egress pricing should encourage the best practice of regularly distributing content originating from Cloud Platform out to the edge close to your end-users,” Ofir Roval, product manager for Google Cloud Platform, wrote in a blog post.
Google will provide private network links between its cloud data centers and the CDNs.
Storing content at “the edge” of the internet is becoming increasingly important as users watch more and more video online, and as companies rely more and more on business applications delivered as web services. Demand for this content and services in markets far removed from the traditional “tier-one” metros, such as New York, Silicon Valley, or Dallas, is on the rise, driving rapid build-out of data centers in those tier-two markets, where content providers and CDNs cache popular files and exchange traffic with long-haul carriers and ISPs, or “eyeball” networks.
Users who want to take advantage of Google’s egress traffic discounts have to work with their CDN providers to deploy CDN Interconnect for their applications. The CDNs will provide lists of locations that apply. | | 3:00p |
Three Tips for Surviving Today’s Complex Data Landscape John Whittaker is Executive Director of Information Management for Dell Software.
The volume of data is growing exponentially as various channels and sources create large and complex data sets. As the variety of data grows, more vendors have emerged on the scene with solutions to help manage, analyze and ultimately turn that data into value. Among the chaos of an increasingly complex big data landscape, organizations hope to bring sense to various types of data, especially unstructured data. At the same time, traditional data sets still remain a foundational piece of businesses environments. Thus, understanding how to maintain and leverage both structured and unstructured data from new sources has become a challenge.
With these changes, database administrators (DBAs) are forced to adapt to handling these growing sets of data and data sources, while continuing to operate efficiently and productively. It’s vital for DBAs to keep in mind a few key tips in order to acquire and implement the new skills necessary to perform in this changing data ecosystem.
Make Structured Data a Priority
According to the results of a Unisphere Research survey of 300 database administrators (DBAs) and others charged with managing corporate data, structured data in relational database management systems (RDBMS) remains the foundation of the information infrastructure in most companies. While the industry is constantly discussing the volume of unstructured data as big data sources explode, it’s vital to keep in mind that structured data still plays the primary role within organizations. In fact, according to the survey, structured data still makes up 75 percent of data under management for more than two-thirds of organizations, with nearly one-third of organizations not yet actively managing unstructured data at all. For DBAs managing data across RDMS technologies, even with the growth of big data and the role it plays in today’s industry landscape, insights from structured data are driving key operational processes, so it still needs to remain top of mind.
Integrate Analytics to Predict Business Change
At the same time, preparing for the incoming continued growth of big data is key to an organization’s successful information management strategy. With the complexity of today’s information infrastructure, remaining ahead of the big data trajectory will help organizations generate effective results and remain competitive. Utilizing predictive analytics has become more and more of a relevant need as CIOs and their database management teams come to face-to-face with increasingly complicated data environments.
The need to support new analytical use cases is the most important factor driving adoption of new database management systems, according to the Unisphere survey. Using data analytics to not only make sense of previous events, but future situations, is critical for database administrators to gain the knowledge necessary to keep their organizations one step ahead.
Leverage the Proper Tools
Staying successful in an industry that continues to advance means obtaining and integrating the right tools. Even with the escalation of big data, according to the Unisphere survey, Oracle and Microsoft SQL Server – often called the most traditional databases – remain the most common platforms organizations use to support key company data. As the industry shifts to make room for the growth of big data entering the industry scene, it’s critical to recognize which tools enable a productive and efficient database management team.
The rise of other databases, including Hadoop and NoSQL, fill a necessary role to manage the influx of unstructured data. However, many companies have not yet matured to the point of implementing them within their data management plans, and default to the proven traditional database systems. Creating an effective and results-oriented information management landscape means understanding your environment and integrating the various tools and skill sets needed to make sense of all areas of the ecosystem – traditional or modern. Doing so will help businesses focus on producing insights that improve operations and drive revenue.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library. | | 4:56p |
Microsoft Confirms Acquistion of Cloud Application Security Firm Adallom 
This article originally appeared at The WHIR
Microsoft has acquired cloud application security company Adallom, the companies announced Tuesday. Adallom’s SaaS identity solution will be integrated into Microsoft’s security portfolio to enhance its capabilities on-premise and across multiple clouds. Several reports have pegged the deal around the $250 million mark.
The acquisition was initially reported in July as a $320 million purchase, shortly after Microsoft acquired Aorato, another Isreali security company.
Cloud access security broker technology from Adallom will be available for Office 365 and the Enterprise Mobility Suite, including Microsoft Advanced Threat Analytics. The solution also works with Salesforce, Box, Dropbox, ServiceNow, Ariba, and other popular cloud applications. Its integration with Dropbox was announced in June.
Adallom was founded in 2012 by Assaf Rappaport, Ami Luttwak, and Roy Reznik,and blogs on the announcement from both companies seem to imply that the whole team will be integrated into Microsoft along with Adallom’s software offering.
Microsoft has been busy with cloud-related announcements, including the launch of new Azure VMs last week, a new Federal Aviation Administration contract for cloud services, and the IT preview of SharePoint Server 2016 in August.
Microsoft Ventures entered into a partnership a year ago to provide tools and funding for cybersecurity startups in Israel. When PayPal acquired Israeli cybersecurity startup Cyactive in March, it also announced the establishment of a cybersecurity facility in the country, which has become a hotbed for new companies in online and network security. BlackBerry did the same a month later when it acquired WatchDox.
This first ran at http://www.thewhir.com/web-hosting-news/microsoft-confirms-acquistion-of-cloud-application-security-firm-adallom | | 5:43p |
Better Insight into Cooling Efficiency Can Help Defer Data Center Expansion Because many IT organizations have no real way of knowing how well the cooling systems in their data center are actually working, there is a tendency to compensate by acquiring more data center space than necessary. In fact, Cliff Federspiel, president and CTO of Vigilent, an efficiency tooling vendor, said that as much as 40 percent of data center capacity is wasted because IT organizations are too conservative when assessing capabilities of their existing cooling systems.
At the Data Center World conference this month in National Harbor, Maryland, Federspiel will present on using wireless sensors to identify areas where additional data center capacity can be brought online without exceeding the cooling capacity of the existing facility.
“When it comes to the data center, the big four issues are network, space, power, and cooling,” said Federspiel. “The issue that many organizations get hung up on is cooling.”
In addition to saving money, most IT organizations are trying to move more compute capacity to the edges of their networks. The end result is a significant shift in terms of how systems are distributed around the data center, which in turn can change the thermodynamics of the entire facility. As a result, Federspiel said, more IT organizations need to make sure they understand how the cooling systems are actually working.
While cooling systems may not always get the same level of attention that power tends to get inside the data center, implementing an analytics application can pay for itself in one to two years. Add to that the amount of time it takes to bring on additional data center capacity and suddenly understanding the airflow throughout a data center facility quickly becomes the difference between putting a hole in the capital budget versus maximizing the value of existing IT investments.
For more information, sign up for Data Center World National Harbor, which will convene in National Harbor, Maryland, on September 20-23, 2015, and attend Cliff’s session titled “Eliminating Cooling Capacity Roadblocks to IT Expansion” | | 7:17p |
Microsoft Back in Court over Emails in Dublin Data Center Microsoft is back in court today over US government access to customer emails stored in its Dublin, Ireland, data center.
The battle with US law enforcement officials in a district court is the next step in the process that started last year, when a magistrate judge ordered the company to turn over the data. Microsoft said at the time that the magistrate judge’s ruling was an expected and necessary step in the fight to make sure the case doesn’t set precedent that would make it common for US authorities’ jurisdiction over US companies to extend to data they store overseas.
More details on the previous court decision here.
The emails in question reportedly belong to a person suspected of drug trafficking.
Microsoft is enjoying support from a number of large US tech companies and civil liberties groups, including the American Civil Liberties Union and the Electronic Frontier Foundation, according to news reports. ACLU papers supporting the company were filed in the appeal, Bloomberg reported.
The case is being viewed as a landmark one, which will decide whether or not a US company with data centers overseas is legally obligated to comply with law-enforcement requests for access to data stored in those facilities on foreign land, which are also governed by data privacy laws in the countries they are located in.
There are international treaties in place for cases like this, requiring the US government to cooperate with foreign governments to ensure it receives the information it seeks in ways that are compliant with the foreign laws. But the US argues that the process would take too long, according to a report by the BBC.
Irish authorities said they would expedite the process had the US made such a request.
Companies like Microsoft, which serve customers around the world from globally distributed data center infrastructure, view requests for customer data stored overseas by the US or any other government as erosive to customer trust in their services and therefore bad for business. |
|