|
| |||
|
|
Resource Sharing Unleashes Performance Storms on the Data Center Jagan Jagannathan is the founder and chief technology officer of Xangati. We have all experienced the good and the bad in the world of computing. We share files on a server, we share a network for sending and receiving email, and we share resources as a number of people try to establish and participate in a web conference. Today’s data centers are also sharing more and more resources, leading to better return-on-investment as its capacities are better utilized. However, while high capacity utilization is generally good, it could lead to situations such as users standing by the printer waiting for their printout to emerge. Caught in the stormWhen critical resources are shared to capacity limits, shared computing environments can suffer spontaneous contention “storms” impacting the application performance and creating a drag on end-user productivity. At Xangati, we talk about “performance storms,” likening them to stormy weather that comes up seemingly out of nowhere and can quickly disappear leaving a path of destruction. A performance storm in the computing environment leaves destruction of your service-level agreements in its wake. Wreaking havoc on the varied cross-silo shared resources in the data center, these storms can entangle multiple objects: virtual machines, storage, hosts, servers of all kinds, and applications. For example, you can experience:
Time ticks awayOne brutal reality of these storms is their extreme brevity; many contention storms surge and subside within a matter of seconds. This short window in which to capture information about a storm can severely hamper an IT organization’s ability to track down its root cause. Often, the IT folks shrug their shoulders, understanding that the only remediation is to wait and see if it happens again. Many management solutions, at best, identify only the effects of storms. The more daunting challenge is to perform a root-cause analysis. Three challenges complicate the problem:
The challenge of scalingUnderstanding what is happening on the network at any precise moment is critical to uncovering and fully understanding the causality of behaviors and interactions between objects. That capability usually requires scalability to track up to hundreds of thousands of objects on a second-by-second basis. In that environment, you gain a decided edge by deploying agent-less technology. Why? Because technologies that build on a multitude of agents do not scale. In the end, it’s still no small feat to penetrate the innards of a complex infrastructure and track down the source of a random, possibly seconds-long, anomaly that can wreak havoc on performance. Once the anomaly passes, after all, there’s nothing left to examine. Ultimately, proper remediation comes down to deploying scalable technology for performance assurance and applying split-second responsiveness to quickly identify and eliminate any issue that could otherwise lead to significant performance loss – and worst of all, a poor end-user experience. Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library. |
|||||||||||||