|
| |||
|
|
Why Working Sets May Be Working Against You Pete Koehler is an Engineer for PernixData. Lack of visibility into how information is being used can be extremely problematic in any data center, resulting in poor application performance, excessive operational costs, and over-investment in infrastructure hardware and software. One of the biggest mysteries in modern day data centers is the “working set,” which refers to the amount of data that a process or workflow uses in a given time period. Many administrators find it hard to define, let alone understand and measure how working sets impact data center operations. Virtualization helps by providing an ideal control plane for visibility into working set behavior, but hypervisors tend to present data in ways that can be easily misinterpreted, which can actually create more problems than are solved. So how can data center administrators get the working set information they need in a manner that is most useful for proper planning, design, operations, and management? What Is It?For all practical purposes, working sets are the data most commonly accessed from persistent storage. But that simple explanation leaves a handful of terms that are difficult to qualify, and quantify. What is recent? Does “amount” mean reads, writes, or both? What happens when the same data is written over and over again? Determining a working set’s size helps administrators understand the behaviors of workloads for better design, operations, and optimization. For the same reason administrators pay attention to compute and memory demands, it is also important to understand storage characteristics like working sets. Understanding and accurately calculating working sets can have a profound effect on the consistency of a data center. Have you ever heard about a real workload performing poorly, or inconsistently on a tiered storage array, hybrid array, or hyper-converged environment? This is because both are extremely sensitive to right sizing the caching layer. Not accurately accounting for working set sizes of the production workloads is a common reason for such issues. To explore this more, let’s review a few traits associated with working sets:
A simplified, visual interpretation of data activity that would define a working set, might look like below.
If a working set is always related to a period of time, then how can it ever be defined? A workload often has a period of activity followed by a period of rest. This is sometimes referred to the “duty cycle.” A duty cycle might be the pattern that shows up after a day of activity on a mailbox server, an hour of batch processing on a SQL server, or 30 minutes compiling code. Taking a look over a larger period of time, duty cycles of a VM might look something like below. Working sets can be defined at whatever time increment desired, but the goal in calculating a working set will be to capture one or more duty cycles of each individual workload at a minimum. Classic Methods for Calculating Working SetsThere are various ways that administrators have attempted to measure working sets, all of which are ineffective for various reasons. These include:
As you can see, these old strategies do not hold up well, and still leaves the administrator without a real answer. A data center architect deserves better when factoring in this element to the design or optimization of an environment. A New ApproachThe hypervisor is the ideal control plane for measurement of a lot of things. Let’s take storage I/O latency as a great example. It doesn’t matter what the latency a storage array advertises, but what the VM actually will see. So why not extend the functionality of the hypervisor kernel so that it provides insight into working set data on a per VM basis? By understanding and presenting storage characteristics such as block sizes in a way never previously possible, you can understand on a per VM basis the key elements necessary to calculate working set sizes. Furthermore, you can estimate working sets for each individual VM in a vSphere cluster, and/or estimate for VMs on a per host basis. Once working set sizes have been established, it opens a lot of doors for better design and optimization of an environment. Here are some examples of what can be achieved:
SummaryUnderstanding and accurately accounting for working set sizes can make the difference between a successful design, implementation, and operation of the data center, or an environment that leaves you with erratic performance, and dissatisfied application owners and users. Accommodating working set sizes correctly will not only help with predictable application delivery, but may have significant cost savings by avoiding overprovisioning of data center resources. Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library. |
|||||||||||||