I’m finally getting off the schneid to write an entry that might hopefully be beneficial to the community.  I’ll do my best to make sure more of these will follow in the future.

A recent experience with a customer leads me into this discussion around performance optimization options when running an HP EVA 4×00/6×00/8×00 Storage Array.  As you may already know these storage arrays are commonly sold for their ease of implementation, management and use.  From that standpoint they are great, however if you have any need to understand how things are performing, you are limited in your options.  I’ll dig deeper in the future on the tools for performance monitoring, but for this post I’m going to describe a simple change on your Microsoft Windows-based hosts attached to the SAN that may help increase overall performance.

The HP EVA 4×00/6×00/8×00 Storage Array is an Assymetric Active/Active (AAA) Storage Array which simply means both storage controllers are active and able to service I/O requests, however only one controller manages a given vDisk or LUN.  That managing controller is the one that will ultimately have to service all I/O requests for the vDisks it owns.  From a write perspective, this is what it is and there are strategies for optimizing this traffic however on read activity this can result in unnecessary performance degradation on very active arrays because of how a read miss is handled.  I’ll explain what I mean…

An EVA 4×00/6×00/8×00 has a back-side channel for communication between the two controllers called the Mirror Port.  I believe this is nothing more than a point-to-point fiber channel link and each controller has two or three of these depending on the array version.  It is over these ports that communication between the controllers must occur and it is possible to saturate this communication link with data write and read proxy cache mirroring traffic.  We are going to focus on the proxy reads this time around and maybe look at the data writes at a later time.

A proxy read is a situation where a read comes into the controller that does NOT own a vDisk or LUN and the data requested is not currently in that proxy controller’s cache.  This is considered a Read Miss and a request is made to the owning controller to read the data into the non-owning controller’s cache.  Reading this data into cache will force the owning controller to push the data across the Mirror Port to the proxy controller.  While this is by design, it is not ideal because if your Mirror Port is under heavy utilization already this will just make things worse.  As a result we would like to see most or all read requests always come into the owning controller.  Fortunately, we have a thing called Asymmetric Logical Unit Access (ALUA) that EVAs support and can be utilized by the HP MPIO DSM on a Windows host to intelligently direct read traffic to the owning controller.

Now the downside is that the utilization of this ALUA feature is not enabled by default.  In fact, the default for Windows hosts is to leverage all paths and both controllers, essentially a round robin approach to path balancing and effectively sends half of all read traffic to the non-owning controller.  In order to enable the use of ALUA you must enable ALB (Adaptive Load Balancing) on each Microsoft Windows host currently running the HP MPIO DSM.  This can be done with the HP DSM CLI or the HP DSM Manager MMC plugin. 

 

Now this setting helped eliminate approximately 30-40% of all mirror port traffic for one customer that was running a very active storage array and had an immediate effect on reducing I/O latency.  The benefits will likely vary and could have no visible effects if your array is not under heavy utilization.

One way to monitor where you stand on performance and compare your stats before and after the change is to run the EVAPerf utility usually installed on your HP CommandView server.  Launch EVAPerf, which will open a Command Prompt and type:

EVAperf vdg –cont

This will provide you a view of the live stats from your EVA Disk Groups and specifically the Mirror Port MB/s will let you know how active your mirror port is currently.  I have been told that anything over 280 MB/s is reaching a threshold that could start displaying ill effects.  I have personally seen this reach upwards of 400 MB/s and significant I/O Latency came along with it.  This can display itself as lagging or locked VMs in a VMware vSphere 4 environment and very poorly performing Microsoft SQL 2008 Database servers.

Hopefully this will help someone troubleshooting an HP EVA Storage Array or planning a new implementation.  Comments and corrections are always welcome, we are all part of the same team, so let’s figure out how to make things work better.