Recently I have been working with a few different clients architecting and designing High Availability WMB solutions. A few things have stood out to me that are worth sharing and considering.
We need to define and agree upon what HA means to the stakeholders
What does this thing do?
What is more pertinent and relevant for messaging solution specialists is to know, understand and think in business terms and design the appropriate solution. An engineer would not blueprint a bridge, a building or an airplane without a through, detailed and documented understanding of the purpose which is being served by the solution. Tinkering with technology is the crazed favorite pass time amongst many a techology enthusiasts. It is more than worth the tradeoff to acquire the input of a business driven technologist over a techology enthusiast. This applies strongly in the case of desiging HA messaging solutions as most of these systems make up the core of business services delivery.
What do you mean by that?
Some of the best communication I recall is from a mentor who would often just simply say "What do you mean by that?". Wow, how simple and yet how effective is that principle in communication. Communication must be a strong suit for the person designing a messaging technology solution.
HA means something different to every one of my customers. In some instances it simply means there is a way to recover from a failure and to apply maintenance with little to no down time. Then when we talk further they agree that the system can be down for eight hours and recover. On the other end of the spectrum, with large, real-time messaging at the core of their business; which represents most of my clients we start talking about 9's of availability.
Most often this turns quickly into Five 9's MUST be acheived. Then I begin to explain that this really equivocates to Continuous Availability and further break down the definition in terms of downtime per year, per day and per week and begin to discuss the architecture.
5:15.36 of downtime per year and 25:92 per year.
After this discussion some times these requirements are reduced. This is very important, no it is critical to any discussion about availability. This will drive the rest of the design.
A chain is only as strong as its weakest...
You know the trite but true. Therefore a high availability messaging soltuion is only as highly available as its least available component! The driving principle here is to think about every place across the wire, across the LAN across the C-Sharp and the Worldwide Area Network that a message has to traverse in its short and simple life to stay alive long enough to be tested, translated and transformed, secured, packaged and delivered to its final destination to that place in messaging bliss when a message becomes a trasaction that once lived, but is now part of the bigger universe of business services so eloquently transaction that mean real money to real owners. So and therefore if that little bit of bytes finds its way on to one component, one little nic that when nac and no longer transmits a data pulse...guess what? That little message just did not make it to heaven. No matter what kind of quad engine turbo jet processor it was rocketed on. The moral of the story? Each component which the message crosses must be available for the designated number of nines specificed. MQ clustering, WMB multi-instance, Red Hat Clustering, HACMP, Tivoli, Veritas Clustering etc. provides the supporting infrastructure availability to meet the success criteria and satisfy the SLA; but not if there is one component which is not highly available and redundant.
Stakeholders need to know and understand that messaging is one of many boxes and wires involved in delivering messages on to heavenly blissful passage where a message transforms into revenue.
Clear and Accurate Measurements
Meters or Yards?
Now when the stakes are high we simply do not calibrate on an ambiguous system of measurement! No sir, this is why football is played with an agreed upon definition of success. This is why the guys in stripes bring out the chains to mesure the fraction of inches that make the difference between success and failure on third and long. So why would we not agree upon standards for measuring the availaiblity of critical business systems services for the underlying technologies which ensure the delivery of said services? Does loosing a transaction here and there REALLY matter? If so, then how much? How much does it cost to design a CA messaging solution versus mitigating those losses in some other manner?
There is actually such a thing as availability math. I am not sure if they are teaching this in C.S. classes, but I am sure they need to be.
The rest of the Design Story
So now that what level of availability is required has been defined we can begin to actually design a system that meets the criteria. Once designed and costed some customers will further relax their 9's and thereby reduce their costs. They will also make intelligent and informed decisions about what transactions really need this level of availability versus what percentage can live on a more simplified architecture.
Service is the name of the game
At the end of the day we all need to remember it is about service. Quality of service, assurance of service delivery and reliablity of service. These issues are the lifeblood of the companies we who wear the messaging hat hold at our disposal. It's what getting that tiny bit of bytes across the wire to service bliss is all about.
How to?
Next we can look at the how to with respect to the actual design and implementation of a WMB HA including options.


