|Read the Digest in
You need the free
The digest of current topics on Continuous Availability. More than Business Continuity Planning.
BCP tells you how to recover from the effects of downtime.
CA tells you how to avoid the effects of downtime.
In this issue:
Browse through our Useful Links.
Check our article archive for complete articles.
Sign up for your free subscription.
Join us on our Continuous Availability Forum.
Availability – The Five-Legged Stool
As we point out in our availability seminars, there are five fundamental requirements to be met by any system for which availability is an issue. They are redundancy, dispersion, isolation, failover, and testing.
Redundancy is the obvious requirement. If a component in the system fails, there must be an equivalent component to take its place. Redundancy goes beyond hardware and software. It extends to people. If the guy in-the-know is unavailable when you need him, your system may well be down.
To protect against major disasters, the redundant components should be dispersed over sufficient distances to ensure that no common event will take down all of them. Technologies such as clusters, RAID, and virtualization do not provide dispersion. Furthermore, the redundant components should be isolated. A fault in one should not create a fault in another (redundant power grids are a good example).
Redundant components are ineffective if one cannot fail over quickly to a backup component. Failover time, which can range from hours to seconds, is the key factor in the degree of availability that is achieved. Failover must also be reliable. This can only be guaranteed by periodic testing of failover, a process often given short-shrift because of its cost and risk.
We focus on all of these requirements in our availability seminars with theory, practice, and case studies. Check out our seminar synopses on our web site, and give us a call to schedule an educational experience.
Dr. Bill Highleyman, Managing Editor
A large, international packaged-tour operator found that its booking system was being overloaded with rafts of customer queries before the customers made final transactions. A typical customer investigates several travel arrangements, such as airfares, hotels, and car rentals, before making a booking decision. This activity is called the look-to-book ratio.
The tour operator solved its overload problem by moving to an asymmetric, active/active configuration that allows it to process queries efficiently with Windows query nodes while dedicating its large NonStop transaction-processing systems to the execution of the final booking transactions. Database changes made to the booking node are replicated to query nodes that provide independent query processing.
In addition, the booking and query nodes can interchange roles to eliminate planned downtime and to provide continuous availability and disaster tolerance in the event of a system fault or a data-center failure.
Equally important is that the system is now easily expandable by adding Windows query nodes. Expandability is simple and economic.
In Part 1 of this series, we reviewed various concepts of availability. We pointed out that systems can be highly available, exhibiting minutes of downtime per outage, or continuously available with seconds of downtime per outage. If a backup data center is provided in order to continue operations following a disaster of some sort, the backup data center can provide disaster recovery or disaster tolerance. With disaster recovery, IT services can be restored, though it may take days or weeks. With disaster tolerance, IT services continue uninterrupted following a disaster.
Fundamental to all highly available and continuously available architectures is data replication. Such availability requires redundancy. It is data replication that maintains redundant databases in synchronization so that an up-to-date database copy is immediately available following an outage.
There are a wide variety of data-replication technologies in use today. They support active/passive systems, in which the backup system is passively standing by and ready to take over if it is needed. Also supported are active/active systems, in which all systems are actively involved in the application.
Part 2 of this series explores the various replication technologies and their strengths and weaknesses.
Since 1989, the Disaster Recovery Journal (DRJ) has sponsored the semiannual Spring World and Fall World conferences dedicated to Business Continuity and Disaster Recovery (BC/DR). Its 45th conference, Fall World 2011, will be held the week of September 11th at the Sheraton San Diego Hotel and Marina in San Diego, California. The body of the three-day agenda (Monday, September 12th, to Wednesday, September 14th) includes nine unopposed general sessions and 24 breakout sessions.
Several pre-conference and post-conference courses are also scheduled. Pre-conference courses are held all day on Saturday, September 10th, and conclude on Sunday morning, September 11th. Post-conference courses are one- to three-day courses held on Wednesday afternoon, all day Thursday, and Friday morning, September 14th to September 16th. In addition, several workshops are scheduled for Sunday afternoon and Tuesday afternoon. Courses and examinations for certain BCP (Business Continuity Planning) certifications are offered.
DRJ’s Fall 2011 conference continues twenty-three years of distinction in the fields of Business Continuity and Disaster Recovery. Lasting for a week with informational sessions, workshops, certification exam preparation, and qualifying exams, it is the premier educational event for BC/DR professionals.
A major problem in active/passive systems is configuration drift. Typically, both the active production system and the passive standby system must be identical. Should the production system fail, applications may not execute properly or may not run at all on the standby system if its configuration is different from that of the production system.
Configuration drift is one of the major causes of failover faults. It is aggravated by the fact that testing failover is so risky and expensive that exercising failover is often given short shrift by companies. Configuration errors are not detected until the standby system fails to come up.
An important tool for ensuring that this situation does not occur is to use a facility that maintains in synchronization the software configuration of the two systems. Such a tool is HP NonStop AutoSYNC from Carr Scott Software, Inc.
AutoSYNC synchronizes NonStop Guardian and OSS files whether they are quiescent or active. AutoSYNC is extensible via the use of triggers that can invoke scripts or programs following a successful synchronization task.
AutoSYNC makes full use of the availability and scalability capabilities of the HP NonStop system.
Sign up for your free subscription at http://www.availabilitydigest.com/signups.htm
Would You Like to Sign Up for the Free Digest by Fax?
Simply print out the following form, fill it in, and fax it to:
+1 908 459 5543
The Availability Digest is published monthly. It may be distributed freely. Please pass it on to an associate.
Managing Editor - Dr. Bill Highleyman firstname.lastname@example.org.
© 2011 Sombers Associates, Inc., and W. H. Highleyman