|Read the Digest in
You need the free Adobe
The digest of current topics on Continuous Processing Architectures. More than Business Continuity Planning.
BCP tells you how to recover from the effects of downtime.
CPA tells you how to avoid the effects of downtime.
In this issue:
Browse through our Useful Links.
Check our article archive for complete articles.
Sign up for your free subscription.
Join us on our Continuous Availability Forum.
Is “Active/Active” Getting Old?
A colleague recently related to me some comments of others that the interest in active/active architectures has passed its sell-by date. I find that strange. The need for 24/7 operation of critical systems gets stronger every day. Continuous availability is moving from a competitive advantage to an absolute requirement in many applications.
Perhaps it’s a matter of terminology. “Active/Active” is an architecture. “Continuous Availability” is the benefit. But what is continuous availability? In our view, if your system has no perceptible downtime, that is “continuous availability.” And active/active architectures are the predominant way today to achieve that level of continuous availability.
We find it curious that in this day of round-the-clock global commerce, the use of active/active technology is still in its infancy. It is practiced in the HP NonStop world, the hub of critical systems. IBM has its Parallel Sysplex systems. But products exist today to extend this technology to the Linux and Windows worlds. Why is this not happening? After all, the infrastructure cost is about the same as active/backup and cluster technologies.
Join us on our thread, Is “Active/Active” Getting Old?, on our LinkedIn Continuous Availability Forum to give us your insight on where continuous availability is going.
Dr. Bill Highleyman, Managing Editor
Wolfgang Breidbach and his colleagues may well be the fathers of active/active systems. They implemented their configuration over twenty years ago. An earlier Availability Digest article described Bank-Verlag’s system as using transaction replication. In a recent, interesting discussion on the Defining Active/Active thread of our LinkedIn Continuous Availability Forum, Wolfgang provided significant additional detail about his approach.
Bank-Verlag is responsible for the production of debit cards for the German banks. The technology of the mid-1980s was to simply keep debit-card data on the magnetic stripe of the card, with later batch updates of the customer accounts. There was no online verification of a debit-card transaction against the customer’s account.
This system worked fine until a TV investigative report showed how easy it was to counterfeit these cards. As a consequence, Bank-Verlag implemented an online debit-card processing application on an IBM System 370 so that a debit-card transaction could be checked against the corresponding customer account before authorizing the transaction. Later, for uptime reasons, Bank-Verlag switched to a Tandem system and had to migrate the IBM database and applications to the Tandem without denying debit-card service to its customers. Thus was born active/active.
Today, Bank-Verlag performs this function on a pair of NonStop NS 16000s using transaction replication in an active/active configuration.
Déjà vu. Just four months ago, in our April, 2010, issue, we related Google’s experience with incomplete documentation that took down their entire Google Apps data center for two and a half hours. A major Singapore bank, DBS Bank, has just had a repeat of that experience. Only in this case, its systems were down for up to nine hours during a busy banking day. Gone were its online banking, its ATMs and credit-card services, and its back office systems.
The problem was caused by an IBM employee who directed operations staff to use an outdated procedure to perform maintenance on a disk-storage system. The correct procedure had yet to be documented.
The bank compounded the problem by waiting too long to dust off its business continuity plan. By the time the bank convened its disaster-recovery team, the crisis was almost over.
Since 1989, the Disaster Recovery Journal (DRJ) has sponsored the semiannual Spring World and Fall World conferences dedicated to Business Continuity and Disaster Recovery (BC/DR). Its 43rd conference, Fall World 2010, will be held the week of September 18th at the Sheraton San Diego Hotel and Marina in San Diego, California. The body of the three-day agenda (Monday, September 20th, to Wednesday, September 22nd) includes nine unopposed general sessions and 24 breakout sessions.
Several pre-conference and post-conference courses are also scheduled. Pre-conference courses are held all day on Saturday, September 18th, and conclude on Sunday morning, September 19th. Post-conference courses are one- to three-day courses held on Wednesday afternoon, all day Thursday, and Friday morning, September 22nd to September 24th. In addition, several workshops are scheduled for Sunday afternoon and Tuesday afternoon. Courses and examinations for certain BCP (Business Continuity Planning) certifications are offered.
DRJ’s Fall 2010 conference continues twenty-one years of distinction in the fields of Business Continuity and Disaster Recovery. Lasting for a week with informational sessions, workshops, certification exam preparation, and qualifying exams, it is the premier educational event for BC/DR professionals.
The ftServer from Stratus Technologies is a hardware-based, fault-tolerant server for running Windows and Linux applications. Its scalability and recoverability were recently measured in June, 2010, by Principled Technologies, Inc.
Under a test commissioned by Stratus and NEC to assess ftServer’s scalability and its resilience to catastrophic events, Principled Technologies stressed the ftServer incrementally by adding virtual CPUs (vCPUs) to a single VMware virtual machine (VM). At peak load, one of the two redundant servers - memory, processor, I/O subsystem, disks, and all – was pulled from the chassis to measure the ftServer’s recovery time from such a catastrophic failure.
The results showed performance measured in tens of thousands of orders per minute. Scalability was reasonably linear up to four vCPUs. With eight vCPUs running, the induced massive fault caused by removing one of the redundant processors resulted in virtually no performance degradation and no of loss of application data or its integrity.
The test further showed that the lockstep, hardware-based, fault-tolerant approach used by the ftServer imposed no overhead on the system during normal redundant operation.
Sign up for your free subscription at http://www.availabilitydigest.com/signups.htm
Would You Like to Sign Up for the Free Digest by Fax?
Simply print out the following form, fill it in, and fax it to:
+1 908 459 5543
The Availability Digest may be distributed freely. Please pass it on to an associate.
Managing Editor - Dr. Bill Highleyman firstname.lastname@example.org.
© 2010 Sombers Associates, Inc., and W. H. Highleyman