The Availability Digest

The articles you read in the Availability Digest result from years of experience in researching and writing a variety of technical documents and marketing content. It’s what we do best, and we provide our services to others who value high-quality content created by IT specialists. Ask us about

• articles • white papers • case studies • web content • manuals • specifications • patent disclosures

Follow us

@availabilitydig

System Architecture and Risk Analysis - It's What We Do Best

Let’s be realistic. Availability costs money. Organizations may determine their requirements for system availability based on expectations of performance, cost, and downtime avoidance; but it is cost that usually is the primary driver. Rarely do budgetary decisions take into consideration the revenue losses, bad press, and customer dissatisfaction incurred when an existing system goes down and provides no service at all. Those are the most damaging costs, and some companies never recover.

We at the Availability Digest appreciate costs – both the expenses of operating a business and the revenues lost when system budgets are too frugal. Helping a company balance those two considerations is the service we bring as availability specialists. With our years of experience in custom software development and IT consulting, we are adept at discussing our analyses with staff that possess varying degrees of technical proficiency.

First things first. Our goal is to help you architect your systems to provide the proper uptime and data protection for your individual applications. To do so, we begin with a risk analysis. During this study, we determine the importance of each application to those who use it or for whom it is essential. How much downtime is permissible (the RTO, or Recovery Time Objective)? How much data loss is permissible (the RPO, or Recovery Point Objective)? How is lost data reconstructed?

Based on our risk assessment, we categorize your applications. For instance:

Non-critical applications – The loss of these applications will have no serious affect on any company activity. The application could be down for days, and days of data could be lost. Examples of such applications are statistical reporting applications and business forecasting applications.
Task-critical applications – If these applications should go down, certain employees in the company will be inconvenienced and may have to resort to manual procedures to complete their tasks. Lost data can be restored manually from paper copies. Hours of downtime and data loss are acceptable. Certain tasks may be suspended until the application is restored. Examples are accounts payable and accounts receivable.
Business-critical applications – The loss of these applications will prevent critical business functions from being performed. Downtimes of an hour or less are acceptable, with data loss measured in minutes. Payroll is an example of such an application.
Mission-critical applications – If these applications go down, important business functions that must be continuous will be impacted. Downtime can be tolerated for minutes with seconds of data loss. Customer call centers, customer-facing web sites, and health care applications are examples.
Absolutely critical applications – These applications can tolerate no downtime and virtually no data loss. Recovery time and data loss should be measured in seconds. In some cases, no data loss is acceptable. Examples are 911 systems, factory control systems, and large Electronic Funds Transfer systems.

Once our risk analysis is complete, the next step is to architect the appropriate systems for each application. We will base our suggestions on your current systems and how they can be rearchitected to maximize availability, minimize data loss, facilitate your existing operational knowledge and experience, and control the need for any new equipment or staffing costs. Architectures may include:

Non-critical applications – A single server with magnetic-tape backup. Following an outage, a new server may have to be provisioned, the database loaded from magnetic tape, the production applications brought up, and the system tested.
Task-critical applications – Applications are run on a production server, and available is a cold backup that is probably doing other work. Database backup is performed with virtual tape. Upon a production-system outage, the applications on the backup server will have to be shut down, the database loaded from virtual tape, the production applications brought up, and the system tested.
Business-critical applications – Applications are run on a production system with a warm backup. Applications are loaded onto the backup system but do not have the database opened. Production data is replicated in real time with a data-replication engine. In the event of an outage, the backup applications mount the backup database; and the backup system continues the operations after being tested.
Mission-critical applications – Applications are run on a production server with a hot backup. Applications are loaded onto the backup system, and they have the database opened for read/write activity. In the event of an outage, the backup system is tested and takes over production operations.
Absolutely-critical applications - Applications are run in an active/active architecture. Two (or more) nodes are actively processing transactions for the same application. They each have their own local application databases, and the databases in the application network are kept synchronized via real-time data replication. Should a node fail, all transactions simply are routed to surviving nodes.

The applications must be rearchitected in a controlled manner. As part of our consulting responsibilities, we will monitor the rearchitecting of each application. Our agreement with you includes the assurance that the required RTOs and RPOs can be met. Of particular importance is the control of updates in active/backup configurations to prevent configuration drift, which might preclude a successful failover. Of equal importance is automating active/backup failovers to the extent possible and providing extensive operational training to ensure successful failovers. All are competencies that we offer.

Availability costs money. So does system downtime. For every application and for every system, there is an affordable balance between the two. Determining the proper balance for your company is what risk assessment and system architecture is all about, and it’s what we do best.

For further information and price quotations, contact Dr. Bill Highleyman at billh@availabilitydigest.com.

billh@availabilitydigest.com.