High Availability, Resiliency or Disaster Recovery

High Availability (HA), Resiliency or Disaster Recovery (DR) Implementation

There are multiple methods available to achieve DPOD HA/DR planning and configuration. These methods depend on the customer's requirements, implementation and infrastructure.

Terminology

Node state/mode - DPOD nodes can be in state : Active (On and Performing monitoring activities) / Inactive ( Off and Not perform monitoring activities) / DR Standby (On and Not perform monitoring activities).

Primary node - a DPOD installation that will actively monitor DataPower instances under normal circumstances (Active state).

Secondary node - an identical DPOD installation to the Primary (In virtualized environments it is the same image of the primary node) in DR Standby or Inactive state.

3rd party DR software - a software tool that assist in identify the the primary node state has changed from active to inactive and initiate the process of launching the secondary node as an active node.

DPOD Scalability vs. HA/DR

DPOD supports installing multiple DPOD nodes for scalability to support high throughput in case of high rate of transactions per second (TPS). However, this does not provide a solution for HA/DR requirements.

For simplicity, this document assumes that only one DPOD node is installed, but exactly the same scenarios and considerations apply for multiple nodes installations.

Important HA\DR considerations

Consult your BCP/DR/System/Network Admin and address the following questions before selecting the method(s) of HA/DR implementation with DPOD to use:

1. For large installations, DPOD can capture vast volumes of data. Replicating that much data for DR purposes may consume significant network bandwidth, and may incur 3rd party storage replication license costs.

Is it cost effective to replicate DPOD data or is it acceptable to launch another instance of DPOD with configuration replication only?

2. The software used for Active/Passive scenario:

Does DPOD in your case run on a virtual infrastructure like VMware or can you use VMware VMotion or Active\Passive cluster management tools that can help identify and relaunch DPOD on a different cluster member?

3. You are expected to have an Active/Passive software or another mechanism in place to identify when DPOD server node becomes inactive and launch a new one in an active cluster member.

Do you have such a tool (DR software)?

4. When launching a new DPOD instance on the backup cluster member:

Will the new instance keep the same network configuration of the primary instance (for example: IP Address, DNS, NTP, LDAP, SMTP) or will the configuration change?

5. Some DataPower architecture solutions (Active/Passive or Active/Active) effect DPOD configuration. If DataPower IP address changes - then your DPOD configuration may need to change.

Does your DataPower architecture use an active/passive deployment? If so - will the passive DataPower have the same IP addresses when it switches to active?

Common scenarios for implementing DPOD HA/DR

Scenario A: Active/Passive - DPOD's IP Address remains the same

Assumptions:

The customer has DataPower appliances deployed using either an Active/Passive, Active/Standby or Active/Active configuration. All DataPower appliances in any of these configurations have unique IP addresses.
DPOD node is installed once and is configured to monitor all DataPower appliances (active, standby and passive).
All DPOD network services (NTP, SMTP, LDAP etc.) have the same IP addresses even after failover (otherwise a post configuration script is required to be run by the DR software).
The customer has storage replication capabilities to replicate DPOD disks based on the disks’ replication policy described above.
The customer has a 3rd party software tool or scripts that can:

- Identify unavailability of the primary DPOD server node.
- Launch a passive DPOD node using the same IP address as the primary one (usually on a different physical hardware).

6. The passive DPOD node is not running when business is as usual, since disks replication is required and it has the same IP address as the primary DPOD node.

During a disaster:

The customer's DR software should Identify a failure in DPOD primary node (e.g. by pinging access IP, sampling user interface URL or both).
The customer's DR software should launch the passive DPOD node using the same IP address as the failed primary server node (or change the IP address if not already configured that way).

DPOD will be available in the following way:

As the passive DPOD server has the same IP, all DataPower appliances will be able to access it.
Since all DataPower appliances will have the same IP addresses - DPOD can continue to sample them.
Since the passive DPOD server has the same IP address as the primary one, access to DPOD console will be with the same URL.

Scenario B: Active/Passive – DPOD's IP Address changes

Assumptions:

The customer has DataPower appliances deployed using either an Active/Passive or Active/Stand-by configuration. All DataPower appliances in any of these configurations have unique IP addresses.
DPOD server is installed once and is configured to monitor all DataPower appliances (active, standby and passive).
All DPOD network services (NTP, SMTP, LDAP etc.) have the same IP addresses even after failover (otherwise a post configuration script is required to be run by the DR software).
The customer has storage replication capabilities to replicate DPOD disks based on the disks’ replication policy described above.
The customer has a 3rd party software tool or scripts that can:

- Identify unavailability of the primary DPOD server.
- Launch a passive DPOD server using a different IP address than the primary one (usually on a different physical hardware).

6. The passive DPOD server is not running when business is as usual, since disks replication is required.

During a disaster:

The customer's DR software should Identify a failure in DPOD primary server (e.g. by pinging access IP, sampling user interface URL or both).
The customer's DR software should launch the passive DPOD server using a different IP address than the failed primary server (or change the IP address if not already configured that way).
The customer's DR software should execute a command/script to change DPOD's IP address.
The customer's DR software should change the DNS name for the DPOD server's web console to reference an actual IP address or use an NLB in front of both DPOD web consoles.
The customer's DR software should disable all DPOD log targets, update DPOD host aliases and re-enable all log targets in all DataPower devices. This is done by invoking a REST API to DPOD. See "refreshAgents" API under Devices REST API.

DPOD will be available in the following way:

Although the passive DPOD server has a different IP address, all the DataPower appliances will still be able to access it since their internal host aliases pointing to DPOD will be replaced (step 5 above).
As all DataPower appliances retain the same IP addresses - the passive DPOD server that was just became active can continue to sample them.
Although the passive DPOD server has a different IP, all users can access DPOD’s web console because its DNS name has been changed or it is behind an NLB (step 4 above).

Scenario C: Active/Standby – 2 DPOD separate installations

Assumptions:

The customer has DataPower appliances deployed using either an Active/Passive or Active/Stand-by configuration. All DataPower appliances in any of these configurations have unique IP addresses.
Two DPOD servers are installed (requires DPOD version 1.0.5 +), one operates as the active node and the other one as standby node. After installing the standby DPOD node, it must be configured as a standby node. See "makeStandby" API under DR REST API.
Both DPOD servers should have the same environment name. The environment name is set by the customer during DPOD software deployment or during upgrade, and is visible at the top navigation bar (circled in red in the image below):
When the DPOD node is in DR Standby mode, a message is shown next to the environment name in the Web Console. A refresh (F5) may be required to reflect recent changes if the makeStandby API has just been executed, or when the DPOD status has changed from active to standby or vice versa. See the image below:
The customer is expected to configure each DPOD node to monitor all DataPower Devices (active, standby and passive). Since DPOD v1.0.5 a new REST API may be utilized to add a new DataPower device to DPOD without using the UI (see Devices REST API). As both servers are up, no configuration or data replication can exist in this scenario. Especially, customer must add DataPower instances to the standby DPOD server and set the agents for each device from the Device Management page in the web console (or by using the Devices REST API). Setting up the devices in the standby DPOD server will not make any changes to the monitored DataPower devices (no log targets, host aliases or configuration changes will be made).
All DPOD network services (NTP, SMTP, LDAP etc.) have the same IP addresses even after failover (otherwise a post configuration script is required to be run by the DR software).
The customer has a 3rd party software tool or scripts that can:
- Identify unavailability of the primary DPOD node.
- change the state of the secondary node (that is in standby state) to Active state .
The standby DPOD server can still be online as disk replication is not required.

During a disaster:

The customer's DR software should Identify a failure in DPOD primary server (e.g. by pinging access IP, sampling user interface URL or both).
The customer's DR software should enable the standby DPOD server by calling the "standbyToActive" API (see DR REST API). This API will point DPOD's log targets and host aliases of the monitored devices to the standby server and enable most timers based services (Reports,Alerts ...) on secondary servers.
The customer's DR software should change the DNS name for the DPOD server's web console to reference an actual IP address or use an NLB in front of both DPOD web consoles.

DPOD will be available in the following way:

Although the standby DPOD server has a different IP address, all the DataPower appliances will still be able to access it since their internal host aliases pointing to DPOD will be replaced (step 2 above).
As all DataPower appliances retain the same IP addresses - DPOD can continue to sample them.
Although the secondary DPOD node has a different IP, all users can access DPOD’s web console because its DNS name has been changed or it is behind an NLB (step 3 above).
Note - All Data from the originally Active DPOD will not be available!

In a "Return to Normal" scenario:

Right after re-launching the primary server, call the "standbyToInactive" API (see DR REST API) to disable the standby server.
Call the "activeBackToActive" API (see DR REST API) to re-enable the primary server - this will point DPOD's log targets and host aliases on the monitored devices back to the primary DPOD server.
The customer's DR software should change the DNS name for the DPOD server's web console to reference an actual IP address or use an NLB in front of both DPOD web consoles.

Scenario D: Limited Active/Active – 2 DPOD separate installations

Assumptions:

The customer has DataPower appliances deployed using either an Active/Passive , Active/Active or Active/Stand-by configuration. All DataPower appliances in any of these configurations have unique IP addresses.
Two DPOD servers are installed (both are v1.0.5+), both operate as the active.
Both DPOD servers should have the different environment name. The environment name is set by the customer during DPOD software deployment, and is visible at the top navigation bar (circled in red in the image below):
Both DPOD servers are configured separately to monitor all DataPower Devices (active, standby and passive). Since DPOD v1.0.5 a new REST API may be utilized to add a new DataPower device to DPOD without using the UI (see Devices REST API). As both servers are up, no configuration replication can exist in this scenario.
The customer added DataPower devices to the standby DPOD server and set the agents for each device from the Device Management page in the web console (or by using the Devices REST API). Setting up the devices in the standby DPOD server will not make any changes to the monitored DataPower devices (no log targets, host aliases or configuration changes will be made). Customer is expected to replicate all configurations and definitions for each installation. DPOD is not replication neither data nor configurations/definitions.
All DPOD network services (NTP, SMTP, LDAP etc.) have the different IP addresses .
Since the two installations are completely independant and no data is replicated this may lead to data inconsistency as once may capture information while the other is shut down for maintenance.
Each DPOD installation will create for each domain 2 log targets. If one DataPower is connected to 2 DPODs than for each domain you will need 4 log targets . As DataPower have a limitation of ~1000 log targets starting FW 7.6 than customer must be aware notreach limitation of log targets

During a disaster:

The customer's DR software should Identify a failure in DPOD primary server (e.g. by pinging access IP, sampling user interface URL or both).

DPOD will be available in the following way:

Although the passive DPOD server has a different IP address, all the DataPower appliances will still be able to access it since their internal host aliases pointing to DPOD will be replaced (step 2 above).
As all DataPower appliances retain the same IP addresses - DPOD can continue to sample them.
Although the passive DPOD server has a different IP, all users can access DPOD’s web console because its DNS name has been changed or it is behind an NLB (step 3 above).
Note - All Data from the originally Active DPOD will not be available!

In a "Return to Normal" scenario:

Right after re-launching the primary server, call the "standbyToInactive" API (see DR REST API) to disable the standby server.
Call the "activeBackToActive" API (see DR REST API) to re-enable the primary server - this will point DPOD's log targets and host aliases on the monitored devices back to the primary DPOD server.
The customer's DR software should change the DNS name for the DPOD server's web console to reference an actual IP address or use an NLB in front of both DPOD web consoles.

Backups

To improve product recovery, an administrator should perform regular backups as described in the backup section.

IBM DataPower Operations Dashboard v1.0.6.0

High Availability (HA), Resiliency or Disaster Recovery (DR) Implementation

Terminology

DPOD Scalability vs. HA/DR

Important HA\DR considerations

Common scenarios for implementing DPOD HA/DR

Scenario A: Active/Passive - DPOD's IP Address remains the same

Assumptions:

During a disaster:

DPOD will be available in the following way:

Scenario B: Active/Passive – DPOD's IP Address changes

Assumptions:

During a disaster:

DPOD will be available in the following way:

Scenario C: Active/Standby – 2 DPOD separate installations

Assumptions:

During a disaster:

DPOD will be available in the following way:

In a "Return to Normal" scenario:

Scenario D: Limited Active/Active – 2 DPOD separate installations

Assumptions:

During a disaster:

DPOD will be available in the following way:

In a "Return to Normal" scenario:

Backups

IBM DataPower Operations Dashboard (DPOD) v1.0.6.0