/
High Availability and Disaster Recovery

IBM DataPower Operations Dashboard v1.0.20.x

A newer version of this product documentation is available.

You are viewing an older version. View latest at IBM DPOD Documentation.

High Availability and Disaster Recovery

There are multiple ways to achieve HA/DR configuration in DPOD, based on the customer's requirements, implementation and infrastructure.

In addition, it is highly recommended to schedule a periodic backup of the DPOD server(s) using the Backup Utility.

Terminology

  • DPOD Installation: A single All-in-One DPOD server, or a set of DPOD servers installed as a Cell Environment. See Deployment Scenarios for more details.

  • DPOD Installation State/Mode can be one of the following:

    • Active: Powered on, performing monitoring activities.

    • Inactive: Powered off, not performing any monitoring activities.

    • DR Standby: Powered on, but not performing monitoring activities.

  • Primary DPOD Installation: A DPOD installation that is performing monitoring activities under normal circumstances (in Active state).

  • Secondary DPOD Installation: A DPOD installation, identical to the Primary Installation, that can be in Active, Inactive, or DR Standby state.

  • 3rd party DR software: A software tool that assists in the process of identifying a failure in the DPOD primary installation, and initiating the process of changing the secondary installation state to Active.

Important HA/DR Considerations

Before selecting an HA/DR scenario, address the following questions:

  1. Do you plan to use shared storage (which is replicated between the DPOD installations at the disks/storage level)?
    Note that in large installations, DPOD can capture vast volumes of data (tens of terabytes), so the replication may consume significant resources, and may incur 3rd party storage replication license costs, especially between data centers.

  2. Do you plan to use virtual cluster management tools like VMware vMotion that can relaunch the same DPOD installation on a different cluster member or data center in case of a failure?

  3. Do you plan to use a 3rd party DR software or another mechanism to identify a failure in the DPOD primary installation, and launch the secondary installation?

  4. Do you plan to use the same network configuration (e.g.: IP address, DNS, NTP, LDAP, SMTP) for the DPOD secondary installation as the primary one?

Scenario A: Active/Passive

active/passive shared storage high data consistency

  1. The storage must be shared (replicated at the disks/storage level in both directions) between the primary and the secondary DPOD installations.
    Note that in large installations, DPOD can capture vast volumes of data (tens of terabytes), so the replication may consume significant resources, and may incur 3rd party storage replication license costs, especially between data centers.

  2. The primary DPOD installation is in Active state (powered on) under normal circumstances, and is configured to monitor all DataPower gateways, along with the required reports, alerts, maintenance plans, roles, system parameters, etc.

  3. The secondary DPOD installation is in Inactive state (powered off) under normal circumstances, since storage replication is active.

  4. Once a 3rd party DR software or custom scripts identify a failure in the primary installation (e.g.: by sampling access to a network port, sampling the user interface URL, etc.):

    1. The primary installation should be powered off (set to Inactive) if not already in this state.

    2. The secondary installation should be powered on (set to Active) with the same resources as the primary installation, and attached to the same shared storage, probably on a different physical server(s) or data center.

    3. If the secondary installation is launched with a different IP address from the primary one, the 3rd party DR software should also apply the following changes:

      1. Change the configuration of network services (e.g.: NTP, SMTP, LDAP, etc.) in DPOD operating system, if it is different in the secondary data center.

      2. Execute a command/script to change the DPOD network address.

      3. Reconfigure the DPOD log targets on the DataPower gateways to send data to the new DPOD IP address using the refreshAgents REST API.

      4. Unless using an NLB in front of both DPOD installations web consoles, the DNS record for the DPOD installation’s Web Console should be changed to reference the secondary DPOD installation IP address, or users need to be informed to use the secondary DPOD installation IP address for accessing the DPOD Web Console.

    4. The secondary installation is now available with all the configurations and history of data.

  5. Once the situation is back to normal, the same procedure may be applied for switching back to the primary DPOD installation.

 

Scenario B: Limited Active/Active

active/active No shared storage moderate data consistency

  1. Both the primary and the secondary DPOD installations are in Active state (powered on) under normal circumstances.

  2. Both DPOD installations should be identical, with the same version, the same deployment profile(s) and the same allocated resources.

  3. The DPOD installations must have different 4-character environment names (set during the DPOD software deployment and visible at the top navigation bar of the Web Console), so they can concurrently monitor the same gateways.

  4. Both DPOD installations are configured to monitor all DataPower gateways, along with the required reports, alerts, maintenance plans, roles, system parameters, etc.
    The reports and alerts of the secondary installation should be configured so they are not actually published via email/Syslog/WS (see System Parameters List), and the maintenance plans should be disabled.
    Currently the configuration must be configured manually. A future version of DPOD will allow exporting and importing configurations between installations so this process can be automated.

  5. Each DPOD installation is completely independent, and no data is replicated between the DPOD installations.

  6. The data flows simultaneously from the DataPower gateways to both DPOD installations, and both installations simultaneously sample the gateways, execute reports and alerts, etc.

  7. Important: Since no data is replicated between the DPOD installations, data inconsistency may occur, as one installation may not be capturing data while it is not available (due to a failure, during server or network maintenance, or due to misconfiguration). This might affect the displayed data as well as reports and alerts.

  8. Once a 3rd party DR software or custom scripts identify a failure in the primary installation (e.g.: by sampling access to a network port, sampling the user interface URL, etc.):

    1. The reports and alerts of the secondary installation should be configured to be published via email/Syslog/WS (see System Parameters List), and the maintenance plans should be enabled (optional).

    2. Unless using an NLB in front of both DPOD installations web consoles, the DNS record for the DPOD installation’s Web Console should be changed to reference the secondary DPOD installation IP address, or users need to be informed to use the secondary DPOD installation IP address for accessing the DPOD Web Console.

    3. The secondary installation is now available with all the configurations and its own history of data (which might be different from the primary one).

  9. Once the situation is back to normal:

    1. The reports and alerts of the secondary installation should be configured so they are not actually published via email/Syslog/WS (see System Parameters List), and the maintenance plans should be disabled.

    2. Unless using an NLB in front of both DPOD installations web consoles, the DNS record for the DPOD installation’s Web Console should be changed to reference the primary DPOD installation IP address, or users need to be informed to use the primary DPOD installation IP address for accessing the DPOD Web Console.

    3. The primary installation is now available with all the configurations and its own history of data (which might be different from the secondary one).

    4. The data gathered throughout the disaster period in the secondary installation cannot be synced back to the primary installation.

 

 

Scenario C: Active/Standby

active/standby No shared storage low data consistency

  1. The primary DPOD installation is in Active state (powered on) under normal circumstances.

  2. The secondary DPOD installation is in Standby state (also powered on) under normal circumstances.
    Once it is installed, it must be configured to run in Standby state using the makeStandby REST API. This state is reflected in the top navigation bar of the Web Console.

  3. Both DPOD installations should be identical, with the same version, the same deployment profile(s) and the same allocated resources.

  4. Both DPOD installations must have the same 4-character environment names (set during the DPOD software deployment and visible at the top navigation bar of the Web Console).

  5. Both DPOD installations are configured to monitor all DataPower gateways, along with the required reports, alerts, maintenance plans, roles, system parameters, etc.
    In the secondary DPOD installation, this configuration will not make any changes to the monitored DataPower gateways, reports and alerts will not execute, etc., since it is configured in Standby state.
    Currently the configuration must be configured manually. A future version of DPOD will allow exporting and importing configurations between installations so this process can be automated.

  6. No data is replicated between the DPOD installations.

  7. The data flows from the DataPower gateways only to the active DPOD installation, and this installation is also responsible for sampling the gateways, executing reports and alerts, etc.

  8. Important: Since no data is replicated between the DPOD installations, data will be inconsistent, as only one installation is capturing data at any given moment. This will affect the displayed data as well as reports and alerts.

  9. Once a 3rd party DR software or custom scripts identify a failure in the primary installation (e.g.: by sampling access to a network port, sampling the user interface URL, etc.):

    1. The primary installation should be powered off (set to Inactive) if not already in this state.

    2. The secondary installation should be configured to run in Active state using the standbyToActive REST API.
      This will reconfigure the DPOD log targets on the DataPower gateways to send data to the secondary DPOD installation and will trigger the execution of reports, alerts, etc.

    3. Unless using an NLB in front of both DPOD installations web consoles, the DNS record for the DPOD installation’s Web Console should be changed to reference the secondary DPOD installation IP address, or users need to be informed to use the secondary DPOD installation IP address for accessing the DPOD Web Console.

    4. The secondary installation is now available with all the configurations, but without any history of data. Only new data will be captured.

  10. Once the situation is back to normal:

    1. The primary installation should be powered on (set to Active) if not already in this state.

    2. The secondary installation should be configured to run in Standby state using the standbyToInactive REST API.
      This will stop the execution of reports, alerts, etc.

    3. The primary installation should be configured to run in Active state using the activeBackToActive REST API.
      This will reconfigure the DPOD log targets on the DataPower gateways to send data to the primary DPOD installation.

    4. Unless using an NLB in front of both DPOD installations web consoles, the DNS record for the DPOD installation’s Web Console should be changed to reference the primary DPOD installation IP address, or users need to be informed to use the primary DPOD installation IP address for accessing the DPOD Web Console.

    5. The primary installation is now available with all the configurations and its own history of data (which is different from the secondary one).

    6. The data gathered throughout the disaster period in the secondary installation cannot be synced back to the primary installation.

 

 

Copyright © 2015 MonTier Software (2015) Ltd.