IBM DataPower Operations Dashboard v1.0.10.0

A newer version of this product documentation is available.

You are viewing an older version. View latest at IBM DPOD Documentation.

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

The System Health dashboard provides an overview of all devices health state based on user-defined metrics.

The dashboard is composed of multiple cards, each representing the health of a device, divided into device groups (optional).

In addition, the dashboard displays the DPOD health state.

Metrics

  • The health of each device is based on several user-defined metrics.
  • A metric is basically a criteria and a set of thresholds that define whether the state of that criteria is good or bad.
  • Metrics are based on the Alerts feature of DPOD. The user can define which alerts are part of the system health.
  • The following System Health metrics are defined by default:
    • Devices CPU Metric
    • Devices Memory Metric
    • Devices Load Metric
    • Devices Fan Metric
    • Devices Temperature Metric
    • Devices Voltage Metric
    • Devices Space Encrypted Metric
    • Devices Space Temp Metric
    • Devices Space Internal Metric
    • System Errors Metric
  • The user can define whether a metric is part of the system health by selecting the "System Health Metric" option 
    • The user may edit the metric setting from [Manage → Alert → Setup Alerts →  Edit Alert] 
  • In addition, there is an internal metric named "Device Availability" which is not editable, the purpose of this metric is to sample the device availability.


Assumptions:

  • The existing alerts infrastructure (with additional few more fields) will provide all data and logic to decide if a sample should be alerted and if it considers an error or warning or good.
  • The alerts mechanism will be the only source of current and future metrics. New fields: Is alert used as health metric, warning threshold, damage points
  • Any health metric must be based on a detailed investigation screen

Prerequisites:

Metric:

Device Health Settings:

  • For each device, the user can define whether the device is displayed in the System Health dashboard, Damage Points Threshold, Total Warnings Threshold
  • For each device, the user can set thresholds and damage points per health metric. - see device health settings

Device Group Settings:

System Parameters (DB) - default values for:

  • "System Health Dashboard Sample Time Range (min.)" - default to 5 minutes.

Device Card:

  • A single device card - includes:
    • Health states of the past hour divided to 5 parts, last X minutes and 4 parts of 15 minutes:
Icondescriptionlast X minutes(System Parameter)4 parts of 15 minutes

Good
  • If no Errors or Warnings found in metric samples
  • If no Errors or Warnings found in metric samples
Warning
  • If warning exists
  • If  warning exists
Error
  • If  error exists
  • If no errors found but some warning exists then
    • If total damage points for all metrics is bigger than the damage points for a device to sustain.
    • If total of warning is bigger than number of warning found for a device
  • If  error exists
  • If no errors found but some warning exists then
    • If total damage points for all metrics is bigger than the damage points for a device to sustain.
    • If total of warning is bigger than number of warning found for a device
No metrics samples-
  • If  no metric samples found
  • If all "Device Availability" metric samples are Error

 + background color of card is red

Critical
  • If  no metric samples found
  • If all "Device Availability" metric samples are Error
  • If  the "Device Availability" metric last sample is Error
-


  • Clicking a device card should direct to the device health dashboard
  • Device Health Dashboard (drill-down)
    • A series of charts to display metric values per device - each metric with its own chart over time
      • Each chart is of "Scatter" type, divided to 4 parts (each part represents 15 minutes)
      • Each point in the graph should display the right color (green for Good, red for Error etc.) and display the value in the tooltip of the point
      • All points should overlap a little so the display is compact and looks like a thick line built of points:
      • Clicking on a metric graph will dispatch the user to further investigation in one of the product existing analytics dashboards








  • No labels