Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The System Health dashboard provides an overview of all devices health state based on user-defined metrics.

The dashboard is composed of multiple cards, each representing the health of a device, optionally divided into device groups.

In addition, the dashboard displays the DPOD health state based on Internal Health Alerts.

Metrics

  • The health of each device is based on several user-defined metrics. For example, the CPU of the device.
  • A metric is basically a criteria and a set of thresholds that together define whether the state of the device in that aspect is one of: Good / Waring / Error.
  • Metrics are based on the Alerts subsystem of DPOD:
    • The user can define which alerts are part of the System Health by selecting the "System Health Metric" option under [Manage → Alert → Setup Alerts →  Edit Alert].
    • Each alert can be used as a simple alert, as a System Health metric, or both. Using an alert both for alerting and as a System Health metric is recommended, since it makes sure the System Health dashboard will precisely reflect the sent alerts.
  • The following System Health metrics are defined by default:
    • Devices CPU Metric
    • Devices Memory Metric
    • Devices Load Metric
    • Devices Fan Metric
    • Devices Temperature Metric
    • Devices Voltage Metric
    • Devices Space Encrypted Metric
    • Devices Space Temp Metric
    • Devices Space Internal Metric
    • System Errors Metric
    • Device Availability Metric - This is an internal metric based on "Device Resources Monitoring" option selected at the device level, that checks whether the device is available or not.

...

  • Each device may have a total warnings threshold which sets the health of that device to Error in case the number of metrics that are at Warning state exceeds that threshold in a specific time period (see Device Health Settingssee Configuring Monitored Gateways).
  • Each device may also have a warning damage points threshold which sets the health of that device to Error in case the summary of the damage points of all metrics that are at Warning state exceeds that threshold in a specific time period (see Device Health Settingssee Configuring Monitored Gateways).
    • Each System Health metric may be assigned with damage points, which should reflect the severity of that warning.
  • For each device, the user can set thresholds and damage points per health metric, which override the default thresholds and damage points defined at the System Health metric level (see Device Health Settingssee Configuring Monitored Gateways).

Devices Display Options

  • For each device, the user can define whether the device is displayed in the System Health dashboard.
  • The user may define device groups:
    • Each device group has a name and a display order of that group
    • Devices are assigned to one or more device groups with a defined display order
    • For example: Production, Non-production

...