Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The System Health dashboard provides an overview of all devices health state based on user-defined metrics.

The dashboard is composed of multiple cards, each representing the health of a device, optionally divided into device groups (optional).

In addition, the dashboard displays the DPOD health state based on Internal Health Alerts.

Metrics

  • The health of each device is based on several user-defined metrics. For example, the CPU of the device.
  • A metric is basically a criteria and a set of thresholds that together define whether the state of the device in that criteria is good or badaspect is one of: Good / Waring / Error.
  • Metrics are based on the Alerts feature subsystem of DPOD. :
    • The user can define which alerts are part of
    the system health
    • the System Health by selecting the "System Health Metric" option under [Manage → Alert → Setup Alerts →  Edit Alert].
    • Each alert can be used as a simple alert, as a System Health metric, or both. Using an alert both for alerting and as a System Health metric is recommended, since it makes sure the System Health dashboard will precisely reflect the sent alerts.
  • The following System Health metrics are defined by default:In addition, there
    • Devices CPU Metric
    • Devices Memory Metric
    • Devices Load Metric
    • Devices Fan Metric
    • Devices Temperature Metric
    • Devices Voltage Metric
    • Devices Space Encrypted Metric
    • Devices Space Temp Metric
    • Devices Space Internal Metric
    • System Errors Metric
  • The user can define whether a metric is part of the system health by selecting the "System Health Metric" option 
    • The user may edit the metric setting from [Manage → Alert → Setup Alerts →  Edit Alert] 
    • Device Availability Metric - This is an internal metric
    named
    • based on "Device
    Availability" which is not editable, the purpose of this metric is to sample the device availability.

Assumptions:

  • The existing alerts infrastructure (with additional few more fields) will provide all data and logic to decide if a sample should be alerted and if it considers an error or warning or good.
  • The alerts mechanism will be the only source of current and future metrics. New fields: Is alert used as health metric, warning threshold, damage points
  • Any health metric must be based on a detailed investigation screen
    • Resources Monitoring" option selected at the device level, that checks whether the device is available or not.

Prerequisites:

Metric:

Device Health Settings:

  • For each device, the user can define whether the device is displayed in the System Health dashboard, Damage Points Threshold, Total Warnings Threshold
  • For each device, the user can set thresholds and damage points per health metric. - see device health settings

Device Group Settings:

System Parameters (DB) - default values for:

  • "System Health Dashboard Sample Time Range (min.)" - default to 5 minutes.

...