Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The System Health dashboard provides an overview of all devices health states based on metrics

The dashboard is composed of multiple cards divided into groups(optional) to display the health of a device.

In addition, the dashboard displays the DPOD health state.

Assumptions:

  • The existing alerts infrastructure (with additional few more fields) will provide all data and logic to decide if a sample should be alerted and if it considers an error or warning or good.
  • The alerts mechanism will be the only source of current and future metrics. New fields: Is alert used as health metric, warning threshold, damage points
  • Any health metric must be based on a detailed investigation screen

...

  • List of default System Health metrics:
    • Devices CPU Metric
    • Devices Memory Metric
    • Devices Load Metric
    • Devices Fan Metric
    • Devices Temperature Metric
    • Devices Voltage Metric
    • Devices Space Encrypted Metric
    • Devices Space Temp Metric
    • Devices Space Internal Metric
    • System Errors Metric
  • The user can define whether a metric is part of the system health by selecting the "System Health Metric" option 
    • The user may edit the metric setting from [Manage → Alert → Setup Alerts →  Edit Alert] 
  • In addition, there is an internal metric called "Device Availability" which is not editable, the purpose of this metric is to sample the device availability.

Device Health Settings:

  • For each device, the user can define whether the device is displayed in the System Health dashboard, Damage Points Threshold, Total Warnings Threshold
  • For each device, the user can set thresholds and damage points per health metric. - TODO add link to see device health settings

Device Group Settings:

  • For each device, the user can define the device group and the display order -  TODO add link to device groupssee device group settings

System Parameters (DB) - default values for:

  • "System Health Dashboard Sample Time Range (min.)" - default to 5 minutes.

...