Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Agents screen is accessible by logging into the Web Console and navigating to [Manage→Internal Manage→DPOD HealthAgents].

DPOD agents are responsible for collecting data (either actively or passively) from monitored devices and storing it in the Big Data the Store. This screen allows you to verify that the agents are up and running.

...

The Agent Status section of the screen is a set of 3 widgets, each displaying data related to a different set of DPOD Agents: Syslog, WS-M and Resources. The agents are monitored internally by a KeepAlive service. For a list of all system services, and an explanation on KeepAlive processing, see section on System Services Management.

Each collector agent is displayed in a colored box (see below). Click an agent's name to open its details in the Agent's Detail view.

...

DetailDescriptionDesired State
General Health Status

The general health status of the agent is relayed by the color of the box wrapping its details.

  • Green – The agent is running and is ready to receive and store syslog messages (Keepalive checks were successful)
  • Yellow – Syslog records or Keepalive checks did not arrived in the last 3 minutes OR number of dropped records is greater than 0.
  • Red – Syslog records or Keepalive checks did not arrived in the last 10 minutes or more.
  • Grey - Device/Service Resources agents only. No monitored devices were added yet, or device/service resource monitoring wasn't requested for any device (this status does not indicate any problem)

Possible agent issues :

  • Monitored device system time that is not synced with DPOD's system time could send Syslog record with "future" time that will cause the agent's health status to be Yellow or Red.
  • The agent service is down - Syslog records and Keepalive records will be processed causing the agent's status to change to Yellow or Red.
  • The Keepalive service is down - Syslog agent did not receive any records from monitored device, this will cause the agent's status to change to Yellow or Red.
Green - the agent is healthy.



Date / TimeThe timestamp of the last successful record processed by the agent (Syslog record or Keepalive check). A delay of over a minute may suggest a performance problem.
Note: It is important to verify that system time is time synced correctly when reading these values.
< 3 minutes
Msg. Rate

This is the total number of messages of all types (syslog, WS-M, and keepalive messages) processed by this agent in the last 10 minutes.

Verify:

  • That the number is greater than 1. If it isn’t, either agents are down or the network is down.
  • For Syslog agents: the number should reflect the expected throughput of raw logs records received from all monitored devices in the last 10 minutes.
  • For WS-M agents: If WS-M recording is enabled, the number should reflect the message throughput for the recording service/domain in 10 minutes.
  • If this value is more than 500,000 consider redistributing traffic to other agents, in order to optimize performance

You may redirect syslog traffic from one agent to another by assigning it to a domain a specific agent. For more details see Adding Monitored DevicesGateways.

1 < value < 500,000

(if there are any monitored devices)

Dropped Msgs
(For Syslog and WS-M agents)
This is the total number of syslog or WS-M messages that were sent from the monitored devices but were not processed by the DPOD agent in the last 10 minutes.

Dropped messages usually indicate that the agent cannot keep up with the load, consider redistribution of traffic to other agents.
0

If you encounter any problems, see how to Troubleshoot & Support links to agent status troubleshooting.

...

GraphDetailsDesired State
Files Proces Process PendingThe graph depicts the number of large payloads waiting to be processed. A value higher than 1000 indicates a high load on WS-M subscription.
WS-M usage should be avoided until this folder is cleared by the system or you clear it manually.
Only a few files displayed

Channel Utilization

The graph depicts stream processing usage in percentage. Each colored graph denotes a different agent.

Verify that all agents use less than 80% of their stream processing capacity. If usage goes above 80%, data coming in from collector agents might be lost.
See how to Troubleshoot & Support the issue.

Under 80% for all agents

Agents Free Memory

The graph depicts the collector agents’ free memory over time, where each agent is denoted in a different color.

When an agent's free memory is too low, you might encounter performance problems. See how to Troubleshoot & Support the issue.

Verify that each agent
has at least 30-40 Mil free.

...

The following information is displayed for each message:

ColumnDescription
DeviceThe device emitting the message.
Click on the device name to view the device in the Raw Messages view. 
DomainThe domain for this message
CategoryAlways montier-ka
SeverityAlways debug
TimeTimestamp for the Keep-Alive message
DirectionN/A
Object TypeN/A
Object NameN/A
Trans. IDN/A
Client IPThis will always be the originating host so 0.0.0.0
MessageKeep-Alive message text.

Agent Statistics

GraphDescription
Message Rate (per sec)This widget displays a graph of the number of messages per second going through this agent over the last 24 hours period
Dropped Syslog Messages

This graph shows the number of messages dropped by the agent.

Note

This value is cumulative , the agent will reset it to zero only after restart.


Channels UtilizationThe graph depicts stream processing usage of the current agent in percentage.
Free MemoryThe graph depicts the agent free memory over time.


Reporting Domains (24 hrs.)

...