IBM DataPower Operations Dashboard v1.0.8.5

Note: A more recent version of DPOD is available. See DPOD Documentation for the latest documentation.

Agents

The Agents screen is accessible by logging into the Web Console and navigating to [Manage→Internal HealthAgents].

DPOD agents are responsible for collecting data (either actively or passively) from monitored devices and storing it in the Big Data Store. This screen allows you to verify that the agents are up and running.

The screen shows two sections:

  1. Agent Status - Showing the state of all the agents
  2. Agent Processing Status- Showing streaming and memory consumption data

Agent Status

The Agent Status section of the screen is a set of 3 widgets, each displaying data related to a different set of DPOD Agents: Syslog, WS-M and Resources. The agents are monitored internally by a KeepAlive service. For a list of all system services, and an explanation on KeepAlive processing, see section on System Services Management.

Each collector agent is displayed in a colored box (see below). Click an agent's name to open its details in the Agent's Detail view.



The table below describes the details presented for each agent:

DetailDescriptionDesired State
General Health Status

The general health status of the agent is relayed by the color of the box wrapping its details.

  • Green – The agent is running and is ready to receive and store syslog messages (Keepalive checks were successful)
  • Yellow – Syslog records or Keepalive checks did not arrived in the last 3 minutes OR number of dropped records is greater than 0.
  • Red – Syslog records or Keepalive checks did not arrived in the last 10 minutes or more.
  • Grey - Device/Service Resources agents only. No monitored devices were added yet, or device/service resource monitoring wasn't requested for any device (this status does not indicate any problem)

Possible agent issues :

  • Monitored device system time that is not synced with DPOD's system time could send Syslog record with "future" time that will cause the agent's health status to be Yellow or Red.
  • The agent service is down - Syslog records and Keepalive records will be processed causing the agent's status to change to Yellow or Red.
  • The Keepalive service is down - Syslog agent did not receive any records from monitored device, this will cause the agent's status to change to Yellow or Red.
Green - the agent is healthy.



Date / TimeThe timestamp of the last successful record processed by the agent (Syslog record or Keepalive check). A delay of over a minute may suggest a performance problem.
Note: It is important to verify that system time is time synced correctly when reading these values.
< 3 minutes
Msg. Rate

This is the total number of messages of all types (syslog, WS-M, and keepalive messages) processed by this agent in the last 10 minutes.

Verify:

  • That the number is greater than 1. If it isn’t, either agents are down or the network is down.
  • For Syslog agents: the number should reflect the expected throughput of raw logs records received from all monitored devices in the last 10 minutes.
  • For WS-M agents: If WS-M recording is enabled, the number should reflect the message throughput for the recording service/domain in 10 minutes.
  • If this value is more than 500,000 consider redistributing traffic to other agents, in order to optimize performance

You may redirect syslog traffic from one agent to another by assigning it to a domain a specific agent. For more details see Adding Monitored Devices.

1 < value < 500,000

(if there are any monitored devices)

Dropped Msgs
(For Syslog and WS-M agents)
This is the total number of syslog or WS-M messages that were sent from the monitored devices but were not processed by the DPOD agent in the last 10 minutes.

Dropped messages usually indicate that the agent cannot keep up with the load, consider redistribution of traffic to other agents.
0

If you encounter any problems, see how to Troubleshoot links to agent status troubleshooting.

Agents processing status

The bottom of the screen displays the agents processing status graphs, which relay the state of processing agents. 

GraphDetailsDesired State
Files Process PendingThe graph depicts the number of large payloads waiting to be processed. A value higher than 1000 indicates a high load on WS-M subscription.
WS-M usage should be avoided until this folder is cleared by the system or you clear it manually.
Only a few files displayed

Channel Utilization

The graph depicts stream processing usage in percentage. Each colored graph denotes a different agent.

Verify that all agents use less than 80% of their stream processing capacity. If usage goes above 80%, data coming in from collector agents might be lost.
See how to Troubleshoot the issue.

Under 80% for all agents

Agents Free Memory

The graph depicts the collector agents’ free memory over time, where each agent is denoted in a different color.

When an agent's free memory is too low, you might encounter performance problems. See how to Troubleshoot the issue.

Verify that each agent
has at least 30-40 Mil free.

Agent's Detail View

When clicking an agent name on the Agent Status screen, DPOD opens the agent's details in a single-agent details view.

The Agent Details view is composed of 4 widgets (for syslog and WS-M Agents) or 3 widgets (for Resources Agents).

Agent Details

The agent details widget displays the following information for the agent:

DetailDescription
IPThe IP where this agent runs
DNSThe DNS of the agent (if set)
PortThe port the agent is listening on
KeepaliveOn / Off state of the keepalive service for this agent
Dropped Msgs (10 mins)How many messages were lost by the agent
Message Rate (10 min)Number of messages handles by agent in the last 10 minutes
Newest MessageTimestamp of the latest message received on this agent

Recent Keep-Alive Messages

This widget displays a table with details of recent keep-alive messages received on this agent. Scanning this table for changes in frequency may help catching issues.

The following information is displayed for each message:

ColumnDescription
DeviceThe device emitting the message.
Click on the device name to view the device in the Raw Messages view. 
DomainThe domain for this message
CategoryAlways montier-ka
SeverityAlways debug
TimeTimestamp for the Keep-Alive message
DirectionN/A
Object TypeN/A
Object NameN/A
Trans. IDN/A
Client IPThis will always be the originating host so 0.0.0.0
MessageKeep-Alive message text.

Agent Statistics

GraphDescription
Message Rate (per sec)This widget displays a graph of the number of messages per second going through this agent over the last 24 hours period
Dropped Syslog Messages

This graph shows the number of messages dropped by the agent.

This value is cumulative , the agent will reset it to zero only after restart.

Channels UtilizationThe graph depicts stream processing usage of the current agent in percentage.
Free MemoryThe graph depicts the collector agent free memory over time.


Reporting Domains (24 hrs.)

This widget lists the domains reporting in the preceding 24 hours period.
The list may be used to identify that a device has dropped off the monitoring list. 

ColumnDescription
Device NameName of reporting device
Domain NameName of reporting domain

Recent Resources Messages

This widget displays only for the Device and Service Resources Agents. It lists recent resource messages in a table.
Resource messages are status messages where the resource relays the status of its resource consumption.  

For each resource message, the table displays the following details:

ColumnDescription
Device IDID of the device this resource message relates to
Device NameName of the device this resource message relates to
Load TimeTimestamp when the load sampling was taken
LoadLoad sampling value
Memory TimeTimestamp when the memory sampling was taken
Used MemoryMemory used sampling value
Total MemoryTotal memory for the device
Total Memory %Percentage of total memory used at sampling time
CPU TimeTimestamp when the CPU usage sampling was taken
CPUCPU usage (%) at sampling time

Monitored Devices (24 hrs.)

This widget displays only for the Device and Service Resources Agents. It lists the devices monitored by DPOD in the preceding 24 hours period.
The list may be used to identify that a device has dropped off the monitoring list. 


IBM DataPower Operations Dashboard (DPOD) v1.0.8.5