Federated architecture best fits customers that execute high load (thousands of transactions per second) in their gateways, where the vast majority of the transactions is executed on-premise.
The cell environment implements the federated architecture by distributing DPOD's Store and DPOD's processing (using DPOD's agents) across different federated servers.
The cell environment has two main components:
The following diagram describes the Cell Environment:
The following procedure describes the process of establishing a DPOD cell environment.
From | To | Ports (Defaults) | Protocol | Usage |
---|---|---|---|---|
DPOD Cell Manager | Each Monitored Device | 5550 (TCP) | HTTP/S | Monitored device administration management interface |
DPOD Cell Manager | DNS Server | TCP and UDP 53 | DNS | DNS services. Static IP address may be used. |
DPOD Cell Manager | NTP Server | 123 (UDP) | NTP | Time synchronization |
DPOD Cell Manager | Organizational mail server | 25 (TCP) | SMTP | Send reports by email |
DPOD Cell Manager | LDAP | TCP 389 / 636 (SSL). TCP 3268 / 3269 (SSL) | LDAP | Authentication & authorization. Can be over SSL. |
DPOD Cell Manager | Each DPOD Federated Cell Member | 443 (TCP) | HTTP/S | Communication (data + management) |
DPOD Cell Manager | Each DPOD Federated Cell Member | 9300-9305 (TCP) | ElasticSearch | ElasticSearch Communication (data + management) |
DPOD Cell Manager | Each DPOD Federated Cell Member | 22 (TCP) | TCP | root access is needed for the cell installation and for admin operations from time to time. |
NTP Server | DPOD Cell Manager | 123 (UDP) | NTP | Time synchronization |
Each Monitored Device | DPOD Cell Manager | 60000-60003 (TCP) | TCP | SYSLOG Data |
Each Monitored Device | DPOD Cell Manager | 60020-60023 (TCP) | HTTP/S | WS-M Payloads |
Users IPs | DPOD Cell Manager | 443 (TCP) | HTTP/S | DPOD's Web Console |
Admins IPs | DPOD Cell Manager | 22 (TCP) | TCP | SSH |
Each DPOD Federated Cell Member | DPOD Cell Manager | 443 (TCP) | HTTP/S | Communication (data + management) |
Each DPOD Federated Cell Member | DPOD Cell Manager | 9200, 9300-9400 | ElasticSearch | ElasticSearch Communication (data + management) |
Each DPOD Federated Cell Member | DNS Server | TCP and UDP 53 | DNS | DNS services |
Each DPOD Federated Cell Member | NTP Server | 123 (UDP) | NTP | Time synchronization |
NTP Server | Each DPOD Federated Cell Member | 123 (UDP) | NTP | Time synchronization |
Each Monitored Device | Each DPOD Federated Cell Member | 60000-60003 (TCP) | TCP | SYSLOG Data |
Each Monitored Device | Each DPOD Federated Cell Member | 60020-60023 (TCP) | HTTP/S | WS-M Payloads |
Admins IPs | Each DPOD Federated Cell Member | 22 (TCP) | TCP | SSH |
DPOD Cell Manager | Each DPOD Federated Cell Member | 60000-60003 (TCP) | TCP | Syslog keep-alive data |
DPOD Cell Manager | Each DPOD Federated Cell Member | 60020-60023 (TCP) | TCP | HTTP/S WS-M keep-alive data |
Install DPOD as described in one of the following installation procedures:
As described in the prerequisites section, the cell manager should have two network interfaces.
In this mode, the user is prompted to choose the IP address for the Web Console. Choose the IP address of the external network interface.
After DPOD installation is complete, the user should execute the following operating system performance optimization script:
/app/scripts/tune-os-parameters.sh |
The following section describes the installation process of a single Federated Cell Member (FCM). User should repeat the procedure for every FCM installation.
Important !! The initial installation (until the federation process is executed) of both Cell Manager and Cell Members is a standard All-In-One standalone DPOD installation . In order for the initial installation to complete successful all per-requisites for DPOD installation should be met as described on Hardware and Software Requirements and Prepare Pre-Installed Operating System (including the 3 disk drives) |
As described in the prerequisites section, the federated cell member should have two network interfaces. When installing DPOD, the user is prompted to choose the IP address for the Web Console - this should be the IP address of the external network interface (although the FCM does not run the Web Console service). |
/app/scripts/tune-os-parameters.sh |
User should reboot the server for the new performance optimization to take effect. |
The cell member is usually a "bare metal" server with NVMe disks for maximizing server throughput.
Each of the Store's logical node (service) will be bound to a specific physical processor, disks and memory using NUMA (Non-Uniform Memory Access) technology.
The default cell member configuration assumes 6 NVMe disks which will serve 3 Store logical nodes (2 disks per node).
The following OS mount points should be configured by the user before federating the DPOD cell member to the cell environment.
We highly recommend the use of LVM (Logical Volume Manager) to allow flexible storage for future storage needs. |
Empty cells in the following table should be completed by the user, based on their specific hardware:
Store Node | Mount Point Path | Disk Bay | PCI Slot Number | Disk Serial | Disk OS Path | NUMA node (CPU #) |
---|---|---|---|---|---|---|
2 | /data2 | |||||
2 | /data22 | |||||
3 | /data3 | |||||
3 | /data33 | |||||
4 | /data4 | |||||
4 | /data44 |
In order to identify the disk OS path (e.g.: /dev/nvme01n), disk serial and disk NUMA node use the following command :
Identify all NVMe Disks installed on the server
lspci -nn | grep NVM expected output : 5d:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] 5e:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] ad:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] ae:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] c5:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] c6:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] |
Locate disk's NUMA node
Use the disk PCI slot listed in previous command to identify the NUMA node (the first disk PCI slot is : 5d:00.0 )
lspci -s 5d:00.0 -v expected output : 5d:00.0 Non-Volatile memory controller: Intel Corporation Express Flash NVMe P4500 (prog-if 02 [NVM Express]) Subsystem: Lenovo Device 4712 Physical Slot: 70 Flags: bus master, fast devsel, latency 0, IRQ 93, NUMA node 1 Memory at e1310000 (64-bit, non-prefetchable) [size=16K] Expansion ROM at e1300000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI-X: Enable+ Count=129 Masked- Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [100] Advanced Error Reporting Capabilities: [150] Virtual Channel Capabilities: [180] Power Budgeting <?> Capabilities: [190] Alternative Routing-ID Interpretation (ARI) Capabilities: [270] Device Serial Number 55-cd-2e-41-4f-89-0f-43 Capabilities: [2a0] #19 Capabilities: [2d0] Latency Tolerance Reporting Capabilities: [310] L1 PM Substates Kernel driver in use: nvme Kernel modules: nvme |
From the command output (line number 8) we can identify the NUMA node ( Flags: bus master, fast devsel, latency 0, IRQ 93, NUMA node 1 )
Identify NVMe Disks path
Use the disk PCI slot listed in previous command to identify the disk's block device path
ls -la /sys/dev/block |grep 5d:00.0 expected output : lrwxrwxrwx. 1 root root 0 Nov 5 08:06 259:4 -> ../../devices/pci0000:58/0000:58:00.0/0000:59:00.0/0000:5a:02.0/0000:5d:00.0/nvme/nvme0/nvme0n1 |
Use the last part of the device path (nvme0n1) as input for the following command :
nvme -list |grep nvme0n1 expected output : /dev/nvme0n1 PHLE822101AN3P2EGN SSDPE2KE032T7L 1 3.20 TB / 3.20 TB 512 B + 0 B QDV1LV45 |
The disk's path is /dev/nvme0n1
Store Node | Mount Point Path | Disk Bay | PCI Slot Number | Disk Serial | Disk OS Path | NUMA node (CPU #) |
---|---|---|---|---|---|---|
2 | /data2 | 1 | 2 | PHLE822101AN3PXXXX | /dev/nvme0n1 | 1 |
2 | /data22 | 2 | /dev/nvme1n1 | 1 | ||
3 | /data3 | 4 | /dev/nvme2n1 | 2 | ||
3 | /data33 | 5 | /dev/nvme3n1 | 2 | ||
4 | /data4 | 12 | /dev/nvme4n1 | 3 | ||
4 | /data44 | 13 | /dev/nvme5n1 | 3 |
pvcreate -ff /dev/nvme0n1 vgcreate vg_data2 /dev/nvme0n1 lvcreate -l 100%FREE -n lv_data vg_data2 mkfs.xfs -f /dev/vg_data2/lv_data pvcreate -ff /dev/nvme1n1 vgcreate vg_data22 /dev/nvme1n1 lvcreate -l 100%FREE -n lv_data vg_data22 mkfs.xfs /dev/vg_data22/lv_data |
/etc/fstab file:
/dev/vg_data2/lv_data /data2 xfs defaults 0 0 /dev/vg_data22/lv_data /data22 xfs defaults 0 0 /dev/vg_data3/lv_data /data3 xfs defaults 0 0 /dev/vg_data33/lv_data /data33 xfs defaults 0 0 /dev/vg_data4/lv_data /data4 xfs defaults 0 0 /dev/vg_data44/lv_data /data44 xfs defaults 0 0 |
Create directories for the new data mount points
mkdir -p /data2 /data22 /data3 /data33 /data4 /data44 |
This example does not include other mount points needed, as describe in Hardware and Software Requirements. |
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 2.9T 0 disk └─vg_data2-lv_data 253:6 0 2.9T 0 lvm /data2 nvme1n1 259:5 0 2.9T 0 disk └─vg_data22-lv_data 253:3 0 2.9T 0 lvm /data22 nvme2n1 259:1 0 2.9T 0 disk └─vg_data3-lv_data 253:2 0 2.9T 0 lvm /data3 nvme3n1 259:2 0 2.9T 0 disk └─vg_data33-lv_data 253:5 0 2.9T 0 lvm /data33 nvme4n1 259:4 0 2.9T 0 disk └─vg_data44-lv_data 253:7 0 2.9T 0 lvm /data44 nvme5n1 259:3 0 2.9T 0 disk └─vg_data4-lv_data 253:8 0 2.9T 0 lvm /data4 |
yum install numactl |
Most Linux-based OS uses a local firewall service (e.g.: iptables / firewalld).
Since the OS of the Non-Appliance Mode DPOD installation is provided by the user, it is under the user's responsibility to allow needed connectivity to and from the server.
User should make sure needed connectivity detailed in Network Ports Table is allowed on the OS local firewall service.
When using DPOD Appliance mode installation for the cell manager, local OS based firewall service is handled by the cell member federation script. |
In order to federate and configure the cell member, run the following script in the cell manager once per cell member.
For instance, to federate two cell members, the script should be run twice (in the cell manager) - first time with the IP address of the first cell member, and second time with the IP address of the second cell member.
Important: The script should be executed using the OS root user.
/app/scripts/configure_cell_manager.sh -a <internal IP address of the cell member> -g <external IP address of the cell member> For example: /app/scripts/configure_cell_manager.sh -a 172.18.100.34 -g 172.17.100.33 |
/app/scripts/configure_cell_manager.sh -a 172.18.100.36 -g 172.17.100.35 2018-10-22_16-13-16 INFO Cell Configuration 2018-10-22_16-13-16 INFO =============================== 2018-10-22_16-13-18 INFO 2018-10-22_16-13-18 INFO Log file is : /installs/logs/cell_manager_configuration-2018-10-22_16-13-16.log 2018-10-22_16-13-18 INFO 2018-10-22_16-13-18 INFO Adding new cell member with the following configuration : 2018-10-22_16-13-18 INFO Cell member internal address 172.18.100.36 2018-10-22_16-13-18 INFO Cell member external address 172.17.100.35 2018-10-22_16-13-18 INFO Syslog agents using TCP ports starting with 60000 2018-10-22_16-13-18 INFO Wsm agents using TCP ports starting with 60020 2018-10-22_16-13-18 INFO 2018-10-22_16-13-18 INFO During the configuration process the system will be shut down, which means that new data will not be collected and the Web Console will be unavailable for users. 2018-10-22_16-13-18 INFO Please make sure the required network connectivity (e.g. firewall rules) is available between all cell components (manager and members) according to the documentation. 2018-10-22_16-13-18 INFO 2018-10-22_16-13-20 INFO Please choose the IP address for the cell manager server internal address followed by [ENTER]: 2018-10-22_16-13-20 INFO 1.) 172.18.100.32 2018-10-22_16-13-20 INFO 2.) 172.17.100.31 1 2018-10-22_16-14-30 INFO Stopping application ... 2018-10-22_16-15-16 INFO Application stopped successfully. root@172.18.100.36's password: 2018-10-22_16-21-41 INFO Cell member configuration ended successfully. 2018-10-22_16-21-45 INFO Stopping application ... 2018-10-22_16-22-31 INFO Application stopped successfully. 2018-10-22_16-22-31 INFO Starting application ... |
Note that the script writes two log file, one in the cell manager and one in the cell member. The log file names are mentioned in the script's output.
/app/scripts/configure_cell_manager.sh -a 172.18.100.36 -g 172.17.100.35 2018-10-22_16-05-03 INFO Cell Configuration 2018-10-22_16-05-03 INFO =============================== 2018-10-22_16-05-05 INFO 2018-10-22_16-05-05 INFO Log file is : /installs/logs/cell_manager_configuration-2018-10-22_16-05-03.log 2018-10-22_16-05-05 INFO 2018-10-22_16-05-05 INFO Adding new cell member with the following configuration : 2018-10-22_16-05-05 INFO Cell member internal address 172.18.100.36 2018-10-22_16-05-05 INFO Cell member external address 172.17.100.35 2018-10-22_16-05-05 INFO Syslog agents using TCP ports starting with 60000 2018-10-22_16-05-05 INFO Wsm agents using TCP ports starting with 60020 2018-10-22_16-05-05 INFO 2018-10-22_16-05-05 INFO During the configuration process the system will be shut down, which means that new data will not be collected and the Web Console will be unavailable for users. 2018-10-22_16-05-05 INFO Please make sure the required network connectivity (e.g. firewall rules) is available between all cell components (manager and members) according to the documentation. 2018-10-22_16-05-05 INFO 2018-10-22_16-05-06 INFO Please choose the IP address for the cell manager server internal address followed by [ENTER]: 2018-10-22_16-05-06 INFO 1.) 172.18.100.32 2018-10-22_16-05-06 INFO 2.) 172.17.100.31 1 2018-10-22_16-05-09 INFO Stopping application ... 2018-10-22_16-05-58 INFO Application stopped successfully. root@172.18.100.36's password: 2018-10-22_16-06-46 ERROR Starting rollback 2018-10-22_16-06-49 WARN Issues found that may need attention !! 2018-10-22_16-06-49 INFO Stopping application ... 2018-10-22_16-07-36 INFO Application stopped successfully. 2018-10-22_16-07-36 INFO Starting application ... |
In case of a failure, the script will try to rollback the configuration changes it made, so the problem can be fixed before rerunning it again.
DPOD cell member is using NUMA (Non-Uniform Memory Access) technology. The default cell member configuration binds DPOD's agent to CPU 0 and the Store's nodes to CPU 1.
If the server has 4 CPUs, the user should edit the service files of nodes 2 and 3 and change the bind CPU to 2 and 3 respectively.
To identify the amount of CPUs installed on the server, use the NUMA utility:
numactl -s Example output for 4 CPU server : policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 cpubind: 0 1 2 3 nodebind: 0 1 2 3 membind: 0 1 2 3 |
The services files are located on the directory /etc/init.d/ with the name MonTier-es-raw-trans-Node-2 and MonTier-es-raw-trans-Node-3.
For node MonTier-es-raw-trans-Node-2 OLD VALUE : numa="/usr/bin/numactl --membind=1 --cpunodebind=1" NEW VALUE : numa="/usr/bin/numactl --membind=2 --cpunodebind=2" For node MonTier-es-raw-trans-Node-3 OLD VALUE : numa="/usr/bin/numactl --membind=1 --cpunodebind=1" NEW VALUE : numa="/usr/bin/numactl --membind=3 --cpunodebind=3" |
After a successful execution, you will be able to see the new federated cell member in the Manage → System → Nodes page.
For example, after federating cell member the page should look as follows:
Also, the new agents will be shown in the agents list in the Manage → Internal Health → Agents page.
For example, if the cell manager has two agents and there is a federated cell member with additional four agents, the page will show six agents:
It is possible to configure entire monitored device or just a specific domain to the federated cell member's agents.
To configure monitored device / specific domain please follow instructions on Adding Monitored Devices.