...
- DPOD cell manager and federated cell members must be with the same version (minimum version is v1.0.8.5).
- DPOD cell manager can be installed in both Appliance Mode or Non-Appliance Mode with Medium Load architecture type, as detailed in the Hardware and Software Requirements. The manager server can be both virtual or physical.
- DPOD federated cell member (FCM) should be installed in Non-appliance Mode with High_20dv with High Load architecture architecture type, as detailed in the Hardware and Software Requirements.
- Each cell component (manager / FCM) should have two network interfaces:
- External interface - for DPOD users to access the Web Console and for communication between DPOD and Monitored Gateways.
- Internal interface - for internal DPOD components inter-communication (should be a 10Gb Ethernet interface).
- Network ports should be opened in the network firewall as detailed below:
...
- DPOD federated cell member (FCM) should be installed in Non-appliance Mode with High_20dv with High Load architecture type, as detailed in the Hardware and Software Requirements.The following software packages (RPMs) should be installed: iptables, iptables-services, numactl, bc
- The following software packages (RPMs) are recommended for system maintenance and troubleshooting, but are not required: telnet client, net-tools, iftop, tcpdump, bc, pciutils
Installation
DPOD Installation
...
Empty cells in the following table should be completed by the user, based on their specific hardware:
Store Node | Mount Point Path | Disk Bay | PCI Slot Number | Disk Serial | Disk OS Path | NUMA node (CPU #) |
---|---|---|---|---|---|---|
2 | /data2 | |||||
2 | /data22 | |||||
3 | /data3 | |||||
3 | /data33 | |||||
4 | /data4 | |||||
4 | /data44 |
How to Identify Disk OS Path and Disk Serial
- To identify which of the server's NVMe disk bays is bound to which of the CPUs, use the hardware manufacture documentation.
Also, write down the disk's serial number by visually observing the disk. In order to identify the disk OS path (e.g.: /dev/nvme01n) and the , disk serial, install the NVMe disk utility software provided by the hardware supplier. For example: for Intel-based NVMe SSD disks, install "Intel® SSD Data Center Tool" (isdct).
Example output of the Intel SSD DC tool: and disk NUMA node use the following command :Identify all NVMe Disks installed on the server
Code Block theme RDark
lspci -nn
|
grep NVM
expected
output
:
- Use the disk bay number and the disk serial number (visually identified) and correlate them with the output of the disk tool to identify the disk OS path.
Example for Mount Points and Disk Configurations
...
Example for LVM Configuration
Code Block | ||
---|---|---|
| ||
pvcreate -ff /dev/nvme0n1
vgcreate vg_data2 /dev/nvme0n1
lvcreate -l 100%FREE -n lv_data vg_data2
mkfs.xfs -f /dev/vg_data2/lv_data
pvcreate -ff /dev/nvme1n1
vgcreate vg_data22 /dev/nvme1n1
lvcreate -l 100%FREE -n lv_data vg_data22
mkfs.xfs /dev/vg_data22/lv_data |
/etc/fstab file:
...
theme | RDark |
---|
...
5d:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] 5e:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] ad:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] ae:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] c5:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54] c6:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54]
Locate disk's NUMA node
Use the disk PCI slot listed in previous command to identify the NUMA node (the first disk PCI slot is : 5d:00.0 )Code Block theme RDark linenumbers true lspci -s 5d:00.0 -v expected output : 5d:00.0 Non-Volatile memory controller: Intel Corporation Express Flash NVMe P4500 (prog-if 02 [NVM Express]) Subsystem: Lenovo Device 4712 Physical Slot: 70 Flags: bus master, fast devsel, latency 0, IRQ 93, NUMA node 1 Memory at e1310000 (64-bit, non-prefetchable) [size=16K] Expansion ROM at e1300000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI-X: Enable+ Count=129 Masked- Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [100] Advanced Error Reporting Capabilities: [150] Virtual Channel Capabilities: [180] Power Budgeting <?> Capabilities: [190] Alternative Routing-ID Interpretation (ARI) Capabilities: [270] Device Serial Number 55-cd-2e-41-4f-89-0f-43 Capabilities: [2a0] #19 Capabilities: [2d0] Latency Tolerance Reporting Capabilities: [310] L1 PM Substates Kernel driver in use: nvme Kernel modules: nvme
From the command output (line number 8) we can identify the NUMA node ( Flags: bus master, fast devsel, latency 0, IRQ 93, NUMA node 1 )
Identify NVMe Disks path
Use the disk PCI slot listed in previous command to identify the disk's block device pathCode Block theme RDark ls -la /sys/dev/block |grep 5d:00.0 expected output : lrwxrwxrwx. 1 root root 0 Nov 5 08:06 259:4 -> ../../devices/pci0000:58/0000:58:00.0/0000:59:00.0/0000:5a:02.0/0000:5d:00.0/nvme/nvme0/nvme0n1
Use the last part of the device path (nvme0n1) as input for the following command :Code Block theme RDark nvme -list |grep nvme0n1 expected output : /dev/nvme0n1 PHLE822101AN3P2EGN SSDPE2KE032T7L 1 3.20 TB / 3.20 TB 512 B + 0 B QDV1LV45
The disk's path is /dev/nvme0n1
- Use the disk bay number and the disk serial number (visually identified) and correlate them with the output of the disk tool to identify the disk OS path.
Example for Mount Points and Disk Configurations
Store Node | Mount Point Path | Disk Bay | PCI Slot Number | Disk Serial | Disk OS Path | NUMA node (CPU #) |
---|---|---|---|---|---|---|
2 | /data2 | 1 | 2 | PHLE822101AN3PXXXX | /dev/nvme0n1 | 1 |
2 | /data22 | 2 | /dev/nvme1n1 | 1 | ||
3 | /data3 | 4 | /dev/nvme2n1 | 2 | ||
3 | /data33 | 5 | /dev/nvme3n1 | 2 | ||
4 | /data4 | 12 | /dev/nvme4n1 | 3 | ||
4 | /data44 | 13 | /dev/nvme5n1 | 3 |
Example for LVM Configuration
Code Block | ||
---|---|---|
| ||
pvcreate -ff /dev/nvme0n1
vgcreate vg_data2 /dev/nvme0n1
lvcreate -l 100%FREE -n lv_data vg_data2
mkfs.xfs -f /dev/vg_data2/lv_data
pvcreate -ff /dev/nvme1n1
vgcreate vg_data22 /dev/nvme1n1
lvcreate -l 100%FREE -n lv_data vg_data22
mkfs.xfs /dev/vg_data22/lv_data |
/etc/fstab file:
Code Block | ||
---|---|---|
| ||
/dev/vg_data2/lv_data /data2 xfs defaults 0 0 /dev/vg_data22/lv_data /data22 xfs defaults 0 0 /dev/vg_data3/lv_data /data3 xfs defaults 0 0 /dev/vg_data33/lv_data /data33 xfs defaults 0 0 /dev/vg_data4/lv_data /data4 xfs defaults 0 0 /dev/vg_data44/lv_data /data44 xfs defaults 0 0 /dev/vg_data22/lv_data /data22 |
Example for the Final Configuration for 3 Store's nodes
Note |
---|
This example does not include other mount points needed, as describe in Hardware and Software Requirements. |
Code Block | ||
---|---|---|
| ||
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 xfs defaults259:0 0 2.9T 0 0 /dev/vg_data3/disk └─vg_data2-lv_data 253:6 /data3 0 2.9T 0 lvm /data2 nvme1n1 xfs 259:5 defaults 0 2.9T 0 0 /dev/vg_data33/disk └─vg_data22-lv_data 253:3 /data33 0 2.9T 0 lvm /data22 nvme2n1 xfs 259:1 defaults 0 2.9T 0 0 /dev/vg_data4/disk └─vg_data3-lv_data 253:2 /data4 0 2.9T 0 lvm /data3 nvme3n1 xfs defaults259:2 0 2.9T 0 0 /dev/vg_data44/disk └─vg_data33-lv_data /data44 253:5 0 2.9T 0 lvm xfs /data33 nvme4n1 defaults 0 0 |
Example for the Final Configuration for 3 Store's nodes
Note |
---|
This example does not include other mount points needed, as describe in Hardware and Software Requirements. |
Code Block | ||
---|---|---|
| ||
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:4 0 2.9T 0 disk └─vg_data44-lv_data 253:7 0 2.9T 0 lvm /data44 nvme5n1 259:03 0 2.9T 0 disk └─vg_data2data4-lv_data 253:68 0 2.9T 0 lvm /data2 nvme1n1 259:5 0 2.9T 0 disk └─vg_data22-lv_data 253:3 0 2.9T 0 lvm /data22 nvme2n1 259:1 0 2.9T 0 disk └─vg_data3-lv_data 253:2 0 2.9T 0 lvm /data3 nvme3n1 259:2 0 2.9T 0 disk └─vg_data33-lv_data 253:5 0 2.9T 0 lvm /data33 nvme4n1 259:4 0 2.9T 0 disk └─vg_data44-lv_data 253:7 0 2.9T 0 lvm /data44 nvme5n1 259:3 0 2.9T 0 disk └─vg_data4-lv_data 253:8 0 2.9T 0 lvm /data4 |
Preparing Local OS Based Firewall
Most Linux-based OS uses a local firewall service (e.g.: iptables / firewalld).
Since the OS of the Non-Appliance Mode DPOD installation is provided by the user, it is under the user's responsibility to allow needed connectivity to and from the server.
User should make sure needed connectivity detailed in Network Ports Table is allowed on the OS local firewall service.
Note |
---|
When using DPOD Appliance mode installation for the cell manager, local OS based firewall service is handled by the cell member federation script. |
Cell Member Federation
In order to federate and configure the cell member, run the following script in the cell manager once per cell member.
For instance, to federate two cell members, the script should be run twice (in the cell manager) - first time with the IP address of the first cell member, and second time with the IP address of the second cell member.
Important: The script should be executed using the OS root user.
Code Block | ||
---|---|---|
| ||
/app/scripts/configure_cell_manager.sh -a <internal IP address of the cell member> -g <external IP address of the cell member>
For example: /app/scripts/configure_cell_manager.sh -a 172.18.100.34 -g 172.17.100.33 |
Example for a Successful Execution
Code Block | ||
---|---|---|
| ||
/app/scripts/configure_cell_manager.sh -a 172.18.100.34 -g 172.17.100.33
2018-10-01_00-31-56 INFO Cell Configuration
2018-10-01_00-31-56 INFO ===============================
2018-10-01_00-31-58 INFO
2018-10-01_00-31-58 INFO Log file is : /installs/logs/cell_manager_configuration-2018-10-01_00-31-56.log
2018-10-01_00-31-58 INFO
2018-10-01_00-31-58 INFO Adding new cell member with the following configuration :
2018-10-01_00-31-58 INFO Cell member internal address 172.18.100.34
2018-10-01_00-31-58 INFO Cell member external address 172.17.100.33
2018-10-01_00-31-58 INFO Syslog agents using TCP ports starting with 60000
2018-10-01_00-31-58 INFO Syslog agents using TCP ports starting with 60000
2018-10-01_00-31-58 INFO Wsm agents using TCP ports starting with 60020
2018-10-01_00-31-59 INFO
2018-10-01_00-31-59 INFO Please choose the IP address for the cell manager server internal address followed by [ENTER]:
2018-10-01_00-31-59 INFO 1.) 172.18.100.32
2018-10-01_00-31-59 INFO 2.) 172.17.100.31
1
2018-10-01_00-32-31 INFO Stopping application ...
2018-10-01_00-33-22 INFO Application stopped successfully.
root@172.18.100.34's password:
2018-10-01_00-37-24 INFO Cell member configuration ended successfully.
2018-10-01_00-37-29 INFO Stopping application ...
2018-10-01_00-38-17 INFO Application stopped successfully.
2018-10-01_00-38-17 INFO Starting application ...
2018-10-01_00-40-14 INFO Application started successfully.
|
Note that the script writes two log file, one in the cell manager and one in the cell member. The log file names are mentioned in the script's output.
Example for a Failed Execution
Code Block | ||
---|---|---|
| ||
/app/scripts/configure_cell_manager.sh -a 172.18.100.34 -g 172.17.100.33
2018-10-01_00-11-43 INFO Cell Configuration
2018-10-01_00-11-43 INFO ===============================
2018-10-01_00-11-45 INFO
2018-10-01_00-11-45 INFO Log file is : /installs/logs/cell_manager_configuration-2018-10-01_00-11-43.log
2018-10-01_00-11-45 INFO
2018-10-01_00-11-45 INFO Adding new cell member with the following configuration :
2018-10-01_00-11-45 INFO Cell member internal address 172.18.100.34
2018-10-01_00-11-45 INFO Cell member external address 172.17.100.33
2018-10-01_00-11-45 INFO Syslog agents using TCP ports starting with 60000
2018-10-01_00-11-45 INFO Syslog agents using TCP ports starting with 60000
2018-10-01_00-11-45 INFO Wsm agents using TCP ports starting with 60020
2018-10-01_00-11-45 INFO
2018-10-01_00-11-45 INFO Please choose the IP address for the cell manager server internal address followed by [ENTER]:
2018-10-01_00-11-46 INFO 1.) 172.18.100.32
2018-10-01_00-11-46 INFO 2.) 172.17.100.31
1
2018-10-01_00-12-17 INFO Stopping application ...
2018-10-01_00-13-09 INFO Application stopped successfully.
root@172.18.100.34's password:
2018-10-01_00-14-15 ERROR Starting rollback
2018-10-01_00-14-19 WARN Issues found that may need attention !!
2018-10-01_00-14-20 INFO Starting application ...
2018-10-01_00-17-36 INFO Application started successfully. |
In case of a failure, the script will try to rollback the configuration changes it made, so the problem can be fixed before rerunning it again.
Cell Member Federation Post Steps
NUMA configuration
DPOD cell member is using NUMA (Non-Uniform Memory Access) technology. The default cell member configuration binds DPOD's agent to CPU 0 and the Store's nodes to CPU 1.
If the server has 4 CPUs, the user should edit the service files of nodes 2 and 3 and change the bind CPU to 2 and 3 respectively.
Identifying NUMA Configuration
To identify the amount of CPUs installed on the server, use the NUMA utility:
Code Block | ||
---|---|---|
| ||
numactl -s Example output for 4 CPU server : policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 cpubind: 0 1 2 3 nodebind: 0 1 2 3 membind: 0 1 2 3.9T 0 lvm /data4 |
Install NUMA Software
Code Block | ||
---|---|---|
| ||
yum install numactl |
Preparing Local OS Based Firewall
Most Linux-based OS uses a local firewall service (e.g.: iptables / firewalld).
Since the OS of the Non-Appliance Mode DPOD installation is provided by the user, it is under the user's responsibility to allow needed connectivity to and from the server.
User should make sure needed connectivity detailed in Network Ports Table is allowed on the OS local firewall service.
Note |
---|
When using DPOD Appliance mode installation for the cell manager, local OS based firewall service is handled by the cell member federation script. |
Cell Member Federation
In order to federate and configure the cell member, run the following script in the cell manager once per cell member.
For instance, to federate two cell members, the script should be run twice (in the cell manager) - first time with the IP address of the first cell member, and second time with the IP address of the second cell member.
Important: The script should be executed using the OS root user.
Code Block | ||
---|---|---|
| ||
/app/scripts/configure_cell_manager.sh -a <internal IP address of the cell member> -g <external IP address of the cell member>
For example: /app/scripts/configure_cell_manager.sh -a 172.18.100.34 -g 172.17.100.33 |
Example for a Successful Execution
Code Block | ||
---|---|---|
| ||
/app/scripts/configure_cell_manager.sh -a 172.18.100.36 -g 172.17.100.35
2018-10-22_16-13-16 INFO Cell Configuration
2018-10-22_16-13-16 INFO ===============================
2018-10-22_16-13-18 INFO
2018-10-22_16-13-18 INFO Log file is : /installs/logs/cell_manager_configuration-2018-10-22_16-13-16.log
2018-10-22_16-13-18 INFO
2018-10-22_16-13-18 INFO Adding new cell member with the following configuration :
2018-10-22_16-13-18 INFO Cell member internal address 172.18.100.36
2018-10-22_16-13-18 INFO Cell member external address 172.17.100.35
2018-10-22_16-13-18 INFO Syslog agents using TCP ports starting with 60000
2018-10-22_16-13-18 INFO Wsm agents using TCP ports starting with 60020
2018-10-22_16-13-18 INFO
2018-10-22_16-13-18 INFO During the configuration process the system will be shut down, which means that new data will not be collected and the Web Console will be unavailable for users.
2018-10-22_16-13-18 INFO Please make sure the required network connectivity (e.g. firewall rules) is available between all cell components (manager and members) according to the documentation.
2018-10-22_16-13-18 INFO
2018-10-22_16-13-20 INFO Please choose the IP address for the cell manager server internal address followed by [ENTER]:
2018-10-22_16-13-20 INFO 1.) 172.18.100.32
2018-10-22_16-13-20 INFO 2.) 172.17.100.31
1
2018-10-22_16-14-30 INFO Stopping application ...
2018-10-22_16-15-16 INFO Application stopped successfully.
root@172.18.100.36's password:
2018-10-22_16-21-41 INFO Cell member configuration ended successfully.
2018-10-22_16-21-45 INFO Stopping application ...
2018-10-22_16-22-31 INFO Application stopped successfully.
2018-10-22_16-22-31 INFO Starting application ...
|
Note that the script writes two log file, one in the cell manager and one in the cell member. The log file names are mentioned in the script's output.
Example for a Failed Execution
Code Block | ||
---|---|---|
| ||
/app/scripts/configure_cell_manager.sh -a 172.18.100.36 -g 172.17.100.35
2018-10-22_16-05-03 INFO Cell Configuration
2018-10-22_16-05-03 INFO ===============================
2018-10-22_16-05-05 INFO
2018-10-22_16-05-05 INFO Log file is : /installs/logs/cell_manager_configuration-2018-10-22_16-05-03.log
2018-10-22_16-05-05 INFO
2018-10-22_16-05-05 INFO Adding new cell member with the following configuration :
2018-10-22_16-05-05 INFO Cell member internal address 172.18.100.36
2018-10-22_16-05-05 INFO Cell member external address 172.17.100.35
2018-10-22_16-05-05 INFO Syslog agents using TCP ports starting with 60000
2018-10-22_16-05-05 INFO Wsm agents using TCP ports starting with 60020
2018-10-22_16-05-05 INFO
2018-10-22_16-05-05 INFO During the configuration process the system will be shut down, which means that new data will not be collected and the Web Console will be unavailable for users.
2018-10-22_16-05-05 INFO Please make sure the required network connectivity (e.g. firewall rules) is available between all cell components (manager and members) according to the documentation.
2018-10-22_16-05-05 INFO
2018-10-22_16-05-06 INFO Please choose the IP address for the cell manager server internal address followed by [ENTER]:
2018-10-22_16-05-06 INFO 1.) 172.18.100.32
2018-10-22_16-05-06 INFO 2.) 172.17.100.31
1
2018-10-22_16-05-09 INFO Stopping application ...
2018-10-22_16-05-58 INFO Application stopped successfully.
root@172.18.100.36's password:
2018-10-22_16-06-46 ERROR Starting rollback
2018-10-22_16-06-49 WARN Issues found that may need attention !!
2018-10-22_16-06-49 INFO Stopping application ...
2018-10-22_16-07-36 INFO Application stopped successfully.
2018-10-22_16-07-36 INFO Starting application ... |
In case of a failure, the script will try to rollback the configuration changes it made, so the problem can be fixed before rerunning it again.
Cell Member Federation Post Steps
NUMA configuration
DPOD cell member is using NUMA (Non-Uniform Memory Access) technology. The default cell member configuration binds DPOD's agent to CPU 0 and the Store's nodes to CPU 1.
If the server has 4 CPUs, the user should edit the service files of nodes 2 and 3 and change the bind CPU to 2 and 3 respectively.
Identifying NUMA Configuration
To identify the amount of CPUs installed on the server, use the NUMA utility:
Code Block | ||
---|---|---|
| ||
numactl -s
Example output for 4 CPU server :
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12
cpubind: 0 1 2 3
nodebind: 0 1 2 3
membind: 0 1 2 3 |
Alter Syslog agents
The services files are located on the directory /etc/init.d/ with name prefix MonTier-SyslogAgent- (should be 4 service files)
Look in the service file the string "numa" and make sure the numa variable definition is as follows :
Code Block | ||
---|---|---|
| ||
numa="/usr/bin/numactl --membind=0 --cpunodebind=0"
/bin/su -s /bin/bash -c "/bin/bash -c 'echo \$\$ >${FLUME_PID_FILE} && exec ${numa} ${exec}...... |
Alter Store's Node 2 and 3 (OPTIONAL - only if the server has 4 CPUs)
The services files are located on the directory /etc/init.d/ with the namea name MonTier-es-raw-trans-Node-2 and MonTier-es-raw-trans-Node-3.
...