Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • DPOD federated cell member (FCM) should be installed in Non-appliance Mode with High_20dv architecture type, as detailed in the Hardware and Software Requirements.
  • The following software packages (RPMs) are recommended for system maintenance and troubleshooting, but are not required: telnet client, net-tools, iftop, tcpdump, bc, pciutils

Installation

DPOD Installation

...

Store NodeMount Point PathDisk BayPCI Slot NumberDisk SerialDisk OS PathNUMA node (CPU #)
2/data2




2/data22




3/data3




3/data33




4/data4




4/data44




...

  1. To identify which of the server's NVMe disk bays is bound to which of the CPUs, use the hardware manufacture documentation.
    Also, write down the disk's serial number by visually observing the disk.

  2. In order to identify the disk OS path (e.g.: /dev/nvme01n) and the disk serial, install the NVMe disk utility software provided by the hardware supplier. For example: for Intel-based NVMe SSD disks, install "Intel® SSD Data Center Tool" (isdct).
    Example output of the Intel SSD DC tool: and disk NUMA node use the following command :

    1. Identify all NVMe Disks installed on the server

      Code Block
      themeRDark
    isdct
    1. lspci -nn 
    show
    1. | 
    -intelssd
    1. grep NVM
      
    -
    1. 
      
    Intel
    1. expected 
    SSD
    1. output 
    DC
    1. :
      
    P4500
    1. 
      
    Series PHLE822101AN3PXXXX - Bootloader : 0133 DevicePath : /dev/nvme0n1 DeviceStatus : Healthy Firmware : QDV1LV45 FirmwareUpdateAvailable : Please contact your Intel representative about firmware update for this drive. Index : 0 ModelNumber : SSDPE2KE032T7L ProductFamily : Intel SSD DC P4500 Series SerialNumber : PHLE822101AN3PXXXX
  3. Use the disk bay number and the disk serial number (visually identified) and correlate them with the output of the disk tool to identify the disk OS path.
Example for Mount Points and Disk Configurations

...

    1. 5d:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54]
      5e:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54]
      ad:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54]
      ae:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54]
      c5:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54]
      c6:00.0 Non-Volatile memory controller [0108]: Intel Corporation Express Flash NVMe P4500 [8086:0a54]


    2. Locate disk's NUMA node
      Use the disk PCI slot  listed in previous command  to identify the NUMA node (the first disk PCI slot is :  5d:00.0 )

      Code Block
      themeRDark
      linenumberstrue
      lspci  -s 5d:00.0 -v
      
      expected output :
      
      5d:00.0 Non-Volatile memory controller: Intel Corporation Express Flash NVMe P4500 (prog-if 02 [NVM Express])
              Subsystem: Lenovo Device 4712
              Physical Slot: 70
              Flags: bus master, fast devsel, latency 0, IRQ 93, NUMA node 1
              Memory at e1310000 (64-bit, non-prefetchable) [size=16K]
              Expansion ROM at e1300000 [disabled] [size=64K]
              Capabilities: [40] Power Management version 3
              Capabilities: [50] MSI-X: Enable+ Count=129 Masked-
              Capabilities: [60] Express Endpoint, MSI 00
              Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
              Capabilities: [100] Advanced Error Reporting
              Capabilities: [150] Virtual Channel
              Capabilities: [180] Power Budgeting <?>
              Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
              Capabilities: [270] Device Serial Number 55-cd-2e-41-4f-89-0f-43
              Capabilities: [2a0] #19
              Capabilities: [2d0] Latency Tolerance Reporting
              Capabilities: [310] L1 PM Substates
              Kernel driver in use: nvme
              Kernel modules: nvme

      From the command output (line number 8) we can identify the NUMA node ( Flags: bus master, fast devsel, latency 0, IRQ 93, NUMA node 1 )

    3. Identify NVMe Disks path
      Use the disk PCI slot  listed in previous command  to identify the disk's block device path

      Code Block
      themeRDark
      ls -la /sys/dev/block |grep  5d:00.0
      
      expected output :
      lrwxrwxrwx. 1 root root 0 Nov  5 08:06 259:4 -> ../../devices/pci0000:58/0000:58:00.0/0000:59:00.0/0000:5a:02.0/0000:5d:00.0/nvme/nvme0/nvme0n1


      Use the last part of the device path (nvme0n1) as input for the following command :

      Code Block
      themeRDark
      nvme -list |grep nvme0n1
      
      expected output :
      
      /dev/nvme0n1     PHLE822101AN3P2EGN   SSDPE2KE032T7L                           1           3.20  TB /   3.20  TB    512   B +  0 B   QDV1LV45


      The disk's path is  /dev/nvme0n1


  1. Use the disk bay number and the disk serial number (visually identified) and correlate them with the output of the disk tool to identify the disk OS path.
Example for Mount Points and Disk Configurations
Store NodeMount Point PathDisk BayPCI Slot NumberDisk SerialDisk OS PathNUMA node (CPU #)
2/data212PHLE822101AN3PXXXX/dev/nvme0n11
2/data222

/dev/nvme1n11
3/data34

/dev/nvme2n12
3/data335

/dev/nvme3n12
4/data412

/dev/nvme4n13
4/data4413

/dev/nvme5n13
Example for LVM Configuration

...

For instance, to federate two cell members, the script should be run twice (in the cell manager) - first time with the IP address of the first cell manager) - first time with the IP address of the first cell member, and second time with the IP address of the second cell member.Important: The script should be executed using the OS root user.member, and second time with the IP address of the second cell member.


Important: The script should be executed using the OS root user.

Code Block
themeRDark
/app/scripts/configure_cell_manager.sh -a <internal IP address of the cell member> -g <external IP address of the cell member>
For example: /app/scripts/configure_cell_manager.sh -a 172.18.100.34 -g 172.17.100.33
Example for a Successful Execution
Code Block
themeRDark
 /app/scripts/configure_cell_manager.sh -a <internal IP address of the cell member> -g <external IP address of the cell member>
For example: /app/scripts/configure_cell_manager.sh -a 172.18.100.34 -g 172.17.100.33
Example for a Successful Execution
Code Block
themeRDark
 /app/scripts/configure_cell_manager.sh -a 172.18.100.36 -g 172.17.100.35172.18.100.36 -g 172.17.100.35

2018-10-22_16-13-16 INFO Cell Configuration
2018-10-22_16-13-16 INFO ===============================
2018-10-22_16-13-18 INFO 
2018-10-22_16-13-18 INFO Log file is : /installs/logs/cell_manager_configuration-2018-10-22_16-13-16.log
2018-10-22_16-13-18 INFO 
2018-10-22_16-13-1618 INFO Cell Configuration Adding new cell member with the following configuration :
2018-10-22_16-13-1618 INFO =============================== Cell member internal address 172.18.100.36
2018-10-22_16-13-18 INFO Cell member external address 172.17.100.35
2018-10-22_16-13-18 INFO Log file is : /installs/logs/cell_manager_configuration-Syslog agents using TCP ports starting with 60000
2018-10-22_16-13-16.log18 INFO Wsm agents using TCP ports starting with 60020 
2018-10-22_16-13-18 INFO 
2018-10-22_16-13-18 INFO Adding new cell member with the following configuration :
2018-10-22_16-13-18 INFO Cell member internal address 172.18.100.36
2018-10-22_16-13-18 INFO Cell member external address 172.17.100.35 During the configuration process the system will be shut down, which means that new data will not be collected and the Web Console will be unavailable for users.
2018-10-22_16-13-18 INFO Syslog agents using TCP ports starting with 60000
2018-10-22_16-13-18 INFO Wsm agents using TCP ports starting with 60020  Please make sure the required network connectivity (e.g. firewall rules) is available between all cell components (manager and members) according to the documentation.
2018-10-22_16-13-18 INFO 
2018-10-22_16-13-1820 INFO DuringPlease thechoose configuration process the systemIP willaddress befor shutthe down,cell whichmanager meansserver thatinternal newaddress datafollowed will not be collected and the Web Console will be unavailable for users.by [ENTER]:
2018-10-22_16-13-20 INFO 1.) 172.18.100.32
2018-10-22_16-13-1820 INFO Please make sure the required network connectivity (e.g. firewall rules) is available between all cell components (manager and members) according to the documentation.
2.) 172.17.100.31
1
2018-10-22_16-1314-1830 INFO Stopping application ...
2018-10-22_16-1315-2016 INFO PleaseApplication choosestopped thesuccessfully. IP address for the cell manager server internal address followed by [ENTER]:

root@172.18.100.36's password: 
2018-10-22_16-21-41 INFO Cell member configuration ended successfully.
2018-10-22_16-1321-2045 INFO Stopping 1.)application 172.18.100.32
2018-10-22_16-1322-2031 INFO 2.) 172.17.100.31
1Application stopped successfully. 
2018-10-22_16-1422-3031 INFO StoppingStarting application ...
2018-10-22_16-15-16 INFO Application stopped successfully. 
root@172

Note that the script writes two log file, one in the cell manager and one in the cell member. The log file names are mentioned in the script's output.

Example for a Failed Execution
Code Block
themeRDark
 /app/scripts/configure_cell_manager.sh -a 172.18.100.36's password: 
2018-10-22_16-21-41 INFO Cell member configuration ended successfully.
-g 172.17.100.35

2018-10-22_16-2105-4503 INFO Stopping application ...Cell Configuration
2018-10-22_16-2205-3103 INFO Application stopped successfully. 
===============================
2018-10-22_16-2205-3105 INFO Starting application ...

Note that the script writes two log file, one in the cell manager and one in the cell member. The log file names are mentioned in the script's output.

Example for a Failed Execution
Code Block
themeRDark
 /app/scripts/configure_cell_manager.sh -a 172.18.100.36 -g 172.17.100.35
2018-10-22_16-05-05 INFO Log file is : /installs/logs/cell_manager_configuration-2018-10-22_16-05-03.log
2018-10-22_16-05-05 INFO 
2018-10-22_16-05-0305 INFO Cell Configuration Adding new cell member with the following configuration :
2018-10-22_16-05-03 INFO ===============================05 INFO Cell member internal address 172.18.100.36
2018-10-22_16-05-05 INFO Cell member external address 172.17.100.35
2018-10-22_16-05-05 INFO Log file is : /installs/logs/cell_manager_configuration-Syslog agents using TCP ports starting with 60000
2018-10-22_16-05-03.log-05-05 INFO Wsm agents using TCP ports starting with 60020 
2018-10-22_16-05-05 INFO 
2018-10-22_16-05-05 INFO Adding new cell member with the following configuration :
2018-10-22_16-05-05 INFO Cell member internal address 172.18.100.36
2018-10-22_16-05-05 INFO Cell member external address 172.17.100.35 During the configuration process the system will be shut down, which means that new data will not be collected and the Web Console will be unavailable for users.
2018-10-22_16-05-05 INFO Syslog agents using TCP ports starting with 60000
2018-10-22_16-05-05 INFO Wsm agents using TCP ports starting with 60020  Please make sure the required network connectivity (e.g. firewall rules) is available between all cell components (manager and members) according to the documentation.
2018-10-22_16-05-05 INFO 
2018-10-22_16-05-0506 INFO DuringPlease the configurationchoose process the systemIP willaddress befor shutthe down,cell whichmanager meansserver thatinternal newaddress datafollowed will not be collected and the Web Console will be unavailable for users.by [ENTER]:
2018-10-22_16-05-06 INFO 1.) 172.18.100.32
2018-10-22_16-05-0506 INFO Please make sure the required network connectivity (e.g. firewall rules) is available between all cell components (manager and members) according to the documentation2.) 172.17.100.31
1
2018-10-22_16-05-09 INFO Stopping application ...
2018-10-22_16-05-0558 INFO Application stopped successfully. 
root@172.18.100.36's password: 
2018-10-22_16-0506-0646 INFOERROR PleaseStarting choose the IP address for the cell manager server internal address followed by [ENTER]:rollback
2018-10-22_16-06-49 WARN Issues found that may need attention !!
2018-10-22_16-0506-0649 INFO 1.) 172.18.100.32Stopping application ...
2018-10-22_16-0507-0636 INFO 2.) 172.17.100.31
1Application stopped successfully. 
2018-10-22_16-0507-0936 INFO Stopping application ...
2018-10-22_16-05-58 INFO Application stopped successfully. 
root@172.18.100.36's password: 
2018-10-22_16-06-46 ERROR Starting rollback
2018-10-22_16-06-49 WARN Issues found that may need attention !!
2018-10-22_16-06-49 INFO Stopping application ...
2018-10-22_16-07-36 INFO Application stopped successfully. 
2018-10-22_16-07-36 INFO Starting application ...

In case of a failure, the script will try to rollback the configuration changes it made, so the problem can be fixed before rerunning it again.

Cell Member Federation Post Steps

NUMA configuration

DPOD cell member is using NUMA (Non-Uniform Memory Access) technology. The default cell member configuration binds DPOD's agent to CPU 0 and the Store's nodes to CPU 1.
If the server has 4 CPUs, the user should edit the service files of nodes 2 and 3 and change the bind CPU to 2 and 3 respectively.

Identifying NUMA Configuration

...

 Starting application ...

In case of a failure, the script will try to rollback the configuration changes it made, so the problem can be fixed before rerunning it again.

Cell Member Federation Post Steps

NUMA configuration

DPOD cell member is using NUMA (Non-Uniform Memory Access) technology. The default cell member configuration binds DPOD's agent to CPU 0 and the Store's nodes to CPU 1.
If the server has 4 CPUs, the user should edit the service files of nodes 2 and 3 and change the bind CPU to 2 and 3 respectively.

Identifying NUMA Configuration

To identify the amount of CPUs installed on the server, use the NUMA utility:

Code Block
themeRDark
numactl -s

Example output for 4 CPU server :

policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 
cpubind: 0 1 2 3
nodebind: 0 1 2 3
membind: 0 1 2 3
Alter Syslog agents

The services files are located on the directory /etc/init.d/ with name prefix MonTier-SyslogAgent- (should be 4 service files)

Look in the service file the string "numa" and make sure the numa variable definition is as follows :

Code Block
themeRDark
numactl -s

Example output for 4 CPU server :

policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 
cpubind: 0 1 2 3
nodebind: 0 1 2 3
membind: 0 1 2 3numa="/usr/bin/numactl --membind=0 --cpunodebind=0"

/bin/su -s /bin/bash -c "/bin/bash -c 'echo \$\$ >${FLUME_PID_FILE} && exec ${numa} ${exec}......


Alter Store's Node 2 and 3 (OPTIONAL - only if the server has 4 CPUs)

The services files are located on the directory /etc/init.d/ with the namea name MonTier-es-raw-trans-Node-2 and MonTier-es-raw-trans-Node-3.

...