Article 6NE3F DAOS Command Fails with "Transport layer mercury error" on CentOS 7.9

DAOS Command Fails with "Transport layer mercury error" on CentOS 7.9

by
NorthernLights2003
from LinuxQuestions.org on (#6NE3F)
Hello,

I'm encountering an issue when running the daos cont create command on my DAOS setup. The command fails with a "Transport layer mercury error." Below are the details of the error and my setup:

Command and Error Message:

[root@client2 ~]# daos cont create tank --label mycont
external ERR # [5323.920594] mercury->msg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/na/na_ofi.c:3047
# na_ofi_msg_send(): fi_tsend() failed, rc: -2 (No such file or directory)
external ERR # [5323.921055] mercury->hg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/mercury_core.c:2727
# hg_core_forward_na(): Could not post send for input buffer (NA_NOENTRY)
hg ERR src/cart/crt_hg.c:1104 crt_hg_req_send_cb(0x2a5c8c0) [opc=0x1020004 (DAOS) rpcid=0x18ea69b600000000 rank:tag=0:0] RPC failed; rc: DER_HG(-1020): 'Transport layer mercury error'
mgmt ERR src/mgmt/cli_mgmt.c:882 dc_mgmt_pool_find() tank: failed to get PS replicas from 1 servers, DER_HG(-1020): 'Transport layer mercury error'
pool ERR src/pool/cli.c:198 dc_pool_choose_svc_rank() 00000000:tank: dc_mgmt_pool_find() failed, DER_HG(-1020): 'Transport layer mercury error'
pool ERR src/pool/cli.c:503 dc_pool_connect_internal() 00000000:tank: cannot find pool service: DER_HG(-1020): 'Transport layer mercury error'
ERROR: daos: DER_HG(-1020): Transport layer mercury error

Environment Details:

DAOS Version: daos-2.0.3-5.el7.x86_64
DAOS Client Version: daos-client-2.0.3-5.el7.x86_64
Libfabric Version: libfabric-1.15.1-1.el7.x86_64
Mercury Version: mercury-2.1.0~rc4-9.el7.x86_64
CentOS Version: CentOS 7.9
Fabric Interface: enp0s3

Additional Information:

[root@server ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:95:c2 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
valid_lft 564sec preferred_lft 564sec
inet6 fe80::e25:a2fd:9904:a8ac/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bb:cb:4d brd ff:ff:ff:ff:ff:ff
inet 192.168.56.104/24 brd 192.168.56.255 scope global noprefixroute dynamic enp0s8
valid_lft 370sec preferred_lft 370sec
inet6 fe80::8d85:5b39:5f73:6e0b/64 scope link noprefixroute
valid_lft forever preferred_lft forever

I have also mentioned the DAOS server, client, and agent configuration files for reference.

DAOS Server
## default: daos_server
name: daos_server
#
#
## Access points
## Immutable after running "dmg storage format".
#
## To operate, DAOS will need a quorum of access point nodes to be available.
## Must have the same value for all agents and servers in a system.
## Hosts can be specified with or without port. The default port that is set
## up in port: will be used if a port is not specified here.
#
## default: hostname of this node
access_points: ['10.0.2.15']
#
#
## Default control plane port
#
## Port number to bind daos_server to. This will also be used when connecting
## to access points, unless a port is specified in access_points:
#
## default: 10001
port: 10001
#
#
## Transport credentials specifying certificates to secure communications
#
transport_config:
# # In order to disable transport security, uncomment and set allow_insecure
# # to true. Not recommended for production configurations.
allow_insecure: true
#
# # Location where daos_server will look for Client certificates
client_cert_dir: /etc/daos/certs/clients
# # Custom CA Root certificate for generated certs
ca_cert: /etc/daos/certs/daosCA.crt
# # Server certificate for use in TLS handshakes
cert: /etc/daos/certs/server.crt
# # Key portion of Server Certificate
key: /etc/daos/certs/server.key
provider: ofi+sockets
socket_dir: /var/run/daos_server
nr_hugepages: 4096
control_log_mask: DEBUG
control_log_file: /tmp/daos_server.log
helper_log_file: /tmp/daos_admin.log
engines:
-
targets: 8
nr_xs_helpers: 0
fabric_iface: enp0s3
fabric_iface_port: 31316
log_mask: INFO
log_file: /tmp/daos_engine_0.log
env_vars:
- CRT_TIMEOUT=30

scm_mount: /mnt/daos0
scm_class: ram
scm_size: 8

DAOS Control file

# default: daos_server
name: daos_server

# Default destination port to use when connecting to hosts in the hostlist.
# default: 10001
port: 10001

# Hostlist, a comma separated list of addresses (hostnames or IPv4 addresses).
# default: ['localhost']
hostlist: ['10.0.2.15']

## Transport Credentials Specifying certificates to secure communications

transport_config:
# # In order to disable transport security, uncomment and set allow_insecure
# # to true. Not recommended for production configurations.
allow_insecure: true
#
# # Custom CA Root certificate for generated certs
ca_cert: /etc/daos/certs/daosCA.crt
# # Admin certificate for use in TLS handshakes
cert: /etc/daos/certs/admin.crt
# # Key portion of Admin Certificate
key: /etc/daos/certs/admin.key

DAOS Agent file
# default: daos_server
name: daos_server

# Management server access points
# Must have the same value for all agents and servers in a system.
# default: hostname of this node
access_points: ['10.0.2.15']

# Force different port number to connect to access points.
# default: 10001
port: 10001

## Transport Credentials Specifying certificates to secure communications
#
transport_config:
# # In order to disable transport security, uncomment and set allow_insecure
# # to true. Not recommended for production configurations.
allow_insecure: true
#
# # Custom CA Root certificate for generated certs
ca_cert: /etc/daos/certs/daosCA.crt
# # Agent certificate for use in TLS handshakes
cert: /etc/daos/certs/agent.crt
# # Key portion of Agent Certificate
key: /etc/daos/certs/agent.key
# Use the given directory for creating unix domain sockets
#
# NOTE: Do not change this when running under systemd control. If it needs to
# be changed, then make sure that it matches the RuntimeDirectory setting
# in /usr/lib/systemd/system/daos_agent.service
#
# default: /var/run/daos_agent
#runtime_dir: /var/run/daos_agent

# Full path and name of the DAOS agent logfile.
# default: /tmp/daos_agent.log
log_file: /tmp/daos_agent.log

# Manually define the fabric interfaces and domains to be used by the agent,
# organized by NUMA node.
# If not defined, the agent will automatically detect all fabric interfaces and
# select appropriate ones based on the server preferences.
#
#fabric_ifaces:
#-
# numa_node: 0
# devices:
# -
# iface: ib0
# domain: mlx5_0
# -
# iface: ib1
# domain: mlx5_1
#-
# numa_node: 1
# devices:
# -
# iface: ib2
# domain: mlx5_2
# -
# iface: ib3
# domain: mlx5_3

Any assistance or insights into resolving this issue would be greatly appreciated. Thank you!
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments