Installation Manual
Installation Manual
Installation Manual
Installation Manual
Revision: c0ec5115f
Date: Thu Nov 16 2023
©2023 NVIDIA Corporation & affiliates. All Rights Reserved. This manual or parts thereof may not be
reproduced in any form unless permitted by contract or by written permission of NVIDIA Corporation.
Trademarks
Linux is a registered trademark of Linus Torvalds. PathScale is a registered trademark of Cray, Inc.
Red Hat and all Red Hat-based trademarks are trademarks or registered trademarks of Red Hat, Inc.
SUSE is a registered trademark of SUSE LLC. NVIDIA, CUDA, GPUDirect, HPC SDK, NVIDIA DGX,
NVIDIA Nsight, and NVLink are registered trademarks of NVIDIA Corporation. FLEXlm is a registered
trademark of Flexera Software, Inc. PBS Professional, and Green Provisioning are trademarks of Altair
Engineering, Inc. All other trademarks are the property of their respective owners.
2 Introduction 11
2.1 What Is NVIDIA Base Command Manager? . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 What OS Platforms Is It Available For? . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 What Architectures Does It Run On? . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 What Features Are Supported Per OS And Architecture? . . . . . . . . . . . . . . . 12
2.1.4 What OS Platforms Can It Be Managed From? . . . . . . . . . . . . . . . . . . . . . 12
2.2 Cluster Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
11 Burning Nodes 97
11.1 Test Scripts Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
11.2 Burn Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
11.2.1 Mail Tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11.2.2 Pre-install And Post-install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11.2.3 Post-burn Install Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11.2.4 Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11.2.5 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11.3 Running A Burn Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.3.1 Burn Configuration And Execution In cmsh . . . . . . . . . . . . . . . . . . . . . . . 99
11.3.2 Writing A Test Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
11.3.3 Burn Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
11.4 Relocating The Burn Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
11.4.1 Configuring The Relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
11.4.2 Testing The Relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
• If the cluster has already been installed, tested, and configured, but only needs to be configured
now for a new network, then the administrator should only need to look at Chapter 6. Chapter 6
lays out how to carry out the most common configuration changes that usually need to be done to
make the cluster work in the new network.
• For administrators that are very unfamiliar with clusters, reading the introduction (Chapter 2)
and then the more detailed installation walkthrough for a bare metal installation (Chapter 3, sec-
tions 3.1, 3.2, and 3.3) is recommended. Having carried out the head node installation, the ad-
ministrator can then return to this quickstart chapter (Chapter 1), and continue onward with the
quickstart process of regular node installation (section 1.3).
• The configuration and administration of the cluster after it has been installed is covered in the
BCM Administrator Manual. The Administrator Manual should be consulted for further background
information as well as guidance on cluster administration tasks, after the introduction (Chapter 2)
of the Installation Manual has been read.
1. The BIOS of the head node should have the local time set.
2. The head node should be booted from the BCM installation DVD or from a USB flash drive with
the DVD ISO on it.
2 Quickstart Installation Guide
3. The option: Start Base Command Manager Graphical Installer, or Start Base Command
ManagerText Installer, should be selected in the text boot menu. The graphical installation
is recommended, and brings up the GUI installation welcome screen. The text installer provides a
minimal ncurses-based version of the GUI installation.
Only the GUI installation is discussed in the rest of this quickstart for convenience.
4. At the GUI welcome screen, the Start installation button should be clicked.
• At the Bright Computing Software License screen, the acceptance checkbox should be
ticked. Next should then be ticked.
• At the Linux base distribution screen, the acceptance checkbox should be ticked. Next should
then be clicked.
7. At the Hardware Info screen, the detected hardware should be reviewed. If additional kernel
modules are required, then the administrator should go back to the Kernel Modules screen. Once
all the relevant hardware (Ethernet interfaces, hard drive and DVD drive) is detected, Next should
be clicked.
8. At the Installation source screen, the DVD drive containing the BCM DVD, or the USB flash
drive containing the DVD ISO, should be selected, then Next clicked.
9. At the General cluster settings screen, one or more nameservers and one or more domains
can be set, if they have not already been automatically filled. The remaining settings can usually
be left as is.
10. At the Workload management screen, an HPC workload manager can be selected. The choice can
be made later on too, after BCM has been installed.
11. For the Network topology screen, a Type 1 network is the most common.
12. For the Head node settings screen, the head is given a name and a password.
13. For the Compute nodes settings screen, the head is given a name and a password.
14. For the BMC configuration screen, the use of IPMI/iLO/DRAC/CIMC/Redfish BMCs is
carried out. Adding an IPMI/iLO/DRAC/CIMC/Redfish network is needed to configure
IPMI/iLO/DRAC/CIMC/Redfish interfaces in a different IP subnet, and is recommended.
15. At the Networks screen, the network parameters for the head node should be entered for the inter-
face facing the network named externalnet:
• If using DHCP on that interface, the parameters for IP Address, Netmask and Gateway as
suggested by the DHCP server on the external network can be accepted.
• If not using DHCP on that interface, static values put in instead.
1.1 Installing The Head Node 3
The network parameters for externalnet that can be set include the:
The network externalnet corresponds to the site network that the cluster resides in (for example,
a corporate or campus network). The IP address details are therefore the details of the head node
for a type 1 externalnet network (figure 3.10). A domain name should be entered to suit the local
requirements.
16. For the Head node interfaces screen, the head node network interfaces are assigned networks
and IP addresses. The assigned values can be reviewed and changed.
17. At the Compute node interfaces screen, the compute node interfaces are assigned networks and
IP addresses. The assigned values can be reviewed and changed.
18. At the Disk layout screen, a drive should be selected for the head node. The installation will be
done onto this drive, overwriting all its previous content.
19. At the Disk layout Settings screen, the administrator can modify the disk layout for the head
node by selecting a pre-defined layout.
For hard drives that have less than about 500GB space, the XML file
master-one-big-partition.xml is used by default:
For hard drives that have about 500GB or more of space, the XML file master-standard.xml is
used by default:
The layouts indicated by these tables may be fine-tuned by editing the XML partitioning definition
during this stage. The “max” setting in the XML file corresponds to the “rest” entry in these tables,
and means the rest of the drive space is used up for the associated partition, whatever the leftover
space is.
There are also other layout templates available from a menu.
4 Quickstart Installation Guide
20. At the Additional software screen, extra software options can be chosen for installation if these
were selected for the installation ISO. The extra software options are:
• CUDA
• Ceph
• OFED stack
21. The Summary screen should be reviewed. A wrong entry can still be fixed at this point. The Next
button then starts the installation.
22. The Deployment screen should eventually complete. Clicking on Reboot reboots the head node.
2. Once the machine is fully booted, a login should be done as root with the password that was
entered during installation.
3. A check should be done to confirm that the machine is visible on the external network. Also, it
should be checked that the second NIC (i.e. eth1) is physically connected to the external network.
4. If the parent distribution for BCM is RHEL and SUSE then registration (Chapter 5) should usually
be done.
6. The head node software should be updated via its package manager (yum, dnf, apt, zypper) so
that it has the latest packages (sections 11.2 -11.3 of the Administrator Manual).
2. The BIOS of regular nodes should be configured to boot from the network. The regular nodes
should then be booted. No operating system is expected to be on the regular nodes already. If
there is an operating system there already, then by default, it is overwritten by a default image
provided by the head node during the next stages.
3. If everything goes well, the node-installer component starts on each regular node and a certificate
request is sent to the head node.
If a regular node does not make it to the node-installer stage, then it is possible that additional
kernel modules are needed. Section 5.8 of the Administrator Manual contains more information on
how to diagnose problems during the regular node booting process.
4. To identify the regular nodes (that is, to assign a host name to each physical node), several options
are available. Which option is most convenient depends mostly on the number of regular nodes
and whether a (configured) managed Ethernet switch is present.
1.3 Booting Regular Nodes 5
Rather than identifying nodes based on their MAC address, it is often beneficial (especially in
larger clusters) to identify nodes based on the Ethernet switch port that they are connected to. To
allow nodes to be identified based on Ethernet switch ports, section 3.9 of the Administrator Manual
should be consulted.
If a node is unidentified, then its node console displays an ncurses message to indicate it is an un-
known node, and the net boot keeps retrying its identification attempts. Any one of the following
methods may be used to assign node identities when nodes start up as unidentified nodes:
a. Identifying each node on the node console: To manually identify each node, the “Manually
select node” option is selected for each node. The node is then identified manually by se-
lecting a node-entry from the list, choosing the Accept option. This option is easiest when
there are not many nodes. It requires being able to view the console of each node and key-
board entry to the console.
b. Identifying nodes using cmsh: In cmsh the newnodes command in device mode (page 245,
section 5.4.2 of the Administrator Manual) can be used to assign identities to nodes from the
command line. When called without parameters, the newnodes command can be used to
verify that all nodes have booted into the node-installer and are all waiting to be assigned an
identity.
c. Identifying nodes using Base View: The node identification resource (page 249, section 5.4.2
of the Administrator Manual) in Base View automates the process of assigning identities so
that manual identification of nodes at the console is not required.
Example
To verify that all regular nodes have booted into the node-installer:
Example
Once all regular nodes have been booted in the proper order, the order of their appearance on the
network can be used to assign node identities. To assign identities node001 through node032 to
the first 32 nodes that were booted, the following commands may be used:
5. Each regular node is now provisioned and eventually fully boots. In case of problems, section 5.8
of the Administrator Manual should be consulted.
6. Optional: To configure power management, Chapter 4 of the Administrator Manual should be con-
sulted.
7. To update the software on the nodes, a package manager is used to install to the node image
filesystem that is on the head node.
The node image filesystem should be updated via its package manager (yum, dnf, apt, zypper) so
that it has the latest packages (sections 11.4 -11.5 of the Administrator Manual).
(a) NVIDIA GPU hardware should be detected on the nodes that use it. This is true for NVIDIA
GPU units (separate from the nodes) as well as for on-board NVIDIA GPUs. The lspci
command can be used for detection. For example, for a GPU used by node001:
Example
[root@basecm10 ~]# ssh node001 lspci | grep NVIDIA
00:07.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40c] (rev a1)
(b) AMD CPUs, which have a GPU integrated with the CPU, the CPU chip can similarly be
identified with lscpu:
Example
[root@basecm10 ~]# ssh node001 lscpu | grep "Model name:"
Model name: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
The AMD chips can then be checked against the list of AMD chips with AMD GPUs, as listed
at https://www.amd.com/en/support/kb/release-notes/rn-prorad-lin-18-20
(a) Details of AMD GPU software installation are given in section 7.4.
(b) For NVIDIA GPUs, assuming the GPU is on the regular node node001, and that the hardware
is supported by CUDA 12.1, then software installation is carried out at the head node as
follows:
i. The software components are installed for the head node itself with:
[root@basecm10 ~]# yum install cuda12.1-toolkit cuda12.1-sdk
ii. Components are installed into the image used by the nodes that have the GPUs, for ex-
ample the image default-image, with:
[root@basecm10 ~]# yum --installroot=/cm/images/default-image install cuda-driver cuda-dcgm
1.4 Quickstart For GPUs 7
iii. The nodes with GPUs can then simply be rebooted to compile the CUDA drivers as the
node boots, and to start the CUDA driver up:
[root@basecm10 ~]# cmsh -c 'device; reboot -n node001..node015'
Further details on the basic installation of CUDA for NVIDIA GPUs are given in section 9
This starts up a TUI configuration. An NVIDIA GPU can be configured via for Slurm using the
Setup (Step by Step) option for Slurm (section 7.3.2 of the Administrator Manual).
After configuring the WLM server, WLM submission and WLM client roles for the nodes of the
cluster, a screen that asks if GPU resources should be configured is displayed (figure 1.1):
Following through brings up a GPU device settings configuration screen (figure 1.2):
Figure 1.2: Slurm With cm-wlm-setup: GPU Device Settings Configuration Screen
The help text option in the screen gives hints based on the descriptions at https://slurm.schedmd.
com/gres.conf.html, and also as seen in Slurm’s man (5) gres.conf.
8 Quickstart Installation Guide
Figure 1.2 shows 2 physical GPUs on the node being configured. The type is an arbitrary string for
the GPU, and each CPU core is allocated an alias GPU device.
The next screen (figure 1.3) allows the NVIDIA CUDA MPS (Multi-Process Service) to be config-
ured:
Figure 1.3: Configuring An NVIDIA GPU For Slurm With cm-wlm-setup: MPS Settings Configuration
Screen
The help text for this screen gives hints on how the fields can be filled in. The number of GPU
cores (figure 1.3) for a GPU device can be set.
The rest of the cm-wlm-setup procedure can then be completed.
The regular nodes that had a role change during cm-wlm-setup can then be rebooted to pick up the
workload manager (WLM) services. A check via the cmsh command ds should show what nodes
need a restart.
Example
If, for example, the range from node001 to node015 needs to be restarted to get the WLM services
running, then it could be carried out with:
Example
More on these attributes can be found in the man pages (man 5 gres.conf).
NVIDIA configuration for Slurm and other workload managers is described in further detail in
section 7.5 of the Administrator Manual
The “Hello World” helloworld.cu script from section 8.5.4 of the User Manual can be saved in
auser’s directory, and then compiled for a GPU with nvcc:
The output from submission to a node with a GPU can then be seen:
More about Slurm batch scripts and GPU compilation can be found in Chapter 8 of the User Man-
ual.
Users can use the modules command to switch the environment to the appropriate Python version.
For example, to switch to Python 3.9:
[root@basecm10 ~]# python -V
Python 3.6.8
[root@basecm10 ~]# module load python39
[root@basecm10 ~]# python -V
Python 3.9.10
If the change is carried out correctly, then support is not available for Python-related bugs, but is
available for BCM-related features.
• This manual, the Installation Manual, has more details and background on the installation of
the cluster in the next chapters.
• The Upgrade Manual describes upgrading from earlier versions of NVIDIA Base Command
Manager.
• The User Manual describes the user environment and how to submit jobs for the end user
• The Cloudbursting Manual describes how to deploy the cloud capabilities of the cluster.
• The Developer Manual has useful information for developers who would like to program with
BCM.
• The Machine Learning Manual describes how to install and configure machine learning capa-
bilities with BCM.
• Red Hat Enterprise Linux Server and some derivatives, such as Rocky Linux
– Versions 8.x
– Versions 9.x.
• SLES versions:
• Ubuntu versions:
Typically BCM is installed either on a head node, or on a pair of head nodes in a high availability
configuration. Images are provisioned to the non-head nodes from the head node(s). The OS for the
images is by default the same as that of the head node(s), but the OS used for the images can be changed
later on (section 11.6 of the Administrator Manual).
At the time of writing of this section in September 2023, the OS platforms listed all support NVIDIA
AI Enterprise (https://docs.nvidia.com/ai-enterprise/index.html), with the exception for now of
RHEL9 and derivatives.
• on the x86_64 architecture that is supported by Intel and AMD 64-bit CPUs
• Base View (section 2.4 of the Administrator Manual): a GUI which conveniently runs on modern
desktop web browsers, and therefore on all operating system versions that support a modern
browser. This includes Microsoft Windows, MacOS and iOS, and Linux.
• cmsh (section 2.5 of the Administrator Manual): an interactive shell front end that can be accessed
from any computing device with a secured SSH terminal access
The head node is the most important machine within a cluster because it controls all other devices,
such as compute nodes, switches and power distribution units. Furthermore, the head node is also the
host that all users (including the administrator) log in to in a default cluster. The head node is typically
the only machine that is connected directly to the external network and is usually the only machine in a
cluster that is equipped with a monitor and keyboard. The head node provides several vital services to
the rest of the cluster, such as central data storage, workload management, user management, DNS and
DHCP service. The head node in a cluster is also frequently referred to as the master node.
Often, the head node is replicated to a second head node, frequently called a passive head node. If
the active head node fails, the passive head node can become active and take over. This is known as a
high availability setup, and is a typical configuration (Chapter 17 of the Administrator Manual) in BCM.
A cluster normally contains a considerable number of non-head, or regular nodes, also referred to
simply as nodes. The head node, not surprisingly, manages these regular nodes over the network.
2.2 Cluster Structure 13
Most of the regular nodes are compute nodes. Compute nodes are the machines that will do the heavy
work when a cluster is being used for large computations. In addition to compute nodes, larger clusters
may have other types of nodes as well (e.g. storage nodes and login nodes). Nodes typically install
automatically through the (network bootable) node provisioning system that is included with BCM.
Every time a compute node is started, the software installed on its local hard drive is synchronized
automatically against a software image which resides on the head node. This ensures that a node can
always be brought back to a “known state”. The node provisioning system greatly eases compute node
administration and makes it trivial to replace an entire node in the event of hardware failure. Software
changes need to be carried out only once (in the software image), and can easily be undone. In general,
there will rarely be a need to log on to a compute node directly.
In most cases, a cluster has a private internal network, which is usually built from one or multiple
managed Gigabit Ethernet switches, or made up of an InfiniBand or Omni-Path fabric. The internal net-
work connects all nodes to the head node and to each other. Compute nodes use the internal network for
booting, data storage and interprocess communication. In more advanced cluster setups, there may be
several dedicated networks. It should be noted that the external network—which could be a university
campus network, company network or the Internet—is not normally directly connected to the internal
network. Instead, only the head node is connected to the external network.
Figure 2.1 illustrates a typical cluster network setup.
Most clusters are equipped with one or more power distribution units. These units supply power to
all compute nodes and are also connected to the internal cluster network. The head node in a cluster can
use the power control units to switch compute nodes on or off. From the head node, it is straightforward
to power on/off a large number of compute nodes with a single command.
3
Installing NVIDIA Base
Command Manager
This chapter describes in detail the installation of NVIDIA Base Command Manager onto the head node
of a cluster. Sections 3.1 and 3.2 list hardware requirements and supported hardware. Section 3.3 gives
step-by-step instructions on installing BCM from a DVD or USB drive onto a head node that has no
operating system running on it initially, while section 3.4 gives instructions on installing onto a head
node that already has an operating system running on it.
Once the head node is installed, the other, regular, nodes can (network) boot off the head node
and provision themselves from it with a default image, without requiring a Linux distribution DVD
or USB drive themselves. Regular nodes normally have any existing data wiped during the process of
provisioning from the head node, which means that a faulty drive can normally simply be replaced by
taking the regular node offline, replacing its drive, and then bringing the node back online, without
special reconfiguration. The details of the network boot and provisioning process for the regular nodes
are described in Chapter 5 of the Administrator Manual.
The installation of software on an already-configured cluster running BCM is described in Chapter 11
of the Administrator Manual.
• 80GB diskspace
• 2 Gigabit Ethernet NICs (for the most common Type 1 topology (section 3.3.9))
Recommended hardware requirements for larger clusters are discussed in detail in Appendix B.
Other brands are also expected to work, even if not explicitly supported.
Other brands are also expected to work, although not explicitly supported.
Other brands with the same SNMP MIB mappings are also expected to work, although not explicitly
supported.
3.3 Head Node Installation: Bare Metal Method 17
• iDRAC
• IPMI 1.5/2.0
• CIMC
• Redfish v1
3.2.6 GPUs
• NVIDIA Tesla with latest recommended drivers
• NVIDIA GeForce and other older generations are mostly supported. The BCM support team can
be consulted for details.
• NVIDIA DGX servers and workstations are supported for Ubuntu 20.04 at the time of writing of
this section (March 2023).
The GPUs that are NVIDIA-certified for AI Enterprise are listed in the support matrix at https://docs.
nvidia.com/ai-enterprise/latest/product-support-matrix/index.html#support-matrix.
3.2.7 RAID
Software or hardware RAID are supported. Fake RAID is not regarded as a serious production option
and is supported accordingly.
Special steps for installation from a bootable USB device: If a bootable USB device is to be used, then
the instructions within the BCM ISO, in the file README.BRIGHTUSB should be followed to copy the ISO
image over to the USB device. After copying the ISO image, the MD5 checksum should be validated
to verify that the copied ISO is not corrupt. This is important, because corruption is possible in subtle
ways that may affect operations later on, and in ways that are difficult to uncover.
The ISO Boot menu offers a default option of booting from the hard drive, with a countdown to
starting the hard drive boot. To install BCM, the countdown should be interrupted by selecting the
option of “Start Base Command Manager Graphical Installer” instead.
Selecting the option allows kernel parameter options to be provided to the installer.
Default kernel parameter options are provided so that the administrator can simply press the enter
key to go straight on to start the installer, and bring up the welcome screen (section 3.3.2).
1. a setting for the external network interface that is to be used. For example: eth0 or eth1.
2. a setting for the network configuration of the external network, to be explained soon. The network
configuration option can be built either using static IP addressing or with DHCP.
3. a setting for the password, for example secretpass, for the login to the cluster manager that is
about to be installed.
Example
netconf=eth0:static:10.141.161.253,10.141.255.254,255.255.0.0:secretpass
Example
netconf=eth0:dhcp:secretpass
A remote installation can alternatively be carried out later on without setting netconf, by using
the text mode installer to set up networking (section 3.5), or by using GUI mode installer Continue
remotely option (figure 3.3).
An administrator who would like to simply start installation can click on the Start installation
button at the left side of the screen.
The agreement is carried out by ticking the checkbox and clicking on the Next button.
A similar screen after that asks the user to agree to the Base Distribution EULA. This is the end user
license agreement for the distribution (RHEL, Rocky, Ubuntu and so on) that is to be used as the base
upon which BCM is to run.
• Load config: allows an existing configuration file to be loaded and used by the installation. This
option is available only during the first few screens.
• Show config: allows any already loaded configuration file to be displayed. There is a default con-
figuration loaded by default, with values that may suit the cluster already. However, the defaults
are not expected to be optimal, and may not even work for the actual physical configuration.
• Continue remotely: allows the administrator to leave the console and access the cluster from a
remote location. This can be useful for administrators who, for example, prefer to avoid working
inside a noisy cold data center. If Continue remotely is selected, then addresses are displayed
on the console screen, for use with a web browser or SSH, and the console installation screen is
locked.
• Back: if not grayed out, the Back option allows the administrator to go back a step in the installa-
tion.
3.3 Head Node Installation: Bare Metal Method 21
Changes to the modules to be loaded can be entered by reordering the loading order of modules, by
removing modules, and adding new modules. Clicking the + button opens an input box for adding a
module name and optional module parameters (figure 3.5). The module can be selected from a built-in;
it can be automatically extracted from a .deb or .rpm package; or it can simply be selected by selecting
an available .ko kernel module file from the filesystem.
A module can also be blacklisted, which means it is prevented from being used, by clicking on the
button. This can be useful when replacing one module with another.
22 Installing NVIDIA Base Command Manager
Clicking Next then leads to the “Hardware info” overview screen, described next.
Figure 3.6: Hardware Overview Based On Hardware Detection Used For Loading Kernel Modules
3.3 Head Node Installation: Bare Metal Method 23
Clicking Next in the Hardware Info screen leads to the Installation source configuration screen,
described next.
The administrator must select the correct device to continue the installation.
Optionally, a media integrity check can be set.
Clicking on the Next button starts the media integrity check, if it was set. The media integrity check
can take about a minute to run. If all is well, then the “Cluster settings” setup screen is displayed, as
described next.
• Cluster name
• Administrator email: Where mail to the administrator goes. This need not be local.
• Time zone
• Time servers: The defaults are pool.ntp.org servers. A time server is recommended to avoid
problems due to time discrepancies between nodes.
• Environment modules: Traditional Tcl modules are set by default. Lmod is an alternative.
If no workload manager is selected here, then it can be installed later on, after the cluster installation
without the workload manager has been done. Details on installing a workload manager later on are
given in Chapter 7 on workload management of the Administrator Manual.
The default client slot number that is set depends on the workload manager chosen.
• If PBS Professional or OpenPBS is selected as a workload management system, then the number
of client slots defaults to 1. After the installation is completed the administrator should update the
value in the pbsproclient role to the desired number of slots for the compute nodes.
• For all other workload management systems, the number of client slots is determined automati-
cally.
The head node can also be selected for running jobs, thereby acting as an additional compute node.
This can be a sensible choice on small clusters if the head node can spare such resources.
Clicking Next on this screen leads to the Network topology screen.
A type 1 network: has its nodes connected on a private internal network. This is the default net-
work setup. In this topology, a network packet from a head or regular node destined for any
external network that the cluster is attached to, by default called Externalnet, can only reach the
external network by being routed and forwarded at the head node itself. The packet routing for
Externalnet is configured at the head node.
A type 1 network is the most common and simple way to run the cluster. It means that the head
node provides DHCP and PXE services (during pre-init stage node booting only) to a secondary,
isolated network for the worker nodes, segregating the cluster traffic. The external (typically a
corporate) network is then only used to provide access to the head node(s) for management.
One limitation is that broader network access must be provided through routing or via a proxy,
should anyone outside of the cluster network need to access a node.
3.3 Head Node Installation: Bare Metal Method 27
A type 2 network: has its nodes connected via a router to a public network. In this topology,
a network packet from a regular node destined for outside the cluster does not go via the head
node, but uses the router to reach a public network.
Packets between the regular nodes and the head node however still normally go directly to each
other, including the DHCP/PXE-related packets during pre-init stage node booting, since in a
normal configuration the regular nodes and head node are on the same network.
Any routing for beyond the router is configured on the router, and not on the cluster or its parts.
Care should be taken to avoid conflicts between the DHCP server on the head node and any
existing DHCP server on the internal network, if the cluster is being placed within an existing
corporate network that is also part of Internalnet (there is no Externalnet in this topology).
Typically, in the case where the cluster becomes part of an existing network, there is another router
configured and placed between the regular corporate machines and the cluster nodes to shield
them from effects on each other.
A type 2 network does not isolate the worker nodes from the network above it. Instead, each node
remains reachable through the main data plane. This is useful for clusters hosting services, such
as a web portal, avoiding the use of proxies.
28 Installing NVIDIA Base Command Manager
A type 3 network: has its nodes connected on a routed public network. In this topology, a network
packet from a regular node, destined for another network, uses a router to get to it. The head node,
being on another network, can only be reached via a router too. The network the regular nodes are
on is called Internalnet by default, and the network the head node is on is called Managementnet
by default. Any routing configuration for beyond the routers that are attached to the Internalnet
and Managementnet networks is configured on the routers, and not on the clusters or its parts.
A consequence of using a router in the type 3 configuration is that the communication between
the head node and the regular nodes is via OSI layer 3. So, OSI layer 2 used by DHCP is not
directly supported. However, DHCP/PXE packets still need to be exchanged between the head
and regular nodes during pre-init node boot. The usual way to relay the packets is using a DHCP
relay agent. Configuration of a DHCP relay agent is outside the scope of BCM configuration, and
is typically done by the network administrator or the router vendor.
For a type 2 network, a DHCP relay agent may also be needed if the regular nodes are spread
across several subnets.
Selecting the network topology helps decide the predefined networks on the Networks settings
screen later (figure 3.16). Clicking Next here leads to the Head node settings screen, described next.
Clicking Next leads to the Compute node settings screen, described next.
By default therefore, the first compute node takes the name node001, the second compute node
takes the name node002, and so on.
If the administrator confirms that the nodes are to use BMCs (Baseboard Management Controllers)
that are compatible with IPMI, iLO, CIMC, iDRAC, or Redfish, then the BMC network options appear.
By default, for the compute nodes, the BMC is automatically configured.
For a Type 1 network, the head node BMC is often connected to an ethernet segment that has the
external network running on it, while the BMCs on the compute nodes are normally connected to an
ethernet segment that has the internal network on it.
Once a network associated with the ethernet segment is chosen, it means that further BMC-related
networking values can be set for the BMCs.
A new Layer 3 IP subnet can be created for BMC interfaces.
The BMC interface can be configured as a shared physical interface with an already existing network
interface. However this can in some cases cause problems during early system BIOS checks. A dedicated
physical BMC interface is therefore recommended.
If a BMC is configured, then the BMC password is set to a random value. Retrieving and changing a
BMC password is covered in section 3.7.2 of the Administrator Manual. BMC configuration is discussed
further in section 3.7 of the Administrator Manual.
Clicking Next leads to the Networks screen, described next.
3.3.13 Networks
The Networks configuration screen (figure 3.16) displays the predefined list of networks, based on the
selection of network topology and BMC networks made in the earlier screens.
32 Installing NVIDIA Base Command Manager
The Networks configuration screen allows the parameters of the network interfaces to be configured
via tabs for each network. In addition to any BMC networks:
For a type 1 setup, an external network and an internal network are always defined.
For a type 2 setup, an internal network is defined but no external network is defined.
For a type 3 setup, an internal network and a management network are defined.
• for externalnet correspond to the details of the head node external network interface.
• for internalnet correspond to the details of how the compute nodes are configured.
• for a BMC network correspond to the details of how the BMC is connected
Additional custom networks can be added in the Networks configuration screen by clicking on the
+ button.
Clicking Next in this screen validates all network settings. Invalid settings for any of the defined
networks cause an alert to be displayed, explaining the error. A correction is then needed to proceed
further. Settings may of course be valid, but incorrect—the validation is merely a sanity check. It may
be wise for the cluster administrator to check with the network specialist that the networks that have
been configured are set up as really intended.
If all settings are valid, then the Next button brings the installation on to the Head node interfaces
screen, described next.
3.3 Head Node Installation: Bare Metal Method 33
If a BMC network is to be shared with a regular network, then an alias interface is shown too. In
figure 3.17 an alias interface, ens3:ipmi, is shown.
Interfaces can be created or removed.
Dropdown selection allows the proposed values to be changed. It is possible to swap network inter-
faces with dropdown selection.
Clicking the Next button brings the installation on to the Compute node interfaces screen, de-
scribed next.
The boot interface BOOTIF is used to pick up the image for the node via node provisioning.
The IP offset is used to calculate the IP address assigned to a regular node interface. The nodes are
conveniently numbered in a sequence, so their interfaces are typically also given a network IP address
that is in a sequence on a selected network. In BCM, interfaces by default have their IP addresses
assigned to them sequentially, in steps of 1, starting after the network base address.
The default IP offset is 0.0.0.0, which means that the node interfaces by default start their range at
the usual default values in their network.
With a modified IP offset, the point at which addressing starts is altered. For example, a different
offset might be desirable when no IPMI network has been defined, but the nodes of the cluster do have
IPMI interfaces in addition to the regular network interfaces. If a modified IP offset is not set for one of
the interfaces, then the BOOTIF and ipmi0 interfaces get IP addresses assigned on the same network by
default, which could be confusing.
However, if an offset is entered for the ipmi0 interface, then the assigned IPMI IP addresses start
from the IP address specified by the offset. That is, each modified IPMI address takes the value:
address that would be assigned by de f ault + IP o f f set
Example
Taking the case where BOOTIF and IPMI interfaces would have IP addresses on the same network with
the default IP offset:
Then, on a cluster of 10 nodes, a modified IPMI IP offset of 0.0.0.20 means:
Clicking the Next button brings the installation on to the Disk layout screen, described next.
It is used to set the partitioning layouts. The partitioning layout XML schema is described in detail
in Appendix D of the Administrator Manual.
The administrator must set the disk partitioning layout for the head node and regular nodes with
the two options: Head node disk layout and Compute nodes disk layout.
• The icon can be clicked to allow a custom partitioning layout specification to be added:
– as a file
– from the default template, for use as a starting point for a specification
• Partitioning layouts can be edited with the icon. This brings up a screen (figure 3.21) that
allows the administrator to view and change layout values within the layout’s configuration XML
file:
3.3 Head Node Installation: Bare Metal Method 37
• The head node partitioning layout is the only installation setting that cannot easily be changed
after the completion (section 3.3.20) of installation. It should therefore be decided upon with care.
• By default, BCM mounts filesystems on the head node with ACLs set and extended attributes set.
• The XML schema allows the definition of a great variety of layouts in the layout’s configuration
XML file:
Example
1. for a large cluster or for a cluster that is generating a lot of monitoring or burn data, the
default partition layout partition size for /var may fill up with log messages because log
messages are usually stored under /var/log/. If /var is in a partition of its own, as in the
default head node partitioning layout presented when the hard drive is about 500GB or more,
then providing a larger size of partition than the default for /var allows more logging to take
place before /var is full. Modifying the value found within the <size></size> tags associated
with that partition in the XML file modifies the size of the partition that is to be installed. This
can be conveniently done from the front end shown in figure 3.21.
2. the administrator could specify the layout for multiple non-RAID drives on the head node
using one <blockdev></blockdev> tag pair within an enclosing <device></device> tag pair for
each drive.
3. For non-boot partitions, it is possible to set up LUKS encrypted disk partitions on head and
regular nodes. Scrolling through the Partition Properties column of figure 3.21, and tick-
ing the Enable encryption checkbox, makes the LUKS configuration parameters available
(figure 3.22):
38 Installing NVIDIA Base Command Manager
The parameters can be left at their default values to set up an encrypted partition.
If setting parameters, then there are some existing fields to set the more common parame-
ters. Settings for less-common parameters that have no existing fields can be specified and
appended to the field with the Additional Parameters: setting.
The settings are automatically stored in the XML specification for the disk layout and can be
viewed there by selecting the XML Output tab.
How a cluster administrator applies this configured disk encryption to a node that is booting
up is covered in Appendix D.17 of the Administrator Manual.
Clicking Next on the Disk layout screen leads to the Additional software screen, described next.
Clicking Next on the Additional software screen leads to the Summary screen, described next.
3.3.19 Summary
The Summary screen (figure 3.24), summarizes some of the installation settings and parameters config-
ured during the previous stages.
40 Installing NVIDIA Base Command Manager
3.3.20 Deployment
The Deployment screen (figure 3.25) shows the progress of the deployment. It is not possible to navigate
back to previous screens once the installation has begun. The installation log can be viewed in detail by
clicking on Install log.
The Reboot button restarts the machine. Alternatively, the head node can be set to automatically
reboot when deployment is complete.
During the reboot, the BIOS boot order may need changing, or the DVD may need to be removed, in
order to boot from the hard drive on which BCM has been installed.
3.4 Head Node Installation: Ansible Add-On Method 41
After rebooting, the system starts and presents a login prompt. The cluster administrator can log in
as root using the password that was set during the installation procedure.
The cluster should then be updated with the latest packages (Chapter 11 of the Administrator Manual).
After the latest updates have been installed, the system is ready to be configured.
• The installation configuration may conflict with what has already been installed. The problems
that arise can always be resolved, but an administrator that is not familiar with BCM should be
prepared for troubleshooting.
With the release of BCM version 9.2, using the head node installer Ansible collection is the method
for performing add-on installations.
Aside: Ansible can also be used with BCM once NVIDIA Base Command Manager is installed. This
integration is described in section 16.10 of the Administrator Manual.
• An Ansible module is code, usually in Python, that is executed by Ansible to carry out Ansible
tasks, usually on a remote node. The module returns values.
• An Ansible playbook is a YAML file. The file declares a configuration that is to be executed (“the
playbook is followed”) on selected machines. The execution is usually carried out over SSH, by
placing modules on the remote machine.
• Traditionally, official Ansible content was obtained as a part of milestone releases of Ansible En-
gine, (the Red Hat version of Ansible for the enterprise).
• Since Ansible version 2.10, the official way to distribute content is via Ansible content collections.
Collections are composed of Ansible playbooks, modules, module utilities and plugins. The col-
lection is a formatted set of tools used to achieve automation with Ansible.
• https://github.com/Bright-Computing/bright-installer-ansible/tree/main/playbooks
contains additional documentation and example playbooks.
3.5 Enabling Remote Browser-Based Installation Via The Text Mode Installer 43
3.5 Enabling Remote Browser-Based Installation Via The Text Mode Installer
When carrying out an installation as in section 3.3, the installer is normally run on the machine that is to
be the head node of the cluster. A text mode installer is presented as an alternative to the GUI installer
(figure 3.1).
The text mode installer is a very minimal installer compared with the GUI installer. The GUI instal-
lation is therefore usually preferred.
However, in some cases the GUI installation can fail to start. For example, if X is not working cor-
rectly for some reason on the head node.
A way to still run a GUI installation is then to first run the text mode installer, and use it to run the
Remote Install option from its main menu (figure 3.26):
This then sets up network connectivity, and provides the cluster administrator with a remote URL
(figure 3.27):
A browser that is on a machine with connectivity to the head node can then use the provided remote
URL. This then brings up the GUI installer within the browser.
An alternative to running the text mode installer to obtain the remote URL is to use the netconf
kernel parameter instead. Details on configuring this are given in section 3.3.1.
4
Licensing NVIDIA Base
Command Manager
This chapter explains how an NVIDIA Base Command Manager license is viewed, verified, requested,
and installed. The use of a product key to activate the license is also explained.
Typically, for a new cluster that is purchased from a reseller, the cluster may have BCM already set
up on it.
BCM can be run with a temporary, or evaluation license, which allows the administrator to try it out.
This typically has some restrictions on the period of validity for the license, or the number of nodes in
the cluster.
• The default evaluation license comes with the online ISO download. The ISO is available for
product key owners via http://customer.brightcomputing.com/Download. The license that is
shipped with the ISO is a 2-node license, suitable for a head node and up to one compute node.
• A custom evaluation license can be set up by the NVIDIA sales team and configured for an agreed-
upon number of nodes and validity period.
• The Easy8 evaluation license is a license available via the customer portal account for signed-up
customers. It can be used with up to 16 GPUs for up to a year. The total number of nodes in the
the evaluation cluster cannot exceed 16.
In contrast to an evaluation licence, there is the full license. A full license is almost always a sub-
scription license. Installing a full license allows the cluster to function without the restrictions of an
evaluation license. The administrator therefore usually requests a full license, and installs it. This nor-
mally only requires the administrator to:
The preceding takes care of the licensing needs for most administrators, and the rest of this chapter
can then usually conveniently be skipped.
Administrators who would like a better background understanding on how licensing is installed
and used in BCM can go on to read the rest of this chapter.
CMDaemon can run only with an unexpired evaluation or unexpired full license. CMDaemon is the
engine that runs BCM, and is what is normally recommended for further configuration of the cluster.
Basic CMDaemon-based cluster configuration is covered in Chapter 3 of the Administrator Manual.
Any BCM installation requires a license file to be present on the head node. The license file details the
attributes under which a particular BCM installation has been licensed.
Example
46 Licensing NVIDIA Base Command Manager
• the “Licensee” details, which include the name of the organization, is an attribute of the license
file that specifies the condition that only the specified organization may use the software
• the “Licensed nodes” attribute specifies the maximum number of nodes that the cluster manager
may manage. Head nodes are also regarded as nodes for this attribute.
• the “Expiration date” of the license is an attribute that sets when the license expires. It is some-
times set to a date in the near future so that the cluster owner effectively has a trial period. A
new license with a longer period can be requested (section 4.3) after the owner decides to continue
using the cluster with BCM
A license file can only be used on the machine for which it has been generated and cannot be changed
once it has been issued. This means that to change licensing conditions, a new license file must be issued.
The license file is sometimes referred to as the cluster certificate, or head node certificate, because it is
the X509v3 certificate of the head node, and is used throughout cluster operations. Its components are
located under /cm/local/apps/cmd/etc/. Section 2.3 of the Administrator Manual has more information
on certificate-based authentication.
Example
The license shown in the preceding example allows 100 nodes to be used.
The license is tied to a specific MAC address, so it cannot simply be used elsewhere. For convenience,
the Node Count field in the output of licenseinfo shows the current number of nodes used.
Example
but further information cannot be obtained using Base View or cmsh, because these clients themselves
obtain their information from the cluster management daemon.
In such a case, the verify-license utility allows the troubleshooting of license issues.
Example
Example
3. Using verify-license with the verify option: checks the validity of the license:
• If the license is valid, then no output is produced and the utility exits with exit-code 0.
• If the license is invalid, then output is produced indicating what is wrong. Messages such as these
are then displayed:
– if the license is due to expire in more than that number of months, then the verify-license
command returns nothing.
– if the license is due to expire in less than that number of months, then the verify-license
command returns the date of expiry
• If a number value is not set for monthsleft, then the value is set to 12 by default. In other words,
the default value means that if the license is due to expire in less than 12 months, then the date of
expiry of the license is displayed.
Example
[root@basecm10 etc]# date
Tue Sep 19 14:55:16 CET 2023
[root@basecm10 etc]# verify-license monthsleft
Cluster Manager License expiration date: 31/Dec/2023
[root@basecm10 etc]# verify-license monthsleft=3
[root@basecm10 etc]# verify-license monthsleft=4
Cluster Manager License expiration date: 31/Dec/2023, time remaining 14w 5d
4.2.3 Using The versioninfo Command To Verify The BCM Software Version
The license version should not be confused with the BCM software version. The license version is a
license format version that rarely changes between cluster manager version releases. Thus a cluster
can have a license with version 7.0, which was the license format introduced during NVIDIA Base
Command Manager 7.0 (also known as Bright Cluster Manager 7.0), and have a software version 8.1.
The version of a cluster can be viewed with using the versioninfo command, which can be run from
the main mode of cmsh as follows:
Example
root@basecm10 ~]# cmsh
[basecm10]% main
[basecm10->main]% versioninfo
Version Information
------------------------ ----------------------------------------------------
Cluster Manager 9.2
CMDaemon 2.2
CMDaemon Build Index 151494
CMDaemon Build Hash fc86e6036f
Database Version 36249
• Evaluation product key: An evaluation license is a temporary license that can be installed via an
evaluation product key. The evaluation product key is valid for a maximum of 3 months from a
specified date, unless the account manager approves a further extension.
If a cluster has BCM installed on it, then a temporary license to run the cluster can be installed with
an evaluation product key. Such a key allows to the cluster to run with defined attributes, such
as a certain number of nodes and features enabled, depending on what was agreed upon with the
account manager. The temporary license is valid until the product key expires, unless the account
manager has approved further extension of the product key, and the license has been re-installed.
DVD downloads of BCM from the BCM website come with a built-in license that overrides any
product key attributes. The license is valid for a maximum of 3 months from the download date.
An evaluation product key allows the user to download such a DVD, and the built-in license then
allows 2-node clusters to be tried out. Such a cluster can comprise 1 head node and 1 compute
node, or comprise 2 head nodes.
• Subscription product key: A subscription license is a license can be installed with a subscrip-
tion product key. The subscription product key has some attributes that decide the subscription
length and other settings for the license. At the time of writing (September 2017), the subscription
duration is a maximum of 5 years from a specified date.
If a cluster has BCM installed on it, then a subscription license to run the cluster can be installed
with a subscription product key. Such a key allows the cluster to run with defined attributes, such
as a certain number of nodes and features enabled, depending on what was agreed upon with the
account manager. The subscription license is valid until the subscription product key expires.
• Hardware lifetime product key: This is a legacy product key that is supported for the hardware
lifetime. It is no longer issued.
If the product key has been used on the cluster already: then it can be retrieved from the CSR file
(page 52) with the command:
cm-get-product-key
• to register the key using the Bright Computing customer portal (section 4.3.9) account.
The following terminology is used: when talking about product keys, locking, licenses, installation,
and registration:
• activating a license: A product key is obtained from any BCM (re)seller. It is used to obtain and
activate a license file. Activation means that Bright Computing records that the product key has
been used to obtain a license file. The license obtained by product key activation permits the
cluster to work with particular settings. For example, the subscription period, and the number of
nodes. The subscription start and end date cannot be altered for the license file associated with
the key, so an administrator normally activates the license file as soon as possible after the starting
date in order to not waste the subscription period.
• locking a product key: The administrator is normally allowed to use a product key to activate a
license only once. This is because a product key is locked on activation of the license. A locked state
means that product key cannot activate a new license—it is “used up”.
An activated license only works on the hardware that the product key was used with. This could
obviously be a problem if the administrator wants to move BCM to new hardware. In such a
case, the product key must be unlocked. Unlocking is possible for a subscription license via the
customer portal (section 4.3.9). Unlocking an evaluation license, or a hardware lifetime license, is
possible by sending a request to the account manager at Bright Computing to unlock the product
key. Once the product key is unlocked, then it can be used once again to activate a new license.
• license installation: License installation occurs on the cluster after the license is activated and is-
sued. The installation is done automatically if possible. Sometimes installation needs to be done
manually, as explained in the section on the request-license script (page 51). The license can
only work on the hardware it was specified for. After installation is complete, the cluster runs
with the activated license.
• product key registration: Product key registration occurs on the customer portal (section 4.3.9) ac-
count when the product key is associated with the account.
There are three options to use the product key to get the license:
1. Direct WWW access: If the cluster has access to the WWW port, then a successful completion of
the request-license command obtains and activates the license. It also locks the product key.
52 Licensing NVIDIA Base Command Manager
• Proxy WWW access: If the cluster uses a web-proxy, then the environment variable
http_proxy must be set before the request-license command is run. From a bash prompt
this is set with:
export http_proxy=<proxy>
where <proxy> is the hostname or IP address of the proxy. An equivalent alternative is that
the ScriptEnvironment directive (page 863 of the Administrator Manual), which is a CMDae-
mon directive, can be set and activated (page 845 of the Administrator Manual).
2. Off-cluster WWW access: If the cluster does not have access to the WWW port,
but the administrator does have off-cluster web-browser access, then the point at
which the request-license command prompts “Submit certificate request to
http://licensing.brightcomputing.com/licensing/index.cgi ?” should be answered
negatively. CSR (Certificate Sign Request) data generated is then conveniently displayed
on the screen as well as saved in the file /cm/local/apps/cmd/etc/cluster.csr.new. The
cluster.csr.new file may be taken off-cluster and processed with an off-cluster browser.
The CSR file should not be confused with the private key file, cluster.key.new, created shortly
beforehand by the request-license command. In order to maintain cluster security, the private
key file must, in principle, never leave the cluster.
At the off-cluster web-browser, the administrator may enter the cluster.csr.new content in a
web form at:
http://licensing.brightcomputing.com/licensing
A signed license text is returned. At Bright Computing the license is noted as having been acti-
vated, and the product key is locked.
The signed license text received by the administrator is in the form of a plain text certificate. As
the web form response explains, it can be saved directly from most browsers. Cutting and pasting
the text into an editor and then saving it is possible too, since the response is plain text. The saved
signed license file, <signedlicense>, should then be put on the head node. If there is a copy of
the file on the off-cluster machine, the administrator should consider wiping that copy in order to
reduce information leakage.
The command:
install-license <signedlicense>
installs the signed license on the head node, and is described further on page 53. Installation
means the cluster now runs with the activated certificate.
3. Fax or physical delivery: If no internet access is available at all to the administrator, the CSR data
may be faxed or sent as a physical delivery (postal mail print out, USB flash drive/floppy disk) to
any BCM reseller. A certificate will be faxed or sent back in response, the license will be noted by
Bright Computing as having been activated, and the associated product key will be noted as being
locked. The certificate can then be handled further as described in option 2.
Example
Contacting http://licensing.brightcomputing.com/licensing/...
License granted.
License data was saved to /cm/local/apps/cmd/etc/cluster.pem.new
Install license ? [Y/n] n
Use "install-license /cm/local/apps/cmd/etc/cluster.pem.new" to install the license.
• If the old head node is not able to run normally, then the new head node can have the head node
data placed on it from the old head node data backup.
• If the old head node is still running normally, then the new head node can have data placed on it
by a cloning action run from the old head node (section 17.4.8 of the Administrator Manual).
• a user with a subscription license can unlock the product key directly via the customer portal
(section 4.3.9).
• a user with a hardware license almost always has the license under the condition that it expires
when the hardware expires. Therefore, a user with a hardware license who is replacing the hard-
ware is almost always restricted from a license reinstallation. Users without this restriction may
request the account manager at Bright Computing to unlock the product key.
Using the product key with the request-license script then allows a new license to be requested,
which can then be installed by running the install-license script. The install-license script may
not actually be needed, but it does no harm to run it just in case afterwards.
• The full drive image can be copied on to a blank drive and the system will work as before.
– then after the installation is done, a license can be requested and installed once more using
the same product key, using the request-license command. Because the product key is
normally locked when the previous license request was done, a request to unlock the product
key usually needs to be sent to the account manager at Bright Computing before the license
request can be executed.
– If the administrator wants to avoid using the request-license command and having to type
in a product key, then some certificate key pairs must be placed on the new drive from the
old drive, in the same locations. The procedure that can be followed is:
1. in the directory /cm/local/apps/cmd/etc/, the following key pair is copied over:
* cluster.key
* cluster.pem
Copying these across means that request-license does not need to be used.
2. The admin.{pem|key} key pair files can then be placed in the directory /root/.cm/cmsh/.
Two options are:
* the following key pair can be copied over:
· admin.key
· admin.pem
or
* a fresh admin.{pem|key} key pair can be generated instead via a cmd -b option:
Example
4.3 Requesting And Installing A License Using A Product Key 55
The subsequent times that the same product key is used: If a license has become invalid, a new
license may be requested. On running the command request-license for the cluster, the administrator
is prompted on whether to re-use the existing keys and settings from the existing license:
Example
• If the existing keys are kept, a pdsh -g computenode reboot is not required. This is because these
keys are X509v3 certificates issued from the head node. For these:
– Any node certificates (section 5.4.1 of the Administrator Manual) that were generated using the
old certificate are therefore still valid and so regenerating them for nodes via a reboot is not
56 Licensing NVIDIA Base Command Manager
required, allowing users to continue working uninterrupted. On reboot new node certificates
are generated and used if needed.
– User certificates (section 6.4 of the Administrator Manual) also remain valid, but only while
CMDaemon is not restarted. They become invalid in any case with a new license on boot
since they do not regenerate automatically. It is therefore advised to install a permanent
license as soon as possible, or alternatively, to not bother creating user certificates until a
permanent license has been set up for the cluster.
• If the existing keys are not re-used, then node communication ceases until the nodes are rebooted.
If there are jobs running on BCM nodes, they cannot then complete.
After the license is installed, verifying the license attribute values is a good idea. This can be done
using the licenseinfo command in cmsh, or by selecting the License info menu option from within
the Partition base window in Base View’s Cluster resource (section 4.1).
• Request support, including with a non-activated key. This is for for versions of BCM prior to
version 10.
For BCM version 10 and later, the corresponding support request link is at https://enterprise-support.
nvidia.com/s/
The --auto-attach option allows a system to update its subscription automatically, so that the sys-
tem ends up with a valid subscription state.
If the head node has no direct connection to the internet, then an HTTP proxy can be configured as
a command line option. The subscription-manager man pages give details on configuring the proxy
from the command line.
A valid subscription means that, if all is well, then the RHEL server RPMs repository (rhel-6-server-
rpms or rhel-7-server-rpms) is enabled, and means that RPMs can be picked up from that repository.
58 Linux Distributions That Use Registration
For some RHEL7 packages, the RHEL7 extras repository has to be enabled in a similar manner. The
option used is then --enable rhel-7-server-extras-rpms.
A list of the available repositories for a subscription can be retrieved using:
After registration, the yum subscription-manager plugin is enabled. This means that yum can now
be used to install and update from the Red Hat Network repositories.
After the software image is registered, the optional and extras RPMs repository must be enabled
using, for RHEL7 systems:
After registration, the yum subscription-manager plugin is enabled within the software image.
This means that yum can now be used to install and update the software image from the Red Hat Net-
work repositories
The e-mail address used is the address that was used to register the subscription with Novell. When
logged in on the Novell site, the activation code or registration code can be found at the products
overview page after selecting “SUSE Linux Enterprise Server”.
After registering, the SLES and SLE SDK repositories are added to the repository list and enabled.
The defined repositories can be listed with:
[root@basecm10 ~]# zypper lr
The e-mail address is the address used to register the subscription with Novell. When logged in on
the Novell site, the activation code or registration code can be found at the products overview page after
selecting “SUSE Linux Enterprise Server”.
When running the registration command, warnings about the /sys or /proc filesystems can be ig-
nored. The command tries to query hardware information via these filesystems, but these are empty
filesystems in a software image, and only fill up on the node itself after the image is provisioned to the
node.
Instead of registering the software image, the SLES repositories can be enabled for the
default-image software image with:
The copied files should be reviewed. Any unwanted repositories, unwanted service files, and un-
wanted credential files, must be removed.
The repository list of the default-image software image can be viewed with the chroot option, -R,
as follows:
6.2 Method
A cluster consists of a head node, say basecm10 and one or more regular nodes. The head node of the
cluster is assumed to face the internal network (the network of regular nodes) on one interface, say eth0.
The external network leading to the internet is then on another interface, say eth1. This is referred to as
a type 1 configuration in this manual (section 3.3.9).
Typically, an administrator gives the head node a static external IP address before actually connect-
ing it up to the external network. This requires logging into the physical head node with the vendor-
supplied root password. The original network parameters of the head node can then be viewed and set.
For example for eth1:
# cmsh -c "device interfaces basecm10; get eth1 dhcp"
yes
Other external network parameters can be viewed and set in a similar way, as shown in table 6.1. A
reboot implements the networking changes.
6.3 Terminology
A reminder about the less well-known terminology in the table:
• netmaskbits is the netmask size, or prefix-length, in bits. In IPv4’s 32-bit addressing, this can be up
to 31 bits, so it is a number between 1 and 31. For example: networks with 256 (28 ) addresses (i.e.
Changing The Network Parameters Of The Head Node
Table 6.1: External Network Parameters And How To Change Them On The Head Node
Network Parameter Description Operation Command Used
IP address of head node view cmsh -c "device interfaces basecm10; get eth1 ip"
IP∗
on eth1 interface set cmsh -c "device interfaces basecm10; set eth1 ip address; commit"
base IP address (network view cmsh -c "network get externalnet baseaddress"
baseaddress∗
address) of network set cmsh -c "network; set externalnet baseaddress address; commit"
broadcast IP address of view cmsh -c "network get externalnet broadcastaddress"
broadcastaddress∗
network set cmsh -c "network; set externalnet broadcastaddress address; commit"
netmask in CIDR notation view cmsh -c "network get externalnet netmaskbits"
netmaskbits
(number after “/”, or prefix set cmsh -c "network; set externalnet netmaskbits bitsize; commit"
length)
gateway (default route) view cmsh -c "network get externalnet gateway"
gateway∗
IP address set cmsh -c "network; set externalnet gateway address; commit"
view cmsh -c "partition get base nameservers"
nameservers∗, ∗∗ nameserver IP addresses
set cmsh -c "partition; set base nameservers address; commit"
view cmsh -c "partition get base searchdomains"
searchdomains∗∗ name of search domains
set cmsh -c "partition; set base searchdomains hostname; commit"
view cmsh -c "partition get base timeservers"
timeservers∗∗ name of timeservers
set cmsh -c "partition; set base timeservers address; commit"
* If address is set to 0.0.0.0 then the value offered by the DHCP server on the external network is accepted
62
** Space-separated multiple values are also accepted for these parameters when setting the value for address or hostname.
6.3 Terminology 63
with host addresses specified with the last 8 bits) have a netmask size of 24 bits. They are written
in CIDR notation with a trailing “/24”, and are commonly spoken of as “slash 24” networks.
• baseaddress is the IP address of the network the head node is on, rather than the IP address of
the head node itself. The baseaddress is specified by taking netmaskbits number of bits from the
IP address of the head node. Examples:
– A network with 256 (28 ) host addresses: This implies the first 24 bits of the head node’s
IP address are the network address, and the remaining 8 bits are zeroed. This is specified
by using “0” as the last value in the dotted-quad notation (i.e. zeroing the last 8 bits). For
example: 192.168.3.0
– A network with 128 (27 ) host addresses: Here netmaskbits is 25 bits in size, and only the
last 7 bits are zeroed. In dotted-quad notation this implies “128” as the last quad value (i.e.
zeroing the last 7 bits). For example: 192.168.3.128.
When in doubt, or if the preceding terminology is not understood, then the values to use can be calcu-
lated using the head node’s sipcalc utility. To use it, the IP address in CIDR format for the head node
must be known.
When run using a CIDR address value of 192.168.3.130/25, the output is (some output removed for
clarity):
# sipcalc 192.168.3.130/25
# sipcalc -b 192.168.3.130/25
7.2 Shorewall
Package name: shorewall
In NVIDIA Base Command Manager 10, Shorewall is managed by CMDaemon, in order to han-
dle the automation of cloud node access. Restarting Shorewall can thus also be carried out within the
services submode (section 3.13 of the Administrator Manual), on the head node. For example a head
node basecm10 the cmsh session to carry out a restart of shorewall might be:
[basecm10->device[basecm10]->services[shorewall]]% restart
restart Successfully restarted service shorewall on: basecm10
System administrators who need a deeper understanding of how Shorewall is implemented should
be aware that Shorewall does not really run as a daemon process. The command to restart the service
therefore does not stop and start a shorewall daemon. Instead it carries out the configuration of netfilter
through implementing the iptables configuration settings, and then exits. It exits without leaving a
shorewall process up and running, even though service shorewall status shows it is running.
A restart of CMDaemon has the change take effect, and takes care of opening the firewall on port
8082 for CMDaemon, by adding a line to the rules file of Shorewall. The original port 8081 remains
open, but CMDaemon no longer listens to it.
The status of ports used by the cluster manager can be listed with:
[root@basecm10 ~]# cm-cmd-ports -l
type http https firewall rule path
-------------- ------ ------- --------------- -------------------------------------------------------
image 8080 8082 /cm/images/default-image/cm/local/apps/cmd/etc/cmd.conf
image 8080 8082 True /cm/local/apps/cmd/etc/cmd.conf
node-installer 8082 /cm/node-installer/scripts/node-installer.conf
7.3 Compilers 67
7.2.3 Clear And Stop Behavior In service Options, bash Shell Command, And cmsh Shell
To remove all rules, for example for testing purposes, the clear option should be used from the Unix
shell. This then allows all network traffic through:
shorewall clear
Administrators should be aware that in the Linux distributions supported by BCM, the service
shorewall stop command corresponds to the unix shell shorewall stop command, and not to the
unix shell shorewall clear command. The stop option for the service and shell blocks network traffic
but allows a pre-defined minimal safe set of connections, and is not the same as completely remov-
ing Shorewall from consideration. The stop options discussed so far should not be confused with the
equivalent stop option in the cmsh shell.
This situation is indicated in the following table:
Correspondence Of Stop And Clear Options In Shorewall Vs cmsh
iptables rules Service Unix Shell cmsh shell
keep a safe set: service shorewall stop shorewall stop no equivalent
clear all rules: no equivalent shorewall clear stop shorewall
7.3 Compilers
The BCM repositories provide convenient RPM and .deb packages for several compilers that are popular
in the HPC community. All of those may be installed through yum, zypper, or apt (section 11.2 of the
Administrator Manual) but (with the exception of GCC) require an installed license file to be used.
68 Third Party Software
7.3.1 GCC
Package name: gcc-recent for RHEL and derivatives, and SLES. cm-gcc for Ubuntu
The GCC suite that the distribution provides is also present by default.
Packages In The Intel Compiler Suite Versions For RHEL And Derivatives, SLES, And Ubuntu
2018 2019 2020
intel-compiler-common-2018 intel-compiler-common-2019 intel-compiler-common-2020
intel-cc-2018 intel-cc-2019 intel-cc-2020
intel-daal-2018 intel-daal-2019 intel-daal-2020
intel-daal-2018-32 intel-daal-2019-32 intel-daal-2020-32
intel-fc-2018 intel-fc-2019 intel-fc-2020
intel-gdb-2018 intel-gdb-2019 intel-gdb-2020
intel-icx-2020
intel-ipp-2018 intel-ipp-2019 intel-ipp-2020
intel-ipp-2018-32 intel-ipp-2019-32 intel-ipp-2020-32
intel-ipp-2018-devel intel-ipp-2019-devel intel-ipp-2020-devel
intel-ipp-2018-devel-32 intel-ipp-2019-devel-32 intel-ipp-2020-devel-32
intel-itac-2018 intel-itac-2019 intel-itac-2020
intel-mkl-2018 intel-mkl-2019 intel-mkl-2020
intel-mkl-2018-32 intel-mkl-2019-32 intel-mkl-2020-32
intel-mpi-2018 intel-mpi-2019 intel-mpi-2020
intel-openmp-2018 intel-openmp-2019 intel-openmp-2020
intel-openmp-2018-32 intel-openmp-2019-32 intel-openmp-2020-32
intel-tbb-2018 intel-tbb-2019 intel-tbb-2020
NVIDIA Base Command Manager 10 provides x86_64 packages for the 2018, 2019, and 2020 versions
of the Intel compiler suites. These are for RHEL and derivatives, for SLES, and for Ubuntu, except for
the following packages and distributions:
• The 2018 version of the Intel compiler suite is not supported for RHEL8 and derivatives, and is
also not supported for Ubuntu 20.04. Therefore, for the 2018 suite, packages for these distributions
are not available.
• The 2019 version of the Intel compiler suite is not supported for Ubuntu 20.04. Therefore, for the
2019 suite, a package for this distribution is not available.
7.3 Compilers 69
Typically the compiler suite includes the Intel Fortran (indicated by fc) and Intel C++ compilers
(part of the C compiler package, indicated by cc). 32-bit compilers are included in the intel-cc-<year>
and intel-fc-<year> packages.
For the other packages, a 32-bit version is sometimes available separately. The 32-bit packages have
package names ending in “-32”
Both the 32-bit and 64-bit versions can be invoked through the same set of commands. The modules
environment (section 2.2 of the Administrator Manual) provided when installing the packages can be
loaded accordingly, to select one of the two versions. For the C++ and Fortran compilers the 64-bit
and 32-bit modules are called as modules beginning with intel/compiler/64 and intel/compiler/32
respectively.
The Intel compiler can be accessed by loading the compiler modules under intel/compiler/64 or
intel/compiler/32. The following commands can be used to run the Intel compilers:
A short summary of a package can be shown using, for example: “yum info intel-fc-<year>”.
The compiler packages require a license, obtainable from Intel, and placed in /cm/shared/licenses/
intel.
Full documentation for the Intel compilers is available at http://software.intel.com/en-us/
intel-compilers/.
In the following example the license file is copied into the appropriate location, the C/C++ compiler
is installed, and a modules environment (section 2.2 of the Administrator Manual) is loaded for use in
this session by the root user. Furthermore, the modules environment is added for regular root user use
with “module initadd”:
Example
How to load modules for use and regular use by non-root users is explained in section 2.2.3 of the
Administrator Manual.
70 Third Party Software
BCM provides the latest free version of FLEXlm (version 9.5). However, Intel provides a more recent
version, which is needed for more recent compilers.
The free version is described in this section.
For the Intel compilers a FLEXlm license must be present in the /cm/shared/licenses tree.
For workstation licenses, i.e. a license which is only valid on the head node, the presence of the
license file is typically sufficient.
However, for floating licenses, i.e. a license which may be used on several machines, possibly simul-
taneously, the FLEXlm license manager, lmgrd, must be running.
The lmgrd service serves licenses to any system that is able to connect to it through the network.
With the default firewall configuration, this means that licenses may be checked out from any machine
on the internal cluster network. Licenses may be installed by adding them to /cm/shared/licenses/
lmgrd/license.dat. Normally any FLEXlm license starts with the following line:
SERVER hostname MAC port
Only the first FLEXlm license that is listed in the license.dat file used by lmgrd may contain a
SERVER line. All subsequent licenses listed in license.dat should have the SERVER line removed. This
means in practice that all except for the first licenses listed in license.dat start with a line:
DAEMON name /full/path/to/vendor-daemon
The DAEMON line must refer to the vendor daemon for a specific application. For Intel the vendor
daemon (called INTEL) must be installed from the flexlm-intel.
Installing the flexlm package adds a system account lmgrd to the password file. The account is not
assigned a password, so it cannot be used for logins. The account is used to run the lmgrd process. The
lmgrd service is not configured to start up automatically after a system boot, but can be configured to
do so with:
chkconfig lmgrd on
The lmgrd service is started manually with:
service lmgrd start
The lmgrd service logs its transactions and any errors to /var/log/lmgrd.log.
To install the packages, the instructions from AMD should be followed. These instructions describe
configuring the AMD driver repository and then installing the driver. The instructions are at
https://docs.amd.com/en/latest/deploy/linux/quick_start.html
at the time of writing (October 2023).
The installation must be done in the image, which for a RHEL image uses a chroot into the image,
and uses a bind mount to have some special filesystem directories (/proc, /sys, and similar) be available
during the package installation. This is needed for the DKMS installation.
Bind mounting the filesystems and then chrooting is a little tedious, so the cm-chroot-sw-img utility
(page 534 of the Administrator Manual) is used to automate the job.
The following session output illustrates the procedure for Rocky 9.2, with much text elided:
mounted /cm/images/default-image/dev
mounted /cm/images/default-image/dev/pts
mounted /cm/images/default-image/proc
mounted /cm/images/default-image/sys
mounted /cm/images/default-image/run
...
The amdgpu-install package can then be installed, which installs the ROCm stack with it. After
installation, exiting from the chroot automatically unmounts the bind mounts:
[root@basecm10:/]# urldomain=https://repo.radeon.com
[root@basecm10:/]# urlpath=/amdgpu/5.7.1/rhel/9.2/main/x86_64/amdgpu-install-5.7.50701-1664922.el9.noarch.rpm
[root@basecm10:/]# yum install $urldomain$urlpath
...
[root@basecm10:/]# amdgpu-install --usecase=rocm
...
[root@basecm10:/]# exit
umounted /cm/images/am/dev/pts
umounted /cm/images/am/dev
umounted /cm/images/am/proc
umounted /cm/images/am/sys
umounted /cm/images/am/run
The nodes that are to use the driver should then be set to use the new image, and should be rebooted:
Example
root@basecm10 ~# cmsh
[basecm10]% device use node001
[basecm10->device[node001]]% set softwareimage am
[basecm10->device*[node001*]]% commit
[basecm10->device[node001]]% reboot node001
Normal nodes without the AMD GPU also boot up without crashing if they are set to use this image,
but will not be able to run OpenCL programs.
root@basecm10:~# cmsh
[basecm10]% softwareimage
[basecm10->softwareimage]% clone default-image am
[basecm10->softwareimage*[am*]]% commit
[basecm10->softwareimage[am]]%
[notice] basecm10: Started to copy: /cm/images/default-image -> /cm/images/am (117)
...
[notice] basecm10: Initial ramdisk for image am was generated successfully
[basecm10->softwareimage[am]]% quit
To install the packages, the instructions from AMD should be followed. These instructions describe
configuring access to the AMD driver repository, before picking up the driver. The instructions are at
https://docs.amd.com/en/latest/deploy/linux/quick_start.html
at the time of writing (October 2023).
The configuration must be done in the image. For an Ubuntu image a chroot can be done into the
image with the help of the cm-chroot-sw-img utility (page 534 of the Administrator Manual). This uses
a bind mount to have the /proc, /sys, and other special directories be available during the package
installation (section 11.4 of the Administrator Manual).
The following session output illustrates the driver installation procedure, with much text elided,
The am image directory is entered with the chroot utility
Example
and the instructions on configuring access to the AMD driver repository are followed.
The AMD GPU installer package can be picked up from under https://repo.radeon.com/
amdgpu-install/. There are several installer versions available. Using the most recent one is usually
best.
The first part of a URL to the package can be defined as:
Example
root@basecm10:~# URLubuntu=https://repo.radeon.com/amdgpu-install/23.20/ubuntu
The second part of the URL to the package can be defined according to the Ubuntu version used,
and according to what is available. The package can then be retrieved, for example:
• for Ubuntu 20.04 with:
root@basecm10:~# URLfocal=/focal/amdgpu-install_5.7.50700-1_all.deb
root@basecm10:~# wget $URLubuntu$URLfocal
or
root@basecm10:~# URLjammy=/jammy/amdgpu-install_5.7.50700-1_all.deb
root@basecm10:~# wget $URLubuntu$URLjammy
The nodes that are to use the driver should then be set to use the new image, and should be rebooted:
Normal nodes without an AMD GPU also boot up without crashing if they are set to use this image,
but they are not be able to run OpenCL programs.
1. The procedure begins with cloning the default-image to an image that is to be the AMD GPU
image, such as, for example, am.
root@basecm10:~# cmsh
[basecm10]% softwareimage
[basecm10->softwareimage]% clone default-image am
[basecm10->softwareimage*[am*]]% commit
[basecm10->softwareimage[am]]%
[notice] basecm10: Started to copy: /cm/images/default-image -> /cm/images/am (117)
...
[notice] basecm10: Initial ramdisk for image am was generated successfully
[basecm10->softwareimage[am]]% quit
• install DKMS
basecm10:/ # zypper install dkms
basecm10:/ # zypper clean --all
• add the Perl dependency repository
basecm10:/ # domainURL=https://download.opensuse.org
basecm10:/ # perlSLESpath=/repositories/devel:languages:perl/SLE_15/devel:languages:perl.repo
basecm10:/ # zypper addrepo $domainURL$perlSLESpath
• install the AMD GPU install tool
basecm10:/ # URLradeon=https://repo.radeon.com
basecm10:/ # slepath=/amdgpu-install/22.10/sle/15/amdgpu-install-22.10.50100-1.noarch.rpm
basecm10:/ # zypper install $URLradeon$slepath
• and to install the ROCm driver and software:
basecm10:/ # amdgpu-install --usecase=rocm
basecm10:/ # exit
umounted /cm/images/am/dev/pts
umounted /cm/images/am/dev
umounted /cm/images/am/proc
umounted /cm/images/am/sys
umounted /cm/images/am/run
74 Third Party Software
3. The nodes that are to use the driver should then be set to use the new image, and should be
rebooted:
Example
root@basecm10 ~# cmsh
[basecm10]% device use node001
[basecm10->device[node001]]% set softwareimage am
[basecm10->device*[node001*]]% commit
[basecm10->device[node001]]% reboot node001
Normal nodes without the AMD GPU also boot up without crashing if they are set to use this
image, but they will not be able to run OpenCL programs.
This is due to an AMD GPU driver installation bug, where the library, which is placed in a directory
of the form /opt/rocm-*/lib, is not linked up during installation.
A workaround is to set up the link manually. This is done in the chroot environment, in the relevant
image, by creating a .conf file under /etc/ld.so.conf.d with the path to the library.
In the following example, the path is configured for an Ubuntu 22.04 image:
Example
After the configuration file has been placed, the ldconfig command is run, still within chroot, to
link the library in the image(s).
7.5.1 Installation
After the head node has been installed, the Intel OPA Software Stack can be installed by executing the
following commands on the head node:
The yum command installs the package containing the OPA stack itself, as well as the installation
scripts required for installing and configuring the kernel drivers. These are automatically placed under
a subdirectory named after the OPA stack version.
/cm/local/apps/intel-opa/<version>/bin/intel-opa-install.sh -h
For the software images, the OPA stack can be configured and deployed for each software image as
follows:
The OPA MTU size is not changed by BCM during installation. An Intel recommendation earlier (Oc-
tober 2017) in https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/
fabric-products/Intel_OP_Performance_Tuning_UG_H93143_v3_0.pdf, in section 6.2 was:
OPA on the other hand can support MTU sizes from 2048B (2K) up to 8192B (8KB) for verbs or PSM 2 traffic.
Intel recommends you use the 8KB MTU default for RDMA requests of 8KB or more.
8
The NVIDIA HPC SDK
The NVIDIA HPC software development kit (https://developer.nvidia.com/hpc-sdk) is a suite of
compilers, libraries, and other tools for HPC.
Features include:
• The NVIDIA HPC SDK C, C++, and Fortran compilers that support GPU acceleration of HPC
modeling and simulation applications with standard C++ and Fortran, OpenACC directives, and
CUDA.
• GPU-accelerated math libraries that maximize performance on common HPC algorithms, and op-
timized communications libraries enable standards-based multi-GPU and scalable systems pro-
gramming.
• Performance profiling and debugging tools that simplify porting and optimization of HPC appli-
cations
• Support for ARM, OpenPOWER, x86-64 CPUs, as well as NVIDIA GPUs, running Linux
Example
The preceding output was what was available at the time of writing (April 2023). The output can be
expected to change.
A browser-based way to check the cm-nvhpc versions and CUDA availability situation for BCM
versions, distributions and architecture is to use cm-nvhpc as a string in the distributed packages list for
BCM at https://support.brightcomputing.com/packages-dashboard
The nvhpc environment module is the standard HPC SDK, and provides an OPENMPI 3.x library by
default.
The byo tag is an abbreviation for ’bring-your-own’, and means that the general compiler environ-
ment for C, C++ and Fortran are not set.
The nompi tag implies that paths to the MPI binaries and MPI libraries that come with cm-nvhpc are
not set, so that no MPI library is used from the package. An external MPI library can then be used with
the nvhpc-nompi compiler.
The nvhpc-hpcx environment module sets up the HPC-X library environment. This is an alternative
to the default OpenMPI 3.x library that the nvhpc module provides.
8.3 Viewing Installed Available CUDA Versions, And The Running CUDA
Version
The installed available CUDA versions for nvhpc can be viewed with:
Example
Example
• nvhpc cluster-wide:
• nvhpc for a specific user on a specific head or compute node, as specified by hostname -s:
The second configuration file overwrites any settings set with ${NVHPC_ROOT}/compilers/bin/localrc
If the ${NVHPC_ROOT}/compilers/bin/localrc.$(hostname -s) configuration file exists, then a
${HOME}/localrc.$(hostname -s) is ignored.
9
CUDA For GPUs
The optional CUDA packages should be deployed in order to take advantage of the computational
capabilities of NVIDIA GPUs. The packages may already be in place, and ready for deployment on the
cluster, depending on the particular BCM software that was obtained. If the CUDA packages are not in
place, then they can be picked up from the BCM repositories, or a local mirror.
cuda11.7-visual-tools
cuda11.8-visual-tools
cuda12.0-visual-tools shared CUDA visual toolkit
cuda12.1-visual-tools
cuda12.2-visual-tools
cuda11.7-sdk
cuda11.8-sdk
cuda12.0-sdk shared CUDA software development kit
cuda12.1-sdk
cuda12.2-sdk
The packages of type shared in the preceding table should be installed on the head nodes of a cluster
using CUDA-compatible GPUs. The packages of type local should be installed to all nodes that access
the GPUs. In most cases this means that the cuda-driver and cuda-dcgm packages should be installed
in a software image (section 2.1.2 of the Administrator Manual).
If a head node also accesses GPUs, then the cuda-driver and cuda-dcgm packages should be in-
stalled on it, too.
For packages of type shared, the particular CUDA version that is run on the node can be selected
via a modules environment command:
Example
CUDA packages that the cluster administrator normally does not manage: As an aside, there are
also the additional CUDA DCGM packages:
The preceding DCGM packages are installed in BCM, because CMDaemon uses them to manage
NVIDIA Tesla GPUs. Tesla drivers normally work for the latest CUDA version, and may not therefore
not (yet) support the latest GeForce GPUs.
CUDA Package That The Cluster Administrator May Wish To Install For CUDA Programming
CUB is a CUDA programming library that developers may wish to access. It is provided by the package
cm-cub-cuda, from the Machine Learning (cm-ml) repository.
Example
If the hardware is not detected by the kernel already, then the administrator should reassess the
situation.
• Only after CUDA package installation has taken place, and after rebooting the node with the GPU,
are GPU details visible using the sysinfo command:
Example
running sysinfo on node001, which is where the GPU is, via cmsh on the head node, while cuda-dcgm is
not yet ready:
Example
running sysinfo on node001, which is where the GPU is, via cmsh on the head node, after cuda-dcgm is
ready:
• Cross compilation of CUDA software is generally not a best practice due to resource consumption,
which can even lead to crashes.
– If, despite this, cross compilation with a CPU is done, then the cuda-driver package should
be installed on the node on which the compilation is done, and the GPU-related services on
the node, such as:
* cuda-driver.service
* nvidia-persistenced.service
* cuda-dcgm.service
should be disabled.
shows that one of the dependencies of the cuda-driver package in RHEL8 (not in RHEL9) is the
freeglut-devel package, so for RHEL8 and derivatives it should be installed on a node that accesses
a GPU. If the CUDA SDK source is to be compiled on the head node (with the head node not access-
ing a GPU, and with the cuda-driver package not installed) then the freeglut, freeglut-devel, and
libXi-devel packages should be installed on the head node.
The cuda-driver package is used to compile the kernel drivers which manage the GPU. Therefore,
when installing cuda-driver with yum, several other X11-related packages are installed too, due to pack-
age dependencies.
The cuda*-sdk packages can be used to compile libraries and tools that are not part of the CUDA
toolkit, but used by CUDA software developers, such as the deviceQuery binary (section 9.3).
The cuda-xorg package is optional, and contains the driver and libraries for an X server
Example
For example, on a cluster where (some of) the nodes access GPUs, but the head node does not access a
GPU, the following commands can be issued on the head node to install the CUDA 12.2 packages using
YUM:
[root@mycluster ~]# yum install cuda12.2-toolkit cuda12.2-sdk cuda12.2-visual-tools
[root@mycluster ~]# yum --installroot=/cm/images/default-image install cuda-driver cuda-dcgm
9.2 Installing Kernel Development Packages 85
The --installroot command installs to the image used by the nodes. Here the image used by the
nodes is assumed to be default-image. To ensure the software is installed from the image to the nodes,
the imageupdate command can be run from within cmsh for the appropriate nodes.
Example
The status of the driver and some event messages can be seen with:
Example
May 24 07:04:01 node001 systemd[1]: Starting build, install and load the nvidia kernel modules...
May 24 07:04:15 node001 cuda-driver[1816]: compile nvidia kernel modules
May 24 07:06:09 node001 cuda-driver[1816]: loading kernel modules
May 24 07:06:12 node001 cuda-driver[1816]: create devices
May 24 07:06:12 node001 cuda-driver[1816]: [ OK ]
May 24 07:06:12 node001 systemd[1]: Finished build, install and load the nvidia kernel modules.
If there is a failure in compiling the CUDA module, it is usually indicated by a message saying
“Could not make module”, “NVRM: API mismatch:”, or “Cannot determine kernel version”. Such
a failure typically occurs because compilation is not possible due to missing the correct kernel develop-
ment package from the distribution. Section 9.2 explains how to check for, and install, the appropriate
missing package.
Example
make clean
Executing: /tmp/cuda12.1/bin/x86_64/linux/release/alignedTypes
[/tmp/cuda12.1/bin/x86_64/linux/release/alignedTypes] - Starting...
GPU Device 0: "Volta" with compute capability 7.0
Another method to verify that CUDA is working, is to build and use the deviceQuery command on
a node accessing one or more GPUs. The deviceQuery command lists all CUDA-capable GPUs that a
device can access, along with several of their properties (some output elided):
Example
9.4 Verifying OpenCL 87
The CUDA user manual has further information on how to run compute jobs using CUDA.
Example
make clean
make (may take a while)
[oclBandwidthTest] starting...
/tmp/opencl/OpenCL/bin/linux/release/oclBandwidthTest Starting...
Running on...
Tesla V100-SXM3-32GB
88 CUDA For GPUs
Quick Mode
ModulePath "/usr/lib64/xorg/modules/extensions/nvidia"
ModulePath "/usr/lib64/xorg/modules/extensions"
ModulePath "/usr/lib64/xorg/modules"
The following dynamic module loading line may need to be added to the Module section of the X con-
figuration:
Load "glx"
The following graphics device description lines need to be replaced in the Device section of the X con-
figuration:
Driver "nvidia"
The BusID line may need to be replaced with the ID shown for the GPU by the lspci command.
Example
9.6 NVIDIA Validation Suite (Package: cuda-dcgm-nvvs) 89
Section "ServerLayout"
Identifier "Default Layout"
Screen 0 "Screen0" 0 0
InputDevice "Keyboard0" "CoreKeyboard"
EndSection
Section "Files"
ModulePath "/usr/lib64/xorg/modules/extensions/nvidia"
ModulePath "/usr/lib64/xorg/modules/extensions"
ModulePath "/usr/lib64/xorg/modules"
EndSection
Section "Module"
Load "glx"
EndSection
Section "InputDevice"
Identifier "Keyboard0"
Driver "kbd"
Option "XkbModel" "pc105"
Option "XkbLayout" "us"
EndSection
Section "Device"
Identifier "Videocard0"
Driver "nvidia"
BusID "PCI:14:0:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Videocard0"
DefaultDepth 24
SubSection "Display"
Viewport 0 0
Depth 24
EndSubSection
EndSection
Using GPUs for an X server as well as for general computing with CUDA can have some is-
sues. These are considered in https://nvidia.custhelp.com/app/answers/detail/a_id/3029/~/
using-cuda-and-x.
Example
Example
After it is installed, the node on which the installation is done must be rebooted.
Running the diagnostic after the reboot should display output similar to:
Example
A successful installation of the BCM OFED software stack (section 10.2) onto a running cluster con-
sists of the BCM OFED package installation, as well as then running an installation script. The vendor
and version number installed can then be found in /etc/cm-ofed. Further installation details can be
found in /var/log/cm-ofed.log.
10.2 Mellanox OFED Stack Installation Using The BCM Package Repository
Package names: mlnx-ofed54, mlnx-ofed56, mlnx-ofed57, mlnx-ofed58, mlnx-ofed59,
mlnx-ofed23.04, mlnx-ofed23.07
The Mellanox stacks are installed and configured by BCM in an identical way as far as the adminis-
trator is concerned. In this section (section 10.2):
<vendor-ofedVersion>
is used to indicate where the administrator must carry out a substitution. For Mellanox, the substitution
is one of the following:
These stacks are currently supported by the NVIDIA Base Command Manager 10-supported distri-
butions (RHEL and derivatives, SLES, and Ubuntu), as determined by the compatibility matrices in the
downloads pages accessible from https://network.nvidia.com/support/mlnx-ofed-matrix/.
For most use cases it usually makes sense to get the most recent supported stack.
• The mlnx-ofed49 stack is an LTS release, aimed mainly at supporting older hardware. The stack
may be useful for one of the following:
More recent distributions, such as RHEL9, Ubuntu 22.04, SLES15sp4, no longer support the
mlnx-ofed49 stack.
• The mlnx-ofed54 and mlnx-ofed59 packages are also LTS releases.
Each stack version needs to be matched to a firmware version associated with the OFED device
used. OFED devices used, ConnectX, BlueField, and others, must also be matched along with their
firmware version via the downloads pages accessible from the URL https://network.nvidia.com/
support/mlnx-ofed-matrix/. Deviating from compatible versions is not supported.
Details on the compatibility of older stack versions can also be found via that URL.
Returning back to the subject of OFED package installation via the package manager: For example,
a yum install command indicated by:
yum install <vendor-ofedVersion>
10.2 Mellanox OFED Stack Installation Using The BCM Package Repository 93
means that the installation of the BCM OFED package is executed with one of these corresponding yum
install commands, for example:
yum install mlnx-ofed59
10.2.1 Installing The OFED Stack Provided By The BCM Repository Vendor Package
Running the package manager command associated with the distribution (yum install, zypper up,
apt install), unpacks and installs or updates several packages and scripts. For example:
yum install <vendor-ofedVersion>
However, it does not carry out the installation and configuration of the driver itself due to the funda-
mental nature of the changes it would carry out. The script:
<vendor-ofedVersion>-install.sh
can be used after the package manager installation to carry out the installation and configuration of the
driver itself. By default, a pre-built driver installation is carried out. An alternative is to compile the
driver from source.
• On the head node, the default distribution OFED software stack can be replaced with the vendor
OFED software stack made available from the BCM repository, by using the script’s head option,
-h:
[root@basecm10~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -h
A reboot is recommended after the script completes the install, to help ensure the new image is
cleanly used by the head node.
• For a software image, for example default-image, used by the regular nodes, the default distri-
bution OFED software stack can be replaced with the vendor OFED software stack made available
from the BCM repository, by using the script’s software image option, -s:
[root@basecm10~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -s default-image
If the distribution kernel is updated on any of these head or regular nodes after the vendor OFED
stack has been installed, then the vendor OFED kernel modules made available from the BCM repository
must be reinstalled. This can be done by running the installation scripts again, which replaces the kernel
modules again, along with all the other OFED packages.
• On the head node, a vendor OFED software stack can be built from the BCM repository, with:
[root@basecm10~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -h -b
94 OFED Software Stack
• Similarly, for the software image, for example default-image, used by the regular nodes, the
vendor OFED can built from the BCM repository, with:
[root@basecm10~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -s default-image -b
The OFED Stack provided by BCM can be removed by appending the -r option to the appropriate
-h or -s option. Removing the packages from a head node or software image can lead to package
dependency breakage, and software not working any more. So using the “-r” option should be done
with caution.
10.2.2 Upgrading Kernels When The OFED Stack Has Been Provided By The BCM
Repository Vendor Package—Reinstallation Of The OFED Stack
For all distributions, as explained in the preceding text, a vendor OFED stack is installed and config-
ured via the script <vendor-ofedVersion>-install.sh. OFED reinstallation may be needed if the kernel is
upgraded.
In Ubuntu
If the OFED stack is installed from the distribution or vendor OFED .deb packages, then the DKMS
(Dynamic Kernel Module System) framework makes upgraded vendor OFED kernel modules available
at a higher preference than the distribution OFED kernel modules for a standard distribution kernel.
If there is a kernel upgrade that causes unexpected behavior from the vendor OFED package, then the
cluster administrator can still configure the distribution OFED for use by setting the distribution OFED
kernel module as the preferred kernel module. So no kernel-related packages need to be excluded from
vendor OFED upgrades or kernel upgrades. Typically, Ubuntu clusters can have a package update (apt
upgrade) carried out, with no explicit changes needed to take care of the OFED stack.
• In Red Hat-based systems, The /etc/yum.conf file must be edited. In that file, in the line that
starts with exclude, the kernel and kernel-devel packages need to be removed, so that they
are no longer excluded from updates.
• In SUSE, the kernel-default and kernel-default-devel packages must be unlocked. The
command:
zypper removelock kernel-default kernel-default-devel
• yum update—or for SUSE zypper up—updates the packages on the head node.
10.2 Mellanox OFED Stack Installation Using The BCM Package Repository 95
• To update the packages on the regular nodes the procedure outlined in section 11.3.3 of the
Administrator Manual is followed:
– The packages on the regular node image (for example, default-image) are updated ac-
cording to distribution:
* in Red Hat-based systems as follows:
yum --installroot=/cm/images/default-image update
* or in SLES as follows:
zypper --root=/cm/images/default-image up
* or in Ubuntu as follows, using the cm-chroot-sw-img tool (page 534 of the Adminis-
trator Manual):
root@basecm10:~# cm-chroot-sw-img /cm/images/default-image
root@basecm10:~# apt update; apt upgrade #upgrade takes place in image
...
root@basecm10:~# exit #get out of chroot
– The kernelversion setting for the regular node image, which in this example is the de-
fault default-image, can be updated as follows:
Example
[root@basecm10 ~]# cmsh
[basecm10]% softwareimage
[basecm10->softwareimage]% use default-image
[basecm10->softwareimage[default-image]]% set kernelversion 3.10.0-327.3.1.el7.x86_64
[basecm10->softwareimage[default-image*]]% commit
This ensures that the updated kernel is used after reboot. Tab-completion in the set
kernelversion line prompts for the right kernel from available options.
3. A reboot of the head and regular nodes installs the new kernel.
4. Configuring and installing the vendor OFED stack driver for the new kernel is done by running
the script <vendor-ofedVersion>-install.sh as before, as follows:
• For a stack that is on the head node, the compilation should be done together with the -h
option:
[root@basecm10~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -h
• For a software image used by the regular nodes, for example default-image, the compilation
should be done together with the -s option:
[root@basecm10~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -s default-image
• For an image that is built from source, the -b option (page 93) can be used with the vendor-
ofedVersion>-install.sh commands of step 4
These configuration and installation steps for the vendor OFED driver are typically not needed for
Ubuntu.
11
Burning Nodes
The burn framework is a component of NVIDIA Base Command Manager 10 that can automatically run
test scripts on specified nodes within a cluster. The framework is designed to stress test newly built
machines and to detect components that may fail under load. Nodes undergoing a burn session with
the default burn configuration, lose their filesystem and partition data for all attached drives, and revert
to their software image on provisioning after a reboot.
Example
<?xml version="1.0"?>
<burnconfig>
<mail>
<address>root@master</address>
<address>[email protected]</address>
</mail>
<pre-install>
<phase name="01-hwinfo">
<test name="hwinfo"/>
<test name="hwdiff"/>
98 Burning Nodes
<phase name="02-disks">
<test name="disktest" args="30"/>
<test name="mce_check" endless="1"/>
</phase>
</pre-install>
<post-install>
<phase name="03-hpl">
<test name="hpl"/>
<test name="mce_check" endless="1"/>
</phase>
<phase name="04-compile">
<test name="compile" args="6"/>
<test name="mce_check" endless="1"/>
</phase>
</post-install>
</burnconfig>
11.2.4 Phases
The phases sections must exist. If there is no content for the phases, the phases tags must still be in place
(“must exist”). Each phase must have a unique name and must be written in the burn configuration file
in alphanumerical order. By default, numbers are used as prefixes. The phases are executed in sequence.
11.2.5 Tests
Each phase consists of one or more test tags. The tests can optionally be passed arguments using the args
property of the burn configuration file (section 11.2). If multiple arguments are required, they should be
a space separated list, with the (single) list being the args property.
Tests in the same phase are run simultaneously.
11.3 Running A Burn Configuration 99
Most tests test something and then end. For example, the disk test tests the performance of all drives
and then quits.
Tests which are designed to end automatically are known as non-endless tests.
Tests designed to monitor continuously are known as endless tests. Endless tests are not really endless.
They end once all the non-endless tests in the same phase are ended, thus bringing an end to the phase.
Endless tests typically test for errors caused by the load induced by the non-endless tests. For example
the mce_check test continuously keeps an eye out for Machine Check Exceptions while the non-endless
tests in the same phase are run.
A special test is the final test, memtest86, which is part of the default burn run, as configured in the
XML configuration default-destructive. It does run endlessly if left to run. To end it, the adminis-
trator can deal with its output at the node console or can power reset the node. It is usually convenient
to remove memtest86 from the default XML configuration in larger clusters, and to rely on the HPL and
memtester tests instead, for uncovering memory hardware errors.
Example
The values of a particular burn configuration (default-destructive in the following example) can
be viewed as follows:
Example
The set command can be used to modify existing values of the burn configuration, that is:
Description, Name, and XML. XML is the burn configuration file itself. The get xml command can be
used to view the file, while using set xml opens up the default text editor, thus allowing the burn
configuration to be modified.
A new burn configuration can also be added with the add command. The new burn configuration
can be created from scratch with the set command. However, an XML file can also be imported to the
new burn configuration by specifying the full path of the XML file to be imported:
Example
100 Burning Nodes
The burn configuration can also be edited when carrying out burn execution with the burn com-
mand.
Executing A Burn
A burn as specified by the burn configuration file can be executed in cmsh using the burn command of
device mode.
Burn commands: The burn commands can modify these properties, as well as execute other burn-
related operations.
The burn commands are executed within device mode, and are:
• burn start
• burn stop
• burn status
• burn log
The burn help text that follows lists the detailed options. Next, operations with the burn commands
illustrate how the options may be used along with some features.
[head1->device[node005]]% burn
Name: burn - Node burn control
Include all nodes that have the given image, e.g default-image or
default-image,gpu-image
-i, --intersection
Calculate the intersection of the above selections
-u, --union
Calculate the union of the above selections
--config <name>
Burn with the specified burn configuration. See in partition burn configurations
for a list of valid names
--file <path>
Burn with the specified file instead of burn configuration
--later
Do not reboot nodes now, wait until manual reboot
--edit
Open editor for last minute changes
--no-drain
Do not drain the node from WLM before starting to burn
--no-undrain
Do not undrain the node from WLM after burn is complete
-p, --path
Show path to the burn log files. Of the form: /var/spool/burn/<mac>.
-v, --verbose
Show verbose output (only for burn status)
--sort <field1>[,<field2>,...]
Override default sort order (only for burn status)
Examples:
burn --config default-destructive start -n node001
Burn command operations: Burn commands allow the following operations, and have the following
features:
• start, stop, status, log: The basic burn operations allow a burn to be started or stopped, and the
status of a burn to be viewed and logged.
– The “burn start” command always needs a configuration file name. In the following it is
boxburn. The command also always needs to be given the nodes it operates on:
[basecm10->device]% burn --config boxburn -n node007 start
Power reset nodes
[basecm10->device]%
ipmi0 .................... [ RESET ] node007
Fri Nov 3 ... [notice] basecm10: node007 [ DOWN ]
[basecm10->device]%
Fri Nov 3 ... [notice] basecm10: node007 [ INSTALLING ] (node installer started)
[basecm10->device]%
Fri Nov 3 ... [notice] basecm10: node007 [ INSTALLING ] (running burn in tests)
...
– The “burn stop” command only needs to be given the nodes it operates on, for example:
[basecm10->device]% burn -n node007 stop
each line of output is quite long, so each line has been rendered truncated and ellipsized.
The ellipsis marks in the 5 preceding output lines align with the lines that follow.
That is, the lines that follow are the endings of the preceding 5 lines:
...Warnings Tests
...--------- --------------------------------------------------------------
...0
...0
...0 /var/spool/burn/c8-1f-66-f2-61-c0/02-disks/disktest (S,171),\
/var/spool/burn/c8-1f-66-f2-61-c0/02-disks/kmon (S),\
/var/spool/bu+
– The “burn log” command displays the burn log for specified node groupings. Each node
with a boot MAC address of <mac> has an associated burn log file.
Burn command output examples: The burn status command has a compact one-line output per
node:
Example
burn test kernel log monitor kmon (SP) Started and Passed
Letter Meaning
S started
W warning
F failed
P passed
The “burn log” command output looks like the following (some output elided):
The output of the burn log command is actually the messages file in the burn directory, for the node
associated with a MAC-address directory <mac>. The burn directory is at /var/spool/burn/ and the
messages file is thus located at:
/var/spool/burn/<mac>/messages
The tests have their log files in their own directories under the MAC-address directory, using their
phase name. For example, the pre-install section has a phase named 01-hwinfo. The output logs of this
test are then stored under:
/var/spool/burn/<mac>/01-hwinfo/
Non-endless Tests
The following example test script is not a working test script, but can be used as a template for a non-
endless test:
Example
#!/bin/bash
# We need to know our own test name, amongst other things for logging.
me=`basename $0`
# Inside the spool directory a sub-directory with the same name as the
# test is also created. This directory ($spooldir/$me) should be used
# for any output files etc. Note that the script should possibly remove
# any previous output files before starting.
spooldir=$1
# In case a test detects trouble but does not want the entire burn to be
# halted $warningfile _and_ $passedfile should be created. Any warnings
# should be written to this file.
warningfile=$spooldir/$me.warning
# Some short status info can be written to this file. For instance, the
# stresscpu test outputs something like 13/60 to this file to indicate
# time remaining.
# Keep the content on one line and as short as possible!
statusfile=$spooldir/$me.status
# Some scripts may require some cleanup. For instance a test might fail
# and be
# restarted after hardware fixes.
rm -f $spooldir/$me/*.out &>/dev/null
# Send a message to the burn log file, syslog and the screen.
# Always prefix with $me!
blog "$me: starting, option1 = $option1 option2 = $option2"
Endless Tests
The following example test script is not a working test, but can be used as a template for an endless test.
Example
#!/bin/bash
# We need to know our own test name, amongst other things for logging.
me=`basename $0`
# In case a test detects trouble but does not want the entire burn to be
# halted $warningfile _and_ $passedfile should be created. Any warnings
# should be written to this file.
warningfile=$spooldir/$me.warning
# Some short status info can be written to this file. For instance, the
# stresscpu test outputs something like 13/60 to this file to indicate
# time remaining.
# Keep the content on one line and as short as possible!
statusfile=$spooldir/$me.status
else
blog "$me: starting test, checking every minute"
# Some scripts may require some cleanup. For instance a test might fail
# and be restarted after hardware fixes.
rm -f $spooldir/$me/*.out &>/dev/null
while [ -e "$spooldir/$me/running" ]
do
run-some-check
if [ was_a_problem ]; then
blog "$me: WARNING, something unexpected happened."
echo "some warning" >> $warningfile # note the append!
elif [ failure ]; then
blog "$me: Aiii, we're all gonna die! my-test FAILED!"
echo "Failure message." > $failedfile
fi
sleep 60
done
fi
Example
Here, burn-control, which is the parent of the disk testing process, keeps track of the tests that pass
and fail. On failure of a test, burn-control terminates all tests.
The node that has failed then requires intervention from the administrator in order to change state.
The node does not restart by default. The administrator should be aware that the state reported by the
node to CMDaemon remains burning at this point, even though it is not actually doing anything.
To change the state, the burn must be stopped with the burn stop command in cmsh. If the node is
restarted without explicitly stopping the burn, then it simply retries the phase at which it failed.
Under the burn log directory, the log of the particular test that failed for a particular node can some-
times suggest a reason for the failure. For retries, old logs are not overwritten, but moved to a directory
with the same name, and a number appended indicating the try number. Thus:
Example
1. The BurnSpoolDir setting can be set in the CMDaemon configuration file on the head node, at
/cm/local/apps/cmd/etc/cmd.conf. The BurnSpoolDir setting tells CMDaemon where to look
for burn data when the burn status is requested through cmsh.
• BurnSpoolDir="/var/spool/burn"
CMDaemon should be restarted after the configuration has been set. This can be done with:
2. The burnSpoolHost setting, which matches the host, and burnSpoolPath setting,
which matches the location, can be changed in the node-installer configuration file
on the head node, at /cm/node-installer/scripts/node-installer.conf (for multi-
arch/multidistro configurations the path takes the form: /cm/node-installer-<distribution>-
<architecture>/scripts/node-installer.conf). These have the following values by default:
• burnSpoolHost = master
110 Burning Nodes
• burnSpoolPath = /var/spool/burn
3. Part 3 of the procedure adds a new location to export the burn log. This is only relevant if the
spool directory is being relocated within the head node. If the spool is on an external fileserver,
the existing burn log export may as well be removed.
The new location can be added to the head node as a path value, from a writable filesystem export
name. The writable filesystem export name can most easily be added using Base View, via the
clickpath:
Devices→Head Nodes→Edit→Settings→Filesystem exports→Add
Adding a new name like this is recommended, instead of just modifying the path value in an
existing Filesystem exports name. This is because changing things back if the configuration is
done incorrectly is then easy. By default, the existing Filesystem exports for the burn directory
has the name:
• /var/spool/burn@internalnet
• /var/spool/burn
When the new name is set in Filesystem exports, the associated path value can be set in agree-
ment with the values set earlier in parts 1 and 2.
If using cmsh instead of Base View, then the change can be carried out from within the fsexports
submode. Section 3.12.1 of the Administrator Manual gives more detail on similar examples of how
to add such filesystem exports.
Example
<burnconfig>
<pre-install>
<phase name="01-hwinfo">
<test name="hwinfo"/>
<test name="sleep" args="10"/>
</phase>
</pre-install>
<post-install>
<phase name="02-mprime">
<test name="mprime" args="2"/>
<test name="mce_check" endless="1"/>
<test name="kmon" endless="1"/>
</phase>
</post-install>
</burnconfig>
11.4 Relocating The Burn Logs 111
To burn a single node with this configuration, the following could be run from the device mode of
cmsh:
Example
This makes an editor pop up containing the default burn configuration. The content can be replaced
with the short burn configuration. Saving and quitting the editor causes the node to power cycle and
start its burn.
The example burn configuration typically completes in less then 10 minutes or so, depending mostly
on how fast the node can be provisioned. It runs the mprime test for about two minutes.
12
Installing And Configuring
SELinux
12.1 Introduction
Security-Enhanced Linux (SELinux) can be enabled on selected nodes. If SELinux is enabled on a stan-
dard Linux operating system, then it is typically initialized in the kernel when booting from a hard
drive. However, in the case of nodes provisioned by NVIDIA Base Command Manager, via PXE boot,
the SELinux initialization occurs at the very end of the node installer phase.
SELinux is disabled by default because its security policies are typically customized to the needs
of the organization using it. The administrator must therefore decide on appropriate access control
security policies. When creating such custom policies, special care should be taken that the cmd process
is executed in, ideally, an unconfined context.
Before enabling SELinux on a cluster, the administrator is advised to first check that the Linux distri-
bution used offers enterprise support for SELinux-enabled systems. This is because support for SELinux
should be provided by the distribution in case of issues.
Enabling SELinux is only advised for BCM if the internal security policies of the organization ab-
solutely require it. This is because it requires custom changes from the administrator. If something is
not working right, then the effect of these custom changes on the installation must also be taken into
consideration, which can sometimes be difficult.
SELinux is partially managed by BCM and can run on the head and regular nodes. The SELinux
settings managed by CMDaemon (via cmsh or Base View) should not be managed by directly dealing
with the node outside of CMDaemon management, as that can lead to an inconsistent knowledge of the
SELinux settings by CMDaemon.
When first configuring SELinux to run with BCM on regular nodes, the nodes should be configured
with permissive mode to ensure that the nodes work with applications. Troubleshooting permissive
mode so that enforcing mode can be enabled is outside the scope of BCM support, unless the issue is
demonstrably a BCM-related issue.
Example
-------------------------------- ------------------------------------------------
Initialize yes
Revision
Reboot after context restore no
Allow NFS home directories yes
Context action auto install always
Context action full install always
Context action nosync install always
Mode permissive
Policy targeted
Key value settings <submode>
The Mode can be set to permissive, enforcing or disabled. When starting the use of SELinux and
establishing policies, it should be set to permissive to begin with, so that troubleshooting issues to do
with running applications with enforcing mode can be examined.
The default SELinux configuration parameters are in /cm/node-installer/scripts/
node-installer.conf, and that file remains unchanged by cmsh settings changes. The values of
SELinux configuration parameters used from that file are however overridden by the corresponding
cmsh settings.
For multiarch/multidistro configurations the node-installer path in the preceding session takes
the form: /cm/node-installer-<distribution>-<architecture>/scripts/node-installer.conf. The val-
ues for <distribution> and <architecture> can take the values outlined on (page 551 of the Administrator
Manual).
Example
The SELinux settings can then be configured for the newly-cloned category.
Example
[basecm10->category]% use secategory; selinuxsettings
[basecm10->category[secategory]->selinuxsettings]% keyvaluesettings
[basecm10->category*[secategory*]->selinuxsettings*->keyvaluesettings*]% set domain_can_mmap_files 1
[basecm10->category*[secategory*]->selinuxsettings*->keyvaluesettings*]% exit
[basecm10->category*[secategory*]->selinuxsettings*]% set mode<tab><tab>
disabled enforcing permissive
[basecm10->category*[secategory*]->selinuxsettings*]% set mode permissive #for now, to debug apps
[basecm10->category*[secategory*]->selinuxsettings*]% commit
The domain_can_mmap_files boolean setting is needed to allow SELinux policies to revalidate some
kinds of file access in memory.
Creating a new image and using setfiles to set up SELinux file contexts on the new image: One
good way to have a node come up with SELinux file contexts, is to set up the image that is provisioned
so that the image has the contexts already.
This can be configured by first cloning the image, with:
Example
[basecm10->category[secategory]->selinuxsettings]% softwareimage
[basecm10->softwareimage]% list
Name (key) Path Kernel version Nodes
-------------------- ----------------------------- ----------------------------- --------
default-image /cm/images/default-image 5.14.0-284.11.1.el9_2.x86_64 5
[basecm10->softwareimage]% clone default-image selinux-image; commit
...
...[notice] basecm10: Initial ramdisk for image selinux-image was generated successfully
Then, after selinux-image has been generated, the contexts can be set up in the new image with the
SELinux setfiles command, using the -r option to set the root path:
Example
[basecm10->softwareimage]% quit
[root@basecm10 ~]# setfiles -r /cm/images/selinux-image \
/etc/selinux/targeted/contexts/files/file_contexts /cm/images/selinux-image/
[root@basecm10 ~]# setfiles -r /cm/images/selinux-image \
/etc/selinux/targeted/contexts/files/file_contexts.local /cm/images/selinux-image/
If the image is updated in the future with new packages, or new files, then the setfiles commands
in the preceding example must be run again to set the file contexts.
116 Installing And Configuring SELinux
Organizing the nodes and setting them up with the newly-created SELinux image: Nodes in the
category can be listed with:
[basecm10->category[secategory]]% listnodes
...lists the nodes in that category...
Nodes can be placed in the category from device mode. For example, node001, node002, and
node003 can be configured with:
[basecm10->category[secategory]]% device
[basecm10->device]% foreach -n node001..node003 (set category secategory)
If the nodes in the category secategory are to run file systems with SELinux file contexts, then the
image generated for this earlier on, selinux-image, can be committed to that category with:
Example
Example
system_u:object_r:admin_home_t:s0 original-ks.cfg
system_u:object_r:admin_home_t:s0 rpmbuild
A
Other Licenses, Subscriptions,
Or Support Vendors
NVIDIA Base Command Manager comes with enough software to allow it to work with no additional
commercial requirements other than its own. However, BCM integrates with some other products that
that have their own separate commercial requirements. The following table lists commercial software
that requires a separate license, subscription, or support vendor, and an associated URL where more
information can be found.
Software URL
Workload managers
PBS Professional http://www.altair.com
MOAB http://www.adaptivecomputing.com
LSF http://www.ibm.com/systems/platformcomputing/products/lsf/
GE http://www.altair.com
Distributions
Suse http://www.suse.com
Red Hat http://www.redhat.com
Compilers
Intel https://software.intel.com/en-us/intel-sdp-home
Miscellaneous
Amazon AWS http://aws.amazon.com
B
Hardware Recommendations
The hardware suggestions in section 3.1 are for a minimal cluster, and are inadequate for larger clusters.
For larger clusters, hardware suggestions and examples are given in this section.
The memory used depends significantly on CMDaemon, which is the main NVIDIA Base Command
Manager service component, and on the number of processes running on the head node or regular node.
The number of processes mostly depends on the number of metrics and health checks that are run.
Hard drive storage mostly depends on the number of metrics and health checks that are managed
by CMDaemon.
A device means any item seen as a device by CMDaemon. A list of devices can be seen by cmsh under
its device node. Examples of devices are: regular nodes, cloud nodes, switches, head nodes, GPU units,
and PDUs.
120 Hardware Recommendations
This assumes less than 100 metrics and health checks are being measured, which is a default for
systems that are just head nodes and regular nodes. Beyond the first 100 metrics and health checks, each
further 100 extra take about 1MB extra per device.
B.2.2 Suggested Head Node Specification For Clusters Beyond 1000 Nodes
For clusters with more than 1000 nodes, a head node is recommended with at least the following speci-
fications:
• 24 cores
• 128 GB RAM
• 512 GB SSD
The extra RAM is useful for caching the filesystem, so scrimping on it makes little sense.
Handy for speedy retrievals is to place the monitoring data files, which are by default located under
/var/spool/cmd/monitoring/, on an SSD.
A dedicated /var or /var/lib/mysql partition for clusters with greater than 2500 nodes is also a
good idea.
C
Base Command Manager
Essentials And NVIDIA AI
Enterprise
Base Command Manager Essentials (BCME) is the NVIDIA AI Enterprise (https://docs.nvidia.com/
ai-enterprise/index.html) edition of Base Command Manager.
– Kubernetes
– automated scaling
– a tightly integrated Run:ai
• provides comprehensive management for cluster control and job monitoring. This includes man-
aging and monitoring for
– GPU metrics
– resource allocation
– access control
– chargeback options
122 Base Command Manager Essentials And NVIDIA AI Enterprise
• AI Enterprise vSphere
• NVIDIA drivers
• NVIDIA containers
meaning, easier information gathering, and a more optimal saving of information for later internal ref-
erence and search queries.
• regular support (customer has a question or problem that requires an answer or resolution), and
• professional services (customer asks for the team to do something or asks the team to provide
some service).
Professional services outside of the standard scope of support can be purchased via the NVIDIA
Enterprise Services page at:
https://www.nvidia.com/en-us/support/enterprise/services/