Deep Learning Server Platform_Admin Manual 2.0
Deep Learning Server Platform_Admin Manual 2.0
SERVER
Admin Manual
From
6. Server Access 7
8. User Provisioning 9
9. Nvidia-smi 11
10. Reference for Docker Installation 12
11. Reference Docker commands 13
12. Linux Administration 17
1/19
About Deep Learning Server platform
We are happy to inform you that recently we had implemented a high end NVIDIA GPU
platform with capability to support development and execution of Artificial Intelligence based
research projects and applications. This platform has good configuration that supports large and
computationally intensive operations.
Hardware Configuration:
https://www.supermicro.com/en/products/system/gpu/tower/sys-741ge-tnrt CPU:
Memory: 1 TB
Storage: 1 x 1.9TB NVMe PCIe (For OS) / 3 x 14TB SATA 6Gb/s7.2K (File storage)
2/19
NVIDIA H100 Tensor Core GPU
Extraordinary performance, scalability,and security for every data center
3/19
NVLink Bridge
NVIDIA NVLink is a high-speed point-to-point (P2P) peer transfer connection. Where one GPU can
transfer data to and receive data from one other GPU. The NVIDIA H100 card supports NVLink bridge
connection with a single adjacent NVIDIA H100 card. Each of the three attached bridges spans two PCIe
slots. To function correctly as well as to provide peak bridge bandwidth, bridge connection with an
adjacent NVIDIA H100 card must incorporate all three NVLink bridges. Wherever an adjacent pair of
NVIDIA H100 cards exists in Product Features NVIDIA H100 PCIe GPU PB-11133-001_v02 | 9 the server,
for best bridging performance and balanced bridge topology, the NVIDIA H100 pair should be bridged.
4/19
Software stack layout – High level
5/19
Implemented Software Stack Architecture:
6/19
Server Access:
Credential
Password: ****
7/19
Jupyter Hub access:
172.16.8.22 (Through Browser)
Credential:
Username: admin
Password: V!t@321
Note: Without Docker, version conflict issues will be created for projects within the user
8/19
User Provisioning
Steps to Set Up the Container
Create a Dockerfile
nano Dockerfile
Open the Dockerfile in a text editor and add the following content:
# Use the Ubuntu base image
FROM ubuntu:latest
# Install required packages (openssh-server and sudo)
RUN apt-get update && apt-get install -y \
openssh-server \
sudo \
&& apt-get clean
# Create SSH directory and enable SSH
RUN mkdir /var/run/sshd
# Add the user setup script to the container
COPY setup_user.sh /usr/local/bin/setup_user.sh
RUN chmod +x /usr/local/bin/setup_user.sh
# Expose the SSH port
EXPOSE 22
# Allow root login via SSH (optional)
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
# Ensure PasswordAuthentication is enabled
RUN sed -i 's/#PasswordAuthentication yes/PasswordAuthentication yes/' /etc/ssh/sshd_config
# Run the setup script and start the SSH service
CMD ["/bin/bash", "-c", "/usr/local/bin/setup_user.sh && /usr/sbin/sshd -D"]
save the file by using ctrl+0 and ctrl+x.
Open the setup_user.sh in a text editor and add the following content.
#!/bin/bash
# Set default values if environment variables are not provided
USERNAME=${USERNAME:-user}
PASSWORD=${PASSWORD:-password}
# Create the user with the specified username and password
useradd -m -s /bin/bash "$USERNAME"
echo "$USERNAME:$PASSWORD" | chpasswd
# Install sudo if not already installed
apt-get update && apt-get install -y sudo
# Add the user to the sudo group
usermod -aG sudo "$USERNAME"
# Ensure the sudoers file has no restrictions for the sudo group
9/19
echo "%sudo ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
# Allow the user to use SSH
mkdir -p /home/$USERNAME/.ssh
chown -R $USERNAME:$USERNAME /home/$USERNAME/.ssh
echo "User $USERNAME created with password $PASSWORD and added to sudo group."
# Execute the provided command (SSH in this case)
exec "$@"
save the file by using ctrl+0 and ctrl+x.
Save the following content in a file named setup_user.sh (placed in the same directory as the Dockerfile)
Use the docker run command to start a container. Pass the USERNAME and PASSWORD as environment
variables:
docker run -d --name <container name> -p 2222:22 -e USERNAME=<user name> -e
PASSWORD=<password> <image name>
-p 2222:22: Maps port 2222 on the host to port 22 in the container.
-e USERNAME and -e PASSWORD: Set the username and password dynamically.
From the host system or an external machine, use the following command to SSH into the container:
ssh username@<host-ip> -p 2222
Replace <host-ip> with your host system's IP address (e.g., 10.10.10.10).
Enter the password when prompted (password123 in this example).
Nvidia-smi
11/19
Reference for Installation docker into the machine.
https://https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-
ubuntu-20-04
12/19
Docker Commands - For Reference
1. To display the help information for Docker commands:
Docker --help
3. The command to get the Docker version in JSON format using the --format option:
Docker version --format ‘{{ json .}}’
4. This command provides detailed information about your Docker installation, including system-wide
settings, storage drivers, network information, and more. No additional arguments are needed for a
basic docker info:
Docker info
5. This command provides information about how to use the docker pull command in Docker. Running it
will display a list of options, flags, and a brief description of how to pull images from a Docker
registry:
Docker pull --help
7. This command in Docker is used to download the latest version of the Redis image from Docker Hub:
Docker pull redis (doesn’t use a ‘tag’, so the latest version is pulled by default)
8. The command docker image ls is used to list all the Docker images on your local machine. It provides
information about the image repositories, tags, image IDs, creation dates, and sizes:
Docker image ls
9. This command in short is used to list all Docker images stored locally on your system. It displays the
repository, tag, image ID, creation time, and size of each image:
Docker images
10. This command will run Redis in the background with default settings. If Redis is not installed locally,
Docker will pull the Redis image from Docker Hub automatically:
Docker run redis
13/19
11. This command is used to list all the currently running Docker containers. It shows information about
the container ID, image, command, status, ports, and names of the running containers:
Docker ps
12. The command docker ps -a is used in Docker to list all containers on your system, including both
running and stopped containers:
Docker ps -a
13. The command docker run -it redis is used to start a Docker container running the Redis image
interactively:
Docker run -it redis
14. The command docker run -d redis is used to run a Redis container in detached mode:
Docker run -d redis
16. This command starts a Redis container named akhilredis in the background, with an interactive
terminal:
Docker run -it --name=akhilredis -d redis
17. The docker stats command provides real-time information about the resource usage (like CPU,
memory, network I/O) of running Docker containers. It shows metrics for each container, including
CPU usage,Memory usage,Network I/O,Block I/O,PIDs (number of processes):
Docker stats
18. This command searches the Docker Hub for images related to "redis." It will return a list of Redis
images, along with details like the image name, description, stars, and whether it’s an official image:
Docker search redis
19. This command will return a list of Redis images on Docker Hub with a minimum of 3 stars and display
their full descriptions:
Docker search --filter=stars=3 --no-trunc redis
20. This command searches for Docker images related to Redis on Docker Hub with a minimum of 3 stars
and limits the results to 10 images, displaying full descriptions:
Docker search --filter=stars=3 --no-trunc --limit 10 redis
14/19
21. The command docker start a8217c4c56 is used to start a Docker container with the specified
container ID (a8217c4c56):
Docker start a8217c4c56 (also try name instead of ID here)
22. The command docker stop a8217c4c56 is used to stop a running Docker container:
Docker stop a8217c4c56 (also try name instead of ID here)
23. To restart a Docker container, you can use either the container ID:
Docker restart a8217c4c56 (also try name instead of ID here)
24. The command docker pause is used to temporarily pause all processes within a container:
Docker pause a8217c4c56 (also try name instead of ID here)
26. The command you are referring to is used to view the logs of a Docker container:
Docker logs a8217c4c56 (also try name instead of ID here)
27. The command docker exec -it a8217c4c56 bash is used to run a new interactive shell session inside a
running Docker container:
Docker exec -it a8217c4c56 bash (start bash inside the container, type exit to exit the bash)
28. The docker run command you provided is used to create and start a new container based on the redis
image, but with an error in how it specifies the command to run inside the container:
Docker run -i -t --name=akhilredis -d redis /bin/bash
29. The command docker exec 023828e786e0 apt-get update runs the apt-get update command inside a
running Docker container identified by the container ID 023828e786e0:
Docker exec 023828e786e0 apt-get update
30. The command docker rename vibrant_yellow test is used to rename a Docker container:
Docker rename vibrant_yellow test (renames the container to “test”, container can be
running or stopped)
31. The command docker rm test is used in Docker to remove a container named test:
Docker rm test (you have to stop the container before removing it, also try this with
container ID)
32. The command docker stop $(docker ps -a -q) is used to stop all running Docker containers:
Docker stop $(docker ps -a -q) (Stops all running containers)
15/19
33. The command docker rm -f $(sudo docker ps -a -q) is used to forcefully remove all Docker containers,
both running and stopped, from the system:
Docker rm -f $(sudo docker ps -a -q) (removes all stopped containers)
34. The command docker inspect happy_faraday is used to get detailed information about a Docker
container or image named happy_faraday. When you run this command, Docker will return a JSON
output containing all available details about the container or image:
Docker inspect happy_faraday (also works with ID)
35. The command docker kill happy_faraday is used to immediately stop (terminate) a running Docker
container named happy_faraday:
Docker kill happy_faraday (same as stop)
36. The command docker kill $(docker ps -q) is used to stop all running Docker containers:
Docker kill $(docker ps -q) (stops all running containers)
37. The docker volume create command is used to create a new volume in Docker. Volumes are
persistent storage areas that can be used by containers to store data outside the container's file
system:
Docker volume create new-vol
38. The command docker volume ls is used to list all the volumes in Docker. Docker volumes are used to
persist data created by and used by Docker containers:
Docker volume ls
39. The docker volume inspect command is used to retrieve detailed information about a specific Docker
volume:
Docker volume inspect new-vol
40. This command will start a Redis container in the background, named redisvol, and mount a Docker
volume named new-vol to the /app directory inside the container. If the volume doesn’t already exist,
Docker will create it automatically:
Docker run -d --name redisvol --mount source=new-vol,target=/app redis (create a new vol ->
docker volume create new-vol)
16/19
Linux Administration
Displays who is logged into the system, showing their login time and terminal.
who
Displays the last login times for all users on the system.
lastlog
Lists all running processes, including the user, CPU, and memory usage.
ps aux
Real-time process monitoring, showing resource usage, such as CPU, memory, and active users.
top
An enhanced version of top , with an interactive user interface for monitoring system processes.
htop
Reports on virtual memory statistics, system processes, paging, block I/O, and CPU usage.
vmstat
Displays memory usage, showing total, used, and free memory in a human readable format.
free -h
glances Linux Administration 4 A cross-platform system monitoring tool that provides real-time system
resource usage CPU, memory, disk, network).
glances
Real-time monitoring tool that provides detailed reports on system activity, including CPU, memory, disk I/O,
and network.
sudo atop
Lists users as stored in the system's user database, which can include additional sources like LDAP.
getent passwd
Adds a user to a group. The -a flag appends the user to the group without removing them from other groups.
sudo usermod -aG <group> <username>
18/19
Deletes a user from the system.
sudo userdel <username>
19/19