Lustre Installation and Administration Lab Manual - Intel EE For Lustre With ZFS
Lustre Installation and Administration Lab Manual - Intel EE For Lustre With ZFS
Version 2017.3
DAY 1:
DAY 2:
4. Data Lifecycle Management
5. Striping
6. OST Pools
7. Quotas
DAY 3:
8. Lustre Networking, Advanced
9. Lustre Troubleshooting
10. Lustre Tuning
Lab 0
Student Notes:
Note: The head node is now called KTLN2 and the systems are hosted at an Intel Site.
Each student will be assigned a virtual environment (VE) to complete the labs. Each VE is made
up of ten (10) virtual machines (VMs), with each VM having multiple storage targets attached.
The OSTs are all equal size while the MGT is a smaller volume than is the MDT.
Each VE has the same naming scheme except for the stXX prefix on node names. Student IDs
will be assigned prior to the first lab; you will use these student IDs to connect to the correct VE.
Example: Student ID 20 will connect to the hosts st20-mds1, st20-oss1, etc.
The lab nodes correspond to the above drawing using these interfaces:
eth0 = blue
eth1 = green
eth2=red
Each system has a primary system (OS) disk - /dev/vda. The MDS1 server has a 10GB disk
(/dev/sda) and a 1GB disk (/dev/sdb). The OSS1 and OSS2 servers have each two 10GB
disks - /dev/sda, /dev/sdb,/dev/sdc,/dev/sdd. Carefully follow the instructions
provided to ensure that you use the correct device files when creating file systems.
To login to the “stXX” nodes you must first login to the head node (ktln2).
From “ktln2”, each node in your VE can be reached via its hostname without password.
In Putty:
Souce port = 90XX
Destination = localhost:90XX
3. On your workstation you can open your browser and use the location
https://localhost:90XX to access to Intel Manager for Lustre
The examples in this lab manual use student ID 16 as well as “stXX”. As you work through the
exercises, substitute your Student ID in place of “st16” or “stXX”.
Preparation:
ST=st16
HL=`echo mds{1..2} oss{1..4} iml1 cli{1..3}`
for i in $HL;
do ssh -o StrictHostKeyChecking=no root@$ST-$i uname -n
done
Lab 1
Student Notes:
Step 2: List the already provided IEEL software package in the /root directory
Step 3: Change into the ieel installation directory and list the package contents
# cd /root/ee-3.X.X.X
# ls
Please verify the settings for the LANG variable in the bash environment.
Currently Intel Manager for Lustre is supporting only “en_US.utf8”
Username: admin
Email: <your email address>
Password: admin.123
Confirm password: admin.123
User 'admin' successfully created.
Step 7: Hit return without typing to use the default “localhost” as the time synchronization server
Step 8: View the installation log file for a detailed description of the installed packages
$ less /var/log/chroma/install.log
[Google Chrome has tested as one of the better performing web browsers to the IML
interface.]
Step 2: Use “https” to access your training clusters from the ssh tunel setup.
https://localhost:90xx
Example for the st16 training cluster: https:// localhost:9016
Step 4: Wait until all the javascript components are downloaded in your computer
Step 5: Login as the super-user Administrator using the credentials inserted during the installation
phase
You should now see the "Configuration" link in the top menu bar
Step 3: Hostname - Type in the hostname of the server to be added: stxx-mds1 and click Next.
On the confirmation screen, click Proceed.
Step 5: The Server Profile should appear and the “Managed storage server” profile should be
automatically selected
Note: “Monitored storage server” can be selected for monitoring existing Lustre servers
and file systems
Step 7: Click on Add another to repeat steps 3 – 5 and add stxx-mds2 and the OSS servers {stxx-
oss1, stxx-oss2}
Step 9: Click on the Volumes tab in the Configuration page and verify that all the volumes are in
GREEN Status.
On stXX-mds1:
Step 2: Find the Volume named “mgt” and make the primary server stXX-mds2 and the failover
server stXX-mds1.
Step 3: Find the Volume named “mdt” and make the primary server stXX-mds1 and the failover
server stXX-mds2.
Step 4: Click the “Apply” button and confirm the volume modification on XX0001 and XX0002.
Step 7. Finish
Lab 2
Student Notes:
cd $HOME
cat > test.sh <<\_EOF
while true; do
for i in {1..100}; do
done
sleep 2
rm -rf /mnt/Lustre01/dir-*
sleep 2
done
_EOF
chmod 755 test.sh
./test.sh
Run the script from one client only and navigate the Dashboard.
Let the script keep running for the following tasks
Step 4: Go back to the client’s terminal. Failover should be transparent for the application
Step 4: Go back to the client’s terminal. Failback should be transparent for the application
Click on the “Create OST” button and add 4 HA Zpools as new OSTs
After few minutes the total capacity of your lustre filesystem will be increased.
The script should work without any interruption.
The Dashboard should be update with new objects.
Why are areas of the heatmap on the dashboard is not updated?
Please stop the script for the upcoming Lab.
Lab 3
Student Notes:
$ ssh root@stXX-mds1
Why can’t LNET be stopped? Because, when a storage target is mounted, libcfs.ko
enforces a dependency that lnet.ko must keep the interface up. Run the following
command to see which modules Lustre has loaded:
Step 1: Open the IML Web UI and navigate to Configuration File Systems
Step 4: From the Actions menu, select Failover. This will move the Metadata Target
(e.g. Lustre01-MDT0000) from stxx-mds1 to stxx-mds2.
Step 5: Wait until complete. Verify by looking at the "Started on" column. The line
will also display an alert icon.
3. Now, login to stXX-mds1 and use the "lustre_rmmod" command to remove all of the
Lustre kernel modules.
[mds1]# lustre_rmmod
[mds1]# lsmod |grep -E 'lustre|lnet'
insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko
networks="tcp1(eth1)"
Enter the ‘lctl’ subshell and check to see if loading the LNET module started LNET. If not, start
LNET using “network up”. List your local NID.
Note in the following lab 10.10.116.* is for st16. 10.10.101.* would be the st01 subnet to use as
an example. You can check the eth1 device to double confirm your subnet.
[mds1]# lctl
lctl > list_nids
IOC_LIBCFS_GET_NI error 100: Network is down
5. Ping your local NID from inside the LNET subshell, then exit and ping the NID on
OSS1.
NOTE: 10.10.1XX.* is the subnet with XX is your student number. ST16 info seen
below adjust the subnet for your student.
> exit
6. Bring the Lustre network down and again try to ping OSS1 and MDS1.
[mds1]# lustre_rmmod
[mds1]# lsmod |grep lnet
9. Move to the IML web interface and verify the status of server stxx-mds1
a. Navigate to Configuration Servers
b. Look at the LNet State column in the Server Configuration table
10. Load all the Lustre modules and verify that LNET started automatically.
11. Use the IML web interface to Failback the metadata target to stxx-mds1
Step 1: Open the IML Web UI and navigate to Configuration File Systems
Step 2: Click on the Lustre01 file system
Step 3: Scroll down to the Metadata Target section
Step 4: From the Actions menu, select Failback.
Step 5: Wait until complete. Verify by looking at the "Started on" column. The line
will also display an alert icon.
Lab 4
Student Notes:
Step 2: Create a Lustre filsystem Lustre01 as previously but enable the HSM flag.
Step 2: Add stXX-cli3, after few seconds the window will display an Incompatible “Managed
Profile”, please select “POSIX HSM Agent Node” as Server profile
Step 3: Log into the HSM Agent node, stXX-cli3, and create a POSIX archive file system:
2. Create file:
[root@st31-cli1 Lustre01]# cd /mnt/Lustre01/
[root@st31-cli1 Lustre01]# dd if=/dev/zero of=pippo bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00744627 s, 141 MB/s
1. From a Lustre* client, create a new much larger file and archive it
[root@st31-cli1 Lustre01]# dd if=/dev/zero of=big bs=1M count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 34.4295 s, 62.4 MB/s
[root@st31-cli1 Lustre01]# lfs hsm_archive big
2. The archive command returns straight away. It will take some time to archive the file. You
can watch the progress on the HSM agent host or you can use “lfs hsm_state” to monitor
the file:
[root@st31-cli1 Lustre01]# lfs hsm_state big
big: (0x00000001) exists, archive_id:1
You can see the activity of the copytool in the HSM page of IML.
3. The copy to the archive is not complete until hsm_state lists both "exists" and "archived"
[root@st31-cli1 Lustre01]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root 8.4G 1.2G 6.8G 15% /
tmpfs 1.4G 0 1.4G 0% /dev/shm
/dev/vda1 485M 71M 389M 16% /boot
10.10.131.1@tcp1:10.10.131.2@tcp1:/Lustre01 79G 17G 59G 22% /mnt/Lustre01
Even if the file size is 2GB (with the ls command), the space used in the file system is decreased.
The 2 GB file is stored in the Archive file system (/data)
5. Restore the file using an indirect command and time how long it takes:
[root@st01-cli1 Lustre01]# time file big
big: data
real 1m49.416s
user 0m0.005s
sys 0m0.006s
Notice the difference? The first timed run had to wait for the file to be restored from the archive,
so it took longer to complete. By comparison, the second run completed almost instantly, since the
file was already restored to Lustre.
At this point, files created before the activation of the changelog are not moved in the
backup directory.
3. Create a file
6. Verify the file in the lustre file system and in the backup directory
Lab 5
Lustre Striping
Student Notes:
6. Review the free space available from cli1 and cli2. It should look like this:
7. Make a new subdirectory in your Lustre file system on one of your clients. Use the “lfs
getstripe” command to review the default striping policy for this directory:
# mkdir /mnt/Lustre01/new_dir_1
8. Create a file in the new subdirectory. Review the striping pattern for the file. In the
following example:
a. obdidx == 0, meaning the “index” OST (first OST) was OST0000
b. objid == 2, meaning the object id (count) on that OST was object # 2.
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.02791 s, 102 MB/s
Repeat this for a couple more files, checking that the obdidx moves in a round-robin
like manner, and that the objid increments appropriately. Delete the files and repeat
this process. What do you find with regard to obdidx and objid? Is that what you
expected?
9. Create a new directory and use the “lfs setstripe” command to set the following striping
policy on the new sub-directory:
a. Stripe size of 128k
b. Stripe count of 2
c. Default stripe index
# mkdir -p /mnt/Lustre01/new_dir_2
# lfs getstripe /mnt/Lustre01/new_dir_2
# lfs setstripe -s 128K -c 2 -i -1 /mnt/Lustre01/new_dir_2
# lfs getstripe /mnt/Lustre01/new_dir_2
10. Create some more files in your subdirectory. Review the striping policy used for new
files less than 128KB in size and files larger than 128KB in size.
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.134001 s, 78.3 MB/s
/mnt/Lustre01/new_dir_2/file3
lmm_stripe_count: 2
lmm_stripe_size: 131072
lmm_layout_gen: 0
lmm_stripe_offset: 0
obdidx objid objid group
0 34 0x22 0
2 2 0x2 0
# touch /mnt/Lustre01/new_dir_2/file4
/mnt/Lustre01/new_dir_2/file4
lmm_stripe_count: 2
lmm_stripe_size: 131072
lmm_layout_gen: 0
lmm_stripe_offset: 3
obdidx objid objid group
3 3 0x3 0
0 35 0x23 0
Notice how even small files, that is, files whose size is less than the stripe size, get
striped across the “stripe_count” of OSTs.
# mkdir /mnt/Lustre01/nonstriped
# mkdir /mnt/Lustre01/striped
# mkdir /mnt/Lustre01/reallystriped
2. Check OST usage and create three (3) 1GB files (for a total of 9 files) in each of the
directories, checking usage between each new file. How well was space allocated by
Lustre? Look at the following example:
# lfs df -h
UUID bytes Used Available Use% Mounted on
Lustre01-MDT0000_UUID 7.5G 2.0G 5.0G 29% /mnt/Lustre01[MDT:0]
Lustre01-OST0000_UUID 9.8G 2.1G 7.2G 23% /mnt/Lustre01[OST:0]
Lustre01-OST0001_UUID 9.8G 2.1G 7.2G 23% /mnt/Lustre01[OST:1]
Lustre01-OST0002_UUID 9.8G 2.0G 7.3G 22% /mnt/Lustre01[OST:2]
Lustre01-OST0003_UUID 9.8G 2.1G 7.2G 23% /mnt/Lustre01[OST:3]
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 13.5676 s, 77.3 MB/s
# lfs df -h
Looks like the last file was written to OST-3. Confirm with “getstripe”:
# lfs getstripe /mnt/Lustre01/nonstriped/X
/mnt/Lustre01/nonstriped/X
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 3
obdidx objid objid group
3 78 0x4e 0
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 16.7898 s, 62.5 MB/s
# lfs df -h
Looks like the last file was written to OST0 & OST1. Confirm.
# lfs getstripe /mnt/Lustre01/striped/Y
/mnt/Lustre01/striped/Y
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 0
obdidx objid objid group
0 74 0x4a 0
1 76 0x4c 0
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 35.2855 s, 29.7 MB/s
# lfs df -h
UUID bytes Used Available Use% Mounted on
Lustre01-MDT0000_UUID 7.5G 2.0G 5.0G 29% /mnt/Lustre01[MDT:0]
Lustre01-OST0000_UUID 9.8G 2.9G 6.5G 31% /mnt/Lustre01[OST:0]
Lustre01-OST0001_UUID 9.8G 2.9G 6.5G 31% /mnt/Lustre01[OST:1]
Lustre01-OST0002_UUID 9.8G 2.3G 7.1G 24% /mnt/Lustre01[OST:2]
Lustre01-OST0003_UUID 9.8G 3.4G 6.0G 36% /mnt/Lustre01[OST:3]
1. The default striping patterns on a directory may be overridden when files are created
using the “lfs setstripe” command. Consider the following example and create some
files that have non-default striping patterns in each of the three directories above.
Use a “dd” command to write some data to each of the files you have created.
1. In this case we want to change the striping pattern of an existing file. Attempt to
change the striping pattern of one of your files. For example:
2. Ok, that didn’t work. As it turns out, Lustre does not support changing the striping
pattern of an existing file. To change the striping pattern on an existing file, it is
necessary to 1) create a new “temp” file (using ‘lfs setstripe’) with the desired striping
pattern, 2) copy the existing file to the newly created file, and finally 3) delete the
original file and rename the temp file. Give this a try and verify that your operations
worked.
Lab 6
OST Pools
Student Notes:
OST Pools
In this Lab you will create different pools and write files to the OSTs in a pool. Unfortunately,
some of the commands have to run on the MGS while others run on the MDS. To avoid
changing hosts to execute the different commands, we will use IML to perform a failover of
the MDT (stxx-mds1) to the MGS (stxx-mds2).
Add one OST from each OSS to each pool. For example:
[root@st16-mds2 ~]# lctl pool_list Lustre01.pool-03
Pool: Lustre01.pool-03
Lustre01-OST0003_UUID
7. On CLI1, make three new subdirectories in your Lustre file system and assign each
directory to an OST pool. The result of this is that files written to a directory which is
assigned to a pool causes the files to be written only to OSTs in that pool.
8. Using ‘dd’, write a new file into the “/mnt/Lustre01/pool-03” subdirectory. Confirm
that the file was written to the OST pool correctly.
Note that only one (1) OST in the pool was written to. Why didn’t both OSTs get data
written to them? Read on for the answer.
9. In the previous steps a directory was assigned to an OST pool, however the striping
policy on the directory was the Lustre default (stripe_count: 1). To have files written
to that directory AND use all the OSTs in that pool, it is necessary to change the
striping policy for that directory.
Although we could define the policy to "start at the first OST" and “stripe across the
remaining OSTs”, we know that OSTs can be added and removed from pools, so it is
much better to use the “-1” values and Lustre figure it out dynamically.
Although the “getstripe” and “setstripe” arguments both support the “--pool” argument,
it should be realized that a striping policy may NOT be set on an OST pool. Instead, a
striping policy is set on a directory (or file), and a directory is assigned to a pool.
10. Use “dd” or “touch” to write *several* files in each of the "pool" directories, and
confirm that each file is written to the correct OSTs. Notice how “lfs getstripe” on a
directory returns the striping pattern for the directory and for the files inside.
/mnt/Lustre01/pool-03/file-03.dd
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_layout_gen: 0
lmm_stripe_offset: 3
lmm_pool: pool-03
obdidx objid objid group
3 7 0x7 0
Pools and directories are flexible. Test out the example below and prove to yourself that
files can be created in pools of which the directory is NOT a member.
[cli1]# lfs setstripe -p Lustre01.pool-03 /mnt/Lustre01/file-in-pool-03
[cli1]# lfs getstripe -p /mnt/Lustre01/file-in-pool-03
pool-03
11. This example extends the previous, demonstrating that a file can be created in one pool
even though its parent directory is in another pool.
This example demonstrates that files retain their "creation-time" settings regardless of the state of
the pool.
[cli1]# lfs setstripe -c -1 -p Lustre01.pool-123 /mnt/Lustre01/pool-123/file-123
lmm_layout_gen: 0
lmm_stripe_offset: 2
lmm_pool: pool-123
obdidx objid objid group
2 7 0x7 0
3 10 0xa 0
1 5 0x5 0
On the MGS, remove OST2 from pool-123 and recheck the striping of the file.
On the MGS, remove the remaining OSTs, destroy pool-123 and again check the file.
(Yes, that last step is accurate. Files created in a pool retain that information.)
12. Verify that files written to the "pool-123" directory retain the stripe size (-1) and that
the stripe size applies at the file system level (all 4 OSTs are written to).
Lab 7
Quotas
Student Notes:
Quotas
Preparation steps
1. On the MDS, the OSSs and the clients, add a new Linux user and directory.
2. As the newly created user, write 99 new files, some with 100 MB of data.
1. Check usage and quotas using the ‘lfs quota’ command. Note that if no user name is
specified, the current user is used.
1. From a client node, set a quota (soft limit) of 1 GB and 1500 files for user user1:
2. From a client node, set group quotas to a size larger than the individual user quotas:
4. Determine how many new files it will take user1 over the file count quota (soft limit)
but not over the limit (hard limit), and then as user1, write that many files. The
example writes 1500 files, which should be appropriate.
[root@cli1]# su - user1
5. Verify that the user1 account is now over the quota for files. An asterisk will be seen
alongside the file count, and a grace period countdown will have begun.
[user1@cli1]$ lfs quota -u user1 /mnt/Lustre01
Disk quotas for user user1 (uid 1001):
Filesystem kbytes quota limit grace files quota limit grace
/mnt/Lustre01 819212 1024000 2048000 - 1599* 1500 2000 6d23h59m43s
6. Similarly, determine and then write enough data to force user1 over the block quota but
not over the block limit. This example writes 300MB.
[user1@cli1 ~]$ for a in {9..11}; do
dd if=/dev/zero of=/mnt/Lustre01/users/$a bs=1M count=100 ;done
Verify that user1 is now over the limit for both quotas. Add the verbose option to display
block utilization per target.
[user1@st16-cli1 ~]$ lfs quota -u user1 -v /mnt/Lustre01/
Disk quotas for user user1 (uid 1001):
Filesystem kbytes quota limit grace files quota limit grace
/mnt/Lustre01/ 1126428* 1024000 2048000 6d23h59m40s 1599* 1500 2000
6d23h51m57s
Lustre01-MDT0000_UUID
0 - 0 - 1599* - 1599 -
Lustre01-OST0000_UUID
307208 - 308232 - - - - -
Lustre01-OST0001_UUID
307208 - 307216 - - - - -
Lustre01-OST0002_UUID
307208 - 308240 - - - - -
Lustre01-OST0003_UUID
204804* - 204804 - - - - -
7. Observe the results as you attempt to write more than 1GB of data.
Review the size of the file you have written and explain your results. Finally, run a
quota check and notice that the grace period for block usage is no longer displayed.
Why? Remove the '99' file and check the grace period timer. Does it make sense?
8. Realizing that the default grace period (1 week) for being over the soft limit is too
short for your needs, change the grace period to 2 weeks for block data and 3 weeks for
files for users. For groups, change it to 3 weeks for block data and 4 weeks for inodes.
Then, check quota's on "user1." Examples follow:
Note that if week (w), day (d), hour (h), etc. are not specified, Lustre quotas assume
that the value provided is in seconds.
Usage:
lfs setquota -t <-u|-g>
[--block-grace <block-grace>]
[--inode-grace <inode-grace>] <filesystem>
lfs setquota -t <-u|-g>
[-b <block-grace>] [-i <inode-grace>]
<filesystem>
lfs setquota help
9. After changing the grace periods, observe the grace period for the "user1" user.
What happened? The current grace period did not change to the new grace period?
Change just made does not immediately change the grace period!
If an account, like in this case "user1", is over their quota (user or group), changing
the grace period does not change the countdown timer. This change will not help
them until the next time they go over a limit.
Reset quotas on the user1 account to 0, effectively disabling quotas for that user.
[root@cli1]# su - user1
[user1@cli1 ~]$ id
uid=1001(user1) gid=1001(user1) groups=1001(user1),100(users)
/etc/passwd:user2:x:1002:1002::/home/user2:/bin/bash
/etc/group:users:x:100:user1,user2
/etc/group:user2:x:1002:
Then assign a group quota to the users group for files, block data, or both.
Have user2 exceed the quota for the entire group, and then attempt to write using the user1
account.
Lab 8
Student Notes:
Advanced LNET
In this task you will configure clients on different LNET subnets. Each client will be able to
reach all the Lustre targets as the Lustre servers will run two LNET subnets.
On each Server:
lustre_rmmod
lctl network down
lustre_rmmod
5. Modify your Lustre MDS and OSS servers to configure LNET to run over two separate TCP
channels using interfaces eth0 and eth1. Use the networks option to configure LNET on each
server.
Run these two commands on all MDS / OSS servers (shown below is MDS1 only):
6. Configure LNET on client1 to use the eth0 interface using the networks option.
...
7. Configure LNET on client2 to use the eth1 interface using the networks option.
...
[root@st20-cli2 ~]# lctl list_nids
10.10.116.6@tcp1
8. Login to the IML UI and rescan the lustre server’s NIDs using the button in the Configuration
Servers screen
9. Create a new Lustre Filesystem called Lustre01 using IML. Make sure that only the volumes
on the OSS servers are used for the OSTs (XX0003, XX0004, XX0005 and XX0006).
10. On CLI1, verify that you can ping Lustre servers over the tcp0 channel.
11. On CLI2, verify that you can ping Lustre servers over the tcp1 channel.
12. Mount the Lustre file system on both client systems. Explain the difference between the mount
commands used on the two client systems.
13. Write some data to the Lustre file system from CLI1 and read it back from CLI2.
[cli2]# wc -l /mnt/Lustre01/a
100 /mnt/Lustre01/a
Task B: - Schedule LNET self test between an OSS node and a client node
1. Load LNET self test model on both OSS1 and CLI1 (only OSS1 shown):
[cli1]# lst add_test --batch bulk_write --from client --to oss brw write check=full
size=1M
Test was added successfully
In this task you will configure CLI1 to be a router, capable of routing LNET traffic between
MDS1 and CLI2.
[mds1]# lustre_rmmod
[mds1]# cat /etc/modprobe.d/lustre.conf
options lnet networks="tcp1(eth1)" routes="tcp2 10.10.1XX.5@tcp1"
6. On CLI2, configure the ETH2 network interface with IP Address 192.168.1XX.6, where XX is
the student number. Use the command system-config-network to create the interface
configuration. Use the command "ifup eth2" to bring the interface online.
7. Configure a LNET route specifying the router NID to reach tcp1.
[cli2]# lustre_rmmod
[cli2]# cat /etc/modprobe.d/lustre.conf
options lnet networks="tcp2(eth2)" routes="tcp1 192.168.1XX.5@tcp2"
1. On CLI1, configure the ETH2 network interface with IP Address 192.168.1XX.5, where XX is
the student number. Use the command system-config-network to create the interface
configuration. Use the command "ifup eth2" to bring the interface online.
2. Configure LNET routing (forwarding) between tcp1(eth1) and tcp2(eth2).
[cli1]# lustre_rmmod
[cli1]# cat /etc/modprobe.d/lustre.conf
options lnet networks="tcp1(eth1),tcp2(eth2)" forwarding=enabled
4. On MDS1, perform an LNET ping to the NID of the last failed ping.
5. Why isn't it working? Well, the answer isn't great, but its why you came to this class - to learn
these little tidbits of information. On MDS1 and CLI2, check the status of the router.
State down!? Why down? Let's check the state of the router to confirm.
Hmm, looks like routing is enabled on the router. What else could it be?
Well, if you recall from the steps above, we first had you configure the clients and THEN
configure the router. So when the clients loaded up the LNET module, they checked the
router status and found it not was not up - thus, they now report that the router is not up
- no, it's not quite as dynamic as would be ideal. Now you know to bring up the routers
first.
From here you don't need to unload / reload modules, but rather just reconfigure LNET.
Notice that the state of the router is now up, but that MDS1 and CLI2 are NOT routers.
7. On MDS1, CLI1 and CLI2, use the following set of commands to remove the LNET modules
(shown only for MDS1):
8. Reset the LNET modules on all nodes using the following commands:
Lab 9
Lustre Troubleshooting
Student Notes:
Lustre Troubleshooting
2. In the Dashboard you can click on each block of the Heatmap chart to discover
applications and users that are doing I/O on the selectes OST.
4. The above log output says that from NID 10.10.101.6@tcp1 the Metadata Server received
several operation code number 36. Use this command to identify the meaning of the
operation code :
[root@st16-mds1 lustre]# cat /usr/include/lustre/lustre_idl.h|grep 36
MDS_REINT = 36,
5948344271957065728:10.10.101.1@tcp1:12345-
10.10.101.5@tcp1:x1452231776798196:480:Complete:1384956825:0s(-6s) opc 49
5948344271960604672:10.10.101.1@tcp1:12345-
10.10.101.5@tcp1:x1452231776798200:480:Complete:1384956825:0s(-6s) opc 49
5948344271964405760:10.10.101.1@tcp1:12345-
10.10.101.5@tcp1:x1452231776798204:480:Complete:1384956825:0s(-6s) opc 49
7. The operation code and the NID is changed. Depending on the version of Lustre used and
the commands issued on the Lustre file system, the opcode may be different to the ones
shown in thwe above example. Again, search for the opcode to find its meaning:
[root@st16-mds1 lustre]# cat /usr/include/lustre/lustre_idl.h|grep 49
MDS_GETXATTR = 49,
1. View the set of debug options currently captured by the debug daemon
4. Run some commands in the lustre file system and stop the daemon
7. Disable debugging
Lab 10
Lustre Tuning
Student Notes:
Lustre Tuning
Task A: Setting the max cache size on clients
4. Create a 5M file and verify that the lustre client takes this file in the cache
reclaim_count: 0
llite.Lustre01-ffff8800b5544c00.max_cached_mb=1024
6. If you unmount the client and re-mount, lustre will set the parameter to the default
[root@st16-cli2 ~]# umount /mnt/Lustre01/
7. Use the IML to set the value centrally for all the clients.
8. Navigate to Configuration File Systems and select the Lustre01 file system.
9. Use the Update Advanced Settings button to set the max_cached_mb value to 1024. Verify
10. From both of the clients, verify the value, e.g.:
[root@st16-cli1 ~]# lctl get_param llite.*.max_cached_mb
llite.Lustre01-ffff8800379ca800.max_cached_mb=
users: 4
max_cached_mb: 1024
used_mb: 0
unused_mb: 1024
reclaim_count: 0
No license under any patent, copyright, trade secret or other intellectual property right is granted to or conferred upon you by
disclosure or delivery of the Materials, either expressly, by implication, inducement, estoppel or otherwise. Any license under such
intellectual property rights must be express and approved by Intel in writing.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in
personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL
APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND
AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS
COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR
INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT
OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS
NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the
absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition
and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information
here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to
deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Before using any third party software referenced herein, please refer to the third party software provider’s website for more
information, including without limitation, information regarding the mitigation of potential security vulnerabilities in the third
party software.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by
calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.