0% found this document useful (0 votes)
637 views1,252 pages

Storage Ceph 5 Documentation IBM

Uploaded by

sanju237678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
637 views1,252 pages

Storage Ceph 5 Documentation IBM

Uploaded by

sanju237678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1252

IBM Storage Ceph

IBM
© Copyright IBM Corp. 2024.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Tables of Contents
IBM Storage Ceph 1
Summary of changes 1
Release notes 1
Enhancements 1
Bug fixes 3
Known issues 11
Sources 12
Asynchronous updates 12
Release Notes for 5.3z6 12
Enhancements 12
Bug fixes 13
Known issues 18
Concepts 18
Architecture 19
Ceph architecture 19
Core Ceph components 20
Prerequisites 21
Ceph pools 21
Ceph authentication 22
Ceph placement groups 23
Ceph CRUSH ruleset 24
Ceph input/output operations 24
Ceph replication 25
Ceph erasure coding 26
Ceph ObjectStore 27
Ceph BlueStore 28
Ceph self management operations 28
Ceph heartbeat 29
Ceph peering 29
Ceph rebalancing and recovery 29
Ceph data integrity 30
Ceph high availability 30
Clustering the Ceph Monitor 30
Ceph client components 31
Prerequisites 31
Ceph client native protocol 31
Ceph client object watch and notify 32
Ceph client Mandatory Exclusive Locks 32
Ceph client object map 33
Ceph client data striping 34
Ceph on-wire encryption 36
Data Security and Hardening 38
Introduction to data security 38
Preface 38
Introduction to IBM Storage Ceph 38
Supporting Software 39
Threat and Vulnerability Management 39
Threat Actors 40
Security Zones 40
Connecting Security Zones 41
Security-Optimized Architecture 41
Encryption and Key Management 42
SSH 43
SSL Termination 43
Messenger v2 protocol 43
Encryption in transit 44
Encryption at Rest 45
Identity and Access Management 45
Ceph Storage Cluster User Access 46
Ceph Object Gateway User Access 46
Ceph Object Gateway LDAP or AD authentication 47
Ceph Object Gateway OpenStack Keystone authentication 47
Infrastructure Security 47
Administration 48
Network Communication 48
Hardening the Network Service 49
Reporting 50
Auditing Administrator Actions 50
Data Retention 51
Ceph Storage Cluster 51
Ceph Block Device 51
Ceph Object Gateway 51
Federal Information Processing Standard (FIPS) 52
Summary 52
Planning 52
Compatibility 53
Compatibility Matrix for IBM Storage Ceph 5.3 53
Hardware 54
Executive summary 54
General principles for selecting hardware 55
Identify performance use case 55
Consider storage density 56
Identical hardware configuration 56
Network considerations for IBM Storage Ceph 56
Avoid using RAID solutions 57
Summary of common mistakes when selecting hardware 57
Reference 58
Optimize workload performance domains 58
Server and rack solutions 59
Minimum hardware recommendations for containerized Ceph 62
Recommended minimum hardware requirements for the IBM Storage Ceph Dashboard 63
Storage Strategies 63
Overview 63
What are storage strategies? 64
Configuring storage strategies 65
Crush admin overview 65
CRUSH introduction 66
Dynamic data placement 67
CRUSH failure domain 68
CRUSH performance domain 68
Using different device classes 69
CRUSH hierarchy 69
CRUSH location 70
Adding a bucket 71
Moving a bucket 71
Removing a bucket 72
CRUSH Bucket algorithms 72
Ceph OSDs in CRUSH 72
Viewing OSDs in CRUSH 74
Adding an OSD to CRUSH 76
Moving an OSD within a CRUSH Hierarchy 76
Removing an OSD from a CRUSH Hierarchy 76
Device class 77
Setting a device class 77
Removing a device class 78
Renaming a device class 78
Listing a device class 78
Listing OSDs of a device class 78
Listing CRUSH Rules by Class 79
CRUSH weights 79
Setting CRUSH weights of OSDs 79
Setting a Bucket’s OSD Weights 80
Set an OSD’s in Weight 80
Setting the OSDs weight by utilization 80
Setting an OSD’s Weight by PG distribution 81
Recalculating a CRUSH Tree’s weights 82
Primary affinity 82
CRUSH rules 82
Listing CRUSH rules 85
Dumping CRUSH rules 85
Adding CRUSH rules 85
Creating CRUSH rules for replicated pools 85
Creating CRUSH rules for erasure coded pools 86
Removing CRUSH rules 86
CRUSH tunables overview 86
CRUSH tuning 87
CRUSH tuning, the hard way 87
CRUSH legacy values 88
Edit a CRUSH map 88
Getting the CRUSH map 88
Decompiling the CRUSH map 89
Compiling the CRUSH map 89
Setting a CRUSH map 89
CRUSH storage strategies examples 89
Placement Groups 91
About placement groups 91
Placement group states 92
Placement group tradeoffs 94
Data durability 94
Data distribution 95
Resource usage 96
Placement group count 96
Placement group calculator 96
Configuring default placement group count 97
Placement group count for small clusters 97
Calculating placement group count 97
Maximum placement group count 97
Auto-scaling placement groups 98
Placement group auto-scaling 98
Viewing placement group scaling recommendations 98
Placement group splitting and merging 100
Setting placement group auto-scaling modes 101
Viewing placement group scaling recommendations 98
Setting placement group auto-scaling 103
Setting minimum and maximum number of placement groups for pools 104
Updating noautoscale flag 105
Specifying target pool size 105
Specifying target size using the absolute size of the pool 106
Specifying target size using the total cluster capacity 106
Placement group command line interface 106
Setting number of placement groups in a pool 107
Getting number of placement groups in a pool 107
Getting statistics for placement groups 107
Getting statistics for stuck placement groups 108
Getting placement group maps 108
Scrubbing placement groups 108
Getting a placement group statistics 108
Marking unfound objects 109
Pools overview 109
Pools and storage strategies overview 110
Listing pool 111
Creating a pool 111
Setting pool quota 113
Deleting a pool 113
Renaming a pool 113
Viewing pool statistics 114
Setting pool values 114
Getting pool values 114
Enabling a client application 114
Disabling a client application 115
Setting application metadata 115
Removing application metadata 116
Setting the number of object replicas 116
Getting the number of object replicas 116
Pool values 117
Erasure code pools overview 120
Creating a sample erasure-coded pool 121
Erasure code profiles 121
Setting OSD erasure-code-profile 123
Removing OSD erasure-code-profile 124
Getting OSD erasure-code-profile 124
Listing OSD erasure-code-profile 125
Erasure Coding with Overwrites 125
Erasure Code Plugins 125
Creating a new erasure code profile using jerasure erasure code plugin 125
Controlling CRUSH Placement 127
Installing 128
IBM Storage Ceph 128
IBM Storage Ceph considerations and recommendations 129
Basic IBM Storage Ceph considerations 129
IBM Storage Ceph workload considerations 131
Network considerations for IBM Storage Ceph 133
Considerations for using a RAID controller with OSD hosts 134
Tuning considerations for the Linux kernel when running Ceph 135
How colocation works and its advantages 135
Operating system requirements for IBM Storage Ceph 138
Minimum hardware considerations for IBM Storage Ceph 139
IBM Storage Ceph installation 140
cephadm utility 141
How cephadm works 142
cephadm-ansible playbooks 143
Registering the IBM Storage Ceph nodes 144
Configuring Ansible inventory location 145
Enabling SSH login as root user on Red Hat Enterprise Linux 9 146
Creating an Ansible user with sudo access 147
Enabling password-less SSH for Ansible 148
Configuring SSH 149
Configuring a different SSH user 150
Running the preflight playbook 151
Bootstrapping a new storage cluster 153
Recommended cephadm bootstrap command options 155
Obtaining entitlement key 155
Using a JSON file to protect login information 156
Bootstrapping a storage cluster using a service configuration file 156
Bootstrapping the storage cluster as a non-root user 158
Bootstrap command options 159
Configuring a private registry for a disconnected installation 160
Running the preflight playbook for a disconnected installation 165
Performing a disconnected installation 166
Changing configurations of custom container images for disconnected installations 168
Distributing SSH keys 169
Launching the cephadm shell 170
Verifying the cluster installation 171
Adding hosts 172
Using the addr option to identify hosts 174
Adding multiple hosts 174
Adding hosts in disconnected deployments 176
Removing hosts 176
Labeling hosts 177
Adding a label to a host 178
Removing a label from a host 178
Using host labels to deploy daemons on specific hosts 179
Adding Monitor service 181
Setting up the admin node 182
Deploying Ceph monitor nodes using host labels 183
Adding Ceph Monitor nodes by IP address or network name 184
Removing the admin label from a host 185
Adding Manager service 186
Adding OSDs 186
Purging the Ceph storage cluster 187
Deploying client nodes 189
Managing an IBM Storage Ceph cluster using cephadm-ansible modules 191
cephadm-ansible modules 191
cephadm-ansible modules options 192
Bootstrapping a storage cluster using the cephadm_bootstrap and cephadm_registry_login modules 193
Adding or removing hosts using the ceph_orch_host module 195
Setting configuration options using the ceph_config module 199
Applying a service specification using the ceph_orch_apply module 201
Managing Ceph daemon states using the ceph_orch_daemon module 202
Comparison between Ceph Ansible and Cephadm 203
cephadm commands 204
What to do next? Day 2 209
Upgrading 209
Upgrading to an IBM Storage Ceph cluster using cephadm 209
Upgrading to IBM Storage Ceph cluster 210
Upgrading the IBM Storage Ceph cluster in a disconnected environment 213
Staggered upgrade 215
Staggered upgrade options 215
Performing a staggered upgrade 216
Monitoring and managing upgrade of the storage cluster 218
Troubleshooting upgrade error messages 218
Configuring 219
The basics of Ceph configuration 219
Ceph configuration 219
The Ceph configuration database 220
Using the Ceph metavariables 222
Viewing the Ceph configuration at runtime 222
Viewing a specific configuration at runtime 223
Setting a specific configuration at runtime 223
OSD Memory Target 224
Setting the OSD memory target 224
MDS Memory Cache Limit 225
Ceph network configuration 225
Network configuration for Ceph 226
Ceph network messenger 228
Configuring a public network 228
Configuring multiple public networks to the cluster 229
Configuring a private network 231
Verifying firewall rules are configured for default Ceph ports 232
Firewall settings for Ceph Monitor node 232
Firewall settings for Ceph OSDs 233
Ceph Monitor configuration 234
Ceph Monitor configuration 235
Viewing the Ceph Monitor configuration database 235
Ceph cluster maps 236
Ceph Monitor quorum 236
Ceph Monitor consistency 237
Bootstrap the Ceph Monitor 237
Minimum configuration for a Ceph Monitor 238
Unique identifier for Ceph 238
Ceph Monitor data store 238
Ceph storage capacity 239
Ceph heartbeat 240
Ceph Monitor synchronization role 240
Ceph time synchronization 241
Ceph authentication configuration 241
Cephx authentication 242
Enabling Cephx 242
Disabling Cephx 243
Cephx user keyrings 244
Cephx daemon keyrings 244
Cephx message signatures 244
Pools, placement groups, and CRUSH configuration 244
Pools placement groups and CRUSH 245
Ceph Object Storage Daemon (OSD) configuration 245
Ceph OSD configuration 245
Scrubbing the OSD 246
Backfilling an OSD 246
OSD recovery 246
Ceph Monitor and OSD interaction configuration 247
Ceph Monitor and OSD interaction 247
OSD heartbeat 247
Reporting an OSD as down 248
Reporting a peering failure 249
OSD reporting status 249
Ceph debugging and logging configuration 250
General configuration options 251
Ceph network configuration options 252
Ceph Monitor configuration options 256
Cephx configuration options 269
Pools, placement groups, and CRUSH configuration options 271
Object Storage Daemon (OSD) configuration options 275
Ceph Monitor and OSD configuration options 287
Ceph debugging and logging configuration options 290
Ceph scrubbing options 294
BlueStore configuration options 298
Administering 298
Administration 299
Ceph administration 299
Understanding process management for Ceph 299
Ceph process management 300
Starting, stopping, and restarting all Ceph daemons 300
Starting, stopping, and restarting all Ceph services 301
Viewing log files of Ceph daemons that run in containers 302
Powering down and rebooting IBM Storage Ceph cluster 303
Powering down and rebooting the cluster using the systemctl commands 303
Powering down and rebooting the cluster using the Ceph Orchestrator 305
Monitoring a Ceph storage cluster 308
High-level monitoring of a Ceph storage cluster 309
Using the Ceph command interface interactively 309
Checking the storage cluster health 309
Watching storage cluster events 310
How Ceph calculates data usage 311
Understanding the storage clusters usage stats 312
Understanding the OSD usage stats 313
Checking the storage cluster status 314
Checking the Ceph Monitor status 315
Using the Ceph administration socket 317
Understanding the Ceph OSD status 321
Low-level monitoring of a Ceph storage cluster 323
Monitoring Placement Group Sets 323
Ceph OSD peering 324
Placement Group States 324
Placement Group creating state 327
Placement group peering state 327
Placement group active state 327
Placement Group clean state 328
Placement Group degraded state 328
Placement Group recovering state 328
Back fill state 328
Placement Group remapped state 329
Placement Group stale state 329
Placement Group misplaced state 329
Placement Group incomplete state 330
Identifying stuck Placement Groups 330
Finding an object’s location 331
Stretch clusters for Ceph storage 331
Stretch mode for a storage cluster 332
Setting the crush location for the daemons 333
Entering the stretch mode 336
Adding OSD hosts in stretch mode 338
Override Ceph behavior 339
Setting and unsetting Ceph override options 340
Ceph override use cases 340
Ceph user management 341
Ceph user management background 341
Managing Ceph users 343
Listing Ceph users 343
Display Ceph user information 345
Add a new Ceph user 346
Modifying a Ceph User 346
Deleting a Ceph user 347
Print a Ceph user key 347
The ceph-volume utility 348
Ceph volume lvm plugin 348
Why does ceph-volume replace ceph-disk? 349
Preparing Ceph OSDs using ceph-volume 350
Listing devices using ceph-volume 351
Activating Ceph OSDs using ceph-volume 352
Deactivating Ceph OSDs using ceph-volume 353
Creating Ceph OSDs using ceph-volume 354
Migrating BlueFS data 355
Using batch mode with ceph-volume 357
Zapping data using ceph-volume 357
Ceph performance benchmark 358
Performance baseline 359
Benchmarking Ceph performance 359
Benchmarking Ceph block performance 361
Ceph performance counters 362
Access to Ceph performance counters 363
Display the Ceph performance counters 363
Dump the Ceph performance counters 364
Average count and sum 365
Ceph Monitor metrics 365
Ceph OSD metrics 367
Ceph Object Gateway metrics 372
BlueStore 374
Ceph BlueStore 374
Ceph BlueStore devices 375
Ceph BlueStore caching 376
Sizing considerations for Ceph BlueStore 376
Tuning Ceph BlueStore using bluestore_min_alloc_size parameter 376
Resharding the RocksDB database using the BlueStore admin tool 378
The BlueStore fragmentation tool 379
What is the BlueStore fragmentation tool? 380
Checking for fragmentation 380
Ceph BlueStore BlueFS 381
Viewing the bluefs_buffered_io setting 382
Viewing Ceph BlueFS statistics for Ceph OSDs 383
Cephadm troubleshooting 384
Pause or disable cephadm 385
Per service and per daemon event 385
Check cephadm logs 386
Gather log files 386
Collect systemd status 387
List all downloaded container images 387
Manually run containers 387
CIDR network error 388
Access the admin socket 388
Manually deploying a mgr daemon 388
Cephadm operations 390
Monitor cephadm log messages 390
Ceph daemon logs 391
Data location 392
Cephadm health checks 392
Cephadm operations health checks 392
Cephadm configuration health checks 393
Managing an IBM Storage Ceph cluster using cephadm-ansible modules 394
The cephadm-ansible modules 395
The cephadm-ansible modules options 395
Bootstrapping a storage cluster using the cephadm_bootstrap and cephadm_registry_login modules 397
Adding or removing hosts using the ceph_orch_host module 399
Setting configuration options using the ceph_config module 402
Applying a service specification using the ceph_orch_apply module 404
Managing Ceph daemon states using the ceph_orch_daemon module 406
Operations 407
Introduction to the Ceph Orchestrator 407
Use of the Ceph Orchestrator 407
Management of services 409
Checking service status 409
Checking daemon status 410
Placement specification of the Ceph Orchestrator 411
Deploying the Ceph daemons using the command line interface 411
Deploying the Ceph daemons on a subset of hosts using the command line interface 413
Service specification of the Ceph Orchestrator 414
Deploying the Ceph daemons using the service specification 414
Management of hosts 416
Adding hosts 416
Adding multiple hosts 418
Listing hosts 419
Adding labels to hosts 420
Removing labels from hosts 421
Removing hosts 422
Placing hosts in the maintenance mode 423
Management of monitors 424
Ceph Monitors 424
Configuring monitor election strategy 425
Deploying the Ceph monitor daemons using the command line interface 425
Deploying the Ceph monitor daemons using the service specification 427
Deploying the monitor daemons on specific network 428
Removing the monitor daemons 429
Removing a Ceph Monitor from an unhealthy storage cluster 430
Management of managers 432
Deploying the manager daemons 432
Removing the manager daemons 433
Using Ceph Manager modules 434
Using the Ceph Manager balancer module 436
Using the Ceph Manager alerts module 439
Using the Ceph manager crash module 441
Management of OSDs 443
Ceph OSDs 444
Ceph OSD node configuration 444
Automatically tuning OSD memory 444
Listing devices for Ceph OSD deployment 445
Zapping devices for Ceph OSD deployment 447
Deploying Ceph OSDs on all available devices 448
Deploying Ceph OSDs on specific devices and hosts 449
Advanced service specifications and filters for deploying OSDs 450
Deploying Ceph OSDs using advanced service specifications 452
Removing the OSD daemons 455
Replacing the OSDs 457
Replacing the OSDs with pre-created LVM 458
Replacing the OSDs in a non-colocated scenario 460
Stopping the removal of the OSDs 464
Activating the OSDs 465
Observing the data migration 466
Recalculating the placement groups 467
Management of monitoring stack 467
Deploying the monitoring stack 468
Removing the monitoring stack 470
Basic IBM Storage Ceph client setup 471
Configuring file setup on client machines 471
Setting-up keyring on client machines 472
Management of MDS service 472
Deploying the MDS service using the command line interface 473
Deploying the MDS service using the service specification 475
Removing the MDS service 476
Management of Ceph object gateway 478
Deploying the Ceph Object Gateway using the command line interface 478
Deploying the Ceph Object Gateway using the service specification 480
Deploying a multi-site Ceph Object Gateway 482
Removing the Ceph Object Gateway 486
Configuration of SNMP traps 487
Simple network management protocol 487
Configuring snmptrapd 488
Deploying the SNMP gateway 491
Handling a node failure 494
Considerations before adding or removing a node 494
Performance considerations 495
Recommendations for adding or removing nodes 496
Adding a Ceph OSD node 496
Removing a Ceph OSD node 498
Simulating a node failure 499
Handling a data center failure 500
Avoiding a data center failure 500
Handling a data center failure 501
Dashboard 502
Ceph dashboard overview 503
Ceph Dashboard components 503
Ceph Dashboard features 504
IBM Storage Ceph Dashboard architecture 505
Ceph Dashboard installation and access 506
Network port requirements for Ceph Dashboard 507
Accessing the Ceph dashboard 508
Setting message of the day (MOTD) 509
Expanding the cluster 511
Toggling Ceph dashboard features 512
Understanding the landing page of the Ceph dashboard 515
Changing the dashboard password 517
Changing the Ceph dashboard password using the command line interface 518
Setting admin user password for Grafana 518
Enabling IBM Storage Ceph Dashboard manually 520
Creating an admin account for syncing users to the Ceph dashboard 521
Syncing users to the Ceph dashboard using Red Hat Single Sign-On 522
Enabling Single Sign-On for the Ceph Dashboard 525
Disabling Single Sign-On for the Ceph Dashboard 526
Management of roles 527
User roles and permissions 527
Creating roles 529
Editing roles 531
Cloning roles 531
Deleting roles 532
Management of users 532
Creating users 533
Editing users 534
Deleting users 535
Management of Ceph daemons 535
Daemon actions 535
Monitor the cluster 536
Monitoring hosts of the Ceph cluster 537
Viewing and editing the configuration of the Ceph cluster 538
Viewing and editing the manager modules of the Ceph cluster 538
Monitoring monitors of the Ceph cluster 539
Monitoring services of the Ceph cluster 540
Monitoring Ceph OSDs 540
Monitoring HAProxy 541
Viewing the CRUSH map of the Ceph cluster 542
Filtering logs of the Ceph cluster 543
Monitoring pools of the Ceph cluster 544
Monitoring Ceph file systems 544
Monitoring Ceph object gateway daemons 545
Monitoring Block device images 545
Management of Alerts 546
Enabling monitoring stack 547
Configuring Grafana certificate 549
Adding Alertmanager webhooks 550
Viewing alerts 552
Creating a silence 552
Re-creating a silence 553
Editing a silence 554
Expiring a silence 554
Management of pools 555
Creating pools 555
Editing pools 556
Deleting pools 556
Management of hosts 558
Entering maintenance mode 558
Exiting maintenance mode 559
Removing hosts 560
Management of Ceph OSDs 561
Managing the OSDs 562
Replacing the failed OSDs 564
Management of Ceph Object Gateway 566
Manually adding Ceph object gateway login credentials to the dashboard 566
Creating the Ceph Object Gateway services with SSL using the dashboard 568
Management of Ceph Object Gateway users 569
Creating Ceph object gateway users 570
Creating Ceph object gateway subusers 571
Editing Ceph object gateway users on the dashboard 572
Deleting Ceph object gateway users 573
Management of Ceph Object Gateway buckets 574
Creating Ceph object gateway buckets 574
Editing Ceph object gateway buckets 575
Deleting Ceph object gateway buckets 576
Monitoring multisite object gateway configuration 577
Management of buckets of a multisite object configuration 578
Editing buckets of a multisite object gateway configuration 578
Deleting buckets of a multisite object gateway configuration 580
Management of block devices 581
Management of block device images 582
Creating images 582
Creating namespaces 583
Editing images 584
Copying images 585
Moving images to trash 586
Purging trash 586
Restoring images from trash 587
Deleting images. 587
Deleting namespaces. 588
Creating snapshots of images 589
Renaming snapshots of images 589
Protecting snapshots of images 590
Cloning snapshots of images 591
Copying snapshots of images 592
Unprotecting snapshots of images 593
Rolling back snapshots of images 594
Deleting snapshots of images 594
Management of mirroring functions 595
Mirroring view 595
Editing mode of pools 596
Adding peer in mirroring 596
Editing peer in mirroring 598
Deleting peer in mirroring 599
Activating and deactivating telemetry 600
Ceph Object Gateway 601
The Ceph Object Gateway 601
Considerations and recommendations 602
Network considerations for IBM Storage Ceph 603
Basic IBM Storage Ceph considerations 604
Colocating Ceph daemons and its advantages 605
IBM Storage Ceph workload considerations 607
Ceph Object Gateway considerations 610
Administrative data storage 610
Index pool 611
Data pool 612
Data extra pool 612
Developing CRUSH hierarchies 612
Creating CRUSH roots 613
Creating CRUSH rules 613
Ceph Object Gateway multi-site considerations 615
Considering storage sizing 616
Considering storage density 617
Considering disks for the Ceph Monitor nodes 617
Adjusting backfill and recovery settings 617
Adjusting the cluster map size 617
Adjusting scrubbing 618
Increase rgw_thread_pool_size 618
Increase objecter_inflight_ops 618
Tuning considerations for the Linux kernel when running Ceph 618
Deployment 619
Deploying the Ceph Object Gateway using the command line interface 620
Deploying the Ceph Object Gateway using the service specification 622
Deploying a multi-site Ceph Object Gateway using the Ceph Orchestrator 624
Removing the Ceph Object Gateway using the Ceph Orchestrator 627
Basic configuration 628
Add a wildcard to the DNS 629
The Beast front-end web server 631
Configuring SSL for Beast 631
Adjusting logging and debugging output 632
Static web hosting 633
Static web hosting assumptions 634
Static web hosting requirements 634
Static web hosting gateway setup 634
Static web hosting DNS configuration 635
Creating a static web hosting site 636
High availability for the Ceph Object Gateway 636
High availability service 636
Configuring high availability for the Ceph Object Gateway 637
HAProxy/keepalived Prerequisites 639
HAProxy/keepalived Prerequisites 640
Preparing HAProxy Nodes 640
Installing and Configuring HAProxy 641
Installing and Configuring keepalived 642
Advanced configuration 644
Multi-site configuration and administration 644
Requirements and Assumptions 645
Pools 647
Migrating a single site system to multi-site 648
Establishing a secondary zone 649
Configuring the archive zone (Technology Preview) 652
Deleting objects in archive zone 652
Failover and disaster recovery 654
Configuring multiple zones without replication 656
Configuring multiple realms in the same storage cluster 658
Multi-site Ceph Object Gateway command line usage 666
Realms 666
Creating a realm 666
Making a Realm the Default 667
Deleting a Realm 667
Getting a realm 667
Listing realms 667
Setting a realm 668
Listing Realm Periods 668
Pulling a Realm 668
Renaming a Realm 668
Zone Groups 668
Creating a Zone Group 669
Making a Zone Group the Default 669
Renaming a Zone Group 669
Deleting a zone group 670
Listing Zone Groups 670
Getting a Zone Group 670
Setting a Zone Group Map 671
Setting a Zone Group 672
Zones 673
Creating a Zone 673
Deleting a zone 674
Modifying a Zone 674
Listing Zones 675
Getting a Zone 675
Setting a Zone 675
Renaming a zone 676
Adding a Zone to a Zone Group 676
Removing a Zone from a Zone Group 676
Configure LDAP and Ceph Object Gateway 677
Install Red Hat Directory Server 677
Configure the Directory Server firewall 677
Label ports for SELinux 678
Configure LDAPS 678
Check if the gateway user exists 678
Add a gateway user 679
Configure the gateway to use LDAP 679
Using a custom search filter 680
Add an S3 user to the LDAP server 680
Export an LDAP token 681
Test the configuration with an S3 client 681
Configure Active Directory and Ceph Object Gateway 682
Using Microsoft Active Directory 683
Configuring Active Directory for LDAPS 683
Check if the gateway user exists 683
Add a gateway user 683
Configuring the gateway to use Active Directory 684
Add an S3 user to the LDAP server 684
Export an LDAP token 685
Test the configuration with an S3 client 685
The Ceph Object Gateway and OpenStack Keystone 686
Roles for Keystone authentication 687
Keystone authentication and the Ceph Object Gateway 687
Creating the Swift service 687
Setting the Ceph Object Gateway endpoints 688
Verifying Openstack is using the Ceph Object Gateway endpoints 689
Configuring the Ceph Object Gateway to use Keystone SSL 690
Configuring the Ceph Object Gateway to use Keystone authentication 690
Restarting the Ceph Object Gateway daemon 691
Security 692
S3 server-side encryption 692
Server-side encryption requests 693
Configuring server-side encryption 693
The HashiCorp Vault 695
Secret engines for Vault 696
Authentication for Vault 697
Namespaces for Vault 697
Transit engine compatibility support 697
Creating token policies for Vault 698
Configuring the Ceph Object Gateway to use SSE-S3 with Vault 699
Configuring the Ceph Object Gateway to use SSE-KMS with Vault 702
Creating a key using the kv engine 704
Creating a key using the transit engine 705
Uploading an object using AWS and the Vault 706
The Ceph Object Gateway and multi-factor authentication 707
Multi-factor authentication 707
Creating a seed for multi-factor authentication 707
Creating a new multi-factor authentication TOTP token 708
Test a multi-factor authentication TOTP token 709
Resynchronizing a multi-factor authentication TOTP token 710
Listing multi-factor authentication TOTP tokens 711
Display a multi-factor authentication TOTP token 711
Deleting a multi-factor authentication TOTP token 712
Administration 713
Creating storage policies 713
Creating indexless buckets 715
Configure bucket index resharding 716
Bucket index resharding 716
Recovering bucket index 717
Limitations of bucket index resharding 718
Configuring bucket index resharding in simple deployments 718
Configuring bucket index resharding in multi-site deployments 719
Resharding bucket index dynamically 721
Resharding bucket index dynamically in multi-site configuration 723
Resharding bucket index manually 725
Cleaning stale instances of bucket entries after resharding 726
Fixing lifecycle policies after resharding 727
Enabling compression 727
User management 728
Multi-tenant namespace 729
Create a user 730
Create a subuser 730
Get user information 731
Modify user information 731
Enable and suspend users 731
Remove a user 732
Remove a subuser 732
Rename a user 732
Create a key 735
Add and remove access keys 735
Add and remove admin capabilities 736
Role management 736
Creating a role 737
Getting a role 738
Listing a role 738
Updating assume role policy document of a role 739
Getting permission policy attached to a role 740
Listing permission policy attached to a role 741
Deleting policy attached to a role 741
Deleting a role 742
Updating the session duration of a role 743
Quota management 744
Set user quotas 744
Enable and disable user quotas 744
Set bucket quotas 745
Enable and disable bucket quotas 745
Get quota settings 745
Update quota stats 745
Get user quota usage stats 746
Quota cache 746
Reading and writing global quotas 746
Bucket management 746
Renaming buckets 747
Moving buckets 748
Moving buckets between non-tenanted users 748
Moving buckets between tenanted users 749
Moving buckets from non-tenanted users to tenanted users 750
Finding orphan and leaky objects 751
Managing bucket index entries 753
Bucket notifications 754
Creating bucket notifications 755
Bucket lifecycle 757
Creating a lifecycle management policy 758
Deleting a lifecycle management policy 760
Updating a lifecycle management policy 761
Monitoring bucket lifecycles 764
Configuring lifecycle expiration window 765
S3 bucket lifecycle transition within a storage cluster 766
Transitioning an object from one storage class to another 766
Enabling object lock for S3 772
Usage 774
Show usage 774
Trim usage 775
Ceph Object Gateway data layout 775
Object lookup path 776
Multiple data pools 776
Bucket and object listing 776
Object Gateway data layout parameters 777
Optimize the Ceph Object Gateway's garbage collection 778
Viewing the garbage collection queue 778
Adjusting Garbage Collection Settings 778
Adjusting garbage collection for delete-heavy workloads 779
Optimize the Ceph Object Gateway's data object storage 780
Parallel thread processing for bucket life cycles 780
Optimizing the bucket lifecycle 781
Testing 781
Create an S3 user 782
Create a Swift user 783
Test S3 access 785
Test Swift access 786
Configuration reference 786
General settings 787
About pools 789
Lifecycle settings 790
Swift settings 790
Logging settings 791
Keystone settings 791
LDAP settings 792
Block devices 792
Introduction to Ceph block devices 792
Ceph block devices 793
Displaying the command help 793
Creating a block device pool 794
Creating a block device image 794
Listing the block device images 795
Retrieving the block device image information 796
Resizing a block device image 796
Removing a block device image 797
Moving a block device image to the trash 798
Defining an automatic trash purge schedule 799
Enabling and disabling image features 799
Working with image metadata 800
Moving images between pools 802
The rbdmap service 803
Configuring the rbdmap service 804
Persistent Write Log Cache (Technology Preview) 804
Persistent write log cache limitations 805
Enabling persistent write log cache 805
Checking persistent write log cache status 807
Flushing persistent write log cache 808
Discarding persistent write log cache 808
Monitoring performance of Ceph Block Devices using the command-line interface 809
Live migration of images 810
The live migration process 810
Formats 811
Streams 812
Preparing the live migration process 813
Preparing import-only migration 814
Executing the live migration process 815
Committing the live migration process 815
Aborting the live migration process 816
Image encryption 817
Encryption format 817
Encryption load 817
Supported formats 818
Adding encryption format to images and clones 819
Snapshot management 820
Ceph block device snapshots 821
The Ceph user and keyring 821
Creating a block device snapshot 821
Listing the block device snapshots 822
Rolling back a block device snapshot 823
Deleting a block device snapshot 823
Purging the block device snapshots 824
Renaming a block device snapshot 824
Ceph block device layering 825
Protecting a block device snapshot 826
Cloning a block device snapshot 826
Unprotecting a block device snapshot 827
Listing the children of a snapshot 827
Flattening cloned images 828
Mirroring Ceph block devices 828
Ceph block device mirroring 829
An overview of journal-based and snapshot-based mirroring 831
Configuring one-way mirroring using the command-line interface 831
Configuring two-way mirroring using the command-line interface 834
Administration for mirroring Ceph block devices 837
Viewing information about peers 838
Enabling mirroring on a pool 838
Disabling mirroring on a pool 839
Enabling image mirroring 839
Disabling image mirroring 840
Image promotion and demotion 841
Image resynchronization 841
Adding a storage cluster peer 842
Removing a storage cluster peer 843
Getting mirroring status for a pool 843
Getting mirroring status for a single image 844
Delaying block device replication 844
Asynchronous updates and Ceph block device mirroring 845
Converting journal-based mirroring to snapshot-based mirrorring 845
Creating an image mirror-snapshot 846
Scheduling mirror-snapshots 846
Creating a mirror-snapshot schedule 847
Listing all snapshot schedules at a specific level 847
Removing a mirror-snapshot schedule 848
Viewing the status for the next snapshots to be created 849
Recover from a disaster 849
Disaster recovery 850
Recover from a disaster with one-way mirroring 850
Recover from a disaster with two-way mirroring 850
Failover after an orderly shutdown 850
Failover after a non-orderly shutdown 851
Prepare for fail back 852
Fail back to the primary storage cluster 854
Remove two-way mirroring 856
Management of ceph-immutable-object-cache daemons 857
Explanation of ceph-immutable-object-cache daemons 857
Configuring the ceph-immutable-object-cache daemon 858
Generic settings of ceph-immutable-object-cache daemons 860
QOS settings of ceph-immutable-object-cache daemons 860
The rbd kernel module 862
Creating a Ceph Block Device and using it from a Linux kernel module client 862
Creating a Ceph block device for a Linux kernel module client using dashboard 862
Map and mount a Ceph Block Device on Linux using the command line 863
Mapping a block device 865
Displaying mapped block devices 866
Unmapping a block device 867
Using the Ceph block device Python module 867
Ceph block device configuration reference 868
Block device default options 869
Block device general options 870
Block device caching options 872
Block device parent and child read options 874
Block device read ahead options 874
Block device blocklist options 875
Block device journal options 875
Block device configuration override options 877
Block device input and output options 879
Developer 880
Ceph RESTful API 880
Prerequisites 881
Versioning for the Ceph API 881
Authentication and authorization for the Ceph API 881
Enabling and Securing the Ceph API module 882
Questions and Answers 883
Getting information 883
How Can I View All Cluster Configuration Options? 884
How Can I View a Particular Cluster Configuration Option? 885
How Can I View All Configuration Options for OSDs? 886
How Can I View CRUSH Rules? 887
How Can I View Information about Monitors? 888
How Can I View Information About a Particular Monitor? 889
How Can I View Information about OSDs? 890
How Can I View Information about a Particular OSD? 891
How Can I Determine What Processes Can Be Scheduled on an OSD? 892
How Can I View Information About Pools? 894
How Can I View Information About a Particular Pool? 895
How Can I View Information About Hosts? 896
How Can I View Information About a Particular Host? 897
Changing Configuration 898
How Can I Change OSD Configuration Options? 898
How Can I Change the OSD State? 899
How Can I Reweight an OSD? 900
How Can I Change Information for a Pool? 901
Administering the Cluster 902
How Can I Run a Scheduled Process on an OSD? 902
How Can I Create a New Pool? 903
How Can I Remove Pool? 904
Ceph Object Gateway administrative API 905
Prerequisites 907
Administration operations 907
Administration authentication requests 907
Creating an administrative user 913
Get user information 915
Create a user 917
Modify a user 921
Remove a user 926
Create a subuser 927
Modify a subuser 929
Remove a subuser 931
Add capabilities to a user 932
Remove capabilities from a user 934
Create a key 935
Remove a key 938
Bucket notifications 939
Prerequisites 939
Overview of bucket notifications 939
Persistent notifications 940
Creating a topic 940
Getting topic information 942
Listing topics 943
Deleting topics 944
Using the command-line interface for topic management 944
Event record 945
Supported event types 947
Get bucket information 947
Check a bucket index 950
Remove a bucket 951
Link a bucket 952
Unlink a bucket 954
Get a bucket or object policy 955
Remove an object 956
Quotas 957
Get a user quota 957
Set a user quota 957
Get a bucket quota 958
Set a bucket quota 960
Get usage information 960
Remove usage information 963
Standard error responses 964
Ceph Object Gateway and the S3 API 965
Prerequisites 907
S3 limitations 965
Accessing the Ceph Object Gateway with the S3 API 966
Prerequisites 966
S3 authentication 966
S3 server-side encryption 968
S3 access control lists 968
Preparing access to the Ceph Object Gateway using S3 969
Accessing the Ceph Object Gateway using Ruby AWS S3 970
Accessing the Ceph Object Gateway using Ruby AWS SDK 973
Accessing the Ceph Object Gateway using PHP 977
Secure Token Service 980
The Secure Token Service application programming interfaces 981
Configuring the Secure Token Service 984
Creating a user for an OpenID Connect provider 985
Obtaining a thumbprint of an OpenID Connect provider 986
Configuring and using STS Lite with Keystone (Technology Preview) 987
Working around the limitations of using STS Lite with Keystone (Technology Preview) 989
S3 bucket operations 990
Prerequisites 992
S3 create bucket notifications 992
S3 get bucket notifications 995
S3 delete bucket notifications 997
Accessing bucket host names 997
S3 list buckets 998
S3 return a list of bucket objects 999
S3 create a new bucket 1001
S3 put bucket website 1002
S3 get bucket website 1003
S3 delete bucket website 1003
S3 delete a bucket 1003
S3 bucket lifecycle 1004
S3 GET bucket lifecycle 1005
S3 create or replace a bucket lifecycle 1006
S3 delete a bucket lifecycle 1006
S3 get bucket location 1007
S3 get bucket versioning 1007
S3 put bucket versioning 1007
S3 get bucket access control lists 1008
S3 put bucket Access Control Lists 1009
S3 get bucket cors 1010
S3 put bucket cors 1010
S3 delete a bucket cors 1011
S3 list bucket object versions 1011
S3 head bucket 1013
S3 list multipart uploads 1013
S3 bucket policies 1017
S3 get the request payment configuration on a bucket 1019
S3 set the request payment configuration on a bucket 1019
Multi-tenant bucket operations 1020
S3 Block Public Access 1020
S3 GET PublicAccessBlock 1022
S3 PUT PublicAccessBlock 1022
S3 delete PublicAccessBlock 1023
S3 object operations 1023
Prerequisites 1024
S3 get an object from a bucket 1025
S3 get information on an object 1026
S3 put object lock 1027
S3 get object lock 1028
S3 put object legal hold 1030
S3 get object legal hold 1031
S3 put object retention 1031
S3 get object retention 1032
S3 put object tagging 1033
S3 get object tagging 1034
S3 delete object tagging 1034
S3 add an object to a bucket 1035
S3 delete an object 1035
S3 delete multiple objects 1036
S3 get an object’s Access Control List (ACL) 1036
S3 set an object’s Access Control List (ACL) 1037
S3 copy an object 1038
S3 add an object to a bucket using HTML forms 1040
S3 determine options for a request 1040
S3 initiate a multipart upload 1040
S3 add a part to a multipart upload 1041
S3 list the parts of a multipart upload 1042
S3 assemble the uploaded parts 1044
S3 copy a multipart upload 1045
S3 abort a multipart upload 1046
S3 Hadoop interoperability 1046
S3 select operations (Technology Preview) 1047
Prerequisites 1047
S3 select content from an object 1047
S3 supported select functions 1051
S3 alias programming construct 1053
S3 CSV parsing explained 1053
Ceph Object Gateway and the Swift API 1054
Prerequisites 1055
Swift API limitations 1055
Create a Swift user 1055
Swift authenticating a user 1057
Swift container operations 1058
Prerequisites 1058
Swift container operations 1058
Swift update a container’s Access Control List (ACL) 1059
Swift list containers 1059
Swift list a container’s objects 1061
Swift create a container 1063
Swift delete a container 1064
Swift add or update the container metadata 1064
Swift object operations 1065
Prerequisites 1065
Swift object operations 1065
Swift get an object 1065
Swift create or update an object 1066
Swift delete an object 1067
Swift copy an object 1068
Swift get object metadata 1069
Swift add or update object metadata 1069
Swift temporary URL operations 1070
Swift get temporary URL objects 1070
Swift POST temporary URL keys 1070
Swift multi-tenancy container operations 1071
The Ceph RESTful API specifications 1071
Prerequisites 1072
Ceph summary 1072
Authentication 1073
Ceph File System 1074
Storage cluster configuration 1079
CRUSH rules 1081
Erasure code profiles 1083
Feature toggles 1084
Grafana 1085
Storage cluster health 1086
Host 1087
Logs 1091
Ceph Manager modules 1091
Ceph Monitor 1094
Ceph OSD 1094
Ceph Object Gateway 1102
REST APIs for manipulating a role 1111
Ceph Orchestrator 1113
Pools 1114
Prometheus 1116
RADOS block device 1118
Performance counters 1132
Roles 1135
Services 1137
Settings 1139
Ceph task 1141
Telemetry 1142
Ceph users 1143
S3 common request headers 1146
S3 common response status codes 1146
S3 unsupported header fields 1147
Swift request headers 1147
Swift response headers 1147
Examples using the Secure Token Service APIs 1147
Troubleshooting 1149
Initial Troubleshooting 1150
Identifying problems 1150
Diagnosing the health of a storage cluster 1150
Understanding Ceph health 1151
Muting health alerts of a Ceph cluster 1152
Understanding Ceph logs 1153
Generating an sos report 1154
Configuring logging 1154
Ceph subsystems 1155
Configuring logging at runtime 1156
Configuring logging in configuration file 1157
Accelerating log rotation 1158
Creating and collecting operation logs for Ceph Object Gateway 1158
Troubleshooting networking issues 1159
Basic networking troubleshooting 1160
Basic chrony NTP troubleshooting 1163
Troubleshooting Ceph Monitors 1164
Most common Ceph Monitor errors 1164
Ceph Monitor error messages 1164
Common Ceph Monitor error messages in the Ceph logs 1164
Ceph Monitor is out of quorum 1165
Clock skew 1166
The Ceph Monitor store is getting too big 1167
Understanding Ceph Monitor status 1168
Injecting a monmap 1169
Replacing a failed Monitor 1170
Compacting the monitor store 1171
Opening port for Ceph manager 1172
Recovering the Ceph Monitor store 1173
Recovering the Ceph Monitor store when using BlueStore 1173
Troubleshooting Ceph OSDs 1176
Most common Ceph OSD errors 1176
Ceph OSD error messages 1177
Common Ceph OSD error messages in the Ceph logs 1177
Full OSDs 1177
Backfillfull OSDs 1178
Nearfull OSDs 1178
Down OSDs 1179
Flapping OSDs 1181
Slow requests or requests are blocked 1182
Stopping and starting rebalancing 1183
Mounting the OSD data partition 1184
Replacing an OSD drive 1185
Increasing the PID count 1187
Deleting data from a full storage cluster 1187
Troubleshooting a multi-site Ceph Object Gateway 1188
Error code definitions for the Ceph Object Gateway 1188
Syncing a multisite Ceph Object Gateway 1189
Performance counters for multi-site Ceph Object Gateway data sync 1190
Synchronizing data in a multi-site Ceph Object Gateway configuration 1191
Troubleshooting Ceph placement groups 1192
Most common Ceph placement groups errors 1192
Placement group error messages 1192
Stale placement groups 1193
Inconsistent placement groups 1193
Unclean placement groups 1195
Inactive placement groups 1195
Placement groups are down 1195
Unfound objects 1196
Listing placement groups stuck in stale, inactive, or unclean state 1198
Listing placement group inconsistencies 1199
Repairing inconsistent placement groups 1202
Increasing the placement group 1202
Troubleshooting Ceph objects 1204
Troubleshooting high-level object operations 1204
Listing objects 1204
Fixing lost objects 1205
Troubleshooting low-level object operations 1206
Manipulating the object’s content 1207
Removing an object 1208
Listing the object map 1209
Manipulating the object map header 1209
Manipulating the object map key 1210
Listing the object’s attributes 1211
Manipulating the object attribute key 1212
Troubleshooting clusters in stretch mode 1213
Replacing the tiebreaker with a monitor in quorum 1213
Replacing the tiebreaker with a new monitor 1215
Forcing stretch cluster into recovery or healthy mode 1217
Contacting IBM support for service 1217
Providing information to IBM Support engineers 1217
Generating readable core dump files 1218
Generating readable core dump files in containerized deployments 1219
Ceph subsystems default logging level values 1221
Health messages of a Ceph cluster 1222
Related information 1226
Acknowledgments 1226
IBM Storage Ceph
Edit online
IBM Storage Ceph is a software-defined storage platform engineered for private cloud architectures.

Summary of changes
Edit online
This topic lists the dates and nature of updates to the published information for IBM Storage Ceph.

Date Nature of updates to the published information


08 Feb 2024 Refreshed the release notes with bug-fixes and known issues for IBM 5.3z6 minor release.
16 October Refreshed the release notes with bug-fixes and known issues for IBM 5.3z5 minor release.
2023
4 August 2023 Refreshed the release notes with bug-fixes and known issues for IBM 5.3z4 minor release.
18 May 2023 Added important updates to Upgrading to IBM Storage Ceph cluster for upgrading from Red Hat Ceph Storage
5.3 to IBM Storage Ceph 5.3
17 March 2023 Added the following sections:

Concepts
Architecture
Data Security and Hardening
Planning
Compatibility
Hardware
Storage Strategies
Administering
Operations
Ceph Object Gateway
Block devices
Developer
Troubleshooting

10 March 2023 The version information was added in IBM Documentation as part of the initial IBM Storage Ceph 5.3 release.

Release notes
Edit online
IBM Storage Ceph is a hardened, qualified, secure, and supported enterprise software curated from the Ceph open-source project
and delivered by IBM.

Enhancements
This section lists all major updates, enhancements, and new features introduced in this release of IBM Storage Ceph.
Bug fixes
This section describes bugs with significant user impact, which were fixed in this release of IBM Storage Ceph. In addition, the
section includes descriptions of fixed known issues found in previous versions.
Known issues
This section documents known issues found in this release of IBM Storage Ceph.
Sources

Enhancements

IBM Storage Ceph 1


Edit online
This section lists all major updates, enhancements, and new features introduced in this release of IBM Storage Ceph.

The Cephadm utility


Ceph Dashboard
Ceph File System
Ceph Object Gateway
Multi-site Ceph Object Gateway
RADOS
RADOS Block Devices (RBD)

The Cephadm utility


cephadm automatically updates the dashboard Grafana password if it is set in the Grafana service spec
Previously, users would have to manually set the Grafana password after applying the specification.

With this enhancement, if initial_admin_password is set in an applied Grafana specification, cephadm automatically updates
the dashboard Grafana password, which is equivalent to running ceph dashboard set-grafana-api-password
command, to streamline the process of fully setting up Grafana. Users no longer have to manually set the dashboard Grafana
password after applying a specification that includes the password.

OSDs automatically update their Ceph configuration files with the new mon locations
With this enhancement, whenever a monmap change is detected, cephadm automatically updates the Ceph configuration
files for each OSD with the new mon locations.
Note: This enhancement may take some time to update on all OSDs if you have a lot of OSDs.

Ceph Dashboard
The Block Device images table is paginated
With this enhancement, the Block Device images table is paginated to use with 10000+ image storage clusters as retrieving
information for a block device image is expensive.

Newly added cross_origin_url option allows cross origin resource sharing


Previously, IBM developers faced issues with their storage insights product when they tried to ping the REST API using their
front-end because of the tight Cross Origin Resource Sharing (CORS) policies set up in Red Hat’s REST API.

With this enhancement, CORS is allowed by adding the cross_origin_url option that can be set to a particular URL - ceph
config set mgr mgr/dashboard/cross_origin_url
localhost and the REST API allows communication with only that URL.

Ceph File System


Users can store arbitrary metadata of CephFS subvolume snapshots
With this enhancement, Ceph File System (CephFS) volume users can store arbitrary metadata in the form of key-value pairs
for CephFS subvolume snapshots with a set of command-line interface (CLI) commands.

Ceph Object Gateway


STS max_session_duration for a role can now be updated
With this enhancement, the STS max_session_duration for a role can be updated using the radosgw-admin command-
line interface.
ListBucket S3 operation now generates JSON output
With this enhancement, on customers’ request to facilitate integrations, the ListBucket S3 operation generates JSON-
formatted output, instead of the default XML, if the request contains an Accept: application/json header.
The option to enable TCP keepalive managed by libcurl is added
With this enhancement, the option to enable TCP keepalive on the HTTP client sockets managed by libcurl is added to
make sync and other operations initiated by Ceph Object Gateway more resilient to network instability. This does not apply to
connections received by the HTTP frontend, but only to HTTP requests sent by the Ceph Object Gateway, such as Keystone for
authentication, sync requests from multi-site, and requests to key management servers for SSE.

Result code 2002 of radosgw-admin commands is explicitly translated to 2

2 IBM Storage Ceph


Previously, a change in the S3 error translation of internal NoSuchBucket result inadvertently changed the error code from
the radosgw-admin bucket stats command, causing the programs checking the shell result code of those radosgw-
admin commands to see a different result code.

With this enhancement, the result code 2002 is explicitly translated to 2 and users can see the original behavior.

You can now use bucket policies with useful errors


Bucket policies were difficult to use since the error indication was wrong. Additionally, silently dropping principals would
cause problems during the upgrade. With this update, useful errors from policy parser and a flag to reject invalid principals
with rgw policy reject invalid principals=true parameter is introduced.

The rgw-restore-bucket-index command allows user specification


Previously, rgw-restore-bucket-index command would only be able to restore the indices for the buckets that were in the
default realm, default zonegroup, and default zone.

With this release, rgw-restore-bucket-index command can restore the indices for the buckets that are in the non-default
realms, non-default zonegroup, or non-default zone when the user specifies that information on the command-line.

Multi-site Ceph Object Gateway


The bucket sync run command provides more details
With this enhancement, user-friendly progress reports on the bucket sync run
command are added to provide users easier visibility into the progress of the operation. When the user runs the radosgw-
admin bucket sync run command with --extra-info flag, users get a message for the start of generation sync and also
for each object that is synced.
Warning: It is not recommended to use the bucket
sync run command without contacting IBM support.
Multi-site configuration supports dynamic bucket index resharding
Previously, only manual resharding of the buckets for multi-site configurations was supported.

With this enhancement, dynamic bucket resharding is supported in multi-site configurations. Once the storage clusters are
upgraded, enable the resharding feature, zone level, and zone group. You can either manually reshard the buckets with
radosgw-admin bucket reshard command or automatically reshard them with dynamic resharding, independently of other
zones in the storage cluster.

Users can now reshard bucket index dynamically with multi-site archive zones
With this enhancement, multi-site archive zone bucket index can be resharded dynamically when dynamic resharding is
enabled for that zone.

RADOS
Low-level log messages are introduced to warn user about hitting throttle limits
Previously, there was a lack of low-level logging indication that throttle limits were hit, causing these occurrences to
incorrectly have the appearance of a networking issue.

With this enhancement, the introduction of low-level log messages makes it much clearer that the throttle limits are hit.

RADOS Block Devices (RBD)


Cloned images can now be encrypted with their own encryption format and passphrase
With this enhancement, layered client-side encryption is now supported that enables each cloned image to be encrypted with
its own encryption format and passphrase, potentially different from that of the parent image. The efficient copy-on-write
semantics intrinsic to unformatted regular cloned images are retained.

Bug fixes
Edit online
This section describes bugs with significant user impact, which were fixed in this release of IBM Storage Ceph. In addition, the
section includes descriptions of fixed known issues found in previous versions.

IBM Storage Ceph 3


The Cephadm utility
Ceph Manager plug ins
The Ceph Volume utility
Ceph Object Gateway
Multi-site Ceph Object Gateway
RADOS
RADOS Block Devices (RBD)
RBD Mirroring
The Ceph Ansible utility

The Cephadm utility


Users can upgrade to a local repo image without any issues
Previously, in cephadm, docker.io would be added to the start of the image name by default, if the image name was not a
qualified domain name. Due to this, users were unable to upgrade to images on local repositories.

With this fix, care has been taken to identify the images to which docker.io is added by default. Users using a local repo image
can upgrade to that image without encountering issues.

(BZ#2100553)

tcmu-runner no longer stops logging after its log file is rotated


Previously, tcmu-runner was left out of the postrotate actions of the logrotate configuration that Cephadm generated for
rotating the logs of Ceph daemons on the host. Due to this, tcmu-runner would eventually stop logging as proper signals
were not sent to regenerate its log file, which is done for other Ceph daemons using the previously mentioned postrotate
actions in the logrotate configuration.

With this fix, tcmu-runner is added to the postrotate actions in the logrotate file that Cephadm deploys for rotation of Ceph
daemons logs. tcmu-runner no longer stops logging after its log file is rotated.

(BZ#2204505)

Added support to configure retention.size parameter in Prometheus’s specification file


Previously, Cephadm Prometheus’s specification would not support configuring retention.size parameter. A
ServiceSpec exception arose whenever the user included this parameter in the specification file. Due to this, the user could
not limit the size of Prometheus’s data directory.

With this fix, users can configure the retention.size parameter in Prometheus’s specification file. Cephadm passes this
value to the Prometheus daemon allowing it to control the disk space usage of Prometheus by limiting the size of the data
directory.

(BZ#2207748)

Ceph Manager plug ins


Ceph Manager Alert emails are not tagged as spam anymore
Previously, emails sent by the Ceph Manager Alerts module did not have the "Message-Id" and "Date:headers". This increased
the chances of flagging the emails as spam.

With this fix, both the headers are added to the emails sent by Ceph Manager Alerts module and the messages are not flagged
as spam.

(BZ#2064481)

Emails generated are not flagged as spam


Previously, the email header created by the Ceph alert manager module would not include the message-id and date fields and
the mails would get flagged as spam.

With this release, the email header is modified to include these two fields and the emails generated by the module are no
longer flagged as spam.

(BZ#2210906)

Python tasks no longer wait for the GIL

4 IBM Storage Ceph


Previously, the Ceph manager daemon held the Python global interpreter lock (GIL) during some RPCs with the Ceph MDS,
due to which, other Python tasks are starved waiting for the GIL.

With this fix, the GIL is released during all libcephfs or librbd calls and other Python tasks may acquire the GIL normally.

(BZ#2219093)

The Ceph Volume utility


The volume list remains empty when no ceph-osd container is found and cephvolumescan actor no longer fails
Previously, if Ceph containers ran collocated with other containers without a ceph-osd container present among them, the
process would try to retrieve the volume list from one non-Ceph container which would not work. Due to this,
cephvolumescan actor would fail and the upgrade would not complete.

With this fix, if no ceph-osd container is found, the volume list will remain empty and the cephvolumescan actor does not
fail.

(BZ#2141393)

Ceph OSD deployment no longer fails when ceph-volume treats multiple devices.
Previously, ceph-volume computed wrong sizes when there were multiple devices to treat, resulting in failure to deploy
OSDs.

With this fix, ceph-volume computes the correct size when multiple devices are to be treated and deployment of OSDs work
as expected.

(BZ#2119774)

Re-running ceph-volume lvm batch command against created devices is now possible
Previously, in ceph-volume, lvm membership was not set for mpath devices like it was for other types of supported devices.
Due to this, re-running the ceph-volume lvm batch command against already created devices was not possible.

With this fix, the lvm membership is set for mpath devices and re-running ceph-volume lvm batch command against
already created devices is now possible.

(BZ#2215042)

Adding new OSDs with pre-created LVs no longer fails


Previously, due to a bug, ceph-volume did not filter out the devices already used by Ceph. Due to this, adding new OSDs with
ceph-volume used to fail when using pre-created LVs.

With this fix, devices already used by Ceph are filtered out as expected and adding new OSDs with pre-created LVs no longer
fails.

(BZ#2209319)

Ceph Object Gateway


Users can now set up Kafka connectivity with SASL in a non-TLS environment
Previously, due to a failure in configuring the TLS certificate for Ceph Object Gateway, it was not possible to configure Kafka
topic with SASL (user and password).

With this fix, a new configuration parameter, rgw_allow_notification_secrets_in_cleartext, is added. Users can now set up
Kafka connectivity with SASL in a non-TLS environment.

(BZ#2014330)

Internal handling of tokens is fixed


Previously, internal handling of tokens in the refresh path of Java-based client authentication provider jar for AWS SDK for
Java and Hadoop S3A Connector, would not deal correctly with the large tokens, resulting in improper processing of some
tokens and preventing the renewal of client tokens.

With this fix, the internal token handling is fixed and it works as expected.

(BZ#2055137)

IBM Storage Ceph 5


The object version access is corrected preventing object lock violation
Previously, inadvertent slicing of version information would occur in some call paths, causing any object version protected by
object lock to be deleted contrary to policy.

With this fix, the object version access is corrected, thereby preventing object lock violation.

(BZ#2108394)

Ceph Object Gateway no longer crashes with malformed URLs


Previously, a refactoring abstraction replaced a bucket value with a pointer to a bucket value that was not always initialized.
This caused malformed URLs corresponding to bucket operations on no buckets resulting in Ceph Object Gateway crashing.

With this fix, a check on the pointer has been implemented into the call path and Ceph Object Gateway returns a permission
error, rather than crashing, if it is uninitialized.

(BZ#2109256)

The code that parses dates z-amz-date format is changed


Previously, the standard format for x-amz-date was changed which caused issues, since the new software uses the new
date format. The new software built with the latest go libraries would not talk to the Ceph Object Gateway.

With this fix, the code in the Ceph Object Gateway that parses dates in x-amz-date format is changed to also accept the new
date format.

(BZ#2109675)

New logic in processing of lifecycle shards prevents stalling due to deleted buckets
Previously, changes were made to cause lifecycle processing to continuously cycle across days, that is, to not restart from the
beginning of the list of eligible buckets each day. However, the changes contained a bug which could stall processing of
lifecycle shards that contained deleted buckets, causing the processing of lifecycle shards to stall.

With this fix, a logic is introduced to skip over the deleted buckets, due to which the processing no longer stalls.

(BZ#2118295)

Header processing no longer causes sporadic swift-protocol authentication failures


Previously, a combination of incorrect HTTP header processing and timestamp handling logic would either cause an invalid
Keystone admin token to be used for operations, or non-renewal of Keystone’s admin token as required. Due to this, sporadic
swift-protocol authentication failures would occur.

With this fix, header processing is corrected and new diagnostics are added. The logic now works as expected.

(BZ#2123335)

Warnings are no longer logged in inappropriate circumstances


Previously, an inverted logic would occasionally report an incorrect warning - unable to find head object, causing the warning
to be logged when it was not applicable in a Ceph Object Gateway configuration.

With this fix, the corrected logic no longer logs the warning in inappropriate circumstances.

(BZ#2126787)

PUT object operation writes to the correct bucket index shards


Previously, due to a race condition, a PUT object operation would rarely write to a former bucket index shard. This caused the
former bucket index shard to be recreated, and the object would not appear in the proper bucket index. Therefore, the object
would not be listed when the bucket was listed.

With this fix, care is taken to prevent various operations from creating bucket index shards and recover when the race
condition is encountered. PUT object operations now always write to the correct bucket index shards.

(BZ#2145022)

Blocksize is changed to 4K
Previously, Ceph Object Gateway GC processing would consume excessive time due to the use of a 1K blocksize that would
consume the GC queue. This caused slower processing of large GC queues.

With this fix, blocksize is changed to 4K, which has accelerated the processing of large GC queues.

(BZ#2215062)

6 IBM Storage Ceph


The closing tags are sent to the client
Previously, When returning an error page to the client, for example for a 404 or 403 condition, the </body> and </html>
closing tags were missing and not sent to the client although their presence was accounted for in the requests Content-Length
header field value. This caused the TCP connection between the client and the Ceph Object Gateway to be closed by an RST
packet from the client occasionally on the account of the incorrect Content-Length header field value instead of a FIN packet
under normal circumstances.

With this release, the <body> and </html> closing tags are sent to the client under all required conditions. The value of the
Content-Length header field correctly represents the length of the data sent to the client, and the client no longer resets the
connection for an incorrect Content-Length reason.

(BZ#2227048)

Multi-site Ceph Object Gateway


Suspending bucket versioning in the primary zone no longer suspends bucket versioning in the archive zone
Previously, if bucket versioning was suspended in the primary zone, bucket versioning in the archive zone would also be
suspended.

With this fix, archive zone versioning is always enabled irrespective of bucket versioning changes on other zones. Bucket
versioning in the archive zone no longer gets suspended.

(BZ#1957088)

The radosgw-admin sync status command in multi-site replication works as expected


Previously, in a multisite replication, if one or more participating Ceph Object Gateway nodes are down, you would (5)
Input/output error output when running the radosgw-admin sync status command. This status should get resolved after
all the Ceph Object Gateway nodes are back online.

With this update, the radosgw-admin sync status command does not get stuck and works as expected.

(BZ#1749627)

Processes trimming retired bucket index entries no longer cause radosgw instance to crash
Previously, under some circumstances, processes trimming retired bucket index entries could access an uninitialized pointer
variable resulting in the radosgw instance to crash.

With this fix, code is initialized immediately before use and the radosgw instance no longer crashes.

(BZ#2139258)

Bucket sync run is given control logic to sync all objects


Previously, to support dynamic bucket resharding on multisite clusters, a singular bucket index log was replaced with multiple
bucket index log generations. But, due to how bucket sync run was implemented, only the oldest outstanding generation
would be sync run.

With this fix, bucket sync run is given control logic which enables it to run the sync from oldest outstanding to current and all
objects are now synced as expected.

(BZ#2066453)

Per-bucket replication logical error fix executes policies correctly


Previously, an internal logic error caused failures in per-bucket replication, due to which per-bucket replication policies did not
work in some circumstances.

With this fix, the logic error responsible for confusing the source and destination bucket information is corrected and the
policies execute correctly.

(BZ#2108886)

Variable access no longer causes undefined program behavior


Previously, a coverity scan would identify two cases, where variables could be used after a move, potentially causing an
undefined program behavior to occur.

With this fix, variable access is fixed and the potential fault can no longer occur.

(BZ#2123423)

IBM Storage Ceph 7


Requests with a tenant but no bucket no longer cause a crash
Previously, an upstream refactoring replaced uninitialized bucket data fields with uninitialized pointers. Due to this, any bucket
request containing a URL referencing no valid bucket caused crashes.

With this fix, requests that access the bucket but do not specify a valid bucket are denied, resulting in an error instead of a
crash.

(BZ#2139422)

RADOS
Performing a DR test with two sites stretch cluster no longer causes Ceph to become unresponsive
Previously, when performing a DR test with two sites stretch-cluster, removing and adding new monitors to the cluster would
cause an incorrect rank in ConnectionTracker class. Due to this, the monitor would fail to identify itself in the
peer_tracker copy and would never update its correct field, causing a deadlock in the election process which would lead to
Ceph becoming unresponsive.

With this fix, the following corrections are made:

Added an assert in the function notify_rank_removed(), to compare the expected rank provided by the Monmap against
the rank that is manually adjusted as a sanity check.
Clear the variable removed_ranks from every Monmap update.
Added an action to manually reset peer_tracker.rank when executing the command - ceph connection scores
reset for each monitor. The peer_tracker.rank matches the current rank of the monitor.
Added functions in the Elector and ConnectionTracker classes to check for clean peer_tracker when
upgrading the monitors, including booting up. If found unclean, peer_tracker is cleared.
The user can choose to manually remove a monitor rank before shutting down the monitor, causing inconsistency in
Monmap. Therefore, in Monitor::notify_new_monmap() we prevent the function from removing our rank or ranks
that don’t exist in Monmap.

The cluster now works as expected and there is no unwarranted downtime. The cluster no longer becomes unresponsive
when performing a DR test with two sites stretch-cluster.

(BZ#2142674)

Rank is removed from the live_pinging and dead_pinging set to mitigate the inconsistent connectivity score issue
Previously, when removing two monitors consecutively, if the rank size is equal to Paxos’s size, the monitor would face a
condition and would not remove rank from the dead_pinging set. Due to this, the rank remained in the dead_pinging set which
would cause problems, such as inconsistent connectivity score when the stretch-cluster mode was enabled.

With this fix, a case is added where the highest ranked monitor is removed, that is, when the rank is equal to Paxos’s size,
remove the rank from the live_pinging and dead_pinging set. The monitor stays healthy with a clean live_pinging and
dead_pinging set.

(BZ#2142174)

The Prometheus metrics now reflect the correct Ceph version for all Ceph Monitors whenever requested
Previously, the Prometheus metrics reported mismatched Ceph versions for Ceph Monitors when the monitor was upgraded.
As a result, the active Ceph Manager daemon needed to be restarted to resolve this inconsistency.
With this fix, the Ceph Monitors explicitly send metadata update requests with mon metadata to mgr when MON election is
over.

(BZ#2008524)

The small writes are deferred


Previously, Ceph would defer writes while allocating units. When the allocation unit was large, like 64 K, no small write was
eligible for deferring.
With this update, the small writes are deferred as they operate on disk blocks even when large allocation units are deferring.

(BZ#2107407)

The correct set of replicas are used for remapped placement groups
Previously, for remapped placement groups, the wrong set of replicas would be queried for the scrub information causing a
failure of the scrub process, after identifying mismatches that would not exist.
With this fix, the correct set of replicas are now queried.

8 IBM Storage Ceph


(BZ#2130667)

The ceph daemon heap status command shows the heap status
Previously, due to a failure to get heap information through the ceph daemon command, the ceph daemon heap stats
command would return empty output instead of returning current heap usage for a Ceph daemon. This was because
ceph::osd_cmds::heap() was confusing the stderr and stdout concept which caused the difference in output.
With this fix, the ceph daemon heap stats command returns heap usage information for a Ceph daemon similar to what we
get using the ceph tell command.

(BZ#2119100)

Ceph Monitors no longer crash when using ceph orch apply mon <num> command
Previously, when the command ceph orch apply mon <num> was used to decrease monitors in a cluster, the monitors were
removed before shutting down in ceph-adm causing the monitors to crash.

With this fix, a sanity check is added to all code paths that check whether the peer rank is more than or equal to the size of the
ranks from the monitor map. If the condition is satisfied, then skip certain operations that lead to the monitor crashing. The
peer rank eventually resolves itself in the next version of the monitor map. The monitors no longer crash when removed from
the monitor map before shutting down.

(BZ#2142141)

End-user can now see the scrub or deep-scrub starts message from the Ceph cluster log
Previously, due to the scrub or deep-scrub starts message missing in the Ceph cluster log, the end-user would fail to know if
the PG scrubbing had started for a PG from the Ceph cluster log.

With this fix, the scrub or deep-scrub starts message is reintroduced. The Ceph cluster log now shows the message for a PG,
whenever it goes for a scrubbing or deep-scrubbing process.

(BZ#2091773)

No assertion during the Ceph Manager failover


Previously, when activating the Ceph Manager, it would receive several service_map versions sent by the previously active
manager. This incorrect check in code would cause assertion failure when the newly activated manager received a map with a
higher version sent by the previously active manager.

With this fix, the check in the manager that deals with the initial service map is relaxed and there is no assertion during the
Ceph Manager failover.

(BZ#2095062)

Users can remove cloned objects after upgrading a cluster


Previously, after upgrading a cluster from Red Hat Ceph Storage 4 to Red Hat Ceph Storage 5 , removing snapshots of objects
created in earlier versions would leave clones, which could not be removed. This was because the SnapMapper keys were
wrongly converted.

With this fix, SnapMapper’s legacy conversation is updated to match the new key format. The cloned objects in earlier
versions of Ceph can now be easily removed after an upgrade.

(BZ#2107405)

RocksDB error does not occur for small writes


BlueStore employs a strategy of deferring small writes for HDDs and stores data in RocksDB. Cleaning deferred data from
RocksDB is a background process which is not synchronized with BlueFS.

With this fix, deferred replay no longer overwrites BlueFS data and some RocksDB errors do not occur, such as:

osd_superblock corruption.
CURRENT does not end with newline.
.sst files checksum error.

Note: Do not write deferred data as the write location might either contain a proper object or be empty. It is not possible to
corrupt object data this way. BlueFS is the only entity that can allocate this space.
(BZ#2109886)

Corrupted dups entries of a PG Log can be removed by off-line and on-line trimming
Previously, trimming of PG log dups entries could be prevented during the low-level PG split operation, which is used by the
PG autoscaler with far higher frequency than by a human operator. Stalling the trimming of dups resulted in significant

IBM Storage Ceph 9


memory growth of PG log, leading to OSD crashes as it ran out of memory. Restarting an OSD did not solve the problem as the
PG log is stored on disk and reloaded to RAM on startup.

With this fix, both off-line, using the ceph-objectstore-tool command, and on-line, within OSD, trimming can remove
corrupted dups entries of a PG log that jammed the on-line trimming machinery and were responsible for the memory growth.
A debug improvement is implemented that prints the number of dups entries to the OSD’s log to help future investigations.

(BZ#2119853)

Manager continues to send beacons in the event of an error during authentication check
Previously, if an error was encountered when performing an authentication check with a monitor, the manager would get into a
state where it would no longer have an active connection. Due to this, the manager could no longer send beacons and the
monitor would mark it as lost.

With this fix, a session (active con) is reopened in the event of an error and the manager is able to continue to send beacons
and is no longer marked as lost.

(BZ#2192479)

RADOS Block Devices (RBD)


The rbd info command no longer fails if executed when the image is being flattened
Previously, due to an implementation defect, rbd info command would fail, although rarely, if run when the image was
being flattened. This caused a transient No such file or directory error to occur, although, upon rerun, the command always
succeeded.

With this fix, the implementation defect is fixed and rbd info command no longer fails even if executed when the image is
being flattened.

(BZ#1989527)

Removing a pool with pending Block Device tasks no longer causes all the tasks to hang
Previously, due to an implementation defect, removing a pool with pending Block Device tasks caused all Block Device tasks,
including other pools, to hang. To resume hung Block Device tasks, the administrator had to restart the ceph-mgr daemon.

With this fix, the implementation defect is fixed and removing a pool with pending RBD tasks no longer causes any hangs.
Block Device tasks for the removed pool are cleaned up. Block Device tasks for other pools continue executing uninterrupted.

(BZ#2150968)

Object map for the snapshot accurately reflects the contents of the snapshot
Previously, due to an implementation defect, a stale snapshot context would be used when handling a write-like operation.
Due to this, the object map for the snapshot was not guaranteed to accurately reflect the contents of the snapshot in case the
snapshot was taken without quiescing the workload. In differential backup and snapshot-based mirroring, use cases with
object-map and/or fast-diff features enabled, the destination image could get corrupted.

With this fix, the implementation defect is fixed and everything works as expected.

(BZ#2216188)

RBD Mirroring
The image replayer shuts down as expected
Previously, due to an implementation defect, a request to shut down a particular image replayer would cause the rbd-
mirror daemon to hang indefinitely, especially in cases where the daemon was blocklisted on the remote storage cluster.

With this fix, the implementation defect is fixed and a request to shut down a particular image replayer no longer causes the
rbd-mirror daemon to hang and the image replayer shuts down as expected.

(BZ#2086471)

The rbd mirror pool peer bootstrap create command guarantees correct monitor addresses in the bootstrap token
Previously, a bootstrap token generated with the rbd mirror pool peer bootstrap create command contained monitor
addresses as specified by the mon_host option in the ceph.conf file. This was fragile and caused issues to users, such as
causing confusion between V1 and V2 endpoints, specifying only one of them, grouping them incorrectly, and the like.

10 IBM Storage Ceph


With this fix, the rbd mirror pool peer bootstrap create command is changed to extract monitor address from the cluster
itself, guaranteeing the monitor addresses contained in a bootstrap token to be correct.

(BZ#2122130)

The Ceph Ansible utility


Dependency to ansible-collection-ansible is created when deploying ceph-ansible
Previously, for the cephadm-adopt.yml playbook, an additional ansible library (ansible-utils) was used to work with
ipv4 and ipv6 mixed environments. As ansible-utils is not deployed by default, there were missing dependencies.

With this fix, a dependency to ansible-collection-ansible is created when deploying ceph-ansible and cephadm-
adopt.yml playbook completes successfully.

(BZ#2207872)

Ceph containers no longer fail during startup


Previously, the behaviour of podman released with Red Hat Enterprise Linux 8.7 had changed with respect to SELinux
relabeling. Due to this, depending on their startup order, some Ceph containers would fail to start as they would not have
access to the files they needed.

With this fix, the SElinux separation for the container is disabled and all Ceph containers start successfully.

(BZ#2222003)

Known issues
Edit online
This section documents known issues found in this release of IBM Storage Ceph.

The Cephadm utility


Multi-site Ceph Object Gateway

The Cephadm utility


ceph orch ps command does not display a version for monitoring stack daemons
In cephadm, due to the version grabbing code currently being incompatible with the downstream monitoring stack containers,
version grabbing fails for monitoring stack daemons, such as node-exporter, prometheus, and alertmanager.

As a workaround, if the user needs to find the version, the daemons' container names include the version.

(BZ#2125382)

Multi-site Ceph Object Gateway


md5 mismatch of replicated objects when testing Ceph Object gateway’s server-side encryption in multi-site
Presently, a md5 mismatch of replicated objects is observed when testing Ceph Object gateway’s server-side encryption in
multi-site. The data corruption is specific to S3 multipart uploads with SSE encryption enabled. The corruption only affects the
replicated copy. The original object remains intact.

Encryption of multipart uploads requires special handling around the part boundaries because each part is uploaded and
encrypted separately. In multi-site, objects are encrypted, and multipart uploads are replicated as a single part. As a result,
the replicated copy loses its knowledge about the original part boundaries required to decrypt the data correctly, which
causes this corruption.

As a workaround, multi-site users should not use server-side encryption for multipart uploads. For more detailed information,
see the KCS Server-side encryption with RGW multisite configuration might lead to data corruption of multipart objects.

(BZ#2214252)

IBM Storage Ceph 11


Sources
Edit online

The updated IBM Storage Ceph source code packages are available at the following location:

For Red Hat Enterprise Linux 8: http://ftp.redhat.com/redhat/linux/enterprise/8Base/en/RHCEPH/SRPMS/


For Red Hat Enterprise Linux 9: http://ftp.redhat.com/redhat/linux/enterprise/9Base/en/RHCEPH/SRPMS/
For IBM Storage Ceph tools: https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/5

Asynchronous updates
Edit online
This section describes the bug fixes, known issues, and enhancements of the z-stream releases for IBM Storage Ceph 5.3.

Release Notes for 5.3z6


This section describes the bug fixes, known issues, and enhancements of the IBM Storage Ceph 5.3z6 release.

Release Notes for 5.3z6


Edit online
This section describes the bug fixes, known issues, and enhancements of the IBM Storage Ceph 5.3z6 release.

Enhancements
This section describes the enhancements in the IBM Storage Ceph 5.3z6 release.
Bug fixes
This section describes the bug fixes in the IBM Storage Ceph 5.3z6 release.
Known issues
This section documents known issues found in IBM Storage Ceph 5.3z6 release.

Enhancements
Edit online
This section describes the enhancements in the IBM Storage Ceph 5.3z6 release.

Ceph File System


Ceph Object Gateway
RADOS
Ceph Manager plugins

Ceph File System


The MDS default balancer is now disabled by default
With this release, the MDS default balancer or the automatic dynamic subtree balancer is disabled by default. This prevents
accidental subtree migrations, Subtree migrations can be expensive to undo when the operator increases the file system
max_mds setting without planning subtree delegations, such as, with pinning.

BZ#2255436

Ceph Object Gateway


The rgw-restore-bucket-index experimental tool restores bucket indices for versioned and un-versioned buckets

12 IBM Storage Ceph


With this enhancement, you can restore the bucket indices for versioned buckets with the rgw-restore-bucket-index
experimental tool, in addition to its existing ability to work with un-versioned buckets.
BZ#2224636

The radosgw-admin bucket stats command prints bucket versioning


With this enhancement, the radosgw-admin bucket stats command prints the versioning status for buckets as one of three
values of enabled, off, or suspended since versioning can be enabled or disabled after creation.

BZ#2240089

Enhanced ordered bucket listing


Previously, in some cases, buckets with larger number of shards and several pseudo-subdirectories would take an
unnecessarily long time to complete.

With this enhancement, such buckets perform an ordered bucket listing more quickly.

BZ#2239433

RADOS
Improved protection against running BlueStore twice
Previously, advisory locking was used to protect against running BlueStore twice. This works well on baremetal deployments.
However, when used on containers it would create unrelated inodes that targeted same mknod b block device. As a result,
two containers might assume that they can have exclusive access which led to severe errors.

With this release, you can improve protection against running OSDs twice at the same time on one block device. You can
reinforce advisory locking with O_EXCL open flag dedicated for block devices. It is no longer possible to open one BlueStore
instance twice and the overwrite and corruption does not occur.

BZ#2239455

New reports available for sub-events for delayed operations


Previously, slow operations were marked as delayed but without a detailed description.

With this enhancement, you can view the detailed descriptions of delayed sub-events for operations.

BZ#2240839

Ceph Manager plugins


Each Ceph Manager module has a separate thread to run commands
Previously, there was one thread through which all the ceph-mgr module commands were run. If one of the module’s
commands were stuck, all the other module’s commands would hang, waiting on the same thread.

With this update, one finisher thread for each Ceph Manager module is added. Each module has a separate thread for
commands run. Even if one of the module’s command hangs, the other modules are able to run.

BZ#2234610

Bug fixes
Edit online
This section describes the bug fixes in the IBM Storage Ceph 5.3z6 release.

Cephadm
Ceph File System
Ceph Object Gateway
RADOS
Ceph-Ansible

Cephadm
IBM Storage Ceph 13
The cephadm-adopt playbook completes with IPV6 address
Previously, due to the single quotes around the IPV6 address, the cephadm-adopt playbook would fail as the IPV6 was not
recognized as the correct IP address.

With this fix, the single quotes around the IPV6 address are removed and the cephadm-adopt playbook completes
successfully with an IPV6 setup.

BZ#2153448

Use custom config files to bindmount tcmu.conf files


Previously, the custom config files were not added to the tcmu-runner container. As a result, the custom config files could
not be used to bindmount a tcmu.conf into the tcmu-runner containers deployed by cephadm.

With this fix, custom config files are added to the tcmu-runner container as well as the rbd-target-api container. You
can now use custom config files to bindmount a tcmu.conf file into the tcmu-runner containers deployed by cephadm.

BZ#2193419

Ceph File System


The snap-schedule module retains a defined number of snapshots
With this release, snap-schedule module supports a new retention specification to retain a user-defined number of
snapshots. For example, if you have specified 50 snapshots to retain irrespective of the snapshot creation cadence, then the
snapshot is pruned after a new snapshot is created. The actual number of snapshots retained is 1 less than the maximum
number specified. In this example, 49 snapshots are retained so that there is a margin of 1 snapshot that can be created on
the file system on the next iteration. The retained snapshot avoids breaching the system configured limit of
mds_max_snaps_per_dir.

Important: Be careful when configuring mds_max_snaps_per_dir and snapshot scheduling limits to avoid unintentional
deactivation of snapshot schedules due to the file system returning a "Too many links" error if the mds_max_snaps_per_dir
limit is breached.
BZ#2227806

The modules capture all exceptions to stay operational


Previously, the modules would crash when unknown or unexpected exceptions were not captured at the time of development,
leading to module crash and loss of functionality.

With this fix, the module captures all exceptions. The resulting traceback is also dumped to the console or the log file to report
unexpected events. As a result, the module continues to stay operational, providing a better user experience.

BZ#2227810

Client always sends a caps revocation acknowledgment to the MDS daemon


Previously, whenever an MDS daemon sent a caps revocation request to a client and during this time, if the client released the
caps and removed the inode, then the client would drop the request directly, but the MDS daemon would need to wait for a
cap revoking acknowledgment from the client. Due to this, even when there was no need for caps revocation, the MDS
Daemon would continue waiting for an acknowledgment from the client, causing a warning in MDS Daemon health status.

With this fix, the client always sends a caps revocation acknowledgment to the MDS Daemon, even when there is no inode
existing and the MDS Daemon no longer stays stuck.

BZ#2227997

Sending split_realms information is skipped from CephFS MDS


Previously, the split_realms information would be incorrectly sent from the CephFS MDS which could not be correctly
decoded by kclient. As a result, the clients would not care about the split_realms and would treat it as a corrupted
snaptrace.

With this fix, split_realms are not sent to kclient and works as expected.

BZ#2228001

Laggy clients are now evicted only if there are no laggy OSDs
Previously, monitoring performance dumps from the MDS would sometimes show that the OSDs were laggy,
objecter.op_laggy and objecter.osd_laggy, causing laggy clients (cannot flush dirty data for cap revokes).

14 IBM Storage Ceph


With this enhancement, if defer_client_eviction_on_laggy_osds is set to true and a client gets laggy because of a
laggy OSD then client eviction will not take place until OSDs are no longer laggy.

BZ#2228039

The Python librados supports iterating object omap key/values


Previously, the iteration would break whenever a binary/unicode key was encountered.

With this release, the Python librados supports iterating object omap key/values with unicode or binary keys and the iteration
continues as expected.

BZ#2232164

Deadlocks no longer occur between the unlink and reintegration requests


Previously, when fixing an async bug, a regression was introduced by previous commits, causing deadlocks between the
unlink and reintegration request.

With this fix, the old commits are reverted and there is no longer a deadlock between unlink and reintegration requests.

BZ#2233886

The create and getattr RPC requests no longer face a deadlock


Previously, when an MDS acquired metadata tree locks in the wrong order, the create and getattr RPC requests would
deadlock.

With this fix, MDS ensures that the locks are obtained in the correct order and the requests are processed correctly.

BZ#2236190

Blocklist and evict client for large session metadata


Previously, large client metadata buildup in the MDS would sometimes cause the MDS to switch to read-only mode.

With this fix, the client that is causing the buildup is blocklisted and evicted, allowing the MDS to work as expected.

BZ#2238665

The ENOTEMPTY output is detected and the message is displayed correctly


Previously, when running the subvolume group rm command, the ENOTEMPTY output was not detected in the volume's plugin
causing a generalized error message instead of a specific message.

With this fix, the ENOTEMPTY output is detected for the subvolume group rm command when there is subvolume present
inside the subvolumegroup and the message is displayed correctly.

BZ#2240727

The next client replay request is queued automatically while in the up:client-replay state
Previously, MDS would not queue the next client request for replay in the up:client-replay state causing the MDS to hang
in that state.

With this fix, the next client replay request is queued automatically as part of the request clean up and the MDS proceeds with
failover recovery normally.

BZ#2244868

The MDS no longer crashes when the journal logs are flushed
Previously, when the journal logs were successfully flushed, you could set the lockers’ state to LOCK_SYNC or
LOCK_PREXLOCK when the xclock count was non-zero. However, the MDS would not allow that and would crash.

With this fix, MDS allows the lockers’ state to LOCK_SYNC or LOCK_PREXLOCK when the xclock count is non-zero and the
MDS does not crash.

BZ#2248825

MDS triggers now only one reintegration for each case


Previously, when unlinking the CInode which had multiple links, the MDS would trigger the same reintegration multiple times
resulting in slowing does of normal requests and thereby MDS performance.

With this fix, only one reintegration is triggered for each case and no redundant reintegration is triggered.

IBM Storage Ceph 15


BZ#2249565

The loner member is set to true


Previously, for a file lock in the LOCK_EXCL_XSYN state, the non-loner clients would be issued empty caps. However, since the
loner of this state is set to false, it could make the locker to issue the Fcb caps to them, which is incorrect. This would cause
some client requests to incorrectly revoke some caps and infinitely wait and cause slow requests.

With this fix, the loner member is set to true and as a result the corresponding request is not blocked.

BZ#2251768

Add MDS metadata with FSMap changes in batches


Previously, monitors would lose track of MDS metadata during upgrade and canceled PAXOS transactions, causing the MDS
metadata to be unavailable.

With this fix, you can add MDS metadata with FSMap changes in batches to ensure consistency. The ceph mds metadata
command functions as expected across upgrades.

BZ#2255035

The standby-replay MDS daemons now trim their caches


Previously, the standby-replay MDS daemon would retain more metadata in its cache than required, thereby reporting an
oversized cache warning. This caused a persistent “MDSs report oversized cache” warning in the storage cluster when the
standby-replay MDS daemons were used.

With this fix, the standby-replay MDS daemons trim their caches and keep the cache usage below the configured limit and no
“MDSs report oversized cache” warnings are emitted.

BZ#2257421

The perf dump command works as expected


Previously, the diagnostic counters that show the journal replay progress were not updated in the up:replay state. As a
result, the perf dump command could not be used to evaluate progress.

With this fix, the counters are updated during replay and the perf dump command works as expected.

BZ#2259297

Ceph Object Gateway


A test for reshardable bucket layout is created to prevent crash
Previously, when the bucket layout code was introduced to enable dynamic bucket resharding with multi-site, there was no
check in place to check if the bucket layout supported dynamic, immediate, or scheduled resharding. During dynamic bucket
resharding, the Ceph Object Gateway daemon would crash. During an immediate or scheduled resharding, the radosgw-
admin command would crash.

With this fix, a test for reshardable bucket layout is added to prevent such crashes. In case of immediate and scheduled
resharding a descriptive error message is displayed, and for dynamic bucket resharding the bucket is simply skipped.

BZ#2245335

The user modify –placement-id command can now be used with an empty --storage-class argument
Previously, if the --storage-class argument was not used when running the user modify –placement-id command, the
command would fail.

With this fix, the --storage-class argument can be left empty and the command works as expected.

BZ#2245699

The rados cppool command ceases the operation without the --yes-i-really-mean-it parameter
The rados cppool command does not preserve self-managed snapshots when copying an RBD pool. It needs to be used by
enforcing the --yes-i-really-mean-it parameter. Previously, the obligatory switch for this parameter was not enforced for RBD
pools.

With this fix, if the user misses this switch, rados cppool command ceases the operation and exists with a warning.

BZ#2252781

16 IBM Storage Ceph


Ceph Dashboard
Evicting a single client does not evict other clients
Previously, evicting a CephFS client would evict all the other clients. Once these clients were lost, the access to the mount
points were lost which caused unexpected issues for the CephFS system.

With this fix, the logic is fixed and evicting a single client does not evict other clients.

BZ#2237391

RADOS
Log scrub starts message for a PG through the scrubbing process
Previously, the scrub reservation would get canceled for the PG and caused frequent scrubbing of a PG. This would result in
multiple scrub messages being printed in the cluster log for the same PG.

With this fix, log scrub starts message for a PG only when the replicas confirm the scrub reservation and is going through the
scrubbing process.

BZ#2211758

The detection code is reintroduced and spillover appears as expected


Previously, the refactor removed the spillover detection code and this spillover from the dedicated Block.DB to main block
device would never get detected.

With this fix, the code is reintroduced, and the spillover appears properly.

BZ#2237880

The libcephsqlite zeros short reads at the correct region


Previously, the libcephsqlite zeros short read from RADOS incorrectly, causing the SQLite database to get corrupted.

With this fix, the libcephsqlite zeros short reads at the correct region of the buffer with no corruption.

BZ#2240144

An exception is added to handle a Ceph Monitor crash


Previously, if the ceph health mute command was run with the incorrect syntax, it would trigger the Ceph Monitor to crash.

With this exception, an exception is added to handle the ceph health mute command and the monitor handles this exception.
The command errors out and notifies that the command has the wrong syntax.

BZ#2247232

The correct CRUSH location of the OSDs parent (host) is determined


Previously, when the osd_memory_target_autotune option was enabled, the memory target was applied at the host level. This
was done by using a host mask when auto-tuning the memory. But the code that applied to the memory target would not
determine the correct CRUSH location of the parent host for the change to be propagated to the OSD(s) of the host. As a
result, none of the OSDs hosted by the machine got notified by the config observer and the osd_memory_target remained
unchanged for those set of OSDs.

With this fix, the correct CRUSH location of the OSDs parent (host) is determined based on the host mask. This allows the
change to propagate to the OSDs on the host. All the OSDs hosted by the machine are notified whenever the auto-tuner
applies a new osd_memory_target and the change is reflected.

BZ#2249014

The ceph config dump command output is now consistent


Previously, the ceph config dump command without the pretty print formatted output showed the localized option name and
its value. An example of a normalized vs localized option is shown below:

Normalized: mgr/dashboard/ssl_server_port
Localized: mgr/dashboard/x/ssl_server_port

However, the pretty-printed (for example, JSON) version of the command only showed the normalized option name as shown
in the example above. The ceph config dump command result was inconsistent between with and without the pretty-print
option.

IBM Storage Ceph 17


With this fix, the output is consistent and always shows the localized option name when using the ceph config dump --format
TYPE command, with TYPE as the pretty-print type.

BZ#2249017

The check_past_interval_bounds uses the max_oldest_map to calculate the start interval


Previously, the oldest OSDMap which was used to calculate the past interval bounds was local to the OSD and not the
max_oldest_map received with other peers instead. A specific OSD’s oldest_map can lag for a while behind the
max_oldest_map across all peers. As a result, an assert would be triggered in check_past_interval_bounds.

With this fix, check_past_interval_bounds uses the max_oldest_map (renamed to


cluster_osdmap_trim_lower_bound) to calculate the start interval. In addition, the option
osd_skip_check_past_interval_bounds is introduced to allow OSDs to recover from this issue after applying the fix.

BZ#2253672

Ceph-Ansible
The “manage nodes with cephadm - ipv4/6" task option work as expected
Previously, when there was more than one IP address in the cephadm_mgmt_network, the tasks with the “manage nodes
with cephadm – ipv4/6” parameter would be skipped as the condition was not met. The condition tests if the
cephadm_mgmt_network is an IP address and a list of IPs cannot be an address.

With this fix, the first IP is extracted from the list to test if it is an IP address and the “manage nodes with cephadm – ipv4.6”
works as expected.

BZ#2231469

The Ceph packages now install without stopping any of the running Ceph services
Previously, during the upgrade, all Ceph services stopped running as the Ceph 4 packages would be uninstalled instead of
updating.

With this fix, the new Ceph 5 packages are installed during upgrades and do not impact the running Ceph processes.

BZ#2233444

Known issues
Edit online
This section documents known issues found in IBM Storage Ceph 5.3z6 release.

Ceph Dashboard

Ceph Dashboard
Some metrics are displayed as null leading to blank spaces in graphs
Some metrics on the Ceph dashboard are shown as null, which leads to blank space in the graphs since you do not initialize a
metric until it has some value.

As a workaround, edit the Grafana panel in which the issue is present. From the Edit menu, click Migrate and select Connect
Nulls. Choose Always and the issue is resolved.

Concepts
Edit online
Learn about the architecture, data security, and hardening concepts for IBM Storage Ceph.

Architecture
Data Security and Hardening

18 IBM Storage Ceph


Architecture
Edit online
Know the architecture information for Ceph Storage Clusters and their clients.

Ceph architecture
Core Ceph components
Ceph client components
Ceph on-wire encryption

Ceph architecture
Edit online
IBM Ceph Storage cluster is a distributed data object store designed to provide excellent performance, reliability and scalability.
Distributed object stores are the future of storage, because they accommodate unstructured data, and because clients can use
modern object interfaces and legacy interfaces simultaneously.

For example:

APIs in many languages (C/C++, Java, Python)

RESTful interfaces (S3/Swift)

Block device interface

Filesystem interface

The power of IBM Ceph Storage cluster can transform your organization’s IT infrastructure and your ability to manage vast amounts
of data, especially for cloud computing platforms like Red Hat Enterprise Linux OSP. IBM Ceph Storage cluster delivers extraordinary
scalability–thousands of clients accessing petabytes to exabytes of data and beyond.

At the heart of every Ceph deployment is the IBM Ceph Storage cluster. It consists of three types of daemons:

Ceph OSD Daemon


Ceph OSDs store data on behalf of Ceph clients. Additionally, Ceph OSDs utilize the CPU, memory and networking of Ceph
nodes to perform data replication, erasure coding, rebalancing, recovery, monitoring and reporting functions.

Ceph Monitor
A Ceph Monitor maintains a master copy of the IBM Ceph Storage cluster map with the current state of the IBM Ceph Storage
cluster. Monitors require high consistency, and use Paxos to ensure agreement about the state of the IBM Ceph Storage
cluster.

Ceph Manager
The Ceph Manager maintains detailed information about placement groups, process metadata and host metadata in lieu of
the Ceph Monitor—significantly improving performance at scale. The Ceph Manager handles execution of many of the read-
only Ceph CLI queries, such as placement group statistics. The Ceph Manager also provides the RESTful monitoring APIs.

Figure 1. Daemons

Ceph client interfaces read data from and write data to the IBM Ceph Storage cluster. Clients need the following data to
communicate with the IBM Ceph Storage cluster:

The Ceph configuration file, or the cluster name (usually ceph) and the monitor address.

IBM Storage Ceph 19


The pool name.

The user name and the path to the secret key.

Ceph clients maintain object IDs and the pool names where they store the objects. However, they do not need to maintain an object-
to-OSD index or communicate with a centralized object index to look up object locations. Then, Ceph clients provide an object name
and pool name to librados, which computes an object’s placement group and the primary OSD for storing and retrieving data using
the CRUSH (Controlled Replication Under Scalable Hashing) algorithm. The Ceph client connects to the primary OSD where it may
perform read and write operations. There is no intermediary server, broker or bus between the client and the OSD.

When an OSD stores data, it receives data from a Ceph client—whether the client is a Ceph Block Device, a Ceph Object Gateway, a
Ceph Filesystem or another interface and it stores the data as an object.

NOTE: An object ID is unique across the entire cluster, not just an OSD’s storage media.

Ceph OSDs store all data as objects in a flat namespace. There are no hierarchies of directories. An object has a cluster-wide unique
identifier, binary data, and metadata consisting of a set of name/value pairs.

Figure 2. Object

Ceph clients define the semantics for the client’s data format. For example, the Ceph block device maps a block device image to a
series of objects stored across the cluster.

NOTE: Objects consisting of a unique ID, data, and name/value paired metadata can represent both structured and unstructured
data, as well as legacy and leading edge data storage interfaces.

Core Ceph components


Edit online
An IBM Storage Ceph cluster can have a large number of Ceph nodes for limitless scalability, high availability and performance. Each
node leverages non-proprietary hardware and intelligent Ceph daemons that communicate with each other to:

Write and read data

Compress data

Ensure durability by replicating or erasure coding data

Monitor and report on cluster health--also called 'heartbeating'

Redistribute data dynamically--also called 'backfilling'

Ensure data integrity; and,

Recover from failures.

To the Ceph client interface that reads and writes data, a IBM Storage Ceph cluster looks like a simple pool where it stores data.
However, librados and the storage cluster perform many complex operations in a manner that is completely transparent to the
client interface. Ceph clients and Ceph OSDs both use the CRUSH (Controlled Replication Under Scalable Hashing) algorithm. The
following sections provide details on how CRUSH enables Ceph to perform these operations seamlessly.

Prerequisites
Ceph pools
Ceph authentication
Ceph placement groups

20 IBM Storage Ceph


Ceph CRUSH ruleset
Ceph input/output operations
Ceph replication
Ceph erasure coding
Ceph ObjectStore
Ceph BlueStore
Ceph self management operations
Ceph heartbeat
Ceph peering
Ceph rebalancing and recovery
Ceph data integrity
Ceph high availability
Clustering the Ceph Monitor

Prerequisites
Edit online

A basic understanding of distributed storage systems.

Ceph pools
Edit online
The Ceph storage cluster stores data objects in logical partitions called Pools. Ceph administrators can create pools for particular
types of data, such as for block devices, object gateways, or simply just to separate one group of users from another.

From the perspective of a Ceph client, the storage cluster is very simple. When a Ceph client reads or writes data using an I/O
context, it always connects to a storage pool in the Ceph storage cluster. The client specifies the pool name, a user and a secret key,
so the pool appears to act as a logical partition with access controls to its data objects.

In actual fact, a Ceph pool is not only a logical partition for storing object data. A pool plays a critical role in how the Ceph storage
cluster distributes and stores data. However, these complex operations are completely transparent to the Ceph client.

Ceph pools define:

Pool Type: In early versions of Ceph, a pool simply maintained multiple deep copies of an object. Today, Ceph can maintain
multiple copies of an object, or it can use erasure coding to ensure durability. The data durability method is pool-wide, and
does not change after creating the pool. The pool type defines the data durability method when creating the pool. Pool types
are completely transparent to the client.

Placement Groups: In an exabyte scale storage cluster, a Ceph pool might store millions of data objects or more. Ceph must
handle many types of operations, including data durability via replicas or erasure code chunks, data integrity by scrubbing or
CRC checks, replication, rebalancing and recovery. Consequently, managing data on a per-object basis presents a scalability
and performance bottleneck. Ceph addresses this bottleneck by sharding a pool into placement groups. The CRUSH algorithm
computes the placement group for storing an object and computes the Acting Set of OSDs for the placement group. CRUSH
puts each object into a placement group. Then, CRUSH stores each placement group in a set of OSDs. System administrators
set the placement group count when creating or modifying a pool.

CRUSH Ruleset: CRUSH plays another important role: CRUSH can detect failure domains and performance domains. CRUSH
can identify OSDs by storage media type and organize OSDs hierarchically into nodes, racks, and rows. CRUSH enables Ceph
OSDs to store object copies across failure domains. For example, copies of an object may get stored in different server rooms,
aisles, racks and nodes. If a large part of a cluster fails, such as a rack, the cluster can still operate in a degraded state until
the cluster recovers.

Additionally, CRUSH enables clients to write data to particular types of hardware, such as SSDs, hard drives with SSD journals, or
hard drives with journals on the same drive as the data. The CRUSH ruleset determines failure domains and performance domains
for the pool. Administrators set the CRUSH ruleset when creating a pool.

NOTE: An administrator CANNOT change a pool’s ruleset after creating the pool.

IBM Storage Ceph 21


Durability: In exabyte scale storage clusters, hardware failure is an expectation and not an exception. When using data
objects to represent larger grained storage interfaces such as a block device, losing one or more data objects for that larger
grained interface can compromise the integrity of the larger grained storage entity potentially rendering it useless. So data
loss is intolerable. Ceph provides high data durability in two ways:

Replica pools store multiple deep copies of an object using the CRUSH failure domain to physically separate one data
object copy from another. That is, copies get distributed to separate physical hardware. This increases durability during
hardware failures.

Erasure coded pools store each object as K+M chunks, where K represents data chunks and M represents coding
chunks. The sum represents the number of OSDs used to store the object and the M value represents the number of
OSDs that can fail and still restore data should the M number of OSDs fail.

From the client perspective, Ceph is elegant and simple. The client simply reads from and writes to pools. However, pools play an
important role in data durability, performance and high availability.

Ceph authentication
Edit online
To identify users and protect against man-in-the-middle attacks, Ceph provides its cephx authentication system, which
authenticates users and daemons.

NOTE: The cephx protocol does not address data encryption for data transported over the network or data stored in OSDs.

Cephx uses shared secret keys for authentication, meaning both the client and the monitor cluster have a copy of the client’s secret
key. The authentication protocol enables both parties to prove to each other that they have a copy of the key without actually
revealing it. This provides mutual authentication, which means the cluster is sure the user possesses the secret key, and the user is
sure that the cluster has a copy of the secret key.

Cephx

The cephx authentication protocol operates in a manner similar to Kerberos.

A user/actor invokes a Ceph client to contact a monitor. Unlike Kerberos, each monitor can authenticate users and distribute keys, so
there is no single point of failure or bottleneck when using cephx. The monitor returns an authentication data structure similar to a
Kerberos ticket that contains a session key for use in obtaining Ceph services. This session key is itself encrypted with the user’s
permanent secret key, so that only the user can request services from the Ceph monitors. The client then uses the session key to
request its desired services from the monitor, and the monitor provides the client with a ticket that will authenticate the client to the
OSDs that actually handle data. Ceph monitors and OSDs share a secret, so the client can use the ticket provided by the monitor with
any OSD or metadata server in the cluster. Like Kerberos, cephx tickets expire, so an attacker cannot use an expired ticket or session
key obtained surreptitiously. This form of authentication will prevent attackers with access to the communications medium from
either creating bogus messages under another user’s identity or altering another user’s legitimate messages, as long as the user’s
secret key is not divulged before it expires.

To use cephx, an administrator must set up users first. In the following diagram, the client.admin user invokes ceph auth
get-or-create-key from the command line to generate a username and secret key. Ceph’s auth subsystem generates the
username and key, stores a copy with the monitor(s) and transmits the user’s secret back to the client.admin user. This means
that the client and the monitor share a secret key.

NOTE: The client.admin user must provide the user ID and secret key to the user in a secure manner.

Figure 1. CephX

22 IBM Storage Ceph


Ceph placement groups
Edit online
Storing millions of objects in a cluster and managing them individually is resource intensive. So Ceph uses placement groups (PGs) to
make managing a huge number of objects more efficient.

A PG is a subset of a pool that serves to contain a collection of objects. Ceph shards a pool into a series of PGs. Then, the CRUSH
algorithm takes the cluster map and the status of the cluster into account and distributes the PGs evenly and pseudo-randomly to
OSDs in the cluster.

Here is how it works:

When a system administrator creates a pool, CRUSH creates a user-defined number of PGs for the pool. Generally, the number of
PGs should be a reasonably fine-grained subset of the data. For example, 100 PGs per OSD per pool would mean that each PG
contains approximately 1% of the pool’s data.

The number of PGs has a performance impact when Ceph needs to move a PG from one OSD to another OSD. If the pool has too few
PGs, Ceph will move a large percentage of the data simultaneously and the network load will adversely impact the cluster’s
performance. If the pool has too many PGs, Ceph will use too much CPU and RAM when moving tiny percentages of the data and
thereby adversely impact the cluster’s performance. For details on calculating the number of PGs to achieve optimal performance,
see PG Count

Ceph ensures against data loss by storing replicas of an object or by storing erasure code chunks of an object. Since Ceph stores
objects or erasure code chunks of an object within PGs, Ceph replicates each PG in a set of OSDs called the Acting Set for each copy
of an object or each erasure code chunk of an object. A system administrator can determine the number of PGs in a pool and the
number of replicas or erasure code chunks. However, the CRUSH algorithm calculates which OSDs are in the acting set for a
particular PG.

The CRUSH algorithm and PGs make Ceph dynamic. Changes in the cluster map or the cluster state may result in Ceph moving PGs
from one OSD to another automatically.

Here are a few examples:

Expanding the Cluster: When adding a new host and its OSDs to the cluster, the cluster map changes. Since CRUSH evenly
and pseudo-randomly distributes PGs to OSDs throughout the cluster, adding a new host and its OSDs means that CRUSH will
reassign some of the pool’s placement groups to those new OSDs. That means that system administrators do not have to
rebalance the cluster manually. Also, it means that the new OSDs contain approximately the same amount of data as the other
OSDs. This also means that new OSDs do not contain newly written OSDs, preventing hot spots in the cluster.

An OSD Fails: When an OSD fails, the state of the cluster changes. Ceph temporarily loses one of the replicas or erasure code
chunks, and needs to make another copy. If the primary OSD in the acting set fails, the next OSD in the acting set becomes the
primary and CRUSH calculates a new OSD to store the additional copy or erasure code chunk.

By managing millions of objects within the context of hundreds to thousands of PGs, the Ceph storage cluster can grow, shrink and
recover from failure efficiently.

For Ceph clients, the CRUSH algorithm via librados makes the process of reading and writing objects very simple. A Ceph client
simply writes an object to a pool or reads an object from a pool. The primary OSD in the acting set can write replicas of the object or
erasure code chunks of the object to the secondary OSDs in the acting set on behalf of the Ceph client.

IBM Storage Ceph 23


If the cluster map or cluster state changes, the CRUSH computation for which OSDs store the PG will change too. For example, a
Ceph client may write object foo to the pool bar. CRUSH will assign the object to PG 1.a, and store it on OSD 5, which makes
replicas on OSD 10 and OSD 15 respectively. If OSD 5 fails, the cluster state changes. When the Ceph client reads object foo from
pool bar, the client via librados will automatically retrieve it from OSD 10 as the new primary OSD dynamically.

The Ceph client via librados connects directly to the primary OSD within an acting set when writing and reading objects. Since I/O
operations do not use a centralized broker, network oversubscription is typically NOT an issue with Ceph.

The following diagram depicts how CRUSH assigns objects to PGs, and PGs to OSDs. The CRUSH algorithm assigns the PGs to OSDs
such that each OSD in the acting set is in a separate failure domain, which typically means the OSDs will always be on separate
server hosts and sometimes in separate racks.

Figure 1. Placement Groups

Ceph CRUSH ruleset


Edit online
Ceph assigns a CRUSH ruleset to a pool. When a Ceph client stores or retrieves data in a pool, Ceph identifies the CRUSH ruleset, a
rule within the rule set, and the top-level bucket in the rule for storing and retrieving data. As Ceph processes the CRUSH rule, it
identifies the primary OSD that contains the placement group for an object. That enables the client to connect directly to the OSD,
access the placement group and read or write object data.

To map placement groups to OSDs, a CRUSH map defines a hierarchical list of bucket types. The list of bucket types are located
under types in the generated CRUSH map. The purpose of creating a bucket hierarchy is to segregate the leaf nodes by their failure
domains and/or performance domains, such as drive type, hosts, chassis, racks, power distribution units, pods, rows, rooms, and
data centers.

With the exception of the leaf nodes representing OSDs, the rest of the hierarchy is arbitrary. Administrators may define it according
to their own needs if the default types don’t suit their requirements. CRUSH supports a directed acyclic graph that models the Ceph
OSD nodes, typically in a hierarchy. So Ceph administrators can support multiple hierarchies with multiple root nodes in a single
CRUSH map. For example, an administrator can create a hierarchy representing higher cost SSDs for high performance, and a
separate hierarchy of lower cost hard drives with SSD journals for moderate performance.

Ceph input/output operations


Edit online
Ceph clients retrieve a Cluster Map from a Ceph monitor, bind to a pool, and perform input/output (I/O) on objects within placement
groups in the pool. The pool’s CRUSH ruleset and the number of placement groups are the main factors that determine how Ceph will
place the data. With the latest version of the cluster map, the client knows about all of the monitors and OSDs in the cluster and their
current state. However, the client doesn’t know anything about object locations.

The only inputs required by the client are the object ID and the pool name. It is simple: Ceph stores data in named pools. When a
client wants to store a named object in a pool it takes the object name, a hash code, the number of PGs in the pool and the pool
name as inputs; then, CRUSH (Controlled Replication Under Scalable Hashing) calculates the ID of the placement group and the
primary OSD for the placement group.

24 IBM Storage Ceph


Ceph clients use the following steps to compute PG IDs.

1. The client inputs the pool ID and the object ID. For example, pool = liverpool and object-id = john.

2. CRUSH takes the object ID and hashes it.

3. CRUSH calculates the hash modulo of the number of PGs to get a PG ID. For example, 58.

4. CRUSH calculates the primary OSD corresponding to the PG ID.

5. The client gets the pool ID given the pool name. For example, the pool liverpool is pool number 4.

6. The client prepends the pool ID to the PG ID. For example, 4.58.

7. The client performs an object operation such as write, read, or delete by communicating directly with the Primary OSD in the
Acting Set.

The topology and state of the Ceph storage cluster are relatively stable during a session. Empowering a Ceph client via librados to
compute object locations is much faster than requiring the client to make a query to the storage cluster over a chatty session for
each read/write operation. The CRUSH algorithm allows a client to compute where objects should be stored, and enables the client
to contact the primary OSD in the acting set directly to store or retrieve data in the objects. Since a cluster at the exabyte scale has
thousands of OSDs, network oversubscription between a client and a Ceph OSD is not a significant problem. If the cluster state
changes, the client can simply request an update to the cluster map from the Ceph monitor.

Ceph replication
Edit online
Like Ceph clients, Ceph OSDs can contact Ceph monitors to retrieve the latest copy of the cluster map. Ceph OSDs also use the
CRUSH algorithm, but they use it to compute where to store replicas of objects. In a typical write scenario, a Ceph client uses the
CRUSH algorithm to compute the placement group ID and the primary OSD in the Acting Set for an object. When the client writes the
object to the primary OSD, the primary OSD finds the number of replicas that it should store. The value is found in the
osd_pool_default_size setting. Then, the primary OSD takes the object ID, pool name and the cluster map and uses the CRUSH
algorithm to calculate the IDs of secondary OSDs for the acting set. The primary OSD writes the object to the secondary OSDs. When
the primary OSD receives an acknowledgment from the secondary OSDs and the primary OSD itself completes its write operation, it
acknowledges a successful write operation to the Ceph client.

Figure 1. Replicated IO

With the ability to perform data replication on behalf of Ceph clients, Ceph OSD Daemons relieve Ceph clients from that duty, while
ensuring high data availability and data safety.

NOTE: The primary OSD and the secondary OSDs are typically configured to be in separate failure domains. CRUSH computes the
IDs of the secondary OSDs with consideration for the failure domains.

Data copies In a replicated storage pool, Ceph needs multiple copies of an object to operate in a degraded state. Ideally, a Ceph
storage cluster enables a client to read and write data even if one of the OSDs in an acting set fails. For this reason, Ceph defaults to

IBM Storage Ceph 25


making three copies of an object with a minimum of two copies clean for write operations. Ceph will still preserve data even if two
OSDs fail. However, it will interrupt write operations.

In an erasure-coded pool, Ceph needs to store chunks of an object across multiple OSDs so that it can operate in a degraded state.
Similar to replicated pools, ideally an erasure-coded pool enables a Ceph client to read and write in a degraded state.

IMPORTANT: IBM supports the following jerasure coding values for k, and m:

k=8 m=3

k=8 m=4

k=4 m=2

Ceph erasure coding


Edit online
Ceph can load one of many erasure code algorithms. The earliest and most commonly used is the Reed-Solomon algorithm. An
erasure code is actually a forward error correction (FEC) code. FEC code transforms a message of K chunks into a longer message
called a code word of N chunks, such that Ceph can recover the original message from a subset of the N chunks.

More specifically, N = K+M where the variable K is the original amount of data chunks. The variable M stands for the extra or
redundant chunks that the erasure code algorithm adds to provide protection from failures. The variable N is the total number of
chunks created after the erasure coding process. The value of M is simply N-K which means that the algorithm computes N-K
redundant chunks from K original data chunks. This approach guarantees that Ceph can access all the original data. The system is
resilient to arbitrary N-K failures. For instance, in a 10 K of 16 N configuration, or erasure coding 10/16, the erasure code algorithm
adds six extra chunks to the 10 base chunks K. For example, in a M = K-N or 16-10 = 6 configuration, Ceph will spread the 16
chunks N across 16 OSDs. The original file could be reconstructed from the 10 verified N chunks even if 6 OSDs fail—​ensuring that the
IBM Ceph Storage cluster will not lose data, and thereby ensures a very high level of fault tolerance.

Like replicated pools, in an erasure-coded pool the primary OSD in the up set receives all write operations. In replicated pools, Ceph
makes a deep copy of each object in the placement group on the secondary OSDs in the set. For erasure coding, the process is a bit
different. An erasure coded pool stores each object as K+M chunks. It is divided into K data chunks and M coding chunks. The pool is
configured to have a size of K+M so that Ceph stores each chunk in an OSD in the acting set. Ceph stores the rank of the chunk as an
attribute of the object. The primary OSD is responsible for encoding the payload into K+M chunks and sends them to the other OSDs.
The primary OSD is also responsible for maintaining an authoritative version of the placement group logs.

For example, in a typical configuration a system administrator creates an erasure coded pool to use six OSDs and sustain the loss of
two of them. That is, (K+M = 6) such that (M = 2).

When Ceph writes the object NYAN containing ABCDEFGHIJKL to the pool, the erasure encoding algorithm splits the content into
four data chunks by simply dividing the content into four parts: ABC, DEF, GHI, and JKL. The algorithm will pad the content if the
content length is not a multiple of K. The function also creates two coding chunks: the fifth with YXY and the sixth with QGC. Ceph
stores each chunk on an OSD in the acting set, where it stores the chunks in objects that have the same name, NYAN, but reside on
different OSDs. The algorithm must preserve the order in which it created the chunks as an attribute of the object shard_t, in
addition to its name. For example, Chunk 1 contains ABC and Ceph stores it on OSD5 while chunk 5 contains YXY and Ceph stores it
on OSD4.

Figure 1. Ceph client object watch and notify

26 IBM Storage Ceph


In a recovery scenario, the client attempts to read the object NYAN from the erasure-coded pool by reading chunks 1 through 6. The
OSD informs the algorithm that chunks 2 and 6 are missing. These missing chunks are called erasures. For example, the primary OSD
could not read chunk 6 because the OSD6 is out, and could not read chunk 2, because OSD2 was the slowest and its chunk was not
taken into account. However, as soon as the algorithm has four chunks, it reads the four chunks: chunk 1 containing ABC, chunk 3
containing GHI, chunk 4 containing JKL, and chunk 5 containing YXY. Then, it rebuilds the original content of the object
ABCDEFGHIJKL, and original content of chunk 6, which contained QGC.

Splitting data into chunks is independent from object placement. The CRUSH ruleset along with the erasure-coded pool profile
determines the placement of chunks on the OSDs. For instance, using the Locally Repairable Code (lrc) plugin in the erasure code
profile creates additional chunks and requires fewer OSDs to recover from. For example, in an lrc profile configuration K=4 M=2
L=3, the algorithm creates six chunks (K+M), just as the jerasure plugin would, but the locality value (L=3) requires that the
algorithm create 2 more chunks locally. The algorithm creates the additional chunks as such, (K+M)/L. If the OSD containing chunk
0 fails, this chunk can be recovered by using chunks 1, 2 and the first local chunk. In this case, the algorithm only requires 3 chunks
for recovery instead of 5.

NOTE: Using erasure-coded pools disables Object Map.

Reference
Edit online

For more information about CRUSH, the erasure-coding profiles, and plugins, see Storage strategies.

For more details on Object Map, see Ceph client object map.

Ceph ObjectStore

Edit online
ObjectStore provides a low-level interface to an OSD’s raw block device. When a client reads or writes data, it interacts with the
ObjectStore interface. Ceph write operations are essentially ACID transactions: that is, they provide Atomicity, Consistency,
Isolation and Durability. ObjectStore ensures that a Transaction is all-or-nothing to provide Atomicity. The ObjectStore
also handles object semantics. An object stored in the storage cluster has a unique identifier, object data and metadata. So
ObjectStore provides Consistency by ensuring that Ceph object semantics are correct. ObjectStore also provides the
Isolation portion of an ACID transaction by invoking a Sequencer on write operations to ensure that Ceph write operations occur
sequentially. In contrast, an OSDs replication or erasure coding functionality provides the Durability component of the ACID
transaction. Since ObjectStore is a low-level interface to storage media, it also provides performance statistics.

Ceph implements several concrete methods for storing data:

IBM Storage Ceph 27


BlueStore
A production grade implementation using a raw block device to store object data.

Memstore
A developer implementation for testing read/write operations directly in RAM.

K/V Store
An internal implementation for Ceph’s use of key/value databases.

Since administrators will generally only address BlueStore, the following sections will only describe those implementations in
greater detail.

Ceph BlueStore
Edit online
BlueStore is the next generation storage implementation for Ceph. As the market for storage devices now includes solid state
drives or SSDs and non-volatile memory over PCI Express or NVMe, their use in Ceph reveals some of the limitations of the
FileStore storage implementation. While FileStore has many improvements to facilitate SSD and NVMe storage, other
limitations remain. Among them, increasing placement groups remains computationally expensive, and the double write penalty
remains. Whereas, FileStore interacts with a file system on a block device, BlueStore eliminates that layer of indirection and
directly consumes a raw block device for object storage. BlueStore uses the very light weight BlueFS file system on a small
partition for its k/v databases. BlueStore eliminates the paradigm of a directory representing a placement group, a file representing
an object and file XATTRs representing metadata. BlueStore also eliminates the double write penalty of FileStore, so write
operations are nearly twice as fast with BlueStore under most workloads.

BlueStore stores data as:

Object Data
In BlueStore, Ceph stores objects as blocks directly on a raw block device. The portion of the raw block device that stores
object data does NOT contain a filesystem. The omission of the filesystem eliminates a layer of indirection and thereby
improves performance. However, much of the BlueStore performance improvement comes from the block database and
write-ahead log.

Block Database
In BlueStore, the block database handles the object semantics to guarantee Consistency. An object’s unique identifier is a
key in the block database. The values in the block database consist of a series of block addresses that refer to the stored
object data, the object’s placement group, and object metadata. The block database may reside on a BlueFS partition on the
same raw block device that stores the object data, or it may reside on a separate block device, usually when the primary block
device is a hard disk drive and an SSD or NVMe will improve performance. The block database provides a number of
improvements over FileStore; namely, the key/value semantics of BlueStore do not suffer from the limitations of
filesystem XATTRs. BlueStore may assign objects to other placement groups quickly within the block database without the
overhead of moving files from one directory to another, as is the case in FileStore. BlueStore also introduces new
features. The block database can store the checksum of the stored object data and its metadata, allowing full data checksum
operations for each read, which is more efficient than periodic scrubbing to detect bit rot. BlueStore can compress an object
and the block database can store the algorithm used to compress an object—ensuring that read operations select the
appropriate algorithm for decompression.

Write-ahead Log
In BlueStore, the write-ahead log ensures Atomicity, similar to the journaling functionality of FileStore. Like FileStore,
BlueStore logs all aspects of each transaction. However, the BlueStore write-ahead log or WAL can perform this function
simultaneously, which eliminates the double write penalty of FileStore. Consequently, BlueStore is nearly twice as fast as
FileStore on write operations for most workloads. BlueStore can deploy the WAL on the same device for storing object
data, or it may deploy the WAL on another device, usually when the primary block device is a hard disk drive and an SSD or
NVMe will improve performance.

NOTE: It is only helpful to store a block database or a write-ahead log on a separate block device if the separate device is faster than
the primary storage device. For example, SSD and NVMe devices are generally faster than HDDs. Placing the block database and the
WAL on separate devices may also have performance benefits due to differences in their workloads.

Ceph self management operations


28 IBM Storage Ceph
Edit online
Ceph clusters perform a lot of self monitoring and management operations automatically. For example, Ceph OSDs can check the
cluster health and report back to the Ceph monitors. By using CRUSH to assign objects to placement groups and placement groups
to a set of OSDs, Ceph OSDs can use the CRUSH algorithm to rebalance the cluster or recover from OSD failures dynamically.

Ceph heartbeat
Edit online
Ceph OSDs join a cluster and report to Ceph Monitors on their status. At the lowest level, the Ceph OSD status is up or down
reflecting whether or not it is running and able to service Ceph client requests. If a Ceph OSD is down and in the Ceph storage
cluster, this status may indicate the failure of the Ceph OSD. If a Ceph OSD is not running for example, it crashes the Ceph OSD
cannot notify the Ceph Monitor that it is down. The Ceph Monitor can ping a Ceph OSD daemon periodically to ensure that it is
running. However, heartbeating also empowers Ceph OSDs to determine if a neighboring OSD is down, to update the cluster map and
to report it to the Ceph Monitors. This means that Ceph Monitors can remain light weight processes.

Ceph peering
Edit online
Ceph stores copies of placement groups on multiple OSDs. Each copy of a placement group has a status. These OSDs peer check
each other to ensure that they agree on the status of each copy of the PG. Peering issues usually resolve themselves.

NOTE: When Ceph monitors agree on the state of the OSDs storing a placement group, that does not mean that the placement group
has the latest contents.

When Ceph stores a placement group in an acting set of OSDs, refer to them as Primary, Secondary, and so forth. By convention, the
Primary is the first OSD in the Acting Set. The Primary that stores the first copy of a placement group is responsible for coordinating
the peering process for that placement group. The Primary is the ONLY OSD that will accept client-initiated writes to objects for a
given placement group where it acts as the Primary.

An Acting Set is a series of OSDs that are responsible for storing a placement group. An Acting Set may refer to the Ceph OSD
Daemons that are currently responsible for the placement group, or the Ceph OSD Daemons that were responsible for a particular
placement group as of some epoch.

The Ceph OSD daemons that are part of an Acting Set may not always be up. When an OSD in the Acting Set is up, it is part of the Up
Set. The Up Set is an important distinction, because Ceph can remap PGs to other Ceph OSDs when an OSD fails.

NOTE: In an Acting Set for a PG containing osd.25, osd.32 and osd.61, the first OSD, osd.25, is the Primary. If that OSD fails, the
Secondary, osd.32, becomes the Primary, and Ceph will remove osd.25 from the Up Set.

Ceph rebalancing and recovery


Edit online
When an administrator adds a Ceph OSD to a Ceph storage cluster, Ceph updates the cluster map. This change to the cluster map
also changes object placement, because the modified cluster map changes an input for the CRUSH calculations. CRUSH places data
evenly, but pseudo randomly. So only a small amount of data moves when an administrator adds a new OSD. The amount of data is
usually the number of new OSDs divided by the total amount of data in the cluster. For example, in a cluster with 50 OSDs, 1/50th or
2% of the data might move when adding an OSD.

The following diagram depicts the rebalancing process where some, but not all of the PGs migrate from existing OSDs, OSD 1 and 2
in the diagram, to the new OSD, OSD 3, in the diagram. Even when rebalancing, CRUSH is stable. Many of the placement groups
remain in their original configuration, and each OSD gets some added capacity, so there are no load spikes on the new OSD after the
cluster rebalances.

Figure 1. Rebalancing and recovery

IBM Storage Ceph 29


Ceph data integrity
Edit online
As part of maintaining data integrity, Ceph provides numerous mechanisms to guard against bad disk sectors and bit rot.

Scrubbing
Ceph OSD Daemons can scrub objects within placement groups. That is, Ceph OSD Daemons can compare object metadata in
one placement group with its replicas in placement groups stored on other OSDs. Scrubbing usually performed daily catches
bugs or storage errors. Ceph OSD Daemons also perform deeper scrubbing by comparing data in objects bit-for-bit. Deep
scrubbing usually performed weekly finds bad sectors on a drive that weren’t apparent in a light scrub.

CRC Checks
In IBM Ceph Storage 5 when using BlueStore, Ceph can ensure data integrity by conducting a cyclical redundancy check
(CRC) on write operations; then, store the CRC value in the block database. On read operations, Ceph can retrieve the CRC
value from the block database and compare it with the generated CRC of the retrieved data to ensure data integrity instantly.

Ceph high availability


Edit online
In addition to the high scalability enabled by the CRUSH algorithm, Ceph must also maintain high availability. This means that Ceph
clients must be able to read and write data even when the cluster is in a degraded state, or when a monitor fails.

Clustering the Ceph Monitor


Edit online
Before Ceph clients can read or write data, they must contact a Ceph Monitor to obtain the most recent copy of the cluster map. A
IBM Ceph Storage cluster can operate with a single monitor; however, this introduces a single point of failure. That is, if the monitor
goes down, Ceph clients cannot read or write data.

For added reliability and fault tolerance, Ceph supports a cluster of monitors. In a cluster of Ceph Monitors, latency and other faults
can cause one or more monitors to fall behind the current state of the cluster. For this reason, Ceph must have agreement among
various monitor instances regarding the state of the storage cluster. Ceph always uses a majority of monitors and the Paxos
algorithm to establish a consensus among the monitors about the current state of the storage cluster. Ceph Monitors nodes require
NTP to prevent clock drift.

30 IBM Storage Ceph


Storage administrators usually deploy Ceph with an odd number of monitors so determining a majority is efficient. For example, a
majority may be 1, 2:3, 3:5, 4:6, and so forth.

Ceph client components


Edit online
Ceph clients differ materially in how they present data storage interfaces. A Ceph block device presents block storage that mounts
just like a physical storage drive. A Ceph gateway presents an object storage service with S3-compliant and Swift-compliant RESTful
interfaces with its own user management. However, all Ceph clients use the Reliable Autonomic Distributed Object Store (RADOS)
protocol to interact with the IBM Storage Ceph cluster.

They all have the same basic needs:

The Ceph configuration file, and the Ceph monitor address.

The pool name.

The user name and the path to the secret key.

Ceph clients tend to follow some similar patterns, such as object-watch-notify and striping. The following sections describe a little
bit more about RADOS, librados and common patterns used in Ceph clients.

Prerequisites
Ceph client native protocol
Ceph client object watch and notify
Ceph client Mandatory Exclusive Locks
Ceph client object map
Ceph client data striping

Prerequisites
Edit online

A basic understanding of distributed storage systems.

Prerequisites
Edit online

A basic understanding of distributed storage systems.

Ceph client native protocol


Edit online
Modern applications need a simple object storage interface with asynchronous communication capability. The Ceph Storage Cluster
provides a simple object storage interface with asynchronous communication capability. The interface provides direct, parallel
access to objects throughout the cluster.

Pool Operations

Snapshots

Read/Write Objects

Create or Remove

Entire Object or Byte Range

IBM Storage Ceph 31


Append or Truncate

Create/Set/Get/Remove XATTRs

Create/Set/Get/Remove Key/Value Pairs

Compound operations and dual-ack semantics

Ceph client object watch and notify


Edit online
A Ceph client can register a persistent interest with an object and keep a session to the primary OSD open. The client can send a
notification message and payload to all watchers and receive notification when the watchers receive the notification. This enables a
client to use any object as a synchronization/communication channel.

Figure 1. Ceph client object watch and notify

Ceph client Mandatory Exclusive Locks

Edit online
Mandatory Exclusive Locks is a feature that locks an RBD to a single client, if multiple mounts are in place. This helps address the
write conflict situation when multiple mounted clients try to write to the same object. This feature is built on object-watch-
notify explained in the previous section. So, when writing, if one client first establishes an exclusive lock on an object, another
mounted client will first check to see if a peer has placed a lock on the object before writing.

With this feature enabled, only one client can modify an RBD device at a time, especially when changing internal RBD structures
during operations like snapshot create/delete. It also provides some protection for failed clients. For instance, if a virtual
machine seems to be unresponsive and you start a copy of it with the same disk elsewhere, the first one will be blacklisted in Ceph
and unable to corrupt the new one.

32 IBM Storage Ceph


Mandatory Exclusive Locks are not enabled by default. You have to explicitly enable it with --image-feature parameter when
creating an image.

Example

[root@mon ~]# [root@mon ~]# rbd create --size 102400 mypool/myimage --image-feature 13

Here, the numeral 13 is a summation of 1, 4 and 8 where 1 enables layering support, 4 enables exclusive locking support and 8
enables object map support. So, the above command creates a 100 GB RBD image, enable layering, exclusive lock and object map.

Mandatory Exclusive Locks is also a prerequisite for object map. Without enabling exclusive locking support, object map support
cannot be enabled.

Mandatory Exclusive Locks also does some ground work for mirroring.

Ceph client object map


Edit online
Object map is a feature that tracks the presence of backing RADOS objects when a client writes to an rbd image. When a write
occurs, that write is translated to an offset within a backing RADOS object. When the object map feature is enabled, the presence of
these RADOS objects is tracked. So, we can know if the objects actually exist. Object map is kept in-memory on the librbd client so it
can avoid querying the OSDs for objects that it knows don’t exist. In other words, object map is an index of the objects that actually
exist.

Object map is beneficial for certain operations, viz:

Resize

Export

Copy

Flatten

Delete

Read

A shrink resize operation is like a partial delete where the trailing objects are deleted.

An export operation knows which objects are to be requested from RADOS.

A copy operation knows which objects exist and need to be copied. It does not have to iterate over potentially hundreds and
thousands of possible objects.

A flatten operation performs a copy-up for all parent objects to the clone so that the clone can be detached from the parent i.e, the
reference from the child clone to the parent snapshot can be removed. So, instead of all potential objects, copy-up is done only for
the objects that exist.

A delete operation deletes only the objects that exist in the image.

A read operation skips the read for objects it knows doesn’t exist.

So, for operations like resize, shrinking only, exporting, copying, flattening, and deleting, these operations would need to issue an
operation for all potentially affected RADOS objects, whether they exist or not. With object map enabled, if the object doesn’t exist,
the operation need not be issued.

For example, if we have a 1 TB sparse RBD image, it can have hundreds and thousands of backing RADOS objects. A delete operation
without object map enabled would need to issue a remove object operation for each potential object in the image. But if object
map is enabled, it only needs to issue remove object operations for the objects that exist.

Object map is valuable against clones that don’t have actual objects but get objects from parents. When there is a cloned image, the
clone initially has no objects and all reads are redirected to the parent. So, object map can improve reads as without the object map,
first it needs to issue a read operation to the OSD for the clone, when that fails, it issues another read to the parent — with object
map enabled. It skips the read for objects it knows doesn’t exist.

IBM Storage Ceph 33


Object map is not enabled by default. You have to explicitly enable it with --image-features parameter when creating an image.
Also, Mandatory Exclusive Locks is a prerequisite for object map. Without enabling exclusive locking support, object map
support cannot be enabled. To enable object map support when creating a image, execute:

Example

[root@mon ~]# rbd create --size 102400 mypool/myimage --image-feature 13

Here, the numeral 13 is a summation of 1, 4 and 8 where 1 enables layering support, 4 enables exclusive locking support and 8
enables object map support. So, the above command creates a 100 GB RBD image, enable layering, exclusive lock and object map.

Ceph client data striping


Edit online
Storage devices have throughput limitations, which impact performance and scalability. So storage systems often support striping—
storing sequential pieces of information across multiple storage devices—to increase throughput and performance. The most
common form of data striping comes from RAID. The RAID type most similar to Ceph’s striping is RAID 0, or a striped volume. Ceph’s
striping offers the throughput of RAID 0 striping, the reliability of n-way RAID mirroring and faster recovery.

Ceph provides three types of clients: Ceph Block Device, Ceph Filesystem, and Ceph Object Storage. A Ceph Client converts its data
from the representation format it provides to its users, such as a block device image, RESTful objects, CephFS filesystem directories,
into objects for storage in the Ceph Storage Cluster.

TIP: The objects Ceph stores in the Ceph Storage Cluster are not striped. Ceph Object Storage, Ceph Block Device, and the Ceph
Filesystem stripe their data over multiple Ceph Storage Cluster objects. Ceph Clients that write directly to the Ceph storage cluster
using librados must perform the striping, and parallel I/O for themselves to obtain these benefits.

The simplest Ceph striping format involves a stripe count of 1 object. Ceph Clients write stripe units to a Ceph Storage Cluster object
until the object is at its maximum capacity, and then create another object for additional stripes of data. The simplest form of striping
may be sufficient for small block device images, S3 or Swift objects. However, this simple form doesn’t take maximum advantage of
Ceph’s ability to distribute data across placement groups, and consequently doesn’t improve performance very much. The following
diagram depicts the simplest form of striping:

Figure 1. Data Striping

If you anticipate large images sizes, large S3 or Swift objects for example, video, you may see considerable read/write performance
improvements by striping client data over multiple objects within an object set. Significant write performance occurs when the client
writes the stripe units to their corresponding objects in parallel. Since objects get mapped to different placement groups and further

34 IBM Storage Ceph


mapped to different OSDs, each write occurs in parallel at the maximum write speed. A write to a single disk would be limited by the
head movement for example, 6ms per seek and bandwidth of that one device for example, 100MB/s. By spreading that write over
multiple objects, which map to different placement groups and OSDs, Ceph can reduce the number of seeks per drive and combine
the throughput of multiple drives to achieve much faster write or read speeds.

NOTE: Striping is independent of object replicas. Since CRUSH replicates objects across OSDs, stripes get replicated automatically.

In the following diagram, client data gets striped across an object set (object set 1 in the following diagram) consisting of 4
objects, where the first stripe unit is stripe unit 0 in object 0, and the fourth stripe unit is stripe unit 3 in object 3.
After writing the fourth stripe, the client determines if the object set is full. If the object set is not full, the client begins writing a
stripe to the first object again, see object 0 in the following diagram. If the object set is full, the client creates a new object set, see
object set 2 in the following diagram, and begins writing to the first stripe, with a stripe unit of 16, in the first object in the new
object set, see object 4 in the diagram below.

Figure 2. Client Data Stripping

Three important variables determine how Ceph stripes data:

Object Size

IBM Storage Ceph 35


Objects in the Ceph Storage Cluster have a maximum configurable size, such as 2 MB, or 4 MB. The object size should be large
enough to accommodate many stripe units, and should be a multiple of the stripe unit.

IMPORTANT: IBM recommends a safe maximum value of 16 MB.

Stripe Width
Stripes have a configurable unit size, for example 64 KB. The Ceph Client divides the data it will write to objects into equally
sized stripe units, except for the last stripe unit. A stripe width should be a fraction of the Object Size so that an object may
contain many stripe units.

Stripe Count
The Ceph Client writes a sequence of stripe units over a series of objects determined by the stripe count. The series of objects
is called an object set. After the Ceph Client writes to the last object in the object set, it returns to the first object in the object
set.

IMPORTANT: Test the performance of your striping configuration before putting your cluster into production. You CANNOT change
these striping parameters after you stripe the data and write it to objects.

Once the Ceph Client has striped data to stripe units and mapped the stripe units to objects, Ceph’s CRUSH algorithm maps the
objects to placement groups, and the placement groups to Ceph OSD Daemons before the objects are stored as files on a storage
disk.

NOTE: Since a client writes to a single pool, all data striped into objects get mapped to placement groups in the same pool. So they
use the same CRUSH map and the same access controls.

Ceph on-wire encryption


Edit online
You can enable encryption for all Ceph traffic over the network with the introduction of the messenger version 2 protocol. The
secure mode setting for messenger v2 encrypts communication between Ceph daemons and Ceph clients, giving you end-to-end
encryption.

The second version of Ceph’s on-wire protocol, msgr2, includes several new features:

A secure mode encrypting all data moving through the network.

Encapsulation improvement of authentication payloads.

Improvements to feature advertisement and negotiation.

The Ceph daemons bind to multiple ports allowing both the legacy, v1-compatible, and the new, v2-compatible, Ceph clients to
connect to the same storage cluster. Ceph clients or other Ceph daemons connecting to the Ceph Monitor daemon will try to use the
v2 protocol first, if possible, but if not, then the legacy v1 protocol will be used. By default, both messenger protocols, v1 and v2, are
enabled. The new v2 port is 3300, and the legacy v1 port is 6789, by default.

The messenger v2 protocol has two configuration options that control whether the v1 or the v2 protocol is used:

ms_bind_msgr1
This option controls whether a daemon binds to a port speaking the v1 protocol; it is true by default.

ms_bind_msgr2
This option controls whether a daemon binds to a port speaking the v2 protocol; it is true by default.

Similarly, two options control based on IPv4 and IPv6 addresses used:

ms_bind_ipv4
This option controls whether a daemon binds to an IPv4 address; it is true by default.

ms_bind_ipv6
This option controls whether a daemon binds to an IPv6 address; it is true by default.

The msgr2 protocol supports two connection modes:

crc

Provides strong initial authentication when a connection is established with cephx.

36 IBM Storage Ceph


Provides a crc32c integrity check to protect against bit flips.

Does not provide protection against a malicious man-in-the-middle attack.

Does not prevent an eavesdropper from seeing all post-authentication traffic.

secure

Provides strong initial authentication when a connection is established with cephx.

Provides full encryption of all post-authentication traffic.

Provides a cryptographic integrity check.

The default mode is crc.

Ensure that you consider cluster CPU requirements when you plan the IBM Storage Ceph cluster deployment to include encryption
overhead.

IMPORTANT: Using secure mode is supported by Ceph clients using librbd, such as OpenStack Nova, Glance, and Cinder.

Address Changes

For both versions of the messenger protocol to coexist in the same storage cluster, the address formatting has changed:

Syntax for old address format

IP_ADDR:PORT/CLIENT_ID

For example, 1.2.3.4:5678/91011

Syntax for new address format

PROTOCOL_VERSION:IP_ADDR:PORT/CLIENT_ID

For example, v2:1.2.3.4:5678/91011, where PROTOCOL\_VERSION can be either v1 or v2.

Because the Ceph daemons now bind to multiple ports, the daemons display multiple addresses instead of a single address. Here is
an example from a dump of the monitor map:

epoch 1
fsid 50fcf227-be32-4bcb-8b41-34ca8370bd17
last_changed 2021-12-12 11:10:46.700821
created 2021-12-12 11:10:46.700821
min_mon_release 14 (nautilus)
0: [v2:10.0.0.10:3300/0,v1:10.0.0.10:6789/0] mon.a
1: [v2:10.0.0.11:3300/0,v1:10.0.0.11:6789/0] mon.b
2: [v2:10.0.0.12:3300/0,v1:10.0.0.12:6789/0] mon.c

Also, the mon_host configuration option and specifying addresses on the command line, using -m, supports the new address format.

Connection Phases

There are four phases for making an encrypted connection:

Banner
On connection, both the client and the server send a banner. Currently, the Ceph banner is ceph 0 0n.

Authentication Exchange
All data, sent or received, is contained in a frame for the duration of the connection. The server decides if authentication has
completed, and what the connection mode will be. The frame format is fixed, and can be in three different forms depending on
the authentication flags being used.

Message Flow Handshake Exchange


The peers identify each other and establish a session. The client sends the first message, and the server will reply with the
same message. The server can close connections if the client talks to the wrong daemon. For new sessions, the client and
server proceed to exchanging messages. Client cookies are used to identify a session, and can reconnect to an existing
session.

Message Exchange
The client and server start exchanging messages, until the connection is closed.

IBM Storage Ceph 37


Reference
Edit online
See the IBM Ceph Storage Data Security and Hardening Guide for details on enabling the msgr2 protocol.

Data Security and Hardening


Edit online
This section of the document provides data security and hardening information for Ceph Storage Clusters and their clients.

Introduction to data security


Threat and Vulnerability Management
Encryption and Key Management
Identity and Access Management
Infrastructure Security
Data Retention
Federal Information Processing Standard (FIPS)
Summary

Introduction to data security


Edit online
Security is an important concern and should be a strong focus of any IBM Storage Ceph deployment. Data breaches and downtime
are costly and difficult to manage, laws may require passing audits and compliance processes, and projects have an expectation of a
certain level of data privacy and security. This document provides a general introduction to security for IBM Storage Ceph, as well as
the role of IBM in supporting your system’s security.

Preface
Introduction to IBM Storage Ceph
Supporting Software

Preface
Edit online
This document provides advice and good practice information for hardening the security of IBM Storage Ceph, with a focus on the
Ceph Orchestrator using cephadm for IBM Storage Ceph deployments.

While following the instructions in this guide will help harden the security of your environment, we do not guarantee security or
compliance from following these recommendations.

Introduction to IBM Storage Ceph


Edit online
IBM Storage Ceph is a highly scalable and reliable object storage solution, which is typically deployed in conjunction with cloud
computing solutions like OpenStack, as a standalone storage service, or as network attached storage using interfaces.

All IBM Storage Ceph deployments consist of a storage cluster commonly referred to as the Ceph Storage Cluster or RADOS (Reliable
Autonomous Distributed Object Store), which consists of three types of daemons:

Ceph Monitors (ceph-mon): Ceph monitors provide a few critical functions such as establishing an agreement about the state
of the cluster, maintaining a history of the state of the cluster such as whether an OSD is up and running and in the cluster,
providing a list of pools through which clients write and read data, and providing authentication for clients and the Ceph
Storage Cluster daemons.

38 IBM Storage Ceph


Ceph Managers (ceph-mgr): Ceph manager daemons track the status of peering between copies of placement groups
distributed across Ceph OSDs, a history of the placement group states, and metrics about the Ceph cluster. They also provide
interfaces for external monitoring and management systems.

Ceph OSDs (ceph-osd): Ceph Object Storage Daemons (OSDs) store and serve client data, replicate client data to secondary
Ceph OSD daemons, track and report to Ceph Monitors on their health and on the health of neighboring OSDs, dynamically
recover from failures, and backfill data when the cluster size changes, among other functions.

All IBM Storage Ceph deployments store end-user data in the Ceph Storage Cluster or RADOS (Reliable Autonomous Distributed
Object Store). Generally, users DO NOT interact with the Ceph Storage Cluster directly; rather, they interact with a Ceph client.

There are three primary Ceph Storage Cluster clients:

Ceph Object Gateway (radosgw): The Ceph Object Gateway, also known as RADOS Gateway, radosgw or rgw provides an
object storage service with RESTful APIs. Ceph Object Gateway stores data on behalf of its clients in the Ceph Storage Cluster
or RADOS.

Ceph Block Device (rbd): The Ceph Block Device provides copy-on-write, thin-provisioned, and cloneable virtual block
devices to a Linux kernel via Kernel RBD (krbd) or to cloud computing solutions like OpenStack via librbd.

Ceph File System (cephfs): The Ceph File System consists of one or more Metadata Servers (mds), which store the inode
portion of a file system as objects on the Ceph Storage Cluster. Ceph file systems can be mounted via a kernel client, a FUSE
client, or via the libcephfs library for cloud computing solutions like OpenStack.

Additional clients include librados, which enables developers to create custom applications to interact with the Ceph Storage
cluster and command line interface clients for administrative purposes.

Supporting Software
Edit online
An important aspect of IBM Storage Ceph security is to deliver solutions that have security built-in upfront, that IBM supports over
time. Specific steps which IBM takes with IBM Storage Ceph include:

Maintaining upstream relationships and community involvement to help focus on security from the start.

Selecting and configuring packages based on their security and performance track records.

Building binaries from associated source code (instead of simply accepting upstream builds).

Applying a suite of inspection and quality assurance tools to prevent an extensive array of potential security issues and
regressions.

Digitally signing all released packages and distributing them through cryptographically authenticated distribution channels.

Providing a single, unified mechanism for distributing patches and updates.

In addition, IBM maintains a dedicated security team that analyzes threats and vulnerabilities against our products, and provides
relevant advice and updates through the Customer Portal. This team determines which issues are important, as opposed to those
that are mostly theoretical problems. The IBM Product Security team maintains expertise in, and makes extensive contributions to
the upstream communities associated with our subscription products. A key part of the process, IBM Security Advisories, deliver
proactive notification of security flaws affecting IBM solutions, along with patches that are frequently distributed on the same day
the vulnerability is first published.

Threat and Vulnerability Management


Edit online
IBM Storage Ceph is typically deployed in conjunction with cloud computing solutions, so it can be helpful to think about an IBM
Storage Ceph deployment abstractly as one of many series of components in a larger deployment. These deployments typically have
shared security concerns, which this guide refers to as Security Zones. Threat actors and vectors are classified based on their
motivation and access to resources. The intention is to provide you with a sense of the security concerns for each zone, depending on
your objectives.

IBM Storage Ceph 39


Threat Actors
Security Zones
Connecting Security Zones
Security-Optimized Architecture

Threat Actors
Edit online
A threat actor is an abstract way to refer to a class of adversary that you might attempt to defend against. The more capable the
actor, the more rigorous the security controls that are required for successful attack mitigation and prevention. Security is a matter of
balancing convenience, defense, and cost, based on requirements.

In some cases, it’s impossible to secure an IBM Storage Ceph deployment against all threat actors described here. When deploying
IBM Storage Ceph, you must decide where the balance lies for your deployment and usage.

As part of your risk assessment, you must also consider the type of data you store and any accessible resources, as this will also
influence certain actors. However, even if your data is not appealing to threat actors, they could simply be attracted to your
computing resources.

Nation-State Actors: This is the most capable adversary. Nation-state actors can bring tremendous resources against a
target. They have capabilities beyond that of any other actor. It’s difficult to defend against these actors without stringent
controls in place, both human and technical.

Serious Organized Crime: This class describes highly capable and financially driven groups of attackers. They are able to fund
in-house exploit development and target research. In recent years, the rise of organizations such as the Russian Business
Network, a massive cyber-criminal enterprise, has demonstrated how cyber attacks have become a commodity. Industrial
espionage falls within the serious organized crime group.

Highly Capable Groups: This refers to ‘Hacktivist’ type organizations who are not typically commercially funded, but can pose
a serious threat to service providers and cloud operators.

Motivated Individuals Acting Alone: These attackers come in many guises, such as rogue or malicious employees,
disaffected customers, or small-scale industrial espionage.

Script Kiddies: These attackers don’t target a specific organization, but run automated vulnerability scanning and
exploitation. They are often a nuisance; however, compromise by one of these actors is a major risk to an organization’s
reputation.

The following practices can help mitigate some of the risks identified above:

Security Updates: You must consider the end-to-end security posture of your underlying physical infrastructure, including
networking, storage, and server hardware. These systems will require their own security hardening practices. For your IBM
Storage Ceph deployment, you should have a plan to regularly test and deploy security updates.

Product Updates: IBM recommends running product updates as they become available. Updates are typically released every
six weeks (and occasionally more frequently). IBM endeavors to make point releases and z-stream releases fully compatible
within a major release in order to not require additional integration testing.

Access Management: Access management includes authentication, authorization, and accounting. Authentication is the
process of verifying the user’s identity. Authorization is the process of granting permissions to an authenticated user.
Accounting is the process of tracking which user performed an action. When granting system access to users, apply the
principle of least privilege, and only grant users the granular system privileges they actually need. This approach can also help
mitigate the risks of both malicious actors and typographical errors from system administrators.

Manage Insiders: You can help mitigate the threat of malicious insiders by applying careful assignment of role-based access
control (minimum required access), using encryption on internal interfaces, and using authentication/authorization security
(such as centralized identity management). You can also consider additional non-technical options, such as separation of
duties and irregular job role rotation.

Security Zones

40 IBM Storage Ceph


Edit online
A security zone comprises users, applications, servers, or networks that share common trust requirements and expectations within a
system. Typically they share the same authentication, authorization requirements, and users. Although you might refine these zone
definitions further, this section refers to four distinct security zones, three of which form the bare minimum that is required to deploy
a security-hardened IBM Storage Ceph cluster. These security zones are listed below from least to most trusted:

Public Security Zone: The public security zone is an entirely untrusted area of the cloud infrastructure. It can refer to the
Internet as a whole or simply to networks that are external to your OpenStack deployment over which you have no authority.
Any data with confidentiality or integrity requirements that traverse this zone should be protected using compensating
controls such as encryption. The public security zone SHOULD NOT be confused with the Ceph Storage Cluster’s front- or
client-side network, which is referred to as the public_network in IBM Storage Ceph and is usually NOT part of the public
security zone or the Ceph client security zone.

Ceph Client Security Zone: With IBM Storage Ceph, the Ceph client security zone refers to networks accessing Ceph clients
such as Ceph Object Gateway, Ceph Block Device, Ceph Filesystem, or librados. The Ceph client security zone is typically
behind a firewall separating itself from the public security zone. However, Ceph clients are not always protected from the
public security zone. It is possible to expose the Ceph Object Gateway’s S3 and Swift APIs in the public security zone.

Storage Access Security Zone: The storage access security zone refers to internal networks providing Ceph clients with
access to the Ceph Storage Cluster. We use the phrase storage access security zone so that this document is consistent with
the terminology used in the OpenStack Platform Security and Hardening Guide. The storage access security zone includes the
Ceph Storage Cluster’s front- or client-side network, which is referred to as the public_network in IBM Storage Ceph.

Ceph Cluster Security Zone: The Ceph cluster security zone refers to the internal networks providing the Ceph Storage
Cluster’s OSD daemons with network communications for replication, heartbeating, backfilling, and recovery. The Ceph
cluster security zone includes the Ceph Storage Cluster’s backside network, which is referred to as the cluster_network in
IBM Storage Ceph. These security zones can be mapped separately, or combined to represent the majority of the possible
areas of trust within a given IBM Storage Ceph deployment. Security zones should be mapped out against your specific
deployment topology. The zones and their trust requirements will vary depending upon whether the storage cluster is
operating in a standalone capacity or is serving a public, private, or hybrid cloud.

For a visual representation of these security zones, see Security-Optimized Architecture.

Reference

For more information, see Network Communication.

Connecting Security Zones


Edit online
Any component that spans across multiple security zones with different trust levels or authentication requirements must be carefully
configured. These connections are often the weak points in network architecture, and should always be configured to meet the
security requirements of the highest trust level of any of the zones being connected. In many cases, the security controls of the
connected zones should be a primary concern due to the likelihood of attack. The points where zones meet do present an
opportunity for attackers to migrate or target their attack to more sensitive parts of the deployment.

In some cases, IBM Storage Ceph administrators might want to consider securing integration points at a higher standard than any of
the zones in which the integration point resides. For example, the Ceph Cluster Security Zone can be isolated from other security
zones easily, because there is no reason for it to connect to other security zones. By contrast, the Storage Access Security Zone must
provide access to port 6789 on Ceph monitor nodes, and ports 6800-7300 on Ceph OSD nodes. However, port 3000 should be
exclusive to the Storage Access Security Zone, because it provides access to Ceph Grafana monitoring information that should be
exposed to Ceph administrators only. A Ceph Object Gateway in the Ceph Client Security Zone will need to access the Ceph Cluster
Security Zone’s monitors (port 6789) and OSDs (ports 6800-7300), and may expose its S3 and Swift APIs to the Public Security
Zone such as over HTTP port 80 or HTTPS port 443; yet, it may still need to restrict access to the admin API.

As core services usually span at least two zones, special consideration must be given when applying security controls to them.

Security-Optimized Architecture
Edit online

IBM Storage Ceph 41


An IBM Storage Ceph cluster’s daemons typically run on nodes that are subnet isolated and behind a firewall, which makes it
relatively simple to secure a cluster.

By contrast, IBM Storage Ceph clients such as Ceph Block Device (rbd), Ceph Filesystem (cephfs), and Ceph Object Gateway (rgw)
access the IBM storage cluster, but expose their services to other cloud computing platforms.

Figure 1. Security-optimized architecture

Encryption and Key Management


Edit online
The IBM Storage Ceph cluster typically resides in its own network security zone, especially when using a private storage cluster
network.

IMPORTANT: Security zone separation might be insufficient for protection if an attacker gains access to Ceph clients on the public
network.

There are situations where there is a security requirement to assure the confidentiality or integrity of network traffic, and where IBM
Storage Ceph uses encryption and key management, including:

SSH

42 IBM Storage Ceph


SSL Termination

Encryption in Transit

Encryption at Rest

SSH
SSL Termination
Messenger v2 protocol
Encryption in transit
Encryption at Rest

SSH
Edit online
All nodes in the IBM Storage Ceph cluster use SSH as part of deploying the cluster. This means that on each node:

A cephadm user exists with password-less root privileges.

The SSH service is enabled and by extension port 22 is open.

A copy of the cephadm user’s public SSH key is available.

IMPORTANT: Any person with access to the cephadm user by extension has permission to run commands as root on any node in
the IBM Storage Ceph cluster.

Reference

For more information, see How cephadm works.

SSL Termination
Edit online
The Ceph Object Gateway may be deployed in conjunction with HAProxy and keepalived for load balancing and failover. Earlier
versions of Civetweb do not support SSL and later versions support SSL with some performance limitations.

You can configure the Beast front-end web server to use the OpenSSL library to provide Transport Layer Security (TLS).

When using HAProxy and keepalived to terminate SSL connections, the HAProxy and keepalived components use encryption
keys.

When using HAProxy and keepalived to terminate SSL, the connection between the load balancer and the Ceph Object Gateway is
NOT encrypted.

Reference

For more information, see Configuring SSL for Beast and High availability service.

Messenger v2 protocol
Edit online
The second version of Ceph’s on-wire protocol, msgr2, has the following features:

A secure mode encrypting all data moving through the network.

Encapsulation improvement of authentication payloads, enabling future integration of new authentication modes.

Improvements to feature advertisement and negotiation.

IBM Storage Ceph 43


The Ceph daemons bind to multiple ports allowing both the legacy v1-compatible, and the new, v2-compatible Ceph clients to
connect to the same storage cluster. Ceph clients or other Ceph daemons connecting to the Ceph Monitor daemon uses the v2
protocol first, if possible, but if not, then the legacy v1 protocol is used. By default, both messenger protocols, v1 and v2, are
enabled. The new v2 port is 3300, and the legacy v1 port is 6789, by default.

The messenger v2 protocol has two configuration options that control whether the v1 or the v2 protocol is used:

ms_bind_msgr1 - This option controls whether a daemon binds to a port speaking the v1 protocol; it is true by default.

ms_bind_msgr2 - This option controls whether a daemon binds to a port speaking the v2 protocol; it is true by default.

Similarly, two options control based on IPv4 and IPv6 addresses used:

ms_bind_ipv4 - This option controls whether a daemon binds to an IPv4 address; it is true by default.

ms_bind_ipv6 - This option controls whether a daemon binds to an IPv6 address; it is true by default.

NOTE: The ability to bind to multiple ports has paved the way for dual-stack IPv4 and IPv6 support.

The msgr2 protocol supports two connection modes:

crc

Provides strong initial authentication when a connection is established with cephx.

Provides a crc32c integrity check to protect against bit flips.

Does not provide protection against a malicious man-in-the-middle attack.

Does not prevent an eavesdropper from seeing all post-authentication traffic.

secure

Provides strong initial authentication when a connection is established with cephx.

Provides full encryption of all post-authentication traffic.

Provides a cryptographic integrity check.

The default mode is crc.

Ceph Object Gateway Encryption

Also, the Ceph Object Gateway supports encryption with customer-provided keys using its S3 API.

IMPORTANT: To comply with regulatory compliance standards requiring strict encryption in transit, administrators MUST deploy the
Ceph Object Gateway with client-side encryption.

Ceph Block Device Encryption

System administrators integrating Ceph as a backend for OpenStack Platform 13 MUST encrypt Ceph block device volumes using
dm_crypt for RBD Cinder to ensure on-wire encryption within the Ceph storage cluster.

IMPORTANT: To comply with regulatory compliance standards requiring strict encryption in transit, system administrators MUST use
dmcrypt for RBD Cinder to ensure on-wire encryption within the Ceph storage cluster.

Reference

For more information, see Configuring.

Encryption in transit
Edit online
The secure mode setting for messenger v2 encrypts communication between Ceph daemons and Ceph clients, providing end-to-
end encryption.

44 IBM Storage Ceph


You can check for encryption of the messenger v2 protocol with the ceph config dump command, netstat -Ip | grep
ceph-osd command, or verify the Ceph daemon on the v2 ports.

Reference

For more information about SSL termination, see SSL Termination.

For more information about S3 API encryption, see S3 server-side encryption.

Encryption at Rest
Edit online
IBM Storage Ceph supports encryption at rest in a few scenarios:

1. Ceph Storage Cluster: The Ceph Storage Cluster supports Linux Unified Key Setup or LUKS encryption of Ceph OSDs and their
corresponding journals, write-ahead logs, and metadata databases. In this scenario, Ceph will encrypt all data at rest
irrespective of whether the client is a Ceph Block Device, Ceph Filesystem, or a custom application built on librados.

2. Ceph Object Gateway: The Ceph storage cluster supports encryption of client objects. Additionally, the data transmitted is
between the Ceph Object Gateway and the Ceph Storage Cluster is in encrypted form.

Ceph Storage Cluster Encryption

The Ceph storage cluster supports encrypting data stored in Ceph OSDs. IBM Storage Ceph can encrypt logical volumes with lvm by
specifying dmcrypt; that is, lvm, invoked by ceph-volume, encrypts an OSD’s logical volume, not its physical volume. It can
encrypt non-LVM devices like partitions using the same OSD key. Encrypting logical volumes allows for more configuration flexibility.

Ceph uses LUKS v1 rather than LUKS v2, because LUKS v1 has the broadest support among Linux distributions.

When creating an OSD, lvm will generate a secret key and pass the key to the Ceph Monitors securely in a JSON payload via stdin.
The attribute name for the encryption key is dmcrypt_key.

IMPORTANT: System administrators must explicitly enable encryption.

By default, Ceph does not encrypt data stored in Ceph OSDs. System administrators must enable dmcrypt to encrypt data stored in
Ceph OSDs. When using a Ceph Orchestrator service specification file for adding Ceph OSDs to the storage cluster, set the following
option in the file to encrypt Ceph OSDs:

Example

...
encrypted: true
...

NOTE: LUKS and dmcrypt only address encryption for data at rest, not encryption for data in transit.

Ceph Object Gateway Encryption

The Ceph Object Gateway supports encryption with customer-provided keys using its S3 API. When using customer-provided keys,
the S3 client passes an encryption key along with each request to read or write encrypted data. It is the customer’s responsibility to
manage those keys. Customers must remember which key the Ceph Object Gateway used to encrypt each object.

Reference

For more information, see S3 API server-side encryption.

Identity and Access Management


Edit online
IBM Storage Ceph provides identity and access management for:

Ceph Storage Cluster User Access


Ceph Object Gateway User Access

IBM Storage Ceph 45


Ceph Object Gateway LDAP or AD authentication
Ceph Object Gateway OpenStack Keystone authentication

Ceph Storage Cluster User Access


Edit online
To identify users and protect against man-in-the-middle attacks, Ceph provides its cephx authentication system to authenticate
users and daemons. For more information about cephx, see Ceph user management.

IMPORTANT: The cephx protocol DOES NOT address data encryption in transport or encryption at rest.

Cephx uses shared secret keys for authentication, meaning both the client and the monitor cluster have a copy of the client’s secret
key. The authentication protocol is such that both parties are able to prove to each other they have a copy of the key without actually
revealing it. This provides mutual authentication, which means the cluster is sure the user possesses the secret key, and the user is
sure that the cluster has a copy of the secret key.

In the figure below, users are either individuals or system actors such as applications, which use Ceph clients to interact with the
IBM Storage Ceph cluster daemons.

Figure 1. OSD states

Ceph runs with authentication and authorization enabled by default. Ceph clients may specify a user name and a keyring containing
the secret key of the specified user, usually by using the command line. If the user and keyring are not provided as arguments, Ceph
will use the client.admin administrative user as the default. If a keyring is not specified, Ceph will look for a keyring by using the
keyring setting in the Ceph configuration.

IMPORTANT: To harden a Ceph cluster, keyrings SHOULD ONLY have read and write permissions for the current user and root. The
keyring containing the client.admin administrative user key must be restricted to the root user.

For details on configuring the IBM Storage Ceph cluster to use authentication, see the IBM Storage Ceph Configuration Guide 5. More
specifically, see section Ceph authentication configuration.

Ceph Object Gateway User Access


Edit online
The Ceph Object Gateway provides a RESTful application programming interface (API) service with its own user management that
authenticates and authorizes users to access S3 and Swift APIs containing user data. Authentication consists of:

S3 User: An access key and secret for a user of the S3 API.

Swift User: An access key and secret for a user of the Swift API. The Swift user is a subuser of an S3 user. Deleting the S3
parent user will delete the Swift user.

Administrative User: An access key and secret for a user of the administrative API. Administrative users should be created
sparingly, as the administrative user will be able to access the Ceph Admin API and execute its functions, such as creating
users, and giving them permissions to access buckets or containers and their objects among other things.

The Ceph Object Gateway stores all user authentication information in Ceph Storage cluster pools. Additional information may be
stored about users including names, email addresses, quotas, and usage.

Reference

46 IBM Storage Ceph


For more information, see User Management and Creating an Administrative User.

Ceph Object Gateway LDAP or AD authentication


Edit online
IBM Storage Ceph supports Light-weight Directory Access Protocol (LDAP) servers for authenticating Ceph Object Gateway users.
When configured to use LDAP or Active Directory (AD), Ceph Object Gateway defers to an LDAP server to authenticate users of the
Ceph Object Gateway.

Ceph Object Gateway controls whether to use LDAP. However, once configured, it is the LDAP server that is responsible for
authenticating users.

To secure communications between the Ceph Object Gateway and the LDAP server, IBM recommends deploying configurations with
LDAP Secure or LDAPS.

IMPORTANT: When using LDAP, ensure that access to the rgw_ldap_secret = _PATH_TO_SECRET_FILE_ secret file is secure.

Reference

For more information, see Configure LDAP and Ceph Object Gateway and Configure Active Directory and Ceph Object Gateway.

Ceph Object Gateway OpenStack Keystone authentication


Edit online
IBM Storage Ceph supports using OpenStack Keystone to authenticate Ceph Object Gateway Swift API users. The Ceph Object
Gateway can accept a Keystone token, authenticate the user and create a corresponding Ceph Object Gateway user. When Keystone
validates a token, the Ceph Object Gateway considers the user authenticated.

Ceph Object Gateway controls whether to use OpenStack Keystone for authentication. However, once configured, it is the OpenStack
Keystone service that is responsible for authenticating users.

Configuring the Ceph Object Gateway to work with Keystone requires converting the OpenSSL certificates that Keystone uses for
creating the requests to the nss db format.

Reference

For more information, see The Ceph Object Gateway and OpenStack Keystone.

Infrastructure Security
Edit online
The scope of this guide is IBM Storage Ceph. However, a proper IBM Storage Ceph security plan requires consideration of the
following prerequisites.

Prerequisites

Red Hat Enterprise Linux 8 Security Hardening Guide

Red Hat Enterprise Linux 8 Using SELinux Guide

Administration
Network Communication
Hardening the Network Service
Reporting
Auditing Administrator Actions

IBM Storage Ceph 47


Administration
Edit online
Administering an IBM Storage Ceph cluster involves using command line tools. The CLI tools require an administrator key for
administrator access privileges to the cluster. By default, Ceph stores the administrator key in the /etc/ceph directory. The default
file name is ceph.client.admin.keyring. Take steps to secure the keyring so that only a user with administrative privileges to
the cluster may access the keyring.

Network Communication
Edit online
IBM Storage Ceph provides two networks:

A public network.

A cluster network.

All Ceph daemons and Ceph clients require access to the public network, which is part of the storage access security zone. By
contrast, ONLY the OSD daemons require access to the cluster network, which is part of the Ceph cluster security zone.

Figure 1. Network architecture

The Ceph configuration contains public_network and cluster_network settings. For hardening purposes, specify the IP
address and the netmask using CIDR notation. Specify multiple comma-delimited IP address and netmask entries if the cluster will
have multiple subnets.

48 IBM Storage Ceph


public_network = <public-network/netmask>[,<public-network/netmask>]

cluster_network = <cluster-network/netmask>[,<cluster-network/netmask>]

For more information, see Ceph network configuration.

Hardening the Network Service


Edit online
System administrators deploy IBM Storage Ceph clusters on Red Hat Enterprise Linux 8 Server. SELinux is on by default and the
firewall blocks all inbound traffic except for the SSH service port 22; however, you MUST ensure that this is the case so that no other
unauthorized ports are open or unnecessary services are enabled.

On each server node, execute the following:

1. Start the firewalld service, enable it to run on boot, and ensure that it is running:

# systemctl enable firewalld


# systemctl start firewalld
# systemctl status firewalld

2. Take an inventory of all open ports.

# firewall-cmd --list-all

On a new installation, the sources: section should be blank indicating that no ports have been opened specifically. The
services section should indicate ssh indicating that the SSH service (and port 22) and dhcpv6-client are enabled.

sources:
services: ssh dhcpv6-client

3. Ensure SELinux is running and Enforcing.

# getenforce
Enforcing

If SELinux is Permissive, set it to Enforcing.

# setenforce 1

If SELinux is not running, enable it. For more information, see Red Hat Enterprise Linux 8 Using SELinux.

Each Ceph daemon uses one or more ports to communicate with other daemons in the IBM Storage Ceph cluster. In some cases, you
may change the default port settings. Administrators typically only change the default port with the Ceph Object Gateway or ceph-
radosgw daemon.

Table 1. Ceph Ports


TCP/UDP Port Daemon Configuration Option
6789, 3300 ceph-mon N/A
6800-7300 ceph-osd ms_bind_port_min to ms_bind_port_max
6800-7300 ceph-mgr ms_bind_port_min to ms_bind_port_max
6800 ceph-mds N/A
8080 ceph-radosgw rgw_frontends
The Ceph Storage Cluster daemons include ceph-mon, ceph-mgr, and ceph-osd. These daemons and their hosts comprise the
Ceph cluster security zone, which should use its own subnet for hardening purposes.

The Ceph clients include ceph-radosgw, ceph-mds, ceph-fuse, libcephfs, rbd, librbd, and librados. These daemons and
their hosts comprise the storage access security zone, which should use its own subnet for hardening purposes.

On the Ceph Storage Cluster zone’s hosts, consider enabling only hosts running Ceph clients to connect to the Ceph Storage Cluster
daemons. For example:

firewall-cmd --zone=<zone-name> --add-rich-rule="rule family="ipv4" \


source address="<ip-address>/<netmask>" port protocol="tcp" \
port="<port-number>" accept"

IBM Storage Ceph 49


Replace <zone-name> with the zone name, <ipaddress> with the IP address, <netmask> with the subnet mask in CIDR notation,
and <port-number> with the port number or range. Repeat the process with the --permanent flag so that the changes persist
after reboot. For example:

firewall-cmd --zone=<zone-name> --add-rich-rule="rule family="ipv4" \


source address="<ip-address>/<netmask>" port protocol="tcp" \
port="<port-number>" accept" --permanent

Reporting
Edit online
IBM Storage Ceph provides basic system monitoring and reporting with the ceph-mgr daemon plug-ins, namely, the RESTful API,
the dashboard, and other plug-ins such as Prometheus and Zabbix. Ceph collects this information using collectd and sockets to
retrieve settings, configuration details, and statistical information.

In addition to default system behavior, system administrators may configure collectd to report on security matters, such as
configuring the IP-Tables or ConnTrack plug-ins to track open ports and connections respectively.

System administrators may also retrieve configuration settings at runtime. For more information, see Viewing the Ceph configuration
at runtime.

Auditing Administrator Actions


Edit online
An important aspect of system security is to periodically audit administrator actions on the cluster. IBM Storage Ceph stores a history
of administrator actions in the /var/log/ceph/CLUSTER_FSID/ceph.audit.log file. Run the following command on the
monitor host.

Example

[root@host04 ~]# cat /var/log/ceph/6c58dfb8-4342-11ee-a953-fa163e843234/ceph.audit.log

Each entry will contain:

Timestamp: Indicates when the command was executed.

Monitor Address: Identifies the monitor modified.

Client Node: Identifies the client node initiating the change.

Entity: Identifies the user making the change.

Command: Identifies the command executed.

The following is an output of the Ceph audit log:

2023-09-01T10:20:21.445990+0000 mon.host01 (mon.0) 122301 : audit [DBG] from='mgr.14189


10.0.210.22:0/1157748332' entity='mgr.host01.mcadea' cmd=[{"prefix": "config generate-minimal-
conf"}]: dispatch
2023-09-01T10:20:21.446972+0000 mon.host01 (mon.0) 122302 : audit [INF] from='mgr.14189
10.0.210.22:0/1157748332' entity='mgr.host01.mcadea' cmd=[{"prefix": "auth get", "entity":
"client.admin"}]: dispatch
2023-09-01T10:20:21.453790+0000 mon.host01 (mon.0) 122303 : audit [INF] from='mgr.14189
10.0.210.22:0/1157748332' entity='mgr.host01.mcadea'
2023-09-01T10:20:21.457119+0000 mon.host01 (mon.0) 122304 : audit [DBG] from='mgr.14189
10.0.210.22:0/1157748332' entity='mgr.host01.mcadea' cmd=[{"prefix": "osd tree", "states":
["destroyed"], "format": "json"}]: dispatch
2023-09-01T10:20:30.671816+0000 mon.host01 (mon.0) 122305 : audit [DBG] from='mgr.14189
10.0.210.22:0/1157748332' entity='mgr.host01.mcadea' cmd=[{"prefix": "osd blocklist ls", "format":
"json"}]: dispatch

In distributed systems such as Ceph, actions may begin on one instance and get propagated to other nodes in the cluster. When the
action begins, the log indicates dispatch. When the action ends, the log indicates finished.

50 IBM Storage Ceph


Data Retention
Edit online
IBM Storage Ceph stores user data, but usually in an indirect manner. Customer data retention may involve other applications, such
as the OpenStack Platform.

Ceph Storage Cluster


Ceph Block Device
Ceph Object Gateway

Ceph Storage Cluster


Edit online
The Ceph Storage Cluster, often referred to as the Reliable Autonomic Distributed Object Store or RADOS, stores data as objects
within pools. In most cases, these objects are the atomic units representing client data, such as Ceph Block Device images, Ceph
Object Gateway objects, or Ceph Filesystem files. However, custom applications built on top of librados may bind to a pool and
store data too.

Cephx controls access to the pools storing object data. However, Ceph Storage Cluster users are typically Ceph clients, and not
users. Consequently, users generally DO NOT have the ability to write, read or delete objects directly in a Ceph Storage Cluster pool.

Ceph Block Device


Edit online
The most popular use of IBM Storage Ceph, the Ceph Block Device interface, also referred to as RADOS Block Device or RBD, creates
virtual volumes, images, and compute instances and stores them as a series of objects within pools. Ceph assigns these objects to
placement groups and distributes or places them pseudo-randomly in OSDs throughout the cluster.

Depending upon the application consuming the Ceph Block Device interface, usually OpenStack Platform, users may create, modify,
and delete volumes and images. Ceph handles the create, retrieve, update, and delete operations of each individual object.

Deleting volumes and images destroys the corresponding objects in an unrecoverable manner. However, residual data artifacts may
continue to reside on storage media until overwritten. Data may also remain in backup archives.

Ceph Object Gateway


Edit online
From a data security and retention perspective, the Ceph Object Gateway interface has some important differences when compared
to the Ceph Block Device and Ceph Filesystem interfaces. The Ceph Object Gateway provides a service to users. The Ceph Object
Gateway may store:

User Authentication Information: User authentication information generally consists of user IDs, user access keys, and user
secrets. It may also comprise a user’s name and email address if provided. Ceph Object Gateway will retain user
authentication data unless the user is explicitly deleted from the system.

User Data: User data generally comprises user- or administrator-created buckets or containers, and the user-created S3 or
Swift objects contained within them. The Ceph Object Gateway interface creates one or more Ceph Storage cluster objects for
each S3 or Swift object and stores the corresponding Ceph Storage cluster objects within a data pool. Ceph assigns the Ceph
Storage cluster objects to placement groups and distributes or places them pseudo-randomly in OSDs throughout the cluster.
The Ceph Object Gateway may also store an index of the objects contained within a bucket or index to enable services such as
listing the contents of an S3 bucket or Swift container. Additionally, when implementing multi-part uploads, the Ceph Object
Gateway may temporarily store partial uploads of S3 or Swift objects.

IBM Storage Ceph 51


Users may create, modify, and delete buckets or containers, and the objects contained within them in a Ceph Object Gateway.
Ceph handles the create, retrieve, update, and delete operations of each individual Ceph Storage cluster object representing
the S3 or Swift object.

Deleting S3 or Swift objects destroys the corresponding Ceph Storage cluster objects in an unrecoverable manner. However,
residual data artifacts may continue to reside on storage media until overwritten. Data may also remain in backup archives.

Logging: Ceph Object Gateway also stores logs of user operations that the user intends to accomplish and operations that
have been executed. This data provides traceability about who created, modified or deleted a bucket or container, or an S3 or
Swift object residing in an S3 bucket or Swift container. When users delete their data, the logging information is not affected
and will remain in storage until deleted by a system administrator or removed automatically by expiration policy.

Bucket Lifecycle

Ceph Object Gateway also supports bucket lifecycle features, including object expiration. Data retention regulations like the General
Data Protection Regulation may require administrators to set object expiration policies and disclose them to users among other
compliance factors.

multi-site

Ceph Object Gateway is often deployed in a multi-site context whereby a user stores an object at one site and the Ceph Object
Gateway creates a replica of the object in another cluster possibly at another geographic location. For example, if a primary cluster
fails, a secondary cluster may resume operations. In another example, a secondary cluster may be in a different geographic location,
such as an edge network or content-delivery network such that a client may access the closest cluster to improve response time,
throughput, and other performance characteristics. In multi-site scenarios, administrators must ensure that each site has
implemented security measures. Additionally, if geographic distribution of data would occur in a multi-site scenario, administrators
must be aware of any regulatory implications when the data crosses political boundaries.

Federal Information Processing Standard (FIPS)


Edit online
IBM Storage Ceph uses FIPS validated cryptography modules when run on Red Hat Enterprise Linux 8.4.

Enable FIPS mode on Red Hat Enterprise Linux either during system installation or after it.

For container deployments, follow the instructions in the Red Hat Enterprise Linux 8 Security Hardening Guide.

Reference

For the latest information on FIPS validation, refer to the US Government Standards.

Summary
Edit online
This document has provided only a general introduction to security for IBM Storage Ceph. Contact the IBM Storage Ceph consulting
team for additional help.

Planning
Edit online
Planning involves considering the supported compatibility, physical configuration, and various storage strategy prerequisites before
working with IBM Storage Ceph.

Compatibility
Hardware
Storage Strategies

52 IBM Storage Ceph


Compatibility
Edit online
Understand the compatibility of IBM Storage Ceph versions with other products.

Compatibility Matrix for IBM Storage Ceph 5.3

Compatibility Matrix for IBM Storage Ceph 5.3


Edit online
The following tables list products and their versions compatible with IBM Storage Ceph 5.3.

Host Operating
Version Notes
System
Red Hat 9.2, 9.1, 9.0, 8.8, 8.7, 8.6, 8.5, Standard lifecycle 9.1 is included in the product (recommended). Red Hat
Enterprise Linux and EUS 8.4 Enterprise Linux EUS is optional.
IMPORTANT: All nodes in the cluster and their clients must use the supported OS version(s) to ensure that the version of the ceph
package is the same on all nodes. Using different versions of the ceph package is not supported.

IMPORTANT: IBM no longer supports using Ubuntu as a host operating system to deploy IBM Storage Ceph.

Product Version Notes


Ansible Supported in a limited capacity. Supported for upgrade and conversion to Cephadm and
for other minimal playbooks.
Red Hat 3.x RBD, Cinder and CephFS drivers are supported.
OpenShift
Red Hat See the Red Hat OpenShift Data Foundation
OpenShift Data Supportability and Interoperability Checker for
Foundation detailed external mode version compatibility.
Red Hat 16.2, 17.0 Red Hat OpenStack Platform 16.2 and 17.0 is supported
OpenStack through external and director deployed.
Platform
Red Hat 6.x Only registering with the Content Delivery Network (CDN)
Satellite is supported. Registering with Red Hat Network (RHN) is
deprecated and not supported.
Client Connector Version Notes
S3A 2.8.x, 3.2.x, and trunk
IBM Storage Ceph as a backup target Version Notes
CommVault Cloud Data Management
v11
IBM Spectrum Protect Plus 10.1.5
IBM Spectrum Protect server 8.1.8
NetApp AltaVault 4.3.2 and 4.4
Rubrik Cloud Data Management (CDM) 3.2 onwards
Trilio, TrilioVault 3.0 S3 target
Veeam (object storage) Veeam Availability Suite 9.5 Supported on IBM Storage Ceph object storage
Update 4 with the S3 protocol
Veritas NetBackup for Symantec OpenStorage 7.7 and 8.0
(OST) cloud backup
Independent Software vendors Version Notes
IBM Spectrum Discover 2.0.3
WekaIO 3.12.2

IBM Storage Ceph 53


Hardware
Edit online
Get a high level guidance on selecting hardware for use with IBM Storage Ceph.

Executive summary
General principles for selecting hardware
Optimize workload performance domains
Server and rack solutions
Minimum hardware recommendations for containerized Ceph
Recommended minimum hardware requirements for the IBM Storage Ceph Dashboard

Executive summary
Edit online
Many hardware vendors now offer both Ceph-optimized servers and rack-level solutions designed for distinct workload profiles. To
simplify the hardware selection process and reduce risk for organizations, IBM has worked with multiple storage server vendors to
test and evaluate specific cluster options for different cluster sizes and workload profiles. IBM’s exacting methodology combines
performance testing with proven guidance for a broad range of cluster capabilities and sizes.

With appropriate storage servers and rack-level solutions, IBM Storage Ceph can provide storage pools serving a variety of
workloads—from throughput-sensitive and cost and capacity-focused workloads to emerging IOPS-intensive workloads.

IBM Storage Ceph significantly lowers the cost of storing enterprise data and helps organizations manage exponential data growth.
The software is a robust and modern petabyte-scale storage platform for public or private cloud deployments. IBM Storage Ceph
offers mature interfaces for enterprise block and object storage, making it an optimal solution for active archive, rich media, and
cloud infrastructure workloads characterized by tenant-agnostic OpenStack® environments1. Delivered as a unified, software-
defined, scale-out storage platform, IBM Storage Ceph lets businesses focus on improving application innovation and availability by
offering capabilities such as:

Scaling to hundreds of petabytes2.

No single point of failure in the cluster.

Lower capital expenses (CapEx) by running on commodity server hardware.

Lower operational expenses (OpEx) with self-managing and self-healing properties.

IBM Storage Ceph can run on myriad industry-standard hardware configurations to satisfy diverse needs. To simplify and accelerate
the cluster design process, IBM conducts extensive performance and suitability testing with participating hardware vendors. This
testing allows evaluation of selected hardware under load and generates essential performance and sizing data for diverse
workloads—ultimately simplifying Ceph storage cluster hardware selection. As discussed in this guide, multiple hardware vendors
now provide server and rack-level solutions optimized for IBM Storage Ceph deployments with IOPS-, throughput-, and cost and
capacity-optimized solutions as available options.

Software-defined storage presents many advantages to organizations seeking scale-out solutions to meet demanding applications
and escalating storage needs. With a proven methodology and extensive testing performed with multiple vendors, IBM simplifies the
process of selecting hardware to meet the demands of any environment. Importantly, the guidelines and example systems listed in
this document are not a substitute for quantifying the impact of production workloads on sample systems.

Ceph is and has been the leading storage for OpenStack according to several semi-annual OpenStack user surveys.

See Yahoo Cloud Object Store - Object Storage at Exabyte Scale for details.

IBM Storage Ceph can run on myriad industry-standard hardware configurations to satisfy diverse needs. To simplify and accelerate
the cluster design process, IBM conducts extensive performance and suitability testing with participating hardware vendors. This
testing allows evaluation of selected hardware under load and generates essential performance and sizing data for diverse
workloads—ultimately simplifying Ceph storage cluster hardware selection. As discussed in this guide, multiple hardware vendors
now provide server and rack-level solutions optimized for IBM Storage Ceph deployments with IOPS-, throughput-, and cost and
capacity-optimized solutions as available options.

54 IBM Storage Ceph


Software-defined storage presents many advantages to organizations seeking scale-out solutions to meet demanding applications
and escalating storage needs. With a proven methodology and extensive testing performed with multiple vendors, IBM simplifies the
process of selecting hardware to meet the demands of any environment. Importantly, the guidelines and example systems listed in
this document are not a substitute for quantifying the impact of production workloads on sample systems.

General principles for selecting hardware


Edit online
As a storage administrator, you must select the appropriate hardware for running a production IBM Storage Ceph cluster. When
selecting hardware for IBM Storage Ceph, review these following general principles. These principles will help save time, avoid
common mistakes, save money and achieve a more effective solution.

Identify performance use case


Consider storage density
Identical hardware configuration
Network considerations for IBM Storage Ceph
Avoid using RAID solutions
Summary of common mistakes when selecting hardware
Reference

Prerequisites
Edit online

A planned use for IBM Storage Ceph.

Linux System Administration Advance level with {os-product} certification.

Storage administrator with Ceph Certification.

Identify performance use case


Edit online
One of the most important steps in a successful Ceph deployment is identifying a price-to-performance profile suitable for the
cluster’s use case and workload. It is important to choose the right hardware for the use case. For example, choosing IOPS-
optimized hardware for a cloud storage application increases hardware costs unnecessarily. Whereas, choosing capacity-optimized
hardware for its more attractive price point in an IOPS-intensive workload will likely lead to unhappy users complaining about slow
performance.

The primary use cases for Ceph are:

IOPS optimized: IOPS optimized deployments are suitable for cloud computing operations, such as running MYSQL or
MariaDB instances as virtual machines on OpenStack. IOPS optimized deployments require higher performance storage such
as 15k RPM SAS drives and separate SSD journals to handle frequent write operations. Some high IOPS scenarios use all flash
storage to improve IOPS and total throughput.

Throughput optimized: Throughput-optimized deployments are suitable for serving up significant amounts of data, such as
graphic, audio and video content. Throughput-optimized deployments require networking hardware, controllers and hard disk
drives with acceptable total throughput characteristics. In cases where write performance is a requirement, SSD journals will
substantially improve write performance.

Capacity optimized: Capacity-optimized deployments are suitable for storing significant amounts of data as inexpensively as
possible. Capacity-optimized deployments typically trade performance for a more attractive price point. For example,
capacity-optimized deployments often use slower and less expensive SATA drives and co-locate journals rather than using
SSDs for journaling.

This document provides examples of IBM tested hardware suitable for these use cases.

IBM Storage Ceph 55


Consider storage density
Edit online
Hardware planning should include distributing Ceph daemons and other processes that use Ceph across many hosts to maintain high
availability in the event of hardware faults. Balance storage density considerations with the need to rebalance the cluster in the event
of hardware faults. A common hardware selection mistake is to use very high storage density in small clusters, which can overload
networking during backfill and recovery operations.

Identical hardware configuration


Edit online
Create pools and define CRUSH hierarchies such that the OSD hardware within the pool is identical.

Same controller.

Same drive size.

Same RPMs.

Same seek times.

Same I/O.

Same network throughput.

Same journal configuration.

Using the same hardware within a pool provides a consistent performance profile, simplifies provisioning and streamlines
troubleshooting.

Network considerations for IBM Storage Ceph


Edit online
An important aspect of a cloud storage solution is that storage clusters can run out of IOPS due to network latency, and other factors.
Also, the storage cluster can run out of throughput due to bandwidth constraints long before the storage clusters run out of storage
capacity. This means that the network hardware configuration must support the chosen workloads to meet price versus performance
requirements.

Storage administrators prefer that a storage cluster recovers as quickly as possible. Carefully consider bandwidth requirements for
the storage cluster network, be mindful of network link oversubscription, and segregate the intra-cluster traffic from the client-to-
cluster traffic. Also consider that network performance is increasingly important when considering the use of Solid State Disks (SSD),
flash, NVMe, and other high performing storage devices.

Ceph supports a public network and a storage cluster network. The public network handles client traffic and communication with
Ceph Monitors. The storage cluster network handles Ceph OSD heartbeats, replication, backfilling, and recovery traffic. At a
minimum, a single 10 GB Ethernet link should be used for storage hardware, and you can add additional 10 GB Ethernet links for
connectivity and throughput.

IMPORTANT: IBM recommends allocating bandwidth to the storage cluster network, such that it is a multiple of the public network
using the osd_pool_default_size as the basis for the multiple on replicated pools. IBM also recommends running the public and
storage cluster networks on separate network cards.

IMPORTANT: IBM recommends using 10 GB Ethernet for IBM Storage Ceph deployments in production. A 1 GB Ethernet network is
not suitable for production storage clusters.

In the case of a drive failure, replicating 1 TB of data across a 1 GB Ethernet network takes 3 hours, and 3 TB takes 9 hours. Using 3
TB is the typical drive configuration. By contrast, with a 10 GB Ethernet network, the replication times would be 20 minutes and 1
hour. Remember that when a Ceph OSD fails, the storage cluster will recover by replicating the data it contained to other Ceph OSDs
within the pool.

56 IBM Storage Ceph


The failure of a larger domain such as a rack means that the storage cluster utilizes considerably more bandwidth. When building a
storage cluster consisting of multiple racks, which is common for large storage implementations, consider utilizing as much network
bandwidth between switches in a "fat tree" design for optimal performance. A typical 10 GB Ethernet switch has 48 10 GB ports and
four 40 GB ports. Use the 40 GB ports on the spine for maximum throughput. Alternatively, consider aggregating unused 10 GB ports
with QSFP+ and SFP+ cables into more 40 GB ports to connect to other rack and spine routers. Also, consider using LACP mode 4 to
bond network interfaces. Additionally, use jumbo frames, with a maximum transmission unit (MTU) of 9000, especially on the
backend or cluster network.

Before installing and testing a IBM Storage Ceph cluster, verify the network throughput. Most performance-related problems in Ceph
usually begin with a networking issue. Simple network issues like a kinked or bent Cat-6 cable could result in degraded bandwidth.
Use a minimum of 10 GB ethernet for the front side network. For large clusters, consider using 40 GB ethernet for the backend or
cluster network.

IMPORTANT: For network optimization, IBM recommends using jumbo frames for a better CPU per bandwidth ratio, and a non-
blocking network switch back-plane. IBM Storage Ceph requires the same MTU value throughout all networking devices in the
communication path, end-to-end for both public and cluster networks. Verify that the MTU value is the same on all hosts and
networking equipment in the environment before using a IBM Storage Ceph cluster in production.

Avoid using RAID solutions


Edit online
Ceph can replicate or erasure code objects. RAID duplicates this functionality on the block level and reduces available capacity.
Consequently, RAID is an unnecessary expense. Additionally, a degraded RAID will have a negative impact on performance.

IMPORTANT: IBM recommends that each hard drive be exported separately from the RAID controller as a single volume with write-
back caching enabled.

This requires a battery-backed, or a non-volatile flash memory device on the storage controller. It is important to make sure the
battery is working, as most controllers will disable write-back caching if the memory on the controller can be lost as a result of a
power failure. Periodically, check the batteries and replace them if necessary, as they do degrade over time. See the storage
controller vendor’s documentation for details. Typically, the storage controller vendor provides storage management utilities to
monitor and adjust the storage controller configuration without any downtime.

Using Just a Bunch of Drives (JBOD) in independent drive mode with Ceph is supported when using all Solid State Drives (SSDs), or
for configurations with high numbers of drives per controller. For example, 60 drives attached to one controller. In this scenario, the
write-back caching can become a source of I/O contention. Since JBOD disables write-back caching, it is ideal in this scenario. One
advantage of using JBOD mode is the ease of adding or replacing drives and then exposing the drive to the operating system
immediately after it is physically plugged in.

Summary of common mistakes when selecting hardware


Edit online

Repurposing underpowered legacy hardware for use with Ceph.

Using dissimilar hardware in the same pool.

Using 1Gbps networks instead of 10Gbps or greater.

Neglecting to setup both public and cluster networks.

Using RAID instead of JBOD.

Selecting drives on a price basis without regard to performance or throughput.

Journaling on OSD data drives when the use case calls for an SSD journal.

Having a disk controller with insufficient throughput characteristics.

Use the examples in this document of IBM tested configurations for different workloads to avoid some of the foregoing hardware
selection mistakes.

IBM Storage Ceph 57


Reference
Edit online
Supported configurations IBM Storage Ceph Supported configurations on the portal.

Optimize workload performance domains


Edit online
One of the key benefits of Ceph storage is the ability to support different types of workloads within the same cluster using Ceph
performance domains. Dramatically different hardware configurations can be associated with each performance domain. Ceph
system administrators can deploy storage pools on the appropriate performance domain, providing applications with storage tailored
to specific performance and cost profiles. Selecting appropriately sized and optimized servers for these performance domains is an
essential aspect of designing a IBM Storage Ceph cluster.

The following lists provide the criteria IBM uses to identify optimal IBM Storage Ceph cluster configurations on storage servers.
These categories are provided as general guidelines for hardware purchases and configuration decisions, and can be adjusted to
satisfy unique workload blends. Actual hardware configurations chosen will vary depending on specific workload mix and vendor
capabilities.

IOPS optimized

An IOPS-optimized storage cluster typically has the following properties:

Lowest cost per IOPS.

Highest IOPS per GB.

99th percentile latency consistency.

Typically uses for an IOPS-optimized storage cluster are:

Typically block storage.

3x replication for hard disk drives (HDDs) or 2x replication for solid state drives (SSDs).

MySQL on OpenStack clouds.

Throughput optimized

A throughput-optimized storage cluster typically has the following properties:

Lowest cost per MBps (throughput).

Highest MBps per TB.

Highest MBps per BTU.

Highest MBps per Watt.

97th percentile latency consistency.

Typically uses for an throughput-optimized storage cluster are:

Block or object storage.

3x replication.

Active performance storage for video, audio, and images.

Streaming media.

Cost and capacity optimized

A cost- and capacity-optimized storage cluster typically has the following properties:

58 IBM Storage Ceph


Lowest cost per TB.

Lowest BTU per TB.

Lowest Watts required per TB.

Typically uses for an cost- and capacity-optimized storage cluster are:

Typically object storage.

Erasure coding common for maximizing usable capacity

Object archive.

Video, audio, and image object repositories.

How performance domains work

To the Ceph client interface that reads and writes data, a Ceph storage cluster appears as a simple pool where the client stores data.
However, the storage cluster performs many complex operations in a manner that is completely transparent to the client interface.
Ceph clients and Ceph object storage daemons (Ceph OSDs, or simply OSDs) both use the controlled replication under scalable
hashing (CRUSH) algorithm for storage and retrieval of objects. OSDs run on OSD hosts—the storage servers within the cluster.

A CRUSH map describes a topography of cluster resources, and the map exists both on client nodes as well as Ceph Monitor (MON)
nodes within the cluster. Ceph clients and Ceph OSDs both use the CRUSH map and the CRUSH algorithm. Ceph clients communicate
directly with OSDs, eliminating a centralized object lookup and a potential performance bottleneck. With awareness of the CRUSH
map and communication with their peers, OSDs can handle replication, backfilling, and recovery—allowing for dynamic failure
recovery.

Ceph uses the CRUSH map to implement failure domains. Ceph also uses the CRUSH map to implement performance domains,
which simply take the performance profile of the underlying hardware into consideration. The CRUSH map describes how Ceph
stores data, and it is implemented as a simple hierarchy (acyclic graph) and a ruleset. The CRUSH map can support multiple
hierarchies to separate one type of hardware performance profile from another.

The following examples describe performance domains.

Hard disk drives (HDDs) are typically appropriate for cost- and capacity-focused workloads.

Throughput-sensitive workloads typically use HDDs with Ceph write journals on solid state drives (SSDs).

IOPS-intensive workloads such as MySQL and MariaDB often use SSDs.

All of these performance domains can coexist in a Ceph storage cluster.

Server and rack solutions


Edit online
Hardware vendors have responded to the enthusiasm around Ceph by providing both optimized server-level and rack-level solution
SKUs. Validated through joint testing with IBM, these solutions offer predictable price-to-performance ratios for Ceph deployments,
with a convenient modular approach to expand Ceph storage for specific workloads.

Typical rack-level solutions include:

Network switching: Redundant network switching interconnects the cluster and provides access to clients.

Ceph MON nodes: The Ceph monitor is a datastore for the health of the entire cluster, and contains the cluster log. A
minimum of three monitor nodes are strongly recommended for a cluster quorum in production.

Ceph OSD hosts: Ceph OSD hosts house the storage capacity for the cluster, with one or more OSDs running per individual
storage device. OSD hosts are selected and configured differently depending on both workload optimization and the data
devices installed: HDDs, SSDs, or NVMe SSDs.

IBM Storage Ceph: Many vendors provide a capacity-based subscription for IBM Storage Ceph bundled with both server and
rack-level solution SKUs.

NOTE: Contact IBM Support for any additional assistance.

IBM Storage Ceph 59


IOPS-optimized solutions

With the growing use of flash storage, organizations increasingly host IOPS-intensive workloads on Ceph storage clusters to let them
emulate high-performance public cloud solutions with private cloud storage. These workloads commonly involve structured data
from MySQL-, MariaDB-, or PostgreSQL-based applications.

Typical servers include the following elements:

CPU: 10 cores per NVMe SSD, assuming a 2 GHz CPU.

RAM: 16 GB baseline, plus 5 GB per OSD.

Networking: 10 Gigabit Ethernet (GbE) per 2 OSDs.

OSD media: High-performance, high-endurance enterprise NVMe SSDs.

OSDs: Two per NVMe SSD.

Bluestore WAL/DB: High-performance, high-endurance enterprise NVMe SSD, co-located with OSDs.

Controller: Native PCIe bus.

NOTE: For Non-NVMe SSDs, for CPU, use two cores per SSD OSD.

Table 1. Solutions SKUs for IOPS-optimized Ceph Workloads, by


cluster size.
Vendor Small (250TB) Medium (1PB) Large (2PB+)
SuperMicro 1 SYS-5038MR-OSD006P N/A N/A
See Supermicro® Total Solution for Ceph

Solutions SKUs for IOPS-optimized Ceph Workloads, by cluster size.

Throughput-optimized Solutions

Throughput-optimized Ceph solutions are usually centered around semi-structured or unstructured data. Large-block sequential I/O
is typical.

Typical server elements include:

CPU: 0.5 cores per HDD, assuming a 2 GHz CPU.

RAM: 16 GB baseline, plus 5 GB per OSD.

Networking: 10 GbE per 12 OSDs each for client- and cluster-facing networks.

OSD media: 7,200 RPM enterprise HDDs.

OSDs: One per HDD.

Bluestore WAL/DB: High-performance, high-endurance enterprise NVMe SSD, co-located with OSDs.

Host bus adapter (HBA): Just a bunch of disks (JBOD).

Several vendors provide pre-configured server and rack-level solutions for throughput-optimized Ceph workloads. IBM has
conducted extensive testing and evaluation of servers from Supermicro and Quanta Cloud Technologies (QCT).

Table 2. Rack-level SKUs for Ceph OSDs, MONs, and top-of-rack (TOR)
switches.
Vendor Small (250TB) Medium (1PB) Large (2PB+)
SuperMicro SRS-42E112-Ceph-03 SRS-42E136-Ceph-03 SRS-42E136-Ceph-03
Rack-level SKUs for Ceph OSDs, MONs, and top-of-rack (TOR) switches.

Table 3. Individual OSD Servers


Vendor Small (250TB) Medium (1PB) Large (2PB+)
SuperMicro SSG-6028R-OSD072P SSG-6048-OSD216P SSG-6048-OSD216P
QCT1 QxStor RCT-200 QxStor RCT-400 QxStor RCT-400

60 IBM Storage Ceph


See QCT: QxStor IBM Storage Ceph Edition

Table 4. Additional Servers Configurable for Throughput-optimized Ceph


OSD Workloads.
Vendor Small (250TB) Medium (1PB) Large (2PB+)
Dell PowerEdge R730XD 1 DSS 7000 2, twin node DSS 7000, twin node
Cisco UCS C240 M4 UCS C3260 3 UCS C3260 4
Lenovo System x3650 M5 System x3650 M5 N/A

See Dell PowerEdge R730xd Performance and Sizing Guide for IBM Storage Ceph - A Dell IBM Technical White Paper for
details.

See Dell EMC DSS 7000 Performance & Sizing Guide for IBM Storage Ceph for details.

IBM Storage Ceph hardware reference architecture for details.

UCS C3260 for details.

Additional Servers Configurable for Throughput-optimized Ceph OSD Workloads.

Cost and capacity-optimized solutions

Cost- and capacity-optimized solutions typically focus on higher capacity, or longer archival scenarios. Data can be either semi-
structured or unstructured. Workloads include media archives, big data analytics archives, and machine image backups. Large-block
sequential I/O is typical.

Solutions typically include the following elements:

CPU. 0.5 cores per HDD, assuming a 2 GHz CPU.

RAM. 16 GB baseline, plus 5 GB per OSD.

Networking. 10 GbE per 12 OSDs (each for client- and cluster-facing networks).

OSD media. 7,200 RPM enterprise HDDs.

OSDs. One per HDD.

Bluestore WAL/DB Co-located on the HDD.

HBA. JBOD.

Supermicro and QCT provide pre-configured server and rack-level solution SKUs for cost- and capacity-focused Ceph workloads.

Table 5. Pre-configured Rack-level SKUs for Cost- and Capacity-


optimized Workloads
Vendor Small (250TB) Medium (1PB) Large (2PB+)
SuperMicro N/A SRS-42E136-Ceph-03 SRS-42E172-Ceph-03
Pre-configured Rack-level SKUs for Cost- and Capacity-optimized Workloads

Table 6. Pre-configured Server-level SKUs for Cost- and Capacity-


optimized Workloads
Vendor Small (250TB) Medium (1PB) Large (2PB+)
SuperMicro N/A SSG-6048R-OSD216P 1 SSD-6048R-OSD360P
QCT N/A QxStor RCC-400 QxStor RCC-400
See Supermicro’s Total Solution for Ceph

Table 7. Additional Servers Configurable for Cost- and Capacity-


optimized Workloads
Vendor Small (250TB) Medium (1PB) Large (2PB+)
Dell N/A DSS 7000, twin node DSS 7000, twin node
Cisco N/A UCS C3260 UCS C3260
Lenovo N/A System x3650 M5 N/A

IBM Storage Ceph 61


Additional Servers Configurable for Cost- and Capacity-optimized Workloads

Red Hat Ceph Storage on Samsung NVMe SSDs

Deploying MySQL Databases on Red Hat Ceph Storage

Intel® Data Center Blocks for Cloud – IBM OpenStack Platform with Red Hat Ceph Storage

Red Hat Ceph Storage on QCT Servers

Red Hat Ceph Storage on Servers with Intel Processors and SSDs

Minimum hardware recommendations for containerized Ceph


Edit online
Ceph can run on non-proprietary commodity hardware. Small production clusters and development clusters can run without
performance optimization with modest hardware.

Process Criteria Minimum Recommended


ceph-osd- Processor 1x AMD64 or Intel 64 CPU CORE per OSD container
container
RAM Minimum of 5 GB of RAM per OSD container
OS Disk 1x OS disk per host
OSD Storage 1x storage drive per OSD container. Cannot be shared with OS Disk.
block.db Optional, but IBM recommended, 1x SSD or NVMe or Optane partition or lvm per daemon.
Sizing is 4% of block.data for BlueStore for object, file and mixed workloads and 1% of
block.data for the BlueStore for Block Device, Openstack cinder, and Openstack cinder
workloads.
block.wal Optionally, 1x SSD or NVMe or Optane partition or logical volume per daemon. Use a small
size, for example 10 GB, and only if it’s faster than the block.db device.
Network 2x 10 GB Ethernet NICs
ceph-mon- Processor 1x AMD64 or Intel 64 CPU CORE per mon-container
container
RAM 3 GB per mon-container
Disk Space 10 GB per mon-container, 50 GB Recommended
Monitor Disk Optionally, 1x SSD disk for Monitor rocksdb data
Network 2x 1GB Ethernet NICs, 10 GB Recommended
ceph-mgr- Processor 1x AMD64 or Intel 64 CPU CORE per mgr-container
container
RAM 3 GB per mgr-container
Network 2x 1GB Ethernet NICs, 10 GB Recommended
ceph- Processor 1x AMD64 or Intel 64 CPU CORE per radosgw-container
radosgw-
container RAM 1 GB per daemon
Disk Space 5 GB per daemon
Network 1x 1GB Ethernet NICs
ceph-mds- Processor 1x AMD64 or Intel 64 CPU CORE per mds-container
container
RAM 3 GB per mds-container

This number is highly dependent on the configurable MDS cache size. The RAM requirement is
typically twice as much as the amount set in the mds_cache_memory_limit configuration
setting. Note also that this is the memory for your daemon, not the overall system memory.
Disk Space 2 GB per mds-container, plus taking into consideration any additional space required for
possible debug logging, 20GB is a good start.
Network 2x 1GB Ethernet NICs, 10 GB Recommended

Note that this is the same network as the OSD containers. If you have a 10 GB network on your
OSDs you should use the same on your MDS so that the MDS is not disadvantaged when it
comes to latency.

62 IBM Storage Ceph


Recommended minimum hardware requirements for the IBM
Storage Ceph Dashboard
Edit online
The IBM Storage Ceph Dashboard has minimum hardware requirements.

Minimum requirements

4 core processor at 2.5 GHz or higher

8 GB RAM

50 GB hard disk drive

1 Gigabit Ethernet network interface

Reference

For more information, see High-level monitoring of a Ceph storage cluster in the Administration Guide.

Storage Strategies
Edit online
Creating storage strategies for IBM Storage Ceph clusters

This section of the document provides instructions for creating storage strategies, including creating CRUSH hierarchies, estimating
the number of placement groups, determining which type of storage pool to create, and managing pools.

Overview
Crush admin overview
Placement Groups
Pools overview
Erasure code pools overview

Overview
Edit online
From the perspective of a Ceph client, interacting with the Ceph storage cluster is remarkably simple:

1. Connect to the Cluster

2. Create a Pool I/O Context

This remarkably simple interface is how a Ceph client selects one of the storage strategies you define. Storage strategies are invisible
to the Ceph client in all but storage capacity and performance.

The diagram below shows the logical data flow starting from the client into the IBM Storage Ceph cluster.

Figure 1. Ceph storage architecture data flow

IBM Storage Ceph 63


What are storage strategies?
Configuring storage strategies

What are storage strategies?


Edit online
A storage strategy is a method of storing data that serves a particular use case. For example, if you need to store volumes and
images for a cloud platform like OpenStack, you might choose to store data on reasonably performant SAS drives with SSD-based
journals. By contrast, if you need to store object data for an S3- or Swift-compliant gateway, you might choose to use something
more economical, like SATA drives. Ceph can accommodate both scenarios in the same Ceph cluster, but you need a means of
providing the SAS/SSD storage strategy to the cloud platform (for example, Glance and Cinder in OpenStack), and a means of
providing SATA storage for your object store.

Storage strategies include the storage media (hard drives, SSDs, and the rest), the CRUSH maps that set up performance and failure
domains for the storage media, the number of placement groups, and the pool interface. Ceph supports multiple storage strategies.
Use cases, cost/benefit performance tradeoffs and data durability are the primary considerations that drive storage strategies.

1. Use Cases: Ceph provides massive storage capacity, and it supports numerous use cases. For example, the Ceph Block Device
client is a leading storage backend for cloud platforms like OpenStack—​providing limitless storage for volumes and images
with high performance features like copy-on-write cloning. Likewise, Ceph can provide container-based storage for OpenShift
environments. By contrast, the Ceph Object Gateway client is a leading storage backend for cloud platforms that provides
RESTful S3-compliant and Swift-compliant object storage for objects like audio, bitmap, video and other data.

2. Cost/Benefit of Performance: Faster is better. Bigger is better. High durability is better. However, there is a price for each
superlative quality, and a corresponding cost/benefit trade off. Consider the following use cases from a performance
perspective: SSDs can provide very fast storage for relatively small amounts of data and journaling. Storing a database or

64 IBM Storage Ceph


object index might benefit from a pool of very fast SSDs, but prove too expensive for other data. SAS drives with SSD
journaling provide fast performance at an economical price for volumes and images. SATA drives without SSD journaling
provide cheap storage with lower overall performance. When you create a CRUSH hierarchy of OSDs, you need to consider the
use case and an acceptable cost/performance trade off.

3. Durability: In large scale clusters, hardware failure is an expectation, not an exception. However, data loss and service
interruption remain unacceptable. For this reason, data durability is very important. Ceph addresses data durability with
multiple deep copies of an object or with erasure coding and multiple coding chunks. Multiple copies or multiple coding
chunks present an additional cost/benefit tradeoff: it’s cheaper to store fewer copies or coding chunks, but it might lead to the
inability to service write requests in a degraded state. Generally, one object with two additional copies (that is, size = 3) or
two coding chunks might allow a cluster to service writes in a degraded state while the cluster recovers. The CRUSH algorithm
aids this process by ensuring that Ceph stores additional copies or coding chunks in different locations within the cluster. This
ensures that the failure of a single storage device or node doesn’t lead to a loss of all of the copies or coding chunks necessary
to preclude data loss.

You can capture use cases, cost/benefit performance tradeoffs and data durability in a storage strategy and present it to a Ceph
client as a storage pool.

IMPORTANT: Ceph’s object copies or coding chunks make RAID obsolete. Do not use RAID, because Ceph already handles data
durability, a degraded RAID has a negative impact on performance, and recovering data using RAID is substantially slower than using
deep copies or erasure coding chunks.

Configuring storage strategies


Edit online
Configuring storage strategies is about assigning Ceph OSDs to a CRUSH hierarchy, defining the number of placement groups for a
pool, and creating a pool. The general steps are:

1. Define a Storage Strategy: Storage strategies require you to analyze your use case, cost/benefit performance tradeoffs and
data durability. Then, you create OSDs suitable for that use case. For example, you can create SSD-backed OSDs for a high
performance pool; SAS drive/SSD journal-backed OSDs for high-performance block device volumes and images; or, SATA-
backed OSDs for low cost storage. Ideally, each OSD for a use case should have the same hardware configuration so that you
have a consistent performance profile.

2. Define a CRUSH Hierarchy: Ceph rules select a node, usually the root, in a CRUSH hierarchy, and identify the appropriate
OSDs for storing placement groups and the objects they contain. You must create a CRUSH hierarchy and a CRUSH rule for
your storage strategy. CRUSH hierarchies get assigned directly to a pool by the CRUSH rule setting.

3. Calculate Placement Groups: Ceph shards a pool into placement groups. You do not have to manually set the number of
placement groups for your pool. PG autoscaler sets an appropriate number of placement groups for your pool that remains
within a healthy maximum number of placement groups in the event that you assign multiple pools to the same CRUSH rule.

4. Create a Pool: Finally, you must create a pool and determine whether it uses replicated or erasure-coded storage. You must
set the number of placement groups for the pool, the rule for the pool and the durability, such as size or K+M coding chunks.

Remember, the pool is the Ceph client’s interface to the storage cluster, but the storage strategy is completely transparent to the
Ceph client, except for capacity and performance.

Crush admin overview


Edit online
The Controlled Replication Under Scalable Hashing (CRUSH) algorithm determines how to store and retrieve data by computing data
storage locations.

Any sufficiently advanced technology is indistinguishable from magic.

— Arthur C. Clarke

CRUSH introduction
CRUSH hierarchy

IBM Storage Ceph 65


Ceph OSDs in CRUSH
Device class
CRUSH weights
Primary affinity
CRUSH rules
CRUSH tunables overview
Edit a CRUSH map
CRUSH storage strategies examples

CRUSH introduction
Edit online
The CRUSH map for your storage cluster describes your device locations within CRUSH hierarchies and a rule for each hierarchy that
determines how Ceph stores data.

The CRUSH map contains at least one hierarchy of nodes and leaves. The nodes of a hierarchy, called "buckets" in Ceph, are any
aggregation of storage locations as defined by their type. For example, rows, racks, chassis, hosts, and devices. Each leaf of the
hierarchy consists essentially of one of the storage devices in the list of storage devices. A leaf is always contained in one node or
"bucket." A CRUSH map also has a list of rules that determine how CRUSH stores and retrieves data.

NOTE: Storage devices are added to the CRUSH map when adding an OSD to the cluster.

The CRUSH algorithm distributes data objects among storage devices according to a per-device weight value, approximating a
uniform probability distribution. CRUSH distributes objects and their replicas or erasure-coding chunks according to the hierarchical
cluster map an administrator defines. The CRUSH map represents the available storage devices and the logical buckets that contain
them for the rule, and by extension each pool that uses the rule.

To map placement groups to OSDs across failure domains or performance domains, a CRUSH map defines a hierarchical list of
bucket types; that is, under types in the generated CRUSH map. The purpose of creating a bucket hierarchy is to segregate the leaf
nodes by their failure domains or performance domains or both. Failure domains include hosts, chassis, racks, power distribution
units, pods, rows, rooms, and data centers. Performance domains include failure domains and OSDs of a particular configuration. For
example, SSDs, SAS drives with SSD journals, SATA drives, and so on. Devices have the notion of a class, such as hdd, ssd and
nvme to more rapidly build CRUSH hierarchies with a class of devices.

With the exception of the leaf nodes representing OSDs, the rest of the hierarchy is arbitrary, and you can define it according to your
own needs if the default types do not suit your requirements. We recommend adapting your CRUSH map bucket types to your
organization’s hardware naming conventions and using instance names that reflect the physical hardware names. Your naming
practice can make it easier to administer the cluster and troubleshoot problems when an OSD or other hardware malfunctions and
the administrator needs remote or physical access to the host or other hardware.

In the following example, the bucket hierarchy has four leaf buckets (osd 1-4), two node buckets (host 1-2) and one rack node
(rack 1).

Figure 1. CRUSH hierarchy

66 IBM Storage Ceph


Since leaf nodes reflect storage devices declared under the devices list at the beginning of the CRUSH map, there is no need to
declare them as bucket instances. The second lowest bucket type in the hierarchy usually aggregates the devices; that is, it is usually
the computer containing the storage media, and uses whatever term administrators prefer to describe it, such as "node", "computer",
"server," "host", "machine", and so on. In high density environments, it is increasingly common to see multiple hosts/nodes per card
and per chassis. Make sure to account for card and chassis failure too, for example, the need to pull a card or chassis if a node fails
can result in bringing down numerous hosts/nodes and their OSDs.

When declaring a bucket instance, specify its type, give it a unique name as a string, assign it an optional unique ID expressed as a
negative integer, specify a weight relative to the total capacity or capability of its items, specify the bucket algorithm such as
straw2, and the hash that is usually 0 reflecting hash algorithm rjenkins1. A bucket can have one or more items. The items can
consist of node buckets or leaves. Items can have a weight that reflects the relative weight of the item.

Dynamic data placement


CRUSH failure domain
CRUSH performance domain
Using different device classes

Dynamic data placement


Edit online
Ceph Clients and Ceph OSDs both use the CRUSH map and the CRUSH algorithm.

Ceph Clients: By distributing CRUSH maps to Ceph clients, CRUSH empowers Ceph clients to communicate with OSDs
directly. This means that Ceph clients avoid a centralized object look-up table that could act as a single point of failure, a
performance bottleneck, a connection limitation at a centralized look-up server and a physical limit to the storage cluster’s
scalability.

Ceph OSDs: By distributing CRUSH maps to Ceph OSDs, Ceph empowers OSDs to handle replication, backfilling and recovery.
This means that the Ceph OSDs handle storage of object replicas (or coding chunks) on behalf of the Ceph client. It also
means that Ceph OSDs know enough about the cluster to re-balance the cluster (backfilling) and recover from failures
dynamically.

IBM Storage Ceph 67


CRUSH failure domain
Edit online
Having multiple object replicas or M erasure coding chunks helps prevent data loss, but it is not sufficient to address high availability.
By reflecting the underlying physical organization of the Ceph Storage Cluster, CRUSH can model—and thereby address—potential
sources of correlated device failures. By encoding the cluster’s topology into the cluster map, CRUSH placement policies can
separate object replicas or erasure coding chunks across different failure domains while still maintaining the desired pseudo-random
distribution.

For example, to address the possibility of concurrent failures, it might be desirable to ensure that data replicas or erasure coding
chunks are on devices using different shelves, racks, power supplies, controllers or physical locations. This helps to prevent data loss
and allows the cluster to operate in a degraded state.

CRUSH performance domain


Edit online
Ceph can support multiple hierarchies to separate one type of hardware performance profile from another type of hardware
performance profile. For example, CRUSH can create one hierarchy for hard disk drives and another hierarchy for SSDs. Performance
domains—hierarchies that take the performance profile of the underlying hardware into consideration—are increasingly popular due
to the need to support different performance characteristics. Operationally, these are just CRUSH maps with more than one root
type bucket. Use case examples include:

Object Storage: Ceph hosts that serve as an object storage back end for S3 and Swift interfaces might take advantage of less
expensive storage media such as SATA drives that might not be suitable for VMs—reducing the cost per gigabyte for object
storage, while separating more economical storage hosts from more performing ones intended for storing volumes and
images on cloud platforms. HTTP tends to be the bottleneck in object storage systems.

Cold Storage: Systems designed for cold storage—infrequently accessed data, or data retrieval with relaxed performance
requirements—might take advantage of less expensive storage media and erasure coding. However, erasure coding might
require a bit of additional RAM and CPU, and thus differ in RAM and CPU requirements from a host used for object storage or
VMs.

SSD-backed Pools: SSDs are expensive, but they provide significant advantages over hard disk drives. SSDs have no seek
time and they provide high total throughput. In addition to using SSDs for journaling, a cluster can support SSD-backed pools.
Common use cases include high performance SSD pools. For example, it is possible to map the .rgw.buckets.index pool
for the Ceph Object Gateway to SSDs instead of SATA drives.

A CRUSH map supports the notion of a device class. Ceph can discover aspects of a storage device and automatically assign a class
such as hdd, ssd or nvme. However, CRUSH is not limited to these defaults. For example, CRUSH hierarchies might also be used to
separate different types of workloads. For example, an SSD might be used for a journal or write-ahead log, a bucket index or for raw
object storage. CRUSH can support different device classes, such as ssd-bucket-index or ssd-object-storage so Ceph does
not use the same storage media for different workloads—making performance more predictable and consistent.

Behind the scenes, Ceph generates a crush root for each device-class. These roots should only be modified by setting or changing
device classes on OSDs. You can view the generated roots using the following command:

Example

[ceph: root@host01 /]# ceph osd crush tree --show-shadow

ID CLASS WEIGHT TYPE NAME


-24 ssd 4.54849 root default~ssd
-19 ssd 0.90970 host ceph01~ssd
8 ssd 0.90970 osd.8
-20 ssd 0.90970 host ceph02~ssd
7 ssd 0.90970 osd.7
-21 ssd 0.90970 host ceph03~ssd
3 ssd 0.90970 osd.3
-22 ssd 0.90970 host ceph04~ssd
5 ssd 0.90970 osd.5
-23 ssd 0.90970 host ceph05~ssd
6 ssd 0.90970 osd.6
-2 hdd 50.94173 root default~hdd

68 IBM Storage Ceph


-4 hdd 7.27739 host ceph01~hdd
10 hdd 7.27739 osd.10
-12 hdd 14.55478 host ceph02~hdd
0 hdd 7.27739 osd.0
12 hdd 7.27739 osd.12
-6 hdd 14.55478 host ceph03~hdd
4 hdd 7.27739 osd.4
11 hdd 7.27739 osd.11
-10 hdd 7.27739 host ceph04~hdd
1 hdd 7.27739 osd.1
-8 hdd 7.27739 host ceph05~hdd
2 hdd 7.27739 osd.2
-1 55.49022 root default
-3 8.18709 host ceph01
10 hdd 7.27739 osd.10
8 ssd 0.90970 osd.8
-11 15.46448 host ceph02
0 hdd 7.27739 osd.0
12 hdd 7.27739 osd.12
7 ssd 0.90970 osd.7
-5 15.46448 host ceph03
4 hdd 7.27739 osd.4
11 hdd 7.27739 osd.11
3 ssd 0.90970 osd.3
-9 8.18709 host ceph04
1 hdd 7.27739 osd.1
5 ssd 0.90970 osd.5
-7 8.18709 host ceph05
2 hdd 7.27739 osd.2
6 ssd 0.90970 osd.6

Using different device classes


Edit online
To create performance domains, use device classes and a single CRUSH hierarchy. Simply add OSDs to the CRUSH hierarchy, then do
the following:

1. Add a class to each device. For example:

Syntax

ceph osd crush set-device-class <class> <osdId> [<osdId>]


ceph osd crush set-device-class hdd osd.0 osd.1 osd.4 osd.5
ceph osd crush set-device-class ssd osd.2 osd.3 osd.6 osd.7

2. Then, create rules to use the devices.

Syntax

ceph osd crush rule create-replicated <rule-name> <root> <failure-domain-type> <class>


ceph osd crush rule create-replicated cold default host hdd
ceph osd crush rule create-replicated hot default host ssd

3. Finally, set pools to use the rules.

Syntax

ceph osd pool set <poolname> crush_rule <rule-name>


ceph osd pool set cold_tier crush_rule cold
ceph osd pool set hot_tier crush_rule hot

NOTE: There is no need to manually edit the CRUSH map.

CRUSH hierarchy
Edit online

IBM Storage Ceph 69


The CRUSH map is a directed acyclic graph, so it can accommodate multiple hierarchies, for example, performance domains. The
easiest way to create and modify a CRUSH hierarchy is with the Ceph CLI; however, you can also decompile a CRUSH map, edit it,
recompile it, and activate it.

When declaring a bucket instance with the Ceph CLI, you must specify its type and give it a unique string name. Ceph automatically
assigns a bucket ID, sets the algorithm to straw2, sets the hash to 0 reflecting rjenkins1 and sets a weight. When modifying a
decompiled CRUSH map, assign the bucket a unique ID expressed as a negative integer (optional), specify a weight relative to the
total capacity/capability of its item(s), specify the bucket algorithm (usually straw2), and the hash (usually 0, reflecting hash
algorithm rjenkins1).

A bucket can have one or more items. The items can consist of node buckets (for example, racks, rows, hosts) or leaves (for example,
an OSD disk). Items can have a weight that reflects the relative weight of the item.

When modifying a decompiled CRUSH map, you can declare a node bucket with the following syntax:

[bucket-type] [bucket-name] {
id [a unique negative numeric ID]
weight [the relative capacity/capability of the item(s)]
alg [the bucket type: uniform | list | tree | straw2 ]
hash [the hash type: 0 by default]
item [item-name] weight [weight]
}

For example, using the diagram above, we would define two host buckets and one rack bucket. The OSDs are declared as items
within the host buckets:

host node1 {
id -1
alg straw2
hash 0
item osd.0 weight 1.00
item osd.1 weight 1.00
}

host node2 {
id -2
alg straw2
hash 0
item osd.2 weight 1.00
item osd.3 weight 1.00
}

rack rack1 {
id -3
alg straw2
hash 0
item node1 weight 2.00
item node2 weight 2.00
}

NOTE: In the foregoing example, note that the rack bucket does not contain any OSDs. Rather it contains lower level host buckets,
and includes the sum total of their weight in the item entry.

CRUSH location
Adding a bucket
Moving a bucket
Removing a bucket
CRUSH Bucket algorithms

CRUSH location
Edit online
A CRUSH location is the position of an OSD in terms of the CRUSH map’s hierarchy. When you express a CRUSH location on the
command line interface, a CRUSH location specifier takes the form of a list of name/value pairs describing the OSD’s position. For
example, if an OSD is in a particular row, rack, chassis and host, and is part of the default CRUSH tree, its crush location could be
described as:

root=default row=a rack=a2 chassis=a2a host=a2a1

70 IBM Storage Ceph


NOTE:

1. The order of the keys does not matter.

2. The key name (left of = ) must be a valid CRUSH type. By default these include root, datacenter, room, row, pod, pdu,
rack, chassis and host. You might edit the CRUSH map to change the types to suit your needs.

3. You do not need to specify all the buckets/keys. For example, by default, Ceph automatically sets a ceph-osd daemon’s
location to be root=default host={HOSTNAME} (based on the output from hostname -s).

Adding a bucket
Edit online
To add a bucket instance to your CRUSH hierarchy, specify the bucket name and its type. Bucket names must be unique in the
CRUSH map.

ceph osd crush add-bucket {name} {type}

If you plan to use multiple hierarchies, for example, for different hardware performance profiles, consider naming buckets based on
their type of hardware or use case.

For example, you could create a hierarchy for solid state drives (ssd), a hierarchy for SAS disks with SSD journals (hdd-journal),
and another hierarchy for SATA drives (hdd):

ceph osd crush add-bucket ssd-root root


ceph osd crush add-bucket hdd-journal-root root
ceph osd crush add-bucket hdd-root root

The Ceph CLI outputs:

added bucket ssd-root type root to crush map


added bucket hdd-journal-root type root to crush map
added bucket hdd-root type root to crush map

IMPORTANT: Using colons (:) in bucket names is not supported.

Add an instance of each bucket type you need for your hierarchy. The following example demonstrates adding buckets for a row with
a rack of SSD hosts and a rack of hosts for object storage.

ceph osd crush add-bucket ssd-row1 row


ceph osd crush add-bucket ssd-row1-rack1 rack
ceph osd crush add-bucket ssd-row1-rack1-host1 host
ceph osd crush add-bucket ssd-row1-rack1-host2 host
ceph osd crush add-bucket hdd-row1 row
ceph osd crush add-bucket hdd-row1-rack2 rack
ceph osd crush add-bucket hdd-row1-rack1-host1 host
ceph osd crush add-bucket hdd-row1-rack1-host2 host
ceph osd crush add-bucket hdd-row1-rack1-host3 host
ceph osd crush add-bucket hdd-row1-rack1-host4 host

Once you have completed these steps, view your tree.

ceph osd tree

Notice that the hierarchy remains flat. You must move your buckets into a hierarchical position after you add them to the CRUSH
map.

Moving a bucket
Edit online
When you create your initial cluster, Ceph has a default CRUSH map with a root bucket named default and your initial OSD hosts
appear under the default bucket. When you add a bucket instance to your CRUSH map, it appears in the CRUSH hierarchy, but it
does not necessarily appear under a particular bucket.

To move a bucket instance to a particular location in your CRUSH hierarchy, specify the bucket name and its type.

IBM Storage Ceph 71


Example

ceph osd crush move ssd-row1 root=ssd-root


ceph osd crush move ssd-row1-rack1 row=ssd-row1
ceph osd crush move ssd-row1-rack1-host1 rack=ssd-row1-rack1
ceph osd crush move ssd-row1-rack1-host2 rack=ssd-row1-rack1

Once you have completed these steps, you can view your tree.

ceph osd tree

NOTE: You can also use ceph osd crush create-or-move to create a location while moving an OSD.

Removing a bucket
Edit online
To remove a bucket instance from your CRUSH hierarchy, specify the bucket name. For example:

ceph osd crush remove {bucket-name}

Or:

ceph osd crush rm {bucket-name}

NOTE: The bucket must be empty in order to remove it.

If you are removing higher level buckets (for example, a root like default), check to see if a pool uses a CRUSH rule that selects
that bucket. If so, you need to modify your CRUSH rules; otherwise, peering fails.

CRUSH Bucket algorithms


Edit online
When you create buckets using the Ceph CLI, Ceph sets the algorithm to straw2 by default. Ceph supports four bucket algorithms,
each representing a tradeoff between performance and reorganization efficiency. If you are unsure of which bucket type to use, we
recommend using a straw2 bucket. The bucket algorithms are:

1. Uniform: Uniform buckets aggregate devices with exactly the same weight. For example, when firms commission or
decommission hardware, they typically do so with many machines that have exactly the same physical configuration (for
example, bulk purchases). When storage devices have exactly the same weight, you can use the uniform bucket type, which
allows CRUSH to map replicas into uniform buckets in constant time. With non-uniform weights, you should use another
bucket algorithm.

2. List: List buckets aggregate their content as linked lists. Based on the RUSH (Replication Under Scalable Hashing) P algorithm,
a list is a natural and intuitive choice for an expanding cluster: either an object is relocated to the newest device with some
appropriate probability, or it remains on the older devices as before. The result is optimal data migration when items are
added to the bucket. Items removed from the middle or tail of the list, however, can result in a significant amount of
unnecessary movement, making list buckets most suitable for circumstances in which they never, or very rarely shrink.

3. Tree: Tree buckets use a binary search tree. They are more efficient than listing buckets when a bucket contains a larger set of
items. Based on the RUSH (Replication Under Scalable Hashing) R algorithm, tree buckets reduce the placement time to zero
(log n), making them suitable for managing much larger sets of devices or nested buckets.

4. Straw2 (default): List and Tree buckets use a divide and conquer strategy in a way that either gives certain items precedence,
for example, those at the beginning of a list or obviates the need to consider entire subtrees of items at all. That improves the
performance of the replica placement process, but can also introduce suboptimal reorganization behavior when the contents
of a bucket change due an addition, removal, or re-weighting of an item. The straw2 bucket type allows all items to fairly
“compete” against each other for replica placement through a process analogous to a draw of straws.

Ceph OSDs in CRUSH

72 IBM Storage Ceph


Edit online
Once you have a CRUSH hierarchy for the OSDs, add OSDs to the CRUSH hierarchy. You can also move or remove OSDs from an
existing hierarchy. The Ceph CLI usage has the following values:

id
Description
The numeric ID of the OSD.

Type
Integer

Required
Yes

Example
0

name
Description
The full name of the OSD.

Type
String

Required
Yes

Example
osd.0

weight
Description
The CRUSH weight for the OSD.

Type
Double

Required
Yes

Example
2.0

root
Description
The name of the root bucket of the hierarchy or tree in which the OSD resides.

Type
Key-value pair.

Required
Yes

Example
root=default, root=replicated_rule, and so on

bucket-type
Description
One or more name-value pairs, where the name is the bucket type and the value is the bucket’s name. You can specify a CRUSH
location for an OSD in the CRUSH hierarchy.

Type
Key-value pairs.

Required
No

Example
datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1

IBM Storage Ceph 73


Viewing OSDs in CRUSH
Adding an OSD to CRUSH
Moving an OSD within a CRUSH Hierarchy
Removing an OSD from a CRUSH Hierarchy

Viewing OSDs in CRUSH


Edit online
The ceph osd crush tree command prints CRUSH buckets and items in a tree view. Use this command to determine a list of
OSDs in a particular bucket. It will print output similar to ceph osd tree.

To return additional details, execute the following:

# ceph osd crush tree -f json-pretty

The command returns an output similar to the following:

[
{
"id": -2,
"name": "ssd",
"type": "root",
"type_id": 10,
"items": [
{
"id": -6,
"name": "dell-per630-11-ssd",
"type": "host",
"type_id": 1,
"items": [
{
"id": 6,
"name": "osd.6",
"type": "osd",
"type_id": 0,
"crush_weight": 0.099991,
"depth": 2
}
]
},
{
"id": -7,
"name": "dell-per630-12-ssd",
"type": "host",
"type_id": 1,
"items": [
{
"id": 7,
"name": "osd.7",
"type": "osd",
"type_id": 0,
"crush_weight": 0.099991,
"depth": 2
}
]
},
{
"id": -8,
"name": "dell-per630-13-ssd",
"type": "host",
"type_id": 1,
"items": [
{
"id": 8,
"name": "osd.8",
"type": "osd",
"type_id": 0,
"crush_weight": 0.099991,
"depth": 2
}

74 IBM Storage Ceph


]
}
]
},
{
"id": -1,
"name": "default",
"type": "root",
"type_id": 10,
"items": [
{
"id": -3,
"name": "dell-per630-11",
"type": "host",
"type_id": 1,
"items": [
{
"id": 0,
"name": "osd.0",
"type": "osd",
"type_id": 0,
"crush_weight": 0.449997,
"depth": 2
},
{
"id": 3,
"name": "osd.3",
"type": "osd",
"type_id": 0,
"crush_weight": 0.289993,
"depth": 2
}
]
},
{
"id": -4,
"name": "dell-per630-12",
"type": "host",
"type_id": 1,
"items": [
{
"id": 1,
"name": "osd.1",
"type": "osd",
"type_id": 0,
"crush_weight": 0.449997,
"depth": 2
},
{
"id": 4,
"name": "osd.4",
"type": "osd",
"type_id": 0,
"crush_weight": 0.289993,
"depth": 2
}
]
},
{
"id": -5,
"name": "dell-per630-13",
"type": "host",
"type_id": 1,
"items": [
{
"id": 2,
"name": "osd.2",
"type": "osd",
"type_id": 0,
"crush_weight": 0.449997,
"depth": 2
},
{
"id": 5,
"name": "osd.5",
"type": "osd",

IBM Storage Ceph 75


"type_id": 0,
"crush_weight": 0.289993,
"depth": 2
}
]
}
]
}
]

Adding an OSD to CRUSH


Edit online
Adding a Ceph OSD to a CRUSH hierarchy is the final step before you might start an OSD (rendering it up and in) and Ceph assigns
placement groups to the OSD.

You must prepare a Ceph OSD before you add it to the CRUSH hierarchy. Deployment utilities, such as the Ceph Orchestrator, can
perform this step for you. For example creating a Ceph OSD on a single node:

Syntax

ceph orch daemon add osd HOST:_DEVICE_,[DEVICE]

The CRUSH hierarchy is notional, so the ceph osd crush add command allows you to add OSDs to the CRUSH hierarchy wherever
you wish. The location you specify should reflect its actual location. If you specify at least one bucket, the command places the OSD
into the most specific bucket you specify, and it moves that bucket underneath any other buckets you specify.

To add an OSD to a CRUSH hierarchy:

Syntax

ceph osd crush add ID_OR_NAME WEIGHT [BUCKET_TYPE=BUCKET_NAME ...]

IMPORTANT: If you specify only the root bucket, the command attaches the OSD directly to the root. However, CRUSH rules expect
OSDs to be inside of hosts or chassis, and host or chassis should be inside of other buckets reflecting your cluster topology.

The following example adds osd.0 to the hierarchy:

ceph osd crush add osd.0 1.0 root=default datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1

NOTE: You can also use ceph osd crush set or ceph osd crush create-or-move to add an OSD to the CRUSH hierarchy.

Moving an OSD within a CRUSH Hierarchy


Edit online
If the storage cluster topology changes, you can move an OSD in the CRUSH hierarchy to reflect its actual location.

IMPORTANT: Moving an OSD in the CRUSH hierarchy means that Ceph will recompute which placement groups get assigned to the
OSD, potentially resulting in significant redistribution of data.

To move an OSD within the CRUSH hierarchy:

Syntax

ceph osd crush set ID_OR_NAME WEIGHT root=POOL_NAME [BUCKET_TYPE=BUCKET_NAME...]

NOTE: You can also use ceph osd crush create-or-move to move an OSD within the CRUSH hierarchy.

Removing an OSD from a CRUSH Hierarchy


Edit online

76 IBM Storage Ceph


Removing an OSD from a CRUSH hierarchy is the first step when you want to remove an OSD from your cluster. When you remove the
OSD from the CRUSH map, CRUSH recomputes which OSDs get the placement groups and data re-balances accordingly. See
Adding/Removing OSDs for additional details.

To remove an OSD from the CRUSH map of a running cluster, execute the following:

Syntax

ceph osd crush remove NAME

Device class
Edit online
Ceph’s CRUSH map provides extraordinary flexibility in controlling data placement. This is one of Ceph’s greatest strengths. Early
Ceph deployments used hard disk drives almost exclusively. Today, Ceph clusters are frequently built with multiple types of storage
devices: HDD, SSD, NVMe, or even various classes of the foregoing. For example, it is common in Ceph Object Gateway deployments
to have storage policies where clients can store data on slower HDDs and other storage policies for storing data on fast SSDs. Ceph
Object Gateway deployments might even have a pool backed by fast SSDs for bucket indices. Additionally, OSD nodes also frequently
have SSDs used exclusively for journals or write-ahead logs that do NOT appear in the CRUSH map. These complex hardware
scenarios historically required manually editing the CRUSH map, which can be time-consuming and tedious. It is not required to have
different CRUSH hierarchies for different classes of storage devices.

CRUSH rules work in terms of the CRUSH hierarchy. However, if different classes of storage devices reside in the same hosts, the
process becomes more complicated—requiring users to create multiple CRUSH hierarchies for each class of device, and then disable
the osd crush update on start option that automates much of the CRUSH hierarchy management. Device classes eliminate
this tediousness by telling the CRUSH rule what class of device to use, dramatically simplifying CRUSH management tasks.

NOTE: The ceph osd tree command has a column reflecting a device class.

Setting a device class


Removing a device class
Renaming a device class
Listing a device class
Listing OSDs of a device class
Listing CRUSH Rules by Class

Reference
Edit online

Using Different Device Class

CRUSH Storage Strategy examples

Setting a device class


Edit online
To set a device class for an OSD, execute the following:

Syntax

ceph osd crush set-device-class CLASS OSD_ID [OSD_ID..]

Example

[ceph: root@host01 /]# ceph osd crush set-device-class hdd osd.0 osd.1
[ceph: root@host01 /]# ceph osd crush set-device-class ssd osd.2 osd.3
[ceph: root@host01 /]# ceph osd crush set-device-class bucket-index osd.4

NOTE: Ceph might assign a class to a device automatically. However, class names are simply arbitrary strings. There is no
requirement to adhere to hdd, ssd or nvme. In the foregoing example, a device class named bucket-index might indicate an SSD

IBM Storage Ceph 77


device that a Ceph Object Gateway pool uses exclusively bucket index workloads. To change a device class that was already set, use
ceph osd crush rm-device-class first.

Removing a device class


Edit online
To remove a device class for an OSD, execute the following:

Syntax

ceph osd crush rm-device-class CLASS OSD_ID [OSD_ID..]

Example

[ceph: root@host01 /]# ceph osd crush rm-device-class hdd osd.0 osd.1
[ceph: root@host01 /]# ceph osd crush rm-device-class ssd osd.2 osd.3
[ceph: root@host01 /]# ceph osd crush rm-device-class bucket-index osd.4

Renaming a device class


Edit online
To rename a device class for all OSDs that use that class, execute the following:

Syntax

ceph osd crush class rename OLD_NAME NEW_NAME

Example

[ceph: root@host01 /]# ceph osd crush class rename hdd sas15k

Listing a device class


Edit online
To list device classes in the CRUSH map, execute the following:

Syntax

ceph osd crush class ls

The output will look something like this:

Example

[
"hdd",
"ssd",
"bucket-index"
]

Listing OSDs of a device class


Edit online
To list all OSDs that belong to a particular class, execute the following:

Syntax

ceph osd crush class ls-osd CLASS

78 IBM Storage Ceph


Example

[ceph: root@host01 /]# ceph osd crush class ls-osd hdd

The output is simply a list of OSD numbers. For example:

0
1
2
3
4
5
6

Listing CRUSH Rules by Class


Edit online
To list all crush rules that reference the same class, execute the following:

Syntax

ceph osd crush rule ls-by-class CLASS

Example

[ceph: root@host01 /]# ceph osd crush rule ls-by-class hdd

CRUSH weights
Edit online
The CRUSH algorithm assigns a weight value in terabytes (by convention) per OSD device with the objective of approximating a
uniform probability distribution for write requests that assign new data objects to PGs and PGs to OSDs. For this reason, as a best
practice, we recommend creating CRUSH hierarchies with devices of the same type and size, and assigning the same weight. We also
recommend using devices with the same I/O and throughput characteristics so that you will also have uniform performance
characteristics in your CRUSH hierarchy, even though performance characteristics do not affect data distribution.

Since using uniform hardware is not always practical, you might incorporate OSD devices of different sizes and use a relative weight
so that Ceph will distribute more data to larger devices and less data to smaller devices.

Setting CRUSH weights of OSDs


Setting a Bucket’s OSD Weights
Set an OSD’s in Weight
Setting the OSDs weight by utilization
Setting an OSD’s Weight by PG distribution
Recalculating a CRUSH Tree’s weights

Setting CRUSH weights of OSDs


Edit online
To set an OSD CRUSH weight in Terabytes within the CRUSH map, execute the following command

ceph osd crush reweight NAME WEIGHT

Where:

name
Description
The full name of the OSD.

IBM Storage Ceph 79


Type
String

Required
Yes

Example
osd.0

weight
Description
The CRUSH weight for the OSD. This should be the size of the OSD in Terabytes, where 1.0 is 1 Terabyte.

Type
Double

Required
Yes

Example
2.0

This setting is used when creating an OSD or adjusting the CRUSH weight immediately after adding the OSD. It usually does not
change over the life of the OSD.

Setting a Bucket’s OSD Weights


Edit online
Using ceph osd crush reweight can be time-consuming. You can set (or reset) all Ceph OSD weights under a bucket (row, rack,
node, and so on) by executing:

Syntax

osd crush reweight-subtree NAME

Where,

name is the name of the CRUSH bucket.

Set an OSD’s in Weight

Edit online
For the purposes of ceph osd in and ceph osd out, an OSD is either in the cluster or out of the cluster. That is how a monitor
records an OSD’s status. However, even though an OSD is in the cluster, it might be experiencing a malfunction such that you do not
want to rely on it as much until you fix it (for example, replace a storage drive, change out a controller, and so on).

You can increase or decrease the in weight of a particular OSD (that is, without changing its weight in Terabytes) by executing:

Syntax

ceph osd reweight ID WEIGHT

Where:

id is the OSD number.

weight is a range from 0.0-1.0, where 0 is not in the cluster (that is, it does not have any PGs assigned to it) and 1.0 is in the
cluster (that is, the OSD receives the same number of PGs as other OSDs).

Setting the OSDs weight by utilization

80 IBM Storage Ceph


Edit online
CRUSH is designed to approximate a uniform probability distribution for write requests that assign new data objects PGs and PGs to
OSDs. However, a cluster might become imbalanced anyway. This can happen for a number of reasons. For example:

Multiple Pools: You can assign multiple pools to a CRUSH hierarchy, but the pools might have different numbers of placement
groups, size (number of replicas to store), and object size characteristics.

Custom Clients: Ceph clients such as block device, object gateway and filesystem share data from their clients and stripe the
data as objects across the cluster as uniform-sized smaller RADOS objects. So except for the foregoing scenario, CRUSH
usually achieves its goal. However, there is another case where a cluster can become imbalanced: namely, using librados to
store data without normalizing the size of objects. This scenario can lead to imbalanced clusters (for example, storing 100 1
MB objects and 10 4 MB objects will make a few OSDs have more data than the others).

Probability: A uniform distribution will result in some OSDs with more PGs and some with less. For clusters with a large
number of OSDs, the statistical outliers will be further out.

You can reweight OSDs by utilization by executing the following:

Syntax

ceph osd reweight-by-utilization [THRESHOLD_] [WEIGHT_CHANGE_AMOUNT] [NUMBER_OF_OSDS] [--no-


increasing]

Example

[ceph: root@host01 /]# ceph osd test-reweight-by-utilization 110 .5 4 --no-increasing

Where:

threshold is a percentage of utilization such that OSDs facing higher data storage loads will receive a lower weight and thus
fewer PGs assigned to them. The default value is 120, reflecting 120%. Any value from 100+ is a valid threshold. Optional.

weight_change_amount is the amount to change the weight. Valid values are greater than 0.0 - 1.0. The default value is
0.05. Optional.

number_of_OSDs is the maximum number of OSDs to reweight. For large clusters, limiting the number of OSDs to reweight
prevents significant rebalancing. Optional.

no-increasing is off by default. Increasing the osd weight is allowed when using the reweight-by-utilization or
test-reweight-by-utilization commands. If this option is used with these commands, it prevents the OSD weight
from increasing, even if the OSD is underutilized. Optional.

IMPORTANT: Executing reweight-by-utilization is recommended and somewhat inevitable for large clusters. Utilization rates
might change over time, and as your cluster size or hardware changes, the weightings might need to be updated to reflect changing
utilization. If you elect to reweight by utilization, you might need to re-run this command as utilization, hardware or cluster size
change.

Executing this or other weight commands that assign a weight will override the weight assigned by this command (for example, osd
reweight-by-utilization, osd crush weight, osd weight, in or out).

Setting an OSD’s Weight by PG distribution


Edit online
In CRUSH hierarchies with a smaller number of OSDs, it’s possible for some OSDs to get more PGs than other OSDs, resulting in a
higher load. You can reweight OSDs by PG distribution to address this situation by executing the following:

Syntax

osd reweight-by-pg POOL_NAME

Where:

poolname is the name of the pool. Ceph will examine how the pool assigns PGs to OSDs and reweight the OSDs according to
this pool’s PG distribution. Note that multiple pools could be assigned to the same CRUSH hierarchy. Reweighting OSDs
according to one pool’s distribution could have unintended effects for other pools assigned to the same CRUSH hierarchy if
they do not have the same size (number of replicas) and PGs.

IBM Storage Ceph 81


Recalculating a CRUSH Tree’s weights
Edit online
CRUSH tree buckets should be the sum of their leaf weights. If you manually edit the CRUSH map weights, you should execute the
following to ensure that the CRUSH bucket tree accurately reflects the sum of the leaf OSDs under the bucket.

Syntax

osd crush reweight-all

Primary affinity
Edit online
When a Ceph Client reads or writes data, it always contacts the primary OSD in the acting set. For set [2, 3, 4], osd.2 is the
primary. Sometimes an OSD is not well suited to act as a primary compared to other OSDs (for example, it has a slow disk or a slow
controller). To prevent performance bottlenecks (especially on read operations) while maximizing utilization of your hardware, you
can set a Ceph OSD’s primary affinity so that CRUSH is less likely to use the OSD as a primary in an acting set. :

Syntax

ceph osd primary-affinity OSD_ID WEIGHT

Primary affinity is 1 by default (that is, an OSD might act as a primary). You might set the OSD primary range from 0-1, where 0
means that the OSD might NOT be used as a primary and 1 means that an OSD might be used as a primary. When the weight is < 1,
it is less likely that CRUSH will select the Ceph OSD Daemon to act as a primary.

CRUSH rules
Edit online
CRUSH rules define how a Ceph client selects buckets and the primary OSD within them to store objects, and how the primary OSD
selects buckets and the secondary OSDs to store replicas or coding chunks. For example, you might create a rule that selects a pair
of target OSDs backed by SSDs for two object replicas, and another rule that selects three target OSDs backed by SAS drives in
different data centers for three replicas.

A rule takes the following form:

rule <rulename> {

id <unique number>
type [replicated | erasure]
min_size <min-size>
max_size <max-size>
step take <bucket-type> [class <class-name>]
step [choose|chooseleaf] [firstn|indep] <N> <bucket-type>
step emit
}

id
Description
A unique whole number for identifying the rule.

Purpose
A component of the rule mask.

Type
Integer

Required
Yes

82 IBM Storage Ceph


Default
0

type
Description
Describes a rule for either a storage drive replicated or erasure coded.

Purpose
A component of the rule mask.

Type
String

Required
Yes

Default
replicated

Valid Values
Currently only replicated

min_size
Description
If a pool makes fewer replicas than this number, CRUSH will not select this rule.

Type
Integer

Purpose
A component of the rule mask.

Required
Yes

Default
1

max_size
Description
If a pool makes more replicas than this number, CRUSH will not select this rule.

Type
Integer

Purpose
A component of the rule mask.

Required
Yes

Default
10

step take <bucket-name> [class <class-name>]


Description
Takes a bucket name, and begins iterating down the tree.

Purpose
A component of the rule.

Required
Yes

Example
step take data step take data class ssd

step choose firstn <num> type <bucket-type>


Description
Selects the number of buckets of the given type. The number is usually the number of replicas in the pool (that is, pool size).

IBM Storage Ceph 83


If <num> == 0, choose pool-num-replicas buckets (all available).

If <num> > 0 && < pool-num-replicas, choose that many buckets.

If <num> < 0, it means pool-num-replicas - {num}.

Purpose
A component of the rule.

Prerequisite
Follow step take or step choose.

Example
step choose firstn 1 type row

step chooseleaf firstn <num> type <bucket-type>


Description
Selects a set of buckets of {bucket-type} and chooses a leaf node from the subtree of each bucket in the set of buckets. The
number of buckets in the set is usually the number of replicas in the pool (that is, pool size).

If <num> == 0, choose pool-num-replicas buckets (all available).

If <num> > 0 && < pool-num-replicas, choose that many buckets.

If <num> < 0, it means pool-num-replicas - <num>.

Purpose
A component of the rule. Usage removes the need to select a device using two steps.

Prerequisite
Follows step take or step choose.

Example
step chooseleaf firstn 0 type row

step emit
Description
Outputs the current value and empties the stack. Typically used at the end of a rule, but might also be used to pick from different
trees in the same rule.

Purpose
A component of the rule.

Prerequisite
Follows step choose.

Example
step emit

firstn versus indep


Description
Controls the replacement strategy CRUSH uses when OSDs are marked down in the CRUSH map. If this rule is to be used with
replicated pools it should be firstn and if it is for erasure-coded pools it should be indep.

Example
You have a PG stored on OSDs 1, 2, 3, 4, 5 in which 3 goes down.. In the first scenario, with the firstn mode, CRUSH adjusts its
calculation to select 1 and 2, then selects 3 but discovers it is down, so it retries and selects 4 and 5, and then goes on to select a
new OSD 6. The final CRUSH mapping change is from 1, 2, 3, 4, 5 to 1, 2, 4, 5, 6. In the second scenario, with indep mode on an
erasure-coded pool, CRUSH attempts to select the failed OSD 3, tries again and picks out 6, for a final transformation from 1, 2, 3, 4,
5 to 1, 2, 6, 4, 5.

IMPORTANT: A given CRUSH rule can be assigned to multiple pools, but it is not possible for a single pool to have multiple CRUSH
rules.

Listing CRUSH rules


Dumping CRUSH rules
Adding CRUSH rules
Creating CRUSH rules for replicated pools
Creating CRUSH rules for erasure coded pools

84 IBM Storage Ceph


Removing CRUSH rules

Listing CRUSH rules


Edit online
To list CRUSH rules from the command line, execute the following:

Syntax

ceph osd crush rule list


ceph osd crush rule ls

Dumping CRUSH rules


Edit online
To dump the contents of a specific CRUSH rule, execute the following:

Syntax

ceph osd crush rule dump NAME

Adding CRUSH rules


Edit online
To add a CRUSH rule, you must specify a rule name, the root node of the hierarchy you wish to use, the type of bucket you want to
replicate across (for example, rack, row, and so on and the mode for choosing the bucket.

Syntax

ceph osd crush rule create-simple RUENAME ROOT BUCKET_NAME FIRSTN_OR_INDEP

Ceph creates a rule with chooseleaf and one bucket of the type you specify.

Example

[ceph: root@host01 /]# ceph osd crush rule create-simple deleteme default host firstn

Create the following rule:

{ "id": 1,
"rule_name": "deleteme",
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{ "op": "take",
"item": -1,
"item_name": "default"},
{ "op": "chooseleaf_firstn",
"num": 0,
"type": "host"},
{ "op": "emit"}]}

Creating CRUSH rules for replicated pools


Edit online
To create a CRUSH rule for a replicated pool, execute the following:

IBM Storage Ceph 85


Syntax

ceph osd crush rule create-replicated NAME ROOT FAILURE_DOMAIN CLASS

Where:

<name>: The name of the rule.

<root>: The root of the CRUSH hierarchy.

<failure-domain>: The failure domain. For example: host or rack.

<class>: The storage device class. For example: hdd or ssd.

Example

[ceph: root@host01 /]# ceph osd crush rule create-replicated fast default host ssd

Creating CRUSH rules for erasure coded pools


Edit online
To add a CRUSH rule for use with an erasure coded pool, you might specify a rule name and an erasure code profile.

Syntax

ceph osd crush rule create-erasure RULE_NAME PROFILE_NAME

Removing CRUSH rules


Edit online
To remove a rule, execute the following and specify the CRUSH rule name:

Syntax

ceph osd crush rule rm NAME

CRUSH tunables overview


Edit online
The Ceph project has grown exponentially with many changes and many new features. Beginning with the first commercially
supported major release of Ceph, v0.48 (Argonaut), Ceph provides the ability to adjust certain parameters of the CRUSH algorithm,
that is, the settings are not frozen in the source code.

A few important points to consider:

Adjusting CRUSH values might result in the shift of some PGs between storage nodes. If the Ceph cluster is already storing a
lot of data, be prepared for some fraction of the data to move.

The ceph-osd and ceph-mon daemons will start requiring the feature bits of new connections as soon as they receive an
updated map. However, already-connected clients are effectively grandfathered in, and will misbehave if they do not support
the new feature. Make sure when you upgrade your Ceph Storage Cluster daemons that you also update your Ceph clients.

If the CRUSH tunables are set to non-legacy values and then later changed back to the legacy values, ceph-osd daemons will
not be required to support the feature. However, the OSD peering process requires examining and understanding old maps.
Therefore, you should not run old versions of the ceph-osd daemon if the cluster has previously used non-legacy CRUSH
values, even if the latest version of the map has been switched back to using the legacy defaults.

CRUSH tuning
CRUSH tuning, the hard way
CRUSH legacy values

86 IBM Storage Ceph


CRUSH tuning
Edit online
Before you tune CRUSH, you should ensure that all Ceph clients and all Ceph daemons use the same version. If you have recently
upgraded, ensure that you have restarted daemons and reconnected clients.

The simplest way to adjust the CRUSH tunables is by changing to a known profile. Those are:

legacy: The legacy behavior from v0.47 (pre-Argonaut) and earlier.

argonaut: The legacy values supported by v0.48 (Argonaut) release.

bobtail: The values supported by the v0.56 (Bobtail) release.

firefly: The values supported by the v0.80 (Firefly) release.

hammer: The values supported by the v0.94 (Hammer) release.

jewel: The values supported by the v10.0.2 (Jewel) release.

optimal: The current best values.

default: The current default values for a new cluster.

You can select a profile on a running cluster with the command:

Syntax

# ceph osd crush tunables PROFILE

NOTE: This might result in some data movement.

Generally, you should set the CRUSH tunables after you upgrade, or if you receive a warning. Starting with version v0.74, Ceph issues
a health warning if the CRUSH tunables are not set to their optimal values, the optimal values are the default as of v0.73.

To make this warning go away, you have two options:

1. Adjust the tunables on the existing cluster. Note that this will result in some data movement (possibly as much as 10%). This
is the preferred route, but should be taken with care on a production cluster where the data movement might affect
performance. You can enable optimal tunables with:

# ceph osd crush tunables optimal

If things go poorly (for example, too much load) and not very much progress has been made, or there is a client compatibility
problem (old kernel cephfs or rbd clients, or pre-bobtail librados clients), you can switch back to an earlier profile:

# ceph osd crush tunables <profile>

For example, to restore the pre-v0.48 (Argonaut) values, execute:

# ceph osd crush tunables legacy

2. You can make the warning go away without making any changes to CRUSH by adding the following option to the mon section of
the ceph.conf file:

# mon warn on legacy crush tunables = false

For the change to take effect, restart the monitors, or apply the option to running monitors with:

# ceph tell mon.\* injectargs --no-mon-warn-on-legacy-crush-tunables

CRUSH tuning, the hard way


Edit online

IBM Storage Ceph 87


If you can ensure that all clients are running recent code, you can adjust the tunables by extracting the CRUSH map, modifying the
values, and reinjecting it into the cluster.

Extract the latest CRUSH map:

ceph osd getcrushmap -o /tmp/crush

Adjust tunables. These values appear to offer the best behavior for both large and small clusters we tested with. You will need
to additionally specify the --enable-unsafe-tunables argument to crushtool for this to work. Please use this option
with extreme care.:

crushtool -i /tmp/crush --set-choose-local-tries 0 --set-choose-local-fallback-tries 0 --


set-choose-total-tries 50 -o /tmp/crush.new

Reinject modified map:

ceph osd setcrushmap -i /tmp/crush.new

CRUSH legacy values


Edit online
For reference, the legacy values for the CRUSH tunables can be set with:

crushtool -i /tmp/crush --set-choose-local-tries 2 --set-choose-local-fallback-tries 5 --set-


choose-total-tries 19 --set-chooseleaf-descend-once 0 --set-chooseleaf-vary-r 0 -o
/tmp/crush.legacy

Again, the special --enable-unsafe-tunables option is required. Further, as noted above, be careful running old versions of the
ceph-osd daemon after reverting to legacy values as the feature bit is not perfectly enforced.

Edit a CRUSH map


Edit online
Generally, modifying your CRUSH map at runtime with the Ceph CLI is more convenient than editing the CRUSH map manually.
However, there are times when you might choose to edit it, such as changing the default bucket types, or using a bucket algorithm
other than straw2.

To edit an existing CRUSH map:

1. Get the CRUSH map.

2. Decompile the CRUSH map.

3. Edit at least one of the devices, and buckets and rules.

4. Recompile the CRUSH map.

5. Set the CRUSH map.

To activate a CRUSH Map rule for a specific pool, identify the common rule number and specify that rule number for the pool when
creating the pool.

Getting the CRUSH map


Decompiling the CRUSH map
Compiling the CRUSH map
Setting a CRUSH map

Getting the CRUSH map


Edit online

88 IBM Storage Ceph


To get the CRUSH map for your cluster, execute the following:

Syntax

ceph osd getcrushmap -o COMPILED_CRUSHMAP_FILENAME

Ceph will output (-o) a compiled CRUSH map to the file name you specified. Since the CRUSH map is in a compiled form, you must
decompile it first before you can edit it.

Decompiling the CRUSH map


Edit online
To decompile a CRUSH map, execute the following:

Syntax

crushtool -d COMPILED_CRUSHMAP_FILENAME -o DECOMPILED_CRUSHMAP_FILENAME

Ceph decompiles (-d) the compiled CRUSH map and send the output (-o) to the file name you specified.

Compiling the CRUSH map


Edit online
To compile a CRUSH map, execute the following:

Syntax

crushtool -c DECOMPILED_CRUSHMAP_FILENAME -o COMPILED_CRUSHMAP_FILENAME

Ceph will store a compiled CRUSH map to the file name you specified.

Setting a CRUSH map


Edit online
To set the CRUSH map for your cluster, execute the following:

Syntax

ceph osd setcrushmap -i COMPILED_CRUSHMAP_FILENAME

Ceph inputs the compiled CRUSH map of the file name you specified as the CRUSH map for the cluster.

CRUSH storage strategies examples


Edit online
If you want to have most pools default to OSDs backed by large hard drives, but have some pools mapped to OSDs backed by fast
solid-state drives (SSDs). CRUSH can handle these scenarios easily.

Use device classes. The process is simple to add a class to each device.

Syntax

ceph osd crush set-device-class CLASS OSD_ID [OSD_ID]

Example

[ceph:root@host01 /]# ceph osd crush set-device-class hdd osd.0 osd.1 osd.4 osd.5
[ceph:root@host01 /]# ceph osd crush set-device-class ssd osd.2 osd.3 osd.6 osd.7

IBM Storage Ceph 89


Then, create rules to use the devices.

Syntax

ceph osd crush rule create-replicated RULENAME ROOT FAILURE_DOMAIN_TYPE DEVICE_CLASS

Example

[ceph:root@host01 /]# ceph osd crush rule create-replicated cold default host hdd
[ceph:root@host01 /]# ceph osd crush rule create-replicated hot default host ssd

Finally, set pools to use the rules.

Syntax

ceph osd pool set POOL_NAME crush_rule RULENAME

Example

[ceph:root@host01 /]# ceph osd pool set cold crush_rule hdd


[ceph:root@host01 /]# ceph osd pool set hot crush_rule ssd

There is no need to manually edit the CRUSH map, because one hierarchy can serve multiple classes of devices.

device 0 osd.0 class hdd


device 1 osd.1 class hdd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class ssd
device 7 osd.7 class ssd

host ceph-osd-server-1 {
id -1
alg straw2
hash 0
item osd.0 weight 1.00
item osd.1 weight 1.00
item osd.2 weight 1.00
item osd.3 weight 1.00
}

host ceph-osd-server-2 {
id -2
alg straw2
hash 0
item osd.4 weight 1.00
item osd.5 weight 1.00
item osd.6 weight 1.00
item osd.7 weight 1.00
}

root default {
id -3
alg straw2
hash 0
item ceph-osd-server-1 weight 4.00
item ceph-osd-server-2 weight 4.00
}

rule cold {
ruleset 0
type replicated
min_size 2
max_size 11
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}

rule hot {
ruleset 1

90 IBM Storage Ceph


type replicated
min_size 2
max_size 11
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}

Placement Groups
Edit online
Placement Groups (PGs) are invisible to Ceph clients, but they play an important role in Ceph Storage Clusters.

A Ceph Storage Cluster might require many thousands of OSDs to reach an exabyte level of storage capacity. Ceph clients store
objects in pools, which are a logical subset of the overall cluster. The number of objects stored in a pool might easily run into the
millions and beyond. A system with millions of objects or more cannot realistically track placement on a per-object basis and still
perform well. Ceph assigns objects to placement groups, and placement groups to OSDs to make re-balancing dynamic and efficient.

All problems in computer science can be solved by another level of indirection, except of course for the problem of too
many indirections.

— David Wheeler

About placement groups


Placement group states
Placement group tradeoffs
Placement group count
Auto-scaling placement groups
Updating noautoscale flag
Specifying target pool size
Placement group command line interface

About placement groups


Edit online
Tracking object placement on a per-object basis within a pool is computationally expensive at scale. To facilitate high performance at
scale, Ceph subdivides a pool into placement groups, assigns each individual object to a placement group, and assigns the
placement group to a primary OSD. If an OSD fails or the cluster re-balances, Ceph can move or replicate an entire placement group
—that is, all of the objects in the placement groups—without having to address each object individually. This allows a Ceph cluster to
re-balance or recover efficiently.

Figure 1. About PGs

IBM Storage Ceph 91


When CRUSH assigns a placement group to an OSD, it calculates a series of OSDs—the first being the primary. The
osd_pool_default_size setting minus 1 for replicated pools, and the number of coding chunks M for erasure-coded pools
determine the number of OSDs storing a placement group that can fail without losing data permanently. Primary OSDs use CRUSH to
identify the secondary OSDs and copy the placement group’s contents to the secondary OSDs. For example, if CRUSH assigns an
object to a placement group, and the placement group is assigned to OSD 5 as the primary OSD, if CRUSH calculates that OSD 1 and
OSD 8 are secondary OSDs for the placement group, the primary OSD 5 will copy the data to OSDs 1 and 8. By copying data on behalf
of clients, Ceph simplifies the client interface and reduces the client workload. The same process allows the Ceph cluster to recover
and rebalance dynamically.

Figure 2. CRUSH hierarchy

When the primary OSD fails and gets marked out of the cluster, CRUSH assigns the placement group to another OSD, which receives
copies of objects in the placement group. Another OSD in the Up Set will assume the role of the primary OSD.

When you increase the number of object replicas or coding chunks, CRUSH will assign each placement group to additional OSDs as
required.

NOTE: PGs do not own OSDs. CRUSH assigns many placement groups to each OSD pseudo-randomly to ensure that data gets
distributed evenly across the cluster.

Placement group states


Edit online
When you check the storage cluster’s status with the ceph -s or ceph -w commands, Ceph reports on the status of the placement
groups (PGs). A PG has one or more states. The optimum state for PGs in the PG map is an active + clean state.

activating
The PG is peered, but not yet active.

92 IBM Storage Ceph


active
Ceph processes requests to the PG.

backfill_toofull
A backfill operation is waiting because the destination OSD is over the backfillfull ratio.

backfill_unfound
Backfill stopped due to unfound objects.

backfill_wait
The PG is waiting in line to start backfill.

backfilling
Ceph is scanning and synchronizing the entire contents of a PG instead of inferring what contents need to be synchronized from the
logs of recent operations. Backfill is a special case of recovery.

clean
Ceph replicated all objects in the PG accurately.

creating
Ceph is still creating the PG.

deep
Ceph is checking the PG data against stored checksums.

degraded
Ceph has not replicated some objects in the PG accurately yet.

down
A replica with necessary data is down, so the PG is offline. A PG with less than min_size replicas is marked as down. Use ceph
health detail to understand the backing OSD state.

forced_backfill
High backfill priority of that PG is enforced by user.

forced_recovery
High recovery priority of that PG is enforced by user.

incomplete
Ceph detects that a PG is missing information about writes that might have occurred, or does not have any healthy copies. If you see
this state, try to start any failed OSDs that might contain the needed information. In the case of an erasure coded pool, temporarily
reducing min_size might allow recovery.

inconsistent
Ceph detects inconsistencies in one or more replicas of an object in the PG, such as objects are the wrong size, objects are missing
from one replica after recovery finished.

peering
The PG is undergoing the peering process. A peering process should clear off without much delay, but if it stays and the number of
PGs in a peering state does not reduce in number, the peering might be stuck.

peered
The PG has peered, but cannot serve client IO due to not having enough copies to reach the pool’s configured min_size parameter.
Recovery might occur in this state, so the PG might heal up to min_size eventually.

recovering
Ceph is migrating or synchronizing objects and their replicas.

recovery_toofull
A recovery operation is waiting because the destination OSD is over its full ratio.

recovery_unfound
Recovery stopped due to unfound objects.

recovery_wait
The PG is waiting in line to start recovery.

IBM Storage Ceph 93


remapped
The PG is temporarily mapped to a different set of OSDs from what CRUSH specified.

repair
Ceph is checking the PG and repairing any inconsistencies it finds, if possible.

replay
The PG is waiting for clients to replay operations after an OSD crashed.

snaptrim
Trimming snaps.

snaptrim_error
Error stopped trimming snaps.

snaptrim_wait
Queued to trim snaps.

scrubbing
Ceph is checking the PG metadata for inconsistencies.

splitting
Ceph is splitting the PG into multiple PGs.

stale
The PG is in an unknown state; the monitors have not received an update for it since the PG mapping changed.

undersized
The PG has fewer copies than the configured pool replication level.

unknown
The ceph-mgr has not yet received any information about the PG’s state from an OSD since Ceph Manager started up.

References

See the knowledge base What are the possible Placement Group states in an Ceph cluster for more information.

Placement group tradeoffs


Edit online
Data durability and data distribution among all OSDs call for more placement groups but their number should be reduced to the
minimum required for maximum performance to conserve CPU and memory resources.

Data durability
Data distribution
Resource usage

Data durability
Edit online
Ceph strives to prevent the permanent loss of data. However, after an OSD fails, the risk of permanent data loss increases until the
data it had is fully recovered. Permanent data loss, though rare, is still possible. The following scenario describes how Ceph could
permanently lose data in a single placement group with three copies of the data:

An OSD fails and all copies of the object it contains are lost. For all objects within a placement group stored on the OSD, the
number of replicas suddenly drops from three to two.

Ceph starts recovery for each placement group stored on the failed OSD by choosing a new OSD to re-create the third copy of
all objects for each placement group.

The second OSD containing a copy of the same placement group fails before the new OSD is fully populated with the third
copy. Some objects will then only have one surviving copy.

94 IBM Storage Ceph


Ceph picks yet another OSD and keeps copying objects to restore the desired number of copies.

The third OSD containing a copy of the same placement group fails before recovery is complete. If this OSD contained the only
remaining copy of an object, the object is lost permanently.

Hardware failure isn’t an exception, but an expectation. To prevent the foregoing scenario, ideally the recovery process should be as
fast as reasonably possible. The size of your cluster, your hardware configuration and the number of placement groups play an
important role in total recovery time.

Small clusters don’t recover as quickly.

In a cluster containing 10 OSDs with 512 placement groups in a three replica pool, CRUSH will give each placement group three
OSDs. Each OSD will end up hosting (512 * 3) / 10 = ~150 placement groups. When the first OSD fails, the cluster will start
recovery for all 150 placement groups simultaneously.

It is likely that Ceph stored the remaining 150 placement groups randomly across the 9 remaining OSDs. Therefore, each remaining
OSD is likely to send copies of objects to all other OSDs and also receive some new objects, because the remaining OSDs become
responsible for some of the 150 placement groups now assigned to them.

The total recovery time depends upon the hardware supporting the pool. For example, in a 10 OSD cluster, if a host contains one OSD
with a 1 TB SSD, and a 10 GB/s switch connects each of the 10 hosts, the recovery time will take M minutes. By contrast, if a host
contains two SATA OSDs and a 1 GB/s switch connects the five hosts, recovery will take substantially longer. Interestingly, in a
cluster of this size, the number of placement groups has almost no influence on data durability. The placement group count could be
128 or 8192 and the recovery would not be slower or faster.

However, growing the same Ceph cluster to 20 OSDs instead of 10 OSDs is likely to speed up recovery and therefore improve data
durability significantly. Why? Each OSD now participates in only 75 placement groups instead of 150. The 20 OSD cluster will still
require all 19 remaining OSDs to perform the same amount of copy operations in order to recover. In the 10 OSD cluster, each OSDs
had to copy approximately 100 GB. In the 20 OSD cluster each OSD only has to copy 50 GB each. If the network was the bottleneck,
recovery will happen twice as fast. In other words, recovery time decreases as the number of OSDs increases.

In large clusters, PG count is important!

If the exemplary cluster grows to 40 OSDs, each OSD will only host 35 placement groups. If an OSD dies, recovery time will decrease
unless another bottleneck precludes improvement. However, if this cluster grows to 200 OSDs, each OSD will only host
approximately 7 placement groups. If an OSD dies, recovery will happen between at most of 21 (7 * 3) OSDs in these placement
groups: recovery will take longer than when there were 40 OSDs, meaning the number of placement groups should be
increased!

IMPORTANT: No matter how short the recovery time, there is a chance for another OSD storing the placement group to fail while
recovery is in progress.

In the 10 OSD cluster described above, if any OSD fails, then approximately 8 placement groups (that is 75 pgs / 9 osds being
recovered) will only have one surviving copy. And if any of the 8 remaining OSDs fail, the last objects of one placement group are
likely to be lost (that is 8 pgs / 8 osds with only one remaining copy being recovered). This is why starting with a somewhat
larger cluster is preferred (for example, 50 OSDs).

When the size of the cluster grows to 20 OSDs, the number of placement groups damaged by the loss of three OSDs drops. The
second OSD lost will degrade approximately 2 (that is 35 pgs / 19 osds being recovered) instead of 8 and the third OSD lost will
only lose data if it is one of the two OSDs containing the surviving copy. In other words, if the probability of losing one OSD is
0.0001% during the recovery time frame, it goes from 8 * 0.0001% in the cluster with 10 OSDs to 2 * 0.0001% in the cluster
with 20 OSDs. Having 512 or 4096 placement groups is roughly equivalent in a cluster with less than 50 OSDs as far as data
durability is concerned.

TIP In a nutshell, more OSDs means faster recovery and a lower risk of cascading failures leading to the permanent loss of a
placement group and its objects.

When you add an OSD to the cluster, it might take a long time to populate the new OSD with placement groups and objects. However
there is no degradation of any object and adding the OSD has no impact on data durability.

Data distribution
Edit online

IBM Storage Ceph 95


Ceph seeks to avoid hot spots—that is, some OSDs receive substantially more traffic than other OSDs. Ideally, CRUSH assigns objects
to placement groups evenly so that when the placement groups get assigned to OSDs (also pseudo randomly), the primary OSDs
store objects such that they are evenly distributed across the cluster and hot spots and network over-subscription problems cannot
develop because of data distribution.

Since CRUSH computes the placement group for each object, but does not actually know how much data is stored in each OSD within
this placement group, the ratio between the number of placement groups and the number of OSDs might influence the
distribution of the data significantly.

For instance, if there was only one placement group with ten OSDs in a three replica pool, Ceph would only use three OSDs to store
data because CRUSH would have no other choice. When more placement groups are available, CRUSH is more likely to evenly spread
objects across OSDs. CRUSH also evenly assigns placement groups to OSDs.

As long as there are one or two orders of magnitude more placement groups than OSDs, the distribution should be even. For
instance, 256 placement groups for 3 OSDs, 512 or 1024 placement groups for 10 OSDs, and so forth.

The ratio between OSDs and placement groups usually solves the problem of uneven data distribution for Ceph clients that
implement advanced features like object striping. For example, a 4 TB block device might get sharded up into 4 MB objects.

The ratio between OSDs and placement groups does not address uneven data distribution in other cases, because CRUSH does
not take object size into account. Using the librados interface to store some relatively small objects and some very large objects
can lead to uneven data distribution. For example, one million 4K objects totaling 4 GB are evenly spread among 1000 placement
groups on 10 OSDs. They will use 4 GB / 10 = 400 MB on each OSD. If one 400 MB object is added to the pool, the three OSDs
supporting the placement group in which the object has been placed will be filled with 400 MB + 400 MB = 800 MB while the
seven others will remain occupied with only 400 MB.

Resource usage
Edit online
For each placement group, OSDs and Ceph monitors need memory, network and CPU at all times, and even more during recovery.
Sharing this overhead by clustering objects within a placement group is one of the main reasons placement groups exist.

Minimizing the number of placement groups saves significant amounts of resources.

Placement group count


Edit online
The number of placement groups in a pool plays a significant role in how a cluster peers, distributes data and rebalances. Small
clusters don’t see as many performance improvements compared to large clusters by increasing the number of placement groups.
However, clusters that have many pools accessing the same OSDs might need to carefully consider PG count so that Ceph OSDs use
resources efficiently.

TIP IBM recommends 100 to 200 PGs per OSD.

Placement group calculator


Configuring default placement group count
Placement group count for small clusters
Calculating placement group count
Maximum placement group count

Placement group calculator


Edit online
The PG calculator calculates the number of placement groups for you and addresses specific use cases. The PG calculator is
especially helpful when using Ceph clients like the Ceph Object Gateway where there are many pools typically using the same rule
(CRUSH hierarchy). You might still calculate PGs manually using the guidelines in PG Count for Small Clusters and Calculating PG
Count. However, the PG calculator is the preferred method of calculating PGs.

96 IBM Storage Ceph


See Ceph Placement Groups (PGs) per Pool Calculator on the Red Hat Customer Portal for details.

Configuring default placement group count


Edit online
When you create a pool, you also create a number of placement groups for the pool. If you don’t specify the number of placement
groups, Ceph will use the default value of 8, which is unacceptably low. You can increase the number of placement groups for a pool,
but we recommend setting reasonable default values too.

osd pool default pg num = 100


osd pool default pgp num = 100

You need to set both the number of placement groups (total), and the number of placement groups used for objects (used in PG
splitting). They should be equal.

Placement group count for small clusters


Edit online
Small clusters don’t benefit from large numbers of placement groups. As the number of OSDs increase, choosing the right value for
pg_num and pgp_num becomes more important because it has a significant influence on the behavior of the cluster as well as the
durability of the data when something goes wrong (that is the probability that a catastrophic event leads to data loss). It is important
to use the PG calculator with small clusters.

Calculating placement group count


Edit online
If you have more than 50 OSDs, we recommend approximately 50-100 placement groups per OSD to balance out resource usage,
data durability and distribution. If you have less than 50 OSDs, choosing among the PG Count for Small Clusters is ideal. For a single
pool of objects, you can use the following formula to get a baseline:

(OSDs * 100)
Total PGs = ------------
pool size

Where pool size is either the number of replicas for replicated pools or the K+M sum for erasure coded pools (as returned by ceph
osd erasure-code-profile get).

You should then check if the result makes sense with the way you designed your Ceph cluster to maximize data durability, data
distribution and minimize resource usage.

The result should be rounded up to the nearest power of two. Rounding up is optional, but recommended for CRUSH to evenly
balance the number of objects among placement groups.

For a cluster with 200 OSDs and a pool size of 3 replicas, you would estimate your number of PGs as follows:

(200 * 100)
----------- = 6667. Nearest power of 2: 8192
3

With 8192 placement groups distributed across 200 OSDs, that evaluates to approximately 41 placement groups per OSD. You also
need to consider the number of pools you are likely to use in your cluster, since each pool will create placement groups too. Ensure
that you have a reasonable maximum PG count.

Maximum placement group count


Edit online

IBM Storage Ceph 97


When using multiple data pools for storing objects, you need to ensure that you balance the number of placement groups per pool
with the number of placement groups per OSD so that you arrive at a reasonable total number of placement groups. The aim is to
achieve reasonably low variance per OSD without taxing system resources or making the peering process too slow.

In an exemplary Ceph Storage Cluster consisting of 10 pools, each pool with 512 placement groups on ten OSDs, there are a total of
5,120 placement groups spread over ten OSDs, or 512 placement groups per OSD. That might not use too many resources
depending on your hardware configuration. By contrast, if you create 1,000 pools with 512 placement groups each, the OSDs will
handle ~50,000 placement groups each and it would require significantly more resources. Operating with too many placement
groups per OSD can significantly reduce performance, especially during rebalancing or recovery.

The Ceph Storage Cluster has a default maximum value of 300 placement groups per OSD. You can set a different maximum value in
your Ceph configuration file.

mon pg warn max per osd

TIP Ceph Object Gateways deploy with 10-15 pools, so you might consider using less than 100 PGs per OSD to arrive at a
reasonable maximum number.

Auto-scaling placement groups


Edit online
The number of placement groups (PGs) in a pool plays a significant role in how a cluster peers, distributes data, and rebalances.

Auto-scaling the number of PGs can make managing the cluster easier. The pg-autoscaling command provides recommendations
for scaling PGs, or automatically scales PGs based on how the cluster is being used.

To learn more about how auto-scaling works, see Placement group auto-scaling.

To enable, or disable auto-scaling, see Setting placement group auto-scaling modes.

To view placement group scaling recommendations, see Viewing placement group scaling recommendations.

To set placement group auto-scaling, see Setting placement group auto-scaling.

To update the autoscaler globally, see Updating noautoscale flag

To set target pool size, see Specifying target pool size

Placement group auto-scaling


Placement group splitting and merging
Setting placement group auto-scaling modes
Viewing placement group scaling recommendations
Setting placement group auto-scaling

Placement group auto-scaling


Edit online
How the auto-scaler works

The auto-scaler analyzes pools and adjusts on a per-subtree basis. Because each pool can map to a different CRUSH rule, and each
rule can distribute data across different devices, Ceph considers utilization of each subtree of the hierarchy independently. For
example, a pool that maps to OSDs of class ssd, and a pool that maps to OSDs of class hdd, will each have optimal PG counts that
depend on the number of those respective device types.

Viewing placement group scaling recommendations

Viewing placement group scaling recommendations


Edit online

98 IBM Storage Ceph


You can view the pool, its relative utilization and any suggested changes to the PG count in the storage cluster.

Prerequisites
Edit online

A running IBM Storage Ceph cluster

Root-level access to all the nodes.

Procedure
Edit online

You can view each pool, its relative utilization, and any suggested changes to the PG count using:

[ceph: root@host01 /]# ceph osd pool autoscale-status

Output will look similar to the following:

POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO
EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK
device_health_metrics 0 3.0 374.9G 0.0000
1.0 1 on False
cephfs.cephfs.meta 24632 3.0 374.9G 0.0000
4.0 32 on False
cephfs.cephfs.data 0 3.0 374.9G 0.0000
1.0 32 on False
.rgw.root 1323 3.0 374.9G 0.0000
1.0 32 on False
default.rgw.log 3702 3.0 374.9G 0.0000
1.0 32 on False
default.rgw.control 0 3.0 374.9G 0.0000
1.0 32 on False
default.rgw.meta 382 3.0 374.9G 0.0000
4.0 8 on False

SIZE is the amount of data stored in the pool.

TARGET SIZE, if present, is the amount of data the administrator has specified they expect to eventually be stored in this pool. The
system uses the larger of the two values for its calculation.

RATE is the multiplier for the pool that determines how much raw storage capacity the pool uses. For example, a 3 replica pool has a
ratio of 3.0, while a k=4,m=2 erasure coded pool has a ratio of 1.5.

RAW CAPACITY is the total amount of raw storage capacity on the OSDs that are responsible for storing the pool’s data.

RATIO is the ratio of the total capacity that the pool is consuming, that is, ratio = size * rate / raw capacity.

TARGET RATIO, if present, is the ratio of storage the administrator has specified that they expect the pool to consume relative to
other pools with target ratios set. If both target size bytes and ratio are specified, the ratio takes precedence. The default value of
TARGET RATIO is 0 unless it was specified while creating the pool. The more the --target_ratio you give in a pool, the larger the
PGs you are expecting the pool to have.

EFFECTIVE RATIO, is the target ratio after adjusting in two ways: 1. subtracting any capacity expected to be used by pools with
target size set. 2. normalizing the target ratios among pools with target ratio set so they collectively target the rest of the space. For
example, 4 pools with target ratio 1.0 would have an effective ratio of 0.25. The system uses the larger of the actual ratio
and the effective ratio for its calculation.

BIAS, is used as a multiplier to manually adjust a pool’s PG based on prior information about how much PGs a specific pool is
expected to have. By default, the value if 1.0 unless it was specified when creating a pool. The more --bias you give in a pool, the
larger the PGs you are expecting the pool to have.

PG_NUM is the current number of PGs for the pool, or the current number of PGs that the pool is working towards, if a pg_num change
is in progress. NEW PG_NUM, if present, is the suggested number of PGs (pg_num). It is always a power of 2, and is only present if the
suggested value varies from the current value by more than a factor of 3.

AUTOSCALE, is the pool pg_autoscale_mode, and is either on, off, or warn.

IBM Storage Ceph 99


BULK, is used to determine which pool should start out with a full complement of PGs. BULK only scales down when the usage ratio
across the pool is not even. If the pool does not have this flag the pool starts out with a minimal amount of PGs and only used when
there is more usage in the pool.

The BULK values are true, false, 1, or 0, where 1 is equivalent to true and 0 is equivalent to false. The default value is false.

Set the BULK value either during or after pool creation.

For more information about use the bulk flag, see Creating a pool and Setting placement group auto-scaling modes.

Placement group splitting and merging


Edit online
Splitting

IBM Storage Ceph can split existing placement groups (PGs) into smaller PGs, which increases the total number of PGs for a given
pool. Splitting existing placement groups (PGs) allows a small IBM Storage Ceph cluster to scale over time as storage requirements
increase. The PG auto-scaling feature can increase the pg_num value, which causes the existing PGs to split as the storage cluster
expands. If the PG auto-scaling feature is disabled, then you can manually increase the pg_num value, which triggers the PG split
process to begin. For example, increasing the pg_num value from 4 to 16, will split into four pieces. Increasing the pg_num value will
also increase the pgp_num value, but the pgp_num value increases at a gradual rate. This gradual increase is done to minimize the
impact to a storage cluster’s performance and to a client’s workload, because migrating object data adds a significant load to the
system. By default, Ceph queues and moves no more than 5% of the object data that is in a "misplaced" state. This default
percentage can be adjusted with the target_max_misplaced_ratio option.

Figure 1. Splitting

Merging

IBM Storage Ceph can also merge two existing PGs into a larger PG, which decreases the total number of PGs. Merging two PGs
together can be useful, especially when the relative amount of objects in a pool decreases over time, or when the initial number of
PGs chosen was too large. While merging PGs can be useful, it is also a complex and delicate process. When doing a merge, pausing
I/O to the PG occurs, and only one PG is merged at a time to minimize the impact to a storage cluster’s performance. Ceph works
slowly on merging the object data until the new pg_num value is reached.

Figure 2. Merging

100 IBM Storage Ceph


Setting placement group auto-scaling modes
Edit online
Each pool in the IBM Storage Ceph cluster has a pg_autoscale_mode property for PGs that you can set to off, on, or warn.

off: Disables auto-scaling for the pool. It is up to the administrator to choose an appropriate PG number for each pool. Refer
to the PG count section for more information.

on: Enables automated adjustments of the PG count for the given pool.

warn: Raises health alerts when the PG count needs adjustment.

NOTE: In IBM Storage Ceph 5.3, pg_autoscale_mode is on by default. Upgraded storage clusters retain the existing
pg_autoscale_mode setting. The pg_auto_scale mode is on for the newly created pools. PG count is automatically adjusted,
and ceph status might display a recovering state during PG count adjustment.

The autoscaler uses the bulk flag to determine which pool should start with a full complement of PGs and only scales down when
the usage ratio across the pool is not even. However, if the pool does not have the bulk flag, the pool starts with minimal PGs and
only when there is more usage in the pool.

NOTE: The autoscaler identifies any overlapping roots and prevents the pools with such roots from scaling because overlapping
roots can cause problems with the scaling process.

Procedure
Edit online

Enable auto-scaling on an existing pool:

Syntax

ceph osd pool set POOL_NAME pg_autoscale_mode on

Example

[ceph: root@host01 /]# ceph osd pool set testpool pg_autoscale_mode on

Enable auto-scaling on a newly created pool:

IBM Storage Ceph 101


Syntax

ceph config set global osd_pool_default_pg_autoscale_mode MODE

Example

[ceph: root@host01 /]# ceph config set global osd_pool_default_pg_autoscale_mode on

Create a pool with the bulk flag:

Syntax

ceph osd pool create POOL_NAME --bulk

Example

[ceph: root@host01 /]# ceph osd pool create testpool --bulk

Set or unset the bulk flag for an existing pool:

IMPORTANT: The values must be written as true, false, 1, or 0. 1 is equivalent to true and 0 is equivalent to false. If
written with different capitalization, or with other content, an error is emitted.

The following is an example of the command written with the wrong syntax:

[ceph: root@host01 /]# ceph osd pool set ec_pool_overwrite bulk True
Error EINVAL: expecting value 'true', 'false', '0', or '1'

Syntax

ceph osd pool set POOL_NAME bulk true/false/1/0

Example

[ceph: root@host01 /]# ceph osd pool set testpool bulk true

Get the bulk flag of an existing pool:

Syntax

ceph osd pool get POOL_NAME bulk

Example

[ceph: root@host01 /]# ceph osd pool get testpool bulk


bulk: true

Viewing placement group scaling recommendations


Edit online
You can view the pool, its relative utilization and any suggested changes to the PG count in the storage cluster.

Prerequisites
Edit online

A running IBM Storage Ceph cluster

Root-level access to all the nodes.

Procedure
Edit online

You can view each pool, its relative utilization, and any suggested changes to the PG count using:

[ceph: root@host01 /]# ceph osd pool autoscale-status

102 IBM Storage Ceph


Output will look similar to the following:

POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO
EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK
device_health_metrics 0 3.0 374.9G 0.0000
1.0 1 on False
cephfs.cephfs.meta 24632 3.0 374.9G 0.0000
4.0 32 on False
cephfs.cephfs.data 0 3.0 374.9G 0.0000
1.0 32 on False
.rgw.root 1323 3.0 374.9G 0.0000
1.0 32 on False
default.rgw.log 3702 3.0 374.9G 0.0000
1.0 32 on False
default.rgw.control 0 3.0 374.9G 0.0000
1.0 32 on False
default.rgw.meta 382 3.0 374.9G 0.0000
4.0 8 on False

SIZE is the amount of data stored in the pool.

TARGET SIZE, if present, is the amount of data the administrator has specified they expect to eventually be stored in this pool. The
system uses the larger of the two values for its calculation.

RATE is the multiplier for the pool that determines how much raw storage capacity the pool uses. For example, a 3 replica pool has a
ratio of 3.0, while a k=4,m=2 erasure coded pool has a ratio of 1.5.

RAW CAPACITY is the total amount of raw storage capacity on the OSDs that are responsible for storing the pool’s data.

RATIO is the ratio of the total capacity that the pool is consuming, that is, ratio = size * rate / raw capacity.

TARGET RATIO, if present, is the ratio of storage the administrator has specified that they expect the pool to consume relative to
other pools with target ratios set. If both target size bytes and ratio are specified, the ratio takes precedence. The default value of
TARGET RATIO is 0 unless it was specified while creating the pool. The more the --target_ratio you give in a pool, the larger the
PGs you are expecting the pool to have.

EFFECTIVE RATIO, is the target ratio after adjusting in two ways: 1. subtracting any capacity expected to be used by pools with
target size set. 2. normalizing the target ratios among pools with target ratio set so they collectively target the rest of the space. For
example, 4 pools with target ratio 1.0 would have an effective ratio of 0.25. The system uses the larger of the actual ratio
and the effective ratio for its calculation.

BIAS, is used as a multiplier to manually adjust a pool’s PG based on prior information about how much PGs a specific pool is
expected to have. By default, the value if 1.0 unless it was specified when creating a pool. The more --bias you give in a pool, the
larger the PGs you are expecting the pool to have.

PG_NUM is the current number of PGs for the pool, or the current number of PGs that the pool is working towards, if a pg_num change
is in progress. NEW PG_NUM, if present, is the suggested number of PGs (pg_num). It is always a power of 2, and is only present if the
suggested value varies from the current value by more than a factor of 3.

AUTOSCALE, is the pool pg_autoscale_mode, and is either on, off, or warn.

BULK, is used to determine which pool should start out with a full complement of PGs. BULK only scales down when the usage ratio
across the pool is not even. If the pool does not have this flag the pool starts out with a minimal amount of PGs and only used when
there is more usage in the pool.

The BULK values are true, false, 1, or 0, where 1 is equivalent to true and 0 is equivalent to false. The default value is false.

Set the BULK value either during or after pool creation.

For more information about use the bulk flag, see Creating a pool and Setting placement group auto-scaling modes.

Setting placement group auto-scaling


Edit online
Allowing the cluster to automatically scale PGs based on cluster usage is the simplest approach to scaling PGs. IBM Storage Ceph
takes the total available storage and the target number of PGs for the whole system, compares how much data is stored in each pool,

IBM Storage Ceph 103


and apportions the PGs accordingly. The command only makes changes to a pool whose current number of PGs (pg_num) is more
than three times off from the calculated or suggested PG number.

The target number of PGs per OSD is based on the mon_target_pg_per_osd configurable. The default value is set to 100.

To adjust mon_target_pg_per_osd:

Syntax

ceph config set global mon_target_pg_per_osd number

Example

[ceph: root@host01 /]# ceph config set global mon_target_pg_per_osd 150

Setting minimum and maximum number of placement groups for pools

Setting minimum and maximum number of placement groups for


pools
Edit online
Specify the minimum and maximum value of placement groups (PGs) in order to limit the auto-scaling range.

If a minimum value is set, Ceph does not automatically reduce, or recommend to reduce, the number of PGs to a value below the set
minimum value.

If a minimum value is set, Ceph does not automatically increase, or recommend to increase, the number of PGs to a value above the
set maximum value.

The minimum and maximum values can be set together, or separately.

In addition to the this procedure, the ceph osd pool create command has two command-line options that can be used to
specify the minimum or maximum PG count at the time of pool creation.

Syntax

ceph osd pool create --pg-num-min NUMBER


ceph osd pool create --pg-num-max NUMBER

Example

ceph osd pool create --pg-num-min 50


ceph osd pool create --pg-num-max 150

Prerequisites
Edit online

A running IBM Storage Ceph cluster

Root-level access to the node

Procedure
Edit online

Set the minimum number of PGs for a pool.

Syntax

ceph osd pool set POOL_NAME pg_num_min NUMBER

Example

[ceph: root@host01 /]# ceph osd pool set testpool pg_num_min 50

104 IBM Storage Ceph


Set the maximum number of PGs for a pool.

Syntax

ceph osd pool set POOL_NAME pg_num_max NUMBER

Example

[ceph: root@host01 /]# ceph osd pool set testpool pg_num_max 150

Resources
Edit online
For more information, see:

Setting placement group auto-scaling modes

Placement Group count

Updating noautoscale flag

Edit online
If you want to enable or disable the autoscaler for all the pools at same time, you can use the noautoscale global flag. This global
flag is useful during upgradation of the storage cluster when some OSDs are bounced or when the cluster is under maintenance. You
can set the flag before any activity and unset it once the activity is complete.

By default, the noautoscale flag is set to off. When this flag is set, then all the pools have pg_autoscale_mode as off and all
the pools have the autoscaler disabled.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Procedure
Edit online

1. Get the value of the noautoscale flag:

Example

[ceph: root@host01 /]# ceph osd pool get noautoscale

2. Set the noautoscale flag before any activity:

Example

[ceph: root@host01 /]# ceph osd pool set noautoscale

3. Unset the noautoscale flag on completion of the activity:

Example

[ceph: root@host01 /]# ceph osd pool unset noautoscale

Specifying target pool size

IBM Storage Ceph 105


Edit online
A newly created pool consumes a small fraction of the total cluster capacity and appears to the system that it will need a small
number of PGs. However, in most cases, cluster administrators know which pools are expected to consume most of the system
capacity over time. If you provide this information, known as the target size to IBM Storage Ceph, such pools can use a more
appropriate number of PGs (pg_num) from the beginning. This approach prevents subsequent changes in pg_num and the overhead
associated with moving data around when making those adjustments.

You can specify target size of a pool in these ways:

Specifying target size using the absolute size of the pool

Specifying target size using the total cluster capacity

Specifying target size using the absolute size of the pool


Specifying target size using the total cluster capacity

Specifying target size using the absolute size of the pool


Edit online

1. Set the target size using the absolute size of the pool in bytes:

ceph osd pool set pool-name target_size_bytes value

For example, to instruct the system that mypool is expected to consume 100T of space:

$ ceph osd pool set mypool target_size_bytes 100T

You can also set the target size of a pool at creation time by adding the optional --target-size-bytes <bytes> argument to the
ceph osd pool create command.

Specifying target size using the total cluster capacity


Edit online

1. Set the target size using the ratio of the total cluster capacity:

Syntax

ceph osd pool set pool-name target_size_ratio ratio

Example

[ceph: root@host01 /]# ceph osd pool set mypool target_size_ratio 1.0

tells the system that the pool mypool is expected to consume 1.0 relative to the other pools with target_size_ratio set.
If mypool is the only pool in the cluster, this means an expected use of 100% of the total capacity. If there is a second pool
with target_size_ratio as 1.0, both pools would expect to use 50% of the cluster capacity.

You can also set the target size of a pool at creation time by adding the optional --target-size-ratio <ratio> argument to the
ceph osd pool create command.

NOTE If you specify impossible target size values, for example, a capacity larger than the total cluster, or ratios that sum to more
than 1.0, the cluster raises a POOL_TARGET_SIZE_RATIO_OVERCOMMITTED or POOL_TARGET_SIZE_BYTES_OVERCOMMITTED
health warning. If you specify both target_size_ratio and target_size_bytes for a pool, the cluster considers only the ratio, and raises
a POOL_HAS_TARGET_SIZE_BYTES_AND_RATIO health warning.

Placement group command line interface


Edit online

106 IBM Storage Ceph


The ceph CLI allows you to set and get the number of placement groups for a pool, view the PG map and retrieve PG statistics.

Setting number of placement groups in a pool


Getting number of placement groups in a pool
Getting statistics for placement groups
Getting statistics for stuck placement groups
Getting placement group maps
Scrubbing placement groups
Getting a placement group statistics
Marking unfound objects

Setting number of placement groups in a pool


Edit online
To set the number of placement groups in a pool, you must specify the number of placement groups at the time you create the pool.
See Create a Pool for details. Once you set placement groups for a pool, you can increase or decrease the number of placement
groups. To change the number of placement groups, use the following command:

Syntax

ceph osd pool set POOL_NAME pg_num PG_NUM

Example

[ceph: root@host01 /]# ceph osd pool set pool1 pg_num 60


set pool 2 pg_num to 60

Once you increase or decrease the number of placement groups, you must also adjust the number of placement groups for
placement (pgp_num) before your cluster rebalances. The pgp_num should be equal to the pg_num. To increase the number of
placement groups for placement, execute the following:

Syntax

ceph osd pool set POOL_NAME pgp_num PGP_NUM

Example

[ceph: root@host01 /]# ceph osd pool set pool1 pgp_num 60


set pool 2 pgp_num to 60

Getting number of placement groups in a pool


Edit online
To get the number of placement groups in a pool, run the following:

Syntax

ceph osd pool get POOL_NAME pg_num

Example

[ceph: root@host01 /]# ceph osd pool get testpool 60

Getting statistics for placement groups


Edit online
To get the statistics for the placement groups in your storage cluster, execute the following:

Syntax

IBM Storage Ceph 107


ceph pg dump [--format FORMAT]

Valid formats are plain (default) and json.

Getting statistics for stuck placement groups


Edit online
To get the statistics for all placement groups stuck in a specified state, execute the following:

Syntax

ceph pg dump_stuck {inactive|unclean|stale|undersized|degraded


[inactive|unclean|stale|undersized|degraded...]} INTEGER

Inactive Placement groups cannot process reads or writes because they are waiting for an OSD with the most up-to-date data to
come up and in.

Unclean Placement groups contain objects that are not replicated the desired number of times. They should be recovering.

Stale Placement groups are in an unknown state - the OSDs that host them have not reported to the monitor cluster in a while
(configured by mon_osd_report_timeout).

Valid formats are plain (default) and json. The threshold defines the minimum number of seconds the placement group is stuck
before including it in the returned statistics (default 300 seconds).

Getting placement group maps


Edit online
To get the placement group map for a particular placement group, execute the following:

Syntax

ceph pg map PG_ID

Example

[ceph: root@host01 /]# ceph pg map 1.6c


osdmap e13 pg 1.6c (1.6c) -> up [1,0] acting [1,0]

Ceph returns the placement group map, the placement group, and the OSD status:

Scrubbing placement groups


Edit online
To scrub a placement group, execute the following:

Syntax

ceph pg scrub PG_ID

Ceph checks the primary and any replica nodes, generates a catalog of all objects in the placement group and compares them to
ensure that no objects are missing or mismatched, and their contents are consistent. Assuming the replicas all match, a final
semantic sweep ensures that all of the snapshot-related object metadata is consistent. Errors are reported via logs.

Getting a placement group statistics


Edit online

108 IBM Storage Ceph


Retrieve statistics for a particular placement group:

Syntax

ceph pg PG_ID query

Marking unfound objects

Edit online
If the cluster has lost one or more objects, and you have decided to abandon the search for the lost data, you must mark the unfound
objects as lost.

If all possible locations have been queried and objects are still lost, you might have to give up on the lost objects. This is possible
given unusual combinations of failures that allow the cluster to learn about writes that were performed before the writes themselves
are recovered.

Currently the only supported option is "revert", which will either roll back to a previous version of the object or (if it was a new object)
forget about it entirely. To mark the "unfound" objects as "lost", execute the following:

Syntax

ceph pg PG_ID mark_unfound_lost revert|delete

IMPORTANT: Use this feature with caution, because it might confuse applications that expect the object(s) to exist.

Pools overview
Edit online
Ceph clients store data in pools. When you create pools, you are creating an I/O interface for clients to store data. From the
perspective of a Ceph client (that is, block device, gateway, and the rest), interacting with the Ceph storage cluster is remarkably
simple: create a cluster handle and connect to the cluster; then, create an I/O context for reading and writing objects and their
extended attributes.

Create a Cluster Handle and Connect to the Cluster

To connect to the Ceph storage cluster, the Ceph client needs the cluster name (usually ceph by default) and an initial monitor
address. Ceph clients usually retrieve these parameters using the default path for the Ceph configuration file and then read it from
the file, but a user might also specify the parameters on the command line too. The Ceph client also provides a user name and secret
key (authentication is on by default). Then, the client contacts the Ceph monitor cluster and retrieves a recent copy of the cluster
map, including its monitors, OSDs and pools.

Figure 1. Create Handle

Create a Pool I/O Context

To read and write data, the Ceph client creates an i/o context to a specific pool in the Ceph storage cluster. If the specified user has
permissions for the pool, the Ceph client can read from and write to the specified pool.

IBM Storage Ceph 109


Figure 2. I/O Context

Ceph’s architecture enables the storage cluster to provide this remarkably simple interface to Ceph clients so that clients might
select one of the sophisticated storage strategies you define simply by specifying a pool name and creating an I/O context. Storage
strategies are invisible to the Ceph client in all but capacity and performance. Similarly, the complexities of Ceph clients (mapping
objects into a block device representation, providing an S3/Swift RESTful service) are invisible to the Ceph storage cluster.

A pool provides you with:

Resilience: You can set how many OSD are allowed to fail without losing data. For replicated pools, it is the desired number of
copies/replicas of an object. A typical configuration stores an object and one additional copy (that is, size = 2), but you can
determine the number of copies/replicas. For erasure coded pools, it is the number of coding chunks (that is m=2 in the
erasure code profile)

Placement Groups: You can set the number of placement groups for the pool. A typical configuration uses approximately 50-
100 placement groups per OSD to provide optimal balancing without using up too many computing resources. When setting
up multiple pools, be careful to ensure you set a reasonable number of placement groups for both the pool and the cluster as
a whole.

CRUSH Rules: When you store data in a pool, a CRUSH rule mapped to the pool enables CRUSH to identify the rule for the
placement of each object and its replicas or chunks for erasure coded pools in your cluster. You can create a custom CRUSH
rule for your pool.

Snapshots: When you create snapshots with ceph osd pool mksnap, you effectively take a snapshot of a particular pool.

Quotas: When you set quotas on a pool with ceph osd pool set-quota you might limit the maximum number of objects
or the maximum number of bytes stored in the specified pool.

Pools and storage strategies overview


Listing pool
Creating a pool
Setting pool quota
Deleting a pool
Renaming a pool
Viewing pool statistics
Setting pool values
Getting pool values
Enabling a client application
Disabling a client application
Setting application metadata
Removing application metadata
Setting the number of object replicas
Getting the number of object replicas
Pool values

Pools and storage strategies overview

110 IBM Storage Ceph


Edit online
To manage pools, you can list, create, and remove pools. You can also view the utilization statistics for each pool.

Listing pool
Edit online
To list your cluster’s pools, execute:

ceph osd lspools

Creating a pool
Edit online
Before creating pools, see the Pool, PG and CRUSH Configuration Reference.

NOTE The system administrators must expressly enable a pool to receive I/O operations from Ceph clients. See Enabling a client
application for details. Failure to enable a pool will result in a HEALTH_WARN status.

It is better to adjust the default value for the number of placement groups in the Ceph configuration file, as the default value does
not have to suit your needs.

Example

osd pool default pg num = 100


osd pool default pgp num = 100

To create a replicated pool, execute:

Syntax

ceph osd pool create POOL_NAME PG_NUM PGP_NUM [replicated]


[CRUSH_RULE_NAME] [EXPECTED_NUMBER_OBJECTS]

To create a bulk pool, run:

Syntax

ceph osd pool create POOL_NAME --bulk

To create an erasure-coded pool, execute:

Syntax

ceph osd pool create POOL_NAME PG_NUM PGP_NUM erasure


[ERASURE_CODE_PROFILE] [CRUSH_RULE_NAME] [EXPECTED_NUMBER_OBJECTS]

Where:

POOL_NAME
Description
The name of the pool. It must be unique.

Type
String

Required
Yes. If not specified, it is set to the value listed in the Ceph configuration file or to the default value.

Default
ceph

PG_NUM
Description
The total number of placement groups for the pool. See the See the Placement Groups and the Ceph Placement Groups (PGs) per
Pool Calculator for details on calculating a suitable number. The default value 8 is not suitable for most systems.

IBM Storage Ceph 111


Type
Integer

Required
Yes

Default
8

PGP_NUM
Description
The total number of placement groups for placement purposes. This value must be equal to the total number of placement groups,
except for placement group splitting scenarios.

Type
Integer

Required
Yes. If not specified it is set to the value listed in the Ceph configuration file or to the default value.

Default
8

replicated or erasure
Description
The pool type can be either replicated to recover from lost OSDs by keeping multiple copies of the objects or erasure to get a
kind of generalized RAID5 capability. The replicated pools require more raw storage but implement all Ceph operations. The erasure-
coded pools require less raw storage but only implement a subset of the available operations.

Type
String

Required
No

Default
replicated

crush-rule-name
Description
The name of the crush rule for the pool. The rule MUST exist. For replicated pools, the name is the rule specified by the
osd_pool_default_crush_rule configuration setting. For erasure-coded pools the name is erasure-code if you specify the
default erasure code profile or POOL_NAME otherwise. Ceph creates this rule with the specified name implicitly if the rule doesn’t
already exist.

Type
String

Required
No

Default
Uses erasure-code for an erasure-coded pool. For replicated pools, it uses the value of the osd_pool_default_crush_rule
variable from the Ceph configuration.

expected-num-objects
Description
The expected number of objects for the pool. By setting this value together with a negative filestore_merge_threshold
variable, Ceph splits the placement groups at pool creation time to avoid the latency impact to perform runtime directory splitting.

Type
Integer

Required
No

Default
0, no splitting at the pool creation time

112 IBM Storage Ceph


erasure-code-profile
Description
For erasure-coded pools only. Use the erasure code profile. It must be an existing profile as defined by the osd erasure-code-
profile set variable in the Ceph configuration file. For further information, see the Erasure Code Profiles section.

Type
String

Required
No

When you create a pool, set the number of placement groups to a reasonable value (for example to 100). Consider the total number
of placement groups per OSD too. Placement groups are computationally expensive, so performance will degrade when you have
many pools with many placement groups, for example, 50 pools with 100 placement groups each. The point of diminishing returns
depends upon the power of the OSD host.

See the Placement Groups section and Ceph Placement Groups (PGs) per Pool Calculator for details on calculating an appropriate
number of placement groups for your pool.

Setting pool quota


Edit online
You can set pool quotas for the maximum number of bytes or the maximum number of objects per pool or for both.

Syntax

ceph osd pool set-quota POOL_NAME [max_objects OBJECT_COUNT>] [max_bytes BYTES]

Example

[ceph: root@host01 /]# ceph osd pool set-quota data max_objects 10000

To remove a quota, set its value to 0.

NOTE: In-flight write operations might overrun pool quotas for a short time until Ceph propagates the pool usage across the cluster.
This is normal behavior. Enforcing pool quotas on in-flight write operations would impose significant performance penalties.

Deleting a pool
Edit online
To delete a pool, execute:

Syntax

ceph osd pool delete POOL_NAME [POOL_NAME --yes-i-really-really-mean-it]

IMPORTANT: To protect data, storage administrators cannot delete pools by default. Set the mon_allow_pool_delete
configuration option before deleting pools.

If a pool has its own rule, consider removing it after deleting the pool. If a pool has users strictly for its own use, consider deleting
those users after deleting the pool.

Renaming a pool
Edit online
To rename a pool, execute:

Syntax

ceph osd pool rename CURRENT_POOL_NAME NEW_POOL_NAME

IBM Storage Ceph 113


If you rename a pool and you have per-pool capabilities for an authenticated user, you must update the user’s capabilities (that is,
caps) with the new pool name.

Viewing pool statistics


Edit online
To show a pool’s utilization statistics, run the following command:

Syntax

rados df

Setting pool values


Edit online
To set a value to a pool, execute the following command:

Syntax

ceph osd pool set POOL_NAME KEY VALUE

The Pool Values section lists all key-values pairs that you can set.

Getting pool values


Edit online
To get a value from a pool, execute the following command:

Syntax

ceph osd pool get POOL_NAME KEY

The Pool Values section lists all key-values pairs that you can get.

Enabling a client application


Edit online
IBM Storage Ceph provides additional protection for pools to prevent unauthorized types of clients from writing data to the pool. This
means that system administrators must expressly enable pools to receive I/O operations from Ceph Block Device, Ceph Object
Gateway, Ceph Filesystem or for a custom application.

To enable a client application to conduct I/O operations on a pool, execute the following:

Syntax

ceph osd pool application enable POOL_NAME APP {--yes-i-really-mean-it}

Where APP is:

cephfs for the Ceph Filesystem.

rbd for the Ceph Block Device

rgw for the Ceph Object Gateway

NOTE: Specify a different APP value for a custom application.

114 IBM Storage Ceph


IMPORTANT: A pool that is not enabled will generate a HEALTH_WARN status.

In that scenario, the output for ceph health detail -f json-pretty gives the following output:

{
"checks": {
"POOL_APP_NOT_ENABLED": {
"severity": "HEALTH_WARN",
"summary": {
"message": "application not enabled on 1 pool(s)"
},
"detail": [
{
"message": "application not enabled on pool 'POOL_NAME'"
},
{
"message": "use 'ceph osd pool application enable POOL_NAME APP', where APP is
'cephfs', 'rbd', 'rgw', or freeform for custom applications."
}
]
}
},
"status": "HEALTH_WARN",
"overall_status": "HEALTH_WARN",
"detail": [
"'ceph health' JSON format has changed in luminous. If you see this your monitoring system
is scraping the wrong fields. Disable this with 'mon health preluminous compat warning = false'"
]
}

NOTE: Initialize pools for the Ceph Block Device with rbd pool init POOL_NAME.

Disabling a client application


Edit online
To disable a client application from conducting I/O operations on a pool, execute the following:

Syntax

ceph osd pool application disable POOL_NAME APP {--yes-i-really-mean-it}

Where APP is:

cephfs for the Ceph Filesystem.

rbd for the Ceph Block Device

rgw for the Ceph Object Gateway

NOTE: Specify a different APP value for a custom application.

Setting application metadata


Edit online
Provides the functionality to set key-value pairs describing attributes of the client application.

To set client application metadata on a pool, execute the following:

Syntax

ceph osd pool application set POOL_NAME APP KEY

Where APP is:

cephfs for the Ceph Filesystem.

IBM Storage Ceph 115


rbd for the Ceph Block Device

rgw for the Ceph Object Gateway

NOTE: Specify a different APP value for a custom application.

Removing application metadata


Edit online
To remove client application metadata on a pool, execute the following:

Syntax

ceph osd pool application rm POOL_NAME APP KEY

Where APP is:

cephfs for the Ceph Filesystem.

rbd for the Ceph Block Device

rgw for the Ceph Object Gateway

NOTE: Specify a different APP value for a custom application.

Setting the number of object replicas


Edit online
To set the number of object replicas on a replicated pool, execute the following command:

Syntax

ceph osd pool set POOL_NAME size NUMBER_OF_REPLICAS

IMPORTANT: The NUMBER_OF_REPLICAS parameter includes the object itself. If you want to include the object and two copies of
the object for a total of three instances of the object, specify 3.

Example

[ceph: root@host01 /]# ceph osd pool set data size 3

You can execute this command for each pool.

NOTE: An object might accept I/O operations in degraded mode with fewer replicas than specified by the pool size setting. To set
a minimum number of required replicas for I/O, use the min_size setting.

Example

ceph osd pool set data min_size 2

This ensures that no object in the data pool will receive I/O with fewer replicas than specified by the min_size setting.

Getting the number of object replicas


Edit online
To get the number of object replicas, execute the following command:

ceph osd dump | grep 'replicated size'

Ceph will list the pools, with the replicated size attribute highlighted. By default, Ceph creates two replicas of an object, that is
a total of three copies, or a size of 3.

116 IBM Storage Ceph


Pool values
Edit online
The following list contains key-values pairs that you can set or get. For further information, see the Setting pool values and Getting
pool values sections.

size

Description
Specifies the number of replicas for objects in the pool. See the Setting the number of object replicas section for further details.
Applicable for the replicated pools only.

Type
Integer

min_size

Description
Specifies the minimum number of replicas required for I/O. See the Setting the number of object replicas section for further
details.For erasure-coded pools, this should be set to a value greater than k. If I/O is allowed at the value k, then there is no
redundancy and data is lost in the event of a permanent OSD failure. For more information, see Erasure code pools overview.

Type
Integer

crash_replay_interval

Description
Specifies the number of seconds to allow clients to replay acknowledged, but uncommitted requests.

Type
Integer

pg-num

Description The total number of placement groups for the pool. See the Pools, placement groups, and CRUSH configuration section
for details on calculating a suitable number. The default value 8 is not suitable for most systems.

Type
Integer

Required
Yes.

Default
8

pgp-num

Description
The total number of placement groups for placement purposes. This should be equal to the total number of placement groups,
except for placement group splitting scenarios.

Type
Integer

Required
Yes. Picks up default or Ceph configuration value if not specified.

Default
8

Valid Range
Equal to or less than what specified by the pg_num variable.

crush_rule

IBM Storage Ceph 117


Description
The rule to use for mapping object placement in the cluster.

Type
String

hashpspool

Description
Enable or disable the HASHPSPOOL flag on a given pool. With this option enabled, pool hashing and placement group mapping are
changed to improve the way pools and placement groups overlap.

Type
Integer

Valid Range
1 enables the flag, 0 disables the flag.

IMPORTANT Do not enable this option on production pools of a cluster with a large amount of OSDs and data. All placement groups
in the pool would have to be remapped causing too much data movement.

fast_read

Description
On a pool that uses erasure coding, if this flag is enabled, the read request issues subsequent reads to all shards, and waits until it
receives enough shards to decode to serve the client. In the case of the jerasure and isa erasure plug-ins, once the first K
replies return, the client’s request is served immediately using the data decoded from these replies. This helps to allocate some
resources for better performance. Currently this flag is only supported for erasure coding pools.

Type
Boolean

Defaults
0

allow_ec_overwrites

Description
Whether writes to an erasure coded pool can update part of an object, so the Ceph Filesystem and Ceph Block Device can use it.

Type
Boolean

compression_algorithm

Description
Sets inline compression algorithm to use with the BlueStore storage backend. This setting overrides the
bluestore_compression_algorithm configuration setting.

Type
String

Valid Settings
lz4, snappy, zlib, zstd

compression_mode

Description
Sets the policy for the inline compression algorithm for the BlueStore storage backend. This setting overrides the
bluestore_compression_mode configuration setting.

Type
String

Valid Settings
none, passive, aggressive, force

compression_min_blob_size

118 IBM Storage Ceph


Description
BlueStore will not compress chunks smaller than this size. This setting overrides the bluestore_compression_min_blob_size
configuration setting.

Type
Unsigned Integer

compression_max_blob_size

Description
BlueStore will break chunks larger than this size into smaller blobs of compression_max_blob_size before compressing the data.

Type
Unsigned Integer

nodelete

Description
Set or unset the NODELETE flag on a given pool.

Type
Integer

Valid Range
1 sets flag. 0 unsets flag.

nopgchange

Description
Set or unset the NOPGCHANGE flag on a given pool.

Type
Integer

Valid Range
1 sets the flag. 0 unsets the flag.

nosizechange

Description
Set or unset the NOSIZECHANGE flag on a given pool.

Type
Integer

Valid Range
1 sets the flag. 0 unsets the flag.

write_fadvise_dontneed

Description
Set or unset the WRITE_FADVISE_DONTNEED flag on a given pool.

Type
Integer

Valid Range
1 sets the flag. 0 unsets the flag.

noscrub

Description
Set or unset the NOSCRUB flag on a given pool.

Type
Integer

Valid Range
1 sets the flag. 0 unsets the flag.

nodeep-scrub

IBM Storage Ceph 119


Description
Set or unset the NODEEP_SCRUB flag on a given pool.

Type
Integer

Valid Range
1 sets the flag. 0 unsets the flag.

scrub_min_interval

Description
The minimum interval in seconds for pool scrubbing when load is low. If it is 0, Ceph uses the osd_scrub_min_interval
configuration setting.

Type
Double

Default
0

scrub_max_interval

Description
The maximum interval in seconds for pool scrubbing irrespective of cluster load. If it is 0, Ceph uses the
osd_scrub_max_interval configuration setting.

Type
Double

Default
0

deep_scrub_interval

Description
The interval in seconds for pool deep scrubbing. If it is 0, Ceph uses the osd_deep_scrub_interval configuration setting.

Type
Double

Default
0

Erasure code pools overview


Edit online
Ceph storage strategies involve defining data durability requirements. Data durability means the ability to sustain the loss of one or
more OSDs without losing data.

Ceph stores data in pools and there are two types of the pools:

replicated

erasure-coded

Ceph uses the replicated pools by default, meaning the Ceph copies every object from a primary OSD node to one or more secondary
OSDs.

The erasure-coded pools reduce the amount of disk space required to ensure data durability but it is computationally a bit more
expensive than replication.

Erasure coding is a method of storing an object in the Ceph storage cluster durably where the erasure code algorithm breaks the
object into data chunks (k) and coding chunks (m), and stores those chunks in different OSDs.

In the event of the failure of an OSD, Ceph retrieves the remaining data (k) and coding (m) chunks from the other OSDs and the
erasure code algorithm restores the object from those chunks.

120 IBM Storage Ceph


NOTE: IBM recommends min_size for erasure-coded pools to be K+1 or more to prevent loss of writes and data.

Erasure coding uses storage capacity more efficiently than replication. The n-replication approach maintains n copies of an object
(3x by default in Ceph), whereas erasure coding maintains only k + m chunks. For example, 3 data and 2 coding chunks use 1.5x the
storage space of the original object.

While erasure coding uses less storage overhead than replication, the erasure code algorithm uses more RAM and CPU than
replication when it accesses or recovers objects. Erasure coding is advantageous when data storage must be durable and fault
tolerant, but do not require fast read performance (for example, cold storage, historical records, and so on).

For the mathematical and detailed explanation on how erasure code works in Ceph, see the Ceph Erasure Coding.

Ceph creates a default erasure code profile when initializing a cluster with k=2 and m=2, This mean that Ceph will spread the object
data over four OSDs (k+m == 4) and Ceph can lose one of those OSDs without losing data. To know more about erasure code
profiling see Erasure Code Profiles section.

IMPORTANT: Configure only the .rgw.buckets pool as erasure-coded and all other Ceph Object Gateway pools as replicated,
otherwise an attempt to create a new bucket fails with the following error:

set_req_state_err err_no=95 resorting to 500

The reason for this is that erasure-coded pools do not support the omap operations and certain Ceph Object Gateway metadata
pools require the omap support.

Creating a sample erasure-coded pool


Erasure code profiles
Erasure Coding with Overwrites
Erasure Code Plugins

Creating a sample erasure-coded pool


Edit online
The simplest erasure coded pool is equivalent to RAID5 and requires at least three hosts:

Example

$ ceph osd pool create ecpool 32 32 erasure


pool 'ecpool' created
$ echo ABCDEFGHI | rados --pool ecpool put NYAN -
$ rados --pool ecpool get NYAN -
ABCDEFGHI

NOTE: The 32 in pool create stands for the number of placement groups.

Erasure code profiles


Edit online
Ceph defines an erasure-coded pool with a profile. Ceph uses a profile when creating an erasure-coded pool and the associated
CRUSH rule.

Ceph creates a default erasure code profile when initializing a cluster and it provides the same level of redundancy as two copies in a
replicated pool. However, it uses 25% less storage capacity. The default profiles define k=2 and m=2, meaning Ceph will spread the
object data over three OSDs (k+m=4) and Ceph can lose one of those OSDs without losing data.

The default erasure code profile can sustain the loss of a single OSD. It is equivalent to a replicated pool with a size two, but requires
1.5 TB instead of 2 TB to store 1 TB of data. To display the default profile use the following command:

$ ceph osd erasure-code-profile get default


k=2
m=2
plugin=jerasure
technique=reed_sol_van

IBM Storage Ceph 121


You can create a new profile to improve redundancy without increasing raw storage requirements. For instance, a profile with k=8
and m=4 can sustain the loss of four (m=4) OSDs by distributing an object on 12 (k+m=12) OSDs. Ceph divides the object into 8
chunks and computes 4 coding chunks for recovery. For example, if the object size is 8 MB, each data chunk is 1 MB and each coding
chunk has the same size as the data chunk, that is also 1 MB. The object will not be lost even if four OSDs fail simultaneously.

The most important parameters of the profile are k, m and crush-failure-domain, because they define the storage overhead and the
data durability.

IMPORTANT: Choosing the correct profile is important because you cannot change the profile after you create the pool. To modify a
profile, you must create a new pool with a different profile and migrate the objects from the old pool to the new pool.

For instance, if the desired architecture must sustain the loss of two racks with a storage overhead of 40% overhead, the following
profile can be defined:

$ ceph osd erasure-code-profile set myprofile \


k=4 \
m=2 \
crush-failure-domain=rack
$ ceph osd pool create ecpool 12 12 erasure *myprofile*
$ echo ABCDEFGHIJKL | rados --pool ecpool put NYAN -
$ rados --pool ecpool get NYAN -
ABCDEFGHIJKL

The primary OSD will divide the NYAN object into four (k=4) data chunks and create two additional chunks (m=2). The value of m
defines how many OSDs can be lost simultaneously without losing any data. The crush-failure-domain=rack will create a CRUSH rule
that ensures no two chunks are stored in the same rack.

Figure 1. Erasure code

IMPORTANT: IBM supports the following jerasure coding values for k, and m:

k=8 m=3

k=8 m=4

k=4 m=2

IMPORTANT: If the number of OSDs lost equals the number of coding chunks (m), some placement groups in the erasure coding pool
will go into incomplete state. If the number of OSDs lost is less than m, no placement groups will go into incomplete state. In either
situation, no data loss will occur. If placement groups are in incomplete state, temporarily reducing min_size of an erasure coded
pool will allow recovery.

Setting OSD erasure-code-profile

122 IBM Storage Ceph


Removing OSD erasure-code-profile
Getting OSD erasure-code-profile
Listing OSD erasure-code-profile

Setting OSD erasure-code-profile

Edit online
To create a new erasure code profile:

Syntax

ceph osd erasure-code-profile set NAME \


[<directory=DIRECTORY>] \
[<plugin=PLUGIN>] \
[<stripe_unit=STRIPE_UNIT>] \
[<CRUSH_DEVICE_CLASS>]\
[<CRUSH_FAILURE_DOMAIN>]\
[<key=value> ...] \
[--force]

Where:

directory
Description
Set the directory name from which the erasure code plug-in is loaded.

Type
String

Required
No.

Default
/usr/lib/ceph/erasure-code

plugin
Description
Use the erasure code plug-in to compute coding chunks and recover missing details. See the Erasure Code Plug-ins section for
details.

Type
String

Required
No.

Default
jerasure

stripe_unit
Description
The amount of data in a data chunk, per stripe. For example, a profile with 2 data chunks and stripe_unit=4K would put the range
0-4K in chunk 0, 4K-8K in chunk 1, then 8K-12K in chunk 0 again. This should be a multiple of 4K for best performance. The default
value is taken from the monitor config option osd_pool_erasure_code_stripe_unit when a pool is created. The stripe_width
of a pool using this profile will be the number of data chunks multiplied by this stripe_unit.

Type
String

Required
No.

Default
4K

crush-device-class
Description

IBM Storage Ceph 123


The device class, such as hdd or ssd.

Type
String

Required
No

Default
none, meaning CRUSH uses all devices regardless of class.

crush-failure-domain
Description
The failure domain, such as host or rack.

Type
String

Required
No

Default
host

key
Description
The semantic of the remaining key-value pairs is defined by the erasure code plug-in.

Type
String

Required
No.

--force
Description
Override an existing profile by the same name.

Type
String

Required
No.

Removing OSD erasure-code-profile

Edit online
To remove an erasure code profile:

Syntax

ceph osd erasure-code-profile rm NAME

If the profile is referenced by a pool, the deletion will fail.

Getting OSD erasure-code-profile

Edit online
To display an erasure code profile:

Syntax

ceph osd erasure-code-profile get NAME

124 IBM Storage Ceph


Listing OSD erasure-code-profile

Edit online
To list the names of all erasure code profiles:

Syntax

ceph osd erasure-code-profile ls

Erasure Coding with Overwrites


Edit online
By default, erasure coded pools only work with the Ceph Object Gateway, which performs full object writes and appends.

Using erasure coded pools with overwrites allows Ceph Block Devices and CephFS store their data in an erasure coded pool:

Syntax

ceph osd pool set ERASURE_CODED_POOL_NAME allow_ec_overwrites true

Example

[ceph: root@host01 /]# ceph osd pool set ec_pool allow_ec_overwrites true

Enabling erasure coded pools with overwrites can only reside in a pool using BlueStore OSDs. Since BlueStore’s checksumming is
used to detect bit rot or other corruption during deep scrubs. Using FileStore with erasure coded overwrites is unsafe, and yields
lower performance when compared to BlueStore.

Erasure coded pools do not support omap. To use erasure coded pools with Ceph Block Devices and CephFS, store the data in an
erasure coded pool, and the metadata in a replicated pool.

For Ceph Block Devices, use the --data-pool option during image creation:

Syntax

rbd create --size IMAGE_SIZE_M|G|T --data-pool _ERASURE_CODED_POOL_NAME


REPLICATED_POOL_NAME/IMAGE_NAME

Example

[ceph: root@host01 /]# rbd create --size 1G --data-pool ec_pool rep_pool/image01

If using erasure coded pools for CephFS, then setting the overwrites must be done in a file layout.

Erasure Code Plugins


Edit online
Ceph supports erasure coding with a plug-in architecture, which means you can create erasure coded pools using different types of
algorithms. Ceph supports: - Jerasure (Default)

Creating a new erasure code profile using jerasure erasure code plugin
Controlling CRUSH Placement

Creating a new erasure code profile using jerasure erasure code


plugin
Edit online

IBM Storage Ceph 125


The jerasure plug-in is the most generic and flexible plug-in. It is also the default for Ceph erasure coded pools.

The jerasure plug-in encapsulates the JerasureH library. For detailed information about the parameters, see the jerasure
documentation.

To create a new erasure code profile using the jerasure plug-in, run the following command:

Syntax

ceph osd erasure-code-profile set NAME \


plugin=jerasure \
k=DATA_CHUNKS \
m=DATA_CHUNKS \
technique=TECHNIQUE \
[crush-root=ROOT] \
[crush-failure-domain=BUCKET_TYPE] \
[directory=DIRECTORY] \
[--force]

Where:

k
Description
Each object is split in data-chunks parts, each stored on a different OSD.

Type
Integer

Required
Yes.

Example
4

m
Description
Compute coding chunks for each object and store them on different OSDs. The number of coding chunks is also the number of OSDs
that can be down without losing data.

Type
Integer

Required
Yes.

Example
2

technique
Description
The more flexible technique is reed_sol_van; it is enough to set k and m. The cauchy_good technique can be faster but you need to
choose the packetsize carefully. All of reed_sol_r6_op, liberation, blaum_roth, liber8tion are RAID6 equivalents in the sense that they
can only be configured with m=2.

Type
String

Required
No.

Valid Settings
reed_sol_van reed_sol_r6_op cauchy_orig cauchy_good liberation blaum_roth liber8tion

Default
reed_sol_van

packetsize
Description
The encoding will be done on packets of bytes size at a time. Choosing the correct packet size is difficult. The jerasure documentation
contains extensive information on this topic.

126 IBM Storage Ceph


Type
Integer

Required
No.

Default
2048

crush-root
Description
The name of the CRUSH bucket used for the first step of the rule. For instance step take default.

Type
String

Required
No.

Default
default

crush-failure-domain
Description
Ensure that no two chunks are in a bucket with the same failure domain. For instance, if the failure domain is host no two chunks will
be stored on the same host. It is used to create a rule step such as step chooseleaf host.

Type
String

Required
No.

Default
host

directory
Description
Set the directory name from which the erasure code plug-in is loaded.

Type
String

Required
No.

Default
/usr/lib/ceph/erasure-code

--force
Description
Override an existing profile by the same name.

Type
String

Required
No.

Controlling CRUSH Placement


Edit online
The default CRUSH rule provides OSDs that are on different hosts. For instance:

chunk nr 01234567

step 1 _cDD_cDD

IBM Storage Ceph 127


step 2 cDDD____
step 3 ____cDDD

needs exactly 8 OSDs, one for each chunk. If the hosts are in two adjacent racks, the first four chunks can be placed in the first rack
and the last four in the second rack. Recovering from the loss of a single OSD does not require using bandwidth between the two
racks.

For instance:

crush-steps='[ [ "choose", "rack", 2 ], [ "chooseleaf", "host", 4 ] ]'

creates a rule that selects two crush buckets of type rack and for each of them choose four OSDs, each of them located in a different
bucket of type host.

The rule can also be created manually for finer control.

Installing
Edit online
This information provides instructions on installing IBM Storage Ceph on Red Hat Enterprise Linux running on AMD64 and Intel 64
architectures.

IBM Storage Ceph


IBM Storage Ceph considerations and recommendations
IBM Storage Ceph installation
Managing an IBM Storage Ceph cluster using cephadm-ansible modules
Comparison between Ceph Ansible and Cephadm
cephadm commands
What to do next? Day 2

IBM Storage Ceph


Edit online
IBM Storage Ceph is a scalable, open, software-defined storage platform that combines an enterprise-hardened version of the Ceph
storage system, with a Ceph management platform, deployment utilities, and support services.

IBM Storage Ceph is designed for cloud infrastructure and web-scale object storage. IBM Storage Ceph clusters consist of the
following types of nodes:

Ceph Monitor

Each Ceph Monitor node runs the ceph-mon daemon, which maintains a master copy of the storage cluster map. The storage cluster
map includes the storage cluster topology. A client connecting to the Ceph storage cluster retrieves the current copy of the storage
cluster map from the Ceph Monitor, which enables the client to read from and write data to the storage cluster.

IMPORTANT: The storage cluster can run with only one Ceph Monitor; however, to ensure high availability in a production storage
cluster, IBM supports deployments with at least three Ceph Monitor nodes. Deploy a total of 5 Ceph Monitors for storage clusters
exceeding 750 Ceph OSDs.

Ceph Manager

The Ceph Manager daemon, ceph-mgr, co-exists with the Ceph Monitor daemons running on Ceph Monitor nodes to provide
additional services. The Ceph Manager provides an interface for other monitoring and management systems using Ceph Manager
modules. Running the Ceph Manager daemons is a requirement for normal storage cluster operations.

Ceph OSD

Each Ceph Object Storage Device (OSD) node runs the ceph-osd daemon, which interacts with logical disks attached to the node.
The storage cluster stores data on these Ceph OSD nodes.

Ceph can run with very few OSD nodes, of which the default is three, but production storage clusters realize better performance
beginning at modest scales. For example, 50 Ceph OSDs in a storage cluster. Ideally, a Ceph storage cluster has multiple OSD nodes,

128 IBM Storage Ceph


allowing for the possibility to isolate failure domains by configuring the CRUSH map accordingly.

Ceph MDS

Each Ceph Metadata Server (MDS) node runs the ceph-mds daemon, which manages metadata related to files stored on the Ceph
File System (CephFS). The Ceph MDS daemon also coordinates access to the shared storage cluster.

Ceph Object Gateway

Ceph Object Gateway node runs the ceph-radosgw daemon, and is an object storage interface built on top of librados to provide
applications with a RESTful access point to the Ceph storage cluster. The Ceph Object Gateway supports two interfaces:

S3

Provides object storage functionality with an interface that is compatible with a large subset of the Amazon S3 RESTful API.

Swift

Provides object storage functionality with an interface that is compatible with a large subset of the OpenStack Swift API.

Reference
Edit online

For details on the Ceph architecture, see Architecture

For the minimum hardware recommendations, see Hardware

IBM Storage Ceph considerations and recommendations


Edit online
As a storage administrator, you can have a basic understanding about what things to consider before running an IBM Storage Ceph
cluster.

Understanding such things as, the hardware and network requirements, understanding what type of workloads work well with an
IBM Storage Ceph cluster, along with IBM's recommendations. IBM Storage Ceph can be used for different workloads based on a
particular business need or set of requirements. Doing the necessary planning before installing an IBM Storage Ceph is critical to the
success of running a Ceph storage cluster efficiently and achieving the business requirements.

NOTE: Want help with planning an IBM Storage Ceph cluster for a specific use case? Contact your IBM representative for assistance.

Basic IBM Storage Ceph considerations


IBM Storage Ceph workload considerations
Network considerations for IBM Storage Ceph
Considerations for using a RAID controller with OSD hosts
Tuning considerations for the Linux kernel when running Ceph
How colocation works and its advantages
Operating system requirements for IBM Storage Ceph
Minimum hardware considerations for IBM Storage Ceph

Basic IBM Storage Ceph considerations


Edit online
The first consideration for using IBM Storage Ceph is developing a storage strategy for the data. A storage strategy is a method of
storing data that serves a particular use case. If you need to store volumes and images for a cloud platform like OpenStack, you can
choose to store data on faster Serial Attached SCSI (SAS) drives with Solid State Drives (SSD) for journals. By contrast, if you need to
store object data for an S3- or Swift-compliant gateway, you can choose to use something more economical, like traditional Serial
Advanced Technology Attachment (SATA) drives. IBM Storage Ceph can accommodate both scenarios in the same storage cluster,
but you need a means of providing the fast storage strategy to the cloud platform, and a means of providing more traditional storage
for your object store.

IBM Storage Ceph 129


One of the most important steps in a successful Ceph deployment is identifying a price-to-performance profile suitable for the
storage cluster’s use case and workload. It is important to choose the right hardware for the use case. For example, choosing IOPS-
optimized hardware for a cold storage application increases hardware costs unnecessarily. Whereas, choosing capacity-optimized
hardware for its more attractive price point in an IOPS-intensive workload will likely lead to unhappy users complaining about slow
performance.

IBM Storage Ceph Use cases can support multiple storage strategies. Use cases, cost versus benefit performance tradeoffs, and data
durability are the primary considerations that help develop a sound storage strategy.

Use Cases

Ceph provides massive storage capacity, and it supports numerous use cases, such as:

The Ceph Block Device client is a leading storage backend for cloud platforms that provides limitless storage for volumes and
images with high performance features like copy-on-write cloning.

The Ceph Object Gateway client is a leading storage backend for cloud platforms that provides a RESTful S3-compliant and
Swift-compliant object storage for objects like audio, bitmap, video, and other data.

The Ceph File System for traditional file storage.

Cost vs. Benefit of Performance

Faster is better. Bigger is better. High durability is better. However, there is a price for each superlative quality, and a corresponding
cost versus benefit tradeoff. Consider the following use cases from a performance perspective: SSDs can provide very fast storage
for relatively small amounts of data and journaling. Storing a database or object index can benefit from a pool of very fast SSDs, but
proves too expensive for other data. SAS drives with SSD journaling provide fast performance at an economical price for volumes and
images. SATA drives without SSD journaling provide cheap storage with lower overall performance. When you create a CRUSH
hierarchy of OSDs, you need to consider the use case and an acceptable cost versus performance tradeoff.

Data Durability

In large scale storage clusters, hardware failure is an expectation, not an exception. However, data loss and service interruption
remain unacceptable. For this reason, data durability is very important. Ceph addresses data durability with multiple replica copies
of an object or with erasure coding and multiple coding chunks. Multiple copies or multiple coding chunks present an additional cost
versus benefit tradeoff: it is cheaper to store fewer copies or coding chunks, but it can lead to the inability to service write requests in
a degraded state. Generally, one object with two additional copies, or two coding chunks can allow a storage cluster to service writes
in a degraded state while the storage cluster recovers.

Replication stores one or more redundant copies of the data across failure domains in case of a hardware failure. However,
redundant copies of data can become expensive at scale. For example, to store 1 petabyte of data with triple replication would
require a cluster with at least 3 petabytes of storage capacity.

Erasure coding stores data as data chunks and coding chunks. In the event of a lost data chunk, erasure coding can recover the lost
data chunk with the remaining data chunks and coding chunks. Erasure coding is substantially more economical than replication. For
example, using erasure coding with 8 data chunks and 3 coding chunks provides the same redundancy as 3 copies of the data.
However, such an encoding scheme uses approximately 1.5x the initial data stored compared to 3x with replication.

The CRUSH algorithm aids this process by ensuring that Ceph stores additional copies or coding chunks in different locations within
the storage cluster. This ensures that the failure of a single storage device or host does not lead to a loss of all of the copies or coding
chunks necessary to preclude data loss. You can plan a storage strategy with cost versus benefit tradeoffs, and data durability in
mind, then present it to a Ceph client as a storage pool.

IMPORTANT: ONLY the data storage pool can use erasure coding. Pools storing service data and bucket indexes use replication.

IMPORTANT: Ceph’s object copies or coding chunks make RAID solutions obsolete. Do not use RAID, because Ceph already handles
data durability, a degraded RAID has a negative impact on performance, and recovering data using RAID is substantially slower than
using deep copies or erasure coding chunks.

Reference
Edit online

Minimum hardware considerations for IBM Storage Ceph

130 IBM Storage Ceph


IBM Storage Ceph workload considerations
Edit online
One of the key benefits of a Ceph storage cluster is the ability to support different types of workloads within the same storage cluster
using performance domains. Different hardware configurations can be associated with each performance domain. Storage
administrators can deploy storage pools on the appropriate performance domain, providing applications with storage tailored to
specific performance and cost profiles. Selecting appropriately sized and optimized servers for these performance domains is an
essential aspect of designing an IBM Storage Ceph cluster.

To the Ceph client interface that reads and writes data, a Ceph storage cluster appears as a simple pool where the client stores data.
However, the storage cluster performs many complex operations in a manner that is completely transparent to the client interface.
Ceph clients and Ceph object storage daemons, referred to as Ceph OSDs, or simply OSDs, both use the Controlled Replication Under
Scalable Hashing (CRUSH) algorithm for the storage and retrieval of objects. Ceph OSDs can run in containers within the storage
cluster.

A CRUSH map describes a topography of cluster resources, and the map exists both on client hosts as well as Ceph Monitor hosts
within the cluster. Ceph clients and Ceph OSDs both use the CRUSH map and the CRUSH algorithm. Ceph clients communicate
directly with OSDs, eliminating a centralized object lookup and a potential performance bottleneck. With awareness of the CRUSH
map and communication with their peers, OSDs can handle replication, backfilling, and recovery—allowing for dynamic failure
recovery.

Ceph uses the CRUSH map to implement failure domains. Ceph also uses the CRUSH map to implement performance domains,
which simply take the performance profile of the underlying hardware into consideration. The CRUSH map describes how Ceph
stores data, and it is implemented as a simple hierarchy, specifically an acyclic graph, and a ruleset. The CRUSH map can support
multiple hierarchies to separate one type of hardware performance profile from another. Ceph implements performance domains
with device "classes".

For example, you can have these performance domains coexisting in the same IBM Storage Ceph cluster:

Hard disk drives (HDDs) are typically appropriate for cost and capacity-focused workloads.

Throughput-sensitive workloads typically use HDDs with Ceph write journals on solid state drives (SSDs).

IOPS-intensive workloads, such as MySQL and MariaDB, often use SSDs.

Figure 1. Performance and Failure Domains

IBM Storage Ceph 131


Workloads
Edit online
IBM Storage Ceph is optimized for three primary workloads:

IMPORTANT: Carefully consider the workload being run by IBM Storage Ceph clusters BEFORE considering what hardware to
purchase, because it can significantly impact the price and performance of the storage cluster. For example, if the workload is
capacity-optimized and the hardware is better suited to a throughput-optimized workload, then hardware will be more expensive
than necessary. Conversely, if the workload is throughput-optimized and the hardware is better suited to a capacity-optimized
workload, then the storage cluster can suffer from poor performance.

IOPS optimized: Input, output per second (IOPS) optimization deployments are suitable for cloud computing operations,
such as running MYSQL or MariaDB instances as virtual machines on OpenStack. IOPS optimized deployments require higher
performance storage such as 15k RPM SAS drives and separate SSD journals to handle frequent write operations. Some high
IOPS scenarios use all flash storage to improve IOPS and total throughput.

An IOPS-optimized storage cluster has the following properties:

Lowest cost per IOPS.

Highest IOPS per GB.

99th percentile latency consistency.

Uses for an IOPS-optimized storage cluster are:

Typically block storage.

3x replication for hard disk drives (HDDs) or 2x replication for solid state drives (SSDs).

MySQL on OpenStack clouds.

132 IBM Storage Ceph


Throughput optimized: Throughput-optimized deployments are suitable for serving up significant amounts of data, such as
graphic, audio, and video content. Throughput-optimized deployments require high bandwidth networking hardware,
controllers, and hard disk drives with fast sequential read and write characteristics. If fast data access is a requirement, then
use a throughput-optimized storage strategy. Also, if fast write performance is a requirement, using Solid State Disks (SSD) for
journals will substantially improve write performance.

A throughput-optimized storage cluster has the following properties:

Lowest cost per MBps (throughput).

Highest MBps per TB.

Highest MBps per BTU.

Highest MBps per Watt.

97th percentile latency consistency.

Uses for a throughput-optimized storage cluster are:

Block or object storage.

3x replication.

Active performance storage for video, audio, and images.

Streaming media, such as 4k video.

Capacity optimized: Capacity-optimized deployments are suitable for storing significant amounts of data as inexpensively as
possible. Capacity-optimized deployments typically trade performance for a more attractive price point. For example,
capacity-optimized deployments often use slower and less expensive SATA drives and co-locate journals rather than using
SSDs for journaling.

A cost and capacity-optimized storage cluster has the following properties:

Lowest cost per TB.

Lowest BTU per TB.

Lowest Watts required per TB.

Uses for a cost and capacity-optimized storage cluster are:

Typically object storage.

Erasure coding for maximizing usable capacity

Object archive.

Video, audio, and image object repositories.

Network considerations for IBM Storage Ceph


Edit online
An important aspect of a cloud storage solution is that storage clusters can run out of IOPS due to network latency, and other factors.
Also, the storage cluster can run out of throughput due to bandwidth constraints long before the storage clusters run out of storage
capacity. This means that the network hardware configuration must support the chosen workloads to meet price versus performance
requirements.

Storage administrators prefer that a storage cluster recovers as quickly as possible. Carefully consider bandwidth requirements for
the storage cluster network, be mindful of network link oversubscription, and segregate the intra-cluster traffic from the client-to-
cluster traffic. Also consider that network performance is increasingly important when considering the use of Solid State Disks (SSD),
flash, NVMe, and other high performing storage devices.

Ceph supports a public network and a storage cluster network. The public network handles client traffic and communication with
Ceph Monitors. The storage cluster network handles Ceph OSD heartbeats, replication, backfilling, and recovery traffic. At a

IBM Storage Ceph 133


minimum, a single 10 Gb/s Ethernet link should be used for storage hardware, and you can add additional 10 Gb/s Ethernet links for
connectivity and throughput.

IMPORTANT: Allocate bandwidth to the storage cluster network, such that it is a multiple of the public network using the
osd_pool_default_size as the basis for the multiple on replicated pools. Run the public and storage cluster networks on
separate network cards.

IMPORTANT: Use 10 Gb/s Ethernet for IBM Storage Ceph deployments in production. A 1 Gb/s Ethernet network is not suitable for
production storage clusters.

In the case of a drive failure, replicating 1 TB of data across a 1 Gb/s network takes 3 hours and replicating 10 TB across a 1 Gb/s
network takes 30 hours. Using 10 TB is the typical drive configuration. By contrast, with a 10 Gb/s Ethernet network, the replication
times would be 20 minutes for 1 TB and 1 hour for 10 TB. Remember that when a Ceph OSD fails, the storage cluster will recover by
replicating the data it contained to other Ceph OSDs within the pool.

The failure of a larger domain such as a rack means that the storage cluster utilizes considerably more bandwidth. When building a
storage cluster consisting of multiple racks, which is common for large storage implementations, consider utilizing as much network
bandwidth between switches in a "fat tree" design for optimal performance. A typical 10 Gb/s Ethernet switch has 48 10 Gb/s ports
and four 40 Gb/s ports. Use the 40 Gb/s ports on the spine for maximum throughput. Alternatively, consider aggregating unused 10
Gb/s ports with QSFP+ and SFP+ cables into more 40 Gb/s ports to connect to other rack and spine routers. Also, consider using
LACP mode 4 to bond network interfaces. Additionally, use jumbo frames, with a maximum transmission unit (MTU) of 9000,
especially on the backend or cluster network.

Before installing and testing an IBM Storage Ceph cluster, verify the network throughput. Most performance-related problems in
Ceph usually begin with a networking issue. Simple network issues like a kinked or bent Cat-6 cable could result in degraded
bandwidth. Use a minimum of 10 Gb/s ethernet for the front side network. For large clusters, consider using 40 Gb/s ethernet for the
backend or cluster network.

IMPORTANT: For network optimization, use jumbo frames for a better CPU per bandwidth ratio, and a non-blocking network switch
back-plane. IBM Storage Ceph requires the same MTU value throughout all networking devices in the communication path, end-to-
end for both public and cluster networks. Verify that the MTU value is the same on all hosts and networking equipment in the
environment before using an IBM Storage Ceph cluster in production.

Reference
Edit online
For more information, see:

Configuring a private network

Configuring a public network

Configuring multiple public networks to the cluster

Considerations for using a RAID controller with OSD hosts


Edit online
Optionally, you can consider using a RAID controller on the OSD hosts. Here are some things to consider:

If an OSD host has a RAID controller with 1-2 Gb of cache installed, enabling the write-back cache might result in increased
small I/O write throughput. However, the cache must be non-volatile.

Most modern RAID controllers have super capacitors that provide enough power to drain volatile memory to non-volatile
NAND memory during a power-loss event. It is important to understand how a particular controller and its firmware behave
after power is restored.

Some RAID controllers require manual intervention. Hard drives typically advertise to the operating system whether their disk
caches should be enabled or disabled by default. However, certain RAID controllers and some firmware do not provide such
information. Verify that disk level caches are disabled to avoid file system corruption.

Create a single RAID 0 volume with write-back for each Ceph OSD data drive with write-back cache enabled.

134 IBM Storage Ceph


If Serial Attached SCSI (SAS) or SATA connected Solid-state Drive (SSD) disks are also present on the RAID controller, then
investigate whether the controller and firmware support pass-through mode. Enabling pass-through mode helps avoid caching
logic, and generally results in much lower latency for fast media.

Tuning considerations for the Linux kernel when running Ceph


Edit online
Production IBM Storage Ceph clusters generally benefit from tuning the operating system, specifically around limits and memory
allocation. Ensure that adjustments are set for all hosts within the storage cluster. You can also open a case with IBM support asking
for additional guidance.

Increase the File Descriptors

The Ceph Object Gateway can hang if it runs out of file descriptors. You can modify the /etc/security/limits.conf file on Ceph
Object Gateway hosts to increase the file descriptors for the Ceph Object Gateway.

ceph soft nofile unlimited

Adjusting the ulimit value for Large Storage Clusters

When running Ceph administrative commands on large storage clusters, for example, with 1024 Ceph OSDs or more, create an
/etc/security/limits.d/50-ceph.conf file on each host that runs administrative commands with the following contents:

Syntax

USER_NAME soft nproc unlimited

Replace USER_NAME with the name of the non-root user account that runs the Ceph administrative commands.

NOTE: The root user’s ulimit value is already set to unlimited by default on Red Hat Enterprise Linux.

How colocation works and its advantages


Edit online
You can colocate containerized Ceph daemons on the same host. Here are the advantages of colocating some of Ceph’s services:

Significant improvement in total cost of ownership (TCO) at small scale

Reduction from six hosts to three for the minimum configuration

Easier upgrade

Better resource isolation

How Colocation Works


Edit online
With the help of the Cephadm orchestrator, you can colocate one daemon from the following list with one or more OSD daemons
(ceph-osd):

Ceph Monitor (ceph-mon) and Ceph Manager (ceph-mgr) daemons

NFS Ganesha (nfs-ganesha) for Ceph Object Gateway (nfs-ganesha)

RBD Mirror (rbd-mirror)

Observability Stack (Grafana)

Additionally, for Ceph Object Gateway (radosgw) (RGW) and Ceph File System (ceph-mds), you can colocate either with an OSD
daemon plus a daemon from the above list, excluding RBD mirror.

NOTE: Colocating two of the same kind of daemons on a given node is not supported.

IBM Storage Ceph 135


NOTE: Because ceph-mon and ceph-mgr work together closely they do not count as two separate daemons for the purposes of
colocation.

NOTE: IBM recommends colocating the Ceph Object Gateway with Ceph OSD containers to increase performance.

With the colocation rules shared above, we have the following minimum clusters sizes that comply with these rules:

Example 1

1. Media: Full flash systems (SSDs)

2. Use case: Block (RBD) and File (CephFS), or Object (Ceph Object Gateway)

3. Number of nodes: 3

4. Replication scheme: 2

Host Daemon Daemon Daemon


host1 OSD Monitor/Manager Grafana
host2 OSD Monitor/Manager RGW or CephFS
host3 OSD Monitor/Manager RGW or CephFS
NOTE: The minimum size for a storage cluster with three replicas is four nodes. Similarly, the size of a storage cluster with two
replicas is a three node cluster. It is a requirement to have a certain number of nodes for the replication factor with an extra node in
the cluster to avoid extended periods with the cluster in a degraded state.

Figure 1. Colocated Daemons Example 1


Colocated Daemons Example 1

Example 2

1. Media: Full flash systems (SSDs) or spinning devices (HDDs)

2. Use case: Block (RBD), File (CephFS), and Object (Ceph Object Gateway)

3. Number of nodes: 4

4. Replication scheme: 3

Host Daemon Daemon Daemon


host1 OSD Grafana CephFS
host2 OSD Monitor/Manager RGW
host3 OSD Monitor/Manager RGW
host4 OSD Monitor/Manager CephFS

Figure 2. Colocated Daemons Example 2


Colocated Daemons Example 2

Example 3

1. Media: Full flash systems (SSDs) or spinning devices (HDDs)

2. Use case: Block (RBD), Object (Ceph Object Gateway), and NFS for Ceph Object Gateway

3. Number of nodes: 4

4. Replication scheme: 3

Host Daemon Daemon Daemon


host1 OSD Grafana
host2 OSD Monitor/Manager RGW
host3 OSD Monitor/Manager RGW
host4 OSD Monitor/Manager NFS (RGW)

Figure 3. Colocated Daemons Example 3


Colocated Daemons Example 3

The diagrams below shows the differences between storage clusters with colocated and non-colocated daemons.

136 IBM Storage Ceph


Figure 4. Colocated Daemons

Figure 5. Non-colocated Daemons

IBM Storage Ceph 137


Operating system requirements for IBM Storage Ceph
Edit online
Red Hat Enterprise Linux entitlements are included in the IBM Storage Ceph subscription.

The release of IBM Storage Ceph 5.3 is supported on Red Hat Enterprise Linux 8.4 EUS or later.

IBM Storage Ceph 5 is supported on container-based deployments only.

Use the same operating system version, architecture, and deployment type across all nodes.

For example, do not use a mixture of nodes with both AMD64 and Intel 64 architectures, a mixture of nodes with Red Hat Enterprise
Linux 8 operating systems, or a mixture of nodes with container-based deployments.

IMPORTANT: IBM does not support clusters with heterogeneous architectures, operating system versions, or deployment types.

SELinux

By default, SELinux is set to Enforcing mode and the ceph-selinux packages are installed.For additional information on SELinux,
see Red Hat Enterprise Linux 8 Using SELinux Guide.

138 IBM Storage Ceph


Minimum hardware considerations for IBM Storage Ceph
Edit online
IBM Storage Ceph can run on non-proprietary commodity hardware. Small production clusters and development clusters can run
without performance optimization with modest hardware.

NOTE: Disk space requirements are based on the Ceph daemons' default path under /var/lib/ceph/ directory.

Table 1. Containers

Process Criteria Minimum Recommended


ceph-osd- Processor 1x AMD64 or Intel 64 CPU CORE per OSD container
container
RAM Minimum of 5 GB of RAM per OSD container
OS Disk 1x OS disk per host
OSD Storage 1x storage drive per OSD container. Cannot be shared with OS Disk.
block.db Optional, but IBM recommended, 1x SSD or NVMe or Optane partition or lvm per daemon.
Sizing is 4% of block.data for BlueStore for object, file and mixed workloads and 1% of
block.data for the BlueStore for Block Device, Openstack cinder, and Openstack cinder
workloads.
block.wal Optionally, 1x SSD or NVMe or Optane partition or logical volume per daemon. Use a small
size, for example 10 GB, and only if it’s faster than the block.db device.
Network 2x 10 GB Ethernet NICs
ceph-mon- Processor 1x AMD64 or Intel 64 CPU CORE per mon-container
container
RAM 3 GB per mon-container
Disk Space 10 GB per mon-container, 50 GB Recommended
Monitor Disk Optionally, 1x SSD disk for Monitor rocksdb data
Network 2x 1 GB Ethernet NICs, 10 GB Recommended
Prometheus 20 GB to 50 GB under /var/lib/ceph/` directory created as a separate file system to
protect the contents under /var/ directory.
ceph-mgr- Processor 1x AMD64 or Intel 64 CPU CORE per mgr-container
container
RAM 3 GB per mgr-container
Network 2x 1 GB Ethernet NICs, 10 GB Recommended
ceph- Processor 1x AMD64 or Intel 64 CPU CORE per radosgw-container
radosgw-
container RAM 1 GB per daemon
Disk Space 5 GB per daemon
Network 1x 1 GB Ethernet NICs
ceph-mds- Processor 1x AMD64 or Intel 64 CPU CORE per mds-container
container
RAM 3 GB per mds-container

This number is highly dependent on the configurable MDS cache size. The RAM requirement is
typically twice as much as the amount set in the mds_cache_memory_limit configuration
setting. Note also that this is the memory for your daemon, not the overall system memory.
Disk Space 2 GB per mds-container, plus taking into consideration any additional space required for
possible debug logging, 20GB is a good start.
Network 2x 1 GB Ethernet NICs, 10 GB Recommended

Note that this is the same network as the OSD containers. If you have a 10 GB network on your
OSDs you should use the same on your MDS so that the MDS is not disadvantaged when it
comes to latency.

Reference
Edit online

To take a deeper look into Ceph’s various internal components and the strategies around those components, see Storage
Strategies.

IBM Storage Ceph 139


IBM Storage Ceph installation
Edit online
As a storage administrator, you can use the cephadm utility to deploy new IBM Storage Ceph clusters.

The cephadm utility manages the entire life cycle of a Ceph cluster. Installation and management tasks comprise two types of
operations:

Day One operations involve installing and bootstrapping a bare-minimum, containerized Ceph storage cluster, running on a
single node. Day One also includes deploying the Monitor and Manager daemons and adding Ceph OSDs.

Day Two operations use the Ceph orchestration interface, cephadm orch, or the IBM Storage Ceph Dashboard to expand the
storage cluster by adding other Ceph services to the storage cluster.

cephadm utility
How cephadm works
cephadm-ansible playbooks
Registering the IBM Storage Ceph nodes
Configuring Ansible inventory location
Enabling SSH login as root user on Red Hat Enterprise Linux 9
Creating an Ansible user with sudo access
Enabling password-less SSH for Ansible
Configuring SSH
Configuring a different SSH user
Running the preflight playbook
Bootstrapping a new storage cluster
Distributing SSH keys
Launching the cephadm shell
Verifying the cluster installation
Adding hosts
Removing hosts
Labeling hosts
Adding Monitor service
Setting up the admin node
Adding Manager service
Adding OSDs
Purging the Ceph storage cluster
Deploying client nodes

Prerequisites
Edit online

At least one running virtual machine (VM) or bare-metal server with an active internet connection.

Red Hat Enterprise Linux 8.4 EUS or later.

Ansible 2.9 or later.

Server and Ceph repositories enabled.

Root-level access to all nodes.

An active IBM Network or service account to access the IBM Registry.

Remove troubling configurations in iptables so that refresh of iptables services does not cause issues to the cluster. For an
example, see Verifying firewall rules are configured for default Ceph ports.

Procedure
Edit online

1. Enable the Red Hat Enterprise Linux baseos and appstream repositories:

140 IBM Storage Ceph


Example

Red Hat Enterprise Linux 8:

[root@admin ~]# subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms

[root@admin ~]# subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms

Example

Red Hat Enterprise Linux 9:

[root@admin ~]# subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms

[root@admin ~]# subscription-manager repos --enable=rhel-9-for-x86_64-appstream-rpms

2. Enable the ceph-tools repository for both Red Hat Enterprise Linux 8 and Red Hat Enterprise Linux 9:

Red Hat Enterprise Linux 8:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-


ceph-5-rhel-8.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-8.repo

Red Hat Enterprise Linux 9:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-


ceph-5-rhel-9.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-9.repo

Repeat the above steps on all the nodes of the storage cluster.

3. Add license to install IBM Storage Ceph and click "Accept":

Example

[root@admin ~]# dnf install ibm-storage-ceph-license

4. Accept these provisions:

Example

[root@admin ~]# sudo touch /usr/share/ibm-storage-ceph-license/accept

cephadm utility

Edit online
The cephadm utility deploys and manages a Ceph storage cluster. It is tightly integrated with both the command-line interface (CLI)
and the IBM Storage Ceph Dashboard web interface, so that you can manage storage clusters from either environment. cephadm
uses SSH to connect to hosts from the manager daemon to add, remove, or update Ceph daemon containers. It does not rely on
external configuration or orchestration tools such as Ansible or Rook.

NOTE: The cephadm utility is available after running the preflight playbook on a host.

The cephadm utility consists of two main components:

The cephadm shell.

The cephadm orchestrator.

The cephadm shell

The cephadm shell launches a bash shell within a container. This enables you to perform “Day One” cluster setup tasks, such as
installation and bootstrapping, and to invoke ceph commands.

There are two ways to invoke the cephadm shell:

Enter cephadm shell at the system prompt:

Example

IBM Storage Ceph 141


[root@host01 ~]# cephadm shell
[ceph: root@host01 /]# ceph -s

At the system prompt, type cephadm shell and the command you want to execute:

Example

[root@host01 ~]# cephadm shell ceph -s

NOTE: If the node contains configuration and keyring files in /etc/ceph/, the container environment uses the values in those files
as defaults for the cephadm shell. However, if you execute the cephadm shell on a Ceph Monitor node, the cephadm shell inherits its
default configuration from the Ceph Monitor container, instead of using the default configuration.

The cephadm orchestrator

The cephadm orchestrator enables you to perform “Day Two” Ceph functions, such as expanding the storage cluster and
provisioning Ceph daemons and services. You can use the cephadm orchestrator through either the command-line interface (CLI) or
the web-based IBM Storage Ceph Dashboard. Orchestrator commands take the form ceph orch.

The cephadm script interacts with the Ceph orchestration module used by the Ceph Manager.

How cephadm works

Edit online
The cephadm command manages the full lifecycle of an IBM Storage Ceph cluster. The cephadm command can perform the
following operations:

Bootstrap a new IBM Storage Ceph cluster.

Launch a containerized shell that works with the IBM Storage Ceph command-line interface (CLI).

Aid in debugging containerized daemons.

The cephadm command uses ssh to communicate with the nodes in the storage cluster. This allows you to add, remove, or update
IBM Storage Ceph containers without using external tools. Generate the ssh key pair during the bootstrapping process, or use your
own ssh key.

The cephadm bootstrapping process creates a small storage cluster on a single node, consisting of one Ceph Monitor and one Ceph
Manager, as well as any required dependencies. You then use the orchestrator CLI or the IBM Storage Ceph Dashboard to expand the
storage cluster to include nodes, and to provision all of the IBM Storage Ceph daemons and services. You can perform management
functions through the CLI or from the IBM Storage Ceph Dashboard web interface.

NOTE: The cephadm utility is a new feature in IBM Storage Ceph 5. It does not support older versions of IBM Storage Ceph.

Figure 1. Ceph storage cluster deployment

142 IBM Storage Ceph


cephadm-ansible playbooks

Edit online
The cephadm-ansible package is a collection of Ansible playbooks to simplify workflows that are not covered by cephadm. After
installation, the playbooks are located in /usr/share/cephadm-ansible/.

IMPORTANT: Red Hat Enterprise Linux 9 and later does not support the cephadm-ansible playbook.

The cephadm-ansbile package includes the following playbooks:

cephadm-preflight.yml

cephadm-clients.yml

cephadm-purge-cluster.yml

The cephadm-preflight playbook

Use the cephadm-preflight playbook to initially setup hosts before bootstrapping the storage cluster and before adding new
nodes or clients to your storage cluster. This playbook configures the Ceph repository and installs some prerequisites such as
podman, lvm2, chronyd, and cephadm.

For more information, see Running the preflight playbook.

The cephadm-clients playbook

Use the cephadm-clients playbook to set up client hosts. This playbook handles the distribution of configuration and keyring files
to a group of Ceph clients.

The cephadm-purge-cluster playbook

IBM Storage Ceph 143


Use the cephadm-purge-cluster playbook to remove a Ceph cluster. This playbook purges a Ceph cluster managed with
cephadm.

For more information, see Purging the Ceph storage cluster.

Registering the IBM Storage Ceph nodes


Edit online
IBM Storage Ceph 5.3 is supported on Red Hat Enterprise Linux 8.4 EUS or later.

Prerequisites
Edit online

At least one running virtual machine (VM) or bare-metal server with an active internet connection.

Red Hat Enterprise Linux 8.4 EUS or later.

A valid IBM subscription with the appropriate entitlements.

Root-level access to all nodes.

Procedure
Edit online

1. Enable the Red Hat Enterprise Linux baseos and appstream repositories:

Example

Red Hat Enterprise Linux 8:

[root@admin ~]# subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms

[root@admin ~]# subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms

Example

Red Hat Enterprise Linux 9:

[root@admin ~]# subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms

[root@admin ~]# subscription-manager repos --enable=rhel-9-for-x86_64-appstream-rpms

2. Enable the ceph-tools repository for both Red Hat Enterprise Linux 8 and Red Hat Enterprise Linux 9:

Red Hat Enterprise Linux 8:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-


ceph-5-rhel-8.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-8.repo

Red Hat Enterprise Linux 9:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-


ceph-5-rhel-9.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-9.repo

Repeat the above steps on all the nodes of the storage cluster.

3. Install cephadm-ansible on Red Hat Enterprise Linux 8:

Syntax

dnf install cephadm-ansible

IMPORTANT: Skip this step for {os-product} 9 as cephadm-ansible is not supported.

144 IBM Storage Ceph


Configuring Ansible inventory location
Edit online
You can configure inventory location files for the cephadm-ansible staging and production environments. The Ansible inventory
hosts file contains all the hosts that are part of the storage cluster. You can list nodes individually in the inventory hosts file or you
can create groups such as [mons],[osds], and [rgws] to provide clarity to your inventory and ease the usage of the --limit
option to target a group or node when running a playbook.

NOTE: If deploying clients, client nodes must be defined in a dedicated [clients] group.

IMPORTANT: Skip these steps for Red Hat Enterprise Linux 9 as cephadm-ansible is not supported.

Prerequisites
Edit online

An Ansible administration node.

Root-level access to the Ansible administration node.

The cephadm-ansible package is installed on the node.

Procedure
Edit online

1. Navigate to the /usr/share/cephadm-ansible/ directory:

[root@admin ~]# cd /usr/share/cephadm-ansible

2. Optional: Create subdirectories for staging and production:

[root@admin cephadm-ansible]# mkdir -p inventory/staging inventory/production

3. Optional: Edit the ansible.cfg file and add the following line to assign a default inventory location:

[defaults]
inventory = ./inventory/staging

4. Optional: Create an inventory hosts file for each environment:

[root@admin cephadm-ansible]# touch inventory/staging/hosts


[root@admin cephadm-ansible]# touch inventory/production/hosts

5. Open and edit each hosts file and add the nodes and [admin] group:

Syntax

NODE_NAME_1
NODE_NAME_2

[admin]
ADMIN_NODE_NAME_1

Replace NODE_NAME_1 and NODE_NAME_2 with the Ceph nodes such as monitors, OSDs, MDSs, and gateway nodes.

Replace ADMIN_NODE_NAME_1 with the name of the node where the admin keyring is stored.

Example

host02
host03
host04

[admin]
host01

IBM Storage Ceph 145


NOTE: If you set the inventory location in the ansible.cfg file to staging, you need to run the playbooks in the
staging environment as follows:

Syntax

ansible-playbook -i inventory/staging/hosts PLAYBOOK.yml

To run the playbooks in the production environment:

Syntax

ansible-playbook -i inventory/production/hosts PLAYBOOK.yml

Enabling SSH login as root user on Red Hat Enterprise Linux 9


Edit online
Red Hat Enterprise Linux 9 does not support SSH login as a root user even if PermitRootLogin parameter is set to yes in the
/etc/ssh/sshd_config file. You get the following error:

Example

[root@host01 ~]# ssh root@myhostname


root@myhostname password:
Permission denied, please try again.

You can run one of the following methods to enable login as a root user:

Use "Allow root SSH login with password" flag while setting the root password during installation of Red Hat Enterprise Linux
9.

Manually set the PermitRootLogin parameter after Red Hat Enterprise Linux 9 installation.

This section describes manual setting of the PermitRootLogin parameter.

Prerequisites
Edit online

Root-level access to all nodes.

Procedure
Edit online

1. Open the etc/ssh/sshd_config file and set the PermitRootLogin to yes:

Example

[root@admin ~]# echo 'PermitRootLogin yes' >> /etc/ssh/sshd_config.d/01-permitrootlogin.conf

2. Restart the SSH service:

Example

# systemctl restart sshd.service

3. Login to the node as the root user:

Syntax

ssh root@HOST_NAME

Replace HOST_NAME with the host name of the Ceph node.

Example

[root@admin ~]# ssh root@host01

146 IBM Storage Ceph


Enter the root password when prompted.

Reference
Edit online

For more information, see the Not able to login as root user via ssh in RHEL 9 server.

Creating an Ansible user with sudo access

Edit online
You can create an Ansible user with password-less root access on all nodes in the storage cluster to run the cephadm-ansible
playbooks. The Ansible user must be able to log into all the IBM Storage Ceph nodes as a user that has root privileges to install
software and create configuration files without prompting for a password.

IMPORTANT: If you are a non-root user on a Red Hat Enterprise Linux 9, you can follow the steps for this creating the user, else you
can skip these steps for Red Hat Enterprise Linux 9.

Prerequisites
Edit online

Root-level access to all nodes.

For Red Hat Enterprise 9, to log in as a root user, see Enabling SSH log in as root user on Red Hat Enterprise 9

Procedure
Edit online

1. Log in to the node as the root user:

Syntax

ssh root@HOST_NAME

Replace HOST_NAME with the host name of the Ceph node.

Example

[root@admin ~]# ssh root@host01

Enter the root password when prompted.

2. Create a new Ansible user:

Syntax

adduser USER_NAME

Replace USER_NAME with the new user name for the Ansible user.

Example

[root@host01 ~]# adduser ceph-admin

IMPORTANT: Do not use ceph as the user name. The ceph user name is reserved for the Ceph daemons. A uniform user
name across the cluster can improve ease of use, but avoid using obvious user names, because intruders typically use them
for brute-force attacks.

3. Set a new password for this user:

Syntax

passwd USER_NAME

IBM Storage Ceph 147


Replace USER_NAME with the new user name for the Ansible user.

Example

[root@host01 ~]# passwd ceph-admin

Enter the new password twice when prompted.

4. Configure sudo access for the newly created user:

Syntax

cat << EOF >/etc/sudoers.d/USER_NAME


$USER_NAME ALL = (root) NOPASSWD:ALL
EOF

Replace USER_NAME with the new user name for the Ansible user.

Example

[root@host01 ~]# cat << EOF >/etc/sudoers.d/ceph-admin


ceph-admin ALL = (root) NOPASSWD:ALL
EOF

5. Assign the correct file permissions to the new file:

Syntax

chmod 0440 /etc/sudoers.d/USER_NAME

Replace USER_NAME with the new user name for the Ansible user.

Example

[root@host01 ~]# chmod 0440 /etc/sudoers.d/ceph-admin

6. Repeat the above steps on all nodes in the storage cluster.

Reference
Edit online

For more information about creating user accounts, see Configuring basic system settings > Getting started with managing
user accounts within the Red Hat Enterprise Linux guide.

Enabling password-less SSH for Ansible


Edit online
Generate an SSH key pair on the Ansible administration node and distribute the public key to each node in the storage cluster so that
Ansible can access the nodes without being prompted for a password.

IMPORTANT: If you are a non-root user on a Red Hat Enterprise Linux 9, you can follow the steps for this creating the user, else you
can skip these steps for Red Hat Enterprise Linux 9.

Prerequisites
Edit online

Access to the Ansible administration node.

Ansible user with sudo access to all nodes in the storage cluster.

For Red Hat Enterprise 9, to log in as a root user, see Enabling SSH log in as root user on Red Hat Enterprise 9

Procedure
148 IBM Storage Ceph
Edit online

1. Generate the SSH key pair, accept the default file name and leave the passphrase empty:

[ansible@admin ~]$ ssh-keygen

2. Copy the public key to all nodes in the storage cluster:

ssh-copy-id USER_NAME@HOST_NAME

Replace USER_NAME with the new user name for the Ansible user. Replace HOST_NAME with the host name of the Ceph node.

Example

[ansible@admin ~]$ ssh-copy-id ceph-admin@host01

3. Create the user’s SSH config file:

[ansible@admin ~]$ touch ~/.ssh/config

4. Open for editing the config file. Set values for the Hostname and User options for each node in the storage cluster:

Syntax

Host host01
Hostname HOST_NAME
User USER_NAME
Host host02
Hostname HOST_NAME
User USER_NAME
...

Replace HOST_NAME with the host name of the Ceph node. Replace USER_NAME with the new user name for the Ansible user.

Example

Host host01
Hostname host01
User ceph-admin
Host host02
Hostname host02
User ceph-admin
Host host03
Hostname host03
User ceph-admin

IMPORTANT: By configuring the ~/.ssh/config file you do not have to specify the -u _USER_NAME_ option each time you
execute the ansible-playbook command.

5. Set the correct file permissions for the ~/.ssh/config file:

[ansible@admin ~]$ chmod 600 ~/.ssh/config

Reference
Edit online

The ssh_config(5) manual page.

See Using secure communications between two systems with OpenSSH.

Configuring SSH
Edit online
As a storage administrator, with Cephadm, you can use an SSH key to securely authenticate with remote hosts. The SSH key is stored
in the monitor to connect to remote hosts.

IBM Storage Ceph 149


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

An Ansible administration node.

Root-level access to the Ansible administration node.

The cephadm-ansible package is installed on the node.

Procedure
Edit online

1. Navigate to the cephadm-ansible directory.

2. Generate a new SSH key:

Example

[ceph-admin@admin cephadm-ansible]$ ceph cephadm generate-key

3. Retrieve the public portion of the SSH key:

Example

[ceph-admin@admin cephadm-ansible]$ ceph cephadm get-pub-key

4. Delete the currently stored SSH key:

Example

[ceph-admin@admin cephadm-ansible]$ceph cephadm clear-key

5. Restart the mgr daemon to reload the configuration:

Example

[ceph-admin@admin cephadm-ansible]$ ceph mgr fail

Configuring a different SSH user


Edit online
As a storage administrator, you can configure a non-root SSH user who can log into all the Ceph cluster nodes with enough privileges
to download container images, start containers, and execute commands without prompting for a password.

IMPORTANT: Prior to configuring a non-root SSH user, the cluster SSH key needs to be added to the user's authorized_keys file
and non-root users must have passwordless sudo access.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

An Ansible administration node.

Root-level access to the Ansible administration node.

The cephadm-ansible package is installed on the node.

Add the cluster SSH keys to the user's authorized_keys.

150 IBM Storage Ceph


Enable passwordless sudo access for the non-root users.

Procedure
Edit online

1. Navigate to the cephadm-ansible directory.

2. Provide Cephadm the name of the user who is going to perform all the Cephadm operations:

Syntax

ceph cephadm set-user USER

Example

[ceph-admin@admin cephadm-ansible]$ ceph cephadm set-user user

3. Retrieve the SSH public key.

Syntax

ceph cephadm get-pub-key > ~/ceph.pub

Example

[ceph-admin@admin cephadm-ansible]$ ceph cephadm get-pub-key > ~/ceph.pub

4. Copy the SSH keys to all the hosts.

Syntax

ssh-copy-id -f -i ~/ceph.pub USER@HOST

Example

[ceph-admin@admin cephadm-ansible]$ ssh-copy-id ceph-admin@host01

Running the preflight playbook


Edit online
This Ansible playbook configures the Ceph repository and prepares the storage cluster for bootstrapping. It also installs some
prerequisites, such as podman, lvm2, chronyd, and cephadm. The default location for cephadm-ansible and cephadm-
preflight.yml is /usr/share/cephadm-ansible.

The preflight playbook uses the cephadm-ansible inventory file to identify all the admin and nodes in the storage cluster.

IMPORTANT: Skip these steps for Red Hat Enterprise Linux 9 as cephadm-ansible is not supported.

The default location for the inventory file is /usr/share/cephadm-ansible/hosts. The following example shows the structure
of a typical inventory file:

Example

host02
host03
host04

[admin]
host01

The [admin] group in the inventory file contains the name of the node where the admin keyring is stored. On a new storage cluster,
the node in the [admin] group will be the bootstrap node. To add additional admin hosts after bootstrapping the cluster see Setting
up the admin node.

NOTE: Run the preflight playbook before you bootstrap the initial host.

IMPORTANT: If you are performing a disconnected installation, see Running the preflight playbook for a disconnected installation.

IBM Storage Ceph 151


Prerequisites

Root-level access to the Ansible administration node.

Ansible user with sudo and passwordless ssh access to all nodes in the storage cluster.

NOTE: In the below example, host01 is the bootstrap node.

Procedure

1. Navigate to the the /usr/share/cephadm-ansible directory.

2. Open and edit the hosts file and add your nodes:

Example

host02
host03
host04

[admin]
host01

3. Add license to install IBM Storage Ceph and click "Accept" on all nodes:

Example

[root@admin ~]# dnf install ibm-storage-ceph-license

a. Accept these provisions:

Example

[root@admin ~]# sudo touch /usr/share/ibm-storage-ceph-license/accept

4. Run the preflight playbook:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=ibm"

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-vars


"ceph_origin=ibm"

After installation is complete, cephadm resides in the /usr/sbin/ directory.

Use the --limit option to run the preflight playbook on a selected set of hosts in the storage cluster:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=ibm" -


-limit GROUP_NAME|NODE_NAME

Replace GROUP_NAME with a group name from your inventory file. Replace NODE_NAME with a specific node name
from your inventory file.

NOTE: Optionally, you can group your nodes in your inventory file by group name such as [mons], [osds], and
[mgrs]. However, admin nodes must be added to the [admin] group and clients must be added to the [clients]
group.

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-


vars "ceph_origin=ibm" --limit clients
[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-
vars "ceph_origin=ibm" --limit host01

When you run the preflight playbook, cephadm-ansible automatically installs chronyd and ceph-common on the
client

The preflight playbook installs chronyd but configures it for a single NTP source.

152 IBM Storage Ceph


If you want to configure multiple sources or if you have a disconnected environment, see the following documentation
for more infromation:

How to configure chrony?

Best practices for NTP

Basic chrony NTP troubleshooting

Bootstrapping a new storage cluster


Edit online
The cephadm utility performs the following tasks during the bootstrap process:

Installs and starts a Ceph Monitor daemon and a Ceph Manager daemon for a new IBM Storage Ceph cluster on the local node
as containers.

Creates the /etc/ceph directory.

Writes a copy of the public key to /etc/ceph/ceph.pub for the IBM Storage Ceph cluster and adds the SSH key to the root
user’s /root/.ssh/authorized_keys file.

Applies the _admin label to the bootstrap node.

Writes a minimal configuration file needed to communicate with the new cluster to /etc/ceph/ceph.conf.

Writes a copy of the client.admin administrative secret key to /etc/ceph/ceph.client.admin.keyring.

Deploys a basic monitoring stack with prometheus, grafana, and other tools such as node-exporter and alert-manager.

IMPORTANT: If you are performing a disconnected installation, see Performing a disconnected installation.

NOTE: If you have existing prometheus services that you want to run with the new storage cluster, or if you are running Ceph with
Rook, use the --skip-monitoring-stack option with the cephadm bootstrap command. This option bypasses the basic
monitoring stack so that you can manually configure it later.

IMPORTANT: If you are deploying a monitoring stack, see Deploying the monitoring stack using the Ceph Orchestrator

IMPORTANT: Bootstrapping provides the default user name and password for the initial login to the Dashboard. Bootstrap requires
you to change the password after you log in.

IMPORTANT: Before you begin the bootstrapping process, make sure that the container image that you want to use has the same
version of IBM Storage Ceph as cephadm. If the two versions do not match, bootstrapping fails at the Creating initial admin
user stage.

NOTE: Before you begin the bootstrapping process, you must create a username and password for the cp.icr.io/cp container
registry.

Recommended cephadm bootstrap command options


Obtaining entitlement key
Using a JSON file to protect login information
Bootstrapping a storage cluster using a service configuration file
Bootstrapping the storage cluster as a non-root user
Bootstrap command options
Configuring a private registry for a disconnected installation
Running the preflight playbook for a disconnected installation
Performing a disconnected installation
Changing configurations of custom container images for disconnected installations

Prerequisites
Edit online

An IP address for the first Ceph Monitor container, which is also the IP address for the first node in the storage cluster.

IBM Storage Ceph 153


Login access to cp.icr.io/cp. For information about obtaining credentials for cp.icr.io/cp, see Obtaining entitlement
key

A minimum of 10 GB of free space for /var/lib/containers/.

Root-level access to all nodes.

NOTE: If the storage cluster includes multiple networks and interfaces, be sure to choose a network that is accessible by any node
that uses the storage cluster.

NOTE: If the local node uses fully-qualified domain names (FQDN), then add the --allow-fqdn-hostname option to cephadm
bootstrap on the command line.

IMPORTANT: Run cephadm bootstrap on the node that you want to be the initial Monitor node in the cluster. The IP_ADDRESS
option should be the IP address of the node you are using to run cephadm bootstrap.

NOTE: If you want to deploy a storage cluster using IPV6 addresses, then use the IPV6 address format for the --mon-ip
IP_ADDRESS option. For example: cephadm bootstrap --mon-ip 2620:52:0:880:225:90ff:fefc:2536 --registry-
json /etc/mylogin.json

IMPORTANT: Configuring Ceph Object Gateway multi-site on IBM Storage Ceph 5.3 is not supported due to several open issues. For
more information, see the Red Hat knowledge base article Red Hat Ceph Storage 5.3 does not support multi-site configuration. Use
the --yes-i-know flag while bootstrapping a new IBM Storage Ceph cluster to get past the warning about multi-site regressions.

NOTE: Follow the knowledge base article How to upgrade from Red Hat Ceph Storage 4.2z4 to Red Hat Ceph Storage 5.0z4 with the
bootstrapping procedure if you are planning for a new installation of IBM Storage Ceph 5.3z4.

Procedure
Edit online

1. Bootstrap a storage cluster:

Syntax

cephadm bootstrap --cluster-network NETWORK_CIDR --mon-ip IP_ADDRESS --registry-url


cp.icr.io/cp --registry-username USER_NAME --registry-password PASSWORD --yes-i-know

Example

[root@host01 ~]# cephadm bootstrap --cluster-network 10.10.128.0/24 --mon-ip 10.10.128.68 --


registry-url cp.icr.io/cp --registry-username myuser1 --registry-password mypassword1 --yes-i-
know

NOTE: If you want internal cluster traffic routed over the public network, you can omit the --cluster-network
NETWORK_CIDR option.

The script takes a few minutes to complete. Once the script completes, it provides the credentials to the IBM Storage Ceph
Dashboard URL, a command to access the Ceph command-line interface (CLI), and a request to enable telemetry.

Ceph Dashboard is now available at:

URL: https://host01:8443/
User: admin
Password: i8nhu7zham

Enabling client.admin keyring and conf on hosts with "admin" label


You can access the Ceph CLI with:

sudo /usr/sbin/cephadm shell --fsid 266ee7a8-2a05-11eb-b846-5254002d4916 -c


/etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Please consider enabling telemetry to help improve Ceph:

ceph telemetry on

For more information see:

https://docs.ceph.com/docs/master/mgr/telemetry/

Bootstrap complete.

154 IBM Storage Ceph


Reference
Edit online

Recommended cephadm bootstrap command options

Bootstrap command options

Using a JSON file to protect login information

Recommended cephadm bootstrap command options


Edit online
The cephadm bootstrap command has multiple options that allow you to specify file locations, configure ssh settings, set
passwords, and perform other initial configuration tasks.

IBM recommends that you use a basic set of command options for cephadm bootstrap. You can configure additional options after
your initial cluster is up and running.

The following examples show how to specify the recommended options.

Syntax

cephadm bootstrap --ssh-user USER_NAME --mon-ip IP_ADDRESS --allow-fqdn-hostname --registry-json


REGISTRY_JSON

Example

[root@host01 ~]# cephadm bootstrap --ssh-user ceph --mon-ip 10.10.128.68 --allow-fqdn-hostname --


registry-json /etc/mylogin.json

For non-root users, see Creating an Ansible user with sudo access and Enabling password-less SSH for Ansible for more details.

Reference
Edit online

For more information about the --registry-json option, see Using a JSON file to protect login information

For more information about all available cephadm bootstrap options, see Bootstrap command options

For more information about bootstrapping the storage cluster as a non-root user, see Bootstrapping the storage cluster as a
non-root user

Obtaining entitlement key


Edit online
Entitlement keys determine whether IBM Storage Ceph can automatically pull the required container default images. During
installation, image pull failures can occur due to an invalid entitlement key or a key belonging to an account that does not have
entitlement to IBM Storage Ceph.

Procedure
Edit online

1. Log in to the IBM container software library with the IBM ID and password that is associated with the entitled IBM Storage
Ceph software.

2. In the navigation bar, click Get entitlement key.

3. On the Access your container software page, click Copy key to copy the generated entitlement key.

IBM Storage Ceph 155


4. Save the key to a secure location for future use.

5. The user is cp while the key is the token which is the password.

6. Verify the login against the registry.

podman login -u cp -p TOKEN cp.icr.io/cp

Login Succeeded!

Using a JSON file to protect login information


Edit online
As a storage administrator, you might choose to add login and password information to a JSON file, and then refer to the JSON file for
bootstrapping. This protects the login credentials from exposure.

NOTE: You can also use a JSON file with the cephadm --registry-login command.

Prerequisites
Edit online

An IP address for the first Ceph Monitor container, which is also the IP address for the first node in the storage cluster.

Login access to cp.icr.io/cp.

A minimum of 10 GB of free space for /var/lib/containers/.

Root-level access to all nodes.

Procedure
Edit online

1. Create the JSON file. In this example, the file is named mylogin.json.

Syntax

{
"url":"REGISTRY_URL",
"username":"USER_NAME",
"password":"PASSWORD"
}

Example

{
"url":"cp.icr.io/cp",
"username":"myuser1",
"password":"mypassword1"
}

2. Bootstrap a storage cluster:

Syntax

cephadm bootstrap --mon-ip IP_ADDRESS --registry-json /etc/mylogin.json

Example

[root@host01 ~]# cephadm bootstrap --mon-ip 10.10.128.68 --registry-json /etc/mylogin.json

Bootstrapping a storage cluster using a service configuration file

156 IBM Storage Ceph


Edit online
To bootstrap the storage cluster and configure additional hosts and daemons using a service configuration file, use the --apply-
spec option with the cephadm bootstrap command. The configuration file is a .yaml file that contains the service type,
placement, and designated nodes for services that you want to deploy.

NOTE: If you want to use a non-default realm or zone for applications such as multi-site, configure your Ceph Object Gateway
daemons after you bootstrap the storage cluster, instead of adding them to the configuration file and using the --apply-spec
option. This gives you the opportunity to create the realm or zone you need for the Ceph Object Gateway daemons before deploying
them.

NOTE: To deploy a Metadata Server (MDS) service, configure it after bootstrapping the storage cluster.

To deploy the MDS service, you must create a CephFS volume first.

NOTE: If you run the bootstrap command with --apply-spec option, ensure to include the IP address of the bootstrap host in the
specification file. This prevents resolving the IP address to loopback address while re-adding the bootstrap host where active Ceph
Manager is already running. If you do not use the --apply spec option during bootstrap and instead use ceph orch apply
command with another specification file which includes re-adding the host and contains an active Ceph Manager running, then
ensure to explicitly provide the addr field. This is applicable for applying any specification file after bootstrapping.

For more information, see Operations.

Prerequisites
Edit online

At least one running virtual machine (VM) or server.

Red Hat Enterprise Linux 8.4 EUS or later.

Root-level access to all nodes.

Login access to cp.icr.io/cp.

Passwordless ssh is set up on all hosts in the storage cluster.

cephadm is installed on the node that you want to be the initial Monitor node in the storage cluster.

Procedure
Edit online

1. Log in to the bootstrap host.

2. Create the service configuration .yaml file for your storage cluster. The example file directs cephadm bootstrap to
configure the initial host and two additional hosts, and it specifies that OSDs be created on all available disks.

Example

service_type: host
addr: host01
hostname: host01
---
service_type: host
addr: host02
hostname: host02
---
service_type: host
addr: host03
hostname: host03
---
service_type: host
addr: host04
hostname: host04
---
service_type: mon
placement:
host_pattern: "host[0-2]"
---

IBM Storage Ceph 157


service_type: osd
service_id: my_osds
placement:
host_pattern: "host[1-3]"
data_devices:
all: true

3. Bootstrap the storage cluster with the --apply-spec option:

Syntax

cephadm bootstrap --apply-spec CONFIGURATION_FILE_NAME --mon-ip MONITOR_IP_ADDRESS --registry-


url cp.icr.io/cp --registry-username USER_NAME --registry-password PASSWORD

Example

[root@host01 ~]# cephadm bootstrap --apply-spec initial-config.yaml --mon-ip 10.10.128.68 --


registry-url cp.icr.io/cp --registry-username myuser1 --registry-password mypassword1

The script takes a few minutes to complete. Once the script completes, it provides the credentials to the IBM Storage Ceph
Dashboard URL, a command to access the Ceph command-line interface (CLI), and a request to enable telemetry.

Once your storage cluster is up and running, see Operations for more information about configuring additional daemons and
services.

Reference
Edit online

Bootstrap command options

Bootstrapping the storage cluster as a non-root user


Edit online
To bootstrap the IBM Storage Ceph cluster as a non-root user on the bootstrap node, use the --ssh-user option with the cephadm
bootstrap command. --ssh-user specifies a user for SSH connections to cluster nodes.

Non-root users must have passwordless sudo access. See the Creating an Ansible user with sudo access section and Enabling
password-less SSH for Ansible sections for more details.

Prerequisites
Edit online

An IP address for the first Ceph Monitor container, which is also the IP address for the initial Monitor node in the storage
cluster.

Login access to cp.icr.io/cp.

A minimum of 10 GB of free space for /var/lib/containers/.

SSH public and private keys.

Passwordless sudo access to the bootstrap node.

Procedure
Edit online

1. Change to sudo on the bootstrap node:

Syntax

su - SSH_USER_NAME

158 IBM Storage Ceph


Example

[root@host01 ~]# su - ceph


Last login: Tue Sep 14 12:00:29 EST 2021 on pts/0

2. Establish the SSH connection to the bootstrap node:

Example

[ceph@host01 ~]# ssh host01


Last login: Tue Sep 14 12:03:29 EST 2021 on pts/0

3. Optional: Invoke the cephadm bootstrap command.

NOTE: Using private and public keys is optional. If SSH keys have not previously been created, these can be created during
this step.

Include the --ssh-private-key and --ssh-public-key options:

Syntax

cephadm bootstrap --ssh-user USER_NAME --mon-ip IP_ADDRESS --ssh-private-key PRIVATE_KEY --


ssh-public-key PUBLIC_KEY --registry-url cp.icr.io/cp --registry-username USER_NAME --
registry-password PASSWORD

Example

cephadm bootstrap --ssh-user ceph --mon-ip 10.10.128.68 --ssh-private-key


/home/ceph/.ssh/id_rsa --ssh-public-key /home/ceph/.ssh/id_rsa.pub --registry-url cp.icr.io/cp
--registry-username myuser1 --registry-password mypassword1

Reference
Edit online

Bootstrap command options

For more information about utilizing Ansible to automate bootstrapping a rootless cluster, see the knowledge base article Red
Hat Ceph Storage 5.3 rootless deployment utilizing ansible ad-hoc commands.

Bootstrap command options


Edit online
The cephadm bootstrap command bootstraps a Ceph storage cluster on the local host. It deploys a MON daemon and a MGR
daemon on the bootstrap node, automatically deploys the monitoring stack on the local host, and calls ceph orch host add
HOSTNAME.

The following table lists the available options for cephadm bootstrap.

cephadm bootstrap option Description


--config CONFIG_FILE, -c CONFIG_FILE CONFIG_FILE is the ceph.conf file to use with the bootstrap command
--cluster-network NETWORK_CIDR Use the subnet defined by NETWORK_CIDR for internal cluster traffic. This is specified in
CIDR notation. For example: 10.10.128.0/24.
--mon-id MON_ID Bootstraps on the host named MON_ID. Default value is the local host.
--mon-addrv MON_ADDRV mon IPs (e.g., [v2:localipaddr:3300,v1:localipaddr:6789])
--mon-ip IP_ADDRESS IP address of the node you are using to run cephadm bootstrap.
--mgr-id MGR_ID Host ID where a MGR node should be installed. Default: randomly generated.
--fsid FSID Cluster FSID.
--output-dir OUTPUT_DIR Use this directory to write config, keyring, and pub key files.
--output-keyring OUTPUT_KEYRING Use this location to write the keyring file with the new cluster admin and mon keys.
--output-config OUTPUT_CONFIG Use this location to write the configuration file to connect to the new cluster.
--output-pub-ssh-key Use this location to write the public SSH key for the cluster.
OUTPUT_PUB_SSH_KEY

IBM Storage Ceph 159


cephadm bootstrap option Description
--skip-ssh Skip the setup of the ssh key on the local host.
--initial-dashboard-user Initial user for the dashboard.
INITIAL_DASHBOARD_USER
--initial-dashboard-password Initial password for the initial dashboard user.
INITIAL_DASHBOARD_PASSWORD
--ssl-dashboard-port Port number used to connect with the dashboard using SSL.
SSL_DASHBOARD_PORT
--dashboard-key DASHBOARD_KEY Dashboard key.
--dashboard-crt DASHBOARD_CRT Dashboard certificate.
--ssh-config SSH_CONFIG SSH config.
--ssh-private-key SSH_PRIVATE_KEY SSH private key.
--ssh-public-key SSH_PUBLIC_KEY SSH public key.
--ssh-user SSH_USER Sets the user for SSH connections to cluster hosts. Passwordless sudo is needed for
non-root users.
--skip-mon-network Sets mon public_network based on the bootstrap mon ip.
--skip-dashboard Do not enable the Ceph Dashboard.
--dashboard-password-noupdate Disable forced dashboard password change.
--no-minimize-config Do not assimilate and minimize the configuration file.
--skip-ping-check Do not verify that the mon IP is pingable.
--skip-pull Do not pull the latest image before bootstrapping.
--skip-firewalld Do not configure firewalld.
--allow-overwrite Allow the overwrite of existing –output-* config/keyring/ssh files.
--allow-fqdn-hostname Allow fully qualified host name.
--skip-prepare-host Do not prepare host.
--orphan-initial-daemons Do not create initial mon, mgr, and crash service specs.
--skip-monitoring-stack Do not automatically provision the monitoring stack] (prometheus, grafana,
alertmanager, node-exporter).
--apply-spec APPLY_SPEC Apply cluster spec file after bootstrap (copy ssh key, add hosts and apply services).
--registry-url REGISTRY_URL Specifies the URL of the custom registry to log into. For example: cp.icr.io/cp.
--registry-username User name of the login account to the custom registry.
REGISTRY_USERNAME
--registry-password Password of the login account to the custom registry.
REGISTRY_PASSWORD
--registry-json REGISTRY_JSON JSON file containing registry login information.

Reference
Edit online

For more information about the --skip-monitoring-stack option, see Adding Hosts.

For more information about logging into the registry with the registry-json option, see help for the registry-login
command.

For more information about cephadm options, see help for cephadm.

Configuring a private registry for a disconnected installation


Edit online
You can use a disconnected installation procedure to install cephadm and bootstrap your storage cluster on a private network. A
disconnected installation uses a private registry for installation. Use this procedure when the IBM Storage Ceph nodes do NOT have
access to the Internet during deployment.

Follow this procedure to set up a secure private registry using authentication and a self-signed certificate. Perform these steps on a
node that has both Internet access and access to the local cluster.

160 IBM Storage Ceph


NOTE: Using an insecure registry for production is not recommended.

Prerequisites
Edit online

At least one running virtual machine (VM) or server with an active internet connection.

Red Hat Enterprise Linux 8.4 EUS or later.

Login access to cp.icr.io/cp.

Root-level access to all nodes.

Procedure
Edit online

1. Enable the Red Hat Enterprise Linux baseos and appstream repositories:

Example

Red Hat Enterprise Linux 8:

[root@admin ~]# subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms

[root@admin ~]# subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms

Example

Red Hat Enterprise Linux 9:

[root@admin ~]# subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms

[root@admin ~]# subscription-manager repos --enable=rhel-9-for-x86_64-appstream-rpms

2. Enable the ceph-tools repository for both Red Hat Enterprise Linux 8 and Red Hat Enterprise Linux 9:

Red Hat Enterprise Linux 8:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-


ceph-5-rhel-8.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-8.repo

Red Hat Enterprise Linux 9:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-


ceph-5-rhel-9.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-9.repo

Repeat the above steps on all the nodes of the storage cluster.

3. Install the podman and httpd-tools packages:

Example

[root@admin ~]# dnf install -y podman httpd-tools

4. Create folders for the private registry:

Example

[root@admin ~]# mkdir -p /opt/registry/{auth,certs,data}

The registry will be stored in /opt/registry and the directories are mounted in the container running the registry.

The auth directory stores the htpasswd file the registry uses for authentication.

The certs directory stores the certificates the registry uses for authentication.

The data directory stores the registry images.

5. Create credentials for accessing the private registry:

IBM Storage Ceph 161


Syntax

htpasswd -bBc /opt/registry/auth/htpasswd PRIVATE_REGISTRY_USERNAME PRIVATE_REGISTRY_PASSWORD

The b option provides the password from the command line.

The B option stores the password using Bcrypt encryption.

The c option creates the htpasswd file.

Replace PRIVATE_REGISTRY_USERNAME with the username to create for the private registry.

Replace PRIVATE_REGISTRY_PASSWORD with the password to create for the private registry username.

Example

[root@admin ~]# htpasswd -bBc /opt/registry/auth/htpasswd myregistryusername


myregistrypassword1

6. Create a self-signed certificate:

Syntax

openssl req -newkey rsa:4096 -nodes -sha256 -keyout /opt/registry/certs/domain.key -x509 -


days 365 -out /opt/registry/certs/domain.crt -addext "subjectAltName = DNS:_LOCAL_NODE_FQDN_"

Replace LOCAL_NODE_FQDN with the fully qualified host name of the private registry node.

NOTE: You will be prompted for the respective options for your certificate. The CN= value is the host name of your node
and should be resolvable by DNS or the /etc/hosts file.

Example

# openssl req -newkey rsa:4096 -nodes -sha256 -keyout /opt/registry/certs/domain.key -x509 -days 365 -out
/opt/registry/certs/domain.crt -addext "subjectAltName = DNS:admin.lab.ibm.com"

NOTE: When creating a self-signed certificate, be sure to create a certificate with a proper Subject Alternative Name
(SAN). Podman commands that require TLS verification for certificates that do not include a proper SAN, return the
following error: x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common
Name matching with GODEBUG=x509ignoreCN=0

7. Create a symbolic link to domain.cert to allow skopeo to locate the certificate with the file extension .cert:

Example

[root@admin ~]# ln -s /opt/registry/certs/domain.crt /opt/registry/certs/domain.cert

8. Add the certificate to the trusted list on the private registry node:

Syntax

cp /opt/registry/certs/domain.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust
trust list | grep -i "LOCAL_NODE_FQDN"

Replace LOCAL_NODE_FQDN with the FQDN of the private registry node.

Example

[root@admin ~]# cp /opt/registry/certs/domain.crt /etc/pki/ca-trust/source/anchors/


[root@admin ~]# update-ca-trust
[root@admin ~]# trust list | grep -i "admin.lab.ibm.com"

label: admin.lab.ibm.com

9. Copy the certificate to any nodes that will access the private registry for installation and update the trusted list:

Example

[root@admin ~]# scp /opt/registry/certs/domain.crt root@host01:/etc/pki/ca-


trust/source/anchors/
[root@admin ~]# ssh root@host01
[root@host01 ~]# update-ca-trust

162 IBM Storage Ceph


[root@host01 ~]# trust list | grep -i "admin.lab.ibm.com"

label: admin.lab.ibm.com

10. Start the local secure private registry:

Syntax

[root@admin ~]# podman run --restart=always --name NAME_OF_CONTAINER \


-p 5000:5000 -v /opt/registry/data:/var/lib/registry:z \
-v /opt/registry/auth:/auth:z \
-v /opt/registry/certs:/certs:z \
-e "REGISTRY_AUTH=htpasswd" \
-e "REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm" \
-e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \
-e "REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt" \
-e "REGISTRY_HTTP_TLS_KEY=/certs/domain.key" \
-e REGISTRY_COMPATIBILITY_SCHEMA1_ENABLED=true \
-d registry:2

Replace NAME_OF_CONTAINER with a name to assign to the container.

Example

[root@admin ~]# podman run --restart=always --name myprivateregistry \


-p 5000:5000 -v /opt/registry/data:/var/lib/registry:z \
-v /opt/registry/auth:/auth:z \
-v /opt/registry/certs:/certs:z \
-e "REGISTRY_AUTH=htpasswd" \
-e "REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm" \
-e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \
-e "REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt" \
-e "REGISTRY_HTTP_TLS_KEY=/certs/domain.key" \
-e REGISTRY_COMPATIBILITY_SCHEMA1_ENABLED=true \
-d registry:2

This starts the private registry on port 5000 and mounts the volumes of the registry directories in the container running the
registry.

11. On the local registry node, verify that cp.icr.io/cp is in the container registry search path.

a. Open for editing the /etc/containers/registries.conf file, and add cp.icr.io/cp to the unqualified-
search-registries list, if it does not exist:

Example

unqualified-search-registries = ["cp.icr.io/cp"]

12. Login to cp.icr.io/cp with your IBM Customer Portal credentials:

Syntax

podman login cp.icr.io/cp

13. Copy the following IBM Storage Ceph 5 image, Prometheus images, and Dashboard image from the IBM Customer Portal to
the private registry:

Table 1. Custom image details for monitoring stack


Monitoring stack component IBM Storage Ceph version Image details
Prometheus All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/prometheus:v4.10
Grafana All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/ceph-5-dashboard-rhel8:latest
Node-exporter All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/prometheus-node-exporter:v4.10
AlertManager IBM Storage Ceph 5.3 cp.icr.io/cp/ibm-ceph/prometheus-alertmanager:v4.10
HAProxy IBM Storage Ceph 5.3 cp.icr.io/cp/ibm-ceph/haproxy-rhel8:latest
Keepalived IBM Storage Ceph 5.3 cp.icr.io/cp/ibm-ceph/keepalived-rhel8:latest
SNMP Gateway IBM Storage Ceph 5.3 cp.icr.io/cp/ibm-ceph/snmp-notifier-rhel8:latest
Custom image details for monitoring stack

Syntax

IBM Storage Ceph 163


podman run -v /_CERTIFICATE_DIRECTORY_PATH_:/certs:Z -v
/_CERTIFICATE_DIRECTORY_PATH_/domain.cert:/certs/domain.cert:Z --rm
registry.redhat.io/rhel8/skopeo:8.5-8 skopeo copy --remove-signatures --src-creds
_IBM_CUSTOMER_PORTAL_LOGIN_:_IBM_CUSTOMER_PORTAL_PASSWORD_ --dest-cert-dir=./certs/ --dest-
creds _PRIVATE_REGISTRY_USERNAME_:_PRIVATE_REGISTRY_PASSWORD_
docker://cp.icr.io/cp/_SRC_IMAGE_:_SRC_TAG_
docker://_LOCAL_NODE_FQDN_:5000/_DST_IMAGE_:p_DST_TAG_

Replace CERTIFICATE_DIRECTORY_PATH with the directory path to the self-signed certificates.

Replace IBM_CUSTOMER_PORTAL_LOGIN and IBM_CUSTOMER_PORTAL_PASSWORD with your IBM Customer Portal


credentials.

Replace PRIVATE_REGISTRY_USERNAME and PRIVATE_REGISTRY_PASSWORD with the private registry credentials.

Replace SRC_IMAGE and SRC_TAG with the name and tag of the image to copy from cp.icr.io/cp.

Replace DST_IMAGE and DST_TAG with the name and tag of the image to copy to the private registry.

Replace LOCAL_NODE_FQDN with the FQDN of the private registry.

Example

[root@admin ~]# podman run -v /opt/registry/certs:/certs:Z -v


/opt/registry/certs/domain.cert:/certs/domain.cert:Z --rm --rm
registry.redhat.io/rhel8/skopeo:8.5-8 skopeo copy --remove-signatures --src-creds
myusername:mypassword1 --dest-cert-dir=./certs/ --dest-creds
myregistryusername:myregistrypassword1 docker://cp.icr.io/cp/ibm-ceph/ceph-5-rhel8:latest
docker://admin.lab.ibm.com:5000/ibm-ceph/ceph-5-rhel8:latest

[root@admin ~]# podman run -v /opt/registry/certs:/certs:Z -v


/opt/registry/certs/domain.cert:/certs/domain.cert:Z --rm
registry.redhat.io/rhel8/skopeo:8.5-8 skopeo copy --remove-signatures --src-creds
myusername:mypassword1 --dest-cert-dir=./certs/ --dest-creds
myregistryusername:myregistrypassword1 docker://cp.icr.io/cp/ibm-ceph/prometheus-node-
exporter:v4.10 docker://admin.lab.ibm.com:5000/ibm-ceph/prometheus-node-exporter:v4.10
docker

[root@admin ~]# podman run -v /opt/registry/certs:/certs:Z -v


/opt/registry/certs/domain.cert:/certs/domain.cert:Z --rm
registry.redhat.io/rhel8/skopeo:8.5-8 skopeo copy --remove-signatures --src-creds
myusername:mypassword1 --dest-cert-dir=./certs/ --dest-creds
myregistryusername:myregistrypassword1 docker:cp.icr.io/cp/ibm-ceph/ceph-5-dashboard-
rhel8:latest docker://admin.lab.ibm.com:5000/ibm-ceph/ceph-5-dashboard-rhel8:latest

[root@admin ~]# podman run -v /opt/registry/certs:/certs:Z -v


/opt/registry/certs/domain.cert:/certs/domain.cert:Z --rm
registry.redhat.io/rhel8/skopeo:8.5-8 skopeo copy --remove-signatures --src-creds
myusername:mypassword1 --dest-cert-dir=./certs/ --dest-creds
myregistryusername:myregistrypassword1 docker://cp.icr.io/cp/ibm-ceph/prometheus:v4.10
docker://admin.lab.ibm.com:5000/ibm-ceph/prometheus:v4.10

[root@admin ~]# podman run -v /opt/registry/certs:/certs:Z -v


/opt/registry/certs/domain.cert:/certs/domain.cert:Z --rm
registry.redhat.io/rhel8/skopeo:8.5-8 skopeo copy --remove-signatures --src-creds
myusername:mypassword1 --dest-cert-dir=./certs/ --dest-creds
myregistryusername:myregistrypassword1 docker://cp.icr.io/cp/ibm-ceph/prometheus-
alertmanager:v4.10 docker://admin.lab.ibm.com:5000/ibm-ceph/prometheus-alertmanager:v4.10

14. Using the curl command, verify the images reside in the local registry:

Syntax

curl -u _PRIVATE_REGISTRY_USERNAME_:_PRIVATE_REGISTRY_PASSWORD_
https://_LOCAL_NODE_FQDN_:5000/v2/catalog

Example

[root@admin ~]# curl -u myregistryusername:myregistrypassword1


https://admin.lab.ibm.com:5000/v2/_catalog

{"repositories":["ibm-ceph/prometheus","ibm-ceph/prometheus-alertmanager","ibm-
ceph/prometheus-node-exporter","ibm-ceph/ceph-5-dashboard-rhel8","ibm-ceph/ceph-5-rhel8"]}

164 IBM Storage Ceph


Reference
Edit online

See the Red Hat knowledge centered solution What are the Red Hat Ceph Storage releases and corresponding Ceph package
versions? for different image Ceph package versions.

Running the preflight playbook for a disconnected installation


Edit online
You use the cephadm-preflight.yml Ansible playbook to configure the Ceph repository and prepare the storage cluster for
bootstrapping. It also installs some prerequisites, such as podman, lvm2, chronyd, and cephadm.

IMPORTANT: Skip these steps for Red Hat Enterprise Linux 9 as cephadm-preflight playbook is not supported.

The preflight playbook uses the cephadm-ansible inventory hosts file to identify all the nodes in the storage cluster. The default
location for cephadm-ansible, cephadm-preflight.yml, and the inventory hosts file is /usr/share/cephadm-ansible/.

The following example shows the structure of a typical inventory file:

Example

host02
host03
host04

[admin]
host01

The [admin] group in the inventory file contains the name of the node where the admin keyring is stored.

NOTE: Run the preflight playbook before you bootstrap the initial host.

Prerequisites

The cephadm-ansible package is installed on the Ansible d administration node.

Root-level access to all nodes in the storage cluster.

Passwordless ssh is set up on all hosts in the storage cluster.

Nodes configured to access a local YUM repository server with the following repositories enabled on respective Red Hat
Enterprise Linux versions.

rhel-8-for-x86_64-baseos-rpms

rhel-8-for-x86_64-appstream-rpms

curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-ceph-5-rhel-8.repo | sudo tee


/etc/yum.repos.d/ibm-storage-ceph-5-rhel-8.repo

rhel-9-for-x86_64-baseos-rpms

rhel-9-for-x86_64-appstream-rpms

curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-ceph-5-rhel-9.repo | sudo tee


/etc/yum.repos.d/ibm-storage-ceph-5-rhel-9.repo

NOTE: For more information about setting up a local YUM repository, see the Red Hat knowledge base article Creating a Local
Repository and Sharing with Disconnected/Offline/Air-gapped Systems

Procedure

1. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node.

2. Open and edit the hosts file and add your nodes.

IBM Storage Ceph 165


3. Add license to install IBM Storage Ceph and click Accept on all nodes:

Example

[root@admin ~]# dnf install ibm-storage-ceph-license

a. Accept these provisions:

Example

[root@admin ~]# sudo touch /usr/share/ibm-storage-ceph-license/accept

4. Run the preflight playbook with the ceph_origin parameter set to custom to use a local YUM repository:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=custom" -e


"custom_repo_url=CUSTOM_REPO_URL"

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-vars


"ceph_origin=custom" -e "custom_repo_url=http://mycustomrepo.lab.ibm.com/x86_64/os/"

After installation is complete, cephadm resides in the /usr/sbin/ directory.

5. Alternatively, you can use the --limit option to run the preflight playbook on a selected set of hosts in the storage cluster:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=custom" -e


"custom_repo_url=CUSTOM_REPO_URL" --limit GROUP_NAME|NODE_NAME

Replace GROUP_NAME with a group name from your inventory file. Replace NODE_NAME with a specific node name from your
inventory file.

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-vars


"ceph_origin=custom" -e "custom_repo_url=http://mycustomrepo.lab.ibm.com/x86_64/os/" --limit
clients
[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-vars
"ceph_origin=custom" -e "custom_repo_url=http://mycustomrepo.lab.ibm.com/x86_64/os/" --limit
host02

NOTE: When you run the preflight playbook, cephadm-ansible automatically installs chronyd and ceph-common on the
client nodes.

Performing a disconnected installation


Edit online
Before you can perform the installation, you must obtain an IBM Storage Ceph container image, either from a proxy host that has
access to the IBM registry or by copying the image to your local registry.

NOTE: If your local registry uses a self-signed certificate with a local registry, ensure you have added the trusted root certificate to
the bootstrap host. For more information, see Configuring a private registry for a disconnected installation.

IMPORTANT: Before you begin the bootstrapping process, make sure that the container image that you want to use has the same
version of IBM Storage Ceph as cephadm. If the two versions do not match, bootstrapping fails at the Creating initial admin
user stage.

Prerequisites
Edit online

At least one running virtual machine (VM) or server.

Root-level access to all nodes.

166 IBM Storage Ceph


Passwordless ssh is set up on all hosts in the storage cluster.

The preflight playbook has been run on the bootstrap host in the storage cluster. For more information, see Running the
preflight playbook for a disconnected installation.

A private registry has been configured and the bootstrap node has access to it. For more information, see Configuring a private
registry for a disconnected installation

An IBM Storage Ceph container image resides in the custom registry.

Procedure
Edit online

1. Log in to the bootstrap host.

2. Bootstrap the storage cluster:

Syntax

cephadm --image PRIVATE_REGISTRY_NODE_FQDN:5000/CUSTOM_IMAGE_NAME:IMAGE_TAG bootstrap --mon-ip


IP_ADDRESS --registry-url PRIVATE_REGISTRY_NODE_FQDN:5000 --registry-username
PRIVATE_REGISTRY_USERNAME --registry-password PRIVATE_REGISTRY_PASSWORD

Replace PRIVATE_REGISTRY_NODE_FQDN with the fully qualified domain name of your private registry.

Replace CUSTOM_IMAGE_NAME and IMAGE_TAG with the name and tag of the IBM Storage Ceph container image that
resides in the private registry.

Replace IP_ADDRESS with the IP address of the node you are using to run cephadm bootstrap.

Replace PRIVATE_REGISTRY_USERNAME with the username to create for the private registry.

Replace PRIVATE_REGISTRY_PASSWORD with the password to create for the private registry username.

Example

[root@host01 ~]# cephadm --image admin.lab.ibm.com:5000/ibm-ceph/ceph-5-rhel8:latest


bootstrap --mon-ip 10.10.128.68 --registry-url admin.lab.ibm.com:5000 --registry-username
myregistryusername --registry-password myregistrypassword1

The script takes a few minutes to complete. Once the script completes, it provides the credentials to the IBM Storage
Ceph Dashboard URL, a command to access the Ceph command-line interface (CLI), and a request to enable telemetry.

Ceph Dashboard is now available at:

URL: https://host01:8443/
User: admin
Password: i8nhu7zham

Enabling client.admin keyring and conf on hosts with "admin" label


You can access the Ceph CLI with:

sudo /usr/sbin/cephadm shell --fsid 266ee7a8-2a05-11eb-b846-5254002d4916 -c


/etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Please consider enabling telemetry to help improve Ceph:

ceph telemetry on

For more information see:

https://docs.ceph.com/docs/master/mgr/telemetry/

Bootstrap complete.

After the bootstrap process is complete, configure the container images, as detailed in Changing configurations of custom container
images for disconnected installations.

Once your storage cluster is up and running, configure additional daemons and services. For more information, see Operations.

IBM Storage Ceph 167


Changing configurations of custom container images for
disconnected installations
Edit online
After you perform the initial bootstrap for disconnected nodes, you must specify custom container images for monitoring stack
daemons. You can override the default container images for monitoring stack daemons, since the nodes do not have access to the
default container registry.

NOTE: Make sure that the bootstrap process on the initial host is complete before making any configuration changes.

By default, the monitoring stack components are deployed based on the primary Ceph image. For disconnected environment of the
storage cluster, you can use the latest available monitoring stack component images.

NOTE: When using a custom registry, be sure to log in to the custom registry on newly added nodes before adding any Ceph
daemons.

Syntax

# ceph cephadm registry-login --registry-url CUSTOM_REGISTRY_NAME --registry_username


REGISTRY_USERNAME --registry_password REGISTRY_PASSWORD

Example

# ceph cephadm registry-login --registry-url myregistry --registry_username myregistryusername --


registry_password myregistrypassword1

Prerequisites
Edit online

At least one running virtual machine (VM) or server.

Red Hat Enterprise Linux 8.4 EUS or later.

Root-level access to all nodes.

Passwordless ssh is set up on all hosts in the storage cluster.

Procedure
Edit online

1. Set the custom container images with the ceph config command:

Syntax

ceph config set mgr mgr/cephadm/OPTION_NAME CUSTOM_REGISTRY_NAME/CONTAINER_NAME

Use the following options for OPTION_NAME:

container_image_prometheus
container_image_grafana
container_image_alertmanager
container_image_node_exporter

Example

[root@host01 ~]# ceph config set mgr mgr/cephadm/container_image_prometheus


myregistry/mycontainer
[root@host01 ~]# ceph config set mgr mgr/cephadm/container_image_grafana
myregistry/mycontainer
[root@host01 ~]# ceph config set mgr mgr/cephadm/container_image_alertmanager
myregistry/mycontainer
[root@host01 ~]# ceph config set mgr mgr/cephadm/container_image_node_exporter
myregistry/mycontainer

168 IBM Storage Ceph


2. Redeploy node-exporter:

Syntax

ceph orch redeploy node-exporter

NOTE: If any of the services do not deploy, you can redeploy them with the ceph orch redeploy command.

NOTE: By setting a custom image, the default values for the configuration image name and tag will be overridden, but not
overwritten. The default values change when updates become available. By setting a custom image, you will not be able to configure
the component for which you have set the custom image for automatic updates. You will need to manually update the configuration
image name and tag to be able to install updates.

If you choose to revert to using the default configuration, you can reset the custom container image. Use ceph config rm to
reset the configuration option:

Syntax

ceph config rm mgr mgr/cephadm/OPTION_NAME

Example

ceph config rm mgr mgr/cephadm/container_image_prometheus

Reference
Edit online

Performing a disconnected installation

Distributing SSH keys


Edit online
You can use the cephadm-distribute-ssh-key.yml playbook to distribute the SSH keys instead of creating and distributing the
keys manually. The playbook distributes an SSH public key over all hosts in the inventory.

You can also generate an SSH key pair on the Ansible administration node and distribute the public key to each node in the storage
cluster so that Ansible can access the nodes without being prompted for a password.

Prerequisites
Edit online

Ansible is installed on the administration node.

Access to the Ansible administration node.

Ansible user with sudo access to all nodes in the storage cluster.

Bootstrapping is completed. See Bootstrapping a new storage cluster for more details.

Procedure
Edit online

1. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

2. From the Ansible administration node, distribute the SSH keys. The optional cephadm_pubkey_path parameter is the full
path name of the SSH public key file on the ansible controller host.

IBM Storage Ceph 169


NOTE: If cephadm_pubkey_path is not specified, the playbook gets the key from the cephadm get-pub-key command.
This implies that you have at least bootstrapped a minimal cluster.

Syntax

ansible-playbook -i INVENTORY_HOST_FILE cephadm-distribute-ssh-key.yml -e


cephadm_ssh_user=USER_NAME -e cephadm_pubkey_path= home/cephadm/ceph.key -e
admin_node=ADMIN_NODE_NAME_1

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-distribute-ssh-key.yml -e


cephadm_ssh_user=ceph-admin -e cephadm_pubkey_path=/home/cephadm/ceph.key -e admin_node=host01

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-distribute-ssh-key.yml -e


cephadm_ssh_user=ceph-admin -e admin_node=host01

Launching the cephadm shell

Edit online
The cephadm shell command launches a bash shell in a container with all of the Ceph packages installed. This enables you to
perform “Day One” cluster setup tasks, such as installation and bootstrapping, and to invoke ceph commands.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all nodes in the storage cluster.

Procedure
Edit online
There are two ways to launch the cephadm shell:

Enter cephadm shell at the system prompt. This example invokes the ceph -s command from within the shell.

Example

[root@host01 ~]# cephadm shell


[ceph: root@host01 /]# ceph -s

At the system prompt, type cephadm shell and the command you want to execute:

Example

[root@host01 ~]# cephadm shell ceph -s


cluster:
id: f64f341c-655d-11eb-8778-fa163e914bcc
health: HEALTH_OK

services:
mon: 3 daemons, quorum host01,host02,host03 (age 94m)
mgr: host01.lbnhug(active, since 59m), standbys: host02.rofgay, host03.ohipra
mds: 1/1 daemons up, 1 standby
osd: 18 osds: 18 up (since 10m), 18 in (since 10m)
rgw: 4 daemons active (2 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 8 pools, 225 pgs
objects: 230 objects, 9.9 KiB
usage: 271 MiB used, 269 GiB / 270 GiB avail
pgs: 225 active+clean

170 IBM Storage Ceph


io:
client: 85 B/s rd, 0 op/s rd, 0 op/s wr

NOTE: If the node contains configuration and keyring files in /etc/ceph/, the container environment uses the values in those files
as defaults for the cephadm shell. If you execute the cephadm shell on a MON node, the cephadm shell inherits its default
configuration from the MON container, instead of using the default configuration.

Verifying the cluster installation


Edit online
Once the cluster installation is complete, you can verify that the IBM Storage Ceph 5.3 installation is running properly.

There are two ways of verifying the storage cluster installation as a root user:

Run the podman ps command.

Run the cephadm shell ceph -s.

Prerequisites
Edit online

Root-level access to all nodes in the storage cluster.

Procedure
Edit online

Run the podman ps command:

Example

[root@host01 ~]# podman ps

NOTE: In the NAMES column, the unit files now include the FSID.

Run the cephadm shell ceph -s command:

Example

[root@host01 ~]# cephadm shell ceph -s

cluster:
id: f64f341c-655d-11eb-8778-fa163e914bcc
health: HEALTH_OK

services:
mon: 3 daemons, quorum host01,host02,host03 (age 94m)
mgr: host01.lbnhug(active, since 59m), standbys: host02.rofgay, host03.ohipra
mds: 1/1 daemons up, 1 standby
osd: 18 osds: 18 up (since 10m), 18 in (since 10m)
rgw: 4 daemons active (2 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 8 pools, 225 pgs
objects: 230 objects, 9.9 KiB
usage: 271 MiB used, 269 GiB / 270 GiB avail
pgs: 225 active+clean

io:
client: 85 B/s rd, 0 op/s rd, 0 op/s wr

NOTE: The health of the storage cluster is in HEALTH_WARN status as the hosts and the daemons are not added.

IBM Storage Ceph 171


Adding hosts
Edit online
Bootstrapping the IBM Storage Ceph installation creates a working storage cluster, consisting of one Monitor daemon and one
Manager daemon within the same container. As a storage administrator, you can add additional hosts to the storage cluster and
configure them.

NOTE: For Red Hat Enterprise Linux 8, running the preflight playbook installs podman, lvm2, chronyd, and cephadm on all hosts
listed in the Ansible inventory file.

NOTE: For Red Hat Enterprise Linux 9, you need to manually install podman, lvm2, chronyd, and cephadm on all hosts and skip
steps for running ansible playbooks as the preflight playbook is not supported.

NOTE: When using a custom registry, be sure to log in to the custom registry on newly added nodes before adding any Ceph
daemons.

Syntax

# ceph cephadm registry-login --registry-url CUSTOM_REGISTRY_NAME --registry_username


REGISTRY_USERNAME --registry_password REGISTRY_PASSWORD

Example

# ceph cephadm registry-login --registry-url myregistry --registry_username myregistryusername --


registry_password myregistrypassword1

Using the addr option to identify hosts


Adding multiple hosts
Adding hosts in disconnected deployments

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level or user with sudo access to all nodes in the storage cluster.

Register the nodes to IBM subscription.

Ansible user with sudo and passwordless ssh access to all nodes in the storage cluster.

Procedure
Edit online

1. From the node that contains the admin keyring, install the storage cluster’s public SSH key in the root user’s
authorized_keys file on the new host:

NOTE: In the following procedure, use either root, as indicated, or the username with which the user is bootstrapped.

Syntax

ssh-copy-id -f -i /etc/ceph/ceph.pub user@NEWHOST

Example

[root@host01 ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host02


[root@host01 ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host03

2. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node.

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

172 IBM Storage Ceph


3. From the Ansible administration node, add the new host to the Ansible inventory file. The default location for the file is
/usr/share/cephadm-ansible/hosts. The following example shows the structure of a typical inventory file:

Example

[ansible@admin ~]$ cat hosts

host02
host03
host04

[admin]
host01

NOTE: If you have previously added the new host to the Ansible inventory file and run the preflight playbook on the host, skip
to step 4.

4. Run the preflight playbook with the --limit option:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=ibm" --


limit NEWHOST

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-vars


"ceph_origin=ibm" --limit host02

The preflight playbook installs podman, lvm2, chronyd, and cephadm on the new host. After installation is complete,
cephadm resides in the /usr/sbin/ directory.

For Red Hat Enterprise Linux 9, install podman, lvm2, chronyd, and cephadm manually:

Example

[root@host01 ~]# dnf install podman lvm2 chronyd cephadm

5. From the bootstrap node, use the cephadm orchestrator to add the new host to the storage cluster:

Syntax

ceph orch host add NEWHOST

Example

[ceph: root@host01 /]# ceph orch host add host02


Added host 'host02' with addr '10.10.128.69'
[ceph: root@host01 /]# ceph orch host add host03
Added host 'host03' with addr '10.10.128.70'

6. Optional: You can also add nodes by IP address, before and after you run the preflight playbook. If you do not have DNS
configured in your storage cluster environment, you can add the hosts by IP address, along with the host names.

Syntax

ceph orch host add HOSTNAME IP_ADDRESS

Example

[ceph: root@host01 /]# ceph orch host add host02 10.10.128.69


Added host 'host02' with addr '10.10.128.69'

View the status of the storage cluster and verify that the new host has been added. The STATUS of the hosts is blank, in
the output of the ceph orch host ls command.

Example

[ceph: root@host01 /]# ceph orch host ls

Reference

IBM Storage Ceph 173


Edit online

Registering IBM Storage Ceph nodes to the CDN and attaching subscriptions

Creating an Ansible user with sudo access

Using the addr option to identify hosts

Edit online
The addr option offers an additional way to contact a host. Add the IP address of the host to the addr option. If ssh cannot connect
to the host by its hostname, then it uses the value stored in addr to reach the host by its IP address.

Prerequisites
Edit online

A storage cluster that has been installed and bootstrapped.

Root-level access to all nodes in the storage cluster.

Procedure
Edit online

1. Log in to the cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Add the IP address:

Syntax

ceph orch host add HOSTNAME IP_ADDR

Example

[ceph: root@host01 /]# ceph orch host add host01 10.10.128.68

NOTE: If adding a host by hostname results in that host being added with an IPv6 address instead of an IPv4 address, use
ceph orch host to specify the IP address of that host:

Syntax

ceph orch host set-addr HOSTNAME IP_ADDR

To convert the IP address from IPv6 format to IPv4 format for a host you have added, use the following command:

ceph orch host set-addr HOSTNAME IPV4_ADDRESS

Adding multiple hosts


Edit online
Use a YAML file to add multiple hosts to the storage cluster at the same time.

NOTE: Be sure to create the hosts.yaml file within a host container, or create the file on the local host and then use the cephadm
shell to mount the file within the container. The cephadm shell automatically places mounted files in /mnt. If you create the file
directly on the local host and then apply the hosts.yaml file instead of mounting it, you might see a File does not exist error.

Prerequisites
174 IBM Storage Ceph
Edit online

A storage cluster that has been installed and bootstrapped.

Root-level access to all nodes in the storage cluster.

Procedure
Edit online

1. Copy over the public ssh key to each of the hosts that you want to add.

2. Use a text editor to create a hosts.yaml file.

3. Add the host descriptions to the hosts.yaml file, as shown in the following example. Include the labels to identify
placements for the daemons that you want to deploy on each host. Separate each host description with three dashes (---).

Example

service_type: host
addr:
hostname: host02
labels:
- mon
- osd
- mgr
---
service_type: host
addr:
hostname: host03
labels:
- mon
- osd
- mgr
---
service_type: host
addr:
hostname: host04
labels:
- mon
- osd

4. If you created the hosts.yaml file within the host container, invoke the ceph orch apply command:

Example

[root@host01 ~]# ceph orch apply -i hosts.yaml


Added host 'host02' with addr '10.10.128.69'
Added host 'host03' with addr '10.10.128.70'
Added host 'host04' with addr '10.10.128.71'

5. If you created the hosts.yaml file directly on the local host, use the cephadm shell to mount the file:

Example

[root@host01 ~]# cephadm shell --mount hosts.yaml -- ceph orch apply -i /mnt/hosts.yaml

6. View the list of hosts and their labels:

Example

[root@host01 ~]# ceph orch host ls


HOST ADDR LABELS STATUS
host02 host02 mon osd mgr
host03 host03 mon osd mgr
host04 host04 mon osd

NOTE: If a host is online and operating normally, its status is blank. An offline host shows a status of OFFLINE, and a host in
maintenance mode shows a status of MAINTENANCE.

IBM Storage Ceph 175


Adding hosts in disconnected deployments
Edit online
If you are running a storage cluster on a private network and your host domain name server (DNS) cannot be reached through private
IP, you must include both the host name and the IP address for each host you want to add to the storage cluster.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all hosts in the storage cluster.

Procedure
Edit online

1. Log into the cephadm shell.

Syntax

[root@host01 ~]# cephadm shell

2. Add the host:

Syntax

ceph orch host add HOST_NAME HOST_ADDRESS

Example

[ceph: root@host01 /]# ceph orch host add host03 10.10.128.70

Removing hosts
Edit online
You can remove hosts of a Ceph cluster with the Ceph Orchestrators. All the daemons are removed with the drain option which
adds the _no_schedule label to ensure that you cannot deploy any daemons or a cluster till the operation is complete.

IMPORTANT: If you are removing the bootstrap host, be sure to copy the admin keyring and the configuration file to another host in
the storage cluster before you remove the host.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the storage cluster.

All the services are deployed.

Cephadm is deployed on the nodes where the services have to be removed.

Procedure
Edit online

176 IBM Storage Ceph


1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Fetch the host details:

Example

[ceph: root@host01 /]# ceph orch host ls

3. Drain all the daemons from the host:

Syntax

ceph orch host drain HOSTNAME

Example

[ceph: root@host01 /]# ceph orch host drain host02

The _no_schedule label is automatically applied to the host which blocks deployment.

4. Check the status of OSD removal:

Example

[ceph: root@host01 /]# ceph orch osd rm status

When no placement groups (PG) are left on the OSD, the OSD is decommissioned and removed from the storage cluster.

5. Check if all the daemons are removed from the storage cluster:

Syntax

ceph orch ps HOSTNAME

Example

[ceph: root@host01 /]# ceph orch ps host02

6. Remove the host:

Syntax

ceph orch host rm HOSTNAME

Example

[ceph: root@host01 /]# ceph orch host rm host02

Reference
Edit online

Adding hosts using the Ceph Orchestrator

Listing hosts using the Ceph Orchestrator

Labeling hosts
Edit online
The Ceph orchestrator supports assigning labels to hosts. Labels are free-form and have no specific meanings. This means that you
can use mon, monitor, mycluster_monitor, or any other text string. Each host can have multiple labels.

For example, apply the mon label to all hosts on which you want to deploy Ceph Monitor daemons, mgr for all hosts on which you
want to deploy Ceph Manager daemons, rgw for Ceph Object Gateway daemons, and so on.

IBM Storage Ceph 177


Labeling all the hosts in the storage cluster helps to simplify system management tasks by allowing you to quickly identify the
daemons running on each host. In addition, you can use the Ceph orchestrator or a YAML file to deploy or remove daemons on hosts
that have specific host labels.

Adding a label to a host


Removing a label from a host
Using host labels to deploy daemons on specific hosts

Prerequisites
Edit online

A storage cluster that has been installed and bootstrapped.

Adding a label to a host


Edit online
You can use the Ceph orchestrator to add a label to a host. Each host can have multiple labels. Labels can be used to specify
placement of daemons.

Prerequisites
Edit online

A storage cluster that has been installed and bootstrapped.

Root-level access to all nodes in the storage cluster.

Hosts are added to the storage cluster.

Procedure
Edit online

1. Launch the cephadm shell:

[root@host01 ~]# cephadm shell


[ceph: root@host01 /]#

2. Add a label to a host:

Syntax

ceph orch host label add HOSTNAME LABEL

Example

[ceph: root@host01 /]# ceph orch host label add host02 mon

Verification
Edit online

List the hosts:

Example

[ceph: root@host01 /]# ceph orch host ls

Removing a label from a host


178 IBM Storage Ceph
Edit online
You can use the Ceph orchestrator to remove a label from a host.

Prerequisites
Edit online

A storage cluster that has been installed and bootstrapped.

Root-level access to all nodes in the storage cluster.

Procedure
Edit online

1. Launch the cephadm shell:

[root@host01 ~]# cephadm shell


[ceph: root@host01 /]#

2. Use the ceph orchestrator to remove a label from a host:

Syntax

ceph orch host label rm HOSTNAME LABEL

Example

[ceph: root@host01 /]# ceph orch host label rm host02 mon

Verification
Edit online

List the hosts:

Example

[ceph: root@host01 /]# ceph orch host ls

Using host labels to deploy daemons on specific hosts


Edit online
You can use host labels to deploy daemons to specific hosts. There are two ways to use host labels to deploy daemons on specific
hosts:

By using the --placement option from the command line.

By using a YAML file.

Prerequisites
Edit online

A storage cluster that has been installed and bootstrapped.

Root-level access to all nodes in the storage cluster.

Procedure
Edit online

IBM Storage Ceph 179


1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List current hosts and labels:

Example

[ceph: root@host01 /]# ceph orch host ls

HOST ADDR LABELS STATUS


host01 _admin mon osd mgr
host02 mon osd mgr mylabel

Method 1: Use the --placement option to deploy a daemon from the command line:

Syntax

ceph orch apply DAEMON --placement="label:_LABEL_"

Example

[ceph: root@host01 /]# ceph orch apply prometheus --placement="label:mylabel"

Method 2: To assign the daemon to a specific host label in a YAML file, specify the service type and label in the YAML
file:

Create the placement.yml file:

Example

[ceph: root@host01 /]# vi placement.yml

Specify the service type and label in the placement.yml file:

Example

service_type: prometheus
placement:
label: "mylabel"

Apply the daemon placement file:

Syntax

ceph orch apply -i FILENAME

Example

[ceph: root@host01 /]# ceph orch apply -i placement.yml


Scheduled prometheus update…

Verification
Edit online

List the status of the daemons:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=prometheus


NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION
IMAGE ID CONTAINER ID
prometheus.host02 host02 *:9095 running (2h) 8m ago 2h 85.3M - 2.22.2
ac25aac5d567 ad8c7593d7c0

180 IBM Storage Ceph


Adding Monitor service
Edit online
A typical IBM Storage Ceph storage cluster has three or five monitor daemons deployed on different hosts. If your storage cluster
has five or more hosts, IBM recommends that you deploy five Monitor nodes.

NOTE: In the case of a firewall, see Firewall settings for Ceph Monitor node

NOTE: The bootstrap node is the initial monitor of the storage cluster. Be sure to include the bootstrap node in the list of hosts to
which you want to deploy.

NOTE: If you want to apply Monitor service to more than one specific host, be sure to specify all of the host names within the same
ceph orch apply command. If you specify ceph orch apply mon --placement host1 and then specify ceph orch
apply mon --placement host2, the second command removes the Monitor service on host1 and applies a Monitor service to
host2.

If your Monitor nodes or your entire cluster are located on a single subnet, then cephadm automatically adds up to five Monitor
daemons as you add new hosts to the cluster. cephadm automatically configures the Monitor daemons on the new hosts. The new
hosts reside on the same subnet as the first (bootstrap) host in the storage cluster. cephadm can also deploy and scale monitors to
correspond to changes in the size of the storage cluster.

Prerequisites
Edit online

Root-level access to all hosts in the storage cluster.

A running storage cluster.

Procedure
Edit online

1. Apply the five Monitor daemons to five random hosts in the storage cluster:

ceph orch apply mon 5

2. Disable automatic Monitor deployment:

ceph orch apply mon --unmanaged

Adding Monitor nodes to specific hosts

Use host labels to identify the hosts that contain Monitor nodes.

Root-level access to all nodes in the storage cluster.

A running storage cluster.

1. Assign the mon label to the host:

Syntax

ceph orch host label add HOSTNAME mon

Example

[ceph: root@host01 /]# ceph orch host label add host01 mon

2. View the current hosts and labels:

Syntax

ceph orch host ls

Example

IBM Storage Ceph 181


[ceph: root@host01 /]# ceph orch host label add host02 mon
[ceph: root@host01 /]# ceph orch host label add host03 mon
[ceph: root@host01 /]# ceph orch host ls
HOST ADDR LABELS STATUS
host01 mon
host02 mon
host03 mon
host04
host05
host06

3. Deploy monitors based on the host label:

Syntax

ceph orch apply mon label:mon

4. Deploy monitors on a specific set of hosts:

Syntax

ceph orch apply mon HOSTNAME1,HOSTNAME2,HOSTNAME3

Example

[root@host01 ~]# ceph orch apply mon host01,host02,host03

NOTE: Be sure to include the bootstrap node in the list of hosts to which you want to deploy.

Setting up the admin node


Edit online
Use an admin node to administer the storage cluster.

An admin node contains both the cluster configuration file and the admin keyring. Both of these files are stored in the directory
/etc/ceph and use the name of the storage cluster as a prefix.

For example, the default ceph cluster name is ceph. In a cluster using the default name, the admin keyring is named
/etc/ceph/ceph.client.admin.keyring. The corresponding cluster configuration file is named /etc/ceph/ceph.conf.

To set up additional hosts in the storage cluster as admin nodes, apply the _admin label to the host you want to designate as an
administrator node.

NOTE: By default, after applying the _admin label to a node, cephadm copies the ceph.conf and client.admin keyring files to
that node. The _admin label is automatically applied to the bootstrap node unless the --skip-admin-label option was specified
with the cephadm bootstrap command.

Deploying Ceph monitor nodes using host labels


Adding Ceph Monitor nodes by IP address or network name
Removing the admin label from a host

Prerequisites
Edit online

A running storage cluster with cephadm installed.

The storage cluster has running Monitor and Manager nodes.

Root-level access to all nodes in the cluster.

Procedure
Edit online

1. Use ceph orch host ls to view the hosts in your storage cluster:

182 IBM Storage Ceph


Example

[root@host01 ~]# ceph orch host ls


HOST ADDR LABELS STATUS
host01 mon,mgr,_admin
host02 mon
host03 mon,mgr
host04
host05
host06

2. Use the _admin label to designate the admin host in your storage cluster. For best results, this host should have both Monitor
and Manager daemons running.

Syntax

ceph orch host label add HOSTNAME _admin

Example

[root@host01 ~]# ceph orch host label add host03 _admin

3. Verify that the admin host has the _admin label.

Example

[root@host01 ~]# ceph orch host ls


HOST ADDR LABELS STATUS
host01 mon,mgr,_admin
host02 mon
host03 mon,mgr,_admin
host04
host05
host06

4. Log in to the admin node to manage the storage cluster.

Deploying Ceph monitor nodes using host labels


Edit online
A typical IBM Storage Ceph storage cluster has three or five Ceph Monitor daemons deployed on different hosts. If your storage
cluster has five or more hosts, IBM recommends that you deploy five Ceph Monitor nodes.

If your Ceph Monitor nodes or your entire cluster are located on a single subnet, then cephadm automatically adds up to five Ceph
Monitor daemons as you add new nodes to the cluster. cephadm automatically configures the Ceph Monitor daemons on the new
nodes. The new nodes reside on the same subnet as the first (bootstrap) node in the storage cluster. cephadm can also deploy and
scale monitors to correspond to changes in the size of the storage cluster.

NOTE: Use host labels to identify the hosts that contain Ceph Monitor nodes.

Prerequisites
Edit online

Root-level access to all nodes in the storage cluster.

A running storage cluster.

Procedure
Edit online

1. Assign the mon label to the host:

Syntax

ceph orch host label add HOSTNAME mon

IBM Storage Ceph 183


Example

[ceph: root@host01 /]# ceph orch host label add host02 mon
[ceph: root@host01 /]# ceph orch host label add host03 mon

2. View the current hosts and labels:

Syntax

ceph orch host ls

Example

[ceph: root@host01 /]# ceph orch host ls


HOST ADDR LABELS STATUS
host01 mon,mgr,_admin
host02 mon
host03 mon
host04
host05
host06

Deploy Ceph Monitor daemons based on the host label:

Syntax

ceph orch apply mon label:mon

Deploy Ceph Monitor daemons on a specific set of hosts:

Syntax

ceph orch apply mon HOSTNAME1,HOSTNAME2,HOSTNAME3

Example

[ceph: root@host01 /]# ceph orch apply mon host01,host02,host03

NOTE: Be sure to include the bootstrap node in the list of hosts to which you want to deploy.

Adding Ceph Monitor nodes by IP address or network name


Edit online
A typical IBM Storage Ceph storage cluster has three or five monitor daemons deployed on different hosts. If your storage cluster
has five or more hosts, IBM recommends that you deploy five Monitor nodes.

If your Monitor nodes or your entire cluster are located on a single subnet, then cephadm automatically adds up to five Monitor
daemons as you add new nodes to the cluster. You do not need to configure the Monitor daemons on the new nodes. The new nodes
reside on the same subnet as the first node in the storage cluster. The first node in the storage cluster is the bootstrap node.
cephadm can also deploy and scale monitors to correspond to changes in the size of the storage cluster.

Prerequisites
Edit online

Root-level access to all nodes in the storage cluster.

A running storage cluster.

Procedure
Edit online

1. To deploy each additional Ceph Monitor node:

Syntax

184 IBM Storage Ceph


ceph orch apply mon NODE:IP_ADDRESS_OR_NETWORK_NAME [NODE:IP_ADDRESS_OR_NETWORK_NAME...]

Example

[ceph: root@host01 /]# ceph orch apply mon host02:10.10.128.69 host03:mynetwork

Removing the admin label from a host


Edit online
You can use the Ceph orchestrator to remove the admin label from a host.

Prerequisites
Edit online

A running storage cluster with cephadm installed and bootstrapped.

The storage cluster has running Monitor and Manager nodes.

Root-level access to all nodes in the cluster.

Procedure
Edit online

1. Use ceph orch host ls to view the hosts and identify the admin host in your storage cluster:

Example

[root@host01 ~]# ceph orch host ls


HOST ADDR LABELS STATUS
host01 mon,mgr,_admin
host02 mon
host03 mon,mgr,_admin
host04
host05
host06

2. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

3. Use the ceph orchestrator to remove the admin label from a host:

Syntax

ceph orch host label rm HOSTNAME LABEL

Example

[ceph: root@host01 /]# ceph orch host label rm host03 _admin

4. Verify that the admin host has the _admin label.

Example

[root@host01 ~]# ceph orch host ls


HOST ADDR LABELS STATUS
host01 mon,mgr,_admin
host02 mon
host03 mon,mgr
host04
host05
host06

IBM Storage Ceph 185


IMPORTANT: After removing the admin label from a node, ensure you remove the ceph.conf and client.admin keyring files
from that node. Also, the node must be removed from the ansible inventory file.

Adding Manager service


Edit online
cephadm automatically installs a Manager daemon on the bootstrap node during the bootstrapping process. Use the Ceph
orchestrator to deploy additional Manager daemons.

The Ceph orchestrator deploys two Manager daemons by default. To deploy a different number of Manager daemons, specify a
different number. If you do not specify the hosts where the Manager daemons should be deployed, the Ceph orchestrator randomly
selects the hosts and deploys the Manager daemons to them.

NOTE: If you want to apply Manager daemons to more than one specific host, be sure to specify all of the host names within the
same ceph orch apply command. If you specify ceph orch apply mgr --placement host1 and then specify ceph orch
apply mgr --placement host2, the second command removes the Manager daemon on host1 and applies a Manager daemon
to host2.

Use the --placement option to deploy to specific hosts.

Prerequisites
Edit online

A running storage cluster.

Procedure
Edit online

To specify that you want to apply a certain number of Manager daemons to randomly selected hosts:

Syntax

ceph orch apply mgr NUMBER_OF_DAEMONS

Example

[ceph: root@host01 /]# ceph orch apply mgr 3

To add Manager daemons to specific hosts in your storage cluster:

Syntax

ceph orch apply mgr --placement "HOSTNAME1 HOSTNAME2 HOSTNAME3"

Example

[ceph: root@host01 /]# ceph orch apply mgr --placement "host02 host03 host04"

Adding OSDs
Edit online
Cephadm will not provision an OSD on a device that is not available. A storage device is considered available if it meets all of the
following conditions:

The device must have no partitions.

The device must not be mounted.

The device must not contain a file system.

186 IBM Storage Ceph


The device must not contain a Ceph BlueStore OSD.

The device must be larger than 5 GB.

IMPORTANT: By default, the osd_memory_target_autotune parameter is set to true in IBM Storage Ceph 5.3.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Procedure
Edit online

1. List the available devices to deploy OSDs:

Syntax

ceph orch device ls [--hostname=HOSTNAME1 HOSTNAME2] [--wide] [--refresh]

Example

[ceph: root@host01 /]# ceph orch device ls --wide --refresh

2. You can either deploy the OSDs on specific hosts or on all the available devices:

To create an OSD from a specific device on a specific host:

Syntax

ceph orch daemon add osd HOSTNAME:_DEVICE_PATH_

Example

[ceph: root@host01 /]# ceph orch daemon add osd host02:/dev/sdb

To deploy OSDs on any available and unused devices, use the --all-available-devices option.

Example

[ceph: root@host01 /]# ceph orch apply osd --all-available-devices

NOTE: This command creates colocated WAL and DB daemons. If you want to create non-colocated daemons, do not
use this command.

Reference
Edit online

For more information about drive specifications for OSDs, see Advanced service specifications and filters for deploying OSDs

For more information about zapping devices to clear data on devices, see Zapping devices for Ceph OSD deployment

Purging the Ceph storage cluster


Edit online
Purging the Ceph storage cluster clears any data or connections that remain from previous deployments on your server. For Red Hat
Enterprise Linux 8, this Ansible script removes all daemons, logs, and data that belong to the FSID passed to the script from all hosts
in the storage cluster. For Red Hat Enterprise Linux 9, use the cephadm rm-cluster command since Ansible is not supported.

Red Hat Enterprise Linux 8

IBM Storage Ceph 187


Edit online
IMPORTANT: This process works only if the cephadm binary is installed on all hosts in the storage cluster.

The Ansible inventory file lists all the hosts in your cluster and what roles each host plays in your Ceph storage cluster. The default
location for an inventory file is /usr/share/cephadm-ansible/hosts, but this file can be placed anywhere.

The following example shows the structure of an inventory file:

Example

host02
host03
host04

[admin]
host01

[clients]
client01
client02
client03

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ansible 2.9 or later is installed on the bootstrap node.

Root-level access to the Ansible administration node.

Ansible user with sudo and passwordless ssh access to all nodes in the storage cluster.

The [admin] group is defined in the inventory file with a node where the admin keyring is present at
/etc/ceph/ceph.client.admin.keyring.

Procedure
Edit online

As an Ansible user on the bootstrap node, run the purge script:

Syntax

ansible-playbook -i hosts cephadm-purge-cluster.yml -e fsid=FSID -vvv

Example

[ansible@host01 cephadm-ansible]$ ansible-playbook -i hosts cephadm-purge-cluster.yml -e


fsid=a6ca415a-cde7-11eb-a41a-002590fc2544 -vvv

NOTE: An additional extra-var (-e ceph_origin=ibm) is required to zap the disk devices during the purge.

When the script has completed, the entire storage cluster, including all OSD disks, will have been removed from all hosts in the
cluster.

Red Hat Enterprise Linux 9


Edit online

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

188 IBM Storage Ceph


Procedure
Edit online

1. Disable cephadm to stop all the orchestration operations to avoid deploying new daemons:

Example

[ceph: root#host01 /]# ceph mgr module disable cephadm

2. Get the FSID of the cluster:

Example

[ceph: root#host01 /]# ceph fsid

3. Purge the Ceph daemons from all hosts in the cluster:

Syntax

cephadm rm-cluster --force --zap-osds --fsid FSID

Example

[ceph: root#host01 /]# cephadm rm-cluster --force --zap-osds --fsid a6ca415a-cde7-11eb-a41a-


002590fc2544

Deploying client nodes


Edit online
As a storage administrator, you can deploy client nodes by running the cephadm-preflight.yml and cephadm-clients.yml
playbooks. The cephadm-preflight.yml playbook configures the Ceph repository and prepares the storage cluster for
bootstrapping. It also installs some prerequisites, such as podman, lvm2, chronyd, and cephadm.

IMPORTANT: Skip these steps for Red Hat Enterprise Linux 9 as cephadm-preflight playbook is not supported.

The cephadm-clients.yml playbook handles the distribution of configuration and keyring files to a group of Ceph clients.

NOTE: If you are not using the cephadm-ansible playbooks, after upgrading your Ceph cluster, you must upgrade the ceph-
common package and client libraries on your client nodes. For more information, see Upgrading the IBM Storage Ceph cluster.

Prerequisites
Edit online

Root-level access to the Ansible administration node.

Ansible user with sudo and passwordless SSH access to all nodes in the storage cluster.

Installation of the cephadm-ansible package.

The [admin] group is defined in the inventory file with a node where the admin keyring is present at
/etc/ceph/ceph.client.admin.keyring.

Procedure
Edit online

1. As an Ansible user, navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node.

Example

[ceph-admin@admin ~]$ cd /usr/share/cephadm-ansible

2. Open and edit the hosts inventory file and add the [clients] group and clients to your inventory:

IBM Storage Ceph 189


Example

host02
host03
host04

[clients]
client01
client02
client03

[admin]
host01

3. Run the cephadm-preflight.yml playbook to install the prerequisites on the clients:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --limit


CLIENT_GROUP_NAME|CLIENT_NODE_NAME

Example

[ceph-admin@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --limit


clients

4. Run the cephadm-clients.yml playbook to distribute the keyring and Ceph configuration files to a set of clients.

a. To copy the keyring with a custom destination keyring name:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-clients.yml --extra-vars


'{"fsid":"FSID","keyring":"KEYRING_PATH","client_group":"CLIENT_GROUP_NAME","conf":"CEPH_
CONFIGURATION_PATH","keyring_dest":"KEYRING_DESTINATION_PATH"}'

Replace INVENTORY_FILE with the Ansible inventory file name.

Replace FSID with the FSID of the cluster.

Replace KEYRING_PATH with the full path name to the keyring on the admin host that you want to copy to the
client.

Optional: Replace CLIENT_GROUP_NAME with the Ansible group name for the clients to set up.

Optional: Replace CEPH_CONFIGURATION_PATH with the full path to the Ceph configuration file on the admin
node.

Optional: Replace KEYRING_DESTINATION_PATH with the full path name of the destination where the keyring
will be copied.

NOTE: If you do not specify a configuration file with the conf option when you run the playbook, the playbook
generates and distributes a minimal configuration file. By default, the generated file is located at
/etc/ceph/ceph.conf.

Example

[ceph-admin@host01 cephadm-ansible]$ ansible-playbook -i hosts cephadm-clients.yml --


extra-vars '{"fsid":"266ee7a8-2a05-11eb-b846-
5254002d4916","keyring":"/etc/ceph/ceph.client.admin.keyring","client_group":"clients","c
onf":"/etc/ceph/ceph.conf","keyring_dest":"/etc/ceph/custom.name.ceph.keyring"}'

b. To copy a keyring with the default destination keyring name of ceph.keyring and using the default group of
clients:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-clients.yml --extra-vars


'{"fsid":"FSID","keyring":"KEYRING_PATH","conf":"CONF_PATH"}'

Example

190 IBM Storage Ceph


[ceph-admin@host01 cephadm-ansible]$ ansible-playbook -i hosts cephadm-clients.yml --
extra-vars '{"fsid":"266ee7a8-2a05-11eb-b846-
5254002d4916","keyring":"/etc/ceph/ceph.client.admin.keyring","conf":"/etc/ceph/ceph.conf
"}'

Verification
Edit online

Log into the client nodes and verify that the keyring and configuration files exist.

Example

[user@client01 ~]# ls -l /etc/ceph/

-rw-------. 1 ceph ceph 151 Jul 11 12:23 custom.name.ceph.keyring


-rw-------. 1 ceph ceph 151 Jul 11 12:23 ceph.keyring
-rw-------. 1 ceph ceph 269 Jul 11 12:23 ceph.conf

Reference
Edit online

For more information about admin keys, see Ceph User Management.

For more information about the cephadm-preflight playbook, see Running the preflight playbook.

Managing an IBM Storage Ceph cluster using cephadm-ansible


modules
Edit online
As a storage administrator, you can use cephadm-ansible modules in Ansible playbooks to administer your IBM Storage Ceph
cluster. The cephadm-ansible package provides several modules that wrap cephadm calls to let you write your own unique
Ansible playbooks to administer your cluster.

NOTE: At this time, cephadm-ansible modules only support the most important tasks. Any operation not covered by cephadm-
ansible modules must be completed using either the command or shell Ansible modules in your playbooks.

cephadm-ansible modules
cephadm-ansible modules options
Bootstrapping a storage cluster using the cephadm_bootstrap and cephadm_registry_login modules
Adding or removing hosts using the ceph_orch_host module
Setting configuration options using the ceph_config module
Applying a service specification using the ceph_orch_apply module
Managing Ceph daemon states using the ceph_orch_daemon module

cephadm-ansible modules

Edit online
The cephadm-ansible modules are a collection of modules that simplify writing Ansible playbooks by providing a wrapper around
cephadm and ceph orch commands. You can use the modules to write your own unique Ansible playbooks to administer your
cluster using one or more of the modules.

The cephadm-ansible package includes the following modules:

cephadm_bootstrap

ceph_orch_host

ceph_config

IBM Storage Ceph 191


ceph_orch_apply

ceph_orch_daemon

cephadm_registry_login

cephadm-ansible modules options

Edit online
The following tables list the available options for the cephadm-ansible modules. Options listed as required need to be set when
using the modules in your Ansible playbooks. Options listed with a default value of true indicate that the option is automatically set
when using the modules and you do not need to specify it in your playbook. For example, for the cephadm_bootstrap module, the
Ceph Dashboard is installed unless you set dashboard: false.

Table 1. Available options for the cephadm_bootstrap module.


cephadm_bootstr Description Required Default
ap
mon_ip Ceph Monitor IP address. true
image Ceph container image. false
docker Use docker instead of podman. false
fsid Define the Ceph FSID. false
pull Pull the Ceph container image. false true
dashboard Deploy the Ceph Dashboard. false true
dashboard_user Specify a specific Ceph Dashboard user. false
dashboard_passw Ceph Dashboard password. false
ord
monitoring Deploy the monitoring stack. false true
firewalld Manage firewall rules with firewalld. false true
allow_overwrite Allow overwrite of existing --output-config, --output-keyring, or -- false false
output-pub-ssh-key files.
registry_url URL for custom registry. false
registry_userna Username for custom registry. false
me
registry_passwo Password for custom registry. false
rd
registry_json JSON file with custom registry login information. false
ssh_user SSH user to use for cephadm ssh to hosts. false
ssh_config SSH config file path for cephadm SSH client. false
allow_fqdn_host Allow hostname that is a fully-qualified domain name (FQDN). false false
name
cluster_network Subnet to use for cluster replication, recovery and heartbeats. false
Table 2. Available options for the ceph_orch_host module.
ceph_orch_ho Description Required Default
st
fsid The FSID of the Ceph cluster to interact with. false
image The Ceph container image to use. false
name Name of the host to add, remove, or update. true
address IP address of the host. true when
state is
present.
set_admin_la Set the _admin label on the specified host. false false
bel
labels The list of labels to apply to the host. false []
state If set to present, it ensures the name specified in name is present. If set to false present
absent, it removes the host specified in name. If set to drain, it schedules
to remove all daemons from the host specified in name.
Table 3. Available options for the ceph_config module

192 IBM Storage Ceph


ceph_config Description Required Default
fsid The FSID of the Ceph cluster to interact with. false
image The Ceph container image to use. false
action Whether to set or get the parameter specified in option. false set
who Which daemon to set the configuration to. true
option Name of the parameter to set or get. true
value Value of the parameter to set. true if action is set
Table 4. Available options for the ceph_orch_apply module.
ceph_orch_apply Description Required
fsid The FSID of the Ceph cluster to interact with. false
image The Ceph container image to use. false
spec The service specification to apply. true
Table 5. Available options for the ceph_orch_daemon module.
ceph_orch_daemon Description Required
fsid The FSID of the Ceph cluster to interact with. false
image The Ceph container image to use. false
state The desired state of the service specified in name. true

If started, it ensures the service is started.

If stopped, it ensures the service is stopped.

If restarted, it will restart the service.


daemon_id The ID of the service. true
daemon_type The type of service. true
Table 6. Available options for the cephadm_registry_login module
cephadm_regis Description Required Default
try_login
state Login or logout of a registry. false login
docker Use docker instead of podman. false
registry_url The URL for custom registry. false
registry_user Username for custom registry. true when
name state is login.
registry_pass Password for custom registry. true when
word state is login.
registry_json The path to a JSON file. This file must be present on remote hosts prior to
running this task. This option is currently not supported.

Bootstrapping a storage cluster using the cephadm_bootstrap


and cephadm_registry_login modules

Edit online
As a storage administrator, you can bootstrap a storage cluster using Ansible by using the cephadm_bootstrap and
cephadm_registry_login modules in your Ansible playbook.

Prerequisites
Edit online

An IP address for the first Ceph Monitor container, which is also the IP address for the first node in the storage cluster.

Login access to cp.icr.io/cp.

A minimum of 10 GB of free space for /var/lib/containers/.

Red Hat Enterprise Linux 8.4 EUS or later.

IBM Storage Ceph 193


Installation of the cephadm-ansible package on the Ansible administration node.

Passwordless SSH is set up on all hosts in the storage cluster.

Hosts are registered with CDN.

Procedure
Edit online

1. Log in to the Ansible administration node.

2. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

3. Create the hosts file and add hosts, labels, and monitor IP address of the first host in the storage cluster:

Syntax

sudo vi INVENTORY_FILE

HOST1 labels="[LABEL1, LABEL2]"


HOST2 labels="[LABEL1, LABEL2]"
HOST3 labels="[LABEL1]"

[admin]
ADMIN_HOST monitor_address=MONITOR_IP_ADDRESS labels="[ADMIN_LABEL, LABEL1, LABEL2]"

Example

[ansible@admin cephadm-ansible]$ sudo vi hosts

host02 labels="['mon', 'mgr']"


host03 labels="['mon', 'mgr']"
host04 labels="['osd']"
host05 labels="['osd']"
host06 labels="['osd']"

[admin]
host01 monitor_address=10.10.128.68 labels="['_admin', 'mon', 'mgr']"

4. Run the preflight playbook:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=ibm"

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-vars


"ceph_origin=ibm"

5. Create a playbook to bootstrap your cluster:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: NAME_OF_PLAY
hosts: BOOTSTRAP_HOST
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
-name: NAME_OF_TASK
cephadm_registry_login:
state: STATE
registry_url: REGISTRY_URL
registry_username: REGISTRY_USER_NAME
registry_password: REGISTRY_PASSWORD

194 IBM Storage Ceph


- name: NAME_OF_TASK
cephadm_bootstrap:
mon_ip: "{{ monitor_address }}"
dashboard_user: DASHBOARD_USER
dashboard_password: DASHBOARD_PASSWORD
allow_fqdn_hostname: ALLOW_FQDN_HOSTNAME
cluster_network: NETWORK_CIDR

Example

[ansible@admin cephadm-ansible]$ sudo vi bootstrap.yml

---
- name: bootstrap the cluster
hosts: host01
become: true
gather_facts: false
tasks:
- name: login to registry
cephadm_registry_login:
state: login
registry_url: cp.icr.io/cp
registry_username: user1
registry_password: mypassword1

- name: bootstrap initial cluster


cephadm_bootstrap:
mon_ip: "{{ monitor_address }}"
dashboard_user: mydashboarduser
dashboard_password: mydashboardpassword
allow_fqdn_hostname: true
cluster_network: 10.10.128.0/28

6. Run the playbook:

Syntax

ansible-playbook -i INVENTORY_FILE PLAYBOOK_FILENAME.yml -vvv

Example

ansible@admin cephadm-ansible]$ ansible-playbook -i hosts bootstrap.yml -vvv

Review the Ansible output after running the playbook.

Adding or removing hosts using the ceph_orch_host module

Edit online
Add and remove hosts in your storage cluster by using the ceph_orch_host module in your Ansible playbook.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Register the nodes to the CDN and attach subscriptions.

Ansible user with sudo and passwordless SSH access to all nodes in the storage cluster.

Installation of the cephadm-ansible package on the Ansible administration node.

New hosts have the storage cluster’s public SSH key. For more information about copying the storage cluster’s public SSH
keys to new hosts, see Adding hosts.

Procedure

IBM Storage Ceph 195


Edit online

1. Use the following procedure to add new hosts to the cluster:

a. Log in to the Ansible administration node.

b. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

c. Add the new hosts and labels to the Ansible inventory file.

Syntax

sudo vi INVENTORY_FILE

NEW_HOST1 labels="[LABEL1, LABEL2]"


NEW_HOST2 labels="[LABEL1, LABEL2]"
NEW_HOST3 labels="[LABEL1]"

[admin]
ADMIN_HOST monitor_address=MONITOR_IP_ADDRESS labels="[ADMIN_LABEL, LABEL1, LABEL2]"

Example

[ansible@admin cephadm-ansible]$ sudo vi hosts

host02 labels="['mon', 'mgr']"


host03 labels="['mon', 'mgr']"
host04 labels="['osd']"
host05 labels="['osd']"
host06 labels="['osd']"

[admin]
host01 monitor_address= 10.10.128.68 labels="['_admin', 'mon', 'mgr']"

NOTE: If you have previously added the new hosts to the Ansible inventory file and ran the preflight playbook on the
hosts, skip to step 3.

d. Run the preflight playbook with the --limit option:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=ibm" -


-limit NEWHOST

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-


vars "ceph_origin=ibm" --limit host02

The preflight playbook installs podman, lvm2, chronyd, and cephadm on the new host. After installation is complete,
cephadm resides in the /usr/sbin/ directory.

e. Create a playbook to add the new hosts to the cluster:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: PLAY_NAME
hosts: HOSTS_OR_HOST_GROUPS
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_orch_host:
name: "{{ ansible_facts[hostname] }}"
address: "{{ ansible_facts[default_ipv4][address] }}"
labels: "{{ labels }}"
delegate_to: HOST_TO_DELEGATE_TASK_TO

196 IBM Storage Ceph


- name: NAME_OF_TASK
when: inventory_hostname in groups[admin]
ansible.builtin.shell:
cmd: CEPH_COMMAND_TO_RUN
register: REGISTER_NAME

- name: NAME_OF_TASK
when: inventory_hostname in groups[admin]
debug:
msg: "{{ REGISTER_NAME.stdout }}"

NOTE: By default, Ansible executes all tasks on the host that matches the hosts line of your playbook. The ceph
orch commands must run on the host that contains the admin keyring and the Ceph configuration file. Use the
delegate_to keyword to specify the admin host in your cluster.

Example

[ansible@admin cephadm-ansible]$ sudo vi add-hosts.yml

---
- name: add additional hosts to the cluster
hosts: all
become: true
gather_facts: true
tasks:
- name: add hosts to the cluster
ceph_orch_host:
name: "{{ ansible_facts['hostname'] }}"
address: "{{ ansible_facts['default_ipv4']['address'] }}"
labels: "{{ labels }}"
delegate_to: host01

- name: list hosts in the cluster


when: inventory_hostname in groups['admin']
ansible.builtin.shell:
cmd: ceph orch host ls
register: host_list

- name: print current list of hosts


when: inventory_hostname in groups['admin']
debug:
msg: "{{ host_list.stdout }}"

In this example, the playbook adds the new hosts to the cluster and displays a current list of hosts.

f. Run the playbook to add additional hosts to the cluster:

Syntax

ansible-playbook -i INVENTORY_FILE PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts add-hosts.yml

2. Use the following procedure to remove hosts from the cluster:

a. Log in to the Ansible administration node.

b. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

c. Create a playbook to remove a host or hosts from the cluster:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: NAME_OF_PLAY
hosts: ADMIN_HOST

IBM Storage Ceph 197


become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_orch_host:
name: HOST_TO_REMOVE
state: STATE

- name: NAME_OF_TASK
ceph_orch_host:
name: HOST_TO_REMOVE
state: STATE
retries: NUMBER_OF_RETRIES
delay: DELAY
until: CONTINUE_UNTIL
register: REGISTER_NAME

- name: NAME_OF_TASK
ansible.builtin.shell:
cmd: ceph orch host ls
register: REGISTER_NAME

- name: NAME_OF_TASK
debug:
msg: "{{ REGISTER_NAME.stdout }}"

Example

[ansible@admin cephadm-ansible]$ sudo vi remove-hosts.yml

---
- name: remove host
hosts: host01
become: true
gather_facts: true
tasks:
- name: drain host07
ceph_orch_host:
name: host07
state: drain

- name: remove host from the cluster


ceph_orch_host:
name: host07
state: absent
retries: 20
delay: 1
until: result is succeeded
register: result

- name: list hosts in the cluster


ansible.builtin.shell:
cmd: ceph orch host ls
register: host_list

- name: print current list of hosts


debug:
msg: "{{ host_list.stdout }}"

In this example, the playbook tasks drain all daemons on host07, removes the host from the cluster, and displays a
current list of hosts.

d. Run the playbook to remove host from the cluster:

Syntax

ansible-playbook -i INVENTORY_FILE PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts remove-hosts.yml

Verification

198 IBM Storage Ceph


Edit online

Review the Ansible task output displaying the current list of hosts in the cluster:

Example

TASK [print current hosts]


**********************************************************************************************
********
Friday 24 June 2022 14:52:40 -0400 (0:00:03.365) 0:02:31.702 ***********
ok: [host01] =>
msg: |-
HOST ADDR LABELS STATUS
host01 10.10.128.68 _admin mon mgr
host02 10.10.128.69 mon mgr
host03 10.10.128.70 mon mgr
host04 10.10.128.71 osd
host05 10.10.128.72 osd
host06 10.10.128.73 osd

Setting configuration options using the ceph_config module

Edit online
As a storage administrator, you can set or get IBM Storage Ceph configuration options using the ceph_config module.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ansible user with sudo and passwordless SSH access to all nodes in the storage cluster.

Installation of the cephadm-ansible package on the Ansible administration node.

The Ansible inventory file contains the cluster and admin hosts. For more information about adding hosts to your storage
cluster, see Adding or removing hosts using the ceph orch host module.

Procedure
Edit online

1. Log in to the Ansible administration node.

2. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

3. Create a playbook with configuration changes:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: PLAY_NAME
hosts: ADMIN_HOST
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_config:
action: GET_OR_SET
who: DAEMON_TO_SET_CONFIGURATION_TO
option: CEPH_CONFIGURATION_OPTION

IBM Storage Ceph 199


value: VALUE_OF_PARAMETER_TO_SET

- name: NAME_OF_TASK
ceph_config:
action: GET_OR_SET
who: DAEMON_TO_SET_CONFIGURATION_TO
option: CEPH_CONFIGURATION_OPTION
register: REGISTER_NAME

- name: NAME_OF_TASK
debug:
msg: "MESSAGE_TO_DISPLAY {{ REGISTER_NAME.stdout }}"

Example

[ansible@admin cephadm-ansible]$ sudo vi change_configuration.yml

---
- name: set pool delete
hosts: host01
become: true
gather_facts: false
tasks:
- name: set the allow pool delete option
ceph_config:
action: set
who: mon
option: mon_allow_pool_delete
value: true

- name: get the allow pool delete setting


ceph_config:
action: get
who: mon
option: mon_allow_pool_delete
register: verify_mon_allow_pool_delete

- name: print current mon_allow_pool_delete setting


debug:
msg: "the value of 'mon_allow_pool_delete' is {{ verify_mon_allow_pool_delete.stdout
}}"

In this example, the playbook first sets the mon_allow_pool_delete option to false. The playbook then gets the current
mon_allow_pool_delete setting and displays the value in the Ansible output.

4. Run the playbook:

Syntax

ansible-playbook -i INVENTORY_FILE _PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts change_configuration.yml

Verification
Edit online

Review the output from the playbook tasks.

Example

TASK [print current mon_allow_pool_delete setting]


*************************************************************
Wednesday 29 June 2022 13:51:41 -0400 (0:00:05.523) 0:00:17.953 ********
ok: [host01] =>
msg: the value of 'mon_allow_pool_delete' is true

Reference
Edit online

200 IBM Storage Ceph


See Configuring for more details.

Applying a service specification using the ceph_orch_apply


module
Edit online
As a storage administrator, you can apply service specifications to your storage cluster using the ceph_orch_apply module in your
Ansible playbooks. A service specification is a data structure to specify the service attributes and configuration settings that is used
to deploy the Ceph service. You can use a service specification to deploy Ceph service types like mon, crash, mds, mgr, osd, rdb, or
rbd-mirror.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ansible user with sudo and passwordless SSH access to all nodes in the storage cluster.

Installation of the cephadm-ansible package on the Ansible administration node.

The Ansible inventory file contains the cluster and admin hosts. For more information about adding hosts to your storage
cluster, see Adding or removing hosts using the ceph orchhost module.

Procedure
Edit online

1. Log in to the Ansible administration node.

2. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

3. Create a playbook with the service specifications:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: PLAY_NAME
hosts: HOSTS_OR_HOST_GROUPS
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_orch_apply:
spec: |
service_type: SERVICE_TYPE
service_id: UNIQUE_NAME_OF_SERVICE
placement:
host_pattern: HOST_PATTERN_TO_SELECT_HOSTS
label: LABEL
spec:
SPECIFICATION_OPTIONS:

Example

[ansible@admin cephadm-ansible]$ sudo vi deploy_osd_service.yml

---
- name: deploy osd service

IBM Storage Ceph 201


hosts: host01
become: true
gather_facts: true
tasks:
- name: apply osd spec
ceph_orch_apply:
spec: |
service_type: osd
service_id: osd
placement:
host_pattern: '*'
label: osd
spec:
data_devices:
all: true

In this example, the playbook deploys the Ceph OSD service on all hosts with the label osd.

4. Run the playbook:

Syntax

ansible-playbook -i INVENTORY_FILE _PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts deploy_osd_service.yml

Verification
Edit online

Review the output from the playbook tasks.

Managing Ceph daemon states using the ceph_orch_daemon


module
Edit online
Start, stop, and restart Ceph daemons on hosts using the ceph_orch_daemon module in your Ansible playbooks.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ansible user with sudo and passwordless SSH access to all nodes in the storage cluster.

Installation of the cephadm-ansible package on the Ansible administration node.

The Ansible inventory file contains the cluster and admin hosts. For more information about adding hosts to your storage
cluster, see Adding or removing hosts using the ceph_orch_host module.

Procedure
Edit online

1. Log in to the Ansible administration node.

2. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

202 IBM Storage Ceph


3. Create a playbook with daemon state changes:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: PLAY_NAME
hosts: ADMIN_HOST
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_orch_daemon:
state: STATE_OF_SERVICE
daemon_id: DAEMON_ID
daemon_type: TYPE_OF_SERVICE

Example

[ansible@admin cephadm-ansible]$ sudo vi restart_services.yml

---
- name: start and stop services
hosts: host01
become: true
gather_facts: false
tasks:
- name: start osd.0
ceph_orch_daemon:
state: started
daemon_id: 0
daemon_type: osd

- name: stop mon.host02


ceph_orch_daemon:
state: stopped
daemon_id: host02
daemon_type: mon

In this example, the playbook starts the OSD with an ID of 0 and stops a Ceph Monitor with an id of host02.

4. Run the playbook:

Syntax

ansible-playbook -i INVENTORY_FILE _PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts restart_services.yml

Verification
Edit online

Review the output from the playbook tasks.

Comparison between Ceph Ansible and Cephadm


Edit online
The IBM Storage Ceph 5 introduced a new deployment tool, Cephadm, for the containerized deployment of the storage cluster.

The tables compare Cephadm with Ceph-Ansible playbooks for managing the containerized deployment of a Ceph cluster for day
one and day two operations.

Table 1. Day one operations


Description Ceph-Ansible Cephadm

IBM Storage Ceph 203


Description Ceph-Ansible Cephadm
Installation of the Run the site-container.yml Run cephadm bootstrap command to bootstrap the cluster on
IBM Storage Ceph playbook. the admin node.
cluster
Addition of hosts Use the Ceph Ansible inventory. Run ceph orch add host HOST_NAME to add hosts to the
cluster.
Gathering Ceph logs Run the gather ceph logs Run the journalctl command.
playbook.
Addition of monitors Run the add-mon.yml playbook. Run the ceph orch apply mon command.
Addition of Run the site-container.yml Run the ceph orch apply mgr command.
managers playbook.
Addition of OSDs Run the add-osd.yml playbook. Run the ceph orch apply osd command to add OSDs on all
available devices or on specific hosts.
Addition of OSDs on Select the devices in the osd.yml file Select the paths filter under the data_devices in the osd.yml
specific devices and then run the add-osd.yml file and then run ceph orch apply -i FILE_NAME.yml
playbook. command.
Addition of MDS Run the site-container.yml Run the ceph orch apply FILESYSTEM_NAME command to
playbook. add MDS.
Addition of Ceph Run the site-container.yml Run the ceph orch apply rgw commands to add Ceph Object
Object Gateway playbook. Gateway.
Table 2. Day two operations
Description Ceph-Ansible Cephadm
Removing hosts Use the Ansible inventory. Run ceph orch host rm HOST_NAME to remove
the hosts.
Removing monitors Run the shrink-mon.yml playbook. Run ceph orch apply mon to redeploy other
monitors.
Removing managers Run the shrink-mon.yml playbook. Run ceph orch apply mgr to redeploy other
managers.
Removing OSDs Run the shrink-osd.yml playbook. Run ceph orch osd rm OSD_ID to remove the
OSDs.
Removing MDS Run the shrink-mds.yml playbook.
Deployment of Ceph Object Run the site-container.yml playbook. Run ceph orch apply rgw _SERVICE_NAME_
Gateway to deploy Ceph Object Gateway service.
Removing Ceph Object Gateway Run the shrink-rgw.yml playbook. Run ceph orch rm SERVICE_NAME to remove
the specific service.
Block device mirroring Run the site-container.yml playbook. Run ceph orch apply rbd-mirror command.
Minor version upgrade of IBM Run the infrastructure- Run ceph orch upgrade start command.
Storage Ceph playbooks/rolling_update.yml
playbook.
Upgrading from IBM Storage Run infrastructure- Upgrade using Cephadm is not supported.
Ceph 4 to IBM Storage Ceph 6 playbooks/rolling_update.yml
playbook.
Deployment of monitoring stack Edit the all.yml file during installation. Run the ceph orch apply -i FILE.yml after
specifying the services.

cephadm commands

Edit online
The cephadm is a command line tool to manage the local host for the Cephadm Orchestrator. It provides commands to investigate
and modify the state of the current host.

Some of the commands are generally used for debugging.

NOTE: cephadm is not required on all hosts, however, it is useful when investigating a particular daemon. The cephadm-ansible-
preflight playbook installs cephadm on all hosts and the cephadm-ansible purge playbook requires cephadm be installed on
all hosts to work properly.

204 IBM Storage Ceph


adopt

Description
Convert an upgraded storage cluster daemon to run cephadm.

Syntax

cephadm adopt [-h] --name DAEMON_NAME --style STYLE [--cluster CLUSTER] --legacy-dir [LEGACY_DIR] -
-config-json CONFIG_JSON] [--skip-firewalld] [--skip-pull]

Example

[root@host01 ~]# cephadm adopt --style=legacy --name prometheus.host02

ceph-volume

Description
This command is used to list all the devices on the particular host. Run the ceph-volume command inside a container Deploys
OSDs with different device technologies like lvm or physical disks using pluggable tools and follows a predictable, and robust way of
preparing, activating, and starting OSDs.

Syntax

cephadm ceph-volume inventory/simple/raw/lvm [-h] [--fsid FSID] [--config-json CONFIG_JSON] [--


config CONFIG, -c CONFIG] [--keyring KEYRING, -k KEYRING]

Example

[root@nhost01 ~]# cephadm ceph-volume inventory --fsid f64f341c-655d-11eb-8778-fa163e914bcc

check-host

Description
Check the host configuration that is suitable for a Ceph cluster.

Syntax

cephadm check-host [--expect-hostname HOSTNAME]

Example

[root@host01 ~]# cephadm check-host --expect-hostname host02

deploy

Description
Deploys a daemon on the local host.

Syntax

cephadm shell deploy DAEMON_TYPE [-h] [--name DAEMON_NAME] [--fsid FSID] [--config CONFIG, -c
CONFIG] [--config-json CONFIG_JSON] [--keyring KEYRING] [--key KEY] [--osd-fsid OSD_FSID] [--skip-
firewalld] [--tcp-ports TCP_PORTS] [--reconfig] [--allow-ptrace] [--memory-request MEMORY_REQUEST]
[--memory-limit MEMORY_LIMIT] [--meta-json META_JSON]

Example

[root@host01 ~]# cephadm shell deploy mon --fsid f64f341c-655d-11eb-8778-fa163e914bcc

enter

Description
Run an interactive shell inside a running daemon container.

Syntax

cephadm enter [-h] [--fsid FSID] --name NAME [command [command …]]

Example

[root@host01 ~]# cephadm enter --name 52c611f2b1d9

help

IBM Storage Ceph 205


Description
View all the commands supported by cephadm.

Syntax

cephadm help

Example

[root@host01 ~]# cephadm help

install

Description
Install the packages.

Syntax

cephadm install PACKAGES

Example

[root@host01 ~]# cephadm install ceph-common ceph-osd

inspect-image

Description
Inspect the local Ceph container image.

Syntax

cephadm --image IMAGE_ID inspect-image

Example

[root@host01 ~]# cephadm --image 13ea90216d0be03003d12d7869f72ad9de5cec9e54a27fd308e01e467c0d4a0a


inspect-image

list-networks

Description
List the IP networks.

Syntax

cephadm list-networks

Example

[root@host01 ~]# cephadm list-networks

ls

Description
List daemon instances known to cephadm on the hosts. You can use --no-detail for the command to run faster, which gives
details of the daemon name, fsid, style, and systemd unit per daemon. You can use --legacy-dir option to specify a legacy base
directory to search for daemons.

Syntax

cephadm ls [--no-detail] [--legacy-dir LEGACY_DIR]

Example

[root@host01 ~]# cephadm ls --no-detail

logs

Description
Print journald logs for a daemon container. This is similar to the journalctl command.

Syntax

206 IBM Storage Ceph


cephadm logs [--fsid FSID] --name DAEMON_NAME
cephadm logs [--fsid FSID] --name DAEMON_NAME -- -n NUMBER # Last N lines
cephadm logs [--fsid FSID] --name DAEMON_NAME -- -f # Follow the logs

Example

[root@host01 ~]# cephadm logs --fsid 57bddb48-ee04-11eb-9962-001a4a000672 --name osd.8


[root@host01 ~]# cephadm logs --fsid 57bddb48-ee04-11eb-9962-001a4a000672 --name osd.8 -- -n 20
[root@host01 ~]# cephadm logs --fsid 57bddb48-ee04-11eb-9962-001a4a000672 --name osd.8 -- -f

prepare-host

Description
Prepare a host for cephadm.

Syntax

cephadm prepare-host [--expect-hostname HOSTNAME]

Example

[root@host01 ~]# cephadm prepare-host


[root@host01 ~]# cephadm prepare-host --expect-hostname host01

pull

Description
Pull the Ceph image.

Syntax

cephadm [-h] [--image IMAGE_ID] pull

Example

[root@host01 ~]# cephadm --image 13ea90216d0be03003d12d7869f72ad9de5cec9e54a27fd308e01e467c0d4a0a


pull

registry-login

Description
Give cephadm login information for an authenticated registry. Cephadm attempts to log the calling host into that registry.

Syntax

cephadm registry-login --registry-url REGISTRY_URL --registry-username USERNAME --registry-password


PASSWORD [--fsid FSID] [--registry-json JSON_FILE]

Example

[root@host01 ~]# cephadm registry-login --registry-url cp.icr.io/cp --registry-username myuser1 --


registry-password mypassword1

You can also use a JSON registry file containing the login info formatted as:

Syntax

cat REGISTRY_FILE

{
"url":"REGISTRY_URL",
"username":"REGISTRY_USERNAME",
"password":"REGISTRY_PASSWORD"
}

Example

[root@host01 ~]# cat registry_file

{
"url":"cp.icr.io/cp",
"username":"myuser",
"password":"mypass"
}

IBM Storage Ceph 207


[root@host01 ~]# cephadm registry-login -i registry_file

rm-daemon

Description
Remove a specific daemon instance. If you run the cephadm rm-daemon command on the host directly, although the command
removes the daemon, the cephadm mgr module notices that the daemon is missing and redeploys it. This command is problematic
and should be used only for experimental purposes and debugging.

Syntax

cephadm rm-daemon --fsid FSID --name DAEMON_NAME [--force ] [--force-delete-data]

Example

[root@host01 ~]# cephadm rm-daemon --fsid f64f341c-655d-11eb-8778-fa163e914bcc --name osd.8

rm-cluster

Description
Remove all the daemons from a storage cluster on that specific host where it is run. Similar to rm-daemon, if you remove a few
daemons this way and the Ceph Orchestrator is not paused and some of those daemons belong to services that are not unmanaged,
the cephadm orchestrator just redeploys them there.

Syntax

cephadm rm-cluster --fsid FSID [--force]

Example

[root@host01 ~]# cephadm rm-cluster --fsid f64f341c-655d-11eb-8778-fa163e914bcc

rm-repo

Description
Remove a package repository configuration. This is mainly used for the disconnected installation of IBM Storage Ceph.

Syntax

cephadm rm-repo [-h]

Example

[root@host01 ~]# cephadm rm-repo

run

Description
Run a Ceph daemon, in a container, in the foreground.

Syntax

cephadm run [--fsid FSID] --name DAEMON_NAME

Example

[root@host01 ~]# cephadm run --fsid f64f341c-655d-11eb-8778-fa163e914bcc --name osd.8

shell

Description
Run an interactive shell with access to Ceph commands over the inferred or specified Ceph cluster. You can enter the shell using the
cephadm shell command and run all the orchestrator commands within the shell.

Syntax

cephadm shell [--fsid FSID] [--name DAEMON_NAME, -n DAEMON_NAME] [--config CONFIG, -c CONFIG] [--
mount MOUNT, -m MOUNT] [--keyring KEYRING, -k KEYRING] [--env ENV, -e ENV]

Example

[root@host01 ~]# cephadm shell -- ceph orch ls


[root@host01 ~]# cephadm shell

208 IBM Storage Ceph


unit

Description
Start, stop, restart, enable, and disable the daemons with this operation. This operates on the daemon’s systemd unit.

Syntax

cephadm unit [--fsid FSID] --name DAEMON_NAME start/stop/restart/enable/disable

Example

[root@host01 ~]# cephadm unit --fsid f64f341c-655d-11eb-8778-fa163e914bcc --name osd.8 start

version

Description
Provides the version of the storage cluster.

Syntax

cephadm version

Example

[root@host01 ~]# cephadm version

What to do next? Day 2


Edit online
As a storage administrator, once you have installed and configured IBM Storage Ceph, you are ready to perform "Day Two" operations
for your storage cluster. These operations include adding metadata servers (MDS) and object gateways (RGW), and configuring
services.

For more information about how to use the cephadm orchestrator to perform "Day Two" operations, see Operations.

To deploy, configure, and administer the Ceph Object Gateway on "Day Two" operations, see Object Gateway.

Upgrading
Edit online
Upgrade to an IBM Storage Ceph cluster running Red Hat Enterprise Linux on AMD64 and Intel 64 architectures.

Upgrading to an IBM Storage Ceph cluster using cephadm

Upgrading to an IBM Storage Ceph cluster using cephadm

Edit online
As a storage administrator, you can use the cephadm Orchestrator to upgrade from Red Hat Ceph Storage 5.3 to IBM Storage Ceph
5.3.

You can also use the Orchestrator to upgrade an IBM Storage Ceph cluster to later releases.

The automated upgrade process follows Ceph best practices. For example:

The upgrade order starts with Ceph Managers, Ceph Monitors, then other daemons.

Each daemon is restarted only after Ceph indicates that the cluster will remain available.

The storage cluster health status is likely to switch to HEALTH_WARNING during the upgrade. When the upgrade is complete, the
health status should switch back to HEALTH_OK.

IBM Storage Ceph 209


NOTE: You do not get a message once the upgrade is successful. Run ceph versions and ceph orch ps commands to verify the
new image ID and the version of the storage cluster.

IMPORTANT: Red Hat Enterprise Linux 9 and later does not support the cephadm-ansible playbook.

Upgrading to IBM Storage Ceph cluster


Upgrading the IBM Storage Ceph cluster in a disconnected environment
Staggered upgrade
Monitoring and managing upgrade of the storage cluster
Troubleshooting upgrade error messages

Upgrading to IBM Storage Ceph cluster


Edit online
You can use ceph orch upgrade command for upgrading to an IBM Storage Ceph 5 cluster.

IMPORTANT: Red Hat Enterprise Linux 9 and later does not support the cephadm-ansible playbook.

Prerequisites
Edit online

Latest version of Red Hat Ceph Storage cluster 5.3.

Root-level access to all the nodes.

Ansible user with sudo and passwordless ssh access to all nodes in the storage cluster.

At least two Ceph Manager nodes in the storage cluster: one active and one standby.

NOTE: IBM Storage Ceph 5 also includes a health check function that returns a DAEMON_OLD_VERSION warning if it detects that
any of the daemons in the storage cluster are running multiple versions of IBM Storage Ceph. The warning is triggered when the
daemons continue to run multiple versions of IBM Storage Ceph beyond the time value set in the
mon_warn_older_version_delay option. By default, the mon_warn_older_version_delay option is set to 1 week. This
setting allows most upgrades to proceed without falsely seeing the warning.

If the upgrade process is paused for an extended time period, you can mute the health warning:

ceph health mute DAEMON_OLD_VERSION --sticky

After the upgrade has finished, unmute the health warning:

ceph health unmute DAEMON_OLD_VERSION

Procedure
Edit online

1. Enable the Red Hat Enterprise Linux baseos and appstream repositories:

Example

Red Hat Enterprise Linux 8:

[root@admin ~]# subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms

[root@admin ~]# subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms

Example

Red Hat Enterprise Linux 9:

[root@admin ~]# subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms

[root@admin ~]# subscription-manager repos --enable=rhel-9-for-x86_64-appstream-rpms

210 IBM Storage Ceph


2. Perform this step if upgrading from Red Hat Ceph Storage 5.3 to IBM Storage Ceph 5.3.

If you are upgrading from elsewhere, skip to step 4.

IMPORTANT: This step must be performed to avoid upgrade failures of cephadm and ceph-ansible packages.

a. Disable the Red Hat Ceph Storage ceph-tools repository (/etc/yum.repos.d/).

Example

[cephuser@user-node yum.repos.d]$ subscription-manager repos --disable=rhceph-5-tools-


for-rhel-8-x86_64-rpms

3. Enable the ceph-tools repository (/etc/yum.repos.d/) for both Red Hat Enterprise Linux 8 and Red Hat Enterprise Linux
9:

Red Hat Enterprise Linux 8:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-


ceph-5-rhel-8.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-8.repo

Red Hat Enterprise Linux 9:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-


ceph-5-rhel-9.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-9.repo

Repeat the above steps on all the nodes of the storage cluster

4. Add license to install IBM Storage Ceph and click Accept on all nodes:

Example

[root@admin ~]# dnf -y install ibm-storage-ceph-license

Accept these provisions:

Example

[root@admin ~]# sudo touch /usr/share/ibm-storage-ceph-license/accept

5. Perform this step if upgrading from Red Hat Ceph Storage 5.3 to IBM Storage Ceph 5.3:

NOTE: cephadm-ansible is not supported in Red Hat Enterprise Linux 9.

Example

[root@admin ~]# dnf -y reinstall cephadm


[root@admin ~]# dnf -y reinstall cephadm-ansible

6. Navigate to the /usr/share/cephadm-ansible/ directory:

Example

[root@admin ~]# cd /usr/share/cephadm-ansible

7. Run the preflight playbook with the upgrade_ceph_packagesparameter set to true on the bootstrapped host in the
storage cluster:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=ibm


upgrade_ceph_packages=true"

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i /etc/ansible/hosts cephadm-preflight.yml


--extra-vars "ceph_origin=ibm upgrade_ceph_packages=true"

This package upgrades cephadm on all the nodes.

8. Log into the cephadm shell:

Example

IBM Storage Ceph 211


[root@host01 ~]# cephadm shell

9. Ensure all the hosts are online and that the storage cluster is healthy:

Example

[ceph: root@host01 /]# ceph -s

10. Set the OSD noout, noscrub, and nodeep-scrub flags to prevent OSDs from getting marked out during upgrade and to
avoid unnecessary load on the cluster:

Example

[ceph: root@host01 /]# ceph osd set noout


[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

11. Login to registry and check service versions and the available target containers:

Syntax

ceph cephadm registry-login cp.icr.io USERNAME PASSWORD

or (using json file)

cat mylogin.json
{ "url":"REGISTRY_URL",
"username":"USER_NAME",
"password":"PASSWORD" }
ceph cephadm registry-login -i mylogin.json

12. Check service versions and the available target containers:

Syntax

ceph orch upgrade check IMAGE_NAME

Example

[ceph: root@host01 /]# ceph orch upgrade check cp.icr.io/cp/ibm-ceph/ceph-5-rhel8:latest

13. Upgrade the storage cluster:

Syntax

ceph orch upgrade start IMAGE_NAME

Example

[ceph: root@host01 /]# ceph orch upgrade start cp.icr.io/cp/ibm-ceph/ceph-5-rhel8:latest

NOTE: To perform a staggered upgrade, see Performing a staggered upgrade

While the upgrade is underway, a progress bar appears in the ceph status output.

Example

[ceph: root@host01 /]# ceph status


[...]
progress:
Upgrade to 16.2.0-146.el8cp (1s)
[............................]

14. Verify the new IMAGE_ID and VERSION of the Ceph cluster:

Example

[ceph: root@host01 /]# ceph versions


[ceph: root@host01 /]# ceph orch ps

NOTE: If you are not using the cephadm-ansible playbooks, after upgrading your Ceph cluster, you must upgrade the
ceph-common package and client libraries on your client nodes.

Example

212 IBM Storage Ceph


[root@client01 ~] dnf -y reinstall ceph-common

Verify you have the latest version:

Example

[root@client01 ~] ceph --version

15. When the upgrade is complete, unset the noout, noscrub, and nodeep-scrub flags:

Example

[ceph: root@host01 /]# ceph osd unset noout


[ceph: root@host01 /]# ceph osd unset noscrub
[ceph: root@host01 /]# ceph osd unset nodeep-scrub

Upgrading the IBM Storage Ceph cluster in a disconnected


environment
Edit online
You can upgrade the storage cluster in a disconnected environment by using the --image tag.

Prerequisites
Edit online

Latest version of Red Hat Ceph Storage cluster 5.3.

Root-level access to all the nodes.

Ansible user with sudo and passwordless ssh access to all nodes in the storage cluster.

At least two Ceph Manager nodes in the storage cluster: one active and one standby.

Register the nodes to CDN and attach subscriptions.

Check for the customer container images in a disconnected environment and change the configuration, if required. See the
Changing configurations of custom container images for disconnected installations

By default, the monitoring stack components are deployed based on the primary Ceph image. For disconnected environment of the
storage cluster, you have to use the latest available monitoring stack component images.

Table: Custom image details for monitoring stack components

Monitoring stack component IBM Storage Ceph version Image details


Prometheus All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/prometheus:v4.10
Grafana All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/ceph-5-dashboard-rhel8:latest
Node-exporter All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/prometheus-node-exporter:v4.10
AlertManager All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/prometheus-alertmanager:v4.10
HAProxy All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/haproxy-rhel8:latest
Keepalived All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/keepalived-rhel8:latest
SNMP Gateway All IBM Storage Ceph 5 versions cp.icr.io/cp/ibm-ceph/snmp-notifier-rhel8:latest

Procedure
Edit online

1. Add license to install IBM Storage Ceph and click Accept on all nodes:

Example

[root@admin ~]# dnf install ibm-storage-ceph-license

IBM Storage Ceph 213


a. Accept these provisions:

Example

[root@admin ~]# sudo touch /usr/share/ibm-storage-ceph-license/accept

2. Update the cephadm and cephadm-ansible package.

Example

[root@admin ~]# dnf update cephadm


[root@admin ~]# dnf update cephadm-ansible

3. Run the preflight playbook with the upgrade_ceph_packages parameter set to true and the ceph_origin parameter set
to custom on the bootstrapped host in the storage cluster:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=custom


upgrade_ceph_packages=true"

Example

[ansible@admin ~]$ ansible-playbook -i /etc/ansible/hosts cephadm-preflight.yml --extra-vars


"ceph_origin=custom upgrade_ceph_packages=true"

This package upgrades cephadm on all the nodes.

4. Log into the cephadm shell:

Example

[root@node0 ~]# cephadm shell

5. Ensure all the hosts are online and that the storage cluster is healthy:

Example

[ceph: root@node0 /]# ceph -s

6. Check service versions and the available target containers:

Syntax

ceph orch upgrade check IMAGE_NAME

Example

[ceph: root@node0 /]# ceph orch upgrade check _LOCAL_NODE_FQDN_:5000/ibm-ceph/ceph-5-


rhel8:latest

7. Set the OSD noout, noscrub, and nodeep-scrub flags to prevent OSDs from getting marked out during upgrade and to
avoid unnecessary load on the cluster:

Example

[ceph: root@host01 /]# ceph osd set noout


[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

8. Upgrade the storage cluster:

Syntax

ceph orch upgrade start IMAGE_NAME

Example

[ceph: root@node0 /]# ceph orch upgrade start _LOCAL_NODE_FQDN_:5000/ibm-ceph/ceph-5-


rhel8:latest

While the upgrade is underway, a progress bar appears in the ceph status output.

Example

214 IBM Storage Ceph


[ceph: root@node0 /]# ceph status
[...]
progress:
Upgrade to 16.2.0-115.el8cp (1s)
[............................]

9. Verify the new IMAGE_ID and VERSION of the Ceph cluster:

Example

[ceph: root@node0 /]# ceph version


[ceph: root@node0 /]# ceph versions
[ceph: root@node0 /]# ceph orch ps

10. When the upgrade is complete, unset the noout, noscrub, and nodeep-scrub flags:

Example

[ceph: root@host01 /]# ceph osd unset noout


[ceph: root@host01 /]# ceph osd unset noscrub
[ceph: root@host01 /]# ceph osd unset nodeep-scrub

Reference
Edit online

See the Registering IBM Storage Ceph nodes to the CDN and attaching subscriptions

See the Configuring a private registry for a disconnected installation

Staggered upgrade
Edit online
As a storage administrator, you can upgrade IBM Storage Ceph components in phases rather than all at once. The ceph orch
upgrade command enables you to specify options to limit which daemons are upgraded by a single upgrade command.

NOTE: If you want to upgrade from a version that does not support staggered upgrades, you must first manually upgrade the Ceph
Manager (ceph-mgr) daemons.

Staggered upgrade options


Performing a staggered upgrade

Staggered upgrade options


Edit online
The ceph orch upgrade command supports several options to upgrade cluster components in phases. The staggered upgrade
options include:

--daemon_types: The --daemon_types option takes a comma-separated list of daemon types and will only upgrade
daemons of those types. Valid daemon types for this option include mgr, mon, crash, osd, mds, rgw, rbd-mirror, and
cephfs-mirror.

--services: The --services option is mutually exclusive with --daemon-types, only takes services of one type at a time,
and will only upgrade daemons belonging to those services. For example, you cannot provide an OSD and RGW service
simultaneously.

--hosts: You can combine the --hosts option with --daemon_types, --services, or use it on its own. The --hosts
option parameter follows the same format as the command line options for orchestrator CLI placement specification.

--limit: The --limit option takes an integer greater than zero and provides a numerical limit on the number of daemons
cephadm will upgrade. You can combine the --limit option with --daemon_types, --services, or --hosts. For

IBM Storage Ceph 215


example, if you specify to upgrade daemons of type osd on host01 with a limit set to 3, cephadm will upgrade up to three
OSD daemons on host01.

Performing a staggered upgrade


Edit online
As a storage administrator, you can use the ceph orch upgrade options to limit which daemons are upgraded by a single upgrade
command.

Cephadm strictly enforces an order for the upgrade of daemons that is still present in staggered upgrade scenarios. The current
upgrade order is:

Ceph Manager nodes

Ceph Monitor nodes

Ceph-crash daemons

Ceph OSD nodes

Ceph Metadata Server (MDS) nodes

Ceph Object Gateway (RGW) nodes

Ceph RBD-mirror node

CephFS-mirror node

NOTE: If you specify parameters that upgrade daemons out of order, the upgrade command blocks and notes which daemons you
need to upgrade before you proceed.

Example

[ceph: root@host01 /]# ceph orch upgrade start --image cp.icr.io/cp/ibm-ceph/ceph-5-rhel8:latest --


hosts host02

Error EINVAL: Cannot start upgrade. Daemons with types earlier in upgrade order than daemons on
given host need upgrading.
Please first upgrade mon.ceph-host01
NOTE: Enforced upgrade order is: mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-
mirror

Prerequisites
Edit online

Latest version of Red Hat Ceph Storage cluster 5.3.

Root-level access to all the nodes.

At least two Ceph Manager nodes in the storage cluster: one active and one standby.

Procedure
Edit online

1. Log into the cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Ensure all the hosts are online and that the storage cluster is healthy:

Example

216 IBM Storage Ceph


[ceph: root@host01 /]# ceph -s

3. Check service versions and the available target containers:

Syntax

ceph orch upgrade check IMAGE_NAME

Example

[ceph: root@host01 /]# ceph orch upgrade check cp.icr.io/cp/ibm-ceph/ceph-5-rhel8:latest

4. Upgrade the storage cluster:

a. To upgrade specific daemon types on specific hosts:

Syntax

ceph orch upgrade start --image IMAGE_NAME --daemon-types DAEMON_TYPE1,DAEMON_TYPE2 --


hosts HOST1,HOST2

Example

[ceph: root@host01 /]# ceph orch upgrade start --image cp.icr.io/cp/ibm-ceph/ceph-5-


rhel8:latest --daemon-types mgr,mon --hosts host02,host03

b. To specify specific services and limit the number of daemons to upgrade:

Syntax

ceph orch upgrade start --image IMAGE_NAME --services SERVICE1,SERVICE2 --limit


LIMIT_NUMBER

Example

[ceph: root@host01 /]# ceph orch upgrade start --image cp.icr.io/cp/ibm-ceph/ceph-5-


rhel8:latest --services rgw.example1,rgw1.example2 --limit 2

NOTE: In staggered upgrade scenarios, if using a limiting parameter, the monitoring stack daemons, including
Prometheus and node-exporter, are refreshed after the upgrade of the Ceph Manager daemons. As a result of the
limiting parameter, Ceph Manager upgrades take longer to complete. The versions of monitoring stack daemons might
not change between Ceph releases, in which case, they are only redeployed.

NOTE: Upgrade commands with limiting parameters validates the options before beginning the upgrade, which can
require pulling the new container image. As a result, the upgrade start command might take a while to return when
you provide limiting parameters.

5. To see which daemons you still need to upgrade, run the ceph orch upgrade check or ceph versions command:

Example

[ceph: root@host01 /]# ceph orch upgrade check --image cp.icr.io/cp/ibm-ceph/ceph-5-


rhel8:latest

6. To complete the staggered upgrade, verify the upgrade of all remaining services:

Syntax

ceph orch upgrade start --image IMAGE_NAME

Example

[ceph: root@host01 /]# ceph orch upgrade start --image cp.icr.io/cp/ibm-ceph/ceph-5-


rhel8:latest

7. Verify the new IMAGE_ID and VERSION of the Ceph cluster:

Example

[ceph: root@host01 /]# ceph versions


[ceph: root@host01 /]# ceph orch ps

Reference
IBM Storage Ceph 217
Edit online

For more information about performing a staggered upgrade and staggered upgrade options, see Performing a staggered
upgrade.

Monitoring and managing upgrade of the storage cluster


Edit online
After running the ceph orch upgrade start command to upgrade the IBM Storage Ceph cluster, you can check the status,
pause, resume, or stop the upgrade process. The health of the cluster changes to HEALTH_WARNING during an upgrade. If the host of
the cluster is offline, the upgrade is paused.

NOTE: You have to upgrade one daemon type after the other. If a daemon cannot be upgraded, the upgrade is paused.

Prerequisites
Edit online

A running Red Hat Ceph Storage cluster 5.3.

Root-level access to all the nodes.

At least two Ceph Manager nodes in the storage cluster: one active and one standby.

Upgrade for the storage cluster initiated.

Procedure
Edit online

1. Determine whether an upgrade is in process and the version to which the cluster is upgrading:

Example

[ceph: root@node0 /]# ceph orch upgrade status

NOTE: You do not get a message once the upgrade is successful. Run ceph versions and ceph orch ps commands to
verify the new image ID and the version of the storage cluster.

2. Optional: Pause the upgrade process:

Example

[ceph: root@node0 /]# ceph orch upgrade pause

3. Optional: Resume a paused upgrade process:

Example

[ceph: root@node0 /]# ceph orch upgrade resume

4. Optional: Stop the upgrade process:

Example

[ceph: root@node0 /]# ceph orch upgrade stop

Troubleshooting upgrade error messages


Edit online

218 IBM Storage Ceph


The following table shows some cephadm upgrade error messages. If the cephadm upgrade fails for any reason, an error message
appears in the storage cluster health status.

Error Message Description


UPGRADE_NO_ Ceph requires both active and standby manager daemons to proceed, but there is currently no standby.
STANDBY_MGR
UPGRADE_FAIL Ceph was unable to pull the container image for the target version. This can happen if you specify a version or
ED_PULL container image that does not exist (e.g., 1.2.3), or if the container registry is not reachable from one or more
hosts in the cluster.

Configuring
Edit online
This document provides instructions for configuring IBM Storage Ceph at boot time and run time. It also provides configuration
reference information.

The basics of Ceph configuration


Ceph network configuration
Ceph Monitor configuration
Ceph authentication configuration
Pools, placement groups, and CRUSH configuration
Ceph Object Storage Daemon (OSD) configuration
Ceph Monitor and OSD interaction configuration
Ceph debugging and logging configuration
General configuration options
Ceph network configuration options
Ceph Monitor configuration options
Cephx configuration options
Pools, placement groups, and CRUSH configuration options
Object Storage Daemon (OSD) configuration options
Ceph Monitor and OSD configuration options
Ceph debugging and logging configuration options
Ceph scrubbing options
BlueStore configuration options

The basics of Ceph configuration


Edit online
As a storage administrator, you need to have a basic understanding of how to view the Ceph configuration, and how to set the Ceph
configuration options for the IBM Storage Ceph cluster. You can view and set the Ceph configuration options at runtime.

Prerequisites

Installation of IBM Storage Ceph software.

Ceph configuration
The Ceph configuration database
Using the Ceph metavariables
Viewing the Ceph configuration at runtime
Viewing a specific configuration at runtime
Setting a specific configuration at runtime
OSD Memory Target
MDS Memory Cache Limit

Ceph configuration

IBM Storage Ceph 219


Edit online
All IBM Storage Ceph clusters have a configuration, which defines:

Cluster Identity

Authentication settings

Ceph daemons

Network configuration

Node names and addresses

Paths to keyrings

Paths to OSD log files

Other runtime options

A deployment tool, such as cephadm, will typically create an initial Ceph configuration file for you. However, you can create one
yourself if you prefer to bootstrap an IBM Storage Ceph cluster without using a deployment tool.

Reference

For more information about cephadm and the Ceph orchestrator, see Operations.

The Ceph configuration database


Edit online
The Ceph Monitor manages a configuration database of Ceph options that centralize configuration management by storing
configuration options for the entire storage cluster. By centralizing the Ceph configuration in a database, this simplifies storage
cluster administration.

The priority order that Ceph uses to set options is:

Compiled-in default values

Ceph cluster configuration database

Local ceph.conf file

Runtime override, using the ceph daemon DAEMON-NAME config set or ceph tell DAEMON-NAME injectargs
commands

There are still a few Ceph options that can be defined in the local Ceph configuration file, which is /etc/ceph/ceph.conf by
default.

cephadm uses a basic ceph.conf file that only contains a minimal set of options for connecting to Ceph Monitors, authenticating,
and fetching configuration information. In most cases, cephadm uses only the mon_host option. To avoid using ceph.conf only for
the mon_host option, use DNS SRV records to perform operations with Monitors.

IMPORTANT: IBM recommends that you use the assimilate-conf administrative command to move valid options into the
configuration database from the ceph.conf file. For more information about assimilate-conf, see Administrative Commands.

Ceph allows you to make changes to the configuration of a daemon at runtime. This capability can be useful for increasing or
decreasing the logging output, by enabling or disabling debug settings, and can even be used for runtime optimization.

NOTE: When the same option exists in the configuration database and the Ceph configuration file, the configuration database option
has a lower priority than what is set in the Ceph configuration file.

Sections and Masks

Just as you can configure Ceph options globally, per daemon type, or by a specific daemon in the Ceph configuration file, you can
also configure the Ceph options in the configuration database according to these sections:

Section Description

220 IBM Storage Ceph


Section Description
global Affects all daemons and clients.
mon Affects all Ceph Monitors.
mgr Affects all Ceph Managers.
osd Affects all Ceph OSDs.
mds Affects all Ceph Metadata Servers.
client Affects all Ceph Clients, including mounted file systems, block devices, and RADOS Gateways.
Ceph configuration options can have a mask associated with them. These masks can further restrict which daemons or clients the
options apply to.

Masks have two forms:

type:location

The type is a CRUSH property, for example, rack or host. The location is a value for the property type. For example, host:foo
limits the option only to daemons or clients running on the foo host.

Example

ceph config set osd/host:magna045 debug_osd 20

class:device-class

The device-class is the name of the CRUSH device class, such as hdd or ssd. For example, class:ssd limits the option only to
Ceph OSDs backed by solid state drives (SSD). This mask has no effect on non-OSD daemons of clients.

Example

ceph config set osd/class:hdd osd_max_backfills 8

Administrative Commands

The Ceph configuration database can be administered with the subcommand ceph config ACTION. These are the actions you can
do:

ls
Lists the available configuration options.

dump
Dumps the entire configuration database of options for the storage cluster.

get WHO
Dumps the configuration for a specific daemon or client. For example, WHO can be a daemon, like mds.a.

set WHO OPTION VALUE


Sets a configuration option in the Ceph configuration database, where WHO is the target daemon, OPTION is the option to set,
and VALUE is the desired value.

show WHO
Shows the reported running configuration for a running daemon. These options might be different from those stored by the
Ceph Monitors if there is a local configuration file in use or options have been overridden on the command line or at run time.
Also, the source of the option values is reported as part of the output.

assimilate-conf -i INPUT_FILE -o OUTPUT_FILE


Assimilate a configuration file from the INPUT_FILE and move any valid options into the Ceph Monitors’ configuration
database. Any options that are unrecognized, invalid, or cannot be controlled by the Ceph Monitor return in an abbreviated
configuration file stored in the OUTPUT_FILE. This command can be useful for transitioning from legacy configuration files to a
centralized configuration database. Note that when you assimilate a configuration and the Monitors or other daemons have
different configuration values set for the same set of options, the end result depends on the order in which the files are
assimilated.

help OPTION -f json-pretty


Displays help for a particular OPTION using a JSON-formatted output.

Reference

For more information about the command, see Setting a specific configuration at runtime.

IBM Storage Ceph 221


Using the Ceph metavariables
Edit online
Metavariables simplify Ceph storage cluster configuration dramatically. When a metavariable is set in a configuration value, Ceph
expands the metavariable into a concrete value.

Metavariables are very powerful when used within the [global], [osd], [mon], or [client] sections of the Ceph configuration
file. However, you can also use them with the administration socket. Ceph metavariables are similar to Bash shell expansion.

Ceph supports the following metavariables:

$cluster
Description
Expands to the Ceph storage cluster name. Useful when running multiple Ceph storage clusters on the same hardware.

Example
/etc/ceph/$cluster.keyring

Default ceph

$type
Description
Expands to one of osd or mon, depending on the type of instant daemon.

Example
/var/lib/ceph/$type

$id
Description
Expands to the daemon identifier. For osd.0, this would be 0.

Example
/var/lib/ceph/$type/$cluster-$id

$host
Description
Expands to the host name of the instant daemon.

$name
Description
Expands to $type.$id.

Example
/var/run/ceph/$cluster-$name.asok

Viewing the Ceph configuration at runtime


Edit online
The Ceph configuration files can be viewed at boot time and run time.

Prerequisites

Root-level access to the Ceph node.

Access to admin keyring.

Procedure

1. To view a runtime configuration, log in to a Ceph node running the daemon and execute:

Syntax

ceph daemon DAEMON_TYPE.ID config show

To see the configuration for osd.0, you can log into the node containing osd.0 and execute this command:

222 IBM Storage Ceph


Example

[root@osd ~]# ceph daemon osd.0 config show

2. For additional options, specify a daemon and help.

Example

[root@osd ~]# ceph daemon osd.0 help

Viewing a specific configuration at runtime


Edit online
Configuration settings for IBM Storage Ceph can be viewed at runtime from the Ceph Monitor node.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Monitor node.

Procedure

1. Log into a Ceph node and execute:

Syntax

ceph daemon DAEMON_TYPE.ID config get PARAMETER

Example

[root@mon ~]# ceph daemon osd.0 config get public_addr

Setting a specific configuration at runtime


Edit online
To set a specific Ceph configuration at runtime, use the ceph config set command.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Monitor or OSD nodes.

Procedure

1. Set the configuration on all Monitor or OSD daemons :

Syntax

ceph config set DAEMON CONFIG-OPTION VALUE

Example

[root@mon ~]# ceph config set osd debug_osd 10

2. Validate that the option and value are set:

Example

[root@mon ~]# ceph config dump


osd advanced debug_osd 10/10

To remove the configuration option from all daemons:

Syntax

IBM Storage Ceph 223


ceph config rm DAEMON CONFIG-OPTION VALUE

Example

[root@mon ~]# ceph config rm osd debug_osd

To set the configuration for a specific daemon:

Syntax

ceph config set DAEMON.DAEMON-NUMBER CONFIG-OPTION VALUE

Example

[root@mon ~]# ceph config set osd.0 debug_osd 10

To validate that the configuration is set for the specified daemon:

Example

[root@mon ~]# ceph config dump


osd.0 advanced debug_osd 10/10

To remove the configuration for a specific daemon:

Syntax

ceph config rm DAEMON.DAEMON-NUMBER CONFIG-OPTION

Example

[root@mon ~]# ceph config rm osd.0 debug_osd

NOTE: If you use a client that does not support reading options from the configuration database, or if you still need to use
ceph.conf to change your cluster configuration for other reasons, run the following command:

ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf false

You must maintain and distribute the ceph.conf file across the storage cluster.

OSD Memory Target


Edit online
BlueStore keeps OSD heap memory usage under a designated target size with the osd_memory_target configuration option.

The option osd_memory_target sets OSD memory based upon the available RAM in the system. Use this option when TCMalloc is
configured as the memory allocator, and when the bluestore_cache_autotune option in BlueStore is set to true.

Ceph OSD memory caching is more important when the block device is slow; for example, traditional hard drives, because the
benefit of a cache hit is much higher than it would be with a solid state drive. However, this must be weighed into a decision to
collocate OSDs with other services, such as in a hyper-converged infrastructure (HCI) or other applications.

Setting the OSD memory target

Setting the OSD memory target


Edit online
Use the osd_memory_target option to set the maximum memory threshold for all OSDs in the storage cluster, or for specific OSDs.
An OSD with an osd_memory_target option set to 16 GB might use up to 16 GB of memory.

NOTE: Configuration options for individual OSDs take precedence over the settings for all OSDs.

Prerequisites

A running IBM Storage Ceph cluster.

224 IBM Storage Ceph


Root-level access to all hosts in the storage cluster.

Procedure

1. To set osd_memory_target for all OSDs in the storage cluster:

Syntax

ceph config set osd osd_memory_target VALUE

VALUE is the number of GBytes of memory to be allocated to each OSD in the storage cluster.

2. To set osd_memory_target for a specific OSD in the storage cluster:

Syntax

ceph config set osd.id osd_memory_target VALUE

.id is the ID of the OSD and VALUE is the number of GB of memory to be allocated to the specified OSD. For example, to
configure the OSD with ID 8 to use up to 16 GBytes of memory:

Example

[ceph: root@host01 /]# ceph config set osd.8 osd_memory_target 16G

3. To set an individual OSD to use one maximum amount of memory and configure the rest of the OSDs to use another amount,
specify the individual OSD first:

Example

[ceph: root@host01 /]# ceph config set osd osd_memory_target 16G


[ceph: root@host01 /]# ceph config set osd.8 osd_memory_target 8G

Reference

To configure IBM Storage Ceph to autotune OSD memory usage, see Automatically tuning OSD memory.

MDS Memory Cache Limit


Edit online
MDS servers keep their metadata in a separate storage pool, named cephfs_metadata, and are the users of Ceph OSDs. For Ceph
File Systems, MDS servers have to support an entire IBM Storage Ceph cluster, not just a single storage device within the storage
cluster, so their memory requirements can be significant, particularly if the workload consists of small-to-medium-size files, where
the ratio of metadata to data is much higher.

Example

Set the mds_cache_memory_limit to 2000000000 bytes:

ceph_conf_overrides:
osd:
mds_cache_memory_limit=2000000000

NOTE: For a large IBM Storage Ceph cluster with a metadata-intensive workload, do not put an MDS server on the same node as
other memory-intensive services, doing so gives you the option to allocate more memory to MDS, for example, sizes greater than
100 GB.

Reference

For more information, see Metadata Server cache size limits.

Ceph network configuration


Edit online

IBM Storage Ceph 225


As a storage administrator, you must understand the network environment that the IBM Storage Ceph cluster will operate in, and
configure the IBM Storage Ceph accordingly. Understanding and configuring the Ceph network options will ensure optimal
performance and reliability of the overall storage cluster.

Prerequisites

Network connectivity.

Installed IBM Storage Ceph cluster

Reference

Ceph network configuration options

Ceph on-wire encryption

Network configuration for Ceph


Ceph network messenger
Configuring a public network
Configuring multiple public networks to the cluster
Configuring a private network
Verifying firewall rules are configured for default Ceph ports
Firewall settings for Ceph Monitor node
Firewall settings for Ceph OSDs

Network configuration for Ceph


Edit online
The Ceph storage cluster does not perform request routing or dispatching on behalf of the Ceph client. Instead, Ceph clients make
requests directly to Ceph OSD daemons. Ceph OSDs perform data replication on behalf of Ceph clients, which means replication and
other factors impose additional loads on the networks of Ceph storage clusters.

Ceph has one network configuration requirement that applies to all daemons. The Ceph configuration file must specify the host for
each daemon.

Some deployment utilities, such as cephadm creates a configuration file for you. Do not set these values if the deployment utility
does it for you.

IMPORTANT: The host option is the short name of the node, not its FQDN. It is not an IP address.

All Ceph clusters must use a public network. However, unless you specify an internal cluster network, Ceph assumes a single public
network. Ceph can function with a public network only, but for large storage clusters, you will see significant performance
improvement with a second private network for carrying only cluster-related traffic.

IMPORTANT: IBM recommends running a Ceph storage cluster with two networks. One public network and one private network.

To support two networks, each Ceph Node will need to have more than one network interface card (NIC).

Figure 1. Network architecture

226 IBM Storage Ceph


There are several reasons to consider operating two separate networks:

Performance: Ceph OSDs handle data replication for the Ceph clients. When Ceph OSDs replicate data more than once, the
network load between Ceph OSDs easily dwarfs the network load between Ceph clients and the Ceph storage cluster. This can
introduce latency and create a performance problem. Recovery and rebalancing can also introduce significant latency on the
public network.

Security: While most people are generally civil, some actors will engage in what is known as a Denial of Service (DoS) attack.
When traffic between Ceph OSDs gets disrupted, peering may fail and placement groups may no longer reflect an active +
clean state, which may prevent users from reading and writing data. A great way to defeat this type of attack is to maintain a
completely separate cluster network that does not connect directly to the internet.

Network configuration settings are not required. Ceph can function with a public network only, assuming a public network is
configured on all hosts running a Ceph daemon. However, Ceph allows you to establish much more specific criteria, including
multiple IP networks and subnet masks for your public network. You can also establish a separate cluster network to handle OSD
heartbeat, object replication, and recovery traffic.

Do not confuse the IP addresses you set in the configuration with the public-facing IP addresses network clients might use to access
your service. Typical internal IP networks are often 192.168.0.0 or 10.0.0.0.

NOTE: Ceph uses CIDR notation for subnets, for example, 10.0.0.0/24.

IMPORTANT: If you specify more than one IP address and subnet mask for either the public or the private network, the subnets
within the network must be capable of routing to each other. Additionally, make sure you include each IP address and subnet in your
IP tables and open ports for them as necessary.

When you configured the networks, you can restart the cluster or restart each daemon. Ceph daemons bind dynamically, so you do
not have to restart the entire cluster at once if you change the network configuration.

Reference

For common option descriptions and usage information, see Ceph network configuration options.

IBM Storage Ceph 227


Ceph network messenger
Edit online
Messenger is the Ceph network layer implementation. IBM supports two messenger types:

simple

async

In IBM Storage Ceph 5.3 and higher, async is the default messenger type. To change the messenger type, specify the ms_type
configuration setting in the [global] section of the Ceph configuration file.

NOTE: For the async messenger, IBM supports the posix transport type, but does not currently support rdma or dpdk. By default,
the ms_type setting in IBM Storage Ceph 5.3 or higher reflects async+posix, where async is the messenger type and posix is
the transport type.

SimpleMessenger
The SimpleMessenger implementation uses TCP sockets with two threads per socket. Ceph associates each logical session
with a connection. A pipe handles the connection, including the input and output of each message. While SimpleMessenger
is effective for the posix transport type, it is not effective for other transport types such as rdma or dpdk.

AsyncMessenger
Consequently, AsyncMessenger is the default messenger type for IBM Storage Ceph 5.3 or higher. For IBM Storage Ceph 5.3
or higher, the AsyncMessenger implementation uses TCP sockets with a fixed-size thread pool for connections, which should
be equal to the highest number of replicas or erasure-code chunks. The thread count can be set to a lower value if
performance degrades due to a low CPU count or a high number of OSDs per server.

NOTE: IBM does not support other transport types such as rdma or dpdk at this time.

Reference

For more information about using on-wire encryption with the Ceph messenger version 2 protocol, see Ceph on-wire
encryption.

For more information about asynchronous messenger options, see Ceph network configuration options.

Configuring a public network


Edit online
To configure Ceph networks, use the config set command within the cephadm shell. Note that the IP addresses you set in your
network configuration are different from the public-facing IP addresses that network clients might use to access your service.

Ceph functions perfectly well with only a public network. However, Ceph allows you to establish much more specific criteria,
including multiple IP networks for your public network.

You can also establish a separate, private cluster network to handle OSD heartbeat, object replication, and recovery traffic. For more
information about the private network, see Configuring a private network.

NOTE:

Ceph uses CIDR notation for subnets, for example, 10.0.0.0/24. Typical internal IP networks are often 192.168.0.0/24 or
10.0.0.0/24.

If you specify more than one IP address for either the public or the cluster network, the subnets within the network must be
capable of routing to each other. In addition, make sure you include each IP address in your IP tables, and open ports for them
as necessary.

The public network configuration allows you specifically define IP addresses and subnets for the public network.

Prerequisites

Installation of the IBM Storage Ceph software.

228 IBM Storage Ceph


Procedure

1. Log in to the cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Configure the public network with the subnet:

Syntax

ceph config set mon public_network IP_ADDRESS_WITH_SUBNET

Example

[ceph: root@host01 /]# ceph config set mon public_network 192.168.0.0/24

3. Get the list of services in the storage cluster:

Example

[ceph: root@host01 /]# ceph orch ls

4. Restart the daemons. Ceph daemons bind dynamically, so you do not have to restart the entire cluster at once if you change
the network configuration for a specific daemon.

Example

[ceph: root@host01 /]# ceph orch restart mon

5. Optional: If you want to restart the cluster, on the admin node as a root user, run systemctl command:

Syntax

systemctl restart ceph-FSID_OF_CLUSTER.target

Example

[root@host01 ~]# systemctl restart ceph-1ca9f6a8-d036-11ec-8263-fa163ee967ad.target

Reference

For common option descriptions and usage information, see Ceph network configuration options.

Configuring multiple public networks to the cluster


Edit online
When the user wants to place the Ceph Monitor daemons on hosts belonging to multiple network subnets, configuring multiple
public networks to the cluster is necessary.

An example of usage is a stretch cluster mode used for Advanced Cluster Management (ACM) in Metro DR for OpenShift Data
Foundation.

You can configure multiple public networks to the cluster during bootstrap and once bootstrap is complete.

Prerequisites

Before adding a host be sure that you have a running IBM Storage Ceph cluster.

Procedure

1. Bootstrap a Ceph cluster configured with multiple public networks.

a. Prepare a ceph.conf file containing a mon public network section.

IMPORTANT: At least one of the provided public networks must be configured on the current host used for bootstrap.

Syntax

IBM Storage Ceph 229


[mon]
public_network = PUBLIC_NETWORK1, PUBLIC_NETWORK2

Example

[mon]
public_network = 10.40.0.0/24, 10.41.0.0/24, 10.42.0.0/24

This is an example with three public networks to be provided for bootstrap.

b. Bootstrap the cluster by providing the ceph.conf file as input.

NOTE: During the bootstrap you can include any other arguments that you want to provide.

Syntax

cephadm --image IMAGE_URL bootstrap --mon-ip MONITOR_IP -c PATH_TO_CEPH_CONF

NOTE Alternatively, an IMAGE_ID (such as,


13ea90216d0be03003d12d7869f72ad9de5cec9e54a27fd308e01e467c0d4a0a) can be used instead of
IMAGE_URL.

Example

[root@host01 ~]# cephadm –image cp.icr.io/cp/ibm-ceph/ceph-5-rhel8:latest bootstrap –


mon-ip 10.40.0.0/24 -c /etc/ceph/ceph.conf

2. Add new hosts to the subnets.

NOTE: The host being added must be reachable from the host that the active manager is running on.

a. Install the cluster’s public SSH key in the new host’s root user’s authorized_keys file:

Syntax

# ssh-copy-id -f -i /etc/ceph/ceph.pub root@NEW-HOST

Example

[root@host01 ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host02


[root@host01 ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host03

b. Add the new node to the Ceph cluster:

Syntax

ceph orch host add NEW_HOST IP [LABEL1 ...]

Example

ceph orch host add host02 10.10.0.102 label1


ceph orch host add host03 10.10.0.103 label2

NOTE:

It is best to explicitly provide the host IP address. If an IP is not provided, then the host name will be
immediately resolved via DNS and that IP will be used.

One or more labels can also be included to immediately label the new host. For example, by default the _admin
label will make cephadm maintain a copy of the ceph.conf file and a client.admin keyring file in
/etc/ceph.

3. Add the networks configurations for the public network parameters to a running cluster. Be sure that the subnets are
separated by commas and that the subnets are listed in subnet/mask format.

Syntax

ceph config set mon public_network "SUBNET_1,SUBNET_2, ..."

Example

[root@host01 ~]# ceph config set mon public_network "192.168.0.0/24, 10.42.0.0/24, ..."

230 IBM Storage Ceph


If necessary, update the mon specifications to place the mon daemons on hosts within the specified subnets.

Reference

For more information about adding hosts, see Adding hosts.

For more information about stretch clusters, see Stretch clusters for Ceph storage.

Configuring a private network


Edit online
Network configuration settings are not required. Ceph assumes a public network with all hosts operating on it, unless you specifically
configure a cluster network, also known as a private network.

If you create a cluster network, OSDs routes heartbeat, object replication, and recovery traffic over the cluster network. This can
improve performance, compared to using a single network.

IMPORTANT: For added security, the cluster network should not be reachable from the public network or the Internet.

To assign a cluster network, use the --cluster-network option with the cephadm bootstrap command. The cluster network
that you specify must define a subnet in CIDR notation (for example, 10.90.90.0/24 or fe80::/64).

You can also configure the cluster_network after boostrap.

Prerequisites

Access to the Ceph software repository.

Root-level access to all nodes in the storage cluster.

Procedure

1. Run the cephadm bootstrap command from the initial node that you want to use as the Monitor node in the storage cluster.
Include the --cluster-network option in the command.

Syntax

cephadm bootstrap --mon-ip IP-ADDRESS --registry-url registry.redhat.io --registry-username


USER_NAME --registry-password PASSWORD --cluster-network NETWORK-IP-ADDRESS

Example

[root@host01 ~]# cephadm bootstrap --mon-ip 10.10.128.68 --registry-url registry.redhat.io --


registry-username myuser1 --registry-password mypassword1 --cluster-network 10.10.0.0/24

2. To configure the cluster_network after bootstrap, run the config set command and redeploy the daemons:

a. Log in to the cephadm shell:

Example

[root@host01 ~]# cephadm shell

b. Configure the cluster network with the subnet:

Syntax

ceph config set global cluster_network IP_ADDRESS_WITH_SUBNET

Example

[ceph: root@host01 /]# ceph config set global cluster_network 10.10.0.0/24

c. Get the list of services in the storage cluster:

Example

[ceph: root@host01 /]# ceph orch ls

IBM Storage Ceph 231


d. Restart the daemons. Ceph daemons bind dynamically, so you do not have to restart the entire cluster at once if you change
the network configuration for a specific daemon.

Example

[ceph: root@host01 /]# ceph orch restart mon

e. Optional: If you want to restart the cluster, on the admin node as a root user, run systemctl command:

Syntax

systemctl restart ceph-FSID_OF_CLUSTER.target

Example

[root@host01 ~]# systemctl restart ceph-1ca9f6a8-d036-11ec-8263-fa163ee967ad.target

Reference

For more information about invoking cephadm bootstrap, see Bootstrapping a new storage cluster.

Verifying firewall rules are configured for default Ceph ports


Edit online
By default, IBM Storage Ceph daemons use TCP ports 6800—​7100 to communicate with other hosts in the cluster. You can verify
that the host’s firewall allows connection on these ports.

NOTE: If your network has a dedicated firewall, you might need to verify its configuration in addition to following this procedure. See
the firewall’s documentation for more information.

See the firewall’s documentation for more information.

Prerequisites

Root-level access to the host.

Procedure

1. Verify the host’s iptables configuration:

a. List active rules:

[root@host1 ~]# iptables -L

b. Verify the absence of rules that restrict connectivity on TCP ports 6800—7100.

Example

REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

2. Verify the host’s firewalld configuration:

a. List ports open on the host:

Syntax

firewall-cmd --zone ZONE --list-ports

Example

[root@host1 ~]# firewall-cmd --zone default --list-ports

b. Verify the range is inclusive of TCP ports 6800—7100.

Firewall settings for Ceph Monitor node

232 IBM Storage Ceph


Edit online
You can enable encryption for all Ceph traffic over the network with the introduction of the messenger version 2 protocol. The
secure mode setting for messenger v2 encrypts communication between Ceph daemons and Ceph clients, giving you end-to-end
encryption.

Messenger v2 Protocol

The second version of Ceph’s on-wire protocol, msgr2, includes several new features:

A secure mode encrypts all data moving through the network.

Encapsulation improvement of authentication payloads.

Improvements to feature advertisement and negotiation.

The Ceph daemons bind to multiple ports allowing both the legacy, v1-compatible, and the new, v2-compatible, Ceph clients to
connect to the same storage cluster. Ceph clients or other Ceph daemons connecting to the Ceph Monitor daemon will try to use the
v2 protocol first, if possible, but if not, then the legacy v1 protocol will be used. By default, both messenger protocols, v1 and v2, are
enabled. The new v2 port is 3300, and the legacy v1 port is 6789, by default.

Prerequisites

A running IBM Storage Ceph cluster.

Access to the Ceph software repository.

Root-level access to the Ceph Monitor node.

Procedure

1. Add rules using the following example:

[root@mon ]# sudo iptables -A INPUT -i IFACE -p tcp -s IP-ADDRESS/NETMASK --dport 6789 -j


ACCEPT
[root@mon ]# sudo iptables -A INPUT -i IFACE -p tcp -s IP-ADDRESS/NETMASK --dport 3300 -j
ACCEPT

a. Replace IFACE with the public network interface (for example, eth0, eth1, and so on).

b. Replace IP-ADDRESS with the IP address of the public network and NETMASK with the netmask for the public network.

2. For the firewalld daemon, execute the following commands:

[root@mon ~]# firewall-cmd --zone=public --add-port=6789/tcp


[root@mon ~]# firewall-cmd --zone=public --add-port=6789/tcp --permanent
[root@mon ~]# firewall-cmd --zone=public --add-port=3300/tcp
[root@mon ~]# firewall-cmd --zone=public --add-port=3300/tcp --permanent

Firewall settings for Ceph OSDs


Edit online
By default, Ceph OSDs bind to the first available ports on a Ceph node beginning at port 6800. Ensure to open at least four ports
beginning at port 6800 for each OSD that runs on the node:

One for talking to clients and monitors on the public network.

One for sending data to other OSDs on the cluster network.

Two for sending heartbeat packets on the cluster network.

Figure 1. OSD firewall

IBM Storage Ceph 233


Ports are node-specific. However, you might need to open more ports than the number of ports needed by Ceph daemons running on
that Ceph node in the event that processes get restarted and the bound ports do not get released. Consider opening a few additional
ports in case a daemon fails and restarts without releasing the port such that the restarted daemon binds to a new port. Also,
consider opening the port range of 6800—7300 on each OSD node.

If you set separate public and cluster networks, you must add rules for both the public network and the cluster network, because
clients will connect using the public network and other Ceph OSD Daemons will connect using the cluster network.

Prerequisites

A running IBM Storage Ceph cluster.

Access to the Ceph software repository.

Root-level access to the Ceph OSD nodes.

Procedure

1. Add rules using the following example:

[root@mon ~]# sudo iptables -A INPUT -i IFACE -m multiport -p tcp -s IP-ADDRESS/NETMASK --


dports 6800:7300 -j ACCEPT

a. Replace IFACE with the public network interface (for example, eth0, eth1, and so on).

b. Replace IP-ADDRESS with the IP address of the public network and NETMASK with the netmask for the public network.

2. For the firewalld daemon, execute the following:

[root@mon ~] # firewall-cmd --zone=public --add-port=6800-7300/tcp


[root@mon ~] # firewall-cmd --zone=public --add-port=6800-7300/tcp --permanent

If you put the cluster network into another zone, open the ports within that zone as appropriate.

Ceph Monitor configuration


Edit online
As a storage administrator, you can use the default configuration values for the Ceph Monitor or customize them according to the
intended workload.

Prerequisites

Installed IBM Storage Ceph cluster

Reference

Ceph Monitor configuration options

Ceph Monitor configuration


Ceph cluster maps
Ceph Monitor quorum
Ceph Monitor consistency
Bootstrap the Ceph Monitor

234 IBM Storage Ceph


Minimum configuration for a Ceph Monitor
Unique identifier for Ceph
Ceph Monitor data store
Ceph storage capacity
Ceph heartbeat
Ceph Monitor synchronization role
Ceph time synchronization

Ceph Monitor configuration


Edit online
All storage clusters have at least one monitor. A Ceph Monitor configuration usually remains fairly consistent, but you can add,
remove or replace a Ceph Monitor in a storage cluster.

Ceph monitors maintain a master copy of the cluster map. That means a Ceph client can determine the location of all Ceph monitors
and Ceph OSDs just by connecting to one Ceph monitor and retrieving a current cluster map.

Before Ceph clients can read from or write to Ceph OSDs, they must connect to a Ceph Monitor first. With a current copy of the
cluster map and the CRUSH algorithm, a Ceph client can compute the location for any object. The abilityto compute object locations
allows a Ceph client to talk directly to Ceph OSDs, which is a very important aspect of Ceph’s high scalability and performance.

The primary role of the Ceph Monitor is to maintain a master copy of the cluster map. Ceph Monitors also provide authentication and
logging services. Ceph Monitors write all changes in the monitor services to a single Paxos instance, and Paxos writes the changes to
a key-value store for strong consistency. Ceph Monitors can query the most recent version of the cluster map during synchronization
operations. Ceph Monitors leverage the key-value store’s snapshots and iterators, using the rocksdb database, to perform store-
wide synchronization.

Figure 1. Paxos

Viewing the Ceph Monitor configuration database

Viewing the Ceph Monitor configuration database


Edit online
You can view Ceph Monitor configuration in the configuration database.

NOTE: Previous releases of IBM Storage Ceph centralize Ceph Monitor configuration in /etc/ceph/ceph.conf. This configuration
file has been deprecated as of IBM Storage Ceph 5.3.

IBM Storage Ceph 235


Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to a Ceph Monitor host.

Procedure

1. Log into the cephadm shell.

[root@host01 ~]# cephadm shell

2. Use the ceph config command to view the configuration database:

Example

[ceph: root@host01 /]# ceph config get mon

For more information about the options available for the ceph config command, use ceph config -h.

Ceph cluster maps


Edit online
The cluster map is a composite of maps, including the monitor map, the OSD map, and the placement group map. The cluster map
tracks a number of important events:

Which processes are in the IBM Storage Ceph cluster.

Which processes that are in the IBM Storage Ceph cluster are up and running or down.

Whether, the placement groups are active or inactive, and clean or in some other state.

Other details that reflect the current state of the cluster, such as:

the total amount of storage space or

the amount of storage used.

When there is a significant change in the state of the cluster, for example, a Ceph OSD goes down, a placement group falls into a
degraded state, and so on. The cluster map gets updated to reflect the current state of the cluster. Additionally, the Ceph monitor
also maintains a history of the prior states of the cluster. The monitor map, OSD map, and placement group map each maintain a
history of their map versions. Each version is called an epoch.

When operating the IBM Storage Ceph cluster, keeping track of these states is an important part of the cluster administration.

Ceph Monitor quorum


Edit online
A cluster will run sufficiently with a single monitor. However, a single monitor is a single-point-of-failure. To ensure high availability in
a production Ceph storage cluster, run Ceph with multiple monitors so that the failure of a single monitor will not cause a failure of
the entire storage cluster.

When a Ceph storage cluster runs multiple Ceph Monitors for high availability, Ceph Monitors use the Paxos algorithm to establish
consensus about the master cluster map. A consensus requires a majority of monitors running to establish a quorum for consensus
about the cluster map. For example, 1; 2 out of 3; 3 out of 5; 4 out of 6; and so on.

IBM recommends running a production IBM Storage Ceph cluster with at least three Ceph Monitors to ensure high availability. When
you run multiple monitors, you can specify the initial monitors that must be members of the storage cluster to establish a quorum.
This may reduce the time it takes for the storage cluster to come online.

[mon]
mon_initial_members = a,b,c

236 IBM Storage Ceph


NOTE: Majority of the monitors in the storage cluster must be able to reach each other in to establish a quorum. You can decrease
the initial number of monitors to establish a quorum with the mon_initial_members option.

Ceph Monitor consistency


Edit online
When you add monitor settings to the Ceph configuration file, you need to be aware of some of the architectural aspects of Ceph
Monitors. Ceph imposes strict consistency requirements for a Ceph Monitor when discovering another Ceph Monitor within the
cluster. Whereas Ceph clients and other Ceph daemons use the Ceph configuration file to discover monitors, monitors discover each
other using the monitor map (monmap), not the Ceph configuration file.

A Ceph Monitor always refers to the local copy of the monitor map when discovering other Ceph Monitors in the IBM Storage Ceph
cluster. Using the monitor map instead of the Ceph configuration file avoids errors that could break the cluster. For example, typos in
the Ceph configuration file when specifying a monitor address or port. Since monitors use monitor maps for discovery and they share
monitor maps with clients and other Ceph daemons, the monitor map provides monitors with a strict guarantee that their consensus
is valid.

Strict consistency when applying updates to the monitor maps

As with any other updates on the Ceph Monitor, changes to the monitor map always run through a distributed consensus algorithm
called Paxos. The Ceph Monitors must agree on each update to the monitor map, such as adding or removing a Ceph Monitor, to
ensure that each monitor in the quorum has the same version of the monitor map. Updates to the monitor map are incremental so
that Ceph Monitors have the latest agreed-upon version and a set of previous versions.

Maintaining history

Maintaining a history enables a Ceph Monitor that has an older version of the monitor map to catch up with the current state of the
IBM Storage Ceph cluster.

If Ceph Monitors discovered each other through the Ceph configuration file instead of through the monitor map, it would introduce
additional risks because the Ceph configuration files are not updated and distributed automatically. Ceph Monitors might
inadvertently use an older Ceph configuration file, fail to recognize a Ceph Monitor, fall out of a quorum, or develop a situation where
Paxos is not able to determine the current state of the system accurately.

Bootstrap the Ceph Monitor


Edit online
In most configuration and deployment cases, tools that deploy Ceph, such as cephadm, might help bootstrap the Ceph monitors by
generating a monitor map for you.

A Ceph monitor requires a few explicit settings:

File System ID: The fsid is the unique identifier for your object store. Since you can run multiple storage clusters on the
same hardware, you must specify the unique ID of the object store when bootstrapping a monitor. Using deployment tools,
such as cephadm, will generate a file system identifier, but you can also specify the fsid manually.

Monitor ID: A monitor ID is a unique ID assigned to each monitor within the cluster. By convention, the ID is set to the
monitor’s hostname. This option can be set using a deployment tool, using the ceph command, or in the Ceph configuration
file. In the Ceph configuration file, sections are formed as follows:

Example

[mon.host1]
[mon.host2]

Keys: The monitor must have secret keys.

Reference

For more information about cephadm and the Ceph orchestrator, see Operations.

IBM Storage Ceph 237


Minimum configuration for a Ceph Monitor
Edit online
The bare minimum monitor settings for a Ceph Monitor in the Ceph configuration file includes a host name for each monitor if it is not
configured for DNS and the monitor address. The Ceph Monitors run on port 6789 and 3300 by default.

IMPORTANT: Do not edit the Ceph configuration file.

NOTE: This minimum configuration for monitors assumes that a deployment tool generates the fsid and the mon. key for you.

You can use the following commands to set or read the storage cluster configuration options.

ceph config dump


Dumps the entire configuration database for the whole storage cluster.

ceph config generate-minimal-conf


Generates a minimal ceph.conf file.

ceph config get WHO


Dumps the configuration for a specific daemon or a client, as stored in the Ceph Monitor’s configuration database.

ceph config set WHO OPTION VALUE


Sets the configuration option in the Ceph Monitor’s configuration database.

ceph config show WHO


Shows the reported running configuration for a running daemon.

ceph config assimilate-conf -i INPUT_FILE -o OUTPUT_FILE


Ingests a configuration file from the input file and moves any valid options into the Ceph Monitors’ configuration database.

Here, the WHO parameter might be name of the section or a Ceph daemon, OPTION is a configuration file, and VALUE can be either
true or false.

IMPORTANT: When a Ceph daemon needs a config option prior to getting the option from the config store, you can set the
configuration by running the following command:

ceph cephadm set-extra-ceph-conf

This command adds text to all the daemon’s ceph.conf files. It is a workaround and is NOT a recommended operation.

Unique identifier for Ceph


Edit online
Each IBM Storage Ceph cluster has a unique identifier (fsid). If specified, it usually appears under the [global] section of the
configuration file. Deployment tools usually generate the fsid and store it in the monitor map, so the value may not appear in a
configuration file. The fsid makes it possible to run daemons for multiple clusters on the same hardware.

NOTE: Do not set this value if you use a deployment tool that does it for you.

Ceph Monitor data store


Edit online
Ceph provides a default path where Ceph monitors store data.

IMPORTANT: IBM recommends running Ceph monitors on separate hosts and drives from Ceph OSDs for optimal performance in a
production IBM Storage Ceph cluster.

Ceph monitors call the fsync() function often, which can interfere with Ceph OSD workloads.

Ceph monitors store their data as key-value pairs. Using a data store prevents recovering Ceph monitors from running corrupted
versions through Paxos, and it enables multiple modification operations in one single atomic batch, among other advantages.

238 IBM Storage Ceph


IMPORTANT: IBM does not recommend changing the default data location. If you modify the default location, make it uniform
across Ceph monitors by setting it in the [mon] section of the configuration file.

Ceph storage capacity


Edit online
When an IBM Storage Ceph cluster gets close to its maximum capacity (specifies by the mon_osd_full_ratio parameter), Ceph
prevents you from writing to or reading from Ceph OSDs as a safety measure to prevent data loss. Therefore, letting a production
IBM Storage Ceph cluster approach its full ratio is not a good practice, because it sacrifices high availability.The default full ratio is
.95, or 95% of capacity. This a very aggressive setting for a test cluster with a small number of OSDs.

TIP: When monitoring a cluster, be alert to warnings related to the nearfull ratio. This means that a failure of some OSDs could
result in a temporary service disruption if one or more OSDs fails. Consider adding more OSDs to increase storage capacity.

A common scenario for test clusters involves a system administrator removing a Ceph OSD from the IBM Storage Ceph cluster to
watch the cluster re-balance. Then, removing another Ceph OSD, and so on until the IBM Storage Ceph cluster eventually reaches
the full ratio and locks up.

IMPORTANT: IBM recommends a bit of capacity planning even with a test cluster. Planning enables you to gauge how much spare
capacity you will need in to maintain high availability.

Ideally, you want to plan for a series of Ceph OSD failures where the cluster can recover to an active + clean state without
replacing those Ceph OSDs immediately. You can run a cluster in an active + degraded state, but this is not ideal for normal
operating conditions.

The following diagram depicts a simplistic IBM Storage Ceph cluster containing 33 Ceph Nodes with one Ceph OSD per host, each
Ceph OSD Daemon reading from and writing to a 3TB drive. So this exemplary IBM Storage Ceph cluster has a maximum actual
capacity of 99TB. With a mon osd full ratio of 0.95, if the IBM Storage Ceph cluster falls to 5 TB of remaining capacity, the
cluster will not allow Ceph clients to read and write data. So, the IBM Storage Ceph cluster's operating capacity is 95 TB, not 99 TB.

Figure 1. Storage capacity

It is normal in such a cluster for one or two OSDs to fail. A less frequent but reasonable scenario involves a rack’s router or power
supply failing, which brings down multiple OSDs simultaneously, for example, OSDs 7-12. In such a scenario, you should still strive
for a cluster that can remain operational and achieve an active + clean state, even if that means adding a few hosts with
additional OSDs in short order. If your capacity utilization is too high, you might not lose data, but you could still sacrifice data
availability while resolving an outage within a failure domain if capacity utilization of the cluster exceeds the full ratio. For this
reason, IBM recommends at least some rough capacity planning.

Identify two numbers for your cluster:

IBM Storage Ceph 239


the number of OSDs

the total capacity of the cluster

To determine the mean average capacity of an OSD within a cluster, divide the total capacity of the cluster by the number of OSDs in
the cluster. Consider multiplying that number by the number of OSDs you expect to fail simultaneously during normal operations (a
relatively small number). Finally, multiply the capacity of the cluster by the full ratio to arrive at a maximum operating capacity. Then,
subtract the amount of data from the OSDs you expect to fail to arrive at a reasonable full ratio. Repeat the foregoing process with a
higher number of OSD failures (for example, a rack of OSDs) to arrive at a reasonable number for a near full ratio.

Ceph heartbeat
Edit online
Ceph monitors know about the cluster by requiring reports from each OSD, and by receiving reports from OSDs about the status of
their neighboring OSDs. Ceph provides reasonable default settings for interaction between monitor and OSD, however, you can
modify them as needed.

Ceph Monitor synchronization role


Edit online
When you run a production cluster with multiple monitors which is recommended, each monitor checks to see if a neighboring
monitor has a more recent version of the cluster map. For example, a map in a neighboring monitor with one or more epoch numbers
higher than the most current epoch in the map of the instant monitor. Periodically, one monitor in the cluster might fall behind the
other monitors to the point where it must leave the quorum, synchronize to retrieve the most current information about the cluster,
and then rejoin the quorum.

Synchronization roles

For the purposes of synchronization, monitors can assume one of three roles:

Leader: The Leader is the first monitor to achieve the most recent Paxos version of the cluster map.

Provider: The Provider is a monitor that has the most recent version of the cluster map, but was not the first to achieve the
most recent version.

Requester: The Requester is a monitor that has fallen behind the leader and must synchronize to retrieve the most recent
information about the cluster before it can rejoin the quorum.

These roles enable a leader to delegate synchronization duties to a provider, which prevents synchronization requests from
overloading the leader and improving performance. In the following diagram, the requester has learned that it has fallen behind the
other monitors. The requester asks the leader to synchronize, and the leader tells the requester to synchronize with a provider.

Figure 1. Monitor Synchronization

240 IBM Storage Ceph


Monitor synchronization

Synchronization always occurs when a new monitor joins the cluster. During runtime operations, monitors can receive updates to the
cluster map at different times. This means the leader and provider roles may migrate from one monitor to another. If this happens
while synchronizing, for example, a provider falls behind the leader, the provider can terminate synchronization with a requester.

Once synchronization is complete, Ceph requires trimming across the cluster. Trimming requires that the placement groups are
active + clean.

Ceph time synchronization


Edit online
Ceph daemons pass critical messages to each other, which must be processed before daemons reach a timeout threshold. If the
clocks in Ceph monitors are not synchronized, it can lead to a number of anomalies.

For example:

Daemons ignoring received messages such as outdated timestamps.

Timeouts triggered too soon or late when a message was not received in time.

TIP: Install NTP on the Ceph monitor hosts to ensure that the monitor cluster operates with synchronized clocks.

Clock drift may still be noticeable with NTP even though the discrepancy is not yet harmful. Ceph clock drift and clock skew warnings
can get triggered even though NTP maintains a reasonable level of synchronization. Increasing your clock drift may be tolerable
under such circumstances. However, a number of factors such as workload, network latency, configuring overrides to default
timeouts, and other synchronization options can influence the level of acceptable clock drift without compromising Paxos
guarantees.

Ceph authentication configuration


Edit online
As a storage administrator, authenticating users and services is important to the security of the IBM Storage Ceph cluster. IBM
Storage Ceph includes the Cephx protocol, as the default, for cryptographic authentication, and the tools to manage authentication
in the storage cluster.

Prerequisites

IBM Storage Ceph 241


Installed IBM Storage Ceph cluster

Reference

Cephx configuration options

Cephx authentication
Enabling Cephx
Disabling Cephx
Cephx user keyrings
Cephx daemon keyrings
Cephx message signatures

Cephx authentication
Edit online
The cephx protocol is enabled by default. Cryptographic authentication has some computational costs, though they are generally
quite low. If the network environment connecting clients and hosts is considered safe and you cannot afford authentication
computational costs, you can disable it. When deploying a Ceph storage cluster, the deployment tool will create the client.admin
user and keyring.

IMPORTANT: IBM recommends using authentication.

NOTE: If you disable authentication, you are at risk of a man-in-the-middle attack altering client and server messages, which could
lead to significant security issues.

Enabling and disabling Cephx

Enabling Cephx requires that you have deployed keys for the Ceph Monitors and OSDs. When toggling Cephx authentication on or off,
you do not have to repeat the deployment procedures.

Enabling Cephx
Edit online
When cephx is enabled, Ceph will look for the keyring in the default search path, which includes
/etc/ceph/$cluster.$name.keyring. You can override this location by adding a keyring option in the [global] section of
the Ceph configuration file, but this is not recommended.

Execute the following procedures to enable cephx on a cluster with authentication disabled. If you or your deployment utility have
already generated the keys, you may skip the steps related to generating keys.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Monitor node.

Procedure

1. Create a client.admin key, and save a copy of the key for your client host:

[root@mon ~]# ceph auth get-or-create client.admin mon 'allow *' osd 'allow *' -o
/etc/ceph/ceph.client.admin.keyring

WARNING: This will erase the contents of any existing /etc/ceph/client.admin.keyring file. Do not perform this step if
a deployment tool has already done it for you.

2. Create a keyring for the monitor cluster and generate a monitor secret key:

[root@mon ~]# ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap


mon 'allow *'

242 IBM Storage Ceph


3. Copy the monitor keyring into a ceph.mon.keyring file in every monitor mon data directory. For example, to copy it to
mon.a in cluster ceph, use the following:

[root@mon ~]# cp /tmp/ceph.mon.keyring /var/lib/ceph/mon/ceph-a/keyring

4. Generate a secret key for every OSD, where _ID_ is the OSD number:

ceph auth get-or-create osd.ID mon allow rwx osd allow * -o /var/lib/ceph/osd/ceph-ID/keyring

5. By default the cephx authentication protocol is enabled.

NOTE: If the cephx authentication protocol was disabled previously by setting the authentication options to none, then by
removing the following lines under the [global] section in the Ceph configuration file (/etc/ceph/ceph.conf) will
reenable the cephx authentication protocol:

auth_cluster_required = none
auth_service_required = none
auth_client_required = none

6. Start or restart the Ceph storage cluster.

IMPORTANT: Enabling cephx requires downtime because the cluster needs to be completely restarted, or it needs to be shut
down and then started while client I/O is disabled.These flags need to be set before restarting or shutting down the storage
cluster:

[root@mon ~]# ceph osd set noout


[root@mon ~]# ceph osd set norecover
[root@mon ~]# ceph osd set norebalance
[root@mon ~]# ceph osd set nobackfill
[root@mon ~]# ceph osd set nodown
[root@mon ~]# ceph osd set pause

Once cephx is enabled and all PGs are active and clean, unset the flags:

[root@mon ~]# ceph osd unset noout


[root@mon ~]# ceph osd unset norecover
[root@mon ~]# ceph osd unset norebalance
[root@mon ~]# ceph osd unset nobackfill
[root@mon ~]# ceph osd unset nodown
[root@mon ~]# ceph osd unset pause

Disabling Cephx
Edit online
The following procedure describes how to disable Cephx. If your cluster environment is relatively safe, you can offset the
computation expense of the running authentication.

IMPORTANT: IBM recommends enabling authentication.

However, it may be easier during setup or troubleshooting to temporarily disable authentication.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Monitor node.

Procedure

1. Disable cephx authentication by setting the following options in the [global] section of the Ceph configuration file:

Example

auth_cluster_required = none
auth_service_required = none
auth_client_required = none

2. Start or restart the Ceph storage cluster.

IBM Storage Ceph 243


Cephx user keyrings
Edit online
When you run Ceph with authentication enabled, the ceph administrative commands and Ceph clients require authentication keys to
access the Ceph storage cluster.

The most common way to provide these keys to the ceph administrative commands and clients is to include a Ceph keyring under
the /etc/ceph/ directory. The file name is usually ceph.client.admin.keyring or $cluster.client.admin.keyring. If
you include the keyring under the /etc/ceph/ directory, you do not need to specify a keyring entry in the Ceph configuration file.

IMPORTANT: IBM recommends copying the IBM Storage Ceph cluster keyring file to nodes where you will run administrative
commands, because it contains the client.admin key.

To do so, execute the following command:

# scp USER@HOSTNAME:/etc/ceph/ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring

Replace USER with the user name used on the host with the client.admin key and HOSTNAME with the host name of that host.

NOTE: Ensure the ceph.keyring file has appropriate permissions set on the client machine.

You can specify the key itself in the Ceph configuration file using the key setting, which is not recommended, or a path to a key file
using the keyfile setting.

Cephx daemon keyrings


Edit online
Administrative users or deployment tools might generate daemon keyrings in the same way as generating user keyrings. By default,
Ceph stores daemons keyrings inside their data directory. The default keyring locations, and the capabilities necessary for the
daemon to function.

NOTE: The monitor keyring contains a key but no capabilities, and is not part of the Ceph storage cluster auth database.

The daemon data directory locations default to directories of the form:

/var/lib/ceph/$type/CLUSTER-ID

Example

/var/lib/ceph/osd/ceph-12

You can override these locations, but it is not recommended.

Cephx message signatures


Edit online
Ceph provides fine-grained control so you can enable or disable signatures for service messages between the client and Ceph. You
can enable or disable signatures for messages between Ceph daemons.

IMPORTANT: IBM recommends that Ceph authenticate all ongoing messages between the entities using the session key set up for
that initial authentication.

NOTE: Ceph kernel modules do not support signatures yet.

Pools, placement groups, and CRUSH configuration


Edit online

244 IBM Storage Ceph


As a storage administrator, you can choose to use the IBM Storage Ceph default options for pools, placement groups, and the CRUSH
algorithm or customize them for the intended workload.

Pools placement groups and CRUSH

Pools placement groups and CRUSH


Edit online
When you create pools and set the number of placement groups for the pool, Ceph uses default values when you do not specifically
override the defaults.

IMPORTANT: IBM recommends overriding some of the defaults. Specifically, set a pool’s replica size and override the default
number of placement groups.

You can set these values when running pool commands.

By default, Ceph makes three replicas of objects. If you want to set four copies of an object as the default value, a primary copy and
three replica copies, reset the default values as shown in osd_pool_default_size. If you want to allow Ceph to write a lesser
number of copies in a degraded state, set osd_pool_default_min_size to a number less than the osd_pool_default_size
value.

Example

[ceph: root@host01 /]# ceph config set global osd_pool_default_size 4 # Write an object four
times.
[ceph: root@host01 /]# ceph config set global osd_pool_default_min_size 1 # Allow writing one copy
in a degraded state.

Ensure you have a realistic number of placement groups. IBM recommends approximately 100 per OSD. Total number of OSDs
multiplied by 100 divided by the number of replicas, that is, osd_pool_default_size. For 10 OSDs and
osd_pool_default_size = 4, we would recommend approximately (100*10)/4=250.

Example

[ceph: root@host01 /]# ceph config set global osd_pool_default_pg_num 250


[ceph: root@host01 /]# ceph config set global osd_pool_default_pgp_num 250

Ceph Object Storage Daemon (OSD) configuration


Edit online
As a storage administrator, you can configure the Ceph Object Storage Daemon (OSD) to be redundant and optimized based on the
intended workload.

Reference

OSD object daemon storage configuration options

Ceph OSD configuration


Scrubbing the OSD
Backfilling an OSD
OSD recovery

Ceph OSD configuration


Edit online
All Ceph clusters have a configuration, which defines:

Cluster identity

IBM Storage Ceph 245


Authentication settings

Ceph daemon membership in the cluster

Network configuration

Host names and addresses

Paths to keyrings

Paths to OSD log files

Other runtime options

A deployment tool, such as cephadm, will typically create an initial Ceph configuration file for you. However, you can create one
yourself if you prefer to bootstrap a cluster without using a deployment tool.

For your convenience, each daemon has a series of default values. Many are set by the ceph/src/common/config_opts.h script.
You can override these settings with a Ceph configuration file or at runtime by using the monitor tell command or connecting
directly to a daemon socket on a Ceph node.

IMPORTANT: IBM does not recommend changing the default paths, as it makes it more difficult to troubleshoot Ceph later.

Reference

For more information about cephadm and the Ceph orchestrator, see Operations.

Scrubbing the OSD


Edit online
In addition to making multiple copies of objects, Ceph ensures data integrity by scrubbing placement groups. Ceph scrubbing is
analogous to the fsck command on the object storage layer.

For each placement group, Ceph generates a catalog of all objects and compares each primary object and its replicas to ensure that
no objects are missing or mismatched.

Light scrubbing (daily) checks the object size and attributes. Deep scrubbing (weekly) reads the data and uses checksums to ensure
data integrity.

Scrubbing is important for maintaining data integrity, but it can reduce performance. Adjust the following settings to increase or
decrease scrubbing operations.

Reference

For more information, see Ceph scrubbing options.

Backfilling an OSD
Edit online
When you add Ceph OSDs to a cluster or remove them from the cluster, the CRUSH algorithm rebalances the cluster by moving
placement groups to or from Ceph OSDs to restore the balance. The process of migrating placement groups and the objects they
contain can reduce the cluster operational performance considerably. To maintain operational performance, Ceph performs this
migration with the backfill process, which allows Ceph to set backfill operations to a lower priority than requests to read or write
data.

OSD recovery
Edit online

246 IBM Storage Ceph


When the cluster starts or when a Ceph OSD terminates unexpectedly and restarts, the OSD begins peering with other Ceph OSDs
before a write operation can occur.

If a Ceph OSD crashes and comes back online, usually it will be out of sync with other Ceph OSDs containing more recent versions of
objects in the placement groups. When this happens, the Ceph OSD goes into recovery mode and seeks to get the latest copy of the
data and bring its map back up to date. Depending upon how long the Ceph OSD was down, the OSD’s objects and placement groups
may be significantly out of date. Also, if a failure domain went down, for example, a rack, more than one Ceph OSD might come back
online at the same time. This can make the recovery process time consuming and resource intensive.

To maintain operational performance, Ceph performs recovery with limitations on the number of recovery requests, threads, and
object chunk sizes which allows Ceph to perform well in a degraded state.

Ceph Monitor and OSD interaction configuration


Edit online
As a storage administrator, you must properly configure the interactions between the Ceph Monitors and OSDs to ensure a stable
working environment.

Reference

Ceph Monitor and OSD configuration options

Ceph Monitor and OSD interaction


OSD heartbeat
Reporting an OSD as down
Reporting a peering failure
OSD reporting status

Ceph Monitor and OSD interaction


Edit online
After you have completed your initial Ceph configuration, you can deploy and run Ceph. When you execute a command such as ceph
health or ceph -s, the Ceph Monitor reports on the current state of the Ceph storage cluster. The Ceph Monitor knows about the
Ceph storage cluster by requiring reports from each Ceph OSD daemon, and by receiving reports from Ceph OSD daemons about the
status of their neighboring Ceph OSD daemons. If the Ceph Monitor does not receive reports, or if it receives reports of changes in
the Ceph storage cluster, the Ceph Monitor updates the status of the Ceph cluster map.

Ceph provides reasonable default settings for Ceph Monitor and OSD interaction. However, you can override the defaults. The
following sections describe how Ceph Monitors and Ceph OSD daemons interact for the purposes of monitoring the Ceph storage
cluster.

OSD heartbeat
Edit online
Each Ceph OSD daemon checks the heartbeat of other Ceph OSD daemons every 6 seconds. To change the heartbeat interval,
change the value at runtime:

Syntax

ceph config set osd osd_heartbeat_interval TIME_IN_SECONDS

Example

[ceph: root@host01 /]# ceph config set osd osd_heartbeat_interval 60

If a neighboring Ceph OSD daemon does not send heartbeat packets within a 20 second grace period, the Ceph OSD daemon might
consider the neighboring Ceph OSD daemon down. It can report it back to a Ceph Monitor, which updates the Ceph cluster map. To
change the grace period, set the value at runtime:

IBM Storage Ceph 247


Syntax

ceph config set osd osd_heartbeat_grace TIME_IN_SECONDS

Example

[ceph: root@host01 /]# ceph config set osd osd_heartbeat_grace 30

Figure 1. Check heartbeats

Reporting an OSD as down


Edit online
By default, two Ceph OSD Daemons from different hosts must report to the Ceph Monitors that another Ceph OSD Daemon is down
before the Ceph Monitors acknowledge that the reported Ceph OSD Daemon is down.

However, there is the chance that all the OSDs reporting the failure are in different hosts in a rack with a bad switch that causes
connection problems between OSDs.

To avoid a "false alarm," Ceph considers the peers reporting the failure as a proxy for a "subcluster" that is similarly laggy. While this
is not always the case, it may help administrators localize the grace correction to a subset of the system that is performing poorly.

Ceph uses the mon_osd_reporter_subtree_level setting to group the peers into the "subcluster" by their common ancestor
type in the CRUSH map.

By default, only two reports from a different subtree are required to report another Ceph OSD Daemon down. Administrators can
change the number of reporters from unique subtrees and the common ancestor type required to report a Ceph OSD Daemon down
to a Ceph Monitor by setting the mon_osd_min_down_reporters and mon_osd_reporter_subtree_level values at runtime:

Syntax

ceph config set mon mon_osd_min_down_reporters NUMBER

Example

[ceph: root@host01 /]# ceph config set mon mon_osd_min_down_reporters 4

Syntax

ceph config set mon mon_osd_reporter_subtree_level CRUSH_ITEM

248 IBM Storage Ceph


Example

[ceph: root@host01 /]# ceph config set mon mon_osd_reporter_subtree_level host


[ceph: root@host01 /]# ceph config set mon mon_osd_reporter_subtree_level rack
[ceph: root@host01 /]# ceph config set mon mon_osd_reporter_subtree_level osd

Figure 1. Report down OSDs

Reporting a peering failure


Edit online
If a Ceph OSD daemon cannot peer with any of the Ceph OSD daemons defined in its Ceph configuration file or the cluster map, it
pings a Ceph Monitor for the most recent copy of the cluster map every 30 seconds. You can change the Ceph Monitor heartbeat
interval by setting the value at runtime:

Syntax

ceph config set osd osd_mon_heartbeat_interval TIME_IN_SECONDS

Example

[ceph: root@host01 /]# ceph config set osd osd_mon_heartbeat_interval 60

Figure 1. Report peering failure

OSD reporting status


Edit online

IBM Storage Ceph 249


If a Ceph OSD Daemon does not report to a Ceph Monitor, the Ceph Monitor marks the Ceph OSD Daemon down after the
mon_osd_report_timeout, which is 900 seconds, elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a
reportable event such as a failure, a change in placement group stats, a change in up_thru or when it boots within 5 seconds.

You can change the Ceph OSD Daemon minimum report interval by setting the osd_mon_report_interval value at runtime:

Syntax

ceph config set osd osd_mon_report_interval TIME_IN_SECONDS

To get, set, and verify the config you can use the following example:

Example

[ceph: root@host01 /]# ceph config get osd osd_mon_report_interval


5

[ceph: root@host01 /]# ceph config set osd osd_mon_report_interval


20

[ceph: root@host01 /]# ceph config dump | grep osd


global advanced osd_pool_default_crush_rule -1
osd basic osd_memory_target 4294967296
osd advanced osd_mon_report_interval 20

Figure 1. Ceph configuration update

Ceph debugging and logging configuration


Edit online
As a storage administrator, you can increase the amount of debugging and logging information in cephadm to help diagnose
problems with IBM Storage Ceph.

Prerequisite

A running IBM Storage Ceph cluster.

250 IBM Storage Ceph


Reference

For more information about troubleshooting cephadm, see Cephadm troubleshooting.

For more information about cephadm logging, see Cephadm operations.

General configuration options


Edit online
These are the general configuration options for Ceph.

NOTE: Typically, these will be set automatically by deployment tools, such as cephadm.

fsid
Description
The file system ID. One per cluster.

Type
UUID

Required
No.

Default
N/A. Usually generated by deployment tools.

admin_socket
Description
The socket for executing administrative commands on a daemon, irrespective of whether Ceph monitors have established a
quorum.

Type
String

Required
No

Default
/var/run/ceph/$cluster-$name.asok

pid_file
Description
The file in which the monitor or OSD will write its PID. For instance, /var/run/$cluster/$type.$id.pid will create
/var/run/ceph/mon.a.pid for the mon with id a running in the ceph cluster. The pid file is removed when the daemon stops
gracefully. If the process is not daemonized (meaning it runs with the -f or -d option), the pid file is not created.

Type
String

Required
No

Default
No

chdir
Description
The directory Ceph daemons change to once they are up and running. Default / directory recommended.

Type
String

Required
No

IBM Storage Ceph 251


Default
/

max_open_files
Description
If set, when the IBM Storage Ceph cluster starts, Ceph sets the max_open_fds at the OS level (that is, the max \# of file
descriptors). It helps prevent Ceph OSDs from running out of file descriptors.

Type
64-bit Integer

Required
No

Default
0

fatal_signal_handlers
Description
If set, we will install signal handlers for SEGV, ABRT, BUS, ILL, FPE, XCPU, XFSZ, SYS signals to generate a useful log message.

Type
Boolean

Default
true

Ceph network configuration options


Edit online
These are the common network configuration options for Ceph.

public_network
Description
The IP address and netmask of the public (front-side) network (for example, 192.168.0.0/24). Set in [global]. You can
specify comma-delimited subnets.

Type
<ip-address>/<netmask> [, <ip-address>/<netmask>]

Required No

Default N/A

public_addr
Description
The IP address for the public (front-side) network. Set for each daemon.

Type
IP Address

Required No

Default N/A

cluster_network
Description
The IP address and netmask of the cluster network (for example, 10.0.0.0/24). Set in [global]. You can specify comma-
delimited subnets.

Type
<ip-address>/<netmask> [, <ip-address>/<netmask>]

Required No

Default NA

252 IBM Storage Ceph


cluster_addr
Description
The IP address for the cluster network. Set for each daemon.

Type
Address

Required No

Default NA

ms_type
Description
The messenger type for the network transport layer. IBM supports the simple and the async messenger type using posix
semantics.

Type
String.

Required No.

Default async+posix

ms_public_type
Description
The messenger type for the network transport layer of the public network. It operates identically to ms_type, but is
applicable only to the public or front-side network. This setting enables Ceph to use a different messenger type for the public
or front-side and cluster or back-side networks.

Type String.

Required No.

Default None.

ms_cluster_type
Description
The messenger type for the network transport layer of the cluster network. It operates identically to ms_type, but is
applicable only to the cluster or back-side network. This setting enables Ceph to use a different messenger type for the public
or front-side and cluster or back-side networks.

Type String.

Required No.

Default None.

Host options You must declare at least one Ceph Monitor in the Ceph configuration file, with a mon addr setting under each
declared monitor. Ceph expects a host setting under each declared monitor, metadata server and OSD in the Ceph configuration file.

IMPORTANT: Do not use localhost. Use the short name of the node, not the fully-qualified domain name (FQDN). Do not specify
any value for host when using a third party deployment system that retrieves the node name for you.

mon_addr
Description
A list of <hostname>:<port> entries that clients can use to connect to a Ceph monitor. If not set, Ceph searches [mon.*]
sections.

Type String

Required No

Default NA

host
Description
The host name. Use this setting for specific daemon instances (for example, [osd.0]).

Type String

IBM Storage Ceph 253


Required Yes, for daemon instances.

Default localhost

TCP options Ceph disables TCP buffering by default.

ms_tcp_nodelay
Description
Ceph enables ms_tcp_nodelay so that each request is sent immediately (no buffering). Disabling Nagle’s algorithm
increases network traffic, which can introduce congestion. If you experience large numbers of small packets, you may try
disabling ms_tcp_nodelay, but be aware that disabling it will generally increase latency.

Type Boolean

Required No

Default true

ms_tcp_rcvbuf
Description
The size of the socket buffer on the receiving end of a network connection. Disabled by default.

Type 32-bit Integer

Required No

Default 0

ms_tcp_read_timeout
Description
If a client or daemon makes a request to another Ceph daemon and does not drop an unused connection, the tcp read
timeout defines the connection as idle after the specified number of seconds.

Type Unsigned 64-bit Integer

Required No

Default 900 15 minutes.

Bind options The bind options configure the default port ranges for the Ceph OSD daemons. The default range is 6800:7100. You
can also enable Ceph daemons to bind to IPv6 addresses.

IMPORTANT: Verify that the firewall configuration allows you to use the configured port range.

ms_bind_port_min
Description
The minimum port number to which an OSD daemon will bind.

Type 32-bit Integer

Default 6800

Required No

ms_bind_port_max
Description
The maximum port number to which an OSD daemon will bind.

Type 32-bit Integer

Default 7300

Required No.

ms_bind_ipv6
Description
Enables Ceph daemons to bind to IPv6 addresses.

Type Boolean

Default false

254 IBM Storage Ceph


Required No

Asynchronous messenger options These Ceph messenger options configure the behavior of AsyncMessenger.

ms_async_transport_type
Description
Transport type used by the AsyncMessenger. IBM supports the posix setting, but does not support the dpdk or rdma
settings at this time. POSIX uses standard TCP/IP networking and is the default value. Other transport types are experimental
and are NOT supported.

Type String

Required No

Default posix

ms_async_op_threads
Description
Initial number of worker threads used by each AsyncMessenger instance. This configuration setting SHOULD equal the
number of replicas or erasure code chunks, but it may be set lower if the CPU core count is low or the number of OSDs on a
single server is high.

Type 64-bit Unsigned Integer

Required No

Default 3

ms_async_max_op_threads
Description
The maximum number of worker threads used by each AsyncMessenger instance. Set to lower values if the OSD host has
limited CPU count, and increase if Ceph is underutilizing CPUs are underutilized.

Type 64-bit Unsigned Integer

Required No

Default 5

ms_async_set_affinity
Description
Set to true to bind AsyncMessenger workers to particular CPU cores.

Type Boolean

Required No

Default true

ms_async_affinity_cores
Description
When ms_async_set_affinity is true, this string specifies how AsyncMessenger workers are bound to CPU cores. For
example, 0,2 will bind workers \#1 and \#2 to CPU cores \#0 and \#2, respectively.

NOTE: When manually setting affinity, make sure to not assign workers to virtual CPUs created as an effect of hyper threading
or similar technology, because they are slower than physical CPU cores.

Type String

Required No

Default (empty)

ms_async_send_inline
Description
Send messages directly from the thread that generated them instead of queuing and sending from the AsyncMessenger
thread. This option is known to decrease performance on systems with a lot of CPU cores, so it’s disabled by default.

Type Boolean

IBM Storage Ceph 255


Required No

Default false

Ceph Monitor configuration options


Edit online
The following are Ceph monitor configuration options that can be set up during deployment.

You can set these configuration options with the ceph config set mon CONFIGURATION_OPTION VALUE command.

mon_initial_members
Description The IDs of initial monitors in a cluster during startup. If specified, Ceph requires an odd number of monitors to
form an initial quorum (for example, 3).

Type
String

Default
None

mon_force_quorum_join
Description
Force monitor to join quorum even if it has been previously removed from the map

Type
Boolean

Default
False

mon_dns_srv_name
Description
The service name used for querying the DNS for the monitor hosts/addresses.

Type
String

Default
ceph-mon

fsid
Description
The cluster ID. One per cluster.

Type
UUID

Required
Yes.

Default
N/A. May be generated by a deployment tool if not specified.

mon_data
Description
The monitor’s data location.

Type
String

Default
/var/lib/ceph/mon/$cluster-$id

mon_data_size_warn
Description
Ceph issues a HEALTH_WARN status in the cluster log when the monitor’s data store reaches this threshold. The default value

256 IBM Storage Ceph


is 15GB.

Type
Integer

Default
15*1024*1024*1024*

mon_data_avail_warn
Description
Ceph issues a HEALTH_WARN status in the cluster log when the available disk space of the monitor’s data store is lower than
or equal to this percentage.

Type
Integer

Default
30

mon_data_avail_crit
Description
Ceph issues a HEALTH_ERR status in the cluster log when the available disk space of the monitor’s data store is lower or equal
to this percentage.

Type
Integer

Default
5

mon_warn_on_cache_pools_without_hit_sets
Description
Ceph issues a HEALTH_WARN status in the cluster log if a cache pool does not have the hit_set_type parameter set.

Type
Boolean

Default
True

mon_warn_on_crush_straw_calc_version_zero
Description
Ceph issues a HEALTH_WARN status in the cluster log if the CRUSH’s straw_calc_version is zero.

Type
Boolean

Default
True

mon_warn_on_legacy_crush_tunables
Description
Ceph issues a HEALTH_WARN status in the cluster log if CRUSH tunables are too old (older than
mon_min_crush_required_version).

Type
Boolean

Default
True

mon_crush_min_required_version
Description
This setting defines the minimum tunable profile version required by the cluster.

Type
String

Default
hammer

IBM Storage Ceph 257


mon_warn_on_osd_down_out_interval_zero
Description
Ceph issues a HEALTH_WARN status in the cluster log if the mon_osd_down_out_interval setting is zero, because the
Leader behaves in a similar manner when the noout flag is set. Administrators find it easier to troubleshoot a cluster by
setting the noout flag. Ceph issues the warning to ensure administrators know that the setting is zero.

Type
Boolean

Default
True

mon_cache_target_full_warn_ratio
Description
Ceph issues a warning when between the ratio of cache_target_full and target_max_object.

Type
Float

Default
0.66

mon_health_data_update_interval
Description
How often (in seconds) a monitor in the quorum shares its health status with its peers. A negative number disables health
updates.

Type
Float

Default
60

mon_health_to_clog
Description
This setting enables Ceph to send a health summary to the cluster log periodically.

Type
Boolean

Default
True

mon_health_detail_to_clog
Description
This setting enable Ceph to send a health details to the cluster log periodically.

Type
Boolean

Default
True

mon_op_complaint_time
Description
Number of seconds after which the Ceph Monitor operation is considered blocked after no updates.

Type
Integer

Default
30

mon_health_to_clog_tick_interval
Description
How often (in seconds) the monitor sends a health summary to the cluster log. A non-positive number disables it. If the
current health summary is empty or identical to the last time, the monitor will not send the status to the cluster log.

258 IBM Storage Ceph


Type
Integer

Default
60.000000

mon_health_to_clog_interval
Description
How often (in seconds) the monitor sends a health summary to the cluster log. A non-positive number disables it. The monitor
will always send the summary to the cluster log.

Type
Integer

Default
600

mon_osd_full_ratio
Description
The percentage of disk space used before an OSD is considered full.

Type
Float:

Default
.95

mon_osd_nearfull_ratio
Description
The percentage of disk space used before an OSD is considered nearfull.

Type
Float

Default
.85

mon_sync_trim_timeout
Description;

Type
Double

Default
30.0

mon_sync_heartbeat_timeout
Description;

Type
Double

Default
30.0

mon_sync_heartbeat_interval
Description;

Type
Double

Default
5.0

mon_sync_backoff_timeout
Description;

Type
Double

IBM Storage Ceph 259


Default
30.0

mon_sync_timeout
Description
The number of seconds the monitor will wait for the next update message from its sync provider before it gives up and
bootstraps again.

Type
Double

Default
60.000000

mon_sync_max_retries
Description;

Type
Integer

Default
5

mon_sync_max_payload_size
Description
The maximum size for a sync payload (in bytes).

Type
32-bit Integer

Default
1045676

paxos_max_join_drift
Description
The maximum Paxos iterations before we must first sync the monitor data stores. When a monitor finds that its peer is too far
ahead of it, it will first sync with data stores before moving on.

Type
Integer

Default
10

paxos_stash_full_interval
Description
How often (in commits) to stash a full copy of the PaxosService state. Currently this setting only affects mds, mon, auth and
mgr PaxosServices.

Type
Integer

Default
25

paxos_propose_interval
Description
Gather updates for this time interval before proposing a map update.

Type
Double

Default
1.0

paxos_min
Description
The minimum number of paxos states to keep around

260 IBM Storage Ceph


Type
Integer

Default
500

paxos_min_wait
Description
The minimum amount of time to gather updates after a period of inactivity.

Type
Double

Default
0.05

paxos_trim_min
Description
Number of extra proposals tolerated before trimming.

Type
Integer

Default
250

paxos_trim_max
Description
The maximum number of extra proposals to trim at a time.

Type
Integer

Default
500

paxos_service_trim_min
Description
The minimum amount of versions to trigger a trim (0 disables it).

Type
Integer

Default
250

paxos_service_trim_max
Description
The maximum amount of versions to trim during a single proposal (0 disables it).

Type
Integer

Default
500

mon_max_log_epochs
Description
The maximum amount of log epochs to trim during a single proposal.

Type
Integer

Default
500

mon_max_pgmap_epochs
Description
The maximum amount of pgmap epochs to trim during a single proposal

IBM Storage Ceph 261


Type
Integer

Default
500

mon_mds_force_trim_to
Description
Force monitor to trim mdsmaps to this point (0 disables it. dangerous, use with care)

Type
Integer

Default
0

mon_osd_force_trim_to
Description
Force monitor to trim osdmaps to this point, even if there is PGs not clean at the specified epoch (0 disables it. dangerous, use
with care)

Type
Integer

Default
0

mon_osd_cache_size
Description
The size of osdmaps cache, not to rely on underlying store’s cache.

Type
Integer

Default
500

mon_election_timeout
Description
On election proposer, maximum waiting time for all ACKs in seconds.

Type
Float

Default
5

mon_lease
Description
The length (in seconds) of the lease on the monitor’s versions.

Type
Float

Default
5

mon_lease_renew_interval_factor
Description
mon lease \* mon lease renew interval factor will be the interval for the Leader to renew the other monitor’s
leases. The factor should be less than 1.0.

Type
Float

Default
0.6

mon_lease_ack_timeout_factor

262 IBM Storage Ceph


Description
The Leader will wait mon lease \* mon lease ack timeout factor for the Providers to acknowledge the lease
extension.

Type
Float

Default
2.0

mon_accept_timeout_factor
Description
The Leader will wait mon lease \* mon accept timeout factor for the Requesters to accept a Paxos update. It is also
used during the Paxos recovery phase for similar purposes.

Type
Float

Default
2.0

mon_min_osdmap_epochs
Description
Minimum number of OSD map epochs to keep at all times.

Type
32-bit Integer

Default
500

mon_max_pgmap_epochs
Description
Maximum number of PG map epochs the monitor should keep.

Type
32-bit Integer

Default
500

mon_max_log_epochs
Description
Maximum number of Log epochs the monitor should keep.

Type
32-bit Integer

Default
500

clock_offset
Description
How much to offset the system clock. See Clock.cc for details.

Type
Double

Default
0

mon_tick_interval
Description
A monitor’s tick interval in seconds.

Type
32-bit Integer

Default
5

IBM Storage Ceph 263


mon_clock_drift_allowed
Description
The clock drift in seconds allowed between monitors.

Type
Float

Default
.050

mon_clock_drift_warn_backoff
Description
Exponential backoff for clock drift warnings.

Type
Float

Default
5

mon_timecheck_interval
Description
The time check interval (clock drift check) in seconds for the leader.

Type
Float

Default
300.0

mon_timecheck_skew_interval
Description
The time check interval (clock drift check) in seconds when in the presence of a skew in seconds for the Leader.

Type
Float

Default
30.0

mon_max_osd
Description
The maximum number of OSDs allowed in the cluster.

Type
32-bit Integer

Default
10000

mon_globalid_prealloc
Description
The number of global IDs to pre-allocate for clients and daemons in the cluster.

Type
32-bit Integer

Default
10000

mon_sync_fs_threshold
Description
Synchronize with the filesystem when writing the specified number of objects. Set it to 0 to disable it.

Type
32-bit Integer

Default
5

mon_subscribe_interval

264 IBM Storage Ceph


Description
The refresh interval, in seconds, for subscriptions. The subscription mechanism enables obtaining the cluster maps and log
information.

Type
Double

Default
86400.000000

mon_stat_smooth_intervals
Description
Ceph will smooth statistics over the last N PG maps.

Type
Integer

Default
6

mon_probe_timeout
Description
Number of seconds the monitor will wait to find peers before bootstrapping.

Type
Double

Default
2.0

mon_daemon_bytes
Description
The message memory cap for metadata server and OSD messages (in bytes).

Type
64-bit Integer Unsigned

Default
400ul << 20

mon_max_log_entries_per_event
Description
The maximum number of log entries per event.

Type
Integer

Default
4096

mon_osd_prime_pg_temp
Description
Enables or disable priming the PGMap with the previous OSDs when an out OSD comes back into the cluster. With the true
setting, the clients will continue to use the previous OSDs until the newly in OSDs as that PG peered.

Type
Boolean

Default
true

mon_osd_prime_pg_temp_max_time
Description
How much time in seconds the monitor should spend trying to prime the PGMap when an out OSD comes back into the
cluster.

Type
Float

IBM Storage Ceph 265


Default
0.5

mon_osd_prime_pg_temp_max_time_estimate
Description
Maximum estimate of time spent on each PG before we prime all PGs in parallel.

Type
Float

Default
0.25

mon_osd_allow_primary_affinity
Description
Allow primary_affinity to be set in the osdmap.

Type
Boolean

Default
False

mon_osd_pool_ec_fast_read
Description
Whether turn on fast read on the pool or not. It will be used as the default setting of newly created erasure pools if
fast_read is not specified at create time.

Type
Boolean

Default
False

mon_mds_skip_sanity
Description
Skip safety assertions on FSMap, in case of bugs where we want to continue anyway. Monitor terminates if the FSMap sanity
check fails, but we can disable it by enabling this option.

Type
Boolean

Default
False

mon_max_mdsmap_epochs
Description
The maximum amount of mdsmap epochs to trim during a single proposal.

Type
Integer

Default
500

mon_config_key_max_entry_size
Description
The maximum size of config-key entry (in bytes).

Type
Integer

Default
65536

mon_warn_pg_not_scrubbed_ratio
Description
The percentage of the scrub max interval past the scrub max interval to warn.

266 IBM Storage Ceph


Type
float

Default
0.5

mon_warn_pg_not_deep_scrubbed_ratio
Description
The percentage of the deep scrub interval past the deep scrub interval to warn.

Type
float

Default
0.75

mon_scrub_interval
Description
How often, in seconds, the monitor scrub its store by comparing the stored checksums with the computed ones of all the
stored keys.

Type
Integer

Default
3600*24

mon_scrub_timeout
Description
The timeout to restart scrub of mon quorum participant does not respond for the latest chunk.

Type
Integer

Default
5 min

mon_scrub_max_keys
Description
The maximum number of keys to scrub each time.

Type
Integer

Default
100

mon_scrub_inject_crc_mismatch
Description
The probability of injecting CRC mismatches into Ceph Monitor scrub.

Type
Integer

Default
3600*24

mon_scrub_inject_missing_keys
Description
The probability of injecting missing keys into monitor scrub.

Type
float

Default
0

mon_compact_on_start

IBM Storage Ceph 267


Description
Compact the database used as Ceph Monitor store on ceph-mon start. A manual compaction helps to shrink the monitor
database and improve its performance if the regular compaction fails to work.

Type
Boolean

Default
False

mon_compact_on_bootstrap
Description
Compact the database used as Ceph Monitor store on bootstrap. The monitor starts probing each other for creating a quorum
after bootstrap. If it times out before joining the quorum, it will start over and bootstrap itself again.

Type
Boolean

Default
False

mon_compact_on_trim
Description
Compact a certain prefix (including paxos) when we trim its old states.

Type
Boolean

Default
True

mon_cpu_threads
Description
Number of threads for performing CPU intensive work on monitor.

Type
Boolean

Default
True

mon_osd_mapping_pgs_per_chunk
Description
We calculate the mapping from the placement group to OSDs in chunks. This option specifies the number of placement groups
per chunk.

Type
Integer

Default
4096

mon_osd_max_split_count
Description
Largest number of PGs per "involved" OSD to let split create. When we increase the pg_num of a pool, the placement groups
will be split on all OSDs serving that pool. We want to avoid extreme multipliers on PG splits.

Type
Integer

Default
300

rados_mon_op_timeout
Description
Number of seconds to wait for a response from the monitor before returning an error from a rados operation. 0 means at limit,
or no wait time.

268 IBM Storage Ceph


Type
Double

Default
0

Cephx configuration options


Edit online
The following are Cephx configuration options that can be set up during deployment.

auth_cluster_required
Description
Valid settings are cephx or none.

Type
String

Required
No

Default
cephx.

auth_service_required
Description
Valid settings are cephx or none.

Type
String

Required
No

Default
cephx.

auth_client_required
Description
If enabled, the IBM Storage Ceph cluster daemons require Ceph clients to authenticate with the IBM Storage Ceph cluster in
order to access Ceph services. Valid settings are cephx or none.

Type
String

Required
No

Default
cephx.

keyring
Description
The path to the keyring file.

Type
String

Required
No

Default
/etc/ceph/$cluster.$name.keyring, /etc/ceph/$cluster.keyring, /etc/ceph/keyring,
/etc/ceph/keyring.bin

keyfile

IBM Storage Ceph 269


Description
The path to a key file (that is. a file containing only the key).

Type
String

Required
No

Default
None

key
Description
The key (that is, the text string of the key itself). Not recommended.

Type
String

Required
No

Default
None

ceph-mon
Location
$mon_data/keyring

Capabilities
mon 'allow *'

ceph-osd
Location
$osd_data/keyring

Capabilities
mon 'allow profile osd' osd 'allow *'

radosgw
Location
$rgw_data/keyring

Capabilities
mon 'allow rwx' osd 'allow rwx'

cephx_require_signatures
Description
If set to true, Ceph requires signatures on all message traffic between the Ceph client and the IBM Storage Ceph cluster, and
between daemons comprising the IBM Storage Ceph cluster.

Type
Boolean

Required
No

Default
false

cephx_cluster_require_signatures
Description
If set to true, Ceph requires signatures on all message traffic between Ceph daemons comprising the IBM Storage Ceph
cluster.

Type
Boolean

Required
No

270 IBM Storage Ceph


Default
false

cephx_service_require_signatures
Description
If set to true, Ceph requires signatures on all message traffic between Ceph clients and the IBM Storage Ceph cluster.

Type
Boolean

Required
No

Default
false

cephx_sign_messages
Description
If the Ceph version supports message signing, Ceph will sign all messages so they cannot be spoofed.

Type
Boolean

Default
true

auth_service_ticket_ttl
Description
When the IBM Storage Ceph cluster sends a Ceph client a ticket for authentication, the cluster assigns the ticket a time to live.

Type
Double

Default
60*60

Pools, placement groups, and CRUSH configuration options


Edit online
The Ceph options that govern pools, placement groups, and the CRUSH algorithm.

mon_allow_pool_delete
Description
Allows a monitor to delete a pool. In RHCS 3 and later releases, the monitor cannot delete the pool by default as an added
measure to protect data.

Type
Boolean

Default
false

mon_max_pool_pg_num
Description
The maximum number of placement groups per pool.

Type
Integer

Default
65536

mon_pg_create_interval
Description
Number of seconds between PG creation in the same Ceph OSD Daemon.

IBM Storage Ceph 271


Type
Float

Default
30.0

mon_pg_stuck_threshold
Description
Number of seconds after which PGs can be considered as being stuck.

Type
32-bit Integer

Default
300

mon_pg_min_inactive
Description
Ceph issues a HEALTH_ERR status in the cluster log if the number of PGs that remain inactive longer than the
mon_pg_stuck_threshold exceeds this setting. The default setting is one PG. A non-positive number disables this setting.

Type
Integer

Default
1

mon_pg_warn_min_per_osd
Description
Ceph issues a HEALTH_WARN status in the cluster log if the average number of PGs per OSD in the cluster is less than this
setting. A non-positive number disables this setting.

Type
Integer

Default
30

mon_pg_warn_max_per_osd
Description
Ceph issues a HEALTH_WARN status in the cluster log if the average number of PGs per OSD in the cluster is greater than this
setting. A non-positive number disables this setting.

Type
Integer

Default
300

mon_pg_warn_min_objects
Description
Do not warn if the total number of objects in the cluster is below this number.

Type
Integer

Default
1000

mon_pg_warn_min_pool_objects
Description
Do not warn on pools whose object number is below this number.

Type
Integer

Default
1000

mon_pg_check_down_all_threshold

272 IBM Storage Ceph


Description
The threshold of down OSDs by percentage after which Ceph checks all PGs to ensure they are not stuck or stale.

Type
Float

Default
0.5

mon_pg_warn_max_object_skew
Description
Ceph issue a HEALTH_WARN status in the cluster log if the average number of objects in a pool is greater than mon pg warn
max object skew times the average number of objects for all pools. A non-positive number disables this setting.

Type
Float

Default
10

mon_delta_reset_interval
Description
The number of seconds of inactivity before Ceph resets the PG delta to zero. Ceph keeps track of the delta of the used space
for each pool to aid administrators in evaluating the progress of recovery and performance.

Type
Integer

Default
10

mon_osd_max_op_age
Description
The maximum age in seconds for an operation to complete before issuing a HEALTH_WARN status.

Type
Float

Default
32.0

osd_pg_bits
Description
Placement group bits per Ceph OSD Daemon.

Type
32-bit Integer

Default
6

osd_pgp_bits
Description
The number of bits per Ceph OSD Daemon for Placement Groups for Placement purpose (PGPs).

Type
32-bit Integer

Default
6

osd_crush_chooseleaf_type
Description
The bucket type to use for chooseleaf in a CRUSH rule. Uses ordinal rank rather than name.

Type
32-bit Integer

Default
1. Typically a host containing one or more Ceph OSD Daemons.

IBM Storage Ceph 273


osd_pool_default_crush_replicated_ruleset
Description
The default CRUSH ruleset to use when creating a replicated pool.

Type
8-bit Integer

Default
0

osd_pool_erasure_code_stripe_unit
Description
Sets the default size, in bytes, of a chunk of an object stripe for erasure coded pools. Every object of size S will be stored as N
stripes, with each data chunk receiving stripe unit bytes. Each stripe of N * stripe unit bytes will be
encoded/decoded individually. This option can be overridden by the stripe_unit setting in an erasure code profile.

Type
Unsigned 32-bit Integer

Default
4096

osd_pool_default_size
Description
Sets the number of replicas for objects in the pool. The default value is the same as ceph osd pool set {pool-name}
size {size}.

Type
32-bit Integer

Default
3

osd_pool_default_min_size
Description
Sets the minimum number of written replicas for objects in the pool in order to acknowledge a write operation to the client. If
the minimum is not met, Ceph will not acknowledge the write to the client. This setting ensures a minimum number of replicas
when operating in degraded mode.

Type
32-bit Integer

Default
0, which means no particular minimum. If 0, minimum is size - (size / 2).

osd_pool_default_pg_num
Description
The default number of placement groups for a pool. The default value is the same as pg_num with mkpool.

Type
32-bit Integer

Default
32

osd_pool_default_pgp_num
Description
The default number of placement groups for placement for a pool. The default value is the same as pgp_num with mkpool. PG
and PGP should be equal.

Type
32-bit Integer

Default
0

osd_pool_default_flags
Description
The default flags for new pools.

274 IBM Storage Ceph


Type
32-bit Integer

Default
0

osd_max_pgls
Description
The maximum number of placement groups to list. A client requesting a large number can tie up the Ceph OSD Daemon.

Type
Unsigned 64-bit Integer

Default
1024

Note
Default should be fine.

osd_min_pg_log_entries
Description
The minimum number of placement group logs to maintain when trimming log files.

Type
32-bit Int Unsigned

Default
250

osd_default_data_pool_replay_window
Description
The time, in seconds, for an OSD to wait for a client to replay a request.

Type
32-bit Integer

Default
45

Object Storage Daemon (OSD) configuration options


Edit online
The following are Ceph Object Storage Daemon (OSD) configuration options that can be set during deployment.

You can set these configuration options with the ceph config set osd CONFIGURATION_OPTION VALUE command.

osd_uuid
Description
The universally unique identifier (UUID) for the Ceph OSD.

Type
UUID

Default
The UUID.

NOTE: The osd uuid applies to a single Ceph OSD. The fsid applies to the entire cluster.

osd_data
Description
The path to the OSD’s data. You must create the directory when deploying Ceph. Mount a drive for OSD data at this mount
point.

IMPORTANT: IBM does not recommend changing the default.

Type
String

IBM Storage Ceph 275


Default
/var/lib/ceph/osd/$cluster-$id

osd_max_write_size
Description
The maximum size of a write in megabytes.

Type
32-bit Integer

Default
90

osd_client_message_size_cap
Description
The largest client data message allowed in memory.

Type
64-bit Integer Unsigned

Default
500MB. 500*1024L*1024L

osd_class_dir
Description
The class path for RADOS class plug-ins.

Type
String

Default
$libdir/rados-classes

osd_max_scrubs
Description
The maximum number of simultaneous scrub operations for a Ceph OSD.

Type
32-bit Int

Default
1

osd_scrub_thread_timeout
Description
The maximum time in seconds before timing out a scrub thread.

Type
32-bit Integer

Default
60

osd_scrub_finalize_thread_timeout
Description
The maximum time in seconds before timing out a scrub finalize thread.

Type
32-bit Integer

Default
60*10

osd_scrub_begin_hour
Description
This restricts scrubbing to this hour of the day or later. Use osd_scrub_begin_hour = 0 and osd_scrub_end_hour = 0
to allow scrubbing the entire day. Along with osd_scrub_end_hour, they define a time window, in which the scrubs can
happen. But a scrub is performed no matter whether the time window allows or not, as long as the placement group’s scrub
interval exceeds osd_scrub_max_interval.

276 IBM Storage Ceph


Type
Integer

Default
0

Allowed range
0 to 23

osd_scrub_end_hour
Description
This restricts scrubbing to the hour earlier than this. Use osd_scrub_begin_hour = 0 and osd_scrub_end_hour = 0 to
allow scrubbing for the entire day. Along with osd_scrub_begin_hour, they define a time window, in which the scrubs can
happen. But a scrub is performed no matter whether the time window allows or not, as long as the placement group's scrub
interval exceeds osd_scrub_max_interval.

Type
Integer

Default
0

Allowed range
0 to 23

osd_scrub_load_threshold
Description
The maximum load. Ceph will not scrub when the system load (as defined by the getloadavg() function) is higher than this
number. Default is 0.5.

Type
Float

Default
0.5

osd_scrub_min_interval
Description
The minimum interval in seconds for scrubbing the Ceph OSD when the IBM Storage Ceph cluster load is low.

Type
Float

Default
Once per day. 60*60*24

osd_scrub_max_interval
Description
The maximum interval in seconds for scrubbing the Ceph OSD irrespective of cluster load.

Type
Float

Default
Once per week. 7*60*60*24

osd_scrub_interval_randomize_ratio
Description
Takes the ratio and randomizes the scheduled scrub between osd scrub min interval and osd scrub max
interval.

Type
Float

Default
0.5.

mon_warn_not_scrubbed

IBM Storage Ceph 277


Description
Number of seconds after osd_scrub_interval to warn about any PGs that were not scrubbed.

Type
Integer

Default
0 (no warning).

osd_scrub_chunk_min
Description
The object store is partitioned into chunks which end on hash boundaries. For chunky scrubs, Ceph scrubs objects one chunk
at a time with writes blocked for that chunk. The osd scrub chunk min setting represents the minimum number of chunks
to scrub.

Type
32-bit Integer

Default
5

osd_scrub_chunk_max
Description
The maximum number of chunks to scrub.

Type
32-bit Integer

Default
25

osd_scrub_sleep
Description
The time to sleep between deep scrub operations.

Type
Float

Default
0 (or off).

osd_scrub_during_recovery
Description
Allows scrubbing during recovery.

Type
Boolean

Default
false

osd_scrub_invalid_stats
Description
Forces extra scrub to fix stats marked as invalid.

Type
Boolean

Default
true

osd_scrub_priority
Description
Controls queue priority of scrub operations versus client I/O.

Type
Unsigned 32-bit Integer

Default
5

278 IBM Storage Ceph


osd_requested_scrub_priority
Description
The priority set for user requested scrub on the work queue. If this value were to be smaller than
osd_client_op_priority, it can be boosted to the value of osd_client_op_priority when scrub is blocking client
operations.

Type
Unsigned 32-bit Integer

Default
120

osd_scrub_cost
Description
Cost of scrub operations in megabytes for queue scheduling purposes.

Type
Unsigned 32-bit Integer

Default
52428800

osd_deep_scrub_interval
Description
The interval for deep scrubbing, that is fully reading all data. The osd scrub load threshold parameter does not affect
this setting.

Type
Float

Default
Once per week. 60*60*24*7

osd_deep_scrub_stride
Description
Read size when doing a deep scrub.

Type
32-bit Integer

Default
512 KB. 524288

mon_warn_not_deep_scrubbed
Description
Number of seconds after osd_deep_scrub_interval to warn about any PGs that were not scrubbed.

Type
Integer

Default
0 (no warning)

osd_deep_scrub_randomize_ratio
Description
The rate at which scrubs will randomly become deep scrubs (even before osd_deep_scrub_interval has passed).

Type
Float

Default
0.15 or 15%

osd_deep_scrub_update_digest_min_age
Description
How many seconds old objects must be before scrub updates the whole-object digest.

Type
Integer

IBM Storage Ceph 279


Default
7200 (120 hours)

osd_deep_scrub_large_omap_object_key_threshold
Description
Warning when you encounter an object with more OMAP keys than this.

Type
Integer

Default
200000

osd_deep_scrub_large_omap_object_value_sum_threshold
Description
Warning when you encounter an object with more OMAP key bytes than this.

Type
Integer

Default
1 G

osd_delete_sleep
Description
Time in seconds to sleep before the next removal transaction. This throttles the placement group deletion process.

Type
Float

Default
0.0

osd_delete_sleep_hdd
Description
Time in seconds to sleep before the next removal transaction for HDDs.

Type
Float

Default
5.0

osd_delete_sleep_ssd
Description
Time in seconds to sleep before the next removal transaction for SSDs.

Type
Float

Default
1.0

osd_delete_sleep_hybrid
Description
Time in seconds to sleep before the next removal transaction when Ceph OSD data is on HDD and OSD journal or WAL and DB
is on SSD.

Type
Float

Default
1.0

osd_op_num_shards
Description
The number of shards for client operations.

Type
32-bit Integer

280 IBM Storage Ceph


Default
0

osd_op_num_threads_per_shard
Description
The number of threads per shard for client operations.

Type
32-bit Integer

Default
0

osd_op_num_shards_hdd
Description
The number of shards for HDD operations.

Type
32-bit Integer

Default
5

osd_op_num_threads_per_shard_hdd
Description
The number of threads per shard for HDD operations.

Type
32-bit Integer

Default
1

osd_op_num_shards_ssd
Description
The number of shards for SSD operations.

Type
32-bit Integer

Default
8

osd_op_num_threads_per_shard_ssd
Description
The number of threads per shard for SSD operations.

Type
32-bit Integer

Default
2

osd_client_op_priority
Description
The priority set for client operations. It is relative to osd recovery op priority.

Type
32-bit Integer

Default
63

Valid Range
1-63

osd_recovery_op_priority
Description
The priority set for recovery operations. It is relative to osd client op priority.

IBM Storage Ceph 281


Type
32-bit Integer

Default
3

Valid Range
1-63

osd_op_thread_timeout
Description
The Ceph OSD operation thread timeout in seconds.

Type
32-bit Integer

Default
15

osd_op_complaint_time
Description
An operation becomes complaint worthy after the specified number of seconds have elapsed.

Type
Float

Default
30

osd_disk_threads
Description
The number of disk threads, which are used to perform background disk intensive OSD operations such as scrubbing and snap
trimming.

Type
32-bit Integer

Default
1

osd_op_history_size
Description
The maximum number of completed operations to track.

Type
32-bit Unsigned Integer

Default
20

osd_op_history_duration
Description
The oldest completed operation to track.

Type
32-bit Unsigned Integer

Default
600

osd_op_log_threshold
Description
How many operations logs to display at once.

Type
32-bit Integer

Default
5

osd_op_timeout

282 IBM Storage Ceph


Description
The time in seconds after which running OSD operations time out.

Type
Integer

Default
0

IMPORTANT: Do not set the osd op timeout option unless your clients can handle the consequences. For example, setting
this parameter on clients running in virtual machines can lead to data corruption because the virtual machines interpret this
timeout as a hardware failure.

osd_max_backfills
Description
The maximum number of backfill operations allowed to or from a single OSD.

Type
64-bit Unsigned Integer

Default
1

osd_backfill_scan_min
Description
The minimum number of objects per backfill scan.

Type
32-bit Integer

Default
64

osd_backfill_scan_max
Description
The maximum number of objects per backfill scan.

Type
32-bit Integer

Default
512

osd_backfill_full_ratio
Description
Refuse to accept backfill requests when the Ceph OSD’s full ratio is above this value.

Type
Float

Default
0.85

osd_backfill_retry_interval
Description
The number of seconds to wait before retrying backfill requests.

Type
Double

Default
30.000000

osd_map_dedup
Description
Enable removing duplicates in the OSD map.

Type
Boolean

IBM Storage Ceph 283


Default
true

osd_map_cache_size
Description
The size of the OSD map cache in megabytes.

Type
32-bit Integer

Default
50

osd_map_cache_bl_size
Description
The size of the in-memory OSD map cache in OSD daemons.

Type
32-bit Integer

Default
50

osd_map_cache_bl_inc_size
Description
The size of the in-memory OSD map cache incrementals in OSD daemons.

Type
32-bit Integer

Default
100

osd_map_message_max
Description
The maximum map entries allowed per MOSDMap message.

Type
32-bit Integer

Default
40

osd_snap_trim_thread_timeout
Description
The maximum time in seconds before timing out a snap trim thread.

Type
32-bit Integer

Default
60*60*1

osd_pg_max_concurrent_snap_trims
Description
The max number of parallel snap trims/PG. This controls how many objects per PG to trim at once.

Type
32-bit Integer

Default
2

osd_snap_trim_sleep
Description
Insert a sleep between every trim operation a PG issues.

Type
32-bit Integer

284 IBM Storage Ceph


Default
0

osd_max_trimming_pgs
Description
The max number of trimming PGs

Type
32-bit Integer

Default
2

osd_backlog_thread_timeout
Description
The maximum time in seconds before timing out a backlog thread.

Type
32-bit Integer

Default
60*60*1

osd_default_notify_timeout
Description
The OSD default notification timeout (in seconds).

Type
32-bit Integer Unsigned

Default
30

osd_check_for_log_corruption
Description
Check log files for corruption. Can be computationally expensive.

Type
Boolean

Default
false

osd_remove_thread_timeout
Description
The maximum time in seconds before timing out a remove OSD thread.

Type
32-bit Integer

Default
60*60

osd_command_thread_timeout
Description
The maximum time in seconds before timing out a command thread.

Type
32-bit Integer

Default
10*60

osd_command_max_records
Description
Limits the number of lost objects to return.

Type
32-bit Integer

IBM Storage Ceph 285


Default
256

osd_auto_upgrade_tmap
Description
Uses tmap for omap on old objects.

Type
Boolean

Default
true

osd_tmapput_sets_users_tmap
Description
Uses tmap for debugging only.

Type
Boolean

Default
false

osd_preserve_trimmed_log
Description
Preserves trimmed log files, but uses more disk space.

Type
Boolean

Default
false

osd_recovery_delay_start
Description
After peering completes, Ceph delays for the specified number of seconds before starting to recover objects.

Type
Float

Default
0

osd_recovery_max_active
Description
The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests place
an increased load on the cluster.

Type
32-bit Integer

Default
0

osd_recovery_max_chunk
Description
The maximum size of a recovered chunk of data to push.

Type
64-bit Integer Unsigned

Default
8388608

osd_recovery_threads
Description
The number of threads for recovering data.

Type
32-bit Integer

286 IBM Storage Ceph


Default
1

osd_recovery_thread_timeout
Description
The maximum time in seconds before timing out a recovery thread.

Type
32-bit Integer

Default
30

osd_recover_clone_overlap
Description
Preserves clone overlap during recovery. Should always be set to true.

Type
Boolean

Default
true

rados_osd_op_timeout
Description
Number of seconds that RADOS waits for a response from the OSD before returning an error from a RADOS operation. A value
of 0 means no limit.

Type
Double

Default
0

Ceph Monitor and OSD configuration options


Edit online
When modifying heartbeat settings, include them in the [global] section of the Ceph configuration file.

mon_osd_min_up_ratio
Description
The minimum ratio of up Ceph OSD Daemons before Ceph will mark Ceph OSD Daemons down.

Type
Double

Default
.3

mon_osd_min_in_ratio
Description
The minimum ratio of in Ceph OSD Daemons before Ceph will mark Ceph OSD Daemons out.

Type Double

Default
0.750000

mon_osd_laggy_halflife
Description
The number of seconds laggy estimates will decay.

Type
Integer

Default
60*60

IBM Storage Ceph 287


mon_osd_laggy_weight
Description
The weight for new samples in laggy estimation decay.

Type
Double

Default
0.3

mon_osd_laggy_max_interval
Description
Maximum value of laggy_interval in laggy estimations (in seconds). The monitor uses an adaptive approach to evaluate
the laggy_interval of a certain OSD. This value will be used to calculate the grace time for that OSD.

Type
Integer

Default
300

mon_osd_adjust_heartbeat_grace
Description
If set to true, Ceph will scale based on laggy estimations.

Type
Boolean

Default
true

mon_osd_adjust_down_out_interval
Description
If set to true, Ceph will scaled based on laggy estimations.

Type
Boolean

Default
true

mon_osd_auto_mark_in
Description
Ceph will mark any booting Ceph OSD Daemons as in the Ceph Storage Cluster.

Type
Boolean

Default
false

mon_osd_auto_mark_auto_out_in
Description
Ceph will mark booting Ceph OSD Daemons auto marked out of the Ceph Storage Cluster as in the cluster.

Type
Boolean

Default
true

mon_osd_auto_mark_new_in
Description
Ceph will mark booting new Ceph OSD Daemons as in the Ceph Storage Cluster.

Type
Boolean

Default
true

288 IBM Storage Ceph


mon_osd_down_out_interval
Description
The number of seconds Ceph waits before marking a Ceph OSD Daemon down and out if it does not respond.

Type
32-bit Integer

Default
600

mon_osd_downout_subtree_limit
Description
The largest CRUSH unit type that Ceph will automatically mark out.

Type
String

Default
rack

mon_osd_reporter_subtree_level
Description
This setting defines the parent CRUSH unit type for the reporting OSDs. The OSDs send failure reports to the monitor if they
find an unresponsive peer. The monitor may mark the reported OSD down and then out after a grace period.

Type
String

Default
host

mon_osd_report_timeout
Description
The grace period in seconds before declaring unresponsive Ceph OSD Daemons down.

Type
32-bit Integer

Default
900

mon_osd_min_down_reporters
Description
The minimum number of Ceph OSD Daemons required to report a down Ceph OSD Daemon.

Type
32-bit Integer

Default
2

osd_heartbeat_address
Description
A Ceph OSD Daemon’s network address for heartbeats.

Type
Address

Default
The host address.

osd_heartbeat_interval
Description
How often a Ceph OSD Daemon pings its peers (in seconds).

Type
32-bit Integer

Default
6

IBM Storage Ceph 289


osd_heartbeat_grace
Description
The elapsed time when a Ceph OSD Daemon has not shown a heartbeat that the Ceph Storage Cluster considers it down.

Type
32-bit Integer

Default
20

osd_mon_heartbeat_interval
Description
Frequency of Ceph OSD Daemon pinging a Ceph Monitor if it has no Ceph OSD Daemon peers.

Type
32-bit Integer

Default
30

osd_mon_report_interval_max
Description The maximum time in seconds that a Ceph OSD Daemon can wait before it must report to a Ceph Monitor.

Type
32-bit Integer

Default
120

osd_mon_report_interval_min
Description
The minimum number of seconds a Ceph OSD Daemon may wait from startup or another reportable event before reporting to
a Ceph Monitor.

Type
32-bit Integer

Default
5

Valid Range
Should be less than osd mon report interval max

osd_mon_ack_timeout
Description
The number of seconds to wait for a Ceph Monitor to acknowledge a request for statistics.

Type
32-bit Integer

Default
30

Ceph debugging and logging configuration options


Edit online
Logging and debugging settings are not required in a Ceph configuration file, but you can override default settings as needed.

The options take a single item that is assumed to be the default for all daemons regardless of channel. For example, specifying "info"
is interpreted as "default=info". However, options can also take key/value pairs. For example, "default=daemon audit=local0" is
interpreted as "default all to daemon, override audit with local0."

log_file
Description
The location of the logging file for the cluster.

290 IBM Storage Ceph


Type
String

Required
No

Default
/var/log/ceph/$cluster-$name.log

mon_cluster_log_file
Description The location of the monitor cluster’s log file.

Type String

Required No

Default
/var/log/ceph/$cluster.log

log_max_new
Description
The maximum number of new log files.

Type
Integer

Required
No

Default
1000

log_max_recent
Description
The maximum number of recent events to include in a log file.

Type
Integer

Required
No

Default
10000

log_flush_on_exit
Description
Determines if Ceph flushes the log files after exit.

Type
Boolean

Required
No

Default
true

mon_cluster_log_file_level
Description
The level of file logging for the monitor cluster. Valid settings include "debug", "info", "sec", "warn", and "error".

Type
String

Default
"info"

log_to_stderr
Description
Determines if logging messages appear in stderr.

IBM Storage Ceph 291


Type
Boolean

Required
No

Default
true

err_to_stderr
Description
Determines if error messages appear in stderr.

Type
Boolean

Required
No

Default
true

log_to_syslog
Description
Determines if logging messages appear in syslog.

Type
Boolean

Required
No

Default
false

err_to_syslog
Description
Determines if error messages appear in syslog.

Type
Boolean

Required
No

Default
false

clog_to_syslog
Description
Determines if clog messages will be sent to syslog.

Type
Boolean

Required
No

Default
false

mon_cluster_log_to_syslog
Description
Determines if the cluster log will be output to syslog.

Type
Boolean

Required
No

292 IBM Storage Ceph


Default
false

mon_cluster_log_to_syslog_level
Description
The level of syslog logging for the monitor cluster. Valid settings include "debug", "info", "sec", "warn", and "error".

Type
String

Default
"info"

mon_cluster_log_to_syslog_facility
Description
The facility generating the syslog output. This is usually set to "daemon" for the Ceph daemons.

Type
String

Default
"daemon"

clog_to_monitors
Description
Determines if clog messages will be sent to monitors.

Type
Boolean

Required
No

Default
true

mon_cluster_log_to_graylog
Description
Determines if the cluster will output log messages to graylog.

Type
String

Default
"false"

mon_cluster_log_to_graylog_host
Description
The IP address of the graylog host. If the graylog host is different from the monitor host, override this setting with the
appropriate IP address.

Type
String

Default
"127.0.0.1"

mon_cluster_log_to_graylog_port
Description
Graylog logs will be sent to this port. Ensure the port is open for receiving data.

Type
String

Default
"12201"

osd_preserve_trimmed_log
Description
Preserves trimmed logs after trimming.

IBM Storage Ceph 293


Type
Boolean

Required
No

Default
false

osd_tmapput_sets_uses_tmap
Description
Uses tmap. For debug only.

Type
Boolean

Required
No

Default
false

osd_min_pg_log_entries
Description
The minimum number of log entries for placement groups.

Type
32-bit Unsigned Integer

Required
No

Default
1000

osd_op_log_threshold
Description
Number of op log messages to show up in one pass.

Type
Integer

Required
No

Default
5

Ceph scrubbing options


Edit online
Ceph ensures data integrity by scrubbing placement groups. The following are the Ceph scrubbing options that you can adjust to
increase or decrease scrubbing operations.

You can set these configuration options with the ceph config set global CONFIGURATION_OPTION VALUE command.

mds_max_scrub_ops_in_progress::
Description The maximum number of scrub operations performed in parallel. You can set this value with ceph config set
mds_max_scrub_ops_in_progress VALUE command.

Type
integer

Default
5

294 IBM Storage Ceph


osd_max_scrubs
Description
The maximum number of simultaneous scrub operations for a Ceph OSD Daemon.

Type
integer

Default
1

osd_scrub_begin_hour
Description
The specific hour at which the scrubbing begins. Along with osd_scrub_end_hour, you can define a time window in which
the scrubs can happen. Use osd_scrub_begin_hour = 0 and osd_scrub_end_hour = 0 to allow scrubbing the entire
day.

Type
integer

Default
0

Allowed range
[0, 23]

osd_scrub_end_hour
Description
The specific hour at which the scrubbing ends. Along with osd_scrub_begin_hour, you can define a time window, in which
the scrubs can happen. Use osd_scrub_begin_hour = 0 and osd_scrub_end_hour = 0 to allow scrubbing for the
entire day.

Type
integer

Default
0

Allowed range
[0, 23]

osd_scrub_begin_week_day
Description
The specific day on which the scrubbing begins. 0 = Sunday, 1 = Monday, etc. Along with osd_scrub_end_week_day, you
can define a time window in which scrubs can happen. Use osd_scrub_begin_week_day = 0 and
osd_scrub_end_week_day = 0 to allow scrubbing for the entire week.

Type
integer

Default
0

Allowed range
[0, 6]

osd_scrub_end_week_day
Description
This defines the day on which the scrubbing ends. 0 = Sunday, 1 = Monday, etc. Along with osd_scrub_begin_week_day,
they define a time window, in which the scrubs can happen. Use osd_scrub_begin_week_day = 0 and
osd_scrub_end_week_day = 0 to allow scrubbing for the entire week.

Type
integer

Default
0

Allowed range
[0, 6]

IBM Storage Ceph 295


osd_scrub_during_recovery
Description
Allow scrub during recovery. Setting this to false disables scheduling new scrub, and deep-scrub, while there is an active
recovery. The already running scrubs continue which is useful to reduce load on busy storage clusters.

Type
boolean

Default
false

osd_scrub_load_threshold
Description
The normalized maximum load. Scrubbing does not happen when the system load, as defined by getloadavg()/number of
online CPUs, is higher than this defined number.

Type
float

Default
0.5

osd_scrub_min_interval
Description
The minimal interval in seconds for scrubbing the Ceph OSD daemon when the Ceph storage Cluster load is low.

Type
float

Default
1 day

osd_scrub_max_interval
Description
The maximum interval in seconds for scrubbing the Ceph OSD daemon irrespective of cluster load.

Type
float

Default
7 days

osd_scrub_chunk_min
Description
The minimal number of object store chunks to scrub during a single operation. Ceph blocks writes to a single chunk during
scrub.

Type
integer

Default
5

osd_scrub_chunk_max
Description
The maximum number of object store chunks to scrub during a single operation.

Type
integer

Default
25

osd_scrub_sleep
Description
Time to sleep before scrubbing the next group of chunks. Increasing this value slows down the overall rate of scrubbing, so
that client operations are less impacted.

296 IBM Storage Ceph


Type
float

Default
0.0

osd_scrub_extended_sleep::
Description Duration to inject a delay during scrubbing out of scrubbing hours or seconds.

Type
float

Default
0.0

osd_scrub_backoff_ratio::
Description Backoff ratio for scheduling scrubs. This is the percentage of ticks that do NOT schedule scrubs, 66% means that
1 out of 3 ticks schedules scrubs.

Type
float

Default
0.66

osd_deep_scrub_interval
Description
The interval for deep scrubbing, fully reading all data. The osd_scrub_load_threshold does not affect this setting.

Type
float

Default
7 days

osd_debug_deep_scrub_sleep::
Description Inject an expensive sleep during deep scrub IO to make it easier to induce preemption.

Type
float

Default
0

osd_scrub_interval_randomize_ratio
Description
Add a random delay to osd_scrub_min_interval when scheduling the next scrub job for a placement group. The delay is a
random value less than osd_scrub_min_interval * osd_scrub_interval_randomized_ratio. The default setting
spreads scrubs throughout the allowed time window of [1, 1.5] * osd_scrub_min_interval.

Type
float

Default
0.5

osd_deep_scrub_stride
Description
Read size when doing a deep scrub.

Type
size

Default
512 KB

osd_scrub_auto_repair_num_errors
Description
Auto repair does not occur if more than this many errors are found.

IBM Storage Ceph 297


Type
integer

Default
5

osd_scrub_auto_repair
Description
Setting this to true enables automatic Placement Group (PG) repair when errors are found by scrubs or deep-scrubs.
However, if more than osd_scrub_auto_repair_num_errors errors are found, a repair is NOT performed.

Type
boolean

Default
false

osd_scrub_max_preemptions
Description
Set the maximum number of times you need to preempt a deep scrub due to a client operation before blocking client IO to
complete the scrub.

Type
integer

Default
5

osd_deep_scrub_keys
Description
Number of keys to read from an object at a time during deep scrub.

Type
integer

Default
1024

BlueStore configuration options


Edit online
The following are Ceph BlueStore configuration options that can be configured during deployment.

NOTE: This list is not complete.

rocksdb_cache_size
Description
The size of the RocksDB cache in MB.

Type
32-bit Integer

Default
512

Administering
Edit online
Learn how to properly administer and operate IBM Storage Ceph.

Administration
Operations

298 IBM Storage Ceph


Administration
Edit online
Learn how to manage processes, monitor cluster states, manage users, and add and remove daemons for IBM Storage Ceph.

Ceph administration
Understanding process management for Ceph
Monitoring a Ceph storage cluster
Stretch clusters for Ceph storage
Override Ceph behavior
Ceph user management
The ceph-volume utility
Ceph performance benchmark
Ceph performance counters
BlueStore
Cephadm troubleshooting
Cephadm operations
Managing an IBM Storage Ceph cluster using cephadm-ansible modules

Ceph administration
Edit online
An IBM Storage Ceph cluster is the foundation for all Ceph deployments. After deploying an IBM Storage Ceph cluster, there are
administrative operations for keeping an IBM Storage Ceph cluster healthy and performing optimally.

This section helps storage administrators to perform such tasks as:

How do I check the health of my IBM Storage Ceph cluster?

How do I start and stop the IBM Storage Ceph cluster services?

How do I add or remove an OSD from a running IBM Storage Ceph cluster?

How do I manage user authentication and access controls to the objects stored in an IBM Storage Ceph cluster?

I want to understand how to use overrides with an IBM Storage Ceph cluster.

I want to monitor the performance of the IBM Storage Ceph cluster.

A basic Ceph storage cluster consist of two types of daemons:

A Ceph Object Storage Device (OSD) stores data as objects within placement groups assigned to the OSD

A Ceph Monitor maintains a master copy of the cluster map

A production system will have three or more Ceph Monitors for high availability and typically a minimum of 50 OSDs for acceptable
load balancing, data re-balancing and data recovery.

Reference
Edit online

For more information, see Installing.

Understanding process management for Ceph


Edit online

IBM Storage Ceph 299


As a storage administrator, you can manipulate the various Ceph daemons by type or instance in an IBM Storage Ceph cluster.
Manipulating these daemons allows you to start, stop and restart all of the Ceph services as needed.

Ceph process management


Starting, stopping, and restarting all Ceph daemons
Starting, stopping, and restarting all Ceph services
Viewing log files of Ceph daemons that run in containers
Powering down and rebooting IBM Storage Ceph cluster

Ceph process management


Edit online
In IBM Storage Ceph, all process management is done through the systemd service. Each time you want to start, restart, and
stop the Ceph daemons, you must specify the daemon type or the daemon instance.

For more information on using systemd, see Introduction to systemd and Managing system services with systemctl within the
Configuring basic system settings guide for Red Hat Enterprise Linux 8.

Starting, stopping, and restarting all Ceph daemons


Edit online
You can start, stop, and restart all Ceph daemons as the root user from the host where you want to stop the Ceph daemons.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Having root access to the node.

Procedure
Edit online

1. On the host where you want to start, stop, and restart the daemons, run the systemctl service to get the SERVICE_ID of the
service.

Example

[root@host01 ~]# systemctl --type=service


[email protected]

2. Starting all Ceph daemons:

Syntax

systemctl start SERVICE_ID

Example

[root@host01 ~]# systemctl start [email protected]

3. Stopping all Ceph daemons:

Syntax

systemctl stop SERVICE_ID

Example

[root@host01 ~]# systemctl stop [email protected]

300 IBM Storage Ceph


4. Restarting all Ceph daemons:

Syntax

systemctl restart SERVICE_ID

Example

[root@host01 ~]# systemctl restart ceph-499829b4-832f-11eb-8d6d-


[email protected]

Starting, stopping, and restarting all Ceph services


Edit online
Ceph services are logical groups of Ceph daemons of the same type, configured to run in the same IBM Storage Ceph cluster. The
orchestration layer in Ceph allows the user to manage these services in a centralized way, making it easy to execute operations that
affect all the Ceph daemons that belong to the same logical service. The Ceph daemons running in each host are managed through
the Systemd service. You can start, stop, and restart all Ceph services from the host where you want to manage the Ceph services.

IMPORTANT: If you want to start,stop, or restart a specific Ceph daemon in a specific host, you need to use the SystemD service. To
obtain a list of the SystemD services running in a specific host, connect to the host, and run the following command:

Example

[root@host01 ~]# systemctl list-units “ceph*”

The output will give you a list of the service names that you can use, to manage each Ceph daemon.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Having root access to the node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Run the ceph orch ls command to get a list of Ceph services configured in the IBM Storage Ceph cluster and to get the
specific service ID.

Example

[ceph: root@host01 /]# ceph orch ls


NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME
IMAGE ID
alertmanager 1/1 4m ago 4M count:1 cp.icr.io/cp/ibm-
ceph/prometheus-alertmanager:v4.10 b7bae610cd46
crash 3/3 4m ago 4M * cp.icr.io/cp/ibm-ceph/ceph-5-
rhel8:latest c88a5d60f510
grafana 1/1 4m ago 4M count:1 cp.icr.io/cp/ibm-ceph/ceph-5-
dashboard-rhel8:latest bd3d7748747b
mgr 2/2 4m ago 4M count:2 cp.icr.io/cp/ibm-ceph/ceph-5-
rhel8:latest c88a5d60f510
mon 2/2 4m ago 10w count:2 cp.icr.io/cp/ibm-ceph/ceph-5-
rhel8:latest c88a5d60f510
node-exporter 1/3 4m ago 4M * cp.icr.io/cp/ibm-
ceph/prometheus-node-exporter:v4.10 mix
osd.all-available-devices 5/5 4m ago 3M * cp.icr.io/cp/ibm-ceph/ceph-5-

IBM Storage Ceph 301


rhel8:latest c88a5d60f510
prometheus 1/1 4m ago 4M count:1 cp.icr.io/cp/ibm-
ceph/prometheus:v4.10 bebb0ddef7f0
rgw.test_realm.test_zone 2/2 4m ago 3M count:2 cp.icr.io/cp/ibm-ceph/ceph-5-
rhel8:latest c88a5d60f510

3. To start a specific service, run the following command:

Syntax

ceph orch start SERVICE_ID

Example

[ceph: root@host01 /]# ceph orch start node-exporter

4. To stop a specific service, run the following command:

IMPORTANT: The ceph orch stop SERVICE_ID command results in the IBM Storage Ceph cluster being inaccessible,
only for the MON and MGR service. It is recommended to use the systemctl stop SERVICE_ID command to stop a
specific daemon in the host.

Syntax

ceph orch stop SERVICE_ID

Example

[ceph: root@host01 /]# ceph orch stop node-exporter

In the example the ceph orch stop node-exporter command removes all the daemons of the node exporter service.

5. To restart a specific service, run the following command:

Syntax

ceph orch restart SERVICE_ID

Example

[ceph: root@host01 /]# ceph orch restart node-exporter

Viewing log files of Ceph daemons that run in containers


Edit online
Use the journald daemon from the container host to view a log file of a Ceph daemon from a container.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To view the entire Ceph log file, run a journalctl command as root composed in the following format:

Syntax

journalctl -u ceph SERVICE_ID

[root@host01 ~]# journalctl -u [email protected]

302 IBM Storage Ceph


In the above example, you can view the entire log for the OSD with ID osd.8.

2. To show only the recent journal entries, use the -f option.

Syntax

journalctl -fu SERVICE_ID

Example

[root@host01 ~]# journalctl -fu [email protected]

NOTE: You can also use the sosreport utility to view the journald logs. For more details about SOS reports, see the What is an
sosreport and how to create one in Red Hat Enterprise Linux? solution on the Red Hat Customer Portal.

Reference
Edit online

The journalctl manual page.

Powering down and rebooting IBM Storage Ceph cluster


Edit online
You can power down and reboot the IBM Storage Ceph cluster using two different approaches: systemctl commands and the Ceph
Orchestrator. You can choose either approach to power down and reboot the cluster.

Powering down and rebooting the cluster using the systemctl commands
Powering down and rebooting the cluster using the Ceph Orchestrator

Powering down and rebooting the cluster using the systemctl


commands
Edit online
You can use the systemctl commands approach to power down and reboot the IBM Storage Ceph cluster. This approach follows
the Linux way of stopping the services.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access.

Procedure
Edit online
Powering down the IBM Storage Ceph cluster

1. Stop the clients from using the Block Device images RADOS Gateway - Ceph Object Gateway on this cluster and any other
clients.

2. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

IBM Storage Ceph 303


3. The cluster must be in healthy state (Health_OK and all PGs active+clean) before proceeding. Run ceph status on the
host with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.

Example

[ceph: root@host01 /]# ceph -s

4. If you use the Ceph File System (CephFS), bring down the CephFS cluster:

Syntax

ceph fs set _FS_NAME_ max_mds 1


ceph fs fail _FS_NAME_
ceph status
ceph fs set _FS_NAME_ joinable false

Example

[ceph: root@host01 /]# ceph fs set cephfs max_mds 1


[ceph: root@host01 /]# ceph fs fail cephfs
[ceph: root@host01 /]# ceph status
[ceph: root@host01 /]# ceph fs set cephfs joinable false

5. Set the noout, norecover, norebalance, nobackfill, nodown, and pause flags. Run the following on a node with the
client keyrings, for example, the Ceph Monitor or OpenStack controller node:

Example

[ceph: root@host01 /]# ceph osd set noout


[ceph: root@host01 /]# ceph osd set norecover
[ceph: root@host01 /]# ceph osd set norebalance
[ceph: root@host01 /]# ceph osd set nobackfill
[ceph: root@host01 /]# ceph osd set nodown
[ceph: root@host01 /]# ceph osd set pause

6. If the MDS and Ceph Object Gateway nodes are on their own dedicated nodes, power them off.

7. Shut down the OSD nodes one by one:

Example

[root@host01 ~]# systemctl stop [email protected]

8. Shut down the monitor nodes one by one:

Example

[root@host01 ~]# systemctl stop [email protected]

9. Shut down the admin node.

Rebooting the IBM Storage Ceph cluster

1. If network equipment was involved, ensure it is powered ON and stable prior to powering ON any Ceph hosts or nodes.

2. Power ON the administration node.

3. Power ON the monitor nodes:

Example

[root@host01 ~]# systemctl start [email protected]

4. Power ON the OSD nodes:

Example

[root@host01 ~]# systemctl start [email protected]

5. Wait for all the nodes to come up. Verify all the services are up and there are no connectivity issues between the nodes.

6. Unset the noout, norecover, norebalance, nobackfill, nodown and pause flags. Run the following on a node with the
client keyrings, for example, the Ceph Monitor or OpenStack controller node:

304 IBM Storage Ceph


Example

[ceph: root@host01 /]# ceph osd unset noout


[ceph: root@host01 /]# ceph osd unset norecover
[ceph: root@host01 /]# ceph osd unset norebalance
[ceph: root@host01 /]# ceph osd unset nobackfill
[ceph: root@host01 /]# ceph osd unset nodown
[ceph: root@host01 /]# ceph osd unset pause

7. If you use the Ceph File System (CephFS), bring the CephFS cluster back up by setting the joinable flag to true:

Syntax

ceph fs set FS_NAME joinable true

Example

[ceph: root@host01 /]# ceph fs set cephfs joinable true

Verification
Edit online

Verify the cluster is in healthy state (Health_OK and all PGs active+clean). Run ceph status on a node with the client
keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.

Example

[ceph: root@host01 /]# ceph -s

Reference
Edit online

For more information on installing Ceph, see Installing.

Powering down and rebooting the cluster using the Ceph


Orchestrator
Edit online
You can also use the capabilities of the Ceph Orchestrator to power down and reboot the IBM Storage Ceph cluster. In most cases, it
is a single system login that can help in powering off the cluster.

The Ceph Orchestrator supports several operations, such as start, stop, and restart. You can use these commands with
systemctl, for some cases, in powering down or rebooting the cluster.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online
Powering down the IBM Storage Ceph cluster

1. Stop the clients from using the user Block Device Image and Ceph Object Gateway on this cluster and any other clients.

2. Log into the Cephadm shell:

IBM Storage Ceph 305


Example

[root@host01 ~]# cephadm shell

3. The cluster must be in healthy state (Health_OK and all PGs active+clean) before proceeding. Run ceph status on the
host with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.

Example

[ceph: root@host01 /]# ceph -s

4. If you use the Ceph File System (CephFS), bring down the CephFS cluster:

Syntax

ceph fs set _FS_NAME_ max_mds 1


ceph fs fail _FS_NAME_
ceph status
ceph fs set _FS_NAME_ joinable false
ceph mds fail _FS_NAME_:_N_

Example

[ceph: root@host01 /]# ceph fs set cephfs max_mds 1


[ceph: root@host01 /]# ceph fs fail cephfs
[ceph: root@host01 /]# ceph status
[ceph: root@host01 /]# ceph fs set cephfs joinable false
[ceph: root@host01 /]# ceph mds fail cephfs:1

5. Set the noout, norecover, norebalance, nobackfill, nodown, and pause flags. Run the following on a node with the
client keyrings, for example, the Ceph Monitor or OpenStack controller node:

Example

[ceph: root@host01 /]# ceph osd set noout


[ceph: root@host01 /]# ceph osd set norecover
[ceph: root@host01 /]# ceph osd set norebalance
[ceph: root@host01 /]# ceph osd set nobackfill
[ceph: root@host01 /]# ceph osd set nodown
[ceph: root@host01 /]# ceph osd set pause

6. Stop the MDS service.

a. Fetch the MDS service name:

Example

[ceph: root@host01 /]# ceph orch ls --service-type mds

b. Stop the MDS service using the fetched name in the previous step:

Syntax

ceph orch stop SERVICE-NAME

7. Stop the Ceph Object Gateway services. Repeat for each deployed service.

a. Fetch the Ceph Object Gateway service names:

Example

[ceph: root@host01 /]# ceph orch ls --service-type rgw

b. Stop the Ceph Object Gateway service using the fetched name:

Syntax

ceph orch stop SERVICE-NAME

8. Stop the Alertmanager service:

Example

[ceph: root@host01 /]# ceph orch stop alertmanager

306 IBM Storage Ceph


9. Stop the node-exporter service which is a part of the monitoring stack:

Example

[ceph: root@host01 /]# ceph orch stop node-exporter

10. Stop the Prometheus service:

Example

[ceph: root@host01 /]# ceph orch stop prometheus

11. Stop the Grafana dashboard service:

Example

[ceph: root@host01 /]# ceph orch stop grafana

12. Stop the crash service:

Example

[ceph: root@host01 /]# ceph orch stop crash

13. Shut down the OSD nodes from the cephadm node, one by one. Repeat this step for all the OSDs in the cluster.

a. Fetch the OSD ID:

Example

[ceph: root@host01 /]# ceph orch ps --daemon-type=osd

b. Shut down the OSD node using the OSD ID you fetched:

Example

[ceph: root@host01 /]# ceph orch daemon stop osd.1


Scheduled to stop osd.1 on host 'host02'

14. Stop the monitors one by one.

a. Identify the hosts hosting the monitors:

Example

[ceph: root@host01 /]# ceph orch ps --daemon-type mon

b. On each host, stop the monitor.

i. Identify the systemctl unit name:

Example

[ceph: root@host01 /]# systemctl list-units ceph-* | grep mon

ii. Stop the service:

Syntax

systemct stop SERVICE-NAME

15. Shut down all the hosts.

Rebooting the IBM Storage Ceph cluster

1. If network equipment was involved, ensure it is powered ON and stable prior to powering ON any Ceph hosts or nodes.

2. Power ON all the Ceph hosts.

3. Log into the administration node from the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

IBM Storage Ceph 307


4. Verify all the services are in running state:

Example

[ceph: root@host01 /]# ceph orch ls

5. Ensure the cluster health is Health_OK status:

Example

[ceph: root@host01 /]# ceph -s

6. Unset the noout, norecover, norebalance, nobackfill, nodown and pause flags. Run the following on a node with the
client keyrings, for example, the Ceph Monitor or OpenStack controller node:

Example

[ceph: root@host01 /]# ceph osd unset noout


[ceph: root@host01 /]# ceph osd unset norecover
[ceph: root@host01 /]# ceph osd unset norebalance
[ceph: root@host01 /]# ceph osd unset nobackfill
[ceph: root@host01 /]# ceph osd unset nodown
[ceph: root@host01 /]# ceph osd unset pause

7. If you use the Ceph File System (CephFS), bring the CephFS cluster back up by setting the joinable flag to true:

Syntax

ceph fs set FS_NAME joinable true

Example

[ceph: root@host01 /]# ceph fs set cephfs joinable true

Verification
Edit online

Verify the cluster is in healthy state (Health_OK and all PGs active+clean). Run ceph status on a node with the client
keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.

Example

[ceph: root@host01 /]# ceph -s

Reference
Edit online

For more information, see Installing.

Monitoring a Ceph storage cluster


Edit online
As a storage administrator, you can monitor the overall health of the IBM Storage Ceph cluster, along with monitoring the health of
the individual components of Ceph.

Once you have a running IBM Storage Ceph cluster, you might begin monitoring the storage cluster to ensure that the Ceph Monitor
and Ceph OSD daemons are running, at a high-level. Ceph storage cluster clients connect to a Ceph Monitor and receive the latest
version of the storage cluster map before they can read and write data to the Ceph pools within the storage cluster. So the monitor
cluster must have agreement on the state of the cluster before Ceph clients can read and write data.

Ceph OSDs must peer the placement groups on the primary OSD with the copies of the placement groups on secondary OSDs. If
faults arise, peering will reflect something other than the active + clean state.

High-level monitoring of a Ceph storage cluster

308 IBM Storage Ceph


Low-level monitoring of a Ceph storage cluster

High-level monitoring of a Ceph storage cluster


Edit online
As a storage administrator, you can monitor the health of the Ceph daemons to ensure that they are up and running. High level
monitoring also involves checking the storage cluster capacity to ensure that the storage cluster does not exceed its full ratio.
The IBM Storage Ceph Dashboard is the most common way to conduct high-level monitoring. However, you can also use the
command-line interface, the Ceph admin socket or the Ceph API to monitor the storage cluster.

Using the Ceph command interface interactively


Checking the storage cluster health
Watching storage cluster events
How Ceph calculates data usage
Understanding the storage clusters usage stats
Understanding the OSD usage stats
Checking the storage cluster status
Checking the Ceph Monitor status
Using the Ceph administration socket
Understanding the Ceph OSD status

Using the Ceph command interface interactively


Edit online
You can interactively interface with the Ceph storage cluster by using the ceph command-line utility.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To run the ceph utility in interactive mode.

Syntax

podman exec -it ceph-mon-MONITOR_NAME /bin/bash

Replace:

MONITOR_NAME with the name of the Ceph Monitor container, found by running the podman ps command.

Example

[root@host01 ~]# podman exec -it ceph-499829b4-832f-11eb-8d6d-001a4a000635-mon.host01


/bin/bash

This example opens an interactive terminal session on mon.host01, where you can start the Ceph interactive shell.

Checking the storage cluster health

IBM Storage Ceph 309


Edit online
After you start the Ceph storage cluster, and before you start reading or writing data, check the storage cluster’s health first.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

root@host01 ~]# cephadm shell

2. You can check on the health of the Ceph storage cluster with the following command:

Example

[ceph: root@host01 /]# ceph health


HEALTH_OK

3. You can check the status of the Ceph storage cluster by running ceph status command:

Example

[ceph: root@host01 /]# ceph status

The output provides the following information:

Cluster ID

Cluster health status

The monitor map epoch and the status of the monitor quorum.

The OSD map epoch and the status of OSDs.

The status of Ceph Managers.

The status of Object Gateways.

The placement group map version.

The number of placement groups and pools.

The notional amount of data stored and the number of objects stored.

The total amount of data stored.

Upon starting the Ceph cluster, you will likely encounter a health warning such as HEALTH_WARN XXX num
placement groups stale. Wait a few moments and check it again. When the storage cluster is ready, ceph
health should return a message such as HEALTH_OK. At that point, it is okay to begin using the cluster.

Watching storage cluster events


Edit online
You can watch events that are happening with the Ceph storage cluster using the command-line interface.

310 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

root@host01 ~]# cephadm shell

2. To watch the cluster’s ongoing events, run the following command:

Example

[ceph: root@host01 /]# ceph -w


cluster:
id: 8c9b0072-67ca-11eb-af06-001a4a0002a0
health: HEALTH_OK

services:
mon: 2 daemons, quorum Ceph5-2,Ceph5-adm (age 3d)
mgr: Ceph5-1.nqikfh(active, since 3w), standbys: Ceph5-adm.meckej
osd: 5 osds: 5 up (since 2d), 5 in (since 8w)
rgw: 2 daemons active (test_realm.test_zone.Ceph5-2.bfdwcn, test_realm.test_zone.Ceph5-
adm.acndrh)

data:
pools: 11 pools, 273 pgs
objects: 459 objects, 32 KiB
usage: 2.6 GiB used, 72 GiB / 75 GiB avail
pgs: 273 active+clean

io:
client: 170 B/s rd, 730 KiB/s wr, 0 op/s rd, 729 op/s wr

2022-11-02 15:45:21.655871 osd.0 [INF] 17.71 deep-scrub ok


2022-11-02 15:45:47.880608 osd.1 [INF] 1.0 scrub ok
2022-11-02 15:45:48.865375 osd.1 [INF] 1.3 scrub ok
2022-11-02 15:45:50.866479 osd.1 [INF] 1.4 scrub ok
2022-11-02 15:45:01.345821 mon.0 [INF] pgmap v41339: 952 pgs: 952 active+clean; 17130 MB data,
115 GB used, 167 GB / 297 GB avail
2022-11-02 15:45:05.718640 mon.0 [INF] pgmap v41340: 952 pgs: 1 active+clean+scrubbing+deep,
951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
2022-11-02 15:45:53.997726 osd.1 [INF] 1.5 scrub ok
2022-11-02 15:45:06.734270 mon.0 [INF] pgmap v41341: 952 pgs: 1 active+clean+scrubbing+deep,
951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
2022-11-02 15:45:15.722456 mon.0 [INF] pgmap v41342: 952 pgs: 952 active+clean; 17130 MB data,
115 GB used, 167 GB / 297 GB avail
2022-11-02 15:46:06.836430 osd.0 [INF] 17.75 deep-scrub ok
2022-11-02 15:45:55.720929 mon.0 [INF] pgmap v41343: 952 pgs: 1 active+clean+scrubbing+deep,
951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail

How Ceph calculates data usage


Edit online
The used value reflects the actual amount of raw storage used. The xxx GB / xxx GB value means the amount available, the
lesser of the two numbers, of the overall storage capacity of the cluster. The notional number reflects the size of the stored data
before it is replicated, cloned or snapshotted. Therefore, the amount of data actually stored typically exceeds the notional amount
stored, because Ceph creates replicas of the data and may also use storage capacity for cloning and snapshotting.

IBM Storage Ceph 311


Understanding the storage clusters usage stats
Edit online
To check a cluster’s data usage and data distribution among pools, use the df option. It is similar to the Linux df command.

The SIZE/AVAIL/RAW USED in the ceph df and ceph status command output are different if some OSDs are marked OUT of the
cluster compared to when all OSDs are IN. The SIZE/AVAIL/RAW USED is calculated from sum of SIZE (osd disk size), RAW USE
(total used space on disk), and AVAIL of all OSDs which are in IN state. You can see the total of SIZE/AVAIL/RAW USED for all OSDs
in ceph osd df tree command output.

Example

[ceph: root@host01 /]#ceph df


--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 5 TiB 2.9 TiB 2.1 TiB 2.1 TiB 42.98
TOTAL 5 TiB 2.9 TiB 2.1 TiB 2.1 TiB 42.98

--- POOLS ---


POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 5.3 MiB 3 16 MiB 0 629 GiB
.rgw.root 2 32 1.3 KiB 4 48 KiB 0 629 GiB
default.rgw.log 3 32 3.6 KiB 209 408 KiB 0 629 GiB
default.rgw.control 4 32 0 B 8 0 B 0 629 GiB
default.rgw.meta 5 32 1.7 KiB 10 96 KiB 0 629 GiB
default.rgw.buckets.index 7 32 5.5 MiB 22 17 MiB 0 629 GiB
default.rgw.buckets.data 8 32 807 KiB 3 2.4 MiB 0 629 GiB
default.rgw.buckets.non-ec 9 32 1.0 MiB 1 3.1 MiB 0 629 GiB

source-ecpool-86 11 32 1.2 TiB 391.13k 2.1 TiB 53.49 1.1 TiB

The ceph df detail command gives more details about other pool statistics such as quota objects, quota bytes, used
compression, and under compression.

The RAW STORAGE section of the output provides an overview of the amount of storage the storage cluster manages for data.

CLASS: The class of OSD device.

SIZE: The amount of storage capacity managed by the storage cluster.

In the above example, if the SIZE is 90 GiB, it is the total size without the replication factor, which is three by default. The
total available capacity with the replication factor is 90 GiB/3 = 30 GiB. Based on the full ratio, which is 0.85% by default, the
maximum available space is 30 GiB * 0.85 = 25.5 GiB

AVAIL: The amount of free space available in the storage cluster.

In the above example, if the SIZE is 90 GiB and the USED space is 6 GiB, then the AVAIL space is 84 GiB. The total available
space with the replication factor, which is three by default, is 84 GiB/3 = 28 GiB

USED: The amount of raw storage consumed by user data.

In the above example, 100 MiB is the total space available after considering the replication factor. The actual available size is
33 MiB.

RAW USED: The amount of raw storage consumed by user data, internal overhead, or reserved capacity.

% RAW USED: The percentage of RAW USED. Use this number in conjunction with the full ratio and near full ratio
to ensure that you are not reaching the storage cluster’s capacity.

The POOLS section of the output provides a list of pools and the notional usage of each pool. The output from this section DOES NOT
reflect replicas, clones or snapshots. For example, if you store an object with 1 MB of data, the notional usage will be 1 MB, but the
actual usage may be 3 MB or more depending on the number of replicas for example, size = 3, clones and snapshots.

POOL: The name of the pool.

ID: The pool ID.

312 IBM Storage Ceph


STORED: The actual amount of data stored by the user in the pool. This value changes based on the raw usage data based on
(k+M)/K values, number of object copies, and the number of objects degraded at the time of pool stats calculation.

OBJECTS: The notional number of objects stored per pool. It is STORED size * replication factor.

USED: The notional amount of data stored in kilobytes, unless the number appends M for megabytes or G for gigabytes.

%USED: The notional percentage of storage used per pool.

MAX AVAIL: An estimate of the notional amount of data that can be written to this pool. It is the amount of data that can be
used before the first OSD becomes full. It considers the projected distribution of data across disks from the CRUSH map and
uses the first OSD to fill up as the target.

In the above example, MAX AVAIL is 153.85 MB without considering the replication factor, which is three by default.

See the Knowledgebase article titled ceph df MAX AVAIL is incorrect for simple replicated pool to calculate the value of MAX
AVAIL.

QUOTA OBJECTS: The number of quota objects.

QUOTA BYTES: The number of bytes in the quota objects.

USED COMPR: The amount of space allocated for compressed data including his includes compressed data, allocation,
replication and erasure coding overhead.

UNDER COMPR: The amount of data passed through compression and beneficial enough to be stored in a compressed form.

NOTE: The numbers in the POOLS section are notional. They are not inclusive of the number of replicas, snapshots or clones. As a
result, the sum of the USED and %USED amounts will not add up to the RAW USED and %RAW USED amounts in the GLOBAL
section of the output.

NOTE: The MAX AVAIL value is a complicated function of the replication or erasure code used, the CRUSH rule that maps storage to
devices, the utilization of those devices, and the configured mon_osd_full_ratio.

Reference
Edit online

See How Ceph calculates data usage for details.

See Understanding the OSD usage stats for details.

Understanding the OSD usage stats


Edit online
Use the ceph osd df command to view OSD utilization stats.

Example

[ceph: root@host01 /]# ceph osd df


ID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS
3 hdd 0.90959 1.00000 931GiB 70.1GiB 69.1GiB 0B 1GiB 861GiB 7.53 2.93 66
4 hdd 0.90959 1.00000 931GiB 1.30GiB 308MiB 0B 1GiB 930GiB 0.14 0.05 59
0 hdd 0.90959 1.00000 931GiB 18.1GiB 17.1GiB 0B 1GiB 913GiB 1.94 0.76 57
MIN/MAX VAR: 0.02/2.98 STDDEV: 2.91

ID: The name of the OSD.

CLASS: The type of devices the OSD uses.

WEIGHT: The weight of the OSD in the CRUSH map.

REWEIGHT: The default reweight value.

SIZE: The overall storage capacity of the OSD.

IBM Storage Ceph 313


USE: The OSD capacity.

DATA: The amount of OSD capacity that is used by user data.

OMAP: An estimate value of the bluefs storage that is being used to store object map (omap) data (key value pairs stored in
rocksdb).

META: The bluefs space allocated, or the value set in the bluestore_bluefs_min parameter, whichever is larger, for
internal metadata which is calculated as the total space allocated in bluefs minus the estimated omap data size.

AVAIL: The amount of free space available on the OSD.

%USE: The notional percentage of storage used by the OSD

VAR: The variation above or below average utilization.

PGS: The number of placement groups in the OSD.

MIN/MAX VAR: The minimum and maximum variation across all OSDs.

Reference
Edit online
For more information, see:

How Ceph calculates data usage

CRUSH Weights

Checking the storage cluster status


Edit online
You can check the status of the IBM Storage Ceph cluster from the command-line interface. The status sub command or the -s
argument will display the current status of the storage cluster.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. To check a storage cluster’s status, execute the following:

Example

[ceph: root@host01 /]# ceph status

Or

Example

[ceph: root@host01 /]# ceph -s

314 IBM Storage Ceph


3. In interactive mode, type ceph and press Enter:

Example

[ceph: root@host01 /]# ceph


ceph> status
cluster:
id: 499829b4-832f-11eb-8d6d-001a4a000635
health: HEALTH_WARN
1 stray daemon(s) not managed by cephadm
1/3 mons down, quorum host03,host02
too many PGs per OSD (261 > max 250)

services:
mon: 3 daemons, quorum host03,host02 (age 3d), out of quorum: host01
mgr: host01.hdhzwn(active, since 9d), standbys: host05.eobuuv, host06.wquwpj
osd: 12 osds: 11 up (since 2w), 11 in (since 5w)
rgw: 2 daemons active (test_realm.test_zone.host04.hgbvnq,
test_realm.test_zone.host05.yqqilm)

data:
pools: 8 pools, 960 pgs
objects: 414 objects, 1.0 MiB
usage: 5.7 GiB used, 214 GiB / 220 GiB avail
pgs: 960 active+clean

io:
client: 41 KiB/s rd, 0 B/s wr, 41 op/s rd, 27 op/s wr

ceph> health
HEALTH_WARN 1 stray daemon(s) not managed by cephadm; 1/3 mons down, quorum host03,host02; too
many PGs per OSD (261 > max 250)

ceph> mon stat


e3: 3 mons at {host01=[v2:10.74.255.0:3300/0,v1:10.74.255.0:6789/0],host02=
[v2:10.74.249.253:3300/0,v1:10.74.249.253:6789/0],host03=
[v2:10.74.251.164:3300/0,v1:10.74.251.164:6789/0]}, election epoch 6688, leader 1 host03,
quorum 1,2 host03,host02

Checking the Ceph Monitor status


Edit online
If the storage cluster has multiple Ceph Monitors, which is a requirement for a production IBM Storage Ceph cluster, then you can
check the Ceph Monitor quorum status after starting the storage cluster, and before doing any reading or writing of data.

A quorum must be present when multiple Ceph Monitors are running.

Check the Ceph Monitor status periodically to ensure that they are running. If there is a problem with the Ceph Monitor, that prevents
an agreement on the state of the storage cluster, the fault can prevent Ceph clients from reading and writing data.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

IBM Storage Ceph 315


2. To display the Ceph Monitor map, execute the following:

Example

[ceph: root@host01 /]# ceph mon stat

or

Example

[ceph: root@host01 /]# ceph mon dump

3. To check the quorum status for the storage cluster, execute the following:

[ceph: root@host01 /]# ceph quorum_status -f json-pretty

Ceph returns the quorum status.

Example

{
"election_epoch": 6686,
"quorum": [
0,
1,
2
],
"quorum_names": [
"host01",
"host03",
"host02"
],
"quorum_leader_name": "host01",
"quorum_age": 424884,
"features": {
"quorum_con": "4540138297136906239",
"quorum_mon": [
"kraken",
"luminous",
"mimic",
"osdmap-prune",
"nautilus",
"octopus",
"pacific",
"elector-pinging"
]
},
"monmap": {
"epoch": 3,
"fsid": "499829b4-832f-11eb-8d6d-001a4a000635",
"modified": "2021-03-15T04:51:38.621737Z",
"created": "2021-03-12T12:35:16.911339Z",
"min_mon_release": 16,
"min_mon_release_name": "pacific",
"election_strategy": 1,
"disallowed_leaders: ": "",
"stretch_mode": false,
"features": {
"persistent": [
"kraken",
"luminous",
"mimic",
"osdmap-prune",
"nautilus",
"octopus",
"pacific",
"elector-pinging"
],
"optional": []
},
"mons": [
{
"rank": 0,
"name": "host01",
"public_addrs": {

316 IBM Storage Ceph


"addrvec": [
{
"type": "v2",
"addr": "10.74.255.0:3300",
"nonce": 0
},
{
"type": "v1",
"addr": "10.74.255.0:6789",
"nonce": 0
}
]
},
"addr": "10.74.255.0:6789/0",
"public_addr": "10.74.255.0:6789/0",
"priority": 0,
"weight": 0,
"crush_location": "{}"
},
{
"rank": 1,
"name": "host03",
"public_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.74.251.164:3300",
"nonce": 0
},
{
"type": "v1",
"addr": "10.74.251.164:6789",
"nonce": 0
}
]
},
"addr": "10.74.251.164:6789/0",
"public_addr": "10.74.251.164:6789/0",
"priority": 0,
"weight": 0,
"crush_location": "{}"
},
{
"rank": 2,
"name": "host02",
"public_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.74.249.253:3300",
"nonce": 0
},
{
"type": "v1",
"addr": "10.74.249.253:6789",
"nonce": 0
}
]
},
"addr": "10.74.249.253:6789/0",
"public_addr": "10.74.249.253:6789/0",
"priority": 0,
"weight": 0,
"crush_location": "{}"
}
]
}
}

Using the Ceph administration socket


Edit online

IBM Storage Ceph 317


Use the administration socket to interact with a given daemon directly by using a UNIX socket file. For example, the socket enables
you to:

List the Ceph configuration at runtime

Set configuration values at runtime directly without relying on Monitors. This is useful when Monitors are down.

Dump historic operations

Dump the operation priority queue state

Dump operations without rebooting

Dump performance counters

In addition, using the socket is helpful when troubleshooting problems related to Ceph Monitors or OSDs.

Regardless, if the daemon is not running, a following error is returned when attempting to use the administration socket:

Error 111: Connection Refused

IMPORTANT:

The administration socket is only available while a daemon is running. When you shut down the daemon properly, the administration
socket is removed. However, if the daemon terminates unexpectedly, the administration socket might persist.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. To use the socket:

Syntax

ceph daemon MONITOR_ID COMMAND

Replace:

MONITOR_ID of the daemon

COMMAND with the command to run. Use help to list the available commands for a given daemon.

To view the status of a Ceph Monitor:

Example

[ceph: root@host01 /]# ceph daemon mon.host01 help


{
"add_bootstrap_peer_hint": "add peer address as potential bootstrap peer for cluster
bringup",
"add_bootstrap_peer_hintv": "add peer address vector as potential bootstrap peer for
cluster bringup",
"compact": "cause compaction of monitor's leveldb/rocksdb storage",
"config diff": "dump diff of current config and default config",
"config diff get": "dump diff get <field>: dump diff of current and default config
setting <field>",

318 IBM Storage Ceph


"config get": "config get <field>: get the config value",
"config help": "get config setting schema and descriptions",
"config set": "config set <field> <val> [<val> ...]: set a config variable",
"config show": "dump current config settings",
"config unset": "config unset <field>: unset a config variable",
"connection scores dump": "show the scores used in connectivity-based elections",
"connection scores reset": "reset the scores used in connectivity-based elections",
"dump_historic_ops": "dump_historic_ops",
"dump_mempools": "get mempool stats",
"get_command_descriptions": "list available commands",
"git_version": "get git sha1",
"heap": "show heap usage info (available only if compiled with tcmalloc)",
"help": "list available commands",
"injectargs": "inject configuration arguments into running daemon",
"log dump": "dump recent log entries to log file",
"log flush": "flush log entries to log file",
"log reopen": "reopen log file",
"mon_status": "report status of monitors",
"ops": "show the ops currently in flight",
"perf dump": "dump perfcounters value",
"perf histogram dump": "dump perf histogram values",
"perf histogram schema": "dump perf histogram schema",
"perf reset": "perf reset <name>: perf reset all or one perfcounter name",
"perf schema": "dump perfcounters schema",
"quorum enter": "force monitor back into quorum",
"quorum exit": "force monitor out of the quorum",
"sessions": "list existing sessions",
"smart": "Query health metrics for underlying device",
"sync_force": "force sync of and clear monitor store",
"version": "get ceph version"
}

Example

[ceph: root@host01 /]# ceph daemon mon.host01 mon_status

{
"name": "host01",
"rank": 0,
"state": "leader",
"election_epoch": 120,
"quorum": [
0,
1,
2
],
"quorum_age": 206358,
"features": {
"required_con": "2449958747317026820",
"required_mon": [
"kraken",
"luminous",
"mimic",
"osdmap-prune",
"nautilus",
"octopus",
"pacific",
"elector-pinging"
],
"quorum_con": "4540138297136906239",
"quorum_mon": [
"kraken",
"luminous",
"mimic",
"osdmap-prune",
"nautilus",
"octopus",
"pacific",
"elector-pinging"
]
},
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": {
"epoch": 3,

IBM Storage Ceph 319


"fsid": "81a4597a-b711-11eb-8cb8-001a4a000740",
"modified": "2021-05-18T05:50:17.782128Z",
"created": "2021-05-17T13:13:13.383313Z",
"min_mon_release": 16,
"min_mon_release_name": "pacific",
"election_strategy": 1,
"disallowed_leaders: ": "",
"stretch_mode": false,
"features": {
"persistent": [
"kraken",
"luminous",
"mimic",
"osdmap-prune",
"nautilus",
"octopus",
"pacific",
"elector-pinging"
],
"optional": []
},
"mons": [
{
"rank": 0,
"name": "host01",
"public_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.74.249.41:3300",
"nonce": 0
},
{
"type": "v1",
"addr": "10.74.249.41:6789",
"nonce": 0
}
]
},
"addr": "10.74.249.41:6789/0",
"public_addr": "10.74.249.41:6789/0",
"priority": 0,
"weight": 0,
"crush_location": "{}"
},
{
"rank": 1,
"name": "host02",
"public_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.74.249.55:3300",
"nonce": 0
},
{
"type": "v1",
"addr": "10.74.249.55:6789",
"nonce": 0
}
]
},
"addr": "10.74.249.55:6789/0",
"public_addr": "10.74.249.55:6789/0",
"priority": 0,
"weight": 0,
"crush_location": "{}"
},
{
"rank": 2,
"name": "host03",
"public_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.74.249.49:3300",

320 IBM Storage Ceph


"nonce": 0
},
{
"type": "v1",
"addr": "10.74.249.49:6789",
"nonce": 0
}
]
},
"addr": "10.74.249.49:6789/0",
"public_addr": "10.74.249.49:6789/0",
"priority": 0,
"weight": 0,
"crush_location": "{}"
}
]
},
"feature_map": {
"mon": [
{
"features": "0x3f01cfb9fffdffff",
"release": "luminous",
"num": 1
}
],
"osd": [
{
"features": "0x3f01cfb9fffdffff",
"release": "luminous",
"num": 3
}
]
},
"stretch_mode": false
}

3. Alternatively, specify the Ceph daemon by using its socket file:

Syntax

ceph daemon /var/run/ceph/SOCKET_FILE COMMAND

4. To view the status of an Ceph OSD named osd.2:

Example

[ceph: root@host01 /]# ceph daemon /var/run/ceph/ceph-osd.2.asok status

5. To list all socket files for the Ceph processes:

Example

[ceph: root@host01 /]# ls /var/run/ceph

Reference
Edit online

For more information, see Troubleshooting.

Understanding the Ceph OSD status


Edit online
A Ceph OSD’s status is either in the storage cluster, or out of the storage cluster. It is either up and running, or it is down and not
running. If a Ceph OSD is up, it can be either in the storage cluster, where data can be read and written, or it is out of the storage
cluster. If it was in the storage cluster and recently moved out of the storage cluster, Ceph starts migrating placement groups to
other Ceph OSDs. If a Ceph OSD is out of the storage cluster, CRUSH will not assign placement groups to the Ceph OSD. If a Ceph
OSD is down, it should also be out.

IBM Storage Ceph 321


NOTE: If a Ceph OSD is down and in, there is a problem, and the storage cluster will not be in a healthy state.

Figure 1. OSD States

If you execute a command such as ceph health, ceph -s or ceph -w, you might notice that the storage cluster does not always
echo back HEALTH OK. Do not panic. With respect to Ceph OSDs, you can expect that the storage cluster will NOT echo HEALTH OK
in a few expected circumstances:

You have not started the storage cluster yet, and it is not responding.

You have just started or restarted the storage cluster, and it is not ready yet, because the placement groups are getting
created and the Ceph OSDs are in the process of peering.

You just added or removed a Ceph OSD.

You just modified the storage cluster map.

An important aspect of monitoring Ceph OSDs is to ensure that when the storage cluster is up and running that all Ceph OSDs that
are in the storage cluster are up and running, too.

To see if all OSDs are running, execute:

Example

[ceph: root@host01 /]# ceph osd stat

or

Example

[ceph: root@host01 /]# ceph osd dump

The result should tell you the map epoch, eNNNN, the total number of OSDs, x, how many, y, are up, and how many, z, are in:

eNNNN: x osds: y up, z in

If the number of Ceph OSDs that are in the storage cluster are more than the number of Ceph OSDs that are up. Execute the
following command to identify the ceph-osd daemons that are not running:

Example

[ceph: root@host01 /]# ceph osd tree

# id weight type name up/down reweight


-1 3 pool default
-3 3 rack mainrack
-2 3 host osd-host
0 1 osd.0 up 1
1 1 osd.1 up 1
2 1 osd.2 up 1

TIP: The ability to search through a well-designed CRUSH hierarchy can help you troubleshoot the storage cluster by identifying the
physical locations faster.

If a Ceph OSD is down, connect to the node and start it. You can use IBM Storage Ceph Console to restart the Ceph OSD daemon, or
you can use the command line.

Syntax

322 IBM Storage Ceph


systemctl start CEPH_OSD_SERVICE_ID

Example

[root@host01 ~]# systemctl start [email protected]

Reference
Edit online

For more information, see Dashboard.

Low-level monitoring of a Ceph storage cluster


Edit online
As a storage administrator, you can monitor the health of an IBM Storage Ceph cluster from a low-level perspective. Low-level
monitoring typically involves ensuring that Ceph OSDs are peering properly. When peering faults occur, placement groups operate in
a degraded state. This degraded state can be the result of many different things, such as hardware failure, a hung or crashed Ceph
daemon, network latency, or a complete site outage.

Monitoring Placement Group Sets


Ceph OSD peering
Placement Group States
Placement Group creating state
Placement group peering state
Placement group active state
Placement Group clean state
Placement Group degraded state
Placement Group recovering state
Back fill state
Placement Group remapped state
Placement Group stale state
Placement Group misplaced state
Placement Group incomplete state
Identifying stuck Placement Groups
Finding an object’s location

Monitoring Placement Group Sets


Edit online
When CRUSH assigns placement groups to Ceph OSDs, it looks at the number of replicas for the pool and assigns the placement
group to Ceph OSDs such that each replica of the placement group gets assigned to a different Ceph OSD. For example, if the pool
requires three replicas of a placement group, CRUSH may assign them to osd.1, osd.2 and osd.3 respectively. CRUSH actually
seeks a pseudo-random placement that will take into account failure domains you set in the CRUSH map, so you will rarely see
placement groups assigned to nearest neighbor Ceph OSDs in a large cluster. We refer to the set of Ceph OSDs that should contain
the replicas of a particular placement group as the Acting Set. In some cases, an OSD in the Acting Set is down or otherwise not able
to service requests for objects in the placement group. When these situations arise, do not panic. Common examples include:

You added or removed an OSD. Then, CRUSH reassigned the placement group to other Ceph OSDs, thereby changing the
composition of the acting set and spawning the migration of data with a "backfill" process.

A Ceph OSD was down, was restarted and is now recovering.

A Ceph OSD in the acting set is down or unable to service requests, and another Ceph OSD has temporarily assumed its duties.

Ceph processes a client request using the Up Set, which is the set of Ceph OSDs that actually handle the requests. In most cases,
the up set and the Acting Set are virtually identical. When they are not, it can indicate that Ceph is migrating data, a Ceph OSD is
recovering, or that there is a problem, that is, Ceph usually echoes a HEALTH WARN state with a "stuck stale" message in such
scenarios.

IBM Storage Ceph 323


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. To retrieve a list of placement groups:

Example

[ceph: root@host01 /]# ceph pg dump

3. View which Ceph OSDs are in the Acting Set or in the Up Set for a given placement group:

Syntax

ceph pg map PG_NUM

Example

[ceph: root@host01 /]# ceph pg map 128

NOTE: If the Up Set and Acting Set do not match, this may be an indicator that the storage cluster rebalancing itself or of a
potential problem with the storage cluster.

Ceph OSD peering


Edit online
Before you can write data to a placement group, it must be in an active state, and it should be in a clean state. For Ceph to
determine the current state of a placement group, the primary OSD of the placement group that is, the first OSD in the acting set,
peers with the secondary and tertiary OSDs to establish agreement on the current state of the placement group. Assuming a pool
with three replicas of the PG.

Peering

Figure 1. Peering

Placement Group States


324 IBM Storage Ceph
Edit online
If you execute a command such as ceph health, ceph -s or ceph -w, you may notice that the cluster does not always echo back
HEALTH OK. After you check to see if the OSDs are running, you should also check placement group states. You should expect that
the cluster will NOT echo HEALTH OK in a number of placement group peering-related circumstances:

You have just created a pool and placement groups have not peered yet.

The placement groups are recovering.

You have just added an OSD to or removed an OSD from the cluster.

You have just modified the CRUSH map and the placement groups are migrating.

There is inconsistent data in different replicas of a placement group.

Ceph is scrubbing a placement group’s replicas.

Ceph does not have enough storage capacity to complete backfilling operations.

If one of the foregoing circumstances causes Ceph to echo HEALTH WARN, do not panic. In many cases, the cluster will recover on its
own. In some cases, you may need to take action. An important aspect of monitoring placement groups is to ensure that when the
cluster is up and running that all placement groups are active, and preferably in the clean state.

To see the status of all placement groups, execute:

Example

[ceph: root@host01 /]# ceph pg stat

The result should tell you the placement group map version, vNNNNNN, the total number of placement groups, x, and how many
placement groups, y, are in a particular state such as active+clean:

vNNNNNN: x pgs: y active+clean; z bytes data, aa MB used, bb GB / cc GB avail

NOTE: It is common for Ceph to report multiple states for placement groups.

Snapshot Trimming PG States

When snapshots exist, two additional PG states will be reported.

snaptrim : The PGs are currently being trimmed

snaptrim_wait : The PGs are waiting to be trimmed

Example Output:

244 active+clean+snaptrim_wait
32 active+clean+snaptrim

In addition to the placement group states, Ceph will also echo back the amount of data used, aa, the amount of storage capacity
remaining, bb, and the total storage capacity cc for the placement group. These numbers can be important in a few cases:

You are reaching the near full ratio or full ratio.

Your data isn’t getting distributed across the cluster due to an error in the CRUSH configuration.

Placement Group IDs

Placement group IDs consist of the pool number, and not the pool name, followed by a period (.) and the placement group ID—a
hexadecimal number. You can view pool numbers and their names from the output of ceph osd lspools. The default pool names
data, metadata and rbd correspond to pool numbers 0, 1 and 2 respectively. A fully qualified placement group ID has the following
form:

Syntax

POOL_NUM.PG_ID

Example output:

0.1f

To retrieve a list of placement groups:

IBM Storage Ceph 325


Example

[ceph: root@host01 /]# ceph pg dump

To format the output in JSON format and save it to a file:

Syntax

ceph pg dump -o FILE_NAME --format=json

Example

[ceph: root@host01 /]# ceph pg dump -o test --format=json

Query a particular placement group:

Syntax

ceph pg POOL_NUM.PG_ID query

Example

[ceph: root@host01 /]# ceph pg 5.fe query


{
"snap_trimq": "[]",
"snap_trimq_len": 0,
"state": "active+clean",
"epoch": 2449,
"up": [
3,
8,
10
],
"acting": [
3,
8,
10
],
"acting_recovery_backfill": [
"3",
"8",
"10"
],
"info": {
"pgid": "5.ff",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"purged_snaps": [],
"history": {
"epoch_created": 114,
"epoch_pool_created": 82,
"last_epoch_started": 2402,
"last_interval_started": 2401,
"last_epoch_clean": 2402,
"last_interval_clean": 2401,
"last_epoch_split": 114,
"last_epoch_marked_full": 0,
"same_up_since": 2401,
"same_interval_since": 2401,
"same_primary_since": 2086,
"last_scrub": "0'0",
"last_scrub_stamp": "2022-10-17T01:32:03.763988+0000",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2022-10-17T01:32:03.763988+0000",
"last_clean_scrub_stamp": "2022-10-17T01:32:03.763988+0000",
"prior_readable_until_ub": 0
},
"stats": {
"version": "0'0",
"reported_seq": "2989",
"reported_epoch": "2449",
"state": "active+clean",

326 IBM Storage Ceph


"last_fresh": "2022-10-18T05:16:59.401080+0000",
"last_change": "2022-10-17T01:32:03.764162+0000",
"last_active": "2022-10-18T05:16:59.401080+0000",
....

Reference
Edit online

See the Object Storage Daemon (OSD) configuration options for more details on the snapshot trimming settings.

Placement Group creating state


Edit online
When you create a pool, it will create the number of placement groups you specified. Ceph will echo creating when it is creating
one or more placement groups. Once they are created, the OSDs that are part of a placement group’s Acting Set will peer. Once
peering is complete, the placement group status should be active+clean, which means a Ceph client can begin writing to the
placement group.

Figure 1. Creating PGs

Placement group peering state


Edit online
When Ceph is Peering a placement group, Ceph is bringing the OSDs that store the replicas of the placement group into agreement
about the state of the objects and metadata in the placement group. When Ceph completes peering, this means that the OSDs that
store the placement group agree about the current state of the placement group. However, completion of the peering process does
NOT mean that each replica has the latest contents.

Authoritative History

Ceph will NOT acknowledge a write operation to a client, until all OSDs of the acting set persist the write operation. This practice
ensures that at least one member of the acting set will have a record of every acknowledged write operation since the last successful
peering operation.

With an accurate record of each acknowledged write operation, Ceph can construct and disseminate a new authoritative history of
the placement group. A complete, and fully ordered set of operations that, if performed, would bring an OSD’s copy of a placement
group up to date.

Placement group active state


Edit online
Once Ceph completes the peering process, a placement group may become active. The active state means that the data in the
placement group is generally available in the primary placement group and the replicas for read and write operations.

IBM Storage Ceph 327


Placement Group clean state
Edit online
When a placement group is in the clean state, the primary OSD and the replica OSDs have successfully peered and there are no
stray replicas for the placement group. Ceph replicated all objects in the placement group the correct number of times.

Placement Group degraded state


Edit online
When a client writes an object to the primary OSD, the primary OSD is responsible for writing the replicas to the replica OSDs. After
the primary OSD writes the object to storage, the placement group will remain in a degraded state until the primary OSD has
received an acknowledgement from the replica OSDs that Ceph created the replica objects successfully.

The reason a placement group can be active+degraded is that an OSD may be active even though it doesn’t hold all of the
objects yet. If an OSD goes down, Ceph marks each placement group assigned to the OSD as degraded. The Ceph OSDs must peer
again when the Ceph OSD comes back online. However, a client can still write a new object to a degraded placement group if it is
active.

If an OSD is down and the degraded condition persists, Ceph may mark the down OSD as out of the cluster and remap the data
from the down OSD to another OSD. The time between being marked down and being marked out is controlled by
mon_osd_down_out_interval, which is set to 600 seconds by default.

A placement group can also be degraded, because Ceph cannot find one or more objects that Ceph thinks should be in the
placement group. While you cannot read or write to unfound objects, you can still access all of the other objects in the degraded
placement group.

For example, if there are nine OSDs in a three way replica pool. If OSD number 9 goes down, the PGs assigned to OSD 9 goes into a
degraded state. If OSD 9 does not recover, it goes out of the storage cluster and the storage cluster rebalances. In that scenario, the
PGs are degraded and then recover to an active state.

Placement Group recovering state


Edit online
Ceph was designed for fault-tolerance at a scale where hardware and software problems are ongoing. When an OSD goes down, its
contents may fall behind the current state of other replicas in the placement groups. When the OSD is back up, the contents of the
placement groups must be updated to reflect the current state. During that time period, the OSD may reflect a recovering state.

Recovery is not always trivial, because a hardware failure might cause a cascading failure of multiple Ceph OSDs. For example, a
network switch for a rack or cabinet may fail, which can cause the OSDs of a number of host machines to fall behind the current state
of the storage cluster. Each one of the OSDs must recover once the fault is resolved.

Ceph provides a number of settings to balance the resource contention between new service requests and the need to recover data
objects and restore the placement groups to the current state. The osd recovery delay start setting allows an OSD to restart,
re-peer and even process some replay requests before starting the recovery process. The osd recovery threads setting limits
the number of threads for the recovery process, by default one thread. The osd recovery thread timeout sets a thread
timeout, because multiple Ceph OSDs can fail, restart and re-peer at staggered rates. The osd recovery max active setting
limits the number of recovery requests a Ceph OSD works on simultaneously to prevent the Ceph OSD from failing to serve. The osd
recovery max chunk setting limits the size of the recovered data chunks to prevent network congestion.

Back fill state


Edit online
When a new Ceph OSD joins the storage cluster, CRUSH will reassign placement groups from OSDs in the cluster to the newly added
Ceph OSD. Forcing the new OSD to accept the reassigned placement groups immediately can put excessive load on the new Ceph

328 IBM Storage Ceph


OSD. Backfilling the OSD with the placement groups allows this process to begin in the background. Once backfilling is complete, the
new OSD will begin serving requests when it is ready.

During the backfill operations, you might see one of several states:

backfill_wait indicates that a backfill operation is pending, but isn’t underway yet

backfill indicates that a backfill operation is underway

backfill_too_full indicates that a backfill operation was requested, but couldn’t be completed due to insufficient storage
capacity.

When a placement group cannot be backfilled, it can be considered incomplete.

Ceph provides a number of settings to manage the load spike associated with reassigning placement groups to a Ceph OSD,
especially a new Ceph OSD. By default, osd_max_backfills sets the maximum number of concurrent backfills to or from a Ceph
OSD to 10. The osd backfill full ratio enables a Ceph OSD to refuse a backfill request if the OSD is approaching its full ratio,
by default 85%. If an OSD refuses a backfill request, the osd backfill retry interval enables an OSD to retry the request, by
default after 10 seconds. OSDs can also set osd backfill scan min and osd backfill scan max to manage scan intervals,
by default 64 and 512.

For some workloads, it is beneficial to avoid regular recovery entirely and use backfill instead. Since backfilling occurs in the
background, this allows I/O to proceed on the objects in the OSD. You can force a backfill rather than a recovery by setting the
osd_min_pg_log_entries option to 1, and setting the osd_max_pg_log_entries option to 2. Contact your IBM Support
account team for details on when this situation is appropriate for your workload.

Placement Group remapped state


Edit online
When the Acting Set that services a placement group changes, the data migrates from the old acting set to the new acting set. It may
take some time for a new primary OSD to service requests. So it may ask the old primary to continue to service requests until the
placement group migration is complete. Once data migration completes, the mapping uses the primary OSD of the new acting set.

Placement Group stale state


Edit online
While Ceph uses heartbeats to ensure that hosts and daemons are running, the ceph-osd daemons may also get into a stuck state
where they are not reporting statistics in a timely manner. For example, a temporary network fault. By default, OSD daemons report
their placement group, up through boot and failure statistics every half a second, that is, 0.5, which is more frequent than the
heartbeat thresholds. If the Primary OSD of a placement group’s acting set fails to report to the monitor or if other OSDs have
reported the primary OSD down, the monitors will mark the placement group stale.

When you start the storage cluster, it is common to see the stale state until the peering process completes. After the storage
cluster has been running for a while, seeing placement groups in the stale state indicates that the primary OSD for those placement
groups is down or not reporting placement group statistics to the monitor.

Placement Group misplaced state


Edit online
There are some temporary backfilling scenarios where a PG gets mapped temporarily to an OSD. When that temporary situation
should no longer be the case, the PGs might still reside in the temporary location and not in the proper location. In which case, they
are said to be misplaced. That’s because the correct number of extra copies actually exist, but one or more copies is in the wrong
place.

For example, there are 3 OSDs: 0,1,2 and all PGs map to some permutation of those three. If you add another OSD (OSD 3), some
PGs will now map to OSD 3 instead of one of the others. However, until OSD 3 is backfilled, the PG will have a temporary mapping

IBM Storage Ceph 329


allowing it to continue to serve I/O from the old mapping. During that time, the PG is misplaced, because it has a temporary
mapping, but not degraded, since there are three copies.

Syntax

pg 1.5: up=acting: [0,1,2]


ADD_OSD_3
pg 1.5: up: [0,3,1] acting: [0,1,2]

is a temporary mapping, so the up set is not equal to the acting set and the PG is misplaced but not degraded since is still three
copies.

Example

pg 1.5: up=acting: [0,3,1]

OSD 3 is now backfilled and the temporary mapping is removed, not degraded and not misplaced.

Placement Group incomplete state


Edit online
A PG goes into a incomplete state when there is incomplete content and peering fails, that is, when there are no complete OSDs
which are current enough to perform recovery.

Lets say OSD 1, 2, and 3 are the acting OSD set and it switches to OSD 1, 4, and 3, then osd.1 will request a temporary acting set of
OSD 1, 2, and 3 while backfilling 4. During this time, if OSD 1, 2, and 3 all go down, osd.4 will be the only one left which might not
have fully backfilled all the data. At this time, the PG will go incomplete indicating that there are no complete OSDs which are
current enough to perform recovery.

Alternately, if osd.4 is not involved and the acting set is simply OSD 1, 2, and 3 when OSD 1, 2, and 3 go down, the PG would likely
go stale indicating that the mons have not heard anything on that PG since the acting set changed. The reason being there are no
OSDs left to notify the new OSDs.

Identifying stuck Placement Groups


Edit online
A placement group is not necessarily problematic just because it is not in a active+clean state. Generally, Ceph’s ability to self
repair might not be working when placement groups get stuck. The stuck states include:

Unclean: Placement groups contain objects that are not replicated the desired number of times. They should be recovering.

Inactive: Placement groups cannot process reads or writes because they are waiting for an OSD with the most up-to-date
data to come back up.

Stale: Placement groups are in an unknown state, because the OSDs that host them have not reported to the monitor cluster
in a while, and can be configured with the mon osd report timeout setting.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online
To identify stuck placement groups, execute the following:

Syntax

330 IBM Storage Ceph


ceph pg dump_stuck {inactive|unclean|stale|undersized|degraded
[inactive|unclean|stale|undersized|degraded...]} {<int>}

Example

[ceph: root@host01 /]# ceph pg dump_stuck stale


OK

Finding an object’s location


Edit online
The Ceph client retrieves the latest cluster map and the CRUSH algorithm calculates how to map the object to a placement group,
and then calculates how to assign the placement group to an OSD dynamically.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online
To find the object location, all you need is the object name and the pool name:

Syntax

ceph osd map POOL_NAME OBJECT_NAME

Example

[ceph: root@host01 /]# ceph osd map mypool myobject

Stretch clusters for Ceph storage


Edit online
As a storage administrator, you can configure stretch clusters by entering stretch mode with 2-site clusters.

IBM Storage Ceph is capable of withstanding the loss of Ceph OSDs because of its network and cluster, which are equally reliable
with failures randomly distributed across the CRUSH map. If a number of OSDs is shut down, the remaining OSDs and monitors still
manage to operate.

However, this might not be the best solution for some stretched cluster configurations where a significant part of the Ceph cluster
can use only a single network component. The example is a single cluster located in multiple data centers, for which the user wants
to sustain a loss of a full data center.

The standard configuration is with two data centers. Other configurations are in clouds or availability zones. Each site holds two
copies of the data, therefore, the replication size is four. The third site should have a tiebreaker monitor, this can be a virtual machine
or high-latency compared to the main sites. This monitor chooses one of the sites to restore data if the network connection fails and
both data centers remain active.

IMPORTANT: The standard Ceph configuration survives many failures of the network or data centers and it never compromises data
consistency. If you restore enough Ceph servers following a failure, it recovers. Ceph maintains availability if you lose a data center,
but can still form a quorum of monitors and have all the data available with enough copies to satisfy pools’ min_size, or CRUSH
rules that replicate again to meet the size.

NOTE: There are no additional steps to power down a stretch cluster. See Powering down and rebooting IBM Storage Ceph cluster

Stretch cluster failures

IBM Storage Ceph 331


IBM Storage Ceph never compromises on data integrity and consistency. If there is a network failure or a loss of nodes and the
services can still be restored, Ceph returns to normal functionality on its own.

However, there are situations where you lose data availability even if you have enough servers available to meet Ceph’s consistency
and sizing constraints, or where you unexpectedly do not meet the constraints.

First important type of failure is caused by inconsistent networks. If there is a network split, Ceph might be unable to mark OSD as
down to remove it from the acting placement group (PG) sets despite the primary OSD being unable to replicate data. When this
happens, the I/O is not permitted because Ceph cannot meet its durability guarantees.

The second important category of failures is when it appears that you have data replicated across data enters, but the constraints are
not sufficient to guarantee this. For example, you might have data centers A and B, and the CRUSH rule targets three copies and
places a copy in each data center with a min_size of 2. The PG might go active with two copies in site A and no copies in site B,
which means that if you lose site A, you lose the data and Ceph cannot operate on it. This situation is difficult to avoid with standard
CRUSH rules.

Stretch mode for a storage cluster


Setting the crush location for the daemons
Entering the stretch mode
Adding OSD hosts in stretch mode

Stretch mode for a storage cluster


Edit online
To configure stretch clusters, you must enter the stretch mode. When stretch mode is enabled, the Ceph OSDs only take PGs as
active when they peer across data centers, or whichever other CRUSH bucket type you specified, assuming both are active. Pools
increase in size from the default three to four, with two copies on each site.

In stretch mode, Ceph OSDs are only allowed to connect to monitors within the same data center. New monitors are not allowed to
join the cluster without specified location.

If all the OSDs and monitors from a data center become inaccessible at once, the surviving data center will enter a degraded stretch
mode. This issues a warning, reduces the min_size to 1, and allows the cluster to reach an active state with the data from the
remaining site.

NOTE: The degraded state also triggers warnings that the pools are too small, because the pool size does not get changed.
However, a special stretch mode flag prevents the OSDs from creating extra copies in the remaining data center, therefore it still
keeps 2 copies.

When the missing data center becomes accesible again, the cluster enters recovery stretch mode. This changes the warning and
allows peering, but still requires only the OSDs from the data center, which was up the whole time.

When all PGs are in a known state and are not degraded or incomplete, the cluster goes back to the regular stretch mode, ends the
warning, and restores min_size to its starting value 2. The cluster again requires both sites to peer, not only the site that stayed up
the whole time, therefore you can fail over to the other site, if necessary.

Stretch mode limitations

It is not possible to exit from stretch mode once it is entered.

You cannot use erasure-coded pools with clusters in stretch mode. You can neither enter the stretch mode with erasure-
coded pools, nor create an erasure-coded pool when the stretch mode is active.

Stretch mode with no more than two sites is supported.

The weights of the two sites should be the same. If they are not, you receive the following error:

Example

[ceph: root@host01 /]# ceph mon enable_stretch_mode host05 stretch_rule datacenter

Error EINVAL: the 2 datacenter instances in the cluster have differing weights 25947 and
15728 but stretch mode currently requires they be the same!

332 IBM Storage Ceph


To achieve same weights on both sites, the Ceph OSDs deployed in the two sites should be of equal size, that is, storage capacity in
the first site is equivalent to storage capacity in the second site.

While it is not enforced, you should run two Ceph monitors on each site and a tiebreaker, for a total of five. This is because
OSDs can only connect to monitors in their own site when in stretch mode.

You have to create your own CRUSH rule, which provides two copies on each site, which totals to four on both sites.

You cannot enable stretch mode if you have existing pools with non-default size or min_size.

Because the cluster runs with min_size 1 when degraded, you should only use stretch mode with all-flash OSDs. This
minimizes the time needed to recover once connectivity is restored, and minimizes the potential for data loss.

Reference
Edit online

Troubleshooting clusters in stretch mode

Setting the crush location for the daemons


Edit online
Before you enter the stretch mode, you need to prepare the cluster by setting the crush location to the daemons in the cluster. There
are two ways to do this:

Bootstrap the cluster through a service configuration file, where the locations are added to the hosts as part of deployment.

Set the locations manually through ceph osd crush add-bucket and ceph osd crush move commands after the
cluster is deployed.

Method 1: Bootstrapping the cluster


Edit online

Prerequisites
Edit online

Root-level access to the nodes.

Procedure
Edit online

1. If you are bootstrapping your new storage cluster, you can create the service configuration .yaml file that adds the nodes to
the IBM Storage Ceph cluster and also sets specific labels for where the services should run:

Example

service_type: host
addr: host01
hostname: host01
location:
root: default
datacenter: DC1
labels:
- osd
- mon
- mgr
---
service_type: host
addr: host02
hostname: host02

IBM Storage Ceph 333


location:
datacenter: DC1
labels:
- osd
- mon
---
service_type: host
addr: host03
hostname: host03
location:
datacenter: DC1
labels:
- osd
- mds
- rgw
---
service_type: host
addr: host04
hostname: host04
location:
root: default
datacenter: DC2
labels:
- osd
- mon
- mgr
---
service_type: host
addr: host05
hostname: host05
location:
datacenter: DC2
labels:
- osd
- mon
---
service_type: host
addr: host06
hostname: host06
location:
datacenter: DC2
labels:
- osd
- mds
- rgw
---
service_type: host
addr: host07
hostname: host07
labels:
- mon
---
service_type: mon
placement:
label: "mon"
---
service_id: cephfs
placement:
label: "mds"
---
service_type: mgr
service_name: mgr
placement:
label: "mgr"
---
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
label: "osd"
spec:
data_devices:
all: true
---
service_type: rgw
service_id: objectgw

334 IBM Storage Ceph


service_name: rgw.objectgw
placement:
count: 2
label: "rgw"
spec:
rgw_frontend_port: 8080

2. Bootstrap the storage cluster with the --apply-spec option:

Syntax

cephadm bootstrap --apply-spec CONFIGURATION_FILE_NAME --mon-ip MONITOR_IP_ADDRESS --ssh-


private-key PRIVATE_KEY --ssh-public-key PUBLIC_KEY --registry-url REGISTRY_URL --registry-
username USER_NAME --registry-password PASSWORD

Example

[root@host01 ~]# cephadm bootstrap --apply-spec initial-config.yaml --mon-ip 10.10.128.68 --


ssh-private-key /home/ceph/.ssh/id_rsa --ssh-public-key /home/ceph/.ssh/id_rsa.pub --registry-
url registry.redhat.io --registry-username myuser1 --registry-password mypassword1

IMPORTANT: You can use different command options with the cephadm bootstrap command. However, always include the
--apply-spec option to use the service configuration file and configure the host locations.

Reference
Edit online

For more information about Ceph bootstrapping and different cephadm bootstrap command options, see Bootstrapping a
new storage cluster

Method 2: Setting the locations after the deployment


Edit online

Prerequisites
Edit online

Root-level access to the nodes.

Procedure
Edit online

1. Add two buckets to which you plan to set the location of your non-tiebreaker monitors to the CRUSH map, specifying the
bucket type as as datacenter:

Syntax

ceph osd crush add-bucket BUCKET_NAME BUCKET_TYPE

Example

[ceph: root@host01 /]# ceph osd crush add-bucket DC1 datacenter


[ceph: root@host01 /]# ceph osd crush add-bucket DC2 datacenter

2. Move the buckets under root=default:

Syntax

ceph osd crush move BUCKET_NAME root=default

Example

[ceph: root@host01 /]# ceph osd crush move DC1 root=default


[ceph: root@host01 /]# ceph osd crush move DC2 root=default

IBM Storage Ceph 335


3. Move the OSD hosts according to the required CRUSH placement:

Syntax

ceph osd crush move HOST datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph osd crush move host01 datacenter=DC1

Entering the stretch mode


Edit online
The new stretch mode is designed to handle two sites. There is a lower risk of component availability outages with 2-site clusters.

Prerequisites
Edit online

Root-level access to the nodes.

The crush location is set to the hosts.

Procedure
Edit online

1. Set the location of each monitor, matching your CRUSH map:

Syntax

ceph mon set_location _HOST_ datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph mon set_location host01 datacenter=DC1


[ceph: root@host01 /]# ceph mon set_location host02 datacenter=DC1
[ceph: root@host01 /]# ceph mon set_location host04 datacenter=DC2
[ceph: root@host01 /]# ceph mon set_location host05 datacenter=DC2
[ceph: root@host01 /]# ceph mon set_location host07 datacenter=DC3

2. Generate a CRUSH rule which places two copies on each data center:

Syntax ceph osd getcrushmap > COMPILED_CRUSHMAP_FILENAME crushtool -d COMPILED_CRUSHMAP_FILENAME -o


DECOMPILED_CRUSHMAP_FILENAME

Example

[ceph: root@host01 /]# ceph osd getcrushmap > crush.map.bin


[ceph: root@host01 /]# crushtool -d crush.map.bin -o crush.map.txt

3. Edit the decompiled CRUSH map file to add a new rule:

Example

rule stretch_rule {
id 1
type replicated
min_size 1
max_size 10
step take DC1 <2>
step chooseleaf firstn 2 type host
step emit
step take DC2
step chooseleaf firstn 2 type host
step emit
}

336 IBM Storage Ceph


The rule `id` has to be unique. In this example, there is only one more rule with `id 0`,
thereby the `id 1` is used, however you might need to use a different rule ID depending on the
number of existing rules. In this example, there are two data center buckets named `DC1` and
`DC2`.

NOTE: This rule makes the cluster have read-affinity towards data center DC1. Therefore, all the reads or writes happen
through Ceph OSDs placed in DC1. If this is not desirable, and reads or writes are to be distributed evenly across the zones,
the crush rule is the following:

Example

rule stretch_rule {
id 1
type replicated
min_size 1
max_size 10
step take default
step choose firstn 0 type datacenter
step chooseleaf firstn 2 type host
step emit
}

In this rule, the data center is selected randomly and automatically. See CRUSH rules for more information on firstn and
indep options.

4. Inject the CRUSH map to make the rule available to the cluster:

Syntax

crushtool -c _DECOMPILED_CRUSHMAP_FILENAME_ -o COMPILED_CRUSHMAP_FILENAME


ceph osd setcrushmap -i COMPILED_CRUSHMAP_FILENAME

Example

[ceph: root@host01 /]# crushtool -c crush.map.txt -o crush2.map.bin


[ceph: root@host01 /]# ceph osd setcrushmap -i crush2.map.bin

5. If you do not run the monitors in connectivity mode, set the election strategy to connectivity:

Example

[ceph: root@host01 /]# ceph mon set election_strategy connectivity

6. Enter stretch mode by setting the location of the tiebreaker monitor to split across the data centers:

Syntax

ceph mon set_location HOST datacenter=DATACENTER


ceph mon enable_stretch_mode HOST stretch_rule datacenter

Example

[ceph: root@host01 /]# ceph mon set_location host07 datacenter=DC3


[ceph: root@host01 /]# ceph mon enable_stretch_mode host07 stretch_rule datacenter

In this example the monitor mon.host07 is the tiebreaker.

IMPORTANT: The location of the tiebreaker monitor should differ from the data centers to which you previously set the non-
tiebreaker monitors. In the example above, it is data center DC3.

IMPORTANT: Do not add this data center to the CRUSH map as it results in the following error when you try to enter stretch
mode: Error EINVAL: there are 3 datacenters in the cluster but stretch mode currently only works with 2!

NOTE: If you are writing your own tooling for deploying Ceph, you can use a new --set-crush-location option when
booting monitors, instead of running the ceph mon set_location command. This option accepts only a single
bucket=location pair, for example ceph-mon --set-crush-location 'datacenter=DC1', which must match the
bucket type you specified when running the enable_stretch_mode command.

7. Verify that the stretch mode is enabled successfully:

Example

IBM Storage Ceph 337


[ceph: root@host01 /]# ceph osd dump

epoch 361
fsid 1234ab78-1234-11ed-b1b1-de456ef0a89d
created 2023-01-16T05:47:28.4827170000
modified 2023-01-17T17:36:50.0661830000
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 31
full_ratio 0.95
backfillfull_ratio 0.92
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client luminous
require_osd_release quincy
stretch_mode_enabled true
stretch_bucket_count 2
degraded_stretch_mode 0
recovering_stretch_mode 0
stretch_mode_bucket 8

The stretch_mode_enabled should be set to true. You can also see the number of stretch buckets, stretch mode buckets,
and if the stretch mode is degraded or recovering.

8. Verify that the monitors are in an appropriate locations:

Example

[ceph: root@host01 /]# ceph mon dump

epoch 19
fsid 1234ab78-1234-11ed-b1b1-de456ef0a89d
last_changed 2023-01-17T04:12:05.7094750000
created 2023-01-16T05:47:25.6316840000
min_mon_release 16 (pacific)
election_strategy: 3
stretch_mode_enabled 1
tiebreaker_mon host07
disallowed_leaders host07
0: [v2:132.224.169.63:3300/0,v1:132.224.169.63:6789/0] mon.host07; crush_location
{datacenter=DC3}
1: [v2:220.141.179.34:3300/0,v1:220.141.179.34:6789/0] mon.host04; crush_location
{datacenter=DC2}
2: [v2:40.90.220.224:3300/0,v1:40.90.220.224:6789/0] mon.host01; crush_location
{datacenter=DC1}
3: [v2:60.140.141.144:3300/0,v1:60.140.141.144:6789/0] mon.host02; crush_location
{datacenter=DC1}
4: [v2:186.184.61.92:3300/0,v1:186.184.61.92:6789/0] mon.host05; crush_location
{datacenter=DC2}
dumped monmap epoch 19

You can also see which monitor is the tiebreaker, and the monitor election strategy.

Reference
Edit online

Configuring monitor election strategy

Adding OSD hosts in stretch mode


Edit online
You can add Ceph OSDs in the stretch mode. The procedure is similar to the addition of the OSD hosts on a cluster where stretch
mode is not enabled.

Prerequisites
Edit online

338 IBM Storage Ceph


A running IBM Storage Ceph cluster.

Stretch mode in enabled on a cluster.

Root-level access to the nodes.

Procedure
Edit online

1. List the available devices to deploy OSDs:

Syntax

ceph orch device ls [--hostname=HOST_1 HOST_2] [--wide] [--refresh]

Example

[ceph: root@host01 /]# ceph orch device ls

2. Deploy the OSDs on specific hosts or on all the available devices:

Create an OSD from a specific device on a specific host:

Syntax

ceph orch daemon add osd HOST:DEVICE_PATH

Example

[ceph: root@host01 /]# ceph orch daemon add osd host03:/dev/sdb

Deploy OSDs on any available and unused devices:

IMPORTANT: This command creates collocated WAL and DB devices. If you want to create non-collocated devices, do
not use this command.

Example

[ceph: root@host01 /]# ceph orch apply osd --all-available-devices

3. Move the OSD hosts under the CRUSH bucket:

Syntax

ceph osd crush move _HOST_ datacenter=_DATACENTER_

Example

[ceph: root@host01 /]# ceph osd crush move host03 datacenter=DC1


[ceph: root@host01 /]# ceph osd crush move host06 datacenter=DC2

NOTE: Ensure you add the same topology nodes on both sites. Issues might arise if hosts are added only on one site.

Reference
Edit online

Adding OSDs for more information about the addition of Ceph OSDs.

Override Ceph behavior


Edit online
As a storage administrator, you need to understand how to use overrides for the IBM Storage Ceph cluster to change Ceph options
during runtime.

Setting and unsetting Ceph override options

IBM Storage Ceph 339


Ceph override use cases

Setting and unsetting Ceph override options


Edit online
You can set and unset Ceph options to override Ceph’s default behavior.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To override Ceph’s default behavior, use the ceph osd set command and the behavior you wish to override:

Syntax

ceph osd set FLAG

Once you set the behavior, ceph health will reflect the override(s) that you have set for the cluster.

Example

[ceph: root@host01 /]# ceph osd set noout

2. To cease overriding Ceph’s default behavior, use the ceph osd unset command and the override you wish to cease.

Syntax

ceph osd unset FLAG

Example

[ceph: root@host01 /]# ceph osd unset noout

Flag Description
noin Prevents OSDs from being treated as in the cluster.
noout Prevents OSDs from being treated as out of the cluster.
noup Prevents OSDs from being treated as up and running.
nodown Prevents OSDs from being treated as down.
full Makes a cluster appear to have reached its full_ratio, and thereby prevents write operations.
pause Ceph will stop processing read and write operations, but will not affect OSD in, out, up or down statuses.
nobackfill Ceph will prevent new backfill operations.
norebalance Ceph will prevent new rebalancing operations.
norecover Ceph will prevent new recovery operations.
noscrub Ceph will prevent new scrubbing operations.
nodeep-scrub Ceph will prevent new deep scrubbing operations.
notieragent Ceph will disable the process that is looking for cold/dirty objects to flush and evict.

Ceph override use cases


Edit online

340 IBM Storage Ceph


noin: Commonly used with noout to address flapping OSDs.

noout: If the mon osd report timeout is exceeded and an OSD has not reported to the monitor, the OSD will get marked
out. If this happens erroneously, you can set noout to prevent the OSD(s) from getting marked out while you troubleshoot
the issue.

noup: Commonly used with nodown to address flapping OSDs.

nodown: Networking issues may interrupt Ceph heartbeat processes, and an OSD may be up but still get marked down. You
can set nodown to prevent OSDs from getting marked down while troubleshooting the issue.

full: If a cluster is reaching its full_ratio, you can pre-emptively set the cluster to full and expand capacity.

NOTE: Setting the cluster to full will prevent write operations.

pause: If you need to troubleshoot a running Ceph cluster without clients reading and writing data, you can set the cluster to
pause to prevent client operations.

nobackfill: If you need to take an OSD or node down temporarily, for example, upgrading daemons, you can set
nobackfill so that Ceph will not backfill while the OSDs is down.

norecover: If you need to replace an OSD disk and don’t want the PGs to recover to another OSD while you are hotswapping
disks, you can set norecover to prevent the other OSDs from copying a new set of PGs to other OSDs.

noscrub and nodeep-scrubb: If you want to prevent scrubbing for example, to reduce overhead during high loads,
recovery, backfilling, and rebalancing you can set noscrub and/or nodeep-scrub to prevent the cluster from scrubbing
OSDs.

notieragent: If you want to stop the tier agent process from finding cold objects to flush to the backing storage tier, you
may set notieragent.

Ceph user management


Edit online
As a storage administrator, you can manage the Ceph user base by providing authentication, and access control to objects in the IBM
Storage Ceph cluster.

Figure 1. OSD States

IMPORTANT: Cephadm manages the client keyrings for the IBM Storage Ceph cluster as long as the clients are within the scope of
Cephadm. Users should not modify the keyrings that are managed by Cephadm, unless there is troubleshooting.

Ceph user management background


Managing Ceph users

Ceph user management background


Edit online
When Ceph runs with authentication and authorization enabled, you must specify a user name. If you do not specify a user name,
Ceph will use the client.admin administrative user as the default user name.

Alternatively, you may use the CEPH_ARGS environment variable to avoid re-entry of the user name and secret.

IBM Storage Ceph 341


Irrespective of the type of Ceph client, for example, block device, object store, file system, native API, or the Ceph command line,
Ceph stores all data as objects within pools. Ceph users must have access to pools in order to read and write data. Additionally,
administrative Ceph users must have permissions to execute Ceph’s administrative commands.

The following concepts can help you understand Ceph user management.

Storage Cluster Users

A user of the IBM Storage Ceph cluster is either an individual or as an application. Creating users allows you to control who can
access the storage cluster, its pools, and the data within those pools.

Ceph has the notion of a type of user. For the purposes of user management, the type will always be client. Ceph identifies users
in period (.) delimited form consisting of the user type and the user ID. For example, TYPE.ID, client.admin, or client.user1.
The reason for user typing is that Ceph Monitors, and OSDs also use the Cephx protocol, but they are not clients. Distinguishing the
user type helps to distinguish between client users and other users—streamlining access control, user monitoring and traceability.

Sometimes Ceph’s user type may seem confusing, because the Ceph command line allows you to specify a user with or without the
type, depending upon the command line usage. If you specify --user or --id, you can omit the type. So client.user1 can be
entered simply as user1. If you specify --name or -n, you must specify the type and name, such as client.user1. IBM
recommends using the type and name as a best practice wherever possible.

NOTE: An IBM Storage Ceph cluster user is not the same as a Ceph Object Gateway user. The object gateway uses an IBM Storage
Ceph cluster user to communicate between the gateway daemon and the storage cluster, but the gateway has its own user
management functionality for its end users.

Authorization capabilities

Ceph uses the term "capabilities" (caps) to describe authorizing an authenticated user to exercise the functionality of the Ceph
Monitors and OSDs. Capabilities can also restrict access to data within a pool or a namespace within a pool. A Ceph administrative
user sets a user’s capabilities when creating or updating a user. Capability syntax follows the form:

Syntax

DAEMON_TYPE 'allow CAPABILITY' [DAEMON_TYPE 'allow CAPABILITY']

Monitor Caps: Monitor capabilities include r, w, x, allow profile CAP, and profile rbd.

Example

mon 'allow rwx'


mon 'allow profile osd'

OSD Caps: OSD capabilities include r, w, x, class-read, class-write, profile osd, profile rbd, and profile
rbd-read-only. Additionally, OSD capabilities also allow for pool and namespace settings. :

Syntax

osd 'allow CAPABILITY' [pool=POOL_NAME] [namespace=NAMESPACE_NAME]

NOTE: The Ceph Object Gateway daemon (radosgw) is a client of the Ceph storage cluster, so it is not represented as a Ceph storage
cluster daemon type.

The following entries describe each capability.

allow Precedes access settings for a daemon.


r Gives the user read access. Required with monitors to retrieve the CRUSH map.
w Gives the user write access to objects.
x Gives the user the capability to call class methods (that is, both read and write) and to conduct auth operations
on monitors.
class-read Gives the user the capability to call class read methods. Subset of x.
class-write Gives the user the capability to call class write methods. Subset of x.
* Gives the user read, write and execute permissions for a particular daemon or pool, and the ability to execute
admin commands.
profile osd Gives a user permissions to connect as an OSD to other OSDs or monitors. Conferred on OSDs to enable OSDs
to handle replication heartbeat traffic and status reporting.
profile Gives a user permissions to bootstrap an OSD, so that they have permissions to add keys when bootstrapping
bootstrap-osd an OSD.

342 IBM Storage Ceph


profile rbd Gives a user read-write access to the Ceph Block Devices.
profile rbd- Gives a user read-only access to the Ceph Block Devices.
read-only
Pool

A pool defines a storage strategy for Ceph clients, and acts as a logical partition for that strategy.

In Ceph deployments, it is common to create a pool to support different types of use cases. For example, cloud volumes or images,
object storage, hot storage, cold storage, and so on. When deploying Ceph as a back end for OpenStack, a typical deployment would
have pools for volumes, images, backups and virtual machines, and users such as client.glance, client.cinder, and so on.

Namespace

Objects within a pool can be associated to a namespace—a logical group of objects within the pool. A user’s access to a pool can be
associated with a namespace such that reads and writes by the user take place only within the namespace. Objects written to a
namespace within the pool can only be accessed by users who have access to the namespace.

NOTE: Currently, namespaces are only useful for applications written on top of librados. Ceph clients such as block device and
object storage do not currently support this feature.

The rationale for namespaces is that pools can be a computationally expensive method of segregating data by use case, because
each pool creates a set of placement groups that get mapped to OSDs. If multiple pools use the same CRUSH hierarchy and ruleset,
OSD performance may degrade as load increases.

For example, a pool should have approximately 100 placement groups per OSD. So an exemplary cluster with 1000 OSDs would
have 100,000 placement groups for one pool. Each pool mapped to the same CRUSH hierarchy and ruleset would create another
100,000 placement groups in the exemplary cluster. By contrast, writing an object to a namespace simply associates the namespace
to the object name without the computational overhead of a separate pool. Rather than creating a separate pool for a user or set of
users, you may use a namespace.

NOTE: Only available using librados at this time.

Reference
Edit online

For more information on configuring the use of authentication, see Configuring.

Managing Ceph users


Edit online
As a storage administrator, you can manage Ceph users by creating, modifying, deleting, and importing users.

A Ceph client user can be either individuals or applications, which use Ceph clients to interact with the IBM Storage Ceph cluster
daemons.

Listing Ceph users


Display Ceph user information
Add a new Ceph user
Modifying a Ceph User
Deleting a Ceph user
Print a Ceph user key

Listing Ceph users


Edit online
You can list the users in the storage cluster using the command-line interface.

Prerequisites
IBM Storage Ceph 343
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

To list the users in the storage cluster, execute the following:

Example

[ceph: root@host01 /]# ceph auth list


installed auth entries:

osd.10
key: AQBW7U5gqOsEExAAg/CxSwZ/gSh8iOsDV3iQOA==
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.11
key: AQBX7U5gtj/JIhAAPsLBNG+SfC2eMVEFkl3vfA==
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.9
key: AQBV7U5g1XDULhAAKo2tw6ZhH1jki5aVui2v7g==
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: AQADYEtgFfD3ExAAwH+C1qO7MSLE4TWRfD2g6g==
caps: [mds] allow *
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
key: AQAHYEtgpbkANBAANqoFlvzEXFwD8oB0w3TF4Q==
caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
key: AQAHYEtg3dcANBAAVQf6brq3sxTSrCrPe0pKVQ==
caps: [mon] allow profile bootstrap-mgr
client.bootstrap-osd
key: AQAHYEtgD/QANBAATS9DuP3DbxEl86MTyKEmdw==
caps: [mon] allow profile bootstrap-osd
client.bootstrap-rbd
key: AQAHYEtgjxEBNBAANho25V9tWNNvIKnHknW59A==
caps: [mon] allow profile bootstrap-rbd
client.bootstrap-rbd-mirror
key: AQAHYEtgdE8BNBAAr6rLYxZci0b2hoIgH9GXYw==
caps: [mon] allow profile bootstrap-rbd-mirror
client.bootstrap-rgw
key: AQAHYEtgwGkBNBAAuRzI4WSrnowBhZxr2XtTFg==
caps: [mon] allow profile bootstrap-rgw
client.crash.host04
key: AQCQYEtgz8lGGhAAy5bJS8VH9fMdxuAZ3CqX5Q==
caps: [mgr] profile crash
caps: [mon] profile crash
client.crash.host02
key: AQDuYUtgqgfdOhAAsyX+Mo35M+HFpURGad7nJA==
caps: [mgr] profile crash
caps: [mon] profile crash
client.crash.host03
key: AQB98E5g5jHZAxAAklWSvmDsh2JaL5G7FvMrrA==
caps: [mgr] profile crash
caps: [mon] profile crash
client.rgw.test_realm.test_zone.host01.hgbvnq
key: AQD5RE9gAQKdCRAAJzxDwD/dJObbInp9J95sXw==
caps: [mgr] allow rw
caps: [mon] allow *
caps: [osd] allow rwx tag rgw *=*
client.rgw.test_realm.test_zone.host02.yqqilm

344 IBM Storage Ceph


key: AQD0RE9gkxA4ExAAFXp3pLJWdIhsyTe2ZR6Ilw==
caps: [mgr] allow rw
caps: [mon] allow *
caps: [osd] allow rwx tag rgw *=*
mgr.host01.hdhzwn
key: AQAEYEtg3lhIBxAAmHodoIpdvnxK0llWF80ltQ==
caps: [mds] allow *
caps: [mon] profile mgr
caps: [osd] allow *
mgr.host02.eobuuv
key: AQAn6U5gzUuiABAA2Fed+jPM1xwb4XDYtrQxaQ==
caps: [mds] allow *
caps: [mon] profile mgr
caps: [osd] allow *
mgr.host03.wquwpj
key: AQAd6U5gIzWsLBAAbOKUKZlUcAVe9kBLfajMKw==
caps: [mds] allow *
caps: [mon] profile mgr
caps: [osd] allow *

NOTE: The TYPE.ID notation for users applies such that osd.0 is a user of type osd and its ID is 0, client.admin is a user of type
client and its ID is admin, that is, the default client.admin user. Note also that each entry has a key: VALUE entry, and one or
more caps: entries.

You may use the -o FILE_NAME option with ceph auth list to save the output to a file.

Display Ceph user information


Edit online
You can display a Ceph’s user information using the command-line interface.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To retrieve a specific user, key and capabilities, execute the following:

Syntax

ceph auth export TYPE.ID

Example

[ceph: root@host01 /]# ceph auth export mgr.host02.eobuuv

2. You can also use the -o FILE_NAME option.

Syntax

ceph auth export TYPE.ID -o FILE_NAME

Example

[ceph: root@host01 /]# ceph auth export osd.9 -o filename


export auth(key=AQBV7U5g1XDULhAAKo2tw6ZhH1jki5aVui2v7g==)

The auth export command is identical to auth get, but also prints out the internal auid, which isn’t relevant to end users.

IBM Storage Ceph 345


Add a new Ceph user
Edit online
Adding a user creates a username, that is, TYPE.ID, a secret key and any capabilities included in the command you use to create the
user.

A user’s key enables the user to authenticate with the Ceph storage cluster. The user’s capabilities authorize the user to read, write,
or execute on Ceph monitors (mon), Ceph OSDs (osd) or Ceph Metadata Servers (mds).

There are a few ways to add a user:

ceph auth add: This command is the canonical way to add a user. It will create the user, generate a key and add any
specified capabilities.

ceph auth get-or-create: This command is often the most convenient way to create a user, because it returns a keyfile
format with the user name (in brackets) and the key. If the user already exists, this command simply returns the user name
and key in the keyfile format. You may use the -o FILE_NAME option to save the output to a file.

ceph auth get-or-create-key: This command is a convenient way to create a user and return the user’s key only. This is
useful for clients that need the key only, for example, libvirt. If the user already exists, this command simply returns the
key. You may use the -o FILE_NAME option to save the output to a file.

When creating client users, you may create a user with no capabilities. A user with no capabilities is useless beyond mere
authentication, because the client cannot retrieve the cluster map from the monitor. However, you can create a user with no
capabilities if you wish to defer adding capabilities later using the ceph auth caps command.

A typical user has at least read capabilities on the Ceph monitor and read and write capability on Ceph OSDs. Additionally, a user’s
OSD permissions are often restricted to accessing a particular pool. :

[ceph: root@host01 /]# ceph auth add client.john mon 'allow r' osd 'allow rw pool=mypool'
[ceph: root@host01 /]# ceph auth get-or-create client.paul mon 'allow r' osd 'allow rw pool=mypool'
[ceph: root@host01 /]# ceph auth get-or-create client.george mon 'allow r' osd 'allow rw
pool=mypool' -o george.keyring
[ceph: root@host01 /]# ceph auth get-or-create-key client.ringo mon 'allow r' osd 'allow rw
pool=mypool' -o ringo.key

IMPORTANT: If you provide a user with capabilities to OSDs, but you DO NOT restrict access to particular pools, the user will have
access to ALL pools in the cluster.

Modifying a Ceph User


Edit online
The ceph auth caps command allows you to specify a user and change the user’s capabilities.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To add capabilities, use the form:

Syntax

ceph auth caps USERTYPE.USERID DAEMON 'allow [r|w|x|*|...] [pool=POOL_NAME]


[namespace=NAMESPACE_NAME]'

346 IBM Storage Ceph


Example

[ceph: root@host01 /]# ceph auth caps client.john mon 'allow r' osd 'allow rw pool=mypool'
[ceph: root@host01 /]# ceph auth caps client.paul mon 'allow rw' osd 'allow rwx pool=mypool'
[ceph: root@host01 /]# ceph auth caps client.brian-manager mon 'allow *' osd 'allow *'

2. To remove a capability, you may reset the capability. If you want the user to have no access to a particular daemon that was
previously set, specify an empty string:

Example

[ceph: root@host01 /]# ceph auth caps client.ringo mon ' ' osd ' '

Reference
Edit online

For more information about capabilities, see Authorization capabilities.

Deleting a Ceph user


Edit online
You can delete a user from the Ceph storage cluster using the command-line interface.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To delete a user, use ceph auth del:

Syntax

ceph auth del TYPE.ID

Example

[ceph: root@host01 /]# ceph auth del osd.6

Print a Ceph user key


Edit online
You can display a Ceph user’s key information using the command-line interface.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

IBM Storage Ceph 347


Procedure
Edit online

1. To print a user’s authentication key to standard output, execute the following:

Syntax

ceph auth print-key TYPE.ID

Example

[ceph: root@host01 /]# ceph auth print-key osd.6

AQBQ7U5gAry3JRAA3NoPrqBBThpFMcRL6Sr+5w==[ceph: root@host01 /]#

2. Printing a user’s key is useful when you need to populate client software with a user’s key, for example, libvirt.

Syntax

mount -t ceph HOSTNAME:/MOUNT_POINT -o name=client.user,secret=`ceph auth print-key


client.user`

Example

[ceph: root@host01 /]# mount -t ceph host02:/ceph -o name=client.user,secret=`ceph auth print-


key client.user`

The ceph-volume utility

Edit online
As a storage administrator, you can prepare, list, create, activate, deactivate, batch, trigger, zap, and migrate Ceph OSDs using the
ceph-volume utility. The ceph-volume utility is a single-purpose command-line tool to deploy logical volumes as OSDs. It uses a
plugin-type framework to deploy OSDs with different device technologies. The ceph-volume utility follows a similar workflow of the
ceph-disk utility for deploying OSDs, with a predictable, and robust way of preparing, activating, and starting OSDs. Currently, the
ceph-volume utility only supports the lvm plugin, with the plan to support others technologies in the future.

IMPORTANT: The ceph-disk command is deprecated.

Ceph volume lvm plugin


Why does ceph-volume replace ceph-disk?
Preparing Ceph OSDs using ceph-volume
Listing devices using ceph-volume
Activating Ceph OSDs using ceph-volume
Deactivating Ceph OSDs using ceph-volume
Creating Ceph OSDs using ceph-volume
Migrating BlueFS data
Using batch mode with ceph-volume
Zapping data using ceph-volume

Ceph volume lvm plugin

Edit online
By making use of LVM tags, the lvm sub-command is able to store and re-discover by querying devices associated with OSDs so they
can be activated. This includes support for lvm-based technologies like dm-cache as well.

When using ceph-volume, the use of dm-cache is transparent, and treats dm-cache like a logical volume. The performance gains
and losses when using dm-cache will depend on the specific workload. Generally, random and sequential reads will see an increase
in performance at smaller block sizes. While random and sequential writes will see a decrease in performance at larger block sizes.

To use the LVM plugin, add lvm as a subcommand to the ceph-volume command within the cephadm shell:

348 IBM Storage Ceph


[ceph: root@host01 /]# ceph-volume lvm

Following are the lvm subcommands:

prepare - Format an LVM device and associate it with an OSD.

activate - Discover and mount the LVM device associated with an OSD ID and start the Ceph OSD.

list - List logical volumes and devices associated with Ceph.

batch - Automatically size devices for multi-OSD provisioning with minimal interaction.

deactivate - Deactivate OSDs.

create - Create a new OSD from an LVM device.

trigger - A systemd helper to activate an OSD.

zap - Removes all data and filesystems from a logical volume or partition.

migrate - Migrate BlueFS data from to another LVM device.

new-wal - Allocate new WAL volume for the OSD at specified logical volume.

new-db - Allocate new DB volume for the OSD at specified logical volume.

NOTE: Using the create subcommand combines the prepare and activate subcommands into one subcommand.

Reference
Edit online

See the create subcommand in Creating OSDs section for more details.

Why does ceph-volume replace ceph-disk?

Edit online
Previous versions of Ceph used the ceph-disk utility to prepare, activate, and create OSDs. Starting with IBM Storage Ceph 5,
ceph-disk is replaced by the ceph-volume utility that aims to be a single purpose command-line tool to deploy logical volumes as
OSDs, while maintaining a similar API to ceph-disk when preparing, activating, and creating OSDs.

How does ceph-volume work?

The ceph-volume is a modular tool that currently supports two ways of provisioning hardware devices, legacy ceph-disk devices
and LVM (Logical Volume Manager) devices. The ceph-volume lvm command uses the LVM tags to store information about devices
specific to Ceph and its relationship with OSDs. It uses these tags to later re-discover and query devices associated with OSDS so
that it can activate them. It supports technologies based on LVM and dm-cache as well.

The ceph-volume utility uses dm-cache transparently and treats it as a logical volume. You might consider the performance gains
and losses when using dm-cache, depending on the specific workload you are handling. Generally, the performance of random and
sequential read operations increases at smaller block sizes; while the performance of random and sequential write operations
decreases at larger block sizes. Using ceph-volume does not introduce any significant performance penalties.

IMPORTANT: The ceph-disk utility is deprecated.

NOTE: The ceph-volume simple command can handle legacy ceph-disk devices, if these devices are still in use.

How does ceph-disk work?

The ceph-disk utility was required to support many different types of init systems, such as upstart or sysvinit, while being
able to discover devices. For this reason, ceph-disk concentrates only on GUID Partition Table (GPT) partitions. Specifically on GPT
GUIDs that label devices in a unique way to answer questions like:

Is this device a journal?

Is this device an encrypted data partition?

IBM Storage Ceph 349


Was the device left partially prepared?

To solve these questions, ceph-disk uses UDEV rules to match the GUIDs.

What are disadvantages of using ceph-disk?

Using the UDEV rules to call ceph-disk can lead to a back-and-forth between the ceph-disk systemd unit and the ceph-disk
executable. The process is very unreliable and time consuming and can cause OSDs to not come up at all during the boot process of
a node. Moreover, it is hard to debug, or even replicate these problems given the asynchronous behavior of UDEV.

Because ceph-disk works with GPT partitions exclusively, it cannot support other technologies, such as Logical Volume Manager
(LVM) volumes, or similar device mapper devices.

To ensure the GPT partitions work correctly with the device discovery workflow, ceph-disk requires a large number of special flags
to be used. In addition, these partitions require devices to be exclusively owned by Ceph.

Preparing Ceph OSDs using ceph-volume

Edit online
The prepare subcommand prepares an OSD back-end object store and consumes logical volumes (LV) for both the OSD data and
journal. It does not modify the logical volumes, except for adding some extra metadata tags using LVM. These tags make volumes
easier to discover, and they also identify the volumes as part of the Ceph Storage Cluster and the roles of those volumes in the
storage cluster.

The BlueStore OSD backend supports the following configurations:

A block device, a block.wal device, and a block.db device

A block device and a block.wal device

A block device and a block.db device

A single block device

The prepare subcommand accepts a whole device or partition, or a logical volume for block.

Prerequisites
Edit online

Root-level access to the OSD nodes.

Optionally, create logical volumes. If you provide a path to a physical device, the subcommand turns the device into a logical
volume. This approach is simpler, but you cannot configure or change the way the logical volume is created.

Procedure
Edit online

1. Extract the Ceph keyring:

Syntax

ceph auth get client.ID -o ceph.client.ID.keyring

Example

[ceph: root@host01 /]# ceph-volume lvm prepare --bluestore --data example_vg/data_lv

2. Prepare the LVM volumes:

Syntax

ceph-volume lvm prepare --bluestore --data VOLUME_GROUP/LOGICAL_VOLUME

350 IBM Storage Ceph


Example

[ceph: root@host01 /]# ceph-volume lvm prepare --bluestore --data example_vg/data_lv

a. Optionally, if you want to use a separate device for RocksDB, specify the --block.db and --block.wal options:

Syntax

ceph-volume lvm prepare --bluestore --block.db --block.wal --data


VOLUME_GROUP/LOGICAL_VOLUME

Example

[ceph: root@host01 /]# ceph-volume lvm prepare --bluestore --block.db --block.wal --data
example_vg/data_lv

b. Optionally, to encrypt data, use the --dmcrypt flag:

Syntax

ceph-volume lvm prepare --bluestore --dmcrypt --data VOLUME_GROUP/LOGICAL_VOLUME

Example

[ceph: root@host01 /]# ceph-volume lvm prepare --bluestore --dmcrypt --data


example_vg/data_lv

References

For more information, see:

Activating Ceph OSDs using ceph-volume

Creating Ceph OSDs using ceph-volume

Listing devices using ceph-volume

Edit online
You can use the ceph-volume lvm list subcommand to list logical volumes and devices associated with a Ceph cluster, as long
as they contain enough metadata to allow for that discovery. The output is grouped by the OSD ID associated with the devices. For
logical volumes, the devices key is populated with the physical devices associated with the logical volume.

In some cases, the output of the ceph -s command shows the following error message:

1 devices have fault light turned on

In such cases, you can list the devices with ceph device ls-lights command which gives the details about the lights on the
devices. Based on the information, you can turn off the lights on the devices.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph OSD node.

Procedure
Edit online

List the devices in the Ceph cluster:

Example

[ceph: root@host01 /]# ceph-volume lvm list

IBM Storage Ceph 351


====== osd.6 =======

[block] /dev/ceph-83909f70-95e9-4273-880e-5851612cbe53/osd-block-7ce687d9-07e7-4f8f-
a34e-d1b0efb89920

block device /dev/ceph-83909f70-95e9-4273-880e-5851612cbe53/osd-block-


7ce687d9-07e7-4f8f-a34e-d1b0efb89920
block uuid 4d7gzX-Nzxp-UUG0-bNxQ-Jacr-l0mP-IPD8cX
cephx lockbox secret
cluster fsid 1ca9f6a8-d036-11ec-8263-fa163ee967ad
cluster name ceph
crush device class None
encrypted 0
osd fsid 7ce687d9-07e7-4f8f-a34e-d1b0efb89920
osd id 6
osdspec affinity all-available-devices
type block
vdo 0
devices /dev/vdc

Optional: List the devices in the storage cluster with the lights:

Example

[ceph: root@host01 /]# ceph device ls-lights

{
"fault": [
"SEAGATE_ST12000NM002G_ZL2KTGCK0000C149"
],
"ident": []
}

Optional: Turn off the lights on the device:

Syntax

ceph device light off _DEVICE_NAME_ _FAULT/INDENT_ --force

Example

[ceph: root@host01 /]# ceph device light off SEAGATE_ST12000NM002G_ZL2KTGCK0000C149 fault --


force

Activating Ceph OSDs using ceph-volume

Edit online
The activation process enables a systemd unit at boot time, which allows the correct OSD identifier and its UUID to be enabled and
mounted.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph OSD node.

Ceph OSDs prepared by the ceph-volume utility.

Procedure
Edit online

1. Get the OSD ID and OSD FSID from an OSD node:

352 IBM Storage Ceph


Example

[ceph: root@host01 /]# ceph-volume lvm list

2. Activate the OSD:

Syntax

ceph-volume lvm activate --bluestore OSD_ID OSD_FSID

Example

[ceph: root@host01 /]# ceph-volume lvm activate --bluestore 10 7ce687d9-07e7-4f8f-a34e-


d1b0efb89920

To activate all OSDs that are prepared for activation, use the --all option:

Example

[ceph: root@host01 /]# ceph-volume lvm activate --all

3. Optionally, you can use the trigger subcommand. This command cannot be used directly, and it is used by systemd so that
it proxies input to ceph-volume lvm activate. This parses the metadata coming from systemd and startup, detecting the
UUID and ID associated with an OSD.

Syntax

ceph-volume lvm trigger SYSTEMD_DATA

Here the SYSTEMD_DATA is in OSD_ID-OSD_FSID format.

Example

[ceph: root@host01 /]# ceph-volume lvm trigger 10 7ce687d9-07e7-4f8f-a34e-d1b0efb89920

Reference
Edit online
For more information, see:

Preparing Ceph OSDs using ceph-volume

Creating Ceph OSDs using ceph-volume

Deactivating Ceph OSDs using ceph-volume

Edit online
You can deactivate the Ceph OSDs using the ceph-volume lvm subcommand. This subcommand removes the volume groups and
the logical volume.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph OSD node.

The Ceph OSDs are activated using the ceph-volume utility.

Procedure
Edit online

1. Get the OSD ID from the OSD node:

IBM Storage Ceph 353


[ceph: root@host01 /]# ceph-volume lvm list

2. Deactivate the OSD:

Syntax

ceph-volume lvm deactivate OSD_ID

Example

[ceph: root@host01 /]# ceph-volume lvm deactivate 16

Reference
Edit online
For more information, see:

Activating Ceph OSDs using ceph-volume

Preparing Ceph OSDs using ceph-volume

Creating Ceph OSDs using ceph-volume

Creating Ceph OSDs using ceph-volume

Edit online
The create subcommand calls the prepare subcommand, and then calls the activate subcommand.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph OSD nodes.

NOTE: If you prefer to have more control over the creation process, you can use the prepare and activate subcommands
separately to create the OSD, instead of using create. You can use the two subcommands to gradually introduce new OSDs into a
storage cluster, while avoiding having to rebalance large amounts of data. Both approaches work the same way, except that using the
create subcommand causes the OSD to become up and in immediately after completion.

Procedure
Edit online

1. To create a new OSD:

Syntax

ceph-volume lvm create --bluestore --data VOLUME_GROUP/LOGICAL_VOLUME

Example

[root@osd ~]# ceph-volume lvm create --bluestore --data example_vg/data_lv

Reference
Edit online
For more information, see:

Preparing Ceph OSDs using ceph-volume

Activating Ceph OSDs using ceph-volume

354 IBM Storage Ceph


Migrating BlueFS data
Edit online
You can migrate the BlueStore file system (BlueFS) data, that is the RocksDB data, from the source volume to the target volume
using the migrate LVM subcommand. The source volume, except the main one, is removed on success.

LVM volumes are primarily for the target only.

The new volumes are attached to the OSD, replacing one of the source drives.

Following are the placement rules for the LVM volumes:

If source list has DB or WAL volume, then the target device replaces it.

If source list has slow volume only, then explicit allocation using the new-db or new-wal command is needed.

The new-db and new-wal commands attaches the given logical volume to the given OSD as a DB or a WAL volume respectively.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph OSD node.

Ceph OSDs prepared by the ceph-volume utility.

Volume groups and Logical volumes are created.

Procedure
Edit online

1. Log in the cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Stop the OSD to which you have to add the DB or the WAL device:

Example

[ceph: root@host01 /]# ceph orch daemon stop osd.1

3. Mount the new devices to the container:

Example

[root@host01 ~]# cephadm shell --mount /var/lib/ceph/72436d46-ca06-11ec-9809-


ac1f6b5635ee/osd.1:/var/lib/ceph/osd/ceph-1

4. Attach the given logical volume to OSD as a DB/WAL device:

NOTE: This command fails if the OSD has an attached DB.

Syntax

ceph-volume lvm new-db --osd-id OSD_ID --osd-fsid OSD_FSID --target


VOLUME_GROUP_NAME/LOGICAL_VOLUME_NAME

Example

[ceph: root@host01 /]# ceph-volume lvm new-db --osd-id 1 --osd-fsid 7ce687d9-07e7-4f8f-a34e-


d1b0efb89921 --target vgname/new_db
[ceph: root@host01 /]# ceph-volume lvm new-wal --osd-id 1 --osd-fsid 7ce687d9-07e7-4f8f-a34e-
d1b0efb89921 --target vgname/new_wal

IBM Storage Ceph 355


5. You can migrate BlueFS data in the following ways:

Move BlueFS data from main device to LV that is already attached as DB:

Syntax

ceph-volume lvm migrate --osd-id OSD_ID --osd-fsid OSD_UUID --from data --target
VOLUME_GROUP_NAME/LOGICAL_VOLUME_NAME

Example

[ceph: root@host01 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid 0263644D-0BF1-4D6D-


BC34-28BD98AE3BC8 --from data --target vgname/db

Move BlueFS data from shared main device to LV which shall be attached as a new DB:

Syntax

ceph-volume lvm migrate --osd-id OSD_ID --osd-fsid OSD_UUID --from data --target
VOLUME_GROUP_NAME/LOGICAL_VOLUME_NAME

Example

[ceph: root@host01 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid 0263644D-0BF1-4D6D-


BC34-28BD98AE3BC8 --from data --target vgname/new_db

Move BlueFS data from DB device to new LV, and replace the DB device:

Syntax

ceph-volume lvm migrate --osd-id OSD_ID --osd-fsid OSD_UUID --from db --target


VOLUME_GROUP_NAME/LOGICAL_VOLUME_NAME

Example

[ceph: root@host01 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid 0263644D-0BF1-4D6D-


BC34-28BD98AE3BC8 --from db --target vgname/new_db

Move BlueFS data from main and DB devices to new LV, and replace the DB device:

Syntax

ceph-volume lvm migrate --osd-id OSD_ID --osd-fsid OSD_UUID --from data db --target
VOLUME_GROUP_NAME/LOGICAL_VOLUME_NAME

Example

[ceph: root@host01 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid 0263644D-0BF1-4D6D-


BC34-28BD98AE3BC8 --from data db --target vgname/new_db

Move BlueFS data from main, DB, and WAL devices to new LV, remove the WAL device, and replace the DB device:

Syntax

ceph-volume lvm migrate --osd-id OSD_ID --osd-fsid OSD_UUID --from data db wal --target
VOLUME_GROUP_NAME/LOGICAL_VOLUME_NAME

Example

[ceph: root@host01 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid 0263644D-0BF1-4D6D-


BC34-28BD98AE3BC8 --from data db --target vgname/new_db

Move BlueFS data from main, DB, and WAL devices to the main device, remove the WAL and DB devices:

Syntax

ceph-volume lvm migrate --osd-id OSD_ID --osd-fsid OSD_UUID --from db wal --target
VOLUME_GROUP_NAME/LOGICAL_VOLUME_NAME

Example

[ceph: root@host01 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid 0263644D-0BF1-4D6D-


BC34-28BD98AE3BC8 --from db wal --target vgname/data

356 IBM Storage Ceph


Using batch mode with ceph-volume

Edit online
The batch subcommand automates the creation of multiple OSDs when single devices are provided.

The ceph-volume command decides the best method to use to create the OSDs, based on drive type. Ceph OSD optimization
depends on the available devices:

If all devices are traditional hard drives, batch creates one OSD per device.

If all devices are solid state drives, batch creates two OSDs per device.

If there is a mix of traditional hard drives and solid state drives, batch uses the traditional hard drives for data, and creates
the largest possible journal (block.db) on the solid state drive.

NOTE: The batch subcommand does not support the creation of a separate logical volume for the write-ahead-log (block.wal)
device.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph OSD nodes.

Procedure
Edit online

1. To create OSDs on several drives:

Syntax

ceph-volume lvm batch --bluestore PATH_TO_DEVICE [PATH_TO_DEVICE]

Example

[ceph: root@host01 /]# ceph-volume lvm batch --bluestore /dev/sda /dev/sdb /dev/nvme0n1

Reference
Edit online

For more information, see Creating Ceph OSDs using ceph-volume.

Zapping data using ceph-volume

Edit online
The zap subcommand removes all data and filesystems from a logical volume or partition.

You can use the zap subcommand to zap logical volumes, partitions, or raw devices that are used by Ceph OSDs for reuse. Any
filesystems present on the given logical volume or partition are removed and all data is purged.

Optionally, you can use the --destroy flag for complete removal of a logical volume, partition, or the physical device.

Prerequisites
Edit online

IBM Storage Ceph 357


A running IBM Storage Ceph cluster.

Root-level access to the Ceph OSD node.

Procedure
Edit online

Zap the logical volume:

Syntax

ceph-volume lvm zap VOLUME_GROUP_NAME/LOGICAL_VOLUME_NAME [--destroy]

Example

[ceph: root@host01 /]# ceph-volume lvm zap osd-vg/data-lv

Zap the partition:

Syntax

ceph-volume lvm zap DEVICE_PATH_PARTITION [--destroy]

Example

[ceph: root@host01 /]# ceph-volume lvm zap /dev/sdc1

Zap the raw device:

Syntax

ceph-volume lvm zap DEVICE_PATH --destroy

Example

[ceph: root@host01 /]# ceph-volume lvm zap /dev/sdc --destroy

Purge multiple devices with the OSD ID:

Syntax

ceph-volume lvm zap --destroy --osd-id OSD_ID

Example

[ceph: root@host01 /]# ceph-volume lvm zap --destroy --osd-id 16

NOTE: All the relative devices are zapped.

Purge OSDs with the FSID:

Syntax

ceph-volume lvm zap --destroy --osd-fsid OSD_FSID

Example

[ceph: root@host01 /]# ceph-volume lvm zap --destroy --osd-fsid 65d7b6b1-e41a-4a3c-b363-


83ade63cb32b

NOTE: All the relative devices are zapped.

Ceph performance benchmark


Edit online
As a storage administrator, you can benchmark performance of the IBM Storage Ceph cluster. The purpose of this section is to give
Ceph administrators a basic understanding of Ceph's native benchmarking tools. These tools will provide some insight into how the

358 IBM Storage Ceph


Ceph storage cluster is performing. This is not the definitive guide to Ceph performance benchmarking, nor is it a guide on how to
tune Ceph accordingly.

Performance baseline
Benchmarking Ceph performance
Benchmarking Ceph block performance

Performance baseline
Edit online
The OSD, including the journal, disks and the network throughput should each have a performance baseline to compare against. You
can identify potential tuning opportunities by comparing the baseline performance data with the data from Ceph’s native tools. Red
Hat Enterprise Linux has many built-in tools, along with a plethora of open source community tools, available to help accomplish
these tasks.

Reference
Edit online

For more details about some of the available tools, see this Knowledgebase article.

Benchmarking Ceph performance


Edit online
Ceph includes the rados bench command to do performance benchmarking on a RADOS storage cluster. The command will
execute a write test and two types of read tests. The --no-cleanup option is important to use when testing both read and write
performance. By default the rados bench command will delete the objects it has written to the storage pool. Leaving behind these
objects allows the two read tests to measure sequential and random read performance.

NOTE: Before running these performance tests, drop all the file system caches by running the following:

Example

[ceph: root@host01 /]# echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Create a new storage pool:

Example

[ceph: root@host01 /]# ceph osd pool create testbench 100 100

2. Execute a write test for 10 seconds to the newly created storage pool:

Example

[ceph: root@host01 /]# rados bench -p testbench 10 write --no-cleanup

Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects

IBM Storage Ceph 359


Object prefix: benchmark_data_cephn1.home.network_10510
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 16 0 0 0 - 0
2 16 16 0 0 0 - 0
3 16 16 0 0 0 - 0
4 16 17 1 0.998879 1 3.19824 3.19824
5 16 18 2 1.59849 4 4.56163 3.87993
6 16 18 2 1.33222 0 - 3.87993
7 16 19 3 1.71239 2 6.90712 4.889
8 16 25 9 4.49551 24 7.75362 6.71216
9 16 25 9 3.99636 0 - 6.71216
10 16 27 11 4.39632 4 9.65085 7.18999
11 16 27 11 3.99685 0 - 7.18999
12 16 27 11 3.66397 0 - 7.18999
13 16 28 12 3.68975 1.33333 12.8124 7.65853
14 16 28 12 3.42617 0 - 7.65853
15 16 28 12 3.19785 0 - 7.65853
16 11 28 17 4.24726 6.66667 12.5302 9.27548
17 11 28 17 3.99751 0 - 9.27548
18 11 28 17 3.77546 0 - 9.27548
19 11 28 17 3.57683 0 - 9.27548
Total time run: 19.505620
Total writes made: 28
Write size: 4194304
Bandwidth (MB/sec): 5.742

Stddev Bandwidth: 5.4617


Max bandwidth (MB/sec): 24
Min bandwidth (MB/sec): 0
Average Latency: 10.4064
Stddev Latency: 3.80038
Max latency: 19.503
Min latency: 3.19824

3. Execute a sequential read test for 10 seconds to the storage pool:

Example

[ceph: root@host01 /]# rados bench -p testbench 10 seq

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
Total time run: 0.804869
Total reads made: 28
Read size: 4194304
Bandwidth (MB/sec): 139.153

Average Latency: 0.420841


Max latency: 0.706133
Min latency: 0.0816332

4. Execute a random read test for 10 seconds to the storage pool:

Example

[ceph: root@host01 /]# rados bench -p testbench 10 rand

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 46 30 119.801 120 0.440184 0.388125
2 16 81 65 129.408 140 0.577359 0.417461
3 16 120 104 138.175 156 0.597435 0.409318
4 15 157 142 141.485 152 0.683111 0.419964
5 16 206 190 151.553 192 0.310578 0.408343
6 16 253 237 157.608 188 0.0745175 0.387207
7 16 287 271 154.412 136 0.792774 0.39043
8 16 325 309 154.044 152 0.314254 0.39876
9 16 362 346 153.245 148 0.355576 0.406032
10 16 405 389 155.092 172 0.64734 0.398372
Total time run: 10.302229
Total reads made: 405
Read size: 4194304
Bandwidth (MB/sec): 157.248

360 IBM Storage Ceph


Average Latency: 0.405976
Max latency: 1.00869
Min latency: 0.0378431

5. To increase the number of concurrent reads and writes, use the -t option, which the default is 16 threads. Also, the -b
parameter can adjust the size of the object being written. The default object size is 4 MB. A safe maximum object size is 16
MB. IBM recommends running multiple copies of these benchmark tests to different pools. Doing this shows the changes in
performance from multiple clients.

Add the --run-name LABEL option to control the names of the objects that get written during the benchmark test. Multiple
rados bench commands might be ran simultaneously by changing the --run-name label for each running command
instance. This prevents potential I/O errors that can occur when multiple clients are trying to access the same object and
allows for different clients to access different objects. The --run-name option is also useful when trying to simulate a real
world workload.

Example

[ceph: root@host01 /]# rados bench -p testbench 10 write -t 4 --run-name client1

Maintaining 4 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects


Object prefix: benchmark_data_node1_12631
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 4 4 0 0 0 - 0
2 4 6 2 3.99099 4 1.94755 1.93361
3 4 8 4 5.32498 8 2.978 2.44034
4 4 8 4 3.99504 0 - 2.44034
5 4 10 6 4.79504 4 2.92419 2.4629
6 3 10 7 4.64471 4 3.02498 2.5432
7 4 12 8 4.55287 4 3.12204 2.61555
8 4 14 10 4.9821 8 2.55901 2.68396
9 4 16 12 5.31621 8 2.68769 2.68081
10 4 17 13 5.18488 4 2.11937 2.63763
11 4 17 13 4.71431 0 - 2.63763
12 4 18 14 4.65486 2 2.4836 2.62662
13 4 18 14 4.29757 0 - 2.62662
Total time run: 13.123548
Total writes made: 18
Write size: 4194304
Bandwidth (MB/sec): 5.486

Stddev Bandwidth: 3.0991


Max bandwidth (MB/sec): 8
Min bandwidth (MB/sec): 0
Average Latency: 2.91578
Stddev Latency: 0.956993
Max latency: 5.72685
Min latency: 1.91967

6. Remove the data created by the rados bench command:

Example

[ceph: root@host01 /]# rados -p testbench cleanup

Benchmarking Ceph block performance


Edit online
Ceph includes the rbd bench-write command to test sequential writes to the block device measuring throughput and latency.
The default byte size is 4096, the default number of I/O threads is 16, and the default total number of bytes to write is 1 GB. These
defaults can be modified by the --io-size, --io-threads and --io-total options respectively.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

IBM Storage Ceph 361


Root-level access to the node.

1. Load the rbd kernel module, if not already loaded:

Example

[root@host01 ~]# modprobe rbd

2. Create a 1 GB rbd image file in the testbench pool:

Example

[root@host01 ~]# rbd create image01 --size 1024 --pool testbench

3. Map the image file to a device file:

Example

[root@host01 ~]# rbd map image01 --pool testbench --name client.admin

4. Create an ext4 file system on the block device:

Example

[root@host01 ~]# mkfs.ext4 /dev/rbd/testbench/image01

5. Create a new directory:

Example

[root@host01 ~]# mkdir /mnt/ceph-block-device

6. Mount the block device under /mnt/ceph-block-device/:

Example

[root@host01 ~]# mount /dev/rbd/testbench/image01 /mnt/ceph-block-device

7. Execute the write performance test against the block device

Example

[root@host01 ~]# rbd bench --io-type write image01 --pool=testbench

bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern seq


SEC OPS OPS/SEC BYTES/SEC
2 11127 5479.59 22444382.79
3 11692 3901.91 15982220.33
4 12372 2953.34 12096895.42
5 12580 2300.05 9421008.60
6 13141 2101.80 8608975.15
7 13195 356.07 1458459.94
8 13820 390.35 1598876.60
9 14124 325.46 1333066.62
..

Reference
Edit online

For more information about the rbd command, see Ceph block devices.

Ceph performance counters


Edit online
As a storage administrator, you can gather performance metrics of the IBM Storage Ceph cluster. The Ceph performance counters
are a collection of internal infrastructure metrics. The collection, aggregation, and graphing of this metric data can be done by an
assortment of tools and can be useful for performance analytics.

362 IBM Storage Ceph


Access to Ceph performance counters
Display the Ceph performance counters
Dump the Ceph performance counters
Average count and sum
Ceph Monitor metrics
Ceph OSD metrics
Ceph Object Gateway metrics

Access to Ceph performance counters


Edit online
The performance counters are available through a socket interface for the Ceph Monitors and the OSDs. The socket file for each
respective daemon is located under /var/run/ceph, by default. The performance counters are grouped together into collection
names. These collections names represent a subsystem or an instance of a subsystem.

Here is the full list of the Monitor and the OSD collection name categories with a brief description for each:

Monitor Collection Name Categories

Cluster Metrics - Displays information about the storage cluster: Monitors, OSDs, Pools, and PGs

Level Database Metrics - Displays information about the back-end KeyValueStore database

Monitor Metrics - Displays general monitor information

Paxos Metrics - Displays information on cluster quorum management

Throttle Metrics - Displays the statistics on how the monitor is throttling

OSD Collection Name Categories

Write Back Throttle Metrics - Displays the statistics on how the write back throttle is tracking unflushed IO

Level Database Metrics - Displays information about the back-end KeyValueStore database

Objecter Metrics - Displays information on various object-based operations

Read and Write Operations Metrics - Displays information on various read and write operations

Recovery State Metrics - Displays latencies on various recovery states

OSD Throttle Metrics - Display the statistics on how the OSD is throttling

RADOS Gateway Collection Name Categories

Object Gateway Client Metrics - Displays statistics on GET and PUT requests

Objecter Metrics - Displays information on various object-based operations

Object Gateway Throttle Metrics - Display the statistics on how the OSD is throttling

Display the Ceph performance counters


Edit online
The ceph daemon DAEMON_NAME perf schema command outputs the available metrics. Each metric has an associated bit field
value type.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

IBM Storage Ceph 363


Root-level access to the node.

Procedure
Edit online

1. To view the metric’s schema:

Syntax

ceph daemon DAEMON_NAME perf schema

NOTE: You must run the ceph daemon command from the node running the daemon.

2. Executing ceph daemon DAEMON_NAME perf schema command from the monitor node:

Example

[ceph: root@host01 /]# ceph daemon mon.host01 perf schema

3. Executing the ceph daemon DAEMON_NAME perf schema command from the OSD node:

Example

[ceph: root@host01 /]# ceph daemon osd.11 perf schema

Table 1. The bit field value definitions


Bit Meaning
1 Floating point value
2 Unsigned 64-bit integer value
4 Average (Sum + Count)
8 Counter
Each value will have bit 1 or 2 set to indicate the type, either a floating point or an integer value. When bit 4 is set, there will be two
values to read, a sum and a count. When bit 8 is set, the average for the previous interval would be the sum delta, since the previous
read, divided by the count delta. Alternatively, dividing the values outright would provide the lifetime average value. Typically these
are used to measure latencies, the number of requests and a sum of request latencies. Some bit values are combined, for example 5,
6 and 10. A bit value of 5 is a combination of bit 1 and bit 4. This means the average will be a floating point value. A bit value of 6 is a
combination of bit 2 and bit 4. This means the average value will be an integer. A bit value of 10 is a combination of bit 2 and bit 8.
This means the counter value will be an integer value.

Reference
Edit online

Average count and sum

Dump the Ceph performance counters


Edit online
The ceph daemon .. perf dump command outputs the current values and groups the metrics under the collection name for
each subsystem.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
364 IBM Storage Ceph
Edit online

1. To view the current metric data:

Syntax

ceph daemon DAEMON_NAME perf dump

NOTE: You must run the ceph daemon command from the node running the daemon.

2. Executing ceph daemon .. perf dump command from the Monitor node:

[ceph: root@host01 /]# ceph daemon mon.host01 perf dump

3. Executing the ceph daemon .. perf dump command from the OSD node:

[ceph: root@host01 /]# ceph daemon osd.11 perf dump

Reference
Edit online

To view a short description of each Monitor metric available, see the Ceph monitor metrics table.

Average count and sum


Edit online
All latency numbers have a bit field value of 5. This field contains floating point values for the average count and sum. The avgcount
is the number of operations within this range and the sum is the total latency in seconds. When dividing the sum by the avgcount
this will provide you with an idea of the latency per operation.

Reference
Edit online

To view a short description of each OSD metric available, see the Ceph OSD table.

Ceph Monitor metrics


Edit online

Cluster Metrics

Level Database Metrics

General Monitor Metrics

Paxos Metrics

Throttle Metrics

Cluster Metrics
Edit online
Table 1. Cluster Metrics Table
Collection Name Metric Name Bit Field Value Short Description
cluster num_mon 2 Number of monitors
num_mon_quorum 2 Number of monitors in quorum
num_osd 2 Total number of OSD

IBM Storage Ceph 365


Collection Name Metric Name Bit Field Value Short Description
num_osd_up 2 Number of OSDs that are up
num_osd_in 2 Number of OSDs that are in cluster
osd_epoch 2 Current epoch of OSD map
osd_bytes 2 Total capacity of cluster in bytes
osd_bytes_used 2 Number of used bytes on cluster
osd_bytes_avail 2 Number of available bytes on cluster
num_pool 2 Number of pools
num_pg 2 Total number of placement groups
num_pg_active_clean 2 Number of placement groups in active+clean state
num_pg_active 2 Number of placement groups in active state
num_pg_peering 2 Number of placement groups in peering state
num_object 2 Total number of objects on cluster
num_object_degraded 2 Number of degraded (missing replicas) objects
num_object_misplaced 2 Number of misplaced (wrong location in the cluster) objects
num_object_unfound 2 Number of unfound objects
num_bytes 2 Total number of bytes of all objects
num_mds_up 2 Number of MDSs that are up
num_mds_in 2 Number of MDS that are in cluster
num_mds_failed 2 Number of failed MDS
mds_epoch 2 Current epoch of MDS map

Level Database Metrics


Edit online
Table 2. Level Database Metrics Table
Collection Name Metric Name Bit Field Value Short Description
leveldb leveldb_get 10 Gets
leveldb_transaction 10 Transactions
leveldb_compact 10 Compactions
leveldb_compact_range 10 Compactions by range
leveldb_compact_queue_merge 10 Mergings of ranges in compaction queue
leveldb_compact_queue_len 2 Length of compaction queue

General Monitor Metrics


Edit online
Table 3. General Monitor Metrics Table
Collection Name Metric Name Bit Field Value Short Description
mon num_sessions 2 Current number of opened monitor sessions
session_add 10 Number of created monitor sessions
session_rm 10 Number of remove_session calls in monitor
session_trim 10 Number of trimed monitor sessions
num_elections 10 Number of elections monitor took part in
election_call 10 Number of elections started by monitor
election_win 10 Number of elections won by monitor
election_lose 10 Number of elections lost by monitor

Paxos Metrics
Edit online
Table 4. Paxos Metrics Table
Collection Name Metric Name Bit Field Value Short Description

366 IBM Storage Ceph


Collection Name Metric Name Bit Field Value Short Description
paxos start_leader 10 Starts in leader role
start_peon 10 Starts in peon role
restart 10 Restarts
refresh 10 Refreshes
refresh_latency 5 Refresh latency
begin 10 Started and handled begins
begin_keys 6 Keys in transaction on begin
begin_bytes 6 Data in transaction on begin
begin_latency 5 Latency of begin operation
commit 10 Commits
commit_keys 6 Keys in transaction on commit
commit_bytes 6 Data in transaction on commit
commit_latency 5 Commit latency
collect 10 Peon collects
collect_keys 6 Keys in transaction on peon collect
collect_bytes 6 Data in transaction on peon collect
collect_latency 5 Peon collect latency
collect_uncommitted 10 Uncommitted values in started and handled collects
collect_timeout 10 Collect timeouts
accept_timeout 10 Accept timeouts
lease_ack_timeout 10 Lease acknowledgement timeouts
lease_timeout 10 Lease timeouts
store_state 10 Store a shared state on disk
store_state_keys 6 Keys in transaction in stored state
store_state_bytes 6 Data in transaction in stored state
store_state_latency 5 Storing state latency
share_state 10 Sharings of state
share_state_keys 6 Keys in shared state
share_state_bytes 6 Data in shared state
new_pn 10 New proposal number queries
new_pn_latency 5 New proposal number getting latency

Throttle Metrics
Edit online
Table 5. Throttle Metrics Table
Collection Name Metric Name Bit Field Value Short Description
throttle-* val 10 Currently available throttle
max 10 Max value for throttle
get 10 Gets
get_sum 10 Got data
get_or_fail_fail 10 Get blocked during get_or_fail
get_or_fail_success 10 Successful get during get_or_fail
take 10 Takes
take_sum 10 Taken data
put 10 Puts
put_sum 10 Put data
wait 5 Waiting latency

Ceph OSD metrics

IBM Storage Ceph 367


Edit online

Write Back Throttle Metrics

Level Database Metrics

Objecter Metrics

Read and Write Operations Metrics

Recovery State Metrics

OSD Throttle Metrics

Write Back Throttle Metrics


Edit online
Table 1. Write Back Throttle Metrics Table
Collection Name Metric Name Bit Field Value Short Description
WBThrottle bytes_dirtied 2 Dirty data
bytes_wb 2 Written data
ios_dirtied 2 Dirty operations
ios_wb 2 Written operations
inodes_dirtied 2 Entries waiting for write
inodes_wb 2 Written entries

Level Database Metrics


Edit online
Table 2. Level Database Metrics Table
Collection Name Metric Name Bit Field Value Short Description
leveldb leveldb_get 10 Gets
leveldb_transaction 10 Transactions
leveldb_compact 10 Compactions
leveldb_compact_range 10 Compactions by range
leveldb_compact_queue_merge 10 Mergings of ranges in compaction queue
leveldb_compact_queue_len 2 Length of compaction queue

OBjecter Metrics
Edit online
Table 3. Objecter Metrics Table
Collection Name Metric Name Bit Field Value Short Description
objecter op_active 2 Active operations
op_laggy 2 Laggy operations
op_send 10 Sent operations
op_send_bytes 10 Sent data
op_resend 10 Resent operations
op_ack 10 Commit callbacks
op_commit 10 Operation commits
op 10 Operation
op_r 10 Read operations
op_w 10 Write operations
op_rmw 10 Read-modify-write operations
op_pg 10 PG operation
osdop_stat 10 Stat operations
osdop_create 10 Create object operations

368 IBM Storage Ceph


Collection Name Metric Name Bit Field Value Short Description
osdop_read 10 Read operations
osdop_write 10 Write operations
osdop_writefull 10 Write full object operations
osdop_append 10 Append operation
osdop_zero 10 Set object to zero operations
osdop_truncate 10 Truncate object operations
osdop_delete 10 Delete object operations
osdop_mapext 10 Map extent operations
osdop_sparse_read 10 Sparse read operations
osdop_clonerange 10 Clone range operations
osdop_getxattr 10 Get xattr operations
osdop_setxattr 10 Set xattr operations
osdop_cmpxattr 10 Xattr comparison operations
osdop_rmxattr 10 Remove xattr operations
osdop_resetxattrs 10 Reset xattr operations
osdop_tmap_up 10 TMAP update operations
osdop_tmap_put 10 TMAP put operations
osdop_tmap_get 10 TMAP get operations
osdop_call 10 Call (execute) operations
osdop_watch 10 Watch by object operations
osdop_notify 10 Notify about object operations
osdop_src_cmpxattr 10 Extended attribute comparison in multi operations
osdop_other 10 Other operations
linger_active 2 Active lingering operations
linger_send 10 Sent lingering operations
linger_resend 10 Resent lingering operations
linger_ping 10 Sent pings to lingering operations
poolop_active 2 Active pool operations
poolop_send 10 Sent pool operations
poolop_resend 10 Resent pool operations
poolstat_active 2 Active get pool stat operations
poolstat_send 10 Pool stat operations sent
poolstat_resend 10 Resent pool stats
statfs_active 2 Statfs operations
statfs_send 10 Sent FS stats
statfs_resend 10 Resent FS stats
command_active 2 Active commands
command_send 10 Sent commands
command_resend 10 Resent commands
map_epoch 2 OSD map epoch
map_full 10 Full OSD maps received
map_inc 10 Incremental OSD maps received
osd_sessions 2 Open sessions
osd_session_open 10 Sessions opened
osd_session_close 10 Sessions closed
osd_laggy 2 Laggy OSD sessions

Read and Write Operations Metrics


Edit online
Table 4. Read and Write Operations Metrics Table
Collection Name Metric Name Bit Field Value Short Description
osd op_wip 2 Replication operations currently being processed (primary)

IBM Storage Ceph 369


Collection Name Metric Name Bit Field Value Short Description
op_in_bytes 10 Client operations total write size
op_out_bytes 10 Client operations total read size
op_latency 5 Latency of client operations (including queue time)
op_process_latency 5 Latency of client operations (excluding queue time)
op_r 10 Client read operations
op_r_out_bytes 10 Client data read
op_r_latency 5 Latency of read operation (including queue time)
op_r_process_latency 5 Latency of read operation (excluding queue time)
op_w 10 Client write operations
op_w_in_bytes 10 Client data written
op_w_rlat 5 Client write operation readable/applied latency
op_w_latency 5 Latency of write operation (including queue time)
op_w_process_latency 5 Latency of write operation (excluding queue time)
op_rw 10 Client read-modify-write operations
op_rw_in_bytes 10 Client read-modify-write operations write in
op_rw_out_bytes 10 Client read-modify-write operations read out
op_rw_rlat 5 Client read-modify-write operation readable/applied latency
op_rw_latency 5 Latency of read-modify-write operation (including queue time)
op_rw_process_latency 5 Latency of read-modify-write operation (excluding queue time)
subop 10 Suboperations
subop_in_bytes 10 Suboperations total size
subop_latency 5 Suboperations latency
subop_w 10 Replicated writes
subop_w_in_bytes 10 Replicated written data size
subop_w_latency 5 Replicated writes latency
subop_pull 10 Suboperations pull requests
subop_pull_latency 5 Suboperations pull latency
subop_push 10 Suboperations push messages
subop_push_in_bytes 10 Suboperations pushed size
subop_push_latency 5 Suboperations push latency
pull 10 Pull requests sent
push 10 Push messages sent
push_out_bytes 10 Pushed size
push_in 10 Inbound push messages
push_in_bytes 10 Inbound pushed size
recovery_ops 10 Started recovery operations
loadavg 2 CPU load
buffer_bytes 2 Total allocated buffer size
numpg 2 Placement groups
numpg_primary 2 Placement groups for which this osd is primary
numpg_replica 2 Placement groups for which this osd is replica
numpg_stray 2 Placement groups ready to be deleted from this osd
heartbeat_to_peers 2 Heartbeat (ping) peers we send to
heartbeat_from_peers 2 Heartbeat (ping) peers we recv from
map_messages 10 OSD map messages
map_message_epochs 10 OSD map epochs
map_message_epoch_dups 10 OSD map duplicates
stat_bytes 2 OSD size
stat_bytes_used 2 Used space
stat_bytes_avail 2 Available space
copyfrom 10 Rados copy-from operations
tier_promote 10 Tier promotions
tier_flush 10 Tier flushes

370 IBM Storage Ceph


Collection Name Metric Name Bit Field Value Short Description
tier_flush_fail 10 Failed tier flushes
tier_try_flush 10 Tier flush attempts
tier_try_flush_fail 10 Failed tier flush attempts
tier_evict 10 Tier evictions
tier_whiteout 10 Tier whiteouts
tier_dirty 10 Dirty tier flag set
tier_clean 10 Dirty tier flag cleaned
tier_delay 10 Tier delays (agent waiting)
tier_proxy_read 10 Tier proxy reads
agent_wake 10 Tiering agent wake up
agent_skip 10 Objects skipped by agent
agent_flush 10 Tiering agent flushes
agent_evict 10 Tiering agent evictions
object_ctx_cache_hit 10 Object context cache hits
object_ctx_cache_total 10 Object context cache lookups

Recovery State Metrics


Edit online
Table 5. Recovery State Metrics Table
Collection Name Metric Name Bit Field Value Short Description
recoverystate_pe initial_latency 5 Initial recovery state latency
rf
started_latency 5 Started recovery state latency
reset_latency 5 Reset recovery state latency
start_latency 5 Start recovery state latency
primary_latency 5 Primary recovery state latency
peering_latency 5 Peering recovery state latency
backfilling_latency 5 Backfilling recovery state latency
waitremotebackfillreserved_lat 5 Wait remote backfill reserved recovery state
ency latency
waitlocalbackfillreserved_late 5 Wait local backfill reserved recovery state
ncy latency
notbackfilling_latency 5 Notbackfilling recovery state latency
repnotrecovering_latency 5 Repnotrecovering recovery state latency
repwaitrecoveryreserved_latenc 5 Rep wait recovery reserved recovery state
y latency
repwaitbackfillreserved_latenc 5 Rep wait backfill reserved recovery state
y latency
RepRecovering_latency 5 RepRecovering recovery state latency
activating_latency 5 Activating recovery state latency
waitlocalrecoveryreserved_late 5 Wait local recovery reserved recovery state
ncy latency
waitremoterecoveryreserved_lat 5 Wait remote recovery reserved recovery state
ency latency
recovering_latency 5 Recovering recovery state latency
recovered_latency 5 Recovered recovery state latency
clean_latency 5 Clean recovery state latency
active_latency 5 Active recovery state latency
replicaactive_latency 5 Replicaactive recovery state latency
stray_latency 5 Stray recovery state latency
getinfo_latency 5 Getinfo recovery state latency
getlog_latency 5 Getlog recovery state latency
waitactingchange_latency 5 Waitactingchange recovery state latency

IBM Storage Ceph 371


Collection Name Metric Name Bit Field Value Short Description
incomplete_latency 5 Incomplete recovery state latency
getmissing_latency 5 Getmissing recovery state latency
waitupthru_latency 5 Waitupthru recovery state latency

OSD Throttle Metrics


Edit online
Table 6. OSD Throttle Metrics Table
Collection Name Metric Name Bit Field Value Short Description
throttle-* val 10 Currently available throttle
max 10 Max value for throttle
get 10 Gets
get_sum 10 Got data
get_or_fail_fail 10 Get blocked during get_or_fail
get_or_fail_success 10 Successful get during get_or_fail
take 10 Takes
take_sum 10 Taken data
put 10 Puts
put_sum 10 Put data
wait 5 Waiting latency

Ceph Object Gateway metrics


Edit online

Ceph Object Gateway Client Metrics

Objecter Metrics

Ceph Object Gateway Throttle Metrics

Ceph Object Gateway Client Metrics


Edit online
Table 1. Ceph Object Gateway Client Metrics Table
Collection Name Metric Name Bit Field Value Short Description
client.rgw.<rgw_node_name> req 10 Requests
failed_req 10 Aborted requests
get 10 Gets
get_b 10 Size of gets
get_initial_lat 5 Get latency
put 10 Puts
put_b 10 Size of puts
put_initial_lat 5 Put latency
qlen 2 Queue length
qactive 2 Active requests queue
cache_hit 10 Cache hits
cache_miss 10 Cache miss
keystone_token_cache_hit 10 Keystone token cache hits
keystone_token_cache_miss 10 Keystone token cache miss

Objecter Metrics
372 IBM Storage Ceph
Edit online
Table 2. Objecter Metrics Table
Collection Name Metric Name Bit Field Value Short Description
objecter op_active 2 Active operations
op_laggy 2 Laggy operations
op_send 10 Sent operations
op_send_bytes 10 Sent data
op_resend 10 Resent operations
op_ack 10 Commit callbacks
op_commit 10 Operation commits
op 10 Operation
op_r 10 Read operations
op_w 10 Write operations
op_rmw 10 Read-modify-write operations
op_pg 10 PG operation
osdop_stat 10 Stat operations
osdop_create 10 Create object operations
osdop_read 10 Read operations
osdop_write 10 Write operations
osdop_writefull 10 Write full object operations
osdop_append 10 Append operation
osdop_zero 10 Set object to zero operations
osdop_truncate 10 Truncate object operations
osdop_delete 10 Delete object operations
osdop_mapext 10 Map extent operations
osdop_sparse_read 10 Sparse read operations
osdop_clonerange 10 Clone range operations
osdop_getxattr 10 Get xattr operations
osdop_setxattr 10 Set xattr operations
osdop_cmpxattr 10 Xattr comparison operations
osdop_rmxattr 10 Remove xattr operations
osdop_resetxattrs 10 Reset xattr operations
osdop_tmap_up 10 TMAP update operations
osdop_tmap_put 10 TMAP put operations
osdop_tmap_get 10 TMAP get operations
osdop_call 10 Call (execute) operations
osdop_watch 10 Watch by object operations
osdop_notify 10 Notify about object operations
osdop_src_cmpxattr 10 Extended attribute comparison in multi operations
osdop_other 10 Other operations
linger_active 2 Active lingering operations
linger_send 10 Sent lingering operations
linger_resend 10 Resent lingering operations
linger_ping 10 Sent pings to lingering operations
poolop_active 2 Active pool operations
poolop_send 10 Sent pool operations
poolop_resend 10 Resent pool operations
poolstat_active 2 Active get pool stat operations
poolstat_send 10 Pool stat operations sent
poolstat_resend 10 Resent pool stats
statfs_active 2 Statfs operations
statfs_send 10 Sent FS stats
statfs_resend 10 Resent FS stats
command_active 2 Active commands

IBM Storage Ceph 373


Collection Name Metric Name Bit Field Value Short Description
command_send 10 Sent commands
command_resend 10 Resent commands
map_epoch 2 OSD map epoch
map_full 10 Full OSD maps received
map_inc 10 Incremental OSD maps received
osd_sessions 2 Open sessions
osd_session_open 10 Sessions opened
osd_session_close 10 Sessions closed
osd_laggy 2 Laggy OSD sessions

Ceph Object Gateway Throttle Metrics


Edit online
Table 3. Ceph Object Gateway Throttle Metrics Table
Collection Name Metric Name Bit Field Value Short Description
throttle-* val 10 Currently available throttle
max 10 Max value for throttle
get 10 Gets
get_sum 10 Got data
get_or_fail_fail 10 Get blocked during get_or_fail
get_or_fail_success 10 Successful get during get_or_fail
take 10 Takes
take_sum 10 Taken data
put 10 Puts
put_sum 10 Put data
wait 5 Waiting latency

BlueStore
Edit online
BlueStore is the back-end object store for the OSD daemons and puts objects directly on the block device.

IMPORTANT: BlueStore provides a high-performance backend for OSD daemons in a production environment. By default, BlueStore
is configured to be self-tuning. If you determine that your environment performs better with BlueStore tuned manually, please
contact IBM support and share the details of your configuration to help us improve the auto-tuning capability. IBM looks forward to
your feedback and appreciates your recommendations.

Ceph BlueStore
Ceph BlueStore devices
Ceph BlueStore caching
Sizing considerations for Ceph BlueStore
Tuning Ceph BlueStore using bluestore_min_alloc_size parameter
Resharding the RocksDB database using the BlueStore admin tool
The BlueStore fragmentation tool
Ceph BlueStore BlueFS

Ceph BlueStore
Edit online
The following are some of the main features of using BlueStore:

Direct management of storage devices

374 IBM Storage Ceph


BlueStore consumes raw block devices or partitions. This avoids any intervening layers of abstraction, such as local file systems like
XFS, that might limit performance or add complexity.

Metadata management with RocksDB

BlueStore uses the RocksDB key-value database to manage internal metadata, such as the mapping from object names to block
locations on a disk.

Full data and metadata checksumming

By default all data and metadata written to BlueStore is protected by one or more checksums. No data or metadata are read from
disk or returned to the user without verification.

Efficient copy-on-write

The Ceph Block Device and Ceph File System snapshots rely on a copy-on-write clone mechanism that is implemented efficiently in
BlueStore. This results in efficient I/O both for regular snapshots and for erasure coded pools which rely on cloning to implement
efficient two-phase commits.

No large double-writes

BlueStore first writes any new data to unallocated space on a block device, and then commits a RocksDB transaction that updates
the object metadata to reference the new region of the disk. Only when the write operation is below a configurable size threshold, it
falls back to a write-ahead journaling scheme.

Multi-device support

BlueStore can use multiple block devices for storing different data. For example: Hard Disk Drive (HDD) for the data, Solid-state Drive
(SSD) for metadata, Non-volatile Memory (NVM) or Non-volatile random-access memory (NVRAM) or persistent memory for the
RocksDB write-ahead log (WAL). See Ceph BlueStore devices for details.

Efficient block device usage

Because BlueStore does not use any file system, it minimizes the need to clear the storage device cache.

Ceph BlueStore devices


Edit online
BlueStore manages either one, two, or three storage devices in the backend.

Primary

WAL

DB

In the simplest case, BlueStore consumes a single primary storage device. The storage device is partitioned into two parts that
contain:

OSD metadata: A small partition formatted with XFS that contains basic metadata for the OSD. This data directory includes
information about the OSD, such as its identifier, which cluster it belongs to, and its private keyring.

Data: A large partition occupying the rest of the device that is managed directly by BlueStore and that contains all of the OSD
data. This primary device is identified by a block symbolic link in the data directory.

You can also use two additional devices:

A WAL (write-ahead-log) device: A device that stores BlueStore internal journal or write-ahead log. It is identified by the
block.wal symbolic link in the data directory. Consider using a WAL device only if the device is faster than the primary
device. For example, when the WAL device uses an SSD disk and the primary device uses an HDD disk.

A DB device: A device that stores BlueStore internal metadata. The embedded RocksDB database puts as much metadata as
it can on the DB device instead of on the primary device to improve performance. If the DB device is full, it starts adding
metadata to the primary device. Consider using a DB device only if the device is faster than the primary device.

IBM Storage Ceph 375


WARNING: If you have only less than a gigabyte storage available on fast devices, IBM recommends using it as a WAL device. If you
have more fast devices available, consider using it as a DB device. The BlueStore journal is always placed on the fastest device, so
using a DB device provides the same benefit that the WAL device provides while also allowing for storing additional metadata.

Ceph BlueStore caching


Edit online
The BlueStore cache is a collection of buffers that, depending on configuration, can be populated with data as the OSD daemon does
reading from or writing to the disk. By default in IBM Storage Ceph, BlueStore will cache on reads, but not writes.This is because the
bluestore_default_buffered_write option is set to false to avoid potential overhead associated with cache eviction.

If the bluestore_default_buffered_write option is set to true, data is written to the buffer first, and then committed to disk.
Afterwards, a write acknowledgement is sent to the client, allowing subsequent reads faster access to the data already in cache,
until that data is evicted.

Read-heavy workloads will not see an immediate benefit from BlueStore caching. As more reading is done, the cache will grow over
time and subsequent reads will see an improvement in performance. How fast the cache populates depends on the BlueStore block
and database disk type, and the client’s workload requirements.

IMPORTANT: Please contact IBM Support before enabling the bluestore_default_buffered_write option.

Sizing considerations for Ceph BlueStore


Edit online
When mixing traditional and solid state drives using BlueStore OSDs, it is important to size the RocksDB logical volume (block.db)
appropriately. IBM recommends that the RocksDB logical volume be no less than 4% of the block size with object, file and mixed
workloads. IBM supports 1% of the BlueStore block size with RocksDB and OpenStack block workloads. For example, if the block
size is 1 TB for an object workload, then at a minimum, create a 40 GB RocksDB logical volume.

When not mixing drive types, there is no requirement to have a separate RocksDB logical volume. BlueStore will automatically
manage the sizing of RocksDB.

BlueStore’s cache memory is used for the key-value pair metadata for RocksDB, BlueStore metadata, and object data.

NOTE: The BlueStore cache memory values are in addition to the memory footprint already being consumed by the OSD.

Tuning Ceph BlueStore using bluestore_min_alloc_size


parameter
Edit online
This procedure is for new or freshly deployed OSDs.

In BlueStore, the raw partition is allocated and managed in chunks of bluestore_min_alloc_size. By default,
bluestore_min_alloc_size is 4096, equivalent to 4 KiB for HDDs and SSDs. The unwritten area in each chunk is filled with
zeroes when it is written to the raw partition. This can lead to wasted unused space when not properly sized for your workload, for
example when writing small objects.

It is best practice to set bluestore_min_alloc_size to match the smallest write so this write amplification penalty can be
avoided.

IMPORTANT: Changing the value of bluestore_min_alloc_size is not recommended. For any assistance, contact IBM support.

NOTE: The settings bluestore_min_alloc_size_ssd and bluestore_min_alloc_size_hdd are specific to SSDs and HDDs,
respectively, but setting them is not necessary because setting bluestore_min_alloc_size overrides them.

Prerequisites
376 IBM Storage Ceph
Edit online

A running IBM Storage Ceph cluster.

Ceph monitors and managers are deployed in the cluster.

Servers or nodes that can be freshly provisioned as OSD nodes

The admin keyring for the Ceph Monitor node, if you are redeploying an existing Ceph OSD node.

Procedure
Edit online

1. On the bootstrapped node, change the value of bluestore_min_alloc_size parameter:

Syntax

ceph config set osd.OSD_ID bluestore_min_alloc_size_DEVICE_NAME_ VALUE

Example

[ceph: root@host01 /]# ceph config set osd.4 bluestore_min_alloc_size_hdd 8192

You can see bluestore_min_alloc_size is set to 8192 bytes, which is equivalent to 8 KiB.

NOTE: The selected values should be power of 2 aligned.

2. Restart the OSD’s service.

Syntax

systemctl restart SERVICE_ID

Example

[ceph: root@host01 /]# systemctl restart ceph-499829b4-832f-11eb-8d6d-


[email protected]

Verification
Edit online

Verify the setting using the ceph daemon command:

Syntax

ceph daemon osd.OSD_ID config get bluestore_min_alloc_size__DEVICE_

Example

[ceph: root@host01 /]# ceph daemon osd.4 config get bluestore_min_alloc_size_hdd

ceph daemon osd.4 config get bluestore_min_alloc_size


{
"bluestore_min_alloc_size": "8192"
}

Reference
Edit online

For OSD removal and addition, see Management of OSDs using the Ceph Orchestrator.

NOTE: For already deployed OSDs, you cannot modify the bluestore_min_alloc_size parameter so you have to remove
the OSDs and freshly deploy them again.

IBM Storage Ceph 377


Resharding the RocksDB database using the BlueStore admin tool
Edit online
You can reshard the database with the BlueStore admin tool. It transforms BlueStore’s RocksDB database from one shape to another
into several column families without redeploying the OSDs. Column families have the same features as the whole database, but
allows users to operate on smaller data sets and apply different options. It leverages the different expected lifetime of keys stored.
The keys are moved during the transformation without creating new keys or deleting existing keys.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

The object store configured as BlueStore.

OSD nodes deployed on the hosts.

Root level access to the all the hosts.

The ceph-common and cephadm packages instaled on all the hosts.

Procedure
Edit online

1. Log into the cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Fetch the OSD_ID and the host details from the administration node:

Example

[ceph: root@host01 /]# ceph orch ps

3. Log into the respective host as a root user and stop the OSD:

Syntax

cephadm unit --name OSD_ID stop

Example

[root@host02 ~]# cephadm unit --name osd.0 stop

4. Enter into the stopped OSD daemon container:

Syntax

cephadm shell --name OSD_ID

Example

[root@host02 ~]# cephadm shell --name osd.0

5. Log into the cephadm shell and check the file system consistency:

Syntax

ceph-bluestore-tool --path/var/lib/ceph/osd/ceph-OSD_ID/ fsck

Example

[ceph: root@host02 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0/ fsck

378 IBM Storage Ceph


fsck success

6. Check the sharding status of the OSD node:

Syntax

ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-OSD_ID/ show-sharding

Example

[ceph: root@host02 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-6/ show-sharding

m(3) p(3,0-12) O(3,0-13) L P

7. Run the ceph-bluestore-tool command to reshard. IBM recommends to use the parameters as given in the command:

Syntax

ceph-bluestore-tool --log-level 10 -l log.txt --path /var/lib/ceph/osd/ceph-OSD_ID/ --


sharding="m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} L P" reshard

Example

[ceph: root@host02 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-6/ --sharding="m(3)


p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} L P" reshard

reshard success

8. To check the sharding status of the OSD node, run the show-sharding command:

Syntax

ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-OSD_ID/ show-sharding

Example

[ceph: root@host02 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-6/ show-sharding

m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} L P

9. Exit from the cephadm shell:

[ceph: root@host02 /]# exit

10. Log into the respective host as a root user and start the OSD:

Syntax

cephadm unit --name OSD_ID start

Example

[root@host02 ~]# cephadm unit --name osd.0 start

Reference
Edit online

For more information, see Installing.

The BlueStore fragmentation tool


Edit online
As a storage administrator, you will want to periodically check the fragmentation level of your BlueStore OSDs. You can check
fragmentation levels with one simple command for offline or online OSDs.

What is the BlueStore fragmentation tool?


Checking for fragmentation

IBM Storage Ceph 379


What is the BlueStore fragmentation tool?
Edit online
For BlueStore OSDs, the free space gets fragmented over time on the underlying storage device. Some fragmentation is normal, but
when there is excessive fragmentation this causes poor performance.

The BlueStore fragmentation tool generates a score on the fragmentation level of the BlueStore OSD. This fragmentation score is
given as a range, 0 through 1. A score of 0 means no fragmentation, and a score of 1 means severe fragmentation.

Table 1. Fragmentation scores' meaning


Score Fragmentation Amount
0.0 - 0.4 None to tiny fragmentation.
0.4 - 0.7 Small and acceptable fragmentation.
0.7 - 0.9 Considerable, but safe fragmentation.
0.9 - 1.0 Severe fragmentation and that causes performance issues.
IMPORTANT: If you have severe fragmentation, and need some help in resolving the issue, contact IBM Support.

Checking for fragmentation


Edit online
Checking the fragmentation level of BlueStore OSDs can be done either online or offline.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

BlueStore OSDs.

Online BlueStore fragmentation score

1. Inspect a running BlueStore OSD process:

a. Simple report:

Syntax

ceph daemon OSD_ID bluestore allocator score block

Example

[ceph: root@host01 /]# ceph daemon osd.123 bluestore allocator score block

b. A more detailed report:

Syntax

ceph daemon OSD_ID bluestore allocator dump block

Example

[ceph: root@host01 /]# ceph daemon osd.123 bluestore allocator dump block

Offline BlueStore fragmentation score

1. Follow the steps for resharding for checking the offline fragmentation score.

Example

[root@host01 ~]# podman exec -it 7fbd6c6293c0 /bin/bash

380 IBM Storage Ceph


2. Inspect a non-running BlueStore OSD process:

a. Simple report:

Syntax

ceph-bluestore-tool --path PATH_TO_OSD_DATA_DIRECTORY --allocator block free-score

Example

[root@7fbd6c6293c0 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator


block free-score

b. A more detailed report:

Syntax

ceph-bluestore-tool --path PATH_TO_OSD_DATA_DIRECTORY --allocator block free-dump


block:
{
"fragmentation_rating": 0.018290238194701977
}

Example

[root@7fbd6c6293c0 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator


block free-dump
block:
{
"capacity": 21470642176,
"alloc_unit": 4096,
"alloc_type": "hybrid",
"alloc_name": "block",
"extents": [
{
"offset": "0x370000",
"length": "0x20000"
},
{
"offset": "0x3a0000",
"length": "0x10000"
},
{
"offset": "0x3f0000",
"length": "0x20000"
},
{
"offset": "0x460000",
"length": "0x10000"
},

Reference
Edit online

See the BlueStore Fragmentation Tool for details on the fragmentation score.

See the Resharding the RocskDB database using the BlueStore admin tool for details on resharding.

Ceph BlueStore BlueFS


Edit online
BlueStore block database stores metadata as key-value pairs in a RocksDB database. The block database resides on a small BlueFS
partition on the storage device. BlueFS is a minimal file system that is designed to hold the RocksDB files.

Viewing the bluefs_buffered_io setting


Viewing Ceph BlueFS statistics for Ceph OSDs

IBM Storage Ceph 381


BlueFS files
Edit online
There are three types of files that RocksDB produces.

Control files, for example CURRENT, IDENTITY, and MANIFEST-00011.

Database (DB) table files, for example 004112.sst.

Write ahead logs (WAL), for example 00038.log.

There is also an internal, hidden file that serves as BlueFS replay log, ino 1, that works as directory structure, file mapping, and
operations log.

Fallback hierarchy
Edit online
With BlueFS it is possible to put any file on any device. Parts of file can even reside on different devices, that is WAL, DB, and SLOW.
There is an order to where BlueFS puts files. File is put to secondary storage only when primary storage is exhausted, and tertiary
only when secondary is exhausted.

The order for the specific files is as follows, for each device type.

Write ahead logs: WAL, DB, SLOW

Replay log ino 1: DB, SLOW

Control and DB files: DB, SLOW

Control and DB file order when running out of space: SLOW

IMPORTANT: There is an exception to control and DB file order. When RocksDB detects that you are running out of space on DB file,
it directly notifies you to put file to SLOW device.

Viewing the bluefs_buffered_io setting

Edit online
As a storage administrator, you can view the current setting for the bluefs_buffered_io parameter.

The option bluefs_buffered_io is set to True by default for IBM Storage Ceph. This option enable BlueFS to perform buffered
reads in some cases, and enables the kernel page cache to act as a secondary cache for reads like RocksDB block reads.

IMPORTANT: Changing the value of bluefs_buffered_io is not recommended. Before changing the bluefs_buffered_io
parameter, contact your IBM Support account team.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Monitor node.

Log into the Cephadm shell, using the cephadm shell command.

Procedure
Edit online
View the current value of the bluefs_buffered_io parameter, using one of the following procedures.

View the value stored in the configuration database.

382 IBM Storage Ceph


Syntax

ceph config get osd bluefs_buffered_io

Example

[ceph: root@host01 /]# ceph config get osd bluefs_buffered_io

View the running value for an OSD where the running value is different from the value stored in the configuration database.

Syntax

ceph config show OSD_ID bluefs_buffered_io

Example

[ceph: root@host01 /]# ceph config show osd.3 bluefs_buffered_io

Viewing Ceph BlueFS statistics for Ceph OSDs


Edit online
View the BluesFS related information about collocated and non-collocated Ceph OSDs with the bluefs stats command.

For more information about BlueStore devices, see BlueStore devices.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the OSD node.

The object store configured as BlueStore.

Log into the Cephadm shell, using the cephadm shell command.

Procedure
Edit online
View the BlueStore OSD statistics.

ceph daemon osd.OSD_ID bluefs stats

Example for collocated OSDs

[ceph: root@host01 /]# ceph daemon osd.1 bluefs stats

1 : device size 0x3bfc00000 : using 0x1a428000(420 MiB)


wal_total:0, db_total:15296836403, slow_total:0

Example for non-collocated OSDs

[ceph: root@host01 /]# ceph daemon osd.1 bluefs stats

0 :
1 : device size 0x1dfbfe000 : using 0x1100000(17 MiB)
2 : device size 0x27fc00000 : using 0x248000(2.3 MiB)
RocksDBBlueFSVolumeSelector: wal_total:0, db_total:7646425907, slow_total:10196562739,
db_avail:935539507
Usage matrix:
DEV/LEV WAL DB SLOW * * REAL FILES
LOG 0 B 4 MiB 0 B 0 B 0 B 756 KiB 1
WAL 0 B 4 MiB 0 B 0 B 0 B 3.3 MiB 1
DB 0 B 9 MiB 0 B 0 B 0 B 76 KiB 10
SLOW 0 B 0 B 0 B 0 B 0 B 0 B 0
TOTALS 0 B 17 MiB 0 B 0 B 0 B 0 B 12
MAXIMUMS:

IBM Storage Ceph 383


LOG 0 B 4 MiB 0 B 0 B 0 B 756 KiB
WAL 0 B 4 MiB 0 B 0 B 0 B 3.3 MiB
DB 0 B 11 MiB 0 B 0 B 0 B 112 KiB
SLOW 0 B 0 B 0 B 0 B 0 B 0 B
TOTALS 0 B 17 MiB 0 B 0 B 0 B 0 B

In this example,

0 refers to the dedicated WAL device, which is block.wal.

1 refers to the dedicated DB device, which is block.db.

2 refers to the main block device, which is block or slow.

device size represents an actual size of the device.

using represents the total usage. It is not restricted to BlueFS.

IMPORTANT: DB and WAL devices are used only by BlueFS. For a main device, usage from stored BlueStore data is also
included. In this example, 2.3 MiB is the data from BlueStore.

wal_total, db_total, and slow_total are values that reiterate the device values previously stated.

db_avail represents how many bytes can be taken from the SLOW device, if necessary.

Usage matrix:

Rows WAL, DB, and SLOW describes where the specific file was intended to be put.

Row Log describes the BlueFS replay log ino 1.

Columns WAL, DB, and SLOW describe where data is actually put.The values are in allocation units. WAL and DB
have bigger allocation units for performance reasons.

Columns * relate to virtual devices new-db and new-wal that are used for ceph-bluestore-tool. It should
always show 0 B.

Column REAL shows actual usage in bytes.

Column FILES shows count of files.

MAXIMUMS. This table captures the maximum value of each entry from the usage matrix.

Cephadm troubleshooting
Edit online
As a storage administrator, you can troubleshoot the IBM Storage Ceph cluster. Sometimes there is a need to investigate why a
Cephadm command failed or why a specific service does not run properly.

Pause or disable cephadm


Per service and per daemon event
Check cephadm logs
Gather log files
Collect systemd status
List all downloaded container images
Manually run containers
CIDR network error
Access the admin socket
Manually deploying a mgr daemon

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

384 IBM Storage Ceph


Pause or disable cephadm
Edit online
If Cephadm does not behave as expected, you can pause most of the background activity with the following command:

Example

[ceph: root@host01 /]# ceph orch pause

This stops any changes, but Cephadm periodically checks hosts to refresh it’s inventory of daemons and devices.

If you want to disable Cephadm completely, run the following commands:

Example

[ceph: root@host01 /]# ceph orch set backend ''


[ceph: root@host01 /]# ceph mgr module disable cephadm

Note that previously deployed daemon containers continue to exist and start as they did before.

To re-enable Cephadm in the cluster, run the following commands:

Example

[ceph: root@host01 /]# ceph mgr module enable cephadm


[ceph: root@host01 /]# ceph orch set backend cephadm

Per service and per daemon event


Edit online
Cephadm stores events per service and per daemon in order to aid in debugging failed daemon deployments. These events often
contain relevant information:

Per service

Syntax

ceph orch ls --service_name SERVICE_NAME --format yaml

Example

[ceph: root@host01 /]# ceph orch ls --service_name alertmanager --format yaml


service_type: alertmanager
service_name: alertmanager
placement:
hosts:
- unknown_host
status:
...
running: 1
size: 1
events:
- 2021-02-01T08:58:02.741162 service:alertmanager [INFO] "service was created"
- '2021-02-01T12:09:25.264584 service:alertmanager [ERROR] "Failed to apply: Cannot
place <AlertManagerSpec for service_name=alertmanager> on unknown_host: Unknown hosts"'

Per daemon

Syntax

ceph orch ps --service-name SERVICE_NAME --daemon-id DAEMON_ID --format yaml

Example

[ceph: root@host01 /]# ceph orch ps --service-name mds --daemon-id cephfs.hostname.ppdhsz --format
yaml
daemon_type: mds

IBM Storage Ceph 385


daemon_id: cephfs.hostname.ppdhsz
hostname: hostname
status_desc: running
...
events:
- 2021-02-01T08:59:43.845866 daemon:mds.cephfs.hostname.ppdhsz [INFO] "Reconfigured
mds.cephfs.hostname.ppdhsz on host 'hostname'"

Check cephadm logs


Edit online
You can monitor the Cephadm log in real time with the following command:

Example

[ceph: root@host01 /]# ceph -W cephadm

You can see the last few messages with the following command:

Example

[ceph: root@host01 /]# ceph log last cephadm

If you have enabled logging to files, you can see a Cephadm log file called ceph.cephadm.log on the monitor hosts.

Gather log files


Edit online
You can use the journalctl command, to gather the log files for all the daemons.

NOTE: You have to run all these commands outside the cephadm shell.

NOTE: By default, Cephadm stores logs in journald which means that daemon logs are no longer available in /var/log/ceph.

To read the log file of a specific daemon, run the following command:

Syntax

cephadm logs --name DAEMON_NAME

Example

[root@host01 ~]# cephadm logs --name cephfs.hostname.ppdhsz

NOTE: This command works when run on the same hosts where the daemon is running.

Read the log file of a specific daemon running on a different host:

Syntax

cephadm logs --fsid FSID --name DAEMON_NAME

Example

[root@host01 ~]# cephadm logs --fsid 2d2fd136-6df1-11ea-ae74-002590e526e8 --name


cephfs.hostname.ppdhsz

where fsid is the cluster ID provided by the ceph status command.

Fetch all log files of all the daemons on a given host:

Syntax

for name in $(cephadm ls | python3 -c "import sys, json; [print(i[name]) for i in


json.load(sys.stdin)]") ; do cephadm logs --fsid FSID_OF_CLUSTER --name "$name" > $name; done

386 IBM Storage Ceph


Example

[root@host01 ~]# for name in $(cephadm ls | python3 -c "import sys, json; [print(i['name'])
for i in json.load(sys.stdin)]") ; do cephadm logs --fsid 57bddb48-ee04-11eb-9962-001a4a000672
--name "$name" > $name; done

Collect systemd status


Edit online
To print the state of a systemd unit, run the following command:

Example

[root@host01 ~]$ systemctl status [email protected]

List all downloaded container images


Edit online
To list all the container images that are downloaded on a host, run the following command:

Example

[ceph: root@host01 /]# podman ps -a --format json | jq '.[].Image'


"docker.io/library/rhel8"
"cp.icr.io/cp/ibm-ceph/ceph-5-rhel8:latest"

Manually run containers


Edit online
Cephadm writes small wrappers that runs a container. Refer to /var/lib/ceph/CLUSTER_FSID/SERVICE_NAME/unit` to run the
container execution command.

Analysing SSH errors

If you get the following error:

Example

execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-73z09u6g -i /tmp/cephadm-identity-


ky7ahp_5 [email protected]

...

raise OrchestratorError(msg) from e


orchestrator._interface.OrchestratorError: Failed to connect to 10.10.1.2 (10.10.1.2).

Please make sure that the host is reachable and accepts connections using the cephadm SSH key

Try the following options to troubleshoot the issue:

To ensure Cephadm has a SSH identity key, run the following command:

Example

[ceph: root@host01 /]# ceph config-key get mgr/cephadm/ssh_identity_key >


~/cephadm_private_key
INFO:cephadm:Inferring fsid f8edc08a-7f17-11ea-8707-000c2915dd98
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 obtained
'mgr/cephadm/ssh_identity_key'
[root@mon1 ~] # chmod 0600 ~/cephadm_private_key

If the above command fails, Cephadm does not have a key. To generate a SSH key, run the following command:

IBM Storage Ceph 387


Example

[ceph: root@host01 /]# chmod 0600 ~/cephadm_private_key

Or

Example

[ceph: root@host01 /]# cat ~/cephadm_private_key | ceph cephadm set-ssk-key -i-

To ensure that the SSH configuration is correct, run the following command:

Example

[ceph: root@host01 /]# ceph cephadm get-ssh-config

To verify the connection to the host, run the following command:

Example

[ceph: root@host01 /]# ssh -F config -i ~/cephadm_private_key root@host01

Verify public key is in authorized_keys.

To verify that the public key is in the authorized_keys file, run the following commands:

Example

[ceph: root@host01 /]# ceph cephadm get-pub-key


[ceph: root@host01 /]# grep "`cat ~/ceph.pub`" /root/.ssh/authorized_keys

CIDR network error


Edit online
Classless inter domain routing (CIDR) also known as supernetting, is a method of assigning Internet Protocol (IP) addresses, the
Cephadm log entries shows the current state that improves the efficiency of address distribution and replaces the previous system
based on Class A, Class B and Class C networks. If you see one of the following errors:

ERROR: Failed to infer CIDR network for mon ip ***; pass --skip-mon-network to configure it later

Or

Must set public_network config option or specify a CIDR network, ceph addrvec, or plain IP

You need to run the following command:

Example

[ceph: root@host01 /]# ceph config set host public_network hostnetwork

Access the admin socket


Edit online
Each Ceph daemon provides an admin socket that bypasses the MONs.

To access the admin socket, enter the daemon container on the host:

Example

[ceph: root@host01 /]# cephadm enter --name cephfs.hostname.ppdhsz


[ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-cephfs.hostname.ppdhsz.asok config show

Manually deploying a mgr daemon


388 IBM Storage Ceph
Edit online
Cephadm requires a mgr daemon in order to manage the IBM Storage Ceph cluster. In case the last mgr daemon of an IBM Storage
Ceph cluster was removed, you can manually deploy a mgr daemon, on a random host of the IBM Storage Ceph cluster.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Disable the Cephadm scheduler to prevent Cephadm from removing the new MGR daemon, with the following command:

Example

[ceph: root@host01 /]# ceph config-key set mgr/cephadm/pause true

3. Get or create the auth entry for the new MGR daemon:

Example

[ceph: root@host01 /]# ceph auth get-or-create mgr.host01.smfvfd1 mon "profile mgr" osd "allow
*" mds "allow *"
[mgr.host01.smfvfd1]
key = AQDhcORgW8toCRAAlMzlqWXnh3cGRjqYEa9ikw==

4. Open ceph.conf file:

Example

[ceph: root@host01 /]# ceph config generate-minimal-conf


# minimal ceph.conf for 8c9b0072-67ca-11eb-af06-001a4a0002a0
[global]
fsid = 8c9b0072-67ca-11eb-af06-001a4a0002a0
mon_host = [v2:10.10.200.10:3300/0,v1:10.10.200.10:6789/0]
[v2:10.10.10.100:3300/0,v1:10.10.200.100:6789/0]

5. Get the container image:

Example

[ceph: root@host01 /]# ceph config get "mgr.host01.smfvfd1" container_image

6. Create a config-json.json file and add the following:

NOTE: Use the values from the output of the ceph config generate-minimal-conf command.

Example

{
{
"config": "# minimal ceph.conf for 8c9b0072-67ca-11eb-af06-001a4a0002a0\n[global]\n\tfsid =
8c9b0072-67ca-11eb-af06-001a4a0002a0\n\tmon_host =
[v2:10.10.200.10:3300/0,v1:10.10.200.10:6789/0]
[v2:10.10.10.100:3300/0,v1:10.10.200.100:6789/0]\n",
"keyring": "[mgr.Ceph5-2.smfvfd1]\n\tkey = AQDhcORgW8toCRAAlMzlqWXnh3cGRjqYEa9ikw==\n"
}
}

IBM Storage Ceph 389


7. Exit from the Cephadm shell:

Example

[ceph: root@host01 /]# exit

8. Deploy the MGR daemon:

Example

[root@host01 ~]# cephadm --image cp.icr.io/cp/ibm-ceph/ceph-5-rhel8:latest deploy --fsid


8c9b0072-67ca-11eb-af06-001a4a0002a0 --name mgr.host01.smfvfd1 --config-json config-json.json

Verification
Edit online

In the Cephadm shell, run the following command:

Example

[ceph: root@host01 /]# ceph -s

You can see a new mgr daemon has been added.

Cephadm operations
Edit online
As a storage administrator, you can carry out Cephadm operations in the IBM Storage Ceph cluster.

Monitor cephadm log messages


Ceph daemon logs
Data location
Cephadm health checks

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Monitor cephadm log messages


Edit online
Cephadm logs to the cephadm cluster log channel so you can monitor progress in real time.

To monitor progress in realtime, run the following command:

Example

[ceph: root@host01 /]# ceph -W cephadm

Example

2022-11-02T17:51:36.335728+0000 mgr.Ceph5-1.nqikfh [INF] refreshing Ceph5-adm facts


2022-11-02T17:51:37.170982+0000 mgr.Ceph5-1.nqikfh [INF] deploying 1 monitor(s) instead of 2
so monitors may achieve consensus
2022-11-02T17:51:37.173487+0000 mgr.Ceph5-1.nqikfh [ERR] It is NOT safe to stop ['mon.Ceph5-
adm']: not enough monitors would be available (Ceph5-2) after stopping mons [Ceph5-adm]
2022-11-02T17:51:37.179658+0000 mgr.Ceph5-1.nqikfh [INF] Found osd claims -> {}
2022-11-02T17:51:37.180116+0000 mgr.Ceph5-1.nqikfh [INF] Found osd claims for drivegroup all-
available-devices -> {}
2022-11-02T17:51:37.182138+0000 mgr.Ceph5-1.nqikfh [INF] Applying all-available-devices on

390 IBM Storage Ceph


host Ceph5-adm...
2022-11-02T17:51:37.182987+0000 mgr.Ceph5-1.nqikfh [INF] Applying all-available-devices on
host Ceph5-1...
2022-11-02T17:51:37.183395+0000 mgr.Ceph5-1.nqikfh [INF] Applying all-available-devices on
host Ceph5-2...
2022-11-02T17:51:43.373570+0000 mgr.Ceph5-1.nqikfh [INF] Reconfiguring node-exporter.Ceph5-1
(unknown last config time)...
2022-11-02T17:51:43.373840+0000 mgr.Ceph5-1.nqikfh [INF] Reconfiguring daemon node-
exporter.Ceph5-1 on Ceph5-1

By default, the log displays info-level events and above. To see the debug-level messages, run the following commands:

Example

[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/log_to_cluster_level debug


[ceph: root@host01 /]# ceph -W cephadm --watch-debug
[ceph: root@host01 /]# ceph -W cephadm --verbose

Return debugging level to default info:

Example

[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/log_to_cluster_level info

To see the recent events, run the following command:

Example

[ceph: root@host01 /]# ceph log last cephadm

Theses events are also logged to ceph.cephadm.log file on the monitor hosts and to the monitor daemon’s stderr.

Ceph daemon logs


Edit online
You can view the Ceph daemon logs through stderr or files.

Logging to stdout

Traditionally, Ceph daemons have logged to /var/log/ceph. By default, Cephadm daemons log to stderr and the logs are
captured by the container runtime environment. For most systems, by default, these logs are sent to journald and accessible
through the journalctl command.

For example, to view the logs for the daemon on host01 for a storage cluster with ID 5c5a50ae-272a-455d-99e9-
32c6a013e694:

Example

[ceph: root@host01 /]# journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@host01

This works well for normal Cephadm operations when logging levels are low.

To disable logging to stderr, set the following values:

Example

[ceph: root@host01 /]# ceph config set global log_to_stderr false


[ceph: root@host01 /]# ceph config set global mon_cluster_log_to_stderr false

Logging to files

You can also configure Ceph daemons to log to files instead of stderr. When logging to files, Ceph logs are located in
/var/log/ceph/CLUSTER_FSID.

To enable logging to files, set the follwing values:

Example

[ceph: root@host01 /]# ceph config set global log_to_file true


[ceph: root@host01 /]# ceph config set global mon_cluster_log_to_file true

IBM Storage Ceph 391


NOTE: Disable logging to stderr to avoid double logs.

IMPORTANT: Currently log rotation to a non-default path is not supported.

By default, Cephadm sets up log rotation on each host to rotate these files. You can configure the logging retention schedule by
modifying /etc/logrotate.d/ceph.CLUSTER_FSID.

Data location
Edit online
Cephadm daemon data and logs are located in slightly different locations than the older versions of Ceph:

/var/log/ceph/CLUSTER_FSID contains all the storage cluster


logs. Note that by default Cephadm logs through stderr` and the container runtime, so these logs are usually
not present.

/var/lib/ceph/CLUSTER_FSID contains all the cluster daemon data, besides logs.

/var/lib/ceph/CLUSTER_FSID/DAEMON_NAME contains all the data for a specific daemon.

/var/lib/ceph/CLUSTER_FSID/crash` contains the crash reports for the storage cluster.

/var/lib/ceph/CLUSTER_FSID/removed` contains old daemon data directories for the stateful daemons, for example
monitor or Prometheus, that have been removed by Cephadm.

Disk usage

A few Ceph daemons may store a significant amount of data in /var/lib/ceph, notably the monitors and Prometheus daemon,
hence IBM recommends moving this directory to its own disk, partition, or logical volume so that the root file system is not filled up.

Cephadm health checks


Edit online
As a storage administrator, you can monitor the IBM Storage Ceph cluster with the additional health checks provided by the
Cephadm module. This is supplementary to the default healthchecks provided by the storage cluster.

Cephadm operations health checks


Cephadm configuration health checks

Cephadm operations health checks


Edit online
Healthchecks are executed when the Cephadm module is active. You can get the following health warnings:

CEPHADM_PAUSED

Cephadm background work is paused with the ceph orch pause command. Cephadm continues to perform passive monitoring
activities such as checking the host and daemon status, but it does not make any changes like deploying or removing daemons. You
can resume Cephadm work with the ceph orch resume command.

CEPHADM_STRAY_HOST

One or more hosts have running Ceph daemons but are not registered as hosts managed by the Cephadm module. This means that
those services are not currently managed by Cephadm, for example, a restart and upgrade that is included in the ceph orch ps
command. You can manage the host(s) with the ceph orch host add HOST_NAME command but ensure that SSH access to the
remote hosts is configured. Alternatively, you can manually connect to the host and ensure that services on that host are removed or
migrated to a host that is managed by Cephadm. You can also disable this warning with the setting ceph config set mgr
mgr/cephadm/warn_on_stray_hosts false

392 IBM Storage Ceph


CEPHADM_STRAY_DAEMON

One or more Ceph daemons are running but are not managed by the Cephadm module. This might be because they were deployed
using a different tool, or because they were started manually. Those services are not currently managed by Cephadm, for example, a
restart and upgrade that is included in the ceph orch ps command.

If the daemon is a stateful one that is a monitor or OSD daemon, these daemons should be adopted by Cephadm. For stateless
daemons, you can provision a new daemon with the ceph orch apply command and then stop the unmanaged daemon.

You can disable this health warning with the setting ceph config set mgr mgr/cephadm/warn_on_stray_daemons false.

CEPHADM_HOST_CHECK_FAILED

One or more hosts have failed the basic Cephadm host check, which verifies that:name: value

The host is reachable and you can execute Cephadm.

The host meets the basic prerequisites, like a working container runtime that is Podman , and working time synchronization. If
this test fails, Cephadm wont be able to manage the services on that host.

You can manually run this check with the ceph cephadm check-host HOST_NAME command. You can remove a broken host from
management with the ceph orch host rm HOST_NAME command. You can disable this health warning with the setting ceph
config set mgr mgr/cephadm/warn_on_failed_host_check false.

Cephadm configuration health checks


Edit online
Cephadm periodically scans each of the hosts in the storage cluster, to understand the state of the OS, disks, and NICs . These facts
are analyzed for consistency across the hosts in the storage cluster to identify any configuration anomalies. The configuration checks
are an optional feature.

You can enable this feature with the following command:

Example

[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/config_checks_enabled true

The configuration checks are triggered after each host scan, which is for a duration of one minute.

The ceph -W cephadm command shows log entries of the current state and outcome of the configuration checks as follows:

Disabled state

Example

ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable

Enabled state

Example

CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected

The configuration checks themselves are managed through several cephadm subcommands.

To determine whether the configuration checks are enabled, run the following command:

Example

[ceph: root@host01 /]# ceph cephadm config-check status

This command returns the status of the configuration checker as either Enabled or Disabled.

To list all the configuration checks and their current state, run the following command:

Example

IBM Storage Ceph 393


[ceph: root@host01 /]# ceph cephadm config-check ls
NAME HEALTHCHECK STATUS DESCRIPTION
kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles
are consistent across cluster hosts
os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are
consistent for all cluster hosts
public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on
the Ceph public_netork
osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common
MTU setting
osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common
linkspeed
network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public
networks defined exist on the Ceph hosts
ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency
- ceph daemons should be on the same release (unless upgrade is active)
kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the
kernel on Ceph hosts is consistent

Each configuration check is described as follows:

CEPHADM_CHECK_KERNEL_LSM

Each host within the storage cluster is expected to operate within the same Linux Security Module (LSM) state. For example, if the
majority of the hosts are running with SELINUX in enforcing mode, any host not running in this mode would be flagged as an
anomaly and a healthcheck with a warning state is raised.

CEPHADM_CHECK_SUBSCRIPTION

This check relates to the status of the vendor subscription. This check is only performed for hosts using Red Hat Enterprise Linux, but
helps to confirm that all the hosts are covered by an active subscription so that patches and updates are available.

CEPHADM_CHECK_PUBLIC_MEMBERSHIP

All members of the cluster should have NICs configured on at least one of the public network subnets. Hosts that are not on the
public network will rely on routing which may affect performance.

CEPHADM_CHECK_MTU

The maximum transmission unit (MTU) of the NICs on OSDs can be a key factor in consistent performance. This check examines
hosts that are running OSD services to ensure that the MTU is configured consistently within the cluster. This is determined by
establishing the MTU setting that the majority of hosts are using, with any anomalies resulting in a Ceph healthcheck.

CEPHADM_CHECK_LINKSPEED

Similar to the MTU check, linkspeed consistency is also a factor in consistent cluster performance. This check determines the
linkspeed shared by the majority of the OSD hosts, resulting in a healthcheck for any hosts that are set at a lower linkspeed rate.

CEPHADM_CHECK_NETWORK_MISSING

The public_network and cluster_network settings support subnet definitions for IPv4 and IPv6. If these settings are not found
on any host in the storage cluster a healthcheck is raised.

CEPHADM_CHECK_CEPH_RELEASE Under normal operations, the Ceph cluster should be running daemons under the same Ceph
release, for example all IBM Storage Ceph cluster 5 releases. This check looks at the active release for each daemon, and reports any
anomalies as a healthcheck. This check is bypassed if an upgrade process is active within the cluster.

CEPHADM_CHECK_KERNEL_VERSION

The OS kernel version is checked for consistency across the hosts. Once again, the majority of the hosts is used as the basis of
identifying anomalies.

Managing an IBM Storage Ceph cluster using cephadm-ansible


modules
Edit online
As a storage administrator, you can use cephadm-ansible modules in Ansible playbooks to administer your IBM Storage Ceph
cluster. The cephadm-ansible package provides several modules that wrap cephadm calls to let you write your own unique

394 IBM Storage Ceph


Ansible playbooks to administer your cluster.

NOTE: At this time, cephadm-ansible modules only support the most important tasks. Any operation not covered by cephadm-
ansible modules must be completed using either the command or shell Ansible modules in your playbooks.

The cephadm-ansible modules


The cephadm-ansible modules options
Bootstrapping a storage cluster using the cephadm_bootstrap and cephadm_registry_login modules
Adding or removing hosts using the ceph_orch_host module
Setting configuration options using the ceph_config module
Applying a service specification using the ceph_orch_apply module
Managing Ceph daemon states using the ceph_orch_daemon module

The cephadm-ansible modules

Edit online
The cephadm-ansible modules are a collection of modules that simplify writing Ansible playbooks by providing a wrapper around
cephadm and ceph orch commands. You can use the modules to write your own unique Ansible playbooks to administer your
cluster using one or more of the modules.

The cephadm-ansible package includes the following modules:

cephadm_bootstrap

ceph_orch_host

ceph_config

ceph_orch_apply

ceph_orch_daemon

cephadm_registry_login

The cephadm-ansible modules options

Edit online
The following tables list the available options for the cephadm-ansible modules. Options listed as required need to be set when
using the modules in your Ansible playbooks. Options listed with a default value of true indicate that the option is automatically set
when using the modules and you do not need to specify it in your playbook. For example, for the cephadm_bootstrap module, the
Ceph Dashboard is installed unless you set dashboard: false.

Table 1. Available options for the cephadm_bootstrap module


cephadm_bootstr
ap Description Required Default
mon_ip Ceph Monitor IP address. true
image Ceph container image. false
docker Use docker instead of podman. false
fsid Define the Ceph FSID. false
pull Pull the Ceph container image. false true
dashboard Deploy the Ceph Dashboard. false true
dashboard_user Specify a specific Ceph Dashboard user. false
dashboard_passw Ceph Dashboard password. false
ord
monitoring Deploy the monitoring stack. false true
firewalld Manage firewall rules with firewalld. false true
allow_overwrite Allow overwrite of existing --output-config, --output-keyring, or -- false false
output-pub-ssh-key files.

IBM Storage Ceph 395


cephadm_bootstr
ap Description Required Default
registry_url URL for custom registry. false
registry_userna Username for custom registry. false
me
registry_passwo Password for custom registry. false
rd
registry_json JSON file with custom registry login information. false
ssh_user SSH user to use for cephadm ssh to hosts. false
ssh_config SSH config file path for cephadm SSH client. false
allow_fqdn_host Allow hostname that is a fully-qualified domain name (FQDN). false false
name
cluster_network Subnet to use for cluster replication, recovery and heartbeats. false
Table 2. Available options for the ceph_orch_host module
ceph_orch_ho
st Description Required Default
fsid The FSID of the Ceph cluster to interact with. false
image The Ceph container image to use. false
name Name of the host to add, remove, or update. true
address IP address of the host. true when
state is
present.
set_admin_la Set the _admin label on the specified host. false false
bel
labels The list of labels to apply to the host. false []
state If set to present, it ensures the name specified in name is present. If set to false present
absent, it removes the host specified in name. If set to drain, it schedules
to remove all daemons from the host specified in name.
Table 3. Available options for the ceph_config module
ceph_config Description Required Default
fsid The FSID of the Ceph cluster to interact with. false
image The Ceph container image to use. false
action Whether to set or get the parameter specified in option. false set
who Which daemon to set the configuration to. true
option Name of the parameter to set or get. true
value Value of the parameter to set. true if action is set
Table 4. Available options for the ceph_orch_apply module
ceph_orch_apply Description Required
fsid The FSID of the Ceph cluster to interact with. false
image The Ceph container image to use. false
spec The service specification to apply. true
Table 5. Available options for the ceph_orch_daemon module
ceph_orch_daemon Description Required
fsid The FSID of the Ceph cluster to interact with. false
image The Ceph container image to use. false
state The desired state of the service specified in name. true

If started, it ensures the service is started.

If stopped, it ensures the service is stopped.

If restarted, it will restart the service.


daemon_id The ID of the service. true
daemon_type The type of service. true
Table 6. Available options for the cephadm_registry_login module
cephadm_regis
try_login Description Required Default
state Login or logout of a registry. false login

396 IBM Storage Ceph


cephadm_regis
try_login Description Required Default
docker Use docker instead of podman. false
registry_url The URL for custom registry. false
registry_user Username for custom registry. true when
name state is login.
registry_pass Password for custom registry. true when
word state is login.
registry_json The path to a JSON file. This file must be present on remote hosts prior to
running this task. This option is currently not supported.

Bootstrapping a storage cluster using the cephadm_bootstrap


and cephadm_registry_login modules

Edit online
As a storage administrator, you can bootstrap a storage cluster using Ansible by using the cephadm_bootstrap and
cephadm_registry_login modules in your Ansible playbook.

Prerequisites
Edit online

An IP address for the first Ceph Monitor container, which is also the IP address for the first node in the storage cluster.

Login access to cp.icr.io/cp.

A minimum of 10 GB of free space for /var/lib/containers/.

Red Hat Enterprise Linux 8.4 EUS or later.

Installation of the cephadm-ansible package on the Ansible administration node.

Passwordless SSH is set up on all hosts in the storage cluster.

Procedure
Edit online

1. Log in to the Ansible administration node.

2. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

3. Create the hosts file and add hosts, labels, and monitor IP address of the first host in the storage cluster:

Syntax

sudo vi INVENTORY_FILE

HOST1 labels="['LABEL1', 'LABEL2']"


HOST2 labels="['LABEL1', 'LABEL2']"
HOST3 labels="['LABEL1']"

[admin]
ADMIN_HOST monitor_address=MONITOR_IP_ADDRESS labels="['ADMIN_LABEL', 'LABEL1', 'LABEL2']"

Example

[ansible@admin cephadm-ansible]$ sudo vi hosts

IBM Storage Ceph 397


host02 labels="['mon', 'mgr']"
host03 labels="['mon', 'mgr']"
host04 labels="['osd']"
host05 labels="['osd']"
host06 labels="['osd']"

[admin]
host01 monitor_address=10.10.128.68 labels="['_admin', 'mon', 'mgr']"

4. Run the preflight playbook:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=ibm"

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-vars


"ceph_origin=ibm"

5. Create a playbook to bootstrap your cluster:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: NAME_OF_PLAY
hosts: BOOTSTRAP_HOST
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
-name: NAME_OF_TASK
cephadm_registry_login:
state: STATE
registry_url: REGISTRY_URL
registry_username: REGISTRY_USER_NAME
registry_password: REGISTRY_PASSWORD

- name: NAME_OF_TASK
cephadm_bootstrap:
mon_ip: "{{ monitor_address }}"
dashboard_user: DASHBOARD_USER
dashboard_password: DASHBOARD_PASSWORD
allow_fqdn_hostname: ALLOW_FQDN_HOSTNAME
cluster_network: NETWORK_CIDR

Example

[ansible@admin cephadm-ansible]$ sudo vi bootstrap.yml

---
- name: bootstrap the cluster
hosts: host01
become: true
gather_facts: false
tasks:
- name: login to registry
cephadm_registry_login:
state: login
registry_url: cp.icr.io/cp
registry_username: user1
registry_password: mypassword1

- name: bootstrap initial cluster


cephadm_bootstrap:
mon_ip: "{{ monitor_address }}"
dashboard_user: mydashboarduser
dashboard_password: mydashboardpassword
allow_fqdn_hostname: true
cluster_network: 10.10.128.0/28

6. Run the playbook:

Syntax

398 IBM Storage Ceph


ansible-playbook -i INVENTORY_FILE PLAYBOOK_FILENAME.yml -vvv

Example

ansible@admin cephadm-ansible]$ ansible-playbook -i hosts bootstrap.yml -vvv

Verification
Edit online

Review the Ansible output after running the playbook.

Adding or removing hosts using the ceph_orch_host module

Edit online
Add and remove hosts in your storage cluster by using the ceph_orch_host module in your Ansible playbook.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Register the nodes to the CDN and attach subscriptions.

Ansible user with sudo and passwordless SSH access to all nodes in the storage cluster.

Installation of the cephadm-ansible package on the Ansible administration node.

New hosts have the storage cluster’s public SSH key.

For more information about copying the storage cluster's public SSH keys to new hosts, see Adding hosts.

Procedure
Edit online

1. Use the following procedure to add new hosts to the cluster:

a. Log in to the Ansible administration node.

b. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

c. Add the new hosts and labels to the Ansible inventory file.

Syntax

sudo vi INVENTORY_FILE

NEW_HOST1 labels="['LABEL1', 'LABEL2']"


NEW_HOST2 labels="['LABEL1', 'LABEL2']"
NEW_HOST3 labels="['LABEL1']"

[admin]
ADMIN_HOST monitor_address=MONITOR_IP_ADDRESS labels="['ADMIN_LABEL', 'LABEL1',
'LABEL2']"

Example

[ansible@admin cephadm-ansible]$ sudo vi hosts

host02 labels="['mon', 'mgr']"

IBM Storage Ceph 399


host03 labels="['mon', 'mgr']"
host04 labels="['osd']"
host05 labels="['osd']"
host06 labels="['osd']"

[admin]
host01 monitor_address= 10.10.128.68 labels="['_admin', 'mon', 'mgr']"

NOTE: If you have previously added the new hosts to the Ansible inventory file and ran the preflight playbook on the
hosts, skip to step 3.

d. Run the preflight playbook with the --limit option:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=ibm" -


-limit NEWHOST

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-


vars "ceph_origin=ibm" --limit host02

The preflight playbook installs podman, lvm2, chronyd, and cephadm on the new host. After installation is complete,
cephadm resides in the /usr/sbin/ directory.

e. Create a playbook to add the new hosts to the cluster:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: PLAY_NAME
hosts: HOSTS_OR_HOST_GROUPS
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_orch_host:
name: "{{ ansible_facts[hostname] }}"
address: "{{ ansible_facts[default_ipv4][address] }}"
labels: "{{ labels }}"
delegate_to: HOST_TO_DELEGATE_TASK_TO

- name: NAME_OF_TASK
when: inventory_hostname in groups[admin]
ansible.builtin.shell:
cmd: CEPH_COMMAND_TO_RUN
register: REGISTER_NAME

- name: NAME_OF_TASK
when: inventory_hostname in groups[admin]
debug:
msg: "{{ REGISTER_NAME.stdout }}"

NOTE: By default, Ansible executes all tasks on the host that matches the hosts line of your playbook. The ceph
orch commands must run on the host that contains the admin keyring and the Ceph configuration file. Use the
delegate_to keyword to specify the admin host in your cluster.

Example

[ansible@admin cephadm-ansible]$ sudo vi add-hosts.yml

---
- name: add additional hosts to the cluster
hosts: all
become: true
gather_facts: true
tasks:
- name: add hosts to the cluster
ceph_orch_host:
name: "{{ ansible_facts['hostname'] }}"
address: "{{ ansible_facts['default_ipv4']['address'] }}"

400 IBM Storage Ceph


labels: "{{ labels }}"
delegate_to: host01

- name: list hosts in the cluster


when: inventory_hostname in groups['admin']
ansible.builtin.shell:
cmd: ceph orch host ls
register: host_list

- name: print current list of hosts


when: inventory_hostname in groups['admin']
debug:
msg: "{{ host_list.stdout }}"

In this example, the playbook adds the new hosts to the cluster and displays a current list of hosts.

f. Run the playbook to add additional hosts to the cluster:

Syntax

ansible-playbook -i INVENTORY_FILE PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts add-hosts.yml

2. Use the following procedure to remove hosts from the cluster:

a. Log in to the Ansible administration node.

b. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

c. Create a playbook to remove a host or hosts from the cluster:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: NAME_OF_PLAY
hosts: ADMIN_HOST
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_orch_host:
name: HOST_TO_REMOVE
state: STATE

- name: NAME_OF_TASK
ceph_orch_host:
name: HOST_TO_REMOVE
state: STATE
retries: NUMBER_OF_RETRIES
delay: DELAY
until: CONTINUE_UNTIL
register: REGISTER_NAME

- name: NAME_OF_TASK
ansible.builtin.shell:
cmd: ceph orch host ls
register: REGISTER_NAME

- name: NAME_OF_TASK
debug:
msg: "{{ REGISTER_NAME.stdout }}"

Example

[ansible@admin cephadm-ansible]$ sudo vi remove-hosts.yml

IBM Storage Ceph 401


---
- name: remove host
hosts: host01
become: true
gather_facts: true
tasks:
- name: drain host07
ceph_orch_host:
name: host07
state: drain

- name: remove host from the cluster


ceph_orch_host:
name: host07
state: absent
retries: 20
delay: 1
until: result is succeeded
register: result

- name: list hosts in the cluster


ansible.builtin.shell:
cmd: ceph orch host ls
register: host_list

- name: print current list of hosts


debug:
msg: "{{ host_list.stdout }}"

In this example, the playbook tasks drain all daemons on host07, removes the host from the cluster, and displays a current
list of hosts.

3. Run the playbook to remove host from the cluster:

Syntax

ansible-playbook -i INVENTORY_FILE PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts remove-hosts.yml

Verification
Edit online

Review the Ansible task output displaying the current list of hosts in the cluster:

Example

TASK [print current hosts]


**********************************************************************************************
********
Friday 24 June 2022 14:52:40 -0400 (0:00:03.365) 0:02:31.702 ***********
ok: [host01] =>
msg: |-
HOST ADDR LABELS STATUS
host01 10.10.128.68 _admin mon mgr
host02 10.10.128.69 mon mgr
host03 10.10.128.70 mon mgr
host04 10.10.128.71 osd
host05 10.10.128.72 osd
host06 10.10.128.73 osd

Setting configuration options using the ceph_config module

Edit online
As a storage administrator, you can set or get IBM Storage Ceph configuration options using the ceph_config module.

402 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ansible user with sudo and passwordless SSH access to all nodes in the storage cluster.

Installation of the cephadm-ansible package on the Ansible administration node.

The Ansible inventory file contains the cluster and admin hosts.

Procedure
Edit online

1. Log in to the Ansible administration node.

2. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

3. Create a playbook with configuration changes:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: PLAY_NAME
hosts: ADMIN_HOST
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_config:
action: GET_OR_SET
who: DAEMON_TO_SET_CONFIGURATION_TO
option: CEPH_CONFIGURATION_OPTION
value: VALUE_OF_PARAMETER_TO_SET

- name: NAME_OF_TASK
ceph_config:
action: GET_OR_SET
who: DAEMON_TO_SET_CONFIGURATION_TO
option: CEPH_CONFIGURATION_OPTION
register: REGISTER_NAME

- name: NAME_OF_TASK
debug:
msg: "MESSAGE_TO_DISPLAY {{ REGISTER_NAME.stdout }}"

Example

[ansible@admin cephadm-ansible]$ sudo vi change_configuration.yml

---
- name: set pool delete
hosts: host01
become: true
gather_facts: false
tasks:
- name: set the allow pool delete option
ceph_config:
action: set
who: mon
option: mon_allow_pool_delete
value: true

- name: get the allow pool delete setting

IBM Storage Ceph 403


ceph_config:
action: get
who: mon
option: mon_allow_pool_delete
register: verify_mon_allow_pool_delete

- name: print current mon_allow_pool_delete setting


debug:
msg: "the value of 'mon_allow_pool_delete' is {{ verify_mon_allow_pool_delete.stdout
}}"

In this example, the playbook first sets the mon_allow_pool_delete option to false. The playbook then gets the current
mon_allow_pool_delete setting and displays the value in the Ansible output.

4. Run the playbook:

Syntax

ansible-playbook -i INVENTORY_FILE _PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts change_configuration.yml

Verification
Edit online

Review the output from the playbook tasks.

Example

TASK [print current mon_allow_pool_delete setting]


*************************************************************
Wednesday 29 June 2022 13:51:41 -0400 (0:00:05.523) 0:00:17.953 ********
ok: [host01] =>
msg: the value of 'mon_allow_pool_delete' is true

Reference
Edit online

For more information about configuration options, see Configuring.

Applying a service specification using the ceph_orch_apply


module
Edit online
As a storage administrator, you can apply service specifications to your storage cluster using the ceph_orch_apply module in your
Ansible playbooks. A service specification is a data structure to specify the service attributes and configuration settings that is used
to deploy the Ceph service. You can use a service specification to deploy Ceph service types like mon, crash, mds, mgr, osd, rdb, or
rbd-mirror.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ansible user with sudo and passwordless SSH access to all nodes in the storage cluster.

Installation of the cephadm-ansible package on the Ansible administration node.

The Ansible inventory file contains the cluster and admin hosts.

404 IBM Storage Ceph


Procedure
Edit online

1. Log in to the Ansible administration node.

2. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

3. Create a playbook with the service specifications:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: PLAY_NAME
hosts: HOSTS_OR_HOST_GROUPS
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_orch_apply:
spec: |
service_type: SERVICE_TYPE
service_id: UNIQUE_NAME_OF_SERVICE
placement:
host_pattern: HOST_PATTERN_TO_SELECT_HOSTS
label: LABEL
spec:
SPECIFICATION_OPTIONS:

Example

[ansible@admin cephadm-ansible]$ sudo vi deploy_osd_service.yml

---
- name: deploy osd service
hosts: host01
become: true
gather_facts: true
tasks:
- name: apply osd spec
ceph_orch_apply:
spec: |
service_type: osd
service_id: osd
placement:
host_pattern: '*'
label: osd
spec:
data_devices:
all: true

In this example, the playbook deploys the Ceph OSD service on all hosts with the label osd.

4. Run the playbook:

Syntax

ansible-playbook -i INVENTORY_FILE _PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts deploy_osd_service.yml

Verification
Edit online

IBM Storage Ceph 405


Review the output from the playbook tasks.

Reference
Edit online

For more details on service specification options, see Operations.

Managing Ceph daemon states using the ceph_orch_daemon


module
Edit online
As a storage administrator, you can start, stop, and restart Ceph daemons on hosts using the ceph_orch_daemon module in your
Ansible playbooks.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ansible user with sudo and passwordless SSH access to all nodes in the storage cluster.

Installation of the cephadm-ansible package on the Ansible administration node.

The Ansible inventory file contains the cluster and admin hosts.

Procedure
Edit online

1. Log in to the Ansible administration node.

2. Navigate to the /usr/share/cephadm-ansible directory on the Ansible administration node:

Example

[ansible@admin ~]$ cd /usr/share/cephadm-ansible

3. Create a playbook with daemon state changes:

Syntax

sudo vi PLAYBOOK_FILENAME.yml

---
- name: PLAY_NAME
hosts: ADMIN_HOST
become: USE_ELEVATED_PRIVILEGES
gather_facts: GATHER_FACTS_ABOUT_REMOTE_HOSTS
tasks:
- name: NAME_OF_TASK
ceph_orch_daemon:
state: STATE_OF_SERVICE
daemon_id: DAEMON_ID
daemon_type: TYPE_OF_SERVICE

Example

[ansible@admin cephadm-ansible]$ sudo vi restart_services.yml

---
- name: start and stop services
hosts: host01

406 IBM Storage Ceph


become: true
gather_facts: false
tasks:
- name: start osd.0
ceph_orch_daemon:
state: started
daemon_id: 0
daemon_type: osd

- name: stop mon.host02


ceph_orch_daemon:
state: stopped
daemon_id: host02
daemon_type: mon

In this example, the playbook starts the OSD with an ID of 0 and stops a Ceph Monitor with an id of host02.

4. Run the playbook:

Syntax

ansible-playbook -i INVENTORY_FILE _PLAYBOOK_FILENAME.yml

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts restart_services.yml

Verification
Edit online

Review the output from the playbook tasks.

Operations
Edit online
Learn to do operational tasks for IBM Storage Ceph.

Introduction to the Ceph Orchestrator


Use of the Ceph Orchestrator
Management of services
Management of hosts
Management of monitors
Management of managers
Management of OSDs
Management of monitoring stack
Basic IBM Storage Ceph client setup
Management of MDS service
Management of Ceph object gateway
Configuration of SNMP traps
Handling a node failure
Handling a data center failure

Introduction to the Ceph Orchestrator


Edit online
As a storage administrator, you can use the Ceph Orchestrator with Cephadm utility that provides the ability to discover devices and
create services in an IBM Storage Ceph cluster.

Use of the Ceph Orchestrator


IBM Storage Ceph 407
Edit online
IBM Storage Ceph Orchestrators are manager modules that primarily act as a bridge between an IBM Storage Ceph cluster and
deployment tools like Rook and Cephadm for a unified experience. They also integrate with the Ceph command line interface and
Ceph Dashboard.

The following is a workflow diagram of Ceph Orchestrator:

Figure 1. Ceph Orchestrator

Types of IBM Storage Ceph Orchestrators

There are three main types of IBM Storage Ceph Orchestrators:

Orchestrator CLI : These are common APIs used in Orchestrators and include a set of commands that can be implemented.
These APIs also provide a common command line interface (CLI) to orchestrate ceph-mgr modules with external
orchestration services. The following are the nomenclature used with the Ceph Orchestrator:

Host : This is the host name of the physical host and not the pod name, DNS name, container name, or host name inside
the container.

Service type : This is the type of the service, such as mds, osd, mon, rgw, and mgr.

Service : A functional service provided by a Ceph storage cluster such as monitors service, managers service, OSD
services, and Ceph Object Gateway service.

Daemon : A specific instance of a service deployed by one or more hosts such as Ceph Object Gateway services can
have different Ceph Object Gateway daemons running in three different hosts.

Cephadm Orchestrator - This is a Ceph Orchestrator module that does not rely on an external tool such as Rook or Ansible,
but rather manages nodes in a cluster by establishing an SSH connection and issuing explicit management commands. This
module is intended for day-one and day-two operations.

Using the Cephadm Orchestrator is the recommended way of installing a Ceph storage cluster without leveraging any
deployment frameworks like Ansible. The idea is to provide the manager daemon with access to an SSH configuration and key
that is able to connect to all nodes in a cluster to perform any management operations, like creating an inventory of storage
devices, deploying and replacing OSDs, or starting and stopping Ceph daemons. In addition, the Cephadm Orchestrator will
deploy container images managed by systemd in order to allow independent upgrades of co-located services.

This orchestrator will also likely highlight a tool that encapsulates all necessary operations to manage the deployment of
container image based services on the current host, including a command that bootstraps a minimal cluster running a Ceph
Monitor and a Ceph Manager.

408 IBM Storage Ceph


Rook Orchestrator - Rook is an orchestration tool that uses the Kubernetes Rook operator to manage a Ceph storage cluster
running inside a Kubernetes cluster. The rook module provides integration between Ceph’s Orchestrator framework and Rook.
Rook is an open source cloud-native storage operator for Kubernetes.

Rook follows the “operator” model, in which a custom resource definition (CRD) object is defined in Kubernetes to describe a
Ceph storage cluster and its desired state, and a rook operator daemon is running in a control loop that compares the current
cluster state to desired state and takes steps to make them converge. The main object describing Ceph’s desired state is the
Ceph storage cluster CRD, which includes information about which devices should be consumed by OSDs, how many monitors
should be running, and what version of Ceph should be used. Rook defines several other CRDs to describe RBD pools, CephFS
file systems, and so on.

The Rook Orchestrator module is the glue that runs in the ceph-mgr daemon and implements the Ceph orchestration API by
making changes to the Ceph storage cluster in Kubernetes that describe desired cluster state. A Rook cluster’s ceph-mgr
daemon is running as a Kubernetes pod, and hence, the rook module can connect to the Kubernetes API without any explicit
configuration.

Management of services
Edit online
As a storage administrator, after installing the IBM Storage Ceph cluster, you can monitor and manage the services in a storage
cluster. A service is a group of daemons that are configured together.

Checking service status


Checking daemon status
Placement specification of the Ceph Orchestrator
Deploying the Ceph daemons using the command line interface
Deploying the Ceph daemons on a subset of hosts using the command line interface
Service specification of the Ceph Orchestrator
Deploying the Ceph daemons using the service specification

Checking service status


Edit online
You can check the following status of the services of the storage cluster using the ceph orch ls command:

Print a list of services.

Locate the service whose status you want to check.

Print the status of the service.

NOTE: If the services are applied with the ceph orch apply command while bootstrapping, changing the service specification file
is complicated. Instead, you can use the --export option with the ceph orch ls command to export the running specification,
update the yaml file, and re-apply the service.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Log into the cephadm shell.

Procedure
Edit online

Print a list of services:

IBM Storage Ceph 409


Syntax

ceph orch ls [--service_type SERVICE_TYPE] [--service_name SERVICE_NAME] [--export] [--


format FORMAT] [--refresh]

The format can be plain, json, json-pretty, yaml, xml-pretty, or xml.

Example

[ceph: root@host01 /]# ceph orch ls

Check the status of a particular service or a daemon:

Syntax

ceph orch ls [--service_type SERVICE_TYPE] [--service_name SERVICE_NAME] [--refresh]

Example

[ceph: root@host01 /]# ceph orch ls --service-type mds


[ceph: root@host01 /]# ceph orch ls --service-name rgw.realm.myzone

Export the service specification:

Example

[ceph: root@host01 /]# ceph orch ls --service-type mgr --export > mgr.yaml
[ceph: root@host01 /]# ceph orch ls --export > cluster.yaml

This exports the file in the .yaml file format. This file can be used with the ceph orch apply -i command for retrieving
the service specification of a single service.

Checking daemon status


Edit online
A daemon is a systemd unit that is running and is part of the service.

You can check the following status of the daemons of the storage cluster using the ceph orch ps command:

Print a list of all the daemons.

Query the status of the target daemon.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Log into the cephadm shell.

Procedure
Edit online

Print a list of daemons:

Syntax

ceph orch ps [--daemon-type DAEMON_TYPE] [--service_name SERVICE_NAME] [--daemon_id


DAEMON_ID] [--format FORMAT] [--refresh]

Example

[ceph: root@host01 /]# ceph orch ps

Check the status of a particular service instance:

410 IBM Storage Ceph


Syntax

ceph orch ls [--daemon-type _DAEMON_TYPE_] [--daemon_id _DAEMON_ID_] [--refresh]

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type osd --daemon_id 0

Placement specification of the Ceph Orchestrator


Edit online
You can use the Ceph Orchestrator to deploy osds, mons, mgrs, mds, and rgw services. IBM recommends deploying services using
placement specifications. You need to know where and how many daemons have to be deployed to deploy a service. Placement
specifications can either be passed as command line arguments or as a service specification in a yaml file.

There are two ways of deploying the services using the placement specification:

Using the placement specification directly in the command line interface. For example, if you want to deploy three monitors on
the hosts, running the following command deploys three monitors on host01, host02, and host03.

Example

[ceph: root@host01 /]# ceph orch apply mon --placement="3 host01 host02 host03"

Using the placement specification in the YAML file. For example, if you want to deploy node-exporter on all the hosts, then
you can specify the following in the yaml file.

Example

service_type: node-exporter
placement:
host_pattern: '*'

Deploying the Ceph daemons using the command line interface


Edit online
Using the Ceph Orchestrator, you can deploy the daemons such as Ceph Manager, Ceph Monitors, Ceph OSDs, monitoring stack, and
others using the ceph orch command. Placement specification is passed as --placement argument with the Orchestrator
commands.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the storage cluster.

Procedures
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Use one of the following methods to deploy the daemons on the hosts:

Method 1: Specify the number of daemons and the host names:

IBM Storage Ceph 411


Syntax

ceph orch apply SERVICE_NAME --placement="NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_2


HOST_NAME_3"

Example

[ceph: root@host01 /]# ceph orch apply mon --placement="3 host01 host02 host03"

Method 2: Add the labels to the hosts and then deploy the daemons using the labels:

Add the labels to the hosts:

Syntax

ceph orch host label add HOSTNAME_1 LABEL

Example

[ceph: root@host01 /]# ceph orch host label add host01 mon

Deploy the daemons with labels:

Syntax

ceph orch apply DAEMON_NAME label:_LABEL_

Example

ceph orch apply mon label:mon

Method 3: Add the labels to the hosts and deploy using the --placement argument:

Add the labels to the hosts:

Syntax

ceph orch host label add HOSTNAME_1 LABEL

Example

[ceph: root@host01 /]# ceph orch host label add host01 mon

Deploy the daemons using the label placement specification:

Syntax

ceph orch apply DAEMON_NAME --placement="label:_LABEL_"

Example

ceph orch apply mon --placement="label:mon"

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME


ceph orch ps --service_name=SERVICE_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mon


[ceph: root@host01 /]# ceph orch ps --service_name=mon

412 IBM Storage Ceph


Reference
Edit online

Adding hosts

Deploying the Ceph daemons on a subset of hosts using the


command line interface
Edit online
You can use the --placement option to deploy daemons on a subset of hosts. You can specify the number of daemons in the
placement specification with the name of the hosts to deploy the daemons.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List the hosts on which you want to deploy the Ceph daemons:

Example

[ceph: root@host01 /]# ceph orch host ls

3. Deploy the daemons:

Syntax

ceph orch apply SERVICE_NAME --placement="NUMBER_OF_DAEMONS HOST_NAME_1 _HOST_NAME_2


HOST_NAME_3"

Example

ceph orch apply mgr --placement="2 host01 host02 host03"

In this example, the mgr daemons are deployed only on two hosts.

Verification
Edit online

List the hosts:

Example

[ceph: root@host01 /]# ceph orch host ls

Reference

IBM Storage Ceph 413


Edit online

See the Listing hosts section in the IBM Storage Ceph Operations Guide.

Service specification of the Ceph Orchestrator


Edit online
A service specification is a data structure to specify the service attributes and configuration settings that is used to deploy the Ceph
service. The following is an example of the multi-document YAML file, cluster.yaml, for specifying service specifications:

Example

service_type: mon
placement:
host_pattern: "mon*"
---
service_type: mgr
placement:
host_pattern: "mgr*"
---
service_type: osd
service_id: default_drive_group
placement:
host_pattern: "osd*"
data_devices:
all: true

The following list are the parameters where the properties of a service specification are defined as follows:

service_type: The type of service:

Ceph services like mon, crash, mds, mgr, osd, rbd, or rbd-mirror.

Ceph gateway like rgw.

Monitoring stack like Alertmanager, Prometheus, Grafana or Node-exporter.

Container for custom containers.

service_id: A unique name of the service.

placement: This is used to define where and how to deploy the daemons.

unmanaged: If set to true, the Orchestrator will neither deploy nor remove any daemon associated with this service.

Stateless service of Orchestrators

A stateless service is a service that does not need information of the state to be available. For example, to start an rgw service,
additional information is not needed to start or run the service. The rgw service does not create information about this state in order
to provide the functionality. Regardless of when the rgw service starts, the state is the same.

Deploying the Ceph daemons using the service specification


Edit online
Using the Ceph Orchestrator, you can deploy daemons such as ceph Manager, Ceph Monitors, Ceph OSDs, monitoring stack, and
others using the service specification in a YAML file.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

414 IBM Storage Ceph


Root-level access to all the nodes.

Procedure
Edit online

1. Create the yaml file:

Example

[root@host01 ~]# touch mon.yaml

2. This file can be configured in two different ways:

Edit the file to include the host details in placement specification:

Syntax

service_type: SERVICE_NAME
placement:
hosts:
- HOST_NAME_1
- HOST_NAME_2

Example

service_type: mon
placement:
hosts:
- host01
- host02
- host03

Edit the file to include the label details in placement specification:

Syntax

service_type: SERVICE_NAME
placement:
label: "LABEL_1"

Example

service_type: mon
placement:
label: "mon"

3. Optional: You can also use extra container arguments in the service specification files such as CPUs, CA certificates, and other
files while deploying services:

Example

extra_container_args:
- "-v"
- "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro"
- "--security-opt"
- "label=disable"
- "cpus=2"

4. Mount the YAML file under a directory in the container:

Example

[root@host01 ~]# cephadm shell --mount mon.yaml:/var/lib/ceph/mon/mon.yaml

5. Navigate to the directory:

Example

[ceph: root@host01 /]# cd /var/lib/ceph/mon/

6. Deploy the Ceph daemons using service specification:

IBM Storage Ceph 415


Syntax

ceph orch apply -i FILE_NAME.yaml

Example

[ceph: root@host01 mon]# ceph orch apply -i mon.yaml

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mon

Reference
Edit online

See the Listing hosts section in the IBM Storage Ceph Operations Guide.

Management of hosts
Edit online
As a storage administrator, you can use the Ceph Orchestrator with Cephadm in the backend to add, list, and remove hosts in an
existing IBM Storage Ceph cluster.

You can also add labels to hosts. Labels are free-form and have no specific meanings. Each host can have multiple labels. For
example, apply the mon label to all hosts that have monitor daemons deployed, mgr for all hosts with manager daemons deployed,
rgw for Ceph object gateways, and so on.

Labeling all the hosts in the storage cluster helps to simplify system management tasks by allowing you to quickly identify the
daemons running on each host. In addition, you can use the Ceph Orchestrator or a YAML file to deploy or remove daemons on hosts
that have specific host labels.

Adding hosts
Adding multiple hosts
Listing hosts
Adding labels to hosts
Removing labels from hosts
Removing hosts
Placing hosts in the maintenance mode

Adding hosts
Edit online
You can use the Ceph Orchestrator with Cephadm in the backend to add hosts to an existing IBM Storage Ceph cluster.

416 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all nodes in the storage cluster.

Register the nodes to the CDN and attach subscriptions.

Ansible user with sudo and passwordless ssh access to all nodes in the storage cluster.

The IP addresses of the new hosts should be updated in /etc/hosts file.

Procedure
Edit online

1. From the Ceph administration node, log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Extract the cluster’s public SSH keys to a folder:

Syntax

ceph cephadm get-pub-key > ~/PATH

Example

[ceph: root@host01 /]# ceph cephadm get-pub-key > ~/ceph.pub

3. Copy Ceph cluster’s public SSH keys to the root user’s authorized_keys file on the new host:

Syntax

ssh-copy-id -f -i ~/PATH root@HOST_NAME_2

Example

[ceph: root@host01 /]# ssh-copy-id -f -i ~/ceph.pub root@host02

4. From the Ansible administration node, add the new host to the Ansible inventory file. The default location for the file is
/usr/share/cephadm-ansible/hosts. The following example shows the structure of a typical inventory file:

Example

host01
host02
host03

[admin]
host00

NOTE: If you have previously added the new host to the Ansible inventory file and run the preflight playbook on the host, skip
to step 6.

5. Run the preflight playbook with the --limit option:

Syntax

ansible-playbook -i INVENTORY_FILE cephadm-preflight.yml --extra-vars "ceph_origin=ibm" --


limit NEWHOST

Example

[ansible@admin cephadm-ansible]$ ansible-playbook -i hosts cephadm-preflight.yml --extra-vars


"ceph_origin=ibm" --limit host02

IBM Storage Ceph 417


The preflight playbook installs podman, lvm2, chronyd, and cephadm on the new host. After installation is complete,
cephadm resides in the /usr/sbin/ directory.

6. From the Ceph administration node, log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

7. Use the cephadm orchestrator to add hosts to the storage cluster:

Syntax

ceph orch host add HOST_NAME IP_ADDRESS_OF_HOST [--label=LABEL_NAME_1,LABEL_NAME_2]

Example

[ceph: root@host01 /]# ceph orch host add host02 10.10.128.70 --labels=mon,mgr

Verification
Edit online

List the hosts:

Example

[ceph: root@host01 /]# ceph orch host ls

Reference
Edit online

See the Listing hosts section in the IBM Storage Ceph Operations Guide.

For more information about the cephadm-preflight playbook, see Running the preflight playbook section in the IBM
Storage Ceph Installation Guide.

See the Registering IBM Storage Ceph nodes to the CDN and attaching subscriptions section in the IBM Storage Ceph
Installation Guide.

See the Creating an Ansible user with sudo access section in the IBM Storage Ceph Installation Guide.

Adding multiple hosts


Edit online
You can use the Ceph Orchestrator to add multiple hosts to an IBM Storage Ceph cluster at the same time using the service
specification in YAML file format.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Procedure
Edit online

1. Create the hosts.yaml file:

Example

[root@host01 ~]# touch hosts.yaml

418 IBM Storage Ceph


2. Edit the hosts.yaml file to include the following details:

Example

service_type: host
addr: host01
hostname: host01
labels:
- mon
- osd
- mgr
---
service_type: host
addr: host02
hostname: host02
labels:
- mon
- osd
- mgr
---
service_type: host
addr: host03
hostname: host03
labels:
- mon
- osd

3. Mount the YAML file under a directory in the container:

Example

[root@host01 ~]# cephadm shell --mount hosts.yaml:/var/lib/ceph/hosts.yaml

4. Navigate to the directory:

Example

[ceph: root@host01 /]# cd /var/lib/ceph/

5. Deploy the hosts using service specification:

Syntax

ceph orch apply -i FILE_NAME.yaml

Example

[ceph: root@host01 hosts]# ceph orch apply -i hosts.yaml

Verification
Edit online

List the hosts:

Example

[ceph: root@host01 /]# ceph orch host ls

Reference
Edit online

See Listing hosts.

Listing hosts
Edit online

IBM Storage Ceph 419


You can list hosts of a Ceph cluster with Ceph Orchestrators.

NOTE: The STATUS of the hosts is blank, in the output of the ceph orch host ls command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List the hosts of the cluster:

Example

[ceph: root@host01 /]# ceph orch host ls

You will see that the STATUS of the hosts is blank which is expected.

Adding labels to hosts


Edit online
You can use the Ceph Orchestrator to add labels to hosts in an existing IBM Storage Ceph cluster. A few examples of labels are mgr,
mon, and osd based on the service deployed on the hosts.

You can also add the following host labels that have special meaning to cephadm and they begin with _:

_no_schedule: This label prevents cephadm from scheduling or deploying daemons on the host. If it is added to an existing
host that already contains Ceph daemons, it causes cephadm to move those daemons elsewhere, except OSDs which are not
removed automatically. When a host is added with the _no_schedule label, no daemons are deployed on it. When the
daemons are drained before the host is removed, the _no_schedule label is set on that host.

_no_autotune_memory: This label does not autotune memory on the host. It prevents the daemon memory from being
tuned even when the osd_memory_target_autotune option or other similar options are enabled for one or more daemons
on that host.

_admin: By default, the _admin label is applied to the bootstrapped host in the storage cluster and the client.admin key is
set to be distributed to that host with the ceph orch client-keyring {ls|set|rm} function. Adding this label to
additional hosts normally causes cephadm to deploy configuration and keyring files in the /etc/ceph directory.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the storage cluster

Procedure
Edit online

420 IBM Storage Ceph


1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Add labels to the hosts:

Syntax

ceph orch host label add HOST_NAME LABEL_NAME

Example

[ceph: root@host01 /]# ceph orch host label add host02 mon

Verification
Edit online

List the hosts:

Example

[ceph: root@host01 /]# ceph orch host ls

Removing labels from hosts


Edit online
Use this information to remove labels from hosts.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the storage cluster

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Remove the label:

Syntax

ceph orch host label rm HOST_NAME LABEL_NAME

Example

[ceph: root@host01 /]# ceph orch host label rm host02 mon

Verification
Edit online
Verify that the label has been moved from the host, by using the ceph orch host ls command.

IBM Storage Ceph 421


Removing hosts
Edit online
You can remove hosts of a Ceph cluster with the Ceph Orchestrators. All the daemons are removed with the drain option which
adds the _no_schedule label to ensure that you cannot deploy any daemons or a cluster till the operation is complete.

IMPORTANT: If you are removing the bootstrap host, be sure to copy the admin keyring and the configuration file to another host in
the storage cluster before you remove the host.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the storage cluster.

All the services are deployed.

Cephadm is deployed on the nodes where the services have to be removed.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Fetch the host details:

Example

[ceph: root@host01 /]# ceph orch host ls

3. Drain all the daemons from the host:

Syntax

ceph orch host drain HOSTNAME

Example

[ceph: root@host01 /]# ceph orch host drain host02

The _no_schedule label is automatically applied to the host which blocks deployment.

4. Check the status of OSD removal:

Example

[ceph: root@host01 /]# ceph orch osd rm status

When no placement groups (PG) are left on the OSD, the OSD is decommissioned and removed from the storage cluster.

5. Check if all the daemons are removed from the storage cluster:

Syntax

ceph orch ps HOSTNAME

Example

[ceph: root@host01 /]# ceph orch ps host02

422 IBM Storage Ceph


6. Remove the host:

Syntax

ceph orch host rm HOSTNAME

Example

[ceph: root@host01 /]# ceph orch host rm host02

Reference
Edit online

Adding hosts

Listing hosts

Placing hosts in the maintenance mode


Edit online
You can use the Ceph Orchestrator to place the hosts in and out of the maintenance mode. The ceph orch host maintenance
enter command stops the systemd target which causes all the Ceph daemons to stop on the host. Similarly, the ceph orch
host maintenance exit command restarts the systemd target and the Ceph daemons restart on their own.

The orchestrator adopts the following workflow when the host is placed in maintenance:

1. Confirms the removal of hosts does not impact data availability by running the orch host ok-to-stop command.

2. If the host has Ceph OSD daemons, it applies noout to the host subtree to prevent data migration from triggering during the
planned maintenance slot.

3. Stops the Ceph target, thereby, stopping all the daemons.

4. Disables the ceph target on the host, to prevent a reboot from automatically starting Ceph services.

Exiting maintenance reverses the above sequence.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts added to the cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. You can either place the host in maintenance mode or place it out of the maintenance mode:

Place the host in maintenance mode:

Syntax

ceph orch host maintenance enter HOST_NAME [--force]

IBM Storage Ceph 423


Example

[ceph: root@host01 /]# ceph orch host maintenance enter host02 --force

The --force flag allows the user to bypass warnings, but not alerts.

Place the host out of the maintenance mode:

Syntax

ceph orch host maintenance exit HOST_NAME

Example

[ceph: root@host01 /]# ceph orch host maintenance exit host02

List the hosts:

Example

[ceph: root@host01 /]# ceph orch host ls

Management of monitors
Edit online
As a storage administrator, you can deploy additional monitors using placement specification, add monitors using service
specification, add monitors to a subnet configuration, and add monitors to specific hosts. Apart from this, you can remove the
monitors.

By default, a typical IBM Storage Ceph cluster has three or five monitor daemons deployed on different hosts.

IBM recommends deploying five monitors if there are five or more nodes in a cluster.

Ceph deploys monitor daemons automatically as the cluster grows, and scales back monitor daemons automatically as the cluster
shrinks. The smooth execution of this automatic growing and shrinking depends upon proper subnet configuration.

If your monitor nodes or your entire cluster are located on a single subnet, then Cephadm automatically adds up to five monitor
daemons as you add new hosts to the cluster. Cephadm automatically configures the monitor daemons on the new hosts. The new
hosts reside on the same subnet as the bootstrapped host in the storage cluster.

Cephadm can also deploy and scale monitors to correspond to changes in the size of the storage cluster.

Ceph Monitors
Configuring monitor election strategy
Deploying the Ceph monitor daemons using the command line interface
Deploying the Ceph monitor daemons using the service specification
Deploying the monitor daemons on specific network
Removing the monitor daemons
Removing a Ceph Monitor from an unhealthy storage cluster

Ceph Monitors
Edit online
Ceph Monitors are lightweight processes that maintain a master copy of the storage cluster map. All Ceph clients contact a Ceph
monitor and retrieve the current copy of the storage cluster map, enabling clients to bind to a pool and read and write data.

Ceph Monitors use a variation of the Paxos protocol to establish consensus about maps and other critical information across the
storage cluster. Due to the nature of Paxos, Ceph requires a majority of monitors running to establish a quorum, thus establishing
consensus.

IMPORTANT: IBM requires at least three monitors on separate hosts to receive support for a production cluster.

424 IBM Storage Ceph


IBM recommends deploying an odd number of monitors. An odd number of Ceph Monitors has a higher resilience to failures than an
even number of monitors. For example, to maintain a quorum on a two-monitor deployment, Ceph cannot tolerate any failures; with
three monitors, one failure; with four monitors, one failure; with five monitors, two failures. This is why an odd number is advisable.
Summarizing, Ceph needs a majority of monitors to be running and to be able to communicate with each other, two out of three,
three out of four, and so on.

For an initial deployment of a multi-node Ceph storage cluster, IBM requires three monitors, increasing the number two at a time if a
valid need for more than three monitors exists.

Since Ceph Monitors are lightweight, it is possible to run them on the same host as OpenStack nodes. However, IBM recommends
running monitors on separate hosts.

IMPORTANT: IBM ONLY supports collocating Ceph services in containerized environments.

When you remove monitors from a storage cluster, consider that Ceph Monitors use the Paxos protocol to establish a consensus
about the master storage cluster map. You must have a sufficient number of Ceph Monitors to establish a quorum.

Reference
Edit online

See the IBM Storage Ceph Supported configurations Knowledgebase article for all the supported Ceph configurations.

Configuring monitor election strategy


Edit online
The monitor election strategy identifies the net splits and handles failures. You can configure the election monitor strategy in three
different modes:

1. classic - This is the default mode in which the lowest ranked monitor is voted based on the elector module between the two
sites.

2. disallow - This mode lets you mark monitors as disallowed, in which case they will participate in the quorum and serve
clients, but cannot be an elected leader. This lets you add monitors to a list of disallowed leaders. If a monitor is in the
disallowed list, it will always defer to another monitor.

3. connectivity - This mode is mainly used to resolve network discrepancies. It evaluates connection scores, based on pings
that check liveness, provided by each monitor for its peers and elects the most connected and reliable monitor to be the
leader. This mode is designed to handle net splits, which may happen if your cluster is stretched across multiple data centers
or otherwise susceptible. This mode incorporates connection score ratings and elects the monitor with the best score. If a
specific monitor is desired to be the leader, configure the election strategy so that the specific monitor is the first monitor in
the list with a rank of 0.

IBM recommends you to stay in the classic mode unless you require features in the other modes.

Before constructing the cluster, change the election_strategy to classic, disallow, or connectivity in the following
command:

Syntax

ceph mon set election_strategy {classic|disallow|connectivity}

Deploying the Ceph monitor daemons using the command line


interface
Edit online
The Ceph Orchestrator deploys one monitor daemon by default. You can deploy additional monitor daemons by using the
placement specification in the command line interface. To deploy a different number of monitor daemons, specify a different

IBM Storage Ceph 425


number. If you do not specify the hosts where the monitor daemons should be deployed, the Ceph Orchestrator randomly selects the
hosts and deploys the monitor daemons to them.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. There are four different ways of deploying Ceph monitor daemons:

Method 1

Use placement specification to deploy monitors on hosts:

NOTE: IBM recommends that you use the --placement option to deploy on specific hosts.

Syntax

ceph orch apply mon --placement="HOST_NAME_1 HOST_NAME_2 HOST_NAME_3"

Example

[ceph: root@host01 /]# ceph orch apply mon --placement="host01 host02 host03"

NOTE: Be sure to include the bootstrap node as the first node in the command.

IMPORTANT: Do not add the monitors individually as ceph orch apply mon supersedes and will not add the monitors to
all the hosts. For example, if you run the following commands, then the first command creates a monitor on host01. Then the
second command supersedes the monitor on host1 and creates a monitor on host02. Then the third command supersedes
the monitor on host02 and creates a monitor on host03. Eventually, there is a monitor only on the third host.

# ceph orch apply mon host01


# ceph orch apply mon host02
# ceph orch apply mon host03

Method 2

Use placement specification to deploy specific number of monitors on specific hosts with labels:

1. Add the labels to the hosts:

Syntax

ceph orch host label add HOSTNAME_1 LABEL

Example

[ceph: root@host01 /]# ceph orch host label add host01 mon

2. Deploy the daemons:

Syntax

ceph orch apply mon --placement="HOST_NAME_1:mon HOST_NAME_2:mon HOST_NAME_3:mon"

Example

[ceph: root@host01 /]# ceph orch apply mon --placement="host01:mon host02:mon host03:mon"

426 IBM Storage Ceph


Method 3

Use placement specification to deploy specific number of monitors on specific hosts:

Syntax

ceph orch apply mon --placement="NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_2 HOST_NAME_3"

Example

[ceph: root@host01 /]# ceph orch apply mon --placement="3 host01 host02 host03"

Method 4

Deploy monitor daemons randomly on the hosts in the storage cluster:

Syntax

ceph orch apply mon NUMBER_OF_DAEMONS

Example

[ceph: root@host01 /]# ceph orch apply mon 3

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mon

Deploying the Ceph monitor daemons using the service


specification
Edit online
The Ceph Orchestrator deploys one monitor daemon by default. You can deploy additional monitor daemons by using the service
specification, like a YAML format file.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

Procedure
Edit online

1. Create the mon.yaml file:

IBM Storage Ceph 427


Example

[root@host01 ~]# touch mon.yaml

2. Edit the mon.yaml file to include the following details:

Syntax

service_type: mon
placement:
hosts:
- HOST_NAME_1
- HOST_NAME_2

Example

service_type: mon
placement:
hosts:
- host01
- host02

3. Mount the YAML file under a directory in the container:

Example

[root@host01 ~]# cephadm shell --mount mon.yaml:/var/lib/ceph/mon/mon.yaml

4. Navigate to the directory:

Example

[ceph: root@host01 /]# cd /var/lib/ceph/mon/

5. Deploy the monitor daemons:

Syntax

ceph orch apply -i FILE_NAME.yaml

Example

[ceph: root@host01 mon]# ceph orch apply -i mon.yaml

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mon

Deploying the monitor daemons on specific network


Edit online
The Ceph Orchestrator deploys one monitor daemon by default. You can explicitly specify the IP address or CIDR network for each
monitor and control where each monitor is placed.

428 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Disable automated monitor deployment:

Example

[ceph: root@host01 /]# ceph orch apply mon --unmanaged

3. Deploy monitors on hosts on specific network:

Syntax

ceph orch daemon add mon _HOST_NAME_1_:_IP_OR_NETWORK_

Example

[ceph: root@host01 /]# ceph orch daemon add mon host03:10.1.2.123

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mon

Removing the monitor daemons


Edit online
To remove the monitor daemons from the host, you can just redeploy the monitor daemons on other hosts.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

IBM Storage Ceph 429


Hosts are added to the cluster.

At least one monitor daemon deployed on the hosts.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Run the ceph orch apply command to deploy the required monitor daemons:

Syntax

ceph orch apply mon “NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_3”

If you want to remove monitor daemons from host02, then you can redeploy the monitors on other hosts.

Example

[ceph: root@host01 /]# ceph orch apply mon “2 host01 host03”

Verification
Edit online

List the hosts,daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mon

Reference
Edit online

See Deploying the Ceph monitor daemons using the command line interface section in the IBM Storage Ceph Operations Guide
for more information.

See Deploying the Ceph monitor daemons using the service specification section in the IBM Storage Ceph Operations Guide for
more information.

Removing a Ceph Monitor from an unhealthy storage cluster


Edit online
You can remove a ceph-mon daemon from an unhealthy storage cluster. An unhealthy storage cluster is one that has placement
groups persistently in not active + clean state.

Prerequisites
Edit online

Root-level access to the Ceph Monitor node.

At least one running Ceph Monitor node.

430 IBM Storage Ceph


Procedure
Edit online

1. Identify a surviving monitor and log into the host:

Syntax

ssh root@MONITOR_ID

Example

[root@admin ~]# ssh root@host00

2. Log in to each Ceph Monitor host and stop all the Ceph Monitors:

Syntax

cephadm unit --name DAEMON_NAME.HOSTNAME stop

Example

[root@host00 ~]# cephadm unit --name mon.host00 stop

3. Set up the environment suitable for extended daemon maintenance and to run the daemon interactively:

Syntax

cephadm shell --name DAEMON_NAME.HOSTNAME

Example

[root@host00 ~]# cephadm shell --name mon.host00

4. Extract a copy of the monmap file:

Syntax

ceph-mon -i HOSTNAME --extract-monmap TEMP_PATH

Example

[ceph: root@host00 /]# ceph-mon -i host01 --extract-monmap /tmp/monmap

2022-01-05T11:13:24.440+0000 7f7603bd1700 -1 wrote monmap to /tmp/monmap

5. Remove the non-surviving Ceph Monitor(s):

Syntax

monmaptool TEMPORARY_PATH --rm HOSTNAME

Example

[ceph: root@host00 /]# monmaptool /tmp/monmap --rm host01

6. Inject the surviving monitor map with the removed monitor(s) into the surviving Ceph Monitor:

Syntax

ceph-mon -i HOSTNAME --inject-monmap TEMP_PATH

Example

[ceph: root@host00 /]# ceph-mon -i host00 --inject-monmap /tmp/monmap

7. Start only the surviving monitors:

Syntax

cephadm unit --name DAEMON_NAME.HOSTNAME start

Example

IBM Storage Ceph 431


[root@host00 ~]# cephadm unit --name mon.host00 start

8. Verify the monitors form a quorum:

Example

[ceph: root@host00 /]# ceph -s

9. Optional: Archive the removed Ceph Monitor’s data directory in /var/lib/ceph/_CLUSTER_FSID_/mon._HOSTNAME_


directory.

Management of managers
Edit online
As a storage administrator, you can use the Ceph Orchestrator to deploy additional manager daemons. Cephadm automatically
installs a manager daemon on the bootstrap node during the bootstrapping process.

In general, you should set up a Ceph Manager on each of the hosts running the Ceph Monitor daemon to achieve same level of
availability.

By default, whichever ceph-mgr instance comes up first is made active by the Ceph Monitors, and others are standby managers.
There is no requirement that there should be a quorum among the ceph-mgr daemons.

If the active daemon fails to send a beacon to the monitors for more than the mon mgr beacon grace, then it is replaced by a
standby.

If you want to pre-empt failover, you can explicitly mark a ceph-mgr daemon as failed with ceph mgr fail MANAGER_NAME
command.

Deploying the manager daemons


Removing the manager daemons
Using Ceph Manager modules
Using the Ceph Manager balancer module
Using the Ceph Manager alerts module
Using the Ceph manager crash module

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

Deploying the manager daemons


Edit online
The Ceph Orchestrator deploys two Manager daemons by default. You can deploy additional manager daemons using the
placement specification in the command line interface. To deploy a different number of Manager daemons, specify a different
number. If you do not specify the hosts where the Manager daemons should be deployed, the Ceph Orchestrator randomly selects
the hosts and deploys the Manager daemons to them.

Prerequisites
Edit online
NOTE: Ensure your deployment has at least three Ceph Managers in each deployment.

A running IBM Storage Ceph cluster.

432 IBM Storage Ceph


Hosts are added to the cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. You can deploy manager daemons in two different ways:

Method 1

Deploy manager daemons using placement specification on specific set of hosts:

NOTE: IBM recommends that you use the --placement option to deploy on specific hosts.

Syntax

ceph orch apply mgr --placement=" HOST_NAME_1 HOST_NAME_2 HOST_NAME_3"

Example

[ceph: root@host01 /]# ceph orch apply mgr --placement="host01 host02 host03"

Method 2

Deploy manager daemons randomly on the hosts in the storage cluster:

Syntax

ceph orch apply mgr NUMBER_OF_DAEMONS

Example

[ceph: root@host01 /]# ceph orch apply mgr 3

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mgr

Removing the manager daemons


Edit online
To remove the manager daemons from the host, you can just redeploy the daemons on other hosts.

Prerequisites
IBM Storage Ceph 433
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

At least one manager daemon deployed on the hosts.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Run the ceph orch apply command to redeploy the required manager daemons:

Syntax

ceph orch apply mgr "“"NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_3"

If you want to remove manager daemons from host02, then you can redeploy the manager daemons on other hosts.

Example

[ceph: root@host01 /]# ceph orch apply mgr "“"2 host01 host03"

Verification
Edit online

List the hosts,daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mgr

Reference
Edit online

See Deploying the manager daemons section in the IBM Storage Ceph Operations Guide for more information.

Using Ceph Manager modules


Edit online
Use the ceph mgr module ls command to see the available modules and the modules that are presently enabled.

Enable or disable modules with ceph mgr module enable MODULE command or ceph mgr module disable MODULE
command respectively.

If a module is enabled, then the active ceph-mgr daemon loads and executes it. In the case of modules that provide a service, such
as an HTTP server, the module might publish its address when it is loaded. To see the addresses of such modules, run the ceph mgr
services command.

434 IBM Storage Ceph


Some modules might also implement a special standby mode which runs on standby ceph-mgr daemon as well as the active
daemon. This enables modules that provide services to redirect their clients to the active daemon, if the client tries to connect to a
standby.

Following is an example to enable the dashboard module:

[ceph: root@host01 /]# ceph mgr module enable dashboard

[ceph: root@host01 /]# ceph mgr module ls

MODULE
balancer on (always on)
crash on (always on)
devicehealth on (always on)
orchestrator on (always on)
pg_autoscaler on (always on)
progress on (always on)
rbd_support on (always on)
status on (always on)
telemetry on (always on)
volumes on (always on)
cephadm on
dashboard on
iostat on
nfs on
prometheus on
restful on
alerts -
diskprediction_local -
influx -
insights -
k8sevents -
localpool -
mds_autoscaler -
mirroring -
osd_perf_query -
osd_support -
rgw -
rook -
selftest -
snap_schedule -
stats -
telegraf -
test_orchestrator -
zabbix -

[ceph: root@host01 /]# ceph mgr services


{
"dashboard": "http://myserver.com:7789/",
"restful": "https://myserver.com:8789/"
}

The first time the cluster starts, it uses the mgr_initial_modules setting to override which modules to enable. However, this
setting is ignored through the rest of the lifetime of the cluster: only use it for bootstrapping. For example, before starting your
monitor daemons for the first time, you might add a section like this to your ceph.conf file:

[mon]
mgr initial modules = dashboard balancer

Where a module implements comment line hooks, the commands are accessible as ordinary Ceph commands and Ceph
automatically incorporates module commands into the standard CLI interface and route them appropriately to the module:

[ceph: root@host01 /]# ceph <command | help>

You can use the following configuration parameters with the above command:

Table 1. Configuration parameters


Configuration Description Type Default
mgr module Path to load modules from. String "<library dir>/mgr"
path
mgr data Path to load daemon data (such as keyring) String "/var/lib/ceph/mgr/
$cluster-$id"

IBM Storage Ceph 435


mgr tick How many seconds between manager beacons to monitors, Integer 5
period and other periodic checks.
mon mgr beacon How long after last beacon should a manager be considered Integer 30
grace failed.

Using the Ceph Manager balancer module


Edit online
The balancer is a module for Ceph Manager (ceph-mgr) that optimizes the placement of placement groups (PGs) across OSDs in
order to achieve a balanced distribution, either automatically or in a supervised fashion.

Currently the balancer module cannot be disabled. It can only be turned off to customize the configuration.

Modes
Edit online
There are currently two supported balancer modes:

crush-compat: The CRUSH compat mode uses the compat weight-set feature, introduced in Ceph Luminous, to manage an
alternative set of weights for devices in the CRUSH hierarchy. The normal weights should remain set to the size of the device
to reflect the target amount of data that you want to store on the device. The balancer then optimizes the weight-set
values, adjusting them up or down in small increments in order to achieve a distribution that matches the target distribution as
closely as possible. Because PG placement is a pseudorandom process, there is a natural amount of variation in the
placement; by optimizing the weights, the balancer counter-acts that natural variation.

This mode is fully backwards compatible with older clients. When an OSDMap and CRUSH map are shared with older clients, the
balancer presents the optimized weights as the real weights.

The primary restriction of this mode is that the balancer cannot handle multiple CRUSH hierarchies with different placement rules if
the subtrees of the hierarchy share any OSDs. Because this configuration makes managing space utilization on the shared OSDs
difficult, it is generally not recommended. As such, this restriction is normally not an issue.

upmap: Starting with Luminous, the OSDMap can store explicit mappings for individual OSDs as exceptions to the normal
CRUSH placement calculation. These upmap entries provide fine-grained control over the PG mapping. This CRUSH mode will
optimize the placement of individual PGs in order to achieve a balanced distribution. In most cases, this distribution is
"perfect", with an equal number of PGs on each OSD +/-1 PG, as they might not divide evenly.

IMPORTANT:

To allow use of this feature, you must tell the cluster that it only needs to support
luminous or later clients with the following command:

[ceph: root@host01 /]# ceph osd set-require-min-compat-client luminous

This command fails if any pre-luminous clients or daemons are connected to the monitors.

Due to a known issue, kernel CephFS clients report themselves as jewel clients. To work
around this issue, use the `--yes-i-really-mean-it` flag:

[ceph: root@host01 /]# ceph osd set-require-min-compat-client luminous --yes-i-really-mean-


it

You can check what client versions are in use with:

[ceph: root@host01 /]# ceph features

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Procedure
436 IBM Storage Ceph
Edit online

1. Ensure the balancer module is enabled:

Example

[ceph: root@host01 /]# ceph mgr module enable balancer

2. Turn on the balancer module:

Example

[ceph: root@host01 /]# ceph balancer on

3. The default mode is upmap. The mode can be changed with:

Example

[ceph: root@host01 /]# ceph balancer mode crush-compat

or

Example

[ceph: root@host01 /]# ceph balancer mode upmap

Status

The current status of the balancer can be checked at any time with:

Example

[ceph: root@host01 /]# ceph balancer status

Automatic balancing

By default, when turning on the balancer module, automatic balancing is used:

Example

[ceph: root@host01 /]# ceph balancer on

The balancer can be turned back off again with:

Example

[ceph: root@host01 /]# ceph balancer off

This will use the crush-compat mode, which is backward compatible with older clients and will make small changes to the data
distribution over time to ensure that OSDs are equally utilized.

Throttling

No adjustments will be made to the PG distribution if the cluster is degraded, for example, if an OSD has failed and the system has
not yet healed itself.

When the cluster is healthy, the balancer throttles its changes such that the percentage of PGs that are misplaced, or need to be
moved, is below a threshold of 5% by default. This percentage can be adjusted using the target_max_misplaced_ratio setting.
For example, to increase the threshold to 7%:

Example

[ceph: root@host01 /]# ceph config-key set mgr target_max_misplaced_ratio .07

Supervised optimization

The balancer operation is broken into a few distinct phases:

1. Building a plan.

2. Evaluating the quality of the data distribution, either for the current PG distribution, or the PG distribution that would result
after executing a plan.

IBM Storage Ceph 437


3. Executing the plan.

To evaluate and score the current distribution:

Example

[ceph: root@host01 /]# ceph balancer eval

To evaluate the distribution for a single pool:

Syntax

ceph balancer eval POOL_NAME

Example

[ceph: root@host01 /]# ceph balancer eval rbd

To see greater detail for the evaluation:

Example

[ceph: root@host01 /]# ceph balancer eval-verbose ...

To generate a plan using the currently configured mode:

Syntax

ceph balancer optimize PLAN_NAME

Replace PLAN_NAME with a custom plan name.

Example

[ceph: root@host01 /]# ceph balancer optimize rbd_123

To see the contents of a plan:

Syntax

ceph balancer show PLAN_NAME

Example

[ceph: root@host01 /]# ceph balancer show rbd_123

To discard old plans:

Syntax

ceph balancer rm PLAN_NAME

Example

[ceph: root@host01 /]# ceph balancer rm rbd_123

To see currently recorded plans use the status command:

[ceph: root@host01 /]# ceph balancer status

To calculate the quality of the distribution that would result after executing a plan:

Syntax

ceph balancer eval PLAN_NAME

Example

[ceph: root@host01 /]# ceph balancer eval rbd_123

To execute the plan:

Syntax

ceph balancer execute PLAN_NAME

438 IBM Storage Ceph


Example

[ceph: root@host01 /]# ceph balancer execute rbd_123

NOTE: Only execute the plan if it is expected to improve the distribution. After execution, the plan will be discarded.

Using the Ceph Manager alerts module


Edit online
You can use the Ceph Manager alerts module to send simple alert messages about the IBM Storage Ceph cluster’s health by email.

NOTE: This module is not intended to be a robust monitoring solution. The fact that it is run as part of the Ceph cluster itself is
fundamentally limiting in that a failure of the ceph-mgr daemon prevents alerts from being sent. This module can, however, be
useful for standalone clusters that exist in environments where existing monitoring infrastructure does not exist.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Monitor node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Enable the alerts module:

Example

[ceph: root@host01 /]# ceph mgr module enable alerts

3. Ensure the alerts module is enabled:

Example

[ceph: root@host01 /]# ceph mgr module ls | more


{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [
"alerts",
"cephadm",
"dashboard",
"iostat",
"prometheus",
"restful"
]

4. Configure the Simple Mail Transfer Protocol (SMTP):

IBM Storage Ceph 439


Syntax

ceph config set mgr mgr/alerts/smtp_host SMTP_SERVER


ceph config set mgr mgr/alerts/smtp_destination RECEIVER_EMAIL_ADDRESS
ceph config set mgr mgr/alerts/smtp_sender SENDER_EMAIL_ADDRESS

Example

[ceph: root@host01 /]# ceph config set mgr mgr/alerts/smtp_host smtp.example.com


[ceph: root@host01 /]# ceph config set mgr mgr/alerts/smtp_destination [email protected]
[ceph: root@host01 /]# ceph config set mgr mgr/alerts/smtp_sender [email protected]

5. Optional: By default, the alerts module uses SSL and port 465.

Syntax

ceph config set mgr mgr/alerts/smtp_port PORT_NUMBER

Example

[ceph: root@host01 /]# ceph config set mgr mgr/alerts/smtp_port 587

Do not set the smtp_ssl parameter while configuring alerts.

6. Authenticate to the SMTP server:

Syntax

ceph config set mgr mgr/alerts/smtp_user USERNAME


ceph config set mgr mgr/alerts/smtp_password PASSWORD

Example

[ceph: root@host01 /]# ceph config set mgr mgr/alerts/smtp_user admin1234


[ceph: root@host01 /]# ceph config set mgr mgr/alerts/smtp_password admin1234

7. Optional: By default, SMTP From name is Ceph. To change that, set the smtp_from_name parameter:

Syntax

ceph config set mgr mgr/alerts/smtp_from_name CLUSTER_NAME

Example

[ceph: root@host01 /]# ceph config set mgr mgr/alerts/smtp_from_name 'Ceph Cluster Test'

8. Optional: By default, the alerts module checks the storage cluster’s health every minute, and sends a message when there is a
change in the cluster health status. To change the frequency, set the interval parameter:

Syntax

ceph config set mgr mgr/alerts/interval INTERVAL

Example

[ceph: root@host01 /]# ceph config set mgr mgr/alerts/interval "5m"

In this example, the interval is set to 5 minutes.

9. Optional: Send an alert immediately:

Example

[ceph: root@host01 /]# ceph alerts send

Reference
Edit online

See the Health messages of a Ceph cluster section in the IBM Storage Ceph Troubleshooting Guide for more information on
Ceph health messages.

440 IBM Storage Ceph


Using the Ceph manager crash module
Edit online
By default, daemon crashdumps are dumped in /var/lib/ceph/crash. You can configure it with the option crash dir. Crash
directories are named by time, date, and a randomly-generated UUID, and contain a metadata file meta and a recent log file, with a
crash_id that is the same.

You can use ceph-crash.service to submit these crashes automatically and persist in the Ceph Monitors. The ceph-
crash.service watches the crashdump directory and uploads them with ceph crash post.

The RECENT_CRASH heath message is one of the most common health messages in a Ceph cluster. This health message means that
one or more Ceph daemons has crashed recently, and the crash has not yet been archived or acknowledged by the administrator.
This might indicate a software bug, a hardware problem like a failing disk, or some other problem. The option
mgr/crash/warn_recent_interval controls the time period of what recent means, which is two weeks by default. You can
disable the warnings by running the following command:

Example

[ceph: root@host01 /]# ceph config set mgr/crash/warn_recent_interval 0

The option mgr/crash/retain_interval controls the period for which you want to retain the crash reports before they are
automatically purged. The default for this option is one year.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Procedure
Edit online

1. Ensure the crash module is enabled:

Example

[ceph: root@host01 /]# ceph mgr module ls | more


{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator_cli",
"progress",
"rbd_support",
"status",
"volumes"
],
"enabled_modules": [
"dashboard",
"pg_autoscaler",
"prometheus"
]

2. Save a crash dump: The metadata file is a JSON blob stored in the crash dir as meta. You can invoke the ceph command -i -
option, which reads from stdin.

Example

[ceph: root@host01 /]# ceph crash post -i meta

3. List the timestamp or the UUID crash IDs for all the new and archived crash info:

Example

[ceph: root@host01 /]# ceph crash ls

IBM Storage Ceph 441


4. List the timestamp or the UUID crash IDs for all the new crash information:

Example

[ceph: root@host01 /]# ceph crash ls-new

5. List the timestamp or the UUID crash IDs for all the new crash information:

Example

[ceph: root@host01 /]# ceph crash ls-new

6. List the summary of saved crash information grouped by age:

Example

[ceph: root@host01 /]# ceph crash stat


8 crashes recorded
8 older than 1 days old:
2022-05-20T08:30:14.533316Z_4ea88673-8db6-4959-a8c6-0eea22d305c2
2022-05-20T08:30:14.590789Z_30a8bb92-2147-4e0f-a58b-a12c2c73d4f5
2022-05-20T08:34:42.278648Z_6a91a778-bce6-4ef3-a3fb-84c4276c8297
2022-05-20T08:34:42.801268Z_e5f25c74-c381-46b1-bee3-63d891f9fc2d
2022-05-20T08:34:42.803141Z_96adfc59-be3a-4a38-9981-e71ad3d55e47
2022-05-20T08:34:42.830416Z_e45ed474-550c-44b3-b9bb-283e3f4cc1fe
2022-05-24T19:58:42.549073Z_b2382865-ea89-4be2-b46f-9a59af7b7a2d
2022-05-24T19:58:44.315282Z_1847afbc-f8a9-45da-94e8-5aef0738954e

7. View the details of the saved crash:

Syntax

ceph crash info CRASH_ID

Example

[ceph: root@host01 /]# ceph crash info 2022-05-24T19:58:42.549073Z_b2382865-ea89-4be2-b46f-


9a59af7b7a2d
{
"assert_condition": "session_map.sessions.empty()",
"assert_file": "/builddir/build/BUILD/ceph-16.1.0-486-g324d7073/src/mon/Monitor.cc",
"assert_func": "virtual Monitor::~Monitor()",
"assert_line": 287,
"assert_msg": "/builddir/build/BUILD/ceph-16.1.0-486-g324d7073/src/mon/Monitor.cc: In
function 'virtual Monitor::~Monitor()' thread 7f67a1aeb700 time 2022-05-
24T19:58:42.545485+0000\n/builddir/build/BUILD/ceph-16.1.0-486-g324d7073/src/mon/Monitor.cc:
287: FAILED ceph_assert(session_map.sessions.empty())\n",
"assert_thread_name": "ceph-mon",
"backtrace": [
"/lib64/libpthread.so.0(+0x12b30) [0x7f679678bb30]",
"gsignal()",
"abort()",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9)
[0x7f6798c8d37b]",
"/usr/lib64/ceph/libceph-common.so.2(+0x276544) [0x7f6798c8d544]",
"(Monitor::~Monitor()+0xe30) [0x561152ed3c80]",
"(Monitor::~Monitor()+0xd) [0x561152ed3cdd]",
"main()",
"__libc_start_main()",
"_start()"
],
"ceph_version": "16.2.8-65.el8cp",
"crash_id": "2022-07-06T19:58:42.549073Z_b2382865-ea89-4be2-b46f-9a59af7b7a2d",
"entity_name": "mon.ceph-adm4",
"os_id": "rhel",
"os_name": "Red Hat Enterprise Linux",
"os_version": "8.5 (Ootpa)",
"os_version_id": "8.5",
"process_name": "ceph-mon",
"stack_sig": "957c21d558d0cba4cee9e8aaf9227b3b1b09738b8a4d2c9f4dc26d9233b0d511",
"timestamp": "2022-07-06T19:58:42.549073Z",
"utsname_hostname": "host02",
"utsname_machine": "x86_64",
"utsname_release": "4.18.0-240.15.1.el8_3.x86_64",
"utsname_sysname": "Linux",

442 IBM Storage Ceph


"utsname_version": "#1 SMP Wed Jul 06 03:12:15 EDT 2022"
}

8. Remove saved crashes older than KEEP days: Here, KEEP must be an integer.

Syntax

ceph crash prune KEEP

Example

[ceph: root@host01 /]# ceph crash prune 60

9. Archive a crash report so that it is no longer considered for the RECENT_CRASH health check and does not appear in the
crash ls-new output. It appears in the crash ls.

Syntax

ceph crash archive CRASH_ID

Example

[ceph: root@host01 /]# ceph crash archive 2022-05-24T19:58:42.549073Z_b2382865-ea89-4be2-b46f-


9a59af7b7a2d

10. Archive all crash reports:

Example

[ceph: root@host01 /]# ceph crash archive-all

11. Remove the crash dump:

Syntax

ceph crash rm CRASH_ID

Example

[ceph: root@host01 /]# ceph crash rm 2022-05-24T19:58:42.549073Z_b2382865-ea89-4be2-b46f-


9a59af7b7a2d

Reference
Edit online

See the Health messages of a Ceph cluster section in the IBM Storage Ceph Troubleshooting Guide for more information on
Ceph health messages.

Management of OSDs
Edit online
As a storage administrator, you can use the Ceph Orchestrators to manage OSDs of an IBM Storage Ceph cluster.

Ceph OSDs
Ceph OSD node configuration
Automatically tuning OSD memory
Listing devices for Ceph OSD deployment
Zapping devices for Ceph OSD deployment
Deploying Ceph OSDs on all available devices
Deploying Ceph OSDs on specific devices and hosts
Advanced service specifications and filters for deploying OSDs
Deploying Ceph OSDs using advanced service specifications
Removing the OSD daemons
Replacing the OSDs
Replacing the OSDs with pre-created LVM
Replacing the OSDs in a non-colocated scenario

IBM Storage Ceph 443


Stopping the removal of the OSDs
Activating the OSDs
Recalculating the placement groups

Ceph OSDs
Edit online
A Ceph OSD generally consists of one ceph-osd daemon for one storage drive and its associated journal within a node. If a node has
multiple storage drives, then map one ceph-osd daemon for each drive.

IBM recommends checking the capacity of a cluster regularly to see if it is reaching the upper end of its storage capacity. As a
storage cluster reaches its near full ratio, add one or more OSDs to expand the storage cluster’s capacity.

If the node has multiple storage drives, you might also need to remove one of the ceph-osd daemon for that drive. Generally, it’s a
good idea to check the capacity of the storage cluster to see if you are reaching the upper end of its capacity. Ensure that when you
remove an OSD that the storage cluster is not at its near full ratio.

IMPORTANT: Do not let a storage cluster reach the full ratio before adding an OSD. OSD failures that occur after the storage
cluster reaches the near full ratio can cause the storage cluster to exceed the full ratio. Ceph blocks write access to protect the
data until you resolve the storage capacity issues. Do not remove OSDs without considering the impact on the full ratio first.

Ceph OSD node configuration


Edit online
Configure Ceph OSDs and their supporting hardware similarly as a storage strategy for the pool(s) that will use the OSDs. Ceph
prefers uniform hardware across pools for a consistent performance profile. For best performance, consider a CRUSH hierarchy with
drives of the same type or size.

If you add drives of dissimilar size, adjust their weights accordingly. When you add the OSD to the CRUSH map, consider the weight
for the new OSD. Hard drive capacity grows approximately 40% per year, so newer OSD nodes might have larger hard drives than
older nodes in the storage cluster, that is, they might have a greater weight.

Automatically tuning OSD memory


Edit online
The OSD daemons adjust the memory consumption based on the osd_memory_target configuration option. The option
osd_memory_target sets OSD memory based upon the available RAM in the system.

Syntax

ceph config set osd osd_memory_target_autotune true

Cephadm starts with a fraction mgr/cephadm/autotune_memory_target_ratio, which defaults to 0.7 of the total RAM in the
system, subtract off any memory consumed by non-autotuned daemons such as non-OSDS and for OSDs for which
osd_memory_target_autotune is false, and then divide by the remaining OSDs.

By default, autotune_memory_target_ratio is 0.2 for hyper-converged infrastructure and 0.7 for other environments.

The osd_memory_target parameter is calculated as follows:

Syntax

osd_memory_target = TOTAL_RAM_OF_THE_OSD_NODE (in Bytes) * (autotune_memory_target_ratio) /


NUMBER_OF_OSDS_IN_THE_OSD_NODE - (SPACE_ALLOCATED_FOR_OTHER_DAEMONS (in Bytes))

SPACE_ALLOCATED_FOR_OTHER_DAEMONS may optionally include the following daemon space allocations:

Alertmanager: 1 GB

444 IBM Storage Ceph


Grafana: 1 GB

Ceph Manager: 4 GB

Ceph Monitor: 2 GB

Node-exporter: 1 GB

Prometheus: 1 GB

For example, if a node has 24 OSDs and has 251 GB RAM space, then osd_memory_target is 7860684936.

The final targets are reflected in the configuration database with options. You can view the limits and the current memory consumed
by each daemon from the ceph orch ps output under MEM LIMIT column.

NOTE: In a hyperconverged infrastructure, the autotune_memory_target_ratio can be set to 0.2 to reduce the memory
consumption of Ceph.

Example

[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2

You can manually set a specific memory target for an OSD in the storage cluster.

Example

[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target 7860684936

You can manually set a specific memory target for an OSD host in the storage cluster.

Syntax

ceph config set osd/host:_HOSTNAME_ osd_memory_target _TARGET_BYTES_

Example

[ceph: root@host01 /]# ceph config set osd/host:host01 osd_memory_target 1000000000

NOTE: Enabling osd_memory_target_autotune overwrites existing manual OSD memory target settings. To prevent daemon
memory from being tuned even when the osd_memory_target_autotune option or other similar options are enabled, set the
_no_autotune_memory label on the host.

Syntax

ceph orch host label add HOSTNAME _no_autotune_memory

You can exclude an OSD from memory autotuning by disabling the autotune option and setting a specific memory target.

Example

[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target_autotune false


[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target 16G

Listing devices for Ceph OSD deployment


Edit online
You can check the list of available devices before deploying OSDs using the Ceph Orchestrator. The commands are used to print a list
of devices discoverable by Cephadm. A storage device is considered available if all of the following conditions are met:

The device must have no partitions.

The device must not have any LVM state.

The device must not be mounted.

The device must not contain a file system.

The device must not contain a Ceph BlueStore OSD.

IBM Storage Ceph 445


The device must be larger than 5 GB.

NOTE: Ceph will not provision an OSD on a device that is not available.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

All manager and monitor daemons are deployed.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List the available devices to deploy OSDs:

Syntax

ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]

Example

[ceph: root@host01 /]# ceph orch device ls --wide --refresh

Using the --wide option provides all details relating to the device, including any reasons that the device might not be eligible
for use as an OSD. This option does not support NVMe devices.

3. Optional: To enable Health, Ident, and Fault fields in the output of ceph orch device ls, run the following commands:

NOTE: These fields are supported by libstoragemgmt library and currently supports SCSI, SAS, and SATA devices.

a. As root user outside the Cephadm shell, check your hardware’s compatibility with libstoragemgmt library to avoid
unplanned interruption to services:

Example

[root@host01 ~]# cephadm shell lsmcli ldl

In the output, you see the Health Status as Good with the respective SCSI VPD 0x83 ID.

NOTE: If you do not get this information, then enabling the fields might cause erratic behavior of devices.

b. Log back into the Cephadm shell and enable libstoragemgmt support:

Example

[root@host01 ~]# cephadm shell


[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/device_enhanced_scan true

Once this is enabled, ceph orch device ls gives the output of Health field as Good.

Verification
Edit online

List the devices:

Example

446 IBM Storage Ceph


[ceph: root@host01 /]# ceph orch device ls

Zapping devices for Ceph OSD deployment


Edit online
You need to check the list of available devices before deploying OSDs. If there is no space available on the devices, you can clear the
data on the devices by zapping them.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

All manager and monitor daemons are deployed.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List the available devices to deploy OSDs:

Syntax

ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]

Example

[ceph: root@host01 /]# ceph orch device ls --wide --refresh

3. Clear the data of a device:

Syntax

ceph orch device zap HOSTNAME FILE_PATH --force

Example

[ceph: root@host01 /]# ceph orch device zap host02 /dev/sdb --force

Verification
Edit online

Verify the space is available on the device:

Example

[ceph: root@host01 /]# ceph orch device ls

You will see that the field under Available is Yes.

Reference
Edit online

IBM Storage Ceph 447


See the Listing devices for Ceph OSD deployment section in the IBM Storage Ceph Operations Guide for more information.

Deploying Ceph OSDs on all available devices


Edit online
You can deploy all OSDS on all the available devices. Cephadm allows the Ceph Orchestrator to discover and deploy the OSDs on any
available and unused storage device.

To deploy OSDs all available devices, run the command without the unmanaged parameter and then re-run the command with the
parameter to prevent from creating future OSDs.

NOTE: The deployment of OSDs with --all-available-devices is generally used for smaller clusters. For larger clusters, use
the OSD specification file.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

All manager and monitor daemons are deployed.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List the available devices to deploy OSDs:

Syntax

ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]

Example

[ceph: root@host01 /]# ceph orch device ls --wide --refresh

3. Deploy OSDs on all available devices:

Example

[ceph: root@host01 /]# ceph orch apply osd --all-available-devices

The effect of ceph orch apply is persistent which means that the Orchestrator automatically finds the device, adds it to the
cluster, and creates new OSDs. This occurs under the following conditions:

New disks or drives are added to the system.

Existing disks or drives are zapped.

An OSD is removed and the devices are zapped.

You can disable automatic creation of OSDs on all the available devices by using the --unmanaged parameter.

Example

[ceph: root@host01 /]# ceph orch apply osd --all-available-devices --unmanaged=true

448 IBM Storage Ceph


Setting the parameter --unmanaged to true disables the creation of OSDs and also there is no change if you apply a
new OSD service.

NOTE: The command ceph orch daemon add creates new OSDs, but does not add an OSD service.

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

View the details of the node and devices:

Example

[ceph: root@host01 /]# ceph osd tree

Reference
Edit online

See the Listing devices for Ceph OSD deployment section in the IBM Storage Ceph Operations Guide.

Deploying Ceph OSDs on specific devices and hosts


Edit online
You can deploy all the Ceph OSDs on specific devices and hosts using the Ceph Orchestrator.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

All manager and monitor daemons are deployed.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List the available devices to deploy OSDs:

Syntax

ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]

Example

[ceph: root@host01 /]# ceph orch device ls --wide --refresh

3. Deploy OSDs on specific devices and hosts:

IBM Storage Ceph 449


Syntax

ceph orch daemon add osd _HOSTNAME_:_DEVICE_PATH_

Example

[ceph: root@host01 /]# ceph orch daemon add osd host02:/dev/sdb

To deploy ODSs on a raw physical device, without an LVM layer, use the --method raw option.

Syntax

ceph orch daemon add osd --method raw _HOSTNAME_:_DEVICE_PATH_

Example

[ceph: root@host01 /]# ceph orch daemon add osd --method raw host02:/dev/sdb

NOTE: If you have separate DB or WAL devices, the ratio of block to DB or WAL devices MUST be 1:1.

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls osd

View the details of the node and devices:

Example

[ceph: root@host01 /]# ceph osd tree

List the hosts, daemons, and processes:

Syntax

ceph orch ps --service_name=SERVICE_NAME

Example

[ceph: root@host01 /]# ceph orch ps --service_name=osd

Reference
Edit online

See the Listing devices for Ceph OSD deployment section in the IBM Storage Ceph Operations Guide.

Advanced service specifications and filters for deploying OSDs


Edit online
Service Specification of type OSD is a way to describe a cluster layout using the properties of disks. It gives the user an abstract way
to tell Ceph which disks should turn into an OSD with the required configuration without knowing the specifics of device names and
paths. For each device and each host, define a yaml file or a json file.

General settings for OSD specifications

service_type: osd: This is mandatory to create OSDS

service_id: Use the service name or identification you prefer. A set of OSDs is created using the specification file. This name is
used to manage all the OSDs together and represent an Orchestrator service.

placement: This is used to define the hosts on which the OSDs need to be deployed.

450 IBM Storage Ceph


You can use on the following options:

host_pattern: \* - A host name pattern used to select hosts.

label: osd_host - A label used in the hosts where OSD need to be deployed.

hosts: host01, host02 - An explicit list of host names where OSDs needs to be deployed.

selection of devices: The devices where OSDs are created. This allows us to separate an OSD from different devices. You can
create only BlueStore OSDs which have three components:

OSD data: contains all the OSD data

WAL: BlueStore internal journal or write-ahead Log

DB: BlueStore internal metadata

data_devices: Define the devices to deploy OSD. In this case, OSDs are created in a collocated schema. You can use filters to
select devices and folders.

wal_devices: Define the devices used for WAL OSDs. You can use filters to select devices and folders.

db_devices: Define the devices for DB OSDs. You can use the filters to select devices and folders.

encrypted: An optional parameter to encrypt information on the OSD which can set to either True or False

unmanaged: An optional parameter, set to False by default. You can set it to True if you do not want the Orchestrator to
manage the OSD service.

block_wal_size: User-defined value, in bytes.

block_db_size: User-defined value, in bytes.

osds_per_device: User-defined value for deploying more than one OSD per device.

method: An optional parameter to specify if an OSD is created with an LVM layer or not. Set to raw if you want to create OSDs
on raw physical devices that do not include an LVM layer. If you have separate DB or WAL devices, the ratio of block to DB or
WAL devices MUST be 1:1.

Filters for specifying devices

Filters are used in conjunction with the data_devices, wal_devices and db_devices parameters.

Name of the Description Syntax Example


filter
Model Target specific disks. You can get details of the model by running lsblk -o Model: Model: MC-55-
NAME,FSTYPE,LABEL,MOUNTPOINT,SIZE,MODEL command or DISK_MODEL_N 44-XZ
smartctl -i /_DEVIVE_PATH_ AME
Vendor Target specific disks Vendor: Vendor: Vendor
DISK_VENDOR_ Cs
NAME
Size Includes disks of an exact size size: EXACT size: 10G
Specification
Size Includes disks size of which is within the range size: LOW:HIGH size: 10G:40G
Specification
Size Includes disks less than or equal to in size size: :HIGH size: :10G
Specification
Size Includes disks equal to or greater than in size size: LOW: size: 40G:
Specification
Rotational Rotational attribute of the disk. 1 matches all disks that are rotational and 0 rotational: 0 or 1 rotational: 0
matches all the disks that are non-rotational. If rotational =0, then OSD is
configured with SSD or NVME. If rotational=1 then the OSD is configured
with HDD.
All Considers all the available disks all: true all: true
Limiter When you have specified valid filters but want to limit the amount of limit: NUMBER limit: 2
matching disks you can use the ‘limit’ directive. It should be used only as a

IBM Storage Ceph 451


last resort.
NOTE: To create an OSD with non-collocated components in the same host, you have to specify the different types of devices used
and the devices should be on the same host.

NOTE: The devices used for deploying OSDs must be supported by libstoragemgmt.

Reference
Edit online

See the Deploying Ceph OSDs using the advanced specifications section in the IBM Storage Ceph Operations Guide.

For more information on libstoragemgmt, see the Listing devices for Ceph OSD deployment section in the IBM Storage Ceph
Operations Guide.

Deploying Ceph OSDs using advanced service specifications


Edit online
The service specification of type OSD is a way to describe a cluster layout using the properties of disks. It gives the user an abstract
way to tell Ceph which disks should turn into an OSD with the required configuration without knowing the specifics of device names
and paths.

You can deploy the OSD for each device and each host by defining a yaml file or a json file.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

All manager and monitor daemons are deployed.

Procedure
Edit online

1. On the monitor node, create the osd_spec.yaml file:

Example

[root@host01 ~]# touch osd_spec.yaml

2. Edit the osd_spec.yaml file to include the following details:

Syntax

service_type: osd
service_id: SERVICE_ID
placement:
host_pattern: * # optional
data_devices: # optional
model: DISK_MODEL_NAME # optional
paths:
- /DEVICE_PATH
osds_per_device: NUMBER_OF_DEVICES # optional
db_devices: # optional
size: # optional
all: true # optional
paths:
- /DEVICE_PATH
encrypted: true

a. Simple scenarios: In these cases, all the nodes have the same set-up.

452 IBM Storage Ceph


Example

service_type: osd
service_id: osd_spec_default
placement:
host_pattern: '*'
data_devices:
all: true
paths:
- /dev/sdb
encrypted: true

Example

service_type: osd
service_id: osd_spec_default
placement:
host_pattern: '*'
data_devices:
size: '80G'
db_devices:
size: '40G:'
paths:
- /dev/sdc

b. Simple scenario: In this case, all the nodes have the same setup with OSD devices created in raw mode, without an LVM
layer.

Example

service_type: osd
service_id: all-available-devices
encrypted: "true"
method: raw
placement:
host_pattern: "*"
data_devices:
all: "true"

c. Advanced scenario: This would create the desired layout by using all HDDs as data_devices with two SSD assigned
as dedicated DB or WAL devices. The remaining SSDs are data_devices that have the NVMEs vendors assigned as
dedicated DB or WAL devices.

Example

service_type: osd
service_id: osd_spec_hdd
placement:
host_pattern: '*'
data_devices:
rotational: 0
db_devices:
model: Model-name
limit: 2
---
service_type: osd
service_id: osd_spec_ssd
placement:
host_pattern: '*'
data_devices:
model: Model-name
db_devices:
vendor: Vendor-name

d. Advanced scenario with non-uniform nodes: This applies different OSD specs to different hosts depending on the
host_pattern key.

Example

service_type: osd
service_id: osd_spec_node_one_to_five
placement:
host_pattern: 'node[1-5]'
data_devices:
rotational: 1

IBM Storage Ceph 453


db_devices:
rotational: 0
---
service_type: osd
service_id: osd_spec_six_to_ten
placement:
host_pattern: 'node[6-10]'
data_devices:
model: Model-name
db_devices:
model: Model-name

e. Advanced scenario with dedicated WAL and DB devices:

Example

service_type: osd
service_id: osd_using_paths
placement:
hosts:
- host01
- host02
data_devices:
paths:
- /dev/sdb
db_devices:
paths:
- /dev/sdc
wal_devices:
paths:
- /dev/sdd

f. Advanced scenario with multiple OSDs per device:

Example

service_type: osd
service_id: multiple_osds
placement:
hosts:
- host01
- host02
osds_per_device: 4
data_devices:
paths:
- /dev/sdb

g. For pre-created volumes, edit the osd_spec.yaml file to include the following details:

Syntax

service_type: osd
service_id: SERVICE_ID
placement:
hosts:
- HOSTNAME
data_devices: # optional
model: DISK_MODEL_NAME # optional
paths:
- /DEVICE_PATH
db_devices: # optional
size: # optional
all: true # optional
paths:
- /DEVICE_PATH

Example

service_type: osd
service_id: osd_spec
placement:
hosts:
- machine1
data_devices:
paths:

454 IBM Storage Ceph


- /dev/vg_hdd/lv_hdd
db_devices:
paths:
- /dev/vg_nvme/lv_nvmes

3. Mount the YAML file under a directory in the container:

Example

[root@host01 ~]# cephadm shell --mount osd_spec.yaml:/var/lib/ceph/osd/osd_spec.yaml

4. Navigate to the directory:

Example

[ceph: root@host01 /]# cd /var/lib/ceph/osd/

5. Before deploying OSDs, do a dry run:

NOTE: This step gives a preview of the deployment, without deploying the daemons.

Example

[ceph: root@host01 osd]# ceph orch apply -i osd_spec.yaml --dry-run

6. Deploy OSDs using service specification:

Syntax

ceph orch apply -i _FILE_NAME_.yml

Example

[ceph: root@host01 osd]# ceph orch apply -i osd_spec.yaml

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls osd

View the details of the node and devices:

Example

[ceph: root@host01 /]# ceph osd tree

Reference
Edit online

See the Advanced service specifications and filters for deploying OSDs section in the IBM Storage Ceph Operations Guide.

Removing the OSD daemons


Edit online
You can remove the OSD from a cluster by using Cephadm.

Removing an OSD from a cluster involves two steps:

1. Evacuates all placement groups (PGs) from the cluster.

2. Removes the PG-free OSDs from the cluster.

IBM Storage Ceph 455


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

Ceph Monitor, Ceph Manager and Ceph OSD daemons are deployed on the storage cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Check the device and the node from which the OSD has to be removed:

Example

[ceph: root@host01 /]# ceph osd tree

3. Remove the OSD:

Syntax

ceph orch osd rm OSD_ID [--replace] [--force] --zap

Example

[ceph: root@host01 /]# ceph orch osd rm 0 --zap

NOTE: If you remove the OSD from the storage cluster without an option, such as --replace, the device is removed from the
storage cluster completely. If you want to use the same device for deploying OSDs, you have to first zap the device before
adding it to the storage cluster.

4. Optional: To remove multiple OSDs from a specific node, run the following command:

Syntax

ceph orch osd rm OSD_ID OSD_ID --zap

Example

[ceph: root@host01 /]# ceph orch osd rm 2 5 --zap

5. Check the status of the OSD removal:

Example

[ceph: root@host01 /]# ceph orch osd rm status


OSD HOST STATE PGS REPLACE FORCE ZAP DRAIN STARTED AT
9 host01 done, waiting for purge 0 False False True 2023-06-06 17:50:50.525690
10 host03 done, waiting for purge 0 False False True 2023-06-06 17:49:38.731533
11 host02 done, waiting for purge 0 False False True 2023-06-06 17:48:36.641105

When no PGs are left on the OSD, it is decommissioned and removed from the cluster.

Verification
Edit online

Verify the details of the devices and the nodes from which the Ceph OSDs are removed:

Example

456 IBM Storage Ceph


[ceph: root@host01 /]# ceph osd tree

Reference
Edit online

See the Deploying Ceph OSDs on all available devices section in the IBM Storage Ceph Operations Guide for more information.

See the Deploying Ceph OSDs on specific devices and hosts section in the IBM Storage Ceph Operations Guide for more
information.

See the Zapping devices for Ceph OSD deployment section in the IBM Storage Ceph Operations Guide for more information on
clearing space on devices.

Replacing the OSDs


Edit online
When disks fail, you can replace the physical storage device and reuse the same OSD ID to avoid having to reconfigure the CRUSH
map.

You can replace the OSDs from the cluster by preserving the OSD ID using the ceph orch rm command.

NOTE: If you want to replace a single OSD, Deploying Ceph OSDs on specific devices and hosts . If you want to deploy OSDs on all
available devices, Deploying Ceph OSDs on all available devices.

The OSD is not permanently removed from the CRUSH hierarchy, but is assigned the destroyed flag. This flag is used to determine
the OSD IDs that can be reused in the next OSD deployment. The destroyed flag is used to determine which OSD id is reused in the
next OSD deployment.

If you use OSD specification for deployment, your newly added disk is assigned the OSD ID of their replaced counterparts.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

Monitor, Manager, and OSD daemons are deployed on the storage cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Check the device and the node from which the OSD has to be replaced:

Example

[ceph: root@host01 /]# ceph osd tree

3. Replace the OSD:

IMPORTANT: If the storage cluster has health_warn or other errors associated with it, check the and try to fix any errors
before replacing the OSD to avoid data loss.

Syntax

IBM Storage Ceph 457


ceph orch osd rm OSD_ID --replace

Example

[ceph: root@host01 /]# ceph orch osd rm 0 --replace

4. Check the status of the OSD replacement:

Example

[ceph: root@host01 /]# ceph orch osd rm status

Verification
Edit online

Verify the details of the devices and the nodes from which the Ceph OSDs are replaced:

Example

[ceph: root@host01 /]# ceph osd tree

You will see an OSD with the same id as the one you replaced running on the same host.

Reference
Edit online

See the Deploying Ceph OSDs on all available devices section in the IBM Storage Ceph Operations Guide for more information.

See the Deploying Ceph OSDs on specific devices and hosts section in the IBM Storage Ceph Operations Guide for more
information.

Replacing the OSDs with pre-created LVM


Edit online
After purging the OSD with the ceph-volume lvm zap command, if the directory is not present, then you can replace the OSDs
with the OSd service specification file with the pre-created LVM.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Failed OSD

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Remove the OSD:

Syntax

ceph orch osd rm OSD_ID [--replace]

Example

458 IBM Storage Ceph


[ceph: root@host01 /]# ceph orch osd rm 8 --replace
Scheduled OSD(s) for removal

3. Verify the OSD is destroyed:

Example

[ceph: root@host01 /]# ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF


-1 0.32297 root default
-9 0.05177 host host10
3 hdd 0.01520 osd.3 up 1.00000 1.00000
13 hdd 0.02489 osd.13 up 1.00000 1.00000
17 hdd 0.01169 osd.17 up 1.00000 1.00000
-13 0.05177 host host11
2 hdd 0.01520 osd.2 up 1.00000 1.00000
15 hdd 0.02489 osd.15 up 1.00000 1.00000
19 hdd 0.01169 osd.19 up 1.00000 1.00000
-7 0.05835 host host12
20 hdd 0.01459 osd.20 up 1.00000 1.00000
21 hdd 0.01459 osd.21 up 1.00000 1.00000
22 hdd 0.01459 osd.22 up 1.00000 1.00000
23 hdd 0.01459 osd.23 up 1.00000 1.00000
-5 0.03827 host host04
1 hdd 0.01169 osd.1 up 1.00000 1.00000
6 hdd 0.01129 osd.6 up 1.00000 1.00000
7 hdd 0.00749 osd.7 up 1.00000 1.00000
9 hdd 0.00780 osd.9 up 1.00000 1.00000
-3 0.03816 host host05
0 hdd 0.01169 osd.0 up 1.00000 1.00000
8 hdd 0.01129 osd.8 destroyed 0 1.00000
12 hdd 0.00749 osd.12 up 1.00000 1.00000
16 hdd 0.00769 osd.16 up 1.00000 1.00000
-15 0.04237 host host06
5 hdd 0.01239 osd.5 up 1.00000 1.00000
10 hdd 0.01540 osd.10 up 1.00000 1.00000
11 hdd 0.01459 osd.11 up 1.00000 1.00000
-11 0.04227 host host07
4 hdd 0.01239 osd.4 up 1.00000 1.00000
14 hdd 0.01529 osd.14 up 1.00000 1.00000
18 hdd 0.01459 osd.18 up 1.00000 1.00000

4. Zap and remove the OSD using the ceph-volume command:

Syntax

ceph-volume lvm zap --osd-id OSD_ID

Example

[ceph: root@host01 /]# ceph-volume lvm zap --osd-id 8

Zapping: /dev/vg1/data-lv2
Closing encrypted path /dev/mapper/l4D6ql-Prji-IzH4-dfhF-xzuf-5ETl-jNRcXC
Running command: /usr/sbin/cryptsetup remove /dev/mapper/l4D6ql-Prji-IzH4-dfhF-xzuf-5ETl-
jNRcXC
Running command: /usr/bin/dd if=/dev/zero of=/dev/vg1/data-lv2 bs=1M count=10 conv=fsync
stderr: 10+0 records in
10+0 records out
stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.034742 s, 302 MB/s
Zapping successful for OSD: 8

5. Check the OSD topology:

Example

[ceph: root@host01 /]# ceph-volume lvm list

6. Recreate the OSD with a specification file corresponding to that specific OSD topology:

Example

[ceph: root@host01 /]# cat osd.yml


service_type: osd

IBM Storage Ceph 459


service_id: osd_service
placement:
hosts:
- host03
data_devices:
paths:
- /dev/vg1/data-lv2
db_devices:
paths:
- /dev/vg1/db-lv1

7. Apply the updated specification file:

Example

[ceph: root@host01 /]# ceph orch apply -i osd.yml


Scheduled osd.osd_service update...

8. Verify the OSD is back:

Example

[ceph: root@host01 /]# ceph -s


[ceph: root@host01 /]# ceph osd tree

Replacing the OSDs in a non-colocated scenario


Edit online
When the an OSD fails in a non-colocated scenario, you can replace the WAL/DB devices. The procedure is the same for DB and WAL
devices. You need to edit the paths under db_devices for DB devices and paths under wal_devices for WAL devices.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Daemons are non-colocated.

Failed OSD

Procedure
Edit online

1. Identify the devices in the cluster:

Example

[root@host01 ~]# lsblk

NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda
8:0 0 20G 0 disk
├─sda1
8:1 0 1G 0 part /boot
└─sda2
8:2 0 19G 0 part
├─rhel-root
253:0 0 17G 0 lvm /
└─rhel-swap
253:1 0 2G 0 lvm [SWAP]
sdb
8:16 0 10G 0 disk
└─ceph--5726d3e9--4fdb--4eda--b56a--3e0df88d663f-osd--block--3ceb89ec--87ef--46b4--99c6-
-2a56bac09ff0 253:2 0 10G 0 lvm
sdc

460 IBM Storage Ceph


8:32 0 10G 0 disk
└─ceph--d7c9ab50--f5c0--4be0--a8fd--e0313115f65c-osd--block--37c370df--1263--487f--a476-
-08e28bdbcd3c 253:4 0 10G 0 lvm
sdd
8:48 0 10G 0 disk
├─ceph--1774f992--44f9--4e78--be7b--b403057cf5c3-osd--db--31b20150--4cbc--4c2c--9c8f-
-6f624f3bfd89 253:7 0 2.5G 0 lvm
└─ceph--1774f992--44f9--4e78--be7b--b403057cf5c3-osd--db--1bee5101--dbab--4155--a02c--
e5a747d38a56 253:9 0 2.5G 0 lvm
sde
8:64 0 10G 0 disk
sdf
8:80 0 10G 0 disk
└─ceph--412ee99b--4303--4199--930a--0d976e1599a2-osd--block--3a99af02--7c73--4236--9879-
-1fad1fe6203d 253:6 0 10G 0 lvm
sdg
8:96 0 10G 0 disk
└─ceph--316ca066--aeb6--46e1--8c57--f12f279467b4-osd--block--58475365--51e7--42f2--9681--
e0c921947ae6 253:8 0 10G 0 lvm
sdh
8:112 0 10G 0 disk
├─ceph--d7064874--66cb--4a77--a7c2--8aa0b0125c3c-osd--db--0dfe6eca--ba58--438a--9510--
d96e6814d853 253:3 0 5G 0 lvm
└─ceph--d7064874--66cb--4a77--a7c2--8aa0b0125c3c-osd--db--26b70c30--8817--45de--8843-
-4c0932ad2429 253:5 0 5G 0 lvm
sr0

2. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

3. Identify the OSDs and their DB device:

Example

[ceph: root@host01 /]# ceph-volume lvm list /dev/sdh

====== osd.2 =======

[db] /dev/ceph-d7064874-66cb-4a77-a7c2-8aa0b0125c3c/osd-db-0dfe6eca-ba58-438a-
9510-d96e6814d853

block device /dev/ceph-5726d3e9-4fdb-4eda-b56a-3e0df88d663f/osd-block-


3ceb89ec-87ef-46b4-99c6-2a56bac09ff0
block uuid GkWLoo-f0jd-Apj2-Zmwj-ce0h-OY6J-UuW8aD
cephx lockbox secret
cluster fsid fa0bd9dc-e4c4-11ed-8db4-001a4a00046e
cluster name ceph
crush device class
db device /dev/ceph-d7064874-66cb-4a77-a7c2-8aa0b0125c3c/osd-db-
0dfe6eca-ba58-438a-9510-d96e6814d853
db uuid 6gSPoc-L39h-afN3-rDl6-kozT-AX9S-XR20xM
encrypted 0
osd fsid 3ceb89ec-87ef-46b4-99c6-2a56bac09ff0
osd id 2
osdspec affinity non-colocated
type db
vdo 0
devices /dev/sdh

====== osd.5 =======

[db] /dev/ceph-d7064874-66cb-4a77-a7c2-8aa0b0125c3c/osd-db-26b70c30-8817-45de-
8843-4c0932ad2429

block device /dev/ceph-d7c9ab50-f5c0-4be0-a8fd-e0313115f65c/osd-block-


37c370df-1263-487f-a476-08e28bdbcd3c
block uuid Eay3I7-fcz5-AWvp-kRcI-mJaH-n03V-Zr0wmJ
cephx lockbox secret
cluster fsid fa0bd9dc-e4c4-11ed-8db4-001a4a00046e
cluster name ceph
crush device class
db device /dev/ceph-d7064874-66cb-4a77-a7c2-8aa0b0125c3c/osd-db-

IBM Storage Ceph 461


26b70c30-8817-45de-8843-4c0932ad2429
db uuid mwSohP-u72r-DHcT-BPka-piwA-lSwx-w24N0M
encrypted 0
osd fsid 37c370df-1263-487f-a476-08e28bdbcd3c
osd id 5
osdspec affinity non-colocated
type db
vdo 0
devices /dev/sdh

4. In the osds.yaml file, set unmanaged parameter to true, else cephadm redeploys the OSDs:

Example

[ceph: root@host01 /]# cat osds.yml


service_type: osd
service_id: non-colocated
unmanaged: true
placement:
host_pattern: 'ceph*'
data_devices:
paths:
- /dev/sdb
- /dev/sdc
- /dev/sdf
- /dev/sdg
db_devices:
paths:
- /dev/sdd
- /dev/sdh

5. Apply the updated specification file:

Example

[ceph: root@host01 /]# ceph orch apply -i osds.yml

Scheduled osd.non-colocated update...

6. Check the status:

Example

[ceph: root@host01 /]# ceph orch ls

NAME PORTS RUNNING REFRESHED AGE PLACEMENT


alertmanager ?:9093,9094 1/1 9m ago 4d count:1
crash 3/4 4d ago 4d *
grafana ?:3000 1/1 9m ago 4d count:1
mgr 1/2 4d ago 4d count:2
mon 3/5 4d ago 4d count:5
node-exporter ?:9100 3/4 4d ago 4d *
osd.non-colocated 8 4d ago 5s <unmanaged>
prometheus ?:9095 1/1 9m ago 4d count:1

7. Remove the OSDs. Ensure to use the --zap option to remove hte backend services and the --replace option to retain the
OSD IDs:

Example

[ceph: root@host01 /]# ceph orch osd rm 2 5 --zap --replace


Scheduled OSD(s) for removal

8. Check the status:

Example

[ceph: root@host01 /]# ceph osd df tree | egrep -i "ID|host02|osd.2|osd.5"

ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR
PGS STATUS TYPE NAME
-5 0.04877 - 55 GiB 15 GiB 4.1 MiB 0 B 60 MiB 40 GiB 27.27 1.17
- host02
2 hdd 0.01219 1.00000 15 GiB 5.0 GiB 996 KiB 0 B 15 MiB 10 GiB 33.33 1.43
0 destroyed osd.2

462 IBM Storage Ceph


5 hdd 0.01219 1.00000 15 GiB 5.0 GiB 1.0 MiB 0 B 15 MiB 10 GiB 33.33 1.43
0 destroyed osd.5

9. Edit the osds.yaml specification file to change unmanaged parameter to false and replace the path to the DB device if it
has changed after the device got physically replaced:

Example

[ceph: root@host01 /]# cat osds.yml


service_type: osd
service_id: non-colocated
unmanaged: false
placement:
host_pattern: 'ceph01*'
data_devices:
paths:
- /dev/sdb
- /dev/sdc
- /dev/sdf
- /dev/sdg
db_devices:
paths:
- /dev/sdd
- /dev/sde

In the above example, /dev/sdh is replaced with /dev/sde.

IMPORTANT: If you use the same host specification file to replace the faulty DB device on a single OSD node, modify the
host_pattern option to specify only the OSD node, else the deployment fails and you cannot find the new DB device on
other hosts.

10. Reapply the specification file with the --dry-run option to ensure the OSDs shall be deployed with the new DB device:

Example

[ceph: root@host01 /]# ceph orch apply -i osds.yml --dry-run


WARNING! Dry-Runs are snapshots of a certain point in time and are bound
to the current inventory setup. If any of these conditions change, the
preview will be invalid. Please make sure to have a minimal
timeframe between planning and applying the specs.
####################
SERVICESPEC PREVIEWS
####################
+---------+------+--------+-------------+
|SERVICE |NAME |ADD_TO |REMOVE_FROM |
+---------+------+--------+-------------+
+---------+------+--------+-------------+
################
OSDSPEC PREVIEWS
################
+---------+-------+-------+----------+----------+-----+
|SERVICE |NAME |HOST |DATA |DB |WAL |
+---------+-------+-------+----------+----------+-----+
|osd |non-colocated |host02 |/dev/sdb |/dev/sde |- |
|osd |non-colocated |host02 |/dev/sdc |/dev/sde |- |
+---------+-------+-------+----------+----------+-----+

11. Apply the specification file:

Example

[ceph: root@host01 /]# ceph orch apply -i osds.yml


Scheduled osd.non-colocated update...

12. Check the OSDs are redeployed:

Example

[ceph: root@host01 /]# ceph osd df tree | egrep -i "ID|host02|osd.2|osd.5"

ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR
PGS STATUS TYPE NAME

IBM Storage Ceph 463


-5 0.04877 - 55 GiB 15 GiB 4.5 MiB 0 B 60 MiB 40 GiB 27.27 1.17 - host host02 2 hdd 0.01219 1.00000 15 GiB 5.0 GiB 1.1
MiB 0 B 15 MiB 10 GiB 33.33 1.43 0 up osd.2 5 hdd 0.01219 1.00000 15 GiB 5.0 GiB 1.1 MiB 0 B 15 MiB 10 GiB 33.33 1.43 0
up osd.5

Verification
Edit online

1. From the OSD host where the OSDS are redeployed, verify if they are on the new DB device:

Example

[ceph: root@host01 /]# ceph-volume lvm list /dev/sde

====== osd.2 =======

[db] /dev/ceph-15ce813a-8a4c-46d9-ad99-7e0845baf15e/osd-db-1998a02e-5e67-42a9-
b057-e02c22bbf461

block device /dev/ceph-a4afcb78-c804-4daf-b78f-3c7ad1ed0379/osd-block-


564b3d2f-0f85-4289-899a-9f98a2641979
block uuid ITPVPa-CCQ5-BbFa-FZCn-FeYt-c5N4-ssdU41
cephx lockbox secret
cluster fsid fa0bd9dc-e4c4-11ed-8db4-001a4a00046e
cluster name ceph
crush device class
db device /dev/ceph-15ce813a-8a4c-46d9-ad99-7e0845baf15e/osd-db-
1998a02e-5e67-42a9-b057-e02c22bbf461
db uuid HF1bYb-fTK7-0dcB-CHzW-xvNn-dCym-KKdU5e
encrypted 0
osd fsid 564b3d2f-0f85-4289-899a-9f98a2641979
osd id 2
osdspec affinity non-colocated
type db
vdo 0
devices /dev/sde

====== osd.5 =======

[db] /dev/ceph-15ce813a-8a4c-46d9-ad99-7e0845baf15e/osd-db-6c154191-846d-4e63-
8c57-fc4b99e182bd

block device /dev/ceph-b37c8310-77f9-4163-964b-f17b4c29c537/osd-block-


b42a4f1f-8e19-4416-a874-6ff5d305d97f
block uuid 0LuPoz-ao7S-UL2t-BDIs-C9pl-ct8J-xh5ep4
cephx lockbox secret
cluster fsid fa0bd9dc-e4c4-11ed-8db4-001a4a00046e
cluster name ceph
crush device class
db device /dev/ceph-15ce813a-8a4c-46d9-ad99-7e0845baf15e/osd-db-
6c154191-846d-4e63-8c57-fc4b99e182bd
db uuid SvmXms-iWkj-MTG7-VnJj-r5Mo-Moiw-MsbqVD
encrypted 0
osd fsid b42a4f1f-8e19-4416-a874-6ff5d305d97f
osd id 5
osdspec affinity non-colocated
type db
vdo 0
devices /dev/sde

Stopping the removal of the OSDs


Edit online
You can stop the removal of only the OSDs that are queued for removal. This resets the initial state of the OSD and takes it off the
removal queue.

If the OSD is in the process of removal, then you cannot stop the process.

464 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

Monitor, Manager and OSD daemons are deployed on the cluster.

Remove OSD process initiated.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Check the device and the node from which the OSD was initiated to be removed:

Example

[ceph: root@host01 /]# ceph osd tree

3. Stop the removal of the queued OSD:

Syntax

ceph orch osd rm stop OSD_ID

Example

[ceph: root@host01 /]# ceph orch osd rm stop 0

4. Check the status of the OSD removal:

Example

[ceph: root@host01 /]# ceph orch osd rm status

Verify the details of the devices and the nodes from which the Ceph OSDs were queued for removal:

Example

[ceph: root@host01 /]# ceph osd tree

Reference
Edit online

See the Removing the OSD daemons section in the IBM Storage Ceph Operations Guide for more information.

Activating the OSDs


Edit online
You can activate the OSDs in the cluster in cases where the operating system of the host was reinstalled.

Observing the data migration

Prerequisites

IBM Storage Ceph 465


Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

Monitor, Manager and OSD daemons are deployed on the storage cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. After the operating system of the host is reinstalled, activate the OSDs:

Syntax

ceph cephadm osd activate HOSTNAME

Example

[ceph: root@host01 /]# ceph cephadm osd activate host03

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --service_name=SERVICE_NAME

Example

[ceph: root@host01 /]# ceph orch ps --service_name=osd

Observing the data migration


Edit online
When you add or remove an OSD to the CRUSH map, Ceph begins rebalancing the data by migrating placement groups to the new or
existing OSD(s). You can observe the data migration using ceph-w command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Recently added or removed an OSD.

Procedure

466 IBM Storage Ceph


Edit online

1. To observe the data migration:

Example

[ceph: root@host01 /]# ceph -w

2. Watch as the placement group states change from active+clean to active, some degraded objects, and finally
active+clean when migration completes.

3. To exit the utility, press Ctrl + C.

Recalculating the placement groups


Edit online
Placement groups (PGs) define the spread of any pool data across the available OSDs. A placement group is built upon the given
redundancy algorithm to be used. For a 3-way replication, the redundancy is defined to use three different OSDs. For erasure-coded
pools, the number of OSDs to use is defined by the number of chunks.

When defining a pool the number of placement groups defines the grade of granularity the data is spread with across all available
OSDs. The higher the number the better the equalization of capacity load can be. However, since handling the placement groups is
also important in case of reconstruction of data, the number is significant to be carefully chosen upfront. To support calculation a
tool is available to produce agile environments.

During the lifetime of a storage cluster a pool may grow above the initially anticipated limits. With the growing number of drives a
recalculation is recommended. The number of placement groups per OSD should be around 100. When adding more OSDs to the
storage cluster the number of PGs per OSD will lower over time. Starting with 120 drives initially in the storage cluster and setting the
pg_num of the pool to 4000 will end up in 100 PGs per OSD, given with the replication factor of three. Over time, when growing to ten
times the number of OSDs, the number of PGs per OSD will go down to ten only. Because a small number of PGs per OSD will tend to
an unevenly distributed capacity, consider adjusting the PGs per pool.

Adjusting the number of placement groups can be done online. Recalculating is not only a recalculation of the PG numbers, but will
involve data relocation, which will be a lengthy process. However, the data availability will be maintained at any time.

Very high numbers of PGs per OSD should be avoided, because reconstruction of all PGs on a failed OSD will start at once. A high
number of IOPS is required to perform reconstruction in a timely manner, which might not be available. This would lead to deep I/O
queues and high latency rendering the storage cluster unusable or will result in long healing times.

Reference
Edit online

See the PG calculator for calculating the values by a given use case.

See the Erasure Code Pools chapter in the IBM Storage Ceph Strategies Guide for more information.

Management of monitoring stack


Edit online
As a storage administrator, you can use the Ceph Orchestrator with Cephadm in the backend to deploy monitoring and alerting stack.
The monitoring stack consists of Prometheus, Prometheus exporters, Prometheus Alertmanager, and Grafana. Users need to either
define these services with Cephadm in a YAML configuration file, or they can use the command line interface to deploy them. When
multiple services of the same type are deployed, a highly-available setup is deployed. The node exporter is an exception to this rule.

NOTE: IBM Storage Ceph 5.3 does not support custom images for deploying monitoring services such as Prometheus, Grafana,
Alertmanager, and node-exporter.

The following monitoring services can be deployed with Cephadm:

IBM Storage Ceph 467


Prometheus is the monitoring and alerting toolkit. It collects the data provided by Prometheus exporters and fires
preconfigured alerts if predefined thresholds have been reached. The Prometheus manager module provides a Prometheus
exporter to pass on Ceph performance counters from the collection point in ceph-mgr.

The Prometheus configuration, including scrape targets, such as metrics providing daemons, is set up automatically by Cephadm.
Cephadm also deploys a list of default alerts, for example, health error, 10% OSDs down, or pgs inactive.

Alertmanager handles alerts sent by the Prometheus server. It deduplicates, groups, and routes the alerts to the correct
receiver. By default, the Ceph dashboard is automatically configured as the receiver. The Alertmanager handles alerts sent by
the Prometheus server. Alerts can be silenced using the Alertmanager, but silences can also be managed using the Ceph
Dashboard.

Grafana is a visualization and alerting software. The alerting functionality of Grafana is not used by this monitoring stack. For
alerting, the Alertmanager is used.

By default, traffic to Grafana is encrypted with TLS. You can either supply your own TLS certificate or use a self-signed one. If no
custom certificate has been configured before Grafana has been deployed, then a self-signed certificate is automatically created and
configured for Grafana. Custom certificates for Grafana can be configured using the following commands:

Syntax

ceph config-key set mgr/cephadm/grafana_key -i _PRESENT_WORKING_DIRECTORY_/key.pem


ceph config-key set mgr/cephadm/grafana_crt -i _PRESENT_WORKING_DIRECTORY_/certificate.pem

Node exporter is an exporter for Prometheus which provides data about the node on which it is installed. It is recommended to
install the node exporter on all nodes. This can be done using the monitoring.yml file with the node-exporter service type.

Deploying the monitoring stack


Removing the monitoring stack

Deploying the monitoring stack


Edit online
The monitoring stack consists of Prometheus, Prometheus exporters, Prometheus Alertmanager, and Grafana. Ceph Dashboard
makes use of these components to store and visualize detailed metrics on cluster usage and performance.

You can deploy the monitoring stack using the service specification in YAML file format. All the monitoring services can have the
network and port they bind to configured in the yml file.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the nodes.

Procedure
Edit online

1. Enable the prometheus module in the Ceph Manager daemon. This exposes the internal Ceph metrics so that Prometheus can
read them:

Example

[ceph: root@host01 /]# ceph mgr module enable prometheus

IMPORTANT: Ensure this command is run before Prometheus is deployed. If the command was not run before the
deployment, you must redeploy Prometheus to update the configuration:

ceph orch redeploy prometheus

2. Navigate to the following directory:

468 IBM Storage Ceph


Syntax

cd /var/lib/ceph/DAEMON_PATH/

Example

[ceph: root@host01 mds/]# cd /var/lib/ceph/monitoring/

NOTE: If the directory monitoring does not exist, create it.

3. Create the monitoring.yml file:

Example

[ceph: root@host01 monitoring]# touch monitoring.yml

4. Edit the specification file with a content similar to the following example:

Example

service_type: prometheus
service_name: prometheus
placement:
hosts:
- host01
networks:
- 192.169.142.0/24
---
service_type: node-exporter
---
service_type: alertmanager
service_name: alertmanager
placement:
hosts:
- host01
networks:
- 192.169.142.0/24
---
service_type: grafana
service_name: grafana
placement:
hosts:
- host01
networks:
- 192.169.142.0/24

NOTE: Ensure the monitoring stack components alertmanager, prometheus, and grafana are deployed on the same
host. The node-exporter component should be deployed on all the hosts.

5. Apply monitoring services:

Example

[ceph: root@host01 monitoring]# ceph orch apply -i monitoring.yml

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --service_name=SERVICE_NAME

Example

IBM Storage Ceph 469


[ceph: root@host01 /]# ceph orch ps --service_name=prometheus

IMPORTANT: Prometheus, Grafana, and the Ceph dashboard are all automatically configured to talk to each other, resulting in a fully
functional Grafana integration in the Ceph dashboard.

Removing the monitoring stack


Edit online
You can remove the monitoring stack using the ceph orch rm command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Use the ceph orch rm command to remove the monitoring stack:

Syntax

ceph orch rm SERVICE_NAME --force

Example

[ceph: root@host01 /]# ceph orch rm grafana


[ceph: root@host01 /]# ceph orch rm prometheus
[ceph: root@host01 /]# ceph orch rm node-exporter
[ceph: root@host01 /]# ceph orch rm alertmanager
[ceph: root@host01 /]# ceph mgr module disable prometheus

3. Check the status of the process:

Example

[ceph: root@host01 /]# ceph orch status

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps

Example

[ceph: root@host01 /]# ceph orch ps

470 IBM Storage Ceph


Reference
Edit online

See Deploying the monitoring stack section in the IBM Storage Ceph Operations Guide for more information.

Basic IBM Storage Ceph client setup


Edit online
As a storage administrator, you have to set up client machines with basic configuration to interact with the storage cluster. Most
client machines only need the ceph-common package and its dependencies installed. It will supply the basic ceph and rados
commands, as well as other commands like mount.ceph and rbd.

Configuring file setup on client machines


Setting-up keyring on client machines

Prerequisites
Edit online

A running Red Hat Ceph Storage cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

All manager, monitor and OSD daemons are deployed.

Configuring file setup on client machines


Edit online
Client machines generally need a smaller configuration file than a full-fledged storage cluster member. You can generate a minimal
configuration file which can give details to clients to reach the Ceph monitors.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root access to the nodes.

Procedure
Edit online

1. On the node where you want to set up the files, create a directory ceph in the /etc folder:

Example

[root@host01 ~]# mkdir /etc/ceph/

2. Navigate to /etc/ceph directory:

Example

[root@host01 ~]# cd /etc/ceph/

3. Generate the configuration file in the ceph directory:

IBM Storage Ceph 471


Example

[root@host01 ceph]# ceph config generate-minimal-conf

# minimal ceph.conf for 417b1d7a-a0e6-11eb-b940-001a4a000740


[global]
fsid = 417b1d7a-a0e6-11eb-b940-001a4a000740
mon_host = [v2:10.74.249.41:3300/0,v1:10.74.249.41:6789/0]

The contents of this file should be installed in /etc/ceph/ceph.conf path. You can use this configuration file to reach the
Ceph monitors.

Setting-up keyring on client machines


Edit online
Most Ceph clusters are run with the authentication enabled, and the client needs the keys in order to communicate with cluster
machines. You can generate the keyring which can give details to clients to reach the Ceph monitors.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root access to the nodes.

Procedure
Edit online

1. On the node where you want to set up the keyring, create a directory ceph in the /etc folder:

Example

[root@host01 ~]# mkdir /etc/ceph/

2. Navigate to /etc/ceph directory in the ceph directory:

Example

[root@host01 ~]# cd /etc/ceph/

3. Generate the keyring for the client:

Syntax

ceph auth get-or-create client.CLIENT_NAME -o /etc/ceph/NAME_OF_THE_FILE

Example

[root@host01 ceph]# ceph auth get-or-create client.fs -o /etc/ceph/ceph.keyring

4. Verify the output in the ceph.keyring file:

Example

[root@host01 ceph]# cat ceph.keyring

[client.fs]
key = AQAvoH5gkUCsExAATz3xCBLd4n6B6jRv+Z7CVQ==

The resulting output should be put into a keyring file, for example /etc/ceph/ceph.keyring.

Management of MDS service


472 IBM Storage Ceph
Edit online
As a storage administrator, you can use Ceph Orchestrator with Cephadm in the backend to deploy the MDS service. By default, a
Ceph File System (CephFS) uses only one active MDS daemon. However, systems with many clients benefit from multiple active MDS
daemons.

Deploying the MDS service using the command line interface


Deploying the MDS service using the service specification
Removing the MDS service

Prerequisites
Edit online

A running Red Hat Ceph Storage cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

All manager, monitor and OSD daemons are deployed.

Deploying the MDS service using the command line interface


Edit online
Using the Ceph Orchestrator, you can deploy the Metadata Server (MDS) service using the placement specification in the command
line interface. Ceph File System (CephFS) requires one or more MDS.

NOTE: Ensure you have at least two pools, one for Ceph file system (CephFS) data and one for CephFS metadata.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

All manager, monitor, and OSD daemons are deployed.

Root-level access to all the nodes.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. There are two ways of deploying MDS daemons using placement specification:

Method 1

Use ceph fs volume to create the MDS daemons. This creates the CephFS volume and pools associated with the CephFS,
and also starts the MDS service on the hosts.

Syntax

ceph fs volume create FILESYSTEM_NAME --placement="NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_2


HOST_NAME_3"

NOTE: By default, replicated pools are created for this command.

IBM Storage Ceph 473


Example

[ceph: root@host01 /]# ceph fs volume create test --placement="2 host01 host02"

Method 2

Create the pools, CephFS, and then deploy MDS service using placement specification:

1. Create the pools for CephFS:

Syntax

ceph osd pool create DATA_POOL [PG_NUM]


ceph osd pool create METADATA_POOL [PG_NUM]

Example

[ceph: root@host01 /]# ceph osd pool create cephfs_data 64


[ceph: root@host01 /]# ceph osd pool create cephfs_metadata 64

Typically, the metadata pool can start with a conservative number of Placement Groups (PGs) as it generally has far
fewer objects than the data pool. It is possible to increase the number of PGs if needed. The pool sizes range from 64
PGs to 512 PGs. Size the data pool is proportional to the number and sizes of files you expect in the file system.

**IMPORTANT:** For the metadata pool, consider to use:

* A higher replication level because any data loss to this pool can make the whole file
system inaccessible.
* Storage with lower latency such as Solid-State Drive (SSD) disks because this directly
affects the observed latency of file system operations on clients.

2. Create the file system for the data pools and metadata pools:

Syntax

ceph fs new FILESYSTEM_NAME METADATA_POOL DATA_POOL

Example

[ceph: root@host01 /]# ceph fs new test cephfs_metadata cephfs_data

3. Deploy MDS service using the ceph orch apply command:

Syntax

ceph orch apply mds FILESYSTEM_NAME --placement="NUMBER_OF_DAEMONS HOST_NAME_1


HOST_NAME_2 HOST_NAME_3"

Example

[ceph: root@host01 /]# ceph orch apply mds test --placement="2 host01 host02"

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

Check the CephFS status:

Example

[ceph: root@host01 /]# ceph fs ls


[ceph: root@host01 /]# ceph fs status

List the hosts, daemons, and processes:

Syntax

474 IBM Storage Ceph


ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mds

Deploying the MDS service using the service specification


Edit online
Using the Ceph Orchestrator, you can deploy the MDS service using the service specification.

NOTE: Ensure you have at least two pools, one for the Ceph File System (CephFS) data and one for the CephFS metadata.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Hosts are added to the cluster.

All manager, monitor, and OSD daemons are deployed.

Procedure
Edit online

1. Create the mds.yaml file:

Example

[root@host01 ~]# touch mds.yaml

2. Edit the mds.yaml file to include the following details:

Syntax

service_type: mds
service_id: FILESYSTEM_NAME
placement:
hosts:
- HOST_NAME_1
- HOST_NAME_2
- HOST_NAME_3

Example

service_type: mds
service_id: fs_name
placement:
hosts:
- host01
- host02

3. Mount the YAML file under a directory in the container:

Example

[root@host01 ~]# cephadm shell --mount mds.yaml:/var/lib/ceph/mds/mds.yaml

4. Navigate to the directory:

Example

[ceph: root@host01 /]# cd /var/lib/ceph/mds/

5. Log into the Cephadm shell:

IBM Storage Ceph 475


Example

[root@host01 ~]# cephadm shell

6. Navigate to the following directory:

Example

[ceph: root@host01 /]# cd /var/lib/ceph/mds/

7. Deploy MDS service using service specification:

Syntax

ceph orch apply -i FILE_NAME.yaml

Example

[ceph: root@host01 mds]# ceph orch apply -i mds.yaml

8. Once the MDS services is deployed and functional, create the CephFS:

Syntax

ceph fs new CEPHFS_NAME METADATA_POOL DATA_POOL

Example

[ceph: root@host01 /]# ceph fs new test metadata_pool data_pool

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=mds

Removing the MDS service


Edit online
You can remove the service using the ceph orch rm command. Alternatively, you can remove the file system and the associated
pools.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

At least one MDS daemon deployed on the hosts.

476 IBM Storage Ceph


Procedure
Edit online

There are two ways of removing MDS daemons from the cluster:

Method 1

Remove the CephFS volume, associated pools, and the services:

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Set the configuration parameter mon_allow_pool_delete to true:

Example

[ceph: root@host01 /]# ceph config set mon mon_allow_pool_delete true

3. Remove the file system:

Syntax

ceph fs volume rm FILESYSTEM_NAME --yes-i-really-mean-it

Example

[ceph: root@host01 /]# ceph fs volume rm cephfs-new --yes-i-really-mean-it

This command will remove the file system, its data, and metadata pools. It also tries to remove the MDS using the
enabled ceph-mgr Orchestrator module.

Method 2

Use the ceph orch rm command to remove the MDS service from the entire cluster:

1. List the service:

Example

[ceph: root@host01 /]# ceph orch ls

2. Remove the service

Syntax

ceph orch rm SERVICE_NAME

Example

[ceph: root@host01 /]# ceph orch rm mds.test

Verification
Edit online

List the hosts, daemons, and processes:

Syntax

ceph orch ps

Example

[ceph: root@host01 /]# ceph orch ps

Reference
IBM Storage Ceph 477
Edit online

See Deploying the MDS service using the command line interface section in the IBM Storage Ceph Operations Guide for more
information.

See Deploying the MDS service using the service specification section in the IBM Storage Ceph Operations Guide for more
information.

Management of Ceph object gateway


Edit online
As a storage administrator, you can deploy Ceph object gateway using the command line interface or by using the service
specification.

You can also configure multisite object gateways, and remove the Ceph object gateway.

Cephadm deploys Ceph object gateway as a collection of daemons that manages a single-cluster deployment or a particular realm
and zone in a multisite deployment.

NOTE: With Cephadm, the object gateway daemons are configured using the monitor configuration database instead of a
ceph.conf or the command line. If that configuration is not already in the client.rgw section, then the object gateway daemons
will start up with default settings and bind to the port 80.

NOTE: The .default.rgw.buckets.index pool is created only after the bucket is created in Ceph Object Gateway, while the
.default.rgw.buckets.data pool is created after the data is uploaded to the bucket.

Deploying the Ceph Object Gateway using the command line interface
Deploying the Ceph Object Gateway using the service specification
Deploying a multi-site Ceph Object Gateway
Removing the Ceph Object Gateway

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

All the managers, monitors, and OSDs are deployed in the storage cluster.

Deploying the Ceph Object Gateway using the command line


interface
Edit online
Using the Ceph Orchestrator, you can deploy the Ceph Object Gateway with the ceph orch command in the command line
interface.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

478 IBM Storage Ceph


All manager, monitor and OSD daemons are deployed.

Log in to the Cephadm shell by using the cephadm shell to deploy Ceph Object Gateway daemons.

Procedure
Edit online
Method 1:

1. You can deploy the Ceph object gateway daemons in three different ways:

Create realm, zone group, zone, and then use the placement specification with the host name:

1. Create a realm:

Syntax

radosgw-admin realm create --rgw-realm=REALM_NAME --default

Example

[ceph: root@host01 /]# radosgw-admin realm create --rgw-realm=test_realm --default

2. Create a zone group:

Syntax

radosgw-admin zonegroup create --rgw-zonegroup=ZONE_GROUP_NAME --master --default

Example

[ceph: root@host01 /]# radosgw-admin zonegroup create --rgw-zonegroup=default --master -


-default

3. Create a zone:

Syntax

radosgw-admin zone create --rgw-zonegroup=ZONE_GROUP_NAME --rgw-zone=ZONE_NAME --master -


-default

Example

[ceph: root@host01 /]# radosgw-admin zone create --rgw-zonegroup=default --rgw-


zone=test_zone --master --default

4. Commit the changes:

Syntax

radosgw-admin period update --rgw-realm=REALM_NAME --commit

Example

[ceph: root@host01 /]# radosgw-admin period update --rgw-realm=test_realm --commit

5. Run the ceph orch apply command:

Syntax

ceph orch apply rgw NAME [--realm=REALM_NAME] [--zone=ZONE_NAME] --


placement="NUMBER_OF_DAEMONS [HOST_NAME_1 HOST_NAME_2]"

Example

[ceph: root@host01 /]# ceph orch apply rgw test --realm=test_realm --zone=test_zone --
placement="2 host01 host02"

Method 2:

Use an arbitrary service name to deploy two Ceph Object Gateway daemons for a single cluster deployment:

Syntax

IBM Storage Ceph 479


ceph orch apply rgw SERVICE_NAME

Example

[ceph: root@host01 /]# ceph orch apply rgw foo

Method 3:

Use an arbitrary service name on a labeled set of hosts:

Syntax

ceph orch host label add HOST_NAME_1 LABEL_NAME


ceph orch host label add HOSTNAME_2 LABEL_NAME
ceph orch apply rgw SERVICE_NAME --placement="label:LABEL_NAME count-per-
host:NUMBER_OF_DAEMONS" --port=8000

NUMBER_OF_DAEMONS controls the number of Ceph object gateways deployed on each host. To achieve the highest
performance without incurring an additional cost, set this value to 2.

Example

[ceph: root@host01 /]# ceph orch host label add host01 rgw # the 'rgw' label can be anything
[ceph: root@host01 /]# ceph orch host label add host02 rgw
[ceph: root@host01 /]# ceph orch apply rgw foo "--placement=label:rgw count-per-host:2" --
port=8000

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=rgw

Deploying the Ceph Object Gateway using the service specification


Edit online
You can deploy the Ceph Object Gateway using the service specification with either the default or the custom realms, zones, and
zone groups.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the bootstrapped host.

Hosts are added to the cluster.

All manager, monitor, and OSD daemons are deployed.

Procedure
480 IBM Storage Ceph
Edit online

1. As a root user, create a specification file:

Example

[root@host01 ~]# touch radosgw.yml

2. Edit the radosgw.yml file to include the following details for the default realm, zone, and zone group:

Syntax

service_type: rgw
service_id: REALM_NAME.ZONE_NAME
placement:
hosts:
- HOST_NAME_1
- HOST_NAME_2
count-per-host: NUMBER_OF_DAEMONS
spec:
rgw_realm: REALM_NAME
rgw_zone: ZONE_NAME
rgw_frontend_port: FRONT_END_PORT
networks:
- NETWORK_CIDR # Ceph Object Gateway service binds to a specific network

NOTE: NUMBER_OF_DAEMONS controls the number of Ceph Object Gateways deployed on each host. To achieve the highest
performance without incurring an additional cost, set this value to 2.

Example

service_type: rgw
service_id: default
placement:
hosts:
- host01
- host02
- host03
count-per-host: 2
spec:
rgw_realm: default
rgw_zone: default
rgw_frontend_port: 1234
networks:
- 192.169.142.0/24

3. Optional: For custom realm, zone, and zone group, create the resources and then create the radosgw.yml file:

a. Create the custom realm, zone, and zone group:

Example

[root@host01 ~]# radosgw-admin realm create --rgw-realm=test_realm


[root@host01 ~]# radosgw-admin zonegroup create --rgw-zonegroup=test_zonegroup
[root@host01 ~]# radosgw-admin zone create --rgw-zonegroup=test_zonegroup --rgw-
zone=test_zone
[root@host01 ~]# radosgw-admin period update --rgw-realm=test_realm --commit

b. Create the radosgw.yml file with the following details:

Example

service_type: rgw
service_id: test_realm.test_zone
placement:
hosts:
- host01
- host02
- host03
count-per-host: 2
spec:
rgw_realm: test_realm
rgw_zone: test_zone
rgw_frontend_port: 1234

IBM Storage Ceph 481


networks:
- 192.169.142.0/24

4. Mount the radosgw.yml file under a directory in the container:

Example

[root@host01 ~]# cephadm shell --mount radosgw.yml:/var/lib/ceph/radosgw/radosgw.yml

NOTE: Every time you exit the shell, you have to mount the file in the container before deploying the daemon.

5. Deploy the Ceph Object Gateway using the service specification:

Syntax

ceph orch apply -i FILE_NAME.yml

Example

[ceph: root@host01 /]# ceph orch apply -i radosgw.yml

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=rgw

Deploying a multi-site Ceph Object Gateway


Edit online
Ceph Orchestrator supports multi-site configuration options for the Ceph Object Gateway.

You can configure each object gateway to work in an active-active zone configuration allowing writes to a non-primary zone. The
multi-site configuration is stored within a container called a realm.

The realm stores zone groups, zones, and a time period. The rgw daemons handle the synchronization eliminating the need for a
separate synchronization agent, thereby operating with an active-active configuration.

You can also deploy multi-site zones using the command line interface (CLI).

NOTE: The following configuration assumes at least two IBM Storage Ceph clusters are in geographically separate locations.
However, the configuration also works on the same site.

Prerequisites
Edit online

At least two running IBM Storage Ceph clusters.

At least two Ceph Object Gateway instances, one for each IBM Storage Ceph cluster.

Root-level access to all the nodes.

482 IBM Storage Ceph


Nodes or containers are added to the storage cluster.

All Ceph Manager, Monitor and OSD daemons are deployed.

Procedure
Edit online

1. In the cephadm shell, configure the primary zone:

a. Create a realm:

Syntax

radosgw-admin realm create --rgw-realm=REALM_NAME --default

Example

[ceph: root@host01 /]# radosgw-admin realm create --rgw-realm=test_realm --default

If the storage cluster has a single realm, then specify the --default flag.

b. Create a primary zone group:

Syntax

radosgw-admin zonegroup create --rgw-zonegroup=ZONE_GROUP_NAME --


endpoints=http://RGW_PRIMARY_HOSTNAME:RGW_PRIMARY_PORT_NUMBER_1 --master --default

Example

[ceph: root@host01 /]# radosgw-admin zonegroup create --rgw-zonegroup=us --


endpoints=http://rgw1:80 --master --default

c. Create a primary zone:

Syntax

radosgw-admin zone create --rgw-zonegroup=_PRIMARY_ZONE_GROUP_NAME_ --rgw-


zone=PRIMARY_ZONE_NAME --endpoints=http://RGW_PRIMARY_HOSTNAME:RGW_PRIMARY_PORT_NUMBER_1
--access-key=SYSTEM_ACCESS_KEY --secret=SYSTEM_SECRET_KEY

Example

[ceph: root@host01 /]# radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-1


--endpoints=http://rgw1:80 --access-key=LIPEYZJLTWXRKXS9LPJC --secret-
key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ

d. Optional: Delete the default zone, zone group, and the associated pools.

IMPORTANT: Do not delete the default zone and its pools if you are using the default zone and zone group to store
data. Also, removing the default zone group deletes the system user.

To access old data in the default zone and zonegroup, use --rgw-zone default and --rgw-zonegroup
default in radosgw-admin commands.

Example

[ceph: root@host01 /]# radosgw-admin zonegroup delete --rgw-zonegroup=default


[ceph: root@host01 /]# ceph osd pool rm default.rgw.log default.rgw.log --yes-i-really-
really-mean-it
[ceph: root@host01 /]# ceph osd pool rm default.rgw.meta default.rgw.meta --yes-i-really-
really-mean-it
[ceph: root@host01 /]# ceph osd pool rm default.rgw.control default.rgw.control --yes-i-
really-really-mean-it
[ceph: root@host01 /]# ceph osd pool rm default.rgw.data.root default.rgw.data.root --
yes-i-really-really-mean-it
[ceph: root@host01 /]# ceph osd pool rm default.rgw.gc default.rgw.gc --yes-i-really-
really-mean-it

e. Create a system user:

Syntax

IBM Storage Ceph 483


radosgw-admin user create --uid=USER_NAME --display-name="USER_NAME" --access-
key=SYSTEM_ACCESS_KEY --secret=SYSTEM_SECRET_KEY --system

Example

[ceph: root@host01 /]# radosgw-admin user create --uid=zone.user --display-name="Zone


user" --system

Make a note of the access_key and secret_key.

f. Add the access key and system key to the primary zone:

Syntax

radosgw-admin zone modify --rgw-zone=PRIMARY_ZONE_NAME --access-key=ACCESS_KEY --


secret=SECRET_KEY

Example

[ceph: root@host01 /]# radosgw-admin zone modify --rgw-zone=us-east-1 --access-


key=NE48APYCAODEPLKBCZVQ--secret=u24GHQWRE3yxxNBnFBzjM4jn14mFIckQ4EKL6LoW

g. Commit the changes:

Syntax

radosgw-admin period update --commit

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

h. Outside the cephadm shell, fetch the FSID of the storage cluster and the processes:

Example

[root@host01 ~]# systemctl list-units | grep ceph

i. Start the Ceph Object Gateway daemon:

Syntax

systemctl start ceph-FSID@DAEMON_NAME


systemctl enable ceph-FSID@DAEMON_NAME

Example

[root@host01 ~]# systemctl start ceph-62a081a6-88aa-11eb-a367-


[email protected]_realm.us-east-1.host01.ahdtsw.service
[root@host01 ~]# systemctl enable ceph-62a081a6-88aa-11eb-a367-
[email protected]_realm.us-east-1.host01.ahdtsw.service

2. In the Cephadm shell, configure the secondary zone.

a. Pull the primary realm configuration from the host:

Syntax

radosgw-admin realm pull --url=URL_TO_PRIMARY_ZONE_GATEWAY --access-key=ACCESS_KEY --


secret-key=SECRET_KEY

Example

[ceph: root@host04 /]# radosgw-admin realm pull --url=http://10.74.249.26:80 --access-


key=LIPEYZJLTWXRKXS9LPJC --secret-key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ

b. Pull the primary period configuration from the host:

Syntax

radosgw-admin period pull --url=URL_TO_PRIMARY_ZONE_GATEWAY --access-key=ACCESS_KEY --


secret-key=SECRET_KEY

Example

484 IBM Storage Ceph


[ceph: root@host04 /]# radosgw-admin period pull --url=http://10.74.249.26:80 --access-
key=LIPEYZJLTWXRKXS9LPJC --secret-key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ

c. Configure a secondary zone:

Syntax

radosgw-admin zone create --rgw-zonegroup=ZONE_GROUP_NAME


--rgw-zone=_SECONDARY_ZONE_NAME_ --
endpoints=http://RGW_SECONDARY_HOSTNAME:RGW_PRIMARY_PORT_NUMBER_1
--access-key=_SYSTEM_ACCESS_KEY_ --secret=SYSTEM_SECRET_KEY
--endpoints=http://FQDN:80
[--read-only]

Example

[ceph: root@host04 /]# radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-2


--endpoints=http://rgw2:80 --access-key=LIPEYZJLTWXRKXS9LPJC --secret-
key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ --endpoints=http://rgw.example.com:80

d. Optional: Delete the default zone.

IMPORTANT: Do not delete the default zone and its pools if you are using the default zone and zone group to store
data.

To access old data in the default zone and zonegroup, use --rgw-zone default and --rgw-zonegroup
default in radosgw-admin commands.

Example

[ceph: root@host04 /]# radosgw-admin zone rm --rgw-zone=default


[ceph: root@host04 /]# ceph osd pool rm default.rgw.log default.rgw.log --yes-i-really-
really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.meta default.rgw.meta --yes-i-really-
really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.control default.rgw.control --yes-i-
really-really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.data.root default.rgw.data.root --
yes-i-really-really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.gc default.rgw.gc --yes-i-really-
really-mean-it

e. Update the Ceph configuration database:

Syntax

ceph config set SERVICE_NAME rgw_zone SECONDARY_ZONE_NAME

Example

[ceph: root@host04 /]# ceph config set rgw rgw_zone us-east-2

f. Commit the changes:

Syntax

radosgw-admin period update --commit

Example

[ceph: root@host04 /]# radosgw-admin period update --commit

g. Outside the Cephadm shell, fetch the FSID of the storage cluster and the processes:

Example

[root@host04 ~]# systemctl list-units | grep ceph

h. Start the Ceph Object Gateway daemon:

Syntax

systemctl start ceph-FSID@DAEMON_NAME


systemctl enable ceph-FSID@DAEMON_NAME

IBM Storage Ceph 485


Example

[root@host04 ~]# systemctl start ceph-62a081a6-88aa-11eb-a367-


[email protected]_realm.us-east-2.host04.ahdtsw.service
[root@host04 ~]# systemctl enable ceph-62a081a6-88aa-11eb-a367-
[email protected]_realm.us-east-2.host04.ahdtsw.service

3. Optional: Deploy multi-site Ceph Object Gateways using the placement specification:

Syntax

ceph orch apply rgw NAME --realm=REALM_NAME --zone=PRIMARY_ZONE_NAME --


placement="NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_2"

Example

[ceph: root@host04 /]# ceph orch apply rgw east --realm=test_realm --zone=us-east-1 --
placement="2 host01 host02"

Verification
Edit online

Check the synchronization status to verify the deployment:

Example

[ceph: root@host04 /]# radosgw-admin sync status

Removing the Ceph Object Gateway


Edit online
You can remove the Ceph object gateway daemons using the ceph orch rm command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

At least one Ceph object gateway daemon deployed on the hosts.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List the service:

Example

[ceph: root@host01 /]# ceph orch ls

3. Remove the service:

Syntax

486 IBM Storage Ceph


ceph orch rm SERVICE_NAME

Example

[ceph: root@host01 /]# ceph orch rm rgw.test_realm.test_zone_bb

Verification
Edit online

List the hosts, daemons, and processes:

Syntax

ceph orch ps

Example

[ceph: root@host01 /]# ceph orch ps

Reference
Edit online

See Deploying the Ceph object gateway using the command line interface section in the IBM Storage Ceph Operations Guide for
more information.

See Deploying the Ceph object gateway using the service specification section in the IBM Storage Ceph Operations Guide for
more information.

Configuration of SNMP traps


Edit online
As a storage administrator, you can deploy and configure the simple network management protocol (SNMP) gateway in an IBM
Storage Ceph cluster to receive alerts from the Prometheus Alertmanager and route them as SNMP traps to the cluster.

Simple network management protocol


Configuring snmptrapd
Deploying the SNMP gateway

Simple network management protocol


Edit online
Simple network management protocol (SNMP) is one of the most widely used open protocols, to monitor distributed systems and
devices across a variety of hardware and software platforms. Ceph’s SNMP integration focuses on forwarding alerts from its
Prometheus Alertmanager cluster to a gateway daemon. The gateway daemon transforms the alert into an SNMP Notification and
sends it on to a designated SNMP management platform. The gateway daemon is from the snmp_notifier_project, which
provides SNMP V2c and V3 support with authentication and encryption.

The IBM Ceph Storage SNMP gateway service deploys one instance of the gateway by default. You can increase this by providing
placement information. However, if you enable multiple SNMP gateway daemons, your SNMP management platform receives
multiple notifications for the same event.

The SNMP traps are alert messages and the Prometheus Alertmanager sends these alerts to the SNMP notifier which then looks for
object identifier (OID) in the given alerts’ labels. Each SNMP trap has a unique ID which allows it to send additional traps with
updated status to a given SNMP poller. SNMP hooks into the Ceph health checks so that every health warning generates a specific
SNMP trap.

In order to work correctly and transfer information on device status to the user to monitor, SNMP relies on several components.
There are four main components that makeup SNMP:

IBM Storage Ceph 487


SNMP Manager- The SNMP manager, also called a management station, is a computer that runs network monitoring
platforms. A platform that has the job of polling SNMP-enabled devices and retrieving data from them. An SNMP Manager
queries agents, receives responses from agents and acknowledges asynchronous events from agents.

SNMP Agent - An SNMP agent is a program that runs on a system to be managed and contains the MIB database for the
system. These collect data like bandwidth and disk space, aggregates it, and sends it to the management information base
(MIB).

Management information base (MIB) - These are components contained within the SNMP agents. The SNMP manager uses
this as a database and asks the agent for access to particular information. This information is needed for the network
management systems (NMS). The NMS polls the agent to take information from these files and then proceeds to translate it
into graphs and displays that can be viewed by the user. MIBs contain statistical and control values that are determined by the
network device.

SNMP Devices

The following versions of SNMP are compatible and supported for gateway implementation:

V2c - Uses a community string without any authentication and is vulnerable to outside attacks.

V3 authNoPriv - Uses the username and password authentication without encryption.

V3 authPriv - Uses the username and password authentication with encryption to the SNMP management platform.

IMPORTANT: When using SNMP traps, ensure that you have the correct security configuration for your version number to minimize
the vulnerabilities that are inherent to SNMP and keep your network protected from unauthorized users.

Configuring snmptrapd

Edit online
It is important to configure the simple network management protocol (SNMP) target before deploying the snmp-gateway because
the snmptrapd daemon contains the auth settings that you need to specify when creating the snmp-gateway service.

The SNMP gateway feature provides a means of exposing the alerts that are generated in the Prometheus stack to an SNMP
management platform. You can configure the SNMP traps to the destination based on the snmptrapd tool. This tool allows you to
establish one or more SNMP trap listeners.

The following parameters are important for configuration:

The engine-id is a unique identifier for the device, in hex, and required for SNMPV3 gateway. IBM recommends using
8000C53F_CLUSTER_FSID_WITHOUT_DASHES_for this parameter.

The snmp-community, which is the SNMP_COMMUNITY_FOR_SNMPV2 parameter, is public for SNMPV2c gateway.

The auth-protocol which is the AUTH_PROTOCOL, is mandatory for SNMPV3 gateway and is SHA by default.

The privacy-protocol, which is the PRIVACY_PROTOCOL, is mandatory for SNMPV3 gateway.

The PRIVACY_PASSWORD is mandatory for SNMPV3 gateway with encryption.

The SNMP_V3_AUTH_USER_NAME is the user name and is mandatory for SNMPV3 gateway.

The SNMP_V3_AUTH_PASSWORD is the password and is mandatory for SNMPV3 gateway.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the nodes.

Install firewalld on {os-product} system.

488 IBM Storage Ceph


Procedure
Edit online

1. On the SNMP management host, install the SNMP packages:

Example

[root@host01 ~]# dnf install -y net-snmp-utils net-snmp

2. Open the port 162 for SNMP to receive alerts:

Example

[root@host01 ~]# firewall-cmd --zone=public --add-port=162/udp


[root@host01 ~]# firewall-cmd --zone=public --add-port=162/udp --permanent

3. Implement the management information base (MIB) to make sense of the SNMP notification and enhance SNMP support on
the destination host. Copy the raw file from the main repository:
https://github.com/ceph/ceph/blob/master/monitoring/snmp/CEPH-MIB.txt

Example

[root@host01 ~]# curl -o CEPH_MIB.txt -L


https://raw.githubusercontent.com/ceph/ceph/master/monitoring/snmp/CEPH-MIB.txt
[root@host01 ~]# scp CEPH_MIB.txt root@host02:/usr/share/snmp/mibs

4. Create the snmptrapd directory.

Example

[root@host01 ~]# mkdir /root/snmptrapd/

5. Create the configuration files in snmptrapd directory for each protocol based on the SNMP version:

Syntax

format2 %V\n% Agent Address: %A \n Agent Hostname: %B \n Date: %H - %J - %K - %L - %M - %Y \n


Enterprise OID: %N \n Trap Type: %W \n Trap Sub-Type: %q \n Community/Infosec Context: %P \n
Uptime: %T \n Description: %W \n PDU Attribute/Value Pair Array:\n%v \n -------------- \n
createuser -e 0x_ENGINE_ID_ SNMPV3_AUTH_USER_NAME AUTH_PROTOCOL SNMP_V3_AUTH_PASSWORD
PRIVACY_PROTOCOL PRIVACY_PASSWORD
authuser log,execute SNMP_V3_AUTH_USER_NAME
authCommunity log,execute,net SNMP_COMMUNITY_FOR_SNMPV2

For SNMPV2c, create the snmptrapd_public.conf file as follows:

Example

format2 %V\n% Agent Address: %A \n Agent Hostname: %B \n Date: %H - %J - %K - %L - %M -


%Y \n Enterprise OID: %N \n Trap Type: %W \n Trap Sub-Type: %q \n Community/Infosec
Context: %P \n Uptime: %T \n Description: %W \n PDU Attribute/Value Pair Array:\n%v \n --
------------ \n

authCommunity log,execute,net public

The public setting here must match the snmp_community setting used when deploying the snmp-gateway service.

For SNMPV3 with authentication only, create the snmptrapd_auth.conf file as follows:

Example

format2 %V\n% Agent Address: %A \n Agent Hostname: %B \n Date: %H - %J - %K - %L - %M -


%Y \n Enterprise OID: %N \n Trap Type: %W \n Trap Sub-Type: %q \n Community/Infosec
Context: %P \n Uptime: %T \n Description: %W \n PDU Attribute/Value Pair Array:\n%v \n --
------------ \n
createuser -e 0x8000C53Ff64f341c655d11eb8778fa163e914bcc myuser SHA mypassword
authuser log,execute myuser

The 0x8000C53Ff64f341c655d11eb8778fa163e914bcc string is the engine_id, and myuser and mypassword


are the credentials. The password security is defined by the SHA algorithm.

This corresponds to the settings for deploying the snmp-gateway daemon.

IBM Storage Ceph 489


Example

snmp_v3_auth_username: myuser
snmp_v3_auth_password: mypassword

For SNMPV3 with authentication and encryption, create the snmptrapd_authpriv.conf file as follows:

Example

format2 %V\n% Agent Address: %A \n Agent Hostname: %B \n Date: %H - %J - %K - %L - %M -


%Y \n Enterprise OID: %N \n Trap Type: %W \n Trap Sub-Type: %q \n Community/Infosec
Context: %P \n Uptime: %T \n Description: %W \n PDU Attribute/Value Pair Array:\n%v \n --
------------ \n
createuser -e 0x8000C53Ff64f341c655d11eb8778fa163e914bcc myuser SHA mypassword DES
mysecret
authuser log,execute myuser

The 0x8000C53Ff64f341c655d11eb8778fa163e914bcc string is the engine_id, and myuser and mypassword


are the credentials. The password security is defined by the SHA algorithm and DES is the type of privacy encryption.

This corresponds to the settings for deploying the snmp-gateway daemon.

Example

snmp_v3_auth_username: myuser
snmp_v3_auth_password: mypassword
snmp_v3_priv_password: mysecret

6. Run the daemon on the SNMP management host:

Syntax

/usr/sbin/snmptrapd -M /usr/share/snmp/mibs -m CEPH-MIB.txt -f -C -c


/root/snmptrapd/CONFIGURATION_FILE -Of -Lo :162

Example

[root@host01 ~]# /usr/sbin/snmptrapd -M /usr/share/snmp/mibs -m CEPH-MIB.txt -f -C -c


/root/snmptrapd/snmptrapd_auth.conf -Of -Lo :162

7. If any alert is triggered on the storage cluster, you can monitor the output on the SNMP management host. Verify the SNMP
traps and also the traps decoded by MIB.

Example

NET-SNMP version 5.8


Agent Address: 0.0.0.0
Agent Hostname: <UNKNOWN>
Date: 15 - 5 - 12 - 8 - 10 - 4461391
Enterprise OID: .
Trap Type: Cold Start
Trap Sub-Type: 0
Community/Infosec Context: TRAP2, SNMP v3, user myuser, context
Uptime: 0
Description: Cold Start
PDU Attribute/Value Pair Array:
.iso.org.dod.internet.mgmt.mib-2.1.3.0 = Timeticks: (292276100) 3 days, 19:52:41.00
.iso.org.dod.internet.snmpV2.snmpModules.1.1.4.1.0 = OID:
.iso.org.dod.internet.private.enterprises.ceph.cephCluster.cephNotifications.prometheus.promMg
r.promMgrPrometheusInactive
.iso.org.dod.internet.private.enterprises.ceph.cephCluster.cephNotifications.prometheus.promMg
r.promMgrPrometheusInactive.1 = STRING:
"1.3.6.1.4.1.50495.1.2.1.6.2[alertname=CephMgrPrometheusModuleInactive]"
.iso.org.dod.internet.private.enterprises.ceph.cephCluster.cephNotifications.prometheus.promMg
r.promMgrPrometheusInactive.2 = STRING: "critical"
.iso.org.dod.internet.private.enterprises.ceph.cephCluster.cephNotifications.prometheus.promMg
r.promMgrPrometheusInactive.3 = STRING: "Status: critical
- Alert: CephMgrPrometheusModuleInactive
Summary: Ceph's mgr/prometheus module is not available
Description: The mgr/prometheus module at 10.70.39.243:9283 is unreachable. This could mean
that the module has been disabled or the mgr itself is down.
Without the mgr/prometheus module metrics and alerts will no longer function. Open a shell to
ceph and use 'ceph -s' to determine whether the mgr is active. If the mgr is not active,
restart it, otherwise you can check the mgr/prometheus module is loaded with 'ceph mgr module
ls' and if it's not listed as enabled, enable it with 'ceph mgr module enable prometheus'"

490 IBM Storage Ceph


In the above example, an alert is generated after the Prometheus module is disabled.

Reference
Edit online

See the Deploying the SNMP gateway section in the IBM Storage Ceph Operations Guide.

Deploying the SNMP gateway


Edit online
You can deploy the simple network management protocol (SNMP) gateway using either SNMPV2c or SNMPV3. There are two
methods to deploy the SNMP gateway:

1. By creating a credentials file.

2. By creating one service configuration yaml file with all the details.

You can use the following parameters to deploy the SNMP gateway based on the versions:

The service_type is the snmp-gateway.

The service_name is any user-defined string.

The count is the number of SNMP gateways to be deployed in a storage cluster.

The snmp_destination parameter must be of the format hostname:port.

The engine-id is a unique identifier for the device, in hex, and required for SNMPV3 gateway. IBM recommends to use
8000C53F_CLUSTER_FSID_WITHOUT_DASHES_for this parameter.

The snmp_community parameter is public for SNMPV2c gateway.

The auth-protocol is mandatory for SNMPV3 gateway and is SHA by default.

The privacy-protocol is mandatory for SNMPV3 gateway with authentication and encryption.

The port is 9464 by default.

You must provide a -i _FILENAME_ to pass the secrets and passwords to the orchestrator.

Once the SNMP gateway service is deployed or updated, the Prometheus Alertmanager configuration is automatically updated to
forward any alert that has an objectidentifier to the SNMP gateway daemon for further processing.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the nodes.

Configuring snmptrapd on the destination host, which is the SNMP management host.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

IBM Storage Ceph 491


2. Create a label for the host on which SNMP gateway needs to be deployed:

Syntax

ceph orch host label add HOSTNAME snmp-gateway

Example

[ceph: root@host01 /]# ceph orch host label add host02 snmp-gateway

3. Create a credentials file or a service configuration file based on the SNMP version:

For SNMPV2c, create the file as follows:

Example

[ceph: root@host01 /]# cat snmp_creds.yml

snmp_community: public

OR

Example

[ceph: root@host01 /]# cat snmp-gateway.yml

service_type: snmp-gateway
service_name: snmp-gateway
placement:
count: 1
spec:
credentials:
snmp_community: public
port: 9464
snmp_destination: 192.168.122.73:162
snmp_version: V2c

For SNMPV3 with authentication only, create the file as follows:

Example

[ceph: root@host01 /]# cat snmp_creds.yml

snmp_v3_auth_username: myuser
snmp_v3_auth_password: mypassword

OR

Example

[ceph: root@host01 /]# cat snmp-gateway.yml

service_type: snmp-gateway
service_name: snmp-gateway
placement:
count: 1
spec:
credentials:
snmp_v3_auth_password: mypassword
snmp_v3_auth_username: myuser
engine_id: 8000C53Ff64f341c655d11eb8778fa163e914bcc
port: 9464
snmp_destination: 192.168.122.1:162
snmp_version: V3

For SNMPV3 with authentication and encryption, create the file as follows:

Example

[ceph: root@host01 /]# cat snmp_creds.yml

snmp_v3_auth_username: myuser
snmp_v3_auth_password: mypassword
snmp_v3_priv_password: mysecret

492 IBM Storage Ceph


OR

Example

[ceph: root@host01 /]# cat snmp-gateway.yml

service_type: snmp-gateway
service_name: snmp-gateway
placement:
count: 1
spec:
credentials:
snmp_v3_auth_password: mypassword
snmp_v3_auth_username: myuser
snmp_v3_priv_password: mysecret
engine_id: 8000C53Ff64f341c655d11eb8778fa163e914bcc
port: 9464
snmp_destination: 192.168.122.1:162
snmp_version: V3

4. Run the ceph orch command:

Syntax

ceph orch apply snmp-gateway --snmp_version=V2c_OR_V3 --destination=SNMP_DESTINATION [--


port=PORT_NUMBER]\
[--engine-id=8000C53F_CLUSTER_FSID_WITHOUT_DASHES_] [--auth-protocol=MDS_OR_SHA] [--
privacy_protocol=DES_OR_AES] -i FILENAME

OR Syntax

ceph orch apply -i FILENAME.yml

For SNMPV2c, with the snmp_creds file, run the ceph orch command with the snmp-version as V2c:

Example

[ceph: root@host01 /]# ceph orch apply snmp-gateway --snmp-version=V2c --


destination=192.168.122.73:162 --port=9464 -i snmp_creds.yml

For SNMPV3 with authentication only, with the snmp_creds file, run the ceph orch command with the snmp-version as
V3 and engine-id:

Example

[ceph: root@host01 /]# ceph orch apply snmp-gateway --snmp-version=V3 --engine-


id=8000C53Ff64f341c655d11eb8778fa163e914bcc--destination=192.168.122.73:162 -i snmp_creds.yml

For SNMPV3 with authentication and encryption, with the snmp_creds file, run the ceph orch command with the snmp-
version as V3, privacy-protocol, and engine-id:

Example

[ceph: root@host01 /]# ceph orch apply snmp-gateway --snmp-version=V3 --engine-


id=8000C53Ff64f341c655d11eb8778fa163e914bcc--destination=192.168.122.73:162 --privacy-
protocol=AES -i snmp_creds.yml

OR

For all the SNMP versions, with the snmp-gateway file, run the following command:

Example

[ceph: root@host01 /]# ceph orch apply -i snmp-gateway.yml

Reference
Edit online

See the Configuring snmptrapd section in the IBM Storage Ceph Operations Guide.

IBM Storage Ceph 493


Handling a node failure
Edit online
As a storage administrator, you can experience a whole node failing within the storage cluster, and handling a node failure is similar
to handling a disk failure. With a node failure, instead of Ceph recovering placement groups (PGs) for only one disk, all PGs on the
disks within that node must be recovered. Ceph will detect that the OSDs are all down and automatically start the recovery process,
known as self-healing.

There are three node failure scenarios. Here is the high-level workflow for each scenario when replacing a node:

Replacing the node, but using the root and Ceph OSD disks from the failed node.

Disable backfilling.

Replace the node, taking the disks from the old node, and adding them to the new node.

Enable backfilling.

Replacing the node, reinstalling the operating system, and using the Ceph OSD disks from the failed node.

Disable backfilling.

Create a backup of the Ceph configuration.

Replace the node and add the Ceph OSD disks from the failed node.

Configuring disks as JBOD.

Install the operating system.

Restore the Ceph configuration.

Add the new node to the storage cluster commands and Ceph daemons are placed automatically on the respective
node.

Enable backfilling.

Replacing the node, reinstalling the operating system, and using all new Ceph OSDs disks.

Disable backfilling.

Remove all OSDs on the failed node from the storage cluster.

Create a backup of the Ceph configuration.

Replace the node and add the Ceph OSD disks from the failed node.

Configuring disks as JBOD.

Install the operating system.

Add the new node to the storage cluster commands and Ceph daemons are placed automatically on the respective
node.

Enable backfilling.

Considerations before adding or removing a node


Performance considerations
Recommendations for adding or removing nodes
Adding a Ceph OSD node
Removing a Ceph OSD node
Simulating a node failure

Considerations before adding or removing a node

494 IBM Storage Ceph


Edit online

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A failed node.

One of the outstanding features of Ceph is the ability to add or remove Ceph OSD nodes at run time. This means that you can resize
the storage cluster capacity or replace hardware without taking down the storage cluster.

The ability to serve Ceph clients while the storage cluster is in a degraded state also has operational benefits. For example, you can
add or remove or replace hardware during regular business hours, rather than working overtime or on weekends. However, adding
and removing Ceph OSD nodes can have a significant impact on performance.

Before you add or remove Ceph OSD nodes, consider the effects on storage cluster performance:

Whether you are expanding or reducing the storage cluster capacity, adding or removing Ceph OSD nodes induces backfilling
as the storage cluster rebalances. During that rebalancing time period, Ceph uses additional resources, which can impact
storage cluster performance.

In a production Ceph storage cluster, a Ceph OSD node has a particular hardware configuration that facilitates a particular
type of storage strategy.

Since a Ceph OSD node is part of a CRUSH hierarchy, the performance impact of adding or removing a node typically affects
the performance of pools that use the CRUSH ruleset.

Reference

For more information, see Storage Strategies.

Performance considerations
Edit online
The following factors typically affect a storage cluster’s performance when adding or removing Ceph OSD nodes:

Ceph clients place load on the I/O interface to Ceph; that is, the clients place load on a pool. A pool maps to a CRUSH ruleset.
The underlying CRUSH hierarchy allows Ceph to place data across failure domains. If the underlying Ceph OSD node involves a
pool that is experiencing high client load, the client load could significantly affect recovery time and reduce performance.
Because write operations require data replication for durability, write-intensive client loads in particular can increase the time
for the storage cluster to recover.

Generally, the capacity you are adding or removing affects the storage cluster’s time to recover. In addition, the storage
density of the node you add or remove might also affect recovery times. For example, a node with 36 OSDs typically takes
longer to recover than a node with 12 OSDs.

When removing nodes, you MUST ensure that you have sufficient spare capacity so that you will not reach full ratio or
near full ratio. If the storage cluster reaches full ratio, Ceph will suspend write operations to prevent data loss.

A Ceph OSD node maps to at least one Ceph CRUSH hierarchy, and the hierarchy maps to at least one pool. Each pool that
uses a CRUSH ruleset experiences a performance impact when Ceph OSD nodes are added or removed.

Replication pools tend to use more network bandwidth to replicate deep copies of the data, whereas erasure coded pools
tend to use more CPU to calculate k+m coding chunks. The more copies that exist of the data, the longer it takes for the
storage cluster to recover. For example, a larger pool or one that has a greater number of k+m chunks will take longer to
recover than a replication pool with fewer copies of the same data.

Drives, controllers and network interface cards all have throughput characteristics that might impact the recovery time.
Generally, nodes with higher throughput characteristics, such as 10 Gbps and SSDs, recover more quickly than nodes with
lower throughput characteristics, such as 1 Gbps and SATA drives.

IBM Storage Ceph 495


Recommendations for adding or removing nodes
Edit online
IBM recommends adding or removing one OSD at a time within a node and allowing the storage cluster to recover before proceeding
to the next OSD. This helps to minimize the impact on storage cluster performance. Note that if a node fails, you might need to
change the entire node at once, rather than one OSD at a time.

To remove an OSD:

Using Removing the OSD daemons.

To add an OSD:

Using Deploying Ceph OSDs on all available devices.

Using Deploying Ceph OSDs using advanced service specification.

Using Deploying Ceph OSDs on specific devices and hosts.

When adding or removing Ceph OSD nodes, consider that other ongoing processes also affect storage cluster performance. To
reduce the impact on client I/O, IBM recommends the following:

Calculate capacity

Before removing a Ceph OSD node, ensure that the storage cluster can backfill the contents of all its OSDs without reaching the full
ratio. Reaching the full ratio will cause the storage cluster to refuse write operations.

Temporarily disable scrubbing

Scrubbing is essential to ensuring the durability of the storage cluster’s data; however, it is resource intensive. Before adding or
removing a Ceph OSD node, disable scrubbing and deep-scrubbing and let the current scrubbing operations complete before
proceeding.

ceph osd set noscrub


ceph osd set nodeep-scrub

Once you have added or removed a Ceph OSD node and the storage cluster has returned to an active+clean state, unset the
noscrub and nodeep-scrub settings.

ceph osd unset noscrub


ceph osd unset nodeep-scrub

Limit backfill and recovery

If you have reasonable data durability, there is nothing wrong with operating in a degraded state. For example, you can operate the
storage cluster with osd_pool_default_size = 3 and osd_pool_default_min_size = 2. You can tune the storage cluster
for the fastest possible recovery time, but doing so significantly affects Ceph client I/O performance. To maintain the highest Ceph
client I/O performance, limit the backfill and recovery operations and allow them to take longer.

osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1

You can also consider setting the sleep and delay parameters such as, osd_recovery_sleep.

Increase the number of placement groups

Finally, if you are expanding the size of the storage cluster, you may need to increase the number of placement groups. If you
determine that you need to expand the number of placement groups, IBM recommends making incremental increases in the number
of placement groups. Increasing the number of placement groups by a significant amount will cause a considerable degradation in
performance.

Adding a Ceph OSD node

496 IBM Storage Ceph


Edit online
To expand the capacity of the IBM Storage Ceph cluster, you can add an OSD node.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A provisioned node with a network connection.

Procedure
Edit online

1. Verify that other nodes in the storage cluster can reach the new node by its short host name.

2. Temporarily disable scrubbing:

Example

[ceph: root@host01 /]# ceph osd set noscrub


[ceph: root@host01 /]# ceph osd set nodeep-scrub

3. Limit the backfill and recovery features:

Syntax

ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]

Example

[ceph: root@host01 /]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-


active 1 --osd-recovery-op-priority 1

4. Extract the cluster’s public SSH keys to a folder:

Syntax

ceph cephadm get-pub-key > ~/PATH

Example

[ceph: root@host01 /]# ceph cephadm get-pub-key > ~/ceph.pub

5. Copy ceph cluster’s public SSH keys to the root user’s authorized_keys file on the new host:

Syntax

ssh-copy-id -f -i ~/PATH root@HOST_NAME_2

Example

[ceph: root@host01 /]# ssh-copy-id -f -i ~/ceph.pub root@host02

6. Add the new node to the CRUSH map:

Syntax

ceph orch host add NODE_NAME IP_ADDRESS

Example

[ceph: root@host01 /]# ceph orch host add host02 10.10.128.70

7. Add an OSD for each disk on the node to the storage cluster.

Using Deploying Ceph OSDs on all available devices.

Using Deploying Ceph OSDs using advanced service specification.

IBM Storage Ceph 497


Using Deploying Ceph OSDs on specific devices and hosts.

IMPORTANT: When adding an OSD node to an IBM Storage Ceph cluster, IBM recommends adding one OSD daemon at a time and
allowing the cluster to recover to an active+clean state before proceeding to the next OSD.

Reference
Edit online

See the Setting a Specific Configuration Setting at Runtime.

See Adding a Bucket and Moving a Bucket for details on placing the node at an appropriate location in the CRUSH hierarchy.

Removing a Ceph OSD node


Edit online
To reduce the capacity of a storage cluster, remove an OSD node.

WARNING: Before removing a Ceph OSD node, ensure that the storage cluster can backfill the contents of all OSDs without reaching
the full ratio. Reaching the full ratio will cause the storage cluster to refuse write operations.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all nodes in the storage cluster.

Procedure
Edit online

1. Check the storage cluster’s capacity:

Syntax

ceph df
rados df
ceph osd df

2. Temporarily disable scrubbing:

Syntax

ceph osd set noscrub


ceph osd set nodeep-scrub

3. Limit the backfill and recovery features:

Syntax

ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]

Example

[ceph: root@host01 /]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-


active 1 --osd-recovery-op-priority 1

4. Remove each OSD on the node from the storage cluster:

IMPORTANT: When removing an OSD node from the storage cluster, IBM recommends removing one OSD at a time
within the node and allowing the cluster to recover to an active+clean state before proceeding to remove the next
OSD.

498 IBM Storage Ceph


After you remove an OSD, check to verify that the storage cluster is not getting to the near-full ratio:

Syntax

ceph -s
ceph df

Repeat this step until all OSDs on the node are removed from the storage cluster.

5. Once all OSDs are removed, remove the host:

Using Removing hosts.

Reference
Edit online

See the Setting a specific configuration at runtime section in the IBM Storage Ceph Configuration Guide for more details.

Simulating a node failure


Edit online
To simulate a hard node failure, power off the node and reinstall the operating system.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all nodes on the storage cluster.

Procedure
Edit online

1. Check the storage cluster’s capacity to understand the impact of removing the node:

Example

[ceph: root@host01 /]# ceph df


[ceph: root@host01 /]# rados df
[ceph: root@host01 /]# ceph osd df

2. Optionally, disable recovery and backfilling:

Example

[ceph: root@host01 /]# ceph osd set noout


[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

3. Shut down the node.

4. If you are changing the host name, remove the node from CRUSH map:

Example

[ceph: root@host01 /]# ceph osd crush rm host03

5. Check the status of the storage cluster:

Example

[ceph: root@host01 /]# ceph -s

IBM Storage Ceph 499


6. Reinstall the operating system on the node.

7. Add the new node:

For more information, see Adding hosts.

8. Optionally, enable recovery and backfilling:

Example

[ceph: root@host01 /]# ceph osd unset noout


[ceph: root@host01 /]# ceph osd unset noscrub
[ceph: root@host01 /]# ceph osd unset nodeep-scrub

9. Check Ceph’s health:

Example

[ceph: root@host01 /]# ceph -s

Reference
Edit online

For more information, see Installation.

Handling a data center failure


Edit online
As a storage administrator, you can take preventive measures to avoid a data center failure. These preventive measures include:

Configuring the data center infrastructure.

Setting up failure domains within the CRUSH map hierarchy.

Designating failure nodes within the domains.

Avoiding a data center failure


Handling a data center failure

Prerequisites
Edit online

A healthy running IBM Storage Ceph cluster.

Root-level access to all nodes in the storage cluster.

Avoiding a data center failure


Edit online
Configuring the data center infrastructure

Each data center within a stretch cluster can have a different storage cluster configuration to reflect local capabilities and
dependencies. Set up replication between the data centers to help preserve the data. If one data center fails, the other data centers
in the storage cluster contain copies of the data.

Setting up failure domains within the CRUSH map hierarchy

Failure, or failover, domains are redundant copies of domains within the storage cluster. If an active domain fails, the failure domain
becomes the active domain.

500 IBM Storage Ceph


By default, the CRUSH map lists all nodes in a storage cluster within a flat hierarchy. However, for best results, create a logical
hierarchical structure within the CRUSH map. The hierarchy designates the domains to which each node belongs and the
relationships among those domains within the storage cluster, including the failure domains. Defining the failure domains for each
domain within the hierarchy improves the reliability of the storage cluster.

When planning a storage cluster that contains multiple data centers, place the nodes within the CRUSH map hierarchy so that if one
data center goes down, the rest of the storage cluster stays up and running.

Designating failure nodes within the domains

If you plan to use three-way replication for data within the storage cluster, consider the location of the nodes within the failure
domain. If an outage occurs within a data center, it is possible that some data might reside in only one copy. When this scenario
happens, there are two options:

Leave the data in read-only status with the standard settings.

Live with only one copy for the duration of the outage.

With the standard settings, and because of the randomness of data placement across the nodes, not all the data will be affected, but
some data can have only one copy and the storage cluster would revert to read-only mode. However, if some data exist in only one
copy, the storage cluster reverts to read-only mode.

Handling a data center failure


Edit online
IBM Storage Ceph can withstand catastrophic failures to the infrastructure, such as losing one of the data centers in a stretch cluster.
For the standard object store use case, configuring all three data centers can be done independently with replication set up between
them. In this scenario, the storage cluster configuration in each of the data centers might be different, reflecting the local capabilities
and dependencies.

A logical structure of the placement hierarchy should be considered. A proper CRUSH map can be used, reflecting the hierarchical
structure of the failure domains within the infrastructure. Using logical hierarchical definitions improves the reliability of the storage
cluster, versus using the standard hierarchical definitions. Failure domains are defined in the CRUSH map. The default CRUSH map
contains all nodes in a flat hierarchy. In a three data center environment, such as a stretch cluster, the placement of nodes should be
managed in a way that one data center can go down, but the storage cluster stays up and running. Consider which failure domain a
node resides in when using 3-way replication for the data.

In the example below, the resulting map is derived from the initial setup of the storage cluster with 6 OSD nodes. In this example, all
nodes have only one disk and hence one OSD. All of the nodes are arranged under the default root, that is the standard root of the
hierarchy tree. Because there is a weight assigned to two of the OSDs, these OSDs receive fewer chunks of data than the other OSDs.
These nodes were introduced later with bigger disks than the initial OSD disks. This does not affect the data placement to withstand
a failure of a group of nodes.

Example

[ceph: root@host01 /]# ceph osd tree


ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.33554 root default
-2 0.04779 host host03
0 0.04779 osd.0 up 1.00000 1.00000
-3 0.04779 host host02
1 0.04779 osd.1 up 1.00000 1.00000
-4 0.04779 host host01
2 0.04779 osd.2 up 1.00000 1.00000
-5 0.04779 host host04
3 0.04779 osd.3 up 1.00000 1.00000
-6 0.07219 host host06
4 0.07219 osd.4 up 0.79999 1.00000
-7 0.07219 host host05
5 0.07219 osd.5 up 0.79999 1.00000

Using logical hierarchical definitions to group the nodes into same data center can achieve data placement maturity. Possible
definition types of root, datacenter, rack, row and host allow the reflection of the failure domains for the three data center stretch
cluster:

Nodes host01 and host02 reside in data center 1 (DC1)

IBM Storage Ceph 501


Nodes host03 and host05 reside in data center 2 (DC2)

Nodes host04 and host06 reside in data center 3 (DC3)

All data centers belong to the same structure (allDC)

Since all OSDs in a host belong to the host definition there is no change needed. All the other assignments can be adjusted during
runtime of the storage cluster by:

Defining the bucket structure with the following commands:

ceph osd crush add-bucket allDC root


ceph osd crush add-bucket DC1 datacenter
ceph osd crush add-bucket DC2 datacenter
ceph osd crush add-bucket DC3 datacenter

Moving the nodes into the appropriate place within this structure by modifying the CRUSH map:

ceph osd crush move DC1 root=allDC


ceph osd crush move DC2 root=allDC
ceph osd crush move DC3 root=allDC
ceph osd crush move host01 datacenter=DC1
ceph osd crush move host02 datacenter=DC1
ceph osd crush move host03 datacenter=DC2
ceph osd crush move host05 datacenter=DC2
ceph osd crush move host04 datacenter=DC3
ceph osd crush move host06 datacenter=DC3

Within this structure any new hosts can be added too, as well as new disks. By placing the OSDs at the right place in the hierarchy the
CRUSH algorithm is changed to place redundant pieces into different failure domains within the structure.

The above example results in the following:

Example

[ceph: root@host01 /]# ceph osd tree


ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-8 6.00000 root allDC
-9 2.00000 datacenter DC1
-4 1.00000 host host01
2 1.00000 osd.2 up 1.00000 1.00000
-3 1.00000 host host02
1 1.00000 osd.1 up 1.00000 1.00000
-10 2.00000 datacenter DC2
-2 1.00000 host host03
0 1.00000 osd.0 up 1.00000 1.00000
-7 1.00000 host host05
5 1.00000 osd.5 up 0.79999 1.00000
-11 2.00000 datacenter DC3
-6 1.00000 host host06
4 1.00000 osd.4 up 0.79999 1.00000
-5 1.00000 host host04
3 1.00000 osd.3 up 1.00000 1.00000
-1 0 root default

The listing from above shows the resulting CRUSH map by displaying the osd tree. Easy to see is now how the hosts belong to a data
center and all data centers belong to the same top level structure but clearly distinguishing between locations.

NOTE: Placing the data in the proper locations according to the map works only properly within the healthy cluster. Misplacement
might happen under circumstances, when some OSDs are not available. Those misplacements will be corrected automatically once it
is possible to do so.

Reference
Edit online

See the CRUSH administration chapter in the IBM Storage Ceph Storage Strategies Guide for more information.

Dashboard
502 IBM Storage Ceph
Edit online
Use this information to understand how to use the IBM Storage Ceph Dashboard for monitoring and management purposes.

Ceph dashboard overview


Ceph Dashboard installation and access
Management of roles
Management of users
Management of Ceph daemons
Monitor the cluster
Management of Alerts
Management of pools
Management of hosts
Management of Ceph OSDs
Management of Ceph Object Gateway
Management of block devices
Activating and deactivating telemetry

Ceph dashboard overview


Edit online
As a storage administrator, the IBM Storage Ceph Dashboard provides management and monitoring capabilities, allowing you to
administer and configure the cluster, as well as visualize information and performance statistics related to it. The dashboard uses a
web server hosted by the ceph-mgr daemon.

The dashboard is accessible from a web browser and includes many useful management and monitoring features, for example, to
configure manager modules and monitor the state of OSDs.

Ceph Dashboard components


Ceph Dashboard features
IBM Storage Ceph Dashboard architecture

Prerequisites
Edit online

System administrator level experience.

Ceph Dashboard components


Edit online
The functionality of the dashboard is provided by multiple components.

The Cephadm application for deployment.

The embedded dashboard ceph-mgr module.

The embedded Prometheus ceph-mgr module.

The Prometheus time-series database.

The Prometheus node-exporter daemon, running on each host of the storage cluster.

The Grafana platform to provide monitoring user interface and alerting.

Reference
Edit online

For more information, see Prometheus website.

IBM Storage Ceph 503


For more information, see Grafana website.

Ceph Dashboard features


Edit online
The Ceph dashboard provides the following features:

Multi-user and role management: The dashboard supports multiple user accounts with different permissions and roles. User
accounts and roles can be managed using both, the command line and the web user interface. The dashboard supports
various methods to enhance password security. Password complexity rules may be configured, requiring users to change their
password after the first login or after a configurable time period.

Single Sign-On (SSO): The dashboard supports authentication with an external identity provider using the SAML 2.0 protocol.

Auditing: The dashboard backend can be configured to log all PUT, POST and DELETE API requests in the Ceph manager log.

Management features

View cluster hierarchy: You can view the CRUSH map, for example, to determine which host a specific OSD ID is running on.
This is helpful if there is an issue with an OSD.

Configure manager modules: You can view and change parameters for Ceph manager modules.

Embedded Grafana Dashboards: Ceph Dashboard Grafana dashboards might be embedded in external applications and web
pages to surface information and performance metrics gathered by the Prometheus module.

View and filter logs: You can view event and audit cluster logs and filter them based on priority, keyword, date, or time range.

Toggle dashboard components: You can enable and disable dashboard components so only the features you need are
available.

Manage OSD settings: You can set cluster-wide OSD flags using the dashboard. You can also Mark OSDs up, down or out,
purge and reweight OSDs, perform scrub operations, modify various scrub-related configuration options, select profiles to
adjust the level of backfilling activity. You can set and change the device class of an OSD, display and sort OSDs by device
class. You can deploy OSDs on new drives and hosts.

Viewing Alerts: The alerts page allows you to see details of current alerts.

Quality of Service for images: You can set performance limits on images, for example limiting IOPS or read BPS burst rates.

Monitoring features

Username and password protection: You can access the dashboard only by providing a configurable user name and
password.

Overall cluster health: Displays performance and capacity metrics. This also displays the overall cluster status, storage
utilization, for example, number of objects, raw capacity, usage per pool, a list of pools and their status and usage statistics.

Hosts: Provides a list of all hosts associated with the cluster along with the running services and the installed Ceph version.

Performance counters: Displays detailed statistics for each running service.

Monitors: Lists all Monitors, their quorum status and open sessions.

Configuration editor: Displays all the available configuration options, their descriptions, types, default, and currently set
values. These values are editable.

Cluster logs: Displays and filters the latest updates to the cluster’s event and audit log files by priority, date, or keyword.

Device management: Lists all hosts known by the Orchestrator. Lists all drives attached to a host and their properties.
Displays drive health predictions, SMART data, and blink enclosure LEDs.

View storage cluster capacity: You can view raw storage capacity of the IBM Storage Ceph cluster in the Capacity panels of
the Ceph dashboard.

504 IBM Storage Ceph


Pools: Lists and manages all Ceph pools and their details. For example: applications, placement groups, replication size, EC
profile, quotas, CRUSH ruleset, etc.

OSDs: Lists and manages all OSDs, their status and usage statistics as well as detailed information like attributes, like OSD
map, metadata, and performance counters for read and write operations. Lists all drives associated with an OSD.

Images: Lists all RBD images and their properties such as size, objects, and features. Create, copy, modify and delete RBD
images. Create, delete, and rollback snapshots of selected images, protect or unprotect these snapshots against modification.
Copy or clone snapshots, flatten cloned images.

NOTE: The performance graph for I/O changes in the Overall Performance tab for a specific image shows values only after
specifying the pool that includes that image by setting the rbd_stats_pool parameter in Cluster > Manager modules >
Prometheus.

RBD Mirroring: Enables and configures RBD mirroring to a remote Ceph server. Lists all active sync daemons and their status,
pools and RBD images including their synchronization state.

Ceph File Systems: Lists all active Ceph file system (CephFS) clients and associated pools, including their usage statistics.
Evict active CephFS clients, manage CephFS quotas and snapshots, and browse a CephFS directory structure.

Object Gateway (RGW): Lists all active object gateways and their performance counters. Displays and manages, including
add, edit, delete, object gateway users and their details, for example quotas, as well as the users’ buckets and their details,
for example, owner or quotas.

Security features

SSL and TLS support: All HTTP communication between the web browser and the dashboard is secured via SSL. A self-signed
certificate can be created with a built-in command, but it is also possible to import custom certificates signed and issued by a
Certificate Authority (CA).

Reference
Edit online

For more information, see Toggling Ceph dashboard features.

IBM Storage Ceph Dashboard architecture


Edit online
The Dashboard architecture depends on the Ceph manager dashboard plugin and other components. See the diagram below to
understand how they work together.

IBM Storage Ceph 505


Ceph Dashboard installation and access
Edit online
As a system administrator, you can access the dashboard with the credentials provided on bootstrapping the cluster.

Cephadm installs the dashboard by default. Following is an example of the dashboard URL:

URL: https://host01:8443/
User: admin
Password: zbiql951ar

NOTE: Update the browser and clear the cookies prior to accessing the dashboard URL.

The following are the Cephadm bootstrap options that are available for the Ceph dashboard configurations:

–initial-dashboard-user INITIAL_DASHBOARD_USER
Use this option while bootstrapping to set initial-dashboard-user.

–initial-dashboard-password INITIAL_DASHBOARD_PASSWORD
Use this option while bootstrapping to set initial-dashboard-password.

–ssl-dashboard-port SSL_DASHBOARD_PORT
Use this option while bootstrapping to set custom dashboard port other than default 8443.

–dashboard-key DASHBOARD_KEY
Use this option while bootstrapping to set Custom key for SSL.

–dashboard-crt DASHBOARD_CRT
Use this option while bootstrapping to set Custom certificate for SSL.

–skip-dashboard
Use this option while bootstrapping to deploy Ceph without dashboard.

–dashboard-password-noupdate
Use this option while bootstrapping if you used above two options and don't want to reset password at the first time login.

–allow-fqdn-hostname

506 IBM Storage Ceph


Use this option while bootstrapping to allow hostname that is fully-qualified.

–skip-prepare-host
Use this option while bootstrapping to skip preparing the host.

NOTE:

To avoid connectivity issues with dashboard related external URL, use the fully qualified domain names (FQDN) for
hostnames. For example, host01.ceph.redhat.com.

Open the Grafana URL directly in the client internet browser and accept the security exception to see the graphs on the Ceph
dashboard. Reload the browser to view the changes.

Example

[root@host01 ~]# cephadm bootstrap --mon-ip 127.0.0.1 --registry-json cephadm.txt --initial-


dashboard-user admin --initial-dashboard-password zbiql951ar --dashboard-password-noupdate --
allow-fqdn-hostname

NOTE:

While boostrapping the storage cluster using cephadm, you can use the --image option for either custom container images
or local container images.

You have to change the password the first time you log into the dashboard with the credentials provided on bootstrapping only
if --dashboard-password-noupdate option is not used while bootstrapping. You can find the Ceph dashboard credentials
in the var/log/ceph/cephadm.log file. Search with the "Ceph Dashboard is now available at" string. For example,
host01.ceph.redhat.com.

Network port requirements for Ceph Dashboard


Accessing the Ceph dashboard
Setting message of the day (MOTD)
Expanding the cluster
Toggling Ceph dashboard features
Understanding the landing page of the Ceph dashboard
Changing the dashboard password
Changing the Ceph dashboard password using the command line interface
Setting admin user password for Grafana
Enabling IBM Storage Ceph Dashboard manually
Creating an admin account for syncing users to the Ceph dashboard
Syncing users to the Ceph dashboard using Red Hat Single Sign-On
Enabling Single Sign-On for the Ceph Dashboard
Disabling Single Sign-On for the Ceph Dashboard

Network port requirements for Ceph Dashboard


Edit online
The Ceph dashboard components use certain TCP network ports which must be accessible. By default, the network ports are
automatically opened in firewalld during installation of IBM Storage Ceph.

TCP Port Requirements

Port Use Originating Host Destination Host


8443 The dashboard web IP addresses that need access to Ceph The Ceph Manager hosts.
interface Dashboard UI and the host under Grafana server,
since the AlertManager service can also initiate
connections to the Dashboard for reporting
alerts.
3000 Grafana IP addresses that need access to Grafana The host or hosts running Grafana
Dashboard UI and all Ceph Manager hosts and server.
Grafana server.

IBM Storage Ceph 507


Port Use Originating Host Destination Host
9095 Default Prometheus IP addresses that need access to Prometheus UI The host or hosts running
server for basic and all Ceph Manager hosts and Grafana server Prometheus.
Prometheus graphs or Hosts running Prometheus.
9093 Prometheus IP addresses that need access to Alertmanager All Ceph Manager hosts and the
Alertmanager Web UI and all Ceph Manager hosts and Grafana host under Grafana server.
server or Hosts running Prometheus.
9094 Prometheus All Ceph Manager hosts and the host under Prometheus Alertmanager High
Alertmanager for Grafana server. Availability (peer daemon sync), so
configuring a highly both src and dst should be hosts
available cluster made running Prometheus Alertmanager.
from multiple instances
9100 The Prometheus node- Hosts running Prometheus that need to view All storage cluster hosts, including
exporter daemon Node Exporter metrics Web UI and All Ceph MONs, OSDS, Grafana server host.
Manager hosts and Grafana server or Hosts
running Prometheus.
9283 Ceph Manager Hosts running Prometheus that need access to All Ceph Manager hosts.
Prometheus exporter Ceph Exporter metrics Web UI and Grafana
module server.

Reference
Edit online

For more information, see Installing.

For more information, see Using and configuring firewalld.

Accessing the Ceph dashboard


Edit online
You can access the Ceph dashboard to administer and monitor your IBM Storage Ceph cluster.

Prerequisites
Edit online

Successful installation of IBM Storage Ceph Dashboard.

NTP is synchronizing clocks properly.

Procedure
Edit online

1. Enter the following URL in a web browser:

Syntax

https://HOST_NAME:PORT

Replace:

HOST_NAME with the fully qualified domain name (FQDN) of the active manager host.

PORT with port 8443

Example

https://host01:8443

508 IBM Storage Ceph


You can also get the URL of the dashboard by running the following command in the Cephadm shell:

Example

[ceph: root@host01 /]# ceph mgr services

This command will show you all endpoints that are currently configured. Look for the dashboard key to obtain the URL for
accessing the dashboard.

2. On the login page, enter the username admin and the default password provided during bootstrapping.

3. You have to change the password the first time you log in to the IBM Storage Ceph dashboard.

4. After logging in, the dashboard default landing page is displayed, which provides a high-level overview of status, performance,
and capacity metrics of the IBM Storage Ceph cluster.

5. Click the following icon on the dashboard landing page to collapse or display the options in the vertical menu:

Reference
Edit online

For more information, see Changing the dashboard password.

Setting message of the day (MOTD)

IBM Storage Ceph 509


Edit online
Sometimes, there is a need to inform the Ceph Dashboard users about the latest news, updates, and information on IBM Storage
Ceph.

As a storage administrator, you can configure a message of the day (MOTD) using the command-line interface (CLI).

When the user logs in to the Ceph Dashboard, the configured MOTD is displayed at the top of the Ceph Dashboard similar to the
Telemetry module.

The importance of MOTD can be configured based on severity, such as info, warning, or danger.

A MOTD with a info or warning severity can be closed by the user. The info MOTD is not displayed anymore until the local storage
cookies are cleared or a new MOTD with a different severity is displayed. A MOTD with a warning severity is displayed again in a
new session.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with the monitoring stack installed.

Root-level access to the cephadm host.

The dashboard module enabled.

Procedure
Edit online

1. Configure a MOTD for the dashboard:

Syntax

ceph dashboard motd set SEVERITY EXPIRES MESSAGE

Example

[ceph: root@host01 /]# ceph dashboard motd set danger 2d "Custom login message"
Message of the day has been set.

Replace

SEVERITY can be info, warning, or danger.

EXPIRES can be for seconds (s), minutes (m), hours (h), days (d), weeks (w), or never expires (0).

MESSAGE can be any custom message that users can view as soon as they log in to the dashboard.

2. Optional: Set the MOTD that does not expire:

Example

[ceph: root@host01 /]# ceph dashboard motd set danger 0 "Custom login message"
Message of the day has been set.

3. Get the configured MOTD :

Example

[ceph: root@host01 /]# ceph dashboard motd get


Message="Custom login message", severity="danger", expires="2022-09-08T07:38:52.963882Z"

4. Optional: Clear the configure MOTD using the clear command:

Example

[ceph: root@host01 /]# ceph dashboard motd clear


Message of the day has been cleared.

510 IBM Storage Ceph


Verification
Edit online

Log in to the dashboard:

https://HOST_NAME:8443

Expanding the cluster


Edit online
You can use the dashboard to expand the IBM Storage Ceph cluster for adding hosts, adding OSDs, and creating services such as
Alertmanager, Cephadm-exporter, CephFS-mirror, Grafana, ingress, MDS, node-exporter, Prometheus, RBD-mirror, and Ceph Object
Gateway.

Once you bootstrap a new storage cluster, the Ceph Monitor and Ceph Manager daemons are created and the cluster is in
HEALTH_WARN state. After creating all the services for the cluster on the dashboard, the health of the cluster changes from
HEALTH_WARN to HEALTH_OK status.

Prerequisites
Edit online

Bootstrapped storage cluster. For more information, see Bootstrapping a new storage cluster.

At least cluster-manager role for the user on the IBM Storage Ceph Dashboard. For more information, see User roles and
permissions.

Procedure
Edit online

1. Copy the admin key from the bootstrapped host to other hosts:

Syntax

ssh-copy-id -f -i /etc/ceph/ceph.pub root@HOST_NAME

Example

[ceph: root@host01 /]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host02


[ceph: root@host01 /]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host03

2. Log in to the dashboard with the default credentials provided during bootstrap.

3. Change the password and log in to the dashboard with the new password .

4. On the landing page, click Expand Cluster.

5. Add hosts:

a. In the Add Hosts window, click +Add.

b. Provide the hostname. This is same as the hostname that was provided while copying the key from the bootstrapped
host.

NOTE: You can use the tool tip in the Add Hosts dialog box for more details.

c. Optional: Provide the respective IP address of the host.

d. Optional: Select the labels for the hosts on which the services are going to be created.

e. Click Add Host.

IBM Storage Ceph 511


f. Follow the above steps for all the hosts in the storage cluster.

6. In the Add Hosts window, click Next.

7. Create OSDs:

a. In the Create OSDs window, for Primary devices, Click +Add.

b. In the Primary Devices window, filter for the device and select the device.

c. Click Add.

d. Optional: In the Create OSDs window, if you have any shared devices such as WAL or DB devices, then add the devices.

e. Optional: Click on the check-box Encryption to encrypt the features.

f. In the Create OSDs window, click Next.

8. Create services:

a. In the Create Services window, click +Create.

b. In the Create Service dialog box,

i. Select the type of the service from the drop-down.

ii. Provide the service ID, a unique name of the service.

iii. Provide the placement by hosts or label.

iv. Select the hosts.

v. Provide the number of daemons or services that need to be deployed.

c. Click Create Service.

9. In the Create Service window, Click Next.

10. Review the Cluster Resources, Hosts by Services, Host Details. If you want to edit any parameter, click Back and follow the
above steps.

11. Click Expand Cluster.

12. You get a notification that the cluster expansion was successful.

13. The cluster health changes to HEALTH_OK status on the dashboard.

Verification
Edit online

1. Log in to the cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Run the ceph -s command.

Example

[ceph: root@host01 /]# ceph -s

The health of the cluster is HEALTH_OK.

Toggling Ceph dashboard features

512 IBM Storage Ceph


Edit online
You can customize the IBM Storage Ceph dashboard components by enabling or disabling features on demand. All features are
enabled by default. When disabling a feature, the web-interface elements become hidden and the associated REST API end-points
reject any further requests for that feature. Enabling and disabling dashboard features can be done from the command-line interface
or the web interface.

Available features:

Ceph Block Devices:

Image management, rbd

Mirroring, mirroring

Ceph Filesystem, cephfs

Ceph Object Gateway, rgw

NOTE: By default, the Ceph Manager is collocated with the Ceph Monitor.

NOTE: You can disable multiple features at once.

IMPORTANT: Once a feature is disabled, it can take up to 20 seconds to reflect the change in the web interface.

Prerequisites
Edit online

Installation and configuration of the IBM Storage Ceph dashboard software.

User access to the Ceph Manager host or the dashboard web interface.

Root level access to the Ceph Manager host.

Procedure
Edit online

1. To toggle the dashboard features from the dashboard web interface:

a. On the dashboard landing page, navigate to Cluster drop-down menu.

b. Select Manager Modules, and then select Dashboard.

c. In the Edit Manager module page, you can enable or disable the dashboard features by checking or unchecking the selection
box next to the feature name.

IBM Storage Ceph 513


d. Once the selections have been made, scroll down and click Update.

2. To toggle the dashboard features from the command-line interface:

a. Log in to the Cephadm shell:

[root@host01 ~]# cephadm shell

b. List the feature status:

[ceph: root@host01 /]# ceph dashboard feature status

c. Disable a feature:

[ceph: root@host01 /]# ceph dashboard feature disable rgw

This example disables the Ceph Object Gateway feature.

d. Enable a feature:

[ceph: root@host01 /]# ceph dashboard feature enable cephfs

This example enables the Ceph Filesystem feature.

514 IBM Storage Ceph


Understanding the landing page of the Ceph dashboard
Edit online
The landing page displays an overview of the entire Ceph cluster using navigation bars and individual panels.

The navigation bar provides the following options:

Messages about tasks and notifications.

Link to the documentation, Ceph Rest API, and details about the IBM Storage Ceph Dashboard.

Link to user management and telemetry configuration.

Link to change password and sign out of the dashboard.

The individual panel displays specific information about the state of the cluster.

Categories

The landing page organizes panels into the following three categories:

1. Status

2. Capacity

3. Performance

Status panel
The status panels display the health of the cluster and host and daemon states.

Cluster Status
Displays the current health status of the Ceph storage cluster.

Hosts
Displays the total number of hosts in the Ceph storage cluster.

Monitors
Displays the number of Ceph Monitors and the quorum status.

OSDs
Displays the total number of OSDs in the Ceph Storage cluster and the number that are up, and in.

Managers
Displays the number and status of the Manager Daemons.

IBM Storage Ceph 515


Object Gateways
Displays the number of Object Gateways in the Ceph storage cluster.

Metadata Servers
Displays the number and status of metadata servers for Ceph Filesystems (CephFS).

Capacity panel
The capacity panel displays storage usage metrics.

Raw Capacity
Displays the utilization and availability of the raw storage capacity of the cluster.

Objects
Displays the total number of objects in the pools and a graph dividing objects into states of Healthy, Misplaced, Degraded, or
Unfound.

PG Status
Displays the total number of Placement Groups and a graph dividing PGs into states of Clean, Working, Warning, or Unknown.
To simplify display of PG states Working and Warning actually each encompass multiple states.

The Working state includes PGs with any of these states:

activating

backfill_wait

backfilling

creating

deep

degraded

forced_backfill

forced_recovery

peering

peered

recovering

recovery_wait

repair

scrubbing

snaptrim

snaptrim_wait

The Warning state includes PGs with any of these states:

backfill_toofull

backfill_unfound

down

incomplete

inconsistent

recovery_toofull

recovery_unfound

remapped

516 IBM Storage Ceph


snaptrim_error

stale

undersized

Pools
Displays the number of storage pools in the Ceph cluster.

PGs per OSD


Displays the number of placement groups per OSD.

Performance panel
The performance panel display information related to data transfer speeds.

Client Read/Write
Displays total input/output operations per second, reads per second, and writes per second.

Client Throughput
Displays total client throughput, read throughput, and write throughput.

Recovery Throughput
Displays the data recovery rate.

Scrubbing
Displays whether Ceph is scrubbing data to verify its integrity.

Reference
Edit online

For more information, see Monitor the cluster.

Changing the dashboard password


Edit online
By default, the password for accessing dashboard is randomly generated by the system while bootstrapping the cluster. You have to
change the password the first time you log in to the IBM Storage Ceph dashboard. You can change the password for the admin user
using the dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Procedure
Edit online

1. Log in to the dashboard:

https://HOST_NAME:8443

2. Click the Dashboard Settings icon and then click User management.

3. To change the password of admin, click it’s row.

4. From the Edit drop-down menu, select Edit.

5. In the Edit User window, enter the new password, and change the other parameters, and then Click Edit User.

You will be logged out and redirected to the log-in screen. A notification appears confirming the password change.

IBM Storage Ceph 517


Changing the Ceph dashboard password using the command line
interface
Edit online
If you have forgotten your Ceph dashboard password, you can change the password using the command line interface.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the host on which the dashboard is installed.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Create the dashboard_password.yml file:

Example

[ceph: root@host01 /]# touch dashboard_password.yml

3. Edit the file and add the new dashboard password:

Example

[ceph: root@host01 /]# vi dashboard_password.yml

4. Reset the dashboard password:

Syntax

ceph dashboard ac-user-set-password DASHBOARD_USERNAME -i PASSWORD_FILE

Example

[ceph: root@host01 /]# ceph dashboard ac-user-set-password admin -i dashboard_password.yml


{"username": "admin", "password":
"$2b$12$i5RmvN1PolR61Fay0mPgt.GDpcga1QpYsaHUbJfoqaHd1rfFFx7XS", "roles": ["administrator"],
"name": null, "email": null, "lastUpdate": , "enabled": true, "pwdExpirationDate": null,
"pwdUpdateRequired": false}

Verification
Edit online

Log in to the dashboard with your new password.

Setting admin user password for Grafana

Edit online
By default, cephadm does not create an admin user for Grafana. With the Ceph Orchestrator, you can create an admin user and set
the password.

518 IBM Storage Ceph


With these credentials, you can log in to the storage cluster’s Grafana URL with the given password for the admin user.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with the monitoring stack installed.

Root-level access to the cephadm host.

The dashboard module enabled.

Procedure
Edit online

1. As a root user, create a grafana.yml file and provide the following details:

Syntax

service_type: grafana
spec:
initial_admin_password: PASSWORD

Example

service_type: grafana
spec:
initial_admin_password: mypassword

2. Mount the grafana.yml file under a directory in the container:

Example

[root@host01 ~]# cephadm shell --mount grafana.yml:/var/lib/ceph/grafana.yml

NOTE: Every time you exit the shell, you have to mount the file in the container before deploying the daemon.

3. Optional: Check if the dashboard Ceph Manager module is enabled:

Example

[ceph: root@host01 /]# ceph mgr module ls

4. Optional: Enable the dashboard Ceph Manager module:

Example

[ceph: root@host01 /]# ceph mgr module enable dashboard

5. Apply the specification using the orch command:

Syntax

ceph orch apply -i FILE_NAME.yml

Example

[ceph: root@host01 /]# ceph orch apply -i /var/lib/ceph/grafana.yml

6. Redeploy grafana service:

Example

[ceph: root@host01 /]# ceph orch redeploy grafana

This creates an admin user called admin with the given password and the user can log in to the Grafana URL with these
credentials.

Verification
IBM Storage Ceph 519
Edit online

Log in to Grafana with the credentials:

Syntax

https://HOST_NAME: PORT

Example

https://host01:3000/

Enabling IBM Storage Ceph Dashboard manually


Edit online
If you have installed an IBM Storage Ceph cluster by using --skip-dashboard option during bootstrap, you can see that the dashboard
URL and credentials are not available in the bootstrap output. You can enable the dashboard manually using the command-line
interface. Although the monitoring stack components such as Prometheus, Grafana, Alertmanager, and node-exporter are deployed,
they are disabled and you have to enable them manually.

Prerequisites
Edit online

A running IBM Storage Ceph cluster installed with --skip-dashboard option during bootstrap.

Root-level access to the host on which the dashboard needs to be enabled.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Check the Ceph Manager services:

Example

[ceph: root@host01 /]# ceph mgr services

{
"prometheus": "http://10.8.0.101:9283/"
}

You can see that the Dashboard URL is not configured.

3. Enable the dashboard module:

Example

[ceph: root@host01 /]# ceph mgr module enable dashboard

4. Create the self-signed certificate for the dashboard access:

Example

[ceph: root@host01 /]# ceph dashboard create-self-signed-cert

NOTE: You can disable the certificate verification to avoid certification errors.

5. Check the Ceph Manager services:

Example

520 IBM Storage Ceph


[ceph: root@host01 /]# ceph mgr services

{
"dashboard": "https://10.8.0.101:8443/",
"prometheus": "http://10.8.0.101:9283/"
}

6. Create the admin user and password to access the dashboard:

Syntax

echo -n "PASSWORD" > PASSWORD_FILE


ceph dashboard ac-user-create admin -i PASSWORD_FILE administrator

Example

[ceph: root@host01 /]# echo -n "p@ssw0rd" > password.txt


[ceph: root@host01 /]# ceph dashboard ac-user-create admin -i password.txt administrator

7. Enable the monitoring stack. See Enabling monitoring stack.

Creating an admin account for syncing users to the Ceph


dashboard
Edit online
You have to create an admin account to synchronize users to the Ceph dashboard.

After creating the account, use Red Hat Single Sign-on (SSO) to synchronize users to the Ceph dashboard. See Syncing users to the
Ceph dashboard using Red Hat Single Sign-On.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Admin level access to the dashboard.

Users are added to the dashboard.

Root-level access on all the hosts.

Red Hat Single Sign-On installed from a ZIP file. For more information, see Installing Red Hat Single Sign-On from a zip file.

Procedure
Edit online

1. Download the Red Hat Single Sign-On 7.4.0 Server on the system where IBM Storage Ceph is installed.

2. Unzip the folder:

[root@host01 ~]# unzip rhsso-7.4.0.zip

3. Navigate to the standalone/configuration directory and open the standalone.xml for editing:

[root@host01 ~]# cd standalone/configuration


[root@host01 configuration]# vi standalone.xml

4. Replace all instances of localhost and two instances of 127.0.0.1 with the IP address of the machine where Red Hat SSO
is installed.

IBM Storage Ceph 521


5. Optional: For Red Hat Enterprise Linux 8, users might get Certificate Authority (CA) issues. Import the custom certificates from
CA and move them into the keystore with the exact java version.

Example

[root@host01 ~]# keytool -import -noprompt -trustcacerts -alias ca -file ../ca.cer -keystore
/etc/java/java-1.8.0-openjdk/java-1.8.0-openjdk-1.8.0.272.b10-
3.el8_3.x86_64/lib/security/cacert

6. To start the server from the bin directory of rh-sso-7.4 folder, run the standalone boot script:

[root@host01 bin]# ./standalone.sh

7. Create the admin account in https: IP_ADDRESS :8080/auth with a username and password:

NOTE: You have to create an admin account only the first time that you log into the console

8. Log into the admin console with the credentials created.

Reference
Edit online

For adding roles for users on the dashboard, see Creating roles.

For creating users on the dashboard, see Creating users.

Syncing users to the Ceph dashboard using Red Hat Single Sign-On
Edit online
You can use Red Hat Single Sign-on (SSO) with Lightweight Directory Access Protocol (LDAP) integration to synchronize users to the
IBM Storage Ceph Dashboard.

The users are added to specific realms in which they can access the dashboard through SSO without any additional requirements of
a password.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Admin level access to the dashboard.

Users are added to the dashboard. See Creating users.

Root-level access on all the hosts.

Admin account created for syncing users. See Creating an admin account for syncing users to the Ceph dashboard.

Procedure
Edit online

1. To create a realm, click the Master drop-down menu. In this realm, you can provide access to users and applications.

2. In the Add Realm window, enter a case-sensitive realm name and set the parameter Enabled to ON and click Create.

3. In the Realm Settings tab, set the following parameters and click Save:

a. Enabled - ON

b. User-Managed Access - ON

522 IBM Storage Ceph


c. Make a note of the link address of SAML 2.0 Identity Provider Metadata to paste in Client Settings.

4. In the Clients tab, click Create.

5. In the Add Client window, set the following parameters and click Save:

a. Client ID - BASE_URL:8443/auth/saml2/metadata

Example

https://example.ceph.redhat.com:8443/auth/saml2/metadata ..
Client Protocol - saml

6. In the Client window, under Settings tab, set the following parameters:

Client Settings tab

Name of the
Syntax Example
parameter
Client ID BASE_URL:8443/auth/saml2/metadata https://example.ceph.redhat.com:8443/auth/sa
ml2/metadata
Enabled ON ON
Client Protocol saml saml
Include ON ON
AuthnStatement
Sign Documents ON ON
Signature RSA_SHA1 RSA_SHA1
Algorithm
SAML Signature KEY_ID KEY_ID
Key Name
Valid Redirect BASE_URL:8443 https://example.ceph.redhat.com:8443/
URLs
Base URL BASE_URL:8443 https://example.ceph.redhat.com:8443/
Master SAML https://localhost:8080/auth/realms/REALM_NAM https://localhost:8080/auth/realms/Ceph_LDAP
Processing URL E/protocol/saml/descriptor /protocol/saml/descriptor
NOTE: Paste the link of SAML 2.0 Identity Provider Metadata from Realm Settings tab.

Under Fine Grain SAML Endpoint Configuration, set the following parameters and click Save:

Fine Grain SAML configuration

Name of the parameter Syntax Example


Assertion Consumer Service POST BASE_URL:8443/#/dash https://example.ceph.redhat.com:8443/#/d
Binding URL board ashboard
Assertion Consumer Service Redirect BASE_URL:8443/#/dash https://example.ceph.redhat.com:8443/#/d
Binding URL board ashboard
Logout Service Redirect Binding URL BASE_URL:8443/ https://example.ceph.redhat.com:8443/
7. In the Clients window, Mappers tab, set the following parameters and click Save:

Client Mappers tab

Name of the parameter Value


Protocol saml
Name username
Mapper Property User Property
Property username
SAML Attribute name username
8. In the Clients Scope tab, select role_list:

a. In Mappers tab, select role list, set the Single Role Attribute to ON.

9. Select User_Federation tab:

a. In User Federation window, select ldap from the drop-down menu:

IBM Storage Ceph 523


b. In User_Federation window, Settings tab, set the following parameters and click Save. Click Test authentication. You will get
a notification that the LDAP authentication is successful.

Name of the parameter Value


Console Display Name rh-ldap
Import Users ON
Edit_Mode READ_ONLY
Username LDAP attribute username
RDN LDAP attribute username
UUID LDAP attribute nsuniqueid
User Object Classes inetOrgPerson
organizationalPerson rhatPerson
Connection URL Example: ldap://ldap.corp.redhat.com.
Users DN ou=users, dc=example, dc=com
Bind Type simple
Click Test Connection. You will get a notification that the LDAP connection is successful.

c. In Mappers tab, select first name row and edit the following parameter and Click Save:

- LDAP Attribute - givenName

d. In User_Federation tab, Settings tab, Click Synchronize all users.

You will get a notification that the sync of users is finished successfully.

10. In the Users tab, search for the user added to the dashboard and click the Search icon.

11. To view the user , click the specific row. You should see the federation link as the name provided for the User Federation.
IMPORTANT: Do not add users manually as the users will not be synchronized by LDAP. If added manually, delete the user by
clicking Delete.

524 IBM Storage Ceph


Verification
Edit online

Users added to the realm and the dashboard can access the Ceph dashboard with their mail address and password.

Example

https://example.ceph.redhat.com:8443

Reference
Edit online

For adding roles for users on the dashboard, see Creating roles.

For more information, see Creating an admin account for syncing users to the Ceph dashboard.

Enabling Single Sign-On for the Ceph Dashboard


Edit online
The Ceph Dashboard supports external authentication of users with the Security Assertion Markup Language (SAML) 2.0 protocol.
Before using single sign-On (SSO) with the Ceph dashboard, create the dashboard user accounts and assign the desired roles. The
Ceph Dashboard performs authorization of the users and the authentication process is performed by an existing Identity Provider
(IdP). You can enable single sign-on using the SAML protocol.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Dashboard.

Root-level access to The Ceph Manager hosts.

Procedure
Edit online

1. To configure SSO on Ceph Dashboard, run the following command:

Syntax

podman exec CEPH_MGR_HOST ceph dashboard sso setup saml2 CEPH_DASHBOARD_BASE_URL IDP_METADATA
IDP_USERNAME_ATTRIBUTE IDP_ENTITY_ID SP_X_509_CERT SP_PRIVATE_KEY

Example

[root@host01 ~]# podman exec host01 ceph dashboard sso setup saml2
https://dashboard_hostname.ceph.redhat.com:8443 idp-metadata.xml username
https://10.70.59.125:8080/auth/realms/realm_name /home/certificate.txt /home/private-key.txt

Replace

CEPH_MGR_HOST with Ceph mgr host. For example, host01

CEPH_DASHBOARD_BASE_URL with the base URL where Ceph Dashboard is accessible.

IDP_METADATA with the URL to remote or local path or content of the IdP metadata XML. The supported URL types are
http, https, and file.

Optional: IDP_USERNAME_ATTRIBUTE with the attribute used to get the username from the authentication response.
Defaults to uid.

IBM Storage Ceph 525


Optional: IDP_ENTITY_ID with the IdP entity ID when more than one entity ID exists on the IdP metadata.

Optional: SP_X_509_CERT with the file path of the certificate used by Ceph Dashboard for signing and encryption.

Optional: SP_PRIVATE_KEY with the file path of the private key used by Ceph Dashboard for signing and encryption.

2. Verify the current SAML 2.0 configuration:

Syntax

podman exec CEPH_MGR_HOST ceph dashboard sso show saml2

Example

[root@host01 ~]# podman exec host01 ceph dashboard sso show saml2

3. To enable SSO, run the following command:

Syntax

podman exec CEPH_MGR_HOST ceph dashboard sso enable saml2


SSO is "enabled" with "SAML2" protocol.

Example

[root@host01 ~]# podman exec host01 ceph dashboard sso enable saml2

4. Open your dashboard URL.

Example

https://dashboard_hostname.ceph.redhat.com:8443

5. On the SSO page, enter the login credentials. SSO redirects to the dashboard web interface.

Disabling Single Sign-On for the Ceph Dashboard


Edit online
You can disable single sign-on for Ceph Dashboard using the SAML 2.0 protocol.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Dashboard.

Root-level access to The Ceph Manager hosts.

Single sign-on enabled for Ceph Dashboard

Procedure
Edit online

1. To view status of SSO, run the following command:

Syntax

podman exec CEPH_MGR_HOST ceph dashboard sso status

Example

[root@host01 ~]# podman exec host01 ceph dashboard sso status


SSO is "enabled" with "SAML2" protocol.

526 IBM Storage Ceph


2. To disable SSO, run the following command:

Syntax

podman exec CEPH_MGR_HOST ceph dashboard sso disable

Example

[root@host01 ~]# podman exec host01 ceph dashboard sso disable


SSO is "disabled".

Reference
Edit online

For information about enabling enable single sign-on, see Enabling Single Sign-on for the Ceph Dashboard.

Management of roles
Edit online
As a storage administrator, you can create, edit, clone, and delete roles on the dashboard.

By default, there are eight system roles. You can create custom roles and give permissions to those roles. These roles can be
assigned to users based on the requirements.

User roles and permissions


Creating roles
Editing roles
Cloning roles
Deleting roles

User roles and permissions


Edit online
User accounts are associated with a set of roles that define the specific dashboard functionality which can be accessed.

The IBM Storage Ceph dashboard functionality or modules are grouped within a security scope. Security scopes are predefined and
static. The current available security scopes on the IBM Storage Ceph dashboard are:

cephfs: Includes all features related to CephFS management.

config-opt: Includes all features related to management of Ceph configuration options.

dashboard-settings: Allows to edit the dashboard settings.

grafana: Include all features related to Grafana proxy.

hosts: Includes all features related to the Hosts menu entry.

log: Includes all features related to Ceph logs management.

manager: Includes all features related to Ceph manager management.

monitor: Includes all features related to Ceph monitor management.

osd: Includes all features related to OSD management.

pool: Includes all features related to pool management.

prometheus: Include all features related to Prometheus alert management.

rbd-image: Includes all features related to RBD image management.

IBM Storage Ceph 527


rbd-mirroring: Includes all features related to RBD mirroring management.

rgw: Includes all features related to Ceph object gateway (RGW) management.

A role specifies a set of mappings between a security scope and a set of permissions. There are four types of permissions:

Read

Create

Update

Delete

The list of system roles are:

administrator: Allows full permissions for all security scopes.

block-manager: Allows full permissions for RBD-image and RBD-mirroring scopes.

cephfs-manager: Allows full permissions for the Ceph file system scope.

cluster-manager: Allows full permissions for the hosts, OSDs, monitor, manager, and config-opt scopes.

pool-manager: Allows full permissions for the pool scope.

read-only: Allows read permission for all security scopes except the dashboard settings and config-opt scopes.

rgw-manager: Allows full permissions for the Ceph object gateway scope.

For example, you need to provide rgw-manager access to the users for all Ceph object gateway operations.

Reference
Edit online

For creating users, see Creating Ceph Object Gateway.

For creating roles, see Creating roles on the Ceph dashboard.

528 IBM Storage Ceph


Creating roles
Edit online
You can create custom roles on the dashboard and these roles can be assigned to users based on their roles.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Admin level of access to the Dashboard.

Procedure
Edit online

1. Log in to the Dashboard.

2. Click the Dashboard Settings icon and then click User management.

3. On Roles tab, click Create.

4. In the Create Role window, set the Name, Description, and select the Permissions for this role, and then click the Create Role
button.

IBM Storage Ceph 529


5. You get a notification that the role was created successfully.

6. Click on the Expand/Collapse icon of the row to view the details and permissions given to the roles.

Reference
Edit online

For more information, see User roles and permissions.

For more information, see Creating users.

530 IBM Storage Ceph


Editing roles
Edit online
The dashboard allows you to edit roles on the dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Admin level of access to the Dashboard.

A role is created on the dashboard.

Procedure
Edit online

1. Log in to the Dashboard.

2. Click the Dashboard Settings icon and then click User management.

3. On Roles tab, click the role you want to edit.

4. In the Edit Role window, edit the parameters, and then click Edit Role.

5. You get a notification that the role was updated successfully.

Cloning roles
Edit online
When you want to assign additional permissions to existing roles, you can clone the system roles and edit it on the IBM Storage Ceph
Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Admin level of access to the dashboard.

Roles are created on the dashboard.

Procedure
Edit online

1. Log in to the Dashboard.

2. Click the Dashboard Settings icon and then click User management.

3. On Roles tab, click the role you want to clone.

4. Select Clone from the Edit drop-down menu.

IBM Storage Ceph 531


5. In the Clone Role dialog box, enter the details for the role, and then click Clone Role.

6. Once you clone the role, you can customize the permissions as per the requirements.

Reference
Edit online

For more information, see Creating roles.

Deleting roles
Edit online
You can delete the custom roles that you have created on the IBM Storage Ceph dashboard.

NOTE: You cannot delete the system roles of the Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Admin level of access to the Dashboard.

A custom role is created on the dashboard.

Procedure
Edit online

1. Log in to the Dashboard.

2. Click the Dashboard Settings icon and then click User management.

3. On Roles tab, click the role you want to delete.

4. Select Delete from the Edit drop-down menu.

5. In the Delete Role dialog box, Click the Yes, I am sure box and then click Delete Role.

Reference
Edit online

For more information, see Creating roles.

Management of users
Edit online
As a storage administrator, you can create, edit, and delete users with specific roles on the IBM Storage Ceph dashboard. Role-based
access control is given to each user based on their roles and requirements.

Creating users
Editing users
Deleting users

532 IBM Storage Ceph


Creating users
Edit online
You can create users on the IBM Storage Ceph dashboard with adequate roles and permissions based on their roles. For example, if
you want the user to manage Ceph object gateway operations, then you can give rgw-manager role to the user.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Admin level of access to the Dashboard.

NOTE: The IBM Storage Ceph Dashboard does not support any email verification when changing a users password. This behavior is
intentional, because the Dashboard supports Single Sign-On (SSO) and this feature can be delegated to the SSO provider.

Procedure
Edit online

1. Log in to the Dashboard.

2. Click the Dashboard Settings icon and then click User management.

3. On Users tab, click Create.

4. In the Create User window, set the Username and other parameters including the roles, and then click Create User.

Figure 1. Create User

5. You get a notification that the user was created successfully.

Reference

IBM Storage Ceph 533


Edit online

For more information, see Creating roles.

For more information, see User roles and permissions.

Editing users
Edit online
You can edit the users on the IBM Storage Ceph dashboard. You can modify the user’s password and roles based on the
requirements.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Admin level of access to the Dashboard.

User created on the dashboard.

Procedure
Edit online

1. Log in to the Dashboard.

2. Click the Dashboard Settings icon and then click User management.

3. To edit the user, click the row.

4. On Users tab, select Edit from the Edit drop-down menu.

5. In the Edit User window, edit parameters like password and roles, and then click Edit User.

534 IBM Storage Ceph


NOTE: If you want to disable any user’s access to the Ceph dashboard, you can uncheck Enabled option in the Edit User
window.

6. You get a notification that the user was created successfully.

Deleting users
Edit online
You can delete users on the Ceph dashboard. Some users might be removed from the system. The access to such users can be
deleted from the Ceph dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Admin level of access to the Dashboard.

User created on the dashboard.

Procedure
Edit online

1. Log in to the Dashboard.

2. Click the Dashboard Settings icon and then click User management.

3. On Users tab, click the user you want to delete.

4. select Delete from the Edit drop-down menu.

5. In the Delete User dialog box, Click the Yes, I am sure box and then Click Delete User to save the settings.

Reference
Edit online

For more information, see Creating users.

Management of Ceph daemons


Edit online
As a storage administrator, you can manage Ceph daemons on the IBM Storage Ceph dashboard.

Daemon actions

Daemon actions
Edit online
The IBM Storage Ceph dashboard allows you to start, stop, restart, and redeploy daemons.

NOTE: These actions are supported on all daemons except monitor and manager daemons.

IBM Storage Ceph 535


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

At least one daemon is configured in the storage cluster.

Procedure
Edit online
You can manage daemons two ways.

From the Services page:

1. Log in to the dashboard.

2. From the Cluster drop-down menu, select Services.

3. View the details of the service with the daemon to perform the action on by clicking the Expand/Collapse icon on its row.

4. In Details, select the drop down next to the desired daemon to perform Start, Stop, Restart, or Redeploy.

From the Hosts page:

1. Log in to the dashboard.

2. From the Cluster drop-down menu, select Hosts.

3. From the Hosts List, select the host with the daemon to perform the action on.

4. In the Daemon tab of the host, click the daemon.

5. Use the drop down at the top to perform Start, Stop, Restart, or Redeploy.

Monitor the cluster


536 IBM Storage Ceph
Edit online
As a storage administrator, you can use IBM Storage Ceph Dashboard to monitor specific aspects of the cluster based on types of
hosts, services, data access methods, and more.

Monitoring hosts of the Ceph cluster


Viewing and editing the configuration of the Ceph cluster
Viewing and editing the manager modules of the Ceph cluster
Monitoring monitors of the Ceph cluster
Monitoring services of the Ceph cluster
Monitoring Ceph OSDs
Monitoring HAProxy
Viewing the CRUSH map of the Ceph cluster
Filtering logs of the Ceph cluster
Monitoring pools of the Ceph cluster
Monitoring Ceph file systems
Monitoring Ceph object gateway daemons
Monitoring Block device images

Monitoring hosts of the Ceph cluster


Edit online
You can monitor the hosts of the cluster on the IBM Storage Ceph Dashboard.

The following are the different tabs on the hosts page:

Devices - This tab has details such as device ID, state of health, device name, and the daemons on the hosts.

Inventory - This tab shows all disks attached to a selected host, as well as their type, size and others. It has details such as
device path, type of device, available, vendor, model, size, and the OSDs deployed.

Daemons - This tab shows all services that have been deployed on the selected host, which container they are running in and
their current status. It has details such as hostname, daemon type, daemon ID, container ID, container image name, container
image ID, version status and last refreshed time.

Performance details - This tab has details such as OSDs deployed, CPU utilization, RAM usage, network load, network drop
rate, and OSD disk performance statistics.

Device health - For SMART-enabled devices, you can get the individual health status and SMART data only on the OSD
deployed hosts.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Hosts are added to the storage cluster.

All the services, monitor, manager and OSD daemons are deployed on the storage cluster.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select Hosts.

3. To view the details of a specific host, click the Expand/Collapse icon on it’s row.

IBM Storage Ceph 537


4. You can view the details such as Devices, Inventory, Daemons, Performance Details, and Device Health by clicking the
respective tabs.

Viewing and editing the configuration of the Ceph cluster


Edit online
You can view various configuration options of the Ceph cluster on the dashboard. You can edit only some configuration options.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

All the services are deployed on the storage cluster.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select Configuration.

3. Optional: You can search for the configuration using the Search box:

4. Optional: You can filter for a specific configuration using following filters:

Level - Basic, advanced or dev

Service - Any, mon, mgr, osd, mds, common, mds_client, rgw, and similar filters.

Source - Any, mon, and similar filters

Modified - yes or no

5. To view the details of the configuration, click the Expand/Collapse icon on it’s row.

6. To edit a configuration, click its row and click Edit.

a. In the edit dialog window, edit the required parameters and Click Update.

7. You get a notification that the configuration was updated successfully.

Viewing and editing the manager modules of the Ceph cluster


Edit online
Manager modules are used to manage module-specific configuration settings. For example, you can enable alerts for the health of
the cluster.

You can view, enable or disable, and edit the manager modules of a cluster on the IBM Storage Ceph dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

538 IBM Storage Ceph


Dashboard is installed.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select Manager Modules.

3. To view the details of a specific manager module, click the Expand/Collapse icon on it’s row.

Enabling a manager module

1. Select the row.

2. From the Edit drop-down menu, select Enable.

Disabling a manager module

1. Select the row.

2. From the Edit drop-down menu, select Disable.

Editing a manager module

1. Select the row:

NOTE: Not all modules have configurable parameters. If a module is not configurable, the Edit button is disabled.

2. Edit the required parameters and click Update.

3. You get a notification that the module was updated successfully.

Monitoring monitors of the Ceph cluster


Edit online
You can monitor the performance of the Ceph monitors on the landing page of the IBM Storage Ceph dashboard. You can also view
the details such as status, quorum, number of open session, and performance counters of the monitors in the Monitors tab.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Monitors are deployed in the storage cluster.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select Monitors.

3. The Monitors overview page displays information about the overall monitor status as well as tables of in Quorum and Not in
quorum Monitor hosts.

4. To see the number of open sessions, hover the cursor over the blue dotted trail.

5. To see performance counters for any monitor, click its hostname.

IBM Storage Ceph 539


Reference
Edit online

For more information, see Ceph monitors.

For more information, see Ceph performance counters.

Monitoring services of the Ceph cluster


Edit online
You can monitor the services of the cluster on the IBM Storage Ceph Dashboard. You can view the details such as hostname, daemon
type, daemon ID, container ID, container image name, container image ID, version status and last refreshed time.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Hosts are added to the storage cluster.

All the services are deployed on the storage cluster.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select Services.

3. To view the details of a specific service, click the Expand/Collapse icon on it’s row.

Monitoring Ceph OSDs


Edit online
You can monitor the status of the Ceph OSDs on the landing page of the IBM Storage Ceph Dashboard. You can also view the details
such as host, status, device class, number of placement groups (PGs), size flags, usage, and read or write operations time in the
OSDs tab.

The following are the different tabs on the OSDs page:

Devices - This tab has details such as Device ID, state of health, life expectancy, device name, and the daemons on the hosts.

Attributes (OSD map) - This tab shows the cluster address, details of heartbeat, OSD state, and the other OSD attributes.

Metadata - This tab shows the details of the OSD object store, the devices, the operating system, and the kernel details.

Device health - For SMART-enabled devices, you can get the individual health status and SMART data.

Performance counter - This tab gives details of the bytes written on the devices.

Performance Details - This tab has details such as OSDs deployed, CPU utilization, RAM usage, network load, network drop
rate, and OSD disk performance statistics.

Prerequisites
540 IBM Storage Ceph
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Hosts are added to the storage cluster.

All the services including OSDs are deployed on the storage cluster.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select OSDs.

3. To view the details of a specific OSD, click the Expand/Collapse icon on it’s row.

You can view additional details such as Devices, Attributes (OSD map), Metadata, Device Health, Performance counter, and
Performance Details by clicking on the respective tabs.

Monitoring HAProxy
Edit online
The Ceph Object Gateway allows you to assign many instances of the object gateway to a single zone, so that you can scale out as
load increases. Since each object gateway instance has its own IP address, you can use HAProxy to balance the load across Ceph
Object Gateway servers.

You can monitor the following HAProxy metrics:

Total responses by HTTP code.

Total requests/responses.

Total number of connections.

Current total number of incoming / outgoing bytes.

You can also get the Grafana details by running the ceph dashboard get-grafana-api-url command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Admin level access on the storage dashboard.

An existing Ceph Object Gateway service, without SSL. If you want SSL service, the certificate should be configured on the
ingress service, not the Ceph Object Gateway service.

Ingress service deployed using the Ceph Orchestrator.

Monitoring stack components are created on the dashboard.

Procedure
Edit online

1. Log in to the Grafana URL and select the RGW_Overview panel:

Syntax

IBM Storage Ceph 541


https://_DASHBOARD_URL_:3000

Example

https://dashboard_url:3000

2. Verify the HAProxy metrics on the Grafana URL.

3. Launch the Ceph dashboard and log in with your credentials.

Example

https://dashboard_url:8443

4. From the Cluster drop-down menu, select Object Gateway.

5. Select Daemons.

6. Select the Overall Performance tab.

Verification
Edit online

Verify the Ceph Object Gateway HAProxy metrics:

Reference
Edit online

For more information, see Configuring high availability for the Ceph Object Gateway.

Viewing the CRUSH map of the Ceph cluster


Edit online
You can view the The CRUSH map that contains a list of OSDs and related information on the IBM Storage Ceph dashboard. Together,
the CRUSH map and CRUSH algorithm determine how and where data is stored. The dashboard allows you to view different aspects
of the CRUSH map, including OSD hosts, OSD daemons, ID numbers, device class, and more.

The CRUSH map allows you to determine which host a specific OSD ID is running on. This is helpful if there is an issue with an OSD.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

OSD daemons deployed on the storage cluster.

542 IBM Storage Ceph


Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select CRUSH Map.

3. To view the details of the specific OSD, click it’s row.

Filtering logs of the Ceph cluster


Edit online
You can view and filter logs of the IBM Storage Ceph cluster based on several criteria. The criteria includes Priority, Keyword, Date,
and Time range.

You can download the logs to the system or copy the logs to the clipboard as well for further analysis.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

The Dashboard is installed.

Log entries have been generated since the Ceph Monitor was last started.

NOTE: The Dashboard logging feature only displays the thirty latest high level events. The events are stored in memory by the Ceph
Monitor. The entries disappear after restarting the Monitor. If you need to review detailed or older logs, refer to the file based logs.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select Logs.

a. To filter by priority, click the Priority drop-down menu and select either Debug, Info, Warning, Error, or All.

b. To filter by keyword, enter text into the Keyword field.

c. To filter by date, click the Date field and either use the date picker to select a date from the menu, or enter a date in the
form of YYYY-MM-DD.

d. To filter by time, enter a range in the Time range fields using the HH:MM - HH:MM format. Hours must be entered
using numbers 0 to 23.

e. To combine filters, set two or more filters.

3. Click the Download icon or Copy to Clipboard icon to download the logs.

Reference
Edit online

For more information, see Configuring Logging.

For more information, see Understanding Ceph Logs.

IBM Storage Ceph 543


Monitoring pools of the Ceph cluster
Edit online
A pool plays a critical role in how the Ceph storage cluster distributes and stores data. If you have deployed a cluster without
creating a pool, Ceph uses the default pools for storing data.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Pools are created

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, select Pools.

3. View the pools list which gives the details of Data protection and the application for which the pool is enabled. Hover the
mouse over Usage, Read bytes, and Write bytes for the required details.

4. To view more information about a pool, click the Expand/Collapse icon on it’s row.

Monitoring Ceph file systems


Edit online
You can use the IBM Storage Ceph Dashboard to monitor Ceph File Systems (CephFS) and related components.

There are four main tabs in File Systems:

Details - View the metadata servers (MDS) and their rank plus any standby daemons, pools and their usage,and performance
counters.

Clients - View list of clients that have mounted the file systems.

Directories - View list of directories.

Performance - View the performance of the file systems.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

MDS service is deployed on at least one of the hosts.

Ceph File System is installed.

Procedure
Edit online

544 IBM Storage Ceph


1. Log in to the dashboard.

2. On the navigation bar, click Filesystems.

3. To view more information about the file system, click the Expand/Collapse icon on it’s row.

Monitoring Ceph object gateway daemons


Edit online
You can use the IBM Storage Ceph Dashboard to monitor Ceph object gateway daemons. You can view the details, performance
counters and performance details of the Ceph object gateway daemons.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

At least one Ceph object gateway daemon configured in the storage cluster.

Procedure
Edit online

1. Log in to the dashboard.

2. On the navigation bar, click Object Gateway.

3. To view more information about the Ceph object gateway daemon, click the Expand/Collapse icon on it’s row.

If you have configured multiple Ceph Object Gateway daemons, click on Sync Performance tab and view the multi-site performance
counters.

Monitoring Block device images


Edit online
You can use the IBM Storage Ceph Dashboard to monitor and manage Block device images. You can view the details, snapshots,
configuration details, and performance details of the images.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

Procedure
Edit online

1. Log in to the dashboard.

IBM Storage Ceph 545


2. On the navigation bar, click Block.

3. To view more information about the images, click the Expand/Collapse icon on it’s row.

Management of Alerts
Edit online
As a storage administrator, you can see the details of alerts and create silences for them on the IBM Storage Ceph dashboard.

This includes the following pre-defined alerts:

CephadmDaemonFailed

CephadmPaused

CephadmUpgradeFailed

CephDaemonCrash

CephDeviceFailurePredicted

CephDeviceFailurePredictionTooHigh

CephDeviceFailureRelocationIncomplete

CephFilesystemDamaged

CephFilesystemDegraded

CephFilesystemFailureNoStandby

CephFilesystemInsufficientStandby

CephFilesystemMDSRanksLow

CephFilesystemOffline

CephFilesystemReadOnly

CephHealthError

CephHealthWarning

CephMgrModuleCrash

CephMgrPrometheusModuleInactive

CephMonClockSkew

CephMonDiskspaceCritical

CephMonDiskspaceLow

CephMonDown

CephMonDownQuorumAtRisk

CephNodeDiskspaceWarning

CephNodeInconsistentMTU

CephNodeNetworkPacketDrops

CephNodeNetworkPacketErrors

CephNodeRootFilesystemFull

CephObjectMissing

546 IBM Storage Ceph


CephOSDBackfillFull

CephOSDDown

CephOSDDownHigh

CephOSDFlapping

CephOSDFull

CephOSDHostDown

CephOSDInternalDiskSizeMismatch

CephOSDNearFull

CephOSDReadErrors

CephOSDTimeoutsClusterNetwork

CephOSDTimeoutsPublicNetwork

CephOSDTooManyRepairs

CephPGBackfillAtRisk

CephPGImbalance

CephPGNotDeepScrubbed

CephPGNotScrubbed

CephPGRecoveryAtRisk

CephPGsDamaged

CephPGsHighPerOSD

CephPGsInactive

CephPGsUnclean

CephPGUnavilableBlockingIO

CephPoolBackfillFull

CephPoolFull

CephPoolGrowthWarning

CephPoolNearFull

CephSlowOps

PrometheusJobMissing

You can also monitor alerts using simple network management protocol (SNMP) traps.

Enabling monitoring stack


Configuring Grafana certificate
Adding Alertmanager webhooks
Viewing alerts
Creating a silence
Re-creating a silence
Editing a silence
Expiring a silence

Enabling monitoring stack


IBM Storage Ceph 547
Edit online
You can manually enable the monitoring stack of the IBM Storage Ceph cluster, such as Prometheus, Alertmanager, and Grafana,
using the command-line interface. You can use the Prometheus and Alertmanager API to manage alerts and silences.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

root-level access to all the hosts.

Procedure
Edit online

1. Log into the cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Set the APIs for the monitoring stack:

Specify the host and port of the Alertmanager server:

Syntax

ceph dashboard set-alertmanager-api-host '_ALERTMANAGER_API_HOST_:PORT'

Example

[ceph: root@host01 /]# ceph dashboard set-alertmanager-api-host 'http://10.0.0.101:9093'


Option ALERTMANAGER_API_HOST updated

To see the configured alerts, configure the URL to the Prometheus API. Using this API, the Ceph Dashboard UI verifies
that a new silence matches a corresponding alert.

Syntax

ceph dashboard set-prometheus-api-host '_PROMETHEUS_API_HOST_:PORT'

Example

[ceph: root@host01 /]# ceph dashboard set-prometheus-api-host 'http://10.0.0.101:9095'


Option PROMETHEUS_API_HOST updated

After setting up the hosts, refresh your browser’s dashboard window.

Specify the host and port of the Grafana server:

Syntax

ceph dashboard set-grafana-api-url '_GRAFANA_API_URL_:PORT'

Example

[ceph: root@host01 /]# ceph dashboard set-grafana-api-url 'http://10.0.0.101:3000'


Option GRAFANA_API_URL updated

3. Get the Prometheus, Alertmanager, and Grafana API host details:

Example

[ceph: root@host01 /]# ceph dashboard get-alertmanager-api-host


http://10.0.0.101:9093
[ceph: root@host01 /]# ceph dashboard get-prometheus-api-host
http://10.0.0.101:9095
[ceph: root@host01 /]# ceph dashboard get-grafana-api-url
http://10.0.0.101:3000

548 IBM Storage Ceph


4. Optional: If you are using a self-signed certificate in your Prometheus, Alertmanager, or Grafana setup, disable the certificate
verification in the dashboard This avoids refused connections caused by certificates signed by an unknown Certificate
Authority (CA) or that do not match the hostname.

For Prometheus:

Example

[ceph: root@host01 /]# ceph dashboard set-prometheus-api-ssl-verify False

For Alertmanager:

Example

[ceph: root@host01 /]# ceph dashboard set-alertmanager-api-ssl-verify False

For Grafana:

Example

[ceph: root@host01 /]# ceph dashboard set-grafana-api-ssl-verify False

5. Get the details of the self-signed certificate verification setting for Prometheus, Alertmanager, and Grafana:

Example

[ceph: root@host01 /]# ceph dashboard get-prometheus-api-ssl-verify


[ceph: root@host01 /]# ceph dashboard get-alertmanager-api-ssl-verify
[ceph: root@host01 /]# ceph dashboard get-grafana-api-ssl-verify

6. Optional: If the dashboard does not reflect the changes, you have to disable and then enable the dashboard:

Example

[ceph: root@host01 /]# ceph mgr module disable dashboard


[ceph: root@host01 /]# ceph mgr module enable dashboard

Configuring Grafana certificate


Edit online
The cephadm deploys Grafana using the certificate defined in the ceph key/value store. If a certificate is not specified, cephadm
generates a self-signed certificate during the deployment of the Grafana service.

You can configure a custom certificate with the ceph config-key set command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Procedure
Edit online

1. Log into the cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Configure the custom certificate for Grafana:

Example

[ceph: root@host01 /]# ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem


[ceph: root@host01 /]# ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem

IBM Storage Ceph 549


3. If Grafana is already deployed, then run reconfig to update the configuration:

Example

[ceph: root@host01 /]# ceph orch reconfig grafana

4. Every time a new certificate is added, follow the below steps:

a. Make a new directory: Example

[root@host01 ~]# mkdir /root/internalca


[root@host01 ~]# cd /root/internalca

b. Generate the key:

Example

[root@host01 internalca]# openssl ecparam -genkey -name secp384r1 -out $(date +%F).key

c. View the key: Example

[root@host01 internalca]# openssl ec -text -in $(date +%F).key | less

d. Make a request:

Example

[root@host01 internalca]# umask 077; openssl req -config openssl-san.cnf -new -sha256 -
key $(date +%F).key -out $(date +%F).csr

e. Review the request prior to sending it for signature:

Example

[root@host01 internalca]# openssl req -text -in $(date +%F).csr | less

f. As the CA sign:

Example

# openssl ca -extensions v3_req -in $(date +%F).csr -out $(date +%F).crt -extfile openssl-san.cnf

g. Check the signed certificate:

Example

# openssl x509 -text -in $(date +%F).crt -noout | less

Reference
Edit online

See the Using shared system certificates for more details.

Adding Alertmanager webhooks


Edit online
You can add new webhooks to an existing Alertmanager configuration to receive real-time alerts about the health of the storage
cluster. You have to enable incoming webhooks to allow asynchronous messages into third-party applications.

For example, if an OSD is down in an IBM Storage Ceph cluster, you can configure the Alertmanager to send notification on Google
chat.

Prerequisites
Edit online

550 IBM Storage Ceph


A running IBM Storage Ceph cluster with monitoring stack components enabled.

Incoming webhooks configured on the receiving third-party application.

Procedure
Edit online

1. Log into the cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Configure the Alertmanager to use the webhook for notification:

Syntax

service_type: alertmanager
spec:
user_data:
default_webhook_urls:
- "_URLS_"

The ‘default_webhook_urls is a list of additional URLs that are


added to the default receivers’ webhook_configs` configuration.

Example

service_type: alertmanager
spec:
user_data:
webhook_configs:
- url: 'http:127.0.0.10:8080'

3. Update Alertmanager configuration:

Example

[ceph: root@host01 /]# ceph orch reconfig alertmanager

Verification
Edit online

An example notification from Alertmanager to Gchat:

Example

using: https://chat.googleapis.com/v1/spaces/(xx- space identifyer -xx)/messages


posting: {'status': 'resolved', 'labels': {'alertname': 'PrometheusTargetMissing', 'instance':
'postgres-exporter.host03.chest
response: 200
response: {
"name": "spaces/(xx- space identifyer -xx)/messages/3PYDBOsIofE.3PYDBOsIofE",
"sender": {
"name": "users/114022495153014004089",
"displayName": "monitoring",
"avatarUrl": "",
"email": "",
"domainId": "",
"type": "BOT",
"isAnonymous": false,
"caaEnabled": false
},
"text": "Prometheus target missing (instance postgres-exporter.cluster.local:9187)\n\nA
Prometheus target has disappeared. An e
"cards": [],
"annotations": [],
"thread": {
"name": "spaces/(xx- space identifyer -xx)/threads/3PYDBOsIofE"
},

IBM Storage Ceph 551


"space": {
"name": "spaces/(xx- space identifyer -xx)",
"type": "ROOM",
"singleUserBotDm": false,
"threaded": false,
"displayName": "_privmon",
"legacyGroupChat": false
},
"fallbackText": "",
"argumentText": "Prometheus target missing (instance postgres-
exporter.cluster.local:9187)\n\nA Prometheus target has disappea
"attachment": [],
"createTime": "2022-06-06T06:17:33.805375Z",
"lastUpdateTime": "2022-06-06T06:17:33.805375Z"

Viewing alerts
Edit online
After an alert has fired, you can view it on the IBM Storage Ceph Dashboard. You can edit the Manager module settings to trigger a
mail when an alert is fired.

NOTE: SSL is not supported in IBM Storage Ceph 6 cluster.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A running simple mail transfer protocol (SMTP) configured.

An alert fired.

Procedure
Edit online

1. Log in to the Dashboard.

2. Customize the alerts module on the dashboard to get an email alert for the storage cluster:

a. On the navigation menu, click Cluster.

b. Select Manager modules.

c. Select alerts module.

d. In the Edit drop-down menu, select Edit.

e. In the Edit Manager module window, update the required parameters and click Update.

NOTE: Do not select the smtp_ssl parameter.

3. On the navigation menu, click Cluster.

4. Select Monitoring from the drop-down menu.

5. To view details of the alert, click the Expand/Collapse icon on it’s row.

6. To view the source of an alert, click on its row, and then click Source.

Creating a silence
552 IBM Storage Ceph
Edit online
You can create a silence for an alert for a specified amount of time on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

An alert fired.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Cluster.

3. Select Monitoring from the drop-down menu.

4. To create silence for an alert, select it’s row.

5. Click Create Silence.

6. In the Create Silence window, add the details for the Duration and click Create Silence.

7. You get a notification that the silence was created successfully.

Re-creating a silence
Edit online
You can re-create a silence from an expired silence on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

An alert fired.

A silence created for the alert.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Cluster.

3. Select Monitoring from the drop-down menu.

4. Click the Silences tab.

5. To recreate an expired silence, click it’s row.

6. Click the Recreate button.

IBM Storage Ceph 553


7. In the Recreate Silence window, add the details and click Recreate Silence.

8. You get a notification that the silence was recreated successfully.

Editing a silence
Edit online
You can edit an active silence, for example, to extend the time it is active on the IBM Storage Ceph Dashboard. If the silence has
expired, you can either recreate a silence or create a new silence for the alert.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

An alert fired.

A silence created for the alert.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Cluster.

3. Select Monitoring from the drop-down menu.

4. Click the Silences tab.

5. To edit the silence, click it’s row.

6. In the Edit drop-down menu, select Edit.

7. In the Edit Silence window, update the details and click Edit Silence.

8. You get a notification that the silence was updated successfully.

Expiring a silence
Edit online
You can expire a silence so any matched alerts will not be suppressed on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

An alert fired.

A silence created for the alert.

554 IBM Storage Ceph


Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Cluster.

3. Select Monitoring from the drop-down menu.

4. Click the Silences tab.

5. To expire a silence, click it’s row.

6. In the Edit drop-down menu, select Expire.

7. In the Expire Silence dialog box, select Yes, I am sure, and then click Expire Silence.

8. You get a notification that the silence was expired successfully.

Management of pools
Edit online
As a storage administrator, you can create, edit, and delete pools on the IBM Storage Ceph dashboard.

Creating pools
Editing pools
Deleting pools

Creating pools
Edit online
When you deploy a storage cluster without creating a pool, Ceph uses the default pools for storing data. You can create pools to
logically partition your storage objects on the IBM Storage Ceph dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Procedure
Edit online

1. Log in to the dashboard.

2. On the navigation menu, click Pools.

3. Click Create.

4. In the Create Pool window, set the following parameters:

a. Set the name of the pool and select the pool type.

b. Select either replicated or Erasure Coded (EC) pool type.

c. Set the Placement Group (PG) number.

IBM Storage Ceph 555


d. Optional: If using a replicated pool type, set the replicated size.

e. Optional: If using an EC pool type configure the following additional settings.

f. Optional: To see the settings for the currently selected EC profile, click the question mark.

g. Optional: Add a new EC profile by clicking the plus symbol.

h. Optional: Click the pencil symbol to select an application for the pool.

i. Optional: Set the CRUSH rule, if applicable.

j. Optional: If compression is required, select passive, aggressive, or force.

k. Optional: Set the Quotas.

l. Optional: Set the Quality of Service configuration.

5. Click Create Pool.

6. You get a notification that the pool was created successfully.

Reference
Edit online

For more information, see Ceph pools.

Editing pools
Edit online
You can edit the pools on the IBM Storage Ceph Dashboard.

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool is created.

Procedure
Edit online

1. Log in to the dashboard.

2. On the navigation menu, click Pools.

3. To edit the pool, click its row.

4. Select Edit In the Edit drop-down.

5. In the Edit Pool window, edit the required parameters and click Edit Pool:

6. You get a notification that the pool was created successfully.

Deleting pools
Edit online
You can delete the pools on the IBM Storage Ceph Dashboard. Ensure that value of mon_allow_pool_delete is set to True in
Manager modules.

556 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool is created.

Procedure
Edit online

1. Log in to the dashboard.

2. On the navigation bar, in Cluster drop-down menu, click Configuration.

3. In the Level drop-down menu, select Advanced.

4. Search for mon_allow_pool_delete, click Edit.

5. Set all the values to true:

6. On the navigation bar, click Pools.

7. To delete the pool, click on its row.

8. From Edit drop-down menu, select Delete.

IBM Storage Ceph 557


9. In the Delete Pool window, Click the Yes, I am sure box and then click Delete Pool to save the settings.

Reference
Edit online

For more information, see Ceph pools.

For more information on Compression Modes, see Pool values.

Management of hosts
Edit online
As a storage administrator, you can enable or disable maintenance mode for a host in the IBM Storage Ceph Dashboard. The
maintenance mode ensures that shutting down the host, to perform maintenance activities, does not harm the cluster.

You can also remove hosts using Start Drain and Remove options in the IBM Storage Ceph Dashboard.

Entering maintenance mode


Exiting maintenance mode
Removing hosts

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Hosts, Ceph Monitors and Ceph Manager Daemons are added to the storage cluster.

Entering maintenance mode


Edit online
If the maintenance mode gets enabled successfully, the host is taken offline without any errors for the maintenance activity to be
performed. If the maintenance mode fails, it indicates the reasons for failure and the actions you need to take before taking the host
down.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

558 IBM Storage Ceph


All other prerequisite checks are performed internally by Ceph and any probable errors are taken care of internally by Ceph.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select Hosts.

3. Select a host from the list.

4. From the Edit drop-down menu, click Enter Maintenance.

NOTE: When a host enters maintenance, all daemons are stopped. You can check the status of the daemons under the Daemons tab
of a host.

Verification
Edit online

You get a notification that the host is successfully moved to maintenance and a maintenance label appears in the Status
column.

NOTE: If the maintenance mode fails, you get a notification indicating the reasons for failure.

Exiting maintenance mode


Edit online
To restart a host, you can move it out of maintenance mode on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

All other prerequisite checks are performed internally by Ceph and any probable errors are taken care of internally by Ceph.

Procedure
Edit online

1. Log in to the Dashboard.

IBM Storage Ceph 559


2. From the Cluster drop-down menu, select Hosts.

3. From the Hosts List, select the host in maintenance.

NOTE: You can identify the host in maintenance by checking for the maintenance label in the Status column.

4. From the Edit drop-down menu, click Exit Maintenance.

After exiting the maintenance mode, you need to create the required services on the host by default-crash and the node-exporter
gets deployed.

Verification
Edit online

You get a notification that the host has been successfully moved out of maintenance and the maintenance label is removed
from the Status column.

Removing hosts
Edit online
To remove a host from a Ceph cluster, you can use Start Drain and Remove options in IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

All other prerequisite checks are performed internally by Ceph and any probable errors are taken care of internally by Ceph.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select Hosts.

3. From the Hosts List, select the host you want to remove.

4. From the Edit drop-down menu, click Start Drain. This option drains all the daemons from the host.

NOTE: The _no_schedule label is automatically applied to the host, which blocks the deployment of daemons on this host.

a. Optional: to stop the draining of daemons from the host, click Stop Drain option from the Edit drop-down menu.

560 IBM Storage Ceph


5. Check if all the daemons are removed from the host.

a. Click the Expand/Collapse icon on it’s row

b. Select Daemons. No daemons should be listed.

IMPORTANT: A host can be safely removed from the cluster after all the daemons are removed from it.

6. Remove the host.

a. From the Edit drop-down menu, click Remove.

b. In the Remove Host dialog box, check Yes, I am sure. and click Remove Host.

Verification
Edit online

You get a notification after the successful removal of the host from the Hosts List.

Management of Ceph OSDs


Edit online
As a storage administrator, you can monitor and manage OSDs on the IBM Storage Ceph Dashboard.

Some of the capabilities of the IBM Storage Ceph Dashboard are:

List OSDs, their status, statistics, information such as attributes, metadata, device health, performance counters and
performance details.

Mark OSDs down, in, out, lost, purge, reweight, scrub, deep-scrub, destroy, delete, and select profiles to adjust backfilling
activity.

List all drives associated with an OSD.

Set and change the device class of an OSD.

Deploy OSDs on new drives and hosts.

.Prerequisites

A running IBM Storage Ceph cluster

cluster-manager level of access on the IBM Storage Ceph dashboard

IBM Storage Ceph 561


Managing the OSDs
Replacing the failed OSDs

Managing the OSDs


Edit online
You can carry out the following actions on a Ceph OSD on the IBM Storage Ceph Dashboard:

Create a new OSD.

Edit the device class of the OSD.

Mark the Flags as No Up, No Down, No In, or No Out.

Scrub and deep-scrub the OSDs.

Reweight the OSDs.

Mark the OSDs Out, In, Down, or Lost.

Purge the OSDs.

Destroy the OSDs.

Delete the OSDs.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Hosts, Monitors and Manager Daemons are added to the storage cluster.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select OSDs.

Creating an OSD

1. To create the OSD, click Create.

NOTE: Ensure you have an available host and a few available devices. You can check for available devices in Physical Disks
under the Cluster drop-down menu.

a. In the Create OSDs window, from Deployment Options, select one of the below options:

Cost/Capacity-optimized: The cluster gets deployed with all available HDDs.

Throughput-optimized: Slower devices are used to store data and faster devices are used to store
journals/WALs.

IOPS-optmized: All the available NVMEs are used to deploy OSDs.

b. From the Advanced Mode, you can add primary, WAL and DB devices by clicking +Add.

Primary devices: Primary storage devices contain all OSD data.

562 IBM Storage Ceph


WAL devices: Write-Ahead-Log devices are used for BlueStore’s internal journal and are used only if the WAL
device is faster than the primary device. For example, NVMEs or SSDs.

DB devices: DB devices are used to store BlueStore’s internal metadata and are used only if the DB device is
faster than the primary device. For example, NVMEs or SSDs).

c. If you want to encrypt your data for security purposes, under Features, select encryption.

d. Click the Preview button and in the OSD Creation Preview dialog box, Click Create.

e. In the OSD Creation Preview dialog box, Click Create.

2. You get a notification that the OSD was created successfully.

3. The OSD status changes from in and down to in and up.

Editing an OSD

1. To edit an OSD, select the row.

a. From Edit drop-down menu, select Edit.

b. Edit the device class.

c. Click Edit OSD.

d. You get a notification that the OSD was updated successfully.

Marking the Flags of OSDs

1. To mark the flag of the OSD, select the row.

a. From Edit drop-down menu, select Flags.

b. Mark the Flags with No Up, No Down, No In, or No Out.

c. Click Update.

d. You get a notification that the flags of the OSD was updated successfully.

Scrubbing the OSDs

1. To scrub the OSD, select the row.

a. From Edit drop-down menu, select Scrub.

b. In the OSDs Scrub dialog box, click Update.

c. You get a notification that the scrubbing of the OSD was initiated successfully.

Deep-scrubbing the OSDs

1. To deep-scrub the OSD, select the row.

a. From Edit drop-down menu, select Deep scrub.

b. In the OSDs Deep Scrub dialog box, click Update.

c. You get a notification that the deep scrubbing of the OSD was initiated successfully.

Reweighting the OSDs

1. To reweight the OSD, select the row.

a. From Edit drop-down menu, select Reweight.

b. In the Reweight OSD dialog box, enter a value between zero and one.

c. Click Reweight.

Marking OSDs Out

IBM Storage Ceph 563


1. To mark the OSD out, select the row.

a. From Edit drop-down menu, select Mark Out.

b. In the Mark OSD out dialog box, click Mark Out.

c. The status of the OSD will change to out.

Marking OSDs In

1. To mark the OSD in, select the OSD row that is in out status.

a. From Edit drop-down menu, select Mark In.

b. In the Mark OSD in dialog box, click Mark In.

c. The status of the OSD will change to in.

Marking OSDs Down

1. To mark the OSD down, select the row.

a. From Edit drop-down menu, select Mark Down.

b. In the Mark OSD down dialog box, click Mark Down.

c. The status of the OSD will change to down.

Marking OSDs Lost

1. To mark the OSD lost, select the OSD in out and down status.

a. From Edit drop-down menu, select Mark Lost.

b. In the Mark OSD Lost dialog box, check Yes, I am sure option, and click Mark Lost.

Purging OSDs

1. To purge the OSD, select the OSD in down status.

a. From Edit drop-down menu, select Purge.

b. In the Purge OSDs dialog box, check Yes, I am sure option, and click Purge OSD.

c. All the flags are reset and the OSD is back in in and up status.

Destroying OSDs

1. To destroy the OSD, select the OSD in down status.

a. From Edit drop-down menu, select Destroy.

b. In the Destroy OSDs dialog box, check Yes, I am sure option, and click Destroy OSD.

c. The status of the OSD changes to destroyed.

Deleting OSDs

1. To delete the OSD, select the OSD in down status.

a. From Edit drop-down menu, select Delete.

b. In the Destroy OSDs dialog box, check Yes, I am sure option, and click Delete OSD.

NOTE: You can preserve the OSD_ID when you have to to replace the failed OSD.

Replacing the failed OSDs


Edit online

564 IBM Storage Ceph


You can replace the failed OSDs in an IBM Storage Ceph cluster with the cluster-manager level of access on the dashboard. One of
the highlights of this feature on the dashboard is that the OSD IDs can be preserved while replacing the failed OSDs.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

At least cluster-manager level of access to the Ceph Dashboard.

At least one of the OSDs is down

Procedure
Edit online

1. On the dashboard, you can identify the failed OSDs in the following ways:

Dashboard AlertManager pop-up notifications.

Dashboard landing page showing HEALTH_WARN status.

Dashboard landing page showing failed OSDs.

Dashboard OSD page showing failed OSDs.

In this example, you can see that one of the OSDs is down on the landing page of the dashboard.

Apart from this, on the physical drive, you can view the LED lights blinking if one of the OSDs is down.

2. Click OSDs.

3. Select the out and down OSD:

a. From the Edit drop-down menu, select Flags and select No Up and click Update.

b. From the Edit drop-down menu, select Delete.

c. In the Delete OSD dialog box, select the Preserve OSD ID(s) for replacement and Yes, I am sure check boxes.

d. Click Delete OSD.

e. Wait till the status of the OSD changes to out and destroyed status.

4. Optional: If you want to change the No Up Flag for the entire cluster, in the Cluster-wide configuration drop-down menu, select
Flags.

a. In Cluster-wide OSDs Flags dialog box, select No Up and click Update.

5. Optional: If the OSDs are down due to a hard disk failure, replace the physical drive:

If the drive is hot-swappable, replace the failed drive with a new one.

If the drive is not hot-swappable and the host contains multiple OSDs, you might have to shut down the whole host and
replace the physical drive. Consider preventing the cluster from backfilling. For more information, see Stopping and
Starting Rebalancing.

When the drive appears under the /dev/ directory, make a note of the drive path.

If you want to add the OSD manually, find the OSD drive and format the disk.

If the new disk has data, zap the disk:

Syntax

ceph orch device zap HOST_NAME PATH --force

Example

IBM Storage Ceph 565


ceph orch device zap ceph-adm2 /dev/sdc --force

6. From the Create drop-down menu, select Create.

7. In the Create OSDs window, click +Add for Primary devices.

a. In the Primary devices dialog box, from the Hostname drop-down list, select any one filter. From Any drop-down list,
select the respective option.

NOTE: You have to select the Hostname first and then at least one filter to add the devices.

For example, from Hostname list, select Type and from Any list select hdd. Select Vendor and from Any list, select ATA.

b. Click Add.

c. In the Create OSDs window, click the Preview button.

d. In the OSD Creation Preview dialog box, Click Create.

e. You will get a notification that the OSD is created. The OSD will be in out and down status.

8. Select the newly created OSD that has out and down status.

a. In the Edit drop-down menu, select Mark-in.

b. In the Mark OSD in window, select Mark in.

c. In the Edit drop-down menu, select Flags.

d. Uncheck No Up and click Update.

9. Optional: If you have changed the No Up Flag before for cluster-wide configuration, in the Cluster-wide configuration menu,
select Flags.

a. In Cluster-wide OSDs Flags dialog box, uncheck No Up and click Update.

Verification
Edit online

Verify that the OSD that was destroyed is created on the device and the OSD ID is preserved.

Management of Ceph Object Gateway


Edit online
As a storage administrator, the Ceph Object Gateway functions of the dashboard allow you to manage and monitor the Ceph Object
Gateway.

You can also create the Ceph Object Gateway services with Secure Sockets Layer (SSL) using the dashboard.

For example, monitoring functions allow you to view details about a gateway daemon such as its zone name, or performance graphs
of GET and PUT rates. Management functions allow you to view, create, and edit both users and buckets.

Ceph Object Gateway functions are divided between user functions and bucket functions.

Manually adding Ceph object gateway login credentials to the dashboard

Manually adding Ceph object gateway login credentials to the


dashboard
Edit online

566 IBM Storage Ceph


The IBM Storage Ceph Dashboard can manage the Ceph Object Gateway, also known as the RADOS Gateway, or RGW. When Ceph
Object Gateway is deployed with cephadm, the Ceph Object Gateway credentials used by the dashboard is automatically configured.
You can also manually force the Ceph object gateway credentials to the Ceph dashboard using the command-line interface.

Creating the Ceph Object Gateway services with SSL using the dashboard
Management of Ceph Object Gateway users
Management of Ceph Object Gateway buckets

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Ceph Object Gateway is installed.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Set up the credentials manually:

Example

[ceph: root@host01 /]# ceph dashboard set-rgw-credentials

This creates a Ceph Object Gateway user with UID dashboard for each realm in the system.

3. Optional: If you have configured a custom admin resource in your Ceph Object Gateway admin API, you have to also set the
the admin resource:

Syntax

ceph dashboard set-rgw-api-admin-resource RGW_API_ADMIN_RESOURCE

Example

[ceph: root@host01 /]# ceph dashboard set-rgw-api-admin-resource admin


Option RGW_API_ADMIN_RESOURCE updated

4. Optional: If you are using HTTPS with a self-signed certificate, disable certificate verification in the dashboard to avoid refused
connections.

Refused connections can happen when the certificate is signed by an unknown Certificate Authority, or if the host name used
does not match the host name in the certificate.

Syntax

ceph dashboard set-rgw-api-ssl-verify false

Example

[ceph: root@host01 /]# ceph dashboard set-rgw-api-ssl-verify False


Option RGW_API_SSL_VERIFY updated

5. Optional: If the Object Gateway takes too long to process requests and the dashboard runs into timeouts, you can set the
timeout value:

Syntax

ceph dashboard set-rest-requests-timeout _TIME_IN_SECONDS_

IBM Storage Ceph 567


The default value of 45 seconds.

Example

[ceph: root@host01 /]# ceph dashboard set-rest-requests-timeout 240

Creating the Ceph Object Gateway services with SSL using the
dashboard
Edit online
After installing an IBM Storage Ceph cluster, you can create the Ceph Object Gateway service with SSL using two methods:

Using the command-line interface.

Using the dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

SSL key from Certificate Authority (CA).

NOTE: Obtain the SSL certificate from a CA that matches the hostname of the gateway host. IBM recommends obtaining a certificate
from a CA that has subject alternate name fields and a wildcard for use with S3-style subdomains.

Procedure
Edit online

1. Log in to the Dashboard.

2. From the Cluster drop-down menu, select Services.

3. Click +Create.

4. In the Create Service window, select rgw service.

5. Select SSL and upload the Certificate in .pem format:

568 IBM Storage Ceph


6. Click Create Service.

7. Check the Ceph Object Gateway service is up and running.

Reference
Edit online

See Configuring SSL for Beast.

Management of Ceph Object Gateway users


Edit online
As a storage administrator, the IBM Storage Ceph Dashboard allows you to view and manage Ceph Object Gateway users.

Creating Ceph object gateway users


Creating Ceph object gateway subusers
Editing Ceph object gateway users on the dashboard
Deleting Ceph object gateway users

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

IBM Storage Ceph 569


Dashboard is installed.

The Ceph Object Gateway is installed.

Object gateway login credentials are added to the dashboard.

Creating Ceph object gateway users


Edit online
You can create Ceph object gateway users on the IBM Storage Ceph once the credentials are set-up using the CLI.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Ceph Object Gateway is installed.

Object gateway login credentials are added to the dashboard.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click Object Gateway.

3. Click Users and then Click Create.

4. In the Create User window, set the following parameters:

a. Set the user name, full name, and edit the maximum number of buckets if required.

b. Optional: Set an email address or suspended status.

c. Optional: Set a custom access key and secret key by unchecking Auto-generate key.

d. Optional: Set a user quota.

e. Check Enabled under User quota.

f. Uncheck Unlimited size or Unlimited objects.

g. Enter the required values for Max. size or Max. objects.

h. Optional: Set a bucket quota.

i. Check Enabled under Bucket quota.

j. Uncheck Unlimited size or Unlimited objects:

k. Enter the required values for Max. size or Max. objects:

5. Click Create User.

Figure 1. Create user

570 IBM Storage Ceph


6. You get a notification that the user was created successfully.

Reference
Edit online

For more information, see Adding Ceph object gateway login credentials to the dashboard.

For more information, see Ceph Object Gateway.

Creating Ceph object gateway subusers


Edit online
A subuser is associated with a user of the S3 interface. You can create a sub user for a specific Ceph object gateway user on the IBM
Storage Ceph dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

IBM Storage Ceph 571


Dashboard is installed.

The Ceph Object Gateway is installed.

Object gateway login credentials are added to the dashboard.

Object gateway user is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click Object Gateway.

3. Click Users.

4. Select the user by clicking its row.

5. From Edit drop-down menu, select Edit.

6. In the Edit User window, click +Create Subuser.

7. In the Create Subuser dialog box, enter the user name and select the appropriate permissions.

8. Check the Auto-generate secret box and then click Create Subuser.

NOTE: By clicking Auto-generate-secret checkbox, the secret key for object gateway is generated automatically.

9. In the Edit User window, click the Edit user button.

10. You get a notification that the user was updated successfully.

Editing Ceph object gateway users on the dashboard


Edit online
You can edit Ceph object gateway users on the IBM Storage Ceph once the credentials are set-up using the CLI.

572 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

The Ceph Object Gateway is installed.

Object gateway login credentials are added to the dashboard.

A Ceph object gateway user is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click Object Gateway.

3. Click Users.

4. To edit the user capabilities, click its row.

5. From the Edit drop-down menu, select Edit.

6. In the Edit User window, edit the required parameters.

7. Click Edit User.

8. You get a notification that the user was updated successfully.

Reference
Edit online

For more information, see Adding Ceph object gateway login credentials to the dashboard.

Deleting Ceph object gateway users


Edit online
You can delete Ceph object gateway users on the IBM Storage Ceph once the credentials are set-up using the CLI.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

The Ceph Object Gateway is installed.

Object gateway login credentials are added to the dashboard.

A Ceph object gateway user is created.

Procedure
Edit online

IBM Storage Ceph 573


1. Log in to the Dashboard.

2. On the navigation bar, click Object Gateway.

3. Click Users.

4. To delete the user, click its row.

5. From the Edit drop-down menu, select Delete.

6. In the Edit User window, edit the required parameters.

7. In the Delete user dialog window, Click the Yes, I am sure box and then Click Delete User to save the settings:

Reference
Edit online

For more information, see Adding Ceph object gateway login credentials to the dashboard.

Management of Ceph Object Gateway buckets


Edit online
As a storage administrator, the IBM Storage Ceph Dashboard allows you to view and manage Ceph Object Gateway buckets.

Creating Ceph object gateway buckets


Editing Ceph object gateway buckets
Deleting Ceph object gateway buckets
Monitoring multisite object gateway configuration
Management of buckets of a multisite object configuration

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

The Ceph Object Gateway is installed.

At least one Ceph Object Gateway user is created.

Object gateway login credentials are added to the dashboard.

Creating Ceph object gateway buckets


Edit online
You can create Ceph object gateway buckets on the IBM Storage Ceph once the credentials are set-up using the CLI.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

The Ceph Object Gateway is installed.

574 IBM Storage Ceph


Object gateway login credentials are added to the dashboard.

Object gateway user is created and not suspended.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click Object Gateway.

3. Click Buckets and then click Create.

4. In the Create Bucket window, enter a value for Name and select a user that is not suspended. Select a placement target.

NOTE: A bucket’s placement target is selected on creation and can not be modified.

5. Optional: Enable Locking for the objects in the bucket. Locking can only be enabled while creating a bucket. Once locking is
enabled, you also have to choose the lock mode, Compliance or Governance and the lock retention period in either days or
years, not both.

6. Click Create bucket.

7. You get a notification that the bucket was created successfully.

Editing Ceph object gateway buckets


Edit online
You can edit Ceph object gateway buckets on the IBM Storage Ceph once the credentials are set-up using the CLI.

Prerequisites
Edit online

IBM Storage Ceph 575


A running IBM Storage Ceph cluster.

Dashboard is installed.

The Ceph Object Gateway is installed.

Object gateway login credentials are added to the dashboard.

Object gateway user is created and not suspended.

A Ceph Object Gateway bucket created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click Object Gateway.

3. Click Buckets.

4. To edit the bucket, click it’s row.

5. From the Edit drop-down select Edit.

6. In the Edit bucket window, edit the Owner by selecting the user from the dropdown.

a. Optional: Enable Versioning if you want to enable versioning state for all the objects in an existing bucket. - To enable
versioning, you must be the owner of the bucket. - If Locking is enabled during bucket creation, you cannot disable the
versioning. - All objects added to the bucket will receive a unique version ID. - If the versioning state has not been set on a
bucket, then the bucket will not have a versioning state.

b. Optional: Check Delete enabled for Multi-Factor Authentication. Multi-Factor Authentication(MFA) ensures that users need
to use a one-time password(OTP) when removing objects on certain buckets. Enter a value for Token Serial Number and Token
PIN.

NOTE: The buckets must be configured with versioning and MFA enabled which can be done through the S3 API.

7. Click Edit Bucket.

8. You get a notification that the bucket was updated successfully.

Deleting Ceph object gateway buckets


Edit online
You can delete Ceph object gateway buckets on the IBM Storage Ceph once the credentials are set-up using the CLI.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

The Ceph Object Gateway is installed.

Object gateway login credentials are added to the dashboard.

Object gateway user is created and not suspended.

A Ceph Object Gateway bucket created.

576 IBM Storage Ceph


Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click Object Gateway.

3. Click Buckets.

4. To delete the bucket, click it’s row.

5. From the Edit drop-down select Delete.

6. In the Delete Bucket dialog box, Click the Yes, I am sure box and then Click Delete bucket to save the settings:

Monitoring multisite object gateway configuration


Edit online
The IBM Storage Ceph dashboard supports monitoring the users and buckets of one zone in another zone in a multisite object
gateway configuration. For example, if the users and buckets are created in a zone in the primary site, you can monitor those users
and buckets in the secondary zone in the secondary site.

Prerequisites
Edit online

At least one running IBM Storage Ceph cluster deployed on both the sites.

Dashboard is installed.

The multi-site object gateway is configured on the primary and secondary sites.

Object gateway login credentials of the primary and secondary sites are added to the dashboard.

Object gateway users are created on the primary site.

Object gateway buckets are created on the primary site.

Procedure
Edit online

1. On the Dashboard landing page of the secondary site, in the vertical menu bar, click Object Gateway drop-down list.

2. Select Buckets.

3. You can see those object gateway buckets on the secondary landing page that were created for the object gateway users on
the primary site.

IBM Storage Ceph 577


Management of buckets of a multisite object configuration
Edit online
As a storage administrator, you can edit buckets of one zone in another zone on the IBM Storage Ceph Dashboard. However, you can
delete buckets of secondary sites in the primary site. You cannot delete the buckets of master zones of primary sites in other sites.
For example, If the buckets are created in a zone in the secondary site, you can edit and delete those buckets in the master zone in
the primary site.

Editing buckets of a multisite object gateway configuration


Deleting buckets of a multisite object gateway configuration

Prerequisites
Edit online

At least one running IBM Storage Ceph cluster deployed on both the sites.

Dashboard is installed.

The multi-site object gateway is configured on the primary and secondary sites.

Object gateway login credentials of the primary and secondary sites are added to the dashboard.

Object gateway users are created on the primary site.

Object gateway buckets are created on the primary site.

At least rgw-manager level of access on the Ceph dashboard.

Editing buckets of a multisite object gateway configuration


Edit online

578 IBM Storage Ceph


You can edit and update the details of the buckets of one zone in another zone on the IBM Storage Ceph Dashboard in a multiste
object gateway configuration. You can edit the owner, versioning, multi-factor authentication and locking features of the buckets with
this feature of the dashboard.

Prerequisites
Edit online

At least one running IBM Storage Ceph cluster deployed on both the sites.

Dashboard is installed.

The multi-site object gateway is configured on the primary and secondary sites.

Object gateway login credentials of the primary and secondary sites are added to the dashboard.

Object gateway users are created on the primary site.

Object gateway buckets are created on the primary site.

At least rgw-manager level of access on the Ceph dashboard.

Procedure
Edit online

1. On the Dashboard landing page of the secondary site, in the vertical menu bar, click Object Gateway drop-down list.

2. Select Buckets.

3. You can see those object gateway buckets on the secondary landing page that were created for the object gateway users on
the primary site.

4. Click the row of the bucket that you want to edit.

5. From the Edit drop-down menu, select Edit.

6. In the Edit Bucket window, edit the required parameters and click Edit Bucket.

Figure 1. Edit Bucket

IBM Storage Ceph 579


Verification
Edit online

You will get a notification that the bucket is updated successfully.

Reference
Edit online

For more information on configuring multisite, see Multi-site configuration and administration.

For more information on adding object gateway login credentials to the dashboard, see Manually adding object gateway login
credentials to the Ceph dashboard.

For more information on creating object gateway users on the dashboard, see Creating object gateway users.

For more information on creating object gateway buckets on the dashboard, see Creating object gateway buckets.

For more information on system roles, see User roles and permissions.

Deleting buckets of a multisite object gateway configuration


Edit online
You can delete buckets of secondary sites in primary sites on the IBM Storage Ceph Dashboard in a multiste object gateway
configuration.

IMPORTANT: IBM does not recommend to delete buckets of primary site from secondary sites.

580 IBM Storage Ceph


Prerequisites
Edit online

At least one running IBM Storage Ceph cluster deployed on both the sites.

Dashboard is installed.

The multi-site object gateway is configured on the primary and secondary sites.

Object gateway login credentials of the primary and secondary sites are added to the dashboard.

Object gateway users are created on the primary site.

Object gateway buckets are created on the primary site.

At least rgw-manager level of access on the Ceph dashboard.

Procedure
Edit online

1. On the Dashboard landing page of the primary site, in the vertical menu bar, click Object Gateway drop-down list.

2. Select Buckets.

3. You can see those object gateway buckets of the secondary site here.

4. Click the row of the bucket that you want to delete.

5. From the Edit drop-down menu, select Delete.

6. In the Delete Bucket dialog box, select Yes, I am sure checkbox, and click Delete Bucket.

The selected row of the bucket is deleted successfully.

Verification
Edit online

The selected row of the bucket is deleted successfully.

Reference
Edit online

For more information on configuring multisite, see Multi-site configuration and administration.

For more information on adding object gateway login credentials to the dashboard, see Manually adding object gateway login
credentials to the Ceph dashboard.

For more information on creating object gateway users on the dashboard, see Creating object gateway users.

For more information on creating object gateway buckets on the dashboard, see Creating object gateway buckets.

For more information on system roles, see User roles and permissions.

Management of block devices


Edit online
As a storage administrator, you can manage and monitor block device images on the IBM Storage Ceph dashboard. The functionality
is divided between generic image functions and mirroring functions. For example, you can create new images, view the state of
images mirrored across clusters, and set IOPS limits on an image.

IBM Storage Ceph 581


Management of block device images
Management of mirroring functions

Management of block device images


Edit online
As a storage administrator, you can create, edit, copy, purge, and delete images using the IBM Storage Ceph dashboard.

You can also create, clone, copy, rollback, and delete snapshots of the images using the Ceph dashboard.

NOTE: The Block Device images table is paginated for use with 10000+ image storage clusters to reduce Block Device information
retrieval costs.

Creating images
Creating namespaces
Editing images
Copying images
Moving images to trash
Purging trash
Restoring images from trash
Deleting images.
Deleting namespaces.
Creating snapshots of images
Renaming snapshots of images
Protecting snapshots of images
Cloning snapshots of images
Copying snapshots of images
Unprotecting snapshots of images
Rolling back snapshots of images
Deleting snapshots of images

Creating images
Edit online
You can create block device images on the IBM Storage Ceph dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click the Block drop-down menu.

3. Select Images.

4. Click Create.

5. In the Create RBD window, enter the parameters.

582 IBM Storage Ceph


6. Optional: Click Advanced and set the parameters.

7. Click Create RBD.

Figure 1. Create RBD

8. Create Block device image.

9. You get a notification that the image was created successfully.

Reference
Edit online

For more information about images, see Block devices.

For more information, see Creating pools.

Creating namespaces
Edit online
You can create namespaces for the block device images on the IBM Storage Ceph dashboard.

Once the namespaces are created, you can give access to the users for those namespaces.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

IBM Storage Ceph 583


A Block device image is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click the Block drop-down menu.

3. Select Images.

4. To create the namespace of the image, in the Namespaces tab, click Create.

5. In the Create Namespace window, select the pool and enter a name for the namespace.

6. Click Create.

7. You get a notification that the namespace was created successfully.

Reference
Edit online

See the Knowledgebase article Segregate Block device images within isolated namespaces for more details.

Editing images
Edit online
You can edit block device images on the IBM Storage Ceph dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click the Block drop-down menu.

3. Select Images.

4. To edit the image, click its row.

5. In the Edit drop-down menu, select Edit.

6. In the Edit RBD window, edit the required parameters and click Edit RBD.

584 IBM Storage Ceph


7. You get a notification that the image was updated successfully.

Copying images
Edit online
You can copy block device images on the IBM Storage Ceph dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click the Block drop-down menu.

3. Select Images.

4. To copy the image, click its row.

5. In the Edit drop-down menu, select Copy.

IBM Storage Ceph 585


6. In the Copy RBD window, set the required parameters and click Copy RBD.

7. You get a notification that the image was copied successfully.

Reference
Edit online

For more information on images, see Block devices.

For more information, see Creating pools.

Moving images to trash


Edit online
You can move the block device images to trash before it is deleted on the IBM Storage Ceph dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images from the drop-down menu.

4. To move the image to trash, click its row.

5. Select Move to Trash in the Edit drop-down.

6. In the Moving an image to trash window, edit the date till which the image needs protection, and then click Move.

7. You get a notification that the image was moved to trash successfully.

Purging trash
Edit online
You can purge trash using the IBM Storage Ceph dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

586 IBM Storage Ceph


A pool with the rbd application enabled is created.

An image is trashed.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click Block:

3. Select Images.

4. In the Trash tab, click Purge Trash.

5. In the Purge Trash window, select the pool, and then click Purge Trash.

6. You get a notification that the pools in the trash were purged successfully.

Restoring images from trash


Edit online
You can restore the images that were trashed and has an expiry date on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is trashed.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images.

4. To restore the image from Trash, in the Trash tab, click its row.

5. Select Restore in the Restore drop-down.

6. In the Restore Image window, enter the new name of the image , and then click Restore.

7. You get a notification that the image was restored successfully.

Deleting images.
Edit online
You can delete the images only after the images are moved to trash. You can delete the cloned images and the copied images
directly without moving them to trash.

IBM Storage Ceph 587


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created and is moved to trash.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation bar, click Block

3. Select Images.

4. To delete the image, in the Trash tab, click its row.

5. Select Delete in the Restore drop-down menu.

6. Optional: To remove the cloned images and copied images, select Delete from the Edit drop-down menu.

7. In the Delete RBD dialog box, click the Yes, I am sure box and then Click Delete RBD to save the settings:

8. You get a notification that the image was deleted successfully.

Reference
Edit online

For more information on on creating images in an RBD pool, see Moving images to trash.

Deleting namespaces.
Edit online
You can delete the namespaces of the images on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created and is moved to trash.

A block device image and its namespaces is created

Procedure
Edit online

1. Log in to the Dashboard.

588 IBM Storage Ceph


2. On the navigation bar, click Block

3. Select Images.

4. To delete the namespace of the image, in the Namespaces tab, click its row.

5. Click Delete.

6. In the Delete Namespace dialog box, click the Yes, I am sure box and then Click Delete Namespace to save the settings:

7. You get a notification that the namespace was deleted successfully.

Creating snapshots of images


Edit online
You can take snapshots of the Ceph block device images on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images.

4. To take the snapshot of the image, in the Images tab, click its row, and then click the Snapshots tab.

5. Select Create in the Create drop-down.

6. In the Create RBD Snapshot dialog, enter the name and click Create RBD Snapshot:

7. You get a notification that the snapshot was created successfully.

Reference
Edit online

For more information on creating snapshots, see Creating a block device snapshot.

For more information on creating RBD pools, see Creating pools.

For more information, see Creating images.

Renaming snapshots of images


Edit online
You can rename the snapshots of the Ceph block device images on the IBM Storage Ceph Dashboard.

IBM Storage Ceph 589


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

A snapshot of the image is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images.

4. To rename the snapshot of the image, in the Images tab, click its row, and then click the Snapshots tab.

5. Select Rename in the the Rename drop-down.

6. In the Rename RBD Snapshot dialog box, enter the name and click Rename RBD Snapshot.

Protecting snapshots of images


Edit online
You can protect the snapshots of the Ceph block device images on the IBM Storage Ceph Dashboard.

This is required when you need to clone the snapshots.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

A snapshot of the image is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images.

4. To protect the snapshot of the image, in the Images tab, click its row, and then click the Snapshots tab.

5. Select Protect in the the Rename drop-down.

590 IBM Storage Ceph


6. The State of the snapshot changes from UNPROTECTED to PROTECTED.

Cloning snapshots of images


Edit online
You can clone the snapshots of images on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

A snapshot of the image is created and protected.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images.

4. To protect the snapshot of the image, in the Images tab, click its row, and then click the Snapshots tab.

5. Select Clone in the the Rename drop-down.

6. In the Clone RBD window, edit the parameters and click Clone RBD.

IBM Storage Ceph 591


7. You get a notification that the snapshot was cloned successfully. You can search for the cloned image in the Images tab.

Reference
Edit online

For more information, see Protecting a Block device Snapshot.

For more information, see Protecting snapshots of images.

Copying snapshots of images


Edit online
You can copy the snapshots of images on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

592 IBM Storage Ceph


An image is created.

A snapshot of the image is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images.

4. To protect the snapshot of the image, in the Images tab, click its row, and then click the Snapshots tab.

5. Select Copy in the the Rename drop-down menu.

6. In the Copy RBD window, enter the parameters and click the Copy RBD button:

7. You get a notification that the snapshot was copied successfully. You can search for the copied image in the Images tab.

Reference
Edit online

For more information on creating RBD pools, see Creating pools.

Fore more information, see Creating images.

Unprotecting snapshots of images


Edit online
You can unprotect the snapshots of the Ceph block device images on the IBM Storage Ceph Dashboard.

This is required when you need to delete the snapshots.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

A snapshot of the image is created and protected.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images.

4. To unprotect the snapshot of the image, in the Images tab, click its row, and then click the Snapshots tab.

IBM Storage Ceph 593


5. Select UnProtect in the the Rename drop-down.

6. The State of the snapshot changes from PROTECTED to UNPROTECTED.

Rolling back snapshots of images


Edit online
You can rollback the snapshots of the Ceph block device images on the IBM Storage Ceph Dashboard. Rolling back an image to a
snapshot means overwriting the current version of the image with data from a snapshot. The time it takes to execute a rollback
increases with the size of the image. It is faster to clone from a snapshot than to rollback an image to a snapshot, and it is the
preferred method of returning to a pre-existing state.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

A snapshot of the image is created.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images.

4. To rollback the snapshot of the image, in the Images tab, click its row, and then click the Snapshots tab.

5. Select Rollback in the the Rename drop-down.

6. In the RBD snapshot rollback dialog box, click Rollback.

Deleting snapshots of images


Edit online
You can delete the snapshots of the Ceph block device images on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

A snapshot of the image is created and is unprotected.

594 IBM Storage Ceph


Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Select Images.

4. To take the snapshot of the image, in the Images tab, click its row, and then click the Snapshots tab.

5. Select Delete in the the Rename drop-down.

6. You get a notification that the snapshot was deleted successfully.

Reference
Edit online

For more information, see Deleting a block device snapshot.

For more information, see Unprotecting snapshots of images.

Management of mirroring functions


Edit online
As a storage administrator, you can manage and monitor mirroring functions of the Block devices on the IBM Storage Ceph
Dashboard.

You can add another layer of redundancy to Ceph block devices by mirroring data images between storage clusters. Understanding
and using Ceph block device mirroring can provide you protection against data loss, such as a site failure. There are two
configurations for mirroring Ceph block devices, one-way mirroring or two-way mirroring, and you can configure mirroring on pools
and individual images.

Mirroring view
Editing mode of pools
Adding peer in mirroring
Editing peer in mirroring
Deleting peer in mirroring

Mirroring view
Edit online
You can view the Block device mirroring on the IBM Storage Ceph Dashboard.

You can view the daemons, the site details, the pools, and the images that are configured for Block device mirroring.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

Mirroring is configured.

Procedure
IBM Storage Ceph 595
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Click Mirroring.

Editing mode of pools


Edit online
You can edit mode of the overall state of mirroring functions, which includes pools and images on the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

Mirroring is configured.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Click Mirroring.

4. In the Pools tab, click the peer you want to edit.

5. In the Edit Mode drop-down, select Edit Mode.

6. In the Edit pool mirror mode window, select the mode from the drop-down, and then click Update. Pool is updated
successfully.

Adding peer in mirroring


Edit online
You can add storage cluster peer for the rbd-daemon mirror to discover its peer storage cluster on the IBM Storage Ceph
Dashboard.

Prerequisites
Edit online

Two healthy running IBM Storage Ceph clusters.

Dashboard is installed on both the clusters.

Pools created with the same name.

596 IBM Storage Ceph


rbd application enabled on both the clusters.

NOTE: Ensure that mirroring is enabled for the pool in which images are created.

Procedure
Edit online
Site A

1. Log in to the dashboard.

2. From the Navigation menu, click the Block drop-down menu, and click Mirroring.

3. Click Create Bootstrap Token and configure the following in the window:

a. Choose the pool for mirroring for the provided site name.

b. For the selected pool, generate a new bootstrap token by clicking Generate.

c. Click the Copy icon to copy the token to clipboard.

d. Click Close.

4. Enable pool mirror mode.

a. Select the pool.

b. Click Edit Mode.

c. From the Edit pool mirror mode window, select Image from the drop-down.

d. Click Update.

Site B

1. Log in to the dashboard.

2. From the Navigation menu, click the Block drop-down menu, and click Mirroring.

3. From the Create Bootstrap token drop-down, select Import Bootstrap Token.

NOTE: Ensure that mirroring mode is enabled for the specific pool for which you are importing the bootstrap token.

4. In the Import Bootstrap Token window, choose the direction, and paste the token copied earlier from site A.

5. Click Submit. The peer is added and the images are mirrored in the cluster at site B.

6. Verify the health of the pool is in OK state. In the Navigation menu, under Block, select Mirroring. The health of the pool
should be OK.

Site A

1. Create an image with Mirroring enabled.

a. From the Navigation menu, click the Block drop-down menu.

b. Click Images.

c. Click Create.

d. In the Create RBD window, provide the Name, Size, and enable Mirroring.

NOTE: You can either choose Journal or Snapshot.

e. Click Create RBD.

2. Verify the image is available at both the sites. In the Navigation menu, under Block, select Images. The image in site A is
primary while the image in site B is secondary.

IBM Storage Ceph 597


Reference
Edit online

See Configuring two-way mirroring using the command-line interface.

Editing peer in mirroring


Edit online
You can edit storage cluster peer for therbd-daemon mirror to discover its peer storage cluster in the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

Mirroring is configured.

A peer is added.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Click Mirroring.

4. In the Pools tab, click the peer you want to delete.

5. In the Edit Mode drop-down, select Edit peer.

6. In the Edit pool mirror peer window, edit the parameters, and then click Submit.

598 IBM Storage Ceph


7. You get a notification that the peer was updated successfully.

Deleting peer in mirroring


Edit online
You can edit storage cluster peer for therbd-daemon mirror to discover its peer storage cluster in the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Dashboard is installed.

A pool with the rbd application enabled is created.

An image is created.

Mirroring is configured.

A peer is added.

Procedure
Edit online

1. Log in to the Dashboard.

2. On the navigation menu, click Block.

3. Click Mirroring.

4. In the Pools tab, click the peer you want to delete.

5. In the Edit Mode drop-down, select Delete peer.

6. In the Delete mirror peer dialog window, Click the Yes, I am sure box and then Click Delete mirror peer to save the settings:

IBM Storage Ceph 599


7. You get a notification that the peer was deleted successfully.

Reference
Edit online

For more informatin, see Adding peer in mirroring.

Activating and deactivating telemetry


Edit online
Activate the telemery module to help Ceph developers understand how Ceph is used and what problems users might be
experiencing. This helps improve the dashboard experience. Activating the telemetry module sends anonymous data about the
cluster back to the Ceph developers.

View the telemetry data that is sent to the Ceph developers on the public telemetry dashboard. This allows the community to easily
see summary statistics on how many clusters are reporting, their total capacity and OSD count, and version distribution trends.

The telemetry report is broken down into several channels, each with a different type of information. Assuming telemetry has been
enabled, you can turn on and off the individual channels. If telemetry is off, the per-channel setting has no effect.

Basic: Provides basic information about the cluster.

Crash: Provides information about daemon crashes.

Device: Provides information about device metrics.

Ident: Provides user-provided identifying information about the cluster.

Perf: Provides various performance metrics of the cluster.

The data reports contain information that help the developers gain a better understanding of the way Ceph is used. The data includes
counters and statistics on how the cluster has been deployed, the version of Ceph, the distribution of the hosts, and other
parameters.

IMPORTANT: The data reports do not contain any sensitive data like pool names, object names, object contents, hostnames, or
device serial numbers.

NOTE: Telemetry can also be managed by using an API. For more information, see Telemetry.

Procedure
Edit online

1. Activate the telemetry module in one of the following ways:

From the banner within the Ceph dashboard.

Go to Settings > Telemetry configuration.

2. Select each channel that telemetry should be enabled on.

NOTE: For detailed information about each channel type, click More Info next to the channels.

3. Complete the Contact Information for the cluster. Enter the contact, Ceph cluster description, and organization.

4. Optional: Complete the Advanced Settings field options.

600 IBM Storage Ceph


Interval: Set the interval by hour. The module compiles and sends a new report per this hour interval.

The default interval is 24 hours.

Proxy: Use this to configure an HTTP or HTTPs proxy server if the cluster cannot directly connect to the configured telemetry
endpoint. Add the server in one of the following formats:

https://10.0.0.1:8080 or https://ceph:[email protected]:8080

The default endpoint is telemetry.ceph.com.

5. Click Next. This displays the Telemetry report preview before enabling telemetry.

6. Review the Report preview.

NOTE: The report can be downloaded and saved locally or copied to the clipboard.

7. Select I agree to my telemetry data being submitted under the Community Data License Agreement.

8. Enable the telemetry module by clicking Update.

The following message is displayed, confirming the telemetry activation:

The Telemetry module has been configured and activated successfully

Deactivating telemetry
Edit online
To deactivate the telemetry module, go to Settings > Telemetry configuration and click Deactivate.

Ceph Object Gateway


Edit online
Deploy, configure, and administer a Ceph Object Gateway environment.

This uses a "Day Zero", "Day One", and "Day Two" organizational methodology, providing readers with a logical progression path.

Day Zero is where research and planning are done before implementing a potential solution.

Day One is where the actual deployment, and installation of the software happens.

Day Two is where all the basic, and advanced configuration happens.

The Ceph Object Gateway


Considerations and recommendations
Deployment
Basic configuration
Advanced configuration
Security
Administration
Testing
Configuration reference

The Ceph Object Gateway


Edit online
Ceph Object Gateway, also known as RADOS Gateway (RGW), is an object storage interface built on top of the librados library to
provide applications with a RESTful gateway to Ceph storage clusters. Ceph Object Gateway supports three interfaces:

S3-compatibility: Provides object storage functionality with an interface that is compatible with a large subset of the Amazon S3
RESTful API.

IBM Storage Ceph 601


Swift-compatibility: Provides object storage functionality with an interface that is compatible with a large subset of the OpenStack
Swift API.

The Ceph Object Gateway is a service interacting with a Ceph storage cluster. Since it provides interfaces compatible with OpenStack
Swift and Amazon S3, the Ceph Object Gateway has its own user management system. Ceph Object Gateway can store data in the
same Ceph storage cluster used to store data from Ceph block device clients; however, it would involve separate pools and likely a
different CRUSH hierarchy. The S3 and Swift APIs share a common namespace, so you can write data with one API and retrieve it
with the other.

Administrative API: Provides an administrative interface for managing the Ceph Object Gateways.

Administrative API requests are done on a URI that starts with the admin resource end point. Authorization for the administrative
API mimics the S3 authorization convention. Some operations require the user to have special administrative capabilities. The
response type can be either XML or JSON by specifying the format option in the request, but defaults to the JSON format.

Figure 1. Basic Access diagram

Reference
Edit online

Developer for details on the available APIs in IBM Storage Ceph.

Considerations and recommendations


602 IBM Storage Ceph
Edit online
As a storage administrator, a basic understanding about what to consider before running a Ceph Object Gateway and implementing a
multi-site Ceph Object Gateway solution is important. You can learn the hardware and network requirements, knowing what type of
workloads work well with a Ceph Object Gateway, and IBM's recommendations.

Network considerations for IBM Storage Ceph


Basic IBM Storage Ceph considerations
IBM Storage Ceph workload considerations
Ceph Object Gateway considerations
Developing CRUSH hierarchies
Creating CRUSH roots
Creating CRUSH rules
Ceph Object Gateway multi-site considerations
Considering storage sizing
Considering storage density
Considering disks for the Ceph Monitor nodes
Adjusting backfill and recovery settings
Adjusting the cluster map size
Adjusting scrubbing
Increase rgw_thread_pool_size
Increase objecter_inflight_ops
Tuning considerations for the Linux kernel when running Ceph

Prerequisites
Edit online

Time to understand, consider, and plan a storage solution.

Reference
Edit online

For more details about Ceph’s various internal components and the strategies around those components, see Storage
Strategies.

Network considerations for IBM Storage Ceph


Edit online
An important aspect of a cloud storage solution is that storage clusters can run out of IOPS due to network latency, and other factors.
Also, the storage cluster can run out of throughput due to bandwidth constraints long before the storage clusters run out of storage
capacity. This means that the network hardware configuration must support the chosen workloads to meet price versus performance
requirements.

Storage administrators prefer that a storage cluster recovers as quickly as possible. Carefully consider bandwidth requirements for
the storage cluster network, be mindful of network link oversubscription, and segregate the intra-cluster traffic from the client-to-
cluster traffic. Also consider that network performance is increasingly important when considering the use of Solid State Disks (SSD),
flash, NVMe, and other high performing storage devices.

Ceph supports a public network and a storage cluster network. The public network handles client traffic and communication with
Ceph Monitors. The storage cluster network handles Ceph OSD heartbeats, replication, backfilling, and recovery traffic. At a
minimum, a single 10 GB Ethernet link should be used for storage hardware, and you can add additional 10 GB Ethernet links for
connectivity and throughput.

IMPORTANT:

IBM recommends allocating bandwidth to the storage cluster network, such that it is a multiple of the public network using the
osd_pool_default_size as the basis for the multiple on replicated pools. IBM also recommends running the public and storage
cluster networks on separate network cards.

IMPORTANT:

IBM Storage Ceph 603


A 1 GB Ethernet network is not suitable for production storage clusters.

In the case of a drive failure, replicating 1 TB of data across a 1 GB Ethernet network takes 3 hours, and 3 TB takes 9 hours. Using 3
TB is the typical drive configuration. By contrast, with a 10 GB Ethernet network, the replication times would be 20 minutes and 1
hour. Remember that when a Ceph OSD fails, the storage cluster recovers by replicating the data it contained to other OSDs within
the same failure domain and device class as the failed OSD.

The failure of a larger domain such as a rack means that the storage cluster utilizes considerably more bandwidth. When building a
storage cluster consisting of multiple racks, which is common for large storage implementations, consider utilizing as much network
bandwidth between switches in a "fat tree" design for optimal performance. A typical 10 GB Ethernet switch has 48 10 GB ports and
four 40 GB ports. Use the 40 GB ports on the spine for maximum throughput. Alternatively, consider aggregating unused 10 GB ports
with QSFP+ and SFP+ cables into more 40 GB ports to connect to other rack and spine routers. Also, consider using LACP mode 4 to
bond network interfaces. Additionally, use jumbo frames, with a maximum transmission unit (MTU) of 9000, especially on the
backend or cluster network.

Most performance-related problems in Ceph usually begin with a networking issue. Simple network issues like a kinked or bent Cat-6
cable could result in degraded bandwidth. Use a minimum of 10 GB ethernet for the front side network. For large clusters, consider
using 40 GB ethernet for the backend or cluster network.

IMPORTANT:

For network optimization, IBM recommends using jumbo frames for a better CPU per bandwidth ratio, and a non-blocking network
switch back-plane.IBM Storage Ceph requires the same MTU value throughout all networking devices in the communication path,
end-to-end for both public and cluster networks. Verify that the MTU value is the same on all hosts and networking equipment in the
environment before using a IBM Storage Ceph cluster in production.

Basic IBM Storage Ceph considerations


Edit online
A storage strategy is a method of storing data that serves a particular use case. If you need to store volumes and images for a cloud
platform like OpenStack, you can choose to store data on faster Serial Attached SCSI (SAS) drives with Solid State Drives (SSD) for
journals. By contrast, if you need to store object data for an S3- or Swift-compliant gateway, you can choose to use something more
economical, like traditional Serial Advanced Technology Attachment (SATA) drives.

One of the most important steps in a successful Ceph deployment is identifying a price-to-performance profile suitable for the
storage cluster’s use case and workload. It is important to choose the right hardware for the use case. For example, choosing IOPS-
optimized hardware for a cold storage application increases hardware costs unnecessarily. Whereas, choosing capacity-optimized
hardware for its more attractive price point in an IOPS-intensive workload will likely lead to unhappy users complaining about slow
performance.

Use cases, cost versus benefit performance tradeoffs, and data durability are the primary considerations that help develop a sound
storage strategy.

Use Cases

Ceph provides massive storage capacity, and it supports numerous use cases, such as:

The Ceph Block Device client is a leading storage backend for cloud platforms that provides limitless storage for volumes and
images with high performance features like copy-on-write cloning.

The Ceph Object Gateway client is a leading storage backend for cloud platforms that provides a RESTful S3-compliant and
Swift-compliant object storage for objects like audio, bitmap, video, and other data.

The Ceph File System for traditional file storage.

Cost vs. Benefit of Performance

Faster is better. Bigger is better. High durability is better. However, there is a price for each superlative quality, and a corresponding
cost versus benefit tradeoff. Consider the following use cases from a performance perspective: SSDs can provide very fast storage
for relatively small amounts of data and journaling. Storing a database or object index can benefit from a pool of very fast SSDs, but
proves too expensive for other data. SAS drives with SSD journaling provide fast performance at an economical price for volumes and
images. SATA drives without SSD journaling provide cheap storage with lower overall performance. When you create a CRUSH
hierarchy of OSDs, you need to consider the use case and an acceptable cost versus performance tradeoff.

604 IBM Storage Ceph


Data Durability

In large scale storage clusters, hardware failure is an expectation, not an exception. However, data loss and service interruption
remain unacceptable. For this reason, data durability is very important. Ceph addresses data durability with multiple replica copies
of an object or with erasure coding and multiple coding chunks. Multiple copies or multiple coding chunks present an additional cost
versus benefit tradeoff: it is cheaper to store fewer copies or coding chunks, but it can lead to the inability to service write requests in
a degraded state. Generally, one object with two additional copies, or two coding chunks can allow a storage cluster to service writes
in a degraded state while the storage cluster recovers.

Replication stores one or more redundant copies of the data across failure domains in case of a hardware failure. However,
redundant copies of data can become expensive at scale. For example, to store 1 petabyte of data with triple replication would
require a cluster with at least 3 petabytes of storage capacity.

Erasure coding stores data as data chunks and coding chunks. In the event of a lost data chunk, erasure coding can recover the lost
data chunk with the remaining data chunks and coding chunks. Erasure coding is substantially more economical than replication. For
example, using erasure coding with 8 data chunks and 3 coding chunks provides the same redundancy as 3 copies of the data.
However, such an encoding scheme uses approximately 1.5x the initial data stored compared to 3x with replication.

The CRUSH algorithm aids this process by ensuring that Ceph stores additional copies or coding chunks in different locations within
the storage cluster. This ensures that the failure of a single storage device or host does not lead to a loss of all of the copies or coding
chunks necessary to preclude data loss. You can plan a storage strategy with cost versus benefit tradeoffs, and data durability in
mind, then present it to a Ceph client as a storage pool.

IMPORTANT:

ONLY the data storage pool can use erasure coding. Pools storing service data and bucket indexes use replication.

IMPORTANT:

Ceph’s object copies or coding chunks make RAID solutions obsolete. Do not use RAID, because Ceph already handles data
durability, a degraded RAID has a negative impact on performance, and recovering data using RAID is substantially slower than using
deep copies or erasure coding chunks.

Colocating Ceph daemons and its advantages

Reference
Edit online

See the Minimum hardware considerations for IBM Storage Ceph

Colocating Ceph daemons and its advantages


Edit online
You can colocate containerized Ceph daemons on the same host. Here are the advantages of colocating some of Ceph’s daemons:

Significantly improves the total cost of ownership (TCO) at small scale.

Can increase overall performance.

Reduces the amount of physical hosts for a minimum configuration.

Better resource utilization.

Upgrading IBM Storage Ceph is easier.

By using containers you can colocate one daemon from the following list with a Ceph OSD daemon (ceph-osd). Additionally, for the
Ceph Object Gateway (radosgw), Ceph Metadata Server (ceph-mds), and Grafana, you can colocate it either with a Ceph OSD
daemon, plus a daemon from the list below.

Ceph Metadata Server (ceph-mds)

Ceph Monitor (ceph-mon)

Ceph Manager (ceph-mgr)

IBM Storage Ceph 605


Ceph Manager (ceph-grafana)

Table 1. Daemon Placement Example


Host Name Daemon Daemon Daemon
host1 OSD Monitor & Manager Prometheus
host2 OSD Monitor & Manager RGW
host3 OSD Monitor & Manager RGW
host4 OSD Metadata Server
host5 OSD Metadata Server
NOTE: Because ceph-mon and ceph-mgr work closely together, they are not considered two separate daemons for the purposes of
colocation.

Colocating Ceph daemons can be done from the command line interface, by using the --placement option to the ceph orch
command, or you can use a service specification YAML file.

Command line Example

[ceph: root@host01 /]# ceph orch apply mon --placement="host1 host2 host3"

Service Specification YAML File Example

service_type: mon
placement:
hosts:
- host01
- host02
- host03

[ceph: root@host01 /]# ceph orch apply -i mon.yml

The diagrams below shows the difference between storage clusters with colocated and non-colocated daemons.

Figure 1. Colocated daemons

Figure 2. Non-colocated daemons

606 IBM Storage Ceph


Reference
Edit online

See the Operations Guide for more details.

IBM Storage Ceph workload considerations


Edit online
One of the key benefits of a Ceph storage cluster is the ability to support different types of workloads within the same storage cluster
using performance domains. Different hardware configurations can be associated with each performance domain. Storage
administrators can deploy storage pools on the appropriate performance domain, providing applications with storage tailored to
specific performance and cost profiles. Selecting appropriately sized and optimized servers for these performance domains is an
essential aspect of designing a IBM Storage Ceph cluster.

To the Ceph client interface that reads and writes data, a Ceph storage cluster appears as a simple pool where the client stores data.
However, the storage cluster performs many complex operations in a manner that is completely transparent to the client interface.
Ceph clients and Ceph object storage daemons, referred to as Ceph OSDs, or simply OSDs, both use the Controlled Replication Under

IBM Storage Ceph 607


Scalable Hashing (CRUSH) algorithm for the storage and retrieval of objects. Ceph OSDs can run in containers within the storage
cluster.

A CRUSH map describes a topography of cluster resources, and the map exists both on client hosts as well as Ceph Monitor hosts
within the cluster. Ceph clients and Ceph OSDs both use the CRUSH map and the CRUSH algorithm. Ceph clients communicate
directly with OSDs, eliminating a centralized object lookup and a potential performance bottleneck. With awareness of the CRUSH
map and communication with their peers, OSDs can handle replication, backfilling, and recovery—allowing for dynamic failure
recovery.

Ceph uses the CRUSH map to implement failure domains. Ceph also uses the CRUSH map to implement performance domains,
which simply take the performance profile of the underlying hardware into consideration. The CRUSH map describes how Ceph
stores data, and it is implemented as a simple hierarchy, specifically an acyclic graph, and a ruleset. The CRUSH map can support
multiple hierarchies to separate one type of hardware performance profile from another. Ceph implements performance domains
with device "classes".

Hard disk drives (HDDs) are typically appropriate for cost and capacity-focused workloads.

Throughput-sensitive workloads typically use HDDs with Ceph write journals on solid state drives (SSDs).

IOPS-intensive workloads, such as MySQL and MariaDB, often use SSDs.

Figure 1. Performance and failure domains

Workloads

IBM Storage Ceph is optimized for three primary workloads.

IMPORTANT: Carefully consider the workload being run by IBM Storage Ceph clusters BEFORE considering what hardware to
purchase, because it can significantly impact the price and performance of the storage cluster. For example, if the workload is
capacity-optimized and the hardware is better suited to a throughput-optimized workload, then hardware will be more expensive
than necessary. Conversely, if the workload is throughput-optimized and the hardware is better suited to a capacity-optimized
workload, then the storage cluster can suffer from poor performance.

608 IBM Storage Ceph


IOPS optimized: Input, output per second (IOPS) optimization deployments are suitable for cloud computing operations,
such as running MYSQL or MariaDB instances as virtual machines on OpenStack. IOPS optimized deployments require higher
performance storage such as 15k RPM SAS drives and separate SSD journals to handle frequent write operations. Some high
IOPS scenarios use all flash storage to improve IOPS and total throughput.

An IOPS-optimized storage cluster has the following properties:

Lowest cost per IOPS.

Highest IOPS per GB.

99th percentile latency consistency.

Uses for an IOPS-optimized storage cluster are:

Typically block storage.

3x replication for hard disk drives (HDDs) or 2x replication for solid state drives (SSDs).

MySQL on OpenStack clouds.

Throughput optimized: Throughput-optimized deployments are suitable for serving up significant amounts of data, such as
graphic, audio, and video content. Throughput-optimized deployments require high bandwidth networking hardware,
controllers, and hard disk drives with fast sequential read and write characteristics. If fast data access is a requirement, then
use a throughput-optimized storage strategy. Also, if fast write performance is a requirement, using Solid State Disks (SSD) for
journals will substantially improve write performance.

A throughput-optimized storage cluster has the following properties:

Lowest cost per MBps (throughput).

Highest MBps per TB.

Highest MBps per BTU.

Highest MBps per Watt.

97th percentile latency consistency.

Uses for a throughput-optimized storage cluster are:

Block or object storage.

3x replication.

Active performance storage for video, audio, and images.

Streaming media, such as 4k video.

Capacity optimized: Capacity-optimized deployments are suitable for storing significant amounts of data as inexpensively as
possible. Capacity-optimized deployments typically trade performance for a more attractive price point. For example,
capacity-optimized deployments often use slower and less expensive SATA drives and co-locate journals rather than using
SSDs for journaling.

A cost and capacity-optimized storage cluster has the following properties:

Lowest cost per TB.

Lowest BTU per TB.

Lowest Watts required per TB.

Uses for a cost and capacity-optimized storage cluster are:

Typically object storage.

Erasure coding for maximizing usable capacity

Object archive.

Video, audio, and image object repositories.

IBM Storage Ceph 609


Ceph Object Gateway considerations
Edit online
Another important aspect of designing a storage cluster is to determine if the storage cluster will be in one data center site or span
multiple data center sites. Multi-site storage clusters benefit from geographically distributed failover and disaster recovery, such as
long-term power outages, earthquakes, hurricanes, floods or other disasters. Additionally, multi-site storage clusters can have an
active-active configuration, which can direct client applications to the closest available storage cluster. This is a good storage
strategy for content delivery networks. Consider placing data as close to the client as possible. This is important for throughput-
intensive workloads, such as streaming 4k video.

IMPORTANT:

IBM recommends identifying realm, zone group and zone names BEFORE creating Ceph’s storage pools. Prepend some pool names
with the zone name as a standard naming convention.

Administrative data storage


Index pool
Data pool
Data extra pool

Reference
Edit online

Multi-site configuration and administration

Administrative data storage


Edit online
A Ceph Object Gateway stores administrative data in a series of pools defined in an instance’s zone configuration. For example, the
buckets, users, user quotas, and usage statistics discussed in the subsequent sections are stored in pools in the Ceph storage
cluster. By default, Ceph Object Gateway creates the following pools and maps them to the default zone.

.rgw.root

.default.rgw.control

.default.rgw.meta

.default.rgw.log

.default.rgw.buckets.index

.default.rgw.buckets.data

.default.rgw.buckets.non-ec

NOTE: The .default.rgw.buckets.index pool is created only after the bucket is created in Ceph Object Gateway, while the
.default.rgw.buckets.data pool is created after the data is uploaded to the bucket.

Consider creating these pools manually so you can set the CRUSH ruleset and the number of placement groups. In a typical
configuration, the pools that store the Ceph Object Gateway’s administrative data will often use the same CRUSH ruleset, and use
fewer placement groups, because there are 10 pools for the administrative data.

IBM recommends that the .rgw.root pool and the service pools use the same CRUSH hierarchy, and use at least node as the
failure domain in the CRUSH rule. IBM recommends using replicated for data durability, and NOT erasure for the .rgw.root
pool, and the service pools.

The mon_pg_warn_max_per_osd setting warns you if you assign too many placement groups to a pool, 300 by default. You may
adjust the value to suit your needs and the capabilities of your hardware where n is the maximum number of PGs per OSD.

mon_pg_warn_max_per_osd = n

610 IBM Storage Ceph


NOTE: For service pools, including .rgw.root, the suggested PG count from the Ceph placement groups (PGs) per pool calculator is
substantially less than the target PGs per Ceph OSD. Also, ensure the number of Ceph OSDs is set in step 4 of the calculator.

IMPORTANT:

Garbage collection uses the .log pool with regular RADOS objects instead of OMAP. In future releases, more features will store
metadata on the .log pool. Therefore, IBM recommends using NVMe/SSD Ceph OSDs for the .log pool.

.rgw.root Pool

The pool where the Ceph Object Gateway configuration is stored. This includes realms, zone groups, and zones. By convention, its
name is not prepended with the zone name.

Service Pools

The service pools store objects related to service control, garbage collection, logging, user information, and usage. By convention,
these pool names have the zone name prepended to the pool name.

.ZONE_NAME.rgw.control : The control pool.

.ZONE_NAME.log : The log pool contains logs of all bucket, container, and object actions, such as create, read, update, and
delete.

.ZONE_NAME.rgw.buckets.index : This pool stores index of the buckets.

.ZONE_NAME.rgw.buckets.data : This pool stores data of the buckets.

.ZONE_NAME.rgw.meta : The metadata pool stores user_keys and other critical metadata.

.ZONE_NAME.meta:users.uid : The user ID pool contains a map of unique user IDs.

.ZONE_NAME.meta:users.keys : The keys pool contains access keys and secret keys for each user ID.

.ZONE_NAME.meta:users.email : The email pool contains email addresses associated to a user ID.

.ZONE_NAME.meta:users.swift : The Swift pool contains the Swift subuser information for a user ID.

Reference
Edit online

About pools

Storage Strategies Guide

Index pool
Edit online
When selecting OSD hardware for use with a Ceph Object Gateway--​irrespective of the use case--​an OSD node that has at least one
high performance drive, either an SSD or NVMe drive, is required for storing the index pool. This is particularly important when
buckets contain a large number of objects.

For IBM Storage Ceph running BlueStore, IBM recommends deploying an NVMe drive as a block.db device, rather than as a
separate pool.

Ceph Object Gateway index data is written only into an object map (OMAP). OMAP data for BlueStore resides on the block.db
device on an OSD. When an NVMe drive functions as a block.db device for an HDD OSD and when the index pool is backed by HDD
OSDs, the index data will ONLY be written to the block.db device. As long as the block.db partition/lvm is sized properly at 4% of
block, this configuration is all that is needed for BlueStore.

NOTE: IBM does not support HDD devices for index pools. For more information on supported configurations, see the Red Hat Ceph
Storage:Supported configurations article.

An index entry is approximately 200 bytes of data, stored as an OMAP in rocksdb. While this is a trivial amount of data, some uses
of Ceph Object Gateway can result in tens or hundreds of millions of objects in a single bucket. By mapping the index pool to a

IBM Storage Ceph 611


CRUSH hierarchy of high performance storage media, the reduced latency provides a dramatic performance improvement when
buckets contain very large numbers of objects.

IMPORTANT: In a production cluster, a typical OSD node will have at least one SSD or NVMe drive for storing the OSD journal and the
index pool or block.db device, which use separate partitions or logical volumes for the same physical drive.

Data pool
Edit online
The data pool is where the Ceph Object Gateway stores the object data for a particular storage policy. The data pool has a full
complement of placement groups (PGs), not the reduced number of PGs for service pools. Consider using erasure coding for the data
pool, as it is substantially more efficient than replication, and can significantly reduce the capacity requirements while maintaining
data durability.

To use erasure coding, create an erasure code profile. See Erasure code profiles

IMPORTANT: Choosing the correct profile is important because you cannot change the profile after you create the pool. To modify a
profile, you must create a new pool with a different profile and migrate the objects from the old pool to the new pool.

The default configuration is two data chunks(k) and two encoding chunks(m), which means only one OSD can be lost. For higher
resiliency, consider a larger number of data and encoding chunks. For example, some large scale systems use 8 data chunks and 3
encoding chunks, which allows 3 OSDs to fail without losing data.

IMPORTANT: Each data and encoding chunk SHOULD get stored on a different node or host at a minimum. For smaller storage
clusters, this makes using rack impractical as the minimum CRUSH failure domain for a larger number of data and encoding chunks.
Consequently, it is common for the data pool to use a separate CRUSH hierarchy with host as the minimum CRUSH failure domain.
IBM recommends host as the minimum failure domain. If erasure code chunks get stored on Ceph OSDs within the same host, a
host failure, such as a failed journal or network card, could lead to data loss.

To create a data pool, run the ceph osd pool create command with the pool name, the number of PGs and PGPs, the erasure
data durability method, the erasure code profile, and the name of the rule.

Data extra pool


Edit online
The data_extra_pool is for data that cannot use erasure coding. For example, multi-part uploads allow uploading a large object,
such as a movie in multiple parts. These parts must first be stored without erasure coding. Erasure coding applies to the whole
object, not the partial uploads.

NOTE: The placement group (PG) per Pool Calculator recommends a smaller number of PGs per pool for the data_extra_pool;
however, the PG count is approximately twice the number of PGs as the service pools and the same as the bucket index pool.

To create a data extra pool, run the ceph osd pool create command with the pool name, the number of PGs and PGPs, the
replicated data durability method, and the name of the rule. For example:

# ceph osd pool create .us-west.rgw.buckets.non-ec 64 64 replicated rgw-service

Developing CRUSH hierarchies


Edit online
As a storage administrator, when deploying a Ceph storage cluster and an Object Gateway, typically the Ceph Object Gateway has a
default zone group and zone. The Ceph storage cluster will have default pools, which in turn will use a CRUSH map with a default
CRUSH hierarchy and a default CRUSH rule.

IMPORTANT: The default rbd pool can use the default CRUSH rule. DO NOT delete the default rule or hierarchy if Ceph clients have
used them to store client data.

612 IBM Storage Ceph


Production gateways typically use a custom realm, zone group and zone named according to the use and geographic location of the
gateways. Additionally, the Ceph storage cluster will have a CRUSH map that has multiple CRUSH hierarchies.

Service Pools: At least one CRUSH hierarchy will be for service pools and potentially for data. The service pools include
.rgw.root and the service pools associated with the zone. Service pools typically fall under a single CRUSH hierarchy, and
use replication for data durability. A data pool may also use the CRUSH hierarchy, but the pool will usually be configured with
erasure coding for data durability.

Index: At least one CRUSH hierarchy SHOULD be for the index pool, where the CRUSH hierarchy maps to high performance
media, such as SSD or NVMe drives. Bucket indices can be a performance bottleneck. IBM recommends to use SSD or NVMe
drives in this CRUSH hierarchy. Create partitions for indices on SSDs or NVMe drives used for Ceph OSD journals. Additionally,
an index should be configured with bucket sharding.

Placement Pools: The placement pools for each placement target include the bucket index, the data bucket, and the bucket
extras. These pools can fall under separate CRUSH hierarchies. Since the Ceph Object Gateway can support multiple storage
policies, the bucket pools of the storage policies may be associated with different CRUSH hierarchies, reflecting different use
cases, such as IOPS-optimized, throughput-optimized, and capacity-optimized. The bucket index pool SHOULD use its own
CRUSH hierarchy to map the bucket index pool to higher performance storage media, such as SSD or NVMe drives.

Creating CRUSH roots


Edit online
From the command line on the administration node, create CRUSH roots in the CRUSH map for each CRUSH hierarchy. There MUST
be at least one CRUSH hierarchy for service pools that may also potentially serve data storage pools. There SHOULD be at least one
CRUSH hierarchy for the bucket index pool, mapped to high performance storage media, such as SSDs or NVMe drives.

In the following examples, the hosts named data0, data1, and data2 use extended logical names, such as data0-sas-ssd,
data0-index, and so forth in the CRUSH map, because there are multiple CRUSH hierarchies pointing to the same physical hosts.

A typical CRUSH root might represent nodes with SAS drives and SSDs for journals. For example:

##
# SAS-SSD ROOT DECLARATION
##

root sas-ssd {
id -1 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
item data2-sas-ssd weight 4.000
item data1-sas-ssd weight 4.000
item data0-sas-ssd weight 4.000
}

A CRUSH root for bucket indexes SHOULD represent high performance media, such as SSD or NVMe drives. Consider creating
partitions on SSD or NVMe media that store OSD journals. For example:

##
# INDEX ROOT DECLARATION
##

root index {
id -2 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
item data2-index weight 1.000
item data1-index weight 1.000
item data0-index weight 1.000
}

Creating CRUSH rules

IBM Storage Ceph 613


Edit online
Like the default CRUSH hierarchy, the CRUSH map also contains a default CRUSH rule.

NOTE: The default rbd pool may use this rule. DO NOT delete the default rule if other pools have used it to store customer data.

For general details on CRUSH rules, see the CRUSH rules section. To manually edit a CRUSH map, see the Editing a CRUSH map.

For each CRUSH hierarchy, create a CRUSH rule. The following example illustrates a rule for the CRUSH hierarchy that will store the
service pools, including .rgw.root. In this example, the root sas-ssd serves as the main CRUSH hierarchy. It uses the name rgw-
service to distinguish itself from the default rule. The step take sas-ssd line tells the pool to use the sas-ssd root created in
_Creating CRUSH rootswhose child buckets contain OSDs with SAS drives and high performance storage media, such as SSD or
NVMe drives, for journals in a high throughput hardware configuration. The type rack portion of step chooseleaf is the failure
domain. In the following example, it is a rack.

##
# SERVICE RULE DECLARATION
##

rule rgw-service {
type replicated
min_size 1
max_size 10
step take sas-ssd
step chooseleaf firstn 0 type rack
step emit
}

NOTE: In the foregoing example, if data gets replicated three times, there should be at least three racks in the cluster containing a
similar number of OSD nodes.

TIP: The type replicated setting has NOTHING to do with data durability, the number of replicas, or the erasure coding. Only
replicated is supported.

The following example illustrates a rule for the CRUSH hierarchy that will store the data pool. In this example, the root sas-ssd
serves as the main CRUSH hierarchy—​the same CRUSH hierarchy as the service rule. It uses rgw-throughput to distinguish itself
from the default rule and rgw-service. The step take sas-ssd line tells the pool to use the sas-ssd root created in Creating
CRUSH roots, whose child buckets contain OSDs with SAS drives and high performance storage media, such as SSD or NVMe drives,
in a high throughput hardware configuration. The type host portion of step chooseleaf is the failure domain. In the following
example, it is a host. Notice that the rule uses the same CRUSH hierarchy, but a different failure domain.

##
# THROUGHPUT RULE DECLARATION
##

rule rgw-throughput {
type replicated
min_size 1
max_size 10
step take sas-ssd
step chooseleaf firstn 0 type host
step emit
}

NOTE: In the foregoing example, if the pool uses erasure coding with a larger number of data and encoding chunks than the default,
there should be at least as many racks in the cluster containing a similar number of OSD nodes to facilitate the erasure coding
chunks. For smaller clusters, this may not be practical, so the foregoing example uses host as the CRUSH failure domain.

The following example illustrates a rule for the CRUSH hierarchy that stores the index pool. In this example, the root index serves as
the main CRUSH hierarchy. It uses rgw-index to distinguish itself from rgw-service and rgw-throughput. The step take index line tells
the pool to use the index root created in Creating CRUSH roots, whose child buckets contain high performance storage media, such
as SSD or NVMe drives, or partitions on SSD or NVMe drives that also store OSD journals. The type rack portion of step chooseleaf is
the failure domain. In the following example, it is a rack.

##
# INDEX RULE DECLARATION
##

rule rgw-index {
type replicated
min_size 1
max_size 10

614 IBM Storage Ceph


step take index
step chooseleaf firstn 0 type rack
step emit
}

Reference
Edit online

CRUSH Administration

Ceph Object Gateway multi-site considerations


Edit online
A Ceph Object Gateway multi-site configuration requires at least two IBM Storage Ceph clusters, and at least two Ceph Object
Gateway instances, one for each IBM Storage Ceph cluster. Typically, the two IBM Storage Ceph clusters will be in geographically
separate locations; however, this same multi-site configuration can work on two IBM Storage Ceph clusters located at the same
physical site.

Multi-site configurations require a primary zone group and a primary zone. Additionally, each zone group requires a primary zone.
Zone groups might have one or more secondary zones.

IMPORTANT: The primary zone within the primary zone group of a realm is responsible for storing the primary copy of the realm’s
metadata, including users, quotas, and buckets. This metadata gets synchronized to secondary zones and secondary zone groups
automatically. Metadata operations issued with the radosgw-admin command line interface (CLI)MUST* be issued on a node within
the primary zone of the primary zone group to ensure that they synchronize to the secondary zone groups and zones. Currently, it is
possible to issue metadata operations on secondary zones and zone groups, but it is NOT recommended because they WILL NOT be
synchronized, which can lead to fragmentation of the metadata.

The diagrams below illustrate the possible one, and two realm configurations in multi-site Ceph Object Gateway environments.

Figure 1. One realm

Figure 2. Two realms

IBM Storage Ceph 615


Figure 3. Two realms variant

Considering storage sizing


Edit online
One of the most important factors in designing a cluster is to determine the storage requirements (sizing). Ceph Storage is designed
to scale into petabytes and beyond.

The following examples are common sizes for Ceph storage clusters.

Small: 250 terabytes

616 IBM Storage Ceph


Medium: 1 petabyte

Large: 2 petabytes or more

Sizing includes current needs and near future needs. Consider the rate at which the gateway client will add new data to the cluster.
That can differ from use-case to use-case. For example, recording 4k videos or storing medical images can add significant amounts
of data faster than less storage-intensive information, such as financial market data. Additionally, consider that the data durability
methods, such as replication versus erasure coding, can have a significant impact on the storage media required.

Reference
Edit online

For additional information on sizing, see the Hardware section and its associated links for selecting OSD hardware.

Considering storage density


Edit online
Another important aspect of Ceph’s design, includes storage density. Generally, a storage cluster stores data across at least 10
nodes to ensure reasonable performance when replicating, backfilling, and recovery. If a node fails, with at least 10 nodes in the
storage cluster, only 10% of the data has to move to the surviving nodes. If the number of nodes is substantially less, a higher
percentage of the data must move to the surviving nodes. Additionally, the full_ratio and near_full_ratio options need to be
set to accommodate a node failure to ensure that the storage cluster can write data. For this reason, it is important to consider
storage density. Higher storage density is not necessarily a good idea.

Another factor that favors more nodes over higher storage density is erasure coding. When writing an object using erasure coding
and using node as the minimum CRUSH failure domain, the Ceph storage cluster will need as many nodes as data and coding
chunks. For example, a cluster using k=8, m=3 should have at least 11 nodes so that each data or coding chunk is stored on a
separate node.

Hot-swapping is also an important consideration. Most modern servers support drive hot-swapping. However, some hardware
configurations require removing more than one drive to replace a drive. IBM recommends avoiding such configurations, because they
can bring down more Ceph OSDs than required when swapping out failed disks.

Considering disks for the Ceph Monitor nodes


Edit online
Ceph Monitors use rocksdb, which is sensitive to synchronous write latency. IBM strongly recommends using SSD disks to store the
Ceph Monitor data. Choose SSD disks that have sufficient sequential write and throughput characteristics.

Adjusting backfill and recovery settings


Edit online
I/O is negatively impacted by both backfilling and recovery operations, leading to poor performance and unhappy end users. To help
accommodate I/O demand during a cluster expansion or recovery, set the following options and values in the Ceph Configuration file:

[osd]
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1

Adjusting the cluster map size


Edit online

IBM Storage Ceph 617


By default, the ceph-osd daemon caches 500 previous osdmaps. Even with deduplication, the map might consume a lot of memory
per daemon. Tuning the cache size in the Ceph configuration might help reduce memory consumption significantly. For example:

[ceph: root@host01 /]# ceph config set global osd_map_message_max 10


[ceph: root@host01 /]# ceph config set osd osd_map_cache_size 20
[ceph: root@host01 /]# ceph config set osd osd_map_share_max_epochs 10
[ceph: root@host01 /]# ceph config set osd osd_pg_epoch_persisted_max_stale 10

The ceph-manager daemon handles PG queries, so the cluster map should not impact performance.

Adjusting scrubbing
Edit online
By default, Ceph performs light scrubbing daily and deep scrubbing weekly. Light scrubbing checks object sizes and checksums to
ensure that PGs are storing the same object data. Over time, disk sectors can go bad irrespective of object sizes and checksums.
Deep scrubbing checks an object’s content with that of its replicas to ensure that the actual contents are the same. In this respect,
deep scrubbing ensures data integrity in the manner of fsck, but the procedure imposes an I/O penalty on the cluster. Even light
scrubbing can impact I/O.

The default settings may allow Ceph OSDs to initiate scrubbing at inopportune times, such as peak operating times or periods with
heavy loads. End users may experience latency and poor performance when scrubbing operations conflict with end user operations.

To prevent end users from experiencing poor performance, Ceph provides a number of scrubbing settings that can limit scrubbing to
periods with lower loads or during off-peak hours. See Scrubbing the OSD for more details.

If the cluster experiences high loads during the day and low loads late at night, consider restricting scrubbing to night time hours. For
example:

[osd]
osd_scrub_begin_hour = 23 #23:01H, or 10:01PM.
osd_scrub_end_hour = 6 #06:01H or 6:01AM.

If time constraints aren’t an effective method of determining a scrubbing schedule, consider using the
osd_scrub_load_threshold. The default value is 0.5, but it could be modified for low load conditions. For example:

[osd]
osd_scrub_load_threshold = 0.25

Increase rgw_thread_pool_size

Edit online
To improve scalability, you can edit the value of the rgw_thread_pool_size parameter, which is the size of the thread pool. The
new beast frontend is not restricted by the thread pool size to accept new connections.

rgw_thread_pool_size = 512

Increase objecter_inflight_ops

Edit online
To improve scalability, you can edit the value of the objecter_inflight_ops parameter, which specifies the maximum number of
unsent I/O requests allowed. This parameter is used for client traffic control.

objecter_inflight_ops = 24576

Tuning considerations for the Linux kernel when running Ceph

618 IBM Storage Ceph


Edit online
Production IBM Storage clusters clusters generally benefit from tuning the operating system, specifically around limits and memory
allocation. Ensure that adjustments are set for all hosts within the storage cluster. You can also open a case with IBM Support asking
for additional guidance.

Increase the File Descriptors

The Ceph Object Gateway can hang if it runs out of file descriptors. You can modify the /etc/security/limits.conf file on Ceph
Object Gateway hosts to increase the file descriptors for the Ceph Object Gateway.

ceph soft nofile unlimited

Adjusting the ulimit value for Large Storage Clusters

When running Ceph administrative commands on large storage clusters, ​for example, with 1024 Ceph OSDs or more, ​create an
/etc/security/limits.d/50-ceph.conf file on each host that runs administrative commands with the following contents:

USER_NAME soft nproc unlimited

Replace USER_NAME with the name of the non-root user account that runs the Ceph administrative commands.

NOTE: The root user’s ulimit value is already set to unlimited by default on Red Hat Enterprise Linux.

Deployment
Edit online
As a storage administrator, you can deploy the Ceph Object Gateway using the Ceph Orchestrator with the command line interface or
the service specification. You can also configure multi-site Ceph Object Gateways, and remove the Ceph Object Gateway using the
Ceph Orchestrator.

The cephadm command deploys the Ceph Object Gateway as a collection of daemons that manages a single-cluster deployment or a
particular realm and zone in a multi-site deployment.

NOTE: With cephadm, the Ceph Object Gateway daemons are configured using the Ceph Monitor configuration database instead of
the ceph.conf file or the command line options. If the configuration is not in the client.rgw section, then the Ceph Object
Gateway daemons start up with default settings and bind to port 80.

WARNING: If you want Cephadm to handle the setting of a realm and zone, specify the realm and zone in the service specification
during the deployment of the Ceph Object Gateway. If you want to change that realm or zone at a later point, ensure to update and
reapply the rgw_realm and rgw_zone parameters in the specification file. If you want to handle these options manually without
Cephadm, do not include them in the service specification. Cephadm still deploys the Ceph Object Gateway daemons without setting
the configuration option for which realm or zone the daemons should use. In this case, the update of the specification file is not
necessary.

Deploying the Ceph Object Gateway using the command line interface
Deploying the Ceph Object Gateway using the service specification
Deploying a multi-site Ceph Object Gateway using the Ceph Orchestrator
Removing the Ceph Object Gateway using the Ceph Orchestrator

Prerequisites
Edit online

A running, and healthy IBM Storage Ceph cluster.

Root-level access to all the nodes.

Available nodes on the storage cluster.

All the managers, monitors, and OSDs are deployed in the storage cluster.

IBM Storage Ceph 619


Deploying the Ceph Object Gateway using the command line
interface
Edit online
Using the Ceph Orchestrator, you can deploy the Ceph Object Gateway with the ceph orch command in the command line
interface.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

All manager, monitor and OSD daemons are deployed.

Log in to the Cephadm shell by using the cephadm shell to deploy Ceph Object Gateway daemons.

Procedure
Edit online
Method 1:

1. You can deploy the Ceph object gateway daemons in three different ways:

Create realm, zone group, zone, and then use the placement specification with the host name:

1. Create a realm:

Syntax

radosgw-admin realm create --rgw-realm=REALM_NAME --default

Example

[ceph: root@host01 /]# radosgw-admin realm create --rgw-realm=test_realm --default

2. Create a zone group:

Syntax

radosgw-admin zonegroup create --rgw-zonegroup=ZONE_GROUP_NAME --master --default

Example

[ceph: root@host01 /]# radosgw-admin zonegroup create --rgw-zonegroup=default --master -


-default

3. Create a zone:

Syntax

radosgw-admin zone create --rgw-zonegroup=ZONE_GROUP_NAME --rgw-zone=ZONE_NAME --master -


-default

Example

[ceph: root@host01 /]# radosgw-admin zone create --rgw-zonegroup=default --rgw-


zone=test_zone --master --default

4. Commit the changes:

Syntax

620 IBM Storage Ceph


radosgw-admin period update --rgw-realm=REALM_NAME --commit

Example

[ceph: root@host01 /]# radosgw-admin period update --rgw-realm=test_realm --commit

5. Run the ceph orch apply command:

Syntax

ceph orch apply rgw NAME [--realm=REALM_NAME] [--zone=ZONE_NAME] --


placement="NUMBER_OF_DAEMONS [HOST_NAME_1 HOST_NAME_2]"

Example

[ceph: root@host01 /]# ceph orch apply rgw test --realm=test_realm --zone=test_zone --
placement="2 host01 host02"

Method 2:

Use an arbitrary service name to deploy two Ceph Object Gateway daemons for a single cluster deployment:

Syntax

ceph orch apply rgw SERVICE_NAME

Example

[ceph: root@host01 /]# ceph orch apply rgw foo

Method 3:

Use an arbitrary service name on a labeled set of hosts:

Syntax

ceph orch host label add HOST_NAME_1 LABEL_NAME


ceph orch host label add HOSTNAME_2 LABEL_NAME
ceph orch apply rgw SERVICE_NAME --placement="label:LABEL_NAME count-per-
host:NUMBER_OF_DAEMONS" --port=8000

NUMBER_OF_DAEMONS controls the number of Ceph object gateways deployed on each host. To achieve the highest
performance without incurring an additional cost, set this value to 2.

Example

[ceph: root@host01 /]# ceph orch host label add host01 rgw # the 'rgw' label can be anything
[ceph: root@host01 /]# ceph orch host label add host02 rgw
[ceph: root@host01 /]# ceph orch apply rgw foo "--placement=label:rgw count-per-host:2" --
port=8000

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=rgw

IBM Storage Ceph 621


Deploying the Ceph Object Gateway using the service specification
Edit online
You can deploy the Ceph Object Gateway using the service specification with either the default or the custom realms, zones, and
zone groups.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the bootstrapped host.

Hosts are added to the cluster.

All manager, monitor, and OSD daemons are deployed.

Procedure
Edit online

1. As a root user, create a specification file:

Example

[root@host01 ~]# touch radosgw.yml

2. Edit the radosgw.yml file to include the following details for the default realm, zone, and zone group:

Syntax

service_type: rgw
service_id: REALM_NAME.ZONE_NAME
placement:
hosts:
- HOST_NAME_1
- HOST_NAME_2
count-per-host: NUMBER_OF_DAEMONS
spec:
rgw_realm: REALM_NAME
rgw_zone: ZONE_NAME
rgw_zonegroup: ZONE_GROUP_NAME
rgw_frontend_port: FRONT_END_PORT
networks:
- NETWORK_CIDR # Ceph Object Gateway service binds to a specific network

NUMBER_OF_DAEMONS controls the number of Ceph Object Gateways deployed on each host. To achieve the highest
performance without incurring an additional cost, set this value to 2.

Example

service_type: rgw
service_id: default
placement:
hosts:
- host01
- host02
- host03
count-per-host: 2
spec:
rgw_realm: default
rgw_zone: default
rgw_zonegroup: default
rgw_frontend_port: 1234
networks:
- 192.169.142.0/24

622 IBM Storage Ceph


3. Optional: For custom realm, zone, and zone group, create the resources and then create the radosgw.yml file:

a. Create the custom realm, zone, and zone group:

Example

[root@host01 ~]# radosgw-admin realm create --rgw-realm=test_realm --default


[root@host01 ~]# radosgw-admin zonegroup create --rgw-zonegroup=test_zonegroup --default
[root@host01 ~]# radosgw-admin zone create --rgw-zonegroup=test_zonegroup --rgw-
zone=test_zone --default
[root@host01 ~]# radosgw-admin period update --rgw-realm=test_realm --commit

b. Create the radosgw.yml file with the following details:

Example

service_type: rgw
service_id: test_realm.test_zone
placement:
hosts:
- host01
- host02
- host03
count-per-host: 2
spec:
rgw_realm: test_realm
rgw_zone: test_zone
rgw_zonegroup: test_zonegroup
rgw_frontend_port: 1234
networks:
- 192.169.142.0/24

4. Mount the radosgw.yml file under a directory in the container:

Example

[root@host01 ~]# cephadm shell --mount radosgw.yml:/var/lib/ceph/radosgw/radosgw.yml

NOTE: Every time you exit the shell, you have to mount the file in the container before deploying the daemon.

5. Deploy the Ceph Object Gateway using the service specification:

Syntax

ceph orch apply -i FILE_NAME.yml

Example

[ceph: root@host01 /]# ceph orch apply -i radosgw.yml

Verification
Edit online

List the service:

Example

[ceph: root@host01 /]# ceph orch ls

List the hosts, daemons, and processes:

Syntax

ceph orch ps --daemon_type=DAEMON_NAME

Example

[ceph: root@host01 /]# ceph orch ps --daemon_type=rgw

IBM Storage Ceph 623


Deploying a multi-site Ceph Object Gateway using the Ceph
Orchestrator
Edit online
Ceph Orchestrator supports multi-site configuration options for the Ceph Object Gateway.

You can configure each object gateway to work in an active-active zone configuration allowing writes to a non-primary zone. The
multi-site configuration is stored within a container called a realm.

The realm stores zone groups, zones, and a time period. The rgw daemons handle the synchronization eliminating the need for a
separate synchronization agent, thereby operating with an active-active configuration.

You can also deploy multi-site zones using the command line interface (CLI).

NOTE: The following configuration assumes at least two IBM Storage Ceph clusters are in geographically separate
locations.However, the configuration also works on the same site.

Prerequisites
Edit online

At least two running IBM Storage Ceph clusters.

At least two Ceph Object Gateway instances, one for each IBM Storage Ceph cluster.

Root-level access to all the nodes.

Nodes or containers are added to the storage cluster.

All Ceph Manager, Monitor and OSD daemons are deployed.

Procedure
Edit online

1. In the cephadm shell, configure the primary zone:

a. Create a realm:

Syntax

radosgw-admin realm create --rgw-realm=REALM_NAME --default

Example

[ceph: root@host01 /]# radosgw-admin realm create --rgw-realm=test_realm --default

If the storage cluster has a single realm, then specify the --default flag.

b. Create a primary zone group:

Syntax

radosgw-admin zonegroup create --rgw-zonegroup=_ZONE_GROUP_NAME_ --


endpoints=http://RGW_PRIMARY_HOSTNAME:RGW_PRIMARY_PORT_NUMBER_1 --master --default

Example

[ceph: root@host01 /]# radosgw-admin zonegroup create --rgw-zonegroup=us --


endpoints=http://rgw1:80 --master --default

c. Create a primary zone:

Syntax

624 IBM Storage Ceph


radosgw-admin zone create --rgw-zonegroup=PRIMARY_ZONE_GROUP_NAME --rgw-
zone=PRIMARY_ZONE_NAME --endpoints=http://RGW_PRIMARY_HOSTNAME:RGW_PRIMARY_PORT_NUMBER_1
--access-key=_SYSTEM_ACCESS_KEY_ --secret=_SYSTEM_SECRET_KEY_

Example

[ceph: root@host01 /]# radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-1


--endpoints=http://rgw1:80 --access-key=LIPEYZJLTWXRKXS9LPJC --secret-
key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ

d. Optional: Delete the default zone, zone group, and the associated pools.

IMPORTANT: Do not delete the default zone and its pools if you are using the default zone and zone group to store
data. Also, removing the default zone group deletes the system user.

To access old data in the default zone and zonegroup, use --rgw-zone default and --rgw-zonegroup
default in radosgw-admin commands.

Example

[ceph: root@host01 /]# radosgw-admin zonegroup delete --rgw-zonegroup=default


[ceph: root@host01 /]# ceph osd pool rm default.rgw.log default.rgw.log --yes-i-really-
really-mean-it
[ceph: root@host01 /]# ceph osd pool rm default.rgw.meta default.rgw.meta --yes-i-really-
really-mean-it
[ceph: root@host01 /]# ceph osd pool rm default.rgw.control default.rgw.control --yes-i-
really-really-mean-it
[ceph: root@host01 /]# ceph osd pool rm default.rgw.data.root default.rgw.data.root --
yes-i-really-really-mean-it
[ceph: root@host01 /]# ceph osd pool rm default.rgw.gc default.rgw.gc --yes-i-really-
really-mean-it

e. Create a system user:

Syntax

radosgw-admin user create --uid=USER_NAME --display-name="USER_NAME" --access-


key=SYSTEM_ACCESS_KEY --secret=SYSTEM_SECRET_KEY --system

Example

[ceph: root@host01 /]# radosgw-admin user create --uid=zone.user --display-name="Zone


user" --system

Make a note of the access_key and secret_key.

f. Add the access key and system key to the primary zone:

Syntax

radosgw-admin zone modify --rgw-zone=PRIMARY_ZONE_NAME --access-key=ACCESS_KEY --


secret=SECRET_KEY

Example

[ceph: root@host01 /]# radosgw-admin zone modify --rgw-zone=us-east-1 --access-


key=NE48APYCAODEPLKBCZVQ --secret=u24GHQWRE3yxxNBnFBzjM4jn14mFIckQ4EKL6LoW

g. Commit the changes:

Syntax

radosgw-admin period update --commit

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

h. Outside the cephadm shell, fetch the FSID of the storage cluster and the processes:

Example

[root@host01 ~]# systemctl list-units | grep ceph

i. Start the Ceph Object Gateway daemon:

IBM Storage Ceph 625


Syntax

systemctl start ceph-FSID@DAEMON_NAME


systemctl enable ceph-FSID@DAEMON_NAME

Example

[root@host01 ~]# systemctl start ceph-62a081a6-88aa-11eb-a367-


[email protected]_realm.us-east-1.host01.ahdtsw.service
[root@host01 ~]# systemctl enable ceph-62a081a6-88aa-11eb-a367-
[email protected]_realm.us-east-1.host01.ahdtsw.service

2. In the Cephadm shell, configure the secondary zone.

a. Pull the primary realm configuration from the host:

Syntax

radosgw-admin realm pull --rgw-realm=PRIMARY_REALM --url=URL_TO_PRIMARY_ZONE_GATEWAY --


access-key=ACCESS_KEY --secret-key=SECRET_KEY --default

Example

[ceph: root@host04 /]# radosgw-admin realm pull --rgw-realm=test_realm --


url=http://10.74.249.26:80 --access-key=LIPEYZJLTWXRKXS9LPJC --secret-
key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ --default

b. Pull the primary period configuration from the host:

Syntax

radosgw-admin period pull --url=URL_TO_PRIMARY_ZONE_GATEWAY --access-key=ACCESS_KEY --


secret-key=SECRET_KEY

Example

[ceph: root@host04 /]# radosgw-admin period pull --url=http://10.74.249.26:80 --access-


key=LIPEYZJLTWXRKXS9LPJC --secret-key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ

c. Configure a secondary zone:

Syntax

radosgw-admin zone create --rgw-zonegroup=ZONE_GROUP_NAME


--rgw-zone=SECONDARY_ZONE_NAME
--access-key=SYSTEM_ACCESS_KEY --secret=SYSTEM_SECRET_KEY
--endpoints=http://FQDN:80
[--read-only]

Example

[ceph: root@host04 /]# radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-2


--endpoints=http://rgw2:80 --access-key=LIPEYZJLTWXRKXS9LPJC --secret-
key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ

d. Optional: Delete the default zone.

IMPORTANT: Do not delete the default zone and its pools if you are using the default zone and zone group to store
data. To access old data in the default zone and zonegroup, use --rgw-zone default and --rgw-zonegroup
default in radosgw-admin commands.

Example

[ceph: root@host04 /]# radosgw-admin zone rm --rgw-zone=default


[ceph: root@host04 /]# ceph osd pool rm default.rgw.log default.rgw.log --yes-i-really-
really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.meta default.rgw.meta --yes-i-really-
really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.control default.rgw.control --yes-i-
really-really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.data.root default.rgw.data.root --
yes-i-really-really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.gc default.rgw.gc --yes-i-really-
really-mean-it

626 IBM Storage Ceph


e. Update the Ceph configuration database:

Syntax

ceph config set SERVICE_NAME rgw_zone SECONDARY_ZONE_NAME

Example

[ceph: root@host04 /]# ceph config set rgw rgw_zone us-east-2

f. Commit the changes:

Syntax

radosgw-admin period update --commit

Example

[ceph: root@host04 /]# radosgw-admin period update --commit

g. Outside the Cephadm shell, fetch the FSID of the storage cluster and the processes:

Example

[root@host04 ~]# systemctl list-units | grep ceph

h. Start the Ceph Object Gateway daemon:

Syntax

systemctl start ceph-FSID@DAEMON_NAME


systemctl enable ceph-FSID@DAEMON_NAME

Example

[root@host04 ~]# systemctl start ceph-62a081a6-88aa-11eb-a367-


[email protected]_realm.us-east-2.host04.ahdtsw.service
[root@host04 ~]# systemctl enable ceph-62a081a6-88aa-11eb-a367-
[email protected]_realm.us-east-2.host04.ahdtsw.service

3. Optional: Deploy multi-site Ceph Object Gateways using the placement specification:

Syntax

ceph orch apply rgw NAME --realm=REALM_NAME --zone=PRIMARY_ZONE_NAME --


placement="NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_2"

Example

[ceph: root@host04 /]# ceph orch apply rgw east --realm=test_realm --zone=us-east-1 --
placement="2 host01 host02"

Verification
Edit online

Check the synchronization status to verify the deployment:

Example

[ceph: root@host04 /]# radosgw-admin sync status

Removing the Ceph Object Gateway using the Ceph Orchestrator


Edit online
You can remove the Ceph object gateway daemons using the ceph orch rm command.

Prerequisites
IBM Storage Ceph 627
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all the nodes.

Hosts are added to the cluster.

At least one Ceph object gateway daemon deployed on the hosts.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List the service:

Example

[ceph: root@host01 /]# ceph orch ls

3. Remove the service:

Syntax

ceph orch rm SERVICE_NAME

Example

[ceph: root@host01 /]# ceph orch rm rgw.test_realm.test_zone_bb

Verification
Edit online

List the hosts, daemons, and processes:

Syntax

ceph orch ps

Example

[ceph: root@host01 /]# ceph orch ps

Reference
Edit online

Deploying the Ceph object gateway using the command line interface

Deploying the Ceph object gateway using the service specification

Basic configuration
Edit online
As a storage administrator, learning the basics of configuring the Ceph Object Gateway is important. You can learn about the defaults
and the embedded web server called Beast. For troubleshooting issues with the Ceph Object Gateway, you can adjust the logging
and debugging output generated by the Ceph Object Gateway. Also, you can provide a High-Availability proxy for storage cluster
access using the Ceph Object Gateway.

628 IBM Storage Ceph


Add a wildcard to the DNS
The Beast front-end web server
Configuring SSL for Beast
Adjusting logging and debugging output
Static web hosting
High availability for the Ceph Object Gateway

Add a wildcard to the DNS


Edit online
You can add the wildcard such as hostname to the DNS record of the DNS server.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ceph Object Gateway installed.

Root-level access to the admin node.

Procedure
Edit online

1. To use Ceph with S3-style subdomains, add a wildcard to the DNS record of the DNS server that the ceph-radosgw daemon
uses to resolve domain names:

Syntax

bucket-name.domain-name.com

For dnsmasq, add the following address setting with a dot (.) prepended to the host name:

Syntax

address=/.HOSTNAME_OR_FQDN/HOST_IP_ADDRESS

Example

address=/.gateway-host01/192.168.122.75

For bind, add a wildcard to the DNS record:

Example

$TTL 604800
@ IN SOA gateway-host01. root.gateway-host01. (
2 ; Serial
604800 ; Refresh
86400 ; Retry
2419200 ; Expire
604800 ) ; Negative Cache TTL
;
@ IN NS gateway-host01.
@ IN A 192.168.122.113
* IN CNAME @

2. Restart the DNS server and ping the server with a subdomain to ensure that the ceph-radosgw daemon can process the
subdomain requests:

Syntax

ping mybucket.HOSTNAME

IBM Storage Ceph 629


Example

[root@host01 ~]# ping mybucket.gateway-host01

3. If the DNS server is on the local machine, you might need to modify /etc/resolv.conf by adding a nameserver entry for
the local machine.

4. Add the host name in the Ceph Object Gateway zone group:

a. Get the zone group:

Syntax

radosgw-admin zonegroup get --rgw-zonegroup=ZONEGROUP_NAME > zonegroup.json

Example

[ceph: root@host01 /]# radosgw-admin zonegroup get --rgw-zonegroup=us > zonegroup.json

b. Take a back-up of the JSON file:

Example

[ceph: root@host01 /]# cp zonegroup.json zonegroup.backup.json

c. View the zonegroup.json file:

Example

[ceph: root@host01 /]# cat zonegroup.json


{
"id": "d523b624-2fa5-4412-92d5-a739245f0451",
"name": "asia",
"api_name": "asia",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "d2a3b90f-f4f3-4d38-ac1f-6463a2b93c32",
"zones": [
{
"id": "d2a3b90f-f4f3-4d38-ac1f-6463a2b93c32",
"name": "india",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "d7e2ad25-1630-4aee-9627-84f24e13017f",
"sync_policy": {
"groups": []
}
}

d. Update the zonegroup.json file with new host name:

Example

"hostnames": ["host01", "host02","host03"],

630 IBM Storage Ceph


e. Set the zone group back in the Ceph Object Gateway:

Syntax

radosgw-admin zonegroup set --rgw-zonegroup=ZONEGROUP_NAME --infile=zonegroup.json

Example

[ceph: root@host01 /]# radosgw-admin zonegroup set --rgw-zonegroup=us --


infile=zonegroup.json

f. Update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

g. Restart the Ceph Object Gateway so that the DNS setting takes effect.

Reference
Edit online

The Ceph configuration database

The Beast front-end web server


Edit online
The Ceph Object Gateway provides Beast, a C/Cembedded front-end web server. Beast uses the Boost.Beast C
library to parse HTTP, and Boost.Asio for asynchronous network I/O.

Boost C++ Libraries

Configuring SSL for Beast


Edit online
You can configure the Beast front-end web server to use the OpenSSL library to provide Transport Layer Security (TLS). To use
Secure Socket Layer (SSL) with Beast, you need to obtain a certificate from a Certificate Authority (CA) that matches the hostname of
the Ceph Object Gateway node. Beast also requires the secret key, server certificate, and any other CA in a single .pem file.

IMPORTANT: Prevent unauthorized access to the .pem file, because it contains the secret key hash.

IMPORTANT: IBM recommends obtaining a certificate from a CA with the Subject Alternative Name (SAN) field, and a wildcard for
use with S3-style subdomains.

IMPORTANT: IBM recommends only using SSL with the Beast front-end web server for small to medium sized test environments. For
production environments, you must use HAProxy and keepalived to terminate the SSL connection at the HAProxy.

If the Ceph Object Gateway acts as a client and a custom certificate is used on the server, set the rgw_verify_ssl parameter to
false because injecting a custom CA to Ceph Object Gateways is currently unavailable.

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_verify_ssl false

Prerequisites
Edit online

A running, and healthy IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software package.

IBM Storage Ceph 631


Installation of the OpenSSL software package.

Root-level access to the Ceph Object Gateway node.

Procedure
Edit online

1. Create a new file named rgw.yml in the current directory:

Example

[ceph: root@host01 /]# touch rgw.yml

2. Open the rgw.yml file for editing, and customize it for the environment:

Syntax

service_type: rgw
service_id: SERVICE_ID
service_name: SERVICE_NAME
placement:
hosts:
- HOST_NAME
spec:
ssl: true
rgw_frontend_ssl_certificate: CERT_HASH

Example

service_type: rgw
service_id: foo
service_name: rgw.foo
placement:
hosts:
- host01
spec:
ssl: true
rgw_frontend_ssl_certificate: |
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA+Cf4l9OagD6x67HhdCy4Asqw89Zz9ZuGbH50/7ltIMQpJJU0
gu9ObNtIoC0zabJ7n1jujueYgIpOqGnhRSvsGJiEkgN81NLQ9rqAVaGpadjrNLcM
bpgqJCZj0vzzmtFBCtenpb5l/EccMFcAydGtGeLP33SaWiZ4Rne56GBInk6SATI/
JSKweGD1y5GiAWipBR4C74HiAW9q6hCOuSdp/2WQxWT3T1j2sjlqxkHdtInUtwOm
j5Ism276IndeQ9hR3reFR8PJnKIPx73oTBQ7p9CMR1J4ucq9Ny0J12wQYT00fmJp
-----END RSA PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
MIIEBTCCAu2gAwIBAgIUGfYFsj8HyA9Zv2l600hxzT8+gG4wDQYJKoZIhvcNAQEL
BQAwgYkxCzAJBgNVBAYTAklOMQwwCgYDVQQIDANLQVIxDDAKBgNVBAcMA0JMUjEM
MAoGA1UECgwDUkhUMQswCQYDVQQLDAJCVTEkMCIGA1UEAwwbY2VwaC1zc2wtcmhj
czUtOGRjeHY2LW5vZGU1MR0wGwYJKoZIhvcNAQkBFg5hYmNAcmVkaGF0LmNvbTAe
-----END CERTIFICATE-----

3. Deploy the Ceph Object Gateway using the service specification file:

Example

[ceph: root@host01 /]# ceph orch apply -i rgw.yml

Adjusting logging and debugging output


Edit online
Once you finish the setup procedure, check your logging output to ensure it meets your needs. By default, the Ceph daemons log to
journald, and you can view the logs using the journalctl command. Alternatively, you can also have the Ceph daemons log to
files, which are located under the /var/log/ceph/CEPH_CLUSTER_ID/ directory.

IMPORTANT: Verbose logging can generate over 1 GB of data per hour. This type of logging can potentially fill up the operating
system’s disk, causing the operating system to stop functioning.

632 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

Procedure
Edit online

1. Set the following parameter to increase the Ceph Object Gateway logging output:

Syntax

ceph config set client.rgw debug_rgw VALUE

Example

[ceph: root@host01 /]# ceph config set client.rgw debug_rgw 20

a. You can also modify these settings at runtime:

Syntax

ceph --admin-daemon /var/run/ceph/CEPH_CLUSTER_ID/ceph-client.rgw.NAME.asok config set


debug_rgw VALUE

Example

[ceph: root@host01 /]# ceph --admin-daemon /var/run/ceph/62a081a6-88aa-11eb-a367-


001a4a000672/ceph-client.rgw.rgw.asok config set debug_rgw 20

2. Optionally, you can configure the Ceph daemons to log their output to files. Set the log_to_file, and
mon_cluster_log_to_file options to true:

Example

[ceph: root@host01 /]# ceph config set global log_to_file true


[ceph: root@host01 /]# ceph config set global mon_cluster_log_to_file true

Reference
Edit online

Ceph debugging and logging configuration options

Static web hosting


Edit online
As a storage administrator, you can configure the Ceph Object Gateway to host static websites in S3 buckets. Traditional website
hosting involves configuring a web server for each website, which can use resources inefficiently when content does not change
dynamically. For example, sites that do not use server-side services like PHP, servlets, databases, nodejs, and the like. This approach
is substantially more economical than setting up virtual machines with web servers for each site.

Static web hosting assumptions


Static web hosting requirements
Static web hosting gateway setup
Static web hosting DNS configuration
Creating a static web hosting site

IBM Storage Ceph 633


Static web hosting assumptions
Edit online
Static web hosting requires at least one running IBM Storage Ceph cluster, and at least two Ceph Object Gateway instances for the
static web sites.

IBM assumes that each zone will have multiple gateway instances using a load balancer, such as high-availability (HA) Proxy and
keepalived.

IMPORTANT: IBM DOES NOT support using a Ceph Object Gateway instance to deploy both standard S3/Swift APIs and static web
hosting simultaneously.

Reference
Edit online

High availability service

Static web hosting requirements


Edit online
Static web hosting functionality uses its own API, so configuring a gateway to use static web sites in S3 buckets requires the
following:

1. S3 static web hosting uses Ceph Object Gateway instances that are separate and distinct from instances used for standard
S3/Swift API use cases.

2. Gateway instances hosting S3 static web sites should have separate, non-overlapping domain names from the standard
S3/Swift API gateway instances.

3. Gateway instances hosting S3 static web sites should use separate public-facing IP addresses from the standard S3/Swift API
gateway instances.

4. Gateway instances hosting S3 static web sites load balance, and if necessary terminate SSL, using HAProxy/keepalived.

Static web hosting gateway setup


Edit online
To enable a Ceph Object Gateway for static web hosting, set the following options:

Syntax

ceph config set client.rgw OPTION VALUE

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_enable_static_website true


[ceph: root@host01 /]# ceph config set client.rgw rgw_enable_apis s3,s3website
[ceph: root@host01 /]# ceph config set client.rgw rgw_dns_name objects-zonegroup.example.com
[ceph: root@host01 /]# ceph config set client.rgw rgw_dns_s3website_name objects-website-
zonegroup.example.com
[ceph: root@host01 /]# ceph config set client.rgw rgw_resolve_cname true

The rgw_enable_static_website setting MUST be true. The rgw_enable_apis setting MUST enable the s3website API.
The rgw_dns_name and rgw_dns_s3website_name settings must provide their fully qualified domains. If the site uses canonical
name extensions, then set the rgw_resolve_cname option to true.

IMPORTANT: The FQDNs of rgw_dns_name and rgw_dns_s3website_name MUST NOT overlap.

634 IBM Storage Ceph


Static web hosting DNS configuration
Edit online
The following is an example of assumed DNS settings, where the first two lines specify the domains of the gateway instance using a
standard S3 interface and point to the IPv4 and IPv6 addresses. The third line provides a wildcard CNAME setting for S3 buckets
using canonical name extensions. The fourth and fifth lines specify the domains for the gateway instance using the S3 website
interface and point to their IPv4 and IPv6 addresses.

objects-zonegroup.domain.com. IN A 192.0.2.10
objects-zonegroup.domain.com. IN AAAA 2001:DB8::192:0:2:10
*.objects-zonegroup.domain.com. IN CNAME objects-zonegroup.domain.com.
objects-website-zonegroup.domain.com. IN A 192.0.2.20
objects-website-zonegroup.domain.com. IN AAAA 2001:DB8::192:0:2:20

NOTE: The IP addresses in the first two lines differ from the IP addresses in the fourth and fifth lines.

If using Ceph Object Gateway in a multi-site configuration, consider using a routing solution to route traffic to the gateway closest to
the client.

The Amazon Web Service (AWS) requires static web host buckets to match the host name. Ceph provides a few different ways to
configure the DNS, and HTTPS will work if the proxy has a matching certificate.

Hostname to a Bucket on a Subdomain

To use AWS-style S3 subdomains, use a wildcard in the DNS entry which can redirect requests to any bucket. A DNS entry might look
like the following:

*.objects-website-zonegroup.domain.com. IN CNAME objects-website-zonegroup.domain.com.

Access the bucket name, where the bucket name is bucket1, in the following manner:

http://bucket1.objects-website-zonegroup.domain.com

Hostname to Non-Matching Bucket

Ceph supports mapping domain names to buckets without including the bucket name in the request, which is unique to Ceph Object
Gateway. To use a domain name to access a bucket, map the domain name to the bucket name. A DNS entry might look like the
following:

www.example.com. IN CNAME bucket2.objects-website-zonegroup.domain.com.

Where the bucket name is bucket2.

Access the bucket in the following manner:

http://www.example.com

Hostname to Long Bucket with CNAME

AWS typically requires the bucket name to match the domain name. To configure the DNS for static web hosting using CNAME, the
DNS entry might look like the following:

www.example.com. IN CNAME www.example.com.objects-website-zonegroup.domain.com.

Access the bucket in the following manner:

http://www.example.com

Hostname to Long Bucket without CNAME

If the DNS name contains other non-CNAME records, such as SOA, NS, MX or TXT, the DNS record must map the domain name
directly to the IP address. For example:

www.example.com. IN A 192.0.2.20
www.example.com. IN AAAA 2001:DB8::192:0:2:20

Access the bucket in the following manner:

http://www.example.com

IBM Storage Ceph 635


Creating a static web hosting site
Edit online
To create a static website, perform the following steps:

1. Create an S3 bucket. The bucket name MIGHT be the same as the website’s domain name. For example, mysite.com may
have a bucket name of mysite.com. This is required for AWS, but it is NOT required for Ceph.

See Static web hosting DNS configuration fo details

2. Upload the static website content to the bucket. Contents may include HTML, CSS, client-side JavaScript, images, audio/video
content, and other downloadable files. A website MUST have an index.html file and might have an error.html file.

3. Verify the website’s contents. At this point, only the creator of the bucket has access to the contents.

4. Set permissions on the files so that they are publicly readable.

High availability for the Ceph Object Gateway


Edit online
As a storage administrator, you can assign many instances of the Ceph Object Gateway to a single zone. This allows you to scale out
as the load increases, that is, the same zone group and zone; however, you do not need a federated architecture to use a highly
available proxy. Since each Ceph Object Gateway daemon has its own IP address, you can use the ingress service to balance the
load across many Ceph Object Gateway daemons or nodes. The ingress service manages HAProxy and keepalived daemons for
the Ceph Object Gateway environment. You can also terminate HTTPS traffic at the HAProxy server, and use HTTP between the
HAProxy server and the Beast front-end web server instances for the Ceph Object Gateway.

High availability service


Configuring high availability for the Ceph Object Gateway
HAProxy/keepalived Prerequisites

Prerequisites
Edit online

At least two Ceph Object Gateway daemons running on different hosts.

Capacity for at least two instances of the ingress service running on different hosts

High availability service


Edit online
The ingress service provides a highly available endpoint for the Ceph Object Gateway. The ingress service can be deployed to
any number of hosts as needed. IBM recommends having at least two Red Hat Enterprise Linux 8 servers, each server configured
with the ingress service. You can run a high availability (HA) service with a minimum set of configuration options. The Ceph
orchestrator deploys the ingress service, which manages the haproxy and keepalived daemons, by providing load balancing
with a floating virtual IP address. The active haproxy distributes all Ceph Object Gateway requests to all the available Ceph Object
Gateway daemons.

A virtual IP address is automatically configured on one of the ingress hosts at a time, known as the primary host. The Ceph
orchestrator selects the first network interface based on existing IP addresses that are configured as part of the same subnet. In
cases where the virtual IP address does not belong to the same subnet, you can define a list of subnets for the Ceph orchestrator to
match with existing IP addresses. If the keepalived daemon and the active haproxy are not responding on the primary host, then
the virtual IP address moves to a backup host. This backup host becomes the new primary host.

WARNING: Currently, you can not configure a virtual IP address on a network interface that does not have a configured IP address.

636 IBM Storage Ceph


IMPORTANT: To use the secure socket layer (SSL), SSL must be terminated by the ingress service and not at the Ceph Object
Gateway.

Figure 1. High availability architecture

Reference
Edit online

Configuring high availability for the Ceph Object Gateway

Configuring high availability for the Ceph Object Gateway


Edit online
To configure high availability (HA) for the Ceph Object Gateway you write a YAML configuation file, and the Ceph orchestrator does
the installation, configuraton, and management of the ingress service. The ingress service uses the haproxy and keepalived
daemons to provide high availability for the Ceph Object Gateway.

Prerequisites
Edit online

A minimum of two hosts running Red Hat Enterprise Linux 8, or higher, for installing the ingress service on.

A healthy running IBM Storage Ceph cluster.

A minimum of two Ceph Object Gateway daemons running on different hosts.

Root-level access to the host running the ingress service.

If using a firewall, then open port 80 for HTTP and port 443 for HTTPS traffic.

Procedure

IBM Storage Ceph 637


Edit online

1. Create a new ingress.yaml file:

Example

[root@host01 ~] touch ingress.yaml

2. Open the ingress.yaml file for editing. Added the following options, and add values applicable to the environment:

Syntax

service_type: ingress
service_id: SERVICE_ID
placement:
hosts:
- HOST1
- HOST2
- HOST3
spec:
backend_service: SERVICE_ID
virtual_ip: IP_ADDRESS/CIDR
frontend_port: INTEGER
monitor_port: INTEGER
virtual_interface_networks:
- IP_ADDRESS/CIDR
ssl_cert: |

service_type - Must be set to ingress.

service_id - Must match the existing Ceph Object Gateway service name.

placement - Where to deploy the haproxy and keepalived containers.

virtual_ip - The virtual IP address where the ingress service is available.

frontend_port - The port to access the ingress service.

monitor_port - The port to access the haproxy load balancer status.

virtual_interface_networks - Optional list of available subnets.

ssl_cert - Optional SSL certificate and private key.

Example

service_type: ingress
service_id: rgw.foo
placement:
hosts:
- host01.example.com
- host02.example.com
- host03.example.com
spec:
backend_service: rgw.foo
virtual_ip: 192.168.1.2/24
frontend_port: 8080
monitor_port: 1967
virtual_interface_networks:
- 10.10.0.0/16
ssl_cert: |
-----BEGIN CERTIFICATE-----
MIIEpAIBAAKCAQEA+Cf4l9OagD6x67HhdCy4Asqw89Zz9ZuGbH50/7ltIMQpJJU0
gu9ObNtIoC0zabJ7n1jujueYgIpOqGnhRSvsGJiEkgN81NLQ9rqAVaGpadjrNLcM
bpgqJCZj0vzzmtFBCtenpb5l/EccMFcAydGtGeLP33SaWiZ4Rne56GBInk6SATI/
JSKweGD1y5GiAWipBR4C74HiAW9q6hCOuSdp/2WQxWT3T1j2sjlqxkHdtInUtwOm
j5Ism276IndeQ9hR3reFR8PJnKIPx73oTBQ7p9CMR1J4ucq9Ny0J12wQYT00fmJp
-----END CERTIFICATE-----
-----BEGIN PRIVATE KEY-----
MIIEBTCCAu2gAwIBAgIUGfYFsj8HyA9Zv2l600hxzT8+gG4wDQYJKoZIhvcNAQEL
BQAwgYkxCzAJBgNVBAYTAklOMQwwCgYDVQQIDANLQVIxDDAKBgNVBAcMA0JMUjEM
MAoGA1UECgwDUkhUMQswCQYDVQQLDAJCVTEkMCIGA1UEAwwbY2VwaC1zc2wtcmhj
czUtOGRjeHY2LW5vZGU1MR0wGwYJKoZIhvcNAQkBFg5hYmNAcmVkaGF0LmNvbTAe
-----END PRIVATE KEY-----

638 IBM Storage Ceph


3. Launch the Cephadm shell:

Example

[root@host01 ~]# cephadm shell --mount ingress.yaml:/var/lib/ceph/radosgw/igress.yaml

4. For non-default version and specific hot-fix scenario, configure the latest haproxy and keepalived images:

NOTE: The image names are set as default in cephadm and mgr/cephadm configuration.

Syntax

ceph config set mgr mgr/cephadm/container_image_haproxy HAPROXY_IMAGE_ID


ceph config set mgr mgr/cephadm/container_image_keepalived KEEPALIVED_IMAGE_ID

Example

[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/container_image_haproxy


cp.icr.io/cp/ibm-ceph/haproxy-rhel8:latest
[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/container_image_keepalived
cp.icr.io/cp/ibm-ceph/keepalived-rhel8:latest

5. Install and configure the new ingress service using the Ceph orchestrator:

[ceph: root@host01 /]# ceph orch apply -i ingress.yaml

6. After the Ceph orchestrator completes, verify the HA configuration.

a. On the host running the ingress service, check that the virtual IP address appears:

Example

[root@host01 ~]# ip addr show

b. Try reaching the Ceph Object Gateway from a Ceph client:

Syntax

wget HOST_NAME

Example

[root@client ~]# wget host01.example.com

If this returns an index.html with similar content as in the example below, then the HA configuration for the Ceph
Object Gateway is working properly.

Example

<?xml version="1.0" encoding="UTF-8"?>


<ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Owner>
<ID>anonymous</ID>
<DisplayName></DisplayName>
</Owner>
<Buckets>
</Buckets>
</ListAllMyBucketsResult>

Reference
Edit online

See the Performing a Standard RHEL Installation Guide for more details.

High availability service

HAProxy/keepalived Prerequisites

IBM Storage Ceph 639


Edit online
As a storage administrator, you can assign many instances of the Ceph Object Gateway to a single zone. This allows you scale out as
the load increases, that is, the same zone group and zone; however, you do not need a federated architecture to use HAProxy and
keepalived. Since each object gateway instance has its own IP address, you can use HAProxy and keepalived to balance the
load across Ceph Object Gateway servers.

Another use case for HAProxy and keepalived is to terminate HTTPS at the HAProxy server. You can use an HAProxy server to
terminate HTTPS at the HAProxy server and use HTTP between the HAProxy server and the Beast web server instances.

HAProxy/keepalived Prerequisites
Preparing HAProxy Nodes
Installing and Configuring HAProxy
Installing and Configuring keepalived

HAProxy/keepalived Prerequisites
Edit online
To set up an HAProxy with the Ceph Object Gateway, you must have:

A running IBM Storage Ceph cluster

At least two Ceph Object Gateway servers within the same zone are configured to run on port 80. If you follow the simple
installation procedure, the gateway instances are in the same zone group and zone by default. If you are using a federated
architecture, ensure that the instances are in the same zone group and zone.

At least two Red Hat Enterprise Linux 8 servers for HAProxy and keepalived.

NOTE: This section assumes that you have at least two Ceph Object Gateway servers running, and that you get a valid response from
each of them when running test scripts over port 80.

Preparing HAProxy Nodes


Edit online
The following setup assumes two HAProxy nodes named haproxy and haproxy2 and two Ceph Object Gateway servers named
rgw1 and rgw2. You might use any naming convention you prefer. Perform the following procedure on at least two HAProxy nodes:

Procedure
Edit online

1. Install Red Hat Enterprise Linux 8 or 9.

2. Register the nodes.

[root@haproxy]# subscription-manager register

3. Enable the RHEL server repository.

[root@haproxy]# subscription-manager repos --enable=rhel-8-server-rpms

4. Update the server.

[root@haproxy]# dnf update -y

5. Install admin tools (e.g., wget, vim, etc.) as needed.

6. Open port 80.

[root@haproxy]# firewall-cmd --zone=public --add-port 80/tcp --permanent


[root@haproxy]# firewall-cmd --reload

7. For HTTPS, open port 443.

640 IBM Storage Ceph


[root@haproxy]# firewall-cmd --zone=public --add-port 443/tcp --permanent
[root@haproxy]# firewall-cmd --reload

Installing and Configuring HAProxy


Edit online
Perform the following procedure on your at least two HAProxy nodes:

1. Install haproxy.

[root@haproxy]# dnf install haproxy

2. Configure haproxy for SELinux and HTTP.

[root@haproxy]# vim /etc/firewalld/services/haproxy-http.xml

Add the following lines:

<?xml version="1.0" encoding="utf-8"?>


<service>
<short>HAProxy-HTTP</short>
<description>HAProxy load-balancer</description>
<port protocol="tcp" port="80"/>
</service>

As root, assign the correct SELinux context and file permissions to the haproxy-http.xml file.

[root@haproxy]# cd /etc/firewalld/services
[root@haproxy]# restorecon haproxy-http.xml
[root@haproxy]# chmod 640 haproxy-http.xml

3. If you intend to use HTTPS, configure haproxy for SELinux and HTTPS.

[root@haproxy]# vim /etc/firewalld/services/haproxy-https.xml

Add the following lines:

<?xml version="1.0" encoding="utf-8"?>


<service>
<short>HAProxy-HTTPS</short>
<description>HAProxy load-balancer</description>
<port protocol="tcp" port="443"/>
</service>

As root, assign the correct SELinux context and file permissions to the haproxy-https.xml file.

# cd /etc/firewalld/services
# restorecon haproxy-https.xml
# chmod 640 haproxy-https.xml

4. Finally, put the certificate and key into a PEM file.

[root@haproxy]# cat example.com.crt example.com.key > example.com.pem


[root@haproxy]# cp example.com.pem /etc/ssl/private/

5. Configure haproxy.

[root@haproxy]# vim /etc/haproxy/haproxy.cfg

The global and defaults may remain unchanged. After the defaults section, you will need to configure frontend and
backend sections. For example:

frontend http_web
bind *:80
mode http
default_backend rgw

frontend rgw­
-https
bind *:443 ssl crt /etc/ssl/private/example.com.pem
default_backend rgw

IBM Storage Ceph 641


backend rgw
balance roundrobin
mode http
server rgw1 10.0.0.71:80 check
server rgw2 10.0.0.80:80 check

6. Enable/start haproxy

[root@haproxy]# systemctl enable haproxy


[root@haproxy]# systemctl start haproxy

Installing and Configuring keepalived


Edit online
Perform the following procedure on your at least two HAProxy nodes:

Prerequisites
Edit online

A minimum of two HAProxy nodes.

A minimum of two Object Gateway nodes.

Procedure
Edit online

1. Install keepalived:

[root@haproxy]# yum install -y keepalived

2. Configure keepalived on both HAProxy nodes:

[root@haproxy]# vim /etc/keepalived/keepalived.conf

In the configuration file, there is a script to check the haproxy processes:

vrrp_script chk_haproxy {
script "killall -0 haproxy" # check the haproxy process
interval 2 # every 2 seconds
weight 2 # add 2 points if OK
}

Next, the instance on the primary and backup load balancers uses eno1 as the network interface. It also assigns a virtual IP
address, that is, 192.168.1.20.

Primary load balancer node

vrrp_instance RGW {
state MASTER # might not be necessary. This is on the primary LB node.
@main interface eno1
priority 100
advert_int 1
interface eno1
virtual_router_id 50
@main unicast_src_ip 10.8.128.43 80
unicast_peer {
10.8.128.53
}
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.20
}
track_script {

642 IBM Storage Ceph


chk_haproxy
}
}
virtual_server 192.168.1.20 80 eno1 { #populate correct interface
delay_loop 6
lb_algo wlc
lb_kind dr
persistence_timeout 600
protocol TCP
real_server 10.8.128.43 80 { # ip address of rgw2 on physical interface, haproxy listens
here, rgw listens to localhost:8080 or similar
weight 100
TCP_CHECK { # perhaps change these to a HTTP/SSL GET?
connect_timeout 3
}
}
real_server 10.8.128.53 80 { # ip address of rgw3 on physical interface, haproxy listens
here, rgw listens to localhost:8080 or similar
weight 100
TCP_CHECK { # perhaps change these to a HTTP/SSL GET?
connect_timeout 3
}
}
}

Backup load balancer node

vrrp_instance RGW {
state BACKUP # might not be necessary?
priority 99
advert_int 1
interface eno1
virtual_router_id 50
unicast_src_ip 10.8.128.53 80
unicast_peer {
10.8.128.43
}
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.20
}
track_script {
chk_haproxy
}
}
virtual_server 192.168.1.20 80 eno1 { #populate correct interface
delay_loop 6
lb_algo wlc
lb_kind dr
persistence_timeout 600
protocol TCP
real_server 10.8.128.43 80 { # ip address of rgw2 on physical interface, haproxy listens
here, rgw listens to localhost:8080 or similar
weight 100
TCP_CHECK { # perhaps change these to a HTTP/SSL GET?
connect_timeout 3
}
}
real_server 10.8.128.53 80 { # ip address of rgw3 on physical interface, haproxy listens
here, rgw listens to localhost:8080 or similar
weight 100
TCP_CHECK { # perhaps change these to a HTTP/SSL GET?
connect_timeout 3
}
}
}

3. Enable and start the keepalived service:

[root@haproxy]# systemctl enable keepalived


[root@haproxy]# systemctl start keepalived

IBM Storage Ceph 643


Advanced configuration
Edit online
As a storage administrator, you can configure some of the more advanced features of the Ceph Object Gateway. You can configure a
multisite Ceph Object Gateway and integrate it with directory services, such as Microsoft Active Directory and OpenStack Keystone
service.

Multi-site configuration and administration


Multi-site Ceph Object Gateway command line usage
Configure LDAP and Ceph Object Gateway
Configure Active Directory and Ceph Object Gateway
The Ceph Object Gateway and OpenStack Keystone

Multi-site configuration and administration


Edit online
As a storage administrator, you can configure and administer multiple Ceph Object Gateways for a variety of use cases. You can learn
what to do during a disaster recovery and failover events. Also, you can learn more about realms, zones, and syncing policies in
multi-site Ceph Object Gateway environments.

A single zone configuration typically consists of one zone group containing one zone and one or more ceph-radosgw instances
where you may load-balance gateway client requests between the instances. In a single zone configuration, typically multiple
gateway instances point to a single Ceph storage cluster. However, IBM supports several multi-site configuration options for the
Ceph Object Gateway:

Multi-zone: A more advanced configuration consists of one zone group and multiple zones, each zone with one or more
ceph-radosgw instances. Each zone is backed by its own Ceph Storage Cluster. Multiple zones in a zone group provides
disaster recovery for the zone group should one of the zones experience a significant failure. Each zone is active and may
receive write operations. In addition to disaster recovery, multiple active zones may also serve as a foundation for content
delivery networks. To configure multiple zones without replication, see Configuring multiple zones without replication

Multi-zone-group: Formerly called 'regions', the Ceph Object Gateway can also support multiple zone groups, each zone
group with one or more zones. Objects stored to zone groups within the same realm share a global namespace, ensuring
unique object IDs across zone groups and zones.

Multiple Realms: The Ceph Object Gateway supports the notion of realms, which can be a single zone group or multiple zone
groups and a globally unique namespace for the realm. Multiple realms provides the ability to support numerous
configurations and namespaces.

Figure 1. Ceph Object Gateway realm

644 IBM Storage Ceph


Requirements and Assumptions
Pools
Migrating a single site system to multi-site
Establishing a secondary zone
Configuring the archive zone (Technology Preview)
Failover and disaster recovery
Configuring multiple zones without replication
Configuring multiple realms in the same storage cluster

Prerequisites
Edit online

A healthy running IBM Storage Ceph cluster.

Deployment of the Ceph Object Gateway software.

Requirements and Assumptions


Edit online
A multi-site configuration requires at least two Ceph storage clusters, and at least two Ceph object gateway instances, one for each
Ceph storage cluster.

This guide assumes at least two Ceph storage clusters in geographically separate locations; however, the configuration can work on
the same physical site. This guide also assumes four Ceph object gateway servers named rgw1, rgw2, rgw3 and rgw4 respectively.

A multi-site configuration requires a master zone group and a master zone. Additionally, each zone group requires a master
zone. Zone groups might have one or more secondary or non-master zones.

IBM Storage Ceph 645


IMPORTANT: When planning network considerations for multi-site, it is important to understand the relation bandwidth and latency
observed on the multi-site synchronization network and the clients ingest rate in direct correlation with the current sync state of the
objects owed to the secondary site. Multi-site synchronization is asynchronous and one of the limitations is the rate at which the
sync gateways can process data across the link. An example to look at in terms of network inter-connectivity speed could be 1 GbE
or inter-datacenter connectivity, for every 8 TB or cumulative receive data, per client gateway. Thus, if you replicate to two other
sites, and ingest 16 TB a day, you need 6 GbE of dedicated bandwidth for multi-site replication.

IBM also recommends private Ethernet or Dense wavelength-division multiplexing (DWDM) as a VPN over the internet is not ideal
due to the additional overhead incurred.

IMPORTANT: The master zone within the master zone group of a realm is responsible for storing the master copy of the realm’s
metadata, including users, quotas and buckets (created by the radosgw-admin CLI). This metadata gets synchronized to secondary
zones and secondary zone groups automatically. Metadata operations executed with the radosgw-admin CLI MUST be executed
on a host within the master zone of the master zone group in order to ensure that they get synchronized to the secondary zone
groups and zones. Currently, it is possible to execute metadata operations on secondary zones and zone groups, but it is NOT
recommended because they WILL NOT be synchronized, leading to fragmented metadata.

In the following examples, the rgw1 host will serve as the master zone of the master zone group; the rgw2 host will serve as the
secondary zone of the master zone group; the rgw3 host will serve as the master zone of the secondary zone group; and the rgw4
host will serve as the secondary zone of the secondary zone group.

IMPORTANT: When you have a large cluster with more Ceph Object Gateways configured in a multi-site storage cluster, IBM
recommends to dedicate not more than three sync-enabled Ceph Object Gateways per site for multi-site synchronization. If there
are more than three syncing Ceph Object Gateways, it has diminishing returns sync rate in terms of performance and the increased
contention creates an incremental risk for hitting timing-related error conditions. This is due to a sync-fairness known issue
BZ#1740782. For the rest of the Ceph Object Gateways in such a configuration, which are dedicated for client I/O operations
through load balancers, run the ceph config set client.rgw.CLIENT_NODE rgw_run_sync_thread false command to
prevent them from performing sync operations, and then restart the Ceph Object Gateway.

Following is a typical configuration file for HAProxy for syncing gateways:

Example

[root@host01 ~]# cat ./haproxy.cfg

global

log 127.0.0.1 local2

chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 7000
user haproxy
group haproxy
daemon

stats socket /var/lib/haproxy/stats

defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 30s
timeout server 30s
timeout http-keep-alive 10s
timeout check 10s
timeout client-fin 1s
timeout server-fin 1s
maxconn 6000

listen stats
bind 0.0.0.0:1936

646 IBM Storage Ceph


mode http
log global

maxconn 256

clitimeout 10m
srvtimeout 10m
contimeout 10m
timeout queue 10m

# JTH start
stats enable
stats hide-version
stats refresh 30s
stats show-node
## stats auth admin:password
stats uri /haproxy?stats
stats admin if TRUE

frontend main
bind *:5000
acl url_static path_beg -i /static /images /javascript /stylesheets
acl url_static path_end -i .jpg .gif .png .css .js

use_backend static if url_static


default_backend app
maxconn 6000

backend static
balance roundrobin
fullconn 6000
server app8 host01:8080 check maxconn 2000
server app9 host02:8080 check maxconn 2000
server app10 host03:8080 check maxconn 2000

backend app
balance roundrobin
fullconn 6000
server app8 host01:8080 check maxconn 2000
server app9 host02:8080 check maxconn 2000
server app10 host03:8080 check maxconn 2000

Pools
Edit online
IBM recommends using the Ceph Placement Group’s per Pool Calculator to calculate a suitable number of placement groups for the
pools the radosgw daemon will create. Set the calculated values as defaults in the Ceph configuration database.

Example

[ceph: root@host01 /]# ceph config set osd osd_pool_default_pg_num = 50


[ceph: root@host01 /]# ceph config set osd osd_pool_default_pgp_num = 50

NOTE: Making this change to the Ceph configuration will use those defaults when the Ceph Object Gateway instance creates the
pools. Alternatively, you can create the pools manually.

Pool names particular to a zone follow the naming convention ZONE_NAME.POOL_NAME. For example, a zone named us-east will
have the following pools:

.rgw.root

us-east.rgw.control

us-east.rgw.meta

us-east.rgw.log

IBM Storage Ceph 647


us-east.rgw.buckets.index

us-east.rgw.buckets.data

us-east.rgw.buckets.non-ec

us-east.rgw.meta:users.keys

us-east.rgw.meta:users.email

us-east.rgw.meta:users.swift

us-east.rgw.meta:users.uid

Reference
Edit online

Pools

Migrating a single site system to multi-site


Edit online
You can migrate from a single site system with a default zone group and zone to a multi-site system.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A Ceph Object Gateway installed.

Procedure
Edit online

1. Create a realm. Replace REALM_NAME with the realm name.

Syntax

radosgw-admin realm create --rgw-realm=REALM_NAME --default

2. Rename the default zone and zonegroup. Replace ZONE_GROUP_NAME and ZONE_NAME with the zonegroup or zone name
respectively.

Syntax

radosgw-admin zonegroup rename --rgw-zonegroup default --zonegroup-new-


name=NEW_ZONE_GROUP_NAME
radosgw-admin zone rename --rgw-zone default --zone-new-name us-east-1 --rgw-
zonegroup=ZONE_GROUP_NAME

3. Configure the primary zonegroup. Replace ZONE_GROUP_NAME with the zonegroup name and REALM_NAME with realm name.
Replace FQDN with the fully qualified domain name(s) in the zonegroup.

Syntax

radosgw-admin zonegroup modify --rgw-realm=REALM_NAME --rgw-zonegroup=ZONE_GROUP_NAME --


endpoints http://FQDN:80 --master --default

4. Create a system user. Replace USER_ID with the username. Replace DISPLAY_NAME with a display name. It can contain
spaces.

Syntax

648 IBM Storage Ceph


radosgw-admin user create --uid=USER_ID
--display-name="DISPLAY_NAME"
--access-key=ACCESS_KEY --secret=SECRET_KEY --system

5. Configure the primary zone. Replace FQDN with the fully qualified domain name(s) in the zonegroup.

Syntax

radosgw-admin zone modify --rgw-realm=REALM_NAME --rgw-zonegroup=ZONE_GROUP_NAME


--rgw-zone=ZONE_NAME --endpoints http://FQDN:80
--access-key=ACCESS_KEY --secret=SECRET_KEY
--master --default

6. Optional: If you specified the realm and zone in the service specification during the deployment of the Ceph Object Gateway,
update the spec section of the specification file:

Syntax

spec:
rgw_realm: REALM_NAME
rgw_zone: ZONE_NAME

7. Update the Ceph configuration database:

Syntax

ceph config set client.rgw.SERVICE_NAME rgw_realm REALM_NAME


ceph config set client.rgw.SERVICE_NAME rgw_zonegroup ZONE_GROUP_NAME
ceph config set client.rgw.SERVICE_NAME rgw_zone PRIMARY_ZONE_NAME

Example

[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_realm test_realm


[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zonegroup us
[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zone us-east-1

8. Commit the updated configuration:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

9. Restart the Ceph Object Gateway:

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

a. To restart the Ceph Object Gateway on an individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

10. Establish a secondary zone

Establishing a secondary zone

IBM Storage Ceph 649


Edit online
Zones within a zone group replicate all data to ensure that each zone has the same data. When creating the secondary zone, issue
ALL of the radosgw-admin zone operations on a host identified to serve the secondary zone.

NOTE: To add a additional zones, follow the same procedures as for adding the secondary zone. Use a different zone name.

IMPORTANT: You must run metadata operations, such as user creation and quotas, on a host within the master zone of the master
zonegroup. The master zone and the secondary zone can receive bucket operations from the RESTful APIs, but the secondary zone
redirects bucket operations to the master zone. If the master zone is down, bucket operations will fail. If you create a bucket using
the radosgw-admin CLI, you must run it on a host within the master zone of the master zone group so that the buckets will
synchronize with other zone groups and zones.

Prerequisites
Edit online

At least two IBM Storage Ceph clusters.

At least two Ceph Object Gateway instances, one for each IBM Storage Ceph cluster.

Root-level access to all the nodes.

Nodes or containers are added to the storage cluster.

All Ceph Manager, Monitor, and OSD daemons are deployed.

Procedure
Edit online

1. Log into the cephadm shell:

Example

[root@host04 ~]# cephadm shell

2. Pull the primary realm configuration from the host:

Syntax

radosgw-admin realm pull --url=URL_TO_PRIMARY_ZONE_GATEWAY --access-key=ACCESS_KEY --secret-


key=SECRET_KEY

Example

[ceph: root@host04 /]# radosgw-admin realm pull --url=http://10.74.249.26:80 --access-


key=LIPEYZJLTWXRKXS9LPJC --secret-key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ

3. Pull the primary period configuration from the host:

Syntax

radosgw-admin period pull --url=URL_TO_PRIMARY_ZONE_GATEWAY --access-key=ACCESS_KEY --secret-


key=SECRET_KEY

Example

[ceph: root@host04 /]# radosgw-admin period pull --url=http://10.74.249.26:80 --access-


key=LIPEYZJLTWXRKXS9LPJC --secret-key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ

4. Configure a secondary zone:

NOTE: All zones run in an active-active configuration by default; that is, a gateway client might write data to any zone and the
zone will replicate the data to all other zones within the zone group. If the secondary zone should not accept write operations,
specify the --read-only flag to create an active-passive configuration between the master zone and the secondary zone.
Additionally, provide the access_key and secret_key of the generated system user stored in the master zone of the master
zone group.

Syntax

650 IBM Storage Ceph


radosgw-admin zone create --rgw-zonegroup=ZONE_GROUP_NAME \
--rgw-zone=SECONDARY_ZONE_NAME --
endpoints=http://RGW_SECONDARY_HOSTNAME:RGW_PRIMARY_PORT_NUMBER_1 \
--access-key=SYSTEM_ACCESS_KEY --secret=SYSTEM_SECRET_KEY \
[--read-only]

Example

[ceph: root@host04 /]# radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-2 --


endpoints=http://rgw2:80 --access-key=LIPEYZJLTWXRKXS9LPJC --secret-
key=IsAje0AVDNXNw48LjMAimpCpI7VaxJYSnfD0FFKQ

5. Optional: Delete the default zone:

IMPORTANT:

Do not delete the default zone and its pools if you are using the default zone and zone group to store data.

Example

[ceph: root@host04 /]# radosgw-admin zone rm --rgw-zone=default


[ceph: root@host04 /]# ceph osd pool rm default.rgw.log default.rgw.log --yes-i-really-really-
mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.meta default.rgw.meta --yes-i-really-
really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.control default.rgw.control --yes-i-
really-really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.data.root default.rgw.data.root --yes-i-
really-really-mean-it
[ceph: root@host04 /]# ceph osd pool rm default.rgw.gc default.rgw.gc --yes-i-really-really-
mean-it

6. Optional: If you specified the realm and zone in the service specification during the deployment of the Ceph Object Gateway,
update the spec section of the specification file:

Syntax

spec:
rgw_realm: REALM_NAME
rgw_zone: ZONE_NAME

7. Update the Ceph configuration database:

Syntax

ceph config set client.rgw.SERVICE_NAME rgw_realm REALM_NAME


ceph config set client.rgw.SERVICE_NAME rgw_zonegroup ZONE_GROUP_NAME
ceph config set client.rgw.SERVICE_NAME rgw_zone SECONDARY_ZONE_NAME

Example

[ceph: root@host04 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_realm test_realm


[ceph: root@host04 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zonegroup us
[ceph: root@host04 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zone us-east-2

8. Commit the changes:

Syntax

radosgw-admin period update --commit

Example

[ceph: root@host04 /]# radosgw-admin period update --commit

9. Outside the cephadm shell, fetch the FSID of the storage cluster and the processes:

Example

[root@host04 ~]# systemctl list-units | grep ceph

10. Start the Ceph Object Gateway daemon:

**Syntax**

IBM Storage Ceph 651


systemctl start ceph-FSID@DAEMON_NAME
systemctl enable ceph-FSID@DAEMON_NAME

**Example**

[root@host04 ~]# systemctl start [email protected]_realm.us-


east-2.host04.ahdtsw.service
[root@host04 ~]# systemctl enable [email protected]_realm.us-
east-2.host04.ahdtsw.service

Configuring the archive zone (Technology Preview)


Edit online
The archive zone leverages the versioning feature of S3 objects in Ceph Object Gateway to have an archive zone. The archive zone
has a history of versions of S3 objects that can only be eliminated through the gateways associated with the archive zone. It captures
all the data updates and metadata to consolidate them as versions of S3 objects.

IMPORTANT: Technology Preview features are not supported with IBM production service level agreements (SLAs), might not be
functionally complete, and IBM does not recommend using them for production. These features provide early access to upcoming
product features, enabling customers to test functionality and provide feedback during the development process.

Deleting objects in archive zone

Prerequisites
Edit online

A running, and healthy IBM Storage Ceph cluster.

Root-level access to a Ceph Monitor node.

Installation of the Ceph Object Gateway software.

Procedure
Edit online

Configure the archive zone when creating a new zone by using the archive tier:

Syntax

radosgw-admin zone create --rgw-zonegroup={_ZONE_GROUP_NAME_} --rgw-zone={_ZONE_NAME_} --


endpoints={http://_FQDN_:PORT},{http://_FQDN_:PORT} --tier-type=archive

Example

[ceph: root@host01 /]# radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east --


endpoints={http://example.com:8080} --tier-type=archive

Reference
Edit online

Deploying a multi-site Ceph Object Gateway using the Ceph Orchestrator

Deleting objects in archive zone


Edit online
You can use an S3 lifecycle policy extension to delete objects within an <ArchiveZone> element.

652 IBM Storage Ceph


If any <Rule> section contains an <ArchiveZone> element, that rule executes in archive zone and are the ONLY rules which
run in an archive zone.

Rules marked <ArchiveZone> do NOT execute in non-archive zones.

The rules within the lifecycle policy determine when and what objects to delete. For more information about lifecycle creation and
management, see Bucket lifecycle.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to a Ceph Monitor node.

Installation of the Ceph Object Gateway software.

Procedure
Edit online

1. Set the <ArchiveZone> lifecycle policy rule. For more information about creating a lifecycle policy, see the Creating a
lifecycle management policy.

Example

<?xml version="1.0" ?>


<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Rule>
<ID>delete-1-days-az</ID>
<Filter>
<Prefix></Prefix>
<ArchiveZone /> <1>
</Filter>
<Status>Enabled</Status>
<Expiration>
<Days>1</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>

<1> The archive zone rule.

2. Optional: See if a specific lifecycle policy contains an archive zone rule.

Syntax

radosgw-admin lc get -- BUCKET_NAME

Example

[ceph: root@host01 /]# radosgw-admin lc get --bucket test-bkt

{
"prefix_map": {
"": {
"status": true,
"dm_expiration": true,
"expiration": 0,
"noncur_expiration": 2,
"mp_expiration": 0,
"transitions": {},
"noncur_transitions": {}
}
},
"rule_map": [
{
"id": "Rule 1",

IBM Storage Ceph 653


"rule": {
"id": "Rule 1",
"prefix": "",
"status": "Enabled",
"expiration": {
"days": "",
"date": ""
},
"noncur_expiration": {
"days": "2",
"date": ""
},
"mp_expiration": {
"days": "",
"date": ""
},
"filter": {
"prefix": "",
"obj_tags": {
"tagset": {}
},
"archivezone": "" <1>
},
"transitions": {},
"noncur_transitions": {},
"dm_expiration": true
}
}
]
}

<1> The archive zone rule. This is an example of a lifecycle policy with an archive zone rule.

3. If the Ceph Object Gateway user is deleted, the buckets at the archive site owned by that user is inaccessible. Link those
buckets to another Ceph Object Gateway user to access the data.

Syntax

radosgw-admin bucket link --uid NEW_USER_ID --bucket BUCKET_NAME --yes-i-really-mean-it

Example

[ceph: root@host01 /]# radosgw-admin bucket link --uid arcuser1 --bucket arc1-deleted-
da473fbbaded232dc5d1e434675c1068 --yes-i-really-mean-it

Additional resources

For more information, see:

Bucket lifecycle

S3 bucket lifecycle

Failover and disaster recovery


Edit online
If the primary zone fails, failover to the secondary zone for disaster recovery.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to a Ceph Monitor node.

Installation of the Ceph Object Gateway software.

654 IBM Storage Ceph


Procedure
Edit online

1. Make the secondary zone the primary and default zone. For example:

Syntax

radosgw-admin zone modify --rgw-zone=ZONE_NAME --master --default

By default, Ceph Object Gateway runs in an active-active configuration. If the cluster was configured to run in an active-
passive configuration, the secondary zone is a read-only zone. Remove the --read-only status to allow the zone to receive
write operations. For example:

Syntax

radosgw-admin zone modify --rgw-zone=ZONE_NAME --master --default --read-only=false

2. Update the period to make the changes take effect:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

3. Restart the Ceph Object Gateway.

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

a. To restart the Ceph Object Gateway on an individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

If the former primary zone recovers, revert the operation.

4. From the recovered zone, pull the realm from the current primary zone:

Syntax

radosgw-admin realm pull --url=URL_TO_PRIMARY_ZONE_GATEWAY


--access-key=ACCESS_KEY --secret=SECRET_KEY

5. Make the recovered zone the primary and default zone:

Syntax

radosgw-admin zone modify --rgw-zone=ZONE_NAME --master --default

6. Update the period to make the changes take effect:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

7. Restart the Ceph Object Gateway in the recovered zone:

Syntax

IBM Storage Ceph 655


ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

8. If the secondary zone needs to be a read-only configuration, update the secondary zone:

Syntax

radosgw-admin zone modify --rgw-zone=ZONE_NAME --read-only


radosgw-admin zone modify --rgw-zone=ZONE_NAME --read-only

9. Update the period to make the changes take effect:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

10. Restart the Ceph Object Gateway in the secondary zone:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

Configuring multiple zones without replication


Edit online
You can configure multiple zones that will not replicate each other. For example, you can create a dedicated zone for each team in a
company.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

Root-level access to the Ceph Object Gateway node.

Procedure
Edit online

1. Create a new realm:

Syntax

radosgw-admin realm create --rgw-realm=REALM_NAME [--default]

Example

[ceph: root@host01 /]# radosgw-admin realm create --rgw-realm=test_realm --default


{
"id": "0956b174-fe14-4f97-8b50-bb7ec5e1cf62",
"name": "test_realm",
"current_period": "1950b710-3e63-4c41-a19e-46a715000980",
"epoch": 1
}

2. Create a new zone group:

656 IBM Storage Ceph


Syntax

radosgw-admin zonegroup create --rgw-zonegroup=_ZONE_GROUP_NAME_ --endpoints=_FQDN_:PORT --


rgw-realm=_REALM_NAME_|--realm-id=_REALM_ID_ --master --default

Example

[ceph: root@host01 /]# radosgw-admin zonegroup create --rgw-zonegroup=us --


endpoints=http://rgw1:80 --rgw-realm=test_realm --master --default
{
"id": "f1a233f5-c354-4107-b36c-df66126475a6",
"name": "us",
"api_name": "us",
"is_master": "true",
"endpoints": [
"http://rgw1:80"
],
"hostnames": [],
"hostnames_s3webzone": [],
"master_zone": "",
"zones": [],
"placement_targets": [],
"default_placement": "",
"realm_id": "0956b174-fe14-4f97-8b50-bb7ec5e1cf62"
}

3. Create one or more zones depending on the use case:

Syntax

radosgw-admin zone create --rgw-zonegroup=_ZONE_GROUP_NAME_ --rgw-zone=_ZONE_NAME_ --master --


default --endpoints=_FQDN_:PORT,_FQDN_:PORT

Example

[ceph: root@host01 /]# radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east --


master --default --endpoints=http://rgw1:80

4. Get the JSON file with the configuration of the zone group:

Syntax

radosgw-admin zonegroup get --rgw-zonegroup=ZONE_GROUP_NAME > JSON_FILE_NAME

Example

[ceph: root@host01 /]# radosgw-admin zonegroup get --rgw-zonegroup=us > zonegroup-us.json

a. Open the file for editing, and set the log_meta, log_data, and sync_from_all fields to false:

Example

{
"id": "72f3a886-4c70-420b-bc39-7687f072997d",
"name": "default",
"api_name": "",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "a5e44ecd-7aae-4e39-b743-3a709acb60c5",
"zones": [
{
"id": "975558e0-44d8-4866-a435-96d3e71041db",
"name": "testzone",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "false",
"sync_from": []
},
{

IBM Storage Ceph 657


"id": "a5e44ecd-7aae-4e39-b743-3a709acb60c5",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "false",
"sync_from": []
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "2d988e7d-917e-46e7-bb18-79350f6a5155"
}

5. Use the updated JSON file to set the zone group:

Syntax

radosgw-admin zonegroup set --rgw-zonegroup=ZONE_GROUP_NAME --infile=JSON_FILE_NAME

Example

[ceph: root@host01 /]# radosgw-admin zonegroup set --rgw-zonegroup=us --infile=zonegroup-


us.json

6. Update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

7. Optional: If you specified the realm and zone in the service specification during the deployment of the Ceph Object Gateway,
update the spec section of the specification file:

Syntax

spec:
rgw_realm: REALM_NAME
rgw_zone: ZONE_NAME

Reference
Edit online

Realms

Zone Groups

Zones

Installation

Configuring multiple realms in the same storage cluster


Edit online
You can configure multiple realms in the same storage cluster. This is a more advanced use case for multi-site. Configuring multiple
realms in the same storage cluster enables you to use a local realm to handle local Ceph Object Gateway client traffic, as well as a
replicated realm for data that will be replicated to a secondary site.

NOTE: IBM recommends that each realm has its own Ceph Object Gateway.

658 IBM Storage Ceph


Prerequisites
Edit online

Two running IBM Storage Ceph data centers in a storage cluster.

The access key and secret key for each data center in the storage cluster.

Root-level access to all the Ceph Object Gateway nodes.

Each data center has its own local realm. They share a realm that replicates on both sites.

Procedure
Edit online

1. Create one local realm on the first data center in the storage cluster:

Syntax

radosgw-admin realm create --rgw-realm=REALM_NAME --default

Example

[ceph: root@host01 /]# radosgw-admin realm create --rgw-realm=ldc1 --default

2. Create one local master zonegroup on the first data center:

Syntax

radosgw-admin zonegroup create --rgw-zonegroup=ZONE_GROUP_NAME --


endpoints=http://RGW_NODE_NAME:80 --rgw-realm=REALM_NAME --master --default

Example

[ceph: root@host01 /]# radosgw-admin zonegroup create --rgw-zonegroup=ldc1zg --


endpoints=http://rgw1:80 --rgw-realm=ldc1 --master --default

3. Create one local zone on the first data center:

Syntax

radosgw-admin zone create --rgw-zonegroup=ZONE_GROUP_NAME --rgw-zone=ZONE_NAME --master --


default --endpoints=HTTP_FQDN[,HTTP_FQDN]

Example

[ceph: root@host01 /]# radosgw-admin zone create --rgw-zonegroup=ldc1zg --rgw-zone=ldc1z --


master --default --endpoints=http://rgw.example.com

4. Commit the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

5. Optional: If you specified the realm and zone in the service specification during the deployment of the Ceph Object Gateway,
update the spec section of the specification file:

Syntax

spec:
rgw_realm: REALM_NAME
rgw_zone: ZONE_NAME

6. You can either deploy the Ceph Object Gateway daemons with the appropriate realm and zone or update the configuration
database:

Deploy the Ceph Object Gateway using placement specification:

Syntax

IBM Storage Ceph 659


ceph orch apply rgw SERVICE_NAME --realm=REALM_NAME --zone=ZONE_NAME --
placement="NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_2"

Example

[ceph: root@host01 /]# ceph orch apply rgw rgw --realm=ldc1 --zone=ldc1z --placement="1
host01"

Update the Ceph configuration database:

Syntax

ceph config set client.rgw.SERVICE_NAME rgw_realm REALM_NAME


ceph config set client.rgw.SERVICE_NAME rgw_zonegroup ZONE_GROUP_NAME
ceph config set client.rgw.SERVICE_NAME rgw_zone ZONE_NAME

Example

[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_realm ldc1


[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zonegroup
ldc1zg
[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zone ldc1z

7. Restart the Ceph Object Gateway.

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

a. To restart the Ceph Object Gateway on an individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

8. Create one local realm on the second data center in the storage cluster:

Syntax

radosgw-admin realm create --rgw-realm=REALM_NAME --default

Example

[ceph: root@host04 /]# radosgw-admin realm create --rgw-realm=ldc2 --default

9. Create one local master zonegroup on the second data center:

Syntax

radosgw-admin zonegroup create --rgw-zonegroup=ZONE_GROUP_NAME --


endpoints=http://RGW_NODE_NAME:80 --rgw-realm=REALM_NAME --master --default

Example

[ceph: root@host04 /]# radosgw-admin zonegroup create --rgw-zonegroup=ldc2zg --


endpoints=http://rgw2:80 --rgw-realm=ldc2 --master --default

10. Create one local zone on the second data center:

Syntax

660 IBM Storage Ceph


radosgw-admin zone create --rgw-zonegroup=ZONE_GROUP_NAME --rgw-zone=ZONE_NAME --master --
default --endpoints=HTTP_FQDN[, HTTP_FQDN]

Example

[ceph: root@host04 /]# radosgw-admin zone create --rgw-zonegroup=ldc2zg --rgw-zone=ldc2z --


master --default --endpoints=http://rgw.example.com

11. Commit the period:

Example

[ceph: root@host04 /]# radosgw-admin period update --commit

12. Optional: If you specified the realm and zone in the service specification during the deployment of the Ceph Object Gateway,
update the spec section of the specification file:

Syntax

spec:
rgw_realm: REALM_NAME
rgw_zone: ZONE_NAME

13. You can either deploy the Ceph Object Gateway daemons with the appropriate realm and zone or update the configuration
database:

Deploy the Ceph Object Gateway using placement specification:

Syntax

ceph orch apply rgw SERVICE_NAME --realm=REALM_NAME --zone=ZONE_NAME --


placement="NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_2"

Example

[ceph: root@host01 /]# ceph orch apply rgw rgw --realm=ldc2 --zone=ldc2z --placement="1
host01"

Update the Ceph configuration database:

Syntax

ceph config set client.rgw.SERVICE_NAME rgw_realm REALM_NAME


ceph config set client.rgw.SERVICE_NAME rgw_zonegroup ZONE_GROUP_NAME
ceph config set client.rgw.SERVICE_NAME rgw_zone ZONE_NAME

Example

[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_realm ldc2


[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zonegroup
ldc2zg
[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zone ldc2z

14. Restart the Ceph Object Gateway.

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

a. To restart the Ceph Object Gateway on individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host04 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

IBM Storage Ceph 661


[ceph: root@host04 /]# ceph orch restart rgw

15. Create a replicated realm on the first data center in the storage cluster:

Syntax

radosgw-admin realm create --rgw-realm=REPLICATED_REALM_1 --default

Example

[ceph: root@host01 /]# radosgw-admin realm create --rgw-realm=rdc1 --default

Use the --default flag to make the replicated realm default on the primary site.

16. Create a master zonegroup for the first data center:

Syntax

radosgw-admin zonegroup create --rgw-zonegroup=RGW_ZONE_GROUP --


endpoints=http://_RGW_NODE_NAME:80 --rgw-realm=_RGW_REALM_NAME --master --default

Example

[ceph: root@host01 /]# radosgw-admin zonegroup create --rgw-zonegroup=rdc1zg --


endpoints=http://rgw1:80 --rgw-realm=rdc1 --master --default

17. Create a master zone on the first data center:

Syntax

radosgw-admin zone create --rgw-zonegroup=RGW_ZONE_GROUP --rgw-zone=_MASTER_RGW_NODE_NAME --


master --default --endpoints=HTTP_FQDN[,HTTP_FQDN]

Example

[ceph: root@host01 /]# radosgw-admin zone create --rgw-zonegroup=rdc1zg --rgw-zone=rdc1z --


master --default --endpoints=http://rgw.example.com

18. Create a synchronization user and add the system user to the master zone for multi-site:

Syntax

radosgw-admin user create --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_object-
gateway_proc_rgw_configuring-multiple-realms-in-the-same-cluster_SYNCHRONIZATION_USER" --
display-name="Synchronization User" --system
radosgw-admin zone modify --rgw-zone=RGW_ZONE --access-key=ACCESS_KEY --secret=SECRET_KEY

Example

radosgw-admin user create --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_object-
gateway_proc_rgw_configuring-multiple-realms-in-the-same-cluster_synchronization-user" --
display-name="Synchronization User" --system
[ceph: root@host01 /]# radosgw-admin zone modify --rgw-zone=rdc1zg --access-
key=3QV0D6ZMMCJZMSCXJ2QJ --secret=VpvQWcsfI9OPzUCpR4kynDLAbqa1OIKqRB6WEnH8

19. Commit the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

20. Optional: If you specified the realm and zone in the service specification during the deployment of the Ceph Object Gateway,
update the spec section of the specification file:

Syntax

spec:
rgw_realm: REALM_NAME
rgw_zone: ZONE_NAME

21. You can either deploy the Ceph Object Gateway daemons with the appropriate realm and zone or update the configuration
database:

662 IBM Storage Ceph


Deploy the Ceph Object Gateway using placement specification:

Syntax

ceph orch apply rgw SERVICE_NAME --realm=REALM_NAME --zone=ZONE_NAME --


placement="NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_2"

Example

[ceph: root@host01 /]# ceph orch apply rgw rgw --realm=rdc1 --zone=rdc1z --placement="1
host01"

Update the Ceph configuration database:

Syntax

ceph config set client.rgw.SERVICE_NAME rgw_realm REALM_NAME


ceph config set client.rgw.SERVICE_NAME rgw_zonegroup ZONE_GROUP_NAME
ceph config set client.rgw.SERVICE_NAME rgw_zone ZONE_NAME

Example

[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_realm rdc1


[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zonegroup
rdc1zg
[ceph: root@host01 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zone rdc1z

22. Restart the Ceph Object Gateway.

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

a. To restart the Ceph Object Gateway on individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

23. Pull the replicated realm on the second data center:

Syntax

radosgw-admin realm pull --url=https://tower-osd1.cephtips.com --access-key=ACCESS_KEY --


secret-key=SECRET_KEY

Example

[ceph: root@host01 /]# radosgw-admin realm pull --url=https://tower-osd1.cephtips.com --


access-key=3QV0D6ZMMCJZMSCXJ2QJ --secret-key=VpvQWcsfI9OPzUCpR4kynDLAbqa1OIKqRB6WEnH8

24. Pull the period from the first data center:

Syntax

radosgw-admin period pull --url=https://tower-osd1.cephtips.com --access-key=ACCESS_KEY --


secret-key=SECRET_KEY

Example

[ceph: root@host01 /]# radosgw-admin period pull --url=https://tower-osd1.cephtips.com --


access-key=3QV0D6ZMMCJZMSCXJ2QJ --secret-key=VpvQWcsfI9OPzUCpR4kynDLAbqa1OIKqRB6WEnH8

IBM Storage Ceph 663


25. Create the secondary zone on the second data center:

Syntax

radosgw-admin zone create --rgw-zone=RGW_ZONE --rgw-zonegroup=RGW_ZONE_GROUP --


endpoints=https://tower-osd4.cephtips.com --access-key=_ACCESS_KEY --secret-key=SECRET_KEY

Example

[ceph: root@host04 /]# radosgw-admin zone create --rgw-zone=rdc2z --rgw-zonegroup=rdc1zg --


endpoints=https://tower-osd4.cephtips.com --access-key=3QV0D6ZMMCJZMSCXJ2QJ --secret-
key=VpvQWcsfI9OPzUCpR4kynDLAbqa1OIKqRB6WEnH8

26. Commit the period:

Example

[ceph: root@host04 /]# radosgw-admin period update --commit

27. Optional: If you specified the realm and zone in the service specification during the deployment of the Ceph Object Gateway,
update the spec section of the specification file:

Syntax

spec:
rgw_realm: REALM_NAME
rgw_zone: ZONE_NAME

28. You can either deploy the Ceph Object Gateway daemons with the appropriate realm and zone or update the configuration
database:

Deploy the Ceph Object Gateway using placement specification:

Syntax

ceph orch apply rgw SERVICE_NAME --realm=REALM_NAME --zone=ZONE_NAME --


placement="NUMBER_OF_DAEMONS HOST_NAME_1 HOST_NAME_2"

Example

[ceph: root@host04 /]# ceph orch apply rgw rgw --realm=rdc1 --zone=rdc2z --placement="1
host04"

Update the Ceph configuration database:

Syntax

ceph config set client.rgw.SERVICE_NAME rgw_realm REALM_NAME


ceph config set client.rgw.SERVICE_NAME rgw_zonegroup ZONE_GROUP_NAME
ceph config set client.rgw.SERVICE_NAME rgw_zone ZONE_NAME

Example

[ceph: root@host04 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_realm rdc1


[ceph: root@host04 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zonegroup
rdc1zg
[ceph: root@host04 /]# ceph config set client.rgw.rgwsvcid.mons-1.jwgwwp rgw_zone rdc2z

29. Restart the Ceph Object Gateway.

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

a. To restart the Ceph Object Gateway on individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host02 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

664 IBM Storage Ceph


Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host04 /]# ceph orch restart rgw

30. Log in as root on the endpoint for the second data center.

31. Verify the synchronization status on the master realm:

Syntax

radosgw-admin sync status

Example

[ceph: root@host04 /]# radosgw-admin sync status


realm 59762f08-470c-46de-b2b1-d92c50986e67 (ldc2)
zonegroup 7cf8daf8-d279-4d5c-b73e-c7fd2af65197 (ldc2zg)
zone 034ae8d3-ae0c-4e35-8760-134782cb4196 (ldc2z)
metadata sync no sync (zone is master)
current time 2023-08-17T05:49:56Z
zonegroup features enabled: resharding
disabled: compress-encrypted

IMPORTANT: In IBM Storage Ceph 5.3.z5, compress-encrypted feature is displayed with radosgw-admin sync status
command and it is disabled by default. Do not enable this feature as it is not supported until IBM Storage Ceph 6.1.z2.

32. Log in as root on the endpoint for the first data center.

33. Verify the synchronization status for the replication-synchronization realm:

Syntax

radosgw-admin sync status --rgw-realm RGW_REALM_NAME

Example

[ceph: root@host01 /]# radosgw-admin sync status --rgw-realm rdc1


realm 73c7b801-3736-4a89-aaf8-e23c96e6e29d (rdc1)
zonegroup d67cc9c9-690a-4076-89b8-e8127d868398 (rdc1zg)
zone 67584789-375b-4d61-8f12-d1cf71998b38 (rdc2z)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 705ff9b0-68d5-4475-9017-452107cec9a0 (rdc1z)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
realm 73c7b801-3736-4a89-aaf8-e23c96e6e29d (rdc1)
zonegroup d67cc9c9-690a-4076-89b8-e8127d868398 (rdc1zg)
zone 67584789-375b-4d61-8f12-d1cf71998b38 (rdc2z)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 705ff9b0-68d5-4475-9017-452107cec9a0 (rdc1z)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source

34. To store and access data in the local site, create the user for local realm:

Syntax

radosgw-admin user create --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_object-
gateway_proc_rgw_configuring-multiple-realms-in-the-same-cluster_LOCAL_USER" --display-
name="Local user" --rgw-realm=_REALM_NAME --rgw-zonegroup=ZONE_GROUP_NAME --rgw-zone=ZONE_NAME

IBM Storage Ceph 665


Example

[ceph: root@host04 /]# radosgw-admin user create --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_object-
gateway_proc_rgw_configuring-multiple-realms-in-the-same-cluster_local-user" --display-
name="Local user" --rgw-realm=ldc1 --rgw-zonegroup=ldc1zg --rgw-zone=ldc1z

IMPORTANT: By default, users are created under the default realm. For the users to access data in the local realm, the
radosgw-admin command requires the --rgw-realm argument.

Multi-site Ceph Object Gateway command line usage


Edit online
As a storage administrator, you can have a good understanding of how to use the Ceph Object Gateway in a multi-site environment.
You can learn how to better manage the realms, zone groups, and zones in a multi-site environment.

Realms
Zone Groups
Zones

Realms
Edit online
A realm represents a globally unique namespace consisting of one or more zonegroups containing one or more zones, and zones
containing buckets, which in turn contain objects. A realm enables the Ceph Object Gateway to support multiple namespaces and
their configuration on the same hardware.

A realm contains the notion of periods. Each period represents the state of the zone group and zone configuration in time. Each time
you make a change to a zonegroup or zone, update the period and commit it.

IBM recommends creating realms for new clusters.

Creating a realm
Making a Realm the Default
Deleting a Realm
Getting a realm
Listing realms
Setting a realm
Listing Realm Periods
Pulling a Realm
Renaming a Realm

Creating a realm
Edit online
To create a realm, issue the realm create command and specify the realm name. If the realm is the default, specify --default.

Syntax

radosgw-admin realm create --rgw-realm=REALM_NAME [--default]

Example

[ceph: root@host01 /]# radosgw-admin realm create --rgw-realm=test_realm --default

By specifying --default, the realm will be called implicitly with each radosgw-admin call unless --rgw-realm and the realm
name are explicitly provided.

666 IBM Storage Ceph


Making a Realm the Default
Edit online
One realm in the list of realms should be the default realm. There may be only one default realm. If there is only one realm and it
wasn’t specified as the default realm when it was created, make it the default realm. Alternatively, to change which realm is the
default, run the following command:

[ceph: root@host01 /]# radosgw-admin realm default --rgw-realm=test_realm

NOTE: When the realm is default, the command line assumes --rgw-realm=_REALM_NAME_ as an argument.

Deleting a Realm
Edit online
To delete a realm, run the realm delete command and specify the realm name.

Syntax

radosgw-admin realm delete --rgw-realm=REALM_NAME

Example

[ceph: root@host01 /]# radosgw-admin realm delete --rgw-realm=test_realm

Getting a realm
Edit online
To get a realm, run the realm get command and specify the realm name.

Syntax

radosgw-admin realm get --rgw-realm=REALM_NAME

Example

[ceph: root@host01 /]# radosgw-admin realm get --rgw-realm=test_realm >filename.json

The CLI will echo a JSON object with the realm properties.

{
"id": "0a68d52e-a19c-4e8e-b012-a8f831cb3ebc",
"name": "test_realm",
"current_period": "b0c5bbef-4337-4edd-8184-5aeab2ec413b",
"epoch": 1
}

Use > and an output file name to output the JSON object to a file.

Listing realms
Edit online
To list realms, run the realm list command:

Example

[ceph: root@host01 /]# radosgw-admin realm list

IBM Storage Ceph 667


Setting a realm
Edit online
To set a realm, run the realm set command, specify the realm name, and --infile= with an input file name.

Syntax

radosgw-admin realm set --rgw-realm=REALM_NAME --infile=IN_FILENAME

Example

[ceph: root@host01 /]# radosgw-admin realm set --rgw-realm=test_realm --infile=filename.json

Listing Realm Periods


Edit online
To list realm periods, run the realm list-periods command.

Example

[ceph: root@host01 /]# radosgw-admin realm list-periods

Pulling a Realm
Edit online
To pull a realm from the node containing the master zone group and master zone to a node containing a secondary zone group or
zone, run the realm pull command on the node that will receive the realm configuration.

Syntax

radosgw-admin realm pull --url=URL_TO_MASTER_ZONE_GATEWAY--access-key=ACCESS_KEY --


secret=SECRET_KEY

Renaming a Realm
Edit online
A realm is not part of the period. Consequently, renaming the realm is only applied locally, and will not get pulled with realm pull.
When renaming a realm with multiple zones, run the command on each zone. To rename a realm, run the following command:

Syntax

radosgw-admin realm rename --rgw-realm=REALM_NAME --realm-new-name=NEW_REALM_NAME

NOTE: Do NOT use realm set to change the name parameter. That changes the internal name only. Specifying --rgw-realm
would still use the old realm name.

Zone Groups
Edit online
The Ceph Object Gateway supports multi-site deployments and a global namespace by using the notion of zone groups. Formerly
called a region, a zone group defines the geographic location of one or more Ceph Object Gateway instances within one or more
zones.

668 IBM Storage Ceph


Configuring zone groups differs from typical configuration procedures, because not all of the settings end up in a Ceph configuration
file. You can list zone groups, get a zone group configuration, and set a zone group configuration.

NOTE: The radosgw-admin zonegroup operations can be performed on any node within the realm, because the step of updating
the period propagates the changes throughout the cluster. However, radosgw-admin zone operations MUST be performed on a
host within the zone.

Creating a Zone Group


Making a Zone Group the Default
Renaming a Zone Group
Deleting a zone group
Listing Zone Groups
Getting a Zone Group
Setting a Zone Group Map
Setting a Zone Group

Creating a Zone Group


Edit online
Creating a zone group consists of specifying the zone group name. Creating a zone assumes it will live in the default realm unless --
rgw-realm=_REALM_NAME_ is specified. If the zonegroup is the default zonegroup, specify the --default flag. If the zonegroup is
the master zonegroup, specify the --master flag.

Syntax

radosgw-admin zonegroup create --rgw-zonegroup=ZONE_GROUP_NAME [--rgw-realm=REALM_NAME] [--master]


[--default]

NOTE: Use zonegroup modify --rgw-zonegroup=_ZONE_GROUP_NAME_ to modify an existing zone group’s settings.

Making a Zone Group the Default


Edit online
One zonegroup in the list of zonegroups should be the default zonegroup. There may be only one default zonegroup. If there is only
one zonegroup and it wasn’t specified as the default zonegroup when it was created, make it the default zonegroup. Alternatively, to
change which zonegroup is the default, run the following command:

Example

[ceph: root@host01 /]# radosgw-admin zonegroup default --rgw-zonegroup=us

NOTE: When the zonegroup is the default, the command line assumes --rgw-zonegroup=_ZONE_GROUP_NAME_ as an argument.

Then, update the period:

[ceph: root@host01 /]# radosgw-admin period update --commit

Renaming a Zone Group


Edit online
To rename a zonegroup, run the following command:

Syntax

radosgw-admin zonegroup rename --rgw-zonegroup=ZONE_GROUP_NAME --zonegroup-new-


name=NEW_ZONE_GROUP_NAME

Then, update the period:

IBM Storage Ceph 669


Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Deleting a zone group


Edit online
To delete a zonegroup, run the following command:

Syntax

radosgw-admin zonegroup delete --rgw-zonegroup=ZONE_GROUP_NAME

Then, update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Listing Zone Groups


Edit online
A Ceph cluster contains a list of zone groups. To list the zone groups, run the following command:

[ceph: root@host01 /]# radosgw-admin zonegroup list

The radosgw-admin returns a JSON formatted list of zone groups.

{
"default_info": "90b28698-e7c3-462c-a42d-4aa780d24eda",
"zonegroups": [
"us"
]
}

Getting a Zone Group


Edit online
To view the configuration of a zone group, run the following command:

Syntax

radosgw-admin zonegroup get [--rgw-zonegroup=ZONE_GROUP_NAME]

The zone group configuration looks like this:

{
"id": "90b28698-e7c3-462c-a42d-4aa780d24eda",
"name": "us",
"api_name": "us",
"is_master": "true",
"endpoints": [
"http://rgw1:80"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "9248cab2-afe7-43d8-a661-a40bf316665e",
"zones": [
{
"id": "9248cab2-afe7-43d8-a661-a40bf316665e",
"name": "us-east",
"endpoints": [
"http://rgw1"

670 IBM Storage Ceph


],
"log_meta": "true",
"log_data": "true",
"bucket_index_max_shards": 11,
"read_only": "false"
},
{
"id": "d1024e59-7d28-49d1-8222-af101965a939",
"name": "us-west",
"endpoints": [
"http://rgw2:80"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 11,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "ae031368-8715-4e27-9a99-0c9468852cfe"
}

Setting a Zone Group Map


Edit online
Setting a zone group map consists of creating a JSON object consisting of one or more zone groups, and setting the
master_zonegroup for the cluster. Each zone group in the zone group map consists of a key/value pair, where the key setting is
equivalent to the name setting for an individual zone group configuration, and the val is a JSON object consisting of an individual
zone group configuration.

You may only have one zone group with is_master equal to true, and it must be specified as the master_zonegroup at the end
of the zone group map. The following JSON object is an example of a default zone group map.

{
"zonegroups": [
{
"key": "90b28698-e7c3-462c-a42d-4aa780d24eda",
"val": {
"id": "90b28698-e7c3-462c-a42d-4aa780d24eda",
"name": "us",
"api_name": "us",
"is_master": "true",
"endpoints": [
"http://rgw1:80"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "9248cab2-afe7-43d8-a661-a40bf316665e",
"zones": [
{
"id": "9248cab2-afe7-43d8-a661-a40bf316665e",
"name": "us-east",
"endpoints": [
"http://rgw1"
],
"log_meta": "true",
"log_data": "true",
"bucket_index_max_shards": 11,
"read_only": "false"
},
{
"id": "d1024e59-7d28-49d1-8222-af101965a939",
"name": "us-west",
"endpoints": [
"http://rgw2:80"

IBM Storage Ceph 671


],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 11,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "ae031368-8715-4e27-9a99-0c9468852cfe"
}
}
],
"master_zonegroup": "90b28698-e7c3-462c-a42d-4aa780d24eda",
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}
}

To set a zone group map, run the following command:

Example

[ceph: root@host01 /]# radosgw-admin zonegroup-map set --infile zonegroupmap.json

Where zonegroupmap.json is the JSON file you created. Ensure that you have zones created for the ones specified in the zone
group map. Finally, update the period.

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Setting a Zone Group


Edit online
Defining a zone group consists of creating a JSON object, specifying at least the required settings:

1. name: The name of the zone group. Required.

2. api_name: The API name for the zone group. Optional.

3. is_master: Determines if the zone group is the master zone group. Required.

NOTE: You can only have one master zone group.

4. endpoints: A list of all the endpoints in the zone group. For example, you may use multiple domain names to refer to the
same zone group. Remember to escape the forward slashes (/). You may also specify a port (fqdn:port) for each endpoint.
Optional.

5. hostnames: A list of all the hostnames in the zone group. For example, you may use multiple domain names to refer to the
same zone group. Optional. The rgw dns name setting will automatically be included in this list. You should restart the
gateway daemon(s) after changing this setting.

6. master_zone: The master zone for the zone group. Optional. Uses the default zone if not specified.

NOTE: You can only have one master zone per zone group.

672 IBM Storage Ceph


7. zones: A list of all zones within the zone group. Each zone has a name (required), a list of endpoints (optional), and whether or
not the gateway will log metadata and data operations (false by default).

8. placement_targets: A list of placement targets (optional). Each placement target contains a name (required) for the
placement target and a list of tags (optional) so that only users with the tag can use the placement target (i.e., the user’s
placement_tags field in the user info).

9. default_placement: The default placement target for the object index and object data. Set to default-placement by
default. You may also set a per-user default placement in the user info for each user.

To set a zone group, create a JSON object consisting of the required fields, save the object to a file, for example, zonegroup.json;
then, run the following command:

Example

[ceph: root@host01 /]# radosgw-admin zonegroup set --infile zonegroup.json

Where zonegroup.json is the JSON file you created.

IMPORTANT: The default zone group is_master setting is true by default. If you create a new zone group and want to make it
the master zone group, you must either set the default zone group is_master setting to false, or delete the default zone
group.

Finally, update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Zones
Edit online
Ceph Object Gateway supports the notion of zones. A zone defines a logical group consisting of one or more Ceph Object Gateway
instances.

Configuring zones differs from typical configuration procedures, because not all of the settings end up in a Ceph configuration file.
You can list zones, get a zone configuration, and set a zone configuration.

IMPORTANT: All radosgw-admin zone operations MUST be issued on a host that operates or will operate within the zone.

Creating a Zone
Deleting a zone
Modifying a Zone
Listing Zones
Getting a Zone
Setting a Zone
Renaming a zone
Adding a Zone to a Zone Group
Removing a Zone from a Zone Group

Creating a Zone
Edit online
To create a zone, specify a zone name. If it is a master zone, specify the --master option. Only one zone in a zone group may be a
master zone. To add the zone to a zonegroup, specify the --rgw-zonegroup option with the zonegroup name.

IMPORTANT: Zones must be created on a Ceph Object Gateway node that will be within the zone.

Syntax

radosgw-admin zone create --rgw-zone=ZONE_NAME


[--zonegroup=ZONE_GROUP_NAME]
[--endpoints=ENDPOINT_PORT [,<endpoint:port>]

IBM Storage Ceph 673


[--master] [--default]
--access-key ACCESS_KEY --secret SECRET_KEY

Then, update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Deleting a zone
Edit online
To delete a zone, first remove it from the zonegroup.

1. Remove the zone from the zonegroup:

Syntax

radosgw-admin zonegroup remove --rgw-zonegroup=ZONE_GROUP_NAME


--rgw-zone=ZONE_NAME

2. Update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

3. Delete the zone:

IMPORTANT: This procedure MUST be used on a host within the zone.

Syntax

radosgw-admin zone delete --rgw-zone=ZONE_NAME

4. Update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

IMPORTANT: Do not delete a zone without removing it from a zone group first. Otherwise, updating the period will fail.

If the pools for the deleted zone will not be used anywhere else, consider deleting the pools. Replace DELETED_ZONE_NAME
in the example below with the deleted zone’s name.

IMPORTANT: Once Ceph deletes the zone pools, it deletes all of the data within them in an unrecoverable manner. Only delete the
zone pools if Ceph clients no longer need the pool contents.

IMPORTANT: In a multi-realm cluster, deleting the .rgw.root pool along with the zone pools will remove ALL the realm information
for the cluster. Ensure that .rgw.root does not contain other active realms before deleting the .rgw.root pool.

Syntax

ceph osd pool delete DELETED_ZONE_NAME.rgw.control DELETED_ZONE_NAME.rgw.control --yes-i-really-


really-mean-it
ceph osd pool delete DELETED_ZONE_NAME.rgw.data.root DELETED_ZONE_NAME.rgw.data.root --yes-i-
really-really-mean-it
ceph osd pool delete DELETED_ZONE_NAME.rgw.log DELETED_ZONE_NAME.rgw.log --yes-i-really-really-
mean-it
ceph osd pool delete DELETED_ZONE_NAME.rgw.users.uid DELETED_ZONE_NAME.rgw.users.uid --yes-i-
really-really-mean-it

IMPORTANT: After deleting the pools, restart the RGW process.

Modifying a Zone

674 IBM Storage Ceph


Edit online
To modify a zone, specify the zone name and the parameters you wish to modify.

IMPORTANT: Zones should be modified on a Ceph Object Gateway node that will be within the zone.

Syntax

radosgw-admin zone modify [options]

`--access-key=<key>`
`--secret/--secret-key=<key>`
`--master`
`--default`
`--endpoints=<list>`

Then, update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Listing Zones
Edit online
As root, to list the zones in a cluster, run the following command:

Example

[ceph: root@host01 /]# radosgw-admin zone list

Getting a Zone
Edit online
As root, to get the configuration of a zone, run the following command:

Syntax

radosgw-admin zone get [--rgw-zone=ZONE_NAME]

The default zone looks like this:

{ "domain_root": ".rgw",
"control_pool": ".rgw.control",
"gc_pool": ".rgw.gc",
"log_pool": ".log",
"intent_log_pool": ".intent-log",
"usage_log_pool": ".usage",
"user_keys_pool": ".users",
"user_email_pool": ".users.email",
"user_swift_pool": ".users.swift",
"user_uid_pool": ".users.uid",
"system_key": { "access_key": "", "secret_key": ""},
"placement_pools": [
{ "key": "default-placement",
"val": { "index_pool": ".rgw.buckets.index",
"data_pool": ".rgw.buckets"}
}
]
}

Setting a Zone
Edit online

IBM Storage Ceph 675


Configuring a zone involves specifying a series of Ceph Object Gateway pools. For consistency, we recommend using a pool prefix
that is the same as the zone name. See the Pools for details on configuring pools.

IMPORTANT: Zones should be set on a Ceph Object Gateway node that will be within the zone.

To set a zone, create a JSON object consisting of the pools, save the object to a file, for example, zone.json; then, run the following
command, replacing ZONE_NAME with the name of the zone:

Example

[ceph: root@host01 /]# radosgw-admin zone set --rgw-zone=test-zone --infile zone.json

Where zone.json is the JSON file you created.

Then, as root, update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Renaming a zone
Edit online
To rename a zone, specify the zone name and the new zone name. Issue the following command on a host within the zone:

Syntax

radosgw-admin zone rename --rgw-zone=ZONE_NAME --zone-new-name=NEW_ZONE_NAME

Then, update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Adding a Zone to a Zone Group


Edit online
To add a zone to a zonegroup, you MUST run this command on a host that will be in the zone. To add a zone to a zonegroup, run the
following command:

Syntax

radosgw-admin zonegroup add --rgw-zonegroup=ZONE_GROUP_NAME --rgw-zone=ZONE_NAME

Then, update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Removing a Zone from a Zone Group


Edit online
To remove a zone from a zonegroup, run the following command:

Syntax

radosgw-admin zonegroup remove --rgw-zonegroup=ZONE_GROUP_NAME --rgw-zone=ZONE_NAME

Then, update the period:

676 IBM Storage Ceph


Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Configure LDAP and Ceph Object Gateway


Edit online
Perform the following steps to configure the Red Hat Directory Server to authenticate Ceph Object Gateway users.

Install Red Hat Directory Server


Configure the Directory Server firewall
Label ports for SELinux
Configure LDAPS
Check if the gateway user exists
Add a gateway user
Configure the gateway to use LDAP
Using a custom search filter
Add an S3 user to the LDAP server
Export an LDAP token
Test the configuration with an S3 client

Install Red Hat Directory Server


Edit online
Red Hat Directory Server should be installed on a Red Hat Enterprise Linux 8 with a graphical user interface (GUI) in order to use the
Java Swing GUI Directory and Administration consoles. However, Red Hat Directory Server can still be serviced exclusively from the
command line interface (CLI).

Prerequisites
Edit online

Red Hat Enterprise Linux (RHEL) is installed on the server.

The Directory Server node’s FQDN is resolvable using DNS or the /etc/hosts file.

Register the Directory Server node to the Red Hat subscription management service.

A valid Red Hat Directory Server subscription is available in your Red Hat account.

Procedure
Edit online

Follow the instructions in Installing the Directory Server packages and Setting up a new Directory Server instance of the Red Hat
Directory Server Installation Guide.

Reference
Edit online

Red Hat Director Server Installation Guide

Configure the Directory Server firewall

IBM Storage Ceph 677


Edit online
On the LDAP host, make sure that the firewall allows access to the Directory Server’s secure (636) port, so that LDAP clients can
access the Directory Server. Leave the default unsecure port (389) closed.

# firewall-cmd --zone=public --add-port=636/tcp


# firewall-cmd --zone=public --add-port=636/tcp --permanent

Label ports for SELinux


Edit online
To ensure SELinux does not block requests, label the ports for SELinux. For details see the Changing the LDAP and LDAPS Port
Numbers.

Configure LDAPS
Edit online
The Ceph Object Gateway uses a simple ID and password to authenticate with the LDAP server, so the connection requires an SSL
certificate for LDAP. Once the LDAP is working, configure the Ceph Object Gateway servers to trust the Directory Server’s certificate.

1. Extract/Download a PEM-formatted certificate for the Certificate Authority (CA) that signed the LDAP server’s SSL certificate.

2. Confirm that /etc/openldap/ldap.conf does not have TLS_REQCERT set.

3. Confirm that /etc/openldap/ldap.conf contains a TLS_CACERTDIR /etc/openldap/certs setting.

4. Use the certutil command to add the AD CA to the store at /etc/openldap/certs. For example, if the CA is "msad-
frog-MSAD-FROG-CA", and the PEM-formatted CA file is ldap.pem, use the following command:

Example

# certutil -d /etc/openldap/certs -A -t "TC,," -n "msad-frog-MSAD-FROG-CA" -i


/path/to/ldap.pem

5. Update SELinux on all remote LDAP sites:

Example

# setsebool -P httpd_can_network_connect on

NOTE: This still has to be set even if SELinux is in permissive mode.

6. Make the certs database world-readable:

Example

# chmod 644 /etc/openldap/certs/*

7. Connect to the server using the "ldapwhoami" command as a non-root user.

Example

$ ldapwhoami -H ldaps://redhat-directory-server.example.com -d 9

The -d 9 option will provide debugging information in case something went wrong with the SSL negotiation.

Check if the gateway user exists


Edit online
Before creating the gateway user, ensure that the Ceph Object Gateway does not already have the user.

Example

678 IBM Storage Ceph


[ceph: root@host01 /]# radosgw-admin metadata list user

The user name should NOT be in this list of users.

Add a gateway user


Edit online
Create an LDAP user for the Ceph Object Gateway.

Procedure
Edit online

1. Create an LDAP user for the Ceph Object Gateway, and make a note of the binddn. Since the Ceph object gateway uses the
ceph user, conside using ceph as the username. The user needs to have permissions to search the directory. The Ceph Object
Gateway binds to this user as specified in rgw_ldap_binddn.

2. Test to ensure that the user creation worked. Where ceph is the user ID under People and example.com is the domain, you
can perform a search for the user.

ldapsearch -x -D "uid=ceph,ou=People,dc=example,dc=com" -W -H ldaps://example.com -b


"ou=People,dc=example,dc=com" -s sub 'uid=ceph'
Edit online

On each gateway node, create a file for the user’s secret. For example, the secret may get stored in a file entitled /etc/bindpass.
For security, change the owner of this file to the ceph user and group to ensure it is not globally readable.

Add the rgw_ldap_secret option.

Syntax

ceph config set client.rgw OPTION VALUE

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_secret /etc/bindpass

1. Patch the bind password file to the Ceph Object Gateway container and reapply the Ceph Object Gateway specification.

Example

service_type: rgw
service_id: rgw.1
service_name: rgw.rgw.1
placement:
label: rgw
extra_container_args:
- -v
- /etc/bindpass:/etc/bindpass

Configure the gateway to use LDAP


Edit online

1. Change the Ceph configuration with the following commands on all the Ceph nodes.

Syntax

ceph config set client.rgw OPTION VALUE

Example

IBM Storage Ceph 679


[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_uri ldaps://:636
[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_binddn
"ou=poc,dc=example,dc=local"
[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_searchdn
"ou=poc,dc=example,dc=local"
[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_dnattr "uid"
[ceph: root@host01 /]# ceph config set client.rgw rgw_s3_auth_use_ldap true

2. Restart the Ceph Object Gateway:

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

a. To restart the Ceph Object Gateway on an individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

Using a custom search filter


Edit online
You can create a custom search filter to limit user access by using the rgw_ldap_searchfilter setting. There are two ways to use
the rgw_ldap_searchfilter setting:

1. Specifying a partial filter:

Example

"objectclass=inetorgperson"

The Ceph Object Gateway generates the search filter with the user name from the token and the value of rgw_ldap_dnattr.
The constructed filter is then combined with the partial filter from the rgw_ldap_searchfilter value. For example, the
user name and the settings generate the final search filter:

Example

"(&(uid=joe)(objectclass=inetorgperson))"

User joe is only granted access if he is found in the LDAP directory, has an object class of inetorgperson, and specifies a
valid password.

2. Specifying a complete filter:

A complete filter must contain a USERNAME token which is substituted with the user name during the authentication attempt.
The rgw_ldap_dnattr setting is not used in this case. For example, to limit valid users to a specific group, use the following
filter:

Example

"(&(uid=@USERNAME@)(memberOf=cn=ceph-users,ou=groups,dc=mycompany,dc=com))"

Add an S3 user to the LDAP server


680 IBM Storage Ceph
Edit online
In the administrative console on the LDAP server, create at least one S3 user so that an S3 client can use the LDAP user credentials.
Make a note of the user name and secret for use when passing the credentials to the S3 client.

Export an LDAP token


Edit online
When running Ceph Object Gateway with LDAP, the access token is all that is required. However, the access token is created from the
access key and secret key. Export the access key and secret key as an LDAP token.

1. Export the access key:

Syntax

export RGW_ACCESS_KEY_ID="USERNAME"

2. Export the secret key:

Syntax

export RGW_SECRET_ACCESS_KEY="PASSWORD"

3. Export the token. For LDAP, use ldap as the token type (ttype).

Example

radosgw-token --encode --ttype=ldap

For Active Directory, use ad as the token type.

Example

radosgw-token --encode --ttype=ad

The result is a base-64 encoded string, which is the access token. Provide this access token to S3 clients in lieu of the access
key. The secret key is no longer required.

4. Optional: For added convenience, export the base-64 encoded string to the RGW_ACCESS_KEY_ID environment variable if the
S3 client uses the environment variable.

Example

export
RGW_ACCESS_KEY_ID="ewogICAgIlJHV19UT0tFTiI6IHsKICAgICAgICAidmVyc2lvbiI6IDEsCiAgICAgICAgInR5cGU
iOiAibGRhcCIsCiAgICAgICAgImlkIjogImNlcGgiLAogICAgICAgICJrZXkiOiAiODAwI0dvcmlsbGEiCiAgICB9Cn0K"

Test the configuration with an S3 client


Edit online
Test the configuration with a Ceph Object Gateway client, using a script such as Python Boto.

Procedure
Edit online

1. Use the RGW_ACCESS_KEY_ID environment variable to configure the Ceph Object Gateway client. Alternatively, you can copy
the base-64 encoded string and specify it as the access key. Following is an example of the configured S3 client.

Example

cat .aws/credentials

[default]
aws_access_key_id =

IBM Storage Ceph 681


ewogICaGbnjlwe9UT0tFTiI6IHsKICAgICAgICAidmVyc2lvbiI6IDEsCiAgICAgICAgInR5cGUiOiAiYWQiLAogICAgIC
AgICJpZCI6ICJjZXBoIiwKICAgICAgICAia2V5IjogInBhc3M0Q2VwaCIKICAgIH0KfQo=
aws_secret_access_key =

NOTE: The secret key is no longer required.

2. Run the aws s3 ls command to verify the user.

Example

[root@host01 ~]# aws s3 ls --endpoint http://host03

2023-12-11 17:08:50 mybucket


2023-12-24 14:55:44 mybucket2

3. Optional: You can also run the radosgw-admin user command to verify the user in the directory.

Example

[root@host01 ~]# radosgw-admin user info --uid dir1


{
"user_id": "dir1",
"display_name": "dir1",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"subusers": [],
"keys": [],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"default_storage_class": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "ldap",
"mfa_ids": []
}

Configure Active Directory and Ceph Object Gateway


Edit online
Perform the following steps to configure an Active Directory server to authenticate Ceph Object Gateway users.

Using Microsoft Active Directory


Configuring Active Directory for LDAPS
Check if the gateway user exists
Add a gateway user
Configuring the gateway to use Active Directory
Add an S3 user to the LDAP server
Export an LDAP token
Test the configuration with an S3 client

682 IBM Storage Ceph


Using Microsoft Active Directory
Edit online
Ceph Object Gateway LDAP authentication is compatible with any LDAP-compliant directory service that can be configured for
simple bind, including Microsoft Active Directory. By using Active Directory, the Ceph Object Gateway binds as the user configured in
the rgw_ldap_binddn setting, and uses LDAPs to ensure security.

The process for configuring Active Directory is essentially identical to Configuring LDAP and Ceph Object Gateway, but may have
some Windows-specific usage.

Configuring Active Directory for LDAPS


Edit online
Active Directory LDAP servers are configured to use LDAPs by default. Windows Server 2012 and higher can use Active Directory
Certificate Services. Instructions for generating and installing SSL certificates for use with Active Directory LDAP are available in the
following MS TechNet article:LDAP over SSL (LDAPS) Certificate.

NOTE: Ensure that port 636 is open on the Active Directory host.

Check if the gateway user exists


Edit online
Before creating the gateway user, ensure that the Ceph Object Gateway does not already have the user.

Example

[ceph: root@host01 /]# radosgw-admin metadata list user

The user name should NOT be in this list of users.

Add a gateway user


Edit online
Create an LDAP user for the Ceph Object Gateway, and make a note of the binddn. Since the Ceph object gateway uses the ceph
user, consider using ceph as the username. The user needs to have permissions to search the directory.

The Ceph Object Gateway will bind to this user as specified in rgw_ldap_binddn.

Test to ensure that the user creation worked. Where ceph is the user ID under People and example.com is the domain, you can
perform a search for the user.

# ldapsearch -x -D "uid=ceph,ou=People,dc=example,dc=com" -W -H ldaps://example.com -b


"ou=People,dc=example,dc=com" -s sub 'uid=ceph'

On each gateway node, create a file for the user’s secret. For example, the secret may get stored in a file entitled /etc/bindpass.
For security, change the owner of this file to the ceph user and group to ensure it is not globally readable.

Add the rgw_ldap_secret option:

Syntax

ceph config set client.rgw OPTION VALUE

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_secret /etc/bindpass

IBM Storage Ceph 683


Copy the updated configuration file to each Ceph node:

Syntax

scp /etc/ceph/ceph.conf NODE:/etc/ceph

Configuring the gateway to use Active Directory


Edit online

1. Add the following options after setting the rgw_ldap_secret setting:

Syntax

ceph config set client.rgw OPTION VALUE

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_uri ldaps://FQDN:636


[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_binddn "BINDDN"
[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_searchdn "SEARCHDN"
[ceph: root@host01 /]# ceph config set client.rgw rgw_ldap_dnattr "cn"
[ceph: root@host01 /]# ceph config set client.rgw rgw_s3_auth_use_ldap true

For the rgw_ldap_uri setting, substitute FQDN with the fully qualified domain name of the LDAP server. If there is more than
one LDAP server, specify each domain.

For the rgw_ldap_binddn setting, substitute BINDDN with the bind domain. With a domain of example.com and a ceph
user under users and accounts, it should look something like this:

Example

rgw_ldap_binddn "uid=ceph,cn=users,cn=accounts,dc=example,dc=com"

For the rgw_ldap_searchdn setting, substitute SEARCHDN with the search domain. With a domain of example.com and
users under users and accounts, it should look something like this:

rgw_ldap_searchdn "cn=users,cn=accounts,dc=example,dc=com"

2. Restart the Ceph Object Gateway:

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

a. To restart the Ceph Object Gateway on an individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

Add an S3 user to the LDAP server

684 IBM Storage Ceph


Edit online
In the administrative console on the LDAP server, create at least one S3 user so that an S3 client can use the LDAP user credentials.
Make a note of the user name and secret for use when passing the credentials to the S3 client.

Export an LDAP token


Edit online
When running Ceph Object Gateway with LDAP, the access token is all that is required. However, the access token is created from the
access key and secret key. Export the access key and secret key as an LDAP token.

1. Export the access key:

Syntax

export RGW_ACCESS_KEY_ID="USERNAME"

2. Export the secret key:

Syntax

export RGW_SECRET_ACCESS_KEY="PASSWORD"

3. Export the token. For LDAP, use ldap as the token type (ttype).

Example

radosgw-token --encode --ttype=ldap

For Active Directory, use ad as the token type.

Example

radosgw-token --encode --ttype=ad

The result is a base-64 encoded string, which is the access token. Provide this access token to S3 clients in lieu of the access
key. The secret key is no longer required.

4. Optional: For added convenience, export the base-64 encoded string to the RGW_ACCESS_KEY_ID environment variable if the
S3 client uses the environment variable.

Example

export
RGW_ACCESS_KEY_ID="ewogICAgIlJHV19UT0tFTiI6IHsKICAgICAgICAidmVyc2lvbiI6IDEsCiAgICAgICAgInR5cGU
iOiAibGRhcCIsCiAgICAgICAgImlkIjogImNlcGgiLAogICAgICAgICJrZXkiOiAiODAwI0dvcmlsbGEiCiAgICB9Cn0K"

Test the configuration with an S3 client


Edit online
Test the configuration with a Ceph Object Gateway client, using a script such as Python Boto.

Procedure
Edit online

1. Use the RGW_ACCESS_KEY_ID environment variable to configure the Ceph Object Gateway client. Alternatively, you can copy
the base-64 encoded string and specify it as the access key. Following is an example of the configured S3 client:

Example

cat .aws/credentials

[default]
aws_access_key_id =

IBM Storage Ceph 685


ewogICaGbnjlwe9UT0tFTiI6IHsKICAgICAgICAidmVyc2lvbiI6IDEsCiAgICAgICAgInR5cGUiOiAiYWQiLAogICAgIC
AgICJpZCI6ICJjZXBoIiwKICAgICAgICAia2V5IjogInBhc3M0Q2VwaCIKICAgIH0KfQo=
aws_secret_access_key =

NOTE: The secret key is no longer required.

2. Run the aws s3 ls command to verify the user:

Example

[root@host01 ~]# aws s3 ls --endpoint http://host03

2023-12-11 17:08:50 mybucket


2023-12-24 14:55:44 mybucket2

3. Optional: You can also run the radosgw-admin user command to verify the user in the directory.

Example

[root@host01 ~]# radosgw-admin user info --uid dir1


{
"user_id": "dir1",
"display_name": "dir1",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"subusers": [],
"keys": [],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"default_storage_class": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "ldap",
"mfa_ids": []
}

The Ceph Object Gateway and OpenStack Keystone


Edit online
As a storage administrator, you can use OpenStack's Keystone authentication service to authenticate users through the Ceph Object
Gateway. Before you can configure the Ceph Object Gateway, you need to configure Keystone first. This enables the Swift service,
and points the Keystone service to the Ceph Object Gateway. Next, you need to configure the Ceph Object Gateway to accept
authentication requests from the Keystone service.

Roles for Keystone authentication


Keystone authentication and the Ceph Object Gateway
Creating the Swift service
Setting the Ceph Object Gateway endpoints
Verifying Openstack is using the Ceph Object Gateway endpoints
Configuring the Ceph Object Gateway to use Keystone SSL
Configuring the Ceph Object Gateway to use Keystone authentication
Restarting the Ceph Object Gateway daemon

686 IBM Storage Ceph


Prerequisites
Edit online

A running Red Hat OpenStack Platform environment.

A running IBM Storage Ceph environment.

A running Ceph Object Gateway environment.

Roles for Keystone authentication


Edit online
The OpenStack Keystone service provides three roles: admin, member, and reader. These roles are hierarchical; users with the
admin role inherit the capabilities of the member role and users with the member role inherit the capabilities of the reader role.

NOTE: The member role’s read permissions only apply to objects of the project it belongs to.

admin

The admin role is reserved for the highest level of authorization within a particular scope. This usually includes all the create, read,
update, or delete operations on a resource or API.

member

The member role is not used directly by default. It provides flexibility during deployments and helps reduce responsibility for
administrators.

For example, you can override a policy for a deployment by using the default member role and a simple policy override, to allow
system members to update services and endpoints. This provides a layer of authorization between admin and reader roles.

reader

The reader role is reserved for read-only operations regardless of the scope.

WARNING: If you use a reader to access sensitive information such as image license keys, administrative image data,
administrative volume metadata, application credentials, and secrets, you might unintentionally expose sensitive information.
Hence, APIs that expose these resources should carefully consider the impact of the reader role and appropriately defer access to
the member and admin roles.

Keystone authentication and the Ceph Object Gateway


Edit online
Organizations using OpenStack Keystone to authenticate users can integrate Keystone with the Ceph Object Gateway. The Ceph
Object Gateway enables the gateway to accept a Keystone token, authenticate the user, and create a corresponding Ceph Object
Gateway user. When Keystone validates a token, the gateway considers the user authenticated.

Benefits

Assigning admin, member, and reader roles to users with Keystone.

Automatic user creation in the Ceph Object Gateway.

Managing users with Keystone

The Ceph Object Gateway will query Keystone periodically for a list of revoked tokens.

Creating the Swift service

IBM Storage Ceph 687


Edit online
Before configuring the Ceph Object Gateway, configure Keystone so that the Swift service is enabled and pointing to the Ceph Object
Gateway.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Access to the Ceph software repository.

Root-level access to OpenStack controller node.

Procedure
Edit online

Create the Swift service:

# openstack service create --name=swift --description="Swift Service" object-store. Creating the service will echo the service
settings.

Example

Field Value
description Swift Service
enabled True
id 37c4c0e79571404cb4644201a4a6e5ee
name swift
type object-store

Setting the Ceph Object Gateway endpoints


Edit online
After creating the Swift service, point the service to a Ceph Object Gateway.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Access to the Ceph software repository.

A running Swift service on a RHOSP 17.0 environment.

Procedure
Edit online

Create the OpenStack endpoints pointing to the Ceph Object Gateway:

Syntax

openstack endpoint create --region REGION_NAME swift admin "URL"


openstack endpoint create --region REGION_NAME swift public "URL"
openstack endpoint create --region REGION_NAME swift internal "URL"

Replace REGION_NAME with the name of the gateway’s zone group name or region name. Replace URL with URLs appropriate
for the Ceph Object Gateway.

688 IBM Storage Ceph


Example

[root@osp ~]# openstack endpoint create --region us-west swift admin


"http://radosgw.example.com:8080/swift/v1"
[root@osp ~]# openstack endpoint create --region us-west swift public
"http://radosgw.example.com:8080/swift/v1"
[root@osp ~]# openstack endpoint create --region us-west swift internal
"http://radosgw.example.com:8080/swift/v1"

Field Value
adminurl http://radosgw.example.com:8080/swift/v1
id e4249d2b60e44743a67b5e5b38c18dd3
internalurl http://radosgw.example.com:8080/swift/v1
publicurl http://radosgw.example.com:8080/swift/v1
region us-west
service_id 37c4c0e79571404cb4644201a4a6e5ee
service_name swift
service_type object-store
Setting the endpoints will output the service endpoint settings.

Verifying Openstack is using the Ceph Object Gateway endpoints


Edit online
After creating the Swift service and setting the endpoints, show the endpoints to ensure that all settings are correct.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Access to the Ceph software repository.

Procedure
Edit online

1. List the endpoints under the Swift service:

Syntax

# openstack endpoint list --service=swift

2. Verify settings for the endpoints listed in the previous command:

Syntax

#openstack endpoint show ENDPOINT_ID

Showing the endpoints will echo the endpoints settings, and the service settings.

Table 1. Example
Field Value
adminurl http://radosgw.example.com:8080/swift/v1
enabled True
id e4249d2b60e44743a67b5e5b38c18dd3
internalurl http://radosgw.example.com:8080/swift/v1
publicurl http://radosgw.example.com:8080/swift/v1
region us-west
service_id 37c4c0e79571404cb4644201a4a6e5ee

IBM Storage Ceph 689


Field Value
service_name swift
service_type object-store

Configuring the Ceph Object Gateway to use Keystone SSL


Edit online
Converting the OpenSSL certificates that Keystone uses configures the Ceph Object Gateway to work with Keystone. When the Ceph
Object Gateway interacts with OpenStack’s Keystone authentication, Keystone will terminate with a self-signed SSL certificate.

Prerequisites
Edit online

A running, and healthy IBM Storage Ceph cluster.

Access to the Ceph software repository.

Procedure
Edit online

1. Convert the OpenSSL certificate to the nss db format:

Example

[root@osp ~]# mkdir /var/ceph/nss

[root@osp ~]# openssl x509 -in /etc/keystone/ssl/certs/ca.pem -pubkey |


certutil -d /var/ceph/nss -A -n ca -t "TCu,Cu,Tuw"

[root@osp ~]# openssl x509 -in /etc/keystone/ssl/certs/signing_cert.pem -pubkey |


certutil -A -d /var/ceph/nss -n signing_cert -t "P,P,P"

2. Install Keystone’s SSL certificate in the node running the Ceph Object Gateway. Alternatively set the value of the configurable
rgw_keystone_verify_ssl setting to false.

Setting rgw_keystone_verify_ssl to false means that the gateway will not attempt to verify the certificate.

Configuring the Ceph Object Gateway to use Keystone


authentication
Edit online
Configure the IBM Storage Ceph to use OpenStack's Keystone authentication.

Prerequisites
Edit online

A running, and healthy IBM Storage Ceph cluster.

Access to the Ceph software repository.

Have admin privileges to the production environment.

Procedure

690 IBM Storage Ceph


Edit online

1. Do the following for each gateway instance.

a. Set the rgw_s3_auth_use_keystone option to true:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_s3_auth_use_keystone true

b. Set the nss_db_path setting to the path where the NSS database is stored:

Example

[ceph: root@host01 /]# ceph config set client.rgw nss_db_path


"/var/lib/ceph/radosgw/ceph-rgw.rgw01/nss"

2. Provide authentication credentials:

It is possible to configure a Keystone service tenant, user, and password for keystone for the OpenStack Identity API, similar
to the way system administrators tend to configure OpenStack services. Providing a username and password avoids providing
the shared secret to the rgw_keystone_admin_token setting.

IMPORTANT:

IBM recommends disabling authentication by admin token in production environments. The service tenant credentials should
have admin privileges.

The necessary configuration options are:

Syntax

ceph config set client.rgw rgw_keystone_url _KEYSTONE_URL_:_ADMIN_PORT_


ceph config set client.rgw rgw_keystone_admin_user _KEYSTONE_TENANT_USER_NAME_
ceph config set client.rgw rgw_keystone_admin_password _KEYSTONE_TENANT_USER_PASSWORD_
ceph config set client.rgw rgw_keystone_admin_tenant _KEYSTONE_TENANT_NAME_
ceph config set client.rgw rgw_keystone_accepted_roles _KEYSTONE_ACCEPTED_USER_ROLES_
ceph config set client.rgw rgw_keystone_token_cache_size _NUMBER_OF_TOKENS_TO_CACHE_
ceph config set client.rgw rgw_keystone_revocation_interval
_NUMBER_OF_SECONDS_BEFORE_CHECKING_REVOKED_TICKETS_
ceph config set client.rgw rgw_keystone_implicit_tenants
_TRUE_FOR_PRIVATE_TENANT_FOR_EACH_NEW_USER_

A Ceph Object Gateway user is mapped into a Keystone tenant. A Keystone user has different roles assigned to it on possibly
more than a single tenant. When the Ceph Object Gateway gets the ticket, it looks at the tenant, and the user roles that are
assigned to that ticket, and accepts or rejects the request according to the rgw_keystone_accepted_roles configurable.

Reference
Edit online

See the Users and Identity Management Guide for Red Hat OpenStack Platform.

Restarting the Ceph Object Gateway daemon


Edit online
Restarting the Ceph Object Gateway must be done to active configuration changes.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Access to the Ceph software repository.

IBM Storage Ceph 691


admin privileges to the production environment.

Procedure
Edit online

Once you have saved the Ceph configuration file and distributed it to each Ceph node, restart the Ceph Object Gateway
instances:

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

1. To restart the Ceph Object Gateway on an individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

2. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

Security
Edit online
As a storage administrator, securing the storage cluster environment is important. IBM Storage Ceph provides encryption and key
management to secure the Ceph Object Gateway access point.

S3 server-side encryption
Server-side encryption requests
Configuring server-side encryption
The HashiCorp Vault
The Ceph Object Gateway and multi-factor authentication

Prerequisites
Edit online

A healthy running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

S3 server-side encryption
Edit online
The Ceph Object Gateway supports server-side encryption of uploaded objects for the S3 application programming interface (API).
Server-side encryption means that the S3 client sends data over HTTP in its unencrypted form, and the Ceph Object Gateway stores
that data in the IBM Storage Ceph cluster in encrypted form.

NOTE: IBM does NOT support S3 object encryption of Static Large Object (SLO) or Dynamic Large Object (DLO).

692 IBM Storage Ceph


NOTE: Currently, none of the S3 Server-Side Encryption (SSE) modes have implemented support for CopyObject. It is currently being
developed BZ#2149758.

IMPORTANT: To use encryption, client requests MUST send requests over an SSL connection. IBM does not support S3 encryption
from a client unless the Ceph Object Gateway uses SSL. However, for testing purposes, administrators can disable SSL during testing
by setting the rgw_crypt_require_ssl configuration setting to false at runtime, using the ceph config set client.rgw
command, and then restarting the Ceph Object Gateway instance. In a production environment, it might not be possible to send
encrypted requests over SSL. In such a case, send requests using HTTP with server-side encryption.

There are two options for the management of encryption keys:

Customer-provided Keys

When using customer-provided keys, the S3 client passes an encryption key along with each request to read or write encrypted data.
It is the customer’s responsibility to manage those keys. Customers must remember which key the Ceph Object Gateway used to
encrypt each object.

Ceph Object Gateway implements the customer-provided key behavior in the S3 API according to the Amazon SSE-C specification.

Since the customer handles the key management and the S3 client passes keys to the Ceph Object Gateway, the Ceph Object
Gateway requires no special configuration to support this encryption mode.

Key Management Service

When using a key management service, the secure key management service stores the keys and the Ceph Object Gateway retrieves
them on demand to serve requests to encrypt or decrypt data.

Ceph Object Gateway implements the key management service behavior in the S3 API according to the Amazon SSE-KMS
specification.

IMPORTANT: Currently, the only tested key management implementations are HashiCorp Vault, and OpenStack Barbican. However,
OpenStack Barbican is a Technology Preview and is not supported for use in production systems.

Amazon SSE-C

Amazon SSE-KMS

Configuring server side encryption

The HashiCorp Vault

Server-side encryption requests


Edit online
In a production environment, clients often contact the Ceph Object Gateway through a proxy. This proxy is referred to as a load
balancer because it connects to multiple Ceph Object Gateways. When the client sends requests to the Ceph Object Gateway, the
load balancer routes those requests to the multiple Ceph Object Gateways, thus distributing the workload.

In this type of configuration, it is possible that SSL terminations occur both at a load balancer and between the load balancer and the
multiple Ceph Object Gateways. Communication occurs using HTTP only.

Configuring server-side encryption


Edit online
You can set up server-side encryption to send requests to the Ceph Object Gateway using HTTP, in cases where it might not be
possible to send encrypted requests over SSL.

This procedure uses HAProxy as proxy and load balancer.

Prerequisites

IBM Storage Ceph 693


Edit online

A running IBM Storage Ceph cluster.

Root-level access to all nodes in the storage cluster.

Installation of the Ceph Object Gateway software.

Installation of the HAProxy software.

Procedure
Edit online

1. Edit the haproxy.cfg file:

Example

frontend http_web *:80


mode http
default_backend rgw

frontend rgw­
-https
bind *:443 ssl crt /etc/ssl/private/example.com.pem
default_backend rgw

backend rgw
balance roundrobin
mode http
server rgw1 10.0.0.71:8080 check
server rgw2 10.0.0.80:8080 check

2. Comment out the lines that allow access to the http front end and add instructions to direct HAProxy to use the https front
end instead:

Example

# frontend http_web *:80


# mode http
# default_backend rgw

frontend rgw­
-https
bind *:443 ssl crt /etc/ssl/private/example.com.pem
http-request set-header X-Forwarded-Proto https if { ssl_fc }
http-request set-header X-Forwarded-Proto https
# here we set the incoming HTTPS port on the load balancer (eg : 443)
http-request set-header X-Forwarded-Port 443
default_backend rgw

backend rgw
balance roundrobin
mode http
server rgw1 10.0.0.71:8080 check
server rgw2 10.0.0.80:8080 check

3. Set the rgw_trust_forwarded_https option to true:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_trust_forwarded_https true

4. Enable and start HAProxy:

[root@host01 ~]# systemctl enable haproxy


[root@host01 ~]# systemctl start haproxy

Reference
Edit online

High availability service

694 IBM Storage Ceph


IBM Storage Ceph installation

The HashiCorp Vault


Edit online
As a storage administrator, you can securely store keys, passwords, and certificates in the HashiCorp Vault for use with the Ceph
Object Gateway. The HashiCorp Vault provides a secure key management service for server-side encryption used by the Ceph Object
Gateway.

Figure 1. Ceph Vault integration

The basic workflow:

1. The client requests the creation of a secret key from the Vault based on an object's key ID.

2. The client uploads an object with the object's key ID to the Ceph Object Gateway.

3. The Ceph Object Gateway then requests the newly created secret key from the Vault.

4. The Vault replies to the request by returning the secret key to the Ceph Object Gateway.

5. Now the Ceph Object Gateway can encrypt the object using the new secret key.

6. After encryption is done the object is then stored on the Ceph OSD.

IMPORTANT: IBM works with our technology partners to provide this documentation as a service to our customers. However, IBM
does not provide support for this product. If you need technical assistance for this product, then contact Hashicorp for support.

Secret engines for Vault


Authentication for Vault
Namespaces for Vault
Transit engine compatibility support
Creating token policies for Vault
Configuring the Ceph Object Gateway to use SSE-S3 with Vault
Configuring the Ceph Object Gateway to use SSE-KMS with Vault
Creating a key using the kv engine
Creating a key using the transit engine
Uploading an object using AWS and the Vault

Prerequisites
IBM Storage Ceph 695
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

Installation of the HashiCorp Vault software.

Reference
Edit online

Install Vault documentation on Vault's project site

Secret engines for Vault


Edit online
The HashiCorp Vault provides several secret engines to generate, store, or encrypt data. The application programming interface (API)
sends data calls to the secret engine asking for action on that data, and the secret engine returns a result of that action request.

The Ceph Object Gateway supports two of the HashiCorp Vault secret engines:

Key/Value version 2

Transit

Key/Value version 2

The Key/Value secret engine stores random secrets within the Vault, on disk. With version 2 of the kv engine, a key can have a
configurable number of versions. The default number of versions is 10. Deleting a version does not delete the underlying data, but
marks the data as deleted, allowing deleted versions to be undeleted. You can use the API endpoint or the destroy command to
permanently remove a version’s data. To delete all versions and metadata for a key, you can use the metadata command or the API
endpoint. The key names must be strings, and the engine will convert non-string values into strings when using the command line
interface. To preserve non-string values, provide a JSON file or use the HTTP application programming interface (API).

NOTE: For access control list (ACL) policies, the Key/Value secret engine recognizes the distinctions between the create and
update capabilities.

Transit

The Transit secret engine performs cryptographic functions on in-transit data. The Transit secret engine can generate hashes, can be
a source of random bytes, and can also sign and verify data. The Vault does not store data when using the Transit secret engine. The
Transit secret engine supports key derivation, by allowing the same key to be used for multiple purposes. Also, the transit secret
engine supports key versioning. The Transit secret engine supports these key types:

aes128-gcm96
AES-GCM with a 128-bit AES key and a 96-bit nonce; supports encryption, decryption, key derivation, and convergent encryption

aes256-gcm96
AES-GCM with a 256-bit AES key and a 96-bit nonce; supports encryption, decryption, key derivation, and convergent encryption
(default)

chacha20-poly1305
ChaCha20-Poly1305 with a 256-bit key; supports encryption, decryption, key derivation, and convergent encryption

ed25519
Ed25519; supports signing, signature verification, and key derivation

ecdsa-p256
ECDSA using curve P-256; supports signing and signature verification

ecdsa-p384
ECDSA using curve P-384; supports signing and signature verification

ecdsa-p521
ECDSA using curve P-521; supports signing and signature verification

696 IBM Storage Ceph


rsa-2048
2048-bit RSA key; supports encryption, decryption, signing, and signature verification

rsa-3072
3072-bit RSA key; supports encryption, decryption, signing, and signature verification

rsa-4096
4096-bit RSA key; supports encryption, decryption, signing, and signature verification

See the KV Secrets Engine documentation on Vault’s project site for more information.

See the Transit Secrets Engine documentation on Vault’s project site for more information.

Authentication for Vault


Edit online
The HashiCorp Vault supports several types of authentication mechanisms. The Ceph Object Gateway currently supports the Vault
agent method. The Ceph Object Gateway uses the rgw_crypt_vault_auth, and rgw_crypt_vault_addr options to configure
the use of the HashiCorp Vault.

IMPORTANT: IBM supports the usage of Vault agent as the authentication method for containers and the usage of token
authentication is not supported on containers.

Vault Agent

The Vault agent is a daemon that runs on a client node and provides client-side caching, along with token renewal. The Vault agent
typically runs on the Ceph Object Gateway node. Run the Vault agent and refresh the token file. When the Vault agent is used in this
mode, you can use file system permissions to restrict who has access to the usage of tokens. Also, the Vault agent can act as a proxy
server, that is, Vault will add a token when required and add it to the requests passed to it before forwarding them to the actual
server. The Vault agent can still handle token renewal just as it would when storing a token in the Filesystem. It is required to secure
the network that Ceph Object Gateways uses to connect with the Vault agent, for example, the Vault agent listens to only the
localhost.

Reference
Edit online

See the Vault Agent documentation on Vault’s project site for more information.

Namespaces for Vault


Edit online
Using HashiCorp Vault as an enterprise service provides centralized management for isolated namespaces that teams within an
organization can use. These isolated namespace environments are known as tenants, and teams within an organization can utilize
these tenants to isolate their policies, secrets, and identities from other teams. The namespace features of Vault help support secure
multi-tenancy from within a single infrastructure.

Reference
Edit online

See the Vault Enterprise Namespaces documentation on Vault’s project site for more information.

Transit engine compatibility support


Edit online

IBM Storage Ceph 697


There is compatibility support for the previous versions of Ceph which used the Transit engine as a simple key store. You can use the
compat option in the Transit engine to configure the compatibility support. You can disable previous support with the following
command:

Example

[ceph: root@host03 /]# ceph config set client.rgw rgw_crypt_vault_secret_engine transit compat=0

NOTE: This is the default for future versions and you can use the current version for new installations.

The normal default with the current version is:

Example

[ceph: root@host03 /]# ceph config set client.rgw rgw_crypt_vault_secret_engine transit compat=1

This enables the new engine for newly created objects and still allows the old engine to be used for the old objects. To access old
and new objects, the Vault token must have both the old and new transit policies.

You can force use only the old engine with the following command:

Example

[ceph: root@host03 /]# ceph config set client.rgw rgw_crypt_vault_secret_engine transit compat=2

This mode is selected by default if the Vault ends in export/encryption-key.

IMPORTANT: After configuring the client.rgw options, you need to restart the Ceph Object Gateway daemons for the new values
to take effect.

Reference
Edit online

See the Vault Agent documentation on Vault’s project site for more information.

Creating token policies for Vault


Edit online
A token policy specifies the powers that all Vault tokens have. One token can have multiple policies. You should use the required
policy for the configuration.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the HashiCorp Vault software.

Root-level access to the HashiCorp Vault node.

Procedure
Edit online

1. Create a token policy:

a. For the Key/Value secret engine:

Example

[root@vault ~]# vault policy write rgw-kv-policy -<<EOF


path "secret/data/*" {
capabilities = ["read"]

698 IBM Storage Ceph


}
EOF

b. For the Transit engine:

Example

[root@vault ~]# vault policy write rgw-transit-policy -<<EOF


path "transit/keys/*" {
capabilities = [ "create", "update" ]
denied_parameters = {"exportable" = [], "allow_plaintext_backup" = [] }
}

path "transit/keys/*" {
capabilities = ["read", "delete"]
}

path "transit/keys/" {
capabilities = ["list"]
}

path "transit/keys/+/rotate" {
capabilities = [ "update" ]
}

path "transit/*" {
capabilities = [ "update" ]
}
EOF

NOTE: If you have used the Transit secret engine on an older version of Ceph, the token policy is:

Example

[root@vault ~]# vault policy write old-rgw-transit-policy -<<EOF


path "transit/export/encryption-key/*" {
capabilities = ["read"]
}
EOF

If you are using both SSE-KMS and SSE-S3, you should point each to separate containers. You could either use separate Vault
instances or separately mount transit instances or different branches under a common transit point. If you are not using separate
Vault instances, you can point SSE-KMS or SSE-S3 to serparate containers using rgw_crypt_vault_prefix and
rgw_crypt_sse_s3_vault_prefix. When granting Vault permissions to SSE-KMS bucket owners, you should not give them
permission to SSE-S3 keys; only Ceph should have access to the SSE-S3 keys.

Configuring the Ceph Object Gateway to use SSE-S3 with Vault


Edit online
To configure the Ceph Object Gateway to use the HashiCorp Vault with SSE-S3 for key management, it must be set as the encryption
key store. Currently, the Ceph Object Gateway two secret engines, and two different authentication methods.

Prerequisites
Edit online

A running, and healthy IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

Root-level access to a Ceph Object Gateway node.

Procedure
Edit online

1. Log into the Cephadm shell

IBM Storage Ceph 699


Example

[root@host01 ~]# cephadm shell

2. Enable Vault as the secrets engine to retrieve SSE-S3 encryption keys:

Syntax

ceph config set client.rgw rgw_crypt_sse_s3_backend vault

3. Set the authentication method to use with SSE-S3 and Vault.

Method 1 If using agent authentication, configure the following settings:

Syntax

ceph config set client.rgw rgw_crypt_sse_s3_vault_auth agent


ceph config set client.rgw rgw_crypt_sse_s3_vault_addr
http://_VAULT_AGENT_:_VAULT_AGENT_PORT_

Example

[ceph: root@host01 ~]# ceph config set client.rgw rgw_crypt_sse_s3_vault_auth agent


[ceph: root@host01 ~]# ceph config set client.rgw rgw_crypt_sse_s3_vault_addr
http://vaultagent:8100

Customize the policy as per your use case to set up a Vault agent.

Get the role-id:

Syntax

vault read auth/approle/role/rgw-ap/role-id -format=json | \ jq -r .rgw-ap-role-id >


_PATH_TO_FILE_

Get the secret-id:

Syntax

vault read auth/approle/role/rgw-ap/role-id -format=json | \ jq -r .rgw-ap-secret-id


> _PATH_TO_FILE_

Create the configuration for the Vault agent:

Example

pid_file = "/run/rgw-vault-agent-pid"
auto_auth {
method "AppRole" {
mount_path = "auth/approle"
config = {
role_id_file_path ="/usr/local/etc/vault/.rgw-ap-role-id"
secret_id_file_path ="/usr/local/etc/vault/.rgw-ap-secret-id"
remove_secret_id_file_after_reading ="false"
}
}
}
cache {
use_auto_auth_token = true
}
listener "tcp" {
address = "127.0.0.1:8100"
tls_disable = true
}
vault {
address = "https://vaultserver:8200"
}

Use systemctl to run the persistent daemon:

Example

[root@host01 ~]# /usr/local/bin/vault agent -config=/usr/local/etc/vault/rgw-


agent.hcl

700 IBM Storage Ceph


A token file is populated with a valid token when the Vault agent runs.

Method 2 If using token authentication, configure the following settings: NOTE: Token authentication is not support on
IBM Ceph Storage 5.

Syntax

ceph config set client.rgw rgw_crypt_sse_s3_vault_auth token


ceph config set client.rgw rgw_crypt_sse_s3_vault_token_file _PATH_TO_TOKEN_FILE_
ceph config set client.rgw rgw_crypt_sse_s3_vault_addr http://_VAULT_SERVER_:_VAULT_PORT_

Example

[ceph: root@host01 ~]# ceph config set client.rgw rgw_crypt_sse_s3_vault_auth token


[ceph: root@host01 ~]# ceph config set client.rgw rgw_crypt_sse_s3_vault_token_file
/run/.rgw-vault-token
[ceph: root@host01 ~]# ceph config set client.rgw rgw_crypt_sse_s3_vault_addr
http://vaultserver:8200

NOTE: For security reasons, the path to the token file should only be readable by the RADOS Gateway.

4. Set the Vault secret engine to use to retrieve encryption keys, either Key/Value or Transit.

a. If using Key/Value, set the following:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_crypt_sse_s3_vault_secret_engine kv

b. If using Transit, set the following:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_crypt_sse_s3_vault_secret_engine


transit

5. Optional: Configure the Ceph Object Gateway to access Vault within a particular namespace to retrieve the encryption keys:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_crypt_sse_s3_vault_namespace


company/testnamespace1

NOTE: Vault namespaces allow teams to operate within isolated environments known as tenants. The Vault namespaces
feature is only available in the Vault Enterprise version.

6. Optional: Restrict access to a particular subset of the Vault secret space by setting a URL path prefix, where the Ceph Object
Gateway retrieves the encryption keys from:

a. If using Key/Value, set the following:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_crypt_sse_s3_vault_prefix


/v1/secret/data

b. If using Transit, set the following:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_crypt_sse_s3_vault_prefix


/v1/transit

Assuming the domain name of the Vault server is vaultserver, the Ceph Object Gateway will fetch encrypted transit
keys from the following URL:

Example

http://vaultserver:8200/v1/transit

7. Optional: To use custom SSL certification to authenticate with Vault, configure the following settings:

Syntax

IBM Storage Ceph 701


ceph config set client.rgw rgw_crypt_sse_s3_vault_verify_ssl true
ceph config set client.rgw rgw_crypt_sse_s3_vault_ssl_cacert PATH_TO_CA_CERTIFICATE
ceph config set client.rgw rgw_crypt_sse_s3_vault_ssl_clientcert PATH_TO_CLIENT_CERTIFICATE
ceph config set client.rgw rgw_crypt_sse_s3_vault_ssl_clientkey PATH_TO_PRIVATE_KEY

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_crypt_sse_s3_vault_verify_ssl true


[ceph: root@host01 /]# ceph config set client.rgw rgw_crypt_sse_s3_vault_ssl_cacert
/etc/ceph/vault.ca
[ceph: root@host01 /]# ceph config set client.rgw rgw_crypt_sse_s3_vault_ssl_clientcert
/etc/ceph/vault.crt
[ceph: root@host03 /]# ceph config set client.rgw rgw_crypt_sse_s3_vault_ssl_clientkey
/etc/ceph/vault.key

8. Restart the Ceph Object Gateway daemons.

a. To restart the Ceph Object Gateway on an individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

Reference
Edit online

Secret engines for Vault

Authentication for Vault

Configuring the Ceph Object Gateway to use SSE-KMS with Vault


Edit online
To configure the Ceph Object Gateway to use the HashiCorp Vault with SSE-KMS for key management, it must be set as the
encryption key store. Currently, the Ceph Object Gateway supports two different secret engines, and two different authentication
methods.

Prerequisites
Edit online

A running, and healthy IBM Storage Ceph cluster.

Access to the Ceph software repository.

Installation of the Ceph Object Gateway software.

Root-level access to a Ceph Object Gateway node.

Procedure
702 IBM Storage Ceph
Edit online

1. Use the ceph config set client.rgw _OPTION_ _VALUE_ command to enable Vault as the encryption key store:

Syntax

ceph config set client.rgw rgw_crypt_s3_kms_backend vault

2. Add the following options and values:

Syntax

ceph config set client.rgw rgw_crypt_vault_auth agent


ceph config set client.rgw rgw_crypt_vault_addr http://VAULT_SERVER:8100

3. Customize the policy as per the use case.

4. Get the role-id:

Syntax

vault read auth/approle/role/rgw-ap/role-id -format=json | \ jq -r .data.role_id >


_PATH_TO_FILE_

5. Get the secret-id:

Syntax

vault read auth/approle/role/rgw-ap/role-id -format=json | \ jq -r .data.secret_id >


_PATH_TO_FILE_

6. Create the configuration for the Vault agent:

Example

pid_file = "/run/kv-vault-agent-pid"
auto_auth {
method "AppRole" {
mount_path = "auth/approle"
config = {
role_id_file_path ="/root/vault_configs/kv-agent-role-id"
secret_id_file_path ="/root/vault_configs/kv-agent-secret-id"
remove_secret_id_file_after_reading ="false"
}
}
}
cache {
use_auto_auth_token = true
}
listener "tcp" {
address = "127.0.0.1:8100"
tls_disable = true
}
vault {
address = "http://10.8.128.9:8200"
}

7. Use systemctl to run the persistent daemon:

Example

[root@host03 ~]# /usr/local/bin/vault agent -config=/usr/local/etc/vault/rgw-agent.hcl

8. A token file is populated with a valid token when the Vault agent runs.

9. Select a Vault secret engine, either Key/Value or Transit.

a. If using Key/Value, then add the following line:

Example

[ceph: root@host03 /]# ceph config set client.rgw rgw_crypt_vault_secret_engine kv

b. If using Transit, then add the following line:

IBM Storage Ceph 703


Example

[ceph: root@host03 /]# ceph config set client.rgw rgw_crypt_vault_secret_engine transit

10. Use the ceph config set client.rgw _OPTION_ _VALUE_ command to set the Vault namespace to retrieve the
encryption keys:

Example

[ceph: root@host03 /]# ceph config set client.rgw rgw_crypt_vault_namespace testnamespace1

11. Restrict where the Ceph Object Gateway retrieves the encryption keys from the Vault by setting a path prefix:

Example

[ceph: root@host03 /]# ceph config set client.rgw rgw_crypt_vault_prefix /v1/secret/data

a. For exportable Transit keys, set the prefix path as follows:

Example

[ceph: root@host03 /]# ceph config set client.rgw rgw_crypt_vault_prefix


/v1/transit/export/encryption-key

Assuming the domain name of the Vault server is vault-server, the Ceph Object Gateway will fetch encrypted transit
keys from the following URL:

Example

http://vault-server:8200/v1/transit/export/encryption-key

12. Restart the Ceph Object Gateway daemons.

a. To restart the Ceph Object Gateway on an individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host03 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host03 /]# ceph orch restart rgw

Reference
Edit online

Secret engines for Vault

Authentication for Vault

Creating a key using the kv engine

Edit online
Configure the HashiCorp Vault Key/Value secret engine (kv) so you can create a key for use with the Ceph Object Gateway. Secrets
are stored as key-value pairs in the kv secret engine.

704 IBM Storage Ceph


IMPORTANT: Keys for server-side encryption must be 256-bits long and encoded using base64.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

root or sudo access

Ceph Object Gateway installed

Installation of the HashiCorp Vault software.

Root-level access to the HashiCorp Vault node.

Procedure
Edit online

1. Enable the Key/Value version 2 secret engine:

Example

vault secrets enable -path secret kv-v2

2. Create a new key:

Syntax

vault kv put secret/PROJECT_NAME/BUCKET_NAME key=$(openssl rand -base64 32)

Example

[root@vault ~]# vault kv put secret/myproject/mybucketkey key=$(openssl rand -base64 32)

====== Metadata ======


Key Value
--- -----
created_time 2020-02-21T17:01:09.095824999Z
deletion_time n/a
destroyed false
version 1

Creating a key using the transit engine


Edit online
Configure the HashiCorp Vault Transit secret engine (transit) so you can create a key for use with the Ceph Object Gateway.
Creating keys with the Transit secret engine must be exportable in order to be used for server-side encryption with the Ceph Object
Gateway.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the HashiCorp Vault software.

Root-level access to the HashiCorp Vault node.

Procedure
Edit online

IBM Storage Ceph 705


1. Enable the Transit secret engine:

[root@vault ~]# vault secrets enable transit

2. Create a new exportable key:

Syntax

vault write -f transit/keys/BUCKET_NAME exportable=true

Example

[root@vault ~]# vault write -f transit/keys/mybucketkey exportable=true

NOTE: By default the above command creates a aes256-gcm96 type key.

3. Verify the creation of the key:

Syntax

vault read transit/export/encryption-key/BUCKET_NAME/VERSION_NUMBER

Example

[root@vault ~]# vault read transit/export/encryption-key/mybucketkey/1

Key Value
--- -----
keys map[1:-gbTI9lNpqv/V/2lDcmH2Nq1xKn6FPDWarCmFM2aNsQ=]
name mybucketkey
type aes256-gcm96

NOTE: Providing the full key path, including the key version, is required.

Uploading an object using AWS and the Vault


Edit online
When uploading an object to the Ceph Object Gateway, the Ceph Object Gateway will fetch the key from the Vault, and then encrypt
and store the object in a bucket. When a request is made to download the object, the Ceph Object Gateway will automatically
retrieve the corresponding key from the Vault and decrypt the object. To upload an object, the Ceph object Gateway fetches the key
from the Vault and then encrypts the object and stores it in the bucket. The Ceph Object Gateway retrieves the corresponding key
from the Vault and decrypts the object when there is a request to download the object.

NOTE: The URL is constructed using the base address, set by the rgw_crypt_vault_addr option, and the path prefix, set by the
rgw_crypt_vault_prefix option.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

Installation of the HashiCorp Vault software.

Access to a Ceph Object Gateway client node.

Access to Amazon Web Services (AWS).

Procedure
Edit online

1. Upload an object using the AWS command line client and provide the Secure Side Encryption(SSE) key ID in the request:

706 IBM Storage Ceph


a. For the Key/Value secret engine:

Example

[user@client ~]$ aws --endpoint=http://radosgw:8000 s3 cp plaintext.txt


s3://mybucket/encrypted.txt --sse=aws:kms --sse-kms-key-id myproject/mybucketkey

NOTE: In the example, the Ceph Object Gateway would fetch the secret from http://vault-
server:8200/v1/secret/data/myproject/mybucketkey

b. For the Transit engine:

Example

[user@client ~]$ aws --endpoint=http://radosgw:8000 s3 cp plaintext.txt


s3://mybucket/encrypted.txt --sse=aws:kms --sse-kms-key-id mybucketkey

NOTE: In the example, the Ceph Object Gateway would fetch the secret from
http://vaultserver:8200/v1/transit/mybucketkey

The Ceph Object Gateway and multi-factor authentication


Edit online
As a storage administrator, you can manage time-based one time password (TOTP) tokens for Ceph Object Gateway users.

Multi-factor authentication
Creating a seed for multi-factor authentication
Creating a new multi-factor authentication TOTP token
Test a multi-factor authentication TOTP token
Resynchronizing a multi-factor authentication TOTP token
Listing multi-factor authentication TOTP tokens
Display a multi-factor authentication TOTP token
Deleting a multi-factor authentication TOTP token

Multi-factor authentication
Edit online
When a bucket is configured for object versioning, a developer can optionally configure the bucket to require multi-factor
authentication (MFA) for delete requests. Using MFA, a time-based one time password (TOTP) token is passed as a key to the x-
amz-mfa header. The tokens are generated with virtual MFA devices like Google Authenticator, or a hardware MFA device like those
provided by Gemalto.

Use radosgw-admin to assign time-based one time password tokens to a user. You must set a secret seed and a serial ID. You can
also use radosgw-admin to list, remove, and resynchronize tokens.

IMPORTANT: In a multisite environment it is advisable to use different tokens for different zones, because, while MFA IDs are set on
the user’s metadata, the actual MFA one time password configuration resides on the local zone’s OSDs.

Term Description
TOTP Time-based One Time Password.
Token serial A string that represents the ID of a TOTP token.
Token seed The secret seed that is used to calculate the TOTP. It can be hexadecimal or base32.
TOTP seconds The time resolution used for TOTP generation.
TOTP window The number of TOTP tokens that are checked before and after the current token when validating tokens.
TOTP pin The valid value of a TOTP token at a certain time.
Table: Terminology

Creating a seed for multi-factor authentication


IBM Storage Ceph 707
Edit online
To set up multi-factor authentication (MFA), you must create a secret seed for use by the one-time password generator and the back-
end MFA system.

Prerequisites
Edit online

A Linux system.

Access to the command line shell.

Procedure
Edit online

1. Generate a 30 character seed from the urandom Linux device file and store it in the shell variable SEED:

Example

[user@host01 ~]$ SEED=$(head -10 /dev/urandom | sha512sum | cut -b 1-30)

2. Print the seed by running echo on the SEED variable:

Example

[user@host01 ~]$ echo $SEED


492dedb20cf51d1405ef6a1316017e

Configure the one-time password generator and the back-end MFA system to use the same seed.

Reference
Edit online

The Ceph Object Gateway and multi-factor authentication

Creating a new multi-factor authentication TOTP token


Edit online
Create a new multi-factor authentication (MFA) time-based one time password (TOTP) token.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ceph Object Gateway is installed.

You have root access on a Ceph Monitor node.

A secret seed for the one-time password generator and Ceph Object Gateway MFA was generated.

Procedure
Edit online

Create a new MFA TOTP token:

Syntax

708 IBM Storage Ceph


radosgw-admin mfa create --uid=USERID --totp-serial=SERIAL --totp-seed=SEED --totp-seed-
type=SEED_TYPE --totp-seconds=TOTP_SECONDS --totp-window=TOTP_WINDOW

Set USERID to the user name to set up MFA on, set SERIAL to a string that represents the ID for the TOTP token, and set SEED
to a hexadecimal or base32 value that is used to calculate the TOTP. The following settings are optional: Set the SEED_TYPE to
hex or base32, set TOTP_SECONDS to the timeout in seconds, or set TOTP_WINDOW to the number of TOTP tokens to check
before and after the current token when validating tokens.

Example

[root@host01 ~]# radosgw-admin mfa create --uid=johndoe --totp-serial=MFAtest --totp-


seed=492dedb20cf51d1405ef6a1316017e

Reference
Edit online

Creating a seed for multi-factor authentication

Resynchronizing a multi-factor authentication token

Test a multi-factor authentication TOTP token


Edit online
Test a multi-factor authentication (MFA) time-based one time password (TOTP) token.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the nodes.

Ceph Object Gateway is installed.

You have root access on a Ceph Monitor node.

An MFA TOTP token was created using radosgw-admin mfa create.

Procedure
Edit online

Test the TOTP token PIN to verify that TOTP functions correctly:

Syntax

radosgw-admin mfa check --uid=USERID --totp-serial=SERIAL --totp-pin=PIN

Set USERID to the user name MFA is set up on, set SERIAL to the string that represents the ID for the TOTP token, and set PIN
to the latest PIN from the one-time password generator.

Example

[root@host01 ~]# radosgw-admin mfa check --uid=johndoe --totp-serial=MFAtest --totp-


pin=870305
ok

If this is the first time you have tested the PIN, it may fail. If it fails, resynchronize the token.

Reference
Edit online

IBM Storage Ceph 709


Creating a seed for multi-factor authentication

Resynchronizing a multi-factor authentication token

Resynchronizing a multi-factor authentication TOTP token


Edit online
Resynchronize a multi-factor authentication (MFA) time-based one time password token.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ceph Object Gateway is installed.

You have root access on a Ceph Monitor node.

An MFA TOTP token was created using radosgw-admin mfa create.

Procedure
Edit online

1. Resynchronize a multi-factor authentication TOTP token in case of time skew or failed checks.

This requires passing in two consecutive pins: the previous pin, and the current pin.

Syntax

radosgw-admin mfa resync --uid=USERID --totp-serial=SERIAL --totp-pin=PREVIOUS_PIN --


totp=pin=CURRENT_PIN

Set USERID to the user name MFA is set up on, set SERIAL to the string that represents the ID for the TOTP token, set
PREVIOUS_PIN to the user’s previous PIN, and set CURRENT_PIN to the user’s current PIN.

Example

[root@host01 ~]# radosgw-admin mfa resync --uid=johndoe --totp-serial=MFAtest --totp-


pin=802021 --totp-pin=439996

2. Verify the token was successfully resynchronized by testing a new PIN:

Syntax

radosgw-admin mfa check --uid=USERID --totp-serial=SERIAL --totp-pin=PIN

Set USERID to the user name MFA is set up on, set SERIAL to the string that represents the ID for the TOTP token, and set PIN
to the user’s PIN.

Example

[root@host01 ~]# radosgw-admin mfa check --uid=johndoe --totp-serial=MFAtest --totp-


pin=870305
ok

Reference
Edit online

Creating a seed for multi-factor authentication

710 IBM Storage Ceph


Listing multi-factor authentication TOTP tokens
Edit online
List all multi-factor authentication (MFA) time-based one time password (TOTP) tokens that a particular user has.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ceph Object Gateway is installed.

You have root access on a Ceph Monitor node.

An MFA TOTP token was created using radosgw-admin mfa create.

Procedure
Edit online

List MFA TOTP tokens:

Syntax

radosgw-admin mfa list --uid=USERID

Set USERID to the user name MFA is set up on.

Example

[root@host01 ~]# radosgw-admin mfa list --uid=johndoe


{
"entries": [
{
"type": 2,
"id": "MFAtest",
"seed": "492dedb20cf51d1405ef6a1316017e",
"seed_type": "hex",
"time_ofs": 0,
"step_size": 30,
"window": 2
}
]
}

Reference
Edit online

Creating a new multi-factor authentication TOTP token

Display a multi-factor authentication TOTP token


Edit online
Display a specific multi-factor authentication (MFA) time-based one time password (TOTP) token by specifying its serial.

Prerequisites
Edit online

IBM Storage Ceph 711


A running IBM Storage Ceph cluster.

Ceph Object Gateway is installed.

You have root access on a Ceph Monitor node.

An MFA TOTP token was created using radosgw-admin mfa create.

Procedure
Edit online

Show the MFA TOTP token:

Syntax

radosgw-admin mfa get --uid=USERID --totp-serial=SERIAL

Set USERID to the user name MFA is set up on and set SERIAL to the string that represents the ID for the TOTP token.

Reference
Edit online

Creating a new multi-factor authentication TOTP token

Deleting a multi-factor authentication TOTP token


Edit online
Delete a multi-factor authentication (MFA) time-based one time password (TOTP) token.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ceph Object Gateway is installed.

You have root access on a Ceph Monitor node.

An MFA TOTP token was created using radosgw-admin mfa create.

Procedure
Edit online

1. Delete an MFA TOTP token:

Syntax

radosgw-admin mfa remove --uid=USERID --totp-serial=SERIAL

Set USERID to the user name MFA is set up on and set SERIAL to the string that represents the ID for the TOTP token.

Example

[root@host01 ~]# radosgw-admin mfa remove --uid=johndoe --totp-serial=MFAtest

2. Verify the MFA TOTP token was deleted:

Syntax

radosgw-admin mfa get --uid=USERID --totp-serial=SERIAL

712 IBM Storage Ceph


Set USERID to the user name MFA is set up on and set SERIAL to the string that represents the ID for the TOTP token.

Example

[root@host01 ~]# radosgw-admin mfa get --uid=johndoe --totp-serial=MFAtest


MFA serial id not found

Administration
Edit online
As a storage administrator, you can manage the Ceph Object Gateway using the radosgw-admin command line interface (CLI) or
using the IBM Storage Ceph Dashboard.

NOTE: Not all of the Ceph Object Gateway features are available to the IBM Storage Ceph Dashboard.

Creating storage policies


Creating indexless buckets
Configure bucket index resharding
Enabling compression
User management
Role management
Quota management
Bucket management
Bucket lifecycle
Usage
Ceph Object Gateway data layout
Optimize the Ceph Object Gateway's garbage collection
Optimize the Ceph Object Gateway's data object storage

Prerequisites
Edit online

A healthy running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

Creating storage policies


Edit online
The Ceph Object Gateway stores the client bucket and object data by identifying placement targets, and storing buckets and objects
in the pools associated with a placement target. If you don’t configure placement targets and map them to pools in the instance’s
zone configuration, the Ceph Object Gateway will use default targets and pools, for example, default_placement.

Storage policies give Ceph Object Gateway clients a way of accessing a storage strategy, that is, the ability to target a particular type
of storage, such as SSDs, SAS drives, and SATA drives, as a way of ensuring, for example, durability, replication, and erasure coding.
For details, see the Storage Strategies

To create a storage policy, use the following procedure:

1. Create a new pool .rgw.buckets.special with the desired storage strategy. For example, a pool customized with erasure-
coding, a particular CRUSH ruleset, the number of replicas, and the pg_num and pgp_num count.

2. Get the zone group configuration and store it in a file:

Syntax

radosgw-admin zonegroup --rgw-zonegroup=ZONE_GROUP_NAME get > FILE_NAME.json

Example

IBM Storage Ceph 713


[root@host01 ~]# radosgw-admin zonegroup --rgw-zonegroup=default get > zonegroup.json

3. Add a special-placement entry under placement_target in the zonegroup.json file:

Example

{
"name": "default",
"api_name": "",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"master_zone": "",
"zones": [{
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 5
}],
"placement_targets": [{
"name": "default-placement",
"tags": []
}, {
"name": "special-placement",
"tags": []
}],
"default_placement": "default-placement"
}

4. Set the zone group with the modified zonegroup.json file:

Example

[root@host01 ~]# radosgw-admin zonegroup set < zonegroup.json

5. Get the zone configuration and store it in a file, for example, zone.json:

Example

[root@host01 ~]# radosgw-admin zone get > zone.json

6. Edit the zone file and add the new placement policy key under placement_pool:

Example

{
"domain_root": ".rgw",
"control_pool": ".rgw.control",
"gc_pool": ".rgw.gc",
"log_pool": ".log",
"intent_log_pool": ".intent-log",
"usage_log_pool": ".usage",
"user_keys_pool": ".users",
"user_email_pool": ".users.email",
"user_swift_pool": ".users.swift",
"user_uid_pool": ".users.uid",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [{
"key": "default-placement",
"val": {
"index_pool": ".rgw.buckets.index",
"data_pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra"
}
}, {
"key": "special-placement",
"val": {
"index_pool": ".rgw.buckets.index",
"data_pool": ".rgw.buckets.special",
"data_extra_pool": ".rgw.buckets.extra"
}

714 IBM Storage Ceph


}]
}

7. Set the new zone configuration:

Example

[root@host01 ~]# radosgw-admin zone set < zone.json

8. Update the zone group map:

Example

[root@host01 ~]# radosgw-admin period update --commit

The special-placement entry is listed as a placement_target.

9. To specify the storage policy when making a request:

Example

$ curl -i http://10.0.0.1/swift/v1/TestContainer/file.txt -X PUT -H "X-Storage-Policy:


special-placement" -H "X-Auth-Token: AUTH_rgwtxxxxxx"

Creating indexless buckets


Edit online
You can configure a placement target where created buckets do not use the bucket index to store objects index; that is, indexless
buckets. Placement targets that do not use data replication or listing might implement indexless buckets. Indexless buckets provide
a mechanism in which the placement target does not track objects in specific buckets. This removes a resource contention that
happens whenever an object write happens and reduces the number of round trips that Ceph Object Gateway needs to make to the
Ceph storage cluster. This can have a positive effect on concurrent operations and small object write performance.

IMPORTANT: The bucket index does not reflect the correct state of the bucket, and listing these buckets does not correctly return
their list of objects. This affects multiple features. Specifically, these buckets are not synced in a multi-zone environment because
the bucket index is not used to store change information. IBM recommends not to use S3 object versioning on indexless buckets,
because the bucket index is necessary for this feature.

NOTE: Using indexless buckets removes the limit of the max number of objects in a single bucket.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

Root-level access to a Ceph Object Gateway node.

Procedure
Edit online

1. Add a new placement target to the zonegroup:

Example

[ceph: root@host03 /]# radosgw-admin zonegroup placement add --rgw-zonegroup="default"


--placement-id="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_object-
gateway_proc_rgw_creating-indexless-buckets_indexless-placement"

2. Add a new placement target to the zone:

Example

IBM Storage Ceph 715


[ceph: root@host03 /]# radosgw-admin zone placement add --rgw-zone="default"
--placement-id="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_object-
gateway_proc_rgw_creating-indexless-buckets_indexless-placement"
--data-pool="default.rgw.buckets.data"
--index-pool="default.rgw.buckets.index"
--data_extra_pool="default.rgw.buckets.non-ec"
--placement-index-type="indexless"

3. Set the zonegroup’s default placement to indexless-placement:

Example

[ceph: root@host03 /]# radosgw-admin zonegroup placement default --placement-id "indexless-


placement"

In this example, the buckets created in the indexless-placement target will be indexless buckets.

4. Update and commit the period if the cluster is in a multi-site configuration:

Example

[ceph: root@host03 /]# radosgw-admin period update --commit

5. Restart the Ceph Object Gateways on all nodes in the storage cluster for the change to take effect:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host03 /]# ceph orch restart rgw

Configure bucket index resharding


Edit online
As a storage administrator, you can configure bucket index resharding in single-site and multi-site deployments to improve
performance.

You can reshard a bucket index either manually offline or dynamically online.

Bucket index resharding


Recovering bucket index
Limitations of bucket index resharding
Configuring bucket index resharding in simple deployments
Configuring bucket index resharding in multi-site deployments
Resharding bucket index dynamically
Resharding bucket index dynamically in multi-site configuration
Resharding bucket index manually
Cleaning stale instances of bucket entries after resharding
Fixing lifecycle policies after resharding

Bucket index resharding


Edit online
The Ceph Object Gateway stores bucket index data in the index pool, which defaults to .rgw.buckets.index parameter. When the
client puts many objects in a single bucket without setting quotas for the maximum number of objects per bucket, the index pool can
result in significant performance degradation.

Bucket index resharding prevents performance bottlenecks when you add a high number of objects per bucket.

You can configure bucket index resharding for new buckets or change the bucket index on the existing ones.

716 IBM Storage Ceph


You need to have the shard count as the nearest prime number to the calculated shard count. The bucket index shards that
are prime numbers tend to work better in an evenly distributed bucket index entries across shards.

Bucket index can be resharded manually or dynamically.

During the process of resharding bucket index dynamically, there is a periodic check of all the Ceph Object Gateway buckets
and it detects buckets that require resharding. If a bucket has grown larger than the value specified in the
rgw_max_objs_per_shard parameter, the Ceph Object Gateway reshards the bucket dynamically in the background. The
default value for rgw_max_objs_per_shard is 100k objects per shard. Resharding bucket index dynamically works as
expected on the upgraded single-site configuration without any modification to the zone or the zone group. A single site-
configuration can be any of the following:

A default zone configuration with no realm.

A non-default configuration with at least one realm.

A multi-realm single-site configuration.

Recovering bucket index


Edit online
Resharding a bucket that was created with bucket_index_max_shards = 0, removes the bucket's metadata. However, you can
restore the bucket indexes by recovering the affected buckets.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A Ceph Object Gateway installed at a minimum of two sites.

The jq package installed.

Procedure
Edit online

Perform either of the below two steps to perform recovery of bucket indexes:

Run radosgw-admin object reindex --bucket BUCKET_NAME --object OBJECT_NAME command.

Run the script - /usr/bin/rgw-restore-bucket-index -b BUCKET_NAME -p DATA_POOL_NAME.

Syntax

[root@host01 ceph]# /usr/bin/rgw-restore-bucket-index -b bucket-large-1 -p local-


zone.rgw.buckets.data

marker is d8a347a4-99b6-4312-a5c1-75b83904b3d4.41610.2
bucket_id is d8a347a4-99b6-4312-a5c1-75b83904b3d4.41610.2
number of bucket index shards is 5
data pool is local-zone.rgw.buckets.data
NOTICE: This tool is currently considered EXPERIMENTAL.
The list of objects that we will attempt to restore can be found in "/tmp/rgwrbi-
object-list.49946".
Please review the object names in that file (either below or in another
window/terminal) before proceeding.
Type "proceed!" to proceed, "view" to view object list, or "q" to quit: view
Viewing...
Type "proceed!" to proceed, "view" to view object list, or "q" to quit: proceed!
Proceeding...
NOTICE: Bucket stats are currently incorrect. They can be restored with the following
command after 2 minutes:
radosgw-admin bucket list --bucket=bucket-large-1 --allow-unordered --max-
entries=1073741824

IBM Storage Ceph 717


Would you like to take the time to recalculate bucket stats now? [yes/no] yes
Done

real 2m16.530s
user 0m1.082s
sys 0m0.870s

NOTE: The tool does not work for versioned buckets.

[root@host01 ~]# time rgw-restore-bucket-index --proceed serp-bu-ver-1


default.rgw.buckets.data
NOTICE: This tool is currently considered EXPERIMENTAL.
marker is e871fb65-b87f-4c16-a7c3-064b66feb1c4.25076.5
bucket_id is e871fb65-b87f-4c16-a7c3-064b66feb1c4.25076.5
Error: this bucket appears to be versioned, and this tool cannot work with versioned
buckets.

NOTE: The tool's scope is limited to a single site only and not multi-site, that is, if we run rgw-restore-bucket-index tool
at site-1, it does not recover objects in site-2 and vice versa. On a multi-site, the recovery tool and the object re-index
command should be executed at both sites for a bucket.

Limitations of bucket index resharding


Edit online
IMPORTANT: Use the following limitations with caution. There are implications related to your hardware selections, so you should
always discuss these requirements with your IBM account team.

Maximum number of objects in one bucket before it needs resharding: Use a maximum of 102,400 objects per bucket
index shard. To take full advantage of resharding and maximize parallelism, provide a sufficient number of OSDs in the Ceph
Object Gateway bucket index pool. This parallelization scales with the number of Ceph Object Gateway instances, and
replaces the in-order index shard enumeration with a number sequence. The default locking timeout is extended from 60
seconds to 90 seconds.

Maximum number of objects when using sharding: Based on prior testing, the number of bucket index shards currently
supported is 65521.

You can reshard a bucket three times before the other zones catch-up: Resharding is not recommended until the older
generations synchronize. Around four generations of the buckets from previous reshards are supported. Once the limit is
reached, dynamic resharding does not reshard the bucket again until at least one of the old log generations are fully trimmed.
Using the command radosgw-admin bucket reshard throws the following error:

Bucket BUCKET_NAME already has too many log generations (4) from previous reshards that peer
zones haven't finished syncing.

Resharding is not recommended until the old generations sync, but you can force a reshard with
`--yes-i-really-mean-it`.

Configuring bucket index resharding in simple deployments


Edit online
To enable and configure bucket index resharding on all new buckets, use the rgw_override_bucket_index_max_shards
parameter.

You can set the parameter to one of the following values:

0 to disable bucket index sharding, which is the default value.

A value greater than 0 to enable bucket sharding and to set the maximum number of shards.

Prerequisites
Edit online

718 IBM Storage Ceph


Two running IBM Storage Ceph clusters.

A Ceph Object Gateway installed.

Procedure
Edit online

1. Calculate the recommended number of shards:

number of objects expected in a bucket / 100,000

NOTE: The maximum number of bucket index shards currently supported is 65,521.

2. Set the rgw_override_bucket_index_max_shards option accordingly:

Syntax

ceph config set client.rgw rgw_override_bucket_index_max_shards VALUE

Replace VALUE with the recommended number of shards calculated:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_override_bucket_index_max_shards 12

To configure bucket index resharding for all instances of the Ceph Object Gateway, set the
rgw_override_bucket_index_max_shards parameter with the global option.

To configure bucket index resharding only for a particular instance of the Ceph Object Gateway, add
rgw_override_bucket_index_max_shards parameter under the instance.

3. Restart the Ceph Object Gateways on all nodes in the cluster to take effect:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root#host01 /]# ceph orch restart rgw

Reference
Edit online

Resharding bucket index dynamically

Resharding bucket index manually

Configuring bucket index resharding in multi-site deployments


Edit online
In multi-site deployments, each zone can have a different index_pool setting to manage failover. To configure a consistent shard
count for zones in one zone group, set the bucket_index_max_shards parameter in the configuration for that zone group. The
default value of bucket_index_max_shards parameter is 11.

You can set the parameter to one of the following values:

0 to disable bucket index sharding.

A value greater than 0 to enable bucket sharding and to set the maximum number of shards.

NOTE: Mapping the index pool, for each zone, if applicable, to a CRUSH ruleset of SSD-based OSDs might also help with bucket index
performance. See CRUSH performance domains

IBM Storage Ceph 719


IMPORTANT: To prevent sync issues in multi-site deployments, a bucket should not have more than three generation gaps.

Prerequisites
Edit online

Two running IBM Storage Ceph clusters.

A Ceph Object Gateway installed at a minimum of two sites.

Procedure
Edit online

1. Calculate the recommended number of shards:

number of objects expected in a bucket / 100,000

NOTE: The maximum number of bucket index shards currently supported is 65,521.

2. Extract the zone group configuration to the zonegroup.json file:

Example

[ceph: root@host01 /]# radosgw-admin zonegroup get > zonegroup.json

3. In the zonegroup.json file, set the bucket_index_max_shards parameter for each named zone:

Syntax

bucket_index_max_shards = VALUE

Replace VALUE with the recommended number of shards calculated:

Example

bucket_index_max_shards = 12

4. Reset the zone group:

Example

[ceph: root@host01 /]# radosgw-admin zonegroup set < zonegroup.json

5. Update the period:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

6. Check if resharding is complete:

Syntax

radosgw-admin reshard status --bucket BUCKET_NAME

Example

[ceph: root@host01 /]# radosgw-admin reshard status --bucket data

Verification
Edit online

Check the sync status of the storage cluster:

Example

[ceph: root@host01 /]# radosgw-admin sync status

720 IBM Storage Ceph


Resharding bucket index dynamically
Edit online
You can reshard the bucket index dynamically by adding the bucket to the resharding queue. It gets scheduled to be resharded. The
reshard threads run in the background and executes the scheduled resharding, one at a time.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A Ceph Object Gateway installed.

Procedure
Edit online

1. Set the rgw_dynamic_resharding parameter is set to true.

Example

[ceph: root@host01 /]# radosgw-admin period get

2. Optional: Customize Ceph configuration using the following command:

Syntax

ceph config set client.rgw OPTION VALUE

Replace OPTION with the following options:

rgw_reshard_num_logs: The number of shards for the resharding log. The default value is 16.

rgw_reshard_bucket_lock_duration: The duration of the lock on a bucket during resharding. The default value is
360 seconds.

rgw_dynamic_resharding: Enables or disables dynamic resharding. The default value is true.

rgw_max_objs_per_shard: The maximum number of objects per shard. The default value is 100000 objects per
shard.

rgw_reshard_thread_interval: The maximum time between rounds of reshard thread processing. The default
value is 600 seconds.

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_reshard_num_logs 23

3. Add a bucket to the resharding queue:

Syntax

radosgw-admin reshard add --bucket BUCKET --num-shards NUMBER

Example

[ceph: root@host01 /]# radosgw-admin reshard add --bucket data --num-shards 10

4. List the resharding queue:

Example

[ceph: root@host01 /]# radosgw-admin reshard list

5. Check the bucket log generations and shards:

IBM Storage Ceph 721


Example

[ceph: root@host01 /]# radosgw-admin bucket layout --bucket data


{
"layout": {
"resharding": "None",
"current_index": {
"gen": 1,
"layout": {
"type": "Normal",
"normal": {
"num_shards": 23,
"hash_type": "Mod"
}
}
},
"logs": [
{
"gen": 0,
"layout": {
"type": "InIndex",
"in_index": {
"gen": 0,
"layout": {
"num_shards": 11,
"hash_type": "Mod"
}
}
}
},
{
"gen": 1,
"layout": {
"type": "InIndex",
"in_index": {
"gen": 1,
"layout": {
"num_shards": 23,
"hash_type": "Mod"
}
}
}
}
]
}
}

6. Check bucket resharding status:

Syntax

radosgw-admin reshard status --bucket BUCKET

Example

[ceph: root@host01 /]# radosgw-admin reshard status --bucket data

7. Process entries on the resharding queue immediately:

[ceph: root@host01 /]# radosgw-admin reshard process

8. Cancel pending bucket resharding:

WARNING: You can only cancel pending resharding operations. Do not cancel ongoing resharding operations.

Syntax

radosgw-admin reshard cancel --bucket BUCKET

Example

[ceph: root@host01 /]# radosgw-admin reshard cancel --bucket data

Verification
722 IBM Storage Ceph
Edit online

Check bucket resharding status:

Syntax

radosgw-admin reshard status --bucket BUCKET

Example

[ceph: root@host01 /]# radosgw-admin reshard status --bucket data

Reference
Edit online

Cleaning stale instances of bucket entries after resharding

Resharding bucket index manually

Configuring bucket index resharding in simple deployments

Resharding bucket index dynamically in multi-site configuration


Edit online
The feature allows buckets to be resharded in a multi-site configuration without interrupting the replication of their objects. When
rgw_dynamic_resharding is enabled, it runs on each zone independently, and the zones might choose different shard counts for
the same bucket.

You need to enable the resharding feature manually on the existing zones and the zone groups after upgrading the storage cluster.

NOTE: You can reshard a bucket three times before the other zones catch-up. See Limitations of bucket index resharding

NOTE: If a bucket is created and uploaded with more than the threshold number of objects for resharding dynamically, you need to
continue to write I/Os to old buckets to begin the resharding process.

Prerequisites
Edit online

At least two running IBM Storage Ceph clusters.

All the Ceph Object Gateway daemons enabled at both the sites are upgraded to the latest version.

Root-level access to all the nodes.

Procedure
Edit online

1. Check if resharding is enabled on the zonegroup:

Example

[ceph: root@host01 /]# radosgw-admin sync status

If zonegroup features enabled is not enabled for resharding on the zonegroup, then continue with the procedure.

2. Enable the resharding feature on all the zonegroups in the multi-site configuration where Ceph Object Gateway is installed:

Syntax

radosgw-admin zonegroup modify --rgw-zonegroup=ZONEGROUP_NAME --enable-feature=resharding

Example

IBM Storage Ceph 723


[ceph: root@host01 /]# radosgw-admin zonegroup modify --rgw-zonegroup=us --enable-
feature=resharding

3. Update the period and commit:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

4. Enable the resharding feature on all the zones in the multi-site configuration where Ceph Object Gateway is installed:

Syntax

radosgw-admin zone modify --rgw-zone=ZONE_NAME --enable-feature=resharding

Example

[ceph: root@host01 /]# radosgw-admin zone modify --rgw-zone=us-east --enable-


feature=resharding

5. Update the period and commit:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

6. Verify the resharding feature is enabled on the zones and zonegroups. You can see that each zone lists its
supported_features and the zonegroups lists its enabled_features

Example

[ceph: root@host01 /]# radosgw-admin period get

"zones": [
{
"id": "505b48db-6de0-45d5-8208-8c98f7b1278d",
"name": "us_east",
"endpoints": [
"http://10.0.208.11:8080"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": "",
"supported_features": [
"resharding"
]
"default_placement": "default-placement",
"realm_id": "26cf6f23-c3a0-4d57-aae4-9b0010ee55cc",
"sync_policy": {
"groups": []
},
"enabled_features": [
"resharding"
]

7. Check the sync status:

Example

[ceph: root@host01 /]# radosgw-admin sync status


realm 26cf6f23-c3a0-4d57-aae4-9b0010ee55cc (usa)
zonegroup 33a17718-6c77-493e-99fe-048d3110a06e (us)
zone 505b48db-6de0-45d5-8208-8c98f7b1278d (us_east)
zonegroup features enabled: resharding

In this example. you can see that the resharding feature is enabled for the us zonegroup.

8. Optional: You can disable the resharding feature for the zonegroups:

a. Disable the feature on all the zonegroups in the multi-site where Ceph Object Gateway is installed:

724 IBM Storage Ceph


Syntax

radosgw-admin zonegroup modify --rgw-zonegroup=ZONEGROUP_NAME --disable-


feature=resharding

Example

[ceph: root@host01 /]# radosgw-admin zonegroup modify --rgw-zonegroup=us --disable-


feature=resharding

b. Update the period and commit:

Example

[ceph: root@host01 /]# radosgw-admin period update --commit

Reference
Edit online

Resharding bucket index dynamically

Resharding bucket index manually


Edit online
If a bucket has grown larger than the initial configuration for which it was optimzed, reshard the bucket index pool by using the
radosgw-admin bucket reshard command. This command performs the following tasks:

Creates a new set of bucket index objects for the specified bucket.

Distributes object entries across these bucket index objects.

Creates a new bucket instance.

Links the new bucket instance with the bucket so that all new index operations go through the new bucket indexes.

Prints the old and the new bucket ID to the command output.

Prerequisites
Edit online

At least two running IBM Storage Ceph clusters.

A Ceph Object Gateway installed at a minimum of two sites.

Procedure
Edit online

1. Back up the original bucket index:

Syntax

radosgw-admin bi list --bucket=BUCKET > BUCKET.list.backup

Example

[ceph: root@host01 /]# radosgw-admin bi list --bucket=data > data.list.backup

2. Reshard the bucket index:

Syntax

radosgw-admin bucket reshard --bucket=BUCKET --num-shards=NUMBER

IBM Storage Ceph 725


Example

[ceph: root@host01 /]# radosgw-admin bucket reshard --bucket=data --num-shards=100

Verification
Edit online

Check bucket resharding status:

Syntax

radosgw-admin reshard status --bucket bucket

Example

[ceph: root@host01 /]# radosgw-admin reshard status --bucket data

Cleaning stale instances of bucket entries after resharding


Edit online
The resharding process might not clean stale instances of bucket entries automatically and these instances can impact performance
of the storage cluster.

Clean them manually to prevent the stale instances from negatively impacting the performance of the storage cluster.

IMPORTANT: Contact IBM Support prior to cleaning the stale instances.

IMPORTANT: Use this procedure only in simple deployments, not in multi-site clusters.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A Ceph Object Gateway node.

Procedure
Edit online

1. List stale instances:

[ceph: root@host01 /]# radosgw-admin reshard stale-instances list

2. Clean the stale instances of the bucket entries:

[ceph: root@host01 /]# radosgw-admin reshard stale-instances rm

Verification
Edit online

Check bucket resharding status:

Syntax

radosgw-admin reshard status --bucket BUCKET

Example

[ceph: root@host01 /]# radosgw-admin reshard status --bucket data

726 IBM Storage Ceph


Fixing lifecycle policies after resharding
Edit online
For storage clusters with resharded instances, the old lifecycle processes would have flagged and deleted the lifecycle processing as
the bucket instance changed during a reshard.

However, for older buckets that had lifecycle policies and have undergone resharding, you can fix such buckets with the reshard
fix option.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A Ceph Object Gateway installed.

Procedure
Edit online

Fix the lifecycle policies of the older bucket:

Syntax

radosgw-admin lc reshard fix --bucket BUCKET_NAME

IMPORTANT: If you do not use the --bucket argument, then the command fixes lifecycle policies for all the buckets in the
storage cluster.

Example

[ceph: root@host01 /]# radosgw-admin lc reshard fix --bucket mybucket

Enabling compression
Edit online
The Ceph Object Gateway supports server-side compression of uploaded objects using any of Ceph’s compression plugins. These
include:

zlib: Supported.

snappy: Supported.

zstd: Supported.

Configuration

To enable compression on a zone’s placement target, provide the --compression=TYPE option to the radosgw-admin zone
placement modify command. The compression TYPE refers to the name of the compression plugin to use when writing new
object data.

Each compressed object stores the compression type. Changing the setting does not hinder the ability to decompress existing
compressed objects, nor does it force the Ceph Object Gateway to recompress existing objects.

This compression setting applies to all new objects uploaded to buckets using this placement target.

To disable compression on a zone’s placement target, provide the --compression=TYPE option to the radosgw-admin zone
placement modify command and specify an empty string or none.

Example

IBM Storage Ceph 727


[root@host01 ~] radosgw-admin zone placement modify --rgw-zone=default --placement-id=default-
placement --compression=zlib
{
...
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "default.rgw.buckets.index",
"data_pool": "default.rgw.buckets.data",
"data_extra_pool": "default.rgw.buckets.non-ec",
"index_type": 0,
"compression": "zlib"
}
}
],
...
}

After enabling or disabling compression, restart the Ceph Object Gateway instance so the change will take effect.

NOTE: Ceph Object Gateway creates a default zone and a set of pools. For production deployments, see Creating a realm

Statistics

While all existing commands and APIs continue to report object and bucket sizes based on their uncompressed data, the radosgw-
admin bucket stats command includes compression statistics for all buckets.

Syntax

radosgw-admin bucket stats --bucket=BUCKET_NAME


{
...
"usage": {
"rgw.main": {
"size": 1075028,
"size_actual": 1331200,
"size_utilized": 592035,
"size_kb": 1050,
"size_kb_actual": 1300,
"size_kb_utilized": 579,
"num_objects": 104
}
},
...
}

The size is the accumulated size of the objects in the bucket, uncompressed and unencrypted. The size_kb is the accumulated
size in kilobytes and is calculated as ceiling(size/1024). In this example, it is ceiling(1075028/1024) = 1050.

The size_actual is the accumulated size of all the objects after each object is distributed in a set of 4096-byte blocks. If a bucket
has two objects, one of size 4100 bytes and the other of 8500 bytes, the first object is rounded up to 8192 bytes, and the second one
rounded 12288 bytes, and their total for the bucket is 20480 bytes. The size_kb_actual is the actual size in kilobytes and is
calculated as size_actual/1024. In this example, it is 1331200/1024 = 1300.

The size_utilized is the total size of the data in bytes after it has been compressed and/or encrypted. Encryption could increase
the size of the object while compression could decrease it. The size_kb_utilized is the total size in kilobytes and is calculated as
ceiling(size_utilized/1024). In this example, it is ceiling(592035/1024)= 579.

User management
Edit online
Ceph Object Storage user management refers to users that are client applications of the Ceph Object Storage service; not the Ceph
Object Gateway as a client application of the Ceph Storage Cluster. You must create a user, access key, and secret to enable client
applications to interact with the Ceph Object Gateway service.

There are two user types:

User: The term 'user' reflects a user of the S3 interface.

728 IBM Storage Ceph


Subuser: The term 'subuser' reflects a user of the Swift interface. A subuser is associated to a user .

You can create, modify, view, suspend, and remove users and subusers.

IMPORTANT: When managing users in a multi-site deployment, ALWAYS issue the radosgw-admin command on a Ceph Object
Gateway node within the master zone of the master zone group to ensure that users synchronize throughout the multi-site cluster.
DO NOT create, modify, or delete users on a multi-site cluster from a secondary zone or a secondary zone group.

In addition to creating user and subuser IDs, you may add a display name and an email address for a user. You can specify a key and
secret, or generate a key and secret automatically. When generating or specifying keys, note that user IDs correspond to an S3 key
type and subuser IDs correspond to a swift key type. Swift keys also have access levels of read, write, readwrite and full.

User management command line syntax generally follows the pattern user COMMAND USER_ID where USER_ID is either the --
uid= option followed by the user's ID (S3) or the --subuser= option followed by the user name (Swift).

Syntax

radosgw-admin user <create|modify|info|rm|suspend|enable|check|stats> <--uid=USER_ID|--


subuser=SUB_USER_NAME> [other-options]

Additional options may be required depending on the command you issue.

Multi-tenant namespace
Create a user
Create a subuser
Get user information
Modify user information
Enable and suspend users
Remove a user
Remove a subuser
Rename a user
Create a key
Add and remove access keys
Add and remove admin capabilities

Multi-tenant namespace
Edit online
The Ceph Object Gateway supports multi-tenancy for both the S3 and Swift APIs, where each user and bucket lies under a "tenant."
Multi tenancy prevents namespace clashing when multiple tenants are using common bucket names, such as "test", "main", and so
forth.

Each user and bucket lies under a tenant. For backward compatibility, a "legacy" tenant with an empty name is added. Whenever
referring to a bucket without specifically specifying a tenant, the Swift API will assume the "legacy" tenant. Existing users are also
stored under the legacy tenant, so they will access buckets and objects the same way as earlier releases.

Tenants as such do not have any operations on them. They appear and disappear as needed, when users are administered. In order
to create, modify, and remove users with explicit tenants, either an additional option --tenant is supplied, or a syntax
"_TENANT_$_USER_" is used in the parameters of the radosgw-admin command.

To create a user testx$tester for S3, run the following command:

Example

[root@host01 ~]# radosgw-admin --tenant testx --uid tester


--display-name "Test User" --access_key TESTER
--secret test123 user create

To create a user testx$tester for Swift, run one of the following commands:

Example

[root@host01 ~]# radosgw-admin --tenant testx --uid tester


--display-name "Test User" --subuser tester:swift
--key-type swift --access full subuser create

IBM Storage Ceph 729


[root@host01 ~]# radosgw-admin key create --subuser 'testx$tester:swift'
--key-type swift --secret test123

NOTE: The subuser with explicit tenant had to be quoted in the shell.

Create a user
Edit online
Use the user create command to create an S3-interface user. You MUST specify a user ID and a display name. You may also
specify an email address. If you DO NOT specify a key or secret, radosgw-admin will generate them for you automatically. However,
you may specify a key and/or a secret if you prefer not to use generated key/secret pairs.

Syntax

radosgw-admin user create --uid=USER_ID


[--key-type=KEY_TYPE] [--gen-access-key|--access-key=ACCESS_KEY]
[--gen-secret | --secret=SECRET_KEY]
[--email=EMAIL] --display-name=DISPLAY_NAME

Example

[root@host01 ~]# radosgw-admin user create --uid=janedoe --access-key=11BS02LGFB6AL6H1ADMW --


secret=vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANY [email protected] --display-name=Jane Doe

{ "user_id": "janedoe",
"display_name": "Jane Doe",
"email": "[email protected]",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{ "user": "janedoe",
"access_key": "11BS02LGFB6AL6H1ADMW",
"secret_key": "vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANY"}],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": { "enabled": false,
"max_size_kb": -1,
"max_objects": -1},
"user_quota": { "enabled": false,
"max_size_kb": -1,
"max_objects": -1},
"temp_url_keys": []}

IMPORTANT: Check the key output. Sometimes radosgw-admin generates a JSON escape () character, and some clients
do not know how to handle JSON escape
characters. Remedies include removing the JSON escape character (), encapsulating the string in quotes,
regenerating the key to ensure that it does not have a JSON escape character, or specifying the key and secret manually.

Create a subuser
Edit online
To create a subuser (Swift interface), you must specify the user ID (--uid=_USERNAME_), a subuser ID and the access level for the
subuser. If you DO NOT specify a key or secret, radosgw-admin generates them for you automatically. However, you can specify a
key, a secret, or both if you prefer not to use generated key and secret pairs.

NOTE: full is not readwrite, as it also includes the access control policy.

Syntax

radosgw-admin subuser create --uid=USER_ID --subuser=SUB_USER_ID --access=[ read | write |


readwrite | full ]

730 IBM Storage Ceph


Example

[root@host01 ~]# radosgw-admin subuser create --uid=janedoe --subuser=janedoe:swift --access=full

{ "user_id": "janedoe",
"display_name": "Jane Doe",
"email": "[email protected]",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [
{ "id": "janedoe:swift",
"permissions": "full-control"}],
"keys": [
{ "user": "janedoe",
"access_key": "11BS02LGFB6AL6H1ADMW",
"secret_key": "vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANY"}],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": { "enabled": false,
"max_size_kb": -1,
"max_objects": -1},
"user_quota": { "enabled": false,
"max_size_kb": -1,
"max_objects": -1},
"temp_url_keys": []}

Get user information


Edit online
To get information about a user, specify user info and the user ID (--uid=_USERNAME_).

Example

[root@host01 ~]# radosgw-admin user info --uid=janedoe

To get information about a tenanted user, specify both the user ID and the name of the tenant.

[root@host01 ~]# radosgw-admin user info --uid=janedoe --tenant=test

Modify user information


Edit online
To modify information about a user, you must specify the user ID (--uid=_USERNAME_) and the attributes you want to modify.
Typical modifications are to keys and secrets, email addresses, display names, and access levels.

Example

[root@host01 ~]# radosgw-admin user modify --uid=janedoe --display-name="Jane E. Doe"

To modify subuser values, specify subuser modify and the subuser ID.

Example

[root@host01 ~]# radosgw-admin subuser modify --subuser=janedoe:swift --access=full

Enable and suspend users


Edit online

IBM Storage Ceph 731


When you create a user, the user is enabled by default. However, you may suspend user privileges and re-enable them at a later
time. To suspend a user, specify user suspend and the user ID.

[root@host01 ~]# radosgw-admin user suspend --uid=johndoe

To re-enable a suspended user, specify user enable and the user ID:

[root@host01 ~]# radosgw-admin user enable --uid=johndoe

NOTE: Disabling the user disables the subuser.

Remove a user
Edit online
When you remove a user, the user and subuser are removed from the system. However, you may remove only the subuser if you
wish. To remove a user (and subuser), specify user rm and the user ID.

Syntax

radosgw-admin user rm --uid=USER_ID[--purge-keys] [--purge-data]

Example

[ceph: root@host01 /]# radosgw-admin user rm --uid=johndoe --purge-data

To remove the subuser only, specify subuser rm and the subuser name.

Example

[ceph: root@host01 /]# radosgw-admin subuser rm --subuser=johndoe:swift --purge-keys

Options include:

Purge Data: The --purge-data option purges all data associated with the UID.

Purge Keys: The --purge-keys option purges all keys associated with the UID.

Remove a subuser
Edit online
When you remove a subuser, you are removing access to the Swift interface. The user remains in the system. To remove the subuser,
specify subuser rm and the subuser ID.

Syntax

radosgw-admin subuser rm --subuser=SUB_USER_ID

Example

[root@host01 /]# radosgw-admin subuser rm --subuser=johndoe:swift

Options include:

Purge Keys: The --purge-keys option purges all keys associated with the UID.

Rename a user
Edit online
To change the name of a user, use the radosgw-admin user rename command. The time that this command takes depends on
the number of buckets and objects that the user has. If the number is large, IBM recommends using the command in the Screen
utility provided by the screen package.

732 IBM Storage Ceph


Prerequisites
Edit online

A working Ceph cluster.

root or sudo access to the host running the Ceph Object Gateway.

Installed Ceph Object Gateway.

Procedure
Edit online

1. Rename a user:

Syntax

radosgw-admin user rename --uid=CURRENT_USER_NAME --new-uid=NEW_USER_NAME

Example

[ceph: root@host01 /]# radosgw-admin user rename --uid=user1 --new-uid=user2

{
"user_id": "user2",
"display_name": "user 2",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "user2",
"access_key": "59EKHI6AI9F8WOW8JQZJ",
"secret_key": "XH0uY3rKCUcuL73X0ftjXbZqUbk0cavD11rD8MsA"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw"
}

If a user is inside a tenant, specify both the user name and the tenant:

Syntax

radosgw-admin user rename --uid USER_NAME --new-uid NEW_USER_NAME --tenant TENANT

Example

[ceph: root@host01 /]# radosgw-admin user rename --uid=test$user1 --new-uid=test$user2 --


tenant test

IBM Storage Ceph 733


1000 objects processed in tvtester1. Next marker 80_tVtester1_99
2000 objects processed in tvtester1. Next marker 64_tVtester1_44
3000 objects processed in tvtester1. Next marker 48_tVtester1_28
4000 objects processed in tvtester1. Next marker 2_tVtester1_74
5000 objects processed in tvtester1. Next marker 14_tVtester1_53
6000 objects processed in tvtester1. Next marker 87_tVtester1_61
7000 objects processed in tvtester1. Next marker 6_tVtester1_57
8000 objects processed in tvtester1. Next marker 52_tVtester1_91
9000 objects processed in tvtester1. Next marker 34_tVtester1_74
9900 objects processed in tvtester1. Next marker 9_tVtester1_95
1000 objects processed in tvtester2. Next marker 82_tVtester2_93
2000 objects processed in tvtester2. Next marker 64_tVtester2_9
3000 objects processed in tvtester2. Next marker 48_tVtester2_22
4000 objects processed in tvtester2. Next marker 32_tVtester2_42
5000 objects processed in tvtester2. Next marker 16_tVtester2_36
6000 objects processed in tvtester2. Next marker 89_tVtester2_46
7000 objects processed in tvtester2. Next marker 70_tVtester2_78
8000 objects processed in tvtester2. Next marker 51_tVtester2_41
9000 objects processed in tvtester2. Next marker 33_tVtester2_32
9900 objects processed in tvtester2. Next marker 9_tVtester2_83
{
"user_id": "test$user2",
"display_name": "User 2",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "test$user2",
"access_key": "user2",
"secret_key": "123456789"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw"
}

2. Verify that the user has been renamed successfully:

Syntax

radosgw-admin user info --uid=NEW_USER_NAME

Example

[ceph: root@host01 /]# radosgw-admin user info --uid=user2

If a user is inside a tenant, use the TENANT$USER_NAME format:

Syntax

radosgw-admin user info --uid= TENANT$USER_NAME

734 IBM Storage Ceph


Example

[ceph: root@host01 /]# radosgw-admin user info --uid=test$user2

The screen(1) manual page

Create a key
Edit online
To create a key for a user, you must specify key create. For a user, specify the user ID and the s3 key type. To create a key for a
subuser, you must specify the subuser ID and the swift keytype.

Example

[ceph: root@host01 /]# radosgw-admin key create --subuser=johndoe:swift --key-type=swift --gen-


secret

{ "user_id": "johndoe",
"rados_uid": 0,
"display_name": "John Doe",
"email": "[email protected]",
"suspended": 0,
"subusers": [
{ "id": "johndoe:swift",
"permissions": "full-control"}],
"keys": [
{ "user": "johndoe",
"access_key": "QFAMEDSJP5DEKJO0DDXY",
"secret_key": "iaSFLDVvDdQt6lkNzHyW4fPLZugBAI1g17LO0+87"}],
"swift_keys": [
{ "user": "johndoe:swift",
"secret_key": "E9T2rUZNu2gxUjcwUBO8n/Ev4KX6/GprEuH4qhu1"}]}

Add and remove access keys


Edit online
Users and subusers must have access keys to use the S3 and Swift interfaces. When you create a user or subuser and you do not
specify an access key and secret, the key and secret get generated automatically. You may create a key and either specify or
generate the access key and/or secret. You may also remove an access key and secret. Options include:

--secret=_SECRET_KEY_ specifies a secret key, for example, manually generated.

--gen-access-key generates a random access key (for S3 users by default).

--gen-secret generates a random secret key.

--key-type=_KEY_TYPE_ specifies a key type. The options are: swift and s3.

To add a key, specify the user:

Example

[root@host01 ~]# radosgw-admin key create --uid=johndoe --key-type=s3 --gen-access-key --gen-secret

You might also specify a key and a secret.

To remove an access key, you need to specify the user and the key:

1. Find the access key for the specific user:

Example

[root@host01 ~]# radosgw-admin user info --uid=johndoe

The access key is the "access_key" value in the output:

IBM Storage Ceph 735


Example

[root@host01 ~]# radosgw-admin user info --uid=johndoe


{
"user_id": "johndoe",
...
"keys": [
{
"user": "johndoe",
"access_key": "0555b35654ad1656d804",
"secret_key": "h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q=="
}
],
...
}

2. Specify the user ID and the access key from the previous step to remove the access key:

Syntax

radosgw-admin key rm --uid=USER_ID --access-key ACCESS_KEY

Example

[root@host01 ~]# radosgw-admin key rm --uid=johndoe --access-key 0555b35654ad1656d804

Add and remove admin capabilities


Edit online
The Ceph Storage Cluster provides an administrative API that enables users to run administrative functions via the REST API. By
default, users DO NOT have access to this API. To enable a user to exercise administrative functionality, provide the user with
administrative capabilities.

To add administrative capabilities to a user, run the following command:

Syntax

radosgw-admin caps add --uid=USER_ID--caps=CAPS

You can add read, write, or all capabilities to users, buckets, metadata, and usage (utilization).

Syntax

--caps="[users|buckets|metadata|usage|zone]=[*|read|write|read, write]"

Example

[root@host01 ~]# radosgw-admin caps add --uid=johndoe --caps="users=*"

To remove administrative capabilities from a user, run the following command:

Example

[root@host01 ~]# radosgw-admin caps remove --uid=johndoe --caps={caps}

Role management
Edit online
As a storage administrator, you can create, delete, or update a role and the permissions associated with that role with the radosgw-
admin commands.

A role is similar to a user and has permission policies attached to it. It can be assumed by any identity. If a user assumes a role, a set
of dynamically created temporary credentials are returned to the user. A role can be used to delegate access to users, applications
and services that do not have permissions to access some S3 resources.

Creating a role

736 IBM Storage Ceph


Getting a role
Listing a role
Updating assume role policy document of a role
Getting permission policy attached to a role
Listing permission policy attached to a role
Deleting policy attached to a role
Deleting a role
Updating the session duration of a role

Reference
Edit online

REST APIs for manipulating a role

Creating a role
Edit online
Create a role for the user with the radosgw-admin role create command. You need to create a user with assume-role-
policy-doc parameter in the command, which is the trust relationship policy document that grants an entity the permission to
assume the role.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

An S3 user created with user access.

Procedure
Edit online

Create the role:

Syntax

radosgw-admin role create --role-name=ROLE_NAME [--path=="PATH_TO_FILE"] [--assume-role-


policy-doc=TRUST_RELATIONSHIP_POLICY_DOCUMENT]

Example

[root@host01 ~]# radosgw-admin role create --role-name=S3Access1 --


path=/application_abc/component_xyz/ --assume-role-policy-doc={"Version":"2012-10-
17","Statement":[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":
["sts:AssumeRole"]}]}

{
"RoleId": "ca43045c-082c-491a-8af1-2eebca13deec",
"RoleName": "S3Access1",
"Path": "/application_abc/component_xyz/",
"Arn": "arn:aws:iam:::role/application_abc/component_xyz/S3Access1",
"CreateDate": "2022-06-17T10:18:29.116Z",
"MaxSessionDuration": 3600,
"AssumeRolePolicyDocument": "{"Version":"2012-10-17","Statement":
[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":

IBM Storage Ceph 737


["sts:AssumeRole"]}]}"
}

The value for --path is / by default.

Getting a role
Edit online
Get the information about a role with the get command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

A role created.

An S3 user created with user access.

Procedure
Edit online

Getting the information about the role:

Syntax

radosgw-admin role get --role-name=ROLE_NAME

Example

[root@host01 ~]# radosgw-admin role get --role-name=S3Access1

{
"RoleId": "ca43045c-082c-491a-8af1-2eebca13deec",
"RoleName": "S3Access1",
"Path": "/application_abc/component_xyz/",
"Arn": "arn:aws:iam:::role/application_abc/component_xyz/S3Access1",
"CreateDate": "2022-06-17T10:18:29.116Z",
"MaxSessionDuration": 3600,
"AssumeRolePolicyDocument": "{"Version":"2012-10-17","Statement":
[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":
["sts:AssumeRole"]}]}"
}

Reference
Edit online

Creating a role

Listing a role
Edit online
You can list the roles in the specific path with the role list command.

738 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

A role created.

An S3 user created with user access.

Procedure
Edit online

List the roles:

Syntax

radosgw-admin role list --role-name=ROLE_NAME [--path-prefix =PATH_PREFIX]

Example

[root@host01 ~]# radosgw-admin role list --role-name=S3Access1 --path-prefix="/application"

[
{
"RoleId": "85fb46dd-a88a-4233-96f5-4fb54f4353f7",
"RoleName": "kvm-sts",
"Path": "/application_abc/component_xyz/",
"Arn": "arn:aws:iam:::role/application_abc/component_xyz/kvm-sts",
"CreateDate": "2022-09-13T11:55:09.39Z",
"MaxSessionDuration": 7200,
"AssumeRolePolicyDocument": "{"Version":"2012-10-17","Statement":
[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/kvm"]},"Action":
["sts:AssumeRole"]}]}"
},
{
"RoleId": "9116218d-4e85-4413-b28d-cdfafba24794",
"RoleName": "kvm-sts-1",
"Path": "/application_abc/component_xyz/",
"Arn": "arn:aws:iam:::role/application_abc/component_xyz/kvm-sts-1",
"CreateDate": "2022-09-16T00:05:57.483Z",
"MaxSessionDuration": 3600,
"AssumeRolePolicyDocument": "{"Version":"2012-10-17","Statement":
[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/kvm"]},"Action":
["sts:AssumeRole"]}]}"
}
]

Updating assume role policy document of a role


Edit online
You can update the assume role policy document that grants an entity permission to assume the role with the modify command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

IBM Storage Ceph 739


Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

A role created.

An S3 user created with user access.

Procedure
Edit online

Modify the assume role policy document of a role:

Syntax

radosgw-admin role-tust-policy modify --role-name=ROLE_NAME --assume-role-policy-


doc=TRUST_RELATIONSHIP_POLICY_DOCUMENT

Example

[root@host01 ~]# radosgw-admin role-tust-policy modify --role-name=S3Access1 --assume-role-


policy-doc={"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":
["arn:aws:iam:::user/TESTER"]},"Action":["sts:AssumeRole"]}]}

{
"RoleId": "ca43045c-082c-491a-8af1-2eebca13deec",
"RoleName": "S3Access1",
"Path": "/application_abc/component_xyz/",
"Arn": "arn:aws:iam:::role/application_abc/component_xyz/S3Access1",
"CreateDate": "2022-06-17T10:18:29.116Z",
"MaxSessionDuration": 3600,
"AssumeRolePolicyDocument": "{"Version":"2012-10-17","Statement":
[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":
["sts:AssumeRole"]}]}"
}

Getting permission policy attached to a role


Edit online
You can get the specific permission policy attached to a role with the get command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

A role created.

An S3 user created with user access.

Procedure
Edit online

Get the permission policy:

740 IBM Storage Ceph


Syntax

radosgw-admin role-policy get --role-name=ROLE_NAME --policy-name=POLICY_NAME

Example

[root@host01 ~]# radosgw-admin role-policy get --role-name=S3Access1 --policy-name=Policy1

{
"Permission policy": "{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":
["s3:*"],"Resource":"arn:aws:s3:::example_bucket"}]}"
}

Listing permission policy attached to a role


Edit online
You can list the names of the permission policies attached to a role with the list command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

A role created.

An S3 user created with user access.

Procedure
Edit online

List the names of the permission policies:

Syntax

radosgw-admin role-policy list --role-name=ROLE_NAME

Example

[root@host01 ~]# radosgw-admin role-policy list --role-name=S3Access1

[
"Policy1"
]

Deleting policy attached to a role


Edit online
You can delete the permission policy attached to a role with the rm command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

IBM Storage Ceph 741


Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

A role created.

An S3 user created with user access.

Procedure
Edit online

Delete the permission policy:

Syntax

radosgw-admin role policy delete --role-name=ROLE_NAME --policy-name=POLICY_NAME

Example

[root@host01 ~]# radosgw-admin role policy delete --role-name=S3Access1 --policy-name=Policy1

Deleting a role
Edit online
You can delete the role only after removing the permission policy attached to it.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

A role created.

An S3 bucket created.

An S3 user created with user access.

Procedure
Edit online

1. Delete the policy attached to the role:

Syntax

radosgw-admin role policy delete --role-name=ROLE_NAME --policy-name=POLICY_NAME

Example

[root@host01 ~]# radosgw-admin role policy delete --role-name=S3Access1 --policy-name=Policy1

2. Delete the role:

Syntax

radosgw-admin role delete --role-name=ROLE_NAME

742 IBM Storage Ceph


Example

[root@host01 ~]# radosgw-admin role delete --role-name=S3Access1

Reference
Edit online

Deleting policy attached to a role

Updating the session duration of a role


Edit online
You can update the session duration of a role with the update command to control the length of time that a user can be signed into
the account with the provided credentials.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

A role created.

An S3 user created with user access.

Procedure
Edit online

Update the max-session-duration using the update command:

Syntax

[root@node1 ~]# radosgw-admin role update --role-name=ROLE_NAME --max-session-duration=7200

Example

[root@node1 ~]# radosgw-admin role update --role-name=test-sts-role --max-session-


duration=7200

Verification
Edit online

List the roles to verify the updates:

Example

[root@node1 ~]#radosgw-admin role list


[
{
"RoleId": "d4caf33f-caba-42f3-8bd4-48c84b4ea4d3",
"RoleName": "test-sts-role",
"Path": "/",
"Arn": "arn:aws:iam:::role/test-role",
"CreateDate": "2022-09-07T20:01:15.563Z",
"MaxSessionDuration": 7200, <<<<<<

IBM Storage Ceph 743


"AssumeRolePolicyDocument": "{"Version":"2012-10-17","Statement":
[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/kvm"]},"Action":
["sts:AssumeRole"]}]}"
}
]

Quota management
Edit online
The Ceph Object Gateway enables you to set quotas on users and buckets owned by users. Quotas include the maximum number of
objects in a bucket and the maximum storage size in megabytes.

Bucket: The --bucket option allows you to specify a quota for buckets the user owns.

Maximum Objects: The --max-objects setting allows you to specify the maximum number of objects. A negative value
disables this setting.

Maximum Size: The --max-size option allows you to specify a quota for the maximum number of bytes. A negative value
disables this setting.

Quota Scope: The --quota-scope option sets the scope for the quota. The options are bucket and user. Bucket quotas
apply to buckets a user owns. User quotas apply to a user.

IMPORTANT: Buckets with a large number of objects can cause serious performance issues. The recommended maximum number
of objects in a one bucket is 100,000. To increase this number, configure bucket index sharding. See Configure bucket index
resharding

Set user quotas


Enable and disable user quotas
Set bucket quotas
Enable and disable bucket quotas
Get quota settings
Update quota stats
Get user quota usage stats
Quota cache
Reading and writing global quotas

Set user quotas


Edit online
Before you enable a quota, you must first set the quota parameters.

Syntax

radosgw-admin quota set --quota-scope=user --uid=USER_ID [--max-objects=NUMBER_OF_OBJECTS] [--max-


size=MAXIMUM_SIZE_IN_BYTES]

Example

[root@host01 ~]# radosgw-admin quota set --quota-scope=user --uid=johndoe --max-objects=1024 --max-


size=1024

A negative value for num objects and / or max size means that the specific quota attribute check is disabled.

Enable and disable user quotas


Edit online
Once you set a user quota, you can enable it.

744 IBM Storage Ceph


Syntax

radosgw-admin quota enable --quota-scope=user --uid=USER_ID

You may disable an enabled user quota.

Syntax

radosgw-admin quota disable --quota-scope=user --uid=USER_ID

Set bucket quotas


Edit online
Bucket quotas apply to the buckets owned by the specified uid. They are independent of the user.

Syntax

radosgw-admin quota set --uid=USER_ID --quota-scope=bucket --bucket=BUCKET_NAME [--max-


objects=NUMBER_OF_OBJECTS] [--max-size=MAXIMUM_SIZE_IN_BYTES]

A negative value for NUMBER_OF_OBJECTS, MAXIMUM_SIZE_IN_BYTES, or both means that the specific quota attribute check is
disabled.

Enable and disable bucket quotas


Edit online
Once you set a bucket quota, you may enable it.

Syntax

radosgw-admin quota enable --quota-scope=bucket --uid=USER_ID

You may disable an enabled bucket quota.

Syntax

radosgw-admin quota disable --quota-scope=bucket --uid=USER_ID

Get quota settings


Edit online
You may access each user’s quota settings via the user information API. To read user quota setting information with the CLI
interface, run the following command:

Syntax

radosgw-admin user info --uid=USER_ID

To get quota settings for a tenanted user, specify the user ID and the name of the tenant:

Syntax

radosgw-admin user info --uid=USER_ID --tenant=TENANT

Update quota stats


Edit online

IBM Storage Ceph 745


Quota stats get updated asynchronously. You can update quota statistics for all users and all buckets manually to retrieve the latest
quota stats.

Syntax

radosgw-admin user stats --uid=USER_ID --sync-stats

Get user quota usage stats


Edit online
To see how much of the quota a user has consumed, run the following command:

Syntax

radosgw-admin user stats --uid=USER_ID

NOTE: You should run the radosgw-admin user stats command with the --sync-stats option to receive the latest data.

Quota cache
Edit online
Quota statistics are cached for each Ceph Gateway instance. If there are multiple instances, then the cache can keep quotas from
being perfectly enforced, as each instance will have a different view of the quotas. The options that control this are rgw bucket
quota ttl, rgw user quota bucket sync interval, and rgw user quota sync interval. The higher these values are,
the more efficient quota operations are, but the more out-of-sync multiple instances will be. The lower these values are, the closer to
perfect enforcement multiple instances will achieve. If all three are 0, then quota caching is effectively disabled, and multiple
instances will have perfect quota enforcement.

For more details on these options, see Configuration reference

Reading and writing global quotas


Edit online
You can read and write quota settings in a zonegroup map. To get a zonegroup map:

[root@host01 ~]# radosgw-admin global quota get

The global quota settings can be manipulated with the global quota counterparts of the quota set, quota enable, and quota
disable commands, for example:

[root@host01 ~]# radosgw-admin global quota set --quota-scope bucket --max-objects 1024
[root@host01 ~]# radosgw-admin global quota enable --quota-scope bucket

NOTE: In a multi-site configuration, where there is a realm and period present, changes to the global quotas must be committed
using period update --commit. If there is no period present, the Ceph Object Gateways must be restarted for the changes to
take effect.

Bucket management
Edit online
As a storage administrator, when using the Ceph Object Gateway you can manage buckets by moving them between users and
renaming them. You can create bucket notifications to trigger on specific events. Also, you can find orphan or leaky objects within the
Ceph Object Gateway that can occur over the lifetime of a storage cluster.

NOTE: When millions of objects are uploaded to a Ceph Object Gateway bucket with a high ingest rate, incorrect num_objects are
reported with the radosgw-admin bucket stats command. With the radosgw-admin bucket list command you can

746 IBM Storage Ceph


correct the value of num_objects parameter.

NOTE: The radosgw-admin bucket stats command does not return Unknown error 2002 error and explicitly translates to
POSIX error 2 such as "No such file or directory" error.

Renaming buckets
Moving buckets
Finding orphan and leaky objects
Managing bucket index entries
Bucket notifications
Creating bucket notifications

Reference
Edit online

Developer

Renaming buckets
Edit online
You can rename buckets. If you want to allow underscores in bucket names, then set the rgw_relaxed_s3_bucket_names option
to true.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

An existing bucket.

Procedure
Edit online

1. List the buckets:

Example

[ceph: root@host01 /]# radosgw-admin bucket list


[
"34150b2e9174475db8e191c188e920f6/swcontainer",
"s3bucket1",
"34150b2e9174475db8e191c188e920f6/swimpfalse",
"c278edd68cfb4705bb3e07837c7ad1a8/ec2container",
"c278edd68cfb4705bb3e07837c7ad1a8/demoten1",
"c278edd68cfb4705bb3e07837c7ad1a8/demo-ct",
"c278edd68cfb4705bb3e07837c7ad1a8/demopostup",
"34150b2e9174475db8e191c188e920f6/postimpfalse",
"c278edd68cfb4705bb3e07837c7ad1a8/demoten2",
"c278edd68cfb4705bb3e07837c7ad1a8/postupsw"
]

2. Rename the bucket:

Syntax

radosgw-admin bucket link --bucket=ORIGINAL_NAME --bucket-new-name=NEW_NAME --uid=USER_ID

Example

IBM Storage Ceph 747


[ceph: root@host01 /]# radosgw-admin bucket link --bucket=s3bucket1 --bucket-new-name=s3newb -
-uid=testuser

If the bucket is inside a tenant, specify the tenant as well:

Syntax

radosgw-admin bucket link --bucket=tenant/ORIGINAL_NAME --bucket-new-name=NEW_NAME --


uid=TENANT$USER_ID

Example

[ceph: root@host01 /]# radosgw-admin bucket link --bucket=test/s3bucket1 --bucket-new-


name=s3newb --uid=test$testuser

3. Verify the bucket was renamed:

Example

[ceph: root@host01 /]# radosgw-admin bucket list


[
"34150b2e9174475db8e191c188e920f6/swcontainer",
"34150b2e9174475db8e191c188e920f6/swimpfalse",
"c278edd68cfb4705bb3e07837c7ad1a8/ec2container",
"s3newb",
"c278edd68cfb4705bb3e07837c7ad1a8/demoten1",
"c278edd68cfb4705bb3e07837c7ad1a8/demo-ct",
"c278edd68cfb4705bb3e07837c7ad1a8/demopostup",
"34150b2e9174475db8e191c188e920f6/postimpfalse",
"c278edd68cfb4705bb3e07837c7ad1a8/demoten2",
"c278edd68cfb4705bb3e07837c7ad1a8/postupsw"
]

Moving buckets
Edit online
The radosgw-admin bucket utility provides the ability to move buckets between users. To do so, link the bucket to a new user and
change the ownership of the bucket to the new user.

You can move buckets:

Between two non-tenanted users

Between two tenanted users

Between a non-tenanted user to a tenanted user

Moving buckets between non-tenanted users


Moving buckets between tenanted users
Moving buckets from non-tenanted users to tenanted users

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Ceph Object Gateway is installed.

An S3 bucket.

Various tenanted and non-tenanted users.

Moving buckets between non-tenanted users

748 IBM Storage Ceph


Edit online
The radosgw-admin bucket chown command provides the ability to change the ownership of buckets and all objects they
contain from one user to another. To do so, unlink a bucket from the current user, link it to a new user, and change the ownership of
the bucket to the new user.

Procedure
Edit online

1. Link the bucket to a new user:

Syntax

radosgw-admin bucket link --uid=USER --bucket=BUCKET

Example

[ceph: root@host01 /]# radosgw-admin bucket link --uid=user2 --bucket=data

2. Verify that the bucket has been linked to user2 successfully:

Example

[ceph: root@host01 /]# radosgw-admin bucket list --uid=user2


[
"data"
]

3. Change the ownership of the bucket to the new user:

Syntax

radosgw-admin bucket chown --uid=user --bucket=bucket

Example

[ceph: root@host01 /]# radosgw-admin bucket chown --uid=user2 --bucket=data

4. Verify that the ownership of the data bucket has been successfully changed by checking the owner line in the output of the
following command:

Example

[ceph: root@host01 /]# radosgw-admin bucket list --bucket=data

Moving buckets between tenanted users


Edit online
You can move buckets between one tenanted user and another.

Procedure
Edit online

1. Link the bucket to a new user:

Syntax

radosgw-admin bucket link --bucket=CURRENT_TENANT/BUCKET --uid=NEW_TENANT$USER

Example

[ceph: root@host01 /]# radosgw-admin bucket link --bucket=test/data --uid=test2$user2

2. Verify that the bucket has been linked to user2 successfully:

IBM Storage Ceph 749


[ceph: root@host01 /]# radosgw-admin bucket list --uid=test$user2
[
"data"
]

3. Change the ownership of the bucket to the new user:

Syntax

radosgw-admin bucket chown --bucket=NEW_TENANT/BUCKET --uid=NEW_TENANT$USER

Example

[ceph: root@host01 /]# radosgw-admin bucket chown --bucket='test2/data' --uid='test$tuser2'

4. Verify that the ownership of the data bucket has been successfully changed by checking the owner line in the output of the
following command:

[ceph: root@host01 /]# radosgw-admin bucket list --bucket=test2/data

Moving buckets from non-tenanted users to tenanted users


Edit online
You can move buckets from a non-tenanted user to a tenanted user.

Procedure
Edit online

1. Optional: If you do not already have multiple tenants, you can create them by enabling rgw_keystone_implicit_tenants
and accessing the Ceph Object Gateway from an external tenant:

Enable the rgw_keystone_implicit_tenants option:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_keystone_implicit_tenants true

Access the Ceph Object Gateway from an eternal tenant using either the s3cmd or swift command:

Example

[ceph: root@host01 /]# swift list

Or use s3cmd:

Example

[ceph: root@host01 /]# s3cmd ls

The first access from an external tenant creates an equivalent Ceph Object Gateway user.

2. Move a bucket to a tenanted user:

Syntax

radosgw-admin bucket link --bucket=/BUCKET --uid=TENANT$USER

Example

[ceph: root@host01 /]# radosgw-admin bucket link --bucket=/data --uid='test$tenanted-user'

3. Verify that the data bucket has been linked to tenanted-user successfully:

Example

[ceph: root@host01 /]# radosgw-admin bucket list --uid='test$tenanted-user'


[

750 IBM Storage Ceph


"data"
]

4. Change the ownership of the bucket to the new user:

Syntax

radosgw-admin bucket chown --bucket=tenant/bucket name --uid=tenant$user

Example

[ceph: root@host01 /]# radosgw-admin bucket chown --bucket='test/data' --uid='test$tenanted-


user'

5. Verify that the ownership of the data bucket has been successfully changed by checking the owner line in the output of the
following command:

Example

[ceph: root@host01 /]# radosgw-admin bucket list --bucket=test/data

Finding orphan and leaky objects


Edit online
A healthy storage cluster does not have any orphan or leaky objects, but in some cases orphan or leaky objects can occur.

An orphan object exists in a storage cluster and has an object ID associated with the RADOS object. However, there is no reference of
the RADOS object with the S3 object in the bucket index reference.

For example, if the Ceph Object Gateway goes down in the middle of an operation, this can cause some objects to become orphans.
Also, an undiscovered bug can cause orphan objects to occur.

You can see how the Ceph Object Gateway objects map to the RADOS objects. The radosgw-admin command provides a tool to
search for and produce a list of these potential orphan or leaky objects. Using the radoslist subcommand displays objects stored
within buckets, or all buckets in the storage cluster. The rgw-orphan-list script displays orphan objects within a pool.

NOTE: The radoslist subcommand is replacing the deprecated orphans find and orphans finish subcommands.

IMPORTANT: Do not use this command where Indexless buckets are in use as all the objects appear as orphaned.

IMPORTANT: Another alternate way to identity orphaned objects is to run the rados -p <pool> ls | grep BUCKET_ID
command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A running Ceph Object Gateway.

Procedure
Edit online

1. Generate a list of objects that hold data within a bucket.

Syntax

radosgw-admin bucket radoslist --bucket BUCKET_NAME

Example

[root@host01 ~]# radosgw-admin bucket radoslist --bucket mybucket

NOTE: If the BUCKET_NAME is omitted, then all objects in all buckets are displayed.

IBM Storage Ceph 751


2. Check the version of rgw-orphan-list.

Example

[root@host01 ~]# head /usr/bin/rgw-orphan-list

The version should be 2023-01-11 or newer.

3. Create a directory where you need to generate the list of orphans.

Example

[root@host01 ~]# mkdir orphans

4. Navigate to the directory created earlier.

Example

[root@host01 ~]# cd orphans

5. From the pool list, select the pool in which you want to find orphans. This script might run for a long time depending on the
objects in the cluster.

Example

[root@host01 orphans]# rgw-orphan-list

Example

Available pools:
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.buckets.index
default.rgw.buckets.data
rbd
default.rgw.buckets.non-ec
ma.rgw.control
ma.rgw.meta
ma.rgw.log
ma.rgw.buckets.index
ma.rgw.buckets.data
ma.rgw.buckets.non-ec
Which pool do you want to search for orphans?

Enter the pool name to search for orphans.

IMPORTANT: A data pool must be specified when using the rgw-orphan-list command, and not a metadata pool.

6. View the details of the rgw-orphan-list tool usage.

Syntax

rgw-orphan-list -h
rgw-orphan-list POOL_NAME /DIRECTORY

Example

[root@host01 orphans]# rgw-orphan-list default.rgw.buckets.data /orphans

2023-09-12 08:41:14 ceph-host01 Computing delta...


2023-09-12 08:41:14 ceph-host01 Computing results...
10 potential orphans found out of a possible 2412 (0%). <<<<<<< orphans detected
The results can be found in './orphan-list-20230912124113.out'.
Intermediate files are './rados-20230912124113.intermediate' and './radosgw-admin-
20230912124113.intermediate'.
***
*** WARNING: This is EXPERIMENTAL code and the results should be used
*** only with CAUTION!
***
Done at 2023-09-12 08:41:14.

752 IBM Storage Ceph


7. Run the ls -l command to verify the files ending with error should be zero length indicating the script ran without any
issues.

Example

[root@host01 orphans]# ls -l

-rw-r--r--. 1 root root 770 Sep 12 03:59 orphan-list-20230912075939.out


-rw-r--r--. 1 root root 0 Sep 12 03:59 rados-20230912075939.error
-rw-r--r--. 1 root root 248508 Sep 12 03:59 rados-20230912075939.intermediate
-rw-r--r--. 1 root root 0 Sep 12 03:59 rados-20230912075939.issues
-rw-r--r--. 1 root root 0 Sep 12 03:59 radosgw-admin-20230912075939.error
-rw-r--r--. 1 root root 247738 Sep 12 03:59 radosgw-admin-20230912075939.intermediate

8. Review the orphan objects listed.

Example

[root@host01 orphans]# cat ./orphan-list-20230912124113.out

a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.0
a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.1
a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.2
a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.3
a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.4
a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.5
a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.6
a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.7
a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.8
a9c042bc-be24-412c-9052-dda6b2f01f55.16749.1_key1.cherylf.433-bucky-4865-0.9

9. Remove orphan objects:

Syntax

rados -p POOL_NAME rm OBJECT_NAME

Example

[root@host01 orphans]# rados -p default.rgw.buckets.data rm myobject

WARNING: Verify you are removing the correct objects. Running the rados rm command removes data from the storage
cluster.

Managing bucket index entries


Edit online
Each bucket index entry related to a piece of a multipart upload object is matched against its corresponding .meta index entry.
There should be one .meta entry for all the pieces of a given multipart upload. If it fails to find a corresponding .meta entry for a
piece, it lists out the "orphaned" piece entries in a section of the output.

The stats for the bucket are stored in the bucket index headers. This phase loads those headers and also iterates through all the
plain object entries in the bucket index and recalculates the stats. It then displays the actual and calculated stats in sections labeled
"existing_header" and "calculated_header" respectively, so they can be compared.

If you use the --fix option with the bucket check sub-command, it removes the "orphaned" entries from the bucket index and
also overwrites the existing stats in the header with those that it calculated. It causes all entries, including the multiple entries used
in versioning, to be listed in the output.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A running Ceph Object Gateway.

IBM Storage Ceph 753


A newly created bucket.

Procedure
Edit online

1. Check the bucket index of a specific bucket:

Syntax

radosgw-admin bucket check --bucket=BUCKET_NAME

Example

[root@rgw ~]# radosgw-admin bucket check --bucket=mybucket

2. Fix the inconsistencies in the bucket index, including removal of orphaned objects:

Syntax

radosgw-admin bucket check --fix --bucket=BUCKET_NAME

Example

[root@rgw ~]# radosgw-admin bucket check --fix --bucket=mybucket

Bucket notifications
Edit online
Bucket notifications provide a way to send information out of the Ceph Object Gateway when certain events happen in the bucket.
Bucket notifications can be sent to HTTP, AMQP0.9.1, and Kafka endpoints. A notification entry must be created to send bucket
notifications for events on a specific bucket and to a specific topic. A bucket notification can be created on a subset of event types or
by default for all event types. The bucket notification can filter out events based on key prefix or suffix, regular expression matching
the keys, and on the metadata attributes attached to the object, or the object tags. Bucket notifications have a REST API to provide
configuration and control interfaces for the bucket notification mechanism.

NOTE: The bucket notifications API is enabled by default. If rgw_enable_apis configuration parameter is explicitly set, ensure that
s3, and notifications are included. To verify this, run the ceph --admin-daemon /var/run/ceph/ceph-
client.rgw.NAME.asok config get rgw_enable_apis command. Replace NAME with the Ceph Object Gateway instance
name.

Topic management using CLI

You can manage list, get, and remove topics for the Ceph Object Gateway buckets:

List topics: Run the following command to list the configuration of all topics:

Example

[ceph: host01 /]# radosgw-admin topic list

Get topics: Run the following command to get the configuration of a specific topic:

Example

[ceph: host01 /]# radosgw-admin topic get --topic=topic1

Remove topics: Run the following command to remove the configuration of a specific topic:

Example

[ceph: host01 /]# radosgw-admin topic rm --topic=topic1

NOTE: The topic is removed even if the Ceph Object Gateway bucket is configured to that topic.

754 IBM Storage Ceph


Creating bucket notifications
Edit online
Create bucket notifications at the bucket level. These need to be published with the destination to send the bucket notifications.
Bucket notifications are S3 operations.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with Ceph Object Gateway.

A running HTTP server, RabbitMQ server, or a Kafka server.

Root-level access.

User access key and secret key.

Endpoint parameters.

IMPORTANT: IBM supports ObjectCreate events, such as put, post, multipartUpload, and copy. IBM also supports
ObjectRemove events, such as object_delete and s3_multi_object_delete.

Listed here are two ways of creating bucket notifications:

Using the boto script

Using AWS CLI

Procedure
Edit online
Using the boto script

1. Install the python3-boto3 package:

Example

[user@client ~]$ dnf install python3-boto3

2. Create an S3 bucket.

3. Create a python script topic.py to create an SNS topic for http,amqp, or kafka protocol:

Example

import boto3
from botocore.client import Config
import sys

# endpoint and keys from vstart


endpoint = 'http://127.0.0.1:8000'
access_key='0555b35654ad1656d804'
secret_key='h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q=='

client = boto3.client('sns',
endpoint_url=endpoint,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
config=Config(signature_version='s3'))

attributes = {"push-endpoint": "amqp://localhost:5672", "amqp-exchange": "ex1", "amqp-ack-


level": "broker"}

client.create_topic(topic_name="mytopic", Attributes=attributes)

4. Run the python script for creating topic:

IBM Storage Ceph 755


Example

python3 topic.py

5. Create a python script notification.py to create S3 bucket notification for s3:objectCreate and s3:objectRemove
events:

Example

import boto3
import sys

# bucket name as first argument


bucketname = sys.argv[1]
# topic ARN as second argument
topic_arn = sys.argv[2]
# notification id as third argument
notification_id = sys.argv[3]

# endpoint and keys from vstart


endpoint = 'http://127.0.0.1:8000'
access_key='0555b35654ad1656d804'
secret_key='h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q=='

client = boto3.client('s3',
endpoint_url=endpoint,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key)

# regex filter on the object name and metadata based filtering are extension to AWS S3 API
# bucket and topic should be created beforehand

topic_conf_list = [{'Id': notification_id,


'TopicArn': topic_arn,
'Events': ['s3:ObjectCreated:*', 's3:ObjectRemoved:*'],
}]
client.put_bucket_notification_configuration(
Bucket=bucketname,
NotificationConfiguration={
'TopicConfigurations': [
{
'Id': notification_name,
'TopicArn': topic_arn,
'Events': ['s3:ObjectCreated:*', 's3:ObjectRemoved:*']
}]})

6. Run the python script for creating the bucket notification:

Example

python3 notification.py

7. Create S3 objects in the bucket.

8. Fetch the notification configuration:

Example

endpoint = 'http://127.0.0.1:8000'
access_key='0555b35654ad1656d804'
secret_key='h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q=='

client = boto3.client('s3',
endpoint_url=endpoint,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key)

# getting a specific notification configuration is an extension to AWS S3 API

print(client.get_bucket_notification_configuration(Bucket=bucketname))

9. Optional: Delete the objects.

a. Verify the object deletion events at the http, rabbitmq, or kafka receiver.

756 IBM Storage Ceph


Using the AWS CLI

1. Create topic:

Syntax

aws --endpoint=_AWS_END_POINT_ sns create-topic --name NAME --attributes=ATTRIBUTES_FILE

Example

[user@client ~]$ aws --endpoint=http://localhost sns create-topic --name test-kafka --


attributes=file://topic.json

sample topic.json:
{"push-endpoint": "kafka://localhost","verify-ssl": "False", "kafka-ack-level": "broker",
"persistent":"true"}
ref: https://docs.aws.amazon.com/cli/latest/reference/sns/create-topic.html

2. Create the bucket notification:

Syntax

aws s3api put-bucket-notification-configuration --bucket BUCKET_NAME --notification-


configuration NOTIFICATION_FILE

Example

[user@client ~]$ aws s3api put-bucket-notification-configuration --bucket my-bucket --


notification-configuration file://notification.json

sample notification.json
{
"TopicConfigurations": [
{
"Id": "test_notification",
"TopicArn": "arn:aws:sns:us-west-2:123456789012:test-kafka",
"Events": [
"s3:ObjectCreated:*"
]
}
]
}

3. Fetch the notification configuration:

Syntax

aws s3api --endpoint=_AWS_ENDPOINT_ get-bucket-notification-configuration --bucket


BUCKET_NAME

Example

[user@client ~]$ aws s3api --endpoint=http://localhost get-bucket-notification-configuration


--bucket my-bucket
{
"TopicConfigurations": [
{
"Id": "test_notification",
"TopicArn": "arn:aws:sns:default::test-kafka",
"Events": [
"s3:ObjectCreated:*"
]
}
]
}

Bucket lifecycle
Edit online
As a storage administrator, you can use a bucket lifecycle configuration to manage your objects so they are stored effectively
throughout their lifetime. For example, you can transition objects to less expensive storage classes, archive, or even delete them

IBM Storage Ceph 757


based on your use case.

RADOS Gateway supports S3 API object expiration by using rules defined for a set of bucket objects. Each rule has a prefix, which
selects the objects, and a number of days after which objects become unavailable.

Creating a lifecycle management policy


Deleting a lifecycle management policy
Updating a lifecycle management policy
Monitoring bucket lifecycles
Configuring lifecycle expiration window
S3 bucket lifecycle transition within a storage cluster
Transitioning an object from one storage class to another
Enabling object lock for S3

Creating a lifecycle management policy


Edit online
You can manage a bucket lifecycle policy configuration using standard S3 operations rather than using the radosgw-admin
command. RADOS Gateway supports only a subset of the Amazon S3 API policy language applied to buckets. The lifecycle
configuration contains one or more rules defined for a set of bucket objects.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

An S3 user created with user access.

Access to a Ceph Object Gateway client with the AWS CLI package installed.

Procedure
Edit online

1. Create a JSON file for lifecycle configuration:

Example

[user@client ~]$ vi lifecycle.json

2. Add the specific lifecycle configuration rules in the file:

Example

{
"Rules": [
{
"Filter": {
"Prefix": "images/"
},
"Status": "Enabled",
"Expiration": {
"Days": 1
},
"ID": "ImageExpiration"
}
]
}

758 IBM Storage Ceph


The lifecycle configuration example expires objects in the images directory after 1 day.

3. Set the lifecycle configuration on the bucket:

Syntax

aws --endpoint-url=_RADOSGW_ENDPOINT_URL_:PORT s3api put-bucket-lifecycle-configuration --


bucket _BUCKET_NAME_ --lifecycle-configuration
file://_PATH_TO_LIFECYCLE_CONFIGURATION_FILE_/_LIFECYCLE_CONFIGURATION_FILE_.json

Example

[user@client ~]$ aws --endpoint-url=http://host01:80 s3api put-bucket-lifecycle-configuration


--bucket testbucket --lifecycle-configuration file://lifecycle.json

In this example, the lifecycle.json file exists in the current directory.

Retrieve the lifecycle configuration for the bucket:

Syntax

aws --endpoint-url=_RADOSGW_ENDPOINT_URL_:PORT s3api get-bucket-lifecycle-configuration --


bucket _BUCKET_NAME_

Example

[user@client ~]$ aws --endpoint-url=http://host01:80 s3api get-bucket-lifecycle-configuration


--bucket testbucket
{
"Rules": [
{
"Expiration": {
"Days": 1
},
"ID": "ImageExpiration",
"Filter": {
"Prefix": "images/"
},
"Status": "Enabled"
}
]
}

Optional: From the Ceph Object Gateway node, log into the Cephadm shell and retrieve the bucket lifecycle configuration:

Syntax

radosgw-admin lc get --bucket=BUCKET_NAME

Example

[ceph: root@host01 /]# radosgw-admin lc get --bucket=testbucket


{
"prefix_map": {
"images/": {
"status": true,
"dm_expiration": false,
"expiration": 1,
"noncur_expiration": 0,
"mp_expiration": 0,
"transitions": {},
"noncur_transitions": {}
}
},
"rule_map": [
{
"id": "ImageExpiration",
"rule": {
"id": "ImageExpiration",
"prefix": "",
"status": "Enabled",
"expiration": {
"days": "1",
"date": ""
},

IBM Storage Ceph 759


"mp_expiration": {
"days": "",
"date": ""
},
"filter": {
"prefix": "images/",
"obj_tags": {
"tagset": {}
}
},
"transitions": {},
"noncur_transitions": {},
"dm_expiration": false
}
}
]
}

Reference
Edit online

S3 bucket lifecycle

For more information on using the AWS CLI to manage lifecycle configurations, see the Setting lifecycle configuration on a
bucket section of the Amazon Simple Storage Service documentation.

Deleting a lifecycle management policy


Edit online
You can delete the lifecycle management policy for a specified bucket by using the s3api delete-bucket-lifecycle
command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

An S3 user created with user access.

Access to a Ceph Object Gateway client with the AWS CLI package installed.

Procedure
Edit online

Delete a lifecycle configuration:

Syntax

aws --endpoint-url=_RADOSGW_ENDPOINT_URL_:PORT s3api delete-bucket-lifecycle --bucket


_BUCKET_NAME_

Example

[user@client ~]$ aws --endpoint-url=http://host01:80 s3api delete-bucket-lifecycle --bucket


testbucket

760 IBM Storage Ceph


Verification
Edit online

Retrieve lifecycle configuration for the bucket:

Syntax

aws --endpoint-url=_RADOSGW_ENDPOINT_URL_:PORT s3api get-bucket-lifecycle-configuration --


bucket _BUCKET_NAME_

Example

[user@client ~]# aws --endpoint-url=http://host01:80 s3api get-bucket-lifecycle-configuration


--bucket testbucket

Optional: From the Ceph Object Gateway node, retrieve the bucket lifecycle configuration:

Syntax

radosgw-admin lc get --bucket=BUCKET_NAME

Example

[ceph: root@host01 /]# radosgw-admin lc get --bucket=testbucket

The command does not return any information if a bucket lifecycle policy is not present.

Updating a lifecycle management policy


Edit online
You can update a lifecycle management policy by using the s3cmd put-bucket-lifecycle-configuration command.

NOTE: The put-bucket-lifecycle-configuration overwrites an existing bucket lifecycle configuration. If you want to retain
any of the current lifecycle policy settings, you must include them in the lifecycle configuration file.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

An S3 bucket created.

An S3 user created with user access.

Access to a Ceph Object Gateway client with the AWS CLI package installed.

Procedure
Edit online

1. Create a JSON file for the lifecycle configuration:

Example

[user@client ~]$ vi lifecycle.json

2. Add the specific lifecycle configuration rules to the file:

Example

IBM Storage Ceph 761


{
"Rules": [
{
"Filter": {
"Prefix": "images/"
},
"Status": "Enabled",
"Expiration": {
"Days": 1
},
"ID": "ImageExpiration"
},
{
"Filter": {
"Prefix": "docs/"
},
"Status": "Enabled",
"Expiration": {
"Days": 30
},
"ID": "DocsExpiration"
}
]
}

3. Update the lifecycle configuration on the bucket:

Syntax

aws --endpoint-url=_RADOSGW_ENDPOINT_URL_:PORT s3api put-bucket-lifecycle-configuration --


bucket _BUCKET_NAME_ --lifecycle-configuration
file://_PATH_TO_LIFECYCLE_CONFIGURATION_FILE_/_LIFECYCLE_CONFIGURATION_FILE_.json

Example

[user@client ~]$ aws --endpoint-url=http://host01:80 s3api put-bucket-lifecycle-configuration


--bucket testbucket --lifecycle-configuration file://lifecycle.json

Retrieve the lifecycle configuration for the bucket:

Syntax

aws --endpointurl=_RADOSGW_ENDPOINT_URL_:PORT s3api get-bucket-lifecycle-configuration --


bucket _BUCKET_NAME_

Example

[user@client ~]$ aws -endpoint-url=http://host01:80 s3api get-bucket-lifecycle-configuration -


-bucket testbucket

{
"Rules": [
{
"Expiration": {
"Days": 30
},
"ID": "DocsExpiration",
"Filter": {
"Prefix": "docs/"
},
"Status": "Enabled"
},
{
"Expiration": {
"Days": 1
},
"ID": "ImageExpiration",
"Filter": {
"Prefix": "images/"
},
"Status": "Enabled"
}
]
}

762 IBM Storage Ceph


Optional: From the Ceph Object Gateway node, log into the Cephadm shell and retrieve the bucket lifecycle configuration:

Syntax

radosgw-admin lc get --bucket=BUCKET_NAME

Example

[ceph: root@host01 /]# radosgw-admin lc get --bucket=testbucket


{
"prefix_map": {
"docs/": {
"status": true,
"dm_expiration": false,
"expiration": 1,
"noncur_expiration": 0,
"mp_expiration": 0,
"transitions": {},
"noncur_transitions": {}
},
"images/": {
"status": true,
"dm_expiration": false,
"expiration": 1,
"noncur_expiration": 0,
"mp_expiration": 0,
"transitions": {},
"noncur_transitions": {}
}
},
"rule_map": [
{
"id": "DocsExpiration",
"rule": {
"id": "DocsExpiration",
"prefix": "",
"status": "Enabled",
"expiration": {
"days": "30",
"date": ""
},
"noncur_expiration": {
"days": "",
"date": ""
},
"mp_expiration": {
"days": "",
"date": ""
},
"filter": {
"prefix": "docs/",
"obj_tags": {
"tagset": {}
}
},
"transitions": {},
"noncur_transitions": {},
"dm_expiration": false
}
},
{
"id": "ImageExpiration",
"rule": {
"id": "ImageExpiration",
"prefix": "",
"status": "Enabled",
"expiration": {
"days": "1",
"date": ""
},
"mp_expiration": {
"days": "",
"date": ""
},
"filter": {

IBM Storage Ceph 763


"prefix": "images/",
"obj_tags": {
"tagset": {}
}
},
"transitions": {},
"noncur_transitions": {},
"dm_expiration": false
}
}
]
}

Monitoring bucket lifecycles


Edit online
You can monitor lifecycle processing and manually process the lifecycle of buckets with the radosgw-admin lc list and
radosgw-admin lc process commands.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to a Ceph Object Gateway node.

Creation of an S3 bucket with a lifecycle configuration policy applied.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. List bucket lifecycle progress:

Example

[ceph: root@host01 /]# radosgw-admin lc list

[
{
“bucket”: “:testbucket:8b63d584-9ea1-4cf3-8443-a6a15beca943.54187.1”,
“started”: “Thu, 01 Jan 1970 00:00:00 GMT”,
“status” : “UNINITIAL”
},
{
“bucket”: “:testbucket1:8b635499-9e41-4cf3-8443-a6a15345943.54187.2”,
“started”: “Thu, 01 Jan 1970 00:00:00 GMT”,
“status” : “UNINITIAL”
}
]

The bucket lifecycle processing status can be one of the following:

UNINITIAL - The process has not run yet.

PROCESSING - The process is currently running.

COMPLETE - The process has completed.

3. Optional: You can manually process bucket lifecycle policies:

764 IBM Storage Ceph


a. Process the lifecycle policy for a single bucket:

Syntax

radosgw-admin lc process --bucket=BUCKET_NAME

Example

[ceph: root@host01 /]# radosgw-admin lc process --bucket=testbucket1

b. Process all bucket lifecycle policies immediately:

Example

[ceph: root@host01 /]# radosgw-admin lc process

Verification
Edit online

List the bucket lifecycle policies:

[ceph: root@host01 /]# radosgw-admin lc list


[
{
“bucket”: “:testbucket:8b63d584-9ea1-4cf3-8443-a6a15beca943.54187.1”,
“started”: “Thu, 17 Mar 2022 21:48:50 GMT”,
“status” : “COMPLETE”
}
{
“bucket”: “:testbucket1:8b635499-9e41-4cf3-8443-a6a15345943.54187.2”,
“started”: “Thu, 17 Mar 2022 20:38:50 GMT”,
“status” : “COMPLETE”
}
]

Reference
Edit online

S3 bucket lifecycle

Configuring lifecycle expiration window


Edit online
You can set the time that the lifecycle management process runs each day by setting the rgw_lifecycle_work_time parameter.
By default, lifecycle processing occurs once per day, at midnight.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A Ceph Object Gateway node.

Installation of the Ceph Object Gateway.

Root-level access to a Ceph Object Gateway node.

Procedure
Edit online

IBM Storage Ceph 765


1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Set the lifecycle expiration time:

Syntax

ceph config set client.rgw rgw_lifecycle_work_time %D:%D-%D:%D

Replace %d:%d-%d:%d with start_hour:start_minute-end_hour:end_minute.

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_lifecycle_work_time 06:00-08:00

Verification
Edit online

Retrieve the lifecycle expiration work time:

Example

[ceph: root@host01 /]# ceph config get client.rgw rgw_lifecycle_work_time

06:00-08:00

Reference
Edit online

S3 bucket lifecycle

S3 bucket lifecycle transition within a storage cluster


Edit online
You can use a bucket lifecycle configuration to manage objects so objects are stored effectively throughout the object’s lifetime. The
object lifecycle transition rule allows you to manage, and effectively store the objects throughout the object’s lifetime. You can
transition objects to less expensive storage classes, archive, or even delete them.

You can create storage classes for:

Fast media, such as SSD or NVMe for I/O sensitive workloads.

Slow magnetic media, such as SAS or SATA for archiving.

You can create a schedule for data movement between a hot storage class and a cold storage class. You can schedule this movement
after a specified time so that the object expires and is deleted permanently for example you can transition objects to a storage class
30 days after you have created or even archived the objects to a storage class one year after creating them. You can do this through a
transition rule. This rule applies to an object transitioning from one storage class to another. The lifecycle configuration contains one
or more rules using the <Rule> element.

Transitioning an object from one storage class to another


Edit online
The object lifecycle transition rule allows you to transition an object from one storage class to another class.

You can migrate data between replicated pools, erasure-coded pools, replicated to erasure-coded pools, or erasure-coded to
replicated pools with the Ceph Object Gateway lifecycle transition policy.

766 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

Root-level access to the Ceph Object Gateway node.

An S3 user created with user access.

Procedure
Edit online

1. Create a new data pool:

Syntax

ceph osd pool create POOL_NAME

Example

[ceph: root@host01 /]# ceph osd pool create test.hot.data

2. Add a new storage class:

Syntax

radosgw-admin zonegroup placement add --rgw-zonegroup default --placement-id PLACEMENT_TARGET


--storage-class STORAGE_CLASS

Example

[ceph: root@host01 /]# radosgw-admin zonegroup placement add --rgw-zonegroup default --


placement-id default-placement --storage-class hot.test
{
"key": "default-placement",
"val": {
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD",
"hot.test"
]
}
}

3. Provide the zone placement information for the new storage class:

Syntax

radosgw-admin zone placement add --rgw-zone default --placement-id PLACEMENT_TARGET --storage-


class STORAGE_CLASS --data-pool DATA_POOL

Example

[ceph: root@host01 /]# radosgw-admin zone placement add --rgw-zone default --placement-id
default-placement --storage-class hot.test --data-pool test.hot.data
{
"key": "default-placement",
"val": {
"index_pool": "test_zone.rgw.buckets.index",
"storage_classes": {
"STANDARD": {
"data_pool": "test.hot.data"
},
"hot.test": {
"data_pool": "test.hot.data",
}
},

IBM Storage Ceph 767


"data_extra_pool": "",
"index_type": 0
}

NOTE: Consider setting the compression_type when creating cold or archival data storage pools with write once.

4. Enable the rgw application on the data pool:

Syntax

ceph osd pool application enable POOL_NAME rgw

Example

[ceph: root@host01 /]# ceph osd pool application enable test.hot.data rgw
enabled application 'rgw' on pool 'test.hot.data'

5. Restart all the rgw daemons.

6. Create a bucket:

Example

[ceph: root@host01 /]# aws s3api create-bucket --bucket testbucket10 --create-bucket-


configuration LocationConstraint=default:default-placement --endpoint-url
http://10.0.0.80:8080

7. Add the object:

Example

[ceph: root@host01 /]# aws --endpoint=http://10.0.0.80:8080 s3api put-object --bucket


testbucket10 --key compliance-upload --body /root/test2.txt

8. Create a second data pool:

Syntax

ceph osd pool create POOL_NAME

Example

[ceph: root@host01 /]# ceph osd pool create test.cold.data

9. Add a new storage class:

Syntax

radosgw-admin zonegroup placement add --rgw-zonegroup default --placement-id PLACEMENT_TARGET


--storage-class STORAGE_CLASS

Example

[ceph: root@host01 /]# radosgw-admin zonegroup placement add --rgw-zonegroup default --


placement-id default-placement --storage-class cold.test
{
"key": "default-placement",
"val": {
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD",
"cold.test"
]
}
}

10. Provide the zone placement information for the new storage class:

Syntax

radosgw-admin zone placement add --rgw-zone default --placement-id PLACEMENT_TARGET --storage-


class STORAGE_CLASS --data-pool DATA_POOL

Example

768 IBM Storage Ceph


[ceph: root@host01 /]# radosgw-admin zone placement add --rgw-zone default --placement-id
default-placement --storage-class cold.test --data-pool test.cold.data

11. Enable rgw application on the data pool:

Syntax

ceph osd pool application enable POOL_NAME rgw

Example

[ceph: root@host01 /]# ceph osd pool application enable test.cold.data rgw
enabled application 'rgw' on pool 'test.cold.data'

12. Restart all the rgw daemons.

13. To view the zone group configuration, run the following command:

Syntax

radosgw-admin zonegroup get


{
"id": "3019de59-ddde-4c5c-b532-7cdd29de09a1",
"name": "default",
"api_name": "default",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "adacbe1b-02b4-41b8-b11d-0d505b442ed4",
"zones": [
{
"id": "adacbe1b-02b4-41b8-b11d-0d505b442ed4",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"hot.test",
"cold.test",
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "",
"sync_policy": {
"groups": []
}
}

14. To view the zone configuration, run the following command:

Syntax

radosgw-admin zone get


{
"id": "adacbe1b-02b4-41b8-b11d-0d505b442ed4",
"name": "default",
"domain_root": "default.rgw.meta:root",
"control_pool": "default.rgw.control",
"gc_pool": "default.rgw.log:gc",
"lc_pool": "default.rgw.log:lc",

IBM Storage Ceph 769


"log_pool": "default.rgw.log",
"intent_log_pool": "default.rgw.log:intent",
"usage_log_pool": "default.rgw.log:usage",
"roles_pool": "default.rgw.meta:roles",
"reshard_pool": "default.rgw.log:reshard",
"user_keys_pool": "default.rgw.meta:users.keys",
"user_email_pool": "default.rgw.meta:users.email",
"user_swift_pool": "default.rgw.meta:users.swift",
"user_uid_pool": "default.rgw.meta:users.uid",
"otp_pool": "default.rgw.otp",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "default.rgw.buckets.index",
"storage_classes": {
"cold.test": {
"data_pool": "test.cold.data"
},
"hot.test": {
"data_pool": "test.hot.data"
},
"STANDARD": {
"data_pool": "default.rgw.buckets.data"
}
},
"data_extra_pool": "default.rgw.buckets.non-ec",
"index_type": 0
}
}
],
"realm_id": "",
"notif_pool": "default.rgw.log:notif"
}

15. Create a bucket:

Example

[ceph: root@host01 /]# aws s3api create-bucket --bucket testbucket10 --create-bucket-


configuration LocationConstraint=default:default-placement --endpoint-url
http://10.0.0.80:8080

16. List the objects prior to transition:

Example

[ceph: root@host01 /]# radosgw-admin bucket list --bucket testbucket10

{
"ETag": "\"211599863395c832a3dfcba92c6a3b90\"",
"Size": 540,
"StorageClass": "STANDARD",
"Key": "obj1",
"VersionId": "W95teRsXPSJI4YWJwwSG30KxSCzSgk-",
"IsLatest": true,
"LastModified": "2023-11-23T10:38:07.214Z",
"Owner": {
"DisplayName": "test-user",
"ID": "test-user"
}
}

17. Create a JSON file for lifecycle configuration:

Example

[ceph: root@host01 /]# vi lifecycle.json

18. Add the specific lifecycle configuration rule in the file:

770 IBM Storage Ceph


Example

{
"Rules": [
{
"Filter": {
"Prefix": ""
},
"Status": "Enabled",
"Transitions": [
{
"Days": 5,
"StorageClass": "hot.test"
},
{
"Days": 20,
"StorageClass": "cold.test"
}
],
"Expiration": {
"Days": 365
},
"ID": "double transition and expiration"
}
]
}

The lifecycle configuration example shows an object that will transition from the default
`STANDARD` storage class to the `hot.test` storage class after 5 days, again transitions after
20 days to the `cold.test` storage class, and finally expires after 365 days in the
`cold.test` storage class.

19. Set the lifecycle configuration on the bucket:

[ceph: root@host01 /]# aws s3api put-bucket-lifecycle-configuration --bucket testbucket20 --


lifecycle-configuration file://lifecycle.json

20. Retrieve the lifecycle configuration on the bucket:

Example

[ceph: root@host01 /]# aws s3api get-bucket-lifecycle-configuration --bucket testbucket10


{
"Rules": [
{
"Expiration": {
"Days": 365
},
"ID": "double transition and expiration",
"Prefix": "",
"Status": "Enabled",
"Transitions": [
{
"Days": 20,
"StorageClass": "cold.test"
},
{
"Days": 5,
"StorageClass": "hot.test"
}
]
}
]

21. Verify that the object is transitioned to the given storage class:

Example

[ceph: root@host01 /]# radosgw-admin bucket list --bucket testbucket10

{
"ETag": "\"211599863395c832a3dfcba92c6a3b90\"",
"Size": 540,
"StorageClass": "cold.test",
"Key": "obj1",

IBM Storage Ceph 771


"VersionId": "W95teRsXPSJI4YWJwwSG30KxSCzSgk-",
"IsLatest": true,
"LastModified": "2023-11-23T10:38:07.214Z",
"Owner": {
"DisplayName": "test-user",
"ID": "test-user"
}
}

Enabling object lock for S3


Edit online
Using the S3 object lock mechanism, you can use object lock concepts like retention period, legal hold, and bucket configuration to
implement Write-Once-Read_Many (WORM) functionality as part of the custom workflow overriding data deletion permissions.

IMPORTANT: The object version(s), not the object name, is the defining and required value for object lock to perform correctly to
support the GOVERNANCE or COMPLIANCE mode. You need to know the version of the object when it is written so that you can
retrieve it at a later time.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Object Gateway node.

S3 user with version-bucket creation access.

Procedure
Edit online

1. Create a bucket with object lock enabled:

Syntax

aws --endpoint=http://RGW_PORT:8080 s3api create-bucket --bucket BUCKET_NAME --object-lock-


enabled-for-bucket

Example

[root@rgw-2 ~]# aws --endpoint=http://rgw.ceph.com:8080 s3api create-bucket --bucket worm-


bucket --object-lock-enabled-for-bucket

2. Set a retention period for the bucket:

Syntax

aws --endpoint=http://RGW_PORT:8080 s3api put-object-lock-configuration --bucket BUCKET_NAME -


-object-lock-configuration { "ObjectLockEnabled": "Enabled", "Rule": { "DefaultRetention": {
"Mode": "RETENTION_MODE", "Days": NUMBER_OF_DAYS }}}

Example

[root@rgw-2 ~]# aws --endpoint=http://rgw.ceph.com:8080 s3api put-object-lock-configuration --


bucket worm-bucket --object-lock-configuration '{ "ObjectLockEnabled": "Enabled", "Rule": {
"DefaultRetention": { "Mode": "COMPLIANCE", "Days": 10 }}}'

NOTE: You can choose either the GOVERNANCE or COMPLIANCE mode for the RETENTION_MODE in S3 object lock, to apply
different levels of protection to any object version that is protected by object lock.

In GOVERNANCE mode, users cannot overwrite or delete an object version or alter its lock settings unless they have special
permissions.

772 IBM Storage Ceph


In COMPLIANCE mode, a protected object version cannot be overwritten or deleted by any user, including the root user in
your AWS account. When an object is locked in COMPLIANCE mode, its RETENTION_MODE cannot be changed, and its
retention period cannot be shortened. COMPLIANCE mode helps ensure that an object version cannot be overwritten or
deleted for the duration of the period.

3. Put the object into the bucket with a retention time set:

Syntax

aws --endpoint=http://RGW_PORT:8080 s3api put-object --bucket BUCKET_NAME --object-lock-mode


RETENTION_MODE --object-lock-retain-until-date "DATE" --key compliance-upload --body TEST_FILE

Example

[root@rgw-2 ~]# aws --endpoint=http://rgw.ceph.com:8080 s3api put-object --bucket worm-bucket


--object-lock-mode COMPLIANCE --object-lock-retain-until-date "2022-05-31" --key compliance-
upload --body test.dd
{
"ETag": ""d560ea5652951637ba9c594d8e6ea8c1"",
"VersionId": "Nhhk5kRS6Yp6dZXVWpZZdRcpSpBKToD"
}

4. Upload a new object using the same key:

Syntax

aws --endpoint=http://RGW_PORT:8080 s3api put-object --bucket BUCKET_NAME --object-lock-mode


RETENTION_MODE --object-lock-retain-until-date "DATE" --key compliance-upload --body PATH

Example

[root@rgw-2 ~]# aws --endpoint=http://rgw.ceph.com:8080 s3api put-object --bucket worm-bucket


--object-lock-mode COMPLIANCE --object-lock-retain-until-date "2022-05-31" --key compliance-
upload --body /etc/fstab
{
"ETag": ""d560ea5652951637ba9c594d8e6ea8c1"",
"VersionId": "Nhhk5kRS6Yp6dZXVWpZZdRcpSpBKToD"
}

Command-line options

Set an object lock legal hold on an object version:

Example

[root@rgw-2 ~]# aws --endpoint=http://rgw.ceph.com:8080 s3api put-object-legal-hold --bucket


worm-bucket --key compliance-upload --legal-hold Status=ON

NOTE: Using the object lock legal hold operation, you can place a legal hold on an object version, thereby preventing an object
version from being overwritten or deleted. A legal hold doesn’t have an associated retention period and hence, remains in
effect until removed.

List the objects from the bucket to retrieve only the latest version of the object:

Example

[root@rgw-2 ~]# aws --endpoint=http://rgw.ceph.com:8080 s3api list-objects --bucket worm-


bucket

List the object versions from the bucket:

Example

[root@rgw-2 ~]# aws --endpoint=http://rgw.ceph.com:8080 s3api list-objects --bucket worm-


bucket
{
"Versions": [
{
"ETag": ""d560ea5652951637ba9c594d8e6ea8c1"",
"Size": 288,
"StorageClass": "STANDARD",
"Key": "hosts",
"VersionId": "Nhhk5kRS6Yp6dZXVWpZZdRcpSpBKToD",
"IsLatest": true,

IBM Storage Ceph 773


"LastModified": "2022-06-17T08:51:17.392000+00:00",
"Owner": {
"DisplayName": "Test User in Tenant test",
"ID": "test$test.user"
}
}
}
]
}

Access objects using version-ids:

Example

[root@rgw-2 ~]# aws --endpoint=http://rgw.ceph.com:8080 s3api get-object --bucket worm-bucket


--key compliance-upload --version-id 'IGOU.vdIs3SPduZglrB-RBaK.sfXpcd' download.1
{
"AcceptRanges": "bytes",
"LastModified": "2022-06-17T08:51:17+00:00",
"ContentLength": 288,
"ETag": ""d560ea5652951637ba9c594d8e6ea8c1"",
"VersionId": "Nhhk5kRS6Yp6dZXVWpZZdRcpSpBKToD",
"ContentType": "binary/octet-stream",
"Metadata": {},
"ObjectLockMode": "COMPLIANCE",
"ObjectLockRetainUntilDate": "2023-06-17T08:51:17+00:00"
}

Usage
Edit online
The Ceph Object Gateway logs usage for each user. You can track user usage within date ranges too.

Options include:

Start Date: The --start-date option allows you to filter usage stats from a particular start date (format: yyyy-mm-
dd[HH:MM:SS]).

End Date: The --end-date option allows you to filter usage up to a particular date (format: yyyy-mm-dd[HH:MM:SS]).

Log Entries: The --show-log-entries option allows you to specify whether or not to include log entries with the usage stats
(options: true | false).

NOTE: You can specify time with minutes and seconds, but it is stored with 1 hour resolution.

Show usage
Trim usage

Show usage
Edit online
To show usage statistics, specify the usage show. To show usage for a particular user, you must specify a user ID. You may also
specify a start date, end date, and whether or not to show log entries.

Example

[ceph: root@host01 /]# radosgw-admin usage show


--uid=johndoe --start-date=2022-06-01
--end-date=2022-07-01

You may also show a summary of usage information for all users by omitting a user ID.

Example

[ceph: root@host01 /]# radosgw-admin usage show --show-log-entries=false

774 IBM Storage Ceph


Trim usage
Edit online
With heavy use, usage logs can begin to take up storage space. You can trim usage logs for all users and for specific users. You may
also specify date ranges for trim operations.

Example

[ceph: root@host01 /]# radosgw-admin usage trim --start-date=2022-06-01


--end-date=2022-07-31

[ceph: root@host01 /]# radosgw-admin usage trim --uid=johndoe


[ceph: root@host01 /]# radosgw-admin usage trim --uid=johndoe --end-date=2021-04-31

Ceph Object Gateway data layout


Edit online
Although RADOS only knows about pools and objects with their Extended Attributes (xattrs) and object map (OMAP), conceptually
Ceph Object Gateway organizes its data into three different kinds:

metadata

bucket index

data

Metadata

There are three sections of metadata:

user: Holds user information.

bucket: Holds a mapping between bucket name and bucket instance ID.

bucket.instance: Holds bucket instance information.

You can use the following commands to view metadata entries:

Syntax

radosgw-admin metadata get bucket:BUCKET_NAME


radosgw-admin metadata get bucket.instance:BUCKET:BUCKET_ID
radosgw-admin metadata get user:USER
radosgw-admin metadata set user:USER

Example

[ceph: root@host01 /]# radosgw-admin metadata list


[ceph: root@host01 /]# radosgw-admin metadata list bucket
[ceph: root@host01 /]# radosgw-admin metadata list bucket.instance
[ceph: root@host01 /]# radosgw-admin metadata list user

Every metadata entry is kept on a single RADOS object.

IMPORTANT: When using the radosgw-admin tool, ensure that the tool and the Ceph Cluster are of the same version. The use of
mismatched versions is not supported.

NOTE: A Ceph Object Gateway object might consist of several RADOS objects, the first of which is the head that contains the
metadata, such as manifest, Access Control List (ACL), content type, ETag, and user-defined metadata. The metadata is stored in
xattrs. The head might also contain up to 512 KB of object data, for efficiency and atomicity. The manifest describes how each
object is laid out in RADOS objects.

Bucket index

IBM Storage Ceph 775


It is a different kind of metadata, and kept separately. The bucket index holds a key-value map in RADOS objects. By default, it is a
single RADOS object per bucket, but it is possible to shard the map over multiple RADOS objects.

The map itself is kept in OMAP associated with each RADOS object. The key of each OMAP is the name of the objects, and the value
holds some basic metadata of that object, the metadata that appears when listing the bucket. Each OMAP holds a header, and we
keep some bucket accounting metadata in that header such as number of objects, total size, and the like.

NOTE: OMAP is a key-value store, associated with an object, in a way similar to how extended attributes associate with a POSIX file.
An object’s OMAP is not physically located in the object’s storage, but its precise implementation is invisible and immaterial to the
Ceph Object Gateway.

Data Objects data is kept in one or more RADOS objects for each Ceph Object Gateway object.

Object lookup path


Multiple data pools
Bucket and object listing
Object Gateway data layout parameters

Object lookup path


Edit online
When accessing objects, REST APIs come to Ceph Object Gateway with three parameters:

Account information, which has the access key in S3 or account name in Swift

Bucket or container name

Object name or key

At present, Ceph Object Gateway only uses account information to find out the user ID and for access control. It uses only the bucket
name and object key to address the object in a pool.

Account information

The user ID in Ceph Object Gateway is a string, typically the actual user name from the user credentials and not a hashed or mapped
identifier.

When accessing a user’s data, the user record is loaded from an object USER_ID in the default.rgw.meta pool with users.uid
namespace.

Bucket names

They are represented in the default.rgw.meta pool with root namespace. Bucket record is loaded in order to obtain a marker,
which serves as a bucket ID.

Object names

The object is located in the default.rgw.buckets.data pool. Object name is MARKER_KEY, for example
default.7593.4_image.png, where the marker is default.7593.4 and the key is image.png. These concatenated names are
not parsed and are passed down to RADOS only. Therefore, the choice of the separator is not important and causes no ambiguity. For
the same reason, slashes are permitted in object names, such as keys.

Multiple data pools


Edit online
It is possible to create multiple data pools so that different users’ buckets are created in different RADOS pools by default, thus
providing the necessary scaling. The layout and naming of these pools is controlled by a policy setting.

Bucket and object listing


776 IBM Storage Ceph
Edit online
Buckets that belong to a given user are listed in an OMAP of an object named USER_ID.buckets, for example, foo.buckets, in the
default.rgw.meta pool with users.uid namespace. These objects are accessed when listing buckets, when updating bucket
contents, and updating and retrieving bucket statistics such as quota. These listings are kept consistent with buckets in the .rgw
pool.

NOTE: See the user-visible, encoded class cls_user_bucket_entry and its nested class cls_user_bucket for the values of
these OMAP entries.

Objects that belong to a given bucket are listed in a bucket index. The default naming for index objects is .dir.MARKER in the
default.rgw.buckets.index pool.

Reference
Edit online

Configure bucket index resharding

Object Gateway data layout parameters


Edit online
This is a list of data layout parameters for Ceph Object Gateway.

Known pools:

.rgw.root Unspecified region, zone, and global information records, one per object.

ZONE.rgw.control notify.N

ZONE.rgw.meta Multiple namespaces with different kinds of metadata

namespace: root BUCKET.bucket.meta.BUCKET:MARKER # see put_bucket_instance_info() The tenant is used to disambiguate


buckets, but not bucket instances.

Example

.bucket.meta.prodtx:test%25star:default.84099.6
.bucket.meta.testcont:default.4126.1
.bucket.meta.prodtx:testcont:default.84099.4
prodtx/testcont
prodtx/test%25star
testcont

namespace: users.uid Contains per-user information (RGWUserInfo) in USER objects and per-user lists of buckets in omaps of
USER.buckets objects. The USER might contain the tenant if non-empty.

Example

prodtx$prodt
test2.buckets
prodtx$prodt.buckets
test2

namespace: users.email Unimportant

namespace: users.keys 47UA98JSTJZ9YAN3OS3O

This allows Ceph Object Gateway to look up users by their access keys during authentication.

namespace: users.swift test:tester

ZONE.rgw.buckets.index Objects are named .dir.MARKER, each contains a bucket index. If the index is sharded, each shard
appends the shard index after the marker. dow_].488urDFerTYXavx4yAd-Op8mxehnvTI_1 MARKER_pass:KEY

An example of a marker would be default.16004.1 or default.7593.4. The current format is ZONE.INSTANCE_ID.BUCKET,


but once generated, a marker is not parsed again, so its format might change freely in the future.

IBM Storage Ceph 777


ZONE.rgw.buckets.data default.7593.4__shadow_.488urDFerTYXavx4yAd-Op8mxehnvTI_1 MARKER_KEY

An example of a marker would be default.16004.1 or default.7593.4. The current format is


ZONE.INSTANCE_ID.BUCKET_ID, but once generated, a marker is not parsed again, so its format might change freely in the future.

Reference
Edit online

Ceph Object Gateway layout

Optimize the Ceph Object Gateway's garbage collection


Edit online
When new data objects are written into the storage cluster, the Ceph Object Gateway immediately allocates the storage for these
new objects. After you delete or overwrite data objects in the storage cluster, the Ceph Object Gateway deletes those objects from
the bucket index. Some time afterward, the Ceph Object Gateway then purges the space that was used to store the objects in the
storage cluster. The process of purging the deleted object data from the storage cluster is known as Garbage Collection, or GC.

Garbage collection operations typically run in the background. You can configure these operations to either run continuously, or to
run only during intervals of low activity and light workloads. By default, the Ceph Object Gateway conducts GC operations
continuously. Because GC operations are a normal part of Ceph Object Gateway operations, deleted objects that are eligible for
garbage collection exist most of the time.

Viewing the garbage collection queue


Adjusting Garbage Collection Settings
Adjusting garbage collection for delete-heavy workloads

Viewing the garbage collection queue


Edit online
Before you purge deleted and overwritten objects from the storage cluster, use radosgw-admin to view the objects awaiting
garbage collection.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Object Gateway.

Procedure
Edit online

To view the queue of objects awaiting garbage collection:

Example

[ceph: root@host01 /]# radosgw-admin gc list

NOTE: To list all entries in the queue, including unexpired entries, use the --include-all option.

Adjusting Garbage Collection Settings

778 IBM Storage Ceph


Edit online
The Ceph Object Gateway allocates storage for new and overwritten objects immediately. Additionally, the parts of a multi-part
upload also consume some storage.

The Ceph Object Gateway purges the storage space used for deleted objects after deleting the objects from the bucket index.
Similarly, the Ceph Object Gateway will delete data associated with a multi-part upload after the multi-part upload completes or
when the upload has gone inactive or failed to complete for a configurable amount of time.

Viewing the objects awaiting garbage collection can be done with the following command:

radosgw-admin gc list

Garbage collection is a background activity that runs continuously or during times of low loads, depending upon how the storage
administrator configures the Ceph Object Gateway. By default, the Ceph Object Gateway conducts garbage collection operations
continuously. Since garbage collection operations are a normal function of the Ceph Object Gateway, especially with object delete
operations, objects eligible for garbage collection exist most of the time.

Some workloads can temporarily or permanently outpace the rate of garbage collection activity. This is especially true of delete-
heavy workloads, where many objects get stored for a short period of time and then deleted. For these types of workloads, storage
administrators can increase the priority of garbage collection operations relative to other operations with the following configuration
parameters:

The rgw_gc_obj_min_wait configuration option waits a minimum length of time, in seconds, before purging a deleted
object’s data. The default value is two hours, or 7200 seconds. The object is not purged immediately, because a client might
be reading the object. Under heavy workloads, this setting can consume too much storage or have a large number of deleted
objects to purge. IBM recommends not setting this value below 30 minutes, or 1800 seconds.

The rgw_gc_processor_period configuration option is the garbage collection cycle run time. That is, the amount of time
between the start of consecutive runs of garbage collection threads. If garbage collection runs longer than this period, the
Ceph Object Gateway will not wait before running a garbage collection cycle again.

The rgw_gc_max_concurrent_io configuration option specifies the maximum number of concurrent IO operations that the
gateway garbage collection thread will use when purging deleted data. Under delete heavy workloads, consider increasing this
setting to a larger number of concurrent IO operations.

The rgw_gc_max_trim_chunk configuration option specifies the maximum number of keys to remove from the garbage
collector log in a single operation. Under delete heavy operations, consider increasing the maximum number of keys so that
more objects are purged during each garbage collection operation.

Some new configuration parameters have been added to Ceph Object Gateway to tune the garbage collection queue, as follows:

The rgw_gc_max_deferred_entries_size configuration option sets the maximum size of deferred entries in the garbage
collection queue.

The rgw_gc_max_queue_size configuration option sets the maximum queue size used for garbage collection. This value
should not be greater than osd_max_object_size minus rgw_gc_max_deferred_entries_size minus 1 KB.

The rgw_gc_max_deferred configuration option sets the maximum number of deferred entries stored in the garbage
collection queue.

NOTE: In testing, with an evenly balanced delete-write workload, such as 50% delete and 50% write operations, the storage cluster
fills completely in 11 hours. This is because Ceph Object Gateway garbage collection fails to keep pace with the delete operations.
The cluster status switches to the HEALTH_ERR state if this happens. Aggressive settings for parallel garbage collection tunables
significantly delayed the onset of storage cluster fill in testing and can be helpful for many workloads. Typical real-world storage
cluster workloads are not likely to cause a storage cluster fill primarily due to garbage collection.

Adjusting garbage collection for delete-heavy workloads


Edit online
Some workloads may temporarily or permanently outpace the rate of garbage collection activity. This is especially true of delete-
heavy workloads, where many objects get stored for a short period of time and are then deleted. For these types of workloads,
consider increasing the priority of garbage collection operations relative to other operations. Contact IBM Support with any
additional questions about Ceph Object Gateway Garbage Collection.

IBM Storage Ceph 779


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all nodes in the storage cluster.

Procedure
Edit online

1. Set the value of rgw_gc_max_concurrent_io to 20, and the value of rgw_gc_max_trim_chunk to 64:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_gc_max_concurrent_io 20


[ceph: root@host01 /]# ceph config set client.rgw rgw_gc_max_trim_chunk 64

2. Restart the Ceph Object Gateway to allow the changed settings to take effect.

3. Monitor the storage cluster during GC activity to verify that the increased values do not adversely affect performance.

IMPORTANT: Never modify the value for the rgw_gc_max_objs option in a running cluster. You should only change this value
before deploying the RGW nodes.

Reference
Edit online

Red Hat Ceph RGW - Garbae Collection (GC) Tuning Options

Ceph Object Gateway General Settings

Optimize the Ceph Object Gateway's data object storage


Edit online
Bucket lifecycle configuration optimizes data object storage to increase its efficiency and to provide effective storage throughout the
lifetime of the data.

The S3 API in the Ceph Object Gateway currently supports a subset of the AWS bucket lifecycle configuration actions:

Expiration

NoncurrentVersionExpiration

AbortIncompleteMultipartUpload

Parallel thread processing for bucket life cycles


Optimizing the bucket lifecycle

Parallel thread processing for bucket life cycles


Edit online
The Ceph Object Gateway now allows for parallel thread processing of bucket life cycles across multiple Ceph Object Gateway
instances. Increasing the number of threads that run in parallel enables the Ceph Object Gateway to process large workloads more
efficiently. In addition, the Ceph Object Gateway now uses a numbered sequence for index shard enumeration instead of using in-
order numbering.

780 IBM Storage Ceph


Optimizing the bucket lifecycle
Edit online
Two options in the Ceph configuration file affect the efficiency of bucket lifecycle processing:

rgw_lc_max_worker specifies the number of lifecycle worker threads to run in parallel. This enables the simultaneous
processing of both bucket and index shards. The default value for this option is 3.

rgw_lc_max_wp_worker specifies the number of threads in each lifecycle worker thread’s work pool. This option helps to
accelerate processing for each bucket. The default value for this option is 3.

For a workload with a large number of buckets — for example, a workload with thousands of buckets — consider increasing the value
of the rgw_lc_max_worker option.

For a workload with a smaller number of buckets but with a higher number of objects in each bucket — such as in the hundreds of
thousands — consider increasing the value of the rgw_lc_max_wp_worker option.

NOTE: Before increasing the value of either of these options, please validate current storage cluster performance and Ceph Object
Gateway utilization. Red Hat does not recommend that you assign a value of 10 or above for either of these options.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to all of the nodes in the storage cluster.

Procedure
Edit online

1. To increase the number of threads to run in parallel, set the value of rgw_lc_max_worker to a value between 3 and 9:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_lc_max_worker 7

2. To increase the number of threads in each thread’s work pool, set the value of rgw_lc_max_wp_worker to a value between
3 and 9:

Example

[ceph: root@host01 /]# ceph config set client.rgw rgw_lc_max_wp_worker 7

3. Restart the Ceph Object Gateway to allow the changed settings to take effect.

4. Monitor the storage cluster to verify that the increased values do not adversely affect performance.

Reference
Edit online

For more information about Ceph Object Gateway lifecycle, contact IBM Support.

Testing
Edit online
As a storage administrator, you can do basic functionality testing to verify that the Ceph Object Gateway environment is working as
expected. You can use the REST interfaces by creating an initial Ceph Object Gateway user for the S3 interface, and then create a
subuser for the Swift interface.

IBM Storage Ceph 781


Create an S3 user
Create a Swift user
Test S3 access
Test Swift access

Prerequisites
Edit online

A healthy running IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway software.

Create an S3 user
Edit online
To test the gateway, create an S3 user and grant the user access. The man radosgw-admin command provides information on
additional command options.

NOTE: In a multi-site deployment, always create a user on a host in the master zone of the master zone group.

Prerequisites
Edit online

root or sudo access

Ceph Object Gateway installed

Procedure
Edit online

1. Create an S3 user:

Syntax

radosgw-admin user create --uid=name --display-name="USER_NAME"

Replace name with the name of the S3 user:

Example

[root@host01 ~]# radosgw-admin user create --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_object-gateway_proc_rgw_create-
an-s3-user_testuser" --display-name="Jane Doe"
{
"user_id": "testuser",
"display_name": "Jane Doe",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "testuser",
"access_key": "CEP28KDIQXBKU4M15PDC",
"secret_key": "MARoio8HFc8JxhEilES3dKFVj8tV3NOOYymihTLO"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",

782 IBM Storage Ceph


"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw"
}

2. Verify the output to ensure that the values of access_key and secret_key do not include a JSON escape character (\).
These values are needed for access validation, but certain clients cannot handle if the values include JSON escape characters.
To fix this problem, perform one of the following actions:

Remove the JSON escape character.

Encapsulate the string in quotes.

Regenerate the key and ensure that it does not include a JSON escape character.

Specify the key and secret manually.

Do not remove the forward slash / because it is a valid character.

Create a Swift user


Edit online
To test the Swift interface, create a Swift subuser. Creating a Swift user is a two-step process. The first step is to create the user. The
second step is to create the secret key.

NOTE: In a multi-site deployment, always create a user on a host in the master zone of the master zone group.

Prerequisites
Edit online

A running, and healthy IBM Storage Ceph cluster.

Installation of the Ceph Object Gateway.

Root-level access to the Ceph Object Gateway node.

Procedure
Edit online

1. Create the Swift user:

Syntax

radosgw-admin subuser create --uid=NAME --subuser=NAME:swift --access=full

Replace NAME with the Swift user name, for example:

Example

[root@host01 ~]# radosgw-admin subuser create --uid=testuser --subuser=testuser:swift --


access=full

IBM Storage Ceph 783


{
"user_id": "testuser",
"display_name": "First User",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [
{
"id": "testuser:swift",
"permissions": "full-control"
}
],
"keys": [
{
"user": "testuser",
"access_key": "O8JDE41XMI74O185EHKD",
"secret_key": "i4Au2yxG5wtr1JK01mI8kjJPM93HNAoVWOSTdJd6"
}
],
"swift_keys": [
{
"user": "testuser:swift",
"secret_key": "13TLtdEW7bCqgttQgPzxFxziu0AgabtOc6vM8DLA"
}
],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw"
}

2. Create the secret key:

Syntax

radosgw-admin key create --subuser=NAME:swift --key-type=swift --gen-secret

Replace NAME with the Swift user name, for example:

Example

[root@host01 ~]# radosgw-admin key create --subuser=testuser:swift --key-type=swift --gen-


secret
{
"user_id": "testuser",
"display_name": "First User",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [
{
"id": "testuser:swift",
"permissions": "full-control"
}
],
"keys": [
{

784 IBM Storage Ceph


"user": "testuser",
"access_key": "O8JDE41XMI74O185EHKD",
"secret_key": "i4Au2yxG5wtr1JK01mI8kjJPM93HNAoVWOSTdJd6"
}
],
"swift_keys": [
{
"user": "testuser:swift",
"secret_key": "a4ioT4jEP653CDcdU8p4OuhruwABBRZmyNUbnSSt"
}
],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw"
}

Test S3 access
Edit online
You need to write and run a Python test script for verifying S3 access. The S3 access test script will connect to the radosgw, create a
new bucket, and list all buckets. The values for aws_access_key_id and aws_secret_access_key are taken from the values of
access_key and secret_key returned by the radosgw_admin command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the nodes.

Procedure
Edit online

1. Enable high availability repository.

2. Install the python3-boto3 package:

dnf install python3-boto3

3. Create the Python script:

vi s3test.py

4. Add the following contents to the file:

Syntax

import boto3

IBM Storage Ceph 785


endpoint = "" # enter the endpoint _URL_ along with the port "http://URL:PORT"

access_key = 'ACCESS'
secret_key = 'SECRET'

s3 = boto3.client(
's3',
endpoint_url=endpoint,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key
)

s3.create_bucket(Bucket='my-new-bucket')

response = s3.list_buckets()
for bucket in response['Buckets']:
print("{name}\t{created}".format(
name = bucket['Name'],
created = bucket['CreationDate']
))

Replace endpoint with the URL of the host where you have configured the gateway service. That is, the gateway
host. Ensure that the host setting resolves with DNS. Replace PORT with the port number of the gateway.

Replace ACCESS and SECRET with the access_key and secret_key values

5. Run the script:

python3 s3test.py

The output will be something like the following:

my-new-bucket 2022-05-31T17:09:10.000Z

Test Swift access


Edit online
Swift access can be verified via the swift command line client. The command man swift will provide more information on
available command line options.

To install the swift client, run the following command:

sudo yum install python-setuptools


sudo easy_install pip
sudo pip install --upgrade setuptools
sudo pip install --upgrade python-swiftclient

To test swift access, run the following command:

Syntax

# swift -A http://IP_ADDRESS:PORT/auth/1.0 -U testuser:swift -K 'SWIFT_SECRET_KEY' list

Replace IP_ADDRESS with the public IP address of the gateway server and SWIFT_SECRET_KEY with its value from the output of
the radosgw-admin key create command issued for the swift user. Replace PORT with the port number you are using with
Beast. If you do not replace the port, it will default to port 80.

For example:

swift -A http://10.10.143.116:80/auth/1.0 -U testuser:swift -K


'244+fz2gSqoHwR3lYtSbIyomyPHf3i7rgSJrF/IA' list

The output should be:

my-new-bucket

Configuration reference
786 IBM Storage Ceph
Edit online
As a storage administrator, you can set various options for the Ceph Object Gateway. These options contain default values. If you do
not specify each option, then the default value is set automatically.

To set specific values for these options, update the configuration database by using the ceph config set client.rgw
_OPTION_ _VALUE_ command.

General settings
About pools
Lifecycle settings
Swift settings
Logging settings
Keystone settings
LDAP settings

General settings
Edit online
Name Description Type Default
rgw_data Sets the location of the data files for Ceph Object Gateway. String /var/lib/cep
h/radosgw/$c
luster-$id
rgw_enable_a Enables the specified APIs. String s3,
pis s3website,
swift,
swift_auth,
admin, sts,
iam,
notification
s
rgw_cache_en Whether the Ceph Object Gateway cache is enabled. Boolean true
abled
rgw_cache_lr The number of entries in the Ceph Object Gateway cache. Integer 10000
u_size
rgw_socket_p The socket path for the domain socket. FastCgiExternalServer uses String N/A
ath this socket. If you do not specify a socket path, Ceph Object Gateway will
not run as an external server. The path you specify here must be the same
as the path specified in the rgw.conf file.
rgw_host The host for the Ceph Object Gateway instance. Can be an IP address or a String 0.0.0.0
hostname.
rgw_port Port the instance listens for requests. If not specified, Ceph Object Gateway String None
runs external FastCGI.
rgw_dns_name The DNS name of the served domain. See also the hostnames setting String None
within zone groups.
rgw_script_u The alternative value for the SCRIPT_URI if not set in the request. String None
ri
rgw_request_ The alternative value for the REQUEST_URI if not set in the request. String None
uri
rgw_print_co Enable 100-continue if it is operational. Boolean true
ntinue
rgw_remote_a The remote address parameter. For example, the HTTP field containing the String REMOTE_ADDR
ddr_param remote address, or the X-Forwarded-For address if a reverse proxy is
operational.
rgw_op_threa The timeout in seconds for open threads. Integer 600
d_timeout
rgw_op_threa The timeout in seconds before a Ceph Object Gateway process dies. Integer 0
d_suicide_ti Disabled if set to 0.
meout
rgw_thread_p The size of the thread pool. Integer 512 threads.
ool_size
rgw_num_cont The number of notification objects used for cache synchronization between Integer 8
rol_oids different rgw instances.

IBM Storage Ceph 787


Name Description Type Default
rgw_init_tim The number of seconds before Ceph Object Gateway gives up on Integer 30
eout initialization.
rgw_mime_typ The path and location of the MIME types. Used for Swift auto-detection of String /etc/mime.ty
es_file object types. pes
rgw_gc_max_o The maximum number of objects that may be handled by garbage collection Integer 32
bjs in one garbage collection processing cycle.
rgw_gc_obj_m The minimum wait time before the object may be removed and handled by Integer 2 * 3600
in_wait garbage collection processing.
rgw_gc_proce The maximum time between the beginning of two consecutive garbage Integer 3600
ssor_max_tim collection processing cycles.
e
rgw_gc_proce The cycle time for garbage collection processing. Integer 3600
ssor_period
rgw_s3 The alternate success status response for create-obj. Integer 0
success_crea
te_obj_statu
s
rgw_resolve_ Whether rgw should use the DNS CNAME record of the request hostname Boolean false
cname field (if hostname is not equal to rgw_dns name).
rgw_object_s The size of an object stripe for Ceph Object Gateway objects. Integer 4 << 20
tripe_size
rgw_extended Add a new set of attributes that could be set on an object. These extra String None. For
_http_attrs attributes can be set through HTTP header fields when putting the objects. example:
If set, these attributes will return as HTTP fields when doing GET/HEAD on "content_foo,
the object. content_bar"
rgw_exit_tim Number of seconds to wait for a process before exiting unconditionally. Integer 120
eout_secs
rgw_get_obj_ The window size in bytes for a single object request. Integer 16 << 20
window_size
rgw_get_obj_ The maximum request size of a single get operation sent to the Ceph Integer 4 << 20
max_req_size Storage Cluster.
rgw_relaxed_ Enables relaxed S3 bucket names rules for zone group buckets. Boolean false
s3_bucket_na
mes
rgw_list The maximum number of buckets to retrieve in a single operation when Integer 1000
buckets_max_ listing user buckets.
chunk
rgw_override The number of shards for the bucket index object. A value of 0 indicates Integer 0
_bucket_inde there is no sharding. IBM does not recommend setting a value too large (for
x_max_shards
example, 1000) as it increases the cost for bucket listing.

This variable should be set in the [client] or the [global] section so it


is automatically applied to radosgw-admin commands.
rgw_curl_wai The timeout in milliseconds for certain curl calls. Integer 1000
t_timeout_ms
rgw_copy_obj Enables output of object progress during long copy operations. Boolean true
_progress
rgw_copy_obj The minimum bytes between copy progress output. Integer 1024 * 1024
_progress_ev
ery_bytes
rgw_admin_en The entry point for an admin request URL. String admin
try
rgw_content_ Enable compatibility handling of FCGI requests with both Boolean false
length_compa CONTENT_LENGTH AND HTTP_CONTENT_LENGTH set.
t
rgw_bucket_d The default maximum number of objects per bucket. This value is set on Integer -1
efault_quota new users if no other quota is specified. It has no effect on existing users.
_max_objects
This variable should be set in the [client] or the [global] section so it
is automatically applied to radosgw-admin commands.
rgw_bucket_q The amount of time in seconds cached quota information is trusted. After Integer 600
uota_ttl this timeout, the quota information will be re-fetched from the cluster.

788 IBM Storage Ceph


Name Description Type Default
rgw_user_quo The amount of time in seconds bucket quota information is accumulated Integer 180
ta_bucket_sy before syncing to the cluster. During this time, other RGW instances will not
nc_interval
see the changes in bucket quota stats from operations on this instance.
rgw_user_quo The amount of time in seconds user quota information is accumulated Integer 3600 * 24
ta_sync_inte before syncing to the cluster. During this time, other RGW instances will not
rval
see the changes in user quota stats from operations on this instance.
log_meta A zone parameter to determine whether or not the gateway logs the Boolean false
metadata operations.
log_data A zone parameter to determine whether or not the gateway logs the data Boolean false
operations.
sync_from_al A radosgw-admin command to set or unset whether zone syncs from all Boolean false
l zonegroup peers.

About pools
Edit online
Ceph zones map to a series of Ceph Storage Cluster pools.

Manually Created Pools vs. Generated Pools

If the user key for the Ceph Object Gateway contains write capabilities, the gateway has the ability to create pools automatically.
This is convenient for getting started. However, the Ceph Object Storage Cluster uses the placement group default values unless they
were set in the Ceph configuration file. Additionally, Ceph will use the default CRUSH hierarchy. These settings are NOT ideal for
production systems.

The default pools for the Ceph Object Gateway’s default zone include:

.rgw.root

.default.rgw.control

.default.rgw.meta

.default.rgw.log

.default.rgw.buckets.index

.default.rgw.buckets.data

.default.rgw.buckets.non-ec

The Ceph Object Gateway creates pools on a per zone basis. If you create the pools manually, prepend the zone name. The system
pools store objects related to, for example, system control, logging, and user information. By convention, these pool names have the
zone name prepended to the pool name.

.<zone-name>.rgw.control: The control pool.

.<zone-name>.log: The log pool contains logs of all bucket/container and object actions, such as create, read, update, and
delete.

.<zone-name>.rgw.buckets.index: This pool stores the index of the buckets.

.<zone-name>.rgw.buckets.data: This pool stores the data of the buckets.

.<zone-name>.rgw.meta: The metadata pool stores user_keys and other critical metadata.

.<zone-name>.meta:users.uid: The user ID pool contains a map of unique user IDs.

.<zone-name>.meta:users.keys: The keys pool contains access keys and secret keys for each user ID.

.<zone-name>.meta:users.email: The email pool contains email addresses associated with a user ID.

.<zone-name>.meta:users.swift: The Swift pool contains the Swift subuser information for a user ID.

IBM Storage Ceph 789


Ceph Object Gateways store data for the bucket index (index_pool) and bucket data (data_pool) in placement pools. These may
overlap; that is, you may use the same pool for the index and the data. The index pool for default placement is {zone-
name}.rgw.buckets.index and for the data pool for default placement is {zone-name}.rgw.buckets.

Name Description Type Default


rgw_zonegroup_root_pool The pool for storing all zone group-specific information. String .rgw.root
rgw_zone_root_pool The pool for storing zone-specific information. String .rgw.root

Lifecycle settings
Edit online
As a storage administrator, you can set various bucket lifecycle options for a Ceph Object Gateway. These options contain default
values. If you do not specify each option, then the default value is set automatically.

To set specific values for these options, update the configuration database by using the ceph config set client.rgw
_OPTION_ _VALUE_ command.

Name Description Type Default


rgw_lc_debug For developer use only to debug lifecycle rules by scaling expiration rules Integer -1
_interval from days into an interval in seconds. IBM recommends that this option not
be used in a production cluster.
rgw_lc_lock_ The timeout value used internally by the Ceph Object Gateway. Integer 90
max_time
rgw_lc_max_o Controls the sharding of the RADOS Gateway internal lifecycle work queues, Integer 32
bjs and should only be set as part of a deliberate resharding workflow. IBM
recommends not changing this setting after the setup of your cluster,
without first contacting IBM Support.
rgw_lc_max_r The number of lifecycle rules to include in one, per bucket, lifecycle Integer 1000
ules configuration document. The Amazon Web Service (AWS) limit is 1000 rules.
rgw_lc_max_w The number of lifecycle worker threads to run in parallel, processing bucket Integer 3
orker and index shards simultaneously. IBM does not recommend setting a value
larger than 10 without contacting IBM Support.
rgw_lc_max_w The number of buckets that each lifecycle worker thread can process in Integer 3
p_worker parallel. IBM does not recommend setting a value larger than 10 without
contacting IBM Support.
rgw_lc_threa A delay, in milliseconds, that can be injected into shard processing at Integer 0
d_delay several points. The default value is 0. Setting a value from 10 to 100 ms
would reduce CPU utilization on RADOS Gateway instances and reduce the
proportion of workload capacity of lifecycle threads relative to ingest if
saturation is being observed.

Swift settings
Edit online
Name Description Type Default
rgw_enforce_swift_acls Enforces the Swift Access Control List (ACL) settings. Boolean true
rgw_swift_token_expira The time in seconds for expiring a Swift token. Integer 24 * 3600
tion
rgw_swift_url The URL for the Ceph Object Gateway Swift API. String None
rgw_swift_url_prefix The URL prefix for the Swift API, for example, swift N/A
http://fqdn.com/swift.
rgw_swift_auth_url Default URL for verifying v1 auth tokens (if not using internal String None
Swift auth).
rgw_swift_auth_entry The entry point for a Swift auth URL. String auth

790 IBM Storage Ceph


Logging settings
Edit online
Name Description Type Default
rgw_log_nonexi Enables Ceph Object Gateway to log a request for a non-existent bucket. Boolean false
stent_bucket
rgw_log_object The logging format for an object name. See manpage date for details Date %Y-%m-%d-%H-
_name about format specifiers. %i-%n
rgw_log_object Whether a logged object name includes a UTC time. If false, it uses the Boolean false
_name_utc local time.
rgw_usage_max_ The maximum number of shards for usage logging. Integer 32
shards
rgw_usage_max_ The maximum number of shards used for a single user’s usage logging. Integer 1
user_shards
rgw_enable_ops Enable logging for each successful Ceph Object Gateway operation. Boolean false
_log
rgw_enable_usa Enable the usage log. Boolean false
ge_log
rgw_ops_log_ra Whether the operations log should be written to the Ceph Storage Cluster Boolean true
dos backend.
rgw_ops_log_so The Unix domain socket for writing operations logs. String None
cket_path
rgw_ops_log_da The maximum data backlog data size for operations logs written to a Unix Integer 5 << 20
ta-backlog domain socket.
rgw_usage_log_ The number of dirty merged entries in the usage log before flushing Integer 1024
flush_threshol synchronously.
d
rgw_usage_log_ Flush pending usage log data every n seconds. Integer 30
tick_interval
rgw_intent_log The logging format for the intent log object name. See manpage date for Date %Y-%m-%d-%i-
_object_name details about format specifiers. %n
rgw_intent_log Whether the intent log object name includes a UTC time. If false, it uses Boolean false
_object_name_u the local time.
tc
rgw_data_log_w The data log entries window in seconds. Integer 30
indow
rgw_data_log_c The number of in-memory entries to hold for the data changes log. Integer 1000
hanges_size
rgw_data_log_n The number of shards (objects) on which to keep the data changes log. Integer 128
um_shards
rgw_data_log_o The object name prefix for the data log. String data_log
bj_prefix
rgw_replica_lo The object name prefix for the replica log. String replica log
g_obj_prefix
rgw_md_log_max The maximum number of shards for the metadata log. Integer 64
_shards
rgw_log_http_h Comma-delimited list of HTTP headers to include with ops log entries. String None
eaders Header names are case insensitive, and use the full header name with
words separated by underscores.

Keystone settings
Edit online
Name Description Type Default
rgw_keystone_url The URL for the Keystone server. String None
rgw_keystone_admin_token The Keystone admin token (shared secret). String None
rgw_keystone_accepted_roles The roles required to serve requests. String Member,
admin

IBM Storage Ceph 791


Name Description Type Default
rgw_keystone_token_cache_siz The maximum number of entries in each Keystone Integer 10000
e token cache.
rgw_keystone_revocation_inte The number of seconds between token revocation Integer 15 * 60
rval checks.

LDAP settings
Edit online
Name Description Type Example
rgw_ldap_uri A space-separated list of LDAP servers in URI format. String ldaps://<ldap.your.domain
>
rgw_ldap_sea The LDAP search domain name, also known as base String cn=users,cn=accounts,dc=e
rchdn domain. xample,dc=com
rgw_ldap_bin The gateway will bind with this LDAP entry (user match). String uid=admin,cn=users,dc=exa
ddn mple,dc=com
rgw_ldap_sec A file containing credentials for rgw_ldap_binddn. String /etc/openldap/secret
ret
rgw_ldap_dna LDAP attribute containing Ceph object gateway user String uid
ttr names (to form binddns).

Block devices
Edit online
Learn to manage, create, configure, and use IBM Storage Ceph Block Devices.

Introduction to Ceph block devices


Ceph block devices
Live migration of images
Image encryption
Snapshot management
Mirroring Ceph block devices
Management of ceph-immutable-object-cache daemons
The rbd kernel module
Using the Ceph block device Python module
Ceph block device configuration reference

Introduction to Ceph block devices


Edit online
A block is a set length of bytes in a sequence, for example, a 512-byte block of data. Combining many blocks together into a single
file can be used as a storage device that you can read from and write to. Block-based storage interfaces are the most common way to
store data with rotating media such as:

Hard drives

CD/DVD discs

Floppy disks

Traditional 9-track tapes

Ceph block devices are thin-provisioned, resizable and store data striped over multiple Object Storage Devices (OSD) in a Ceph
storage cluster. Ceph block devices are also known as Reliable Autonomic Distributed Object Store (RADOS) Block Devices (RBDs).
Ceph block devices leverage RADOS capabilities such as:

792 IBM Storage Ceph


Snapshots

Replication

Data consistency

Ceph block devices interact with OSDs by using the librbd library.

Ceph block devices deliver high performance with infinite scalability to Kernel Virtual Machines (KVMs), such as Quick Emulator
(QEMU), and cloud-based computing systems, like OpenStack, that rely on the libvirt and QEMU utilities to integrate with Ceph
block devices. You can use the same storage cluster to operate the Ceph Object Gateway and Ceph block devices simultaneously.

IMPORTANT: Using Ceph block devices requires access to a running Ceph storage cluster. For details on installing an IBM Storage
Ceph cluster, see the Installation.

Ceph block devices


Edit online
As a storage administrator, being familiar with Ceph's block device commands can help you effectively manage the IBM Storage
Ceph cluster. You can create and manage block devices pools and images, along with enabling and disabling the various features of
Ceph block devices.

Displaying the command help


Creating a block device pool
Creating a block device image
Listing the block device images
Retrieving the block device image information
Resizing a block device image
Removing a block device image
Moving a block device image to the trash
Defining an automatic trash purge schedule
Enabling and disabling image features
Working with image metadata
Moving images between pools
The rbdmap service
Configuring the rbdmap service
Persistent Write Log Cache (Technology Preview)
Persistent write log cache limitations
Enabling persistent write log cache
Checking persistent write log cache status
Flushing persistent write log cache
Discarding persistent write log cache
Monitoring performance of Ceph Block Devices using the command-line interface

Displaying the command help


Edit online
Display command, and sub-command online help from the command-line interface.

NOTE: The -h option still displays help for all available commands.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

IBM Storage Ceph 793


Procedure
Edit online

1. Use the rbd help command to display help for a particular rbd command and its subcommand:

Syntax

rbd help COMMAND SUBCOMMAND

2. To display help for the snap list command:

[root@rbd-client ~]# rbd help snap list

Creating a block device pool


Edit online
Before using the block device client, ensure a pool for rbd exists, is enabled and initialized.

NOTE: You MUST create a pool first before you can specify it as a source.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. To create an rbd pool, run the following:

Syntax

ceph osd pool create POOL_NAME PG_NUM


ceph osd pool application enable POOL_NAME rbd
rbd pool init -p POOL_NAME

Example

[root@rbd-client ~]# ceph osd pool create pool1


[root@rbd-client ~]# ceph osd pool application enable pool1 rbd
[root@rbd-client ~]# rbd pool init -p pool1

Reference
Edit online

See Pools for additional details.

Creating a block device image


Edit online
Before adding a block device to a node, create an image for it in the Ceph storage cluster.

Prerequisites
794 IBM Storage Ceph
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. To create a block device image, execute the following command:

Syntax

rbd create IMAGE_NAME --size MEGABYTES --pool POOL_NAME

Example

[root@rbd-client ~]# rbd create image1 --size 1024 --pool pool1

This example creates a 1 GB image named image1 that stores information in a pool named pool1.

NOTE: Ensure the pool exists before creating an image.

Reference
Edit online

See Creating a block device pool for additional details.

Listing the block device images


Edit online
List the block device images.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. To list block devices in the rbd pool, execute the following command:

NOTE: rbd is the default pool name.

Example

[root@rbd-client ~]# rbd ls

2. To list block devices in a specific pool:

Syntax

rbd ls POOL_NAME

Example

[root@rbd-client ~]# rbd ls pool1

IBM Storage Ceph 795


Retrieving the block device image information
Edit online
Retrieve information on the block device image.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. To retrieve information from a particular image in the default rbd pool, run the following command:

Syntax

rbd --image IMAGE_NAME info

Example

[root@rbd-client ~]# rbd --image image1 info

2. To retrieve information from an image within a pool:

Syntax

rbd --image IMAGE_NAME -p POOL_NAME info

Example

[root@rbd-client ~]# rbd --image image1 -p pool1 info

Resizing a block device image


Edit online
Ceph block device images are thin-provisioned. They do not actually use any physical storage until you begin saving data to them.
However, they do have a maximum capacity that you set with the --size option.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

To increase the maximum size of a Ceph block device image for the default rbd pool:

Syntax

rbd resize --image IMAGE_NAME --size SIZE

796 IBM Storage Ceph


Example

[root@rbd-client ~]# rbd resize --image image1 --size 1024

To decrease the maximum size of a Ceph block device image for the default rbd pool:

Syntax

rbd resize --image IMAGE_NAME --size SIZE --allow-shrink

Example

[root@rbd-client ~]# rbd resize --image image1 --size 1024 --allow-shrink

To increase the maximum size of a Ceph block device image for a specific pool:

Syntax

rbd resize --image POOL_NAME/IMAGE_NAME --size SIZE

Example

[root@rbd-client ~]# rbd resize --image pool1/image1 --size 1024

To decrease the maximum size of a Ceph block device image for a specific pool:

Syntax

rbd resize --image POOL_NAME/IMAGE_NAME --size SIZE --allow-shrink

Example

[root@rbd-client ~]# rbd resize --image pool1/image1 --size 1024 --allow-shrink

Removing a block device image


Edit online
Remove a block device image.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. To remove a block device from the default rbd pool:

Syntax

rbd rm IMAGE_NAME

Example

[root@rbd-client ~]# rbd rm image1

2. To remove a block device from a specific pool:

Syntax

rbd rm IMAGE_NAME -p POOL_NAME

Example

IBM Storage Ceph 797


[root@rbd-client ~]# rbd rm image1 -p pool1

Moving a block device image to the trash


Edit online
RADOS Block Device (RBD) images can be moved to the trash using the rbd trash command. This command provides more
options than the rbd rm command.

Once an image is moved to the trash, it can be removed from the trash at a later time. This helps to avoid accidental deletion.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. To move an image to the trash run the following:

Syntax

rbd trash mv POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd trash mv pool1/image1

Once an image is in the trash, a unique image ID is assigned.

NOTE: You need this image ID to specify the image later if you need to use any of the trash options.

2. Execute the rbd trash list POOL_NAME for a list of IDs of the images in the trash. This command also returns the image’s
pre-deletion name. In addition, there is an optional --image-id argument that can be used with rbd info and rbd snap
commands. Use --image-id with the rbd info command to see the properties of an image in the trash, and with rbd
snap to remove an image’s snapshots from the trash.

3. To remove an image from the trash execute the following:

Syntax

rbd trash rm POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd trash rm pool1/d35ed01706a0

IMPORTANT: Once an image is removed from the trash, it cannot be restored.

4. Execute the rbd trash restore command to restore the image:

Syntax

rbd trash restore POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd trash restore pool1/d35ed01706a0

5. To remove all expired images from trash:

Syntax

798 IBM Storage Ceph


rbd trash purge POOL_NAME

Example

[root@rbd-client ~]# rbd trash purge pool1


Removing images: 100% complete...done.

Defining an automatic trash purge schedule


Edit online
You can schedule periodic trash purge operations on a pool.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. To add a trash purge schedule, run:

Syntax

rbd trash purge schedule add --pool POOL_NAME INTERVAL

Example

[ceph: root@host01 /]# rbd trash purge schedule add --pool pool1 10m

2. To list the trash purge schedule, execute:

Syntax

rbd trash purge schedule ls --pool POOL_NAME

Example

[ceph: root@host01 /]# rbd trash purge schedule ls --pool pool1


every 10m

3. To know the status of trash purge schedule, execute:

Example

[ceph: root@host01 /]# rbd trash purge schedule status


POOL NAMESPACE SCHEDULE TIME
pool1 2021-08-02 11:50:00

4. To remove the trash purge schedule, execute:

Syntax

rbd trash purge schedule remove --pool POOL_NAME INTERVAL

Example

[ceph: root@host01 /]# rbd trash purge schedule remove --pool pool1 10m

Enabling and disabling image features

IBM Storage Ceph 799


Edit online
The block device images, such as fast-diff, exclusive-lock, object-map, or deep-flatten, are enabled by default. You can
enable or disable these image features on already existing images.

NOTE: The deep flatten feature can be only disabled on already existing images but not enabled. To use deep flatten, enable
it when creating images.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. Retrieve information from a particular image in a pool:

Syntax

rbd --image POOL_NAME/IMAGE_NAME info

Example

[ceph: root@host01 /]# rbd --image pool1/image1 info

2. Enable a feature:

Syntax

rbd feature enable POOL_NAME/IMAGE_NAME FEATURE_NAME

i. To enable the exclusive-lock feature on the image1 image in the pool1 pool:

Example

[ceph: root@host01 /]# rbd feature enable pool1/image1 exclusive-lock

IMPORTANT: If you enable the fast-diff and object-map features, then rebuild the object map:

Syntax

rbd object-map rebuild POOL_NAME/IMAGE_NAME

3. Disable a feature:

Syntax

rbd feature disable POOL_NAME/IMAGE_NAME FEATURE_NAME

i. To disable the fast-diff feature on the image1 image in the pool1 pool:

Example

[ceph: root@host01 /]# rbd feature disable pool1/image1 fast-diff

Working with image metadata


Edit online
Ceph supports adding custom image metadata as key-value pairs. The pairs do not have any strict format.

Also, by using metadata, you can set the RADOS Block Device (RBD) configuration parameters for particular images.

Use the rbd image-meta commands to work with metadata.

800 IBM Storage Ceph


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. To set a new metadata key-value pair:

Syntax

rbd image-meta set POOL_NAME/IMAGE_NAME KEY VALUE

Example

[ceph: root@host01 /]# rbd image-meta set pool1/image1 last_update 2021-06-06

This example sets the last_update key to the 2021-06-06 value on the image1 image in the pool1 pool.

2. To view a value of a key:

Syntax

rbd image-meta get POOL_NAME/IMAGE_NAME KEY

Example

[ceph: root@host01 /]# rbd image-meta get pool1/image1 last_update

This example views the value of the last_update key.

3. To show all metadata on an image:

Syntax

rbd image-meta list POOL_NAME/IMAGE_NAME

Example

[ceph: root@host01 /]# rbd image-meta list pool1/image1

This example lists the metadata set for the image1 image in the pool1 pool.

4. To remove a metadata key-value pair:

Syntax

rbd image-meta remove POOL_NAME/IMAGE_NAME KEY

Example

[ceph: root@host01 /]# rbd image-meta remove pool1/image1 last_update

This example removes the last_update key-value pair from the image1 image in the pool1 pool.

5. To override the RBD image configuration settings set in the Ceph configuration file for a particular image:

Syntax

rbd config image set POOL_NAME/IMAGE_NAME PARAMETER VALUE

Example

[ceph: root@host01 /]# rbd config image set pool1/image1 rbd_cache false

This example disables the RBD cache for the image1 image in the pool1 pool.

IBM Storage Ceph 801


Reference
Edit online

See Block device general options for a list of possible configuration options.

Moving images between pools


Edit online
You can move RADOS Block Device (RBD) images between different pools within the same cluster.

During this process, the source image is copied to the target image with all snapshot history and optionally with link to the source
image’s parent to help preserve sparseness. The source image is read only, the target image is writable. The target image is linked to
the source image while the migration is in progress.

You can safely run this process in the background while the new target image is in use. However, stop all clients using the target
image before the preparation step to ensure that clients using the image are updated to point to the new target image.

IMPORTANT: The krbd kernel module does not support live migration at this time.

Prerequisites
Edit online

Stop all clients that use the source image.

Root-level access to the client node.

Procedure
Edit online

1. Prepare for migration by creating the new target image that cross-links the source and target images:

Syntax

rbd migration prepare SOURCE_IMAGE TARGET_IMAGE

Replace:

SOURCE_IMAGE with the name of the image to be moved. Use the POOL/IMAGE_NAME format.

TARGET_IMAGE with the name of the new image. Use the POOL/IMAGE_NAME format.

Example

[root@rbd-client ~]# rbd migration prepare pool1/image1 pool2/image2

2. Verify the state of the new target image, which is supposed to be prepared:

Syntax

rbd status TARGET_IMAGE

Example

[root@rbd-client ~]# rbd status pool2/image2


Watchers: none
Migration:
source: pool1/image1 (5e2cba2f62e)
destination: pool2/image2 (5e2ed95ed806)
state: prepared

3. Optionally, restart the clients using the new target image name.

802 IBM Storage Ceph


4. Copy the source image to target image:

Syntax

rbd migration execute TARGET_IMAGE

Example

[root@rbd-client ~]# rbd migration execute pool2/image2

5. Ensure that the migration is completed:

Example

[root@rbd-client ~]# rbd status pool2/image2


Watchers:
watcher=1.2.3.4:0/3695551461 client.123 cookie=123
Migration:
source: pool1/image1 (5e2cba2f62e)
destination: pool2/image2 (5e2ed95ed806)
state: executed

6. Commit the migration by removing the cross-link between the source and target images, and this also removes the source
image:

Syntax

rbd migration commit TARGET_IMAGE

Example

[root@rbd-client ~]# rbd migration commit pool2/image2

If the source image is a parent of one or more clones, use the --force option after ensuring that the clone images are not in
use:

Example

[root@rbd-client ~]# rbd migration commit pool2/image2 --force

7. If you did not restart the clients after the preparation step, restart them using the new target image name.

The rbdmap service

Edit online
The systemd unit file, rbdmap.service, is included with the ceph-common package. The rbdmap.service unit executes the
rbdmap shell script.

This script automates the mapping and unmapping of RADOS Block Devices (RBD) for one or more RBD images. The script can be ran
manually at any time, but the typical use case is to automatically mount RBD images at boot time, and unmount at shutdown. The
script takes a single argument, which can be either map, for mounting or unmap, for unmounting RBD images. The script parses a
configuration file, the default is /etc/ceph/rbdmap, but can be overridden using an environment variable called RBDMAPFILE.
Each line of the configuration file corresponds to an RBD image.

The format of the configuration file format is as follows:

IMAGE_SPEC RBD_OPTS

Where IMAGE_SPEC specifies the POOL_NAME / IMAGE_NAME, or just the IMAGE_NAME, in which case the POOL_NAME defaults
to rbd. The RBD_OPTS is an optional list of options to be passed to the underlying rbd map command. These parameters and their
values should be specified as a comma-separated string:

OPT1=VAL1,OPT2=VAL2,...,OPT_N=VAL_N

This will cause the script to issue an rbd map command like the following:

Syntax

rbd map POOL_NAME/IMAGE_NAME --OPT1 VAL1 --OPT2 VAL2

IBM Storage Ceph 803


NOTE: For options and values which contain commas or equality signs, a simple apostrophe can be used to prevent replacing them.

When successful, the rbd map operation maps the image to a /dev/rbdX device, at which point a udev rule is triggered to create a
friendly device name symlink, for example, /dev/rbd/POOL_NAME/IMAGE_NAME, pointing to the real mapped device. For mounting
or unmounting to succeed, the friendly device name must have a corresponding entry in /etc/fstab file. When writing
/etc/fstab entries for RBD images, it is a good idea to specify the noauto or nofail mount option. This prevents the init system
from trying to mount the device too early, before the device exists.

Reference
Edit online

See the rbd manpage for a full list of possible options.

Configuring the rbdmap service

Edit online
To automatically map and mount, or unmap and unmount, RADOS Block Devices (RBD) at boot time, or at shutdown respectively.

Prerequisites
Edit online

Root-level access to the node doing the mounting.

Installation of the ceph-common package.

Procedure
Edit online

1. Open for editing the /etc/ceph/rbdmap configuration file.

2. Add the RBD image or images to the configuration file:

Example

foo/bar1 id=admin,keyring=/etc/ceph/ceph.client.admin.keyring
foo/bar2
id=admin,keyring=/etc/ceph/ceph.client.admin.keyring,options='lock_on_read,queue_depth=1024'

3. Save changes to the configuration file.

4. Enable the RBD mapping service:

Example

[root@client ~]# systemctl enable rbdmap.service

Reference
Edit online

See The rbdmap service for more details on the RBD system service.

Persistent Write Log Cache (Technology Preview)


Edit online

804 IBM Storage Ceph


IMPORTANT: Technology Preview features are not supported with IBM production service level agreements (SLAs), might not be
functionally complete, and IBM does not recommend using them for production. These features provide early access to upcoming
product features, enabling customers to test functionality and provide feedback during the development process.

In an IBM Storage Ceph cluster, Persistent Write Log (PWL) cache provides a persistent, fault-tolerant write-back cache for librbd-
based RBD clients.

PWL cache uses a log-ordered write-back design which maintains checkpoints internally so that writes that get flushed back to the
cluster are always crash consistent. If the client cache is lost entirely, the disk image is still consistent but the data appears stale.
You can use PWL cache with persistent memory (PMEM) or solid-state disks (SSD) as cache devices.

For PMEM, the cache mode is replica write log (RWL) and for SSD, the cache mode is (SSD). Currently, PWL cache supports RWL and
SSD modes and is disabled by default.

Primary benefits of PWL cache are:

PWL cache can provide high performance when the cache is not full. The larger the cache, the longer the duration of high
performance.

PWL cache provides persistence and is not much slower than RBD cache. RBD cache is faster but volatile and cannot
guarantee data order and persistence.

In a steady state, where the cache is full, performance is affected by the number of I/Os in flight. For example, PWL can
provide higher performance at low io_depth, but at high io_depth, such as when the number of I/Os is greater than 32, the
performance is often worse than that in cases without cache.

Use cases for PMEM caching are:

Different from RBD cache, PWL cache has non-volatile characteristics and is used in scenarios where you do not want data
loss and need performance.

RWL mode provides low latency. It has a stable low latency for burst I/Os and it is suitable for those scenarios with high
requirements for stable low latency.

RWL mode also has high continuous and stable performance improvement in scenarios with low I/O depth or not too much
inflight I/O.

Use case for SSD caching is:

The advantages of SSD mode are similar to RWL mode. SSD hardware is relatively cheap and popular, but its performance is
slightly lower than PMEM.

Persistent write log cache limitations


Edit online
When using Persistent Write Log (PWL) cache, there are several limitations that should be considered.

The underlying implementation of persistent memory (PMEM) and solid-state disks (SSD) is different, with PMEM having
higher performance. At present, PMEM can provide "persist on write" and SSD is "persist on flush or checkpoint". In future
releases, these two modes will be configurable.

When users switch frequently and open and close images repeatedly, Ceph displays poor performance. If PWL cache is
enabled, the performance is worse. It is not recommended to set num_jobs in a Flexible I/O (fio) test, but instead setup
multiple jobs to write different images.

Enabling persistent write log cache


Edit online
You can enable persistent write log cache (PWL) on an IBM Storage Ceph cluster by setting the Ceph RADOS block device (RBD)
rbd_persistent_cache_mode and rbd_plugins options.

IBM Storage Ceph 805


IMPORTANT: The exclusive-lock feature must be enabled to enable persistent write log cache. The cache can be loaded only after
the exclusive-lock is acquired. Exclusive-locks are enabled on newly created images by default unless overridden by the
rbd_default_features configuration option or the --image-feature flag for the rbd create command. See Enabling and
disabling image features in the IBM Storage Ceph Block Device Guide for more details on the exclusive-lock feature.

Set the persistent write log cache options at the host level by using the ceph config set command. Set the persistent write log
cache options at the pool or image level is by using the rbd config pool set or the rbd config image set commands.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the monitor node.

The exclusive-lock feature is enabled.

Client-side disks are persistent memory (PMEM) or solid-state disks (SSD).

RBD cache is disabled.

Procedure
Edit online

1. Enable PWL cache:

i. At the host level, use the ceph config set command:

Syntax

ceph config set client rbd_persistent_cache_mode CACHE_MODE


ceph config set client rbd_plugins pwl_cache

Replace CACHE_MODE with rwl or ssd.

Example

[ceph: root@host01 /]# ceph config set client rbd_persistent_cache_mode ssd


[ceph: root@host01 /]# ceph config set client rbd_plugins pwl_cache

ii. At the pool level, use the rbd config pool set command:

Syntax

rbd config pool set POOL_NAME rbd_persistent_cache_mode CACHE_MODE


rbd config pool set POOL_NAME rbd_plugins pwl_cache

Replace CACHE_MODE with rwl or ssd.

Example

[ceph: root@host01 /]# rbd config pool set pool1 rbd_persistent_cache_mode ssd
[ceph: root@host01 /]# rbd config pool set pool1 rbd_plugins pwl_cache

iii. At the image level, use the rbd config image set command:

Syntax

rbd config image set POOL_NAME/IMAGE_NAME rbd_persistent_cache_mode CACHE_MODE


rbd config image set POOL_NAME/IMAGE_NAME rbd_plugins pwl_cache

Replace CACHE_MODE with rwl or ssd.

Example

[ceph: root@host01 /]# rbd config image set pool1/image1 rbd_persistent_cache_mode ssd
[ceph: root@host01 /]# rbd config image set pool1/image1 rbd_plugins pwl_cache

806 IBM Storage Ceph


2. Optional: Set the additional RBD options at the host, the pool, or the image level:

Syntax

rbd_persistent_cache_mode CACHE_MODE
rbd_plugins pwl_cache
rbd_persistent_cache_path /PATH_TO_DAX_ENABLED_FOLDER/WRITE_BACK_CACHE_FOLDER <1>
rbd_persistent_cache_size PERSISTENT_CACHE_SIZE <2>

<1> rbd_persistent_cache_path - A file folder to cache data that must have direct access (DAX) enabled when using the
rwl mode to avoid performance degradation.

<2> rbd_persistent_cache_size - The cache size per image, with a minimum cache size of 1 GB. The larger the cache
size, the better the performance.

Example

rbd_cache false
rbd_persistent_cache_mode rwl
rbd_plugins pwl_cache
rbd_persistent_cache_path /mnt/pmem/cache/
rbd_persistent_cache_size 1073741824

Reference
Edit online

See Direct Access for files article on kernel.org for more details on using DAX.

Checking persistent write log cache status


Edit online
You can check the status of the Persistent Write Log (PWL) cache. The cache is used when an exclusive lock is acquired, and when
the exclusive-lock is released, the persistent write log cache is closed. The cache status shows information about the cache size,
location, type, and other cache-related information. Updates to the cache status are done when the cache is opened and closed.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the monitor node.

A running process with PWL cache enabled.

Procedure
Edit online

View the PWL cache status:

Syntax

rbd status POOL_NAME/IMAGE_NAME

Example

[ceph: root@host01 /]# rbd status pool1/image1


Watchers:
watcher=10.10.0.102:0/1061883624 client.25496 cookie=140338056493088
Persistent cache state:
host: host02
path: /mnt/nvme0/rbd-pwl.rbd.101e5824ad9a.pool
size: 1 GiB

IBM Storage Ceph 807


mode: ssd
stats_timestamp: Mon Apr 18 13:26:32 2022
present: true empty: false clean: false
allocated: 509 MiB
cached: 501 MiB
dirty: 338 MiB
free: 515 MiB
hits_full: 1450 / 61%
hits_partial: 0 / 0%
misses: 924
hit_bytes: 192 MiB / 66%
miss_bytes: 97 MiB

Flushing persistent write log cache


Edit online
You can flush the cache file with the rbd command, specifying persistent-cache flush, the pool name, and the image name
before discarding the persistent write log (PWL) cache. The flush command can explicitly write cache files back to the OSDs. If
there is a cache interruption or the application dies unexpectedly, all the entries in the cache are flushed to the OSDs so that you can
manually flush the data and then invalidate the cache.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the monitor node.

PWL cache is enabled.

Procedure
Edit online

Flush the PWL cache:

Syntax

rbd persistent-cache flush POOL_NAME/IMAGE_NAME

Example

[ceph: root@host01 /]# rbd persistent-cache flush pool1/image1

Reference
Edit online

See Discarding persistent write log cache for more details.

Discarding persistent write log cache


Edit online
You might need to manually discard the Persistent Write Log (PWL) cache, for example, if the data in the cache has expired. You can
discard a cache file for an image by using the rbd persistent-cache invalidate command. The command removes the cache
metadata for the specified image, disables the cache feature, and deletes the local cache file, if it exists.

Prerequisites

808 IBM Storage Ceph


Edit online

A running IBM Storage Ceph cluster.

Root-level access to the monitor node.

PWL cache is enabled.

Procedure
Edit online

Discard PWL cache:

Syntax

rbd persistent-cache invalidate POOL_NAME/IMAGE_NAME

Example

[ceph: root@host01 /]# rbd persistent-cache invalidate pool1/image1

Monitoring performance of Ceph Block Devices using the


command-line interface
Edit online
This framework provides a built-in method to generate and process performance metrics upon which other Ceph Block Device
performance monitoring solutions are built.

A new Ceph Manager module,rbd_support, aggregates the performance metrics when enabled. The rbd command has two new
actions: iotop and iostat.

NOTE: The initial use of these actions can take around 30 seconds to populate the data fields.

Prerequisites
Edit online

User-level access to a Ceph Monitor node.

Procedure
Edit online

1. Ensure the rbd_support Ceph Manager module is enabled:

Example

[ceph: root@host01 /]# ceph mgr module ls

{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support", <--
"status",
"telemetry",
"volumes"
}

IBM Storage Ceph 809


2. To display an "iotop"-style of images:

Example

[user@mon ~]$ rbd perf image iotop

NOTE: The write ops, read-ops, write-bytes, read-bytes, write-latency, and read-latency columns can be sorted dynamically
by using the right and left arrow keys.

3. To display an "iostat"-style of images:

Example

[user@mon ~]$ rbd perf image iostat

NOTE: The output from this command can be in JSON or XML format, and then can be sorted using other command-line tools.

Live migration of images


Edit online
As a storage administrator, you can live-migrate RBD images between different pools or even with the same pool, within the same
storage cluster. You can migrate between different images formats and layouts and even from external data sources. When live
migration is initiated, the source image is deep copied to the destination image, pulling all snapshot history while preserving the
sparse allocation of data where possible.

IMPORTANT: Currently, the krbd kernel module does not support live migration.

The live migration process


Formats
Streams
Preparing the live migration process
Preparing import-only migration
Executing the live migration process
Committing the live migration process
Aborting the live migration process

The live migration process


Edit online
By default, during the live migration of the RBD images with the same storage cluster, the source image is marked read-only. All
clients redirect the Input/Output (I/O) to the new target image. Additionally, this mode can preserve the link to the source image’s
parent to preserve sparseness, or it can flatten the image during the migration to remove the dependency on the source image’s
parent. You can use the live migration process in an import-only mode, where the source image remains unmodified. You can link the
target image to an external data source, such as a backup file, HTTP(s) file, or an S3 object. The live migration copy process can
safely run in the background while the new target image is being used.

The live migration process consists of three steps:

Prepare Migration: The first step is to create new target image and link the target image to the source image. If the import-only
mode is not configured, the source image will also be linked to the target image and marked read-only. Attempts to read uninitialized
data extents within the target image will internally redirect the read to the source image, and writes to uninitialized extents within the
target image will internally deep copy, the overlapping source image extents to the target image.

Execute Migration: This is a background operation that deep-copies all initialized blocks from the source image to the target. You
can run this step when clients are actively using the new target image.

Finish Migration: You can commit or abort the migration, once the background migration process is completed. Committing the
migration removes the cross-links between the source and target images, and will remove the source image if not configured in the
import-only mode. Aborting the migration remove the cross-links, and will remove the target image.

810 IBM Storage Ceph


Formats
Edit online
The source-spec JSON document is encoded as:

Syntax

{
"type": "native",
"pool_name": "POOL_NAME",
["pool_id": "POOL_ID",] (optional, alternative to "POOL_NAME" key)
["pool_namespace": "POOL_NAMESPACE",] (optional)
"image_name": "IMAGE_NAME>",
["image_id": "IMAGE_ID",] (optional, useful if image is in trash)
"snap_name": "SNAP_NAME",
["snap_id": "SNAP_ID",] (optional, alternative to "SNAP_NAME" key)
}

Note that the native format does not include the stream object since it utilizes native Ceph operations. For example, to import from
the image rbd/ns1/image1@snap1, the source-spec could be encoded as:

Example

{
"type": "native",
"pool_name": "rbd",
"pool_namespace": "ns1",
"image_name": "image1",
"snap_name": "snap1"
}

You can use the qcow format to describe a QEMU copy-on-write (QCOW) block device. Both the QCOW v1 and v2 formats are
currently supported with the exception of advanced features such as compression, encryption, backing files, and external data files.
You can link the qcow format data to any supported stream source:

Example

{
"type": "qcow",
"stream": {
"type": "file",
"file_path": "/mnt/image.qcow"
}
}

You can use the raw format to describe a thick-provisioned, raw block device export that is rbd export –export-format 1
_SNAP_SPEC_. You can link the raw format data to any supported stream source:

Example

{
"type": "raw",
"stream": {
"type": "file",
"file_path": "/mnt/image-head.raw"
},
"snapshots": [
{
"type": "raw",
"name": "snap1",
"stream": {
"type": "file",
"file_path": "/mnt/image-snap1.raw"
}
},
] (optional oldest to newest ordering of snapshots)
}

The inclusion of the snapshots array is optional and currently only supports thick-provisioned raw snapshot exports.

IBM Storage Ceph 811


Streams
Edit online
File stream

You can use the file stream to import from a locally accessible POSIX file source.

Syntax

{
<format unique parameters>
"stream": {
"type": "file",
"file_path": "FILE_PATH"
}
}

For example, to import a raw-format image from a file located at /mnt/image.raw, the source-spec JSON file is:

Example

{
"type": "raw",
"stream": {
"type": "file",
"file_path": "/mnt/image.raw"
}
}

HTTP stream

You can use the HTTP stream to import from a remote HTTP or HTTPS web server.

Syntax

{
<format unique parameters>
"stream": {
"type": "http",
"url": "URL_PATH"
}
}

For example, to import a raw-format image from a file located at http://download.ceph.com/image.raw, the source-spec
JSON file is:

Example

{
"type": "raw",
"stream": {
"type": "http",
"url": "http://download.ceph.com/image.raw"
}
}

S3 stream

You can use the s3 stream to import from a remote S3 bucket.

Syntax

{
<format unique parameters>
"stream": {
"type": "s3",
"url": "URL_PATH",
"access_key": "ACCESS_KEY",
"secret_key": "SECRET_KEY"
}
}

812 IBM Storage Ceph


For example, to import a raw-format image from a file located at http://s3.ceph.com/bucket/image.raw, its source-spec
JSON is encoded as follows:

Example

{
"type": "raw",
"stream": {
"type": "s3",
"url": "http://s3.ceph.com/bucket/image.raw",
"access_key": "NX5QOQKC6BH2IDN8HC7A",
"secret_key": "LnEsqNNqZIpkzauboDcLXLcYaWwLQ3Kop0zAnKIn"
}
}

Preparing the live migration process


Edit online
You can prepare the default live migration process for RBD images within the same IBM Storage Ceph cluster. The rbd migration
prepare command accepts all the same layout options as the rbd create command. The rbd create command allows changes
to the on-disk layout of the immutable image. If you only want to change the on-disk layout and want to keep the original image
name, skip the migration_target argument. All clients using the source image must be stopped before preparing a live migration.
The prepare step will fail if it finds any running clients with the image open in read/write mode. You can restart the clients using the
new target image once the prepare step is completed.

NOTE: You cannot restart the clients using the source image as it will result in a failure.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Two block device pools.

One block device image.

1. Prepare the live migration within the storage cluster:

Syntax

rbd migration prepare SOURCE_POOL_NAME/SOURCE_IMAGE_NAME TARGET_POOL_NAME/SOURCE_IMAGE_NAME

Example

[ceph: root@rbd-client /]# rbd migration prepare sourcepool1/sourceimage1


targetpool1/sourceimage1

OR

If you want to rename the source image:

Syntax

rbd migration prepare SOURCE_POOL_NAME/SOURCE_IMAGE_NAME


TARGET_POOL_NAME/NEW_SOURCE_IMAGE_NAME

Example

[ceph: root@rbd-client /]# rbd migration prepare sourcepool1/sourceimage1


targetpool1/newsourceimage1

In the example, newsourceimage1 is the renamed source image.

2. You can check the current state of the live migration process with the following command:

Syntax

IBM Storage Ceph 813


rbd status TARGET_POOL_NAME/SOURCE_IMAGE_NAME

Example

[ceph: root@rbd-client /]# rbd status targetpool1/sourceimage1


Watchers: none
Migration:
source: sourcepool1/sourceimage1 (adb429cb769a)
destination: targetpool2/testimage1 (add299966c63)
state: prepared

IMPORTANT: During the migration process, the source image is moved into the RBD trash to prevent mistaken usage.

Example

[ceph: root@rbd-client /]# rbd info sourceimage1


rbd: error opening image sourceimage1: (2) No such file or directory

Example

[ceph: root@rbd-client /]# rbd trash ls --all sourcepool1


adb429cb769a sourceimage1

Preparing import-only migration


Edit online
You can initiate the import-only live migration process by running the rbd migration prepare command with the --import-
only and either, --source-spec or --source-spec-path options, passing a JSON document that describes how to access the
source image data directly on the command line or from a file.

A bucket and an S3 object are created.

1. Create a JSON file:

Example

[ceph: root@rbd-client /]# cat testspec.json


{
"type": "raw",
"stream": {
"type": "s3",
"url": "http:10.74.253.18:80/testbucket1/image.raw",
"access_key": "RLJOCP6345BGB38YQXI5",
"secret_key": "oahWRB2ote2rnLy4dojYjDrsvaBADriDDgtSfk6o"
}

2. Prepare the import-only live migration process:

Syntax

rbd migration prepare --import-only --source-spec-path "JSON_FILE" TARGET_POOL_NAME

Example

[ceph: root@rbd-client /]# rbd migration prepare --import-only --source-spec-path


"testspec.json" targetpool1

NOTE: The rbd migration prepare command accepts all the same image options as the rbd create command.

3. You can check the status of the import-only live migration:

Example

[ceph: root@rbd-client /]# rbd status targetpool1/sourceimage1


Watchers: none
Migration:
source: {"stream":
{"access_key":"RLJOCP6345BGB38YQXI5","secret_key":"oahWRB2ote2rnLy4dojYjDrsvaBADriDDgtSfk6o","
type":"s3","url":"http://10.74.253.18:80/testbucket1/image.raw"},"type":"raw"}

814 IBM Storage Ceph


destination: targetpool1/sourceimage1 (b13865345e66)
state: prepared

Executing the live migration process


Edit online
After you prepare for the live migration, you must copy the image blocks from the source image to the target image.

NOTE: The sub-commands help users copy the image blocks. The user is not required to take any further action other than the
execute command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Two block device pools.

One block device image with migration prepared using Live migration of images.

Procedure
Edit online

1. Execute the live migration:

Syntax

rbd migration execute TARGET_POOL_NAME/SOURCE_IMAGE_NAME

Example

[ceph: root@rbd-client /]# rbd migration execute targetpool1/sourceimage1


Image migration: 100% complete...done.

2. You can check the feedback on the progress of the migration block deep-copy process:

Syntax

rbd status TARGET_POOL_NAME/SOURCE_IMAGE_NAME

Example

[ceph: root@rbd-client /]# rbd status targetpool1/sourceimage1


Watchers: none
Migration:
source: sourcepool1/testimage1 (adb429cb769a)
destination: targetpool1/testimage1 (add299966c63)
state: executed

Committing the live migration process


Edit online
You can commit the migration, once the live migration has completed deep-copying all the data blocks from the source image to the
target image.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

IBM Storage Ceph 815


Two block device pools.

One block device image using Executing the live migration process.

Procedure
Edit online

1. Commit the migration, once deep-copying is completed:

Syntax

rbd migration commit TARGET_POOL_NAME/SOURCE_IMAGE_NAME

Example

[ceph: root@rbd-client /]# rbd migration commit targetpool1/sourceimage1


Commit image migration: 100% complete...done.

Verification

Committing the live migration will remove the cross-links between the source and target images, and also removes the source image
from the source pool:

Example

[ceph: root@rbd-client /]# rbd trash list --all sourcepool1

Aborting the live migration process


Edit online
You can revert the live migration process. Aborting live migration reverts the prepare and execute steps.

NOTE: You can abort only if you have not committed the live migration.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Two block device pools.

One block device image.

Procedure
Edit online

1. Abort the live migration process:

Syntax

rbd migration abort TARGET_POOL_NAME/SOURCE_IMAGE_NAME

Example

[ceph: root@rbd-client /]# rbd migration abort targetpool1/sourceimage1


Abort image migration: 100% complete...done.

Verification

When the live migration process is aborted, the target image is deleted and access to the original source image is restored in the
source pool:

816 IBM Storage Ceph


Example

[ceph: root@rbd-client /]# rbd ls sourcepool1


sourceimage1

Image encryption
Edit online
As a storage administrator, you can set a secret key that is used to encrypt a specific RBD image. Image level encryption is handled
internally by RBD clients.

NOTE: The krbd module does not support image level encryption.

NOTE: You can use external tools such as dm-crypt or QEMU to encrypt an RBD image.

Encryption format
Encryption load
Supported formats
Adding encryption format to images and clones

Prerequisites
Edit online

A running IBM Storage Ceph 5 cluster.

root level permissions.

Encryption format
Edit online
RBD images are not encrypted by default. You can encrypt an RBD image by formatting to one of the supported encryption formats.
The format operation persists the encryption metadata to the RBD image. The encryption metadata includes information such as the
encryption format and version, cipher algorithm and mode specifications, as well as the information used to secure the encryption
key.

The encryption key is protected by a user kept secret that is a passphrase, which is never stored as persistent data in the RBD image.
The encryption format operation requires you to specify the encryption format, cipher algorithm, and mode specification as well as a
passphrase. The encryption metadata is stored in the RBD image, currently as an encryption header that is written at the start of the
raw image. This means that the effective image size of the encrypted image would be lower than the raw image size.

NOTE: Currently you can only encrypt flat RBD images. Clones of an encrypted RBD image are inherently encrypted using the same
encryption profile and passphrase.

NOTE: Any data written to the RBD image before formatting might become unreadable, even though it might still occupy storage
resources. RBD images with the journal feature enabled cannot be encrypted.

Encryption load
Edit online
By default, all RBD APIs treat encrypted RBD images the same way as unencrypted RBD images. You can read or write raw data
anywhere in the image. Writing raw data into the image might risk the integrity of the encryption format. For example, the raw data
could override the encryption metadata located at the beginning of the image. To safely perform encrypted Input/Output(I/O) or
maintenance operations on the encrypted RBD image, an additional encryption load operation must be applied immediately after
opening the image.

IBM Storage Ceph 817


The encryption load operation requires you to specify the encryption format and a passphrase for unlocking the encryption key for
the image itself and each of its explicitly formatted ancestor images. All I/Os for the opened RBD image are encrypted or decrypted,
for a cloned RBD image, this includes IOs for the parent images. The encryption key is stored in memory by the RBD client until the
image is closed.

NOTE: Once the encryption is loaded on the RBD image, no other encryption load or format operation can be applied. Additionally,
API calls for retrieving the RBD image size using the opened image context return the effective image size. The encryption is loaded
automatically when mapping the RBD images as block devices through rbd-nbd.

NOTE: API calls for retrieving the image size and the parent overlap using the opened image context returns the effective image size
and the effective parent overlap.

NOTE: If a clone of an encrypted image is explicitly formatted, flattening or shrinking of the cloned image ceases to be transparent
since the parent data must be re-encrypted according to the cloned image format as it is copied from the parent snapshot. If
encryption is not loaded before the flatten operation is issued, any parent data that was previously accessible in the cloned image
might become unreadable.

NOTE: If a clone of an encrypted image is explicitly formatted, the operation of shrinking the cloned image ceases to be transparent.
This is because, in scenarios such as the cloned image containing snapshots or the cloned image being shrunk to a size that is not
aligned with the object size, the action of copying some data from the parent snapshot, similar to flattening is involved. If encryption
is not loaded before the shrink operation is issued, any parent data that was previously accessible in the cloned image might become
unreadable.

Supported formats
Edit online
Both Linux Unified Key Setup (LUKS) 1 and 2 are supported. The data layout is fully compliant with the LUKS specification. External
LUKS compatible tools such as dm-crypt or QEMU can safely perform encrypted Input/Outout (I/O) on encrypted RBD images.
Additionally, you can import existing LUKS images created by external tools, by copying the raw LUKS data into the RBD image.

Currently, only Advanced Encryption Standards (AES) 128 and 256 encryption algorithms are supported. xts-plain64 is currently the
only supported encryption mode.

To use the LUKS format, format the RBD image with the following command:

NOTE: You need to create a file named passphrase.txt and enter a passphrase. You can randomly generate the passphrase, which
might contain NULL characters. If the passphrase ends with a newline character, it will be stripped off.

Syntax

rbd encryption format POOL_NAME/LUKS_IMAGE luks1|luks2 passphrase.txt

Example

[ceph: root@host01 /]# rbd encryption format pool1/luksimage1 luks1 passphrase.txt

NOTE: You can select either luks1 or luks encryption format.

The encryption format operation generates a LUKS header and writes it at the start of the RBD image. A single keyslot is appended to
the header. The keyslot holds a randomly generated encryption key, and is protected by the passphrase read from the passphrase
file. By default, AES-256 in xts-plain64 mode, which is the current recommended mode and the default for other LUKS tools, is used.
Adding or removing additional passphrases is currently not supported natively, but can be achieved using LUKS tools such as
cryptsetup. The LUKS header size can vary that is upto 136MiB in LUKS, but it is usually upto 16MiB, dependent on the version of
libcryptsetup installed. For optimal performance, the encryption format will set the data offset to be aligned with the image
object size. For example, expect a minimum overhead of 8MiB if using an image configured with an 8MiB object size.

In LUKS1, sectors, which are the minimal encryption units, are fixed at 512 bytes. LUKS2 supports larger sectors, and for better
performance, the default sector size is set to the maximum of 4KiB. Writes which are either smaller than a sector, or are not aligned
to a sector start, will trigger a guarded read-modify-write chain on the client, with a considerable latency penalty. A batch of
such unaligned writes can lead to I/O races which will further deteriorate performance. IBM recommends to avoid using RBD
encryption in cases where incoming writes cannot be guaranteed to be LUKS sector aligned.

To map a LUKS encrypted image, run the following command:

Syntax

818 IBM Storage Ceph


rbd device map -t nbd -o encryption-format=luks1|luks2,encryption-passphrase-file=passphrase.txt
POOL_NAME/LUKS_IMAGE

Example

[ceph: root@host01 /]# rbd device map -t nbd -o encryption-format=luks1,encryption-passphrase-


file=passphrase.txt pool1/luksimage1

NOTE: You can select either luks1 or luks2 encryption format.

NOTE: For security reasons, both the encryption format and encryption load operations are CPU-intensive, and may take a few
seconds to complete. For encrypted I/O, assuming AES-NI is enabled, a relative small microseconds latency might be added, as well
as a small increase in CPU utilization.

Adding encryption format to images and clones


Edit online
Layered-client-side encryption is supported. The cloned images can be encrypted with their own format and passphrase, potentially
different from that of the parent image.

Add encryption format to images and clones with the rbd encryption format command. Given a LUKS2-formatted image, you can
create both a LUKS2-formatted clone and a LUKS1-formatted clone.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with Block Device (RBD) configured.

Root-level access to the node.

Procedure
Edit online

1. Create a LUKS2-formatted image:

Syntax

rbd create --size SIZE POOL_NAME/LUKS_IMAGE


rbd encryption format POOL_NAME/LUKS_IMAGE luks1|luks2 PASSPHRASE_FILE
rbd resize --size 50G --encryption-passphrase-file PASSPHRASE_FILE POOL_NAME/LUKS_IMAGE

Example

[ceph: root@host01 /]# rbd create --size 50G mypool/myimage


[ceph: root@host01 /]# rbd encryption format mypool/myimage luks2 passphrase.txt
[ceph: root@host01 /]# rbd resize --size 50G --encryption-passphrase-file passphrase.txt
mypool/myimage

The rbd resize command grows the image to compensate for the overhead associated with the LUKS2 header.

2. With the LUKS2-formatted image, create a LUKS2-formatted clone with the same effective size:

Syntax

rbd snap create POOL_NAME/IMAGE_NAME@SNAP_NAME


rbd snap protect POOL_NAME/IMAGE_NAME@SNAP_NAME
rbd clone POOL_NAME/IMAGE_NAME@SNAP_NAME POOL_NAME/CLONE_NAME
rbd encryption format POOL_NAME/CLONE_NAME luks1 CLONE_PASSPHRASE_FILE

Example

[ceph: root@host01 /]# rbd snap create mypool/myimage@snap


[ceph: root@host01 /]# rbd snap protect mypool/myimage@snap
[ceph: root@host01 /]# rbd clone mypool/myimage@snap mypool/myclone
[ceph: root@host01 /]# rbd encryption format mypool/myclone luks1 clone-passphrase.bin

IBM Storage Ceph 819


3. With the LUKS2-formatted image, create a LUKS1-formatted clone with the same effective size:

Syntax

rbd snap create POOL_NAME/IMAGE_NAME@SNAP_NAME


rbd snap protect POOL_NAME/IMAGE_NAME@SNAP_NAME
rbd clone POOL_NAME/IMAGE_NAME@SNAP_NAME POOL_NAME/CLONE_NAME
rbd encryption format POOL_NAME/CLONE_NAME luks1 CLONE_PASSPHRASE_FILE
rbd resize --size SIZE --allow-shrink --encryption-passphrase-file CLONE_PASSPHRASE_FILE --
encryption-passphrase-file PASSPHRASE_FILE POOL_NAME/CLONE_NAME

Example

[ceph: root@host01 /]# rbd snap create mypool/myimage@snap


[ceph: root@host01 /]# rbd snap protect mypool/myimage@snap
[ceph: root@host01 /]# rbd clone mypool/myimage@snap mypool/myclone
[ceph: root@host01 /]# rbd encryption format mypool/myclone luks1 clone-passphrase.bin
[ceph: root@host01 /]# rbd resize --size 50G --allow-shrink --encryption-passphrase-file
clone-passphrase.bin --encryption-passphrase-file passphrase.bin mypool/myclone

Since LUKS1 header is usually smaller than LUKS2 header, the rbd resize command at the end shrinks the cloned image to get
rid of unwanted space allowance.

4. With the LUKS-1-formatted image, create a LUKS2-formatted clone with the same effective size:

Syntax

rbd resize --size SIZE POOL_NAME/LUKS_IMAGE


rbd snap create POOL_NAME/IMAGE_NAME@SNAP_NAME
rbd snap protect POOL_NAME/IMAGE_NAME@SNAP_NAME
rbd clone POOL_NAME/IMAGE_NAME@SNAP_NAME POOL_NAME/CLONE_NAME
rbd encryption format POOL_NAME/CLONE_NAME luks2 CLONE_PASSPHRASE_FILE
rbd resize --size SIZE --allow-shrink --encryption-passphrase-file PASSPHRASE_FILE
POOL_NAME/LUKS_IMAGE
rbd resize --size SIZE --allow-shrink --encryption-passphrase-file CLONE_PASSPHRASE_FILE --
encryption-passphrase-file PASSPHRASE_FILE POOL_NAME/CLONE_NAME

Example

[ceph: root@host01 /]# rbd resize --size 51G mypool/myimage


[ceph: root@host01 /]# rbd snap create mypool/myimage@snap
[ceph: root@host01 /]# rbd snap protect mypool/myimage@snap
[ceph: root@host01 /]# rbd clone mypool/my-image@snap mypool/myclone
[ceph: root@host01 /]# rbd encryption format mypool/myclone luks2 clone-passphrase.bin
[ceph: root@host01 /]# rbd resize --size 50G --allow-shrink --encryption-passphrase-file
passphrase.bin mypool/myimage
[ceph: root@host01 /]# rbd resize --size 50G --allow-shrink --encryption-passphrase-file
clone-passphrase.bin --encryption-passphrase-file passphrase.bin mypool/myclone

Since LUKS2 header is usually bigger than LUKS1 header, the rbd resize command at the beginning temporarily grows the
parent image to reserve some extra space in the parent snapshot and consequently the cloned image. This is necessary to
make all parent data accessible in the cloned image. The rbd resize command at the end shrinks the parent image back to its
original size and does not impact the parent snapshot and the cloned image to get rid of the unused reserved space.

The same applies to creating a formatted clone of an unformatted image, since an unformatted image does not have a header
at all.

Snapshot management
Edit online
As a storage administrator, being familiar with Ceph's snapshotting feature can help you manage the snapshots and clones of images
stored in the IBM Storage Ceph cluster.

Ceph block device snapshots


The Ceph user and keyring
Creating a block device snapshot
Listing the block device snapshots
Rolling back a block device snapshot
Deleting a block device snapshot

820 IBM Storage Ceph


Purging the block device snapshots
Renaming a block device snapshot
Ceph block device layering
Protecting a block device snapshot
Cloning a block device snapshot
Unprotecting a block device snapshot
Listing the children of a snapshot
Flattening cloned images

Ceph block device snapshots


Edit online
A snapshot is a read-only copy of the state of an image at a particular point in time. One of the advanced features of Ceph block
devices is that you can create snapshots of the images to retain a history of an image’s state. Ceph also supports snapshot layering,
which allows you to clone images quickly and easily, for example a virtual machine image. Ceph supports block device snapshots
using the rbd command and many higher level interfaces, including QEMU, libvirt, OpenStack and CloudStack.

NOTE: If a snapshot is taken while I/O is occurring, then the snapshot might not get the exact or latest data of the image and the
snapshot might have to be cloned to a new image to be mountable. IBM recommends stopping I/O before taking a snapshot of an
image. If the image contains a filesystem, the filesystem must be in a consistent state before taking a snapshot. To stop I/O you can
use fsfreeze command. For virtual machines, the qemu-guest-agent can be used to automatically freeze filesystems when
creating a snapshot.

Figure 1. Ceph Block device snapshots

Reference
Edit online

See the fsfreeze(8) man page for more details.

The Ceph user and keyring


Edit online
When cephx is enabled, you must specify a user name or ID and a path to the keyring containing the corresponding key for the user.

NOTE: cephx is enabled by default.

You might also add the CEPH_ARGS environment variable to avoid re-entry of the following parameters:

Syntax

rbd --id USER_ID --keyring=/path/to/secret [commands]


rbd --name USERNAME --keyring=/path/to/secret [commands]

Example

[root@rbd-client ~]# rbd --id admin --keyring=/etc/ceph/ceph.keyring [commands]


[root@rbd-client ~]# rbd --name client.admin --keyring=/etc/ceph/ceph.keyring [commands]

TIP: Add the user and secret to the CEPH_ARGS environment variable so that you do not need to enter them each time.

Creating a block device snapshot


IBM Storage Ceph 821
Edit online
Create a snapshot of a Ceph block device.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Specify the snap create option, the pool name, and the image name:

Method 1:

Syntax

rbd --pool POOL_NAME snap create --snap SNAP_NAME IMAGE_NAME

Example

[root@rbd-client ~]# rbd --pool pool1 snap create --snap snap1 image1

Method 2:

Syntax

rbd snap create POOL_NAME/IMAGE_NAME@SNAP_NAME

Example

[root@rbd-client ~]# rbd snap create pool1/image1@snap1

Listing the block device snapshots


Edit online
List the block device snapshots.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Specify the pool name and the image name:

Syntax

rbd --pool POOL_NAME --image IMAGE_NAME snap ls


rbd snap ls POOL_NAME/IMAGE_NAME

Example

822 IBM Storage Ceph


[root@rbd-client ~]# rbd --pool pool1 --image image1 snap ls
[root@rbd-client ~]# rbd snap ls pool1/image1

Rolling back a block device snapshot


Edit online
Rollback a block device snapshot.

NOTE: Rolling back an image to a snapshot means overwriting the current version of the image with data from a snapshot. The time it
takes to execute a rollback increases with the size of the image. It is faster to clone from a snapshot than to rollback an image to a
snapshot, and it is the preferred method of returning to a pre-existing state.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Specify the snap rollback option, the pool name, the image name and the snap name:

Syntax

rbd --pool POOL_NAME snap rollback --snap SNAP_NAME IMAGE_NAME


rbd snap rollback POOL_NAME/IMAGE_NAME@SNAP_NAME

Example

[root@rbd-client ~]# rbd --pool pool1 snap rollback --snap snap1 image1
[root@rbd-client ~]# rbd snap rollback pool1/image1@snap1

Deleting a block device snapshot


Edit online
Delete a snapshot for Ceph block devices.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To delete a block device snapshot, specify the snap rm option, the pool name, the image name and the snapshot name:

Syntax

rbd --pool POOL_NAME snap rm --snap SNAP_NAME IMAGE_NAME


rbd snap rm POOL_NAME-/IMAGE_NAME@SNAP_NAME

IBM Storage Ceph 823


Example

[root@rbd-client ~]# rbd --pool pool1 snap rm --snap snap2 image1


[root@rbd-client ~]# rbd snap rm pool1/image1@snap1

IMPORTANT: If an image has any clones, the cloned images retain reference to the parent image snapshot. To delete the parent
image snapshot, you must flatten the child images first.

NOTE: Ceph OSD daemons delete data asynchronously, so deleting a snapshot does not free up the disk space immediately.

Reference
Edit online

See Flattening cloned images for more details.

Purging the block device snapshots


Edit online
Purge block device snapshots.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Specify the snap purge option and the image name on a specific pool:

Syntax

rbd --pool POOL_NAME snap purge IMAGE_NAME


rbd snap purge POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd --pool pool1 snap purge image1


[root@rbd-client ~]# rbd snap purge pool1/image1

Renaming a block device snapshot


Edit online
Rename a block device snapshot.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
824 IBM Storage Ceph
Edit online

1. To rename a snapshot:

Syntax

rbd snap rename POOL_NAME/IMAGE_NAME@ORIGINAL_SNAPSHOT_NAME


POOL_NAME/IMAGE_NAME@NEW_SNAPSHOT_NAME

Example

[root@rbd-client ~]# rbd snap rename data/dataset@snap1 data/dataset@snap2

This renames snap1 snapshot of the dataset image on the data pool to snap2.

2. Run the rbd help snap rename command to display additional details on renaming snapshots.

Ceph block device layering


Edit online
Ceph supports the ability to create many copy-on-write (COW) or copy-on-read (COR) clones of a block device snapshot. Snapshot
layering enables Ceph block device clients to create images very quickly. For example, you might create a block device image with a
Linux VM written to it. Then, snapshot the image, protect the snapshot, and create as many clones as you like. A snapshot is read-
only, so cloning a snapshot simplifies semantics—making it possible to create clones rapidly.

Figure 1. Ceph Block device layering

Snapshot of image Child refers Clone of snapshot


(read only) to parent (writeable)

Parent Child

NOTE: The terms parent and child mean a Ceph block device snapshot, parent, and the corresponding image cloned from the
snapshot, child. These terms are important for the command line usage below.

Each cloned image, the child, stores a reference to its parent image, which enables the cloned image to open the parent snapshot
and read it. This reference is removed when the clone is flattened that is, when information from the snapshot is completely
copied to the clone.

A clone of a snapshot behaves exactly like any other Ceph block device image. You can read to, write from, clone, and resize the
cloned images. There are no special restrictions with cloned images. However, the clone of a snapshot refers to the snapshot, so you
MUST protect the snapshot before you clone it.

A clone of a snapshot can be a copy-on-write (COW) or copy-on-read (COR) clone. Copy-on-write (COW) is always enabled for clones
while copy-on-read (COR) has to be enabled explicitly. Copy-on-write (COW) copies data from the parent to the clone when it writes
to an unallocated object within the clone. Copy-on-read (COR) copies data from the parent to the clone when it reads from an
unallocated object within the clone. Reading data from a clone will only read data from the parent if the object does not yet exist in
the clone. Rados block device breaks up large images into multiple objects. The default is set to 4 MB and all copy-on-write (COW)
and copy-on-read (COR) operations occur on a full object, that is writing 1 byte to a clone will result in a 4 MB object being read from
the parent and written to the clone if the destination object does not already exist in the clone from a previous COW/COR operation.

Whether or not copy-on-read (COR) is enabled, any reads that cannot be satisfied by reading an underlying object from the clone will
be rerouted to the parent. Since there is practically no limit to the number of parents, meaning that you can clone a clone, this
reroute continues until an object is found or you hit the base parent image. If copy-on-read (COR) is enabled, any reads that fail to be
satisfied directly from the clone result in a full object read from the parent and writing that data to the clone so that future reads of
the same extent can be satisfied from the clone itself without the need of reading from the parent.

This is essentially an on-demand, object-by-object flatten operation. This is specially useful when the clone is in a high-latency
connection away from it’s parent, that is the parent in a different pool, in another geographical location. Copy-on-read (COR) reduces
the amortized latency of reads. The first few reads will have high latency because it will result in extra data being read from the
parent, for example, you read 1 byte from the clone but now 4 MB has to be read from the parent and written to the clone, but all
future reads will be served from the clone itself.

IBM Storage Ceph 825


To create copy-on-read (COR) clones from snapshot you have to explicitly enable this feature by adding
rbd_clone_copy_on_read = true under [global] or [client] section in the ceph.conf file.

Reference
Edit online

For more information on flattening, see Flattening cloned images.

Protecting a block device snapshot


Edit online
Clones access the parent snapshots. All clones would break if a user inadvertently deleted the parent snapshot.

You can set the set-require-min-compat-client parameter to greater than or equal to mimic versions of Ceph.

Example

ceph osd set-require-min-compat-client mimic

This creates clone v2, by default. However, clients older than mimic cannot access those block device images.

NOTE: Clone v2 does not require protection of snapshots.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Specify POOL_NAME, IMAGE_NAME, and SNAP_SHOT_NAME in the following command:

Syntax

rbd --pool POOL_NAME snap protect --image IMAGE_NAME --snap SNAPSHOT_NAME


rbd snap protect POOL_NAME/IMAGE_NAME@SNAPSHOT_NAME

Example

[root@rbd-client ~]# rbd --pool pool1 snap protect --image image1 --snap snap1
[root@rbd-client ~]# rbd snap protect pool1/image1@snap1

NOTE: You cannot delete a protected snapshot.

Cloning a block device snapshot


Edit online
Clone a block device snapshot to create a read or write child image of the snapshot within the same pool or in another pool. One use
case would be to maintain read-only images and snapshots as templates in one pool, and writable clones in another pool.

NOTE: Clone v2 does not require protection of snapshots.

Prerequisites

826 IBM Storage Ceph


Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To clone a snapshot, you need to specify the parent pool, snapshot, child pool and image name:

Syntax

rbd snap --pool POOL_NAME --image PARENT_IMAGE --snap SNAP_NAME --dest-pool POOL_NAME --dest
CHILD_IMAGE_NAME
rbd clone POOL_NAME/PARENT_IMAGE@SNAP_NAME POOL_NAME/CHILD_IMAGE_NAME

Example

[root@rbd-client ~]# rbd clone --pool pool1 --image image1 --snap snap2 --dest-pool pool2 --
dest childimage1
[root@rbd-client ~]# rbd clone pool1/image1@snap1 pool1/childimage1

Unprotecting a block device snapshot


Edit online
Before you can delete a snapshot, you must unprotect it first. Additionally, you may NOT delete snapshots that have references from
clones. You must flatten each clone of a snapshot, before you can delete the snapshot.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Run the following commands:

Syntax

rbd --pool POOL_NAME snap unprotect --image IMAGE_NAME --snap SNAPSHOT_NAME


rbd snap unprotect POOL_NAME/IMAGE_NAME@SNAPSHOT_NAME

Example

[root@rbd-client ~]# rbd --pool pool1 snap unprotect --image image1 --snap snap1

[root@rbd-client ~]# rbd snap unprotect pool1/image1@snap1

Listing the children of a snapshot


Edit online
List the children of a snapshot.

IBM Storage Ceph 827


Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To list the children of a snapshot, execute the following:

Syntax

rbd --pool POOL_NAME children --image IMAGE_NAME --snap SNAP_NAME


rbd children POOL_NAME/IMAGE_NAME@SNAPSHOT_NAME

Example

[root@rbd-client ~]# rbd --pool pool1 children --image image1 --snap snap1
[root@rbd-client ~]# rbd children pool1/image1@snap1

Flattening cloned images


Edit online
Cloned images retain a reference to the parent snapshot. When you remove the reference from the child clone to the parent
snapshot, you effectively "flatten" the image by copying the information from the snapshot to the clone. The time it takes to flatten a
clone increases with the size of the snapshot. Because a flattened image contains all the information from the snapshot, a flattened
image will use more storage space than a layered clone.

NOTE: If the deep flatten feature is enabled on an image, the image clone is dissociated from its parent by default.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To delete a parent image snapshot associated with child images, you must flatten the child images first:

Syntax

rbd --pool POOL_NAME flatten --image IMAGE_NAME


rbd flatten POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd --pool pool1 flatten --image childimage1


[root@rbd-client ~]# rbd flatten pool1/childimage1

Mirroring Ceph block devices

828 IBM Storage Ceph


Edit online
As a storage administrator, you can add another layer of redundancy to Ceph block devices by mirroring data images between IBM
Storage Ceph clusters. Understanding and using Ceph block device mirroring can provide you protection against data loss, such as a
site failure. There are two configurations for mirroring Ceph block devices, one-way mirroring or two-way mirroring, and you can
configure mirroring on pools and individual images.

Ceph block device mirroring


Configuring one-way mirroring using the command-line interface
Configuring two-way mirroring using the command-line interface
Administration for mirroring Ceph block devices
Recover from a disaster

Ceph block device mirroring


Edit online
RADOS Block Device (RBD) mirroring is a process of asynchronous replication of Ceph block device images between two or more
Ceph storage clusters. By locating a Ceph storage cluster in different geographic locations, RBD Mirroring can help you recover from
a site disaster. Journal-based Ceph block device mirroring ensures point-in-time consistent replicas of all changes to an image,
including reads and writes, block device resizing, snapshots, clones and flattening.

RBD mirroring uses exclusive locks and the journaling feature to record all modifications to an image in the order in which they occur.
This ensures that a crash-consistent mirror of an image is available.

IMPORTANT: The CRUSH hierarchies supporting primary and secondary pools that mirror block device images must have the same
capacity and performance characteristics, and must have adequate bandwidth to ensure mirroring without excess latency. For
example, if you have X MB/s average write throughput to images in the primary storage cluster, the network must support N \* X
throughput in the network connection to the secondary site plus a safety factor of Y% to mirror N images.

The rbd-mirror daemon is responsible for synchronizing images from one Ceph storage cluster to another Ceph storage cluster by
pulling changes from the remote primary image and writes those changes to the local, non-primary image. The rbd-mirror
daemon can run either on a single Ceph storage cluster for one-way mirroring or on two Ceph storage clusters for two-way mirroring
that participate in the mirroring relationship.

For RBD mirroring to work, either using one-way or two-way replication, a couple of assumptions are made:

A pool with the same name exists on both storage clusters.

A pool contains journal-enabled images you want to mirror.

IMPORTANT: In one-way or two-way replication, each instance of rbd-mirror must be able to connect to the other Ceph storage
cluster simultaneously. Additionally, the network must have sufficient bandwidth between the two data center sites to handle
mirroring.

One-way Replication

One-way mirroring implies that a primary image or pool of images in one storage cluster gets replicated to a secondary storage
cluster. One-way mirroring also supports replicating to multiple secondary storage clusters.

On the secondary storage cluster, the image is the non-primary replicate; that is, Ceph clients cannot write to the image. When data
is mirrored from a primary storage cluster to a secondary storage cluster, the rbd-mirror runs ONLY on the secondary storage
cluster.

For one-way mirroring to work, a couple of assumptions are made:

You have two Ceph storage clusters and you want to replicate images from a primary storage cluster to a secondary storage
cluster.

The secondary storage cluster has a Ceph client node attached to it running the rbd-mirror daemon. The rbd-mirror
daemon will connect to the primary storage cluster to sync images to the secondary storage cluster.

Figure 1. One-way mirroring

IBM Storage Ceph 829


Two-way Replication

Two-way replication adds an rbd-mirror daemon on the primary cluster so images can be demoted on it and promoted on the
secondary cluster. Changes can then be made to the images on the secondary cluster and they will be replicated in the reverse
direction, from secondary to primary. Both clusters must have rbd-mirror running to allow promoting and demoting images on
either cluster. Currently, two-way replication is only supported between two sites.

For two-way mirroring to work, a couple of assumptions are made:

You have two storage clusters and you want to be able to replicate images between them in either direction.

Both storage clusters have a client node attached to them running the rbd-mirror daemon. The rbd-mirror daemon
running on the secondary storage cluster will connect to the primary storage cluster to synchronize images to secondary, and
the rbd-mirror daemon running on the primary storage cluster will connect to the secondary storage cluster to synchronize
images to primary.

Figure 2. Two-way mirroring

Mirroring Modes

Mirroring is configured on a per-pool basis with mirror peering storage clusters. Ceph supports two mirroring modes, depending on
the type of images in the pool.

Pool Mode
All images in a pool with the journaling feature enabled are mirrored.

830 IBM Storage Ceph


Image Mode
Only a specific subset of images within a pool are mirrored. You must enable mirroring for each image separately.

Image States

Whether or not an image can be modified depends on its state:

Images in the primary state can be modified.

Images in the non-primary state cannot be modified.

Images are automatically promoted to primary when mirroring is first enabled on an image. The promotion can happen:

Implicitly by enabling mirroring in pool mode.

Explicitly by enabling mirroring of a specific image.

It is possible to demote primary images and promote non-primary images.

An overview of journal-based and snapshot-based mirroring

Reference
Edit online

See Enabling mirroring on a poolfor more details.

See Enabling image mirroring for more details.

See Image promotion and demotion for more details.

An overview of journal-based and snapshot-based mirroring


Edit online
Journal-based mirroring

The actual image is not modified until every write to the RBD image is first recorded to the associated journal. The remote cluster
reads from this journal and replays the updates to its local copy of the image. Because each write to the RBD images results in two
writes to the Ceph cluster, write latencies nearly double with the usage of the RBD journaling image feature.

Snapshot-based mirroring

The remote cluster determines any data or metadata updates between two mirror snapshots and copies the deltas to its local copy
of the image.The RBD fast-diff image feature enables the quick determination of updated data blocks without the need to scan
the full RBD image. The complete delta between two snapshots needs to be synchronized prior to use during a failover scenario. Any
partially applied set of deltas are rolled back at moment of failover.

Configuring one-way mirroring using the command-line interface


Edit online
This procedure configures one-way replication of a pool from the primary storage cluster to a secondary storage cluster.

NOTE: When using one-way replication you can mirror to multiple secondary storage clusters.

NOTE: Examples in this section will distinguish between two storage clusters by referring to the primary storage cluster with the
primary images as site-a, and the secondary storage cluster you are replicating the images to, as site-b. The pool name used in
these examples is called data.

Prerequisites
Edit online

IBM Storage Ceph 831


A minimum of two healthy and running IBM Storage Ceph clusters.

Root-level access to a Ceph client node for each storage cluster.

A CephX user with administrator-level capabilities.

Procedure
Edit online

1. Log into the cephadm shell on both the sites:

Example

[root@site-a ~]# cephadm shell


[root@site-b ~]# cephadm shell

2. On site-b, schedule the deployment of mirror daemon on the secondary cluster:

Syntax

ceph orch apply rbd-mirror --placement=NODENAME

Example

[ceph: root@site-b /]# ceph orch apply rbd-mirror --placement=host04

NOTE: The nodename is the host where you want to configure mirroring in the secondary cluster.

3. Enable journaling features on an image on site-a.

i. For new images, use the --image-feature option:

Syntax

rbd create IMAGE_NAME --size MEGABYTES --pool POOL_NAME --image-feature FEATURE FEATURE

Example

[ceph: root@site-a /]# rbd create image1 --size 1024 --pool data --image-feature
exclusive-lock,journaling

NOTE: If exclusive-lock is already enabled, usejournaling as the only argument, else it returns the following error:

one or more requested features are already enabled (22) Invalid argument

ii. For existing images, use the rbd feature enable command:

Syntax

rbd feature enable POOL_NAME/IMAGE_NAME FEATURE, FEATURE

Example

[ceph: root@site-a /]# rbd feature enable data/image1 exclusive-lock, journaling

iii. To enable journaling on all new images by default, set the configuration parameter using ceph config set command:

Example

[ceph: root@site-a /]# ceph config set global rbd_default_features 125


[ceph: root@site-a /]# ceph config show mon.host01 rbd_default_features

4. Choose the mirroring mode, either pool or image mode, on both the storage clusters.

i. Enabling pool mode:

Syntax

rbd mirror pool enable POOL_NAME MODE

Example

832 IBM Storage Ceph


[ceph: root@site-a /]# rbd mirror pool enable data pool
[ceph: root@site-b /]# rbd mirror pool enable data pool

This example enables mirroring of the whole pool named data.

ii. Enabling image mode:

Syntax

rbd mirror pool enable POOL_NAME MODE

Example

[ceph: root@site-a /]# rbd mirror pool enable data image


[ceph: root@site-b /]# rbd mirror pool enable data image

This example enables image mode mirroring on the pool named data.

iii. Verify that mirroring has been successfully enabled at both the sites:

Syntax

rbd mirror pool info *POOL_NAME*

Example

[ceph: root@site-a /]# rbd mirror pool info data


Mode: pool
Site Name: c13d8065-b33d-4cb5-b35f-127a02768e7f

Peer Sites: none

[ceph: root@site-b /]# rbd mirror pool info data


Mode: pool
Site Name: a4c667e2-b635-47ad-b462-6faeeee78df7

Peer Sites: none

5. On a Ceph client node, bootstrap the storage cluster peers.

i. Create Ceph user accounts, and register the storage cluster peer to the pool:

Syntax

rbd mirror pool peer bootstrap create --site-name *PRIMARY_LOCAL_SITE_NAME* *POOL_NAME* >
*PATH_TO_BOOTSTRAP_TOKEN*

Example

[ceph: root@rbd-client-site-a /]# rbd mirror pool peer bootstrap create --site-name site-a
data > /root/bootstrap_token_site-a

NOTE: This example bootstrap command creates the client.rbd-mirror.site-a and the client.rbd-mirror-peer
Ceph users.

ii. Copy the bootstrap token file to the site-b storage cluster.

iii. Import the bootstrap token on the site-b storage cluster:

Syntax

rbd mirror pool peer bootstrap import --site-name *SECONDARY_LOCAL_SITE_NAME* --direction rx-
only *POOL_NAME PATH_TO_BOOTSTRAP_TOKEN*

Example

[ceph: root@rbd-client-site-b /]# rbd mirror pool peer bootstrap import --site-name site-b --
direction rx-only data /root/bootstrap_token_site-a

6. To verify the mirroring status, run the following command from a Ceph Monitor node on the primary and secondary sites:

Syntax

rbd mirror image status POOL_NAME/IMAGE_NAME

IBM Storage Ceph 833


Example

[ceph: root@mon-site-a /]# rbd mirror image status data/image1


image1:
global_id: c13d8065-b33d-4cb5-b35f-127a02768e7f
state: up+stopped
description: remote image is non-primary
service: host03.yuoosv on host03
last_update: 2021-10-06 09:13:58

Here, up means the rbd-mirror daemon is running, and stopped means this image is not the target for replication from
another storage cluster. This is because the image is primary on this storage cluster.

Example

[ceph: root@mon-site-b /]# rbd mirror image status data/image1


image1:
global_id: c13d8065-b33d-4cb5-b35f-127a02768e7f

Reference
Edit online

See Ceph block device mirroring for more details.

See User Management for more details on Ceph users.

Configuring two-way mirroring using the command-line interface


Edit online
This procedure configures two-way replication of a pool between the primary storage cluster, and a secondary storage cluster.

NOTE: When using two-way replication you can only mirror between two storage clusters.

NOTE: Examples in this section will distinguish between two storage clusters by referring to the primary storage cluster with the
primary images as site-a, and the secondary storage cluster you are replicating the images to, as site-b. The pool name used in
these examples is called data.

Prerequisites
Edit online

A minimum of two healthy and running IBM Storage Ceph clusters.

Root-level access to a Ceph client node for each storage cluster.

A CephX user with administrator-level capabilities.

Procedure
Edit online

1. Log into the cephadm shell on both the sites:

Example

[root@site-a ~]# cephadm shell


[root@site-b ~]# cephadm shell

2. On the site-a primary cluster, run the following command:

Example

[ceph: root@site-a /]# ceph orch apply rbd-mirror --placement=host01

834 IBM Storage Ceph


NOTE: The nodename is the host where you want to configure mirroring.

3. On site-b, schedule the deployment of mirror daemon on the secondary cluster:

Syntax

ceph orch apply rbd-mirror --placement=NODENAME

Example

[ceph: root@site-b /]# ceph orch apply rbd-mirror --placement=host04

NOTE: The nodename is the host where you want to configure mirroring in the secondary cluster.

4. Enable journaling features on an image on site-a.

i. For new images, use the --image-feature option:

Syntax

rbd create IMAGE_NAME --size MEGABYTES --pool POOL_NAME --image-feature FEATURE FEATURE

Example

[ceph: root@site-a /]# rbd create image1 --size 1024 --pool data --image-feature
exclusive-lock,journaling

NOTE: If exclusive-lock is already enabled, usejournaling as the only argument, else it returns the following error:

one or more requested features are already enabled (22) Invalid argument

ii. For existing images, use the rbd feature enable command:

Syntax

rbd feature enable POOL_NAME/IMAGE_NAME FEATURE, FEATURE

Example

[ceph: root@site-a /]# rbd feature enable data/image1 exclusive-lock, journaling

iii. To enable journaling on all new images by default, set the configuration parameter using ceph config set command:

Example

[ceph: root@site-a /]# ceph config set global rbd_default_features 125


[ceph: root@site-a /]# ceph config show mon.host01 rbd_default_features

5. Choose the mirroring mode, either pool or image mode, on both the storage clusters.

i. Enabling pool mode:

Syntax

rbd mirror pool enable POOL_NAME MODE

Example

[ceph: root@site-a /]# rbd mirror pool enable data pool


[ceph: root@site-b /]# rbd mirror pool enable data pool

This example enables mirroring of the whole pool named data.

ii. Enabling image mode:

Syntax

rbd mirror pool enable POOL_NAME MODE

Example

[ceph: root@site-a /]# rbd mirror pool enable data image


[ceph: root@site-b /]# rbd mirror pool enable data image

IBM Storage Ceph 835


This example enables image mode mirroring on the pool named data.

iii. Verify that mirroring has been successfully enabled at both the sites:

Syntax

rbd mirror pool info POOL_NAME

Example

[ceph: root@site-a /]# rbd mirror pool info data


Mode: pool
Site Name: c13d8065-b33d-4cb5-b35f-127a02768e7f

Peer Sites: none

[ceph: root@site-b /]# rbd mirror pool info data


Mode: pool
Site Name: a4c667e2-b635-47ad-b462-6faeeee78df7

Peer Sites: none

6. On a Ceph client node, bootstrap the storage cluster peers.

i. Create Ceph user accounts, and register the storage cluster peer to the pool:

Syntax

rbd mirror pool peer bootstrap create --site-name PRIMARY_LOCAL_SITE_NAME POOL_NAME >
PATH_TO_BOOTSTRAP_TOKEN

Example

[ceph: root@rbd-client-site-a /]# rbd mirror pool peer bootstrap create --site-name site-a
data > /root/bootstrap_token_site-a

NOTE: This example bootstrap command creates the client.rbd-mirror.site-a and the client.rbd-mirror-peer
Ceph users.

ii. Copy the bootstrap token file to the site-b storage cluster.

iii. Import the bootstrap token on the site-b storage cluster:

Syntax

rbd mirror pool peer bootstrap import --site-name SECONDARY_LOCAL_SITE_NAME --direction rx-tx
POOL_NAME PATH_TO_BOOTSTRAP_TOKEN

Example

[ceph: root@rbd-client-site-b /]# rbd mirror pool peer bootstrap import --site-name site-b --
direction rx-tx data /root/bootstrap_token_site-a

NOTE: The --direction argument is optional, as two-way mirroring is the default when bootstrapping peers.

7. To verify the mirroring status, run the following command from a Ceph Monitor node on the primary and secondary sites:

Syntax

rbd mirror image status POOL_NAME/IMAGE_NAME

Example

[ceph: root@mon-site-a /]# rbd mirror image status data/image1


image1:
global_id: c13d8065-b33d-4cb5-b35f-127a02768e7f
state: up+stopped
description: remote image is non-primary
service: host03.yuoosv on host03
last_update: 2021-10-06 09:13:58

Here, up means the rbd-mirror daemon is running, and stopped means this image is not the target for replication from
another storage cluster. This is because the image is primary on this storage cluster.

836 IBM Storage Ceph


Example

[ceph: root@mon-site-b /]# rbd mirror image status data/image1


image1:
global_id: a4c667e2-b635-47ad-b462-6faeeee78df7
state: up+replaying
description: replaying,
{"bytes_per_second":0.0,"entries_behind_primary":0,"entries_per_second":0.0,"non_primary_posit
ion":{"entry_tid":3,"object_number":3,"tag_tid":1},"primary_position":
{"entry_tid":3,"object_number":3,"tag_tid":1}}
service: host05.dtisty on host05
last_update: 2021-09-16 10:57:20
peer_sites:
name: b
state: up+stopped
description: local image is primary
last_update: 2021-09-16 10:57:28

If images are in the state up+replaying, then mirroring is functioning properly. Here, up means the rbd-mirror daemon is
running, and replaying means this image is the target for replication from another storage cluster.

NOTE: Depending on the connection between the sites, mirroring can take a long time to sync the images.

Reference
Edit online

See Ceph block device mirroring for more details.

See User Management for more details on Ceph users.

Administration for mirroring Ceph block devices


Edit online
As a storage administrator, you can do various tasks to help you manage the Ceph block device mirroring environment. You can do
the following tasks:

Viewing information about storage cluster peers.

Add or remove a storage cluster peer.

Getting mirroring status for a pool or image.

Enabling mirroring on a pool or image.

Disabling mirroring on a pool or image.

Delaying block device replication.

Promoting and demoting an image.

Viewing information about peers


Enabling mirroring on a pool
Disabling mirroring on a pool
Enabling image mirroring
Disabling image mirroring
Image promotion and demotion
Image resynchronization
Adding a storage cluster peer
Removing a storage cluster peer
Getting mirroring status for a pool
Getting mirroring status for a single image
Delaying block device replication
Asynchronous updates and Ceph block device mirroring
Converting journal-based mirroring to snapshot-based mirrorring

IBM Storage Ceph 837


Creating an image mirror-snapshot
Scheduling mirror-snapshots

Viewing information about peers


Edit online
View information about storage cluster peers.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To view information about the peers:

Syntax

rbd mirror pool info POOL_NAME

Example

[root@rbd-client ~]# rbd mirror pool info data


Mode: pool
Site Name: a

Peer Sites:

UUID: 950ddadf-f995-47b7-9416-b9bb233f66e3
Name: b
Mirror UUID: 4696cd9d-1466-4f98-a97a-3748b6b722b3
Direction: rx-tx
Client: client.rbd-mirror-peer

Enabling mirroring on a pool


Edit online
Enable mirroring on a pool by running the following commands on both peer clusters.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To enable mirroring on a pool:

Syntax

838 IBM Storage Ceph


rbd mirror pool enable POOL_NAME MODE

Example

[root@rbd-client ~]# rbd mirror pool enable data pool

This example enables mirroring of the whole pool named data.

Example

[root@rbd-client ~]# rbd mirror pool enable data image

This example enables image mode mirroring on the pool named data.

Reference
Edit online

See Mirroring Ceph block devices for more details.

Disabling mirroring on a pool


Edit online
Before disabling mirroring, remove the peer clusters.

NOTE: When you disable mirroring on a pool, you also disable it on any images within the pool for which mirroring was enabled
separately in image mode.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. To disable mirroring on a pool:

Syntax

rbd mirror pool disable POOL_NAME

Example

[root@rbd-client ~]# rbd mirror pool disable data

This example disables mirroring of a pool named data.

Enabling image mirroring


Edit online
Enable mirroring on the whole pool in image mode on both peer storage clusters.

Prerequisites
Edit online

IBM Storage Ceph 839


A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Enable mirroring for a specific image within the pool:

Syntax

rbd mirror image enable POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image enable data/image2

This example enables mirroring for the image2 image in the data pool.

Reference
Edit online

See Enabling mirroring on a pool for more details.

Disabling image mirroring


Edit online
You can disable Ceph Block Device mirroring on images.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with snapshot-based mirroring configured.

Root-level access to the node.

Procedure
Edit online

1. To disable mirroring for a specific image:

Syntax

rbd mirror image disable POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image disable data/image2

This example disables mirroring of the image2 image in the data pool.

Reference
Edit online

See Configuring Ansible inventory location in the IBM Storage Ceph Installation Guide for more details on adding clients to the
cephadm-ansible inventory.

840 IBM Storage Ceph


Image promotion and demotion
Edit online
You can promote or demote an image in a pool.

NOTE: Do not force promote non-primary images that are still syncing, because the images will not be valid after the promotion.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with snapshot-based mirroring configured.

Root-level access to the node.

Procedure
Edit online

1. To demote an image to non-primary:

Syntax

rbd mirror image demote POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image demote data/image2

This example demotes the image2 image in the data pool.

2. To promote an image to primary:

Syntax

rbd mirror image promote POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image promote data/image2

This example promotes image2 in the data pool.

Syntax

rbd mirror image promote --force POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image promote --force data/image2

Use forced promotion when the demotion cannot be propagated to the peer Ceph storage cluster. For example, because of
cluster failure or communication outage.

Reference
Edit online

See Failover after a non-orderly shutdown for details.

Image resynchronization
Edit online

IBM Storage Ceph 841


You can re-synchronize an image. In case of an inconsistent state between the two peer clusters, the rbd-mirror daemon does not
attempt to mirror the image that is causing the inconsistency.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with snapshot-based mirroring configured.

Root-level access to the node.

Procedure
Edit online

1. To request a re-synchronization to the primary image:

Syntax

rbd mirror image resync POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image resync data/image2

This example requests resynchronization of image2 in the data pool.

Reference
Edit online

To recover from an inconsistent state because of a disaster, see either Recover from a disaster with one-way mirroring or
Recover from a disaster with two-way mirroring for details.

Adding a storage cluster peer


Edit online
Add a storage cluster peer for the rbd-mirror daemon to discover its peer storage cluster. For example, to add the site-a storage
cluster as a peer to the site-b storage cluster, then follow this procedure from the client node in the site-b storage cluster.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with snapshot-based mirroring configured.

Root-level access to the node.

Procedure
Edit online

Register the peer to the pool:

Syntax

rbd --cluster CLUSTER_NAME mirror pool peer add POOL_NAME PEER_CLIENT_NAME@PEER_CLUSTER_NAME


-n CLIENT_NAME

Example

842 IBM Storage Ceph


[root@rbd-client ~]# rbd --cluster site-b mirror pool peer add data client.site-a@site-a -n
client.site-b

Removing a storage cluster peer


Edit online
Remove a storage cluster peer by specifying the peer UUID.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with snapshot-based mirroring configured.

Root-level access to the node.

Procedure
Edit online

Specify the pool name and the peer Universally Unique Identifier (UUID).

Syntax

rbd mirror pool peer remove POOL_NAME PEER_UUID

Example

[root@rbd-client ~]# rbd mirror pool peer remove data 7e90b4ce-e36d-4f07-8cbc-42050896825d

To view the peer UUID, use the rbd mirror pool info command.

Getting mirroring status for a pool


Edit online
You can get the mirror status for a pool on the storage clusters.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with snapshot-based mirroring configured.

Root-level access to the node.

Procedure
Edit online

1. To get the mirroring pool summary:

Syntax

rbd mirror pool status POOL_NAME

Example

[root@site-a ~]# rbd mirror pool status data


health: OK
daemon health: OK

IBM Storage Ceph 843


image health: OK
images: 1 total
1 replaying

TIP: To output status details for every mirroring image in a pool, use the --verbose option.

Getting mirroring status for a single image


Edit online
You can get the mirror status for an image by running the mirror image status command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster with snapshot-based mirroring configured.

Root-level access to the node.

Procedure
Edit online

To get the status of a mirrored image:

Syntax

rbd mirror image status POOL_NAME/IMAGE_NAME

Example

[root@site-a ~]# rbd mirror image status data/image2


image2:
global_id: 1e3422a2-433e-4316-9e43-1827f8dbe0ef
state: up+unknown
description: remote image is non-primary
service: pluto008.yuoosv on pluto008
last_update: 2021-10-06 09:37:58

This example gets the status of the image2 image in the data pool.

Delaying block device replication


Edit online
Whether you are using one- or two-way replication, you can delay replication between RADOS Block Device (RBD) mirroring images.
You might want to implement delayed replication if you want a window of cushion time in case an unwanted change to the primary
image needs to be reverted before being replicated to the secondary image.

To implement delayed replication, the rbd-mirror daemon within the destination storage cluster should set the
rbd_mirroring_replay_delay = _MINIMUM_DELAY_IN_SECONDS_ configuration option. This setting can either be applied
globally within the ceph.conf file utilized by the rbd-mirror daemons, or on an individual image basis.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

844 IBM Storage Ceph


Procedure
Edit online

1. To utilize delayed replication for a specific image, on the primary image, run the following rbd CLI command:

Syntax

rbd image-meta set POOL_NAME/IMAGE_NAME conf_rbd_mirroring_replay_delay


MINIMUM_DELAY_IN_SECONDS

Example

[root@rbd-client ~]# rbd image-meta set vms/vm-1 conf_rbd_mirroring_replay_delay 600

This example sets a 10 minute minimum replication delay on image vm-1 in the vms pool.

Asynchronous updates and Ceph block device mirroring


Edit online
When updating a storage cluster using Ceph block device mirroring with an asynchronous update, follow the update instruction in the
Installing. Once updating is done, restart the Ceph block device mirroring instances.

NOTE: There is no required order for restarting the instances. Restart the instance pointing to the pool with primary images followed
by the instance pointing to the mirrored pool.

Converting journal-based mirroring to snapshot-based mirrorring


Edit online
You can convert journal-based mirroring to snapshot-based mirroring by disabling mirroring and enabling snapshot.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@rbd-client ~]# cephadm shell

2. Disable mirroring for a specific image within the pool:

Syntax

rbd mirror image disable POOL_NAME/IMAGE_NAME

Example

[ceph: root@rbd-client /]# rbd mirror image disable mirror_pool/mirror_image


Mirroring disabled

3. Enable snapshot-based mirroring for the image:

IBM Storage Ceph 845


Syntax

rbd mirror image enable POOL_NAME/IMAGE_NAME snapshot

Example

[ceph: root@rbd-client /]# rbd mirror image enable mirror_pool/mirror_image snapshot


Mirroring enabled

This example enables snapshot-based mirroring for the mirror_image image in the mirror_pool pool.

Creating an image mirror-snapshot


Edit online
Create an image mirror-snapshot when it is required to mirror the changed contents of an RBD image when using snapshot-based
mirroring.

Prerequisites
Edit online

A minimum of two healthy running IBM Storage Ceph clusters.

Root-level access to the Ceph client nodes for the IBM Storage Ceph clusters.

A CephX user with administrator-level capabilities.

Access to the IBM Storage Ceph cluster where a snapshot mirror will be created.

IMPORTANT: By default, a maximum of 5 image mirror-snapshots are retained. The most recent image mirror-snapshot is
automatically removed if the limit is reached. If required, the limit can be overridden through the
rbd_mirroring_max_mirroring_snapshots configuration. Image mirror-snapshots are automatically deleted when the image
is removed or when mirroring is disabled.

To create an image-mirror snapshot:

Syntax

rbd --cluster CLUSTER_NAME mirror image snapshot POOL_NAME/IMAGE_NAME

Example

[root@site-a ~]# rbd mirror image snapshot data/image1

Reference
Edit online

See Mirroring Ceph block devices in the IBM Storage Block Device Guide for details.

Scheduling mirror-snapshots
Edit online
Mirror-snapshots can be automatically created when mirror-snapshot schedules are defined. The mirror-snapshot can be scheduled
globally, per-pool or per-image levels. Multiple mirror-snapshot schedules can be defined at any level but only the most specific
snapshot schedules that match an individual mirrored image will run.

Creating a mirror-snapshot schedule


Listing all snapshot schedules at a specific level
Removing a mirror-snapshot schedule
Viewing the status for the next snapshots to be created

846 IBM Storage Ceph


Creating a mirror-snapshot schedule
Edit online
You can create a mirror-snapshot schedule using the snapshot schedule command.

Prerequisites
Edit online

A minimum of two healthy running IBM Storage Ceph clusters.

Root-level access to the Ceph client nodes for the IBM Storage Ceph clusters.

A CephX user with administrator-level capabilities.

Access to the IBM Storage Ceph cluster where a snapshot mirror will be created.

Procedure
Edit online

1. To create a mirror-snapshot schedule:

Syntax

rbd --cluster CLUSTER_NAME mirror snapshot schedule add --pool POOL_NAME --image IMAGE_NAME
INTERVAL [START_TIME]

The CLUSTER_NAME should be used only when the cluster name is different from the default name ceph. The interval can be
specified in days, hours, or minutes using d, h, or m suffix respectively. The optional START_TIME can be specified using the
ISO 8601 time format.

Example

[root@site-a ~]# rbd mirror snapshot schedule add --pool data --image image1 6h

Example

[root@site-a ~]# rbd mirror snapshot schedule add --pool data --image image1 24h 14:00:00-
05:00

Reference
Edit online

See Mirroring Ceph block devices for details.

Listing all snapshot schedules at a specific level


Edit online
You can list all snapshot schedules at a specific level.

Prerequisites
Edit online

A minimum of two healthy running IBM Storage Ceph clusters.

Root-level access to the Ceph client nodes for the IBM Storage Ceph clusters.

A CephX user with administrator-level capabilities.

IBM Storage Ceph 847


Access to the IBM Storage Ceph cluster where a snapshot mirror will be created.

Procedure
Edit online

1. To list all snapshot schedules for a specific global, pool or image level, with an optional pool or image name:

Syntax

rbd --cluster site-a mirror snapshot schedule ls --pool POOL_NAME --recursive

Additionally, the --recursive option can be specified to list all schedules at the specified level as shown below:

Example

[root@rbd-client ~]# rbd mirror snapshot schedule ls --pool data --recursive


POOL NAMESPACE IMAGE SCHEDULE
data - - every 1d starting at 14:00:00-05:00
data - image1 every 6h

Reference
Edit online

See Mirroring Ceph block devices for more details.

Removing a mirror-snapshot schedule


Edit online
You can remove a mirror-snapshot schedule using the snapshot schedule remove command.

Prerequisites
Edit online

A minimum of two healthy running IBM Storage Ceph clusters.

Root-level access to the Ceph client nodes for the IBM Storage Ceph clusters.

A CephX user with administrator-level capabilities.

Access to the IBM Storage Ceph cluster where a snapshot mirror will be created.

Procedure
Edit online

1. To remove a mirror-snapshot schedule:

Syntax

rbd --cluster CLUSTER_NAME mirror snapshot schedule remove --pool POOL_NAME --image IMAGE_NAME
INTERVAL START_TIME

The interval can be specified in days, hours, or minutes using d, h, m suffix respectively. The optional START_TIME can be
specified using the ISO 8601 time format.

Example

[root@site-a ~]# rbd mirror snapshot schedule remove --pool data --image image1 6h

Example

848 IBM Storage Ceph


[root@site-a ~]# rbd mirror snapshot schedule remove --pool data --image image1 24h 14:00:00-
05:00

Reference
Edit online

See Mirroring Ceph block devices for details.

Viewing the status for the next snapshots to be created


Edit online
You can view the status for the next snapshots to be created for snapshot-based mirroring RBD images.

Prerequisites
Edit online

A minimum of two healthy running IBM Storage Ceph clusters.

Root-level access to the Ceph client nodes for the IBM Storage Ceph clusters.

A CephX user with administrator-level capabilities.

Access to the IBM Storage Ceph cluster where a snapshot mirror will be created.

Procedure
Edit online

1. To view the status for the next snapshots to be created:

Syntax

rbd --cluster site-a mirror snapshot schedule status [--pool POOL_NAME] [--image IMAGE_NAME]

Example

[root@rbd-client ~]# rbd mirror snapshot schedule status


SCHEDULE TIME IMAGE
2021-09-21 18:00:00 data/image1

Reference
Edit online

See Mirroring Ceph block devices for details.

Recover from a disaster


Edit online
As a storage administrator, you can be prepared for eventual hardware failure by knowing how to recover the data from another
storage cluster where mirroring was configured.

In the examples, the primary storage cluster is known as the site-a, and the secondary storage cluster is known as the site-b.
Additionally, the storage clusters both have a data pool with two images, image1 and image2.

Disaster recovery
Recover from a disaster with one-way mirroring

IBM Storage Ceph 849


Recover from a disaster with two-way mirroring
Failover after an orderly shutdown
Failover after a non-orderly shutdown
Prepare for fail back
Remove two-way mirroring

Disaster recovery
Edit online
These failures have a widespread impact, also referred as a large blast radius, and can be caused by impacts to the power grid and
natural disasters.

Customer data needs to be protected during these scenarios. Volumes must be replicated with consistency and efficiency and also
within Recovery Point Objective (RPO) and Recovery Time Objective (RTO) targets. This solution is called a Wide Area Network-
Disaster Recovery (WAN-DR).

In such scenarios it is hard to restore the primary system and the data center. The solutions that are used to recover from these
failure scenarios are guided by the application:

Recovery Point Objective (RPO): The amount of data loss, an application tolerate in the worst case.

Recovery Time Objective (RTO): The time taken to get the application back on line with the latest copy of the data available.

Reference
Edit online

See Mirroring Ceph block devices in the for more details.

See Encryption in transit to learn more about data transmission over the wire in an encrypted state.

Recover from a disaster with one-way mirroring


Edit online
To recover from a disaster when using one-way mirroring use the following procedures. They show how to fail over to the secondary
cluster after the primary cluster terminates, and how to fail back. The shutdown can be orderly or non-orderly.

IMPORTANT: One-way mirroring supports multiple secondary sites. If you are using additional secondary clusters, choose one of the
secondary clusters to fail over to. Synchronize from the same cluster during fail back.

Recover from a disaster with two-way mirroring


Edit online
To recover from a disaster when using two-way mirroring use the following procedures. They show how to fail over to the mirrored
data on the secondary cluster after the primary cluster terminates, and how to failback. The shutdown can be orderly or non-orderly.

Failover after an orderly shutdown


Edit online
Failover to the secondary storage cluster after an orderly shutdown.

Prerequisites

850 IBM Storage Ceph


Edit online

Minimum of two running IBM Storage Ceph clusters.

Root-level access to the node.

Pool mirroring or image mirroring configured with one-way mirroring.

Procedure
Edit online

1. Stop all clients that use the primary image. This step depends on which clients use the image. For example, detach volumes
from any OpenStack instances that use the image.

2. Demote the primary images located on the site-a cluster by running the following commands on a monitor node in the
site-a cluster:

Syntax

rbd mirror image demote POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image demote data/image1


[root@rbd-client ~]# rbd mirror image demote data/image2

3. Promote the non-primary images located on the site-b cluster by running the following commands on a monitor node in the
site-b cluster:

Syntax

rbd mirror image promote POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image promote data/image1


[root@rbd-client ~]# rbd mirror image promote data/image2

4. After some time, check the status of the images from a monitor node in the site-b cluster. They should show a state of
up+stopped and be listed as primary:

[root@rbd-client ~]# rbd mirror image status data/image1


image1:
global_id: 08027096-d267-47f8-b52e-59de1353a034
state: up+stopped
description: local image is primary
last_update: 2019-04-17 16:04:37
[root@rbd-client ~]# rbd mirror image status data/image2
image2:
global_id: 596f41bc-874b-4cd4-aefe-4929578cc834
state: up+stopped
description: local image is primary
last_update: 2019-04-17 16:04:37

5. Resume the access to the images. This step depends on which clients use the image.

Reference
Edit online

See the Block Storage and Volumeschapter in the Red Hat OpenStack Platform Storage Guide.

Failover after a non-orderly shutdown


Edit online
Failover to secondary storage cluster after a non-orderly shutdown.

IBM Storage Ceph 851


Prerequisites
Edit online

Minimum of two running IBM Storage Ceph clusters.

Root-level access to the node.

Pool mirroring or image mirroring configured with one-way mirroring.

Procedure
Edit online

1. Verify that the primary storage cluster is down.

2. Stop all clients that use the primary image. This step depends on which clients use the image. For example, detach volumes
from any OpenStack instances that use the image.

3. Promote the non-primary images from a Ceph Monitor node in the site-b storage cluster. Use the --force option, because
the demotion cannot be propagated to the site-a storage cluster:

Syntax

rbd mirror image promote --force POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image promote --force data/image1


[root@rbd-client ~]# rbd mirror image promote --force data/image2

4. Check the status of the images from a Ceph Monitor node in the site-b storage cluster. They should show a state of
up+stopping_replay. The description should say force promoted, meaning it is in the intermittent state. Wait until the
state comes to up+stopped to validate the successful promotion of the site.

Example

[root@rbd-client ~]# rbd mirror image status data/image1


image1:
global_id: 08027096-d267-47f8-b52e-59de1353a034
state: up+stopping_replay
description: force promoted
last_update: 2023-04-17 13:25:06

[root@rbd-client ~]# rbd mirror image status data/image1


image1:
global_id: 08027096-d267-47f8-b52e-59de1353a034
state: up+stopped
description: force promoted
last_update: 2023-04-17 13:25:06

Reference
Edit online

See Block Storage and Volumes in the Red Hat OpenStack Platform Storage Guide.

Prepare for fail back


Edit online
If two storage clusters were originally configured only for one-way mirroring, in order to fail back, configure the primary storage
cluster for mirroring in order to replicate the images in the opposite direction.

During failback scenario, the existing peer that is inaccessible must be removed before adding a new peer to an existing cluster.

852 IBM Storage Ceph


Fail back to the primary storage cluster

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the client node.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

[root@rbd-client ~]# cephadm shell

2. On the site-a storage cluster , run the following command:

Example

[ceph: root@rbd-client /]# ceph orch apply rbd-mirror --placement=host01

3. Remove any inaccessible peers:

IMPORTANT: This step must be run on the peer site which is up and running.

NOTE: Multiple peers are supported only for one way mirroring.

a. Get the pool UUID:

Syntax

rbd mirror pool info POOL_NAME

Example

[ceph: root@host01 /]# rbd mirror pool info pool_failback

a. Remove the inaccessible peer:

Syntax

rbd mirror pool peer remove POOL_NAME PEER_UUID

Example

[ceph: root@host01 /]# rbd mirror pool peer remove pool_failback f055bb88-6253-4041-923d-
08c4ecbe799a

4. Create a block device pool with a name same as its peer mirror pool.

a. To create an rbd pool, execute the following:

Syntax

ceph osd pool create POOL_NAME PG_NUM


ceph osd pool application enable POOL_NAME rbd
rbd pool init -p POOL_NAME

Example

[root@rbd-client ~]# ceph osd pool create pool1


[root@rbd-client ~]# ceph osd pool application enable pool1 rbd
[root@rbd-client ~]# rbd pool init -p pool1

5. On a Ceph client node, bootstrap the storage cluster peers.

IBM Storage Ceph 853


a. Create Ceph user accounts, and register the storage cluster peer to the pool:

Syntax

rbd mirror pool peer bootstrap create --site-name LOCAL_SITE_NAME POOL_NAME >
PATH_TO_BOOTSTRAP_TOKEN

Example

[ceph: root@rbd-client-site-a /]# rbd mirror pool peer bootstrap create --site-name site-
a data > /root/bootstrap_token_site-a

NOTE: This example bootstrap command creates the client.rbd-mirror.site-a and the client.rbd-mirror-
peer Ceph users.

b. Copy the bootstrap token file to the site-b storage cluster.

c. Import the bootstrap token on the site-b storage cluster:

Syntax

rbd mirror pool peer bootstrap import --site-name LOCAL_SITE_NAME --direction rx-only
POOL_NAME PATH_TO_BOOTSTRAP_TOKEN

Example

[ceph: root@rbd-client-site-b /]# rbd mirror pool peer bootstrap import --site-name site-
b --direction rx-only data /root/bootstrap_token_site-a

NOTE: For one-way RBD mirroring, you must use the --direction rx-only argument, as two-way mirroring is the
default when bootstrapping peers.

6. From a monitor node in the site-a storage cluster, verify the site-b storage cluster was successfully added as a peer:

Example

[ceph: root@rbd-client /]# rbd mirror pool info -p data


Mode: image
Peers:
UUID NAME CLIENT
d2ae0594-a43b-4c67-a167-a36c646e8643 site-b client.site-b

Reference
Edit online

See User Management for more details.

Fail back to the primary storage cluster


Edit online
When the formerly primary storage cluster recovers, fail back to the primary storage cluster.

NOTE: If you have scheduled snapshots at the image level, then you need to re-add the schedule as image resync operations
changes the RBD Image ID and the previous schedule becomes obsolete.

Prerequisites
Edit online

Minimum of two running IBM Storage Ceph clusters.

Root-level access to the node.

Pool mirroring or image mirroring configured with one-way mirroring.

854 IBM Storage Ceph


Procedure
Edit online

1. Check the status of the images from a monitor node in the site-b cluster again. They should show a state of up-stopped
and the description should say local image is primary:

Example

[root@rbd-client ~]# rbd mirror image status data/image1


image1:
global_id: 08027096-d267-47f8-b52e-59de1353a034
state: up+stopped
description: local image is primary
last_update: 2019-04-22 17:37:48
[root@rbd-client ~]# rbd mirror image status data/image2
image2:
global_id: 08027096-d267-47f8-b52e-59de1353a034
state: up+stopped
description: local image is primary
last_update: 2019-04-22 17:38:18

2. From a Ceph Monitor node on the site-a storage cluster determine if the images are still primary:

Syntax

rbd mirror pool info POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd info data/image1


[root@rbd-client ~]# rbd info data/image2

In the output from the commands, look for mirroring primary: true or mirroring primary: false, to determine
the state.

3. Demote any images that are listed as primary by running a command like the following from a Ceph Monitor node in the site-
a storage cluster:

Syntax

rbd mirror image demote POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image demote data/image1

4. Resynchronize the images ONLY if there was a non-orderly shutdown. Run the following commands on a monitor node in the
site-a storage cluster to resynchronize the images from site-b to site-a:

Syntax

rbd mirror image resync POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image resync data/image1


Flagged image for resync from primary
[root@rbd-client ~]# rbd mirror image resync data/image2
Flagged image for resync from primary

5. After some time, ensure resynchronization of the images is complete by verifying they are in the up+replaying state. Check
their state by running the following commands on a monitor node in the site-a storage cluster:

Syntax

rbd mirror image status POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image status data/image1


[root@rbd-client ~]# rbd mirror image status data/image2

IBM Storage Ceph 855


6. Demote the images on the site-b storage cluster by running the following commands on a Ceph Monitor node in the site-b
storage cluster:

Syntax

rbd mirror image demote POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image demote data/image1


[root@rbd-client ~]# rbd mirror image demote data/image2

NOTE: If there are multiple secondary storage clusters, this only needs to be done from the secondary storage cluster where it
was promoted.

7. Promote the formerly primary images located on the site-a storage cluster by running the following commands on a Ceph
Monitor node in the site-a storage cluster:

Syntax

rbd mirror image promote POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image promote data/image1


[root@rbd-client ~]# rbd mirror image promote data/image2

8. Check the status of the images from a Ceph Monitor node in the site-a storage cluster. They should show a status of
up+stopped and the description should say local image is primary:

Syntax

rbd mirror image status POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd mirror image status data/image1


image1:
global_id: 08027096-d267-47f8-b52e-59de1353a034
state: up+stopped
description: local image is primary
last_update: 2019-04-22 11:14:51
[root@rbd-client ~]# rbd mirror image status data/image2
image2:
global_id: 596f41bc-874b-4cd4-aefe-4929578cc834
state: up+stopped
description: local image is primary
last_update: 2019-04-22 11:14:51

Remove two-way mirroring


Edit online
After fail back is complete, you can remove two-way mirroring and disable the Ceph block device mirroring service.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Remove the site-b storage cluster as a peer from the site-a storage cluster:

856 IBM Storage Ceph


Example

[root@rbd-client ~]# rbd mirror pool peer remove data client.remote@remote --cluster local
[root@rbd-client ~]# rbd --cluster site-a mirror pool peer remove data client.site-b@site-b -n
client.site-a

2. Stop and disable the rbd-mirror daemon on the site-a client:

Syntax

systemctl stop ceph-rbd-mirror@CLIENT_ID


systemctl disable ceph-rbd-mirror@CLIENT_ID
systemctl disable ceph-rbd-mirror.target

Example

[root@rbd-client ~]# systemctl stop ceph-rbd-mirror@site-a


[root@rbd-client ~]# systemctl disable ceph-rbd-mirror@site-a
[root@rbd-client ~]# systemctl disable ceph-rbd-mirror.target

Management of ceph-immutable-object-cache daemons

Edit online
As a storage administrator, use the ceph-immutable-object-cache daemons to cache the parent image content on the local
disk. This cache is in the local caching directory. Future reads on that data use the local cache.

Figure 1. Ceph immutable cache daemon

Explanation of ceph-immutable-object-cache daemons


Configuring the ceph-immutable-object-cache daemon
Generic settings of ceph-immutable-object-cache daemons
QOS settings of ceph-immutable-object-cache daemons

Explanation of ceph-immutable-object-cache daemons

IBM Storage Ceph 857


Edit online
Cloned Block device images usually modify only a small fraction of the parent image. For example, in a virtual desktop interface
(VDI), the virtual machines are cloned from the same base image and initially differ only by the hostname and the IP address. During
the bootup, if you use a local cache of the parent image, this speeds up reads on the caching host. This change reduces the client to
cluster network traffic.

Reasons to use ceph-immutable-object-cache daemons

It is a scalable, open-source, and distributed storage system. It connects to local clusters with the RADOS protocol, relying on
default search paths to find ceph.conf files, monitor addresses and authentication information for them such as
/etc/ceph/_CLUSTER_.conf, /etc/ceph/_CLUSTER_.keyring and /etc/ceph/CLUSTER.NAME.keyring, where CLUSTER
is the human-friendly name of the cluster, and NAME is the RADOS user to connect as an example, client.ceph-immutable-
object-cache.

Key components of the daemon

The ceph-immutable-object-cache daemon has the following parts:

Domain socket based inter-process communication (IPC): The daemon listens on a local domain socket on start-up and waits
for connections from librbd clients.

Least recently used (LRU) based promotion or demotion policy: The daemon maintains in-memory statistics of cache-hits on
each cache file. It demotes the cold cache if capacity reaches to the configured threshold.

File-based caching store: The daemon maintains a simple file based cache store. On promotion the RADOS objects are
fetched from RADOS cluster and stored in the local caching directory.

When you open each cloned RBD image, librbd tries to connect to the cache daemon through its Unix domain socket. Once
successfully connected, librbd coordinates with the daemon on the subsequent reads.

If there is a read that is not cached, the daemon promotes the RADOS object to the local caching directory, so the next read on that
object is serviced from cache. The daemon also maintains simple LRU statistics so that under capacity pressure it evicts cold cache
files as needed.

NOTE: For better performance, use SSDs as the underlying storage.

Configuring the ceph-immutable-object-cache daemon

Edit online
The ceph-immutable-object-cache is a daemon for object cache of RADOS objects among Ceph clusters.

IMPORTANT: To use the ceph-immutable-object-cache daemon, you must be able to connect RADOS clusters.

The daemon promotes the objects to a local directory. These cache objects service the future reads. You can configure the daemon
by installing the ceph-immutable-object-cache package.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

At least one SSD for the cache.

Procedure
Edit online

1. Enable the RBD shared read only parent image cache. Add the following parameters under [client] in the
/etc/ceph/ceph.conf file:

Example

858 IBM Storage Ceph


[root@ceph-host01 ~]# vi /etc/ceph/ceph.conf

[client]
rbd parent cache enabled = true
rbd plugins = parent_cache

Restart the cluster.

2. Install the ceph-immutable-object-cache package:

Example

[root@ceph-host1 ~]# dnf install ceph-immutable-object-cache

3. Create a unique Ceph user ID, the keyring:

Syntax

ceph auth get-or-create client.ceph-immutable-object-cache.USER_NAME mon profile rbd osd


profile rbd-read-only

Example

[root@ceph-host1 ~]# ceph auth get-or-create client.ceph-immutable-object-cache.user mon


'profile rbd' osd 'profile rbd-read-only'

[client.ceph-immutable-object-cache.user]
key = AQCVPH1gFgHRAhAAp8ExRIsoxQK4QSYSRoVJLw==

Copy this keyring.

4. In the /etc/ceph directory, create a file and paste the keyring:

Example

[root@ceph-host1 ]# vi /etc/ceph/ceph.client.ceph-immutable-object-cache.user.keyring

[client.ceph-immutable-object-cache.user]
key = AQCVPH1gFgHRAhAAp8ExRIsoxQK4QSYSRoVJLw

5. Enable the daemon:

Syntax

systemctl enable [email protected]_NAME

Specify the USER_NAME as the daemon instance.

Example

[root@ceph-host1 ~]# systemctl enable ceph-immutable-object-cache@ceph-immutable-object-


cache.user

Created symlink /etc/systemd/system/ceph-immutable-object-cache.target.wants/ceph-immutable-


[email protected] → /usr/lib/systemd/system/ceph-
[email protected].

6. Start the ceph-immutable-object-cache daemon:

Syntax

systemctl start [email protected]_NAME

Example

[root@ceph-host1 ~]# systemctl start ceph-immutable-object-cache@ceph-immutable-object-


cache.user

Verification

Check the status of the configuration:

Syntax

systemctl status [email protected]_NAME

IBM Storage Ceph 859


Example

[root@ceph-host1 ~]# systemctl status ceph-immutable-object-cache@ceph-immutable-object-


cache.user

[email protected]>
Loaded: loaded (/usr/lib/systemd/system/ceph-immutable-objec>
Active: active (running) since Mon 2021-04-19 13:49:06 IST; >
Main PID: 85020 (ceph-immutable-)
Tasks: 15 (limit: 49451)
Memory: 8.3M
CGroup: /system.slice/system-ceph\x2dimmutable\x2dobject\x2d>
└─85020 /usr/bin/ceph-immutable-object-cache -f --cl>

Generic settings of ceph-immutable-object-cache daemons

Edit online
A few important generic settings of ceph-immutable-object-cache daemons are listed.

immutable_object_cache_sock

Description
The path to the domain socket used for communication between librbd clients and the ceph-immutable-object-cache daemon.

Type
String

Default /var/run/ceph/immutable_object_cache_sock

immutable_object_cache_path

Description
The immutable object cache data directory.

Type
String

Default
/tmp/ceph_immutable_object_cache

immutable_object_cache_max_size

Description The maximum size for immutable cache.

Type
Size

Default
1G

immutable_object_cache_watermark

Description The high-water mark for the cache. The value is between zero and one. If the cache size reaches this threshold the
daemon starts to delete cold cache based on LRU statistics.

Type
Float

Default 0.9

QOS settings of ceph-immutable-object-cache daemons

Edit online
The ceph-immutable-object-cache daemons supports throttling which supports the settings described.

860 IBM Storage Ceph


immutable_object_cache_qos_schedule_tick_min

Description
Minimum schedule tick for immutable object cache.

Type
Milliseconds

Default
50

immutable_object_cache_qos_iops_limit

Description
User-defined immutable object cache IO operations limit per second.

Type
Integer

Default
0

immutable_object_cache_qos_iops_burst

Description User-defined burst limit of immutable object cache IO operations.

Type
Integer

Default
0

immutable_object_cache_qos_iops_burst_seconds

Description
User-defined burst duration in seconds of immutable object cache IO operations.

Type
Seconds

Default 1

immutable_object_cache_qos_bps_limit

Description User-defined immutable object cache IO bytes limit per second.

Type
Integer

Default
0

immutable_object_cache_qos_bps_burst

Description
User-defined burst limit of immutable object cache IO bytes.

Type
Integer

Default
0

immutable_object_cache_qos_bps_burst_seconds

Description
The desired burst limit of read operations.

Type
Seconds

IBM Storage Ceph 861


Default
1

The rbd kernel module

Edit online
As a storage administrator, you can access Ceph block devices through the rbd kernel module. You can map and unmap a block
device, and displaying those mappings. Also, you can get a list of images through the rbd kernel module.

IMPORTANT: Kernel clients on Linux distributions other than Red Hat Enterprise Linux (RHEL) are permitted but not supported. If
issues are found in the storage cluster when using these kernel clients, IBM addresses them, but if the root cause is found to be on
the kernel client side, the issue will have to be addressed by the software vendor.

Creating a Ceph Block Device and using it from a Linux kernel module client
Mapping a block device
Displaying mapped block devices
Unmapping a block device

Creating a Ceph Block Device and using it from a Linux kernel


module client
Edit online
Create a Ceph Block Device for a Linux kernel module client on the IBM Storage Ceph Dashboard. As a system administrator, you can
map that block device on a Linux client, and partition, format, and mount it, using the command line. After this, you can read and
write files to it.

Kernel module client supports features like Deep flatten, Layering, Exclusive lock, Object map, and Fast diff.

Creating a Ceph block device for a Linux kernel module client using dashboard
Map and mount a Ceph Block Device on Linux using the command line

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A Red Hat Enterprise Linux client.

Creating a Ceph block device for a Linux kernel module client using
dashboard
Edit online
You can create a Ceph block device specifically for a Linux kernel module client using the dashboard web interface by enabling only
the features it supports.

Kernel module client supports features like Deep flatten, Layering, Exclusive lock, Object map, and Fast diff.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

862 IBM Storage Ceph


A replicated RBD pool created and enabled.

Procedure
Edit online

1. From the Block drop-down menu, select Images.

2. Click Create.

3. In the Create RBD window, enter a image name, select the RBD enabled pool, select the supported features:

Figure 1. Create RBD window

1. Click Create RBD.

Verification

You get a notification that the image is created successfully.

Map and mount a Ceph Block Device on Linux using the command
line
Edit online
After mapping it, you can partition, format, and mount it, so you can write files to it.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A Ceph block device for a Linux kernel module client using the dashboard is created.

A Red Hat Enterprise Linux client.

IBM Storage Ceph 863


Procedure
Edit online

1. On the Red Hat Enterprise Linux client node, enable the IBM Ceph Storage 5 Tools repository:

Red Hat Enterprise Linux 8

[root@rbd-client ~]# subscription-manager repos --enable=rhceph-5-tools-for-rhel-8-x86_64-rpms

2. Install the ceph-common RPM package:

Red Hat Enterprise Linux 8

[root@rbd-client ~]# dnf install ceph-common

3. Copy the Ceph configuration file from a Monitor node to the Client node:

Syntax

scp root@MONITOR_NODE:/etc/ceph/ceph.conf /etc/ceph/ceph.conf

Example

[root@rbd-client ~]# scp root@cluster1-node2:/etc/ceph/ceph.conf /etc/ceph/ceph.conf


[email protected]'s password:
ceph.conf 100% 497 724.9KB/s
00:00
[root@client1 ~]#

4. Copy the key file from a Monitor node to the Client node:

Syntax

scp root@MONITOR_NODE:/etc/ceph/ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring

Example

[root@rbd-client ~]# scp root@cluster1-node2:/etc/ceph/ceph.client.admin.keyring


/etc/ceph/ceph.client.admin.keyring
[email protected]'s password:
ceph.client.admin.keyring 100% 151
265.0KB/s 00:00
[root@client1 ~]#

5. Map the image:

Syntax

rbd map --pool POOL_NAME IMAGE_NAME --id admin

Example

[root@rbd-client ~]# rbd map --pool block-device-pool image1 --id admin


/dev/rbd0
[root@client1 ~]#

6. Create a partition table on the block device:

Syntax

parted /dev/MAPPED_BLOCK_DEVICE mklabel msdos

Example

[root@rbd-client ~]# parted /dev/rbd0 mklabel msdos


Information: You may need to update /etc/fstab.

7. Create a partition for an XFS file system:

Syntax

parted /dev/MAPPED_BLOCK_DEVICE mkpart primary xfs 0% 100%

864 IBM Storage Ceph


Example

[root@rbd-client ~]# parted /dev/rbd0 mkpart primary xfs 0% 100%


Information: You may need to update /etc/fstab.

8. Format the partition:

Syntax

mkfs.xfs /dev/MAPPED_BLOCK_DEVICE_WITH_PARTITION_NUMBER

Example

[root@rbd-client ~]# mkfs.xfs /dev/rbd0p1


meta-data=/dev/rbd0p1 isize=512 agcount=16, agsize=163824 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=2621184, imaxpct=25
= sunit=16 swidth=16 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

9. Create a directory to mount the new file system on:

Syntax

mkdir PATH_TO_DIRECTORY

Example

[root@rbd-client ~]# mkdir /mnt/ceph

10. Mount the file system:

Syntax

mount /dev/MAPPED_BLOCK_DEVICE_WITH_PARTITION_NUMBER PATH_TO_DIRECTORY

Example

[root@rbd-client ~]# mount /dev/rbd0p1 /mnt/ceph/

11. Verify that the file system is mounted and showing the correct size:

Syntax

df -h PATH_TO_DIRECTORY

Example

[root@rbd-client ~]# df -h /mnt/ceph/


Filesystem Size Used Avail Use% Mounted on
/dev/rbd0p1 10G 105M 9.9G 2% /mnt/ceph

Reference
Edit online

For more information, see Managing file systems.

For more information, see Storage Administration Guide.

Mapping a block device


Edit online

IBM Storage Ceph 865


Use rbd to map an image name to a kernel module. You must specify the image name, the pool name and the user name. rbd will
load the RBD kernel module if it is not already loaded.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Return a list of the images:

Example

[root@rbd-client ~]# rbd list

2. Following are the two options to map the image:

Map an image name to a kernel module:

Syntax

rbd device map POOL_NAME/IMAGE_NAME --id USER_NAME

Example

[root@rbd-client ~]# rbd device map rbd/myimage --id admin

Specify a secret when using cephx authentication by either the keyring or a file containing the secret:

Syntax

[root@rbd-client ~]# rbd device map POOL_NAME/IMAGE_NAME --id USER_NAME --keyring


PATH_TO_KEYRING

or

[root@rbd-client ~]# rbd device map POOL_NAME/IMAGE_NAME --id USER_NAME --keyfile


PATH_TO_FILE

Displaying mapped block devices


Edit online
You can display which block device images are mapped to the kernel module with the rbd command.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Display the mapped block devices:

866 IBM Storage Ceph


[root@rbd-client ~]# rbd device list

Unmapping a block device


Edit online
You can unmap a block device image with the rbd command, by using the unmap option and providing the device name.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

An image that is mapped.

Procedure
Edit online

1. Get the specification of the device.

Example

[root@rbd-client ~]# rbd device list

2. Unmap the block device image.

Syntax

rbd device unmap /dev/rbd/POOL_NAME/IMAGE_NAME

Example

[root@rbd-client ~]# rbd device unmap /dev/rbd/pool1/image1

Using the Ceph block device Python module


Edit online
The rbd python module provides file-like access to Ceph block device images. In order to use this built-in tool, import the rbd and
rados Python modules.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure
Edit online

1. Connect to RADOS and open an IO context:

cluster = rados.Rados(conffile='my_ceph.conf')
cluster.connect()
ioctx = cluster.open_ioctx('mypool')

IBM Storage Ceph 867


2. Instantiate an :class:rbd.RBD object, which you use to create the image:

rbd_inst = rbd.RBD()
size = 4 * 1024**3 # 4 GiB
rbd_inst.create(ioctx, 'myimage', size)

3. To perform I/O on the image, instantiate an :class:rbd.Image object:

image = rbd.Image(ioctx, 'myimage')


data = 'foo' * 200
image.write(data, 0)

This writes foo to the first 600 bytes of the image. Note that data cannot be :type:unicode - librbd does not know how to
deal with characters wider than a :c:type:char.

4. Close the image, the IO context and the connection to RADOS:

image.close()
ioctx.close()
cluster.shutdown()

To be safe, each of these calls must to be in a separate :finally block:

import rados
import rbd

cluster = rados.Rados(conffile='my_ceph_conf')
try:
ioctx = cluster.open_ioctx('my_pool')
try:
rbd_inst = rbd.RBD()
size = 4 * 1024**3 # 4 GiB
rbd_inst.create(ioctx, 'myimage', size)
image = rbd.Image(ioctx, 'myimage')
try:
data = 'foo' * 200
image.write(data, 0)
finally:
image.close()
finally:
ioctx.close()
finally:
cluster.shutdown()

This can be cumbersome, so the Rados, Ioctx, and Image classes can be used as context managers that close or shut down
automatically. Using them as context managers, the above example becomes:

with rados.Rados(conffile='my_ceph.conf') as cluster:


with cluster.open_ioctx('mypool') as ioctx:
rbd_inst = rbd.RBD()
size = 4 * 1024**3 # 4 GiB
rbd_inst.create(ioctx, 'myimage', size)
with rbd.Image(ioctx, 'myimage') as image:
data = 'foo' * 200
image.write(data, 0)

Ceph block device configuration reference


Edit online
As a storage administrator, you can fine tune the behavior of Ceph block devices through the various options that are available. You
can use this reference for viewing such things as the default Ceph block device options, and Ceph block device caching options.

Block device default options


Block device general options
Block device caching options
Block device parent and child read options
Block device read ahead options
Block device blocklist options
Block device journal options

868 IBM Storage Ceph


Block device configuration override options
Block device input and output options

Block device default options


Edit online
It is possible to override the default settings for creating an image. Ceph will create images with format 2 and no striping.

rbd_default_format

Description The default format (2) if no other format is specified. Format 1 is the original format for a new image, which is
compatible with all versions of librbd and the kernel module, but does not support newer features like cloning. Format 2 is
supported by librbd and the kernel module since version 3.11 (except for striping). Format 2 adds support for cloning and is more
easily extensible to allow more features in the future.

Type
Integer

Default 2

rbd_default_order

Description The default order if no other order is specified.

Type
Integer

Default 22

rbd_default_stripe_count

Description The default stripe count if no other stripe count is specified. Changing the default value requires striping v2 feature.

Type
64-bit Unsigned Integer

Default 0

rbd_default_stripe_unit

Description The default stripe unit if no other stripe unit is specified. Changing the unit from 0 (that is, the object size) requires the
striping v2 feature.

Type
64-bit Unsigned Integer

Default
0

rbd_default_features

Description
The default features enabled when creating an block device image. This setting only applies to format 2 images. The settings are:

1: Layering support. Layering enables you to use cloning.

2: Striping v2 support. Striping spreads data across multiple objects. Striping helps with parallelism for sequential read/write
workloads.

4: Exclusive locking support. When enabled, it requires a client to get a lock on an object before making a write.

8: Object map support. Block devices are thin-provisioned—meaning, they only store data that actually exists. Object map support
helps track which objects actually exist (have data stored on a drive). Enabling object map support speeds up I/O operations for
cloning, or importing and exporting a sparsely populated image.

IBM Storage Ceph 869


16: Fast-diff support. Fast-diff support depends on object map support and exclusive lock support. It adds another property to the
object map, which makes it much faster to generate diffs between snapshots of an image, and the actual data usage of a snapshot
much faster.

32: Deep-flatten support. Deep-flatten makes rbd flatten work on all the snapshots of an image, in addition to the image itself.
Without it, snapshots of an image will still rely on the parent, so the parent will not be delete-able until the snapshots are deleted.
Deep-flatten makes a parent independent of its clones, even if they have snapshots.

64: Journaling support. Journaling records all modifications to an image in the order they occur. This ensures that a crash-
consistent mirror of the remote image is available locally

The enabled features are the sum of the numeric settings. Type;; Integer Default;; 61 - layering, exclusive-lock, object-map, fast-diff,
and deep-flatten are enabled

IMPORTANT: The current default setting is not compatible with the RBD kernel driver nor older RBD clients.

rbd_default_map_options

Description

Most of the options are useful mainly for debugging and benchmarking. See man rbd under Map Options for details.

Type
String

Default ""

Block device general options


Edit online
rbd_op_threads

Description The number of block device operation threads.

Type Integer

Default 1

WARNING: Do not change the default value of rbd_op_threads because setting it to a number higher than 1 might cause data
corruption.

rbd_op_thread_timeout

Description
The timeout (in seconds) for block device operation threads.

Type Integer

Default
60

rbd_non_blocking_aio

Description If true, Ceph will process block device asynchronous I/O operations from a worker thread to prevent blocking.

Type
Boolean

Default true

rbd_concurrent_management_ops

Description
The maximum number of concurrent management operations in flight (for example, deleting or resizing an image).

Type Integer

870 IBM Storage Ceph


Default
10

rbd_request_timed_out_seconds

Description
The number of seconds before a maintenance request times out.

Type
Integer

Default
30

rbd_clone_copy_on_read

Description When set to true, copy-on-read cloning is enabled.

Type
Boolean

Default
false

rbd_enable_alloc_hint

Description If true, allocation hinting is enabled, and the block device will issue a hint to the OSD back end to indicate the expected
size object.

Type
Boolean

Default true

rbd_skip_partial_discard

Description
If true, the block device will skip zeroing a range when trying to discard a range inside an object.

Type
Boolean

Default
false

rbd_tracing

Description
Set this option to true to enable the Linux Trace Toolkit Next Generation User Space Tracer (LTTng-UST) tracepoints. See Tracing
RADOS Block Device (RBD) Workloads with the RBD Replay Feature for details.

Type
Boolean

Default
false

rbd_validate_pool

Description
Set this option to true to validate empty pools for RBD compatibility.

Type
Boolean

Default
true

rbd_validate_names

IBM Storage Ceph 871


Description
Set this option to true to validate image specifications.

Type
Boolean

Default true

Block device caching options


Edit online
The user space implementation of the Ceph block device, that is, librbd, cannot take advantage of the Linux page cache, so it
includes its own in-memory caching, called RBD caching. Ceph block device caching behaves just like well-behaved hard disk
caching. When the operating system sends a barrier or a flush request, all dirty data is written to the Ceph OSDs. This means that
using write-back caching is just as safe as using a well-behaved physical hard disk with a virtual machine that properly sends
flushes, that is, Linux kernel version 2.6.32 or higher. The cache uses a Least Recently Used (LRU) algorithm, and in write-back mode
it can coalesce contiguous requests for better throughput.

Ceph block devices support write-back caching. To enable write-back caching, set rbd_cache = true to the [client] section of
the Ceph configuration file. By default, librbd does not perform any caching. Writes and reads go directly to the storage cluster, and
writes return only when the data is on disk on all replicas. With caching enabled, writes return immediately, unless there are more
than rbd_cache_max_dirty unflushed bytes. In this case, the write triggers write-back and blocks until enough bytes are flushed.

Ceph block devices support write-through caching. You can set the size of the cache, and you can set targets and limits to switch
from write-back caching to write-through caching. To enable write-through mode, set rbd_cache_max_dirty to 0. This means
writes return only when the data is on disk on all replicas, but reads may come from the cache. The cache is in memory on the client,
and each Ceph block device image has its own. Since the cache is local to the client, there is no coherency if there are others
accessing the image. Running other file systems, such as GFS or OCFS, on top of Ceph block devices will not work with caching
enabled.

The Ceph configuration settings for Ceph block devices must be set in the [client] section of the Ceph configuration file, by
default, /etc/ceph/ceph.conf.

The settings include:

rbd_cache

Description
Enable caching for RADOS Block Device (RBD).

Type Boolean

Required
No

Default
true

rbd_cache_size

Description The RBD cache size in bytes.

Type
64-bit Integer

Required
No

Default
32 MiB

rbd_cache_max_dirty

Description
The dirty limit in bytes at which the cache triggers write-back. If 0, uses write-through caching.

872 IBM Storage Ceph


Type
64-bit Integer

Required No

Constraint Must be less than rbd cache size.

Default
24 MiB

rbd_cache_target_dirty

Description
The dirty target before the cache begins writing data to the data storage. Does not block writes to the cache.

Type
64-bit Integer

Required
No

Constraint
Must be less than rbd cache max dirty.

Default
16 MiB

rbd_cache_max_dirty_age

Description
The number of seconds dirty data is in the cache before writeback starts.

Type Float

Required
No

Default
1.0

rbd_cache_max_dirty_object

Description The dirty limit for objects - set to 0 for auto calculate from rbd_cache_size.

Type
Integer

Default
0

rbd_cache_block_writes_upfront

Description
If true, it will block writes to the cache before the aio_write call completes. If false, it will block before the aio_completion
is called.

Type Boolean

Default
false

rbd_cache_writethrough_until_flush

Description
Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but
safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32.

Type
Boolean

IBM Storage Ceph 873


Required
No

Default
true

Block device parent and child read options


Edit online
rbd_balance_snap_reads

Description
Ceph typically reads objects from the primary OSD. Since reads are immutable, you may enable this feature to balance snap reads
between the primary OSD and the replicas.

Type
Boolean

Default false

rbd_localize_snap_reads

Description
Whereas rbd_balance_snap_reads will randomize the replica for reading a snapshot. If you enable
rbd_localize_snap_reads, the block device will look to the CRUSH map to find the closest or local OSD for reading the
snapshot.

Type
Boolean

Default
false

rbd_balance_parent_reads

Description
Ceph typically reads objects from the primary OSD. Since reads are immutable, you may enable this feature to balance parent reads
between the primary OSD and the replicas.

Type
Boolean

Default false

rbd_localize_parent_reads

Description
Whereas rbd_balance_parent_reads will randomize the replica for reading a parent. If you enable
rbd_localize_parent_reads, the block device will look to the CRUSH map to find the closest or local OSD for reading the
parent.

Type
Boolean

Default
true

Block device read ahead options


Edit online
RBD supports read-ahead/prefetching to optimize small, sequential reads. This should normally be handled by the guest OS in the
case of a VM, but boot loaders may not issue efficient reads. Read-ahead is automatically disabled if caching is disabled.

874 IBM Storage Ceph


rbd_readahead_trigger_requests

Description
Number of sequential read requests necessary to trigger read-ahead.

Type Integer

Required
No

Default
10

rbd_readahead_max_bytes

Description
Maximum size of a read-ahead request. If zero, read-ahead is disabled.

Type
64-bit Integer

Required
No

Default 512 KiB

rbd_readahead_disable_after_bytes

Description

After this many bytes have been read from an RBD image, read-ahead is disabled for that image until it is closed. This allows the
guest OS to take over read-ahead once it is booted. If zero, read-ahead stays enabled.

Type
64-bit Integer

Required No

Default 50 MiB

Block device blocklist options


Edit online
rbd_blocklist_on_break_lock

Description Whether to blocklist clients whose lock was broken.

Type
Boolean

Default true

rbd_blocklistexpire_seconds

Description
The number of seconds to blocklist - set to 0 for OSD default.

Type
Integer

Default
0

Block device journal options


IBM Storage Ceph 875
Edit online
rbd_journal_order

Description
The number of bits to shift to compute the journal object maximum size. The value is between 12 and 64.

Type
32-bit Unsigned Integer

Default 24

rbd_journal_splay_width

Description
The number of active journal objects.

Type
32-bit Unsigned Integer

Default
4

rbd_journal_commit_age

Description The commit time interval in seconds.

Type
Double Precision Floating Point Number

Default
5

rbd_journal_object_flush_interval

Description
The maximum number of pending commits per a journal object.

Type Integer

Default
0

rbd_journal_object_flush_bytes

Description
The maximum number of pending bytes per a journal object.

Type
Integer

Default
0

rbd_journal_object_flush_age

Description
The maximum time interval in seconds for pending commits.

Type
Double Precision Floating Point Number

Default
0

rbd_journal_pool

Description
Specifies a pool for journal objects.

Type String

876 IBM Storage Ceph


Default
""

Block device configuration override options


Edit online
Block device configuration override options for global and pool levels.

Global level

Available keys

rbd_qos_bps_burst

Description
The desired burst limit of IO bytes.

Type
Integer

Default 0

rbd_qos_bps_limit

Description
The desired limit of IO bytes per second.

Type Integer

Default
0

rbd_qos_iops_burst

Description
The desired burst limit of IO operations.

Type
Integer

Default
0

rbd_qos_iops_limit

Description
The desired limit of IO operations per second.

Type Integer

Default
0

rbd_qos_read_bps_burst

Description
The desired burst limit of read bytes.

Type
Integer

Default
0

rbd_qos_read_bps_limit

Description
The desired limit of read bytes per second.

IBM Storage Ceph 877


Type
Integer

Default
0

rbd_qos_read_iops_burst

Description
The desired burst limit of read operations.

Type
Integer

Default
0

rbd_qos_read_iops_limit

Description
The desired limit of read operations per second.

Type Integer

Default
0

rbd_qos_write_bps_burst

Description
The desired burst limit of write bytes.

Type
Integer

Default
0

rbd_qos_write_bps_limit

Description
The desired limit of write bytes per second.

Type
Integer

Default
0

rbd_qos_write_iops_burst

Description
The desired burst limit of write operations.

Type
Integer

Default
0

rbd_qos_write_iops_limit

Description
The desired burst limit of write operations per second.

Type
Integer

Default 0

The above keys can be used for the following:

878 IBM Storage Ceph


rbd config global set _CONFIG_ENTITY_ _KEY_ _VALUE_

Description
Set a global level configuration override.

rbd config global get _CONFIG_ENTITY_ _KEY_

Description
Get a global level configuration override.

rbd config global list _CONFIG_ENTITY_

Description
List the global level configuration overrides.

rbd config global remove _CONFIG_ENTITY_ _KEY_

Description
Remove a global level configuration override.

Pool level

rbd config pool set POOL_NAME _KEY_ _VALUE_

Description
Set a pool level configuration override.

rbd config pool get POOL_NAME _KEY_

Description
Get a pool level configuration override.

rbd config pool list POOL_NAME

Description
List the pool level configuration overrides.

rbd config pool remove POOL_NAME _KEY_

Description
Remove a pool level configuration override.

NOTE: _CONFIG_ENTITY_ is global, client or client id. _KEY_ is the config key. _VALUE_ is the config value. POOL_NAME is the name
of the pool.

Block device input and output options


Edit online
rbd_compression_hint

Description
Hint to send to the OSDs on write operations. If set to compressible and the OSD bluestore_compression_mode setting is
passive, the OSD attempts to compress data. If set to incompressible and the OSD bluestore_compression_mode setting is
aggressive, the OSD will not attempt to compress data.

Type
Enum

Required
No

Default
none

Values
none, compressible, incompressible

rbd_read_from_replica_policy

IBM Storage Ceph 879


Description
Policy for determining which OSD receives read operations. If set to default, each PG’s primary OSD will always be used for read
operations. If set to balance, read operations will be sent to a randomly selected OSD within the replica set. If set to localize,
read operations will be sent to the closest OSD as determined by the CRUSH map and the crush_location configuration option,
where the crush_location is denoted using key=value. The key aligns with the CRUSH map keys.

NOTE: This feature requires the storage cluster to be configured with a minimum compatible OSD release of the latest version of IBM
Storage Ceph.

Type
Enum

Required
No

Default
default

Values
default, balance, localize

Developer
Edit online
Use the various application programming interfaces (APIs) for IBM Storage Ceph running on AMD64 and Intel 64 architectures.

Ceph RESTful API


Ceph Object Gateway administrative API
Ceph Object Gateway and the S3 API
Ceph Object Gateway and the Swift API
The Ceph RESTful API specifications
S3 common request headers
S3 common response status codes
S3 unsupported header fields
Swift request headers
Swift response headers
Examples using the Secure Token Service APIs

Ceph RESTful API


Edit online
As a storage administrator, you can use the Ceph RESTful API, or simply the Ceph API, provided by the IBM Storage Ceph Dashboard
to interact with the IBM Storage Ceph cluster. You can display information about the Ceph Monitors and OSDs, along with their
respective configuration options. You can even create or edit Ceph pools.

The Ceph API uses the following standards:

HTTP 1.1

JSON

MIME and HTTP Content Negotiation

JWT

These standards are OpenAPI 3.0 compliant, regulating the API syntax, semantics, content encoding, versioning, authentication, and
authorization.

Prerequisites
Versioning for the Ceph API
Authentication and authorization for the Ceph API

880 IBM Storage Ceph


Enabling and Securing the Ceph API module
Questions and Answers

Prerequisites
Edit online

A healthy running IBM Storage Ceph cluster.

Access to the node running the Ceph Manager.

Versioning for the Ceph API


Edit online
A main goal for the Ceph RESTful API, is to provide a stable interface. To achieve a stable interface, the Ceph API is built on the
following principles:

A mandatory explicit default version for all endpoints to avoid implicit defaults.

Fine-grain change control per-endpoint.

The expected version from a specific endpoint is stated in the HTTP header.

Syntax

Accept: application/vnd.ceph.api.vMAJOR.MINOR+json

Example

Accept: application/vnd.ceph.api.v1.0+json

If the current Ceph API server is not able to address that specific version, a 415 - Unsupported Media Type
response will be returned.

Using semantic versioning.

Major changes are backwards incompatible. Changes might result in non-additive changes to the request, and to the
response formats for a specific endpoint.

Minor changes are backwards and forwards compatible. Changes consist of additive changes to the request or
response formats for a specific endpoint.

Authentication and authorization for the Ceph API


Edit online
Access to the Ceph RESTful API goes through two checkpoints. The first is authenticating that the request is done on the behalf of a
valid, and existing user. Secondly, is authorizing the previously authenticated user can do a specific action, such as creating, reading,
updating, or deleting, on the target end point.

Before users start using the Ceph API, they need a valid JSON Web Token (JWT). The /api/auth endpoint allows you to retrieve
this token.

Example

[root@mon ~]# curl -X POST "https://example.com:8443/api/auth" \


-H "Accept: application/vnd.ceph.api.v1.0+json" \
-H "Content-Type: application/json" \
-d '{"username": user1, "password": password1}'

This token must be used together with every API request by placing it within the Authorization HTTP header.

IBM Storage Ceph 881


Syntax

curl -H "Authorization: Bearer TOKEN" ...

Reference
Edit online

See the Ceph user management for more details.

Enabling and Securing the Ceph API module


Edit online
The IBM Storage Ceph Dashboard module offers the RESTful API access to the storage cluster over an SSL-secured connection.

IMPORTANT: If disabling SSL, then user names and passwords are sent unencrypted to the IBM Storage Ceph Dashboard.

Prerequisites
Edit online

Root-level access to a Ceph Monitor node.

Ensure that you have at least one ceph-mgr daemon active.

If you use a firewall, ensure that TCP port 8443, for SSL, and TCP port 8080, without SSL, are open on the node with the active
ceph-mgr daemon.

Procedure
Edit online

1. Log into the Cephadm shell:

Example

root@host01 ~]# cephadm shell

2. Enable the RESTful plug-in:

[ceph: root@host01 /]# ceph mgr module enable dashboard

3. Configure an SSL certificate.

a. If your organization’s certificate authority (CA) provides a certificate, then set using the certificate files:

Syntax

ceph dashboard set-ssl-certificate HOST_NAME -i CERT_FILE


ceph dashboard set-ssl-certificate-key HOST_NAME -i KEY_FILE

Example

[ceph: root@host01 /]# ceph dashboard set-ssl-certificate -i dashboard.crt


[ceph: root@host01 /]# ceph dashboard set-ssl-certificate-key -i dashboard.key

If you want to set unique node-based certificates, then add a HOST_NAME to the commands:

Example

[ceph: root@host01 /]# ceph dashboard set-ssl-certificate host01 -i dashboard.crt


[ceph: root@host01 /]# ceph dashboard set-ssl-certificate-key host01 -i dashboard.key

b. Alternatively, you can generate a self-signed certificate. However, using a self-signed certificate does not provide full
security benefits of the HTTPS protocol:

882 IBM Storage Ceph


[ceph: root@host01 /]# ceph dashboard create-self-signed-cert

WARNING: Most modern web browsers will complain about self-signed certificates, which require you to confirm
before establishing a secure connection.

4. Create a user, set the password, and set the role:

Syntax

echo -n "PASSWORD" > PATH_TO_FILE/PASSWORDFILE


ceph dashboard ac-user-create USERNAME -i PASSWORDFILE ROLE

Example

[ceph: root@host01 /]# echo -n "p@ssw0rd" > /root/dash-password.txt


[ceph: root@host01 /]# ceph dashboard ac-user-create user1 -i /root/dash-password.txt
administrator

This example creates a user named user1 with the administrator role.

5. Connect to the RESTful plug-in web page. Open a web browser and enter the following URL:

Syntax

https://HOST_NAME:8443

Example

https://host01:8443

If you used a self-signed certificate, confirm a security exception.

Reference
Edit online

The ceph dashboard --help command.

The https://HOST_NAME:8443/doc page, where HOST -NAME is the IP address or name of the node with the running
ceph-mgr instance.

Red Hat Enterprise Linux 8 Security Hardening Guide

Questions and Answers


Edit online

Getting information
Changing Configuration
Administering the Cluster

Getting information
Edit online
This section describes how to use the Ceph API to view information about the storage cluster, Ceph Monitors, OSDs, pools, and
hosts.

How Can I View All Cluster Configuration Options?


How Can I View a Particular Cluster Configuration Option?
How Can I View All Configuration Options for OSDs?
How Can I View CRUSH Rules?
How Can I View Information about Monitors?
How Can I View Information About a Particular Monitor?

IBM Storage Ceph 883


How Can I View Information about OSDs?
How Can I View Information about a Particular OSD?
How Can I Determine What Processes Can Be Scheduled on an OSD?
How Can I View Information About Pools?
How Can I View Information About a Particular Pool?
How Can I View Information About Hosts?
How Can I View Information About a Particular Host?

How Can I View All Cluster Configuration Options?


Edit online
This section describes how to use the RESTful plug-in to view cluster configuration options and their values.

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:CEPH_MANAGER_PORT/api/cluster_conf'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

CEPH_MANAGER_PORT with the TCP port number. The default TCP port number is 8443.

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/cluster_conf'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/cluster_conf', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/cluster_conf', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

884 IBM Storage Ceph


https://CEPH_MANAGER:8080/api/cluster_conf

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user name and password when prompted.

Reference
Edit online

Configuration

How Can I View a Particular Cluster Configuration Option?


Edit online
This section describes how to view a particular cluster option and its value.

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/cluster_conf/ARGUMENT'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ARGUMENT with the configuration option you want to view

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/cluster_conf/ARGUMENT'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/cluster_conf/ARGUMENT', auth=("USER",
"PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ARGUMENT with the configuration option you want to view

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests

IBM Storage Ceph 885


>> result = requests.get('https://CEPH_MANAGER:8080/api/cluster_conf/ARGUMENT', auth=("USER",
"PASSWORD"), verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/cluster_conf/ARGUMENT

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ARGUMENT with the configuration option you want to view

Enter the user name and password when prompted.

Reference
Edit online

Configuration

How Can I View All Configuration Options for OSDs?


Edit online
This section describes how to view all configuration options and their values for OSDs.

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/osd/flags'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/osd/flags'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/osd/flags', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

USER with the user name

886 IBM Storage Ceph


PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/osd/flags', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/osd/flags

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user name and password when prompted.

Reference
Edit online

Configuration

How Can I View CRUSH Rules?


Edit online
This section describes how to view CRUSH rules.

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/crush_rule'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/crush_rule'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/crush_rule', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

IBM Storage Ceph 887


CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/crush_rule', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/crush_rule

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user name and password when prompted.

Reference
Edit online

CRUSH Rules

How Can I View Information about Monitors?


Edit online
This section describes how to view information about a particular Monitor, such as:

IP address

Name

Quorum status

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/monitor'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/monitor'

Python
888 IBM Storage Ceph
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/monitor', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/monitor', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/monitor

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user name and password when prompted.

How Can I View Information About a Particular Monitor?


Edit online
This section describes how to view information about a particular Monitor, such as:

IP address

Name

Quorum status

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/monitor/NAME'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

NAME with the short host name of the Monitor

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

IBM Storage Ceph 889


curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/monitor/NAME'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/monitor/NAME', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

NAME with the short host name of the Monitor

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/monitor/NAME', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/monitor/NAME

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

NAME with the short host name of the Monitor

Enter the user name and password when prompted.

How Can I View Information about OSDs?


Edit online
This section describes how to view information about OSDs, such as:

IP address

Its pools

Affinity

Weight

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/osd'

890 IBM Storage Ceph


Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --userUSER 'https://CEPH_MANAGER:8080/api/osd'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/osd/', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/osd/', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/osd

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user name and password when prompted.

How Can I View Information about a Particular OSD?


Edit online
This section describes how to view information about a particular OSD, such as:

IP address

Its pools

Affinity

Weight

The curl Command


IBM Storage Ceph 891
Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/osd/ID'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/osd/ID'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/osd/ID', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/osd/ID', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/osd/ID

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

Enter the user name and password when prompted.

How Can I Determine What Processes Can Be Scheduled on an


OSD?
Edit online

892 IBM Storage Ceph


This section describes how to use the RESTful plug-in to view what processes, such as scrubbing or deep scrubbing, can be
scheduled on an OSD.

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/osd/ID/command'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/osd/ID/command'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/osd/ID/command', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/osd/ID/command', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/osd/ID/command

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

Enter the user name and password when prompted.

IBM Storage Ceph 893


How Can I View Information About Pools?
Edit online
This section describes how to view information about pools, such as:

Flags

Size

Number of placement groups

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/pool'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/pool'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/pool', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/pool', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/pool

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

894 IBM Storage Ceph


Enter the user name and password when prompted.

How Can I View Information About a Particular Pool?


Edit online
This section describes how to view information about a particular pool, such as:

Flags

Size

Number of placement groups

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/pool/ID'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the pool listed in the pool field

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/pool/ID'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/pool/ID', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the pool listed in the pool field

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/pool/ID', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online

IBM Storage Ceph 895


In the web browser, enter:

https://CEPH_MANAGER:8080/api/pool/ID

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the pool listed in the pool field

Enter the user name and password when prompted.

How Can I View Information About Hosts?


Edit online
This section describes how to view information about hosts, such as:

Host names

Ceph daemons and their IDs

Ceph version

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/host'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/host'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/host', auth=("USER, "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/host', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

896 IBM Storage Ceph


Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/host

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user name and password when prompted.

How Can I View Information About a Particular Host?


Edit online
This section describes how to view information about a particular host, such as:

Host names

Ceph daemons and their IDs

Ceph version

The curl Command


Edit online
On the command line, use:

curl --silent --user USER 'https://CEPH_MANAGER:8080/api/host/HOST_NAME'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

HOST_NAME with the host name of the host listed in the hostname field

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/host/HOST_NAME'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/host/HOST_NAME', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

HOST_NAME with the host name of the host listed in the hostname field

USER with the user name

PASSWORD with the user’s password

IBM Storage Ceph 897


If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.get('https://CEPH_MANAGER:8080/api/host/HOST_NAME', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Web Browser
Edit online
In the web browser, enter:

https://CEPH_MANAGER:8080/api/host/HOST_NAME

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

HOST_NAME with the host name of the host listed in the hostname field

Enter the user name and password when prompted.

Changing Configuration
Edit online
This section describes how to use the Ceph API to change OSD configuration options, the state of an OSD, and information about
pools.

How Can I Change OSD Configuration Options?


How Can I Change the OSD State?
How Can I Reweight an OSD?
How Can I Change Information for a Pool?

How Can I Change OSD Configuration Options?


Edit online
This section describes how to use the RESTful plug-in to change OSD configuration options.

The curl Command


Edit online
On the command line, use:

echo -En '{"=OPTION": VALUE}' | curl --request PATCH --data @- --silent --user USER
'https://CEPH_MANAGER:8080/api/osd/flags'

Replace:

OPTION with the option to modify; pause, noup, nodown, noout, noin, nobackfill, norecover, noscrub, nodeep-
scrub

VALUE with true or false

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

898 IBM Storage Ceph


echo -En '{"OPTION": VALUE}' | curl --request PATCH --data @- --silent --insecure --user USER
'https://CEPH_MANAGER:8080/api/osd/flags'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.patch('https://CEPH_MANAGER:8080/api/osd/flags', json={"OPTION": VALUE}, auth=
("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

OPTION with the option to modify; pause, noup, nodown, noout, noin, nobackfill, norecover, noscrub, nodeep-
scrub

VALUE with True or False

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.patch('https://CEPH_MANAGER:8080/api/osd/flags', json={"OPTION": VALUE}, auth=
("USER", "PASSWORD"), verify=False)
>> print result.json()

How Can I Change the OSD State?


Edit online
This section describes how to use the RESTful plug-in to change the state of an OSD.

The curl Command


Edit online
On the command line, use:

echo -En '{"STATE": VALUE}' | curl --request PATCH --data @- --silent --user USER
'https://CEPH_MANAGER:8080/api/osd/ID'

Replace:

STATE with the state to change (in or up)

VALUE with true or false

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

echo -En '{"STATE": VALUE}' | curl --request PATCH --data @- --silent --insecure --user USER
'https://CEPH_MANAGER:8080/api/osd/ID'

IBM Storage Ceph 899


Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.patch('https://CEPH_MANAGER:8080/api/osd/ID', json={"STATE": VALUE}, auth=
("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

STATE with the state to change (in or up)

VALUE with True or False

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.patch('https://CEPH_MANAGER:8080/api/osd/ID', json={"STATE": VALUE}, auth=
("USER", "PASSWORD"), verify=False)
>> print result.json()

How Can I Reweight an OSD?


Edit online
This section describes how to change the weight of an OSD.

The curl Command


Edit online
On the command line, use:

echo -En '{"reweight": VALUE}' | curl --request PATCH --data @- --silent --user USER
'https://CEPH_MANAGER:8080/api/osd/ID'

Replace:

VALUE with the new weight

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

echo -En '{"reweight": VALUE}' | curl --request PATCH --data @- --silent --insecure --user USER
'https://CEPH_MANAGER:8080/api/osd/ID'

Python
Edit online

900 IBM Storage Ceph


In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.patch('https://CEPH_MANAGER:8080/osd/ID', json={"reweight": VALUE}, auth=
("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

VALUE with the new weight

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.patch('https://CEPH_MANAGER:8080/api/osd/ID', json={"reweight": VALUE}, auth=
("USER", "PASSWORD"), verify=False)
>> print result.json()

How Can I Change Information for a Pool?


Edit online
This section describes how to use the RESTful plug-in to change information for a particular pool.

The curl Command


Edit online
On the command line, use:

echo -En '{"OPTION": VALUE}' | curl --request PATCH --data @- --silent --user USER
'https://CEPH_MANAGER:8080/api/pool/ID'

Replace:

OPTION with the option to modify

VALUE with the new value of the option

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the pool listed in the pool field

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

echo -En '{"OPTION": VALUE}' | curl --request PATCH --data @- --silent --insecure --user USER
'https://CEPH_MANAGER:8080/api/pool/ID'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests

IBM Storage Ceph 901


>> result = requests.patch('https://CEPH_MANAGER:8080/api/pool/ID', json={"OPTION": VALUE}, auth=
("USER, "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the pool listed in the pool field

OPTION with the option to modify

VALUE with the new value of the option

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.patch('https://CEPH_MANAGER:8080/api/pool/ID', json={"OPTION": VALUE}, auth=
("USER, "PASSWORD"), verify=False)
>> print result.json()

Administering the Cluster


Edit online
This section describes how to use the Ceph API to initialize scrubbing or deep scrubbing on an OSD, create a pool or remove data
from a pool, remove requests, or create a request.

How Can I Run a Scheduled Process on an OSD?


How Can I Create a New Pool?
How Can I Remove Pool?

How Can I Run a Scheduled Process on an OSD?


Edit online
This section describes how to use the RESTful API to run scheduled processes, such as scrubbing or deep scrubbing, on an OSD.

The curl Command


Edit online
On the command line, use:

echo -En '{"command": "COMMAND"}' | curl --request POST --data @- --silent --user USER
'https://CEPH_MANAGER:8080/api/osd/ID/command'

Replace:

COMMAND with the process (scrub, deep-scrub, or repair) you want to start. Verify it the process is supported on the OSD.
See How can I determine what process can be scheduled on an OSD? for details.

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

902 IBM Storage Ceph


echo -En '{"command": "COMMAND"}' | curl --request POST --data @- --silent --insecure --user USER
'https://CEPH_MANAGER:8080/api/osd/ID/command'

Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.post('https://CEPH_MANAGER:8080/api/osd/ID/command', json={"command":
"COMMAND"}, auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the OSD listed in the osd field

COMMAND with the process (scrub, deep-scrub, or repair) you want to start. Verify it the process is supported on the OSD.
See How can I determine what process can be scheduled on an OSD? for details.

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.post('https://CEPH_MANAGER:8080/api/osd/ID/command', json={"command":
"COMMAND"}, auth=("USER", "PASSWORD"), verify=False)
>> print result.json()

How Can I Create a New Pool?


Edit online
This section describes how to use the RESTful plug-in to create a new pool.

The curl Command


Edit online
On the command line, use:

echo -En '{"name": "NAME", "pg_num": NUMBER}' | curl --request POST --data @- --silent --user USER
'https://CEPH_MANAGER:8080/api/pool'

Replace:

NAME with the name of the new pool

NUMBER with the number of the placement groups

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

echo -En '{"name": "NAME", "pg_num": NUMBER}' | curl --request POST --data @- --silent --insecure -
-user USER 'https://CEPH_MANAGER:8080/api/pool'

IBM Storage Ceph 903


Python
Edit online
In the Python interpreter, enter:

$ python
>> import requests
>> result = requests.post('https://CEPH_MANAGER:8080/api/pool', json={"name": "NAME", "pg_num":
NUMBER}, auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

NAME with the name of the new pool

NUMBER with the number of the placement groups

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.post('https://CEPH_MANAGER:8080/api/pool', json={"name": "NAME", "pg_num":
NUMBER}, auth=("USER", "PASSWORD"), verify=False)
>> print result.json()

How Can I Remove Pool?


Edit online
This section describes how to use the RESTful plug-in to remove a pool.

This request is by default forbidden. To allow it, add the following parameter to the Ceph configuration.

mon_allow_pool_delete = true

The curl Command


Edit online
On the command line, use:

curl --request DELETE --silent --user USER 'https://CEPH_MANAGER:8080/api/pool/ID'

Replace:

USER with the user name

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the pool listed in the pool field

Enter the user’s password when prompted.

If you used a self-signed certificate, use the --insecure option:

curl --request DELETE --silent --insecure --user USER 'https://CEPH_MANAGER:8080/api/pool/ID'

Python
Edit online
In the Python interpreter, enter:

904 IBM Storage Ceph


$ python
>> import requests
>> result = requests.delete('https://CEPH_MANAGER:8080/api/pool/ID', auth=("USER", "PASSWORD"))
>> print result.json()

Replace:

CEPH_MANAGER with the IP address or short host name of the node with the active ceph-mgr instance

ID with the ID of the pool listed in the pool field

USER with the user name

PASSWORD with the user’s password

If you used a self-signed certificate, use the verify=False option:

$ python
>> import requests
>> result = requests.delete('https://CEPH_MANAGER:8080/api/pool/ID', auth=("USER", "PASSWORD"),
verify=False)
>> print result.json()

Ceph Object Gateway administrative API


Edit online
As a developer, you can administer the Ceph Object Gateway by interacting with the RESTful application programming interface
(API). The Ceph Object Gateway makes available the features of the radosgw-admin command in a RESTful API. You can manage
users, data, quotas, and usage which you can integrate with other management platforms.

NOTE: IBM recommends using the command-line interface when configuring the Ceph Object Gateway.

Figure 1. Basic access diagram

IBM Storage Ceph 905


The administrative API provides the following functionality:

Prerequisites
Administration operations
Administration authentication requests
Creating an administrative user
Get user information
Create a user
Modify a user
Remove a user
Create a subuser
Modify a subuser
Remove a subuser
Add capabilities to a user
Remove capabilities from a user
Create a key
Remove a key
Bucket notifications
Get bucket information
Check a bucket index
Remove a bucket
Link a bucket
Unlink a bucket
Get a bucket or object policy
Remove an object

906 IBM Storage Ceph


Quotas
Get a user quota
Set a user quota
Get a bucket quota
Set a bucket quota
Get usage information
Remove usage information
Standard error responses

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

Administration operations
Edit online
An administrative Application Programming Interface (API) request will be done on a URI that starts with the configurable admin
resource entry point. Authorization for the administrative API duplicates the S3 authorization mechanism. Some operations require
that the user holds special administrative capabilities. The response entity type, either XML or JSON, might be specified as the
format option in the request and defaults to JSON if not specified.

Example

PUT /admin/user?caps&format=json HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
Content-Type: text/plain
Authorization: AUTHORIZATION_TOKEN

usage=read

Administration authentication requests


Edit online
Amazon’s S3 service uses the access key and a hash of the request header and the secret key to authenticate the request. It has the
benefit of providing an authenticated request, especially large uploads, without SSL overhead.

Most use cases for the S3 API involve using open-source S3 clients such as the AmazonS3Client in the Amazon SDK for Java or
Python Boto. These libraries do not support the Ceph Object Gateway Admin API. You can subclass and extend these libraries to
support the Ceph Admin API. Alternatively, you can create a unique Gateway client.

Creating an execute() method

The CephAdminAPI example class in this section illustrates how to create an execute() method that can take request parameters,
authenticate the request, call the Ceph Admin API and receive a response.

NOTE: The CephAdminAPI class example is not supported or intended for commercial use. It is for illustrative purposes only.**

Calling the Ceph Object Gateway

The client code contains five calls to the Ceph Object Gateway to demonstrate CRUD operations:

Create a User

Get a User

IBM Storage Ceph 907


Modify a User

Create a Subuser

Delete a User

To use this example, get the httpcomponents-client-4.5.3 Apache HTTP components. You can download it for example here:
http://hc.apache.org/downloads.cgi. Then unzip the tar file, navigate to its lib directory and copy the contents to the
/jre/lib/ext directory of the JAVA_HOME directory, or a custom classpath.

As you examine the CephAdminAPI class example, notice that the execute() method takes an HTTP method, a request path, an
optional subresource, null if not specified, and a map of parameters. To execute with subresources, for example, subuser, and
key, you will need to specify the subresource as an argument in the execute() method.

The example method:

1. Builds a URI.

2. Builds an HTTP header string.

3. Instantiates an HTTP request, for example, PUT, POST, GET, DELETE.

4. Adds the Date header to the HTTP header string and the request header.

5. Adds the Authorization header to the HTTP request header.

6. Instantiates an HTTP client and passes it the instantiated HTTP request.

7. Makes a request.

8. Returns a response.

Building the header string Building the header string is the portion of the process that involves Amazon’s S3 authentication
procedure. Specifically, the example method does the following:

1. Adds a request type, for example, PUT, POST, GET, DELETE.

2. Adds the date.

3. Adds the requestPath.

The request type should be uppercase with no leading or trailing white space. If you do not trim white space, authentication will fail.
The date MUST be expressed in GMT, or authentication will fail.

The exemplary method does not have any other headers. The Amazon S3 authentication procedure sorts x-amz headers
lexicographically. So if you are adding x-amz headers, be sure to add them lexicographically.

Once you have built the header string, the next step is to instantiate an HTTP request and pass it the URI. The exemplary method
uses PUT for creating a user and subuser, GET for getting a user, POST for modifying a user and DELETE for deleting a user.

Once you instantiate a request, add the Date header followed by the Authorization header. Amazon’s S3 authentication uses the
standard Authorization header, and has the following structure:

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

The CephAdminAPI example class has a base64Sha1Hmac() method, which takes the header string and the secret key for the
admin user, and returns a SHA1 HMAC as a base-64 encoded string. Each execute() call will invoke the same line of code to build
the Authorization header:

httpRequest.addHeader("Authorization", "AWS " + this.getAccessKey() + ":" +


base64Sha1Hmac(headerString.toString(), this.getSecretKey()));

The following CephAdminAPI example class requires you to pass the access key, secret key, and an endpoint to the constructor. The
class provides accessor methods to change them at runtime.

Example

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.time.OffsetDateTime;

908 IBM Storage Ceph


import java.time.format.DateTimeFormatter;
import java.time.ZoneId;

import org.apache.http.HttpEntity;
import org.apache.http.NameValuePair;
import org.apache.http.Header;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpRequestBase;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.methods.HttpPut;
import org.apache.http.client.methods.HttpDelete;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
import org.apache.http.client.utils.URIBuilder;

import java.util.Base64;
import java.util.Base64.Encoder;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import javax.crypto.spec.SecretKeySpec;
import javax.crypto.Mac;

import java.util.Map;
import java.util.Iterator;
import java.util.Set;
import java.util.Map.Entry;

public class CephAdminAPI {

/*
* Each call must specify an access key, secret key, endpoint and format.
*/
String accessKey;
String secretKey;
String endpoint;
String scheme = "http"; //http only.
int port = 80;

/*
* A constructor that takes an access key, secret key, endpoint and format.
*/
public CephAdminAPI(String accessKey, String secretKey, String endpoint){
this.accessKey = accessKey;
this.secretKey = secretKey;
this.endpoint = endpoint;
}

/*
* Accessor methods for access key, secret key, endpoint and format.
*/
public String getEndpoint(){
return this.endpoint;
}

public void setEndpoint(String endpoint){


this.endpoint = endpoint;
}

public String getAccessKey(){


return this.accessKey;
}

public void setAccessKey(String accessKey){


this.accessKey = accessKey;
}

public String getSecretKey(){


return this.secretKey;
}

public void setSecretKey(String secretKey){


this.secretKey = secretKey;

IBM Storage Ceph 909


}

/*
* Takes an HTTP Method, a resource and a map of arguments and
* returns a CloseableHTTPResponse.
*/
public CloseableHttpResponse execute(String HTTPMethod, String resource,
String subresource, Map arguments) {

String httpMethod = HTTPMethod;


String requestPath = resource;
StringBuffer request = new StringBuffer();
StringBuffer headerString = new StringBuffer();
HttpRequestBase httpRequest;
CloseableHttpClient httpclient;
URI uri;
CloseableHttpResponse httpResponse = null;

try {

uri = new URIBuilder()


.setScheme(this.scheme)
.setHost(this.getEndpoint())
.setPath(requestPath)
.setPort(this.port)
.build();

if (subresource != null){
uri = new URIBuilder(uri)
.setCustomQuery(subresource)
.build();
}

for (Iterator iter = arguments.entrySet().iterator();


iter.hasNext();) {
Entry entry = (Entry)iter.next();
uri = new URIBuilder(uri)
.setParameter(entry.getKey().toString(),
entry.getValue().toString())
.build();

request.append(uri);

headerString.append(HTTPMethod.toUpperCase().trim() + "\n\n\n");

OffsetDateTime dateTime = OffsetDateTime.now(ZoneId.of("GMT"));


DateTimeFormatter formatter = DateTimeFormatter.RFC_1123_DATE_TIME;
String date = dateTime.format(formatter);

headerString.append(date + "\n");
headerString.append(requestPath);

if (HTTPMethod.equalsIgnoreCase("PUT")){
httpRequest = new HttpPut(uri);
} else if (HTTPMethod.equalsIgnoreCase("POST")){
httpRequest = new HttpPost(uri);
} else if (HTTPMethod.equalsIgnoreCase("GET")){
httpRequest = new HttpGet(uri);
} else if (HTTPMethod.equalsIgnoreCase("DELETE")){
httpRequest = new HttpDelete(uri);
} else {
System.err.println("The HTTP Method must be PUT,
POST, GET or DELETE.");
throw new IOException();
}

httpRequest.addHeader("Date", date);
httpRequest.addHeader("Authorization", "AWS " + this.getAccessKey()
+ ":" + base64Sha1Hmac(headerString.toString(),
this.getSecretKey()));

httpclient = HttpClients.createDefault();

910 IBM Storage Ceph


httpResponse = httpclient.execute(httpRequest);

} catch (URISyntaxException e){


System.err.println("The URI is not formatted properly.");
e.printStackTrace();
} catch (IOException e){
System.err.println("There was an error making the request.");
e.printStackTrace();
}
return httpResponse;
}

/*
* Takes a uri and a secret key and returns a base64-encoded
* SHA-1 HMAC.
*/
public String base64Sha1Hmac(String uri, String secretKey) {
try {

byte[] keyBytes = secretKey.getBytes("UTF-8");


SecretKeySpec signingKey = new SecretKeySpec(keyBytes, "HmacSHA1");

Mac mac = Mac.getInstance("HmacSHA1");


mac.init(signingKey);

byte[] rawHmac = mac.doFinal(uri.getBytes("UTF-8"));

Encoder base64 = Base64.getEncoder();


return base64.encodeToString(rawHmac);

} catch (Exception e) {
throw new RuntimeException(e);
}
}

The subsequent CephAdminAPIClient example illustrates how to instantiate the CephAdminAPI class, build a map of request
parameters, and use the execute() method to create, get, update and delete a user.

Example

import java.io.IOException;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.HttpEntity;
import org.apache.http.util.EntityUtils;
import java.util.*;

public class CephAdminAPIClient {

public static void main (String[] args){

CephAdminAPI adminApi = new CephAdminAPI ("FFC6ZQ6EMIF64194158N",


"Xac39eCAhlTGcCAUreuwe1ZuH5oVQFa51lbEMVoT",
"ceph-client");

/*
* Create a user
*/
Map requestArgs = new HashMap();
requestArgs.put("access", "usage=read, write; users=read, write");
requestArgs.put("display-name", "New User");
requestArgs.put("email", "[email protected]");
requestArgs.put("format", "json");
requestArgs.put("uid", "new-user");

CloseableHttpResponse response =
adminApi.execute("PUT", "/admin/user", null, requestArgs);

System.out.println(response.getStatusLine());
HttpEntity entity = response.getEntity();

try {
System.out.println("\nResponse Content is: "

IBM Storage Ceph 911


+ EntityUtils.toString(entity, "UTF-8") + "\n");
response.close();
} catch (IOException e){
System.err.println ("Encountered an I/O exception.");
e.printStackTrace();
}

/*
* Get a user
*/
requestArgs = new HashMap();
requestArgs.put("format", "json");
requestArgs.put("uid", "new-user");

response = adminApi.execute("GET", "/admin/user", null, requestArgs);

System.out.println(response.getStatusLine());
entity = response.getEntity();

try {
System.out.println("\nResponse Content is: "
+ EntityUtils.toString(entity, "UTF-8") + "\n");
response.close();
} catch (IOException e){
System.err.println ("Encountered an I/O exception.");
e.printStackTrace();
}

/*
* Modify a user
*/
requestArgs = new HashMap();
requestArgs.put("display-name", "John Doe");
requestArgs.put("email", "[email protected]");
requestArgs.put("format", "json");
requestArgs.put("uid", "new-user");
requestArgs.put("max-buckets", "100");

response = adminApi.execute("POST", "/admin/user", null, requestArgs);

System.out.println(response.getStatusLine());
entity = response.getEntity();

try {
System.out.println("\nResponse Content is: "
+ EntityUtils.toString(entity, "UTF-8") + "\n");
response.close();
} catch (IOException e){
System.err.println ("Encountered an I/O exception.");
e.printStackTrace();
}

/*
* Create a subuser
*/
requestArgs = new HashMap();
requestArgs.put("format", "json");
requestArgs.put("uid", "new-user");
requestArgs.put("subuser", "foobar");

response = adminApi.execute("PUT", "/admin/user", "subuser", requestArgs);


System.out.println(response.getStatusLine());
entity = response.getEntity();

try {
System.out.println("\nResponse Content is: "
+ EntityUtils.toString(entity, "UTF-8") + "\n");
response.close();
} catch (IOException e){
System.err.println ("Encountered an I/O exception.");
e.printStackTrace();
}

/*

912 IBM Storage Ceph


* Delete a user
*/
requestArgs = new HashMap();
requestArgs.put("format", "json");
requestArgs.put("uid", "new-user");

response = adminApi.execute("DELETE", "/admin/user", null, requestArgs);


System.out.println(response.getStatusLine());
entity = response.getEntity();

try {
System.out.println("\nResponse Content is: "
+ EntityUtils.toString(entity, "UTF-8") + "\n");
response.close();
} catch (IOException e){
System.err.println ("Encountered an I/O exception.");
e.printStackTrace();
}
}
}

Reference
Edit online

For a more extensive explanation of the Amazon S3 authentication procedure, consult the Signing and Authenticating REST
Requests section of Amazon Simple Storage Service documentation.

See the S3 Authentication for additional details.

Creating an administrative user


Edit online
IMPORTANT: To run the radosgw-admin command from the Ceph Object Gateway node, ensure the node has the admin key. The
admin key can be copied from any Ceph Monitor node.

Prerequisites
Edit online

Root-level access to the Ceph Object Gateway node.

Procedure
Edit online

1. Create an object gateway user:

Syntax

radosgw-admin user create --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_developer_proc_api_creating-an-
administrative-user_USER_NAME" --display-name="DISPLAY_NAME"

Example

[user@client ~]$ radosgw-admin user create --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_developer_proc_api_creating-an-
administrative-user_admin-api-user" --display-name="Admin API User"

The radosgw-admin command-line interface will return the user.

Example output

{
"user_id": "admin-api-user",

IBM Storage Ceph 913


"display_name": "Admin API User",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "admin-api-user",
"access_key": "NRWGT19TWMYOB1YDBV1Y",
"secret_key": "gr1VEGIV7rxcP3xvXDFCo4UDwwl2YoNrmtRlIAty"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"temp_url_keys": []
}

2. Assign administrative capabilities to the user you create:

Syntax

radosgw-admin caps add --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_developer_proc_api_creating-an-
administrative-user_USER_NAME" --caps="users=*"

Example

[user@client ~]$ radosgw-admin caps add --uid=admin-api-user --caps="users=*"

The radosgw-admin command-line interface will return the user. The "caps": will have the capabilities you assigned to the
user:

Example output

{
"user_id": "admin-api-user",
"display_name": "Admin API User",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "admin-api-user",
"access_key": "NRWGT19TWMYOB1YDBV1Y",
"secret_key": "gr1VEGIV7rxcP3xvXDFCo4UDwwl2YoNrmtRlIAty"
}
],
"swift_keys": [],
"caps": [
{
"type": "users",
"perm": "*"
}
],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,

914 IBM Storage Ceph


"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"temp_url_keys": []
}

Now you have a user with administrative privileges.

Get user information


Edit online
Get the user’s information.

Capabilities

users=read

Syntax

GET /admin/user?format=json HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user for which the information is requested.

Type
String

Example
foo_user

Required
Yes

Response Entities

user

Description
A container for the user data information.

Type
Container

Parent
N/A

user_id

Description
The user ID.

Type
String

Parent
user

display_name

IBM Storage Ceph 915


Description
Display name for the user.

Type
String

Parent
user

suspended

Description
True if the user is suspended.

Type
Boolean

Parent
user

max_buckets

Description
The maximum number of buckets to be owned by the user.

Type
Integer

Parent
user

subusers

Description
Subusers associated with this user account.

Type
Container

Parent
user

keys

Description
S3 keys associated with this user account.

Type
Container

Parent
user

swift_keys

Description
Swift keys associated with this user account.

Type
Container

Parent
user

caps

Description
User capabilities.

Type
Container

916 IBM Storage Ceph


Parent
user

If successful, the response contains the user information.

Special Error Responses

None. S

Create a user
Edit online
Create a new user. By default, an S3 key pair will be created automatically and returned in the response. If only a access-key or
secret-key is provided, the omitted key will be automatically generated. By default, a generated key is added to the keyring
without replacing an existing key pair. If access-key is specified and refers to an existing key owned by the user then it will be
modified.

Capabilities

`users=write`

Syntax

PUT /admin/user?format=json HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user ID to be created.

Type
String

Example
foo_user

Required
Yes

display-name

Description
The display name of the user to be created.

Type
String

Example
foo_user

Required
Yes

email

Description
The email address associated with the user.

Type
String

Example
[email protected]

Required

IBM Storage Ceph 917


No

key-type

Description
Key type to be generated, options are: swift, s3 (default).

Type
String

Example
s3

Required
No

access-key

Description
Specify access key.

Type
String

Example
ABCD0EF12GHIJ2K34LMN

Required
No

secret-key

Description
Specify secret key.

Type
String

Example
0AbCDEFg1h2i34JklM5nop6QrSTUV+WxyzaBC7D8

Required
No

user-caps

Description
User capabilities.

Type
String

Example
usage=read, write; users=read

Required
No

generate-key

Description
Generate a new key pair and add to the existing keyring.

Type
Boolean

Example
True

Required
No

918 IBM Storage Ceph


max-buckets

Description
Specify the maximum number of buckets the user can own.

Type
Integer

Example
500

Required
No

suspended

Description
Specify whether the user should be suspended

Type
Boolean

Example
False

Required
No

Response Entities

user

Description
Specify whether the user should be suspended

Type
Boolean

Parent
No

user_id

Description
The user ID.

Type
String

Parent
user

display_name

Description
Display name for the user.

Type
String

Parent
user

suspended

Description
True if the user is suspended.

Type
Boolean

IBM Storage Ceph 919


Parent
user

max_buckets

Description
The maximum number of buckets to be owned by the user.

Type
Integer

Parent
user

subusers

Description
Subusers associated with this user account.

Type
Container

Parent
user

keys

Description
S3 keys associated with this user account.

Type
Container

Parent
user

swift_keys

Description
Swift keys associated with this user account.

Type
Container

Parent
user

caps

Description
User capabilities.

Type
Container

Parent
If successful, the response contains the user information.

Special Error Responses

UserExists

Description
Attempt to create existing user.

Code
409 Conflict

InvalidAccessKey

Description
Invalid access key specified.

920 IBM Storage Ceph


Code
400 Bad Request

InvalidKeyType

Description
Invalid key type specified.

Code
400 Bad Request

InvalidSecretKey

Description
Invalid secret key specified.

Code
400 Bad Request

KeyExists

Description
Provided access key exists and belongs to another user.

Code
409 Conflict

EmailExists

Description
Provided email address exists.

Code
409 Conflict

InvalidCap

Description
Attempt to grant invalid admin capability.

Code
400 Bad Request

Reference
Edit online

See Developer for creating subusers.

Modify a user
Edit online
Modify an existing user.

Capabilities

`users=write`

Syntax

POST /admin/user?format=json HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

IBM Storage Ceph 921


Description
The user ID to be created.

Type
String

Example
foo_user

Required
Yes

display-name

Description
The display name of the user to be created.

Type
String

Example
foo_user

Required
Yes

email

Description
The email address associated with the user.

Type
String

Example
[email protected]

Required
No

generate-key

Description
Generate a new key pair and add to the existing keyring.

Type
Boolean

Example
True

Required
No

access-key

Description
Specify access key.

Type
String

Example
ABCD0EF12GHIJ2K34LMN

Required
No

secret-key

Description

922 IBM Storage Ceph


Specify secret key.

Type
String

Example
0AbCDEFg1h2i34JklM5nop6QrSTUV+WxyzaBC7D8

Required
No

key-type

Description
Key type to be generated, options are: swift, s3 (default).

Type
String

Example
s3

Required
No

user-caps

Description
User capabilities.

Type
String

Example
usage=read, write; users=read

Required
No

max-buckets

Description
Specify the maximum number of buckets the user can own.

Type
Integer

Example
500

Required
No

suspended

Description
Specify whether the user should be suspended

Type
Boolean

Example
False

Required
No

Response Entities

user

IBM Storage Ceph 923


Description
Specify whether the user should be suspended

Type
Boolean

Parent
No

user_id

Description
The user ID.

Type
String

Parent
user

display_name

Description
Display name for the user.

Type
String

Parent
user

suspended

Description
True if the user is suspended.

Type
Boolean

Parent
user

max_buckets

Description
The maximum number of buckets to be owned by the user.

Type
Integer

Parent
user

subusers

Description
Subusers associated with this user account.

Type
Container

Parent
user

keys

Description
S3 keys associated with this user account.

Type
Container

924 IBM Storage Ceph


Parent
user

swift_keys

Description
Swift keys associated with this user account.

Type
Container

Parent
user

caps

Description
User capabilities.

Type
Container

Parent
If successful, the response contains the user information.

Special Error Responses

InvalidAccessKey

Description
Invalid access key specified.

Code
400 Bad Request

InvalidKeyType

Description
Invalid key type specified.

Code
:400 Bad Request

InvalidSecretKey

Description
Invalid secret key specified.

Code
400 Bad Request

KeyExists

Description
Provided access key exists and belongs to another user.

Code
409 Conflict

EmailExists

Description
Provided email address exists.

Code
409 Conflict

InvalidCap

Description
Attempt to grant invalid admin capability.

IBM Storage Ceph 925


Code
400 Bad Request

Reference
Edit online

Modifying subusers

Remove a user
Edit online
Remove an existing user.

Capabilities

`users=write`

Syntax

DELETE /admin/user?format=json HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user ID to be removed.

Type
String

Example
foo_user

Required
Yes

purge-data

Description
When specified the buckets and objects belonging to the user will also be removed.

Type
Boolean

Example
True

Required
No

Response Entities

None.

Special Error Responses

None.

Reference
Edit online

926 IBM Storage Ceph


See Remove a subuser for removing subusers.

Create a subuser
Edit online
Create a new subuser, primarily useful for clients using the Swift API.

NOTE: Either gen-subuser or subuser is required for a valid request. In general, for a subuser to be useful, it must be granted
permissions by specifying access. As with user creation if subuser is specified without secret, then a secret key is automatically
generated.

Capabilities

`users=write`

Syntax

PUT /admin/user?subuser&format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user ID under which a subuser is to be created.

Type
String

Example
foo_user

Required
Yes

subuser

Description
Specify the subuser ID to be created.

Type
String

Example
sub_foo

Required
Yes (or gen-subuser)

gen-subuser

Description
Specify the subuser ID to be created.

Type
String

Example
sub_foo

Required
Yes (or gen-subuser)

secret-key

Description
Specify secret key.

IBM Storage Ceph 927


Type
String

Example
0AbCDEFg1h2i34JklM5nop6QrSTUV+WxyzaBC7D8

Required
No

key-type

Description
Key type to be generated, options are: swift (default), s3.

Type
String

Example
swift

Required
No

access

Description
Set access permissions for sub-user, should be one of read, write, readwrite, full.

Type
String

Example
read

Required
No

generate-secret

Description
Generate the secret key.

Type
Boolean

Example
True

Required
No

Response Entities

subusers

Description
Subusers associated with the user account.

Type
Container

Parent
N/A

permissions

Description
Subuser access to user account.

Type
String

928 IBM Storage Ceph


Parent
subusers

If successful, the response contains the subuser information.

SubuserExists

Description
Specified subuser exists.

Code
409 Conflict

InvalidKeyType

Description
Invalid key type specified.

Code
400 Bad Request

InvalidSecretKey

Description
Invalid secret key specified.

Code
400 Bad Request

InvalidAccess

Description
Invalid subuser access specified

Code
400 Bad Request

Modify a subuser
Edit online
Modify an existing subuser.

Capabilities

`users=write`

Syntax

POST /admin/user?subuser&format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user ID under which a subuser is to be created.

Type
String

Example
foo_user

Required
Yes

subuser

IBM Storage Ceph 929


Description
The subuser ID to be modified.

Type
String

Example
sub_foo

Required
No

generate-secret

Description
Generate a new secret key for the subuser, replacing the existing key.

Type
Boolean

Example
True

Required
No

secret

Description
Specify secret key.

Type
String

Example
0AbCDEFg1h2i34JklM5nop6QrSTUV+WxyzaBC7D8

Required
No

key-type

Description
Key type to be generated, options are: swift (default), s3.

Type
String

Example
swift

Required
No

access

Description
Set access permissions for sub-user, should be one of read, write, readwrite, full.

Type
String

Example
read

Required
No

Response Entities

subusers

930 IBM Storage Ceph


Description
Subusers associated with the user account.

Type
Container

Parent
N/A

id

Description
Subuser ID

Type
String

Parent
subusers

permissions

Description
Subuser access to user account.

Type
String

Parent
subusers

If successful, the response contains the subuser information.

InvalidKeyType

Description
Invalid key type specified.

Code
400 Bad Request

InvalidSecretKey

Description
Invalid secret key specified.

Code
400 Bad Request

InvalidAccess

Description
Invalid subuser access specified

Code
400 Bad Request

Remove a subuser
Edit online
Remove an existing subuser.

Capabilities

`users=write`

Syntax

IBM Storage Ceph 931


DELETE /admin/user?subuser&format=json HTTP/1.1
Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user ID to be removed.

Type
String

Example
foo_user

Required
Yes

subuser

Description
The subuser ID to be removed.

Type
String

Example
sub_foo

Required
Yes

purge-keys

Description
Remove keys belonging to the subuser.

Type
Boolean

Example
True

Required
No

Response Entities

None.

Special Error Responses

None.

Add capabilities to a user


Edit online
Add an administrative capability to a specified user.

Capabilities

`users=write`

Syntax

PUT /admin/user?caps&format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

932 IBM Storage Ceph


uid

Description
The user ID to add an administrative capability to.

Type
String

Example
foo_user

Required
Yes

user-caps

Description
The administrative capability to add to the user.

Type
String

Example
usage=read, write

Required
Yes

user

Description
A container for the user data information.

Type
Container

Parent
N/A

user_id

Description
The user ID

Type
String

Parent
user

caps

Description
User capabilities

Type
Container

Parent
user

If successful, the response contains the user’s capabilities.

Special Error Responses

InvalidCap

Description
Attempt to grant invalid admin capability.

Code

IBM Storage Ceph 933


400 Bad Request

Remove capabilities from a user


Edit online
Remove an administrative capability from a specified user.

Capabilities

`users=write`

Syntax

DELETE /admin/user?caps&format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user ID to remove an administrative capability from.

Type
String

Example
foo_user

Required
Yes

user-caps

Description
The administrative capabilities to remove from the user.

Type
String

Example
usage=read, write

Required
Yes

Response Entities

user

Description
A container for the user data information.

Type
Container

Parent
N/A

user_id

Description
The user ID.

Type
String

Parent

934 IBM Storage Ceph


user

caps

Description
User capabilities.

Type
Container

Parent
user

If successful, the response contains the user’s capabilities.

InvalidCap

Description
Attempt to remove an invalid admin capability.

Code
400 Bad Request

NoSuchCap

Description
User does not possess specified capability.

Code
404 Not Found

Create a key
Edit online
Create a new key. If a subuser is specified then by default created keys will be swift type. If only one of access-key or secret-
key is provided the committed key will be automatically generated, that is if only secret-key is specified then access-key will be
automatically generated. By default, a generated key is added to the keyring without replacing an existing key pair. If access-key is
specified and refers to an existing key owned by the user then it will be modified. The response is a container listing all keys of the
same type as the key created.

NOTE: When creating a swift key, specifying the option access-key will have no effect. Additionally, only one swift key might be
held by each user or subuser.

Capabilities

`users=write`

Syntax

PUT /admin/user?key&format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user ID to receive the new key.

Type
String

Example
foo_user

Required
Yes

IBM Storage Ceph 935


subuser

Description
The subuser ID to receive the new key.

Type
String

Example
sub_foo

Required
No

key-type

Description
Key type to be generated, options are: swift, s3 (default).

Type
String

Example
s3

Required
No

access-key

Description
Specify access key.

Type
String

Example
AB01C2D3EF45G6H7IJ8K

Required
No

secret-key

Description
Specify secret key.

Type
String

Example
0ab/CdeFGhij1klmnopqRSTUv1WxyZabcDEFgHij

Required
No

generate-key

Description
Generate a new key pair and add to the existing keyring.

Type
Boolean

Example
True

Required
No

Response Entities

936 IBM Storage Ceph


keys

Description
Keys of type created associated with this user account.

Type
Container

Parent
N/A

user

Description
The user account associated with the key.

Type
String

Parent
keys

access-key

Description
The access key.

Type
String

Parent
keys

secret-key

Description
The secret key.

Type
String

Parent
keys

Special Error Responses

InvalidAccessKey

Description
Invalid access key specified.

Code
400 Bad Request

InvalidKeyType

Description
Invalid key type specified.

Code
400 Bad Request

InvalidSecretKey

Description
Invalid secret key specified.

Code
400 Bad Request

InvalidKeyType

IBM Storage Ceph 937


Description
Invalid key type specified.

Code
400 Bad Request

KeyExists

Description
Provided access key exists and belongs to another user.

Code
409 Conflict

Remove a key
Edit online
Remove an existing key.

Capabilities

`users=write`

Syntax

DELETE /admin/user?key&format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

access-key

Description
The S3 access key belonging to the S3 key pair to remove.

Type
String

Example
AB01C2D3EF45G6H7IJ8K

Required
Yes

uid

Description
The user to remove the key from.

Type
String

Example
foo_user

Required
No

subuser

Description
The subuser to remove the key from.

Type
String

Example
sub_foo

938 IBM Storage Ceph


Required
No

key-type

Description
Key type to be removed, options are: swift, s3.

NOTE: Required to remove swift key.

Type
String

Example
swift

Required
No

Special Error Responses

None.

Response Entities

None.

Bucket notifications
Edit online
As a storage administrator, you can use these APIs to provide configuration and control interfaces for the bucket notification
mechanism. The API topics are named objects that contain the definition of a specific endpoint. Bucket notifications associate topics
with a specific bucket. The S3 bucket operations section gives more details on bucket notifications.

NOTE: In all topic actions, the parameters are URL encoded, and sent in the message body using application/x-www-form-
urlencoded content type.

NOTE: Any bucket notification already associated with the topic needs to be re-created for the topic update to take effect.

Prerequisites
Overview of bucket notifications
Persistent notifications
Creating a topic
Getting topic information
Listing topics
Deleting topics
Using the command-line interface for topic management
Event record
Supported event types

Prerequisites
Edit online

Create bucket notifications on the Ceph Object Gateway.

Overview of bucket notifications


Edit online

IBM Storage Ceph 939


Bucket notifications provide a way to send information out of the Ceph Object Gateway when certain events happen in the bucket.
Bucket notifications can be sent to HTTP, AMQP0.9.1, and Kafka endpoints. A notification entry must be created to send bucket
notifications for events on a specific bucket and to a specific topic. A bucket notification can be created on a subset of event types or
by default for all event types. The bucket notification can filter out events based on key prefix or suffix, regular expression matching
the keys, and the metadata attributes attached to the object, or the object tags. Bucket notifications have a REST API to provide
configuration and control interfaces for the bucket notification mechanism.

Persistent notifications
Edit online
Persistent notifications enable reliable and asynchronous delivery of notifications from the Ceph Object Gateway to the endpoint
configured at the topic. Regular notifications are also reliable because the delivery to the endpoint is performed synchronously
during the request. With persistent notifications, the Ceph Object Gateway retries sending notifications even when the endpoint is
down or there are network issues during the operations, that is notifications are retried if not successfully delivered to the endpoint.
Notifications are sent only after all other actions related to the notified operation are successful. If an endpoint goes down for a
longer duration, the notification queue fills up and the S3 operations that have configured notifications for these endpoints will fail.

NOTE: With kafka-ack-level=none, there is no indication for message failures, and therefore messages sent while broker is
down are not retried, when the broker is up again. After the broker is up again, only new notifications are seen.

Creating a topic
Edit online
You can create topics before creating bucket notifications. A topic is a Simple Notification Service (SNS) entity and all the topic
operations, that is, create, delete, list, and get, are SNS operations. The topic needs to have endpoint parameters that are
used when a bucket notification is created. Once the request is successful, the response includes the topic Amazon Resource Name
(ARN) that can be used later to reference this topic in the bucket notification request.

NOTE: A topic_arn provides the bucket notification configuration and is generated after a topic is created.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access.

Installation of the Ceph Object Gateway.

User access key and secret key.

Endpoint parameters.

1. Create a topic with the following request format:

Syntax

POST
Action=CreateTopic
&Name=TOPIC_NAME
[&Attributes.entry.1.key=amqp-exchange&Attributes.entry.1.value=EXCHANGE]
[&Attributes.entry.2.key=amqp-ack-level&Attributes.entry.2.value=none|broker|routable]
[&Attributes.entry.3.key=verify-ssl&Attributes.entry.3.value=true|false]
[&Attributes.entry.4.key=kafka-ack-level&Attributes.entry.4.value=none|broker]
[&Attributes.entry.5.key=use-ssl&Attributes.entry.5.value=true|false]
[&Attributes.entry.6.key=ca-location&Attributes.entry.6.value=FILE_PATH]
[&Attributes.entry.7.key=OpaqueData&Attributes.entry.7.value=OPAQUE_DATA]
[&Attributes.entry.8.key=push-endpoint&Attributes.entry.8.value=ENDPOINT]
[&Attributes.entry.9.key=persistent&Attributes.entry.9.value=true|false]

Here are the request parameters:

940 IBM Storage Ceph


Endpoint: URL of an endpoint to send notifications to.

OpaqueData: opaque data is set in the topic configuration and added to all notifications triggered by the topic.

persistent: indication of whether notifications to this endpoint are persistent that is asynchronous or not. By default
the value is false.

HTTP endpoint:

URL: https://FQDN:PORT

port defaults to: Use 80/443 for HTTP accordingly.

verify-ssl: Indicates whether the server certificate is validated by the client or not. By default , it is true.

AMQP0.9.1 endpoint:

URL: amqp://USER:PASSWORD@FQDN:PORT.

User and password defaults to: guest and guest respectively.

User and password details should be provided over HTTPS, otherwise the topic creation request is rejected.

port defaults to: 5672.

vhost defaults to: “/”

amqp-exchange: The exchanges must exist and be able to route messages based on topics. This is a mandatory
parameter for AMQP0.9.1. Different topics pointing to the same endpoint must use the same exchange.

amqp-ack-level: No end to end acknowledgment is required, as messages may persist in the broker before
being delivered into their final destination. Three acknowledgment methods exist:

none: Message is considered delivered if sent to the broker.

broker: By default, the message is considered delivered if acknowledged by the broker.

routable: Message is considered delivered if the broker can route to a consumer.

NOTE: The key and value of a specific parameter do not have to reside in the same line, or in any specific
order, but must use the same index. Attribute indexing does not need to be sequential or start from any
specific value.

NOTE: The topic-name is used for the AMQP topic.

Kafka endpoint:

URL: kafka:USER:PASSWORD@FQDN:PORT.

use-ssl is set to false by default. If use-ssl is set to true, secure connection is used for connecting with
the broker.

If ca-location is provided, and secure connection is used, the specified CA will be used, instead of the default
one, to authenticate the broker.

User and password can only be provided over HTTP[S\]. Otherwise, the topic creation request is rejected.

User and password may only be provided together with use-ssl, otherwise, the connection to the broker will
fail.

port defaults to: 9092.

kafka-ack-level: no end to end acknowledgment required, as messages may persist in the broker before
being delivered into their final destination. Two acknowledgment methods exist:

none: message is considered delivered if sent to the broker.

broker: By default, the message is considered delivered if acknowledged by the broker.

The following is an example of the response format:

IBM Storage Ceph 941


Example

<CreateTopicResponse xmlns="https://sns.amazonaws.com/doc/2010-03-31/">
<CreateTopicResult>
<TopicArn></TopicArn>
</CreateTopicResult>
<ResponseMetadata>
<RequestId></RequestId>
</ResponseMetadata>
</CreateTopicResponse>

NOTE: The topic Amazon Resource Name (ARN) in the response will have the following format:
arn:aws:sns:_ZONE_GROUP_:_TENANT_:_TOPIC

The following is an example of AMQP0.9.1 endpoint:

Example

client.create_topic(Name='my-topic' , Attributes={'push-endpoint': 'amqp://127.0.0.1:5672', 'amqp-


exchange': 'ex1', 'amqp-ack-level': 'broker'}) "

Getting topic information


Edit online
Returns information about a specific topic. This can include endpoint information if it is provided.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access.

Installation of the Ceph Object Gateway.

User access key and secret key.

Endpoint parameters.

Procedure
Edit online

1. Get topic information with the following request format:

Syntax

POST
Action=GetTopic
&TopicArn=TOPIC_ARN

Here is an example of the response format:

<GetTopicResponse>
<GetTopicRersult>
<Topic>
<User></User>
<Name></Name>
<EndPoint>
<EndpointAddress></EndpointAddress>
<EndpointArgs></EndpointArgs>
<EndpointTopic></EndpointTopic>
<HasStoredSecret></HasStoredSecret>
<Persistent></Persistent>
</EndPoint>
<TopicArn></TopicArn>
<OpaqueData></OpaqueData>

942 IBM Storage Ceph


</Topic>
</GetTopicResult>
<ResponseMetadata>
<RequestId></RequestId>
</ResponseMetadata>
</GetTopicResponse>

The following are the tags and definitions:

User: Name of the user that created the topic.

Name: Name of the topic.

JSON formatted endpoints include:

EndpointAddress: The endpoint URL. If the endpoint URL contains user and password information, the
request must be made over HTTPS. Otheriwse, the topic get request is rejected.

EndPointArgs: The endpoint arguments.

EndpointTopic: The topic name that is be sent to the endpoint can be different than the above example
topic name.

HasStoredSecret: true when the endpoint URL contains user and password information.

Persistent: true when the topic is persistent.

TopicArn: Topic ARN.

OpaqueData: This is an opaque data set on the topic.

Listing topics
Edit online
List the topics that the user has defined.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access.

Installation of the Ceph Object Gateway.

User access key and secret key.

Endpoint parameters.

Procedure
Edit online

1. List topic information with the following request format:

Syntax

POST
Action=ListTopics

Here is an example of the response format:

<ListTopicdResponse xmlns="https://sns.amazonaws.com/doc/2020-03-31/">
<ListTopicsRersult>
<Topics>

IBM Storage Ceph 943


<member>
<User></User>
<Name></Name>
<EndPoint>
<EndpointAddress></EndpointAddress>
<EndpointArgs></EndpointArgs>
<EndpointTopic></EndpointTopic>
</EndPoint>
<TopicArn></TopicArn>
<OpaqueData></OpaqueData>
</member>
</Topics>
</ListTopicsResult>
<ResponseMetadata>
<RequestId></RequestId>
</ResponseMetadata>
</ListTopicsResponse>

NOTE: If endpoint URL contains user and password information, in any of the topics, the request must be made over HTTPS.
Otherwise, the topic list request is rejected.

Deleting topics
Edit online
Removing a deleted topic results in no operation and is not a failure.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

Root-level access.

Installation of the Ceph Object Gateway.

User access key and secret key.

Endpoint parameters.

Procedure
Edit online

Delete a topic with the following request format:

Syntax

POST
Action=DeleteTopic
&TopicArn=TOPIC_ARN

Here is an example of the response format:

<DeleteTopicResponse xmlns="https://sns.amazonaws.com/doc/2020-03-31/">
<ResponseMetadata>
<RequestId></RequestId>
</ResponseMetadata>
</DeleteTopicResponse>

Using the command-line interface for topic management


Edit online

944 IBM Storage Ceph


You can list, get, and remove topics using the command-line interface.

Prerequisites
Edit online

Root-level access to the Ceph Object Gateway node.

1. To get a list of all topics of a user:

Syntax

radosgw-admin topic list --uid=USER_ID

Example

[root@rgw ~]# radosgw-admin topic list --uid=example

2. To get configuration of a specific topic:

Syntax

radosgw-admin topic get --uid=USER_ID --topic=TOPIC_NAME

Example

[root@rgw ~]# radosgw-admin topic get --uid=example --topic=example-topic

3. To remove a specific topic:

Syntax

radosgw-admin topic rm --uid=USER_ID --topic=TOPIC_NAME

Example

[root@rgw ~]# radosgw-admin topic rm --uid=example --topic=example-topic

Event record
Edit online
An event holds information about the operation done by the Ceph Object Gateway and is sent as a payload over the chosen endpoint,
such as HTTP, HTTPS, Kafka, or AMQ0.9.1. The event record is in JSON format.

Example

{"Records":[
{
"eventVersion":"2.1",
"eventSource":"ceph:s3",
"awsRegion":"us-east-1",
"eventTime":"2019-11-22T13:47:35.124724Z",
"eventName":"ObjectCreated:Put",
"userIdentity":{
"principalId":"tester"
},
"requestParameters":{
"sourceIPAddress":""
},
"responseElements":{
"x-amz-request-id":"503a4c37-85eb-47cd-8681-2817e80b4281.5330.903595",
"x-amz-id-2":"14d2-zone1-zonegroup1"
},
"s3":{
"s3SchemaVersion":"1.0",
"configurationId":"mynotif1",
"bucket":{
"name":"mybucket1",
"ownerIdentity":{

IBM Storage Ceph 945


"principalId":"tester"
},
"arn":"arn:aws:s3:us-east-1::mybucket1",
"id":"503a4c37-85eb-47cd-8681-2817e80b4281.5332.38"
},
"object":{
"key":"myimage1.jpg",
"size":"1024",
"eTag":"37b51d194a7513e45b56f6524f2d51f2",
"versionId":"",
"sequencer": "F7E6D75DC742D108",
"metadata":[],
"tags":[]
}
},
"eventId":"",
"opaqueData":"[email protected]"
}
]}

These are the event record keys and their definitions:

awsRegion
Zonegroup.

eventTime
Timestamp that indicates when the event was triggered.

eventName
The type of the event.

userIdentity.principalId
The identity of the user that triggered the event.

requestParameters.sourceIPAddress
The IP address of the client that triggered the event. This field is not supported.

responseElements.x-amz-request-id
The request ID that triggered the event.

responseElements.x_amzID2
The identity of the Ceph Object Gateway on which the event was triggered. The identity format is RGWID*-ZONE-ZONEGROUP.

s3.configurationId
The notification ID that created the event.

s3.bucket.name
The name of the bucket.

s3.bucket.ownerIdentity.principalId
The owner of the bucket.

s3.bucket.arn
Amazon Resource Name (ARN) of the bucket.

s3.bucket.id
Identity of the bucket.

s3.object.key
The object key.

s3.object.size
The size of the object.

s3.object.eTag
The object etag.

s3.object.version
The object version in a versioned bucket.

s3.object.sequencer
Monotonically increasing identifier of the change per object in the hexadecimal format.

s3.object.metadata

946 IBM Storage Ceph


Any metadata set on the object sent as x-amz-meta.

s3.object.tags
Any tags set on the object.

s3.eventId
Unique identity of the event.

s3.opaqueData
Opaque data is set in the topic configuration and added to all notifications triggered by the topic.

Reference
Edit online

See the Event Message Structure for more information.

Supported event types


Edit online
The following event types are supported:

s3:ObjectCreated:

s3:ObjectCreated:Put

s3:ObjectCreated:Post

s3:ObjectCreated:Copy

s3:ObjectCreated:CompleteMultipartUpload

s3:ObjectRemoved:*

s3:ObjectRemoved:Delete

s3:ObjectRemoved:DeleteMarkerCreated

Get bucket information


Edit online
Get information about a subset of the existing buckets. If uid is specified without bucket then all buckets belonging to the user will
be returned. If bucket alone is specified, information for that particular bucket will be retrieved.

Capabilities

`buckets=read`

Syntax

GET /admin/bucket?format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

bucket

Description
The bucket to return info on.

Type
String

Example

IBM Storage Ceph 947


foo_bucket

Required
No

uid

Description
The user to retrieve bucket information for.

Type
String

Example
foo_user

Required
No

stats

Description
Return bucket statistics.

Type
Boolean

Example
True

Required
No

Response Entities

stats

Description
Per bucket information.

Type
Container

Parent
N/A

buckets

Description
Contains a list of one or more bucket containers.

Type
Container

Parent
buckets

bucket

Description
Container for single bucket information.

Type
Container

Parent
buckets

name

Description

948 IBM Storage Ceph


The name of the bucket.

Type
String

Parent
bucket

pool

Description
The pool the bucket is stored in.

Type
String

Parent
bucket

id

Description
The unique bucket ID.

Type
String

Parent
bucket

marker

Description
Internal bucket tag.

Type
String

Parent
bucket

owner

Description
The user ID of the bucket owner.

Type
String

Parent
bucket

usage

Description
Storage usage information.

Type
Container

Parent
bucket

index

Description
Status of bucket index.

Type
String

Parent

IBM Storage Ceph 949


bucket

If successful, then the request returns a bucket’s container with the bucket information.

Special Error Responses

IndexRepairFailed

Description
Bucket index repair failed.

Code
409 Conflict

Check a bucket index


Edit online
Check the index of an existing bucket.

NOTE: To check multipart object accounting with check-objects, fix must be set to True.

Capabilities

buckets=write

Syntax

GET /admin/bucket?index&format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

bucket

Description
The bucket to return info on.

Type
String

Example
foo_bucket

Required
Yes

check-objects

Description
Check multipart object accounting.

Type
Boolean

Example
True

Required
No

fix

Description
Also fix the bucket index when checking.

Type
Boolean

950 IBM Storage Ceph


Example
False

Required
No

Response Entities

index

Description
Status of bucket index.

Type
String

Special Error Responses

IndexRepairFailed

Description
Bucket index repair failed.

Code
409 Conflict

Remove a bucket
Edit online
Removes an existing bucket.

Capabilities

`buckets=write`

Syntax

DELETE /admin/bucket?format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

bucket

Description
The bucket to remove.

Type
String

Example
foo_bucket

Required
Yes

purge-objects

Description
Remove a bucket’s objects before deletion.

Type
Boolean

Example
True

Required

IBM Storage Ceph 951


No

Response Entities

None.

Special Error Responses

BucketNotEmpty

Description
Attempted to delete non-empty bucket.

Code
409 Conflict

ObjectRemovalFailed

Description
Unable to remove objects.

Code
409 Conflict

Link a bucket
Edit online
Link a bucket to a specified user, unlinking the bucket from any previous user.

Capabilities

`buckets=write`

Syntax

PUT /admin/bucket?format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

bucket

Description
The bucket to unlink.

Type
String

Example
foo_bucket

Required
Yes

uid

Description
The user ID to link the bucket to.

Type
String

Example
foo_user

Required
Yes

952 IBM Storage Ceph


Response Entities

bucket

Description
Container for single bucket information.

Type
Container

Parent
N/A

name

Description
The name of the bucket.

Type
String

Parent
bucket

pool

Description
The pool the bucket is stored in.

Type
String

Parent
bucket

id

Description
The unique bucket ID.

Type
String

Parent
bucket

marker

Description
Internal bucket tag.

Type
String

Parent
bucket

owner

Description
The user ID of the bucket owner.

Type
String

Parent
bucket

usage

Description
Storage usage information.

IBM Storage Ceph 953


Type
Container

Parent
bucket

index

Description
Status of bucket index.

Type
String

Parent
bucket

Special Error Responses

BucketUnlinkFailed

Description
Unable to unlink bucket from specified user.

Code
409 Conflict

BucketLinkFailed

Description
Unable to link bucket to specified user.

Code
409 Conflict

Unlink a bucket
Edit online
Unlink a bucket from a specified user. Primarily useful for changing bucket ownership.

Capabilities

`buckets=write`

Syntax

POST /admin/bucket?format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

bucket

Description
The bucket to unlink.

Type
String

Example
foo_bucket

Required
Yes

uid

Description
The user ID to link the bucket to.

954 IBM Storage Ceph


Type
String

Example
foo_user

Required
Yes

Response Entities None.

BucketUnlinkFailed

Description
Unable to unlink bucket from specified user.

Type
409 Conflict

Get a bucket or object policy


Edit online
Read the policy of an object or bucket.

Capabilities

`buckets=read`

Syntax

GET /admin/bucket?policy&format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

bucket

Description
The bucket to read the policy from.

Type
String

Example
foo_bucket

Required
Yes

object

Description
The object to read the policy from.

Type
String

Example
foo.txt

Required
No

Response Entities

policy

Description

IBM Storage Ceph 955


Access control policy.

Type
Container

Parent
N/A

If successful, returns the object or bucket policy.

Special Error Responses

IncompleteBody

Description
Either bucket was not specified for a bucket policy request or bucket and object were not specified for an object policy
request.

Code
400 Bad Request

Remove an object
Edit online
Remove an existing object.

NOTE:

Does not require owner to be non-suspended.

Capabilities

`buckets=write`

Syntax

DELETE /admin/bucket?object&format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

bucket

Description
The bucket containing the object to be removed.

Type
String

Example
foo_bucket

Required
Yes

object

Description
The object to remove

Type
String

Example
foo.txt

Required
Yes

956 IBM Storage Ceph


Response Entities

None.

Special Error Responses

NoSuchObject

Description
Specified object does not exist.

Code
404 Not Found

ObjectRemovalFailed

Description
Unable to remove objects.

Code
409 Conflict

Quotas
Edit online
The administrative Operations API enables you to set quotas on users and on buckets owned by users. Quotas include the maximum
number of objects in a bucket and the maximum storage size in megabytes.

To view quotas, the user must have a users=read capability. To set, modify or disable a quota, the user must have users=write
capability.

Valid parameters for quotas include:

Bucket
The bucket option allows you to specify a quota for buckets owned by a user.

Maximum Objects
The max-objects setting allows you to specify the maximum number of objects. A negative value disables this setting.

Maximum Size
The max-size option allows you to specify a quota for the maximum number of bytes. A negative value disables this setting.

Quota Scope
The quota-scope option sets the scope for the quota. The options are bucket and user.

Get a user quota


Edit online
To get a quota, the user must have users capability set with read permission.

Syntax

GET /admin/user?quota&uid=UID&quota-type=user

Set a user quota


Edit online
To set a quota, the user must have users capability set with write permission.

Syntax

IBM Storage Ceph 957


PUT /admin/user?quota&uid=UID&quota-type=user

The content must include a JSON representation of the quota settings as encoded in the corresponding read operation.

Get a bucket quota


Edit online
Get information about a subset of the existing buckets. If uid is specified without bucket then all buckets belonging to the user will
be returned. If bucket alone is specified, information for that particular bucket will be retrieved.

Capabilities

`buckets=read`

Syntax

GET /admin/bucket?format=json HTTP/1.1


Host FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

bucket

Description
The bucket to return info on.

Type
String

Example
foo_bucket

Required
No

uid

Description
The user to retrieve bucket information for.

Type
String

Example
foo_user

Required
No

stats

Description
Return bucket statistics.

Type
Boolean

Example
True

Required
No

Response Entities

stats

Description

958 IBM Storage Ceph


Per bucket information.

Type
Container

Parent
N/A

buckets

Description
Contains a list of one or more bucket containers.

Type
Container

Parent
N/A

bucket

Description
Container for single bucket information.

Type
Container

Parent
buckets

name

Description
The name of the bucket.

Type
String

Parent
bucket

pool

Description
The pool the bucket is stored in.

Type
String

Parent
bucket

id

Description
The unique bucket ID.

Type
String

Parent
bucket

marker

Description
Internal bucket tag.

Type
String

IBM Storage Ceph 959


Parent
bucket

owner

Description
The user ID of the bucket owner.

Type
String

Parent
bucket

usage

Description
Storage usage information.

Type
Container

Parent
bucket

index

Description
Status of bucket index.

Type
String

Parent
bucket

If successful, then the request returns a bucket’s container with the bucket information.

Special Error Responses

IndexRepairFailed

Description
Bucket index repair failed.

Code
409 Conflict

Set a bucket quota


Edit online
To set a quota, the user must have users capability set with write permission.

Syntax

PUT /admin/user?quota&uid=UID&quota-type=bucket

The content must include a JSON representation of the quota settings as encoded in the corresponding read operation.

Get usage information


Edit online
Requesting bandwidth usage information.

960 IBM Storage Ceph


Capabilities

`usage=read`

Syntax

GET /admin/usage?format=json HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user for which the information is requested.

Type
String

Required
Yes

start

Description
The date, and optionally, the time of when the data request started. For example, 2012-09-25 16:00:00.

Type
String

Required
No

end

Description
The date, and optionally, the time of when the data request ended. For example, 2012-09-25 16:00:00.

Type
String

Required
No

show-entries

Description
Specifies whether data entries should be returned.

Type
Boolean

Required
No

show-summary

Description
Specifies whether data entries should be returned.

Type
Boolean

Required
No

Response Entities

usage

Description

IBM Storage Ceph 961


A container for the usage information.

Type
Container

entries

Description
A container for the usage entries information.

Type
Container

user

Description
A container for the user data information.

Type
Container

owner

Description
The name of the user that owns the buckets.

Type
String

bucket

Description
The bucket name.

Type
String

time

Description
Time lower bound for which data is being specified that is rounded to the beginning of the first relevant hour.

Type
String

epoch

Description
The time specified in seconds since 1/1/1970.

Type
String

categories

Description
A container for stats categories.

Type
Container

entry

Description
A container for stats entry.

Type
Container

category

962 IBM Storage Ceph


Description
Name of request category for which the stats are provided.

Type
String

bytes_sent

Description
Number of bytes sent by the Ceph Object Gateway.

Type
Integer

bytes_received

Description
Number of bytes received by the Ceph Object Gateway.

Type
Integer

ops

Description
Number of operations.

Type
Integer

successful_ops

Description
Number of successful operations.

Type
Integer

summary

Description
Number of successful operations.

Type
Container

total

Description
A container for stats summary aggregated total.

Type
Container

If successful, the response contains the requested information.

Remove usage information


Edit online
Remove usage information. With no dates specified, removes all usage information.

Capabilities

`usage=write`

Syntax

IBM Storage Ceph 963


DELETE /admin/usage?format=json HTTP/1.1
Host: FULLY_QUALIFIED_DOMAIN_NAME

Request Parameters

uid

Description
The user for which the information is requested.

Type
String

Example
foo_user

Required
Yes

start

Description
The date, and optionally, the time of when the data request started.

Type
String

Example
2012-09-25 16:00:00

Required
No

end

Description
The date, and optionally, the time of when the data request ended.

Type
String

Example
2012-09-25 16:00:00

Required
No

remove-all

Description
Required when uid is not specified, in order to acknowledge multi-user data removal.

Type
Boolean

Example
True

Required
No

Standard error responses


Edit online
The following list details standard error responses and their descriptions.

AccessDenied

964 IBM Storage Ceph


Description
Access denied.

Code
403 Forbidden

InternalError

Description
Internal server error.

Code
500 Internal Server Error

NoSuchUser

Description
User does not exist.

Code
404 Not Found

NoSuchBucket

Description
Bucket does not exist.

Code
404 Not Found

NoSuchKey

Description
No such access key.

Code
404 Not Found

Ceph Object Gateway and the S3 API


Edit online
As a developer, you can use a RESTful application programming interface (API) that is compatible with the Amazon S3 data access
model. You can manage the buckets and objects stored in an IBM Storage Ceph cluster through the Ceph Object Gateway.

Prerequisites
S3 limitations
Accessing the Ceph Object Gateway with the S3 API
S3 bucket operations
S3 object operations
S3 select operations (Technology Preview)

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

S3 limitations
IBM Storage Ceph 965
Edit online
IMPORTANT: The following limitations should be used with caution. There are implications related to your hardware selections, so
you should always discuss these requirements with your IBM account team.

Maximum object size when using Amazon S3: Individual Amazon S3 objects can range in size from a minimum of 0B to a
maximum of 5TB. The largest object that can be uploaded in a single PUT is 5GB. For objects larger than 100MB, you should
consider using the Multipart Upload capability.

Maximum metadata size when using Amazon S3: There is no defined limit on the total size of user metadata that can be
applied to an object, but a single HTTP request is limited to 16,000 bytes.

The amount of data overhead IBM Storage cluster produces to store S3 objects and metadata: The estimate here is 200-
300 bytes plus the length of the object name. Versioned objects consume additional space proportional to the number of
versions. Also, transient overhead is produced during multi-part upload and other transactional updates, but these overheads
are recovered during garbage collection.

Reference
Edit online

See unsupported header fields for more details.

Accessing the Ceph Object Gateway with the S3 API


Edit online
As a developer, you must configure access to the Ceph Object Gateway and the Secure Token Service (STS) before you can start
using the Amazon S3 API.

Prerequisites
S3 authentication
S3 server-side encryption
S3 access control lists
Preparing access to the Ceph Object Gateway using S3
Accessing the Ceph Object Gateway using Ruby AWS S3
Accessing the Ceph Object Gateway using Ruby AWS SDK
Accessing the Ceph Object Gateway using PHP
Secure Token Service

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A running Ceph Object Gateway.

A RESTful client.

S3 authentication
Edit online
Requests to the Ceph Object Gateway can be either authenticated or unauthenticated. Ceph Object Gateway assumes
unauthenticated requests are sent by an anonymous user. Ceph Object Gateway supports canned ACLs.

For most use cases, clients use existing open source libraries like the Amazon SDK’s AmazonS3Client for Java, and Python Boto.
With open source libraries you simply pass in the access key and secret key and the library builds the request header and
authentication signature for you. However, you can create requests and sign them too.

966 IBM Storage Ceph


Authenticating a request requires including an access key and a base 64-encoded hash-based Message Authentication Code (HMAC)
in the request before it is sent to the Ceph Object Gateway server. Ceph Object Gateway uses an S3-compatible authentication
approach.

Example

HTTP/1.1
PUT /buckets/bucket/object.mpeg
Host: cname.domain.com
Date: Mon, 2 Jan 2012 00:01:01 +0000
Content-Encoding: mpeg
Content-Length: 9999999

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

In the above example, replace ACCESS_KEY with the value for the access key ID followed by a colon (:). Replace
HASH_OF_HEADER_AND_SECRET with a hash of a canonicalized header string and the secret corresponding to the access key ID.

Generate hash of header string and secret

To generate the hash of the header string and secret:

1. Get the value of the header string.

2. Normalize the request header string into canonical form.

3. Generate an HMAC using a SHA-1 hashing algorithm.

4. Encode the hmac result as base-64.

Normalize header

To normalize the header into canonical form:

1. Get all content- headers.

2. Remove all content- headers except for content-type and content-md5.

3. Ensure the content- header names are lowercase.

4. Sort the content- headers lexicographically.

5. Ensure you have a Date header AND ensure the specified date uses GMT and not an offset.

6. Get all headers beginning with x-amz-.

7. Ensure that the x-amz- headers are all lowercase.

8. Sort the x-amz- headers lexicographically.

9. Combine multiple instances of the same field name into a single field and separate the field values with a comma.

10. Replace white space and line breaks in header values with a single space.

11. Remove white space before and after colons.

12. Append a new line after each header.

13. Merge the headers back into the request header.

Replace the HASH_OF_HEADER_AND_SECRET with the base-64 encoded HMAC string.

Reference
Edit online

For additional details, consult the Signing and Authenticating REST Requests section of Amazon Simple Storage Service
documentation.

IBM Storage Ceph 967


S3 server-side encryption
Edit online
The Ceph Object Gateway supports server-side encryption of uploaded objects for the S3 application programming interface (API).
Server-side encryption means that the S3 client sends data over HTTP in its unencrypted form, and the Ceph Object Gateway stores
that data in the IBM Storage Ceph cluster in encrypted form.

NOTE: IBM does NOT support S3 object encryption of Static Large Object (SLO) or Dynamic Large Object (DLO).

IMPORTANT: To use encryption, client requests MUST send requests over an SSL connection. IBM does not support S3 encryption
from a client unless the Ceph Object Gateway uses SSL. However, for testing purposes, administrators can disable SSL during testing
by setting the rgw_crypt_require_ssl configuration setting to false at runtime, using the ceph config set client.rgw
command, and then restarting the Ceph Object Gateway instance.

In a production environment, it might not be possible to send encrypted requests over SSL. In such a case, send requests using
HTTP with server-side encryption.

There are two options for the management of encryption keys:

Customer-provided Keys
When using customer-provided keys, the S3 client passes an encryption key along with each request to read or write
encrypted data. It is the customer’s responsibility to manage those keys. Customers must remember which key the Ceph
Object Gateway used to encrypt each object.

Ceph Object Gateway implements the customer-provided key behavior in the S3 API according to the Amazon SSE-C specification.

Since the customer handles the key management and the S3 client passes keys to the Ceph Object Gateway, the Ceph Object
Gateway requires no special configuration to support this encryption mode.

Key Management Service


When using a key management service, the secure key management service stores the keys and the Ceph Object Gateway
retrieves them on demand to serve requests to encrypt or decrypt data.

Ceph Object Gateway implements the key management service behavior in the S3 API according to the Amazon SSE-KMS
specification.

IMPORTANT: Currently, the only tested key management implementations are HashiCorp Vault, and OpenStack Barbican. However,
OpenStack Barbican is a Technology Preview and is not supported for use in production systems.

Reference
Edit online

Amazon SSE-C

Amazon SSE-KMS

Configuring server-side encryption

The HashiCorp Vault

S3 access control lists


Edit online
Ceph Object Gateway supports S3-compatible Access Control Lists (ACL) functionality. An ACL is a list of access grants that specify
which operations a user can perform on a bucket or on an object. Each grant has a different meaning when applied to a bucket versus
applied to an object:

Table 1. User Operations


Permission Bucket Object
READ Grantee can list the objects in the bucket. Grantee can read the object.

968 IBM Storage Ceph


Permission Bucket Object
WRITE Grantee can write or delete objects in the bucket. N/A
READ_ACP Grantee can read bucket ACL. Grantee can read the object ACL.
WRITE_ACP Grantee can write bucket ACL. Grantee can write to the object ACL.
FULL_CONTROL Grantee has full permissions for object in the bucket. Grantee can read or write to the object ACL.

Preparing access to the Ceph Object Gateway using S3


Edit online
You have to follow some pre-requisites on the Ceph Object Gateway node before attempting to access the gateway server.

Prerequisites
Edit online

Installation of the Ceph Object Gateway software.

Root-level access to the Ceph Object Gateway node.

Procedure
Edit online

1. As root, open port 8080 on the firewall:

[root@rgw ~]# firewall-cmd --zone=public --add-port=8080/tcp --permanent


[root@rgw ~]# firewall-cmd --reload

2. Add a wildcard to the DNS server that you are using for the gateway as mentioned in the Add a wildcard to the DNS section.

You can also set up the gateway node for local DNS caching. To do so, execute the following steps:

a. As root, install and setup dnsmasq:

[root@rgw ]# yum install dnsmasq


[root@rgw ]# echo "address=/.FQDN_OF_GATEWAY_NODE/IP_OF_GATEWAY_NODE" | tee --append
/etc/dnsmasq.conf
[root@rgw ]# systemctl start dnsmasq
[root@rgw ]# systemctl enable dnsmasq

Replace IP_OF_GATEWAY_NODE and FQDN_OF_GATEWAY_NODE with the IP address and FQDN of the gateway node.

b. As root, stop NetworkManager:

[root@rgw ~]# systemctl stop NetworkManager


[root@rgw ~]# systemctl disable NetworkManager

c. As root, set the gateway server’s IP as the nameserver:

[root@rgw ]# echo "DNS1=IP_OF_GATEWAY_NODE" | tee --append /etc/sysconfig/network-


scripts/ifcfg-eth0
[root@rgw ]# echo "IP_OF_GATEWAY_NODE FQDN_OF_GATEWAY_NODE" | tee --append /etc/hosts
[root@rgw ]# systemctl restart network
[root@rgw ]# systemctl enable network
[root@rgw ~]# systemctl restart dnsmasq

Replace IP_OF_GATEWAY_NODE and FQDN_OF_GATEWAY_NODE with the IP address and FQDN of the gateway node.

d. Verify subdomain requests:

[user@rgw ~]$ ping mybucket.FQDN_OF_GATEWAY_NODE

Replace FQDN_OF_GATEWAY_NODE with the FQDN of the gateway node.

IBM Storage Ceph 969


WARNING: Setting up the gateway server for local DNS caching is for testing purposes only. You won’t be able to
access the outside network after doing this. It is strongly recommended to use a proper DNS server for the IBM Storage
cluster and gateway node.

3. Create the radosgw user for S3 access carefully and copy the generated access_key and secret_key.You will need these
keys for S3 access and subsequent bucket management tasks. For more details, see Create an S3 user

Accessing the Ceph Object Gateway using Ruby AWS S3


Edit online
You can use Ruby programming language along with aws-s3 gem for S3 access. Execute the steps mentioned below on the node
used for accessing the Ceph Object Gateway server with Ruby AWS::S3.

Prerequisites
Edit online

User-level access to Ceph Object Gateway.

Root-level access to the node accessing the Ceph Object Gateway.

Internet access.

Procedure
Edit online

1. Install the ruby package:

[root@dev ~]# yum install ruby

NOTE: The above command will install ruby and its essential dependencies like rubygems and ruby-libs. If somehow the
command does not install all the dependencies, install them separately.

2. Install the aws-s3 Ruby package:

[root@dev ~]# gem install aws-s3

3. Create a project directory:

[user@dev ~]$ mkdir ruby_aws_s3


[user@dev ~]$ cd ruby_aws_s3

4. Create the connection file:

[user@dev ~]$ vim conn.rb

5. Paste the following contents into the conn.rb file:

Syntax

#!/usr/bin/env ruby

require aws/s3
require resolv-replace

AWS::S3::Base.establish_connection!(
:server => FQDN_OF_GATEWAY_NODE,
:port => 8080,
:access_key_id => MY_ACCESS_KEY,
:secret_access_key => MY_SECRET_KEY
)

Replace FQDN_OF_GATEWAY_NODE with the FQDN of the Ceph Object Gateway node.

Example

970 IBM Storage Ceph


#!/usr/bin/env ruby

require 'aws/s3'
require 'resolv-replace'

AWS::S3::Base.establish_connection!(
:server => 'testclient.englab.pnq.redhat.com',
:port => '8080',
:access_key_id => '98J4R9P22P5CDL65HKP8',
:secret_access_key => '6C+jcaP0dp0+FZfrRNgyGA9EzRy25pURldwje049'
)

Save the file and exit the editor.

6. Make the file executable:

[user@dev ~]$ chmod +x conn.rb

7. Run the file:

[user@dev ~]$ ./conn.rb | echo $?

If you have provided the values correctly in the file, the output of the command will be 0.

8. Create a new file for creating a bucket:

[user@dev ~]$ vim create_bucket.rb

Paste the following contents into the file:

#!/usr/bin/env ruby

load 'conn.rb'

AWS::S3::Bucket.create('my-new-bucket1')

Save the file and exit the editor.

9. Make the file executable:

[user@dev ~]$ chmod +x create_bucket.rb

10. Run the file:

[user@dev ~]$ ./create_bucket.rb

If the output of the command is true it would mean that bucket my-new-bucket1 was created successfully.

11. Create a new file for listing owned buckets:

[user@dev ~]$ vim list_owned_buckets.rb

Paste the following content into the file:

#!/usr/bin/env ruby

load 'conn.rb'

AWS::S3::Service.buckets.each do |bucket|
puts "{bucket.name}\t{bucket.creation_date}"
end

Save the file and exit the editor.

12. Make the file executable:

[user@dev ~]$ chmod +x list_owned_buckets.rb

13. Run the file:

[user@dev ~]$ ./list_owned_buckets.rb

The output should look something like this:

my-new-bucket1 2020-01-21 10:33:19 UTC

IBM Storage Ceph 971


14. Create a new file for creating an object:

[user@dev ~]$ vim create_object.rb

Paste the following contents into the file:

#!/usr/bin/env ruby

load 'conn.rb'

AWS::S3::S3Object.store(
'hello.txt',
'Hello World!',
'my-new-bucket1',
:content_type => 'text/plain'
)

Save the file and exit the editor.

15. Make the file executable:

[user@dev ~]$ chmod +x create_object.rb

16. Run the file:

[user@dev ~]$ ./create_object.rb

This will create a file hello.txt with the string Hello World!.

17. Create a new file for listing a bucket’s content:

[user@dev ~]$ vim list_bucket_content.rb

Paste the following content into the file:

#!/usr/bin/env ruby

load 'conn.rb'

new_bucket = AWS::S3::Bucket.find('my-new-bucket1')
new_bucket.each do |object|
puts "{object.key}\t{object.about['content-length']}\t{object.about['last-modified']}"
end

Save the file and exit the editor.

18. Make the file executable.

[user@dev ~]$ chmod +x list_bucket_content.rb

19. Run the file:

[user@dev ~]$ ./list_bucket_content.rb

The output will look something like this:

hello.txt 12 Fri, 22 Jan 2020 15:54:52 GMT

20. Create a new file for deleting an empty bucket:

[user@dev ~]$ vim del_empty_bucket.rb

Paste the following contents into the file:

#!/usr/bin/env ruby

load 'conn.rb'

AWS::S3::Bucket.delete('my-new-bucket1')

Save the file and exit the editor.

21. Make the file executable:

[user@dev ~]$ chmod +x del_empty_bucket.rb

972 IBM Storage Ceph


22. Run the file:

[user@dev ~]$ ./del_empty_bucket.rb | echo $?

If the bucket is successfully deleted, the command will return 0 as output.

NOTE: Edit the create_bucket.rb file to create empty buckets, for example, my-new-bucket4, my-new-bucket5. Next,
edit the above-mentioned del_empty_bucket.rb file accordingly before trying to delete empty buckets.

23. Create a new file for deleting non-empty buckets:

[user@dev ~]$ vim del_non_empty_bucket.rb

Paste the following contents into the file:

#!/usr/bin/env ruby

load 'conn.rb'

AWS::S3::Bucket.delete('my-new-bucket1', :force => true)

Save the file and exit the editor.

24. Make the file executable:

[user@dev ~]$ chmod +x del_non_empty_bucket.rb

25. Run the file:

[user@dev ~]$ ./del_non_empty_bucket.rb | echo $?

If the bucket is successfully deleted, the command will return 0 as output.

26. Create a new file for deleting an object:

[user@dev ~]$ vim delete_object.rb

Paste the following contents into the file:

#!/usr/bin/env ruby

load 'conn.rb'

AWS::S3::S3Object.delete('hello.txt', 'my-new-bucket1')

Save the file and exit the editor.

27. Make the file executable:

[user@dev ~]$ chmod +x delete_object.rb

28. Run the file:

[user@dev ~]$ ./delete_object.rb

This will delete the object hello.txt.

Accessing the Ceph Object Gateway using Ruby AWS SDK


Edit online
You can use the Ruby programming language along with aws-sdk gem for S3 access. Execute the steps mentioned below on the
node used for accessing the Ceph Object Gateway server with Ruby AWS::SDK.

Prerequisites
Edit online

User-level access to Ceph Object Gateway.

IBM Storage Ceph 973


Root-level access to the node accessing the Ceph Object Gateway.

Internet access.

Procedure
Edit online

1. Install the ruby package:

[root@dev ~]# yum install ruby

NOTE: The above command will install ruby and its essential dependencies like rubygems and ruby-libs. If somehow the
command does not install all the dependencies, install them separately.

2. Install the aws-sdk Ruby package:

[root@dev ~]# gem install aws-sdk

3. Create a project directory:

[user@dev ~]$ mkdir ruby_aws_sdk


[user@dev ~]$ cd ruby_aws_sdk

4. Create the connection file:

[user@dev ~]$ vim conn.rb

5. Paste the following contents into the conn.rb file:

Syntax

#!/usr/bin/env ruby

require aws-sdk
require resolv-replace

Aws.config.update(
endpoint: http://FQDN_OF_GATEWAY_NODE:8080,
access_key_id: MY_ACCESS_KEY,
secret_access_key: MY_SECRET_KEY,
force_path_style: true,
region: us-east-1
)

Replace FQDN_OF_GATEWAY_NODE with the FQDN of the Ceph Object Gateway node. Replace MY_ACCESS_KEY and
MY_SECRET_KEY with the access_key and secret_key that were generated when you created the radosgw user for S3
access as mentioned in the Create an S3 user section.

Example

#!/usr/bin/env ruby

require 'aws-sdk'
require 'resolv-replace'

Aws.config.update(
endpoint: 'http://testclient.englab.pnq.redhat.com:8080',
access_key_id: '98J4R9P22P5CDL65HKP8',
secret_access_key: '6C+jcaP0dp0+FZfrRNgyGA9EzRy25pURldwje049',
force_path_style: true,
region: 'us-east-1'
)

Save the file and exit the editor.

6. Make the file executable:

[user@dev ~]$ chmod +x conn.rb

7. Run the file:

[user@dev ~]$ ./conn.rb | echo $?

974 IBM Storage Ceph


If you have provided the values correctly in the file, the output of the command will be 0.

8. Create a new file for creating a bucket:

[user@dev ~]$ vim create_bucket.rb

Paste the following contents into the file:

Syntax

#!/usr/bin/env ruby

load 'conn.rb'

s3_client = Aws::S3::Client.new
s3_client.create_bucket(bucket: 'my-new-bucket2')

Save the file and exit the editor.

9. Make the file executable:

[user@dev ~]$ chmod +x create_bucket.rb

10. Run the file:

[user@dev ~]$ ./create_bucket.rb

If the output of the command is true, this means that bucket my-new-bucket2 was created successfully.

11. Create a new file for listing owned buckets:

[user@dev ~]$ vim list_owned_buckets.rb

Paste the following content into the file:

#!/usr/bin/env ruby

load 'conn.rb'

s3_client = Aws::S3::Client.new
s3_client.list_buckets.buckets.each do |bucket|
puts "{bucket.name}\t{bucket.creation_date}"
end

Save the file and exit the editor.

12. Make the file executable:

[user@dev ~]$ chmod +x list_owned_buckets.rb

13. Run the file:

[user@dev ~]$ ./list_owned_buckets.rb

The output should look something like this:

my-new-bucket2 2020-01-21 10:33:19 UTC

14. Create a new file for creating an object:

[user@dev ~]$ vim create_object.rb

Paste the following contents into the file:

#!/usr/bin/env ruby

load 'conn.rb'

s3_client = Aws::S3::Client.new
s3_client.put_object(
key: 'hello.txt',
body: 'Hello World!',
bucket: 'my-new-bucket2',
content_type: 'text/plain'
)

IBM Storage Ceph 975


Save the file and exit the editor.

15. Make the file executable:

[user@dev ~]$ chmod +x create_object.rb

16. Run the file:

[user@dev ~]$ ./create_object.rb

This will create a file hello.txt with the string Hello World!.

17. Create a new file for listing a bucket’s content:

[user@dev ~]$ vim list_bucket_content.rb

Paste the following content into the file:

#!/usr/bin/env ruby

load 'conn.rb'

s3_client = Aws::S3::Client.new
s3_client.list_objects(bucket: 'my-new-bucket2').contents.each do |object|
puts "{object.key}\t{object.size}"
end

Save the file and exit the editor.

18. Make the file executable.

[user@dev ~]$ chmod +x list_bucket_content.rb

19. Run the file:

[user@dev ~]$ ./list_bucket_content.rb

The output will look something like this:

hello.txt 12 Fri, 22 Jan 2020 15:54:52 GMT

20. Create a new file for deleting an empty bucket:

[user@dev ~]$ vim del_empty_bucket.rb

Paste the following contents into the file:

#!/usr/bin/env ruby

load 'conn.rb'

s3_client = Aws::S3::Client.new
s3_client.delete_bucket(bucket: 'my-new-bucket2')

Save the file and exit the editor.

21. Make the file executable:

[user@dev ~]$ chmod +x del_empty_bucket.rb

22. Run the file:

[user@dev ~]$ ./del_empty_bucket.rb | echo $?

If the bucket is successfully deleted, the command will return 0 as output.

NOTE: Edit the create_bucket.rb file to create empty buckets, for example, my-new-bucket6, my-new-bucket7. Next,
edit the above-mentioned del_empty_bucket.rb file accordingly before trying to delete empty buckets.

23. Create a new file for deleting a non-empty bucket:

[user@dev ~]$ vim del_non_empty_bucket.rb

Paste the following contents into the file:

976 IBM Storage Ceph


#!/usr/bin/env ruby

load 'conn.rb'

s3_client = Aws::S3::Client.new
Aws::S3::Bucket.new('my-new-bucket2', client: s3_client).clear!
s3_client.delete_bucket(bucket: 'my-new-bucket2')

Save the file and exit the editor.

24. Make the file executable:

[user@dev ~]$ chmod +x del_non_empty_bucket.rb

25. Run the file:

[user@dev ~]$ ./del_non_empty_bucket.rb | echo $?

If the bucket is successfully deleted, the command will return 0 as output.

26. Create a new file for deleting an object:

[user@dev ~]$ vim delete_object.rb

Paste the following contents into the file:

#!/usr/bin/env ruby

load 'conn.rb'

s3_client = Aws::S3::Client.new
s3_client.delete_object(key: 'hello.txt', bucket: 'my-new-bucket2')

Save the file and exit the editor.

27. Make the file executable:

[user@dev ~]$ chmod +x delete_object.rb

28. Run the file:

[user@dev ~]$ ./delete_object.rb

This will delete the object hello.txt.

Accessing the Ceph Object Gateway using PHP


Edit online
You can use PHP scripts for S3 access. This procedure provides some example PHP scripts to do various tasks, such as deleting a
bucket or an object.

IMPORTANT: The examples given below are tested against php v5.4.16 and aws-sdk v2.8.24. DO NOT use the latest version
of aws-sdk for php as it requires php >= 5.5+.php 5.5 is not available in the default repositories of RHEL 7. If you want to use
php 5.5, you will have to enable epel and other third-party repositories. Also, the configuration options for php 5.5 and latest
version of aws-sdk are different.

Prerequisites
Edit online

Root-level access to a development workstation.

Internet access.

1. Install the php package:

[root@dev ~]# yum install php

IBM Storage Ceph 977


2. Download the zip archive of aws-sdk for PHP and extract it.

3. Create a project directory:

[user@dev ~]$ mkdir php_s3


[user@dev ~]$ cd php_s3

4. Copy the extracted aws directory to the project directory. For example:

[user@dev ~]$ cp -r ~/Downloads/aws/ ~/php_s3/

5. Create the connection file:

[user@dev ~]$ vim conn.php

6. Paste the following contents in the conn.php file:

Syntax

<?php
define(AWS_KEY, MY_ACCESS_KEY);
define(AWS_SECRET_KEY, MY_SECRET_KEY);
define(HOST, FQDN_OF_GATEWAY_NODE);
define(PORT, 8080);

// require the AWS SDK for php library


require /PATH_TO_AWS/aws-autoloader.php;

use Aws\S3\S3Client;

// Establish connection with host using S3 Client


client = S3Client::factory(array(
base_url => HOST,
port => PORT,
key => AWS_KEY,
secret => AWS_SECRET_KEY
));
?>

Replace FQDN_OF_GATEWAY_NODE with the FQDN of the gateway node. Replace PATH_TO_AWS with the absolute path to the
extracted aws directory that you copied to the php project directory.

Save the file and exit the editor.

7. Run the file:

[user@dev ~]$ php -f conn.php | echo $?

If you have provided the values correctly in the file, the output of the command will be 0.

8. Create a new file for creating a bucket:

[user@dev ~]$ vim create_bucket.php

Paste the following contents into the new file:

Syntax

<?php

include 'conn.php';

client->createBucket(array('Bucket' => 'my-new-bucket3'));

?>

Save the file and exit the editor.

9. Run the file:

[user@dev ~]$ php -f create_bucket.php

10. Create a new file for listing owned buckets:

[user@dev ~]$ vim list_owned_buckets.php

978 IBM Storage Ceph


Paste the following content into the file:

Syntax

<?php

include 'conn.php';

blist = client->listBuckets();
echo "Buckets belonging to " . blist['Owner']['ID'] . ":\n";
foreach (blist['Buckets'] as b) {
echo "{b['Name']}\t{b['CreationDate']}\n";
}

?>

Save the file and exit the editor.

11. Run the file:

[user@dev ~]$ php -f list_owned_buckets.php

The output should look similar to this:

my-new-bucket3 2020-01-21 10:33:19 UTC

12. Create an object by first creating a source file named hello.txt:

[user@dev ~]$ echo "Hello World!" > hello.txt

13. Create a new php file:

[user@dev ~]$ vim create_object.php

Paste the following contents into the file:

Syntax

<?php

include 'conn.php';

key = 'hello.txt';
source_file = './hello.txt';
acl = 'private';
bucket = 'my-new-bucket3';
client->upload(bucket, key, fopen(source_file, 'r'), acl);

?>

Save the file and exit the editor.

14. Run the file:

[user@dev ~]$ php -f create_object.php

This will create the object hello.txt in bucket my-new-bucket3.

15. Create a new file for listing a bucket’s content:

[user@dev ~]$ vim list_bucket_content.php

Paste the following content into the file:

Syntax

<?php

include 'conn.php';

o_iter = client->getIterator('ListObjects', array(


'Bucket' => 'my-new-bucket3'
));
foreach (o_iter as o) {
echo "{o['Key']}\t{o['Size']}\t{o['LastModified']}\n";

IBM Storage Ceph 979


}
?>

Save the file and exit the editor.

16. Run the file:

[user@dev ~]$ php -f list_bucket_content.php

The output will look similar to this:

hello.txt 12 Fri, 22 Jan 2020 15:54:52 GMT

17. Create a new file for deleting an empty bucket:

[user@dev ~]$ vim del_empty_bucket.php

Paste the following contents into the file:

Syntax

<?php

include 'conn.php';

client->deleteBucket(array('Bucket' => 'my-new-bucket3'));


?>

Save the file and exit the editor.

18. Run the file:

[user@dev ~]$ php -f del_empty_bucket.php | echo $?

If the bucket is successfully deleted, the command will return 0 as output.

NOTE: Edit the create_bucket.php file to create empty buckets, for example, my-new-bucket4, my-new-bucket5. Next,
edit the above-mentioned del_empty_bucket.php file accordingly before trying to delete empty buckets.

IMPORTANT: Deleting a non-empty bucket is currently not supported in PHP 2 and newer versions of aws-sdk.

19. Create a new file for deleting an object:

[user@dev ~]$ vim delete_object.php

Paste the following contents into the file:

Syntax

<?php

include 'conn.php';

client->deleteObject(array(
'Bucket' => 'my-new-bucket3',
'Key' => 'hello.txt',
));
?>

Save the file and exit the editor.

20. Run the file:

[user@dev ~]$ php -f delete_object.php

This will delete the object hello.txt.

Secure Token Service


Edit online

980 IBM Storage Ceph


The Amazon Web Services Secure Token Service (STS) returns a set of temporary security credentials for authenticating users. The
Ceph Object Gateway implements a subset of the STS application programming interfaces (APIs) to provide temporary credentials
for identity and access management (IAM). Using these temporary credentials authenticates S3 calls by utilizing the STS engine in
the Ceph Object Gateway. You can restrict temporary credentials even further by using an IAM policy, which is a parameter passed to
the STS APIs.

The Secure Token Service application programming interfaces


Configuring the Secure Token Service
Creating a user for an OpenID Connect provider
Obtaining a thumbprint of an OpenID Connect provider
Configuring and using STS Lite with Keystone (Technology Preview)
Working around the limitations of using STS Lite with Keystone (Technology Preview)

Reference
Edit online

Amazon Web Services Secure Token Service welcome page.

See the Configuring and using STS Lite with Keystone section for details on STS Lite and Keystone.

See the Working around the limitations of using STS Lite with Keystone section for details on the limitations of STS Lite and
Keystone.

The Secure Token Service application programming interfaces


Edit online
The Ceph Object Gateway implements the following Secure Token Service (STS) application programming interfaces (APIs):

AssumeRole

This API returns a set of temporary credentials for cross-account access. These temporary credentials allow for both, permission
policies attached with Role and policies attached with AssumeRole API. The RoleArn and the RoleSessionName request
parameters are required, but the other request parameters are optional.

RoleArn

Description
The role to assume for the Amazon Resource Name (ARN) with a length of 20 to 2048 characters.

Type
String

Required
Yes

RoleSessionName

Description
Identifying the role session name to assume. The role session name can uniquely identify a session when different principals
or different reasons assume a role. This parameter’s value has a length of 2 to 64 characters. The =, ,, ., @, and - characters
are allowed, but no spaces allowed.

Type
String

Required
Yes

Policy

Description
An identity and access management policy (IAM) in a JSON format for use in an inline session. This parameter’s value has a
length of 1 to 2048 characters.

IBM Storage Ceph 981


Type
String

Required
No

DurationSeconds

Description
The duration of the session in seconds, with a minimum value of 900 seconds to a maximum value of 43200 seconds. The
default value is 3600 seconds.

Type
Integer

Required
No

ExternalId

Description
When assuming a role for another account, provide the unique external identifier if available. This parameter’s value has a
length of 2 to 1224 characters.

Type
String

Required
No

SerialNumber

Description
A user’s identification number from their associated multi-factor authentication (MFA) device. The parameter’s value can be
the serial number of a hardware device or a virtual device, with a length of 9 to 256 characters.

Type
String

Required
No

TokenCode

Description
The value generated from the multi-factor authentication (MFA) device, if the trust policy requires MFA. If an MFA device is
required, and if this parameter’s value is empty or expired, then AssumeRole call returns an "access denied" error message.
This parameter’s value has a fixed length of 6 characters.

Type
String

Required
No

AssumeRoleWithWebIdentity

This API returns a set of temporary credentials for users who have been authenticated by an application, such as OpenID Connect or
OAuth 2.0 Identity Provider. The RoleArn and the RoleSessionName request parameters are required, but the other request
parameters are optional.

RoleArn

Description
The role to assume for the Amazon Resource Name (ARN) with a length of 20 to 2048 characters.

Type
String

Required

982 IBM Storage Ceph


Yes

RoleSessionName

Description
Identifying the role session name to assume. The role session name can uniquely identify a session when different principals
or different reasons assume a role. This parameter’s value has a length of 2 to 64 characters. The =, ,, ., @, and - characters
are allowed, but no spaces are allowed.

Type
String

Required
Yes

Policy

Description
An identity and access management policy (IAM) in a JSON format for use in an inline session. This parameter’s value has a
length of 1 to 2048 characters.

Type
String

Required
No

DurationSeconds

Description
The duration of the session in seconds, with a minimum value of 900 seconds to a maximum value of 43200 seconds. The
default value is 3600 seconds.

Type
Integer

Required
No

ProviderId

Description
The fully qualified host component of the domain name from the identity provider. This parameter’s value is only valid for
OAuth 2.0 access tokens, with a length of 4 to 2048 characters.

Type
String

Required
No

WebIdentityToken

Description
The OpenID Connect identity token or OAuth 2.0 access token provided from an identity provider. This parameter’s value has
a length of 4 to 2048 characters.

Type
String

Required
No

Reference
Edit online

See the Examples using the Secure Token Service APIs for more details.

IBM Storage Ceph 983


Amazon Web Services Security Token Service, the AssumeRole action.

Amazon Web Services Security Token Service, the AssumeRoleWithWebIdentity action.

Configuring the Secure Token Service


Edit online
Configure the Secure Token Service (STS) for use with the Ceph Object Gateway by setting the rgw_sts_key, and
rgw_s3_auth_use_sts options.

NOTE: The S3 and STS APIs co-exist in the same namespace, and both can be accessed from the same endpoint in the Ceph Object
Gateway.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A running Ceph Object Gateway.

Root-level access to a Ceph Manager node.

Procedure
Edit online

1. Set the following configuration options for the Ceph Object Gateway client:

Syntax

ceph config set RGW_CLIENT_NAME rgw_sts_key STS_KEY


ceph config set RGW_CLIENT_NAME rgw_s3_auth_use_sts true

The rgw_sts_key is the STS key for encrypting or decrypting the session token and is exactly 16 hex characters.

Example

[root@mgr ~]# ceph config set client.rgw rgw_sts_key abcdefghijklmnop


[root@mgr ~]# ceph config set client.rgw rgw_s3_auth_use_sts true

2. Restart the Ceph Object Gateway for the added key to take effect.

NOTE: Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

a. To restart the Ceph Object Gateway on an individual node in the storage cluster:

Syntax

systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

Example

[root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-


[email protected]

b. To restart the Ceph Object Gateways on all nodes in the storage cluster:

Syntax

ceph orch restart SERVICE_TYPE

Example

[ceph: root@host01 /]# ceph orch restart rgw

984 IBM Storage Ceph


Reference
Edit online

See Secure Token Service application programming interfaces for more details on the STS APIs.

See the The basics of Ceph configuration for more details on using the Ceph configuration database.

Creating a user for an OpenID Connect provider


Edit online
To establish trust between the Ceph Object Gateway and the OpenID Connect Provider create a user entity and a role trust policy.

Prerequisites
Edit online

User-level access to the Ceph Object Gateway node.

Procedure
Edit online

1. Create a new Ceph user:

Syntax

radosgw-admin --uid USER_NAME --display-name "DISPLAY_NAME" --access_key USER_NAME --secret


SECRET user create

Example

[user@rgw ~]$ radosgw-admin --uid TESTER --display-name "TestUser" --access_key TESTER --


secret test123 user create

2. Configure the Ceph user capabilities:

Syntax

radosgw-admin caps add --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_developer_proc_api_creating-a-
user-for-an-openid-connect-provider_USER_NAME" --caps="oidc-provider=*"

Example

[user@rgw ~]$ radosgw-admin caps add --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_developer_proc_api_creating-a-
user-for-an-openid-connect-provider_TESTER" --caps="oidc-provider=*"

3. Add a condition to the role trust policy using the Secure Token Service (STS) API:

Syntax

"{\"Version\":\"2020-01-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":
{\"Federated\":[\"arn:aws:iam:::oidc-provider/IDP_URL\"]},\"Action\":
[\"sts:AssumeRoleWithWebIdentity\"],\"Condition\":{\"StringEquals\":
{\"IDP_URL:app_id\":\"AUD_FIELD\"\}\}\}\]\}"

IMPORTANT: The app_id in the syntax example above must match the AUD_FIELD field of the incoming token.

Reference
Edit online

See the Obtaining the Root CA Thumbprint for an OpenID Connect Identity Provider article on Amazon’s website.

IBM Storage Ceph 985


See the Secure Token Service application programming interfaces for more details on the STS APIs.

See the Examples using the Secure Token Service APIs for more details.

Obtaining a thumbprint of an OpenID Connect provider


Edit online
To get the OpenID Connect provider’s (IDP) configuration document.

Prerequisites
Edit online

Installation of the openssl and curl packages.

Procedure
Edit online

1. Get the configuration document from the IDP’s URL:

Syntax

curl -k -v \
-X GET \
-H "Content-Type: application/x-www-form-urlencoded" \
"IDP_URL:8000/CONTEXT/realms/REALM/.well-known/openid-configuration" \
| jq .

Example

[user@client ~]$ curl -k -v \


-X GET \
-H "Content-Type: application/x-www-form-urlencoded" \
"http://www.example.com:8000/auth/realms/quickstart/.well-known/openid-configuration" \
| jq .

2. Get the IDP certificate:

Syntax

curl -k -v \
-X GET \
-H "Content-Type: application/x-www-form-urlencoded" \
"IDP_URL/CONTEXT/realms/REALM/protocol/openid-connect/certs" \
| jq .

Example

[user@client ~]$ curl -k -v \


-X GET \
-H "Content-Type: application/x-www-form-urlencoded" \
"http://www.example.com/auth/realms/quickstart/protocol/openid-connect/certs" \
| jq .

3. Copy the result of the x5c response from the previous command and paste it into the certificate.crt file. Include —–
BEGIN CERTIFICATE—– at the beginning and —–END CERTIFICATE—– at the end.

4. Get the certificate thumbprint:

Syntax

openssl x509 -in CERT_FILE -fingerprint -noout

Example

986 IBM Storage Ceph


[user@client ~]$ openssl x509 -in certificate.crt -fingerprint -noout
SHA1 Fingerprint=F7:D7:B3:51:5D:D0:D3:19:DD:21:9A:43:A9:EA:72:7A:D6:06:52:87

5. Remove all the colons from the SHA1 fingerprint and use this as the input for creating the IDP entity in the IAM request.

Reference
Edit online

See the Obtaining the Root CA Thumbprint for an OpenID Connect Identity Provider article on Amazon’s website.

See the Secure Token Service application programming interfaces for more details on the STS APIs.

See the Examples using the Secure Token Service APIs for more details.

Configuring and using STS Lite with Keystone (Technology Preview)


Edit online
The Amazon Secure Token Service (STS) and S3 APIs co-exist in the same namespace. The STS options can be configured in
conjunction with the Keystone options.

NOTE: Both S3 and STS APIs can be accessed using the same endpoint in Ceph Object Gateway.

Prerequisites
Edit online

IBM Storage Ceph 5.0 or higher.

A running Ceph Object Gateway.

Installation of the Boto Python module, version 3 or higher.

Root-level access to a Ceph Manager node.

User-level access to an OpenStack node.

Procedure
Edit online

1. Set the following configuration options for the Ceph Object Gateway client:

Syntax

ceph config set RGW_CLIENT_NAME rgw_sts_key STS_KEY


ceph config set RGW_CLIENT_NAME rgw_s3_auth_use_sts true

The rgw_sts_key is the STS key for encrypting or decrypting the session token and is exactly 16 hex characters.

Example

[root@mgr ~]# ceph config set client.rgw rgw_sts_key abcdefghijklmnop


[root@mgr ~]# ceph config set client.rgw rgw_s3_auth_use_sts true

2. Generate the EC2 credentials on the OpenStack node:

Example

[user@osp ~]$ openstack ec2 credentials create

+------------+--------------------------------------------------------+
| Field | Value |
+------------+--------------------------------------------------------+
| access | b924dfc87d454d15896691182fdeb0ef |

IBM Storage Ceph 987


| links | {u'self': u'http://192.168.0.15/identity/v3/users/ |
| | 40a7140e424f493d8165abc652dc731c/credentials/ |
| | OS-EC2/b924dfc87d454d15896691182fdeb0ef'} |
| project_id | c703801dccaf4a0aaa39bec8c481e25a |
| secret | 6a2142613c504c42a94ba2b82147dc28 |
| trust_id | None |
| user_id | 40a7140e424f493d8165abc652dc731c |
+------------+--------------------------------------------------------+

3. Use the generated credentials to get back a set of temporary security credentials using GetSessionToken API:

Example

import boto3

access_key = b924dfc87d454d15896691182fdeb0ef
secret_key = 6a2142613c504c42a94ba2b82147dc28

client = boto3.client('sts',
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
endpoint_url=https://www.example.com/rgw,
region_name='',
)

response = client.get_session_token(
DurationSeconds=43200
)

4. Obtaining the temporary credentials can be used for making S3 calls:

Example

s3client = boto3.client('s3',
aws_access_key_id = response['Credentials']['AccessKeyId'],
aws_secret_access_key = response['Credentials']['SecretAccessKey'],
aws_session_token = response['Credentials']['SessionToken'],
endpoint_url=https://www.example.com/s3,
region_name='')

bucket = s3client.create_bucket(Bucket='my-new-shiny-bucket')
response = s3client.list_buckets()
for bucket in response["Buckets"]:
print "{name}\t{created}".format(
name = bucket['Name'],
created = bucket['CreationDate'],
)

5. Create a new S3Access role and configure a policy.

a. Assign a user with administrative CAPS:

Syntax

radosgw-admin caps add --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_developer_proc_s3_configuri
ng-and-using-sts-lite-with-keystone_USER" --caps="roles=*"

Example

[root@mgr ~]# radosgw-admin caps add --


uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_developer_proc_s3_configuri
ng-and-using-sts-lite-with-keystone_gwadmin" --caps="roles=*"

b. Create the S3Access role:

Syntax

radosgw-admin role create --role-name=ROLE_NAME --path=PATH --assume-role-policy-


doc=TRUST_POLICY_DOC

Example

[root@mgr ~]# radosgw-admin role create --role-name=S3Access --


path=/application_abc/component_xyz/ --assume-role-policy-doc=\{\"Version\":\"2012-10-

988 IBM Storage Ceph


17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Principal\":\{\"AWS\":\
[\"arn:aws:iam:::user/TESTER\"\]\},\"Action\":\[\"sts:AssumeRole\"\]\}\]\}

c. Attach a permission policy to the S3Access role:

Syntax

radosgw-admin role-policy put --role-name=ROLE_NAME --policy-name=POLICY_NAME --policy-


doc=PERMISSION_POLICY_DOC

Example

[root@mgr ~]# radosgw-admin role-policy put --role-name=S3Access --policy-name=Policy --


policy-doc=\{\"Version\":\"2012-10-17\",\"Statement\":\[\
{\"Effect\":\"Allow\",\"Action\":\
[\"s3:*\"\],\"Resource\":\"arn:aws:s3:::example_bucket\"\}\]\}

d. Now another user can assume the role of the gwadmin user. For example, the gwuser user can assume the
permissions of the gwadmin user.

e. Make a note of the assuming user’s access_key and secret_key values.

Example

[root@mgr ~]# radosgw-admin user info --uid=gwuser | grep -A1 access_key

6. Use the AssumeRole API call, providing the access_key and secret_key values from the assuming user:

Example

import boto3

access_key = 11BS02LGFB6AL6H1ADMW
secret_key = vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANY

client = boto3.client('sts',
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
endpoint_url=https://www.example.com/rgw,
region_name='',
)

response = client.assume_role(
RoleArn='arn:aws:iam:::role/application_abc/component_xyz/S3Access',
RoleSessionName='Bob',
DurationSeconds=3600
)

IMPORTANT: The AssumeRole API requires the S3Access role.

Reference
Edit online

See the Test S3 Access for more information on installing the Boto Python module.

See the Create a User for more information.

Working around the limitations of using STS Lite with Keystone


(Technology Preview)
Edit online
A limitation with Keystone is that it does not supports Secure Token Service (STS) requests. Another limitation is the payload hash is
not included with the request. To work around these two limitations the Boto authentication code must be modified.

Prerequisites
IBM Storage Ceph 989
Edit online

A running IBM Storage Ceph cluster, version 5.0 or higher.

A running Ceph Object Gateway.

Installation of Boto Python module, version 3 or higher.

Procedure
Edit online

1. Open and edit Boto’s auth.py file.

a. Add the following four lines to the code block: python class SigV4Auth(BaseSigner): """ Sign a request with Signature
V4. """ REQUIRES_REGION = True

def __init__(self, credentials, service_name, region_name):


self.credentials = credentials
# We initialize these value here so the unit tests can have
# valid values. But these will get overriden in `add_auth`
# later for real requests.
self._region_name = region_name
if service_name == 'sts': <1>
self._service_name = 's3' <2>
else: // <3>
self._service_name = service_name // <4>

b. Add the following two lines to the code block: python def _modify_request_before_signing(self, request): if
'Authorization' in request.headers: del request.headers self._set_necessary_date_headers(request) if
self.credentials.token: if 'X-Amz-Security-Token' in request.headers: del request.headers request.headers =
self.credentials.token

if not request.context.get('payload_signing_enabled', True):


if 'X-Amz-Content-SHA256' in request.headers:
del request.headers['X-Amz-Content-SHA256']
request.headers['X-Amz-Content-SHA256'] = UNSIGNED_PAYLOAD <1>
else: <2>
request.headers['X-Amz-Content-SHA256'] = self.payload(request)

Reference
Edit online

See the Test S3 Access section for more information on installing the Boto Python module.

S3 bucket operations
Edit online
As a developer, you can perform bucket operations with the Amazon S3 application programming interface (API) through the Ceph
Object Gateway.

The following table list the Amazon S3 functional operations for buckets, along with the function's support status.

Feature Status Notes


List Buckets Supported
Create a Bucket Supported Different set of canned ACLs.
Put Bucket Website Supported
Get Bucket Website Supported
Delete Bucket Website Supported
Bucket Lifecycle Partially Expiration, NoncurrentVersionExpiration and
Supported AbortIncompleteMultipartUpload supported.

990 IBM Storage Ceph


Feature Status Notes
Put Bucket Lifecycle Partially Expiration, NoncurrentVersionExpiration and
Supported AbortIncompleteMultipartUpload supported.
Delete Bucket Lifecycle Supported
Get Bucket Objects Supported
Bucket Location Supported
Get Bucket Version Supported
Put Bucket Version Supported
Delete Bucket Supported
Get Bucket ACLs Supported Different set of canned ACLs
Put Bucket ACLs Supported Different set of canned ACLs
Get Bucket cors Supported
Put Bucket cors Supported
Delete Bucket cors Supported
List Bucket Object Supported
Versions
Head Bucket Supported
List Bucket Multipart Supported
Uploads
Bucket Policies Partially
Supported
Get a Bucket Request Supported
Payment
Put a Bucket Request Supported
Payment
Get PublicAccessBlock Supported
Put PublicAccessBlock Supported
Delete Supported
PublicAccessBlock

Prerequisites
S3 create bucket notifications
S3 get bucket notifications
S3 delete bucket notifications
Accessing bucket host names
S3 list buckets
S3 return a list of bucket objects
S3 create a new bucket
S3 put bucket website
S3 get bucket website
S3 delete bucket website
S3 delete a bucket
S3 bucket lifecycle
S3 GET bucket lifecycle
S3 create or replace a bucket lifecycle
S3 delete a bucket lifecycle
S3 get bucket location
S3 get bucket versioning
S3 put bucket versioning
S3 get bucket access control lists
S3 put bucket Access Control Lists
S3 get bucket cors
S3 put bucket cors
S3 delete a bucket cors
S3 list bucket object versions
S3 head bucket
S3 list multipart uploads
S3 bucket policies
S3 get the request payment configuration on a bucket

IBM Storage Ceph 991


S3 set the request payment configuration on a bucket
Multi-tenant bucket operations
S3 Block Public Access
S3 GET PublicAccessBlock
S3 PUT PublicAccessBlock
S3 delete PublicAccessBlock

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

S3 create bucket notifications


Edit online
Create bucket notifications at the bucket level. These need to be published and the destination to send the bucket notifications.
Bucket notifications are S3 operations.

To create a bucket notification for s3:objectCreate and s3:objectRemove events, use PUT:

Example

client.put_bucket_notification_configuration(
Bucket=bucket_name,
NotificationConfiguration={
'TopicConfigurations': [
{
'Id': notification_name,
'TopicArn': topic_arn,
'Events': ['s3:ObjectCreated:*', 's3:ObjectRemoved:*']
}]})

IMPORTANT: IBM supports ObjectCreate events, such as put, post, multipartUpload, and copy. IBM also supports
ObjectRemove events, such as object_delete and s3_multi_object_delete.

Request Entities

NotificationConfiguration

Description
list of TopicConfiguration entities.

Type
Container

Required
Yes

TopicConfiguration

Description
Id, Topic, and list of Event entities.

Type
Container

Required
Yes

id

992 IBM Storage Ceph


Description
Name of the notification.

Type
String

Required
Yes

Topic

Description
Topic Amazon Resource Name(ARN)

NOTE:

The topic must be created beforehand.

Type
String

Required
Yes

Event

Description
List of supported events. Multiple event entities can be used. If omitted, all events are handled.

Type
String

Required
No

Filter

Description
S3Key, S3Metadata and S3Tags entities.

Type
Container

Required
No

S3Key

Description
A list of FilterRule entities, for filtering based on the object key. At most, 3 entities may be in the list, for example Name
would be prefix, suffix, or regex. All filter rules in the list must match for the filter to match.

Type
Container

Required
No

S3Metadata

Description
A list of FilterRule entities, for filtering based on object metadata. All filter rules in the list must match the metadata
defined on the object. However, the object still matches if it has other metadata entries not listed in the filter.

Type
Container

Required
No

IBM Storage Ceph 993


S3Tags

Description
A list of FilterRule entities, for filtering based on object tags. All filter rules in the list must match the tags defined on the
object. However, the object still matches if it has other tags not listed in the filter.

Type
Container

Required
No

S3Key.FilterRule

Description
Name and Value entities. Name is : prefix, suffix, or regex. The Value would hold the key prefix, key suffix, or a regular
expression for matching the key, accordingly.

Type
Container

Required
Yes

S3Metadata.FilterRule

Description
Name and Value entities. Name is the name of the metadata attribute for example x-amz-meta-xxx. The value is the
expected value for this attribute.

Type
Container

Required
Yes

S3Tags.FilterRule

Description
Name and Value entities. Name is the tag key, and the value is the tag value.

Type
Container

Required
Yes

HTTP response

400

Status Code
MalformedXML

Description
The XML is not well-formed.

400

Status Code
InvalidArgument

Description
Missing Id or missing or invalid topic ARN or invalid event.

404

Status Code
NoSuchBucket

Description

994 IBM Storage Ceph


The bucket does not exist.

404

Status Code
NoSuchKey

Description
The topic does not exist.

S3 get bucket notifications


Edit online
Get a specific notification or list all the notifications configured on a bucket.

Syntax

Get /_BUCKET_?notification=_NOTIFICATIONID HTTP/1.1


Host: cname.domain.com
Date: date
Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

Example

Get /testbucket?notification=testnotificationID HTTP/1.1


Host: cname.domain.com
Date: date
Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

Example Response

<NotificationConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<TopicConfiguration>
<Id></Id>
<Topic></Topic>
<Event></Event>
<Filter>
<S3Key>
<FilterRule>
<Name></Name>
<Value></Value>
</FilterRule>
</S3Key>
<S3Metadata>
<FilterRule>
<Name></Name>
<Value></Value>
</FilterRule>
</S3Metadata>
<S3Tags>
<FilterRule>
<Name></Name>
<Value></Value>
</FilterRule>
</S3Tags>
</Filter>
</TopicConfiguration>
</NotificationConfiguration>

NOTE: The notification subresource returns the bucket notification configuration or an empty NotificationConfiguration
element. The caller must be the bucket owner.

Request Entities

notification-id

Description
Name of the notification. All notifications are listed if the ID is not provided.

Type

IBM Storage Ceph 995


String

NotificationConfiguration

Description
list of TopicConfiguration entities.

Type
Container

Required
Yes

TopicConfiguration

Description
Id, Topic, and list of Event entities.

Type
Container

Required
Yes

id

Description
Name of the notification.

Type
String

Required
Yes

Topic

Description
Topic Amazon Resource Name(ARN)

NOTE: The topic must be created beforehand.

Type
String

Required
Yes

Event

Description
Handled event. Multiple event entities may exist.

Type
String

Required
Yes

Filter

Description
The filters for the specified configuration.

Type
Container

Required
No

HTTP response

996 IBM Storage Ceph


404

Status Code
NoSuchBucket

Description
The bucket does not exist.

404

Status Code
NoSuchKey

Description
The notification does not exist if it has been provided.

S3 delete bucket notifications


Edit online
Delete a specific or all notifications from a bucket.

NOTE: Notification deletion is an extension to the S3 notification API. Any defined notifications on a bucket are deleted when the
bucket is deleted. Deleting an unknown notification for example double delete, is not considered an error.

To delete a specific or all notifications use DELETE:

Syntax

DELETE /BUCKET?notification=NOTIFICATION_ID HTTP/1.1

Example

DELETE /testbucket?notification=testnotificationID HTTP/1.1

Request Entities

notification-id

Description
Name of the notification. All notifications on the bucket are deleted if the notification ID is not provided.

Type
String

HTTP response

404

Status Code
NoSuchBucket

Description
The bucket does not exist.

Accessing bucket host names


Edit online
There are two different modes of accessing the buckets. The first, and preferred method identifies the bucket as the top-level
directory in the URI.

Example

GET /mybucket HTTP/1.1


Host: cname.domain.com

IBM Storage Ceph 997


The second method identifies the bucket via a virtual bucket host name.

Example

GET / HTTP/1.1
Host: mybucket.cname.domain.com

TIP: IBM prefers the first method, because the second method requires expensive domain certification and DNS wild cards.

S3 list buckets
Edit online
GET / returns a list of buckets created by the user making the request. GET / only returns buckets created by an authenticated
user. You cannot make an anonymous request.

Syntax

GET / HTTP/1.1
Host: cname.domain.com

Authorization: AWS_ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

Response Entities

Buckets

Description
Container for list of buckets.

Type
Container

Bucket

Description
Container for bucket information.

Type
Container

Name

Description
Bucket name.

Type
String

CreationDate

Description
UTC time when the bucket was created.

Type
Date

ListAllMyBucketsResult

Description
A container for the result.

Type
Container

Owner

Description
A container for the bucket owner’s ID and DisplayName.

998 IBM Storage Ceph


Type
Container

ID

Description
The bucket owner’s ID.

Type
String

DisplayName

Description
The bucket owner’s display name.

Type
String

S3 return a list of bucket objects


Edit online
Returns a list of bucket objects.

Syntax

GET /BUCKET?max-keys=25 HTTP/1.1


Host: cname.domain.com

Parameters

prefix

Description
Only returns objects that contain the specified prefix.

Type
String

delimiter

Description
The delimiter between the prefix and the rest of the object name.

Type
String

marker

Description
A beginning index for the list of objects returned.

Type
String

max-keys

Description
The maximum number of keys to return. Default is 1000.

Type
Integer

HTTP Response

200

Status Code

IBM Storage Ceph 999


OK

Description
Buckets retrieved.

GET /_BUCKET returns a container for buckets with the following fields:

Bucket Response Entities

ListBucketResult

Description
The container for the list of objects.

Type
Entity

Name

Description
The name of the bucket whose contents will be returned.

Type
String

Prefix

Description
A prefix for the object keys.

Type
String

Marker

Description
A beginning index for the list of objects returned.

Type
String

MaxKeys

Description
The maximum number of keys returned.

Type
Integer

Delimiter

Description
If set, objects with the same prefix will appear in the CommonPrefixes list.

Type
String

IsTruncated

Description
If true, only a subset of the bucket’s contents were returned.

Type
Boolean

CommonPrefixes

Description
If multiple objects contain the same prefix, they will appear in this list.

Type

1000 IBM Storage Ceph


Container

The ListBucketResult contains objects, where each object is within a Contents container.

Object Response Entities

Contents

Description
A container for the object.

Type
Object

Key

Description
The object’s key.

Type
String

LastModified

Description
The object’s last-modified date and time.

Type
Date

ETag

Description
An MD-5 hash of the object. Etag is an entity tag.

Type
String

Size

Description
The object’s size.

Type
Integer

StorageClass

Description
Should always return STANDARD.

Type
String

S3 create a new bucket


Edit online
Creates a new bucket. To create a bucket, you must have a user ID and a valid AWS Access Key ID to authenticate requests. You can
not create buckets as an anonymous user.

Constraints

In general, bucket names should follow domain name constraints.

Bucket names must be unique.

Bucket names cannot be formatted as IP address.

IBM Storage Ceph 1001


Bucket names can be between 3 and 63 characters long.

Bucket names must not contain uppercase characters or underscores.

Bucket names must start with a lowercase letter or number.

Bucket names can contain a dash (-).

Bucket names must be a series of one or more labels. Adjacent labels are separated by a single period (.). Bucket names can
contain lowercase letters, numbers, and hyphens. Each label must start and end with a lowercase letter or a number.

NOTE: The above constraints are relaxed if rgw_relaxed_s3_bucket_names is set to true. The bucket names must still be
unique, cannot be formatted as IP address, and can contain letters, numbers, periods, dashes, and underscores of up to 255
characters long.

Syntax

PUT /_BUCKET_ HTTP/1.1


Host: cname.domain.com
x-amz-acl: public-read-write

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

Parameters x-amz-acl

Description
Canned ACLs.

Valid Values
private, public-read,public-read-write, authenticated-read

Required
No

HTTP Response

If the bucket name is unique, within constraints, and unused, the operation will succeed. If a bucket with the same name already
exists and the user is the bucket owner, the operation will succeed. If the bucket name is already in use, the operation will fail.

409

Status Code
BucketAlreadyExists

Description
Bucket already exists under different user’s ownership.

S3 put bucket website


Edit online
The put bucket website API sets the configuration of the website that is specified in the website subresource. To configure a bucket
as a website, the website subresource can be added on the bucket.

NOTE: Put operation requires S3:PutBucketWebsite` permission. By default, only the bucket owner can configure the
website attached to a bucket.

Syntax

PUT /BUCKET?website-configuration=HTTP/1.1

Example

PUT /testbucket?website-configuration=HTTP/1.1

Reference
Edit online

1002 IBM Storage Ceph


For more information about this API call, see S3 API.

S3 get bucket website


Edit online
The get bucket website API retrieves the configuration of the website that is specified in the website subresource.

NOTE: Get operation requires the S3:GetBucketWebsite permission. By default, only the bucket owner can read the bucket
website configuration.

Syntax

GET /BUCKET?website-configuration=HTTP/1.1

Example

GET /testbucket?website-configuration=HTTP/1.1

Reference
Edit online

For more information about this API call, see S3 API.

S3 delete bucket website


Edit online
The delete bucket website API removes the website configuration for a bucket.

Syntax

DELETE /BUCKET?website-configuration=HTTP/1.1

Example

DELETE /testbucket?website-configuration=HTTP/1.1

Reference
Edit online

For more information about this API call, see S3 API.

S3 delete a bucket
Edit online
Deletes a bucket. You can reuse bucket names following a successful bucket removal.

Syntax

DELETE /_BUCKET_ HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

HTTP Response

204

IBM Storage Ceph 1003


Status Code
No Content

Description
Bucket removed.

S3 bucket lifecycle
Edit online
You can use a bucket lifecycle configuration to manage your objects so they are stored effectively throughout their lifetime. The S3
API in the Ceph Object Gateway supports a subset of the AWS bucket lifecycle actions:

Expiration
This defines the lifespan of objects within a bucket. It takes the number of days the object should live or expiration date, at
which point Ceph Object Gateway will delete the object. If the bucket doesn’t enable versioning, Ceph Object Gateway will
delete the object permanently. If the bucket enables versioning, Ceph Object Gateway will create a delete marker for the
current version, and then delete the current version.

NoncurrentVersionExpiration
This defines the lifespan of non-current object versions within a bucket. To use this feature, the bucket must enable
versioning. It takes the number of days a non-current object should live, at which point Ceph Object Gateway will delete the
non-current object.

AbortIncompleteMultipartUpload
This defines the number of days an incomplete multipart upload should live before it is aborted.

The lifecycle configuration contains one or more rules using the <Rule> element.

Example

<LifecycleConfiguration>
<Rule>
<Prefix/>
<Status>Enabled</Status>
<Expiration>
<Days>10</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>

A lifecycle rule can apply to all or a subset of objects in a bucket based on the <Filter> element that you specify in the lifecycle
rule. You can specify a filter in several ways:

Key prefixes

Object tags

Both key prefix and one or more object tags

Key prefixes

You can apply a lifecycle rule to a subset of objects based on the key name prefix. For example, specifying <keypre/> would apply
to objects that begin with keypre/:

<LifecycleConfiguration>
<Rule>
<Status>Enabled</Status>
<Filter>
<Prefix>keypre/</Prefix>
</Filter>
</Rule>
</LifecycleConfiguration>

You can also apply different lifecycle rules to objects with different key prefixes:

<LifecycleConfiguration>
<Rule>
<Status>Enabled</Status>
<Filter>

1004 IBM Storage Ceph


<Prefix>keypre/</Prefix>
</Filter>
</Rule>
<Rule>
<Status>Enabled</Status>
<Filter>
<Prefix>mypre/</Prefix>
</Filter>
</Rule>
</LifecycleConfiguration>

Object tags

You can apply a lifecycle rule to only objects with a specific tag using the <Key> and <Value> elements:

<LifecycleConfiguration>
<Rule>
<Status>Enabled</Status>
<Filter>
<Tag>
<Key>key</Key>
<Value>value</Value>
</Tag>
</Filter>
</Rule>
</LifecycleConfiguration>

Both prefix and one or more tags

In a lifecycle rule, you can specify a filter based on both the key prefix and one or more tags. They must be wrapped in the <And>
element. A filter can have only one prefix, and zero or more tags:

<LifecycleConfiguration>
<Rule>
<Status>Enabled</Status>
<Filter>
<And>
<Prefix>key-prefix</Prefix>
<Tag>
<Key>key1</Key>
<Value>value1</Value>
</Tag>
<Tag>
<Key>key2</Key>
<Value>value2</Value>
</Tag>
...
</And>
</Filter>
</Rule>
</LifecycleConfiguration>

Additional Resources

See the S3 GET bucket lifecycle section for details on getting a bucket lifecycle.

See the S3 create or replace a bucket lifecycle section for details on creating a bucket lifecycle.

See the S3 delete a bucket lifecycle section for details on deleting a bucket lifecycle.

S3 GET bucket lifecycle


Edit online
To get a bucket lifecycle, use GET and specify a destination bucket.

Syntax

GET /_BUCKET_?lifecycle HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

IBM Storage Ceph 1005


Request Headers

See the S3 common request headers in Appendix B for more information about common request headers.

Response

The response contains the bucket lifecycle and its elements.

S3 create or replace a bucket lifecycle


Edit online
To create or replace a bucket lifecycle, use PUT and specify a destination bucket and a lifecycle configuration. The Ceph Object
Gateway only supports a subset of the S3 lifecycle functionality.

Syntax

PUT /_BUCKET_?lifecycle HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_


<LifecycleConfiguration>
<Rule>
<Expiration>
<Days>10</Days>
</Expiration>
</Rule>
...
<Rule>
</Rule>
</LifecycleConfiguration>

Request Headers

content-md5

Description
A base64 encoded MD-5 hash of the message

Valid Values
String No defaults or constraints.

Required
No

Reference
Edit online

See the S3 common request headers sfor more information on Amazon S3 common request headers.

See the S3 bucket lifecycles for more information on Amazon S3 bucket lifecycles.

S3 delete a bucket lifecycle


Edit online
To delete a bucket lifecycle, use DELETE and specify a destination bucket.

Syntax

DELETE /_BUCKET_?lifecycle HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

1006 IBM Storage Ceph


Request Headers

The request does not contain any special elements.

Response

The response returns common response status.

Reference
Edit online

See the S3 common request headers for more information on Amazon S3 common request headers.

See the S3 common response status codes for more information on Amazon S3 common response status codes.

S3 get bucket location


Edit online
Retrieves the bucket’s zone group. The user needs to be the bucket owner to call this. A bucket can be constrained to a zone group
by providing LocationConstraint during a PUT request.

Add the location subresource to the bucket resource as shown below.

Syntax

GET /_BUCKET_?location HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

Response Entities

LocationConstraint

Description
The zone group where bucket resides, an empty string for default zone group.

Type
String

S3 get bucket versioning


Edit online
Retrieves the versioning state of a bucket. The user needs to be the bucket owner to call this.

Add the versioning subresource to the bucket resource as shown below.

Syntax

GET /_BUCKET_?versioning HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

S3 put bucket versioning


Edit online
This subresource set the versioning state of an existing bucket. The user needs to be the bucket owner to set the versioning state. If
the versioning state has never been set on a bucket, then it has no versioning state. Doing a GET versioning request does not return a

IBM Storage Ceph 1007


versioning state value.

Setting the bucket versioning state:

Enabled: Enables versioning for the objects in the bucket. All objects added to the bucket receive a unique version ID. Suspended:
Disables versioning for the objects in the bucket. All objects added to the bucket receive the version ID null.

Syntax

PUT /BUCKET?versioning HTTP/1.1

Example

PUT /testbucket?versioning HTTP/1.1

Bucket Request Entities

VersioningConfiguration

Description
A container for the request.

Type
Container

Status

Description
Sets the versioning state of the bucket. Valid Values: Suspended/Enabled

Type
String

S3 get bucket access control lists


Edit online
Retrieves the bucket access control list. The user needs to be the bucket owner or to have been granted READ_ACP permission on
the bucket.

Add the acl subresource to the bucket request as shown below.

Syntax

GET /_BUCKET_?acl HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

Response Entities

AccessControlPolicy

Description
A container for the response.

Type
Container

AccessControlList

Description
A container for the ACL information.

Type
Container

Owner

Description

1008 IBM Storage Ceph


A container for the bucket owner’s ID and DisplayName.

Type
Container

ID

Description
The bucket owner’s ID.

Type
String

DisplayName

Description
The bucket owner’s display name.

Type
String

Grant

Description
A container for Grantee and Permission.

Type
Container

Grantee

Description
A container for the DisplayName and ID of the user receiving a grant of permission.

Type
Container

Permission

Description
The permission given to the Grantee bucket.

Type
String

S3 put bucket Access Control Lists


Edit online
Sets an access control to an existing bucket. The user needs to be the bucket owner or to have been granted WRITE_ACP permission
on the bucket.

Add the acl subresource to the bucket request as shown below.

Syntax

PUT /_BUCKET_?acl HTTP/1.1

Request Entities S3 list multipart uploads

AccessControlList

Description
A container for the ACL information.

Type
Container

Owner

IBM Storage Ceph 1009


Description
A container for the bucket owner’s ID and DisplayName.

Type
Container

ID

Description
The bucket owner’s ID.

Type
String

DisplayName

Description
The bucket owner’s display name.

Type
String

Grant

Description
A container for Grantee and Permission.

Type
Container

Grantee

Description
A container for the DisplayName and ID of the user receiving a grant of permission.

Type
Container

Permission

Description
The permission given to the Grantee bucket.

Type
String

S3 get bucket cors


Edit online
Retrieves the cors configuration information set for the bucket. The user needs to be the bucket owner or to have been granted
READ_ACP permission on the bucket.

Add the cors subresource to the bucket request as shown below.

Syntax

GET /_BUCKET_?cors HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

S3 put bucket cors


Edit online

1010 IBM Storage Ceph


Sets the cors configuration for the bucket. The user needs to be the bucket owner or to have been granted READ_ACP permission on
the bucket.

Add the cors subresource to the bucket request as shown below.

Syntax

PUT /_BUCKET_?cors HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

S3 delete a bucket cors


Edit online
Deletes the cors configuration information set for the bucket. The user needs to be the bucket owner or to have been granted
READ_ACP permission on the bucket.

Add the cors subresource to the bucket request as shown below.

Syntax

DELETE /_BUCKET_?cors HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

S3 list bucket object versions


Edit online
Returns a list of metadata about all the version of objects within a bucket. Requires READ access to the bucket.

Add the versions subresource to the bucket request as shown below.

Syntax

GET /_BUCKET_?versions HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

You can specify parameters for GET /_BUCKET_?versions, but none of them are required.

Parameters

prefix

Description
Returns in-progress uploads whose keys contain the specified prefix.

Type
String

delimiter

Description
The delimiter between the prefix and the rest of the object name.

Type
String

key-marker

Description
The beginning marker for the list of uploads.

IBM Storage Ceph 1011


Type
String

max-keys

Description
The maximum number of in-progress uploads. The default is 1000.

Type
Integer

version-id-marker

Description
Specifies the object version to begin the list.

Type
String

Response Entities

KeyMarker

Description
The key marker specified by the key-marker request parameter, if any.

Type
String

NextKeyMarker

Description
The key marker to use in a subsequent request if IsTruncated is true.

Type
String

NextUploadIdMarker

Description
The upload ID marker to use in a subsequent request if IsTruncated is true.

Type
String

IsTruncated

Description
If true, only a subset of the bucket’s upload contents were returned.

Type
Boolean

Size

Description
The size of the uploaded part.

Type
Integer

DisplayName

Description
The owner’s display name.

Type
String

ID

1012 IBM Storage Ceph


Description
The owner’s ID.

Type
String

Owner

Description
A container for the ID and DisplayName of the user who owns the object.

Type
Container

StorageClass

Description
The method used to store the resulting object. STANDARD or REDUCED_REDUNDANCY

Type
String

Version

Description
Container for the version information.

Type
Container

versionId

Description
Version ID of an object.

Type
String

versionIdMarker

Description
The last version of the key in a truncated response.

Type
String

S3 head bucket
Edit online
Calls HEAD on a bucket to determine if it exists and if the caller has access permissions. Returns 200 OK if the bucket exists and the
caller has permissions; 404 Not Found if the bucket does not exist; and, 403 Forbidden if the bucket exists but the caller does
not have access permissions.

Syntax

HEAD /_BUCKET_ HTTP/1.1


Host: cname.domain.com
Date: date
Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

S3 list multipart uploads


Edit online

IBM Storage Ceph 1013


GET /?uploads returns a list of the current in-progress multipart uploads, that is, the application initiates a multipart upload, but
the service hasn’t completed all the uploads yet.

Syntax

GET /_BUCKET_?uploads HTTP/1.1

You can specify parameters for GET /_BUCKET_?uploads, but none of them are required.

Parameters

prefix

Description
Returns in-progress uploads whose keys contain the specified prefix.

Type
String

delimiter

Description
The delimiter between the prefix and the rest of the object name.

Type
String

key-marker

Description
The beginning marker for the list of uploads.

Type
String

max-keys

Description
The maximum number of in-progress uploads. The default is 1000.

Type
Integer

max-uploads

Description
The maximum number of multipart uploads. The range is from 1-1000. The default is 1000.

Type
Integer

version-id-marker

Description
Ignored if key-marker isn’t specified. Specifies the ID of the first upload to list in lexicographical order at or following the ID.

Type
String

Response Entities

ListMultipartUploadsResult

Description
A container for the results.

Type
Container

ListMultipartUploadsResult.Prefix

Description

1014 IBM Storage Ceph


The prefix specified by the prefix request parameter, if any.

Type
String

Bucket

Description
The bucket that will receive the bucket contents.

Type
String

KeyMarker

Description
The key marker specified by the key-marker request parameter, if any.

Type
String

UploadIdMarker

Description
The marker specified by the upload-id-marker request parameter, if any.

Type
String

NextKeyMarker

Description
The key marker to use in a subsequent request if IsTruncated is true.

Type
String

NextUploadIdMarker

Description
The upload ID marker to use in a subsequent request if IsTruncated is true.

Type
String

MaxUploads

Description
The max uploads specified by the max-uploads request parameter.

Type
Integer

Delimiter

Description
If set, objects with the same prefix will appear in the CommonPrefixes list.

Type
String

IsTruncated

Description
If true, only a subset of the bucket’s upload contents were returned.

Type
Boolean

Upload

IBM Storage Ceph 1015


Description
A container for Key, UploadId, InitiatorOwner, StorageClass, and Initiated elements.

Type
Container

Key

Description
The key of the object once the multipart upload is complete.

Type
String

UploadId

Description
The ID that identifies the multipart upload.

Type
String

Initiator

Description
Contains the ID and DisplayName of the user who initiated the upload.

Type
Container

DisplayName

Description
The initiator’s display name.

Type
String

ID

Description
The initiator’s ID.

Type
String

Owner

Description
A container for the ID and DisplayName of the user who owns the uploaded object.

Type
Container

StorageClass

Description
The method used to store the resulting object. STANDARD or REDUCED_REDUNDANCY

Type
String

Initiated

Description
The date and time the user initiated the upload.

Type
Date

CommonPrefixes

1016 IBM Storage Ceph


Description
If multiple objects contain the same prefix, they will appear in this list.

Type
Container

CommonPrefixes.Prefix

Description
The substring of the key after the prefix as defined by the prefix request parameter.

Type
String

S3 bucket policies
Edit online
The Ceph Object Gateway supports a subset of the Amazon S3 policy language applied to buckets.

Creation and Removal

Ceph Object Gateway manages S3 Bucket policies through standard S3 operations rather than using the radosgw-admin CLI tool.

Administrators may use the s3cmd command to set or delete a policy.

Example

$ cat > examplepol


{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": ["arn:aws:iam::usfolks:user/fred"]},
"Action": "s3:PutObjectAcl",
"Resource": [
"arn:aws:s3:::happybucket/*"
]
}]
}

$ s3cmd setpolicy examplepol s3://happybucket


$ s3cmd delpolicy s3://happybucket

Limitations

Ceph Object Gateway only supports the following S3 actions:

s3:AbortMultipartUpload

s3:CreateBucket

s3:DeleteBucketPolicy

s3:DeleteBucket

s3:DeleteBucketWebsite

s3:DeleteObject

s3:DeleteObjectVersion

s3:GetBucketAcl

s3:GetBucketCORS

s3:GetBucketLocation

s3:GetBucketPolicy

IBM Storage Ceph 1017


s3:GetBucketRequestPayment

s3:GetBucketVersioning

s3:GetBucketWebsite

s3:GetLifecycleConfiguration

s3:GetObjectAcl

s3:GetObject

s3:GetObjectTorrent

s3:GetObjectVersionAcl

s3:GetObjectVersion

s3:GetObjectVersionTorrent

s3:ListAllMyBuckets

s3:ListBucketMultiPartUploads

s3:ListBucket

s3:ListBucketVersions

s3:ListMultipartUploadParts

s3:PutBucketAcl

s3:PutBucketCORS

s3:PutBucketPolicy

s3:PutBucketRequestPayment

s3:PutBucketVersioning

s3:PutBucketWebsite

s3:PutLifecycleConfiguration

s3:PutObjectAcl

s3:PutObject

s3:PutObjectVersionAcl

NOTE: Ceph Object Gateway does not support setting policies on users, groups, or roles.

The Ceph Object Gateway uses the RGW ‘tenant’ identifier in place of the Amazon twelve-digit account ID. Ceph Object Gateway
administrators who want to use policies between Amazon Web Service (AWS) S3 and Ceph Object Gateway S3 will have to use the
Amazon account ID as the tenant ID when creating users.

With AWS S3, all tenants share a single namespace. By contrast, Ceph Object Gateway gives every tenant its own namespace of
buckets. At present, Ceph Object Gateway clients trying to access a bucket belonging to another tenant MUST address it as
tenant:bucket in the S3 request.

In the AWS, a bucket policy can grant access to another account, and that account owner can then grant access to individual users
with user permissions. Since Ceph Object Gateway does not yet support user, role, and group permissions, account owners will need
to grant access directly to individual users.

IMPORTANT: Granting an entire account access to a bucket grants access to ALL users in that account.

Bucket policies do NOT support string interpolation.

Ceph Object Gateway supports the following condition keys:

aws:CurrentTime

1018 IBM Storage Ceph


aws:EpochTime

aws:PrincipalType

aws:Referer

aws:SecureTransport

aws:SourceIp

aws:UserAgent

aws:username

Ceph Object Gateway ONLY supports the following condition keys for the ListBucket action:

s3:prefix

s3:delimiter

s3:max-keys

Impact on Swift

Ceph Object Gateway provides no functionality to set bucket policies under the Swift API. However, bucket policies that have been
set with the S3 API govern Swift as well as S3 operations.

Ceph Object Gateway matches Swift credentials against Principals specified in a policy.

S3 get the request payment configuration on a bucket


Edit online
Uses the requestPayment subresource to return the request payment configuration of a bucket. The user needs to be the bucket
owner or to have been granted READ_ACP permission on the bucket.

Add the requestPayment subresource to the bucket request as shown below.

Syntax

GET /_BUCKET_?requestPayment HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

S3 set the request payment configuration on a bucket


Edit online
Uses the requestPayment subresource to set the request payment configuration of a bucket. By default, the bucket owner pays for
downloads from the bucket. This configuration parameter enables the bucket owner to specify that the person requesting the
download will be charged for the request and the data download from the bucket.

Add the requestPayment subresource to the bucket request as shown below.

Syntax

PUT /_BUCKET_?requestPayment HTTP/1.1


Host: cname.domain.com

Request Entities

Payer

Description
Specifies who pays for the download and request fees.

IBM Storage Ceph 1019


Type
Enum

RequestPaymentConfiguration

Description
A container for Payer.

Type
Container

Multi-tenant bucket operations


Edit online
When a client application accesses buckets, it always operates with the credentials of a particular user. Consequently, every bucket
operation has an implicit tenant in its context if no tenant is specified explicitly. Thus multi-tenancy is completely backward
compatible with previous releases, as long as the referred buckets and referring user belong to the same tenant.

Extensions employed to specify an explicit tenant differ according to the protocol and authentication system used.

In the following example, a colon character separates tenant and bucket. Thus a sample URL would be:

https://rgw.domain.com/tenant:bucket

By contrast, a simple Python example separates the tenant and bucket in the bucket method itself:

Example

from boto.s3.connection import S3Connection, OrdinaryCallingFormat


c = S3Connection(

aws_access_key_id="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_developer_con_api_multi
-tenant-bucket-operations_TESTER",
aws_secret_access_key="test123",
host="rgw.domain.com",
calling_format = OrdinaryCallingFormat()
)
bucket = c.get_bucket("tenant:bucket")

NOTE: It’s not possible to use S3-style subdomains using multi-tenancy, since host names cannot contain colons or any other
separators that are not already valid in bucket names. Using a period creates an ambiguous syntax. Therefore, the bucket-in-URL-
path format has to be used with multi-tenancy.

Reference
Edit online

See the Multi Tenancy section under User Management for additional details.

S3 Block Public Access


Edit online
You can use the S3 Block Public Access feature to set access points, buckets, and accounts to help you manage public access to
Amazon S3 resources.

Using this feature, bucket policies, access point policies, and object permissions can be overridden to allow public access. By
default, new buckets, access points, and objects do not allow public access.

The S3 API in the Ceph Object Gateway supports a subset of the AWS public access settings:

BlockPublicPolicy
This defines the setting to allow users to manage access point and bucket policies. This setting does not allow the users to
publicly share the bucket or the objects it contains. Existing access point and bucket policies are not affected by enabling this

1020 IBM Storage Ceph


setting.

Setting this option to TRUE causes the S3:

To reject calls to PUT Bucket policy.

To reject calls to PUT access point policy for all of the bucket's same-account access points.

IMPORTANT: Apply this setting at the account level so that users cannot alter a specific bucket's block public access setting.

NOTE: The TRUE setting only works if the specified policy allows public access.

RestrictPublicBuckets
This defines the setting to restrict access to a bucket or access point with public policy. The restriction applies to only AWS
service principals and authorized users within the bucket owner's account and access point owner's account.

This blocks cross-account access to the access point or bucket, except for the cases specified, while still allowing users within the
account to manage the access points or buckets.

Enabling this setting does not affect existing access point or bucket policies. It only defines that Amazon S3 blocks public and cross-
account access derived from any public access point or bucket policy, including non-public delegation to specific accounts.

NOTE: Access control lists (ACLs) are not currently supported by IBM Storage Ceph.

Bucket policies are assumed to be public unless defined otherwise. To block public access a bucket policy must give access only to
fixed values for one or more of the following:

NOTE: A fixed value does not contain a wildcard (*) or an AWS Identity and Access Management Policy Variable.

An AWS principal, user, role, or service principal

A set of Classless Inter-Domain Routings (CIDRs), using aws:SourceIp

aws:SourceArn

aws:SourceVpc

aws:SourceVpce

aws:SourceOwner

aws:SourceAccount

s3:x-amz-server-side-encryption-aws-kms-key-id

aws:userid, outside the pattern AROLEID:*

s3:DataAccessPointArnNOTE: When used in a bucket policy, this value can contain a wildcard for the access point name
without rendering the policy public, as long as the account ID is fixed.

s3:DataAccessPointPointAccount

The following example policy is considered public.

Example

{
"Principal": "*",
"Resource": "*",
"Action": "s3:PutObject",
"Effect": "Allow",
"Condition": { "StringLike": {"aws:SourceVpc": "vpc-*"}}
}

To make a policy non-public, include any of the condition keys with a fixed value.

Example

{
"Principal": "*",
"Resource": "*",
"Action": "s3:PutObject",

IBM Storage Ceph 1021


"Effect": "Allow",
"Condition": {"StringEquals": {"aws:SourceVpc": "vpc-91237329"}}
}

Additional Resources
Edit online

For more information about getting a PublicAccessBlock see S3 GET PublicAccessBlock.

For more information about creating or modifying a PublicAccessBlock see S3 PUT PublicAccessBlock.

For more information about deleting a PublicAccessBlock see S3 PUT PublicAccessBlock.

For more information about bucket policies, see S3 bucket policies.

See the Blocking public access to your Amazon S3 storage section of Amazon Simple Storage Service (S3) documentation.

S3 GET PublicAccessBlock

Edit online
To get the S3 Block Public Access feature configured, use GET and specify a destination AWS account.

Syntax

GET /v20180820/configuration/publicAccessBlock HTTP/1.1


Host: cname.domain.com
x-amz-account-id: AccountID

Request headers

For more information about common request headers, see S3 common request headers.

Response

The response is an HTTP 200 response and is returned in XML format.

Additional Resources
Edit online

For more information about the S3 Public Access Block feature, see S3 Block Public Access.

S3 PUT PublicAccessBlock

Edit online
Use this to create or modify the PublicAccessBlock configuration for an S3 bucket.

To use this operation, you must have the s3:PutBucketPublicAccessBlock permission.

IMPORTANT: If the PublicAccessBlock configuration is different between the bucket and the account, Amazon S3 uses the most
restrictive combination of the bucket-level and account-level settings.

Syntax

PUT /?publicAccessBlock HTTP/1.1


Host: Bucket.s3.amazonaws.com
Content-MD5: ContentMD5
x-amz-sdk-checksum-algorithm: ChecksumAlgorithm
x-amz-expected-bucket-owner: ExpectedBucketOwner
<?xml version="1.0" encoding="UTF-8"?>
<PublicAccessBlockConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<BlockPublicAcls>boolean</BlockPublicAcls>

1022 IBM Storage Ceph


<IgnorePublicAcls>boolean</IgnorePublicAcls>
<BlockPublicPolicy>boolean</BlockPublicPolicy>
<RestrictPublicBuckets>boolean</RestrictPublicBuckets>
</PublicAccessBlockConfiguration>

Request headers

For more information about common request headers, see S3 common request headers.

Response

The response is an HTTP 200 response and is returned with an empty HTTP body.

Additional Resources
Edit online

For more information about the S3 Public Access Block feature, see S3 Block Public Access.

S3 delete PublicAccessBlock

Edit online
Use this to delete the PublicAccessBlock configuration for an S3 bucket.

Syntax

DELETE /v20180820/configuration/publicAccessBlock HTTP/1.1


Host: s3-control.amazonaws.com
x-amz-account-id: AccountId

Request headers

For more information about common request headers, see S3 common request headers.

Response

The response is an HTTP 200 response and is returned with an empty HTTP body.

Additional Resources
Edit online

For more information about the S3 Public Access Block feature, see S3 Block Public Access.

S3 object operations
Edit online
As a developer, you can perform object operations with the Amazon S3 application programming interface (API) through the Ceph
Object Gateway.

The following table list the Amazon S3 functional operations for objects, along with the function's support status.

Feature Status
Get Object Supported
Get Object Information Supported
Put Object Lock Supported
Get Object Lock Supported
Put Object Legal Hold Supported
Get Object Legal Hold Supported
Put Object Retention Supported

IBM Storage Ceph 1023


Feature Status
Get Object Retention Supported
Put Object Tagging Supported
Get Object Tagging Supported
Delete Object Tagging Supported
Put Object Supported
Delete Object Supported
Delete Multiple Objects Supported
Get Object ACLs Supported
Put Object ACLs Supported
Copy Object Supported
Post Object Supported
Options Object Supported
Initiate Multipart Upload Supported
Add a Part to a Multipart Upload Supported
List Parts of a Multipart Upload Supported
Assemble Multipart Upload Supported
Copy Multipart Upload Supported
Abort Multipart Upload Supported
Multi-Tenancy Supported

Prerequisites
S3 get an object from a bucket
S3 get information on an object
S3 put object lock
S3 get object lock
S3 put object legal hold
S3 get object legal hold
S3 put object retention
S3 get object retention
S3 put object tagging
S3 get object tagging
S3 delete object tagging
S3 add an object to a bucket
S3 delete an object
S3 delete multiple objects
S3 get an object’s Access Control List (ACL)
S3 set an object’s Access Control List (ACL)
S3 copy an object
S3 add an object to a bucket using HTML forms
S3 determine options for a request
S3 initiate a multipart upload
S3 add a part to a multipart upload
S3 list the parts of a multipart upload
S3 assemble the uploaded parts
S3 copy a multipart upload
S3 abort a multipart upload
S3 Hadoop interoperability

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

Prerequisites
1024 IBM Storage Ceph
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

S3 get an object from a bucket


Edit online
Retrieves an object from a bucket:

Syntax

GET /BUCKET/OBJECT HTTP/1.1

Add the versionId subresource to retrieve a particular version of the object:

Syntax

GET /BUCKET/OBJECT?versionId=VERSION_ID HTTP/1.1

Request Headers

range

Description
The range of the object to retrieve.

Valid Values
Range:bytes=beginbyte-endbyte

Required
No

if-modified-since

Description
Gets only if modified since the timestamp.

Valid Values
Timestamp

Required
No

if-match

Description
Gets only if object ETag matches ETag.

Valid Values
Entity Tag

Required
No

if-none-match

Description
Gets only if object ETag matches ETag.

Valid Values
Entity Tag

Required
No

Response Headers

IBM Storage Ceph 1025


Content-Range

Description
Data range, will only be returned if the range header field was specified in the request.

x-amz-version-id

Description
Returns the version ID or null.

S3 get information on an object


Edit online
Returns information about an object. This request will return the same header information as with the Get Object request, but will
include the metadata only, not the object data payload.

Retrieves the current version of the object:

Syntax

HEAD /BUCKET/OBJECT HTTP/1.1

Add the versionId subresource to retrieve info for a particular version:

Syntax

HEAD /BUCKET/OBJECT?versionId=VERSION_ID HTTP/1.1

Request Headers

range

Description
The range of the object to retrieve.

Valid Values
Range:bytes=beginbyte-endbyte

Required
No

if-modified-since

Description
Gets only if modified since the timestamp.

Valid Values
Timestamp

Required
No

if-match

Description
Gets only if object ETag matches ETag.

Valid Values
Entity Tag

Required
No

if-none-match

Description
Gets only if object ETag matches ETag.

1026 IBM Storage Ceph


Valid Values
Entity Tag

Required
No

Response Headers

x-amz-version-id

Description
Returns the version ID or null.

S3 put object lock


Edit online
The put object lock API places a lock configuration on the selected bucket. With object lock, you can store objects using a write-
once-read-many(WORM) model. Object lock ensures an object is not deleted or overwritten, for a fixed amount of time or
indefinitely. The rule specified in the object lock configuration is applied by default to every new object placed in the selected bucket.

IMPORTANT: Enable the object lock when creating a bucket otherwise, the operation fails.

Syntax

PUT /BUCKET?object-lock HTTP/1.1

Example

PUT /testbucket?object-lock HTTP/1.1

Request Entities

ObjectLockConfiguration

Description
A container for the request.

Type
Container

Required
Yes

ObjectLockEnabled

Description
Indicates whether this bucket has an object lock configuration enabled.

Type
String

Required
Yes

Rule

Description
The object lock rule in place for the specified bucket.

Type
Container

Required
No

DefaultRetention

Description

IBM Storage Ceph 1027


The default retention period applied to new objects placed in the specified bucket.

Type
Container

Required
No

Mode

Description
The default object lock retention mode. Valid values: GOVERNANCE/COMPLIANCE.

Type
Container

Required
Yes

Days

Description
The number of days specified for the default retention period.

Type
Integer

Required
No

Years

Description
The number of years specified for the default retention period.

Type
Integer

Required
No

HTTP Response

400

Status Code
MalformedXML

Description
The XML is not well-formed.

409

Status Code
InvalidBucketState

Description
The bucket object lock is not enabled.

Reference
Edit online

For more information about this API call, see S3 API.

S3 get object lock

1028 IBM Storage Ceph


Edit online
The get object lock API retrieves the lock configuration for a bucket.

Syntax

GET /BUCKET?object-lock HTTP/1.1

Example

GET /testbucket?object-lock HTTP/1.1

Response Entities

ObjectLockConfiguration

Description
A container for the request.

Type
Container

Required
Yes

ObjectLockEnabled

Description
Indicates whether this bucket has an object lock configuration enabled.

Type
String

Required
Yes

Rule

Description
The object lock rule is in place for the specified bucket.

Type
Container

Required
No

DefaultRetention

Description
The default retention period applied to new objects placed in the specified bucket.

Type
Container

Required
No

Mode

Description
The default object lock retention mode. Valid values: GOVERNANCE/COMPLIANCE.

Type
Container

Required
Yes

Days

Description

IBM Storage Ceph 1029


The number of days specified for the default retention period.

Type
Integer

Required
No

Years

Description
The number of years specified for the default retention period.

Type
Integer

Required
No

Reference
Edit online

For more information about this API call, see S3 API.

S3 put object legal hold


Edit online
The put object legal hold API applies a legal hold configuration to the selected object. With a legal hold in place, you cannot
overwrite or delete an object version. A legal hold does not have an associated retention period and remains in place until you
explicitly remove it.

Syntax

PUT /BUCKET/OBJECT?legal-hold&versionId= HTTP/1.1

Example

PUT /testbucket/testobject?legal-hold&versionId= HTTP/1.1

The versionId subresource retrieves a particular version of the object.

Request Entities

LegalHold

Description
A container for the request.

Type
Container

Required
Yes

Status

Description
Indicates whether the specified object has a legal hold in place. Valid values: ON/OFF

Type
String

Required
Yes

1030 IBM Storage Ceph


Reference
Edit online

For more information about this API call, see S3 API.

S3 get object legal hold


Edit online
The get object legal hold API retrieves an object’s current legal hold status.

Syntax

GET /BUCKET/OBJECT?legal-hold&versionId= HTTP/1.1

Example

GET /testbucket/testobject?legal-hold&versionId= HTTP/1.1

The versionId subresource retrieves a particular version of the object.

Response Entities

LegalHold

Description
A container for the request.

Type
Container

Required
Yes

Status

Description
Indicates whether the specified object has a legal hold in place. Valid values: ON/OFF

Type
String

Required
Yes

Reference
Edit online

For more information about this API call, see S3 API.

S3 put object retention


Edit online
The put object retention API places an object retention configuration on an object. A retention period protects an object version for a
fixed amount of time. There are two modes: governance mode and compliance mode. These two retention modes apply different
levels of protection to your objects.

NOTE: During this period, your object is write-once-read-many(WORM protected) and can not be overwritten or deleted.

Syntax

IBM Storage Ceph 1031


PUT /BUCKET/OBJECT?retention&versionId= HTTP/1.1

Example

PUT /testbucket/testobject?retention&versionId= HTTP/1.1

The versionId subresource retrieves a particular version of the object.

Request Entities

Retention

Description
A container for the request.

Type
Container

Required
Yes

Mode

Description
Retention mode for the specified object. Valid values: GOVERNANCE/COMPLIANCE

Type
String

Required
Yes

RetainUntilDate

Description
Retention date. Format: 2020-01-05T00:00:00.000Z

Type
Timestamp

Required
Yes

Reference
Edit online

For more information about this API call, see S3 API.

S3 get object retention


Edit online
The get object retention API retrieves an object retention configuration on an object.

Syntax

GET /BUCKET/OBJECT?retention&versionId= HTTP/1.1

Example

GET /testbucket/testobject?retention&versionId= HTTP/1.1

The versionId subresource retrieves a particular version of the object.

Response Entities

Retention

1032 IBM Storage Ceph


Description
A container for the request.

Type
Container

Required
Yes

Mode

Description
Retention mode for the specified object. Valid values: GOVERNANCE/COMPLIANCE

Type
String

Required
Yes

RetainUntilDate

Description
Retention date. Format: 2020-01-05T00:00:00.000Z

Type
Timestamp

Required
Yes

Reference
Edit online

For more information about this API call, see S3 API.

S3 put object tagging


Edit online
The put object tagging API associates tags with an object. A tag is a key-value pair. To put tags of any other version, use the
versionId query parameter. You must have permission to perform the s3:PutObjectTagging action. By default, the bucket
owner has this permission and can grant this permission to others.

Syntax

PUT /BUCKET/OBJECT?tagging&versionId= HTTP/1.1

Example

PUT /testbucket/testobject?tagging&versionId= HTTP/1.1

Request Entities

Tagging

Description
A container for the request.

Type
Container

Required
Yes

TagSet

IBM Storage Ceph 1033


Description
A collection of a set of tags.

Type
String

Required
Yes

Reference
Edit online

For more information about this API call, see S3 API.

S3 get object tagging


Edit online
The get object tagging API returns the tag of an object. By default, the GET operation returns information on the current version of an
object.

NOTE: For a versioned bucket, you can have multiple versions of an object in your bucket. To retrieve tags of any other version, add
the versionId query parameter in the request.

Syntax

GET /BUCKET/OBJECT?tagging&versionId= HTTP/1.1

Example

GET /testbucket/testobject?tagging&versionId= HTTP/1.1

Reference
Edit online

For more information about this API call, see S3 API.

S3 delete object tagging


Edit online
The delete object tagging API removes the entire tag set from the specified object. You must have permission to perform the
s3:DeleteObjectTagging action, to use this operation.

NOTE:

To delete tags of a specific object version, add the versionId query parameter in the request.

Syntax

DELETE /_BUCKET_/_OBJECT_?tagging&versionId= HTTP/1.1

Example

DELETE /testbucket/testobject?tagging&versionId= HTTP/1.1

Reference
Edit online

For more information about this API call, see S3 API.

1034 IBM Storage Ceph


S3 add an object to a bucket
Edit online
Adds an object to a bucket. You must have write permissions on the bucket to perform this operation.

Syntax

PUT /_BUCKET_/_OBJECT_ HTTP/1.1

Request Headers

content-md5

Description
A base64 encoded MD-5 hash of the message.

Valid Values
A string. No defaults or constraints.

Required
No

content-type

Description
A standard MIME type.

Valid Values
Any MIME type. Default: binary/octet-stream.

Required
No

x-amz-meta-<...>*

Description
User metadata. Stored with the object.

Valid Values
A string up to 8kb. No defaults.

Required
No

x-amz-acl

Description
A canned ACL.

Valid Values
private, public-read, public-read-write, authenticated-read

Required
No

Response Headers

x-amz-version-id

Description
Returns the version ID or null.

S3 delete an object
Edit online

IBM Storage Ceph 1035


Removes an object. Requires WRITE permission set on the containing bucket.

Deletes an object. If object versioning is on, it creates a marker.

Syntax

DELETE /BUCKET/OBJECT HTTP/1.1

To delete an object when versioning is on, you must specify the versionId subresource and the version of the object to delete.

DELETE /BUCKET/OBJECT?versionId=VERSION_ID HTTP/1.1

S3 delete multiple objects


Edit online
This API call deletes multiple objects from a bucket.

Syntax

POST /_BUCKET_/_OBJECT_?delete HTTP/1.1

S3 get an object’s Access Control List (ACL)


Edit online
Returns the ACL for the current version of the object:

Syntax

GET /BUCKET/OBJECT?acl HTTP/1.1

Add the versionId subresource to retrieve the ACL for a particular version:

Syntax

GET /BUCKET/OBJECT?versionId=VERSION_ID&acl HTTP/1.1

Response Headers

x-amz-version-id

Description
Returns the version ID or null.

Response Entities

AccessControlPolicy

Description
A container for the response.

Type
Container

AccessControlList

Description
A container for the ACL information.

Type
Container

Owner

Description
A container for the bucket owner’s ID and DisplayName.

1036 IBM Storage Ceph


Type
Container

ID

Description
The bucket owner’s ID.

Type
String

DisplayName

Description
The bucket owner’s display name.

Type
String

Grant

Description
A container for Grantee and Permission.

Type
Container

Grantee

Description
A container for the DisplayName and ID of the user receiving a grant of permission.

Type
Container

Permission

Description
The permission given to the Grantee bucket.

Type
String

S3 set an object’s Access Control List (ACL)


Edit online
Sets an object ACL for the current version of the object.

Syntax

PUT /BUCKET/OBJECT?acl

Request Entities

AccessControlPolicy

Description
A container for the response.

Type
Container

AccessControlList

Description
A container for the ACL information.

Type

IBM Storage Ceph 1037


Container

Owner

Description
A container for the bucket owner’s ID and DisplayName.

Type
Container

ID

Description
The bucket owner’s ID.

Type
String

DisplayName

Description
The bucket owner’s display name.

Type
String

Grant

Description
A container for Grantee and Permission.

Type
Container

Grantee

Description
A container for the DisplayName and ID of the user receiving a grant of permission.

Type
Container

Permission

Description
The permission given to the Grantee bucket.

Type
String

S3 copy an object
Edit online
To copy an object, use PUT and specify a destination bucket and the object name.

Syntax

PUT /_DEST_BUCKET_/_DEST_OBJECT_ HTTP/1.1


x-amz-copy-source: _SOURCE_BUCKET_/_SOURCE_OBJECT_

Request Headers

x-amz-copy-source

Description
The source bucket name + object name.

Valid Values

1038 IBM Storage Ceph


BUCKET_/_OBJECT

Required
Yes

x-amz-acl

Description
A canned ACL.

Valid Values
private, public-read, public-read-write, authenticated-read

Required
No

x-amz-copy-if-modified-since

Description
Copies only if modified since the timestamp.

Valid Values
Timestamp

Required
No

x-amz-copy-if-unmodified-since

Description
Copies only if unmodified since the timestamp.

Valid Values
Timestamp

Required
No

x-amz-copy-if-match

Description
Copies only if object ETag matches ETag.

Valid Values
Entity Tag

Required
No

x-amz-copy-if-none-match

Description
Copies only if object ETag matches ETag.

Valid Values
Entity Tag

Required
No

Response Entities

CopyObjectResult

Description
A container for the response elements.

Type
Container

LastModified

IBM Storage Ceph 1039


Description
The last modified date of the source object.

Type
Date

Etag

Description
The ETag of the new object.

Type
String

S3 add an object to a bucket using HTML forms


Edit online
Adds an object to a bucket using HTML forms. You must have write permissions on the bucket to perform this operation.

Syntax

POST _BUCKET_/_OBJECT_ HTTP/1.1

S3 determine options for a request


Edit online
A preflight request to determine if an actual request can be sent with the specific origin, HTTP method, and headers.

Syntax

OPTIONS /_OBJECT_ HTTP/1.1

S3 initiate a multipart upload


Edit online
Initiates a multi-part upload process. Returns a UploadId, which you can specify when adding additional parts, listing parts, and
completing or abandoning a multi-part upload.

Syntax

POST /BUCKET/OBJECT?uploads

Request Headers

content-md5

Description
A base64 encoded MD-5 hash of the message.

Valid Values
A string. No defaults or constraints.

Required
No

content-type

Description
A standard MIME type.

Valid Values

1040 IBM Storage Ceph


Any MIME type. Default: binary/octet-stream

Required
No

x-amz-meta-<...>

Description
User metadata. Stored with the object.

Valid Values
A string up to 8kb. No defaults.

Required
No

x-amz-acl

Description
A canned ACL.

Valid Values
private, public-read, public-read-write, authenticated-read

Required
No

Response Entities

InitiatedMultipartUploadsResult

Description
A container for the results.

Type
Container

Bucket

Description
The bucket that will receive the object contents.

Type
String

Key

Description
The key specified by the key request parameter, if any.

Type
String

UploadId

Description
The ID specified by the upload-id request parameter identifying the multipart upload, if any.

Type
String

S3 add a part to a multipart upload


Edit online
Adds a part to a multi-part upload.

Specify the uploadId subresource and the upload ID to add a part to a multi-part upload:

IBM Storage Ceph 1041


Syntax

PUT /BUCKET/OBJECT?partNumber=&uploadId=UPLOAD_ID HTTP/1.1

The following HTTP response might be returned:

HTTP Response

404

Status Code
NoSuchUpload

Description
Specified upload-id does not match any initiated upload on this object.

S3 list the parts of a multipart upload


Edit online
Specify the uploadId subresource and the upload ID to list the parts of a multi-part upload:

Syntax

GET /BUCKET/OBJECT?uploadId=UPLOAD_ID HTTP/1.1

Response Entities

InitiatedMultipartUploadsResult

Description
A container for the results.

Type
Container

Bucket

Description
The bucket that will receive the object contents.

Type
String

Key

Description
The key specified by the key request parameter, if any.

Type
String

UploadId

Description
The ID specified by the upload-id request parameter identifying the multipart upload, if any.

Type
String

Initiator

Description
Contains the ID and DisplayName of the user who initiated the upload.

Type
Container

ID

1042 IBM Storage Ceph


Description
The initiator’s ID.

Type
String

DisplayName

Description
The initiator’s display name.

Type
String

Owner

Description
A container for the ID and DisplayName of the user who owns the uploaded object.

Type
Container

StorageClass

Description
The method used to store the resulting object. STANDARD or REDUCED_REDUNDANCY

Type
String

PartNumberMarker

Description
The part marker to use in a subsequent request if IsTruncated is true. Precedes the list.

Type
String

NextPartNumberMarker

Description
The next part marker to use in a subsequent request if IsTruncated is true. The end of the list.

Type
String

IsTruncated

Description
If true, only a subset of the object’s upload contents were returned.

Type
Boolean

Part

Description
A container for Key, Part, InitiatorOwner, StorageClass, and Initiated elements.

Type
Container

PartNumber

Description
A container for Key, Part, InitiatorOwner, StorageClass, and Initiated elements.

Type
Integer

ETag

IBM Storage Ceph 1043


Description
The part’s entity tag.

Type
String

Size

Description
The size of the uploaded part.

Type
Integer

S3 assemble the uploaded parts


Edit online
Assembles uploaded parts and creates a new object, thereby completing a multipart upload.

Specify the uploadId subresource and the upload ID to complete a multi-part upload:

Syntax

POST /BUCKET/OBJECT?uploadId=UPLOAD_ID HTTP/1.1

Request Entities

CompleteMultipartUpload

Description
A container consisting of one or more parts.

Type
Container

Required
Yes

Part

Description
A container for the PartNumber and ETag.

Type
Container

Required
Yes

PartNumber

Description
The identifier of the part.

Type
Integer

Required
Yes

ETag

Description
The part’s entity tag.

Type
String

1044 IBM Storage Ceph


Required
Yes

Response Entities

CompleteMultipartUploadResult

Description
A container for the response.

Type
Container

Location

Description
The resource identifier (path) of the new object.

Type
URI

bucket

Description
The name of the bucket that contains the new object.

Type
String

Key

Description
The object’s key.

Type
String

ETag

Description
The entity tag of the new object.

Type
String

S3 copy a multipart upload


Edit online
Uploads a part by copying data from an existing object as data source.

Specify the uploadId subresource and the upload ID to perform a multi-part upload copy:

Syntax

PUT /_BUCKET_/_OBJECT_?partNumber=PartNumber&uploadId=_UPLOADID HTTP/1.1


Host: cname.domain.com

Authorization: AWS _ACCESS_KEY_:_HASH_OF_HEADER_AND_SECRET_

Request Headers

x-amz-copy-source

Description
The source bucket name and object name.

Valid Values
BUCKET/OBJECT

IBM Storage Ceph 1045


Required
Yes

x-amz-copy-source-range

Description
The range of bytes to copy from the source object.

Valid Values
Range: bytes=first-last, where the first and last are the zero-based byte offsets to copy. For example,bytes=0-9
indicates that you want to copy the first ten bytes of the source.

Required
No

Response Entities

CopyPartResult

Description
A container for all response elements.

Type
Container

ETag

Description
Returns the ETag of the new part.

Type
String

LastModified

Description
Returns the date the part was last modified.

Type
String

Reference
Edit online

For more information about this feature, see the Amazon S3 site.

S3 abort a multipart upload


Edit online
Aborts a multipart upload.

Specify the uploadId subresource and the upload ID to abort a multi-part upload:

Syntax

DELETE /BUCKET/OBJECT?uploadId=UPLOAD_ID HTTP/1.1

S3 Hadoop interoperability
Edit online

1046 IBM Storage Ceph


For data analytics applications that require Hadoop Distributed File System (HDFS) access, the Ceph Object Gateway can be
accessed using the Apache S3A connector for Hadoop. The S3A connector is an open-source tool that presents S3 compatible object
storage as an HDFS file system with HDFS file system read and write semantics to the applications while data is stored in the Ceph
Object Gateway.

Ceph Object Gateway is fully compatible with the S3A connector that ships with Hadoop 2.7.3.

S3 select operations (Technology Preview)


Edit online
IMPORTANT: Technology Preview features are not supported with IBM production service level agreements (SLAs), might not be
functionally complete, and IBM does not recommend using them for production. These features provide early access to upcoming
product features, enabling customers to test functionality and provide feedback during the development process.

As a developer, you can use the S3 select API for high-level analytic applications like Spark-SQL to improve latency and throughput.
For example a CSV S3 object with several gigabytes of data, the user can extract a single column which is filtered by another column
using the following query:

Example

select customerid from s3Object where age>30 and age<65;

Currently, the S3 object must retrieve data from the Ceph OSD through the Ceph Object Gateway before filtering and extracting data.
There is improved performance when the object is large and the query is more specific.

Prerequisites
S3 select content from an object
S3 supported select functions
S3 alias programming construct
S3 CSV parsing explained

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

A S3 user created with user access.

S3 select content from an object


Edit online
The select object content API filters the content of an object through the structured query language (SQL). See the Metadata
collected by inventory section in the AWS Systems Manager User Guide for an example of the description of what should reside in the
inventory object. In the request, you must specify the data serialization format as, comma-separated values (CSV) of the object to
retrieve the specified content. Amazon Web Services(AWS) command-line interface(CLI) select object content uses the CSV format
to parse object data into records and returns only the records specified in the query.

NOTE: You must specify the data serialization format for the response. You must have s3:GetObject permission for this operation.

Syntax

POST /BUCKET/KEY?select&select-type=2 HTTP/1.1\r\n

Example

POST /testbucket/sample1csv?select&select-type=2 HTTP/1.1\r\n

IBM Storage Ceph 1047


Bucket

Description
The bucket to select object content from.

Type
String

Required
Yes

Key

Description
The object key.

Length Constraints
Minimum length of 1.

Type
String

Required
Yes

SelectObjectContentRequest

Description
Root level tag for the select object content request parameters.

Type
String

Required
Yes

Expression

Description
The expression that is used to query the object.

Type
String

Required
Yes

ExpressionType

Description
The type of the provided expression for example SQL.

Type
String

Valid Values
SQL

Required
Yes

InputSerialization

Description
Describes the format of the data in the object that is being queried.

Type
String

Required

1048 IBM Storage Ceph


Yes

OutputSerialization

Description
Format of data returned in comma separator and new-line.

Type
String

Required
Yes

Response entities

If the action is successful, the service sends back HTTP 200 response. Data is returned in XML format by the service:

Payload

Description
Root level tag for the payload parameters.

Type
String

Required
Yes

Records

Description
The records event.

Type
Base64-encoded binary data object

Required
No

Stats

Description
The stats event.

Type
Long

Required
No

The Ceph Object Gateway supports the following response:

Example

{:event-type,records} {:content-type,application/octet-stream} :message-type,event}

Syntax (for CSV)

aws --endpoint-URL http://localhost:80 s3api select-object-content


--bucket BUCKET_NAME
--expression-type SQL
--input-serialization
{"CSV": {"FieldDelimiter": "," , "QuoteCharacter": "\"" , "RecordDelimiter" : "\n" ,
"QuoteEscapeCharacter" : "\\" , "FileHeaderInfo": "USE" }, "CompressionType": "NONE"}
--output-serialization {"CSV": {}}
--key OBJECT_NAME
--expression "select count(0) from s3object where int(_1)<10;" output.csv

Example (for CSV)

aws --endpoint-url http://localhost:80 s3api select-object-content


--bucket testbucket

IBM Storage Ceph 1049


--expression-type 'SQL'
--input-serialization
'{"CSV": {"FieldDelimiter": "," , "QuoteCharacter": "\"" , "RecordDelimiter" : "\n" ,
"QuoteEscapeCharacter" : "\\" , "FileHeaderInfo": "USE" }, "CompressionType": "NONE"}'
--output-serialization '{"CSV": {}}'
--key testobject
--expression "select count(0) from s3object where int(_1)<10;" output.csv

Supported features

Currently, only part of the AWS s3 select command is supported:

Features Details Description Example


Arithmetic ^*%/+-() select (int(_1)+int(_2))*int(_9) from s3object;
operators
Arithmetic % modulo select count(*) from s3object where cast(_1 as int)%2 ==
operators 0;
Arithmetic ^ power-of select cast(2^10 as int) from s3object;
operators
Compare > < >= ⇐ == != select _1,_2 from s3object where
operators (int(_1)+int(_3))>int(_5);
logical operator AND or NOT select count(*) from s3object where not (int(1)>123 and
int(_5)<200);
logical operator is null Returns true/false for null
indication in expression
logical operator is not null Returns true/false for null
and NULL indication in expression
logical operator unknown state Review null-handle and observe select count(*) from s3object where null
and NULL the results of logical operations and (3>2);
with NULL. The query returns 0.
Arithmetic unknown state Review null-handle and observe select count(*) from s3object where
operator with the results of binary operations (null+1) and (3>2);
NULL with NULL. The query returns 0.
Compare with unknown state Review null-handle and observe select count(*) from s3object where
NULL results of compare operations (null*1.5) != 3;
with NULL. The query returns 0.
missing column unknown state select count(*) from s3object where _1 is
null;
projection Similar to if or select case when (1+1==(2+1)*3) then ‘case_1’ when
column then or else ((4*3)==(12)) then ‘case_2’ else
‘case_else’ end, age*2 from s3object;
logical operator coalesce returns first non-null select
argument coalesce(nullif(5,5),nullif(1,1.0),age+12)
from s3object;
logical operator nullif returns null in case both select nullif(cast(_1 as int),cast(_2 as
arguments are equal, or else the int)) from s3object;
first one,nullif(1,1)=NULL
nullif(null,1)=NULL
nullif(2,1)=2
logical operator {expression} in ( .. select count(*) from s3object where ‘ben’
{expression} ..) in
(trim(_5),substring(_1,char_length(_1)-3,3)
,last_name);
logical operator {expression} between select count(*) from stdin where
{expression} and substring(_3,char_length(_3),1) between “x”
{expression} and trim(_1) and
substring(_3,char_length(_3)-1,1) == “:”;
logical operator {expression} like {match- select count(*) from s3object where
pattern} first_name like ‘%de_’; select count(*)
from s3object where _1 like "%a[r-s];
casting operator select cast(123 as int)%2 from s3object;
casting operator select cast(123.456 as float)%2 from
s3object;
casting operator select cast(‘ABC0-9’ as
string),cast(substr(‘ab12cd’,3,2) as int)*4
from s3object;

1050 IBM Storage Ceph


casting operator select cast(substring(‘publish on 2007-01-
01’,12,10) as timestamp) from s3object;
non AWS casting select int(_1),int( 1.2 + 3.4) from
operator s3object;

non AWS casting select float(1.2) from s3object;


operator
non AWS casting select timestamp(‘1999:10:10-12:23:44’)
operator from s3object;

Aggregation sun select sum(int(_1)) from s3object;


Function
Aggregation avg select avg(cast(_1 a float) + cast(_2 as
Function int)) from s3object;

Aggregation min select avg(cast(_1 a float) + cast(_2 as


Function int)) from s3object;

Aggregation max select max(float(_1)),min(int(_5)) from


Function s3object;

Aggregation count select count(*) from s3object where


Function (int(1)+int(_3))>int(_5);

Timestamp extract select count(*) from s3object where


Functions extract(‘year’,timestamp(_2)) > 1950 and
extract(‘year’,timestamp(_1)) < 1960;
Timestamp dateadd select count(0) from s3object where
Functions datediff(‘year’,timestamp(_1),dateadd(‘day’
,366,timestamp(_1))) == 1;
Timestamp datediff select count(0) from s3object where
Functions datediff(‘month’,timestamp(_1),timestamp(_2
))) == 2;
Timestamp utcnow select count(0) from s3object where
Functions datediff(‘hours’,utcnow(),dateadd(‘day’,1,u
tcnow())) == 24
String Functions substring select count(0) from s3object where
int(substring(_1,1,4))>1950 and
int(substring(_1,1,4))<1960;
String Functions trim select trim(‘ foobar ‘) from s3object;
String Functions trim select trim(trailing from ‘ foobar ‘) from
s3object;
String Functions trim select trim(leading from ‘ foobar ‘) from
s3object;
String Functions trim select trim(both ‘12’ from
‘1112211foobar22211122’) from s3objects;
String Functions lower or upper select trim(both ‘12’ from
‘1112211foobar22211122’) from s3objects;
String Functions char_length, select count(*) from s3object where
character_lengt char_length(_3)==3;
h
Complex select sum(cast(_1 as int)),max(cast(_3 as
queries int)), substring(‘abcdefghijklm’, (2-
1)*3+sum(cast(_1 as int))/sum(cast(_1 as
int))+1, (count() + count(0))/count(0))
from s3object;
alias support select int(_1) as a1, int(_2) as a2 ,
(a1+a2) as a3 from s3object where a3>100
and a3<300;

Reference
Edit online

See Amazon’s S3 Select Object Content API for more details.

S3 supported select functions

IBM Storage Ceph 1051


Edit online
S3 select supports the following functions: .Timestamp

timestamp(string)

Description
Converts string to the basic type of timestamp.

Supported
Currently it converts: yyyy:mm:dd hh:mi:dd

extract(date-part,timestamp)

Description
Returns integer according to date-part extract from input timestamp.

Supported
date-part: year,month,week,day.

dateadd(date-part ,integer,timestamp)

Description
Returns timestamp, a calculation based on the results of input timestamp and date-part.

Supported
date-part : year,month,day.

datediff(date-part,timestamp,timestamp)

Description
Return an integer, a calculated result of the difference between two timestamps according to date-part.

Supported
date-part : year,month,day,hours.

utcnow()

Description
Return timestamp of current time.

Aggregation

count()

Description
Returns integers based on the number of rows that match a condition if there is one.

sum(expression)
Description
Returns a summary of expression on each row that matches a condition if there is one.

avg(expression)

Description
Returns an average expression on each row that matches a condition if there is one.

max(expression)

Description
Returns the maximal result for all expressions that match a condition if there is one.

min(expression)

Description
Returns the minimal result for all expressions that match a condition if there is one.

String

substring(string,from,to)

1052 IBM Storage Ceph


Description
Returns a string extract from input string based on from and to inputs.

Char_length

Description
Returns a number of characters in string. Character_length also does the same.

Trim

Description
Trims the leading or trailing characters from the target string, default is a blank character.

Upper\lower

Description
Converts characters into uppercase or lowercase.

NULL

The NULL value is missing or unknown that is NULL can not produce a value on any arithmetic operations. The same applies to
arithmetic comparison, any comparison to NULL is NULL that is unknown.

Table 1. The NULL use case


A is NULL Result(NULL=UNKNOWN)
Not A NULL
A or False NULL
A or True True
A or A NULL
A and False False
A and True NULL
A and A NULL

Reference
Edit online

See Amazon’s S3 Select Object Content API for more details.

S3 alias programming construct


Edit online
Alias programming construct is an essential part of the s3 select language because it enables better programming with objects that
contain many columns or complex queries. When a statement with alias construct is parsed, it replaces the alias with a reference to
the right projection column and on query execution, the reference is evaluated like any other expression. Alias maintains result-
cache that is if an alias is used more than once, the same expression is not evaluated and the same result is returned because the
result from the cache is used. Currently, IBM supports the column alias.

Example

select int(_1) as a1, int(_2) as a2 , (a1+a2) as a3 from s3object where a3>100 and a3<300;")

S3 CSV parsing explained


Edit online
You can define the CSV definitions with input serialization uses these default values:

Use {\n} for row-delimiter.

IBM Storage Ceph 1053


Use {“} for quote.

Use {\} for escape characters.

The csv-header-info is parsed upon USE appearing in the AWS-CLI; this is the first row in the input object containing the schema.
Currently, output serialization and compression-type is not supported. The S3 select engine has a CSV parser which parses S3-
objects:

Each row ends with a row-delimiter.

The field-separator separates the adjacent columns.

The successive field separator defines the NULL column.

The quote-character overrides the field-separator; that is, the field separator is any character between the quotes.

The escape character disables any special character except the row delimiter.

The following are examples of CSV parsing rules:

Table 1. CSV parsing


Feature Description Input (Tokens)
NULL Successive field delimiter ,,1,,2, ==> {null}{null}{1}{null}{2}{null}
QUOTE The quote character overrides the 11,22,”a,b,c,d”,last ==> {11}{22}{“a,b,c,d”}{last}
field delimiter.
Escape The escape character overrides the A container for the object owner’s ID and DisplayName
meta-character.
row There is no closed quote; row 11,22,a=”str,44,55,66 ==> {11}{22}{a=”str,44,55,66}
delimiter delimiter is the closing line.
csv header FileHeaderInfo tag USE value means each token on the first line is the column-name;
info IGNORE value means to skip the first line.

Reference
Edit online

See Amazon’s S3 Select Object Content API for more details.

Ceph Object Gateway and the Swift API


Edit online
As a developer, you can use a RESTful application programming interface (API) that is compatible with the Swift API data access
model. You can manage the buckets and objects stored in IBM Storage Ceph cluster through the Ceph Object Gateway.

The following table describes the support status for current Swift functional features:

Feature Status Remarks


Authentication Supported
Get Account Metadata Supported No custom metadata
Swift ACLs Supported Supports a subset of Swift ACLs
List Containers Supported
List Container's Objects Supported
Create Container Supported
Delete Container Supported
Get Container Metadata Supported
Add/Update Container Metadata Supported
Delete Container Metadata Supported
Get Object Supported
Create/Update an Object Supported

1054 IBM Storage Ceph


Feature Status Remarks
Create Large Object Supported
Delete Object Supported
Copy Object Supported
Get Object Metadata Supported
Add/Update Object Metadata Supported
Temp URL Operations Supported
CORS Not Supported
Expiring Objects Supported
Object Versioning Not Supported
Static Website Not Supported

Prerequisites
Swift API limitations
Create a Swift user
Swift authenticating a user
Swift container operations
Swift object operations
Swift temporary URL operations
Swift multi-tenancy container operations

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

Swift API limitations


Edit online
IMPORTANT: The following limitations should be used with caution. There are implications related to your hardware selections, so
you should always discuss these requirements with your IBM account team.

Maximum object size when using Swift API: 5GB

Maximum metadata size when using Swift API: There is no defined limit on the total size of user metadata that can be
applied to an object, but a single HTTP request is imited to 16,000 bytes.

Create a Swift user


Edit online
To test the Swift interface, create a Swift subuser. Creating a Swift user is a two-step process. The first step is to create the user. The
second step is to create the secret key.

IBM Storage Ceph 1055


NOTE: In a multi-site deployment, always create a user on a host in the master zone of the master zone group.

Prerequisites
Edit online

Installation of the Ceph Object Gateway.

Root-level access to the Ceph Object Gateway node.

1. Create the Swift user:

Syntax

radosgw-admin subuser create --uid=NAME --subuser=NAME:swift --access=full

Replace NAME with the Swift user name, for example:

Example

[root@host01 ~]# radosgw-admin subuser create --uid=testuser --subuser=testuser:swift --


access=full
{
"user_id": "testuser",
"display_name": "First User",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [
{
"id": "testuser:swift",
"permissions": "full-control"
}
],
"keys": [
{
"user": "testuser",
"access_key": "O8JDE41XMI74O185EHKD",
"secret_key": "i4Au2yxG5wtr1JK01mI8kjJPM93HNAoVWOSTdJd6"
}
],
"swift_keys": [
{
"user": "testuser:swift",
"secret_key": "13TLtdEW7bCqgttQgPzxFxziu0AgabtOc6vM8DLA"
}
],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw"
}

2. Create the secret key:

Syntax

1056 IBM Storage Ceph


radosgw-admin key create --subuser=NAME:swift --key-type=swift --gen-secret

Replace NAME with the Swift user name, for example:

Example

[root@host01 ~]# radosgw-admin key create --subuser=testuser:swift --key-type=swift --gen-


secret
{
"user_id": "testuser",
"display_name": "First User",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [
{
"id": "testuser:swift",
"permissions": "full-control"
}
],
"keys": [
{
"user": "testuser",
"access_key": "O8JDE41XMI74O185EHKD",
"secret_key": "i4Au2yxG5wtr1JK01mI8kjJPM93HNAoVWOSTdJd6"
}
],
"swift_keys": [
{
"user": "testuser:swift",
"secret_key": "a4ioT4jEP653CDcdU8p4OuhruwABBRZmyNUbnSSt"
}
],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw"
}

Swift authenticating a user


Edit online
To authenticate a user, make a request containing an X-Auth-User and a X-Auth-Key in the header.

Syntax

GET /auth HTTP/1.1


Host: swift.example.com
X-Auth-User: johndoe
X-Auth-Key: R7UUOLFDI2ZI9PRCQ53K

Example Response

IBM Storage Ceph 1057


HTTP/1.1 204 No Content
Date: Mon, 16 Jul 2012 11:05:33 GMT
Server: swift
X-Storage-Url: https://swift.example.com
X-Storage-Token: UOlCCC8TahFKlWuv9DB09TWHF0nDjpPElha0kAa
Content-Length: 0
Content-Type: text/plain; charset=UTF-8

NOTE: You can retrieve data about Ceph’s Swift-compatible service by executing GET requests using the X-Storage-Url value
during authentication.

Reference
Edit online

Swift request headers

Swift response headers

Swift container operations


Edit online
As a developer, you can perform container operations with the Swift application programming interface (API) through the Ceph
Object Gateway. You can list, create, update, and delete containers. You can also add or update the container's metadata.

Prerequisites
Swift container operations
Swift update a container’s Access Control List (ACL)
Swift list containers
Swift list a container’s objects
Swift create a container
Swift delete a container
Swift add or update the container metadata

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

Swift container operations


Edit online
A container is a mechanism for storing data objects. An account can have many containers, but container names must be unique.
This API enables a client to create a container, set access controls and metadata, retrieve a container’s contents, and delete a
container. Since this API makes requests related to information in a particular user’s account, all requests in this API must be
authenticated unless a container’s access control is deliberately made publicly accessible, that is, allows anonymous requests.

NOTE: The Amazon S3 API uses the term bucket to describe a data container. When you hear someone refer to a bucket within the
Swift API, the term bucket might be construed as the equivalent of the term container.

One facet of object storage is that it does not support hierarchical paths or directories. Instead, it supports one level consisting of
one or more containers, where each container might have objects. The RADOS Gateway’s Swift-compatible API supports the notion
of pseudo-hierarchical containers, which is a means of using object naming to emulate a container, or directory hierarchy without

1058 IBM Storage Ceph


actually implementing one in the storage system. You can name objects with pseudo-hierarchical names, for example,
photos/buildings/empire-state.jpg, but container names cannot contain a forward slash (/) character.

IMPORTANT: When uploading large objects to versioned Swift containers, use the --leave-segments option with the python-
swiftclient utility. Not using --leave-segments overwrites the manifest file. Consequently, an existing object is overwritten,
which leads to data loss.

Swift update a container’s Access Control List (ACL)


Edit online
When a user creates a container, the user has read and write access to the container by default. To allow other users to read a
container’s contents or write to a container, you must specifically enable the user. You can also specify +*+ in the X-Container-
Read or X-Container-Write settings, which effectively enables all users to either read from or write to the container. Setting +*+
makes the container public. That is it enables anonymous users to either read from or write to the container.

Syntax

POST /_AP_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_
X-Container-Read: *
X-Container-Write: _UID1_, _UID2_, _UID3_

X-Container-Read

Description
The user IDs with read permissions for the container.

Type
Comma-separated string values of user IDs.

Required
No

X-Container-Write

Description
The user IDs with write permissions for the container.

Type
Comma-separated string values of user IDs.

Required
No

Swift list containers


Edit online
A GET request that specifies the API version and the account will return a list of containers for a particular user account. Since the
request returns a particular user’s containers, the request requires an authentication token. The request cannot be made
anonymously.

Syntax

GET /API_VERSION/ACCOUNT HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: AUTH_TOKEN

Request Parameters

limit

Description
Limits the number of results to the specified value.

IBM Storage Ceph 1059


Type
Integer

Valid Values
N/A

Required
Yes

format

Description
Limits the number of results to the specified value.

Type
Integer

Valid Values
json or xml

Required
No

marker

Description
Returns a list of results greater than the marker value.

Type
String

Valid Values
N/A

Required
No

The response contains a list of containers, or returns with an HTTP 204 response code.

Response Entities

account

Description
A list for account information.

Type
Container

container

Description
The list of containers.

Type
Container

name

Description
The name of a container.

Type
String

bytes

Description
The size of the container.

Type

1060 IBM Storage Ceph


Integer

Swift list a container’s objects


Edit online
To list the objects within a container, make a GET request with the API version, account, and the name of the container. You can
specify query parameters to filter the full list, or leave out the parameters to return a list of the first 10,000 object names stored in
the container.

Syntax

GET /_AP_VERSION_/_TENANT_:_CONTAINER_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_

Request Parameters

format
Description
Limits the number of results to the specified value.

Type
Integer

Valid Values
json or xml

Required
No

prefix
Description
Limits the result set to objects beginning with the specified prefix.

Type
String

Valid Values
N/A

Required
No

marker

Description
Returns a list of results greater than the marker value.

Type
String

Valid Values
N/A

Required
No

limit

Description
Limits the number of results to the specified value.

Type
Integer

Valid Values
0 - 10,000

IBM Storage Ceph 1061


Required
No

delimiter

Description
The delimiter between the prefix and the rest of the object name.

Type
String

Valid Values
N/A

Required
No

path

Description
The pseudo-hierarchical path of the objects.

Type
String

Valid Values
N/A

Required
No

Response Entities

container

Description
The container.

Type
Container

object

Description
An object within the container.

Type
Container

name

Description
The name of an object within the container.

Type
String

hash

Description
A hash code of the object’s contents.

Type
String

last_modified

Description
The last time the object’s contents were modified.

Type

1062 IBM Storage Ceph


Date

content_type

Description
The type of content within the object.

Type
String

Swift create a container


Edit online
To create a new container, make a PUT request with the API version, account, and the name of the new container. The container
name must be unique, must not contain a forward-slash (/) character, and should be less than 256 bytes. You can include access
control headers and metadata headers in the request. You can also include a storage policy identifying a key for a set of placement
pools. For example, execute radosgw-admin zone get to see a list of available keys under placement_pools. A storage policy
enables you to specify a special set of pools for the container, for example, SSD-based storage. The operation is idempotent. If you
make a request to create a container that already exists, it will return with a HTTP 202 return code, but will not create another
container.

Syntax

PUT /_AP_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_
X-Container-Read: _COMMA_SEPARATED_UIDS_
X-Container-Write: _COMMA_SEPARATED_UIDS_
X-Container-Meta-_KEY_:VALUE
X-Storage-Policy: _PLACEMENT_POOLS_KEY_

Headers

X-Container-Read

Description
The user IDs with read permissions for the container.

Type
Comma-separated string values of user IDs.

Required
No

X-Container-Write

Description
The user IDs with write permissions for the container.

Type
Comma-separated string values of user IDs.

Required
No

X-Container-Meta-_KEY

Description
A user-defined metadata key that takes an arbitrary string value.

Type
String

Required
No

X-Storage-Policy

IBM Storage Ceph 1063


Description
The key that identifies the storage policy under placement_pools for the Ceph Object Gateway. Execute radosgw-admin
zone get for available keys.

Type
String

Required
No

If a container with the same name already exists, and the user is the container owner then the operation will succeed. Otherwise, the
operation will fail.

HTTP Response

409

Status Code
BucketAlreadyExists

Description
The container already exists under a different user’s ownership.

Swift delete a container


Edit online
To delete a container, make a DELETE request with the API version, account, and the name of the container. The container must be
empty. If you’d like to check if the container is empty, execute a HEAD request against the container. Once you’ve successfully
removed the container, you’ll be able to reuse the container name.

Syntax

DELETE /_AP_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_

HTTP Response

204

Status Code
NoContent

Description
The container was removed.

Swift add or update the container metadata


Edit online
To add metadata to a container, make a POST request with the API version, account, and container name. You must have write
permissions on the container to add or update metadata.

Syntax

POST /_AP_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_
X-Container-Meta-Color: red
X-Container-Meta-Taste: salty

Request Headers

X-Container-Meta-_KEY

1064 IBM Storage Ceph


Description
A user-defined metadata key that takes an arbitrary string value.

Type
String

Required
No

Swift object operations


Edit online
As a developer, you can perform object operations with the Swift application programming interface (API) through the Ceph Object
Gateway. You can list, create, update, and delete objects. You can also add or update the object's metadata.

Prerequisites
Swift object operations
Swift get an object
Swift create or update an object
Swift delete an object
Swift copy an object
Swift get object metadata
Swift add or update object metadata

Prerequisites
Edit online

A running IBM Storage Ceph cluster.

A RESTful client.

Swift object operations


Edit online
An object is a container for storing data and metadata. A container might have many objects, but the object names must be unique.
This API enables a client to create an object, set access controls and metadata, retrieve an object’s data and metadata, and delete
an object. Since this API makes requests related to information in a particular user’s account, all requests in this API must be
authenticated. Unless the container or object’s access control is deliberately made publicly accessible, that is, allows anonymous
requests.

Swift get an object


Edit online
To retrieve an object, make a GET request with the API version, account, container, and object name. You must have read
permissions on the container to retrieve an object within it.

Syntax

GET /_AP_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_/_OBJECT_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_

Request Headers

IBM Storage Ceph 1065


range

Description
To retrieve a subset of an object’s contents, you can specify a byte range.

Type
Date

Required
No

If-Modified-Since

Description
Only copies if modified since the date and time of the source object’s last_modified attribute.

Type
Date

Required
No

If-Unmodified-Since

Description
Only copies if not modified since the date and time of the source object’s last_modified attribute.

Type
Date

Required
No

Copy-If-Match

Description
Copies only if the ETag in the request matches the source object’s ETag.

Type
ETag

Required
No

Copy-If-None-Match

Description
Copies only if the ETag in the request does not match the source object’s ETag.

Type
ETag

Required
No

Response Headers

Content-Range

Description
The range of the subset of object contents. Returned only if the range header field was specified in the request.

Swift create or update an object


Edit online
To create a new object, make a PUT request with the API version, account, container name, and the name of the new object. You
must have write permission on the container to create or update an object. The object name must be unique within the container.

1066 IBM Storage Ceph


The PUT request is not idempotent, so if you do not use a unique name, the request will update the object. However, you can use
pseudo-hierarchical syntax in the object name to distinguish it from another object of the same name if it is under a different
pseudo-hierarchical directory. You can include access control headers and metadata headers in the request.

Syntax

PUT /_AP_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_

Request Headers

ETag

Description
An MD5 hash of the object’s contents. Recommended.

Type
String

Valid Values
N/A

Required
No

Content-Type

Description
An MD5 hash of the object’s contents.

Type
String

Valid Values
N/A

Required
No

Transfer-Encoding

Description
Indicates whether the object is part of a larger aggregate object.

Type
String

Valid Values
chunked

Required
No

Swift delete an object


Edit online
To delete an object, make a DELETE request with the API version, account, container, and object name. You must have write
permissions on the container to delete an object within it. Once you’ve successfully deleted the object, you will be able to reuse the
object name.

Syntax

DELETE /_API_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_/_OBJECT_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_

IBM Storage Ceph 1067


Swift copy an object
Edit online
Copying an object allows you to make a server-side copy of an object, so that you do not have to download it and upload it under
another container. To copy the contents of one object to another object, you can make either a PUT request or a COPY request with
the API version, account, and the container name.

For a PUT request, use the destination container and object name in the request, and the source container and object in the request
header.

For a Copy request, use the source container and object in the request, and the destination container and object in the request
header. You must have write permission on the container to copy an object. The destination object name must be unique within the
container. The request is not idempotent, so if you do not use a unique name, the request will update the destination object. You can
use pseudo-hierarchical syntax in the object name to distinguish the destination object from the source object of the same name if it
is under a different pseudo-hierarchical directory. You can include access control headers and metadata headers in the request.

Syntax

PUT /_AP_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_ HTTP/1.1


X-Copy-From: _TENANT_:_SOURCE_CONTAINER_/_SOURCE_OBJECT_
Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_

or alternatively:

Syntax

COPY /_AP_VERSION_/_ACCOUNT_/_TENANT_:_SOURCE_CONTAINER_/_SOURCE_OBJECT_ HTTP/1.1


Destination: _TENANT_:_DEST_CONTAINER_/_DEST_OBJECT_

Request Headers

X-Copy-From

Description
Used with a PUT request to define the source container/object path.

Type
String

Required
Yes, if using PUT.

Destination

Description
Used with a COPY request to define the destination container/object path.

Type
String

Required
Yes, if using COPY.

If-Modified-Since

Description
Only copies if modified since the date and time of the source object’s last_modified attribute.

Type
Date

Required
No

If-Unmodified-Since

Description

1068 IBM Storage Ceph


Only copies if not modified since the date and time of the source object’s last_modified attribute.

Type
Date

Required
No

Copy-If-Match

Description
Copies only if the ETag in the request matches the source object’s ETag.

Type
ETag

Required
No

Copy-If-None-Match

Description
Copies only if the ETag in the request does not match the source object’s ETag.

Type
ETag

Required
No

Swift get object metadata


Edit online
To retrieve an object’s metadata, make a HEAD request with the API version, account, container, and object name. You must have
read permissions on the container to retrieve metadata from an object within the container. This request returns the same header
information as the request for the object itself, but it does not return the object’s data.

Syntax

HEAD /_AP_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_/_OBJECT_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_

Swift add or update object metadata


Edit online
To add metadata to an object, make a POST request with the API version, account, container, and object name. You must have write
permissions on the parent container to add or update metadata.

Syntax

POST /_AP_VERSION_/_ACCOUNT_/_TENANT_:_CONTAINER_/_OBJECT_ HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: _AUTH_TOKEN_

Request Headers

X-Object-Meta-_KEY

Description
A user-defined meta data key that takes an arbitrary string value.

Type
String

IBM Storage Ceph 1069


Required
No

Swift temporary URL operations


Edit online
To allow temporary access, temp url functionality is supported by swift endpoint of radosgw. For example GET requests, to objects
without the need to share credentials.

For this functionality, initially the value of X-Account-Meta-Temp-URL-Key and optionally X-Account-Meta-Temp-URL-Key-2
should be set. The Temp URL functionality relies on a HMAC-SHA1 signature against these secret keys.

Swift get temporary URL objects


Swift POST temporary URL keys

Swift get temporary URL objects


Edit online
Temporary URL uses a cryptographic HMAC-SHA1 signature, which includes the following elements:

The value of the Request method, GET for instance

The expiry time, in the format of seconds since the epoch, that is, Unix time

The request path starting from v1 onwards

The above items are normalized with newlines appended between them, and a HMAC is generated using the SHA-1 hashing
algorithm against one of the Temp URL Keys posted earlier.

A sample python script to demonstrate the above is given below:

Example

import hmac
from hashlib import sha1
from time import time

method = 'GET'
host = 'https://objectstore.example.com'
duration_in_seconds = 300 # Duration for which the url is valid
expires = int(time() + duration_in_seconds)
path = '/v1/your-bucket/your-object'
key = 'secret'
hmac_body = '%s\n%s\n%s' % (method, expires, path)
hmac_body = hmac.new(key, hmac_body, sha1).hexdigest()
sig = hmac.new(key, hmac_body, sha1).hexdigest()
rest_uri = "{host}{path}?temp_url_sig={sig}&temp_url_expires={expires}".format(
host=host, path=path, sig=sig, expires=expires)
print rest_uri

Example Output

https://objectstore.example.com/v1/your-bucket/your-object?
temp_url_sig=ff4657876227fc6025f04fcf1e82818266d022c6&temp_url_expires=1423200992

Swift POST temporary URL keys


Edit online
A POST request to the swift account with the required Key will set the secret temp URL key for the account against which temporary
URL access can be provided to accounts. Up to two keys are supported, and signatures are checked against both the keys, if present,
so that keys can be rotated without invalidating the temporary URLs.

1070 IBM Storage Ceph


Syntax

POST /API_VERSION/ACCOUNT HTTP/1.1


Host: FULLY_QUALIFIED_DOMAIN_NAME
X-Auth-Token: AUTH_TOKEN

Request Headers

X-Account-Meta-Temp-URL-Key

Description
A user-defined key that takes an arbitrary string value.

Type
String

Required
Yes

X-Account-Meta-Temp-URL-Key-2

Description
A user-defined key that takes an arbitrary string value.

Type
String

Required
No

Swift multi-tenancy container operations


Edit online
When a client application accesses containers, it always operates with credentials of a particular user. In IBM Storage Ceph cluster,
every user belongs to a tenant. Consequently, every container operation has an implicit tenant in its context if no tenant is specified
explicitly. Thus multi-tenancy is completely backward compatible with previous releases, as long as the referred containers and
referring user belong to the same tenant.

Extensions employed to specify an explicit tenant differ according to the protocol and authentication system used.

A colon character separates tenant and container, thus a sample URL would be:

Example

https://rgw.domain.com/tenant:container

By contrast, in a create_container() method, simply separate the tenant and container in the container method itself:

Example

create_container("tenant:container")

The Ceph RESTful API specifications


Edit online
As a storage administrator, you can access the various Ceph sub-systems through the Ceph RESTful API endpoints. This is a
reference section for the available Ceph RESTful API methods.

Prerequisites
Ceph summary
Authentication
Ceph File System
Storage cluster configuration

IBM Storage Ceph 1071


CRUSH rules
Erasure code profiles
Feature toggles
Grafana
Storage cluster health
Host
Logs
Ceph Manager modules
Ceph Monitor
Ceph OSD
Ceph Object Gateway
REST APIs for manipulating a role
Ceph Orchestrator
Pools
Prometheus
RADOS block device
Performance counters
Roles
Services
Settings
Ceph task
Telemetry
Ceph users

Prerequisites
Edit online

An understanding of how to use a RESTful API.

A healthy running IBM Storage Ceph cluster.

The Ceph Manager dashboard module is enabled.

Ceph summary
Edit online
The method reference for using the Ceph RESTful API summary endpoint to display the Ceph summary details.

GET /api/summary

Description
Display a summary of Ceph details.

Example

GET /api/summary HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

1072 IBM Storage Ceph


Reference
Edit online

See the Ceph RESTful API for more details.

Authentication
Edit online
The method reference for using the Ceph RESTful API auth endpoint to initiate a session with IBM Storage Ceph. POST /api/auth

Curl Example

curl -i -k --location -X POST https://192.168.0.44:8443/api/auth -H Accept:


application/vnd.ceph.api.v1.0+json -H Content-Type: application/json --data {"password":
"admin@123", "username": "admin"}

Example

POST /api/auth HTTP/1.1


Host: example.com
Content-Type: application/json

{
"password": "STRING",
"username": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/auth/check

Description
Check the requirement for an authentication token.

Example

POST /api/auth/check?token=STRING HTTP/1.1


Host: example.com
Content-Type: application/json

{
"token": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

IBM Storage Ceph 1073


500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/auth/logout

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Ceph File System


Edit online
The method reference for using the Ceph RESTful API cephfs endpoint to manage Ceph File Systems (CephFS).

GET /api/cephfs

Example

GET /api/cephfs HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/cephfs/_FS_ID

Parameters

Replace FS_ID with the Ceph File System identifier string.

Example

GET /api/cephfs/FS_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

1074 IBM Storage Ceph


500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/cephfs/_FSID/client/_CLIENT_ID

Parameters

Replace FS_ID with the Ceph File System identifier string.

Replace CLIENT_ID with the Ceph client identifier string.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/cephfs/_FSID/clients

Parameters

Replace FS_ID with the Ceph File System identifier string.

Example

GET /api/cephfs/FS_ID/clients HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/cephfs/_FSID/get_root_directory

Description
The root directory that can not be fetched using the ls_dir API call.

Parameters

Replace FS_ID with the Ceph File System identifier string.

Example

GET /api/cephfs/FS_ID/get_root_directory HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

IBM Storage Ceph 1075


GET /api/cephfs/_FSID/ls_dir

Description
List directories for a given path.

Parameters

Replace FS_ID with the Ceph File System identifier string.

Queries:

path - The string value where you want to start the listing. The default path is /, if not given.

depth - An integer value specifying the number of steps to go down the directory tree.

Example

GET /api/cephfs/FS_ID/ls_dir HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/cephfs/_FSID/mds_counters

Parameters

Replace FS_ID with the Ceph File System identifier string.

Queries:

counters - An integer value.

Example

GET /api/cephfs/FS_ID/mds_counters HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/cephfs/_FSID/quota

Description
Display the CephFS quotas for the given path.

Parameters

Replace FS_ID with the Ceph File System identifier string.

Queries:

path - A required string value specifying the directory path.

Example

1076 IBM Storage Ceph


GET /api/cephfs/FS_ID/quota?path=STRING HTTP/1.1
Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/cephfs/_FSID/quota

Description
Sets the quota for a given path.

Parameters

Replace FS_ID with the Ceph File System identifier string.

max_bytes - A string value defining the byte limit.

max_files - A string value defining the file limit.

path - A string value defining the path to the directory or file.

Example

PUT /api/cephfs/FS_ID/quota HTTP/1.1


Host: example.com
Content-Type: application/json

{
"max_bytes": "STRING",
"max_files": "STRING",
"path": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing, check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/cephfs/_FSID/snapshot

Description
Remove a snapsnot.

Parameters

Replace FS_ID with the Ceph File System identifier string.

Queries:

name - A required string value specifying the snapshot name.

path - A required string value defining the path to the directory.

Status Codes

IBM Storage Ceph 1077


202 Accepted – Operation is still executing, check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/cephfs/_FSID/snapshot

Description
Create a snapshot.

Parameters

Replace FS_ID with the Ceph File System identifier string.

name - A string value specifying the snapshot name. If no name is specified, then a name using the current time in
RFC3339 UTC format is generated.

path - A string value defining the path to the directory.

Example

POST /api/cephfs/FS_ID/snapshot HTTP/1.1


Host: example.com
Content-Type: application/json

{
"name": "STRING",
"path": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing, check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/cephfs/_FSID/tree

Description
Remove a directory.

Parameters

Replace FS_ID with the Ceph File System identifier string.

Queries:

path - A required string value defining the path to the directory.

Status Codes

202 Accepted – Operation is still executing, check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

1078 IBM Storage Ceph


403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/cephfs/_FSID/tree

Description
Creates a directory.

Parameters

Replace FS_ID with the Ceph File System identifier string.

path - A string value defining the path to the directory.

Example

POST /api/cephfs/FS_ID/tree HTTP/1.1


Host: example.com
Content-Type: application/json

{
"path": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing, check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Storage cluster configuration


Edit online
The method reference for using the Ceph RESTful API cluster_conf endpoint to manage the IBM Storage Ceph cluster.

GET /api/cluster_conf

Example

GET /api/cluster_conf HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

IBM Storage Ceph 1079


POST /api/cluster_conf

Example

POST /api/cluster_conf HTTP/1.1


Host: example.com
Content-Type: application/json

{
"name": "STRING",
"value": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing, check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/cluster_conf

Example

PUT /api/cluster_conf HTTP/1.1


Host: example.com
Content-Type: application/json

{
"options": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing, check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/cluster_conf/filter

Description
Display the storage cluster configuration by name.

Parameters

Queries:

names - A string value for the configuration option names.

Example

GET /api/cluster_conf/filter HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

1080 IBM Storage Ceph


401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/cluster_conf/_NAME

Parameters

Replace NAME with the storage cluster configuration name.

Queries:

section - A required string value.

Status Codes

202 Accepted – Operation is still executing, check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/cluster_conf/_NAME

Parameters

Replace NAME with the storage cluster configuration name.

Example

GET /api/cluster_conf/NAME HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

CRUSH rules
Edit online
The method reference for using the Ceph RESTful API crush_rule endpoint to manage the CRUSH rules.

GET /api/crush_rule

Description
List the CRUSH rule configuration.

IBM Storage Ceph 1081


Example

GET /api/crush_rule HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/crush_rule

Example

POST /api/crush_rule HTTP/1.1


Host: example.com
Content-Type: application/json

{
"device_class": "STRING",
"failure_domain": "STRING",
"name": "STRING",
"root": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing, check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/crush_rule/_NAME

Parameters

Replace NAME with the rule name.

Status Codes

202 Accepted – Operation is still executing, check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/crush_rule/_NAME

Parameters

Replace NAME with the rule name.

Example

1082 IBM Storage Ceph


GET /api/crush_rule/NAME HTTP/1.1
Host: example.com

Status Codes

202 Accepted – Operation is still executing, check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Erasure code profiles


Edit online
The method reference for using the Ceph RESTful API erasure_code_profile endpoint to manage the profiles for erasure coding.

GET /api/erasure_code_profile

Description
List erasure-coded profile information.

Example

GET /api/erasure_code_profile HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/erasure_code_profile

Example

POST /api/erasure_code_profile HTTP/1.1


Host: example.com
Content-Type: application/json

{
"name": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing, check the task queue.

IBM Storage Ceph 1083


400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/erasure_code_profile/_NAME

Parameters

Replace NAME with the profile name.

Status Codes

202 Accepted – Operation is still executing, check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/erasure_code_profile/_NAME

Parameters

Replace NAME with the profile name.

Example

GET /api/erasure_code_profile/NAME HTTP/1.1


Host: example.com

Status Codes

202 Accepted – Operation is still executing, check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Feature toggles
Edit online
The method reference for using the Ceph RESTful API feature_toggles endpoint to manage the CRUSH rules.

GET /api/feature_toggles

Description
List the features of IBM Storage Ceph.

1084 IBM Storage Ceph


Example

GET /api/feature_toggles HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Grafana
Edit online
The method reference for using the Ceph RESTful API grafana endpoint to manage Grafana.

POST /api/grafana/dashboards

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/grafana/url

Description
List the Grafana URL instance.

Example

GET /api/grafana/url HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/grafana/validation/_PARAMS

IBM Storage Ceph 1085


Parameters

Replace PARAMS with a string value.

Example

GET /api/grafana/validation/PARAMS HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Storage cluster health


Edit online
The method reference for using the Ceph RESTful API health endpoint to display the storage cluster health details and status.

GET /api/health/full

Example

GET /api/health/full HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/health/minimal

Description
Display the storage cluster’s minimal health report.

Example

GET /api/health/minimal HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

1086 IBM Storage Ceph


403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Host
Edit online
The method reference for using the Ceph RESTful API host endpoint to display host, also known as node, information.

GET /api/host

Description
List the host specifications.

Parameters

Queries:

sources - A string value of host sources.

Example

GET /api/host HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/host

Example

POST /api/host HTTP/1.1


Host: example.com
Content-Type: application/json

{
"hostname": "STRING",
"status": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

IBM Storage Ceph 1087


500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/host/_HOST_NAME

Parameters

Replace HOST_NAME with the name of the node.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/host/_HOST_NAME

Description
Displays information on the given host.

Parameters

Replace HOST_NAME with the name of the node.

Example

GET /api/host/HOST_NAME HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/host/_HOST_NAME

Description
Updates information for the given host. This method is only supported when the Ceph Orchestrator is enabled.

Parameters

Replace HOST_NAME with the name of the node.

force - Force the host to enter maintenance mode.

labels - A list of labels.

maintenance - Enter or exit maintenance mode.

update_labels - Updates the labels.

Example

PUT /api/host/HOST_NAME HTTP/1.1


Host: example.com
Content-Type: application/json

{
"force": true,

1088 IBM Storage Ceph


"labels": [
"STRING"
],
"maintenance": true,
"update_labels": true
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/host/HOST_NAME/daemons

Parameters

Replace HOST_NAME with the name of the node.

Example

GET /api/host/HOST_NAME/daemons HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/host/HOST_NAME/devices

Parameters

Replace HOST_NAME with the name of the node.

Example

GET /api/host/HOST_NAME/devices HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/host/HOST_NAME/identify_device

Description
Identify a device by switching on the device’s light for a specified number of seconds.

Parameters

IBM Storage Ceph 1089


Replace HOST_NAME with the name of the node.

device - The device id, such as, /dev/dm-0 or ABC1234DEF567-1R1234_ABC8DE0Q.

duration - The number of seconds the device’s LED should flash.

Example

POST /api/host/HOST_NAME/identify_device HTTP/1.1


Host: example.com
Content-Type: application/json

{
"device": "STRING",
"duration": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/host/HOST_NAME/inventory

Description
Display the inventory of the host.

Parameters

Replace HOST_NAME with the name of the node.

Queries:

refresh - A string value to trigger an asynchronous refresh.

Example

GET /api/host/HOST_NAME/inventory HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/host/HOST_NAME/smart

Parameters

Replace HOST_NAME with the name of the node.

Example

GET /api/host/HOST_NAME/smart HTTP/1.1


Host: example.com

Status Codes

1090 IBM Storage Ceph


200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Logs
Edit online
The method reference for using the Ceph RESTful API logs endpoint to display log information.

GET /api/logs/all

Description
View all the log configuration.

Example

GET /api/logs/all HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Ceph Manager modules


Edit online
The method reference for using the Ceph RESTful API mgr/module endpoint to manage the Ceph Manager modules.

GET /api/mgr/module

Description
View the list of managed modules.

Example

GET /api/mgr/module HTTP/1.1


Host: example.com

IBM Storage Ceph 1091


Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/mgr/module/_MODULE_NAME

Description
Retrieve the values of the persistent configuration settings.

Parameters

Replace MODULE_NAME with the Ceph Manager module name.

Example

GET /api/mgr/module/MODULE_NAME HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/mgr/module/_MODULE_NAME

Description
Set the values of the persistent configuration settings.

Parameters

Replace MODULE_NAME with the Ceph Manager module name.

config - The values of the module options.

Example

PUT /api/mgr/module/MODULE_NAME HTTP/1.1


Host: example.com
Content-Type: application/json

{
"config": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

1092 IBM Storage Ceph


POST /api/mgr/module/MODULE_NAME/disable

Description
Disable the given Ceph Manager module.

Parameters

Replace MODULE_NAME with the Ceph Manager module name.

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/mgr/module/MODULE_NAME/enable

Description
Enable the given Ceph Manager module.

Parameters

Replace MODULE_NAME with the Ceph Manager module name.

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/mgr/module/MODULE_NAME/options

Description
View the options for the given Ceph Manager module.

Parameters

Replace MODULE_NAME with the Ceph Manager module name.

Example

GET /api/mgr/module/MODULE_NAME/options HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

IBM Storage Ceph 1093


Reference
Edit online

See the Ceph RESTful API for more details.

Ceph Monitor
Edit online
The method reference for using the Ceph RESTful API monitor endpoint to display information on the Ceph Monitor.

GET /api/monitor

Description
View Ceph Monitor details.

Example

GET /api/monitor HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Ceph OSD
Edit online
The method reference for using the Ceph RESTful API osd endpoint to manage the Ceph OSDs.

GET /api/osd

Example

GET /api/osd HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

1094 IBM Storage Ceph


POST /api/osd

Example

POST /api/osd HTTP/1.1


Host: example.com
Content-Type: application/json

{
"data": "STRING",
"method": "STRING",
"tracking_id": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/osd/flags

Description
View the Ceph OSD flags.

Example

GET /api/osd/flags HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/osd/flags

Description
Sets the Ceph OSD flags for the entire storage cluster.

Parameters

The recovery_deletes, sortbitwise, and pglog_hardlimit flags can not be unset.

The purged_snapshots flag can not be set.

IMPORTANT: You must include these four flags for a successful operation.

Example

PUT /api/osd/flags HTTP/1.1


Host: example.com
Content-Type: application/json

{
"flags": [
"STRING"
]
}

IBM Storage Ceph 1095


Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/osd/flags/individual

Description
View the individual Ceph OSD flags.

Example

GET /api/osd/flags/individual HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/osd/flags/individual

Description
Updates the noout, noin, nodown, and noup flags for an individual subset of Ceph OSDs.

Example

PUT /api/osd/flags/individual HTTP/1.1


Host: example.com
Content-Type: application/json

{
"flags": {
"nodown": true,
"noin": true,
"noout": true,
"noup": true
},
"ids": [
1
]
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

1096 IBM Storage Ceph


GET /api/osd/safe_to_delete

Parameters

Queries:

svc_ids - A required string of the Ceph OSD service identifier.

Example

GET /api/osd/safe_to_delete?svc_ids=STRING HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/osd/safe_to_destroy

Description
Check to see if the Ceph OSD is safe to destroy.

Parameters

Queries:

ids: A required string of the Ceph OSD service identifier.

Example

GET /api/osd/safe_to_destroy?ids=STRING HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/osd/_SVC_ID

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

Queries:

preserve_id - A string value.

force - A string value.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

IBM Storage Ceph 1097


401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/osd/_SVC_ID

Description
Returns collected data about a Ceph OSD.

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

Example

GET /api/osd/SVC_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/osd/_SVC_ID

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

Example

PUT /api/osd/SVC_ID HTTP/1.1


Host: example.com
Content-Type: application/json

{
"device_class": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/osd/_SVCID/destroy

Description
Marks Ceph OSD as being destroyed. The Ceph OSD must be marked down before being destroyed. This operation keeps the
Ceph OSD identifier intact, but removes the Cephx keys, configuration key data, and lockbox keys.

WARNING: This operation renders the data permanently unreadable.

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

1098 IBM Storage Ceph


Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/osd/_SVCID/devices

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

Example

GET /api/osd/SVC_ID/devices HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/osd/_SVCID/histogram

Description
Returns the Ceph OSD histogram data.

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

Example

GET /api/osd/SVC_ID/histogram HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/osd/_SVCID/mark

Description
Marks a Ceph OSD out, in, down, and lost.

NOTE: A Ceph OSD must be marked down before marking it lost.

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

IBM Storage Ceph 1099


Example

PUT /api/osd/SVC_ID/mark HTTP/1.1


Host: example.com
Content-Type: application/json

{
"action": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/osd/_SVCID/purge

Description
Removes the Ceph OSD from the CRUSH map.

NOTE: The Ceph OSD must be marked down before removal.

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/osd/_SVCID/reweight

Description
Temporarily reweights the Ceph OSD. When a Ceph OSD is marked out, the OSD’s weight is set to 0. When the Ceph OSD is
marked back in, the OSD’s weight is set to 1.

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

Example

POST /api/osd/SVC_ID/reweight HTTP/1.1


Host: example.com
Content-Type: application/json

{
"weight": "STRING"
}

Status Codes

201 Created – Resource created.

1100 IBM Storage Ceph


202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/osd/_SVCID/scrub

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

Queries:

deep - A boolean value, either true or false.

Example

POST /api/osd/SVC_ID/scrub HTTP/1.1


Host: example.com
Content-Type: application/json

{
"deep": true
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/osd/_SVCID/smart

Parameters

Replace SVC_ID with a string value for the Ceph OSD service identifier.

Example

GET /api/osd/SVC_ID/smart HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

IBM Storage Ceph 1101


Ceph Object Gateway
Edit online
The method reference for using the Ceph RESTful API rgw endpoint to manage the Ceph Object Gateway.

GET /api/rgw/status

Description
Display the Ceph Object Gateway status.

Example

GET /api/rgw/status HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/rgw/daemon

Description
Display the Ceph Object Gateway daemons.

Example

GET /api/rgw/daemon HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/rgw/daemon/_SVC_ID

Parameters

Replace SVC_ID with the service identifier as a string value.

Example

GET /api/rgw/daemon/SVC_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

1102 IBM Storage Ceph


500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/rgw/site

Parameters

Queries:

query - A string value.

daemon_name - The name of the daemon as a string value.

Example

GET /api/rgw/site HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Bucket Management

GET /api/rgw/bucket

Parameters

Queries:

stats - A boolean value for bucket statistics.

daemon_name - The name of the daemon as a string value.

Example

GET /api/rgw/bucket HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/rgw/bucket

Example

POST /api/rgw/bucket HTTP/1.1


Host: example.com
Content-Type: application/json

{
"bucket": "STRING",
"daemon_name": "STRING",
"lock_enabled": "false",
"lock_mode": "STRING",
"lock_retention_period_days": "STRING",
"lock_retention_period_years": "STRING",
"placement_target": "STRING",

IBM Storage Ceph 1103


"uid": "STRING",
"zonegroup": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/rgw/bucket/_BUCKET

Parameters

Replace BUCKET with the bucket name as a string value.

Queries:

purge_objects - A string value.

daemon_name - The name of the daemon as a string value.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/rgw/bucket/_BUCKET

Parameters

Replace BUCKET with the bucket name as a string value.

Queries:

daemon_name - The name of the daemon as a string value.

Example

GET /api/rgw/bucket/BUCKET HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/rgw/bucket/_BUCKET

1104 IBM Storage Ceph


Parameters

Replace BUCKET with the bucket name as a string value.

Example

PUT /api/rgw/bucket/BUCKET HTTP/1.1


Host: example.com
Content-Type: application/json

{
"bucket_id": "STRING",
"daemon_name": "STRING",
"lock_mode": "STRING",
"lock_retention_period_days": "STRING",
"lock_retention_period_years": "STRING",
"mfa_delete": "STRING",
"mfa_token_pin": "STRING",
"mfa_token_serial": "STRING",
"uid": "STRING",
"versioning_state": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

User Management

GET /api/rgw/user

Description
Display the Ceph Object Gateway users.

Parameters

Queries:

daemon_name - The name of the daemon as a string value.

Example

GET /api/rgw/user HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/rgw/user

Example

POST /api/rgw/user HTTP/1.1


Host: example.com
Content-Type: application/json

IBM Storage Ceph 1105


{
"access_key": "STRING",
"daemon_name": "STRING",
"display_name": "STRING",
"email": "STRING",
"generate_key": "STRING",
"max_buckets": "STRING",
"secret_key": "STRING",
"suspended": "STRING",
"uid": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/rgw/user/get_emails

Parameters

Queries:

daemon_name - The name of the daemon as a string value.

Example

GET /api/rgw/user/get_emails HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/rgw/user/_UID

Parameters

Replace UID with the user identifier as a string.

Queries:

daemon_name - The name of the daemon as a string value.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

1106 IBM Storage Ceph


500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/rgw/user/_UID

Parameters

Replace UID with the user identifier as a string.

Queries:

daemon_name - The name of the daemon as a string value.

stats - A boolean value for user statistics.

Example

GET /api/rgw/user/UID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/rgw/user/_UID

Parameters

Replace UID with the user identifier as a string.

Example

PUT /api/rgw/user/UID HTTP/1.1


Host: example.com
Content-Type: application/json

{
"daemon_name": "STRING",
"display_name": "STRING",
"email": "STRING",
"max_buckets": "STRING",
"suspended": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/rgw/user/_UID_/capability

Parameters

Replace UID with the user identifier as a string.

Queries:

daemon_name - The name of the daemon as a string value.

IBM Storage Ceph 1107


type - Required. A string value.

perm - Required. A string value.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/rgw/user/_UID_/capability

Parameters

Replace UID with the user identifier as a string.

Example

POST /api/rgw/user/UID/capability HTTP/1.1


Host: example.com
Content-Type: application/json

{
"daemon_name": "STRING",
"perm": "STRING",
"type": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/rgw/user/_UID_/key

Parameters

Replace UID with the user identifier as a string.

Queries:

daemon_name - The name of the daemon as a string value.

key_type - A string value.

subuser - A string value.

access_key - A string value.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

1108 IBM Storage Ceph


401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/rgw/user/_UID_/key

Parameters

Replace UID with the user identifier as a string.

Example

POST /api/rgw/user/UID/key HTTP/1.1


Host: example.com
Content-Type: application/json

{
"access_key": "STRING",
"daemon_name": "STRING",
"generate_key": "true",
"key_type": "s3",
"secret_key": "STRING",
"subuser": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/rgw/user/_UID_/quota

Parameters

Replace UID with the user identifier as a string.

Example

GET /api/rgw/user/UID/quota HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/rgw/user/_UID_/quota

Parameters

Replace UID with the user identifier as a string.

Example

PUT /api/rgw/user/UID/quota HTTP/1.1


Host: example.com

IBM Storage Ceph 1109


Content-Type: application/json

{
"daemon_name": "STRING",
"enabled": "STRING",
"max_objects": "STRING",
"max_size_kb": 1,
"quota_type": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/rgw/user/_UID_/subuser

Parameters

Replace UID with the user identifier as a string.

Example

POST /api/rgw/user/UID/subuser HTTP/1.1


Host: example.com
Content-Type: application/json

{
"access": "STRING",
"access_key": "STRING",
"daemon_name": "STRING",
"generate_secret": "true",
"key_type": "s3",
"secret_key": "STRING",
"subuser": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/rgw/user/_UID_/subuser/_SUBUSER

Parameters

Replace UID with the user identifier as a string.

Replace SUBUSER with the sub user name as a string.

Queries:

purge_keys - Set to false to not purge the keys. This only works for S3 subusers.

daemon_name - The name of the daemon as a string value.

1110 IBM Storage Ceph


Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

REST APIs for manipulating a role


Edit online
In addition to the radosgw-admin role commands, you can use the REST APIs for manipulating a role.

To invoke the REST admin APIs, create a user with admin caps.

Example

[root@host01 ~]# radosgw-admin --uid TESTER --display-name "TestUser" --access_key TESTER --secret
test123 user create
[root@host01 ~]# radosgw-admin caps add --
uid="_home_markdown_jenkins_workspace_Transform_in_SSEG27_5.3_developer_ref_rgw_rest-apis-for-
manipulating-a-role_TESTER" --caps="roles=*"

Create a role:

Syntax

POST “<hostname>?
Action=CreateRole&RoleName=ROLE_NAME&Path=PATH_TO_FILE&AssumeRolePolicyDocument=TRUST_RELATION
SHIP_POLICY_DOCUMENT”

Example

POST “<hostname>?
Action=CreateRole&RoleName=S3Access&Path=/application_abc/component_xyz/&AssumeRolePolicyDocum
ent={"Version":"2022-06-17","Statement":[{"Effect":"Allow","Principal":{"AWS":
["arn:aws:iam:::user/TESTER"]},"Action":["sts:AssumeRole"]}]}”

Example response

<role>
<id>8f41f4e0-7094-4dc0-ac20-074a881ccbc5</id>
<name>S3Access</name>
<path>/application_abc/component_xyz/</path>
<arn>arn:aws:iam:::role/application_abc/component_xyz/S3Access</arn>
<create_date>2022-06-23T07:43:42.811Z</create_date>
<max_session_duration>3600</max_session_duration>
<assumeROLEpolicy_document>{"Version":"2022-06-17","Statement":
[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":
["sts:AssumeRole"]}]}</assumeROLEpolicy_document>
</role>

Get a role:

Syntax

POST “<hostname>?Action=GetRole&RoleName=ROLE_NAME”

IBM Storage Ceph 1111


Example

POST “<hostname>?Action=GetRole&RoleName=S3Access”

Example response

<role>
<id>8f41f4e0-7094-4dc0-ac20-074a881ccbc5</id>
<name>S3Access</name>
<path>/application_abc/component_xyz/</path>
<arn>arn:aws:iam:::role/application_abc/component_xyz/S3Access</arn>
<create_date>2022-06-23T07:43:42.811Z</create_date>
<max_session_duration>3600</max_session_duration>
<assumeROLEpolicy_document>{"Version":"2022-06-17","Statement":
[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":
["sts:AssumeRole"]}]}</assumeROLEpolicy_document>
</role>

List a role:

Syntax

POST “<hostname>?Action=GetRole&RoleName=ROLE_NAME&PathPrefix=PATH_PREFIX”

Example request

POST “<hostname>?Action=ListRoles&RoleName=S3Access&PathPrefix=/application”

Example response

<role>
<id>8f41f4e0-7094-4dc0-ac20-074a881ccbc5</id>
<name>S3Access</name>
<path>/application_abc/component_xyz/</path>
<arn>arn:aws:iam:::role/application_abc/component_xyz/S3Access</arn>
<create_date>2022-06-23T07:43:42.811Z</create_date>
<max_session_duration>3600</max_session_duration>
<assumeROLEpolicy_document>{"Version":"2022-06-17","Statement":
[{"Effect":"Allow","Principal":{"AWS":["arn:aws:iam:::user/TESTER"]},"Action":
["sts:AssumeRole"]}]}</assumeROLEpolicy_document>
</role>

Update the assume role policy document:

Syntax

POST “<hostname>?
Action=UpdateAssumeRolePolicy&RoleName=ROLE_NAME&PolicyDocument=TRUST_RELATIONSHIP_POLICY_DOCU
MENT”

Example

POST “<hostname>?Action=UpdateAssumeRolePolicy&RoleName=S3Access&PolicyDocument=
{"Version":"2022-06-17","Statement":[{"Effect":"Allow","Principal":{"AWS":
["arn:aws:iam:::user/TESTER2"]},"Action":["sts:AssumeRole"]}]}”

Update policy attached to a role:

Syntax

POST “<hostname>?
Action=PutRolePolicy&RoleName=ROLE_NAME&PolicyName=POLICY_NAME&PolicyDocument=TRUST_RELATIONSH
IP_POLICY_DOCUMENT”

Example

POST “<hostname>?Action=PutRolePolicy&RoleName=S3Access&PolicyName=Policy1&PolicyDocument=
{"Version":"2022-06-17","Statement":[{"Effect":"Allow","Action":
["s3:CreateBucket"],"Resource":"arn:aws:s3:::example_bucket"}]}”

List permission policy names attached to a role:

Syntax

POST “<hostname>?Action=ListRolePolicies&RoleName=ROLE_NAME”

1112 IBM Storage Ceph


Example

POST “<hostname>?Action=ListRolePolicies&RoleName=S3Access”

<PolicyNames>
<member>Policy1</member>
</PolicyNames>

Get permission policy attached to a role:

Syntax

POST “<hostname>?Action=GetRolePolicy&RoleName=ROLE_NAME&PolicyName=POLICY_NAME”

Example

POST “<hostname>?Action=GetRolePolicy&RoleName=S3Access&PolicyName=Policy1”

<GetRolePolicyResult>
<PolicyName>Policy1</PolicyName>
<RoleName>S3Access</RoleName>
<Permission_policy>{"Version":"2022-06-17","Statement":[{"Effect":"Allow","Action":
["s3:CreateBucket"],"Resource":"arn:aws:s3:::example_bucket"}]}</Permission_policy>
</GetRolePolicyResult>

Delete policy attached to a role:

Syntax

POST “hostname>?Action=DeleteRolePolicy&RoleName=ROLE_NAME&PolicyName=POLICY_NAME“

Example

POST “<hostname>?Action=DeleteRolePolicy&RoleName=S3Access&PolicyName=Policy1”

Delete a role:

NOTE: You can delete a role only when it does not have any permission policy attached to it.

Syntax

POST “<hostname>?Action=DeleteRole&RoleName=ROLE_NAME"

Example

POST “<hostname>?Action=DeleteRole&RoleName=S3Access"

Reference
Edit online

See the Role management for details.

Ceph Orchestrator
Edit online
The method reference for using the Ceph RESTful API orchestrator endpoint to display the Ceph Orchestrator status.

GET /api/orchestrator/status

Description
Display the Ceph Orchestrator status.

Example

GET /api/orchestrator/status HTTP/1.1


Host: example.com

Status Codes

IBM Storage Ceph 1113


200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Pools
Edit online
The method reference for using the Ceph RESTful API pool endpoint to manage the storage pools.

GET /api/pool

Description
Display the pool list.

Parameters

Queries:

attrs - A string value of pool attributes.

stats - A boolean value for pool statistics.

Example

GET /api/pool HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/pool

Example

POST /api/pool HTTP/1.1


Host: example.com
Content-Type: application/json

{
"application_metadata": "STRING",
"configuration": "STRING",
"erasure_code_profile": "STRING",
"flags": "STRING",
"pg_num": 1,
"pool": "STRING",
"pool_type": "STRING",

1114 IBM Storage Ceph


"rule_name": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/pool/_POOL_NAME

Parameters

Replace POOL_NAME with the name of the pool.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/pool/_POOL_NAME

Parameters

Replace POOL_NAME with the name of the pool.

Queries:

attrs - A string value of pool attributes.

stats - A boolean value for pool statistics.

Example

GET /api/pool/POOL_NAME HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/pool/_POOL_NAME

Parameters

Replace POOL_NAME with the name of the pool.

Example

IBM Storage Ceph 1115


PUT /api/pool/POOL_NAME HTTP/1.1
Host: example.com
Content-Type: application/json

{
"application_metadata": "STRING",
"configuration": "STRING",
"flags": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/pool/POOL_NAME/configuration

Parameters

Replace POOL_NAME with the name of the pool.

Example

GET /api/pool/POOL_NAME/configuration HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Prometheus
Edit online
The method reference for using the Ceph RESTful API prometheus endpoint to manage Prometheus.

GET /api/prometheus

Example

GET /api/prometheus/rules HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

1116 IBM Storage Ceph


400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/prometheus/rules

Example

GET /api/prometheus/rules HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/prometheus/silence

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/prometheus/silence/_S_ID

Parameters

Replace S_ID with a string value.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/prometheus/silences

Example

GET /api/prometheus/silences HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

IBM Storage Ceph 1117


400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/prometheus/notifications

Example

GET /api/prometheus/notifications HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

RADOS block device


Edit online
The method reference for using the Ceph RESTful API block endpoint to manage RADOS block devices (RBD). This reference
includes all available RBD feature endpoints, such as:

RBD Namespace

RBD Snapshots

RBD Trash

RBD Mirroring

RBD Mirroring Summary

RBD Mirroring Pool Bootstrap

RBD Mirroring Pool Mode

RBD Mirroring Pool Peer

RBD Images

GET /api/block/image

Description
View the RBD images.

Parameters

Queries:

pool_name - The pool name as a string.

1118 IBM Storage Ceph


Example

GET /api/block/image HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/image

Example

POST /api/block/image HTTP/1.1


Host: example.com
Content-Type: application/json

{
"configuration": "STRING",
"data_pool": "STRING",
"features": "STRING",
"name": "STRING",
"namespace": "STRING",
"obj_size": 1,
"pool_name": "STRING",
"size": 1,
"stripe_count": 1,
"stripe_unit": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/block/image/clone_format_version

Description
Returns the RBD clone format version.

Example

GET /api/block/image/clone_format_version HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

IBM Storage Ceph 1119


GET /api/block/image/default_features

Example

GET /api/block/image/default_features HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/block/image/default_features

Example

GET /api/block/image/default_features HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/block/image/_IMAGE_SPEC

Parameters

Replace IMAGE_SPEC with the image name as a string value.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/block/image/_IMAGE_SPEC

Parameters

Replace IMAGE_SPEC with the image name as a string value.

Example

GET /api/block/image/IMAGE_SPEC HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

1120 IBM Storage Ceph


401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/block/image/_IMAGE_SPEC

Parameters

Replace IMAGE_SPEC with the image name as a string value.

Example

PUT /api/block/image/IMAGE_SPEC HTTP/1.1


Host: example.com
Content-Type: application/json

{
"configuration": "STRING",
"features": "STRING",
"name": "STRING",
"size": 1
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/image/IMAGE_SPEC/copy

Parameters

Replace IMAGE_SPEC with the image name as a string value.

Example

POST /api/block/image/IMAGE_SPEC/copy HTTP/1.1


Host: example.com
Content-Type: application/json

{
"configuration": "STRING",
"data_pool": "STRING",
"dest_image_name": "STRING",
"dest_namespace": "STRING",
"dest_pool_name": "STRING",
"features": "STRING",
"obj_size": 1,
"snapshot_name": "STRING",
"stripe_count": 1,
"stripe_unit": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

IBM Storage Ceph 1121


403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/image/IMAGE_SPEC/flatten

Parameters

Replace IMAGE_SPEC with the image name as a string value.

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/image/IMAGE_SPEC/move_trash

Description
Move an image to the trash. Images actively in-use by clones can be moved to the trash, and deleted at a later time.

Parameters

Replace IMAGE_SPEC with the image name as a string value.

Example

POST /api/block/image/IMAGE_SPEC/move_trash HTTP/1.1


Host: example.com
Content-Type: application/json

{
"delay": 1
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

RBD Mirroring

GET /api/block/mirroring/site_name

Description
Display the RBD mirroring site name.

Example

GET /api/block/mirroring/site_name HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

1122 IBM Storage Ceph


400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/block/mirroring/site_name

Example

PUT /api/block/mirroring/site_name HTTP/1.1


Host: example.com
Content-Type: application/json

{
"site_name": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

RBD Mirroring Pool Bootstrap

POST /api/block/mirroring/pool/POOL_NAME/bootstrap/peer

Parameters

Replace POOL_NAME with the name of the pool as a string.

Example

POST /api/block/mirroring/pool/POOL_NAME/bootstrap/peer HTTP/1.1


Host: example.com
Content-Type: application/json

{
"direction": "STRING",
"token": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/mirroring/pool/POOL_NAME/bootstrap/token

Parameters

Replace POOL_NAME with the name of the pool as a string.

Status Codes

IBM Storage Ceph 1123


201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

RBD Mirroring Pool Mode

GET /api/block/mirroring/pool/_POOL_NAME

Description
Display the RBD mirroring summary.

Parameters

Replace POOL_NAME with the name of the pool as a string.

Example

GET /api/block/mirroring/pool/POOL_NAME HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/block/mirroring/pool/_POOL_NAME

Parameters

Replace POOL_NAME with the name of the pool as a string.

Example

PUT /api/block/mirroring/pool/POOL_NAME HTTP/1.1


Host: example.com
Content-Type: application/json

{
"mirror_mode": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

RBD Mirroring Pool Peer

GET /api/block/mirroring/pool/POOL_NAME/peer

1124 IBM Storage Ceph


Parameters

Replace POOL_NAME with the name of the pool as a string.

Example

GET /api/block/mirroring/pool/POOL_NAME/peer HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/mirroring/pool/POOL_NAME/peer

Parameters

Replace POOL_NAME with the name of the pool as a string.

Example

POST /api/block/mirroring/pool/POOL_NAME/peer HTTP/1.1


Host: example.com
Content-Type: application/json

{
"client_id": "STRING",
"cluster_name": "STRING",
"key": "STRING",
"mon_host": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/block/mirroring/pool/POOL_NAME/peer/_PEER_UUID

Parameters

Replace POOL_NAME with the name of the pool as a string.

Replace PEER_UUID with the UUID of the peer as a string.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

IBM Storage Ceph 1125


500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/block/mirroring/pool/POOL_NAME/peer/_PEER_UUID

Parameters

Replace POOL_NAME with the name of the pool as a string.

Replace PEER_UUID with the UUID of the peer as a string.

Example

GET /api/block/mirroring/pool/POOL_NAME/peer/PEER_UUID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/block/mirroring/pool/POOL_NAME/peer/_PEER_UUID

Parameters

Replace POOL_NAME with the name of the pool as a string.

Replace PEER_UUID with the UUID of the peer as a string.

Example

PUT /api/block/mirroring/pool/POOL_NAME/peer/PEER_UUID HTTP/1.1


Host: example.com
Content-Type: application/json

{
"client_id": "STRING",
"cluster_name": "STRING",
"key": "STRING",
"mon_host": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

RBD Mirroring Summary

GET /api/block/mirroring/summary

Description
Display the RBD mirroring summary.

Example

GET /api/block/mirroring/summary HTTP/1.1


Host: example.com

1126 IBM Storage Ceph


Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

RBD Namespace

GET /api/block/pool/POOL_NAME/namespace

Parameters

Replace POOL_NAME with the name of the pool as a string.

Example

GET /api/block/pool/POOL_NAME/namespace HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/pool/POOL_NAME/namespace

Parameters

Replace POOL_NAME with the name of the pool as a string.

Example

POST /api/block/pool/POOL_NAME/namespace HTTP/1.1


Host: example.com
Content-Type: application/json

{
"namespace": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/block/pool/POOL_NAME/namespace/_NAMESPACE

Parameters

Replace POOL_NAME with the name of the pool as a string.

IBM Storage Ceph 1127


Replace NAMESPACE with the namespace as a string.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

RBD Snapshots

POST /api/block/image/IMAGE_SPEC/snap

Parameters

Replace IMAGE_SPEC with the image name as a string value.

Example

POST /api/block/image/IMAGE_SPEC/snap HTTP/1.1


Host: example.com
Content-Type: application/json

{
"snapshot_name": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/block/image/IMAGE_SPEC/snap/_SNAPSHOT_NAME

Parameters

Replace IMAGE_SPEC with the image name as a string value.

Replace SNAPSHOT_NAME with the name of the snapshot as a string value.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/block/image/IMAGE_SPEC/snap/_SNAPSHOT_NAME

Parameters

1128 IBM Storage Ceph


Replace IMAGE_SPEC with the image name as a string value.

Replace SNAPSHOT_NAME with the name of the snapshot as a string value.

Example

PUT /api/block/image/IMAGE_SPEC/snap/SNAPSHOT_NAME HTTP/1.1


Host: example.com
Content-Type: application/json

{
"is_protected": true,
"new_snap_name": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/image/IMAGE_SPEC/snap/SNAPSHOT_NAME/clone

Description
Clones a snapshot to an image.

Parameters

Replace IMAGE_SPEC with the image name as a string value.

Replace SNAPSHOT_NAME with the name of the snapshot as a string value.

Example

POST /api/block/image/IMAGE_SPEC/snap/SNAPSHOT_NAME/clone HTTP/1.1


Host: example.com
Content-Type: application/json

{
"child_image_name": "STRING",
"child_namespace": "STRING",
"child_pool_name": "STRING",
"configuration": "STRING",
"data_pool": "STRING",
"features": "STRING",
"obj_size": 1,
"stripe_count": 1,
"stripe_unit": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/image/IMAGE_SPEC/snap/SNAPSHOT_NAME/rollback

IBM Storage Ceph 1129


Parameters

Replace IMAGE_SPEC with the image name as a string value.

Replace SNAPSHOT_NAME with the name of the snapshot as a string value.

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

RBD Trash

GET /api/block/image/trash

Description
Display all the RBD trash entries, or the RBD trash details by pool name.

Parameters

Queries:

pool_name - The name of the pool as a string value.

Example

GET /api/block/image/trash HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/image/trash/purge

Description
Remove all the expired images from trash.

Parameters

Queries:

pool_name - The name of the pool as a string value.

Example

POST /api/block/image/trash/purge HTTP/1.1


Host: example.com
Content-Type: application/json

{
"pool_name": "STRING"
}

Status Codes

1130 IBM Storage Ceph


201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/block/image/trash/_IMAGEIDSPEC

Description
Deletes an image from the trash. If the image deferment time has not expired, you can not delete it unless you use force. An
actively in-use image by clones or has snapshots, it can not be deleted.

Parameters

Replace IMAGEIDSPEC with the image name as a string value.

Queries:

force - A boolean value to force the deletion of an image from trash.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/block/image/trash/_IMAGEIDSPEC_/restore

Description
Restores an image from the trash.

Parameters

Replace IMAGEIDSPEC with the image name as a string value.

Example

POST /api/block/image/trash/IMAGEIDSPEC/restore HTTP/1.1


Host: example.com
Content-Type: application/json

{
"new_image_name": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

IBM Storage Ceph 1131


Reference
Edit online

See the Ceph RESTful API for more details.

Performance counters
Edit online
The method reference for using the Ceph RESTful API perf_counters endpoint to display the various Ceph performance counter.
This reference includes all available performance counter endpoints, such as:

Ceph Metadata Server (MDS)

Ceph Manager

Ceph Monitor

Ceph OSD

Ceph Object Gateway

Ceph RADOS Block Device (RBD) Mirroring

TCMU Runner

GET /api/perf_counters

Description
Displays the performance counters.

Example

GET /api/perf_counters HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/perf_counters/mds/_SERVICE_ID

Parameters

Replace SERVICE_ID with the required service identifier as a string.

Example

GET /api/perf_counters/mds/SERVICE_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

1132 IBM Storage Ceph


500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Ceph Manager

GET /api/perf_counters/mgr/_SERVICE_ID

Parameters

Replace SERVICE_ID with the required service identifier as a string.

Example

GET /api/perf_counters/mgr/SERVICE_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Ceph Monitor

GET /api/perf_counters/mon/_SERVICE_ID

Parameters

Replace SERVICE_ID with the required service identifier as a string.

Example

GET /api/perf_counters/mon/SERVICE_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Ceph OSD

GET /api/perf_counters/osd/_SERVICE_ID

Parameters

Replace SERVICE_ID with the required service identifier as a string.

Example

GET /api/perf_counters/osd/SERVICE_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

IBM Storage Ceph 1133


403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Ceph RADOS Block Device (RBD) Mirroring

GET /api/perf_counters/rbd-mirror/_SERVICE_ID

Parameters

Replace SERVICE_ID with the required service identifier as a string.

Example

GET /api/perf_counters/rbd-mirror/SERVICE_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Ceph Object Gateway

GET /api/perf_counters/rgw/_SERVICE_ID

Parameters

Replace SERVICE_ID with the required service identifier as a string.

Example

GET /api/perf_counters/rgw/SERVICE_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

TCMU Runner

GET /api/perf_counters/tcmu-runner/_SERVICE_ID

Parameters

Replace SERVICE_ID with the required service identifier as a string.

Example

GET /api/perf_counters/tcmu-runner/SERVICE_ID HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

1134 IBM Storage Ceph


401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Roles
Edit online
The method reference for using the Ceph RESTful API role endpoint to manage the various user roles in Ceph.

GET /api/role

Description
Display the role list.

Example

GET /api/role HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/role

Example

POST /api/role HTTP/1.1


Host: example.com
Content-Type: application/json

{
"description": "STRING",
"name": "STRING",
"scopes_permissions": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/role/_NAME

IBM Storage Ceph 1135


Parameters

Replace NAME with the role name as a string.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/role/_NAME

Parameters

Replace NAME with the role name as a string.

Example

GET /api/role/NAME HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/role/_NAME

Parameters

Replace NAME with the role name as a string.

Example

PUT /api/role/NAME HTTP/1.1


Host: example.com
Content-Type: application/json

{
"description": "STRING",
"scopes_permissions": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/role/NAME/clone

1136 IBM Storage Ceph


Parameters

Replace NAME with the role name as a string.

Example

POST /api/role/NAME/clone HTTP/1.1


Host: example.com
Content-Type: application/json

{
"new_name": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Services
Edit online
The method reference for using the Ceph RESTful API service endpoint to manage the various Ceph services.

GET /api/service

Parameters

Queries:

service_name - The name of the service as a string.

Example

GET /api/service HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/service

Parameters

IBM Storage Ceph 1137


service_spec - The service specification as a JSON file.

service_name - The name of the service.

Example

POST /api/service HTTP/1.1


Host: example.com
Content-Type: application/json

{
"service_name": "STRING",
"service_spec": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/service/known_types

Description
Display a list of known service types.

Example

GET /api/service/known_types HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/service/_SERVICE_NAME

Parameters

Replace SERVICE_NAME with the name of the service as a string.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/service/_SERVICE_NAME

1138 IBM Storage Ceph


Parameters

Replace SERVICE_NAME with the name of the service as a string.

Example

GET /api/service/SERVICE_NAME HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/service/SERVICE_NAME/daemons

Parameters

Replace SERVICE_NAME with the name of the service as a string.

Example

GET /api/service/SERVICE_NAME/daemons HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Settings
Edit online
The method reference for using the Ceph RESTful API settings endpoint to manage the various Ceph settings.

GET /api/settings

Description
Display the list of available options

Parameters

Queries:

names - A comma-separated list of option names.

Example

IBM Storage Ceph 1139


GET /api/settings HTTP/1.1
Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/settings

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/settings/_NAME

Parameters

Replace NAME with the option name as a string.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/settings/_NAME

Description
Display the given option.

Parameters

Replace NAME with the option name as a string.

Example

GET /api/settings/NAME HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

1140 IBM Storage Ceph


403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/settings/_NAME

Parameters

Replace NAME with the option name as a string.

Example

PUT /api/settings/NAME HTTP/1.1


Host: example.com
Content-Type: application/json

{
"value": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Ceph task
Edit online
The method reference for using the Ceph RESTful API task endpoint to display Ceph tasks.

GET /api/task

Description
Display Ceph tasks.

Parameters

Queries:

name - The name of the task.

Example

GET /api/task HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

IBM Storage Ceph 1141


403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

Telemetry
Edit online
The method reference for using the Ceph RESTful API telemetry endpoint to manage data for the telemetry Ceph Manager module.

PUT /api/telemetry

Description
Enables or disables the sending of collected data by the telemetry module.

Parameters

enable - A boolean value.

license_name - A string value, such as, sharing-1-0. Make sure the user is aware of and accepts the license for
sharing telemetry data.

Example

PUT /api/telemetry HTTP/1.1


Host: example.com
Content-Type: application/json

{
"enable": true,
"license_name": "STRING"
}

Status Codes

200 OK – Okay.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/telemetry/report

Description
Display report data on Ceph and devices.

Example

GET /api/telemetry/report HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

1142 IBM Storage Ceph


401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

For more information, see Ceph RESTful API.

For more information about managing with the Ceph dashboard, see Activating and deactivating telemetry.

Ceph users
Edit online
The method reference for using the Ceph RESTful API user endpoint to display Ceph user details and to manage Ceph user
passwords.

GET /api/user

Description
Display a list of users.

Example

GET /api/user HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/user

Example

POST /api/user HTTP/1.1


Host: example.com
Content-Type: application/json

{
"email": "STRING",
"enabled": true,
"name": "STRING",
"password": "STRING",
"pwdExpirationDate": "STRING",
"pwdUpdateRequired": true,
"roles": "STRING",
"username": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

IBM Storage Ceph 1143


401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

DELETE /api/user/USERNAME

Parameters

Replace USER_NAME with the name of the user as a string.

Status Codes

202 Accepted – Operation is still executing. Please check the task queue.

204 No Content – Resource deleted.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

GET /api/user/USERNAME

Parameters

Replace USER_NAME with the name of the user as a string.

Example

GET /api/user/USER_NAME HTTP/1.1


Host: example.com

Status Codes

200 OK – Okay.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

PUT /api/user/USERNAME

Parameters

Replace USER_NAME with the name of the user as a string.

Example

PUT /api/user/USER_NAME HTTP/1.1


Host: example.com
Content-Type: application/json

{
"email": "STRING",
"enabled": "STRING",
"name": "STRING",
"password": "STRING",
"pwdExpirationDate": "STRING",
"pwdUpdateRequired": true,
"roles": "STRING"
}

Status Codes

200 OK – Okay.

1144 IBM Storage Ceph


202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/user/USERNAME_/change_password

Parameters

Replace USER_NAME with the name of the user as a string.

Example

POST /api/user/USER_NAME/change_password HTTP/1.1


Host: example.com
Content-Type: application/json

{
"new_password": "STRING",
"old_password": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

POST /api/user/validate_password

Description
Checks the password to see if it meets the password policy.

Parameters

password - The password to validate.

username - Optional. The name of the user.

old_password - Optional. The old password.

Example

POST /api/user/validate_password HTTP/1.1


Host: example.com
Content-Type: application/json

{
"old_password": "STRING",
"password": "STRING",
"username": "STRING"
}

Status Codes

201 Created – Resource created.

202 Accepted – Operation is still executing. Please check the task queue.

400 Bad Request – Operation exception. Please check the response body for details.

IBM Storage Ceph 1145


401 Unauthorized – Unauthenticated access. Please login first.

403 Forbidden – Unauthorized access. Please check your permissions.

500 Internal Server Error – Unexpected error. Please check the response body for the stack trace.

Reference
Edit online

See the Ceph RESTful API for more details.

S3 common request headers


Edit online
The following table lists the valid common request headers and their descriptions.

Table 1. Request Headers


Request Header Description
CONTENT_LENGTH Length of the request body.
DATE Request time and date (in UTC).
HOST The name of the host server.
AUTHORIZATION Authorization token.

S3 common response status codes


Edit online
The following table lists the valid common HTTP response status and its corresponding code.

Table 1. Response Status


HTTP Status Response Code
100 Continue
200 Success
201 Created
202 Accepted
204 NoContent
206 Partial content
304 NotModified
400 InvalidArgument
400 InvalidDigest
400 BadDigest
400 InvalidBucketName
400 InvalidObjectName
400 UnresolvableGrantByEmailAddress
400 InvalidPart
400 InvalidPartOrder
400 RequestTimeout
400 EntityTooLarge
403 AccessDenied
403 UserSuspended
403 RequestTimeTooSkewed
404 NoSuchKey
404 NoSuchBucket

1146 IBM Storage Ceph


HTTP Status Response Code
404 NoSuchUpload
405 MethodNotAllowed
408 RequestTimeout
409 BucketAlreadyExists
409 BucketNotEmpty
411 MissingContentLength
412 PreconditionFailed
416 InvalidRange
422 UnprocessableEntity
500 InternalError

S3 unsupported header fields


Edit online
Table 1. Unsupported Header
Fields
Name Type
x-amz-security-token Request
Server Response
x-amz-delete-marker Response
x-amz-id-2 Response
x-amz-request-id Response
x-amz-version-id Response

Swift request headers


Edit online
Table 1. Request Headers
Name Description Type Required
X-Auth-User The key Ceph Object Gateway username to authenticate. String Yes
X-Auth-Key The key associated to a Ceph Object Gateway username. String Yes

Swift response headers


Edit online
The response from the server should include an X-Auth-Token value. The response might also contain a X-Storage-Url that
provides the API_VERSION_/_ACCOUNT prefix that is specified in other requests throughout the API documentation.

Table 1. Response Headers


Name Description Type
X-Storage-Token The authorization token for the X-Auth-User specified in the request. String
X-Storage-Url The URL and _API_VERSION_/_ACCOUNT_ path for the user. String

Examples using the Secure Token Service APIs


Edit online

IBM Storage Ceph 1147


These examples are using Python’s boto3 module to interface with the Ceph Object Gateway’s implementation of the Secure Token
Service (STS). In these examples, TESTER2 assumes a role created by TESTER1, as to access S3 resources owned by TESTER1
based on the permission policy attached to the role.

The AssumeRole example creates a role, assigns a policy to the role, then assumes a role to get temporary credentials and access to
S3 resources using those temporary credentials.

The AssumeRoleWithWebIdentity example authenticates users using an external application with Keycloak, an OpenID Connect
identity provider, assumes a role to get temporary credentials and access S3 resources according to the permission policy of the role.

AssumeRole Example

import boto3

iam_client = boto3.client(iam,
aws_access_key_id=ACCESS_KEY_OF_TESTER1,
aws_secret_access_key=SECRET_KEY_OF_TESTER1,
endpoint_url=<IAM URL>,
region_name='
)

policy_document = "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":
{\"AWS\":[\"arn:aws:iam:::user/TESTER1\"]},\"Action\":[\"sts:AssumeRole\"]}]}"

role_response = iam_client.create_role(
AssumeRolePolicyDocument=policy_document,
Path=/,
RoleName='S3Access,
)

role_policy = "{\"Version\":\"2012-10-17\",\"Statement\":
{\"Effect\":\"Allow\",\"Action\":\"s3:*\",\"Resource\":\"arn:aws:s3:::*\"}}"

response = iam_client.putROLEpolicy(
RoleName=S3Access,
PolicyName=Policy1,
PolicyDocument=role_policy
)

sts_client = boto3.client(sts,
aws_access_key_id=ACCESS_KEY_OF_TESTER2,
aws_secret_access_key=SECRET_KEY_OF_TESTER2,
endpoint_url=<STS URL>,
region_name=',
)

response = sts_client.assume_role(
RoleArn=role_response['Role][Arn],
RoleSessionName=Bob,
DurationSeconds=3600
)

s3client = boto3.client(s3,
aws_access_key_id = response[Credentials][AccessKeyId],
aws_secret_access_key = response[Credentials][SecretAccessKey],
aws_session_token = response[Credentials][SessionToken],
endpoint_url=<S3 URL>,
region_name=',)

bucket_name = 'my-bucket
s3bucket = s3client.create_bucket(Bucket=bucket_name)
resp = s3client.list_buckets()

AssumeRoleWithWebIdentity Example

import boto3

iam_client = boto3.client(iam,
aws_access_key_id=ACCESS_KEY_OF_TESTER1,
aws_secret_access_key=SECRET_KEY_OF_TESTER1,
endpoint_url=<IAM URL>,
region_name='
)

1148 IBM Storage Ceph


oidc_response = iam_client.create_openIDconnect_provider(
Url=<URL of the OpenID Connect Provider>,
ClientIDList=[
<Client id registered with the IDP>
],
ThumbprintList=[
<IDP THUMBPRINT>
]
)

policy_document = "{\"Version\":\"2012-10-17\",\"Statement\":\[\
{\"Effect\":\"Allow\",\"Principal\":\{\"Federated\":\[\"arn:aws:iam:::oidc-
provider/localhost:8080/auth/realms/demo\"\]\},\"Action\":\
[\"sts:AssumeRoleWithWebIdentity\"\],\"Condition\":\{\"StringEquals\":\
{\"localhost:8080/auth/realms/demo:app_id\":\"customer-portal\"\}\}\}\]\}"
role_response = iam_client.create_role(
AssumeRolePolicyDocument=policy_document,
Path=/,
RoleName='S3Access,
)

role_policy = "{\"Version\":\"2012-10-17\",\"Statement\":
{\"Effect\":\"Allow\",\"Action\":\"s3:*\",\"Resource\":\"arn:aws:s3:::*\"}}"

response = iam_client.putROLEpolicy(
RoleName=S3Access,
PolicyName=Policy1,
PolicyDocument=role_policy
)

sts_client = boto3.client(sts,
aws_access_key_id=ACCESS_KEY_OF_TESTER2,
aws_secret_access_key=SECRET_KEY_OF_TESTER2,
endpoint_url=<STS URL>,
region_name=',
)

response = client.assumeROLEwith_web_identity(
RoleArn=role_response['Role][Arn],
RoleSessionName=Bob,
DurationSeconds=3600,
WebIdentityToken=<Web Token>
)

s3client = boto3.client(s3,
aws_access_key_id = response[Credentials][AccessKeyId],
aws_secret_access_key = response[Credentials][SecretAccessKey],
aws_session_token = response[Credentials][SessionToken],
endpoint_url=<S3 URL>,
region_name=',)

bucket_name = 'my-bucket
s3bucket = s3client.create_bucket(Bucket=bucket_name)
resp = s3client.list_buckets()

Reference
Edit online

For more details on using Python's boto module, see Test S3 Access.

Troubleshooting
Edit online
Troubleshoot and resolve common problems with IBM Storage Ceph.

Initial Troubleshooting
Configuring logging
Troubleshooting networking issues

IBM Storage Ceph 1149


Troubleshooting Ceph Monitors
Troubleshooting Ceph OSDs
Troubleshooting a multi-site Ceph Object Gateway
Troubleshooting Ceph placement groups
Troubleshooting Ceph objects
Troubleshooting clusters in stretch mode
Contacting IBM support for service
Ceph subsystems default logging level values
Health messages of a Ceph cluster

Initial Troubleshooting
Edit online
As a storage administrator, you can do the initial troubleshooting of a IBM Storage Ceph cluster before contacting IBM support. This
chapter includes the following information:

Prerequisites

A running IBM Storage Ceph cluster.

Identifying problems
Diagnosing the health of a storage cluster
Understanding Ceph health
Muting health alerts of a Ceph cluster
Understanding Ceph logs
Generating an sos report

Identifying problems
Edit online
To determine possible causes of the error with the IBM Storage Ceph cluster, answer the questions in the Procedure section.

Prerequisites

A running IBM Storage Ceph cluster.

Procedure

1. Certain problems can arise when using unsupported configurations. Ensure that your configuration is supported.

2. Do you know which Ceph component causes the problem?

a. No. See Diagnosing the health of a Ceph storage cluster.

b. Ceph Monitors. See Troubleshooting Ceph Monitors.

c. Ceph OSDs. See Troubleshooting Ceph OSDs.

d. Ceph placement groups. See Troubleshooting Ceph placement groups.

e. Multi-site Ceph Object Gateway. See Troubleshooting a multi-site Ceph Object Gateway.

Reference

See the Supported configurations article for details.

Diagnosing the health of a storage cluster


Edit online

1150 IBM Storage Ceph


This procedure lists basic steps to diagnose the health of a IBM Storage Ceph cluster.

Prerequisites

A running IBM Storage Ceph cluster.

Procedure

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Check the overall status of the storage cluster:

Example

[ceph: root@host01 /]# ceph health detail

If the command returns HEALTH_WARN or HEALTH_ERR see Understanding Ceph health for details.

3. Monitor the logs of the storage cluster:

Example

[ceph: root@host01 /]# ceph -W cephadm

4. To capture the logs of the cluster to a file, run the following commands:

Example

[ceph: root@host01 /]# ceph config set global log_to_file true


[ceph: root@host01 /]# ceph config set global mon_cluster_log_to_file true

The logs are located by default in the /var/log/ceph/ directory. Check the Ceph logs for any error messages listed in
Understanding Ceph logs.

5. If the logs do not include a sufficient amount of information, increase the debugging level and try to reproduce the action that
failed. See Configuring logging for details.

Understanding Ceph health


Edit online
The ceph health command returns information about the status of the IBM Storage Ceph cluster:

HEALTH_OK indicates that the cluster is healthy.

HEALTH_WARN indicates a warning. In some cases, the Ceph status returns to HEALTH_OK automatically. For example when
IBM Storage Ceph cluster finishes the rebalancing process. However, consider further troubleshooting if a cluster is in the
HEALTH_WARN state for longer time.

HEALTH_ERR indicates a more serious problem that requires your immediate attention.

Use the ceph health detail and ceph -s commands to get a more detailed output.

NOTE: A health warning is displayed if there is no mgr daemon running. In case the last mgr daemon of a IBM Storage Ceph cluster
was removed, you can manually deploy a mgr daemon, on a random host of the IBM Storage Ceph cluster.

See Manually deploying a mgr daemon in the IBM Storage Ceph 5.3 Administration Guide.

Reference

See Ceph Monitor error messages table in the IBM Storage Ceph Troubleshooting Guide.

See Ceph OSD error messages table in the IBM Storage Ceph Troubleshooting Guide.

See Placement group error messages table in the IBM Storage Ceph Troubleshooting Guide.

IBM Storage Ceph 1151


Muting health alerts of a Ceph cluster
Edit online
In certain scenarios, users might want to temporarily mute some warnings, because they are already aware of the warning and
cannot act on it right away. You can mute health checks so that they do not affect the overall reported status of the Ceph cluster.

Alerts are specified using the health check codes. One example is, when an OSD is brought down for maintenance, OSD_DOWN
warnings are expected. You can choose to mute the warning until the maintenance is over because those warnings put the cluster in
HEALTH_WARN instead of HEALTH_OK for the entire duration of maintenance.

Most health mutes also disappear if the extent of an alert gets worse. For example, if there is one OSD down, and the alert is muted,
the mute disappears if one or more additional OSDs go down. This is true for any health alert that involves a count indicating how
much or how many of something is triggering the warning or error.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level of access to the nodes.

A health warning message.

Procedure

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Check the health of the IBM Storage Ceph cluster by running the ceph health detail command:

Example

[ceph: root@host01 /]# ceph health detail

HEALTH_WARN 1 osds down; 1 OSDs or CRUSH {nodes, device-classes} have


{NOUP,NODOWN,NOIN,NOOUT} flags set
[WRN] OSD_DOWN: 1 osds down
osd.1 (root=default,host=host01) is down
[WRN] OSD_FLAGS: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags
set
osd.1 has flags noup

You can see that the storage cluster is in HEALTH_WARN status as one of the OSDs is down.

3. Mute the alert:

Syntax

ceph health mute HEALTH_MESSAGE

Example

[ceph: root@host01 /]# ceph health mute OSD_DOWN

4. Optional: A health check mute can have a time to live (TTL) associated with it, such that the mute automatically expires after
the specified period of time has elapsed. Specify the TTL as an optional duration argument in the command:

Syntax

ceph health mute HEALTH_MESSAGE DURATION

DURATION can be specified in s, sec, m, min, h, or hour.

Example

[ceph: root@host01 /]# ceph health mute OSD_DOWN 10m

In this example, the alert OSD_DOWN is muted for 10 minutes.

1152 IBM Storage Ceph


5. Verify if the IBM Storage Ceph cluster status has changed to HEALTH_OK:

Example

[ceph: root@host01 /]# ceph -s


cluster:
id: 81a4597a-b711-11eb-8cb8-001a4a000740
health: HEALTH_OK
(muted: OSD_DOWN(9m) OSD_FLAGS(9m))

services:
mon: 3 daemons, quorum host01,host02,host03 (age 33h)
mgr: host01.pzhfuh(active, since 33h), standbys: host02.wsnngf, host03.xwzphg
osd: 11 osds: 10 up (since 4m), 11 in (since 5d)

data:
pools: 1 pools, 1 pgs
objects: 13 objects, 0 B
usage: 85 MiB used, 165 GiB / 165 GiB avail
pgs: 1 active+clean

In this example, you can see that the alert OSD_DOWN and OSD_FLAG is muted and the mute is active for nine minutes.

6. Optional: You can retain the mute even after the alert is cleared by making it sticky.

Syntax

ceph health mute HEALTH_MESSAGE DURATION --sticky

Example

[ceph: root@host01 /]# ceph health mute OSD_DOWN 1h --sticky

7. You can remove the mute by running the following command:

Syntax

ceph health unmute HEALTH_MESSAGE

Example

[ceph: root@host01 /]# ceph health unmute OSD_DOWN

Reference

See Health messages of a Ceph cluster section in the IBM Storage Ceph Troubleshooting Guide for details.

Understanding Ceph logs


Edit online
Ceph stores its logs in the /var/log/ceph/ directory after the logging is enabled.

The CLUSTER_NAME.log is the main storage cluster log file that includes global events. By default, the log file name is ceph.log.
Only the Ceph Monitor nodes include the main storage cluster log.

Each Ceph OSD and Monitor has its own log file named CLUSTER_NAME-osd.NUMBER.log and CLUSTER_NAME-
mon.HOSTNAME.log respectively.

When you increase debugging level for Ceph subsystems, Ceph generates new log files for those subsystems as well.

Reference

For details about logging, see Configuring logging in the IBM Storage Ceph Troubleshooting Guide.

See Common Ceph Monitor error messages in the Ceph logs table in the IBM Storage Ceph Troubleshooting Guide.

See Common Ceph OSD error messages in the Ceph logs table in the IBM Storage Ceph Troubleshooting Guide.

IBM Storage Ceph 1153


Generating an sos report

Edit online
You can run the sos report command to collect the configuration details, system information, and diagnostic information of a IBM
Storage Ceph cluster from a Red Hat Enterprise Linux. IBM Support team uses this information for further troubleshooting of the
storage cluster.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the nodes.

Procedure

1. Install the sos package:

Example

[root@host01 ~]# dnf install sos

NOTE: Install the sos-4.0.11.el8 package or higher version to capture the Ceph command output correctly.

2. Run the sos report to get the system information of the storage cluster:

Example

[root@host01 ~]# sos report -a --all-logs

The report is saved in the /var/tmp file.

For sos versions 4.3 and later, you need to run the following command for specific Ceph information:

Example # sos report -a --all-logs -e ceph_mon

In the above example, you can get the logs of the Ceph Monitor.

Reference

See the What is an sos report and how to create one in Red Hat Enterprise Linux? KnowledgeBase article for more information.

Configuring logging
Edit online
This chapter describes how to configure logging for various Ceph subsystems.

IMPORTANT: Logging is resource intensive. Also, verbose logging can generate a huge amount of data in a relatively short time. If
you are encountering problems in a specific subsystem of the cluster, enable logging only of that subsystem. See Ceph Subsystems
for more information.

In addition, consider setting up a rotation of log files. See Accelerating log rotation for details.

Once you fix any problems you encounter, change the subsystems log and memory levels to their default values. See Ceph
subsystems default logging level values for a list of all Ceph subsystems and their default values.

You can configure Ceph logging by:

Using the ceph command at runtime. This is the most common approach. See Configuring logging at runtime for details.

Updating the Ceph configuration file. Use this approach if you are encountering problems when starting the cluster. See
Configuring logging in configuration file for details.

Prerequisites

A running IBM Storage Ceph cluster.

1154 IBM Storage Ceph


Ceph subsystems
Configuring logging at runtime
Configuring logging in configuration file
Accelerating log rotation
Creating and collecting operation logs for Ceph Object Gateway

Ceph subsystems
Edit online
This section contains information about Ceph subsystems and their logging levels.

Understanding Ceph Subsystems and Their Logging Levels

Ceph consists of several subsystems.

Each subsystem has a logging level of its:

Output logs that are stored by default in /var/log/ceph/ directory (log level)

Logs that are stored in a memory cache (memory level)

In general, Ceph does not send logs stored in memory to the output logs unless:

A fatal signal is raised

An assert in source code is triggered

You request it

You can set different values for each of these subsystems. Ceph logging levels operate on a scale of 1 to 20, where 1 is terse and 20
is verbose.

Use a single value for the log level and memory level to set them both to the same value. For example, debug_osd = 5 sets the
debug level for the ceph-osd daemon to 5.

To use different values for the output log level and the memory level, separate the values with a forward slash (/). For example,
debug_mon = 1/5 sets the debug log level for the ceph-mon daemon to 1 and its memory log level to 5.

Ceph subsystems and logging their default values

Subsystem Log Level Memory Level Description


asok 1 5 The administration socket
auth 1 5 Authentication
client 0 5 Any application or library that uses librados to connect to the cluster.
bluestore 1 5 The BlueStore OSD backend.
journal 1 5 The OSD journal
mds 1 5 The Metadata Servers
monc 0 5 The Monitor client handles communication between most Ceph daemons
and Monitors
mon 1 5 Monitors
ms 0 5 The messaging system between Ceph components
osd 0 5 The OSD Daemons
paxos 0 5 The algorithm that Monitors use to establish a consensus
rados 0 5 Reliable Autonomic Distributed Object Store, a core component of Ceph
rbd 0 5 The Ceph Block Devices
rgw 1 5 The Ceph Object Gateway
Example Log Outputs

The following examples show the type of messages in the logs when you increase the verbosity for the Monitors and OSDs.

Monitor Debug Settings

IBM Storage Ceph 1155


debug_ms = 5
debug_mon = 20
debug_paxos = 20
debug_auth = 20

Example Log Output of Monitor Debug Settings

2022-05-12 12:37:04.278761 7f45a9afc700 10 mon.cephn2@0(leader).osd e322 e322: 2 osds: 2 up, 2 in


2022-05-12 12:37:04.278792 7f45a9afc700 10 mon.cephn2@0(leader).osd e322 min_last_epoch_clean 322
2022-05-12 12:37:04.278795 7f45a9afc700 10 mon.cephn2@0(leader).log v1010106 log
2022-05-12 12:37:04.278799 7f45a9afc700 10 mon.cephn2@0(leader).auth v2877 auth
2022-05-12 12:37:04.278811 7f45a9afc700 20 mon.cephn2@0(leader) e1 sync_trim_providers
2022-05-12 12:37:09.278914 7f45a9afc700 11 mon.cephn2@0(leader) e1 tick
2022-05-12 12:37:09.278949 7f45a9afc700 10 mon.cephn2@0(leader).pg v8126 v8126: 64 pgs: 64
active+clean; 60168 kB data, 172 MB used, 20285 MB / 20457 MB avail
2022-05-12 12:37:09.278975 7f45a9afc700 10 mon.cephn2@0(leader).paxosservice(pgmap 7511..8126)
maybe_trim trim_to 7626 would only trim 115 < paxos_service_trim_min 250
2022-05-12 12:37:09.278982 7f45a9afc700 10 mon.cephn2@0(leader).osd e322 e322: 2 osds: 2 up, 2 in
2022-05-12 12:37:09.278989 7f45a9afc700 5 mon.cephn2@0(leader).paxos(paxos active c
1028850..1029466) is_readable = 1 - now=2021-08-12 12:37:09.278990 lease_expire=0.000000 has v0 lc
1029466
....
2022-05-12 12:59:18.769963 7f45a92fb700 1 -- 192.168.0.112:6789/0 <== osd.1
192.168.0.114:6800/2801 5724 ==== pg_stats(0 pgs tid 3045 v 0) v1 ==== 124+0+0 (2380105412 0 0)
0x5d96300 con 0x4d5bf40
2022-05-12 12:59:18.770053 7f45a92fb700 1 -- 192.168.0.112:6789/0 --> 192.168.0.114:6800/2801 --
pg_stats_ack(0 pgs tid 3045) v1 -- ?+0 0x550ae00 con 0x4d5bf40
2022-05-12 12:59:32.916397 7f45a9afc700 0 mon.cephn2@0(leader).data_health(1) update_stats avail
53% total 1951 MB, used 780 MB, avail 1053 MB
....
2022-05-12 13:01:05.256263 7f45a92fb700 1 -- 192.168.0.112:6789/0 --> 192.168.0.113:6800/2410 --
mon_subscribe_ack(300s) v1 -- ?+0 0x4f283c0 con 0x4d5b440

OSD Debug Settings

debug_ms = 5
debug_osd = 20

Example Log Output of OSD Debug Settings

2022-05-12 11:27:53.869151 7f5d55d84700 1 -- 192.168.17.3:0/2410 --> 192.168.17.4:6801/2801 --


osd_ping(ping e322 stamp 2021-08-12 11:27:53.869147) v2 -- ?+0 0x63baa00 con 0x578dee0
2022-05-12 11:27:53.869214 7f5d55d84700 1 -- 192.168.17.3:0/2410 --> 192.168.0.114:6801/2801 --
osd_ping(ping e322 stamp 2021-08-12 11:27:53.869147) v2 -- ?+0 0x638f200 con 0x578e040
2022-05-12 11:27:53.870215 7f5d6359f700 1 -- 192.168.17.3:0/2410 <== osd.1 192.168.0.114:6801/2801
109210 ==== osd_ping(ping_reply e322 stamp 2021-08-12 11:27:53.869147) v2 ==== 47+0+0 (261193640 0
0) 0x63c1a00 con 0x578e040
2022-05-12 11:27:53.870698 7f5d6359f700 1 -- 192.168.17.3:0/2410 <== osd.1 192.168.17.4:6801/2801
109210 ==== osd_ping(ping_reply e322 stamp 2021-08-12 11:27:53.869147) v2 ==== 47+0+0 (261193640 0
0) 0x6313200 con 0x578dee0
....
2022-05-12 11:28:10.432313 7f5d6e71f700 5 osd.0 322 tick
2022-05-12 11:28:10.432375 7f5d6e71f700 20 osd.0 322 scrub_random_backoff lost coin flip, randomly
backing off
2022-05-12 11:28:10.432381 7f5d6e71f700 10 osd.0 322 do_waiters -- start
2022-05-12 11:28:10.432383 7f5d6e71f700 10 osd.0 322 do_waiters -- finish

Reference

Configuring logging at runtime

Configuring logging in configuration file

Configuring logging at runtime


Edit online
You can configure the logging of Ceph subsystems at system runtime to help troubleshoot any issues that might occur.

Prerequisites

A running IBM Storage Ceph cluster.

1156 IBM Storage Ceph


Access to Ceph debugger.

Procedure

1. To activate the Ceph debugging output, dout(), at runtime:

ceph tell TYPE.ID injectargs --debug-SUBSYSTEM VALUE [--NAME VALUE]

2. Replace:

TYPE with the type of Ceph daemons (osd, mon, or mds)

ID with a specific ID of the Ceph daemon. Alternatively, use * to apply the runtime setting to all daemons of a particular
type.

SUBSYSTEM with a specific subsystem.

VALUE with a number from 1 to 20, where 1 is terse and 20 is verbose.

For example, to set the log level for the OSD subsystem on the OSD named osd.0 to 0 and the memory level to 5:

# ceph tell osd.0 injectargs --debug-osd 0/5

To see the configuration settings at runtime:

1. Log in to the host with a running Ceph daemon, for example, ceph-osd or ceph-mon.

2. Display the configuration:

Syntax

ceph daemon NAME config show | less

Example

[ceph: root@host01 /]# ceph daemon osd.0 config show | less

Reference

See Ceph subsystems for details.

See Configuration logging in configuration file for details.

The Ceph Debugging and Logging Configuration Reference chapter in the IBM Storage Ceph Configuration Guide.

Configuring logging in configuration file


Edit online
Configure Ceph subsystems to log informational, warning, and error messages to the log file. You can specify the debugging level in
the Ceph configuration file, by default /etc/ceph/ceph.conf.

Prerequisites

A running IBM Storage Ceph cluster.

Procedure

1. To activate Ceph debugging output, dout() at boot time, add the debugging settings to the Ceph configuration file.

a. For subsystems common to each daemon, add the settings under the [global] section.

b. For subsystems for particular daemons, add the settings under a daemon section, such as [mon], [osd], or [mds].

**Example**

[global]
debug_ms = 1/5

[mon]

IBM Storage Ceph 1157


debug_mon = 20
debug_paxos = 1/5
debug_auth = 2

[osd]
debug_osd = 1/5
debug_monc = 5/20

[mds]
debug_mds = 1

Reference

Ceph subsystems

Configuring logging at runtime

Ceph Debugging and Logging Configuration Reference

Accelerating log rotation


Edit online
Increasing debugging level for Ceph components might generate a huge amount of data. If you have almost full disks, you can
accelerate log rotation by modifying the Ceph log rotation file at /etc/logrotate.d/ceph. The Cron job scheduler uses this file to
schedule log rotation.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure

1. Add the size setting after the rotation frequency to the log rotation file:

rotate 7
weekly
size SIZE
compress
sharedscripts

For example, to rotate a log file when it reaches 500 MB:

rotate 7
weekly
size 500 MB
compress
sharedscripts
size 500M

2. Open the crontab editor:

[root@mon ~]# crontab -e

3. Add an entry to check the /etc/logrotate.d/ceph file. For example, to instruct Cron to check /etc/logrotate.d/ceph
every 30 minutes:

30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1

Creating and collecting operation logs for Ceph Object Gateway


Edit online
User identity information is added to the operation log output. This is used to enable customers to access this information for
auditing of S3 access.

1158 IBM Storage Ceph


Track user identities reliably by S3 request in all versions of the Ceph Object Gateway operation log.

Procedure

1. Find where the logs are located.

Syntax

logrotate -f

Example

[root@host01 ~]# logrotate -f


/etc/logrotate.d/ceph-12ab345c-1a2b-11ed-b736-fa163e4f6220

2. List the logs within the specified location.

Syntax

ll LOG_LOCATION

Example

[root@host01 ~]# ll /var/log/ceph/12ab345c-1a2b-11ed-b736-fa163e4f6220


-rw-r--r--. 1 ceph ceph 412 Sep 28 09:26 opslog.log.1.gz

3. Create a bucket.

Syntax

/usr/local/bin/s3cmd mb s3://NEW_BUCKET_NAME

Example

[root@host01 ~]# /usr/local/bin/s3cmd mb s3://bucket1


Bucket `s3://bucket1` created

4. Collect the logs.

Syntax

tail -f LOG_LOCATION/opslog.log

Example

[root@host01 ~]# tail -f /var/log/ceph/12ab345c-1a2b-11ed-b736-fa163e4f6220/opslog.log

{"bucket":"","time":"2022-09-29T06:17:03.133488Z","time_local":"2022-09-
29T06:17:03.133488+0000","remote_addr":"10.0.211.66","user":"test1",
"operation":"list_buckets","uri":"GET /
HTTP/1.1","http_status":"200","error_code":"","bytes_sent":232,
"bytes_received":0,"object_size":0,"total_time":9,"user_agent":"","referrer":
"","trans_id":"tx00000c80881a9acd2952a-006335385f-175e5-primary",
"authentication_type":"Local","access_key_id":"1234","temp_url":false}

{"bucket":"cn1","time":"2022-09-29T06:17:10.521156Z","time_local":"2022-09-
29T06:17:10.521156+0000","remote_addr":"10.0.211.66","user":"test1",
"operation":"create_bucket","uri":"PUT /cn1/
HTTP/1.1","http_status":"200","error_code":"","bytes_sent":0,
"bytes_received":0,"object_size":0,"total_time":106,"user_agent":"",
"referrer":"","trans_id":"tx0000058d60c593632c017-0063353866-175e5-primary",
"authentication_type":"Local","access_key_id":"1234","temp_url":false}

Troubleshooting networking issues


Edit online
This chapter lists basic troubleshooting procedures connected with networking and chrony for Network Time Protocol (NTP).

Prerequisites

A running IBM Storage Ceph cluster.

IBM Storage Ceph 1159


Basic networking troubleshooting
Basic chrony NTP troubleshooting

Basic networking troubleshooting


Edit online
IBM Storage Ceph depends heavily on a reliable network connection. Ceph Storage nodes use the network for communicating with
each other. Networking issues can cause many problems with Ceph OSDs, such as them flapping, or being incorrectly reported as
down. Networking issues can also cause the Ceph Monitor’s clock skew errors. In addition, packet loss, high latency, or limited
bandwidth can impact the cluster performance and stability.

Prerequisites

Root-level access to the node.

Procedure

1. Installing the net-tools and telnet packages can help when troubleshooting network issues that can occur in a Ceph
storage cluster:

Example

[root@host01 ~]# dnf install net-tools


[root@host01 ~]# dnf install telnet

2. Log into the cephadm shell and verify that the public_network parameters in the Ceph configuration file include the correct
values:

Example

[ceph: root@host01 /]# cat /etc/ceph/ceph.conf


# minimal ceph.conf for 57bddb48-ee04-11eb-9962-001a4a000672
[global]
fsid = 57bddb48-ee04-11eb-9962-001a4a000672
mon_host = [v2:10.74.249.26:3300/0,v1:10.74.249.26:6789/0]
[v2:10.74.249.163:3300/0,v1:10.74.249.163:6789/0]
[v2:10.74.254.129:3300/0,v1:10.74.254.129:6789/0]
[mon.host01]
public network = 10.74.248.0/21

3. Exit the shell and verify that the network interfaces are up:

Example

[root@host01 ~]# ip link list


1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default
qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group
default qlen 1000
link/ether 00:1a:4a:00:06:72 brd ff:ff:ff:ff:ff:ff

4. Verify that the Ceph nodes are able to reach each other using their short host names. Verify this on each node in the storage
cluster:

Syntax

ping SHORT_HOST_NAME

Example

[root@host01 ~]# ping host02

5. If you use a firewall, ensure that Ceph nodes are able to reach each other on their appropriate ports. The firewall-cmd and
telnet tools can validate the port status, and if the port is open respectively:

Syntax

1160 IBM Storage Ceph


firewall-cmd --info-zone=ZONE
telnet IP_ADDRESS PORT

Example

[root@host01 ~]# firewall-cmd --info-zone=public


public (active)
target: default
icmp-block-inversion: no
interfaces: ens3
sources:
services: ceph ceph-mon cockpit dhcpv6-client ssh
ports: 9283/tcp 8443/tcp 9093/tcp 9094/tcp 3000/tcp 9100/tcp 9095/tcp
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:

[root@host01 ~]# telnet 192.168.0.22 9100

6. Verify that there are no errors on the interface counters. Verify that the network connectivity between nodes has expected
latency, and that there is no packet loss.

a. Using the ethtool command:

Syntax

ethtool -S INTERFACE

Example

[root@host01 ~]# ethtool -S ens3 | grep errors


NIC statistics:
rx_fcs_errors: 0
rx_align_errors: 0
rx_frame_too_long_errors: 0
rx_in_length_errors: 0
rx_out_length_errors: 0
tx_mac_errors: 0
tx_carrier_sense_errors: 0
tx_errors: 0
rx_errors: 0

b. Using the ifconfig command:

Example

[root@host01 ~]# ifconfig


ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.74.249.26 netmask 255.255.248.0 broadcast 10.74.255.255
inet6 fe80::21a:4aff:fe00:672 prefixlen 64 scopeid 0x20<link>
inet6 2620:52:0:4af8:21a:4aff:fe00:672 prefixlen 64 scopeid 0x0<global>
ether 00:1a:4a:00:06:72 txqueuelen 1000 (Ethernet)
RX packets 150549316 bytes 56759897541 (52.8 GiB)
RX errors 0 dropped 176924 overruns 0 frame 0
TX packets 55584046 bytes 62111365424 (57.8 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536


inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 9373290 bytes 16044697815 (14.9 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 9373290 bytes 16044697815 (14.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

c. Using the netstat command:

Example

[root@host01 ~]# netstat -ai


Kernel Interface table

IBM Storage Ceph 1161


Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
ens3 1500 311847720 0 364903 0 114341918 0 0 0 BMRU
lo 65536 19577001 0 0 0 19577001 0 0 0 LRU

7. For performance issues, in addition to the latency checks and to verify the network bandwidth between all nodes of the
storage cluster, use the iperf3 tool. The iperf3 tool does a simple point-to-point network bandwidth test between a server
and a client.

a. Install the iperf3 package on the IBM Storage Ceph nodes you want to check the bandwidth:

Example

[root@host01 ~]# dnf install iperf3

b. On a IBM Storage Ceph node, start the iperf3 server:

Example

[root@host01 ~]# iperf3 -s


Server listening on 5201

NOTE: The default port is 5201, but can be set using the -P command argument.

c. On a different IBM Storage Ceph node, start the iperf3 client:

Example

[root@host02 ~]# iperf3 -c mon


Connecting to host mon, port 5201
[ 4] local xx.x.xxx.xx port 52270 connected to xx.x.xxx.xx port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 0 409 KBytes
[ 4] 1.00-2.00 sec 113 MBytes 945 Mbits/sec 0 409 KBytes
[ 4] 2.00-3.00 sec 112 MBytes 943 Mbits/sec 0 454 KBytes
[ 4] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 0 471 KBytes
[ 4] 4.00-5.00 sec 112 MBytes 940 Mbits/sec 0 471 KBytes
[ 4] 5.00-6.00 sec 113 MBytes 945 Mbits/sec 0 471 KBytes
[ 4] 6.00-7.00 sec 112 MBytes 937 Mbits/sec 0 488 KBytes
[ 4] 7.00-8.00 sec 113 MBytes 947 Mbits/sec 0 520 KBytes
[ 4] 8.00-9.00 sec 112 MBytes 939 Mbits/sec 0 520 KBytes
[ 4] 9.00-10.00 sec 112 MBytes 939 Mbits/sec 0 520 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 1.10 GBytes 941 Mbits/sec receiver

iperf Done.

This output shows a network bandwidth of 1.1 Gbits/second between the IBM Storage Ceph nodes, along with no
retransmissions (Retr) during the test. IBM recommends you validate the network bandwidth between all the nodes in the
storage cluster.

8. Ensure that all nodes have the same network interconnect speed. Slower attached nodes might slow down the faster
connected ones. Also, ensure that the inter switch links can handle the aggregated bandwidth of the attached nodes:

Syntax

ethtool INTERFACE

Example

[root@host01 ~]# ethtool ens3


Settings for ens3:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised pause frame use: Symmetric

1162 IBM Storage Ceph


Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off
Supports Wake-on: g
Wake-on: d
Current message level: 0x000000ff (255)
drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes

Reference

See the Basic Network troubleshooting solution for details.

See the What is the ethtool command and how can I use it to obtain information about my network devices and interfaces
article for details.

See RHEL network interface dropping packets solutions for details.

For details, see the What are the performance benchmarking tools available for IBM Storage Ceph? solution on the Customer
Portal.

For more information, see Knowledgebase articles and solutions related to troubleshooting networking issues.

Basic chrony NTP troubleshooting


Edit online
This section includes basic chrony NTP troubleshooting steps.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Monitor node.

Procedure

1. Verify that the chronyd daemon is running on the Ceph Monitor hosts:

Example

[root@mon ~]# systemctl status chronyd

2. If chronyd is not running, enable and start it:

Example

[root@mon ~]# systemctl enable chronyd


[root@mon ~]# systemctl start chronyd

3. Ensure that chronyd is synchronizing the clocks correctly:

Example

[root@mon ~]# chronyc sources


[root@mon ~]# chronyc sourcestats
[root@mon ~]# chronyc tracking

Reference

IBM Storage Ceph 1163


See the How to troubleshoot chrony issues solution for advanced chrony NTP troubleshooting steps.

See Clock skew section in the IBM Storage Ceph Troubleshooting Guide for further details.

See the Checking if chrony is synchronized section for further details.

Troubleshooting Ceph Monitors


Edit online
This chapter contains information on how to fix the most common errors related to the Ceph Monitors.

Prerequisites

Verify the network connection.

Most common Ceph Monitor errors


Injecting a monmap
Replacing a failed Monitor
Compacting the monitor store
Opening port for Ceph manager
Recovering the Ceph Monitor store

Most common Ceph Monitor errors


Edit online
This section lists the most common error messages that are returned by the ceph health detail command, or included in the
Ceph logs. The tables provide links to corresponding sections that explain the errors and point to specific procedures to fix the
problems.

Prerequisites

A running IBM Storage Ceph cluster.

Ceph Monitor error messages


Common Ceph Monitor error messages in the Ceph logs
Ceph Monitor is out of quorum
Clock skew
The Ceph Monitor store is getting too big
Understanding Ceph Monitor status

Ceph Monitor error messages


Edit online
A table of common Ceph Monitor error messages, and a potential fix.

Error message See


HEALTH_WARN
mon.X is down (out of quorum) Ceph Monitor is out of quorum
clock skew Clock skew
store is getting too big! The Ceph Monitor store is getting too big

Common Ceph Monitor error messages in the Ceph logs


Edit online

1164 IBM Storage Ceph


A table of common Ceph Monitor error messages found in the Ceph logs, and a link to a potential fix.

Error message Log file See


clock skew Main cluster log Clock skew
clocks not synchronized Main cluster log Clock skew
Corruption: error in middle of Monitor log - Ceph Monitor is out of quorum - Recovering the Ceph Monitor
record store
Corruption: 1 missing files Monitor log - Ceph Monitor is out of quorum - Recovering the Ceph Monitor
store
Caught signal (Bus error) Monitor log Ceph Monitor is out of quorum

Ceph Monitor is out of quorum


Edit online
One or more Ceph Monitors are marked as down but the other Ceph Monitors are still able to form a quorum. In addition, the ceph
health detail command returns an error message similar to the following one:

HEALTH_WARN 1 mons down, quorum 1,2 mon.b,mon.c, mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out
of quorum)

What this means

Ceph marks a Ceph Monitor as down due to various reasons.

If the ceph-mon daemon is not running, it might have a corrupted store or some other error is preventing the daemon from starting.
Also, the /var/ partition might be full. As a consequence, ceph-mon is not able to perform any operations to the store located by
default at /var/lib/ceph/mon-SHORT_HOST_NAME/store.db and terminates.

If the ceph-mon daemon is running but the Ceph Monitor is out of quorum and marked as down, the cause of the problem depends
on the Ceph Monitor state:

If the Ceph Monitor is in the probing state longer than expected, it cannot find the other Ceph Monitors. This problem can be
caused by networking issues, or the Ceph Monitor can have an outdated Ceph Monitor map (monmap) and be trying to reach
the other Ceph Monitors on incorrect IP addresses. Alternatively, if the monmap is up-to-date, Ceph Monitor’s clock might not
be synchronized.

If the Ceph Monitor is in the electing state longer than expected, the Ceph Monitor’s clock might not be synchronized.

If the Ceph Monitor changes its state from synchronizing to electing and back, the cluster state is advancing. This means that it
is generating new maps faster than the synchronization process can handle.

If the Ceph Monitor marks itself as the leader or a peon, then it believes to be in a quorum, while the remaining cluster is sure
that it is not. This problem can be caused by failed clock synchronization.

To troubleshoot this problem

1. Verify that the ceph-mon daemon is running. If not, start it:

Syntax

systemctl status ceph-mon@HOST_NAME


systemctl start ceph-mon@HOST_NAME

Replace HOST_NAME with the short name of the host where the daemon is running. Use the hostname -s command when
unsure.

2. If you are not able to start ceph-mon, follow the steps in The ceph-mon daemon cannot start.

3. If you are able to start the ceph-mon daemon but is marked as down, follow the steps in The ceph-mon daemon is running,
but marked as down.

The ceph-mon daemon cannot start

1. Check the corresponding Ceph Monitor log located at /var/log/ceph/CLUSTER_FSID/ directory.

IBM Storage Ceph 1165


NOTE: By default, the monitor logs are not present in the log folder. You need to enable logging to files for the logs to appear in
the folder. See the Ceph daemon logs to enable logging to files.

2. If the log contains error messages similar to the following ones, the Ceph Monitor might have a corrupted store.

Corruption: error in middle of record


Corruption: 1 missing files; example: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb

To fix this problem, replace the Ceph Monitor. See Replacing a failed monitor for details.

3. If the log contains an error message similar to the following one, the /var/ partition might be full. Delete any unnecessary
data from /var/.

Caught signal (Bus error)

IMPORTANT: Do not delete any data from the Monitor directory manually. Instead, use the ceph-monstore-tool to
compact it.

4. If you see any other error messages, open a support ticket. For more information, see Contacting IBM Support for service.

The ceph-mon daemon is running, but still marked as down

1. From the Ceph Monitor host that is out of the quorum, use the mon_status command to check its state:

[root@mon ~]# ceph daemon ID mon_status

Replace ID with the ID of the Ceph Monitor, for example:

[ceph: root@host01 /]# ceph daemon mon.host01 mon_status

2. If the status is probing, verify the locations of the other Ceph Monitors in the mon_status output.

a. If the addresses are incorrect, the Ceph Monitor has incorrect Ceph Monitor map (monmap). To fix this problem, see
Injecting a monmap.

b. If the addresses are correct, verify that the Ceph Monitor clocks are synchronized. See Clock skew for details. In addition, to
troubleshoot any networking issues, see Troubleshooting Networking issues for details.

3. If the status is electing, verify that the Ceph Monitor clocks are synchronized. See Clock skew for details.

4. If the status changes from electing to synchronizing, open a support ticket. For more information, see Contacting IBM Support
for service.

5. If the Ceph Monitor is the leader or a peon, verify that the Ceph Monitor clocks are synchronized. Open a support ticket if
synchronizing the clocks does not solve the problem. For more information, see Contacting IBM Support for service.

Reference

See Understanding Ceph Monitor status.

See Starting, Stopping, Restarting the Ceph daemons section in the IBM Storage Ceph Administration Guide.

The Using the Ceph Administration Socket section in the IBM Storage Ceph Administration Guide.

Clock skew
Edit online
A Ceph Monitor is out of quorum, and the ceph health detail command output contains error messages similar to these:

mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)


mon.a addr 127.0.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s)

In addition, Ceph logs contain error messages similar to these:

2022-05-04 07:28:32.035795 7f806062e700 0 log [WRN] : mon.a 127.0.0.1:6789/0 clock skew 0.14s > max
0.05s
2022-05-04 04:31:25.773235 7f4997663700 0 log [WRN] : message from mon.1 was stamped 0.186257s in
the future, clocks not synchronized

1166 IBM Storage Ceph


What This Means

The ‘clock skew` error message indicates that Ceph Monitors’ clocks are not synchronized. Clock synchronization is important
because Ceph Monitors depend on time precision and behave unpredictably if their clocks are not synchronized.

The mon_clock_drift_allowed parameter determines what disparity between the clocks is tolerated. By default, this parameter
is set to 0.05 seconds.

IMPORTANT: Do not change the default value of mon_clock_drift_allowed without previous testing. Changing this value might
affect the stability of the Ceph Monitors and the Ceph Storage Cluster in general.

Possible causes of the clock skew error include network problems or problems with chrony Network Time Protocol (NTP)
synchronization if that is configured. In addition, time synchronization does not work properly on Ceph Monitors deployed on virtual
machines.

To Troubleshoot This Problem

1. Verify that your network works correctly. For details, see Troubleshooting networking issues. If you use chrony for NTP, see
Basic chrony NTP troubleshooting section for more information.

2. If you use a remote NTP server, consider deploying your own chrony NTP server on your network. For details, see Using the
Chrony suite to configure NTP in the Configuring basic system settings for Red Hat Enterprise Linux 8.

NOTE: Ceph evaluates time synchronization every five minutes only so there will be a delay between fixing the problem and clearing
the clock skew messages.

Reference

Understanding Ceph Monitor status

Ceph Monitor is out of quorum

The Ceph Monitor store is getting too big


Edit online
The ceph health command returns an error message similar to the following one:

mon.ceph1 store is getting too big! 48031 MB >= 15360 MB -- 62% avail

What This Means

Ceph Monitors store is in fact a LevelDB database that stores entries as key–values pairs. The database includes a cluster map and is
located by default at /var/lib/ceph/mon/CLUSTER_NAME-SHORT_HOST_NAME/store.db.

Querying a large Monitor store can take time. As a consequence, the Ceph Monitor can be delayed in responding to client queries.

In addition, if the /var/ partition is full, the Ceph Monitor cannot perform any write operations to the store and terminates. See
Ceph Monitor is out of quorum for details on troubleshooting this issue.

To Troubleshoot This Problem

1. Check the size of the database:

du -sch /var/lib/ceph/mon/CLUSTER_NAME-SHORT_HOST_NAME/store.db

Specify the name of the cluster and the short host name of the host where the ceph-mon is running.

Example

# du -sch /var/lib/ceph/mon/ceph-host1/store.db
47G /var/lib/ceph/mon/ceph-ceph1/store.db/
47G total

2. Compact the Ceph Monitor store. For details, see Compacting the Ceph Monitor Store.

Reference

Ceph Monitor is out of quorum

IBM Storage Ceph 1167


Understanding Ceph Monitor status
Edit online
The mon_status command returns information about a Ceph Monitor, such as:

State

Rank

Elections epoch

Monitor map (monmap)

If Ceph Monitors are able to form a quorum, use mon_status with the ceph command-line utility.

If Ceph Monitors are not able to form a quorum, but the ceph-mon daemon is running, use the administration socket to execute
mon_status.

An example output of mon_status

{
"name": "mon.3",
"rank": 2,
"state": "peon",
"election_epoch": 96,
"quorum": [
1,
2
],
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": {
"epoch": 1,
"fsid": "d5552d32-9d1d-436c-8db1-ab5fc2c63cd0",
"modified": "0.000000",
"created": "0.000000",
"mons": [
{
"rank": 0,
"name": "mon.1",
"addr": "172.25.1.10:6789\/0"
},
{
"rank": 1,
"name": "mon.2",
"addr": "172.25.1.12:6789\/0"
},
{
"rank": 2,
"name": "mon.3",
"addr": "172.25.1.13:6789\/0"
}
]
}
}

Ceph Monitor States

Leader
During the electing phase, Ceph Monitors are electing a leader. The leader is the Ceph Monitor with the highest rank, that is
the rank with the lowest value. In the example above, the leader is mon.1.

Peon
Peons are the Ceph Monitors in the quorum that are not leaders. If the leader fails, the peon with the highest rank becomes a
new leader.

Probing
A Ceph Monitor is in the probing state if it is looking for other Ceph Monitors. For example, after you start the Ceph Monitors,
they are probing until they find enough Ceph Monitors specified in the Ceph Monitor map (monmap) to form a quorum.

1168 IBM Storage Ceph


Electing
A Ceph Monitor is in the electing state if it is in the process of electing the leader. Usually, this status changes quickly.

Synchronizing
A Ceph Monitor is in the synchronizing state if it is synchronizing with the other Ceph Monitors to join the quorum. The smaller
the Ceph Monitor store it, the faster the synchronization process. Therefore, if you have a large store, synchronization takes a
longer time.

Reference

For more inofrmation, see Using the Ceph Administration Socket.

Injecting a monmap

Edit online
If a Ceph Monitor has an outdated or corrupted Ceph Monitor map (monmap), it cannot join a quorum because it is trying to reach the
other Ceph Monitors on incorrect IP addresses.

The safest way to fix this problem is to obtain and inject the actual Ceph Monitor map from other Ceph Monitors.

NOTE: This action overwrites the existing Ceph Monitor map kept by the Ceph Monitor.

This procedure shows how to inject the Ceph Monitor map when the other Ceph Monitors are able to form a quorum, or when at least
one Ceph Monitor has a correct Ceph Monitor map. If all Ceph Monitors have corrupted store and therefore also the Ceph Monitor
map, see Recovering the Ceph Monitor store.

Prerequisites

Access to the Ceph Monitor Map.

Root-level access to the Ceph Monitor node.

Procedure

1. If the remaining Ceph Monitors are able to form a quorum, get the Ceph Monitor map by using the ceph mon getmap
command:

Example

[ceph: root@host01 /]# ceph mon getmap -o /tmp/monmap

2. If the remaining Ceph Monitors are not able to form the quorum and you have at least one Ceph Monitor with a correct Ceph
Monitor map, copy it from that Ceph Monitor:

a. Stop the Ceph Monitor which you want to copy the Ceph Monitor map from:

Syntax

systemctl stop ceph-mon@HOST_NAME

For example, to stop the Ceph Monitor running on a host with the host01 short host name:

Example

[root@mon ~]# systemctl stop ceph-mon@host01

b. Copy the Ceph Monitor map:

Syntax

ceph-mon -i ID --extract-monmap /tmp/monmap

Replace ID with the ID of the Ceph Monitor which you want to copy the Ceph Monitor map from:

Example

[ceph: root@host01 /]# ceph-mon -i mon.a --extract-monmap /tmp/monmap

IBM Storage Ceph 1169


3. Stop the Ceph Monitor with the corrupted or outdated Ceph Monitor map:

Syntax

systemctl stop ceph-mon@HOST_NAME

For example, to stop a Ceph Monitor running on a host with the host01 short host name:

Example

[root@mon ~]# systemctl stop ceph-mon@host01

4. Inject the Ceph Monitor map:

Syntax

ceph-mon -i ID --inject-monmap /tmp/monmap

Replace ID with the ID of the Ceph Monitor with the corrupted or outdated Ceph Monitor map:

Example

[root@mon ~]# ceph-mon -i mon.host01 --inject-monmap /tmp/monmap

5. Start the Ceph Monitor:

Example

[root@mon ~]# systemctl start ceph-mon@host01

If you copied the Ceph Monitor map from another Ceph Monitor, start that Ceph Monitor, too:

Example

[root@mon ~]# systemctl start ceph-mon@host01

Reference

See Ceph Monitor is out of quorum.

See Recovering the Ceph Monitor store when using bluestore.

Replacing a failed Monitor


Edit online
When a Ceph Monitor has a corrupted store, you can replace the monitor in the storage cluster.

Prerequisites

A running IBM Storage Ceph cluster.

Able to form a quorum.

Root-level access to Ceph Monitor node.

1. From the Monitor host, remove the Monitor store by default located at /var/lib/ceph/mon/CLUSTER_NAME-
SHORT_HOST_NAME:

rm -rf /var/lib/ceph/mon/CLUSTER_NAME-SHORT_HOST_NAME

Specify the short host name of the Monitor host and the cluster name. For example, to remove the Monitor store of a Monitor
running on host1 from a cluster called remote:

[root@mon ~]# rm -rf /var/lib/ceph/mon/remote-host1

2. Remove the Monitor from the Monitor map (monmap):

ceph mon remove SHORT_HOST_NAME --cluster CLUSTER_NAME

1170 IBM Storage Ceph


Specify the short host name of the Monitor host and the cluster name. For example, to remove the Monitor running on host1
from a cluster called remote:

[ceph: root@host01 /]# ceph mon remove host01 --cluster remote

3. Troubleshoot and fix any problems related to the underlying file system or hardware of the Monitor host.

Reference

See Ceph Monitor is out of quorum for details.

Compacting the monitor store


Edit online
When the Monitor store has grown big in size, you can compact it:

Dynamically by using the ceph tell command.

Upon the start of the ceph-mon daemon.

By using the ceph-monstore-tool when the ceph-mon daemon is not running. Use this method when the previously
mentioned methods fail to compact the Monitor store or when the Monitor is out of quorum and its log contains the Caught
signal (Bus error) error message.

IMPORTANT: Monitor store size changes when the cluster is not in the active+clean state or during the rebalancing process. For
this reason, compact the Monitor store when rebalancing is completed. Also, ensure that the placement groups are in the
active+clean state.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Monitor node.

Procedure

1. To compact the Monitor store when the ceph-mon daemon is running:

Syntax

ceph tell mon.HOST_NAME compact

2. Replace HOST_NAME with the short host name of the host where the ceph-mon is running. Use the hostname -s command
when unsure.

Example

[ceph: root@host01 /]# ceph tell mon.host01 compact

3. Add the following parameter to the Ceph configuration under the [mon] section:

[mon]
mon_compact_on_start = true

4. Restart the ceph-mon daemon:

Syntax

systemctl restart ceph-mon@HOST_NAME

Replace HOST_NAME with the short name of the host where the daemon is running. Use the hostname -s command when
unsure.

Example

[root@mon ~]# systemctl restart ceph-mon@host01

5. Ensure that Monitors have formed a quorum:

IBM Storage Ceph 1171


[ceph: root@host01 /]# ceph mon stat

6. Repeat these steps on other Monitors if needed.

NOTE: Before you start, ensure that you have the ceph-test package installed.

7. Verify that the ceph-mon daemon with the large store is not running. Stop the daemon if needed.

Syntax

[root@mon ]# systemctl status ceph-mon@HOST_NAME


[root@mon ]# systemctl stop ceph-mon@HOST_NAME

Replace HOST_NAME with the short name of the host where the daemon is running. Use the hostname -s command when
unsure.

Example

[root@mon ~]# systemctl status ceph-mon@host01


[root@mon ~]# systemctl stop ceph-mon@host01

8. Compact the Monitor store:

Syntax

ceph-monstore-tool /var/lib/ceph/mon/mon.HOST_NAME compact

Replace HOST_NAME with a short host name of the Monitor host.

Example

[ceph: root@host01 /]# ceph-monstore-tool /var/lib/ceph/mon/mon.host01 compact

9. Start ceph-mon again:

Syntax

systemctl start ceph-mon@HOST_NAME

Example

[root@mon ~]# systemctl start ceph-mon@host01

Reference

See The Ceph Monitor store is getting too big for details.

See Ceph Monitor is out of quorum for details.

Opening port for Ceph manager


Edit online
The ceph-mgr daemons receive placement group information from OSDs on the same range of ports as the ceph-osd daemons. If
these ports are not open, a cluster will devolve from HEALTH_OK to HEALTH_WARN and will indicate that PGs are unknown with a
percentage count of the PGs unknown.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to Ceph Manager.

Procedure

1. To resolve this situation, for each host running ceph-mgr daemons, open ports 6800-7300.

Example

[root@ceph-mgr] # firewall-cmd --add-port 6800-7300/tcp


[root@ceph-mgr] # firewall-cmd --add-port 6800-7300/tcp --permanent

1172 IBM Storage Ceph


2. Restart the ceph-mgr daemons.

Recovering the Ceph Monitor store


Edit online
Ceph Monitors store the cluster map in a key-value store such as LevelDB. If the store is corrupted on a Monitor, the Monitor
terminates unexpectedly and fails to start again. The Ceph logs might include the following errors:

Corruption: error in middle of record


Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb

The IBM Storage Ceph clusters use at least three Ceph Monitors so that if one fails, it can be replaced with another one. However,
under certain circumstances, all Ceph Monitors can have corrupted stores. For example, when the Ceph Monitor nodes have
incorrectly configured disk or file system settings, a power outage can corrupt the underlying file system.

If there is corruption on all Ceph Monitors, you can recover it with information stored on the OSD nodes by using utilities called
ceph-monstore-tool and ceph-objectstore-tool.

IMPORTANT: These procedures cannot recover the following information:

Metadata Daemon Server (MDS) keyrings and maps

Placement Group settings:

full ratio set by using the ceph pg set_full_ratio command

nearfull ratio set by using the ceph pg set_nearfull_ratio command

IMPORTANT: Never restore the Ceph Monitor store from an old backup. Rebuild the Ceph Monitor store from the current cluster
state using the following steps and restore from that.

Recovering the Ceph Monitor store when using BlueStore

Recovering the Ceph Monitor store when using BlueStore


Edit online
Follow this procedure if the Ceph Monitor store is corrupted on all Ceph Monitors and you use the BlueStore back end.

In containerized environments, this method requires attaching Ceph repositories and restoring to a non-containerized Ceph Monitor
first.

WARNING: This procedure can cause data loss. If you are unsure about any step in this procedure, contact the IBM Support for
assistance with the recovering process.

Prerequisites

All OSDs containers are stopped.

Enable Ceph repositories on the Ceph nodes based on their roles.

The ceph-test and rsync packages are installed on the OSD and Monitor nodes.

The ceph-mon package is installed on the Monitor nodes.

The ceph-osd package is installed on the OSD nodes.

Procedure

1. Mount all disks with Ceph data to a temporary location. Repeat this step for all OSD nodes.

a. List the data partitions using the ceph-volume command:

Example

IBM Storage Ceph 1173


[ceph: root@host01 /]# ceph-volume lvm list

b. Mount the data partitions to a temporary location:

Syntax

mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-$i

c. Restore the SELinux context:

Syntax

for i in {OSD_ID}; do restorecon /var/lib/ceph/osd/ceph-$i; done

Replace OSD_ID with a numeric, space-separated list of Ceph OSD IDs on the OSD node.

d. Change the owner and group to ceph:ceph:

Syntax

for i in {OSD_ID}; do chown -R ceph:ceph /var/lib/ceph/osd/ceph-$i; done

Replace OSD_ID with a numeric, space-separated list of Ceph OSD IDs on the OSD node.

IMPORTANT: Due to a bug that causes the update-mon-db command to use additional db and db.slow directories for the
Monitor database, you must also copy these directories. To do so:

a. Prepare a temporary location outside the container to mount and access the OSD database and extract the OSD maps
needed to restore the Ceph Monitor:

Syntax

ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev OSD-DATA --path


/var/lib/ceph/osd/ceph-OSD-ID

Replace OSD-DATA with the Volume Group (VG) or Logical Volume (LV) path to the OSD data and OSD-ID with the ID of
the OSD.

b. Create a symbolic link between the BlueStore database and block.db:

Syntax

ln -snf BLUESTORE DATABASE /var/lib/ceph/osd/ceph-OSD-ID/block.db

Replace BLUESTORE-DATABASE with the Volume Group (VG) or Logical Volume (LV) path to the BlueStore database and
OSD-ID with the ID of the OSD.

2. Use the following commands from the Ceph Monitor node with the corrupted store. Repeat them for all OSDs on all nodes.

a. Collect the cluster map from all OSD nodes:

Example

[root@host01 ~]# cd /root/


[root@host01 ~]# ms=/tmp/monstore/
[root@host01 ~]# db=/root/db/
[root@host01 ~]# db_slow=/root/db.slow/

[root@host01 ~]# mkdir $ms


[root@host01 ~]# for host in $osd_nodes; do
echo "$host"
rsync -avz $ms $host:$ms
rsync -avz $db $host:$db
rsync -avz $db_slow $host:$db_slow

rm -rf $ms
rm -rf $db
rm -rf $db_slow

sh -t $host <<EOF
for osd in /var/lib/ceph/osd/ceph-*; do
ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-
db --mon-store-path $ms

1174 IBM Storage Ceph


done
EOF

rsync -avz $host:$ms $ms


rsync -avz $host:$db $db
rsync -avz $host:$db_slow $db_slow
done

b. Set the appropriate capabilities:

Example

[ceph: root@host01 /]# ceph-authtool /etc/ceph/ceph.client.admin.keyring -n mon. --cap mon


'allow *' --gen-key
[ceph: root@host01 /]# cat /etc/ceph/ceph.client.admin.keyring
[mon.]
key = AQCleqldWqm5IhAAgZQbEzoShkZV42RiQVffnA==
caps mon = "allow *"
[client.admin]
key = AQCmAKld8J05KxAArOWeRAw63gAwwZO5o75ZNQ==
auid = 0
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"

c. Move all sst file from the db and db.slow directories to the temporary location:

Example

[ceph: root@host01 /]# mv /root/db/*.sst /root/db.slow/*.sst /tmp/monstore/store.db

d. Rebuild the Monitor store from the collected map:

Example

[ceph: root@host01 /]# ceph-monstore-tool /tmp/monstore rebuild -- --keyring


/etc/ceph/ceph.client.admin

NOTE: After using this command, only keyrings extracted from the OSDs and the keyring specified on the ceph-monstore-
tool command line are present in Ceph’s authentication database. You have to recreate or import all other keyrings, such as
clients, Ceph Manager, Ceph Object Gateway, and others, so those clients can access the cluster.

e. Back up the corrupted store. Repeat this step for all Ceph Monitor nodes:

Syntax

mv /var/lib/ceph/mon/ceph-HOSTNAME/store.db /var/lib/ceph/mon/ceph-HOSTNAME/store.db.corrupted

Replace HOSTNAME with the host name of the Ceph Monitor node.

f. Replace the corrupted store. Repeat this step for all Ceph Monitor nodes:

Syntax

scp -r /tmp/monstore/store.db HOSTNAME:/var/lib/ceph/mon/ceph-HOSTNAME/

Replace HOSTNAME with the host name of the Monitor node.

g. Change the owner of the new store. Repeat this step for all Ceph Monitor nodes:

Syntax

chown -R ceph:ceph /var/lib/ceph/mon/ceph-HOSTNAME/store.db

Replace HOSTNAME with the host name of the Ceph Monitor node.

3. Unmount all the temporary mounted OSDs on all nodes:

Example

[root@host01 ~]# umount /var/lib/ceph/osd/ceph-*

4. Start all the Ceph Monitor daemons:

IBM Storage Ceph 1175


[root@host01 ~]# systemctl start ceph-mon *

5. Ensure that the Monitors are able to form a quorum:

Syntax

ceph -s

Replace HOSTNAME with the host name of the Ceph Monitor node.

6. Import the Ceph Manager keyring and start all Ceph Manager processes:

Syntax

ceph auth import -i /etc/ceph/ceph.mgr.HOSTNAME.keyring


systemctl start ceph-mgr@HOSTNAME

Replace HOSTNAME with the host name of the Ceph Manager node.

7. Start all OSD processes across all OSD nodes:

Example

[root@host01 ~]# systemctl start ceph-osd *

8. Ensure that the OSDs are returning to service:

Example

[ceph: root@host01 /]# ceph -s

Reference

For details on registering Ceph nodes to the Content Delivery Network (CDN), see Registering the IBM Storage Ceph nodes to
the CDN and attaching subscriptions section in the IBM Storage Ceph Installation Guide.

Troubleshooting Ceph OSDs


Edit online
This chapter contains information on how to fix the most common errors related to Ceph OSDs.

Prerequisites

Verify your network connection. See Troubleshooting networking issues for details.

Verify that Monitors have a quorum by using the ceph health command. If the command returns a health status
(HEALTH_OK, HEALTH_WARN, or HEALTH_ERR), the Monitors are able to form a quorum. If not, address any Monitor problems
first. See Troubleshooting Ceph Monitors for details. For details about ceph health see Understanding Ceph health.

Optionally, stop the rebalancing process to save time and resources. See Stopping and starting rebalancing for details.

Most common Ceph OSD errors


Stopping and starting rebalancing
Mounting the OSD data partition
Replacing an OSD drive
Increasing the PID count
Deleting data from a full storage cluster

Most common Ceph OSD errors


Edit online
The following tables list the most common error messages that are returned by the ceph health detail command, or included in
the Ceph logs. The tables provide links to corresponding sections that explain the errors and point to specific procedures to fix the
problems.

1176 IBM Storage Ceph


Prerequisites

Root-level access to the Ceph OSD nodes.

Ceph OSD error messages


Common Ceph OSD error messages in the Ceph logs
Full OSDs
Backfillfull OSDs
Nearfull OSDs
Down OSDs
Flapping OSDs
Slow requests or requests are blocked

Ceph OSD error messages


Edit online
A table of common Ceph OSD error messages, and a potential fix.

Error message See


HEALTH_ERR
full osds Full OSDs
HEALTH_WARN
nearfull osds Nearfull OSDs
osds are down Down OSDs Flapping OSDs
requests are blocked Slow request or requests are blocked
slow requests Slow request or requests are blocked

Common Ceph OSD error messages in the Ceph logs


Edit online
A table of common Ceph OSD error messages found in the Ceph logs, and a link to a potential fix.

Error message Log file See


heartbeat_check: no reply from osd.X Main cluster log Flapping OSDs
wrongly marked me down Main cluster log Flapping OSDs
osds have slow requests Main cluster log Slow request or requests are blocked
FAILED assert(0 == "hit suicide timeout") OSD log Down OSDs

Full OSDs
Edit online
The ceph health detail command returns an error message similar to the following one:

HEALTH_ERR 1 full osds


osd.3 is full at 95%

What This Means

Ceph prevents clients from performing I/O operations on full OSD nodes to avoid losing data. It returns the HEALTH_ERR full
osds message when the cluster reaches the capacity set by the mon_osd_full_ratio parameter. By default, this parameter is set
to 0.95 which means 95% of the cluster capacity.

To Troubleshoot This Problem

Determine how many percent of raw storage (%RAW USED) is used:

IBM Storage Ceph 1177


ceph df

If %RAW USED is above 70-75%, you can:

Delete unnecessary data. This is a short-term solution to avoid production downtime.

Scale the cluster by adding a new OSD node. This is a long-term solution recommended by IBM.

Reference

See Nearfull OSDs in IBM Storage Ceph Troubleshooting Guide for details.

See Deleting data from a full storage cluster in IBM Storage Ceph Troubleshooting Guide for details.

Backfillfull OSDs
Edit online
The ceph health detail command returns an error message similar to the following one:

health: HEALTH_WARN
3 backfillfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 32 pgs backfill_toofull

What this means

When one or more OSDs has exceeded the backfillfull threshold, Ceph prevents data from rebalancing to this device. This is an early
warning that rebalancing might not complete and that the cluster is approaching full. The default for the backfullfull threshold is
90%.

To troubleshoot this problem, check utilization by pool:

ceph df

If %RAW USED is above 70-75%, you can carry out one of the following actions:

Delete unnecessary data. This is a short-term solution to avoid production downtime.

Scale the cluster by adding a new OSD node. This is a long-term solution recommended by Red Hat.

Increase the backfillfull ratio for the OSDs that contain the PGs stuck in backfull_toofull to allow the recovery
process to continue. Add new storage to the cluster as soon as possible or remove data to prevent filling more OSDs.

Syntax

ceph osd set-backfillfull-ratio _VALUE_

The range for VALUE is 0.0 to 1.0.

Example

[ceph: root@host01/]# ceph osd set-backfillfull-ratio 0.92

References

See Nearfull OSDS for details.

See Deleting data from a full storage cluster for details.

Nearfull OSDs
Edit online
The ceph health detail command returns an error message similar to the following one:

HEALTH_WARN 1 nearfull osds


osd.2 is near full at 85%

1178 IBM Storage Ceph


What This Means

Ceph returns the nearfull osds message when the cluster reaches the capacity set by the mon osd nearfull ratio
defaults parameter. By default, this parameter is set to 0.85 which means 85% of the cluster capacity.

Ceph distributes data based on the CRUSH hierarchy in the best possible way but it cannot guarantee equal distribution. The main
causes of the uneven data distribution and the nearfull osds messages are:

The OSDs are not balanced among the OSD nodes in the cluster. That is, some OSD nodes host significantly more OSDs than
others, or the weight of some OSDs in the CRUSH map is not adequate to their capacity.

The Placement Group (PG) count is not proper as per the number of the OSDs, use case, target PGs per OSD, and OSD
utilization.

The cluster uses inappropriate CRUSH tunables.

The back-end storage for OSDs is almost full.

To Troubleshoot This Problem:

1. Verify that the PG count is sufficient and increase it if needed.

2. Verify that you use CRUSH tunables optimal to the cluster version and adjust them if not.

3. Change the weight of OSDs by utilization.

4. Determine how much space is left on the disks used by OSDs.

a. To view how much space OSDs use in general:

[ceph: root@host01 /]# ceph osd df

b. To view how much space OSDs use on particular nodes. Use the following command from the node containing nearfull
OSDs:

df

c. If needed, add a new OSD node.

Reference

See Full OSDs for details.

For details, see CRUSH Tunables section in the Storage Strategies Guide for IBM Storage Ceph 5.3 and How can I test the
impact CRUSH map tunable modifications will have on my PG distribution across OSDs in IBM Storage Ceph?.

See Increasing the placement group for details.

Down OSDs
Edit online
The ceph health detail command returns an error similar to the following one:

HEALTH_WARN 1/3 in osds are down

What This Means

One of the ceph-osd processes is unavailable due to a possible service failure or problems with communication with other OSDs. As
a consequence, the surviving ceph-osd daemons reported this failure to the Monitors.

If the ceph-osd daemon is not running, the underlying OSD drive or file system is either corrupted, or some other error, such as a
missing keyring, is preventing the daemon from starting.

In most cases, networking issues cause the situation when the ceph-osd daemon is running but still marked as down.

To Troubleshoot This Problem

1. Determine which OSD is down:

IBM Storage Ceph 1179


[ceph: root@host01 /]# ceph health detail
HEALTH_WARN 1/3 in osds are down
osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080

2. Try to restart the ceph-osd daemon:

[root@host01 ~]# systemctl restart ceph-osd@OSD_NUMBER

Replace OSD_NUMBER with the ID of the OSD that is down, for example:

[root@host01 ~]# systemctl restart ceph-osd@0

a. If you are not able start ceph-osd, follow the steps in The ceph-osd daemon cannot start.

b. If you are able to start the ceph-osd daemon but it is marked as down, follow the steps in The ceph-osd daemon is
running but still marked as down.

The ceph-osd daemon cannot start

1. If you have a node containing a number of OSDs (generally, more than twelve), verify that the default maximum number of
threads (PID count) is sufficient. See Increasing the PID count for details.

2. Verify that the OSD data and journal partitions are mounted properly. You can use the ceph-volume lvm list command to
list all devices and volumes associated with the Ceph Storage Cluster and then manually inspect if they are mounted properly.
See the mount(8) manual page for details.

3. If you got the ERROR: missing keyring, cannot use cephx for authentication error message, the OSD is a
missing keyring.

4. If you got the ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1 error message, the
ceph-osd daemon cannot read the underlying file system. See the following steps for instructions on how to troubleshoot
and fix this error.

a. Check the corresponding log file to determine the cause of the failure. By default, Ceph stores log files in the
/var/log/ceph/ directory.

b. An EIO error message indicates a failure of the underlying disk. To fix this problem replace the underlying OSD disk. See
Replacing an OSD drive for details.

c. If the log includes any other FAILED assert errors, such as the following one, open a support ticket. See Contacting IBM
Support for service for details.

FAILED assert(0 == "hit suicide timeout")

5. Check the dmesg output for the errors with the underlying file system or disk:

dmesg

a. The error -5 error message similar to the following one indicates corruption of the underlying XFS file system. For details
on how to fix this problem, see the What is the meaning of "xfs_log_force: error -5 returned"? solution on the IBM Customer
Portal.

xfs_log_force: error -5 returned

b. If the dmesg output includes any SCSI error error messages, see the SCSI Error Codes Solution Finder solution to
determine the best way to fix the problem.

c. Alternatively, if you are unable to fix the underlying file system, replace the OSD drive. See Replacing an OSD drive for
details.

6. If the OSD failed with a segmentation fault, such as the following one, gather the required information and open a support
ticket. See Contacting IBM Support for service for details.

Caught signal (Segmentation fault)

The ceph-osd is running but still marked as down

1. Check the corresponding log file to determine the cause of the failure. By default, Ceph stores log files in the
/var/log/ceph/ directory.

a. If the log includes error messages similar to the following ones, see Flapping OSDs.

1180 IBM Storage Ceph


wrongly marked me down
heartbeat_check: no reply from osd.2 since back

b. If you see any other errors, open a support ticket. See Contacting IBM Support for service for details.

Reference

Flapping OSDs

Stale placement groups

See Ceph daemon logs to enable logging to files.

Flapping OSDs
Edit online
The ceph -w | grep osds command shows OSDs repeatedly as down and then up again within a short period of time:

ceph -w | grep osds


2022-05-05 06:27:20.810535 mon.0 [INF] osdmap e609: 9 osds: 8 up, 9 in
2022-05-05 06:27:24.120611 mon.0 [INF] osdmap e611: 9 osds: 7 up, 9 in
2022-05-05 06:27:25.975622 mon.0 [INF] HEALTH_WARN; 118 pgs stale; 2/9 in osds are down
2022-05-05 06:27:27.489790 mon.0 [INF] osdmap e614: 9 osds: 6 up, 9 in
2022-05-05 06:27:36.540000 mon.0 [INF] osdmap e616: 9 osds: 7 up, 9 in
2022-05-05 06:27:39.681913 mon.0 [INF] osdmap e618: 9 osds: 8 up, 9 in
2022-05-05 06:27:43.269401 mon.0 [INF] osdmap e620: 9 osds: 9 up, 9 in
2022-05-05 06:27:54.884426 mon.0 [INF] osdmap e622: 9 osds: 8 up, 9 in
2022-05-05 06:27:57.398706 mon.0 [INF] osdmap e624: 9 osds: 7 up, 9 in
2022-05-05 06:27:59.669841 mon.0 [INF] osdmap e625: 9 osds: 6 up, 9 in
2022-05-05 06:28:07.043677 mon.0 [INF] osdmap e628: 9 osds: 7 up, 9 in
2022-05-05 06:28:10.512331 mon.0 [INF] osdmap e630: 9 osds: 8 up, 9 in
2022-05-05 06:28:12.670923 mon.0 [INF] osdmap e631: 9 osds: 9 up, 9 in

In addition, the Ceph log contains error messages similar to the following ones:

2022-05-25 03:44:06.510583 osd.50 127.0.0.1:6801/149046 18992 : cluster [WRN] map e600547 wrongly
marked me down

2022-05-25 19:00:08.906864 7fa2a0033700 -1 osd.254 609110 heartbeat_check: no reply from osd.2


since back 2021-07-25 19:00:07.444113 front 2021-07-25 18:59:48.311935 (cutoff 2021-07-25
18:59:48.906862)

What This Means

The main causes of flapping OSDs are:

Certain storage cluster operations, such as scrubbing or recovery, take an abnormal amount of time, for example, if you
perform these operations on objects with a large index or large placement groups. Usually, after these operations finish, the
flapping OSDs problem is solved.

Problems with the underlying physical hardware. In this case, the ceph health detail command also returns the slow
requests error message.

Problems with the network.

Ceph OSDs cannot manage situations where the private network for the storage cluster fails, or significant latency is on the public
client-facing network.

Ceph OSDs use the private network for sending heartbeat packets to each other to indicate that they are up and in. If the private
storage cluster network does not work properly, OSDs are unable to send and receive the heartbeat packets. As a consequence, they
report each other as being down to the Ceph Monitors, while marking themselves as up.

The following parameters in the Ceph configuration file influence this behavior:

Parameter Description Default value


osd_heartbeat_grace_ How long OSDs wait for the heartbeat packets to return before reporting an OSD as 20 seconds
time down to the Ceph Monitors.

IBM Storage Ceph 1181


Parameter Description Default value
mon_osd_min_down_rep How many OSDs must report another OSD as down before the Ceph Monitors mark 2
orters the OSD as down
This table shows that in the default configuration, the Ceph Monitors mark an OSD as down if only one OSD made three distinct
reports about the first OSD being down. In some cases, if one single host encounters network issues, the entire cluster can
experience flapping OSDs. This is because the OSDs that reside on the host will report other OSDs in the cluster as down.

NOTE: The flapping OSDs scenario does not include the situation when the OSD processes are started and then immediately killed.

To Troubleshoot This Problem

1. Check the output of the ceph health detail command again. If it includes the slow requests error message, see for
details on how to troubleshoot this issue.

ceph health detail


HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests
30 ops are blocked > 268435 sec
1 ops are blocked > 268435 sec on osd.11
1 ops are blocked > 268435 sec on osd.18
28 ops are blocked > 268435 sec on osd.39
3 osds have slow requests

2. Determine which OSDs are marked as down and on what nodes they reside:

ceph osd tree | grep down

3. On the nodes containing the flapping OSDs, troubleshoot and fix any networking problems. For details, see Troubleshooting
networking issues.

4. Alternatively, you can temporarily force Monitors to stop marking the OSDs as down and up by setting the noup and nodown
flags:

ceph osd set noup


ceph osd set nodown

IMPORTANT: Using the noup and nodown flags does not fix the root cause of the problem but only prevents OSDs from
flapping.

IMPORTANT: Flapping OSDs can be caused by MTU misconfiguration on Ceph OSD nodes, at the network switch level, or both. To
resolve the issue, set MTU to a uniform size on all storage cluster nodes, including on the core and access network switches with a
planned downtime. Do not tune osd heartbeat min size because changing this setting can hide issues within the network, and
it will not solve actual network inconsistency.

Reference

See Ceph heartbeat section in the IBM Storage Ceph Architecture Guide for details.

See Slow requests or requests are blocked section in the IBM Storage Ceph Troubleshooting Guide.

Slow requests or requests are blocked


Edit online
The ceph-osd daemon is slow to respond to a request and the ceph health detail command returns an error message similar
to the following one:

HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests
30 ops are blocked > 268435 sec
1 ops are blocked > 268435 sec on osd.11
1 ops are blocked > 268435 sec on osd.18
28 ops are blocked > 268435 sec on osd.39
3 osds have slow requests

In addition, the Ceph logs include an error message similar to the following ones:

2022-05-24 13:18:10.024659 osd.1 127.0.0.1:6812/3032 9 : cluster [WRN] 6 slow requests, 6 included


below; oldest blocked for > 61.758455 secs

2022-05-25 03:44:06.510583 osd.50 [WRN] slow request 30.005692 seconds old, received at {date-

1182 IBM Storage Ceph


time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4
currently waiting for subops from [610]

What This Means

An OSD with slow requests is every OSD that is not able to service the I/O operations per second (IOPS) in the queue within the time
defined by the osd_op_complaint_time parameter. By default, this parameter is set to 30 seconds.

The main causes of OSDs having slow requests are:

Problems with the underlying hardware, such as disk drives, hosts, racks, or network switches

Problems with the network. These problems are usually connected with flapping OSDs. See Flapping OSDs for details.

System load

The following table shows the types of slow requests. Use the dump_historic_ops administration socket command to determine
the type of a slow request.

Slow request type Description


waiting for rw locks The OSD is waiting to acquire a lock on a placement group for the operation.
waiting for subops The OSD is waiting for replica OSDs to apply the operation to the journal.
no flag points reached The OSD did not reach any major operation milestone.
waiting for degraded object The OSDs have not replicated an object the specified number of times yet.
For details about the administration socket, see Using the Ceph Administration Socket section in the Administration Guide for IBM
Storage Ceph 5.3.

To Troubleshoot This Problem

1. Determine if the OSDs with slow or block requests share a common piece of hardware, for example, a disk drive, host, rack, or
network switch.

2. If the OSDs share a disk:

a. Use the smartmontools utility to check the health of the disk or the logs to determine any errors on the disk.

NOTE: The smartmontools utility is included in the smartmontools package.

b. Use the iostat utility to get the I/O wait report (%iowai) on the OSD disk to determine if the disk is under heavy load.

NOTE: The iostat utility is included in the sysstat package.

3. If the OSDs share the node with another service:

a. Check the RAM and CPU utilization

b. Use the netstat utility to see the network statistics on the Network Interface Controllers (NICs) and troubleshoot any
networking issues.

4. If the OSDs share a rack, check the network switch for the rack. For example, if you use jumbo frames, verify that the NIC in
the path has jumbo frames set.

5. If you are unable to determine a common piece of hardware shared by OSDs with slow requests, or to troubleshoot and fix
hardware and networking problems, open a support ticket. See Contacting IBM support for service for details.

Reference

See Using the Ceph Administration Socket section in the IBM Stroage Ceph Administration Guide for details.

See Troubleshooting networking issues for details.

Stopping and starting rebalancing


Edit online
When an OSD fails or you stop it, the CRUSH algorithm automatically starts the rebalancing process to redistribute data across the
remaining OSDs.

IBM Storage Ceph 1183


Rebalancing can take time and resources, therefore, consider stopping rebalancing during troubleshooting or maintaining OSDs.

NOTE: Placement groups within the stopped OSDs become degraded during troubleshooting and maintenance.

Prerequisites

Root-level access to the Ceph Monitor node.

Procedure

1. Log in to the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Set the noout flag before stopping the OSD:

Example

[ceph: root@host01 /]# ceph osd set noout

3. When you finish troubleshooting or maintenance, unset the noout flag to start rebalancing:

Example

[ceph: root@host01 /]# ceph osd unset noout

Reference

Rebalancing and Recovery.

Mounting the OSD data partition


Edit online
If the OSD data partition is not mounted correctly, the ceph-osd daemon cannot start. If you discover that the partition is not
mounted as expected, follow the steps in this section to mount it.

Prerequisites

Access to the ceph-osd daemon.

Root-level access to the Ceph Monitor node.

Procedure

1. Mount the partition:

Syntax

mount -o noatime PARTITION /var/lib/ceph/osd/CLUSTER_NAME-OSD_NUMBER

Replace PARTITION with the path to the partition on the OSD drive dedicated to OSD data. Specify the cluster name and the
OSD number.

Example

[root@host01 ~]# mount -o noatime /dev/sdd1 /var/lib/ceph/osd/ceph-0

2. Try to start the failed ceph-osd daemon:

Syntax

systemctl start ceph-osd@OSD_NUMBER

Replace the OSD_NUMBER with the ID of the OSD.

Example

1184 IBM Storage Ceph


[root@host01 ~]# systemctl start ceph-osd@0

Reference

See Down OSDs in the IBM Storage Ceph Troubleshooting Guide for more details.

Replacing an OSD drive


Edit online
Ceph is designed for fault tolerance, which means that it can operate in a degraded state without losing data. Consequently, Ceph
can operate even if a data storage drive fails. In the context of a failed drive, the degraded state means that the extra copies of the
data stored on other OSDs will backfill automatically to other OSDs in the cluster. However, if this occurs, replace the failed OSD drive
and recreate the OSD manually.

When a drive fails, Ceph reports the OSD as down:

HEALTH_WARN 1/3 in osds are down


osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080

NOTE: Ceph can mark an OSD as down also as a consequence of networking or permissions problems.

Modern servers typically deploy with hot-swappable drives so you can pull a failed drive and replace it with a new one without
bringing down the node. The whole procedure includes these steps:

1. Removing the OSD from the Ceph cluster.

2. Replacing the drive.

3. Adding the OSD to the cluster.

Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the Ceph Monitor node.

At least one OSD is down.

Removing an OSD from the Ceph Cluster

1. Log into the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Determine which OSD is down.

Example

[ceph: root@host01 /]# ceph osd tree | grep -i down


ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
0 hdd 0.00999 osd.0 down 1.00000 1.00000

3. Mark the OSD as out for the cluster to rebalance and copy its data to other OSDs.

Syntax

ceph osd out _OSD_ID_.

Example

[ceph: root@host01 /]# ceph osd out osd.0


marked out osd.0.

NOTE: If the OSD is down, Ceph marks it as out automatically after 600 seconds when it does not receive any heartbeat
packet from the OSD based on the mon_osd_down_out_interval parameter. When this happens, other OSDs with copies
of the failed OSD data begin backfilling to ensure that the required number of copies exists within the cluster. While the cluster
is backfilling, the cluster will be in a degraded state.

IBM Storage Ceph 1185


4. Ensure that the failed OSD is backfilling.

Example

[ceph: root@host01 /]# ceph -w | grep backfill


2022-05-02 04:48:03.403872 mon.0 [INF] pgmap v10293282: 431 pgs: 1
active+undersized+degraded+remapped+backfilling, 28 active+undersized+degraded, 49
active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 294 active+clean;
72347 MB data, 101302 MB used, 1624 GB / 1722 GB avail; 227 kB/s rd, 1358 B/s wr, 12 op/s;
10626/35917 objects degraded (29.585%); 6757/35917 objects misplaced (18.813%); 63500 kB/s, 15
objects/s recovering
2022-05-02 04:48:04.414397 mon.0 [INF] pgmap v10293283: 431 pgs: 2
active+undersized+degraded+remapped+backfilling, 75
active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 295 active+clean;
72347 MB data, 101398 MB used, 1623 GB / 1722 GB avail; 969 kB/s rd, 6778 B/s wr, 32 op/s;
10626/35917 objects degraded (29.585%); 10580/35917 objects misplaced (29.457%); 125 MB/s, 31
objects/s recovering
2022-05-02 04:48:00.380063 osd.1 [INF] 0.6f starting backfill to osd.0 from (0'0,0'0] MAX to
2521'166639
2022-05-02 04:48:00.380139 osd.1 [INF] 0.48 starting backfill to osd.0 from (0'0,0'0] MAX to
2513'43079
2022-05-02 04:48:00.380260 osd.1 [INF] 0.d starting backfill to osd.0 from (0'0,0'0] MAX to
2513'136847
2022-05-02 04:48:00.380849 osd.1 [INF] 0.71 starting backfill to osd.0 from (0'0,0'0] MAX to
2331'28496
2022-05-02 04:48:00.381027 osd.1 [INF] 0.51 starting backfill to osd.0 from (0'0,0'0] MAX to
2513'87544

You should see the placement group states change from active+clean to active, some degraded objects, and finally
active+clean when migration completes.

5. Stop the OSD:

Syntax

ceph orch daemon stop OSD_ID

Example

[ceph: root@host01 /]# ceph orch daemon stop osd.0

6. Remove the OSD from the storage cluster:

Syntax

ceph orch osd rm OSD_ID --replace

Example

[ceph: root@host01 /]# ceph orch osd rm 0 --replace

The OSD_ID is preserved.

Replacing the physical drive

See the documentation for the hardware node for details on replacing the physical drive.

1. If the drive is hot-swappable, replace the failed drive with a new one.

2. If the drive is not hot-swappable and the node contains multiple OSDs, you might have to shut down the whole node and
replace the physical drive. Consider preventing the cluster from backfilling. See Stopping and Starting Rebalancing chapter in
the IBM Storage Ceph Troubleshooting Guide for details.

3. When the drive appears under the /dev/ directory, make a note of the drive path.

4. If you want to add the OSD manually, find the OSD drive and format the disk.

Adding an OSD to the Ceph Cluster

1. Once the new drive is inserted, you can use the following options to deploy the OSDs:

The OSDs are deployed automatically by the Ceph Orchestrator if the --unmanaged parameter is not set.

Example

1186 IBM Storage Ceph


[ceph: root@host01 /]# ceph orch apply osd --all-available-devices

Deploy the OSDs on all the available devices with the unmanaged parameter set to true.

Example

[ceph: root@host01 /]# ceph orch apply osd --all-available-devices --unmanaged=true

Deploy the OSDs on specific devices and hosts.

Example

[ceph: root@host01 /]# ceph orch daemon add osd host02:/dev/sdb

2. Ensure that the CRUSH hierarchy is accurate:

Example

[ceph: root@host01 /]# ceph osd tree

Reference

See Deploying Ceph OSDs on all available devices section in the IBM Storage Ceph Operations Guide.

See Deploying Ceph OSDs on specific devices and hosts section in the IBM Storage Ceph Operations Guide.

See Down OSDs section in the IBM Storage Ceph Troubleshooting Guide.

See IBM Storage Ceph Installation Guide.

Increasing the PID count


Edit online
If you have a node containing more than 12 Ceph OSDs, the default maximum number of threads (PID count) can be insufficient,
especially during recovery. As a consequence, some ceph-osd daemons can terminate and fail to start again. If this happens,
increase the maximum possible number of threads allowed.

Procedure

To temporary increase the number:

[root@mon ~]# sysctl -w kernel.pid.max=4194303

To permanently increase the number, update the /etc/sysctl.conf file as follows:

kernel.pid.max = 4194303

Deleting data from a full storage cluster


Edit online
Ceph automatically prevents any I/O operations on OSDs that reached the capacity specified by the mon_osd_full_ratio
parameter and returns the full osds error message.

This procedure shows how to delete unnecessary data to fix this error.

NOTE: The mon_osd_full_ratio parameter sets the value of the full_ratio parameter when creating a cluster. You cannot
change the value of mon_osd_full_ratio afterward. To temporarily increase the full_ratio value, increase the set-full-
ratio instead.

Prerequisites

Root-level access to the Ceph Monitor node.

Procedure

IBM Storage Ceph 1187


1. Log in to the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Determine the current value of full_ratio, by default it is set to 0.95:

[ceph: root@host01 /]# ceph osd dump | grep -i full


full_ratio 0.95

3. Temporarily increase the value of set-full-ratio to 0.97:

[ceph: root@host01 /]# ceph osd set-full-ratio 0.97

IMPORTANT: IBM strongly recommends to not set the set-full-ratio to a value higher than 0.97. Setting this parameter
to a higher value makes the recovery process harder. As a consequence, you might not be able to recover full OSDs at all.

4. Verify that you successfully set the parameter to 0.97:

[ceph: root@host01 /]# ceph osd dump | grep -i full


full_ratio 0.97

5. Monitor the cluster state:

[ceph: root@host01 /]# ceph -w

As soon as the cluster changes its state from full to nearfull, delete any unnecessary data.

6. Set the value of full_ratio back to 0.95:

[ceph: root@host01 /]# ceph osd set-full-ratio 0.95

7. Verify that you successfully set the parameter to 0.95:

[ceph: root@host01 /]# ceph osd dump | grep -i full


full_ratio 0.95

Reference

Full OSDs.

Nearfull OSDs.

Troubleshooting a multi-site Ceph Object Gateway


Edit online
This chapter contains information on how to fix the most common errors related to multisite Ceph Object Gateways configuration
and operational conditions.

NOTE: When the bucket sync status command reports bucket is behind on shards even if the data is consistent across multi-site,
performing additional writes to the bucket, synchronizes the sync status reports and displays the message bucket is caught
up with source.

Prerequisites

A running IBM Storage Ceph cluster.

A running Ceph Object Gateway.

Error code definitions for the Ceph Object Gateway


Syncing a multisite Ceph Object Gateway
Synchronizing data in a multi-site Ceph Object Gateway configuration

Error code definitions for the Ceph Object Gateway

1188 IBM Storage Ceph


Edit online
The Ceph Object Gateway logs contain error and warning messages to assist in troubleshooting conditions in your environment.
Some common ones are listed below with suggested resolutions.

Common error messages

data_sync: ERROR: a sync operation returned error


This is the high-level data sync process complaining that a lower-level bucket sync process returned an error. This message is
redundant; the bucket sync error appears above it in the log.

data sync: ERROR: failed to sync object: _BUCKET_NAME_:_OBJECT_NAME_


Either the process failed to fetch the required object over HTTP from a remote gateway or the process failed to write that object to
RADOS and it will be tried again.

data sync: ERROR: failure in sync, backing out (sync_status=2)


A low level message reflecting one of the above conditions, specifically that the data was deleted before it could sync and thus
showing a -2 ENOENT status.

data sync: ERROR: failure in sync, backing out (sync_status=-5)


A low level message reflecting one of the above conditions, specifically that we failed to write that object to RADOS and thus showing
a -5 EIO.

ERROR: failed to fetch remote data log info: ret=11


This is the EAGAIN generic error code from libcurl reflecting an error condition from another gateway. It will try again by default.

meta sync: ERROR: failed to read mdlog info with (2) No such file or directory
The shard of the mdlog was never created so there is nothing to sync.

Syncing error messages

failed to sync object


Either the process failed to fetch this object over HTTP from a remote gateway or it failed to write that object to RADOS and it will be
tried again.

failed to sync bucket instance: (11) Resource temporarily unavailable


A connection issue between primary and secondary zones.

failed to sync bucket instance: (125) Operation canceled


A racing condition exists between writes to the same RADOS object.

Reference

Contact IBM Support for any additional assistance.

Syncing a multisite Ceph Object Gateway


Edit online
A multisite sync reads the change log from other zones. To get a high-level view of the sync progress from the metadata and the data
logs, you can use the following command:

Example

[ceph: root@host01 /]# radosgw-admin sync status

This command lists which log shards, if any, which are behind their source zone.

NOTE: Sometimes you might observe recovering shards when running the radosgw-admin sync status command. For data
sync, there are 128 shards of replication logs that are each processed independently. If any of the actions triggered by these
replication log events result in any error from the network, storage, or elsewhere, those errors get tracked so the operation can retry
again later. While a given shard has errors that need a retry, radosgw-admin sync status command reports that shard as
recovering. This recovery happens automatically, so the operator does not need to intervene to resolve them.

If the results of the sync status you have run above reports log shards are behind, run the following command substituting the shard-
id for X.

Syntax

IBM Storage Ceph 1189


radosgw-admin data sync status --shard-id=X --source-zone=ZONE_NAME

Example

[ceph: root@host01 /]# radosgw-admin data sync status --shard-id=27 --source-zone=us-east


{
"shard_id": 27,
"marker": {
"status": "incremental-sync",
"marker": "1_1534494893.816775_131867195.1",
"next_step_marker": "",
"total_entries": 1,
"pos": 0,
"timestamp": "0.000000"
},
"pending_buckets": [],
"recovering_buckets": [
"pro-registry:4ed07bb2-a80b-4c69-aa15-fdc17ae6f5f2.314303.1:26"
]
}

The output lists which buckets are next to sync and which buckets, if any, are going to be retried due to previous errors.

Inspect the status of individual buckets with the following command, substituting the bucket id for X.

Syntax

radosgw-admin bucket sync status --bucket=X.

Replace X with the ID number of the bucket.

The result shows which bucket index log shards are behind their source zone.

A common error in sync is EBUSY, which means the sync is already in progress, often on another gateway. Read errors written to the
sync error log, which can be read with the following command:

radosgw-admin sync error list

The syncing process will try again until it is successful. Errors can still occur that can require intervention.

Performance counters for multi-site Ceph Object Gateway data sync

Performance counters for multi-site Ceph Object Gateway data


sync
Edit online
The following performance counters are available for multi-site configurations of the Ceph Object Gateway to measure data sync:

poll_latency measures the latency of requests for remote replication logs.

fetch_bytes measures the number of objects and bytes fetched by data sync.

Use the ceph --admin-daemon command to view the current metric data for the performance counters:

Syntax

ceph --admin-daemon /var/run/ceph/cluster-id/ceph-client.rgw.RGW_ID.asok perf dump data-sync-from-


ZONE_NAME

Example

[ceph: root@host01 /]# ceph --admin-daemon /var/run/ceph/cluster-id/ceph-client.rgw.host02-


rgw0.103.94309060818504.asok perf dump data-sync-from-us-west

{
"data-sync-from-us-west": {
"fetch bytes": {
"avgcount": 54,
"sum": 54526039885

1190 IBM Storage Ceph


},
"fetch not modified": 7,
"fetch errors": 0,
"poll latency": {
"avgcount": 41,
"sum": 2.533653367,
"avgtime": 0.061796423
},
"poll errors": 0
}
}

NOTE: You must run the ceph --admin-daemon command from the node running the daemon.

Reference

See Ceph performance counters in the IBM Storage Ceph Administration Guide for more information about performance
counters.

Synchronizing data in a multi-site Ceph Object Gateway


configuration
Edit online
In a multi-site Ceph Object Gateway configuration of a storage cluster, failover and failback causes data synchronization to stop. The
radosgw-admin sync status command reports that the data sync is behind for an extended period of time.

You can run the radosgw-admin data sync init command to synchronize data between the sites and then restart the Ceph
Object Gateway. This command does not touch any actual object data and initiates data sync for a specified source zone. It causes
the zone to restart a full sync from the source zone.

IMPORTANT: Contact IBM support before running the data sync init command to avoid data loss. If you are going for a full
restart of sync, and if there is a lot of data that needs to be synced on the source zone, then the bandwidth consumption is high and
then you have to plan accordingly.

NOTE: If a user accidentally deletes a bucket on the secondary site, you can use the metadata sync init command on the site to
synchronize data.

Prerequisites

A running IBM Storage Ceph cluster.

Ceph Object Gateway configured at two sites at least.

Procedure

1. Check the sync status between the sites:

Example

[ceph: host04 /]# radosgw-admin sync status


realm d713eec8-6ec4-4f71-9eaf-379be18e551b (india)
zonegroup ccf9e0b2-df95-4e0a-8933-3b17b64c52b7 (shared)
zone 04daab24-5bbd-4c17-9cf5-b1981fd7ff79 (primary)
current time 2022-09-15T06:53:52Z
zonegroup features enabled: resharding
metadata sync no sync (zone is master)
data sync source: 596319d2-4ffe-4977-ace1-8dd1790db9fb (secondary)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source

2. Synchronize data from the secondary zone:

Example

[ceph: root@host04 /]# radosgw-admin data sync init --source-zone primary

IBM Storage Ceph 1191


3. Restart all the Ceph Object Gateway daemons at the site:

Example

[ceph: root@host04 /]# ceph orch restart rgw.myrgw

Troubleshooting Ceph placement groups


Edit online
This section contains information about fixing the most common errors related to the Ceph Placement Groups (PGs).

Prerequisites

Verify your network connection.

Ensure that Monitors are able to form a quorum.

Ensure that all healthy OSDs are up and in, and the backfilling and recovery processes are finished.

Most common Ceph placement groups errors


Listing placement groups stuck in stale, inactive, or unclean state
Listing placement group inconsistencies
Repairing inconsistent placement groups
Increasing the placement group

Most common Ceph placement groups errors


Edit online
The following sub-chapters contain tables which list the most common error messages that are returned by the ceph health
detail command. The table provides links to corresponding sections that explain the errors and point to specific procedures to fix
the problems.

In addition, you can list placement groups that are stuck in a state that is not optimal.

Reference

See Listing placement groups stuck in stale, inactive, or unclean state for details.

Prerequisites

A running IBM Storage Ceph cluster.

A running Ceph Object Gateway.

Placement group error messages


Stale placement groups
Inconsistent placement groups
Unclean placement groups
Inactive placement groups
Placement groups are down
Unfound objects

Placement group error messages


Edit online
A table of common placement group error messages, and a potential fix.

Error message See


HEALTH_ERR

1192 IBM Storage Ceph


Error message See
pgs down Placement groups are down
pgs inconsistent Inconsistent placement groups
scrub errors Inconsistent placement groups
HEALTH_WARN
pgs stale Stale placement groups
unfound Unfound objects

Stale placement groups


Edit online
The ceph health command lists some Placement Groups (PGs) as stale:

HEALTH_WARN 24 pgs stale; 3/300 in osds are down

What This Means

The Monitor marks a placement group as stale when it does not receive any status update from the primary OSD of the placement
group’s acting set or when other OSDs reported that the primary OSD is down.

Usually, PGs enter the stale state after you start the storage cluster and until the peering process completes. However, when the
PGs remain stale for longer than expected, it might indicate that the primary OSD for those PGs is down or not reporting PG
statistics to the Monitor. When the primary OSD storing stale PGs is back up, Ceph starts to recover the PGs.

The mon_osd_report_timeout setting determines how often OSDs report PGs statistics to Monitors. By default, this parameter is
set to 0.5, which means that OSDs report the statistics every half a second.

To Troubleshoot This Problem

1. Identify which PGs are stale and on what OSDs they are stored. The error message includes information similar to the
following example:

Example

[ceph: root@host01 /]# ceph health detail


HEALTH_WARN 24 pgs stale; 3/300 in osds are down
...
pg 2.5 is stuck stale+active+remapped, last acting [2,0]
...
osd.10 is down since epoch 23, last address 192.168.106.220:6800/11080
osd.11 is down since epoch 13, last address 192.168.106.220:6803/11539
osd.12 is down since epoch 24, last address 192.168.106.220:6806/11861

2. Troubleshoot any problems with the OSDs that are marked as down.

Reference

See the Monitoring Placement Group Sets section in the Administration Guide for IBM Storage Ceph 5.3

See Down OSDs for details.

Inconsistent placement groups


Edit online
Some placement groups are marked as active + clean + inconsistent and the ceph health detail returns an error
message similar to the following one:

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors


pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

What This Means

IBM Storage Ceph 1193


When Ceph detects inconsistencies in one or more replicas of an object in a placement group, it marks the placement group as
inconsistent. The most common inconsistencies are:

Objects have an incorrect size.

Objects are missing from one replica after a recovery finished.

In most cases, errors during scrubbing cause inconsistency within placement groups.

1. Log in to the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Determine which placement group is in the inconsistent state:

[ceph: root@host01 /]# ceph health detail


HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

3. Determine why the placement group is inconsistent.

a. Start the deep scrubbing process on the placement group:

Syntax

ceph pg deep-scrub ID

Replace ID with the ID of the inconsistent placement group, for example:

[ceph: root@host01 /]# ceph pg deep-scrub 0.6


instructing pg 0.6 on osd.0 to deep-scrub

b. Search the output of the ceph -w for any messages related to that placement group:

Syntax

ceph -w | grep ID

Replace ID with the ID of the inconsistent placement group, for example:

[ceph: root@host01 /]# ceph -w | grep 0.6


2022-05-26 01:35:36.778215 osd.106 [ERR] 0.6 deep-scrub stat mismatch, got 636/635
objects, 0/0 clones, 0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,
1855455/1854371 bytes.
2022-05-26 01:35:36.788334 osd.106 [ERR] 0.6 deep-scrub 1 errors

4. If the output includes any error messages similar to the following ones, you can repair the inconsistent placement group.

Syntax

PG.ID shard OSD: soid OBJECT missing attr , missing attr _ATTRIBUTE_TYPE
PG.ID shard OSD: soid OBJECT digest 0 != known digest DIGEST, size 0 != known size SIZE
PG.ID shard OSD: soid OBJECT size 0 != known size SIZE
PG.ID deep-scrub stat mismatch, got MISMATCH
PG.ID shard OSD: soid OBJECT candidate had a read error, digest 0 != known digest DIGEST

5. If the output includes any error messages similar to the following ones, it is not safe to repair the inconsistent placement
group because you can lose data. Open a support ticket in this situation.

PG.ID shard OSD: soid OBJECT digest DIGEST != known digest DIGEST
PG.ID shard OSD: soid OBJECT omap_digest DIGEST != known omap_digest DIGEST

Reference

See Listing placement group inconsistencies section in the IBM Storage Ceph Troubleshooting Guide.

See Ceph data integrity the section in the IBM Storage Ceph Architecture Guide.

See Scrubbing the OSD section in the IBM Storage Ceph Configuration Guide.

See Repairing inconsistent placement groups for details.

1194 IBM Storage Ceph


Unclean placement groups
Edit online
The ceph health command returns an error message similar to the following one:

HEALTH_WARN 197 pgs stuck unclean

What This Means

Ceph marks a placement group as unclean if it has not achieved the active+clean state for the number of seconds specified in
the mon_pg_stuck_threshold parameter in the Ceph configuration file. The default value of mon_pg_stuck_threshold is 300
seconds.

If a placement group is unclean, it contains objects that are not replicated the number of times specified in the
osd_pool_default_size parameter. The default value of osd_pool_default_size is 3, which means that Ceph creates three
replicas.

Usually, unclean placement groups indicate that some OSDs might be down.

1. Determine which OSDs are down:

[ceph: root@host01 /]# ceph osd tree

2. Troubleshoot and fix any problems with the OSDs. See Down OSDs for details.

Reference

Listing placement groups stuck in stale inactive or unclean state

Inactive placement groups


Edit online
The ceph health command returns an error message similar to the following one:

HEALTH_WARN 197 pgs stuck inactive

What This Means

Ceph marks a placement group as inactive if it has not be active for the number of seconds specified in the
mon_pg_stuck_threshold parameter in the Ceph configuration file. The default value of mon_pg_stuck_threshold is 300
seconds.

Usually, inactive placement groups indicate that some OSDs might be down.

1. Determine which OSDs are down:

# ceph osd tree

2. Troubleshoot and fix any problems with the OSDs.

Reference

See Listing placement groups stuck in stale inactive or unclean state for details.

See Down OSDs for details.

Placement groups are down


Edit online
The ceph health detail command reports that some placement groups are down:

IBM Storage Ceph 1195


HEALTH_ERR 7 pgs degraded; 12 pgs down; 12 pgs peering; 1 pgs recovering; 6 pgs stuck unclean;
114/3300 degraded (3.455%); 1/3 in osds are down
...
pg 0.5 is down+peering
pg 1.4 is down+peering
...
osd.1 is down since epoch 69, last address 192.168.106.220:6801/8651

What This Means

In certain cases, the peering process can be blocked, which prevents a placement group from becoming active and usable. Usually, a
failure of an OSD causes the peering failures.

To Troubleshoot This Problem

Determine what blocks the peering process:

Syntax

ceph pg ID query

Replace _ID_ with the ID of the placement group that is down:

Example

[ceph: root@host01 /]# ceph pg 0.5 query

{ "state": "down+peering",
...
"recovery_state": [
{ "name": "Started\/Primary\/Peering\/GetInfo",
"enter_time": "2021-08-06 14:40:16.169679",
"requested_info_from": []},
{ "name": "Started\/Primary\/Peering",
"enter_time": "2021-08-06 14:40:16.169659",
"probing_osds": [
0,
1],
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
1],
"peering_blocked_by": [
{ "osd": 1,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"}]},
{ "name": "Started",
"enter_time": "2021-08-06 14:40:16.169513"}
]
}

The recovery_state section includes information on why the peering process is blocked.

If the output includes the peering is blocked due to down osds error message, see Down OSDs.

If you see any other error message, open a support ticket. See Contacting IBM Support for service for details.

Reference

See Ceph OSD peering section in the IBM Storage Ceph Administration Guide.

Unfound objects
Edit online
The ceph health command returns an error message similar to the following one, containing the unfound keyword:

HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%)

What this means

1196 IBM Storage Ceph


Ceph marks objects as unfound when it knows these objects or their newer copies exist but it is unable to find them. As a
consequence, Ceph cannot recover such objects and proceed with the recovery process.

An example situation

A placement group stores data on osd.1 and osd.2.

1. osd.1 goes down.

2. osd.2 handles some write operations.

3. osd.1 comes up.

4. A peering process between osd.1 and osd.2 starts, and the objects missing on osd.1 are queued for recovery.

5. Before Ceph copies new objects, osd.2 goes down.

As a result, osd.1 knows that these objects exist, but there is no OSD that has a copy of the objects.

In this scenario, Ceph is waiting for the failed node to be accessible again, and the unfound objects blocks the recovery process.

To troubleshoot this problem

1. Log in to the Cephadm shell:

Example

[root@host01 ~]# cephadm shell

2. Determine which placement group contains unfound objects:

[ceph: root@host01 /]# ceph health detail


HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; recovery 5/937611 objects degraded
(0.001%); 1/312537 unfound (0.000%)
pg 3.8a5 is stuck unclean for 803946.712780, current state active+recovering, last acting
[320,248,0]
pg 3.8a5 is active+recovering, acting [320,248,0], 1 unfound
recovery 5/937611 objects degraded (0.001%); **1/312537 unfound (0.000%)**

3. List more information about the placement group:

Syntax

ceph pg ID query

Replace ID with the ID of the placement group containing the unfound objects:

Example

[ceph: root@host01 /]# ceph pg 3.8a5 query


{ "state": "active+recovering",
"epoch": 10741,
"up": [
320,
248,
0],
"acting": [
320,
248,
0],
<snip>
"recovery_state": [
{ "name": "Started\/Primary\/Active",
"enter_time": "2021-08-28 19:30:12.058136",
"might_have_unfound": [
{ "osd": "0",
"status": "already probed"},
{ "osd": "248",
"status": "already probed"},
{ "osd": "301",
"status": "already probed"},
{ "osd": "362",
"status": "already probed"},
{ "osd": "395",

IBM Storage Ceph 1197


"status": "already probed"},
{ "osd": "429",
"status": "osd is down"}],
"recovery_progress": { "backfill_targets": [],
"waiting_on_backfill": [],
"last_backfill_started": "0\/\/0\/\/-1",
"backfill_info": { "begin": "0\/\/0\/\/-1",
"end": "0\/\/0\/\/-1",
"objects": []},
"peer_backfill_info": [],
"backfills_in_flight": [],
"recovering": [],
"pg_backend": { "pull_from_peer": [],
"pushing": []}},
"scrub": { "scrubber.epoch_start": "0",
"scrubber.active": 0,
"scrubber.block_writes": 0,
"scrubber.finalizing": 0,
"scrubber.waiting_on": 0,
"scrubber.waiting_on_whom": []}},
{ "name": "Started",
"enter_time": "2021-08-28 19:30:11.044020"}],

The might_have_unfound section includes OSDs where Ceph tried to locate the unfound objects:

The already probed status indicates that Ceph cannot locate the unfound objects in that OSD.

The osd is down status indicates that Ceph cannot contact that OSD.

4. Troubleshoot the OSDs that are marked as down. See Down OSDs for details.

5. If you are unable to fix the problem that causes the OSD to be down, open a support ticket. See IBM support for details.

Listing placement groups stuck in stale, inactive, or unclean


state
Edit online
After a failure, placement groups enter states like degraded or peering. This states indicate normal progression through the failure
recovery process.

However, if a placement group stays in one of these states for a longer time than expected, it can be an indication of a larger
problem. The Monitors report when placement groups get stuck in a state that is not optimal.

The mon_pg_stuck_threshold option in the Ceph configuration file determines the number of seconds after which placement
groups are considered inactive, unclean, or stale.

The following table lists these states together with a short explanation.

State What it means Most common causes See


inactive The PG has not been able to service read/write requests. - Peering problems Inactive
placement
groups
unclean The PG contains objects that are not replicated the desired - unfound objects - OSDs are Unclean
number of times. Something is preventing the PG from down - Incorrect placement
recovering. configuration groups
stale The status of the PG has not been updated by a ceph-osd - OSDs are down Stale placement
daemon. groups
Prerequisites

A running IBM Storage Ceph cluster.

Root-level access to the node.

Procedure

1. Log into the Cephadm shell:

1198 IBM Storage Ceph


Example

[root@host01 ~]# cephadm shell

2. List the stuck PGs:

Example

[ceph: root@host01 /]# ceph pg dump_stuck inactive


[ceph: root@host01 /]# ceph pg dump_stuck unclean
[ceph: root@host01 /]# ceph pg dump_stuck stale

Reference

See Placement Group States section in the IBM Storage Ceph Administration Guide.

Listing placement group inconsistencies


Edit online
Use the rados utility to list inconsistencies in various replicas of objects. Use the --format=json-pretty option to list a more
detailed output.

This section covers the listing of:

Inconsistent placement group in a pool

Inconsistent objects in a placement group

Inconsistent snapshot sets in a placement group

Prerequisites

A running IBM Storage Ceph cluster in a healthy state.

Root-level access to the node.

Procedure

1. List all the inconsistent placement groups in a pool:

Syntax

rados list-inconsistent-pg POOL --format=json-pretty

Example

[ceph: root@host01 /]# rados list-inconsistent-pg data --format=json-pretty


[0.6]

2. List inconsistent objects in a placement group with ID:

Syntax

rados list-inconsistent-obj PLACEMENT_GROUP_ID

Example

[ceph: root@host01 /]# rados list-inconsistent-obj 0.6


{
"epoch": 14,
"inconsistents": [
{
"object": {
"name": "image1",
"nspace": "",
"locator": "",
"snap": "head",
"version": 1
},
"errors": [

IBM Storage Ceph 1199


"data_digest_mismatch",
"size_mismatch"
],
"union_shard_errors": [
"data_digest_mismatch_oi",
"size_mismatch_oi"
],
"selected_object_info": "0:602f83fe:::foo:head(16'1 client.4110.0:1
dirty|data_digest|omap_digest s 968 uv 1 dd e978e67f od ffffffff alloc_hint [0 0 0])",
"shards": [
{
"osd": 0,
"errors": [],
"size": 968,
"omap_digest": "0xffffffff",
"data_digest": "0xe978e67f"
},
{
"osd": 1,
"errors": [],
"size": 968,
"omap_digest": "0xffffffff",
"data_digest": "0xe978e67f"
},
{
"osd": 2,
"errors": [
"data_digest_mismatch_oi",
"size_mismatch_oi"
],
"size": 0,
"omap_digest": "0xffffffff",
"data_digest": "0xffffffff"
}
]
}
]
}

The following fields are important to determine what causes the inconsistency:

name
The name of the object with inconsistent replicas.

nspace
The namespace that is a logical separation of a pool. It’s empty by default.

locator
The key that is used as the alternative of the object name for placement.

snap
The snapshot ID of the object. The only writable version of the object is called head. If an object is a clone, this field includes
its sequential ID.

version
The version ID of the object with inconsistent replicas. Each write operation to an object increments it.

errors
A list of errors that indicate inconsistencies between shards without determining which shard or shards are incorrect. See the
shard array to further investigate the errors.

data_digest_mismatch
The digest of the replica read from one OSD is different from the other OSDs.

size_mismatch
The size of a clone or the head object does not match the expectation.

read_error
This error indicates inconsistencies caused most likely by disk errors.

union_shard_error
The union of all errors specific to shards. These errors are connected to a faulty shard. The errors that end with oi indicate
that you have to compare the information from a faulty object to information with selected objects. See the shard array to
further investigate the errors.

1200 IBM Storage Ceph


In the above example, the object replica stored on osd.2 has different digest than the replicas stored on osd.0 and osd.1.
Specifically, the digest of the replica is not 0xffffffff as calculated from the shard read from osd.2, but 0xe978e67f. In
addition, the size of the replica read from osd.2 is 0, while the size reported by osd.0 and osd.1 is 968.

1. List inconsistent sets of snapshots:

Syntax

rados list-inconsistent-snapset PLACEMENT_GROUP_ID

Example

[ceph: root@host01 /]# rados list-inconsistent-snapset 0.23 --format=json-pretty


{
"epoch": 64,
"inconsistents": [
{
"name": "obj5",
"nspace": "",
"locator": "",
"snap": "0x00000001",
"headless": true
},
{
"name": "obj5",
"nspace": "",
"locator": "",
"snap": "0x00000002",
"headless": true
},
{
"name": "obj5",
"nspace": "",
"locator": "",
"snap": "head",
"ss_attr_missing": true,
"extra_clones": true,
"extra clones": [
2,
1
]
}
]

The command returns the following errors:

ss_attr_missing
One or more attributes are missing. Attributes are information about snapshots encoded into a snapshot set as a list of key-
value pairs.

ss_attr_corrupted
One or more attributes fail to decode.

clone_missing
A clone is missing.

snapset_mismatch
The snapshot set is inconsistent by itself.

head_mismatch
The snapshot set indicates that head exists or not, but the scrub results report otherwise.

headless
The head of the snapshot set is missing.

size_mismatch
The size of a clone or the head object does not match the expectation.

Reference

Inconsistent placement groups.

Repairing inconsistent placement groups.

IBM Storage Ceph 1201


Repairing inconsistent placement groups
Edit online
Due to an error during deep scrubbing, some placement groups can include inconsistencies. Ceph reports such placement groups as
inconsistent:

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors


pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

WARNING: You can repair only certain inconsistencies.

Do not repair the placement groups if the Ceph logs include the following errors:

_PG_._ID_ shard _OSD_: soid _OBJECT_ digest _DIGEST_ != known digest _DIGEST_
_PG_._ID_ shard _OSD_: soid _OBJECT_ omap_digest _DIGEST_ != known omap_digest _DIGEST_

Open a support ticket instead. See Contacting IBM Support for service for details.

Prerequisites

Root-level access to the Ceph Monitor node.

Procedure

Repair the inconsistent placement groups:

Syntax

ceph pg repair ID

Replace _ID_ with the ID of the inconsistent placement group.

Reference

Inconsistent placement groups.

Listing placement group inconsistencies.

Increasing the placement group


Edit online
Insufficient Placement Group (PG) count impacts the performance of the Ceph cluster and data distribution. It is one of the main
causes of the nearfull osds error messages.

The recommended ratio is between 100 and 300 PGs per OSD. This ratio can decrease when you add more OSDs to the cluster.

The pg_num and pgp_num parameters determine the PG count. These parameters are configured per each pool, and therefore, you
must adjust each pool with low PG count separately.

IMPORTANT: Increasing the PG count is the most intensive process that you can perform on a Ceph cluster. This process might have
a serious performance impact if not done in a slow and methodical way. Once you increase pgp_num, you will not be able to stop or
reverse the process and you must complete it. Consider increasing the PG count outside of business critical processing time
allocation, and alert all clients about the potential performance impact. Do not change the PG count if the cluster is in the
HEALTH_ERR state.

Prerequisites

A running IBM Storage Ceph cluster in a healthy state.

Root-level access to the node.

Procedure

1. Reduce the impact of data redistribution and recovery on individual OSDs and OSD hosts:

1202 IBM Storage Ceph


a. Lower the value of the osd max backfills, osd_recovery_max_active, and osd_recovery_op_priority
parameters:

[ceph: root@host01 /]# ceph tell osd.* injectargs '--osd_max_backfills 1 --


osd_recovery_max_active 1 --osd_recovery_op_priority 1'

b. Disable the shallow and deep scrubbing:

[ceph: root@host01 /]# ceph osd set noscrub


[ceph: root@host01 /]# ceph osd set nodeep-scrub

2. Use the Ceph Placement Groups (PGs) per Pool Calculator to calculate the optimal value of the pg_num and pgp_num
parameters.

3. Increase the pg_num value in small increments until you reach the desired value.

a. Determine the starting increment value. Use a very low value that is a power of two, and increase it when you
determine the impact on the cluster. The optimal value depends on the pool size, OSD count, and client I/O load.

b. Increment the pg_num value:

Syntax

ceph osd pool set POOL pg_num VALUE

Specify the pool name and the new value, for example:

Example

[ceph: root@host01 /]# ceph osd pool set data pg_num 4

c. Monitor the status of the cluster:

Example

[ceph: root@host01 /]# ceph -s

The PGs state will change from creating to active+clean. Wait until all PGs are in the active+clean state.

4. Increase the pgp_num value in small increments until you reach the desired value:

a. Determine the starting increment value. Use a very low value that is a power of two, and increase it when you
determine the impact on the cluster. The optimal value depends on the pool size, OSD count, and client I/O load.

b. Increment the pgp_num value:

Syntax

ceph osd pool set POOL pgp_num VALUE

Specify the pool name and the new value, for example:

[ceph: root@host01 /]# ceph osd pool set data pgp_num 4

c. Monitor the status of the cluster:

[ceph: root@host01 /]# ceph -s

The PGs state will change through peering, wait_backfill, backfilling, recover, and others. Wait until all PGs
are in the active+clean state.

5. Repeat the previous steps for all pools with insufficient PG count.

6. Set osd max backfills, osd_recovery_max_active, and osd_recovery_op_priority to their default values:

[ceph: root@host01 /]# ceph tell osd.* injectargs '--osd_max_backfills 1 --


osd_recovery_max_active 3 --osd_recovery_op_priority 3'

7. Enable the shallow and deep scrubbing:

[ceph: root@host01 /]# ceph osd unset noscrub


[ceph: root@host01 /]# ceph osd unset nodeep-scrub

Reference

IBM Storage Ceph 1203


See Nearfull OSDs.

See Monitoring Placement Group Sets section in the IBM Storage Ceph Administration Guide.

Troubleshooting Ceph objects


Edit online
As a storage administrator, you can use the ceph-objectstore-tool utility to perform high-level or low-level object operations.
The ceph-objectstore-tool utility can help you troubleshoot problems related to objects within a particular OSD or placement
group.

IMPORTANT: Manipulating objects can cause unrecoverable data loss. Contact IBM support before using the ceph-objectstore-
tool utility.

Prerequisites

Verify there are no network-related issues.

Troubleshooting high-level object operations


Troubleshooting low-level object operations

Troubleshooting high-level object operations


Edit online
As a storage administrator, you can use the ceph-objectstore-tool utility to perform high-level object operations. The ceph-
objectstore-tool utility supports the following high-level object operations:

List objects

List lost objects

Fix lost objects

IMPORTANT: Manipulating objects can cause unrecoverable data loss. Contact IBM support before using the ceph-objectstore-
tool utility.

Prerequisites

Root-level access to the Ceph OSD nodes.

Listing objects
Fixing lost objects

Listing objects
Edit online
The OSD can contain zero to many placement groups, and zero to many objects within a placement group (PG). The ceph-
objectstore-tool utility allows you to list objects stored within an OSD.

Prerequisites

Root-level access to the Ceph OSD node.

Stopping the ceph-osd daemon.

Procedure

1. Verify the appropriate OSD is down:

Syntax

1204 IBM Storage Ceph


systemctl status ceph-osd@OSD_NUMBER

Example

[root@host01 ~]# systemctl status ceph-osd@1

2. Log in to the OSD container:

Syntax

cephadm shell --name osd.OSD_ID

Example

[root@host01 ~]# cephadm shell --name osd.0

3. Identify all the objects within an OSD, regardless of their placement group:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --op list

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --op list

4. Identify all the objects within a placement group:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --pgid PG_ID --op list

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c


--op list

5. Identify the PG an object belongs to:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --op list OBJECT_ID

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --op list


default.region

Fixing lost objects


Edit online
You can use the ceph-objectstore-tool utility to list and fix lost and unfound objects stored within a Ceph OSD. This procedure
applies only to legacy objects.

Prerequisites

Root-level access to the Ceph OSD node.

Stopping the ceph-osd daemon.

Procedure

1. Verify the appropriate OSD is down:

Syntax

systemctl status ceph-osd@OSD_NUMBER

Example

[root@host01 ~]# systemctl status ceph-osd@1

IBM Storage Ceph 1205


2. Log in to the OSD container:

Syntax

cephadm shell --name osd.OSD_ID

Example

[root@host01 ~]# cephadm shell --name osd.0

3. To list all the lost legacy objects:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --op fix-lost --dry-run

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --op fix-


lost --dry-run

4. Use the ceph-objectstore-tool utility to fix lost and unfound objects. Select the appropriate circumstance:

a. To fix all lost objects:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --op fix-lost

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --op


fix-lost

b. To fix all the lost objects within a placement group:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --pgid PG_ID --op fix-lost

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid


0.1c --op fix-lost

c. To fix a lost object by its identifier:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --op fix-lost OBJECT_ID

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --op


fix-lost default.region

Troubleshooting low-level object operations


Edit online
As a storage administrator, you can use the ceph-objectstore-tool utility to perform low-level object operations. The ceph-
objectstore-tool utility supports the following low-level object operations:

Manipulate the object's content

Remove an object

List the object map (OMAP)

Manipulate the OMAP header

1206 IBM Storage Ceph


Manipulate the OMAP key

List the object's attributes

Manipulate the object's attribute key

IMPORTANT: Manipulating objects can cause unrecoverable data loss. Contact IBM support before using the ceph-objectstore-
tool utility.

Prerequisites

Root-level access to the Ceph OSD nodes.

Manipulating the object’s content


Removing an object
Listing the object map
Manipulating the object map header
Manipulating the object map key
Listing the object’s attributes
Manipulating the object attribute key

Manipulating the object’s content


Edit online
With the ceph-objectstore-tool utility, you can get or set bytes on an object.

IMPORTANT: Setting the bytes on an object can cause unrecoverable data loss. To prevent data loss, make a backup copy of the
object.

Prerequisites

Root-level access to the Ceph OSD node.

Stopping the ceph-osd daemon.

Procedure

1. Verify the appropriate OSD is down:

Syntax

systemctl status ceph-osd@OSD_ID

Example

[root@host01 ~]# systemctl status ceph-osd@1

2. Find the object by listing the objects of the OSD or placement group (PG).

3. Log in to the OSD container:

Syntax

cephadm shell --name osd.OSD_ID

Example

[root@host01 ~]# cephadm shell --name osd.0

4. Before setting the bytes on an object, make a backup and a working copy of the object:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --pgid PG_ID \


OBJECT \
get-bytes > OBJECT_FILE_NAME

Example

IBM Storage Ceph 1207


[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c
\
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
get-bytes > zone_info.default.backup

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c


\
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
get-bytes > zone_info.default.working-copy

5. Edit the working copy object file and modify the object contents accordingly.

6. Set the bytes of the object:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --pgid PG_ID \


OBJECT \
set-bytes < OBJECT_FILE_NAME

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c


\
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
set-bytes < zone_info.default.working-copy

Removing an object
Edit online
Use the ceph-objectstore-tool utility to remove an object. By removing an object, its contents and references are removed
from the placement group (PG).

IMPORTANT: You cannot recreate an object once it is removed.

Prerequisites

Root-level access to the Ceph OSD node.

Stopping the ceph-osd daemon.

Procedure

1. Log in to the OSD container:

Syntax

cephadm shell --name osd.OSD_ID

Example

[root@host01 ~]# cephadm shell --name osd.0

2. Remove an object:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --pgid PG_ID \


OBJECT \
remove

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c


\
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace

1208 IBM Storage Ceph


":""}' \
remove

Listing the object map


Edit online
Use the ceph-objectstore-tool utility to list the contents of the object map (OMAP). The output provides you a list of keys.

Prerequisites

Root-level access to the Ceph OSD node.

Stopping the ceph-osd daemon.

Procedure

1. Verify the appropriate OSD is down:

systemctl status ceph-osd@OSD_ID

Example

[root@host01 ~]# systemctl status ceph-osd@1

2. Log in to the OSD container:

Syntax

cephadm shell --name osd.OSD_ID

Example

[root@host01 ~]# cephadm shell --name osd.0

3. List the object map:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD --pgid PG_ID OBJECT list-omap

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c


\
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
list-omap

Manipulating the object map header


Edit online
The ceph-objectstore-tool utility outputs the object map (OMAP) header with the values associated with the object’s keys.

Prerequisites

Root-level access to the Ceph OSD node.

Stopping the ceph-osd daemon.

Procedure

1. Verify the appropriate OSD is down:

Syntax

systemctl status ceph-osd@OSD_ID

IBM Storage Ceph 1209


Example

[root@host01 ~]# systemctl status ceph-osd@1

2. Log in to the OSD container:

Syntax

cephadm shell --name osd.OSD_ID

Example

[root@host01 ~]# cephadm shell --name osd.0

3. Get the object map header:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD \


--pgid PG_ID OBJECT \
get-omaphdr > OBJECT_MAP_FILE_NAME

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 \


--pgid 0.1c
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
get-omaphdr > zone_info.default.omaphdr.txt

4. Set the object map header:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD \


--pgid PG_ID OBJECT \
get-omaphdr < OBJECT_MAP_FILE_NAME

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 \


--pgid 0.1c
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
set-omaphdr < zone_info.default.omaphdr.txt

Manipulating the object map key


Edit online
Use the ceph-objectstore-tool utility to change the object map (OMAP) key. You need to provide the data path, the placement
group identifier (PG ID), the object, and the key in the OMAP.

Prerequisites

Root-level access to the Ceph OSD node.

Stopping the ceph-osd daemon.

Procedure

1. Log in to the OSD container:

Syntax

cephadm shell --name osd.OSD_ID

Example

[root@host01 ~]# cephadm shell --name osd.0

1210 IBM Storage Ceph


2. Get the object map key:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD \


--pgid PG_ID OBJECT \
get-omap KEY > OBJECT_MAP_FILE_NAME

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 \


--pgid 0.1c
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
get-omap "" > zone_info.default.omap.txt

3. Set the object map key:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD \


--pgid PG_ID OBJECT \
set-omap KEY < OBJECT_MAP_FILE_NAME

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 \


--pgid 0.1c
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
set-omap "" < zone_info.default.omap.txt

4. Remove the object map key:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD \


--pgid PG_ID OBJECT \
rm-omap KEY

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 \


--pgid 0.1c
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
rm-omap ""

Listing the object’s attributes


Edit online
Use the ceph-objectstore-tool utility to list an object’s attributes. The output provides you with the object’s keys and values.

Prerequisites

Root-level access to the Ceph OSD node.

Stopping the ceph-osd daemon.

Procedure

1. Verify the appropriate OSD is down:

Syntax

systemctl status ceph-osd@OSD_ID

Example

[root@host01 ~]# systemctl status ceph-osd@1

IBM Storage Ceph 1211


2. Log in to the OSD container:

Syntax

cephadm shell --name osd.OSD_ID

Example

[root@host01 ~]# cephadm shell --name osd.0

3. List the object’s attributes:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD \


--pgid PG_ID OBJECT \
list-attrs

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 \


--pgid 0.1c
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
list-attrs

Manipulating the object attribute key


Edit online
Use the ceph-objectstore-tool utility to change an object’s attributes. To manipulate the object’s attributes you need the data
paths, the placement group identifier (PG ID), the object, and the key in the object’s attribute.

Prerequisites

Root-level access to the Ceph OSD node.

Stop the ceph-osd daemon.

Procedure

1. Verify the appropriate OSD is down:

Syntax

systemctl status ceph-osd@OSD_ID

Example

[root@host01 ~]# systemctl status ceph-osd@1

2. Log in to the OSD container:

Syntax

cephadm shell --name osd.OSD_ID

Example

[root@host01 ~]# cephadm shell --name osd.0

3. Get the object’s attributes:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD \


--pgid PG_ID OBJECT \
get-attr KEY > OBJECT_ATTRS_FILE_NAME

Example

1212 IBM Storage Ceph


[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 \
--pgid 0.1c
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
get-attr "oid" > zone_info.default.attr.txt

4. Set an object’s attributes:

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD \


--pgid PG_ID OBJECT \
set-attr KEY < OBJECT_ATTRS_FILE_NAME

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 \


--pgid 0.1c
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
set-attr "oid" < zone_info.default.attr.txt

5. Remove an object’s attributes.

Syntax

ceph-objectstore-tool --data-path PATH_TO_OSD \


--pgid PG_ID OBJECT \
rm-attr KEY

Example

[ceph: root@host01 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 \


--pgid 0.1c
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace
":""}' \
rm-attr "oid"

Troubleshooting clusters in stretch mode


Edit online
You can replace and remove the failed tiebreaker monitors. You can also force the cluster into the recovery or healthy mode if
needed.

Replacing the tiebreaker with a monitor in quorum


Replacing the tiebreaker with a new monitor
Forcing stretch cluster into recovery or healthy mode

Replacing the tiebreaker with a monitor in quorum


Edit online
If your tiebreaker monitor fails, you can replace it with an existing monitor in quorum and remove it from the cluster.

Prerequisites

A running IBM Storage Ceph cluster.

Stretch mode is enabled on a cluster.

Procedure

1. Disable automated monitor deployment:

Example

IBM Storage Ceph 1213


[ceph: root@host01 /]# ceph orch apply mon --unmanaged

Scheduled mon update…

2. View the monitors in quorum:

Example

[ceph: root@host01 /]# ceph -s

mon: 5 daemons, quorum host01, host02, host04, host05 (age 30s), out of quorum: host07

3. Set the monitor in quorum as a new tiebreaker:

Syntax

ceph mon set_new_tiebreaker NEW_HOST

Example

[ceph: root@host01 /]# ceph mon set_new_tiebreaker host02

Important: You get an error message if the monitor is in the same location as existing non-tiebreaker monitors:

Example

[ceph: root@host01 /]# ceph mon set_new_tiebreaker host02

Error EINVAL: mon.host02 has location DC1, which matches mons host02 on the datacenter
dividing bucket for stretch mode.

If that happens, change the location of the monitor:

Syntax

ceph mon set_location HOST datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph mon set_location host02 datacenter=DC3

4. Remove the failed tiebreaker monitor:

Syntax

ceph orch daemon rm FAILED_TIEBREAKER_MONITOR --force

Example

[ceph: root@host01 /]# ceph orch daemon rm mon.host07 --force

Removed mon.host07 from host 'host07'

5. Once the monitor is removed from the host, redeploy the monitor:

Syntax

ceph mon add HOST IP_ADDRESS datacenter=DATACENTER


ceph orch daemon add mon HOST

Example

[ceph: root@host01 /]# ceph mon add host07 213.222.226.50 datacenter=DC1


[ceph: root@host01 /]# ceph orch daemon add mon host07

6. Ensure there are five monitors in quorum:

Example

[ceph: root@host01 /]# ceph -s

mon: 5 daemons, quorum host01, host02, host04, host05, host07 (age 15s)

7. Verify that everything is configured properly:

1214 IBM Storage Ceph


Example

[ceph: root@host01 /]# ceph mon dump

epoch 19
fsid 1234ab78-1234-11ed-b1b1-de456ef0a89d
last_changed 2023-01-17T04:12:05.709475+0000
created 2023-01-16T05:47:25.631684+0000
min_mon_release 16 (pacific)
election_strategy: 3
stretch_mode_enabled 1
tiebreaker_mon host02
disallowed_leaders host02
0: [v2:132.224.169.63:3300/0,v1:132.224.169.63:6789/0] mon.host02; crush_location
{datacenter=DC3}
1: [v2:220.141.179.34:3300/0,v1:220.141.179.34:6789/0] mon.host04; crush_location
{datacenter=DC2}
2: [v2:40.90.220.224:3300/0,v1:40.90.220.224:6789/0] mon.host01; crush_location
{datacenter=DC1}
3: [v2:60.140.141.144:3300/0,v1:60.140.141.144:6789/0] mon.host07; crush_location
{datacenter=DC1}
4: [v2:186.184.61.92:3300/0,v1:186.184.61.92:6789/0] mon.host03; crush_location
{datacenter=DC2}
dumped monmap epoch 19

8. Redeploy the monitors:

Syntax

ceph orch apply mon --placement="HOST_1, HOST_2, HOST_3, HOST_4, HOST_5”

Example

[ceph: root@host01 /]# ceph orch apply mon --placement="host01, host02, host04, host05,
host07"

Scheduled mon update...

Replacing the tiebreaker with a new monitor


Edit online
If your tiebreaker monitor fails, you can replace it with a new monitor and remove it from the cluster.

Prerequisites

A running IBM Storage Ceph cluster.

Stretch mode in enabled on a cluster.

Procedure

1. Add a new monitor to the cluster:

a. Manually add the crush_location to the new monitor:

Syntax

ceph mon add NEW_HOST IP_ADDRESS datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph mon add host06 213.222.226.50 datacenter=DC3

adding mon.host06 at [v2:213.222.226.50:3300/0,v1:213.222.226.50:6789/0]

Note: The new monitor has to be in a different location than existing non-tiebreaker monitors.

a. Disable automated monitor deployment:

Example

IBM Storage Ceph 1215


[ceph: root@host01 /]# ceph orch apply mon --unmanaged

Scheduled mon update…

a. Deploy the new monitor:

Syntax

ceph orch daemon add mon NEW_HOST

Example

[ceph: root@host01 /]# ceph orch daemon add mon host06

2. Ensure there are 6 monitors, from which 5 are in quorum:

Example

[ceph: root@host01 /]# ceph -s

mon: 6 daemons, quorum host01, host02, host04, host05, host06 (age 30s), out of quorum:
host07

3. Set the new monitor as a new tiebreaker:

Syntax

ceph mon set_new_tiebreaker NEW_HOST

Example

[ceph: root@host01 /]# ceph mon set_new_tiebreaker host06

4. Remove the failed tiebreaker monitor:

Syntax

ceph orch daemon rm FAILED_TIEBREAKER_MONITOR --force

Example

[ceph: root@host01 /]# ceph orch daemon rm mon.host07 --force

Removed mon.host07 from host 'host07'

5. Verify that everything is configured properly:

Example

[ceph: root@host01 /]# ceph mon dump

epoch 19
fsid 1234ab78-1234-11ed-b1b1-de456ef0a89d
last_changed 2023-01-17T04:12:05.709475+0000
created 2023-01-16T05:47:25.631684+0000
min_mon_release 16 (pacific)
election_strategy: 3
stretch_mode_enabled 1
tiebreaker_mon host06
disallowed_leaders host06
0: [v2:213.222.226.50:3300/0,v1:213.222.226.50:6789/0] mon.host06; crush_location
{datacenter=DC3}
1: [v2:220.141.179.34:3300/0,v1:220.141.179.34:6789/0] mon.host04; crush_location
{datacenter=DC2}
2: [v2:40.90.220.224:3300/0,v1:40.90.220.224:6789/0] mon.host01; crush_location
{datacenter=DC1}
3: [v2:60.140.141.144:3300/0,v1:60.140.141.144:6789/0] mon.host02; crush_location
{datacenter=DC1}
4: [v2:186.184.61.92:3300/0,v1:186.184.61.92:6789/0] mon.host05; crush_location
{datacenter=DC2}
dumped monmap epoch 19

6. Redeploy the monitors:

Syntax

1216 IBM Storage Ceph


ceph orch apply mon --placement="HOST_1, HOST_2, HOST_3, HOST_4, HOST_5”

Example

[ceph: root@host01 /]# ceph orch apply mon --placement="host01, host02, host04, host05,
host06"

Scheduled mon update…

Forcing stretch cluster into recovery or healthy mode


Edit online
When in stretch degraded mode, the cluster goes into the recovery mode automatically after the disconnected data center comes
back. If that does not happen, or you want to enable recovery mode early, you can force the stretch cluster into the recovery mode.

Prerequisites

A running IBM Storage Ceph cluster.

Stretch mode in enabled on a cluster.

Procedure

1. Force the stretch cluster into the recovery mode:

Example

[ceph: root@host01 /]# ceph osd force_recovery_stretch_mode --yes-i-really-mean-it

NOTE: The recovery state puts the cluster in the HEALTH_WARN state.

2. When in recovery mode, the cluster should go back into normal stretch mode after the placement groups are healthy. If that
does not happen, you can force the stretch cluster into the healthy mode:

Example

[ceph: root@host01 /]# ceph osd force_healthy_stretch_mode --yes-i-really-mean-it

NOTE: You can also run this command if you want to force the cross-data-center peering early and you are willing to risk data
downtime, or you have verified separately that all the placement groups can peer, even if they are not fully recovered. You
might also wish to invoke the healthy mode to remove the HEALTH_WARN state, which is generated by the recovery state.

NOTE: The force_recovery_stretch_mode and force_recovery_healthy_mode commands should not be necessary,


as they are included in the process of managing unanticipated situations.

Contacting IBM support for service


Edit online
If the information in this guide did not help you to solve the problem, this chapter explains how you contact the IBM support service.

Prerequisites

IBM support account.

Providing information to IBM Support engineers


Generating readable core dump files

Providing information to IBM Support engineers


Edit online

IBM Storage Ceph 1217


If you are unable to fix problems related to IBM Storage Ceph, contact the IBM Support Service and provide sufficient amount of
information that helps the support engineers to faster troubleshoot the problem you encounter.

Prerequisites

Root-level access to the node.

IBM Support account.

Procedure

1. Open a support ticket on the IBM support portal.

2. Ideally, attach an sos report to the ticket. See the What is a sos report and how to create one in Red Hat Enterprise Linux?
solution for details.

3. If the Ceph daemons fail with a segmentation fault, consider generating a human-readable core dump file. See Generating
readable core dump files for details.

Generating readable core dump files


Edit online
When a Ceph daemon terminates unexpectedly with a segmentation fault, gather the information about its failure and provide it to
the IBM Support Engineers.

Such information speeds up the initial investigation. Also, the Support Engineers can compare the information from the core dump
files with IBM Storage Ceph cluster known issues.

Prerequisites

Install the debuginfo packages if they are not installed already.

Enable the following repositories to install the required debuginfo packages.

RHEL 8:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-


storage-ceph-5-rhel-8.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-8.repo

RHEL 9:

[root@admin ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-


storage-ceph-5-rhel-9.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-5-rhel-9.repo

Once the repository is enabled, you can install the debug info packages that you need from this list of supported packages:

ceph-base-debuginfo
ceph-common-debuginfo
ceph-debugsource
ceph-fuse-debuginfo
ceph-immutable-object-cache-debuginfo
ceph-mds-debuginfo
ceph-mgr-debuginfo
ceph-mon-debuginfo
ceph-osd-debuginfo
ceph-radosgw-debuginfo
cephfs-mirror-debuginfo

Ensure that the gdb package is installed and if it is not, install it:

Example

[root@host01 ~]# dnf install gdb

Generating readable core dump files in containerized deployments

1218 IBM Storage Ceph


Generating readable core dump files in containerized deployments
Edit online
You can generate a core dump file for IBM Storage Ceph 5 which involves two scenarios of capturing the core dump file:

When a Ceph process terminates unexpectedly due to the SIGILL, SIGTRAP, SIGABRT, or SIGSEGV error.

Manually, for example for debugging issues such as Ceph processes are consuming high CPU cycles, or are not responding.

Prerequisites

Root-level access to the container node running the Ceph containers.

Installation of the appropriate debugging packages.

Installation of the GNU Project Debugger (gdb) package.

Ensure the hosts has at least 8 GB RAM. If there are multiple daemons on the host, then IBM recommends more RAM.

Procedure

1. If a Ceph process terminates unexpectedly due to the SIGILL, SIGTRAP, SIGABRT, or SIGSEGV error:

a. Set the core pattern to the systemd-coredump service on the node where the container with the failed Ceph process is
running:

Example

[root@mon]# echo "| /usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" >


/proc/sys/kernel/core_pattern

b. Watch for the next container failure due to a Ceph process and search for the core dump file in the
/var/lib/systemd/coredump/ directory:

Example

[root@mon]# ls -ltr /var/lib/systemd/coredump


total 8232
-rw-r-----. 1 root root 8427548 Jan 22 19:24 core.ceph-
osd.167.5ede29340b6c4fe4845147f847514c12.15622.1584573794000000.xz

2. To manually capture a core dump file for the Ceph Monitors and Ceph OSDs:

a. Get the MONITOR_ID or the OSD_ID and enter the container:

Syntax

podman ps
podman exec -it MONITOR_ID_OR_OSD_ID bash

Example

[root@host01 ~]# podman ps


[root@host01 ~]# podman exec -it ceph-1ca9f6a8-d036-11ec-8263-fa163ee967ad-osd-2 bash

b. Install the procps-ng and gdb packages inside the container:

Example

[root@host01 /]# dnf install procps-ng gdb

c. Find the process ID:

Syntax

ps -aef | grep PROCESS | grep -v run

Replace PROCESS with the name of the running process, for example ceph-mon or ceph-osd.

Example

IBM Storage Ceph 1219


[root@host01 /]# ps -aef | grep ceph-mon | grep -v run
ceph 15390 15266 0 18:54 ? 00:00:29 /usr/bin/ceph-mon --cluster ceph --setroot
ceph --setgroup ceph -d -i 5
ceph 18110 17985 1 19:40 ? 00:00:08 /usr/bin/ceph-mon --cluster ceph --setroot
ceph --setgroup ceph -d -i 2

d. Generate the core dump file:

Syntax

gcore ID

Replace ID with the ID of the process that you got from the previous step, for example 18110:

Example

[root@host01 /]# gcore 18110


**WARNING:** target file /proc/18110/cmdline contained unexpected null characters
Saved corefile core.18110

e. Verify that the core dump file has been generated correctly.

Example

[root@host01 /]# ls -ltr


total 709772
-rw-r--r--. 1 root root 726799544 Mar 18 19:46 core.18110

f. Copy the core dump file outside of the Ceph Monitor container:

Syntax

podman cp ceph-mon-MONITOR_ID:/tmp/mon.core.MONITOR_PID /tmp

Replace MONITOR_ID with the ID number of the Ceph Monitor and replace MONITOR_PID with the process ID number.

3. To manually capture a core dump file for other Ceph daemons:

a. Log in to the cephadm shell:

Example

[root@host03 ~]# cephadm shell

b. Enable ptrace for the daemons:

Example

[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/allow_ptrace true

c. Redeploy the daemon service:

Syntax

ceph orch redeploy SERVICE_ID

Example

[ceph: root@host01 /]# ceph orch redeploy mgr


[ceph: root@host01 /]# ceph orch redeploy rgw.rgw.1

d. Exit the cephadm shell and log in to the host where the daemons are deployed:

Example

[ceph: root@host01 /]# exit


[root@host01 ~]# ssh [email protected]

e. Get the DAEMON_ID and enter the container:

Example

[root@host04 ~]# podman ps


[root@host04 ~]# podman exec -it ceph-1ca9f6a8-d036-11ec-8263-fa163ee967ad-rgw-rgw-1-host04
bash

1220 IBM Storage Ceph


f. Install the procps-ng and gdb packages:

Example

[root@host04 /]# dnf install procps-ng gdb

g. Get the PID of process:

Example

[root@host04 /]# ps aux | grep rados


ceph 6 0.3 2.8 5334140 109052 ? Sl May10 5:25 /usr/bin/radosgw -n
client.rgw.rgw.1.host04 -f --setuser ceph --setgroup ceph --default-log-to-file=false --
default-log-to-stderr=true --default-log-stderr-prefix=debug

h. Gather core dump:

Syntax

gcore PID

Example

[root@host04 /]# gcore 6

i. Verify that the core dump file has been generated correctly.

Example

[root@host04 /]# ls -ltr


total 108798
-rw-r--r--. 1 root root 726799544 Mar 18 19:46 core.6

j. Copy the core dump file outside the container:

Syntax

podman cp ceph-mon-DAEMON_ID:/tmp/mon.core.PID /tmp

Replace DAEMON_ID with the ID number of the Ceph daemon and replace PID with the process ID number.

4. To allow systemd-coredump to successfully store the core dump for a crashed ceph daemon:

a. Set DefaultLimitCORE to infinity in /etc/systemd/system.conf to allow core dump collection for a crashed process:

Syntax

# cat /etc/systemd/system.conf

DefaultLimitCORE=infinity

b. Restart systemd or the node to apply the updated systemd settings:

Syntax

# sudo systemctl daemon-reexec

c. Verify the core dump files associated with previous daemon crashes:

Syntax

# ls -ltr /var/lib/systemd/coredump/

5. Upload the core dump file for analysis to a IBM Support case. See Providing information to IBM Support engineers for details.

Ceph subsystems default logging level values


Edit online
A table of the default logging level values for the various Ceph subsystems.

IBM Storage Ceph 1221


Subsystem Log Level Memory Level
asok 1 5
auth 1 5
buffer 0 0
client 0 5
context 0 5
crush 1 5
default 0 5
filer 0 5
bluestore 1 5
finisher 1 5
heartbeatmap 1 5
javaclient 1 5
journaler 0 5
journal 1 5
lockdep 0 5
mds balancer 1 5
mds locker 1 5
mds log expire 1 5
mds log 1 5
mds migrator 1 5
mds 1 5
monc 0 5
mon 1 5
ms 0 5
objclass 0 5
objectcacher 0 5
objecter 0 0
optracker 0 5
osd 0 5
paxos 0 5
perfcounter 1 5
rados 0 5
rbd 0 5
rgw 1 5
throttle 1 5
timer 0 5
tp 0 5

Health messages of a Ceph cluster


Edit online
These are defined as health checks which have unique identifiers. The identifier is a terse pseudo-human-readable string that is
intended to enable tools to make sense of health checks, and present them in a way that reflects their meaning.

Table 1. Monitor
Health Code Description
DAEMON_OLD_VERSION Warn if old version of Ceph are running on any daemons. It will generate a health error if multiple
versions are detected.
MON_DOWN One or more Ceph Monitor daemons are currently down.
MON_CLOCK_SKEW The clocks on the nodes running the ceph-mon daemons are not sufficiently well synchronized. Resolve
it by synchronizing the clocks using ntpd or chrony.

1222 IBM Storage Ceph


Health Code Description
MON_MSGR2_NOT_ENAB The ms_bind_msgr2 option is enabled but one or more Ceph Monitors is not configured to bind to a v2
LED port in the cluster’s monmap. Resolve this by running ceph mon enable-msgr2 command.
MON_DISK_LOW One or more Ceph Monitors are low on disk space.
MON_DISK_CRIT One or more Ceph Monitors are critically low on disk space.
MON_DISK_BIG The database size for one or more Ceph Monitors are very large.
AUTH_INSECURE_GLOB One or more clients or daemons are connected to the storage cluster that are not securely reclaiming
AL_ID_RECLAIM their global_id when reconnecting to a Ceph Monitor.
AUTH_INSECURE_GLOB Ceph is currently configured to allow clients to reconnect to monitors using an insecure process to
AL_ID_RECLAIM_ALLO reclaim their previous global_id because the setting auth_allow_insecure_global_id_reclaim
WED
is set to true.
Table 2. Manager
Health Code Description
MGR_DOWN All Ceph Manager daemons are currently down.
MGR_MODULE_DEP An enabled Ceph Manager module is failing its dependency check.
ENDENCY
MGR_MODULE_ERR A Ceph Manager module has experienced an unexpected error. Typically, this means an unhandled exception
OR was raised from the module’s serve function.
Table 3. OSDs
Health Code Description
OSD_DOWN One or more OSDs are marked down.
OSD_CRUSH_TY All the OSDs within a particular CRUSH subtree are marked down, for example all OSDs on a host. For example,
PE_DOWN OSD_HOST_DOWN and OSD_ROOT_DOWN
OSD_ORPHAN An OSD is referenced in the CRUSH map hierarchy but does not exist. Remove the OSD by running ceph osd
crush rm osd._OSD_ID command.
OSD_OUT_OF_O The utilization thresholds for nearfull, backfillfull, full, or, failsafefull are not ascending. Adjust the thresholds by
RDER_FULL running ceph osd set-nearfull-ratio _RATIO_, ceph osd set-backfillfull-ratio _RATIO_,
and ceph osd set-full-ratio _RATIO_
OSD_FULL One or more OSDs has exceeded the full threshold and is preventing the storage cluster from servicing writes.
Restore write availability by raising the full threshold by a small margin ceph osd set-full-ratio
_RATIO_.
OSD_BACKFILL One or more OSDs has exceeded the backfillfull threshold, which will prevent data from being allowed to
FULL rebalance to this device.
OSD_NEARFULL One or more OSDs has exceeded the nearfull threshold.
OSDMAP_FLAGS One or more storage cluster flags of interest has been set. These flags include full, pauserd, pausewr, noup,
nodown, noin, noout, nobackfill, norecover, norebalance, noscrub, nodeep_scrub, and notieragent. Except for full,
the flags can be cleared with ceph osd set _FLAG_ and ceph osd unset _FLAG_ commands.
OSD_FLAGS One or more OSDs or CRUSH has a flag of interest set. These flags include noup, nodown, noin, and noout.
OLD_CRUSH_TU The CRUSH map is using very old settings and should be updated.
NABLES
OLD_CRUSH_ST The CRUSH map is using an older, non-optimal method for calculating intermediate weight values for straw
RAW_CALC_VER buckets.
SION
CACHE_POOL_N One or more cache pools is not configured with a hit set to track utilization, which will prevent the tiering agent
O_HIT_SET from identifying cold objects to flush and evict from the cache. Configure the hit sets on the cache pool with
ceph osd pool set_POOL_NAME_ hit_set_type _TYPE_, ceph osd pool set _POOL_NAME_
hit_set_period _PERIOD_IN_SECONDS_, ceph osd pool set _POOL_NAME_ hit_set_count
_NUMBER_OF_HIT_SETS_, and ceph osd pool set _POOL_NAME_ hit_set_fpp
_TARGET_FALSE_POSITIVE_RATE_ commands.
OSD_NO_SORTB sortbitwise flag is not set. Set the flag with ceph osd set sortbitwise command.
ITWISE
POOL_FULL One or more pools has reached its quota and is no longer allowing writes. Increase the pool quota with ceph
osd pool set-quota _POOL_NAME_ max_objects _NUMBER_OF_OBJECTS_ and ceph osd pool set-
quota _POOL_NAME_ max_bytes _BYTES_ or delete some existing data to reduce utilization.
BLUEFS_SPILL One or more OSDs that use the BlueStore backend is allocated db partitions but that space has filled, such that
OVER metadata has “spilled over” onto the normal slow device. Disable this with ceph config set osd
bluestore_warn_on_bluefs_spillover false command.
BLUEFS_AVAIL This output gives three values which are BDEV_DB free, BDEV_SLOW free and available_from_bluestore.
ABLE_SPACE

IBM Storage Ceph 1223


Health Code Description
BLUEFS_LOW_S If the BlueStore File System (BlueFS) is running low on available free space and there is little
PACE available_from_bluestore one can consider reducing BlueFS allocation unit size.
BLUESTORE_FR As BlueStore works free space on underlying storage will get fragmented. This is normal and unavoidable but
AGMENTATION excessive fragmentation will cause slowdown.
BLUESTORE_LE BlueStore tracks its internal usage statistics on a per-pool granular basis, and one or more OSDs have BlueStore
GACY_STATFS volumes. Disable the warning with ceph config set global bluestore_warn_on_legacy_statfs
false command.
BLUESTORE_NO BlueStore tracks omap space utilization by pool. Disable the warning with ceph config set global
_PER_POOL_OM bluestore_warn_on_no_per_pool_omap false command.
AP
BLUESTORE_NO BlueStore tracks omap space utilization by PG. Disable the warning with ceph config set global
_PER_PG_OMAP bluestore_warn_on_no_per_pg_omap false command.
BLUESTORE_DI One or more OSDs using BlueStore has an internal inconsistency between the size of the physical device and the
SK_SIZE_MISM metadata tracking its size.
ATCH
BLUESTORE_NO One or more OSDs is unable to load a BlueStore compression plugin. This can be caused by a broken installation,
_COMPRESSION in which the ceph-osd binary does not match the compression plugins, or a recent upgrade that did not include
` a restart of the ceph-osd daemon.
BLUESTORE_SP One or more OSDs using BlueStore detects spurious read errors at main device. BlueStore has recovered from
URIOUS_READ_ these errors by retrying disk reads.
ERRORS
Table 4. Device health
Health Code Description
DEVICE_HEALT One or more devices is expected to fail soon, where the warning threshold is controlled by the
H mgr/devicehealth/warn_threshold config option. Mark the device out to migrate the data and replace the
hardware.
DEVICE_HEALT One or more devices is expected to fail soon and has been marked “out” of the storage cluster based on
H_IN_USE mgr/devicehealth/mark_out_threshold, but it is still participating in one more PGs.
DEVICE_HEALT Too many devices are expected to fail soon and the mgr/devicehealth/self_heal behavior is enabled, such
H_TOOMANY that marking out all of the ailing devices would exceed the clusters mon_osd_min_in_ratio ratio that prevents
too many OSDs from being automatically marked out.
Table 5. Pools and placement groups
Health Code Description
PG_AVAILABIL Data availability is reduced, meaning that the storage cluster is unable to service potential read or write requests
ITY for some data in the cluster.
PG_DEGRADED Data redundancy is reduced for some data, meaning the storage cluster does not have the desired number of
replicas for for replicated pools or erasure code fragments.
PG_RECOVERY_ Data redundancy might be reduced or at risk for some data due to a lack of free space in the storage cluster,
FULL specifically, one or more PGs has the recovery_toofull flag set, which means that the cluster is unable to
migrate or recover data because one or more OSDs is above the full threshold.
PG_BACKFILL_ Data redundancy might be reduced or at risk for some data due to a lack of free space in the storage cluster,
FULL specifically, one or more PGs has the backfill_toofull flag set, which means that the cluster is unable to
migrate or recover data because one or more OSDs is above the backfillfull threshold.
PG_DAMAGED Data scrubbing has discovered some problems with data consistency in the storage cluster, specifically, one or
more PGs has the inconsistent or snaptrim_error flag is set, indicating an earlier scrub operation found a
problem, or that the repair flag is set, meaning a repair for such an inconsistency is currently in progress.
OSD_SCRUB_ER Recent OSD scrubs have uncovered inconsistencies.
RORS
OSD_TOO_MANY When a read error occurs and another replica is available it is used to repair the error immediately, so that the
_REPAIRS client can get the object data.
LARGE_OMAP_O One or more pools contain large omap objects as determined by
BJECTS osd_deep_scrub_large_omap_object_key_threshold or
osd_deep_scrub_large_omap_object_value_sum_threshold or both. Adjust the thresholds with ceph
config set osd osd_deep_scrub_large_omap_object_key_threshold _KEYS_ and ceph config
set osd osd_deep_scrub_large_omap_object_value_sum_threshold _BYTES_ commands.
CACHE_POOL_N A cache tier pool is nearly full. Adjust the cache pool target size with ceph osd pool set
EAR_FULL _CACHE_POOL_NAME_ target_max_bytes _BYTES_ and ceph osd pool set _CACHE_POOL_NAME_
target_max_bytes _BYTES_ commands.

1224 IBM Storage Ceph


Health Code Description
TOO_FEW_PGS The number of PGs in use in the storage cluster is below the configurable threshold of
mon_pg_warn_min_per_osd PGs per OSD.
POOL_PG_NUM_ One or more pools has a pg_num value that is not a power of two. Disable the warning with ceph config set
NOT_POWER_OF global mon_warn_on_pool_pg_num_not_power_of_two false command.
_TWO
POOL_TOO_FEW One or more pools should probably have more PGs, based on the amount of data that is currently stored in the
_PGS pool. You can either disable auto-scaling of PGs with ceph osd pool set _POOL_NAME_
pg_autoscale_mode off command, automatically adjust the number of PGs with ceph osd pool set
_POOL_NAME_ pg_autoscale_mode on command or manually set the number of PGs with ceph osd pool
set _POOL_NAME_ pg_num _NEW_PG_NUMBER command.
TOO_MANY_PGS The number of PGs in use in the storage cluster is above the configurable threshold of mon_max_pg_per_osd
PGs per OSD. Increase the number of OSDs in the cluster by adding more hardware.
POOL_TOO_MAN One or more pools should probably have more PGs, based on the amount of data that is currently stored in the
Y_PGS pool. You can either disable auto-scaling of PGs with ceph osd pool set _POOL_NAME_
pg_autoscale_mode off command, automatically adjust the number of PGs with ceph osd pool set
_POOL_NAME_ pg_autoscale_mode on command or manually set the number of PGs with ceph osd pool
set _POOL_NAME_ pg_num _NEW_PG_NUMBER command.
POOL_TARGET_ One or more pools have a target_size_bytes property set to estimate the expected size of the pool, but the
SIZE_BYTES_O values exceed the total available storage. Set the value for the pool to zero with ceph osd pool set
VERCOMMITTED
_POOL_NAME_ target_size_bytes 0 command.
POOL_HAS_TAR One or more pools have both target_size_bytes and target_size_ratio set to estimate the expected
GET_SIZE_BYT size of the pool. Set the value for the pool to zero with ceph osd pool set _POOL_NAME_
ES_AND_RATIO
target_size_bytes 0 command.
TOO_FEW_OSDS The number of OSDs in the storage cluster is below the configurable threshold of osd_pool_default_size.
SMALLER_PGP_ One or more pools has a pgp_num value less than pg_num. This is normally an indication that the PG count was
NUM increased without also increasing the placement behavior. Resolve this by setting pgp_num to match with
pg_num with ceph osd pool set _POOL_NAME_ pgp_num _PG_NUM_VALUE_ command.
MANY_OBJECTS One or more pools has an average number of objects per PG that is significantly higher than the overall storage
_PER_PG cluster average. The specific threshold is controlled by the mon_pg_warn_max_object_skew configuration
value.
POOL_APP_NOT A pool exists that contains one or more objects but has not been tagged for use by a particular application.
_ENABLED Resolve this warning by labeling the pool for use by an application with rbd pool init _POOL_NAME_
command.
POOL_FULL One or more pools has reached its quota. The threshold to trigger this error condition is controlled by the
mon_pool_quota_crit_threshold configuration option.
POOL_NEAR_FU One or more pools is approaching a configured fullness threshold. Adjust the pool quotas with ceph osd pool
LL set-quota _POOL_NAME_ max_objects _NUMBER_OF_OBJECTS_ and ceph osd pool set-quota
_POOL_NAME_ max_bytes _BYTES_ commands.
OBJECT_MISPL One or more objects in the storage cluster is not stored on the node the storage cluster would like it to be stored
ACED on. This is an indication that data migration due to some recent storage cluster change has not yet completed.
OBJECT_UNFOU One or more objects in the storage cluster cannot be found, specifically, the OSDs know that a new or updated
ND copy of an object should exist, but a copy of that version of the object has not been found on OSDs that are
currently online.
SLOW_OPS One or more OSD or monitor requests is taking a long time to process. This can be an indication of extreme load,
a slow storage device, or a software bug.
PG_NOT_SCRUB One or more PGs has not been scrubbed recently. PGs are normally scrubbed within every configured interval
BED specified by osd_scrub_max_interval globally. Initiate the scrub with ceph pg scrub _PG_ID_
command.
PG_NOT_DEEP_ One or more PGs has not been deep scrubbed recently. Initiate the scrub with ceph pg deep-scrub
SCRUBBED _PG_ID_ command. PGs are normally scrubbed every osd_deep_scrub_interval seconds, and this warning
triggers when mon_warn_pg_not_deep_scrubbed_ratio percentage of interval has elapsed without a scrub
since it was due.
PG_SLOW_SNAP The snapshot trim queue for one or more PGs has exceeded the configured warning threshold. This indicates
_TRIMMING that either an extremely large number of snapshots were recently deleted, or that the OSDs are unable to trim
snapshots quickly enough to keep up with the rate of new snapshot deletions.
Table 6. Miscellaneous
Health Code Description

IBM Storage Ceph 1225


Health Code Description
RECENT_CRASH One or more Ceph daemons has crashed recently, and the crash has not yet been acknowledged by the
administrator.
TELEMETRY_CH Telemetry has been enabled, but the contents of the telemetry report have changed since that time, so telemetry
ANGED reports will not be sent.
AUTH_BAD_CAP One or more auth users has capabilities that cannot be parsed by the monitor. Update the capabilities of the user
S with ceph auth _ENTITY_NAME_ _DAEMON_TYPE_ _CAPS_ command.
OSD_NO_DOWN_ The mon_osd_down_out_interval option is set to zero, which means that the system will not automatically
OUT_INTERVAL perform any repair or healing operations after an OSD fails. Silence the interval with ceph config global
mon mon_warn_on_osd_down_out_interval_zero false command.
DASHBOARD_DE The Dashboard debug mode is enabled. This means, if there is an error while processing a REST API request, the
BUG HTTP error response contains a Python traceback. Disable the debug mode with ceph dashboard debug
disable command.

Related information
Edit online
Product guides, other publications, websites, blogs, and videos that contain information related to IBM Storage Ceph.

IBM publications
IBM Storage Insights documentation
IBM Spectrum Protect documentation

External publications
Product Documentation for Red Hat Ceph Storage
Product Documentation for Red Hat Enterprise Linux
Product Documentation for Red Hat OpenShift Data Foundation
Product Documentation for Red Hat OpenStack Platform

Blogs and videos


Introduction to IBM Storage Ceph - 01102023. Publication date: 10 Jan 2023 (1:05:54).

Red Hat's Ceph team is moving to IBM, on Ceph Blog. Last updated 4 Oct 2022.

What is Ceph?. Publication date: 16 Dec 2020 (5:44).

Acknowledgments
Edit online

The Ceph Storage project is seeing amazing growth in the quality and quantity of contributions from individuals and organizations in
the Ceph community. We would like to thank all members of the IBM Storage Ceph team, all of the individual contributors in the
Ceph community, and additionally, but not limited to, the contributions from organizations such as:

Intel®
Fujitsu ®
UnitedStack
Yahoo ™
Ubuntu Kylin
Mellanox ®
CERN ™
Deutsche Telekom

1226 IBM Storage Ceph


Mirantis ®
SanDisk ™
SUSE
Red Hat ®
IBM ™

IBM Storage Ceph 1227

You might also like