0% found this document useful (0 votes)
147 views297 pages

IBM Storage Ceph For Beginner's

This document serves as an introductory guide to IBM Storage Ceph, covering its installation, deployment, and unified services including block, file, and object storage. It aims to assist customers in evaluating IBM Storage Ceph's benefits in a test or proof-of-concept environment, highlighting its architecture, key components, and integration with Kubernetes. The document is targeted at IBM Technical Sellers, Business Partners, and customers involved in software-defined storage solutions.

Uploaded by

mikol.lavoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views297 pages

IBM Storage Ceph For Beginner's

This document serves as an introductory guide to IBM Storage Ceph, covering its installation, deployment, and unified services including block, file, and object storage. It aims to assist customers in evaluating IBM Storage Ceph's benefits in a test or proof-of-concept environment, highlighting its architecture, key components, and integration with Kubernetes. The document is targeted at IBM Technical Sellers, Business Partners, and customers involved in software-defined storage solutions.

Uploaded by

mikol.lavoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 297

IBM STORAGE CEPH FOR

BEGINNER’S

Abstract
This document in intended as an introduction to Ceph
and IBM Storage Ceph in particular from installation to
deployment of all unified services including block, file
and object. It is ideally suited to help customers
evaluate the benefits of IBM Storage Ceph in a test or
POC environment.

NISHAAN DOCRAT
IBM Systems Hardware

Version 1.00 – December 2024


IBM Storage Ceph for Beginner’s

Trademarks
IBM, IBM Storage Ceph, IBM Storage Scale are trademarks or registered trademarks of the
International Business Machines Corporation in the United States and other countries.

Ceph is a trademark or registered trademark of Red Hat, Inc. or its subsidiaries in the United States
and other countries.

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora,
the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other
countries.

OpenStack, OpenStack Swift are trademarks or registered trademarks of the OpenStack Foundation.

Amazon Web Services, AWS, Amazon EC2, EC2, Amazon S3 are trademarks or registered trademarks
of Amazon.com, Inc. or its affiliates in the United States and other countries.

HAProxy, HAProxy Community Edition are trademarks or registered trademarks of HAProxy


Technologies LLC and its affiliated entities.

KUBERNETES is a registered trademark of the Linux Foundation in the United States and other
countries,

Ubuntu and Canonical are registered trademarks of Canonical Ltd.

Linux is the registered trademark of Linus Torvalds in the United States and other countries.

"SUSE" and the SUSE logo are trademarks of SUSE LLC or its subsidiaries or affiliates.

Dnsmasq and s3fs-fuse are distributed under the GPL.

s3cmd is the copyright of s3tools.org

MINIO is a trademark of Minio, Inc.

All other trademarks, trade names, or company names referenced herein are used for identification
only and are the property of their respective owners.

Acknowledgments and Feedback


If you have anything to contribute to this document or have identified any errors or omissions, you are
welcome to contact me at [email protected].

1
IBM Storage Ceph for Beginner’s

Table of Contents
Executive Summary ......................................................................................... 5
Ceph Introduction ........................................................................................... 5
IBM Storage Ceph Architecture and Key Components ......................................... 6
Client/Cluster versus Traditional Client/Server Architectures ................................. 7
Ceph Calculated Data Placement........................................................................... 8
Putting it all together – how does Ceph store or retrieve data .................................. 9
Who Uses Ceph and What for? ........................................................................ 12
Sizing an IBM Storage Ceph Cluster ................................................................ 14
Using IBM Storage Modeller (StorM) to Size an IBM Storage Ceph Cluster .............. 16
Obtaining a 60-day Trial License for IBM Storage Ceph Pro Edition ................... 18
Deploying an IBM Storage Ceph 4-node Cluster ............................................... 18
Firewall Rules Required for IBM Storage Ceph ..................................................... 19
Storage Configuration ......................................................................................... 23
Initial Installation of your IBM Storage Ceph cluster ............................................. 24
Register your IBM Storage Ceph Cluster Nodes with Red Hat ................................ 24
Configure the Ansible Inventory Location ............................................................. 29
Enabling SSH login as root user on Red Hat Enterprise Linux ................................. 29
Run cephadm pre-flight playbook to install all pre-requisites on all cluster nodes .. 30
Bootstrapping a new storage cluster.................................................................... 31
Distributing Ceph Cluster SSH keys to all nodes ................................................... 35
Verifying the cluster installation .......................................................................... 36
Logging into the Ceph Dashboard and Adding Cluster Nodes ................................. 37
IBM Storage Ceph RADOS Gateway (RGW) Deployment .................................... 47
Using Virtual-hosted Style Bucket Addressing with IBM Storage Ceph RGW ........... 64
Setting up IBM Storage Ceph RGW static web hosting .......................................... 71
Setting up IBM Storage Ceph RGW Presigned URL ................................................ 73
IBM Storage Ceph RADOS Block Device (RBD) Deployment ............................... 73
Accessing a Ceph RBD Image on a Windows Host ................................................. 77
Accessing a Ceph RBD Image on a Linux Host ....................................................... 81
Ceph RBD Images and Thin Provisioning .............................................................. 84
Testing RBD client access during a failure ............................................................ 85
IBM Storage Ceph Grafana Dashboards .......................................................... 89

2
IBM Storage Ceph for Beginner’s

IBM Storage Ceph Software Upgrade .............................................................. 93


IBM Storage Ceph Filesystem (CephFS) Deployment ....................................... 101
IBM Storage Ceph NFS Service Deployment ................................................... 110
Deploying the NFS Server with no failover .......................................................... 110
Deploying a Highly-Available NFS Server ........................................................... 114
IBM Storage Ceph NFS with an Object Storage Backend ................................. 120
IBM Storage Ceph iSCSI Gateway ................................................................. 124
Deploying an IBM Storage Ceph Single Node Cluster....................................... 128
IBM Storage Ceph Container Storage Interface (CSI) Driver ............................ 141
Deploying a simple application to dynamically provision a PVC ........................... 153
Ceph CSI Snapshots ......................................................................................... 155
Practical Use of Ceph RGW for Kubernetes Application Backup ........................... 160
IBM Storage Ceph Multi-Cluster Management ................................................ 163
IBM Storage Ceph Replication – RBD Snapshots ............................................ 167
IBM Storage Ceph Replication – CephFS Snapshots ....................................... 171
IBM Storage Ceph Replication – RBD Mirroring .............................................. 175
IBM Storage Ceph Replication – CephFS Snapshot Mirroring ........................... 196
IBM Storage Ceph Replication – RGW Multi-site ............................................. 211
IBM Storage Ceph Encryption ....................................................................... 240
IBM Storage Ceph Data-in-flight Encryption ....................................................... 240
IBM Storage Ceph Data-at-Rest Encryption ........................................................ 241
IBM Storage Ceph RGW SSL Termination ........................................................... 246
IBM Storage Ceph RGW S3 Encryption ............................................................... 246
IBM Storage Ceph Compression .................................................................... 251
IBM Storage Ceph Performance Benchmarking .............................................. 258
Using the IBM Storage RESTful API ............................................................... 260
Using the IBM Storage Ceph SNMP Gateway .................................................. 265
IBM Storage Ceph Email Alerting .................................................................. 268
IBM Storage Ceph Call Home and Storage Insights......................................... 270
IBM Storage Ceph Logging a Manual Support Ticket ....................................... 273
IBM Storage Ceph Simulating Hardware Failures ........................................... 275
IBM Storage Ceph OSD Replacement ................................................................. 275
IBM Storage Ceph Simulating a Node Failure...................................................... 284

3
IBM Storage Ceph for Beginner’s

IBM Storage Ceph Licensing ......................................................................... 292


Summary .................................................................................................... 293
Appendix A - IBM Storage Ceph Tips ............................................................. 294

4
IBM Storage Ceph for Beginner’s

Executive Summary

IBM Storage Ceph is the latest addition to IBM’s software defined Storage Portfolio. IBM has always
had market leading storage software offerings including IBM Storage Scale (formerly GPFS) and IBM
Cloud Object Storage (COS). Whilst the introduction of IBM Storage Ceph does to an extent overlap
with the existing IBM offerings, IBM Storage Ceph still offers a big value proposition to our clients with
very strong S3 API compatibility, the ability to serve out block storage and also strong integration with
Kubernetes. Open source Ceph is widely used worldwide across a multitude of industries and for
varied applications. IBM Storage Ceph is built on open source Ceph and gives customers the ability to
purchase and implement Ceph into mission critical environments with the full backing of IBM support
and development.

This document is primarily meant to help IBM Technical Sellers, Business Partners and IBM customers
deploy and evaluate all of IBM Storage Ceph’s features. Whilst the publicly available documentation
is comprehensive, it is primarily targeted at using the command line. This is the same issue IBM has
had with IBM Storage Scale where our customers perception of the product’s ease of use is
determined by the effort and knowledge required to install, configure and administer the product. This
document serves to make use of the IBM Storage Ceph Dashboard wherever possible to specifically
address this concern. The full implementation of IBM Storage Ceph including all protocols (file, block
and object) are covered in detail along with advanced features like replication, compression and
encryption. Integration with native Kubernetes is also demonstrated.

AUDIENCE: This document is intended for anyone involved in evaluating, acquiring, managing,
operating, or designing a software defined storage solution based on IBM Storage Ceph.

Ceph Introduction
Ceph is a popular open source software defined distributed storage solution that is highly reliable and
extremely scalable. It is built on commodity hardware and provides file, block and object storage from
a single unified storage cluster. Ceph is designed to have no single point of failure and includes self-
healing and self-managing capabilities to reduce administrative overhead and costs. It favors
consistency and correctness above performance. It was initially developed by Sage Weil in 2004 with
the primary goal being to resolve issues with existing storage solutions at the time that struggled with
scalability and were prone to performance degradation at scale. These legacy storage solutions
centralised their metadata service and this became a bottleneck as the solution scaled. Ceph on the
other hand is based on a distributed storage architecture with no centralised metadata server so can
easily scale to exabytes of data without any noticeable performance degradation. It achieves this
through the CRUSH (Controlled Replication Under Scalable Hashing) algorithm which calculates
where data needs to be stored and retrieved from without requiring any central lookup or access to
dedicated metadata servers. The beauty of this approach is that the calculation is done on the client
side so that there is no single point of failure or bottleneck and enables infinite scalability.

For a detailed explanation of the CRUSH algorithm refer to https://ceph.com/assets/pdfs/weil-crush-


sc06.pdf

The first prototype of Ceph was released in 2006 and in 2007 Ceph was released under the LGPL.
Ceph was later incorporated into the Linux kernel in 2010 by Linus Torvalds and Sage Weil formed a
company called Inktank Storage to commercialise and promote Ceph in 2011. The first stable release
of Ceph (Argonaut) was released in 2012. In 2014 Red Hat purchased Inktank Storage which was a
major milestone for Ceph as it brought significant investment into Ceph’s development with enterprise
focus and exposure to a wider audience. Red Hat Ceph storage was paired with their OpenStack

5
IBM Storage Ceph for Beginner’s

offering and would later form the basis for their OpenShift Data Foundation product. In 2015, the Ceph
Community Advisory Board was formed (which was later replaced in 2018 by the Ceph Foundation).
In the years following, Ceph had a string of new releases that further improved performance and
introduced new features with contributions from users, developers and companies across a broad
spectrum of industries. In January 2023, IBM acquired the entire storage development team from
Red Hat and rebranded Red Hat Ceph Storage to IBM Ceph Storage. Ceph remains open source and
IBM is committed to ensure that it stays that way. IBM is a diamond member of the Ceph Foundation
and all new Ceph code changes are published upstream and open source Ceph still forms the basis
for IBM Storage Ceph. What IBM brings to Ceph is to enterprise harden it as a fully tested and
supported product ready for large scale deployments in mission critical environments.

IBM Storage Ceph Architecture and Key Components


Whilst a detailed explanation of IBM Storage Ceph’s architecture is outside the scope of this
document, there are a few key concepts that you need to understand prior to its deployment or use.
A good explanation of the inner workings of Ceph and be found here:

https://docs.ceph.com/en/latest/architecture/
https://www.redbooks.ibm.com/abstracts/redp5721.html

Ceph’s architecture1 is designed primarily for reliability, scalability and performance. It makes use of
distributed computing to handle exabytes of data efficiently and each of its key components can be
individually scaled depending on its intended use case. Figure 1 illustrates the key components of a
Ceph cluster starting with the data access services, through the librados client API library and
underpinned by RADOS.

Figure 1: Unified Storage Architecture based on RADOS

RADOS (Reliable Autonomic Distributed Object Store) is the foundation of the Ceph architecture. It is
the underlying storage layer that supports its block, file and object services. RADOS provides a low-
level object storage service that is both reliable and scalable and it provides strong consistency.
RADOS manages data placement, data protection (either using replication or erasure coding),
rebalancing, repair and recovery. A RADOS cluster comprises of the following components:

1
Source: https://docs.ceph.com/en/latest/architecture/

6
IBM Storage Ceph for Beginner’s

MON (Monitors) – The Ceph Monitor’s primary function is to maintain a master copy
MON of the cluster map. Monitors also keep a detailed record of the cluster state, including
all OSDs, their status, and other critical metadata. Monitors ensure the cluster
achieves consensus on the state of the system using the Paxos algorithm, providing a
reliable and consistent view of the cluster to all clients and OSDs. A Ceph cluster will
typically have anywhere from 3 to 7 monitors.

MGR (Manager) – The Ceph Manager aggregates real-time metrics (throughput, disk
MGR usage, etc.) and tracks the current state of the cluster. It provides essential
management and monitoring capabilities for the cluster. A Ceph cluster typically has
one active and one or more standby Managers (3 is the recommended).

OSD (Object Storage Daemon) – The Ceph Object Storage Dameon is responsible for
OSD storing data on disk and services client I/O requests. The OSDs are also responsible
for data replication, recovery and rebalancing. OSDs communicate with each other to
ensure data is replicated and distributed across the cluster. Each OSD is mapped to a
single disk. A Ceph cluster would typically have 10s-1000s OSD per cluster.

A Ceph storage cluster offers the following data access services:

LIBRADOS - The Ceph storage cluster provides the basic storage services that allows Ceph to uniquely
deliver object, block and file storage in one unified system. However, you are not limited to using the
RESTful, block or POSIX interfaces. Based on RADOS, the librados API enables you to create your own
interface to the Ceph storage cluster. Librados Is a C language library, commonly referred to as Ceph
basic library. The functions of RADOS are abstracted and encapsulated in this library. It essentially
provides low-level access to the RADOS service.

RADOS BLOCK DEVICE (RBD) – A reliable and fully distributed block device with a Linux kernel client
and a QEMU/KVM driver.

RADOS GATEWAY (RGW) – This interface provides object storage. Ceph RGW is a bucket-based REST
API gateway compatible with S3 and Swift.

CEPH FILESYSTEM (CEPHFS) – A POSIX compliant distributed filesystem with a Linux kernel client
and support for FUSE (Filesystems in Userspace).

Client/Cluster versus Traditional Client/Server Architectures

One of the primary goals of the Ceph architecture is to avoid the pitfalls of traditional client/server
models2. Whilst traditional client/server architectures work well, as services scale it becomes
increasingly difficult to maintain the illusion of a single server when there could be hundreds or
thousands of servers that make up the storage cluster. Traditional architectures used technologies
like virtual IP addresses or failover pairs or gateway nodes to hide the layout of the data from the client
though these technologies in themselves have limitations that affect the overall design of the system
and affect its performance and ability to maintain consistency and overall cluster behavior.

Ceph is designed around a client/cluster architecture. This basically means that there is an intelligent
client library that sits on the application clients. This library understands that it is not talking to a single
server but to a cluster of co-operating servers. This client library enables intelligent access to the
storage cluster by enabling smart routing where I/O requests can be routed to the server that has the

2
Source: Sage Weil, Ceph Tech Talk – Intro to Ceph 27/06/19 - https://www.youtube.com/watch?v=PmLPbrf-x9g&list=WL&index=48

7
IBM Storage Ceph for Beginner’s

actual data in question. It also allows for flexible addressing where it can address all the nodes in the
storage cluster and manage the fact that data can be moving around in the background whilst
providing a seamless experience for the client. Lastly, the application is abstracted from the
underlying complexity of the storage cluster by using this client library that handles the intricacies of
where exactly data is being written to or retrieved from.

Figure 2: Ceph client/cluster architecture

Ceph Calculated Data Placement

Considering a Ceph cluster can store hundreds of millions of objects, the overhead of keeping a
centralised record of where each object is supposed to be stored or retrieved from would be
computationally expensive and quickly become a bottleneck. In fact, Ceph’s primary goal is to
eliminate having to “work out” where to store or retrieve data from a centralised metadata service but
to “calculate” it. The best part of this approach is that the calculation is done on the client side. Figure
3 depicts this process. Initially, the client library requests a cluster map from one of the Ceph monitor
daemons (1). As mentioned previously, the cluster map reflects the current state of the cluster
including the structure of the cluster and how to layout data across the servers that comprise it. When
the client application wants to read or write data, the client library will then do a calculation based on
the state of the cluster and the name of the object (2). The result of this calculation provides the
location in the cluster of where that data needs to be stored. The client library can then contact the
appropriate node in the cluster to read or write the object too (3).

Figure 3: Ceph Calculated Data Placement

8
IBM Storage Ceph for Beginner’s

If the state of the cluster changes (e.g. a server node is added or removed or a device fails), an updated
cluster map is provided to the application so that when it needs to retrieve an object it had previously
stored, it will redo the calculation which might produce a different result. The application can then go
and retrieve the object from the new location.

Putting it all together – how does Ceph store or retrieve data

An object is the fundamental unit of storage in RADOS. A RADOS object consists of a unique name,
associated attributes or metadata and the actual byte data of the object which can range from a few
bytes to tens of megabytes. Most objects in Ceph are defaulted to 4MB in size. RADOS also supports
a special type of object called OMAP objects. These objects store key map data instead of byte data.
All objects in Ceph reside in storage pools. Ceph storage pools represent a high-level grouping of
similar objects. They are typically created based on their use case. For example, you might have pools
to store RBD images or pools to store S3 Objects. Ceph storage pools are logical constructs and are
thin provisioned. Ceph storage pools usually share devices (unless Ceph’s CRUSH algorithm has a
placement policy that specifies a specific class of device e.g. hdd vs sdd).

How does Ceph know where to store these RADOS objects? Figure 4 illustrates exactly how this is
achieved.

Figure 4: Ceph Data Storage Illustrated

The way Ceph stores objects can be categorized into the following 3 functions:

• File to Object mapping


• Object to Placement Group mapping
• Placement Group to OSD mapping

Let’s assume we need to store a file (e.g. a large MP4 video). The first thing Ceph does it to break up
the video file into multiple 4MB objects. Each object has an ino (filename and metadata) and an ono
(sequence number determined by the 4MB fragmentation algorithm). Ceph then calculates and
assigns this object an object ID (oid) by using the ino and ono. All of these objects are then mapped to
a pool. Remember, a pool is a logical construct and could contain petabytes of data and billions of
objects. It would be computationally expensive to manage placement of objects individually. To cope

9
IBM Storage Ceph for Beginner’s

with this number of objects, Ceph breaks down the pool into fragments or shards and groups them
into placement groups. Objects are mapped to placement groups within a pool by using a static hash
function to calculate the value of oid, maps oid to an approximately uniformly distributed pseudo-
random value, and then performs bitwise AND with mask to obtain placement group id or pgid. The
mask is typically the number of placement groups in the pool less 1. Once the placement group id is
calculated, Ceph then uses the CRUSH algorithm, substituting pgid into it to get a set of OSDs on which
to store the object. CRUSH is essentially pseudo random data distribution and replication algorithm.
It always produces a consistent and repeatable calculation. Each Ceph cluster has a CRUSH map. The
CRUSH map is basically a hierarchy describing the physical topology of the cluster and a set of rules
defining policy about how to place data on those devices that make up the topology. The hierarchy
has devices (OSDs) at the leaves and internal nodes corresponding to other physical features or
groupings (e.g. hosts, racks, rows, datacenters, etc.). The rules describe how replicas are placed in
terms of that hierarchy (e.g. three replicas in different racks). Figure 5 depicts a simple CRUSH map.

Figure 5: A simple CRUSH hierarchy

In the above CRUSH map, having a 3-way replica policy will cause Ceph to distribute each replica of
an object across 3 separate rack buckets or failure domains. When you deploy a Ceph cluster, a default
CRUSH map is generated. This might be fine for a POC or test environment, but for a large cluster you
should give careful consideration to creating a custom CRUSH map that will ensure optimal
performance and maximum availability. You can also create your own bucket types to suit your
environment. The default bucket types are:

0 OSD
1 HOST
2 CHASSIS
3 RACK
4 ROW
5 PDU
6 POD
7 ROOM
8 DATCENTER
9 REGION
10 ROOT

Table 1: CRUSH bucket types

In terms of data durability, Ceph supports both replication and erasure coding. The default replication
factor is 3 though this can be dynamically changed. Ceph also supports erasure coding using the Reed-

10
IBM Storage Ceph for Beginner’s

Solomon algorithm. In erasure coding, data is broken into fragments of two kinds: data chunks (k) and
parity or coding chunks (m) and stores those chunks on different OSDs. If a drive fails or becomes
corrupted, Ceph retrieves the remaining data (k) and coding (m) chunks from the other OSDs and the
erasure code algorithm restores the object from those chunks. Erasure coding uses storage capacity
more efficiently than replication. The n-replication approach maintains n copies of an object (3x by
default in Ceph), whereas erasure coding maintains only k + m chunks. For example, 3 data and 2
coding chunks use 1.5x the storage space of the original object.

While erasure coding uses less storage overhead than replication, the erasure code algorithm uses
more RAM and CPU than replication when it accesses or recovers objects. Erasure coding is
advantageous when data storage must be durable and fault tolerant, but do not require fast read
performance (for example, cold storage, historical records, and so on).

Ceph defines an erasure-coded pool with a profile3. Ceph uses a profile when creating an erasure-
coded pool and the associated CRUSH rule. Ceph creates a default erasure code profile when
initializing a cluster with k=2 and m=2. This mean that Ceph will spread the object data over four OSDs
(k+m = 4) and Ceph can lose one of those OSDs without losing data. You can create a new profile to
improve redundancy without increasing raw storage requirements. For instance, a profile with k=8
and m=4 can sustain the loss of four (m=4) OSDs by distributing an object on 12 (k+m=12) OSDs. Ceph
divides the object into 8 chunks and computes 4 coding chunks for recovery. For example, if the object
size is 8 MB, each data chunk is 1 MB and each coding chunk has the same size as the data chunk,
that is also 1 MB. The object is not lost even if four OSDs fail simultaneously.

For instance, if the desired architecture must sustain the loss of two racks with a 40% storage
overhead, the following profile can be defined:

$ ceph osd erasure-code-profile set myprofile \


k=4 \
m=2 \
crush-failure-domain=rack
$ ceph osd pool create ecpool 12 12 erasure *myprofile*
$ echo ABCDEFGHIJKL | rados --pool ecpool put NYAN -
$ rados --pool ecpool get NYAN -
ABCDEFGHIJKL

The primary OSD will divide the NYAN object into four (k=4) data chunks and create two additional
chunks (m=2). The value of m defines how many OSDs can be lost simultaneously without losing any
data. The crush-failure-domain=rack will create a CRUSH rule that ensures no two chunks are stored
in the same rack.

Figure 6: Ceph erasure coding profile (4+2)

3
Source: https://docs.redhat.com/en/documentation/red_hat_ceph_storage/7/html/storage_strategies_guide/erasure-code-pools-overview_strategy#erasure-code-
profiles_strategy

11
IBM Storage Ceph for Beginner’s

Who Uses Ceph and What for?


Now that we know how Ceph works, let’s take a look at who uses Ceph, why they do so and also what
the typical use cases are. Based on the most recent Ceph User Survey undertaken by the Ceph
Foundation in 2022, we can see that Ceph is used primarily for commercial use.

Source: https://ceph.io/en/news/blog/2022/ceph-user-survey-results-2022/

Figure 7: Sectors in which Ceph is deployed

More importantly, now that we understand some of the basics of the Ceph architecture, we can see
that it is used exactly because it is open source and because of its availability, reliability and scalability
characteristics.

Figure 8: Why people use Ceph

The primary uses cases for Ceph are Virtualization, Containers and Backup. This implies that Ceph
RBD and Ceph RGW are popular and is also where IBM is advocating for its use; for VMWare
environments (using Ceph block storage over NVMe/TCP) and as an S3 compliant object store for a
wide range of applications including backup.

Figure 9: Ceph Use Cases

12
IBM Storage Ceph for Beginner’s

Perhaps the biggest reason why paying for Ceph (e.g. IBM Storage Ceph) can be understood is by
analyzing the graph below.

Figure 10: Where do Ceph uses go for help

Getting help from these sources is probably acceptable when deploying Ceph into a test or
development environment. For mission critical environments however, being reliant on
documentation and the wider Ceph community for help is not practical. When you have a critical issue,
you want the knowledge that you are backed by a vendor who has the support capability of IBM with
over 200 dedicated Ceph developers and the support personnel to assist you at a moment’s notice.
With the release of IBM Storage Ready Nodes for IBM Storage Ceph, IBM is able to provide both
hardware and software support for your mission critical environments.

IBM Storage Ready Nodes: https://www.ibm.com/downloads/documents/us-en/107a02e95bc8f6bd

Figure 11: IBM Storage Ready Nodes for IBM Storage Ceph – Value Proposition

Apart from 3rd party organizations that provide Ceph consulting and support services, SUSE was the
only other major vendor to offer a commercial Ceph product (SUSE Enterprise Storage or SES).
However, they discontinued this offering in 2020 leaving only IBM and Red Hat as the major vendors
that offer a commercial Ceph product along with its associated support, services and continued
software maintenance.

13
IBM Storage Ceph for Beginner’s

Sizing an IBM Storage Ceph Cluster


Since the primary goal of this document is to help deploy IBM Storage Ceph in a test or POC
environment, we will concentrate on the minimal configuration. For production environments though,
careful planning needs to be taken to ensure proper sizing is completed. A good reference on all the
factors that are involved in sizing an IBM Storage Ceph cluster can be found in the following redbook,
“IBM Storage Ceph Concepts and Architecture Guide, Chapter 4 – Sizing IBM Storage Ceph”.

https://www.redbooks.ibm.com/abstracts/redp5721.html

IBM Storage Ceph is deployed as containers. Containerization makes IBM Storage Ceph easy to
deploy, manage and scale its services. The only caveat is that troubleshooting and tuning become
more complex than if it was deployed natively on bare metal servers.

The minimum hardware requirements for IBM Storage Ceph are listed here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=hardware-minimum-recommendations-
containerized-ceph

The above link also discusses the rules around Ceph daemon colocation. Whilst it is possible to deploy
all services on a single node (which is described later in this paper), for a POC or test environment the
minimum supported configuration is a 4-node cluster. Technically, a 3-node cluster would also work
though a failure of a single node would result in no rebuild space and affect cluster performance and
data integrity. A 4-node cluster allows the ability to tolerate a single node failure without affecting
data redundancy and little impact on performance. The minimum recommended cluster configuration
is depicted below:

Figure 12: Minimum supported IBM Storage Ceph Cluster configuration with service collocation
(Source: https://www.redbooks.ibm.com/abstracts/redp5721.html)

The best practice minimum hardware requirements for each of the IBM Storage Ceph daemons is
listed below:

14
IBM Storage Ceph for Beginner’s

Figure 13: Best Practice Minimum hardware requirements for each Ceph daemon
(Source: https://www.redbooks.ibm.com/abstracts/redp5721.html)

The full software requirements for IBM Storage Ceph including supported Operating Systems and ISV
applications are documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=compatibility-matrix

There are many other factors to take into account when sizing a storage Ceph cluster. The ability to
separate IBM Storage Ceph cluster internal OSD traffic to a dedicated private network is also a good
practice. The amount and size of your disk devices per node also plays a big role. Ideally, IBM Storage
Ceph cluster should be able to recover from a complete node failure in under 8 hours. Red Hat offers
an official Recovery Calculator that can be found here:

https://access.redhat.com/labs/rhsrc/

It is also important to keep the same size disks per server node. Whilst it is possible to calculate the
optimal amount of placement groups per storage pool, using similar sized drives will ensure data is
evenly distributed across them and performance is uniform. Having mixed sized disks would cause a
performance imbalance and IBM Storage Ceph will also complain if there is a placement group
imbalance across OSDs. Red Hat placement group calculator can be found here:

https://access.redhat.com/labs/cephpgc/manual/

The placement group autoscaler is an excellent way to automatically manage placement groups in
your Ceph cluster. Based on expected pool usage, the autoscaler can make recommendations and
adjust the number of placement groups in a cluster based on pool usage and tunings set by the user.

For a POC or test environment, with a limited amount of OSDs per storage node, you will most likely
hit the limit of the optimal number of PGs per OSD. The default maximum is set to 300 (though this
can be adjusted). When using multiple data pools for storing objects, you need to ensure that you
balance the number of placement groups per pool with the number of placement groups per OSD so
that you arrive at a reasonable total number of placement groups. The aim is to achieve reasonably
low variance per OSD without taxing system resources or making the peering process too slow. It is
recommended to use the PG calculator to work out the optimal number of PGs per pool (as opposed
to using the autoscaler) so that you don’t over-burden your Ceph cluster with too many PGs per OSD.

15
IBM Storage Ceph for Beginner’s

Using IBM Storage Modeller (StorM) to Size an IBM Storage Ceph Cluster
IBMers and Business Partners should be familiar with IBM StorM. This is the primary tool that is used
to size most of our storage solutions. StorM has support for IBM Storage Ceph and can be used to size
a Ceph solution based on IBM Storage Ready nodes.

A detailed description of the IBM Storage Ready Nodes for IBM Storage Ceph can be found here:

https://www.ibm.com/downloads/documents/us-en/107a02e95bc8f6bd

And IBM Storage Modeller is accessible via the following URL:

https://www.ibm.com/tools/storage-modeller

As a quick example, let’s size an Entry configuration with a required usable capacity of 125TB on
StorM. First, you need to add IBM Storage Ceph to your project.

Figure 14: IBM StorM Product Selection

Next, we choose the Solution Group and Site (refer to IBM StorM Help for more information on these
constructs).

Figure 15: IBM StorM Product Selection – Solution Group and Site

16
IBM Storage Ceph for Beginner’s

IBM StorM allows you to choose the type of Storage Ready Node and also specify some details about
the expected use case. You can see from the drop-down list below, there are a few pre-defined use
cases (each with recommended options for data redundancy).

Figure 16: IBM StorM Ceph Modelling Use Case Selection

In this example, we want an Entry configuration with a usable capacity of 125TB using a 2+2 erasure
coding data protection scheme (refer to the IBM StorM help for more information on the different use
cases). As you can see, using a 2+2 EC data protection scheme, StorM recommends 5 nodes (1 node
is added to ensure node redundancy and optimal performance even with a single node failure). IBM
StorM calculates the cluster usable capacity and depending on the use case will also calculate the
expected performance throughout. It is also useful to get the RAW capacity values for the proposed
solution so that you can determine the cluster’s licensing requirement. Note, IBM Storage Ceph is
licensed based on RAW capacity.

Figure 17: IBM StorM Ceph Modelling Result

If you don’t have access to IBM StorM, you can also refer to these public sources to help you calculate
the usable capacity for your Ceph cluster.

17
IBM Storage Ceph for Beginner’s

https://access.redhat.com/solutions/6980916
https://www.virtualizationhowto.com/2024/09/ceph-storage-calculator-to-find-capacity-and-cost/
https://bennetgallein.de/tools/ceph-calculator

Obtaining a 60-day Trial License for IBM Storage Ceph Pro


Edition
For a POC or test environment, you can get access to a trial version of IBM Storage Ceph Pro Edition
(including RHEL) using the procedure documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=installing-pro-edition-free

Deploying an IBM Storage Ceph 4-node Cluster

It helps a great deal to have command output to compare too when trying to troubleshoot issues or
validate proper execution of commands. This document therefore includes, where possible, the
output of executed commands from the lab environment.

For the initial cluster deployment, we will use 4 storage nodes deployed as virtual machines with 2
OSDs per node. For some of the advanced functions like replication and multi-cluster management,
single node IBM Storage Ceph clusters were deployed to save on lab resources.

Figure 18: IBM Storage Ceph Test Cluster

All the IBM Storage Ceph cluster nodes were installed using RHEL 9.4. Whilst not strictly required in
a test or POC environment, it is good practice to separate internal OSD traffic to a dedicated network
for performance and security reasons (see https://www.ibm.com/docs/en/storage-
ceph/7.1?topic=configuration-network-ceph). It is also important to ensure that time synchronization
is enabled on all CEPH monitor hosts at the very least.

18
IBM Storage Ceph for Beginner’s

Try to size your nodes symmetrically. From resources to the number of drives to the size of the drives
(ideally 3-5 nodes and 12 OSDs per node). Whilst Ceph can cope with unbalanced nodes, optimal
performance is easier to achieve with a symmetrical configuration. (e.g. Ceph weights each disk so
larger disks get a higher weighting and therefore more I/O will be directed to these disks than smaller
ones). Also, whilst a 3-node cluster is also viable for a POC, consider performance and capacity when
a single node fails. With 4 nodes, Ceph can rebuild and recover onto the free space of the remaining
3 nodes (using 3x replica) and be ready to deal with another failure. With 3 nodes, there is no where
to recover to and another failure will cause an outage. Also consider cluster quorum. If you want to
survive the failure of 2 nodes then you need at least a 5-node cluster and so forth.

IBM Storage Ceph container images are obtained via the IBM Container Registry (ICR). For the initial
deployment, you need internet access from your cluster nodes to pull the required container images.
Most organizations would require firewall rules to be in place to icr.io using port 443. If this is not
possible, you can create a podman private container registry with the required IBM Storage Ceph
container images to perform a disconnected install. You would need at least one node however to be
able to access the IBM Container Registry to pull the required images. Instructions on how to setup a
private container registry can be found at the link below:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=installation-configuring-private-registry-
disconnected

The disconnected install process is described in detail in the link below:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=installation-performing-disconnected

Since we need to run similar commands on all the IBM Storage Ceph cluster nodes, it is a good idea
to use tools like ParallelSSH to save time during the installation. A good reference on how to install
and setup ParallelSSH can be found here https://www.cyberciti.biz/cloud-computing/how-to-use-
pssh-parallel-ssh-program-on-linux-unix/. In our lab environment, I have created an alias called
cephdsh which basically issues the pssh command with the required options.

Firewall Rules Required for IBM Storage Ceph


In the lab environment described above, each cluster node has two network interfaces (to separate
the internal OSD traffic amongst nodes). This is optional of course. Linux firewalld has multiple zones
predefined with one zone being the default (this is usually the public zone). In our example setup, we
will add the interface to be used for internal traffic to the private zone. Though this is not strictly
required in our example setup as both interfaces will require the same services, it might be prudent in
a production environment to do so and is a key consideration for threat and vulnerability management.

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=hardening-threat-vulnerability-
management).

The official recommendation to harden your IBM Storage Ceph cluster is to use 4 distinct security
zones as described in the link above. Since this would be overkill for a test or POC environment, the
following commands are just to demonstrate the use of two zones (public and private).

List the pre-defined firewalld zones:

[root@cephnode1 ~]# cephdsh firewall-cmd --get-zones


[1] 23:32:20 [SUCCESS] root@cephnode1
block dmz drop external home internal nm-shared public trusted work
[2] 23:32:20 [SUCCESS] root@cephnode3
block dmz drop external home internal nm-shared public trusted work
[3] 23:32:20 [SUCCESS] root@cephnode4

19
IBM Storage Ceph for Beginner’s

block dmz drop external home internal nm-shared public trusted work
[4] 23:32:20 [SUCCESS] root@cephnode2
block dmz drop external home internal nm-shared public trusted work
[root@cephnode1 ~]#

Check which zone is set to the default:


[root@cephnode1 ~]# cephdsh firewall-cmd --get-default-zone
[1] 23:33:06 [SUCCESS] root@cephnode1
public
[2] 23:33:06 [SUCCESS] root@cephnode2
public
[3] 23:33:06 [SUCCESS] root@cephnode4
public
[4] 23:33:06 [SUCCESS] root@cephnode3
public
[root@cephnode1 ~]#

Check to see which zones our two interfaces belong too:


[root@cephnode1 ~]# cephdsh firewall-cmd --get-active-zone
[1] 23:29:25 [SUCCESS] root@cephnode1
public
interfaces: ens18 ens19
[2] 23:29:25 [SUCCESS] root@cephnode2
public
interfaces: ens18 ens19
[3] 23:29:25 [SUCCESS] root@cephnode3
public
interfaces: ens18 ens19
[4] 23:29:25 [SUCCESS] root@cephnode4
public
interfaces: ens18 ens19
[root@cephnode1 ~]#

Add the interface we designated for internal traffic to the private zone with persistence:
[root@cephnode1 ~]# cephdsh firewall-cmd --zone=internal --add-interface=ens19 --permanent
[1] 23:36:03 [SUCCESS] root@cephnode1
The interface is under control of NetworkManager, setting zone to 'internal'.
success
[2] 23:36:03 [SUCCESS] root@cephnode3
The interface is under control of NetworkManager, setting zone to 'internal'.
success
[3] 23:36:03 [SUCCESS] root@cephnode4
The interface is under control of NetworkManager, setting zone to 'internal'.
success
[4] 23:36:03 [SUCCESS] root@cephnode2
The interface is under control of NetworkManager, setting zone to 'internal'.
success
[root@cephnode1 ~]#

Reload the firewalld configuration to pick up the changes:


[root@cephnode1 ~]# cephdsh firewall-cmd --reload
[1] 23:36:56 [SUCCESS] root@cephnode1
success
[2] 23:36:56 [SUCCESS] root@cephnode2
success
[3] 23:36:56 [SUCCESS] root@cephnode4
success
[4] 23:36:56 [SUCCESS] root@cephnode3
Success
[root@cephnode1 ~]#

Confirm our interfaces to be used for internal traffic are assigned to the internal zone:
[root@cephnode1 ~]# cephdsh firewall-cmd --list-interfaces --zone internal
[1] 23:37:17 [SUCCESS] root@cephnode1

20
IBM Storage Ceph for Beginner’s

ens19
[2] 23:37:17 [SUCCESS] root@cephnode2
ens19
[3] 23:37:17 [SUCCESS] root@cephnode4
ens19
[4] 23:37:17 [SUCCESS] root@cephnode3
ens19
[root@cephnode1 ~]#

Check that Network Manager has these interfaces set to the correct security zone (if they are not, use
nmcli con modify to change them):

[root@cephnode1 ~]# cephdsh nmcli -p conn show ens19 | grep connection.zone


connection.zone: internal
connection.zone: internal
connection.zone: internal
connection.zone: internal
[root@cephnode1 ~]#

Now that we have setup two security zones, let us check what services are already enabled on each
cluster node:

[root@cephnode1 ~]# cephdsh firewall-cmd --list-services


[1] 23:54:02 [SUCCESS] root@cephnode1
cockpit dhcpv6-client ssh
[2] 23:54:03 [SUCCESS] root@cephnode2
cockpit dhcpv6-client ssh
[3] 23:54:03 [SUCCESS] root@cephnode4
cockpit dhcpv6-client ssh
[4] 23:54:03 [SUCCESS] root@cephnode3
cockpit dhcpv6-client ssh
[root@cephnode1 ~]#

IBM Storage Ceph requires ports 3300 (Ceph Monitor) and the range 6800-7300 for OSDs. If you want
to make use of the Ceph iSCSI gateway (this functionality is deprecated), then ports 3260 and 5000
are also required).

The iSCSI gateway is in maintenance as of November 2022. This means that it is no longer in active
development and will not be updated to add new features.
https://docs.ceph.com/en/reef/rbd/iscsi-overview/

IBM Storage Ceph is already part of the pre-defined list of available services.

[root@cephnode1 ~]# firewall-cmd --get-services


RH-Satellite-6 RH-Satellite-6-capsule afp amanda-client amanda-k5-client amqp amqps apcupsd
audit ausweisapp2 bacula bacula-client bareos-director bareos-filedaemon bareos-storage bb
bgp bitcoin bitcoin-rpc bitcoin-testnet bitcoin-testnet-rpc bittorrent-lsd ceph ceph-exporter
ceph-mon cfengine checkmk-agent cockpit collectd condor-collector cratedb ctdb dds dds-
multicast dds-unicast dhcp dhcpv6 dhcpv6-client distcc dns dns-over-tls docker-registry
.
.
.
[root@cephnode1 ~]#

Let’s check which ports these services are configured to use.

[root@cephnode1 ~]# firewall-cmd --info-service ceph


ceph
ports: 6800-7300/tcp
protocols:
source-ports:
modules:
destination:

21
IBM Storage Ceph for Beginner’s

includes:
helpers:
[root@cephnode1 ~]# firewall-cmd --info-service ceph-mon
ceph-mon
ports: 3300/tcp 6789/tcp
protocols:
source-ports:
modules:
destination:
includes:
helpers:
[root@cephnode1 ~]# firewall-cmd --info-service ceph-exporter
ceph-exporter
ports: 9283/tcp
protocols:
source-ports:
modules:
destination:
includes:
helpers:
[root@cephnode1 ~]#

In our setup, Monitors and MDS operate on the public network, OSDs operate on the public and cluster
network. We will now add the required services to the appropriate zones.

First, we add the ceph service to both zones:

[root@cephnode1 ~]# cephdsh firewall-cmd --add-service ceph --zone=internal --permanent


[1] 00:00:44 [SUCCESS] root@cephnode3
success
[2] 00:00:44 [SUCCESS] root@cephnode4
success
[3] 00:00:44 [SUCCESS] root@cephnode2
success
[4] 00:00:44 [SUCCESS] root@cephnode1
success
[root@cephnode1 ~]# cephdsh firewall-cmd --add-service ceph --zone=public --permanent
[1] 00:01:07 [SUCCESS] root@cephnode1
success
[2] 00:01:07 [SUCCESS] root@cephnode4
success
[3] 00:01:07 [SUCCESS] root@cephnode2
success
[4] 00:01:07 [SUCCESS] root@cephnode3
success
[root@cephnode1 ~]#

The Ceph Monitor service is only required in the public zone:


[root@cephnode1 ~]# cephdsh firewall-cmd --add-service ceph-mon --zone=public --permanent
[1] 00:02:41 [SUCCESS] root@cephnode1
success
[2] 00:02:41 [SUCCESS] root@cephnode4
success
[3] 00:02:41 [SUCCESS] root@cephnode3
success
[4] 00:02:41 [SUCCESS] root@cephnode2
success
[root@cephnode1 ~]#

Reload the firewalld configuration to pick up the changes:

[root@cephnode1 ~]# cephdsh firewall-cmd --reload


[1] 00:03:35 [SUCCESS] root@cephnode1
success
[2] 00:03:35 [SUCCESS] root@cephnode3
success
[3] 00:03:35 [SUCCESS] root@cephnode2
success
[4] 00:03:35 [SUCCESS] root@cephnode4
Success
[root@cephnode1 ~]#

22
IBM Storage Ceph for Beginner’s

Double check that the correct services are enabled:

[root@cephnode1 ~]# cephdsh firewall-cmd --list-services


[1] 00:03:39 [SUCCESS] root@cephnode1
ceph ceph-mon cockpit dhcpv6-client ssh
[2] 00:03:39 [SUCCESS] root@cephnode2
ceph ceph-mon cockpit dhcpv6-client ssh
[3] 00:03:39 [SUCCESS] root@cephnode3
ceph ceph-mon cockpit dhcpv6-client ssh
[4] 00:03:39 [SUCCESS] root@cephnode4
ceph ceph-mon cockpit dhcpv6-client ssh
[root@cephnode1 ~]#

Storage Configuration
IBM Storage Ceph is meant to run on commodity-based hardware with internal disks (JBOD). Using
external storage (e.g. SAN attached storage) increases the cost of a Ceph deployment and defeats the
purpose of Ceph as it most likely will already provide data protection via RAID (which means that we
are effectively protecting data using Ceph replication or erasure coding then again at the block level
with RAID). A failure at the array level which leads to a degraded RAID array will adversely affect
performance. Another consideration is that capacity will further be reduced (e.g. Ceph 3x replication
and RAID6 data protection). Lastly, the external SAN array is a single point of failure and would negate
Ceph’s effort to ensure data durability and maintain separate fault domains. IBM does not officially
support the use of SAN as backend storage for OSDs (see https://www.ibm.com/docs/en/storage-
ceph/7.1?topic=hardware-avoid-using-raid-san-solutions).

In our example lab setup and most likely for a test or POC environment you will deploy your IBM
Storage Ceph cluster on virtualized infrastructure (e.g. VMWare or KVM or Proxmox). The backing store
for your OSDs will most likely be sourced from a single SAN array. In our lab setup, each of the nodes
is initially configured with 2 x 25GB disks for use by Ceph (more will be added later in this document).

[root@cephnode1 ~]# cephdsh lsblk


[1] 00:24:59 [SUCCESS] root@cephnode1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 50G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 49G 0 part
├─rhel_cephnode1-root 253:0 0 44G 0 lvm /
└─rhel_cephnode1-swap 253:1 0 5G 0 lvm [SWAP]
sdb 8:16 0 25G 0 disk
sdc 8:32 0 25G 0 disk
sr0 11:0 1 10.3G 0 rom
[2] 00:24:59 [SUCCESS] root@cephnode4
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 50G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 49G 0 part
├─rhel_cephnode1-root 253:0 0 44G 0 lvm /
└─rhel_cephnode1-swap 253:1 0 5G 0 lvm [SWAP]
sdb 8:16 0 25G 0 disk
sdc 8:32 0 25G 0 disk
sr0 11:0 1 10.3G 0 rom
[3] 00:24:59 [SUCCESS] root@cephnode3
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 50G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 49G 0 part
├─rhel_cephnode1-root 253:0 0 44G 0 lvm /
└─rhel_cephnode1-swap 253:1 0 5G 0 lvm [SWAP]
sdb 8:16 0 25G 0 disk
sdc 8:32 0 25G 0 disk
sr0 11:0 1 10.3G 0 rom
[4] 00:24:59 [SUCCESS] root@cephnode2
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 50G 0 disk

23
IBM Storage Ceph for Beginner’s

├─sda1 8:1 0 1G 0 part /boot


└─sda2 8:2 0 49G 0 part
├─rhel_cephnode1-root 253:0 0 44G 0 lvm /
└─rhel_cephnode1-swap 253:1 0 5G 0 lvm [SWAP]
sdb 8:16 0 25G 0 disk
sdc 8:32 0 25G 0 disk
sr0 11:0 1 10.3G 0 rom
[root@cephnode1 ~]#

Make sure the disks are free (you can use dd to overwrite the partition table if they already contain
one).

Initial Installation of your IBM Storage Ceph cluster

Refer to https://www.ibm.com/docs/en/storage-ceph/7.1?topic=compatibility-matrix for a list of


supported hardware and software.
Initial installation instructions can be found here https://www.ibm.com/docs/en/storage-
ceph/7.1?topic=installing-initial-installation

We will make use of the cephadm utility to perform the initial cluster installation. The cephadm utility
deploys and manages a Ceph storage cluster. It is tightly integrated with both the command-line
interface (CLI) and the IBM Storage Ceph Dashboard web interface so that you can manage storage
clusters from either environment. Cephadm uses SSH to connect to hosts from the manager daemon
to add, remove, or update Ceph daemon containers. It does not rely on external configuration or
orchestration tools such as Ansible or Rook. The following is a high-level summary of the installation
steps:

Figure 19: IBM Storage Ceph Cluster Installation Steps

Register your IBM Storage Ceph Cluster Nodes with Red Hat
Register your cluster nodes with subscription-manager register (enter your Red Hat customer portal
credentials when prompted).

Check that all nodes are successfully registered.

[root@cephnode1 ~]# cephdsh subscription-manager status


[1] 00:41:00 [SUCCESS] root@cephnode1
+-------------------------------------------+
System Status Details
+-------------------------------------------+
Overall Status: Disabled
Content Access Mode is set to Simple Content Access. This host has access to content,
regardless of subscription status.

24
IBM Storage Ceph for Beginner’s

System Purpose Status: Disabled

[2] 00:41:00 [SUCCESS] root@cephnode4


+-------------------------------------------+
System Status Details
+-------------------------------------------+
Overall Status: Disabled
Content Access Mode is set to Simple Content Access. This host has access to content,
regardless of subscription status.

System Purpose Status: Disabled

[3] 00:41:01 [SUCCESS] root@cephnode3


+-------------------------------------------+
System Status Details
+-------------------------------------------+
Overall Status: Disabled
Content Access Mode is set to Simple Content Access. This host has access to content,
regardless of subscription status.

System Purpose Status: Disabled

[4] 00:41:01 [SUCCESS] root@cephnode2


+-------------------------------------------+
System Status Details
+-------------------------------------------+
Overall Status: Disabled
Content Access Mode is set to Simple Content Access. This host has access to content,
regardless of subscription status.

System Purpose Status: Disabled

[root@cephnode1 ~]#

Disable all the default software repositories on all cluster nodes.

[root@cephnode1 ~]# cephdsh subscription-manager repos --disable=*


[1] 00:50:09 [SUCCESS] root@cephnode4
Repository 'osso-1-for-rhel-9-x86_64-files' is disabled for this system.
Repository 'rh-sso-textonly-1-for-middleware-rpms' is disabled for this system.
.
.
.
Repository 'rhel-atomic-7-cdk-3.16-rpms' is disabled for this system.
Repository 'cert-manager-1.11-for-rhel-9-x86_64-debug-rpms' is disabled for this system.
Repository 'rhocp-4.12-for-rhel-9-x86_64-rpms' is disabled for this system.
Repository 'rhceph-7-tools-for-rhel-9-x86_64-debug-rpms' is disabled for this system.
[root@cephnode1 ~]#

Enable the Red Hat Enterprise Linux BaseOS and AppStream repositories on all cluster nodes.

[root@cephnode1 ~]# cephdsh subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms


[1] 00:51:56 [SUCCESS] root@cephnode3
Repository 'rhel-9-for-x86_64-baseos-rpms' is enabled for this system.
[2] 00:51:59 [SUCCESS] root@cephnode4
Repository 'rhel-9-for-x86_64-baseos-rpms' is enabled for this system.
[3] 00:52:04 [SUCCESS] root@cephnode2
Repository 'rhel-9-for-x86_64-baseos-rpms' is enabled for this system.
[4] 00:52:08 [SUCCESS] root@cephnode1
Repository 'rhel-9-for-x86_64-baseos-rpms' is enabled for this system.
[root@cephnode1 ~]# cephdsh subscription-manager repos --enable=rhel-9-for-x86_64-appstream-
rpms
[1] 00:52:52 [SUCCESS] root@cephnode1
Repository 'rhel-9-for-x86_64-appstream-rpms' is enabled for this system.
[2] 00:52:53 [SUCCESS] root@cephnode3
Repository 'rhel-9-for-x86_64-appstream-rpms' is enabled for this system.
[3] 00:53:01 [SUCCESS] root@cephnode4
Repository 'rhel-9-for-x86_64-appstream-rpms' is enabled for this system.
[4] 00:53:06 [SUCCESS] root@cephnode2
Repository 'rhel-9-for-x86_64-appstream-rpms' is enabled for this system.
[root@cephnode1 ~]#

25
IBM Storage Ceph for Beginner’s

Update all the cluster nodes to the latest RHEL version.

[root@cephnode1 ~]# cephdsh dnf update


[1] 00:54:59 [SUCCESS] root@cephnode1
Updating Subscription Management repositories.
Last metadata expiration check: 0:02:08 ago on Mon 15 Jul 2024 00:52:50.
Dependencies resolved.
Nothing to do.
Complete!
[2] 00:54:59 [SUCCESS] root@cephnode3
Updating Subscription Management repositories.
Last metadata expiration check: 0:02:08 ago on Mon 15 Jul 2024 00:52:50.
Dependencies resolved.
Nothing to do.
Complete!
[3] 00:54:59 [SUCCESS] root@cephnode4
Updating Subscription Management repositories.
Last metadata expiration check: 0:01:59 ago on Mon 15 Jul 2024 00:52:59.
Dependencies resolved.
Nothing to do.
Complete!
[4] 00:54:59 [SUCCESS] root@cephnode2
Updating Subscription Management repositories.
Last metadata expiration check: 0:01:55 ago on Mon 15 Jul 2024 00:53:04.
Dependencies resolved.
Nothing to do.
Complete!
[root@cephnode1 ~]#

Enable the IBM ceph-tools repository on all cluster nodes.

[root@cephnode1 ~]# cephdsh "curl


https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-ceph-7-rhel-9.repo |
sudo tee /etc/yum.repos.d/ibm-storage-ceph-7-rhel-9.repo"
[1] 01:23:34 [SUCCESS] root@cephnode1
[ibm-storage-ceph-7]
name = ibm-storage-ceph-7
baseurl = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/7/rhel9/$basearch/
enabled = 1
gpgcheck = 1
gpgkey = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/RPM-GPG-KEY-IBM-CEPH
Stderr: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 245 100 245 0 0 310 0 --:--:-- --:--:-- --:--:-- 310
[2] 01:23:34 [SUCCESS] root@cephnode2
[ibm-storage-ceph-7]
name = ibm-storage-ceph-7
baseurl = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/7/rhel9/$basearch/
enabled = 1
gpgcheck = 1
gpgkey = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/RPM-GPG-KEY-IBM-CEPH
Stderr: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 245 100 245 0 0 320 0 --:--:-- --:--:-- --:--:-- 319
[3] 01:23:34 [SUCCESS] root@cephnode4
[ibm-storage-ceph-7]
name = ibm-storage-ceph-7
baseurl = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/7/rhel9/$basearch/
enabled = 1
gpgcheck = 1
gpgkey = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/RPM-GPG-KEY-IBM-CEPH
Stderr: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 245 100 245 0 0 318 0 --:--:-- --:--:-- --:--:-- 317
[4] 01:23:34 [SUCCESS] root@cephnode3
[ibm-storage-ceph-7]
name = ibm-storage-ceph-7
baseurl = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/7/rhel9/$basearch/
enabled = 1
gpgcheck = 1
gpgkey = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/RPM-GPG-KEY-IBM-CEPH
Stderr: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 245 100 245 0 0 311 0 --:--:-- --:--:-- --:--:-- 311

26
IBM Storage Ceph for Beginner’s

[root@cephnode1 ~]# cephdsh ls -al /etc/yum.repos.d/ibm-storage-ceph-7-rhel-9.repo


[1] 01:23:52 [SUCCESS] root@cephnode1
-rw-r--r--. 1 root root 245 Jul 15 01:23 /etc/yum.repos.d/ibm-storage-ceph-7-rhel-9.repo
[2] 01:23:52 [SUCCESS] root@cephnode3
-rw-r--r--. 1 root root 245 Jul 15 01:23 /etc/yum.repos.d/ibm-storage-ceph-7-rhel-9.repo
[3] 01:23:52 [SUCCESS] root@cephnode4
-rw-r--r--. 1 root root 245 Jul 15 01:23 /etc/yum.repos.d/ibm-storage-ceph-7-rhel-9.repo
[4] 01:23:52 [SUCCESS] root@cephnode2
-rw-r--r--. 1 root root 245 Jul 15 01:23 /etc/yum.repos.d/ibm-storage-ceph-7-rhel-9.repo
[root@cephnode1 ~]#

Check that IBM ceph-tools repository has been successfully added to the list of repositories on all
cluster nodes.

[root@cephnode1 ~]# cephdsh dnf repolist


[1] 01:24:38 [SUCCESS] root@cephnode1
Updating Subscription Management repositories.
repo id repo name
epel Extra Packages for Enterprise Linux 9 - x86_64
ibm-storage-ceph-7 ibm-storage-ceph-7
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
[2] 01:24:38 [SUCCESS] root@cephnode2
Updating Subscription Management repositories.
repo id repo name
epel Extra Packages for Enterprise Linux 9 - x86_64
ibm-storage-ceph-7 ibm-storage-ceph-7
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
[3] 01:24:38 [SUCCESS] root@cephnode3
Updating Subscription Management repositories.
repo id repo name
epel Extra Packages for Enterprise Linux 9 - x86_64
ibm-storage-ceph-7 ibm-storage-ceph-7
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
[4] 01:24:38 [SUCCESS] root@cephnode4
Updating Subscription Management repositories.
repo id repo name
epel Extra Packages for Enterprise Linux 9 - x86_64
ibm-storage-ceph-7 ibm-storage-ceph-7
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
[root@cephnode1 ~]#

Add and accept the IBM Storage Ceph license on all cluster nodes.

[root@cephnode1 ~]# cephdsh -t 20 dnf install ibm-storage-ceph-license -y


[1] 01:30:11 [SUCCESS] root@cephnode1
Updating Subscription Management repositories.
Last metadata expiration check: 0:04:30 ago on Mon 15 Jul 2024 01:25:37.
Dependencies resolved.
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
ibm-storage-ceph-license noarch 7-2.el9cp ibm-storage-ceph-7 458 k

Transaction Summary
================================================================================
Install 1 Package

Total download size: 458 k


Installed size: 5.4 M
Downloading Packages:
ibm-storage-ceph-license-7-2.el9cp.noarch.rpm 197 kB/s | 458 kB 00:02
--------------------------------------------------------------------------------
Total 196 kB/s | 458 kB 00:02
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction

27
IBM Storage Ceph for Beginner’s

Preparing : 1/1
Installing : ibm-storage-ceph-license-7-2.el9cp.noarch 1/1
Running scriptlet: ibm-storage-ceph-license-7-2.el9cp.noarch 1/1
Your licenses have been installed in /usr/share/ibm-storage-ceph-license/L-XSHK-LPQLHG/UTF8/
System locale: en
NOTICE

This document includes License Information documents below for multiple Programs. Each
License Information document identifies the Program(s) to which it applies. Only those
License Information documents for the Program(s) for which Licensee has acquired entitlements
apply.
.
.
.
You can read this license in another language at /usr/share/ibm-storage-ceph-license/L-XSHK-
LPQLHG/UTF8/

To Accept these provisions:


run `sudo touch /usr/share/ibm-storage-ceph-license/accept`

Then proceed with install


[root@cephnode1 ~]#

Verify the license was installed on all cluster nodes.

[root@cephnode1 ~]# cephdsh rpm -qa ibm-storage-ceph-license


[1] 01:31:57 [SUCCESS] root@cephnode1
ibm-storage-ceph-license-7-2.el9cp.noarch
[2] 01:31:57 [SUCCESS] root@cephnode2
ibm-storage-ceph-license-7-2.el9cp.noarch
[3] 01:31:57 [SUCCESS] root@cephnode3
ibm-storage-ceph-license-7-2.el9cp.noarch
[4] 01:31:58 [SUCCESS] root@cephnode4
ibm-storage-ceph-license-7-2.el9cp.noarch
[root@cephnode1 ~]#

Accept the license provisions on all cluster nodes.

[root@cephnode1 ~]# cephdsh touch /usr/share/ibm-storage-ceph-license/accept


[1] 01:32:32 [SUCCESS] root@cephnode1
[2] 01:32:33 [SUCCESS] root@cephnode2
[3] 01:32:33 [SUCCESS] root@cephnode4
[4] 01:32:33 [SUCCESS] root@cephnode3
[root@cephnode1 ~]# cephdsh ls -l /usr/share/ibm-storage-ceph-license/accept
[1] 01:32:44 [SUCCESS] root@cephnode1
-rw-r--r--. 1 root root 0 Jul 15 01:32 /usr/share/ibm-storage-ceph-license/accept
[2] 01:32:44 [SUCCESS] root@cephnode2
-rw-r--r--. 1 root root 0 Jul 15 01:32 /usr/share/ibm-storage-ceph-license/accept
[3] 01:32:44 [SUCCESS] root@cephnode4
-rw-r--r--. 1 root root 0 Jul 15 01:32 /usr/share/ibm-storage-ceph-license/accept
[4] 01:32:44 [SUCCESS] root@cephnode3
-rw-r--r--. 1 root root 0 Jul 15 01:32 /usr/share/ibm-storage-ceph-license/accept
[root@cephnode1 ~]#

Install ceph-ansible on the ansible Admin node (in our case, this is cephnode1).

[root@cephnode1 ~]# dnf install cephadm-ansible


Updating Subscription Management repositories.
Last metadata expiration check: 0:07:42 ago on Mon 15 Jul 2024 01:25:37.
Dependencies resolved.
=============================================================================================
=================================================================
Package Architecture Version
Repository Size
=============================================================================================
=================================================================
Installing:
cephadm-ansible noarch 1:3.2.0-1.el9cp
ibm-storage-ceph-7 31 k
Installing dependencies:
.
.
.

28
IBM Storage Ceph for Beginner’s

python3-cryptography-36.0.1-4.el9.x86_64 python3-packaging-20.9-5.el9.noarch
python3-ply-3.11-14.el9.noarch
python3-pycparser-2.20-6.el9.noarch python3-pyparsing-2.4.7-9.el9.noarch
python3-resolvelib-0.5.4-5.el9.noarch
sshpass-1.09-4.el9.x86_64

Complete!
[root@cephnode1 ~]#

Configure the Ansible Inventory Location


On our admin node (cephnode1) we will add the ansible inventory location hosts file. This file contains
all the hosts that will be part of the Ceph storage cluster. You can list hosts individually in the inventory
hosts file or you can create groups such as [mons], [osds], [rgws] so that you can target a node or
group of nodes when running an ansible playbook. For a test or POC environment with 4 cluster nodes,
we don’t have dedicated nodes for OSDs for example, so we will just list all the nodes that will be part
of our Ceph cluster.

[root@cephnode1 cephadm-ansible]# cd /usr/share/cephadm-ansible


[root@cephnode1 cephadm-ansible]# mkdir -p inventory/production
[root@cephnode1 cephadm-ansible]# vi inventory/production/hosts
[root@cephnode1 cephadm-ansible]# cat inventory/production/hosts
cephnode2
cephnode3
cephnode4

[admin]
cephnode1
[root@cephnode1 cephadm-ansible]#

Now we need to edit the ansible.cfg file to add the location of our inventory hosts file.

[root@cephnode1 cephadm-ansible]# vi ansible.cfg


[root@cephnode1 cephadm-ansible]# cat ansible.cfg
[defaults]
inventory = ./inventory/production
log_path = $HOME/ansible/ansible.log
library = ./library
module_utils = ./module_utils
roles_path = ./

forks = 20
host_key_checking = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = $HOME/ansible/facts
fact_caching_timeout = 7200
nocows = 1
callback_whitelist = profile_tasks
stdout_callback = yaml
force_valid_group_names = ignore
inject_facts_as_vars = False
retry_files_enabled = False
timeout = 60

[ssh_connection]
control_path = %(directory)s/%%h-%%r-%%p
ssh_args = -o ControlMaster=auto -o ControlPersist=600s
pipelining = True
retries = 10

[root@cephnode1 cephadm-ansible]#

Enabling SSH login as root user on Red Hat Enterprise Linux


We need password-less SSH to work from our installer/admin node. (If you setup ParallelSSH then
you don’t need to do this).

29
IBM Storage Ceph for Beginner’s

[root@cephnode1 sshd_config.d]# cephdsh 'echo "PermitRootLogin yes" >>


/etc/ssh/sshd_config.d/01-permitrootlogin.conf'
[1] 01:56:43 [SUCCESS] root@cephnode1
[2] 01:56:43 [SUCCESS] root@cephnode2
[3] 01:56:43 [SUCCESS] root@cephnode3
[4] 01:56:43 [SUCCESS] root@cephnode4
[root@cephnode1 sshd_config.d]# cat 01-permitrootlogin.conf
PermitRootLogin yes
[root@cephnode1 sshd_config.d]#

[root@cephnode1 sshd_config.d]# cephdsh systemctl restart sshd.service


[1] 01:57:25 [SUCCESS] root@cephnode1
[2] 01:57:26 [SUCCESS] root@cephnode2
[3] 01:57:26 [SUCCESS] root@cephnode3
[4] 01:57:26 [SUCCESS] root@cephnode4
[root@cephnode1 sshd_config.d]#

Test to see if you can ssh to the other cluster nodes without being prompted for a password.

[root@cephnode1 sshd_config.d]# ssh root@cephnode2


Activate the web console with: systemctl enable --now cockpit.socket

Register this system with Red Hat Insights: insights-client --register


Create an account or view all your systems at https://red.ht/insights-dashboard
Last login: Mon Jul 15 01:07:37 2024 from 10.0.0.240
[root@cephnode2 ~]# exit
logout
Connection to cephnode2 closed.
[root@cephnode1 sshd_config.d]#

If you don’t want to use root for the installation then you need to create an ansible user with sudo root
access on all the cluster nodes and setup password-less access for this user (see
https://www.ibm.com/docs/en/storage-ceph/7.1?topic=installation-creating-ansible-user-sudo-
access and https://www.ibm.com/docs/en/storage-ceph/7.1?topic=installation-enabling-password-
less-ssh-ansible).

Run cephadm pre-flight playbook to install all pre-requisites on all


cluster nodes
This Ansible playbook configures the Ceph repository and prepares the storage cluster for
bootstrapping. It also installs some prerequisites, such as podman, lvm2, chrony, and cephadm. The
default location for cephadm-ansible and cephadm-preflight.yml is /usr/share/cephadm-ansible. The
preflight playbook uses the cephadm-ansible inventory file to identify all the admin and nodes in the
storage cluster.

[root@cephnode1 cephadm-ansible]# ansible-playbook -i ./inventory/production/hosts cephadm-


preflight.yml --extra-vars "ceph_origin=ibm"
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new
standard, use callbacks_enabled instead. This feature will be removed
from ansible-core in version 2.15. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.

PLAY [insecure_registries]
*********************************************************************************************
**************************************

TASK [fail if insecure_registry is undefined]


*********************************************************************************************
*******************
Monday 15 July 2024 02:05:48 +0200 (0:00:00.024) 0:00:00.024 ***********
skipping: [cephnode2]

PLAY [preflight]
*********************************************************************************************
************************************************
.
.

30
IBM Storage Ceph for Beginner’s

.
PLAY RECAP
*********************************************************************************************
******************************************************
cephnode1 : ok=8 changed=3 unreachable=0 failed=0 skipped=25
rescued=0 ignored=0
cephnode2 : ok=8 changed=3 unreachable=0 failed=0 skipped=29
rescued=0 ignored=0
cephnode3 : ok=8 changed=3 unreachable=0 failed=0 skipped=25
rescued=0 ignored=0
cephnode4 : ok=8 changed=3 unreachable=0 failed=0 skipped=25
rescued=0 ignored=0

Monday 15 July 2024 02:07:16 +0200 (0:00:00.101) 0:01:28.337 ***********


===============================================================================
install ceph-common on rhel -----------------------------------------------------------------
--------------------------------------------------------- 42.80s
install prerequisites packages on servers ---------------------------------------------------
--------------------------------------------------------- 28.88s
remove remaining local services ceph packages -----------------------------------------------
---------------------------------------------------------- 9.84s
configure ceph repository key ---------------------------------------------------------------
---------------------------------------------------------- 2.62s
Gathering Facts -----------------------------------------------------------------------------
---------------------------------------------------------- 1.53s
ensure chronyd is running -------------------------------------------------------------------
---------------------------------------------------------- 0.60s
configure ceph stable repository ------------------------------------------------------------
---------------------------------------------------------- 0.38s
install docker ------------------------------------------------------------------------------
---------------------------------------------------------- 0.13s
add registry as insecure registry in registries.conf ----------------------------------------
---------------------------------------------------------- 0.10s
fail if insecure_registry is undefined ------------------------------------------------------
---------------------------------------------------------- 0.09s
fail if insecure_registry is undefined ------------------------------------------------------
---------------------------------------------------------- 0.09s
uninstall old version packages --------------------------------------------------------------
---------------------------------------------------------- 0.08s
set_fact _ceph_repo -------------------------------------------------------------------------
---------------------------------------------------------- 0.06s
fail if insecure_registry is undefined ------------------------------------------------------
---------------------------------------------------------- 0.06s
configure Ceph custom repositories ----------------------------------------------------------
---------------------------------------------------------- 0.06s
remove ceph_stable repositories -------------------------------------------------------------
---------------------------------------------------------- 0.06s
install prerequisites packages on clients ---------------------------------------------------
---------------------------------------------------------- 0.06s
setup custom repositories -------------------------------------------------------------------
---------------------------------------------------------- 0.05s
fail if baseurl is not defined for ceph_custom_repositories ---------------------------------
---------------------------------------------------------- 0.05s
enable red hat ceph storage tools repository ------------------------------------------------
---------------------------------------------------------- 0.05s
[root@cephnode1 cephadm-ansible]#

Make sure in the Play Recap none of the tasks failed. Resolve any issues and re-run the playbook as
required.

Bootstrapping a new storage cluster

We are now ready to bootstrap our IBM Storage Ceph cluster using the cephadm utility. There are
some important pre-requisites that you need to be aware of that are documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=installation-bootstrapping-new-storage-
cluster.

31
IBM Storage Ceph for Beginner’s

Access to container images is necessary for the successful deployment of IBM Storage Ceph.
Container images are hosted on the IBM Cloud Container Registry, ICR. To do this, navigate to the
following URL and login with your IBM ID and password https://myibm.ibm.com/products-
services/containerlibrary.

Figure 20: IBM Cloud Container Registry

Once you login, generate a new entitlement key by clicking on “Add new key”:

Figure 21: IBM Cloud Container Registry – Adding an entitlement key

Once you have obtained the entitlement key, you need to create a JSON file with this key (see
https://www.ibm.com/docs/en/storage-ceph/7.1?topic=cluster-using-json-file-protect-login-
information). Note, the login username is always “cp” not your IBM ID. The format of the JSON file is
as follows:

[root@cephnode1 ~]# cat /etc/registry.json


{
"url":"cp.icr.io/cp",
"username":"cp",
"password":"<YOUR ENTITLEMENT KEY OBTAINED FROM THE ICR IN THE STEP ABOVE PASTED HERE>"
}
[root@cephnode1 ~]#

You can test access to the IBM Cloud Container Registry by issuing the podman login cp.icr.io. Login
using cp as the username and your entitlement key as the password. You can also test access to the
IBM Storage Ceph container images using the following command skopeo list-tags
docker://cp.icr.io/cp/ibm-ceph/ceph-7-rhel9.

For experienced Ceph users, you can bootstrap your cluster using a service configuration file. The
service configuration file is a YAML file that contains the service type, placement, and designated

32
IBM Storage Ceph for Beginner’s

nodes for services that you want to deploy in your cluster. Since we want to take the easy route and
deploy a single node cluster and use the Ceph Dashboard wizard to add additional cluster nodes and
deploy services we won’t use a service configuration file.

Remember, in our lab setup we want to separate OSD traffic to a private network. The public network
is extrapolated from the --mon-ip parameter that is provided by the bootstrap command. The cluster
network can be provided during the bootstrap operation by using the --cluster-network parameter. If
the --cluster-network parameter is not specified, it is set to the same value as the public network
value.

A full list of the available options for cephadm is available here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=cluster-bootstrap-command-options.

In our example, we want to use FQDNs and also, we saved our ICR login details in a JSON file so need
to specify it. If you don’t have a private network for OSD traffic then you don’t need to specify the --
cluster-network.

[root@cephnode1 ~]#[root@cephnode1 ~]# cephadm bootstrap --cluster-network 192.168.1.0/24 --


mon-ip 10.0.0.240 --allow-fqdn-hostname --registry-json /etc/registry.json
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.9.4 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: e7fcc1ac-42ec-11ef-a58f-bc241172f341
Verifying IP 10.0.0.240 port 3300 ...
Verifying IP 10.0.0.240 port 6789 ...
Mon IP `10.0.0.240` is in CIDR network `10.0.0.0/8`
Mon IP `10.0.0.240` is in CIDR network `10.0.0.0/8`
Pulling custom registry login info from /etc/registry.json.
Logging into custom registry.
Pulling container image cp.icr.io/cp/ibm-ceph/ceph-7-rhel9:latest...
Ceph version: ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef
(stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
firewalld ready
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting public_network to 10.0.0.0/8 in mon config section
Setting cluster_network to 192.168.1.0/24
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 0.0.0.0:9283 ...
Verifying port 0.0.0.0:8765 ...
Verifying port 0.0.0.0:8443 ...
firewalld ready
firewalld ready
Enabling firewalld port 9283/tcp in current zone...
Enabling firewalld port 8765/tcp in current zone...
Enabling firewalld port 8443/tcp in current zone...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr is available

33
IBM Storage Ceph for Beginner’s

Enabling cephadm module...


Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to /etc/ceph/ceph.pub
Adding key to root@localhost authorized_keys...
Adding host cephnode1.local...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Deploying ceph-exporter service with default placement...
Deploying prometheus service with default placement...
Deploying grafana service with default placement...
Deploying node-exporter service with default placement...
Deploying alertmanager service with default placement...
Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for mgr epoch 9...
mgr epoch 9 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
firewalld ready
Ceph Dashboard is now available at:

URL: https://cephnode1.local:8443/
User: admin
Password: 3cb37m2xt6

Enabling client.admin keyring and conf on hosts with "admin" label


Saving cluster configuration to /var/lib/ceph/e7fcc1ac-42ec-11ef-a58f-bc241172f341/config
directory
Skipping call home integration. --enable-ibm-call-home not provided
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:

sudo /usr/sbin/cephadm shell --fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341 -c


/etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Or, if you are only running a single cluster on this host:

sudo /usr/sbin/cephadm shell

Please consider enabling telemetry to help improve Ceph:

ceph telemetry on

For more information see:

https://docs.ceph.com/en/latest/mgr/telemetry/

Bootstrap complete.
[root@cephnode1 ~]#

Note the URL, port and username/password that is output from the bootstrap command.

Consider contributing to the Ceph project by enabling telemetry. The telemetry sends anonymous
data about the cluster back to the Ceph developers to help understand how Ceph is used and what
problems users may be experiencing.

This data is visualized on public dashboards that allow the community to quickly see summary
statistics on how many clusters are reporting, their total capacity and OSD count, and version
distribution trends (see https://telemetry-public.ceph.com/).

34
IBM Storage Ceph for Beginner’s

Figure 22: Ceph Public Telemetry Dashboard https://telemetry-public.ceph.com

Distributing Ceph Cluster SSH keys to all nodes

We need to distribute the Ceph cluster public SSH key to all nodes in the cluster. We will use the
cephadm-distribute-ssh-key.yml playbook to distribute the SSH keys instead of creating and
distributing the keys manually.

[root@cephnode1 cephadm-ansible]# ansible-playbook -i ./inventory/production/hosts cephadm-


distribute-ssh-key.yml -e cephadm_ssh_user=root -e admin_node=cephnode1.local
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new
standard, use callbacks_enabled instead. This feature will be removed from ansible-core in
version 2.15. Deprecation warnings can be disabled by setting deprecation_warnings=False in
ansible.cfg.

PLAY [all]
*********************************************************************************************
****************************************************************************

TASK [fail if admin_node is not defined]


*********************************************************************************************
**********************************************
Thursday 18 July 2024 22:28:13 +0200 (0:00:00.031) 0:00:00.031 *********
skipping: [cephnode2]
.
.
.
PLAY RECAP
*********************************************************************************************
****************************************************************************
cephnode1 : ok=1 changed=0 unreachable=0 failed=0 skipped=0
rescued=0 ignored=0
cephnode2 : ok=3 changed=1 unreachable=0 failed=0 skipped=3
rescued=0 ignored=0
cephnode3 : ok=1 changed=1 unreachable=0 failed=0 skipped=0
rescued=0 ignored=0
cephnode4 : ok=1 changed=1 unreachable=0 failed=0 skipped=0
rescued=0 ignored=0

Thursday 18 July 2024 22:28:17 +0200 (0:00:01.512) 0:00:04.200 *********


===============================================================================
get the cephadm ssh pub key -----------------------------------------------------------------
-------------------------------------------------------------------------------- 1.72s
set cephadm ssh user to root ----------------------------------------------------------------
-------------------------------------------------------------------------------- 1.51s
allow ssh public key for root account -------------------------------------------------------
-------------------------------------------------------------------------------- 0.86s
fail if admin_node is not defined -----------------------------------------------------------
-------------------------------------------------------------------------------- 0.02s

35
IBM Storage Ceph for Beginner’s

fail if {{ cephadm_pubkey_path }} doesn't exist ---------------------------------------------


-------------------------------------------------------------------------------- 0.02s
get details about {{ cephadm_pubkey_path }} -------------------------------------------------
-------------------------------------------------------------------------------- 0.02s
[root@cephnode1 cephadm-ansible]#

Verifying the cluster installation


After the bootstrap process is complete, we should have a functioning IBM Storage Ceph single node
cluster. We can use the cephadm shell to query the details of our storage cluster.

[root@cephnode1 ~]# cephadm shell


Inferring fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341
Inferring config /var/lib/ceph/e7fcc1ac-42ec-11ef-a58f-bc241172f341/mon.cephnode1/config
Using ceph image with id 'a09ffce67935' and tag 'latest' created on 2024-05-31 19:48:46 +0000
UTC
cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:354f5b6f203dbd9334ac2f2bfb541c7e06498a62283b1c91cef5fa8a036aea4f
[ceph: root@cephnode1 /]# ceph -s
cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3

services:
mon: 1 daemons, quorum cephnode1 (age 7m)
mgr: cephnode1.tbqyke(active, since 4m)
osd: 0 osds: 0 up, 0 in

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:

[ceph: root@cephnode1 /]#

We only have our admin node added to the cluster for now.
[ceph: root@cephnode1 /]# ceph orch host ls
HOST ADDR LABELS STATUS
cephnode1.local 10.0.0.240 _admin
1 hosts in cluster
[ceph: root@cephnode1 /]#

If you want to add node labels, you can specify them from the command line as follows:
[ceph: root@cephnode1 /]# ceph orch host label add cephnode1.local mon
Added label mon to host cephnode1.local
[ceph: root@cephnode1 /]# ceph orch host label add cephnode1.local mgr
Added label mgr to host cephnode1.local
[ceph: root@cephnode1 /]# ceph orch host label add cephnode1.local osd
Added label osd to host cephnode1.local
[ceph: root@cephnode1 /]# ceph orch host ls
HOST ADDR LABELS STATUS
cephnode1.local 10.0.0.240 _admin,mon,mgr,osd
1 hosts in cluster
[ceph: root@cephnode1 /]#

We can also label nodes in the Ceph Dashboard GUI. Labelling nodes makes it easier to deploy Ceph
services (e.g. deploy RGW to all nodes labelled with RGW).

We can also configure all available disks as OSDs from the command line (or we can do this via the
Ceph Dashboard GUI when adding additional cluster nodes). If you prefer to do this from the command
line, you can do so as follows:
[ceph: root@cephnode1 /]# ceph orch apply osd --all-available-devices
Scheduled osd.all-available-devices update...
[ceph: root@cephnode1 /]#

36
IBM Storage Ceph for Beginner’s

You can see that the two available disks were automatically configured for us.
[ceph: root@cephnode1 /]# ceph -s
cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_WARN
OSD count 2 < osd_pool_default_size 3

services:
mon: 1 daemons, quorum cephnode1 (age 37m)
mgr: cephnode1.tbqyke(active, since 35m)
osd: 2 osds: 2 up (since 18s), 2 in (since 32s)

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 453 MiB used, 50 GiB / 50 GiB avail
pgs:

[ceph: root@cephnode1 /]#

Logging into the Ceph Dashboard and Adding Cluster Nodes

A Ceph cluster typically has at least three manager (MGR) daemons that are responsible amongst
other things to provide the Ceph dashboard web-based GUI (they also collect and distribute statistics
and perform rebalancing and other cluster tasks). You would ideally deploy MGR and MON daemons
on the same nodes to achieve the same level of availability. A Ceph cluster will typically have one
active and two standby managers. Unlike the MON daemons, there is no requirement for MGR
daemons to maintain quorum and the cluster can tolerate the loss of all 3 managers (you can just
startup a MGR daemon on one of the remaining nodes for example). When you bootstrap your cluster,
the dashboard URL (the admin node port 8443) and login username and password are provided. The
first time you connect to the Ceph dashboard, you need to accept the security risk and continue (due
to Ceph using a self-signed certificate). Login and admin and password you previously recorded from
the bootstrap process.

You can reset the dashboard password from the command line.
https://www.ibm.com/docs/en/storage-ceph/7.1?topic=ia-changing-ceph-dashboard-password-
using-command-line-interface.

Before we add additional cluster nodes, remember for the minimum recommended 4-node cluster we
want to deploy the following Ceph daemons keeping in mind the service collocation guidelines (Note:
Ceph services are logical groups of Ceph daemons that are configured to run together in a Ceph
cluster. Ceph daemons are individual processes that run in the background when a Ceph service
starts).

Figure 23: Recommended Ceph service collocation for a 4-node cluster

37
IBM Storage Ceph for Beginner’s

Figure 24: Ceph Dashboard Login Page

You will initially be prompted to change the current password. Once you have changed the password
you need to add additional nodes to your Ceph cluster via the “Expand Cluster” wizard. Note, you have
alerts to enable the public telemetry and to enable IBM Call Home (again, this is one of the reasons
why you would choose IBM Storage Ceph over the publicly available version).

Figure 25: Ceph expand your cluster wizard after initial login

As you can see, all the Ceph core services (blue) are deployed on our admin/install node. Also, you
can see the node labels (black). As mentioned earlier, when you deploy Ceph services, you can specify
deployment based on labels instead of specify the exact nodes. Lastly, you can see a summary of the
node resources for each node.

38
IBM Storage Ceph for Beginner’s

Figure 26: Ceph expand your cluster wizard – Add Hosts

We need to add the 3 other cluster nodes to the cluster. Click on “+ Add” to add your nodes. You need
to insert your node hostname (since we using FQDN as specified during bootstrap). You can see, some
default labels are automatically added to the node.

Figure 27: Ceph expand your cluster wizard – Adding your cluster nodes

Add the rest of the nodes and before you click “Next”, check that the labels are correct to match the
desired service colocation for a 4-node cluster. You can edit the labels by clicking on the node and
editing the labels if they are not (“Edit”).

39
IBM Storage Ceph for Beginner’s

Figure 28: Ceph expand your cluster wizard – Checking Node Labels

Once you have completed the node labels, you can click “Next”.

Figure 29: Ceph expand your cluster wizard – Recommended service collocation for a 4-node cluster

On the next screen you can create OSDs and enable data-at-rest (DAE) encryption (discussed later in
this document). For now, since we already set OSDs to all available devices on the cli (equivalent to
Cost/Capacity Optimized), we don’t have to choose anything. For a detailed explanation of all the
available options for OSD creation (including the Advanced Mode options) see here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=osds-managing.

40
IBM Storage Ceph for Beginner’s

Figure 30: Ceph expand your cluster wizard – OSD Creation

Clicking on “Next” takes us to the “Create Services” step. We will deploy protocols separately so for
now just accept the default mandatory cluster services and their respective counts. As an example,
the Grafana service will be deployed on a cluster node that is tagged with the correct label (or if no
label is specified it will be deployed on any of the manager nodes).

Figure 31: Ceph expand your cluster wizard – Create Services

Select “Next” and you will have a chance to review your configuration and accept it.

41
IBM Storage Ceph for Beginner’s

Figure 32: Ceph expand your cluster wizard – Review

Depending on the size of your cluster, your GUI will spit out some warnings as the additional nodes
and services are added. Once completed, you should have a stable cluster.

Figure 33: Ceph Cluster Dashboard – Healthy Status

Check you cluster inventory to make sure you have the correct number of deployed services. In our
example, the Expand Cluster wizard deployed 4 monitors.

42
IBM Storage Ceph for Beginner’s

Figure 34: Ceph Cluster Dashboard – Inventory

We only want 3 monitors. To correct this, navigate to Administration -> Services and select and edit
the mon service. Change the count to 3.

Figure 35: Ceph Edit Service Count

If you expand the service, you will see which cluster nodes the service is deployed on.

43
IBM Storage Ceph for Beginner’s

Figure 36: Ceph Service Information

As a reference, our example 4-node cluster has the following services and node labels deployed.

Figure 37: Ceph 4-node cluster services and node labels

If you navigate to Cluster -> Pools, you will see that a default .mgr pool is created with a 3x replica
protection scheme.

44
IBM Storage Ceph for Beginner’s

Figure 38: Ceph Cluster Pools

If you navigate to Cluster -> Hosts, you can check the status and get information on all your cluster
nodes.

Figure 39: Ceph Cluster Hosts

If you navigate to Cluster -> OSDs, you can check the status and get information on all your cluster
OSDs.

45
IBM Storage Ceph for Beginner’s

Figure 40: Ceph Cluster OSDs

If you navigate to Cluster -> Physical Disks, you can get information on all your cluster physical disks.

Figure 41: Ceph Cluster Physical Disks

If you navigate to Cluster -> CRUSH map, you can check the current CRUSH map hierarchy. As
explained earlier, our CRUSH map failure domains in our example are hosts (each replica will reside
on a separate host).

46
IBM Storage Ceph for Beginner’s

Figure 42: Ceph Cluster CRUSH map

Lastly, we can view the status of the MONITOR daemons. Click on Cluster -> Monitors. The Ceph
cluster quorum is displayed here. We need at least 2 MONs for the cluster to remain active.

Figure 43: Ceph Cluster Monitors

Now that you have a working IBM Storage Ceph cluster, we will now start to deploy the different
protocols.

IBM Storage Ceph RADOS Gateway (RGW) Deployment


Ceph Object Gateway, also known as RADOS Gateway (RGW), is an object storage interface built on
top of the librados library to provide applications with a RESTful gateway to Ceph storage clusters. We
will start by deploying two RGWs then convert them to a highly available configuration. Later, we will

47
IBM Storage Ceph for Beginner’s

demonstrate a multi-site configuration. Comprehensive instructions on deploying RGWs can be found


here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=gateway-basic-configuration

Before we start, there are a few concepts that we need to understand. The Ceph Object Gateway
typically supports a single site or multi-site deployment. In order to support a multi-site configuration,
Ceph makes use of realms, zonegroups (formerly called regions) and zones. A realm represents a
globally unique namespace consisting of one or more zonegroups containing one or more zones with
each zone supported by one or more rgw instances and backed by a single Ceph storage cluster. A
single zone contains buckets, which in turn contain objects. A realm enables the Ceph Object Gateway
to support multiple namespaces and their configuration on the same hardware.

Figure 44: Ceph Object Gateway Realm (https://www.ibm.com/docs/en/storage-


ceph/7.1?topic=gateway-multi-site-configuration-administration)

For our purposes, we want to first deploy a single site configuration. To do this, we need a single
zonegroup which contains one zone with one or more Ceph RGW instances. You can specify the realm,
zonegroup and zone when creating a RGW service or accept the default. Navigate to Administration -
> Services -> Create and select RGW.

48
IBM Storage Ceph for Beginner’s

Figure 45: Ceph RGW service

You have to specify the service id (in our case it’s rgw_default) and also the placement of the service
(we will use label since we already added the required rgw labels to each Ceph cluster node). To
support a muti-site configuration, you can create your own realm, zonegroup and zone names (or
accept the default). If you are not planning to test a multi-site configuration then you can just accept
the defaults.

Figure 46: Ceph RGW service – Create Realm, Zonegroup and Zone

Finally, you need to specify the port for the RGW service to run which in our case is port 8080.

49
IBM Storage Ceph for Beginner’s

Figure 47: Ceph RGW service Inputs

You can verify if the RGW instances are deployed after clicking on “Create Service”.

Figure 48: Ceph RGW instances

You will notice that a set of default pools are created when you deploy the RGW service. The .rgw.root
pool is where the configuration for the Ceph Object Gateway (RGW) is stored. This includes
information such as realms, zone groups, and zones. Then 3 pools are created per zone as illustrated
below.

50
IBM Storage Ceph for Beginner’s

Figure 49: Ceph RGW service default pools

One of the advantages of containerized Ceph is that firewall rules are automatically updated on the
relevant Ceph cluster nodes for each service that is deployed. As an example, if we enabled port 8080
we see that it is already enabled on our RGW nodes.

[root@cephnode1 ~]# cephdsh firewall-cmd --zone=public --add-port=8080/tcp --permanent


[1] 22:33:46 [SUCCESS] root@cephnode4
success
[2] 22:33:46 [SUCCESS] root@cephnode3
success
Stderr: Warning: ALREADY_ENABLED: 8080:tcp
[3] 22:33:46 [SUCCESS] root@cephnode2
success
Stderr: Warning: ALREADY_ENABLED: 8080:tcp
[4] 22:33:46 [SUCCESS] root@cephnode1
success
[root@cephnode1 ~]#

You can also check the status of the RGWs from the command line as follows:

[ceph: root@cephnode1 /]# ceph orch ps --daemon_type=rgw


NAME HOST PORTS STATUS REFRESHED AGE MEM
USE MEM LIM VERSION IMAGE ID CONTAINER ID
rgw.rgw_default.cephnode2.oyesnm cephnode2.local *:8080 running (37m) 6m ago 37m
88.8M - 18.2.1-194.el9cp a09ffce67935 7f393ec48fea
rgw.rgw_default.cephnode3.lovrlw cephnode3.local *:8080 running (37m) 6m ago 37m
89.0M - 18.2.1-194.el9cp a09ffce67935 a68afda7db70
[ceph: root@cephnode1 /]#

Now that we have deployed the RGWs, we can manage this service from the Ceph dashboard.

51
IBM Storage Ceph for Beginner’s

Figure 50: Ceph Dashboard Object Protocol

You can use the radosgw-admin command to query the current RGW service configuration from the
command line. As an example, we will query the zonegroup and zone configuration for our
configuration as depicted below:

[ceph: root@cephnode1 /]# radosgw-admin zonegroup list


{
"default_info": "ace3190f-c02e-40e2-abdd-344efc6ab06c",
"zonegroups": [
"LAB_ZONE_GROUP1"
]
}
[ceph: root@cephnode1 /]# radosgw-admin zonegroup get --rgw-zonegroup="LAB_ZONE_GROUP1"
{
"id": "ace3190f-c02e-40e2-abdd-344efc6ab06c",
"name": "LAB_ZONE_GROUP1",
"api_name": "LAB_ZONE_GROUP1",
"is_master": true,
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "7109acc2-883b-4502-b863-4e7097d7d83c",
"zones": [
{
"id": "7109acc2-883b-4502-b863-4e7097d7d83c",
"name": "DC1_ZONE",
"endpoints": [],
"log_meta": false,
"log_data": false,
"bucket_index_max_shards": 11,
"read_only": false,
"tier_type": "",
"sync_from_all": true,
"sync_from": [],
"redirect_zone": "",
"supported_features": [
"compress-encrypted",
"resharding"
]
}
],
"placement_targets": [
{
"name": "default-placement",

52
IBM Storage Ceph for Beginner’s

"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "a9c5b73e-66bd-4e10-9a2f-5df2f6fc515a",
"sync_policy": {
"groups": []
},
"enabled_features": [
"resharding"
]
}
[ceph: root@cephnode1 /]# radosgw-admin zone get
{
"id": "7109acc2-883b-4502-b863-4e7097d7d83c",
"name": "DC1_ZONE",
"domain_root": "DC1_ZONE.rgw.meta:root",
"control_pool": "DC1_ZONE.rgw.control",
"gc_pool": "DC1_ZONE.rgw.log:gc",
"lc_pool": "DC1_ZONE.rgw.log:lc",
"log_pool": "DC1_ZONE.rgw.log",
"intent_log_pool": "DC1_ZONE.rgw.log:intent",
"usage_log_pool": "DC1_ZONE.rgw.log:usage",
"roles_pool": "DC1_ZONE.rgw.meta:roles",
"reshard_pool": "DC1_ZONE.rgw.log:reshard",
"user_keys_pool": "DC1_ZONE.rgw.meta:users.keys",
"user_email_pool": "DC1_ZONE.rgw.meta:users.email",
"user_swift_pool": "DC1_ZONE.rgw.meta:users.swift",
"user_uid_pool": "DC1_ZONE.rgw.meta:users.uid",
"otp_pool": "DC1_ZONE.rgw.otp",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "DC1_ZONE.rgw.buckets.index",
"storage_classes": {
"STANDARD": {
"data_pool": "DC1_ZONE.rgw.buckets.data"
}
},
"data_extra_pool": "DC1_ZONE.rgw.buckets.non-ec",
"index_type": 0,
"inline_data": true
}
}
],
"realm_id": "a9c5b73e-66bd-4e10-9a2f-5df2f6fc515a",
"notif_pool": "DC1_ZONE.rgw.log:notif"
}
[ceph: root@cephnode1 /]#

Note the placement target for our zonegroup is set to default_placement. Placement targets control
which pools are associated with a bucket and cannot be modified once a bucket is created. Storage
classes specify the placement of object data. S3 Bucket Lifecycle (LC) rules can automate the
transition of objects between storage classes. Storage classes are defined in terms of placement
targets. Each zonegroup placement target lists its available storage classes with an initial class named
STANDARD. The zone configuration is responsible for providing a data_pool pool name for each of the
zone group’s storage classes. Demonstrating Bucket Lifecycle policy is outside the scope of this
document but it’s important to understand how this works.

Before we start using the object service, let us configure high availability for the Ceph object gateways
we deployed. Even though we have two RGWs running, a failure of one instance or cluster node will
result in all clients using that gateway failing to connect. Also, we have to statically configure the

53
IBM Storage Ceph for Beginner’s

clients across the two RGWs in order to distribute the workload which is not ideal. Luckily, Ceph
includes a built-in load balancer which is referred to as an ingress service. The ingress service allows
you to create a high availability endpoint for RGW with a minimum set of configuration options. The
orchestrator will deploy and manage a combination of haproxy and keepalived to provide load
balancing on a floating virtual IP. We also want to use SSL for our object gateway service as most
commercial applications require a secure connection to an object store and won’t work over standard
http. This requires SSL termination by the ingress service (and not on the object gateways
themselves).

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=gateway-high-availability-service

Figure 51: Ceph RGW High Availability Architecture (https://www.ibm.com/docs/en/storage-


ceph/7.1?topic=gateway-high-availability-service)

Firstly, let us create a self-signed certificate for use by the ingress service. We will use cephs3.local
(10.0.0.244) as the virtual IP address for our ingress service. All object clients will connect to the
virtual IP and will be load balanced across our two RGW instances automatically. If you not planning
on using SSL, then you can skip this step.

[root@cephnode2 cert]# cat cert.txt


[req]
default_bits = 2048
default_md = sha256
distinguished_name = req_distinguished_name
x509_extensions = v3_req
prompt = no

[req_distinguished_name]
C = ZA
ST = Gauteng
L = Johannesburg
O = Acme Ltd
OU = Storage Management
CN = cephs3.local

[v3_req]
keyUsage = digitalSignature, keyEncipherment, nonRepudiation, keyCertSign
extendedKeyUsage = serverAuth

54
IBM Storage Ceph for Beginner’s

subjectAltName = @alt_names

[alt_names]
DNS.1 = *.local
[root@cephnode2 cert]#

[root@cephnode2 cert]# openssl req -new -nodes -x509 -days 365 -keyout cephs3.key -out
cephs3.crt -config ./cert.txt -addext 'basicConstraints = critical,CA:TRUE'
....+...+...........+......+......+....+.....+......+...+....+...++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++*.........+..+............+...+.......+.....+.+.....+...
....+...+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*...+.....+...+....+
.....+....+............+........+.........+...+...+......+.+...+..+.........+.+.........+...+
.....................+.....+..........+......+.....+.+...+..+....+......+++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++
....+...+..+.+........+..........+..+.........+.............+........++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++*......+.+...+.........+++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++*.+......+....+.....+..........+......+.....+......+...+.
......+...........+....+...........+.............+.....+..........+............+............+
..+.+...+...........+.+.....+....+..................+.....+.+.....+.+...+..+...+.......+.....
.+..+......+.......+.....+....+.....+.+...+.....+....+..+......+............+.+..+...........
.+.......+.....+..........+.........+..+.......+..+.+.........+...+......+.....+....+...+...+
.........+..+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----
[root@cephnode2 cert]# [root@cephnode2 cert]# openssl x509 -noout -text -in cephs3.pem
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
09:ac:c8:39:8e:5d:25:15:4c:e4:bc:2d:ca:0b:e2:0e:a5:f4:4d:83
Signature Algorithm: sha256WithRSAEncryption
Issuer: C = ZA, ST = Gauteng, L = Johannesburg, O = Acme Ltd, OU = Storage
Management, CN = cephs3.local
Validity
Not Before: Jul 19 19:25:43 2024 GMT
Not After : Jul 19 19:25:43 2025 GMT
Subject: C = ZA, ST = Gauteng, L = Johannesburg, O = Acme Ltd, OU = Storage
Management, CN = cephs3.local
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
.
.
.
91:f7:75:ea:cd:1f:68:2a:a6:fa:37:34:3a:b1:34:4a:28:46:
e0:cb:f2:1a:03:b0:4b:a0:43:14:c3:2b:6b:43:1e:33:8a:80:
f7:33:2c:61:21:70:7d:96:ea:ef:02:74:f8:1f:25:38:15:47:
f3:bf:5b:32:69:a5:84:20:0e:6d:5b:cb:52:eb:42:21:e9:34:
6f:98:95:1a:71:29:33:d7:8f:6e:ab:3d:1c:ce:8d:7b:42:cd:
a9:1a:d0:e6
[root@cephnode2 cert]#
[root@cephnode2 cert]# openssl verify cephs3.crt
C = ZA, ST = Gauteng, L = Johannesburg, O = Acme Ltd, OU = Storage Management, CN =
cephs3.local
error 18 at 0 depth lookup: self-signed certificate
error cephs3.crt: verification failed
[root@cephnode2 cert]# cp cephs3.crt /etc/pki/ca-trust/source/anchors/
[root@cephnode2 cert]# update-ca-trust enable; update-ca-trust; update-ca-trust extract

[root@cephnode2 cert]# trust list | more


pkcs11:id=%8D%3B%6F%FE%85%9A%6B%AE%25%FB%AB%5D%72%12%D9%C0%5F%BD%8E%B8;type=cert
type: certificate
label: cephs3.local
trust: anchor
category: authority

pkcs11:id=%42%3D%2B%24%A6%C1%45%CE;type=cert
type: certificate
label: A-Trust-Qual-02
trust: anchor
category: authority
.
.
.
[root@cephnode2 cert]# openssl verify cephs3.crt
cephs3.crt: OK
[root@cephnode2 cert]# openssl verify cephs3.pem
cephs3.pem: OK

55
IBM Storage Ceph for Beginner’s

[root@cephnode2 cert]#

[root@cephnode3 ~]# openssl verify /tmp/cephs3.crt


/tmp/cephs3.crt: OK
[root@cephnode3 ~]# openssl verify /tmp/cephs3.pem
/tmp/cephs3.pem: OK
[root@cephnode3 ~]#

You can deploy the ingress service using the ceph orchestrator or via the dashboard. To use the ceph
orchestrator, we need to create in input ingress.yaml file.

You can use a host of web-based tools to validate your input yaml file syntax. As an example,
https://www.yamllint.com/.

[root@cephnode1 ~]# cat ingress.yaml


service_type: ingress
service_id: rgw.rgw_default
placement:
label: rgw
count: 2
spec:
backend_service: rgw.rgw_default
virtual_ip: 10.0.0.244/24
frontend_port: 443
monitor_port: 1900
ssl_cert: |
-----BEGIN CERTIFICATE-----
MIID8DCCAtigAwIBAgIUCazIOY5dJRVM5LwtygviDqX0TYMwDQYJKoZIhvcNAQEL
BQAwfTELMAkGA1UEBhMCWkExEDAOBgNVBAgMB0dhdXRlbmcxFTATBgNVBAcMDEpv
aGFubmVzYnVyZzERMA8GA1UECgwIQWNtZSBMdGQxGzAZBgNVBAsMElN0b3JhZ2Ug
TWFuYWdlbWVudDEVMBMGA1UEAwwMY2VwaHMzLmxvY2FsMB4XDTI0MDcxOTE5MjU0
M1oXDTI1MDcxOTE5MjU0M1owfTELMAkGA1UEBhMCWkExEDAOBgNVBAgMB0dhdXRl
bmcxFTATBgNVBAcMDEpvaGFubmVzYnVyZzERMA8GA1UECgwIQWNtZSBMdGQxGzAZ
BgNVBAsMElN0b3JhZ2UgTWFuYWdlbWVudDEVMBMGA1UEAwwMY2VwaHMzLmxvY2Fs
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAvWWkcit9DmFgjCawJe54
9MKDjcb87XEkH41q5+aLlCQ+4z9ndWgN8n3kVfuhexP5L5RTtRv3uDUxTSzbEj5o
QNw3n7y+7C/0S6uJvADwX5bmljNX+Usi/KDP+pa3DwSH1hJrHfrsLhicuIPbnCqW
V4D8oKYoNECvgY29VyXBC1g2yod/F9jDZuefZGG9faSVReOOBjwL7xsQrmlnC7ZW
K8C35PttAB7HbnS6UmMSgyPrxcOUPsORuCcQhlXWx9u9dgiKAK9gKtyadYue3nmM
2iwFQN/y3PO6AaDx4ASxoNMtY2P2rhjoKCRllvEbLAPla5R99IxXkX3fIK4eNrN8
ywIDAQABo2gwZjALBgNVHQ8EBAMCAuQwEwYDVR0lBAwwCgYIKwYBBQUHAwEwEgYD
VR0RBAswCYIHKi5sb2NhbDAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBSNO2/+
hZprriX7q11yEtnAX72OuDANBgkqhkiG9w0BAQsFAAOCAQEAS4pNn4NDi4QRiEAA
ONvZGTbhK0sy0Yb4U3kUANkyf6aYUxwk95kpkyHR6S8jQBYDltaY3AHaopfXaJPa
Sh4m1E/uFaS33eiEKhXhITuHZYrLXC2Tw68s9pdrllP8dmeosxZS2GHpfFUOmGEc
G2CAhspbcinQ14CJRqZr3hO7n68UbGlGM0DUo4EYKpurtBEsxomjR0vfd+ZM8l91
CiLSTX3/kfd16s0faCqm+jc0OrE0SihG4MvyGgOwS6BDFMMra0MeM4qA9zMsYSFw
fZbq7wJ0+B8lOBVH879bMmmlhCAObVvLUutCIek0b5iVGnEpM9ePbqs9HM6Ne0LN
qRrQ5g==
-----END CERTIFICATE-----
-----BEGIN PRIVATE KEY-----
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQC9ZaRyK30OYWCM
JrAl7nj0woONxvztcSQfjWrn5ouUJD7jP2d1aA3yfeRV+6F7E/kvlFO1G/e4NTFN
LNsSPmhA3DefvL7sL/RLq4m8APBfluaWM1f5SyL8oM/6lrcPBIfWEmsd+uwuGJy4
g9ucKpZXgPygpig0QK+Bjb1XJcELWDbKh38X2MNm559kYb19pJVF444GPAvvGxCu
aWcLtlYrwLfk+20AHsdudLpSYxKDI+vFw5Q+w5G4JxCGVdbH2712CIoAr2Aq3Jp1
i57eeYzaLAVA3/Lc87oBoPHgBLGg0y1jY/auGOgoJGWW8RssA+VrlH30jFeRfd8g
rh42s3zLAgMBAAECggEAAkDlWZAwrkmQ/aEFjGpvh1StmrPOTisqrqSJq/yy7AOm
1a6yVf6eShZXmkgUaOcnRPrE01DVptUAhrniIIU4rwBvyBsAXv6c2i4Vb+aOg9g0
6UXvHrOQKlcem0Y+eK/wckNlZRsJNNCJQ/KL6io2qtb7KoYsCS3K9wWoDRNmlw7V
gd2cb89hA5imeJs3LNzzLHdTgzea4O7LhIqQVPLvWg6JN8HFp/u7uztTB8gs/2mK
lDc9X4aj/z6yw5vDcPkMWBc7iPgViUgykqaamYufLF2M/U7f+Ud68kTtlEV0SNgV
vhUagX0iSxq+F2eInK/iIlfepIy5j+KqZbjWkLtt8QKBgQDRhk4iZ2/El6Urz765
54tLY0DWIC6OFl8oTxTL+eBkfafdXAFBfwpVj7A7/zoDrHb7GOEhKBOggD5PkwSD
veUMjTNxi1DuL3kX/60S8DavXgXGmYKzJq1v9ut85q6peJV3eqGeWLPNhcLtPbTq
Atlojp3mk5I7Or7dBGyTnO2kHwKBgQDnaGhRpN7pjSBb9+LGkeY1NjlgwfAPJJGX
dsuZ1+5buSZxT3SQCz0aAc3kPrOix/sBUN/ims4CsTlwc/vEm36csRZwvbg7sUfV

56
IBM Storage Ceph for Beginner’s

kdzSpfZ2xKXoy5g2sBtRRhALTkgPktpx3CbnNFGM8yq50GvTj0leuy8G3NgA/8gb
lyURRXAx1QKBgQCjWYgU/nt+05NsQry5hzFsBueHiPOCxyJM9MqL9DXjYqu6wn4g
KAFQj4OgYu1B6/We8dii1vHmUdVCiKYeZ6/pRzRyM2FXMR/BfA3dE/YuZqkuGoRx
U5goEGOrrtVBPseYrLzQDOuxMbW07ETdpHcHMxkbqLV7A+PFwCs+Mjx7lQKBgAtC
nFjkseghcuKmxEUvUklikxYvObQy7la1dCDPTgzujH1VBXIA6f86+T7TAkC4hHFC
8zH+oGmnIAlly2l8u4N6ZoIj6TQWY010JI+nfb+3v+79ATIgDaQ9yYgTThRb6/9A
XDBB7nnyVzDlgGmx/jr61sX5txUNXTpid25It7XlAoGBAJwBTX86b4HMzHUDghVA
oTbH4kjK0msYr+9Hbsc+iaaoAfrY8pUkP1krcGhj3S5LjHEAYUTWZU+xW54wUFWe
o6CWB+cQ7RQww7vITE0B0F24iS24IoTA1OXmN8bit/KW4+PkXNFv0uP3d0b9R7Qr
pJaeAgy19eTaPnUSdCWrywhw
-----END PRIVATE KEY-----
[root@cephnode1 ~]#

Here is an explanation of the options we specified:

• service_type - Must be set to ingress.


• service_id - Must match the existing Ceph Object Gateway service name.
• placement - Where to deploy the haproxy and keepalived containers.
• virtual_ip - The virtual IP address where the ingress service is available.
• frontend_port - The port to access the ingress service.
• monitor_port - The port to access the haproxy load balancer status.
• virtual_interface_networks - Optional list of available subnets.
• ssl_cert - Optional SSL certificate and private key.

Note, we want to deploy the ingress service on the same nodes as our RGWs using the placement set
to label. Before we deploy the ingress service, we need to make sure we have the latest ingress and
haproxy container images.

[root@cephnode1 ~]# cephadm shell --mount ingress.yaml:/root/ingress.yaml


Inferring fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341
Inferring config /var/lib/ceph/e7fcc1ac-42ec-11ef-a58f-bc241172f341/mon.cephnode1/config
Using ceph image with id 'a09ffce67935' and tag 'latest' created on 2024-05-31 19:48:46 +0000
UTC
cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:354f5b6f203dbd9334ac2f2bfb541c7e06498a62283b1c91cef5fa8a036aea4f
[ceph: root@cephnode1 /]# ceph config set mgr mgr/cephadm/container_image_haproxy
cp.icr.io/cp/ibm-ceph/haproxy-rhel9:latest
[ceph: root@cephnode1 /]# ceph config set mgr mgr/cephadm/container_image_keepalived
cp.icr.io/cp/ibm-ceph/keepalived-rhel9:latest
[ceph: root@cephnode1 /]#

Then use the ceph orchestrator to deploy the ingress service.


[root@cephnode1 ~]# cephadm shell --mount ingress.yaml:/root/ingress.yaml
Inferring fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341
Inferring config /var/lib/ceph/e7fcc1ac-42ec-11ef-a58f-bc241172f341/mon.cephnode1/config
Using ceph image with id 'a09ffce67935' and tag 'latest' created on 2024-05-31 19:48:46 +0000
UTC
cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:354f5b6f203dbd9334ac2f2bfb541c7e06498a62283b1c91cef5fa8a036aea4f
[ceph: root@cephnode1 /]# ceph orch apply -i /root/ingress.yaml
Scheduled ingress.rgw_default update...
[ceph: root@cephnode1 /]#

We can check the status of the deployment. We should have two haproxy and two keepalived daemons
running.

[ceph: root@cephnode1 /]# ceph orch ps --daemon-type haproxy


NAME HOST PORTS STATUS
REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
haproxy.rgw.rgw_default.cephnode2.ggqdig cephnode2.local *:443,1900 running (4m) 4m
ago 4m 5150k - 2.4.22-f8e3218 56a7ae245674 f76c5da80802
haproxy.rgw.rgw_default.cephnode3.hrynln cephnode3.local *:443,1900 running (4m) 4m
ago 4m 5171k - 2.4.22-f8e3218 56a7ae245674 47ad7f318296
[ceph: root@cephnode1 /]# ceph orch ps --daemon-type keepalived
NAME HOST PORTS STATUS REFRESHED
AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID

57
IBM Storage Ceph for Beginner’s

keepalived.rgw.rgw_default.cephnode2.feptmu cephnode2.local running (4m) 4m ago


4m 1795k - 2.2.8 84146097b087 535446ddf934
keepalived.rgw.rgw_default.cephnode3.warcjn cephnode3.local running (4m) 4m ago
4m 2437k - 2.2.8 84146097b087 fe07fa627073
[ceph: root@cephnode1 /]#

To validate the HA configuration for the Ceph Object Gateway is working by using wget or curl. Both
should return an index.html.

[root@cephnode1 ~]# wget --no-check-certificate https://cephs3.local


--2024-07-20 01:00:54-- https://cephs3.local/
Resolving cephs3.local (cephs3.local)... 10.0.0.244
Connecting to cephs3.local (cephs3.local)|10.0.0.244|:443... connected.
WARNING: The certificate of ‘cephs3.local’ is not trusted.
WARNING: The certificate of ‘cephs3.local’ doesn't have a known issuer.
The certificate's owner does not match hostname ‘cephs3.local’
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/xml]
Saving to: ‘index.html’

index.html [ <=>
] 214 --.-KB/s in 0s

2024-07-20 01:00:54 (126 MB/s) - ‘index.html’ saved [214]

[root@cephnode1 ~]#
[root@cephnode1 ~]# curl -k https://cephs3.local
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult
xmlns="http://s3.amazonaws.com/doc/2006-03-
01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner> [root@cephnode1 ~]#

As mentioned earlier, you can deploy the ingress service via the Ceph dashboard as follows:

Figure 52: Defining the Ceph Ingress Service using the Ceph Dashboard

Now that we have a highly available object gateway, we can create an object user and test access.
Navigate to Object -> Users on the Ceph Dashboard and create a new user. Make sure to check the
box to generate a S3 access key ID and secret access key.

58
IBM Storage Ceph for Beginner’s

Figure 53: Creating an object user

If you get an error “The Object Gateway Service is not configured” or “Error Connecting to Object
Gateway” when navigating to the Object service in the dashboard then check that the server port is
set correctly (in our case 443 after deploying the ingress service). You can query the current port by
issuing “ceph config dump | grep -i -e rgw -e dash”. You can also get this error when configuring
multi-site. See here https://bugzilla.redhat.com/show_bug.cgi?id=2231072 to resolve it.

You can also display the S3 access key ID and secret access key by clicking on the user and choosing
“Show” under Keys. Note, a default dashboard user is also created by default.

Figure 54: Displaying an object user access key ID and secret access key

59
IBM Storage Ceph for Beginner’s

We will use AWSCLI for testing access to our object service (https://aws.amazon.com/cli/). We need
the PEM file in order to use SSL via HTTPS with AWSCLI.

[root@cephnode1 ~]# aws configure


AWS Access Key ID [None]: Z3VKKAHG9W9WLRN30FQF
AWS Secret Access Key [None]: zBDMygZBnEizrLrY3ioP7wpePeSy69AJBRGhMG1z
Default region name [None]:
Default output format [None]:
[root@cephnode1 ~]# export AWS_CA_BUNDLE=/root/cert/cephs3.pem
[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local list-buckets
{
"Buckets": [],
"Owner": {
"DisplayName": "Nishaan Docrat",
"ID": "ndocrat"
}
}
[root@cephnode1 ~]#
[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local create-bucket --bucket
mybucket
[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local list-buckets
{
"Buckets": [
{
"Name": "mybucket",
"CreationDate": "2024-07-19T23:46:00.788000+00:00"
}
],
"Owner": {
"DisplayName": "Nishaan Docrat",
"ID": "ndocrat"
}
}
[root@cephnode1 ~]#

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local put-object --bucket


mybucket --key hostfile --body /etc/hosts
{
"ETag": "\"7731f264edd83fff369c86be2c1a5a0a\""
}
[root@cephnode1 ~]#

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local list-objects --bucket


mybucket
{
"Contents": [
{
"Key": "hostfile",
"LastModified": "2024-07-19T23:48:25.240000+00:00",
"ETag": "\"7731f264edd83fff369c86be2c1a5a0a\"",
"Size": 472,
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "Nishaan Docrat",
"ID": "ndocrat"
}
}
],
"RequestCharged": null
}
[root@cephnode1 ~]#

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local get-object --bucket


mybucket --key hostfile myhostfile
{
"AcceptRanges": "bytes",
"LastModified": "2024-07-19T23:48:25+00:00",
"ContentLength": 472,
"ETag": "\"7731f264edd83fff369c86be2c1a5a0a\"",
"ContentType": "binary/octet-stream",
"Metadata": {}
}
[root@cephnode1 ~]# ls -al myhostfile
-rw-r--r--. 1 root root 472 Jul 20 01:51 myhostfile
[root@cephnode1 ~]#

60
IBM Storage Ceph for Beginner’s

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local delete-object --bucket


mybucket --key hostfile
[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local delete-bucket --bucket
mybucket
[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local list-buckets
{
"Buckets": [],
"Owner": {
"DisplayName": "Nishaan Docrat",
"ID": "ndocrat"
}
}
[root@cephnode1 ~]#

To test high availability, you can check which node has the virtual IP (10.0.0.244 or cephs3.local). In
our case, cephnode2 currently has virtual IP. We will abruptly shutdown this node and test access to
our virtual IP with AWSCLI as follows:

[root@cephnode1 ~]# ssh cephnode2


Activate the web console with: systemctl enable --now cockpit.socket

Register this system with Red Hat Insights: insights-client --register


Create an account or view all your systems at https://red.ht/insights-dashboard
Last login: Sat Jul 20 01:07:13 2024 from 10.0.0.240
[root@cephnode2 ~]# halt
[root@cephnode2 ~]# Connection to cephnode2 closed by remote host.
Connection to cephnode2 closed.
[root@cephnode1 ~]# ssh cephnode3
Activate the web console with: systemctl enable --now cockpit.socket

Register this system with Red Hat Insights: insights-client --register


Create an account or view all your systems at https://red.ht/insights-dashboard
Last login: Sat Jul 20 00:52:23 2024 from 10.0.0.240
[root@cephnode3 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:5e:07:89 brd ff:ff:ff:ff:ff:ff
altname enp0s18
inet 10.0.0.242/8 brd 10.255.255.255 scope global noprefixroute ens18
valid_lft forever preferred_lft forever
inet 10.0.0.244/24 scope global ens18
valid_lft forever preferred_lft forever
inet6 fe80::be24:11ff:fe5e:789/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:bb:d5:f6 brd ff:ff:ff:ff:ff:ff
altname enp0s19
inet 192.168.1.12/24 brd 192.168.1.255 scope global noprefixroute ens19
valid_lft forever preferred_lft forever
inet6 fe80::4860:35fe:9978:e3a6/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@cephnode3 ~]#

With cephnode2 down, we need to test access to the virtual IP which has now moved to cephnode3
as per the above output. Firstly, we verify the node is down and it’s RGW is not available.

[root@cephnode1 ~]# cephadm shell


Inferring fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341
Inferring config /var/lib/ceph/e7fcc1ac-42ec-11ef-a58f-bc241172f341/mon.cephnode1/config
Using ceph image with id 'a09ffce67935' and tag 'latest' created on 2024-05-31 19:48:46 +0000
UTC
cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:354f5b6f203dbd9334ac2f2bfb541c7e06498a62283b1c91cef5fa8a036aea4f
[ceph: root@cephnode1 /]# ceph -s
cluster:

61
IBM Storage Ceph for Beginner’s

id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_WARN
1/3 mons down, quorum cephnode1,cephnode3
2 osds down
1 host (2 osds) down
Degraded data redundancy: 192/663 objects degraded (28.959%), 56 pgs degraded,
327 pgs undersized

services:
mon: 3 daemons, quorum cephnode1,cephnode3 (age 108s), out of quorum: cephnode2
mgr: cephnode1.tbqyke(active, since 77s)
osd: 8 osds: 6 up (since 2m), 8 in (since 27h)
rgw: 1 daemon active (1 hosts, 1 zones)

data:
pools: 7 pools, 417 pgs
objects: 221 objects, 586 KiB
usage: 363 MiB used, 150 GiB / 150 GiB avail
pgs: 192/663 objects degraded (28.959%)
271 active+undersized
90 active+clean
56 active+undersized+degraded

[ceph: root@cephnode1 /]# ceph orch ls --service-type ingress


NAME PORTS RUNNING REFRESHED AGE PLACEMENT
ingress.rgw.rgw_default 10.0.0.244:443,1900 2/4 7m ago 68m count:2;label:rgw
[ceph: root@cephnode1 /]# ceph orch ps --daemon-type haproxy
NAME HOST PORTS STATUS
REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
haproxy.rgw.rgw_default.cephnode2.ggqdig cephnode2.local *:443,1900 host is offline 7m
ago 68m 7948k - 2.4.22-f8e3218 56a7ae245674 f76c5da80802
haproxy.rgw.rgw_default.cephnode3.hrynln cephnode3.local *:443,1900 running (68m) 3m
ago 68m 5381k - 2.4.22-f8e3218 56a7ae245674 47ad7f318296
[ceph: root@cephnode1 /]# ceph orch ps --daemon-type keepalived
NAME HOST PORTS STATUS
REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
keepalived.rgw.rgw_default.cephnode2.feptmu cephnode2.local host is offline 7m
ago 68m 1795k - 2.2.8 84146097b087 535446ddf934
keepalived.rgw.rgw_default.cephnode3.warcjn cephnode3.local running (68m) 3m
ago 68m 1799k - 2.2.8 84146097b087 fe07fa627073
[ceph: root@cephnode1 /]#

Now we need to test access to our object service using AWSCLI.

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local create-bucket --bucket


newbucket
[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local put-object --bucket
newbucket --key hostfile --body /etc/hosts
{
"ETag": "\"7731f264edd83fff369c86be2c1a5a0a\""
}
[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local get-object --bucket
newbucket --key hostfile myhostfile
{
"AcceptRanges": "bytes",
"LastModified": "2024-07-19T23:58:31+00:00",
"ContentLength": 472,
"ETag": "\"7731f264edd83fff369c86be2c1a5a0a\"",
"ContentType": "binary/octet-stream",
"Metadata": {}
}
[root@cephnode1 ~]#

We have just demonstrated the high availability of the Ceph RGW service. We need to reboot
cephnode2 and wait for it to rejoin the cluster.

[ceph: root@cephnode1 /]# ceph -s


cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_WARN
Degraded data redundancy: 44/780 objects degraded (5.641%), 10 pgs degraded, 61
pgs undersized

62
IBM Storage Ceph for Beginner’s

services:
mon: 3 daemons, quorum cephnode1,cephnode3,cephnode2 (age 2m)
mgr: cephnode1.tbqyke(active, since 11m), standbys: cephnode2.iaecpr
osd: 8 osds: 8 up (since 2m), 8 in (since 27h)
rgw: 1 daemon active (1 hosts, 1 zones)

data:
pools: 7 pools, 417 pgs
objects: 260 objects, 586 KiB
usage: 431 MiB used, 200 GiB / 200 GiB avail
pgs: 44/780 objects degraded (5.641%)
354 active+clean
51 active+undersized
10 active+undersized+degraded
2 active+clean+scrubbing

[ceph: root@cephnode1 /]#

When recovery is completed, the cluster will return to a healthy state.

[ceph: root@cephnode1 /]# ceph -s


cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_OK

services:
mon: 3 daemons, quorum cephnode1,cephnode3,cephnode2 (age 16m)
mgr: cephnode1.tbqyke(active, since 25m), standbys: cephnode2.iaecpr
osd: 8 osds: 8 up (since 16m), 8 in (since 27h)
rgw: 2 daemons active (2 hosts, 1 zones)

data:
pools: 7 pools, 417 pgs
objects: 259 objects, 586 KiB
usage: 427 MiB used, 200 GiB / 200 GiB avail
pgs: 417 active+clean

[ceph: root@cephnode1 /]#

When performing failure testing during a POC like we just did where we simulated a node failure, you
might want to prevent CRUSH from automatically rebalancing the cluster. To avert this rebalancing
behaviour, set the cluster to noout by running the following command “ceph osd set noout”.

The virtual IP will return to cephnode2 (original node) by default. You can verify this by using ip a. As
you can see from the below output, the virtual IP has moved back to cephode2.

[root@cephnode2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:e9:1f:f5 brd ff:ff:ff:ff:ff:ff
altname enp0s18
inet 10.0.0.241/8 brd 10.255.255.255 scope global noprefixroute ens18
valid_lft forever preferred_lft forever
inet 10.0.0.244/24 scope global ens18
valid_lft forever preferred_lft forever
inet6 fe80::be24:11ff:fee9:1ff5/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:2b:c4:ee brd ff:ff:ff:ff:ff:ff

63
IBM Storage Ceph for Beginner’s

altname enp0s19
inet 192.168.1.11/24 brd 192.168.1.255 scope global noprefixroute ens19
valid_lft forever preferred_lft forever
inet6 fe80::f3b8:12a1:3e2c:7db/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@cephnode2 ~]#

[root@cephnode3 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:5e:07:89 brd ff:ff:ff:ff:ff:ff
altname enp0s18
inet 10.0.0.242/8 brd 10.255.255.255 scope global noprefixroute ens18
valid_lft forever preferred_lft forever
inet6 fe80::be24:11ff:fe5e:789/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:bb:d5:f6 brd ff:ff:ff:ff:ff:ff
altname enp0s19
inet 192.168.1.12/24 brd 192.168.1.255 scope global noprefixroute ens19
valid_lft forever preferred_lft forever
inet6 fe80::4860:35fe:9978:e3a6/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@cephnode3 ~]#

Using Virtual-hosted Style Bucket Addressing with IBM Storage Ceph


RGW
Amazon S3 supports two types of bucket addressing schemes. These are referred to as Path-style and
Virtual-hosted style addressing. For Virtual-hosted style bucket addressing, the bucket name is part
of the DNS name in the URL. For example:

https://bucket.s3.amazonaws.com/object
https://bucket.s3-aws-region.amazonaws.com/object

For Path-style addressing, the bucket name is not part of the DNS name in the URL. For example:

https://s3.amazonaws.com/bucket/object
https://s3-aws-region.amazonaws.com/bucket/object

Most commercial applications that make use of object storage will require virtual-hosted style bucket
addressing. To support Virtual-hosted style addressing requires the use of a Wildcard DNS. Amazon
intends to deprecate Path-style API requests.

“Amazon S3 currently supports two request URI styles in all regions: path-style (also known as V1) that
includes bucket name in the path of the URI (example: //s3.amazonaws.com/<bucketname>/key), and
virtual-hosted style (also known as V2) which uses the bucket name as part of the domain name
(example: //<bucketname>.s3.amazonaws.com/key). In our effort to continuously improve customer
experience, the path-style naming convention is being retired in favor of virtual-hosted style request
format. Customers should update their applications to use the virtual-hosted style request format when
making S3 API requests before September 30th, 2020 to avoid any service disruptions. Customers using
the AWS SDK can upgrade to the most recent version of the SDK to ensure their applications are using
the virtual-hosted style request format.

64
IBM Storage Ceph for Beginner’s

Virtual-hosted style requests are supported for all S3 endpoints in all AWS regions. S3 will stop
accepting requests made using the path-style request format in all regions starting September 30th,
2020. Any requests using the path-style request format made after this time will fail.” Source:
https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/ and
https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html.”

The deprecation of Path-style requests was originally targeted for September 2020. As of the time of
writing, this deadline has been extended.

“Update (September 23, 2020) – Over the last year, we’ve heard feedback from many customers who
have asked us to extend the deprecation date. Based on this feedback we have decided to delay the
deprecation of path-style URLs to ensure that customers have the time that they need to transition to
virtual hosted-style URLs.

We have also heard feedback from customers that virtual hosted-style URLs should support buckets
that have dots in their names for compatibility reasons, so we’re working on developing that support.
Once we do, we will provide at least one full year prior to deprecating support for path-style URLs for
new buckets.” Source: https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-
rest-of-the-story/.

Without the ability to implement Virtual-hosted style bucket addressing in IBM Storage Ceph, we risk
losing the ability to support applications built to make use of this addressing scheme exclusively. The
procedure to add a wildcard such as a hostname to the DNS record of the DNS server is documented
here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=configuration-add-wildcard-dns

For a POC environment, you can setup a DNS server using dnsmasq which is an open source,
lightweight, easy to configure DNS forwarder and DHCP server. Our lab dnsmasq configuration file
looks as follows:

.
.
.
#mods
server=8.8.8.8
address=/cephs3.local/10.0.0.244
#address=/local/127.0.0.1
domain=local
.
.
.

Check to see if your name resolution is working. Anything that doesn’t have an entry in the /etc/hosts
file of the dnsmasq server should resolve to *.cephs3.local.

root@labserver:/etc# nslookup cephnode1.local


Server: 127.0.0.53
Address: 127.0.0.53#53

Name: cephnode1.local
Address: 10.0.0.240

root@labserver:/etc# nslookup cephnode1


Server: 127.0.0.53
Address: 127.0.0.53#53

Name: cephnode1
Address: 10.0.0.240

65
IBM Storage Ceph for Beginner’s

root@labserver:/etc# nslookup cephs3.local


Server: 127.0.0.53
Address: 127.0.0.53#53

Name: cephs3.local
Address: 10.0.0.244

root@labserver:/etc# nslookup cephs3


Server: 127.0.0.53
Address: 127.0.0.53#53

Name: cephs3
Address: 10.0.0.244

root@labserver:/etc# nslookup bucket1.cephs3.local


Server: 127.0.0.53
Address: 127.0.0.53#53

Name: bucket1.cephs3.local
Address: 10.0.0.244

root@labserver:/etc#

From one of our ceph cluster nodes we can also verify wildcard DNS is working.

[root@cephnode1 ~]# nslookup bucket.cephs3.local


Server: 10.0.0.246
Address: 10.0.0.246#53

Name: bucket.cephs3.local
Address: 10.0.0.244

[root@cephnode1 ~]#

Before following the procedure from the documentation, we can test to make sure virtual-host style
bucket addressing is working or not. For this, we will use s3cmd (https://s3tools.org/s3cmd). When
issuing the s3cmd --configure to add your S3 credentials and endpoint URL, the configuration should
look something like this.

New settings:
Access Key: Z3VKKAHG9W9WLRN30FQF
Secret Key: zBDMygZBnEizrLrY3ioP7wpePeSy69AJBRGhMG1z
Default Region: US
S3 Endpoint: cephs3.local
DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.cephs3.local
Encryption password: Nlp345gp
Path to GPG program: /usr/bin/gpg
Use HTTPS protocol: True
HTTP Proxy server name:
HTTP Proxy server port: 0

Since we have SSL enabled with a self-signed certificate, we will just disable validating the SSL
certificate in the .s3cfg file with “check_ssl_certificate = False”. If we test the creation of a DNS style
bucket with s3cmd we get:

root@labserver:~# s3cmd mb s3://s3cmdnucket


ERROR: S3 error: 405 (MethodNotAllowed)
root@labserver:~#

Now we will follow the procedure as per the documentation. Firstly, we need to get the current
zonegroup information and output it to a json file.

[ceph: root@cephnode1 /]# radosgw-admin zonegroup get --rgw-zonegroup=LAB_ZONE_GROUP1 >


zonegroup.json

Next, we need to modify the file to include cephs3.local in the hostnames field.

66
IBM Storage Ceph for Beginner’s

[root@cephnode1 ~]# cat zonegroup.json

.
.
"api_name": "LAB_ZONE_GROUP1",
"is_master": true,
"endpoints": [
"10.0.0.241",
"10.0.0.242"
],
"hostnames": ["cephs3.local","cephnode2.local","cephnode3.local"],
"hostnames_s3website": [],
"master_zone": "7109acc2-883b-4502-b863-4e7097d7d83c",
"zones": [
{
"id": "7109acc2-883b-4502-b863-4e7097d7d83c",
"name": "DC1_ZONE",
"endpoints": [
"10.0.0.241",
"10.0.0.242"
],
"log_meta": false,
"log_data": false,
.
.

We need to now upload the new zonegroup information back to the Ceph RGWs and update the period.
Each realm is associated with a “period”. A period represents the state of the zonegroup and zone
configuration in time. Each time you make a change to a zonegroup or zone, you should update and
commit the period.

[root@cephnode1 ~]# radosgw-admin zonegroup set --rgw-zonegroup=LAB_ZONE_GROUP1 --


infile=zonegroup.json
{
"id": "ace3190f-c02e-40e2-abdd-344efc6ab06c",
"name": "LAB_ZONE_GROUP1",
"api_name": "LAB_ZONE_GROUP1",
"is_master": true,
"endpoints": [
"10.0.0.241",
"10.0.0.242"
],
"hostnames": [
"cephs3.local",
"cephnode2.local",
"cephnode3.local"
],
.
.
.
"default_placement": "default-placement",
"realm_id": "a9c5b73e-66bd-4e10-9a2f-5df2f6fc515a",
"sync_policy": {
"groups": []
},
"enabled_features": [
"resharding"
]
}
[root@cephnode1 ~]# radosgw-admin period update --commit
{
"id": "8911517d-673c-4b8b-bd5c-64a43b804f51",
"epoch": 12,
"predecessor_uuid": "256b965c-d5f8-4d21-a54d-f9239ccc318a",
"sync_status": [],
"period_map": {
"id": "8911517d-673c-4b8b-bd5c-64a43b804f51",
"zonegroups": [
{
"id": "ace3190f-c02e-40e2-abdd-344efc6ab06c",
"name": "LAB_ZONE_GROUP1",
"api_name": "LAB_ZONE_GROUP1",
"is_master": true,
"endpoints": [

67
IBM Storage Ceph for Beginner’s

"10.0.0.241",
"10.0.0.242"
],
"hostnames": [
"cephs3.local",
"cephnode2.local",
"cephnode3.local"
],
.
.
.
"realm_id": "a9c5b73e-66bd-4e10-9a2f-5df2f6fc515a",
"realm_epoch": 2
}
[root@cephnode1 ~]#

Lastly, we need to recycle the Ceph Object Gateways (do this for all RGWs).

Figure 55: Recycle the Ceph RGWs

We can now test virtual-hosted style bucket addressing with s3cmd.

root@labserver:~# s3cmd mb s3://s3cmdnucket


Bucket 's3://s3cmdnucket/' created
root@labserver:~#

root@labserver:~# s3cmd rb s3://s3cmdnucket


Bucket 's3://s3cmdnucket/' removed
root@labserver:~#

We can also test it via a simple script.

root@labserver:~# cat virtual.sh


# Script to test Virtual Hosted Style Bucket Addressing
# PUT a given file into a given bucket
# Variables
S3_ACCESS_KEY=Z3VKKAHG9W9WLRN30FQF
S3_SECRET_KEY=zBDMygZBnEizrLrY3ioP7wpePeSy69AJBRGhMG1z
BUCKET="awsbucket"
FILE="testfile"
DATEVALUE=`date -R`
FILEPATH="/${BUCKET}/${FILE}"
# Curl Metadata
CONTENTTYPE="application/x-compressed-tar"

68
IBM Storage Ceph for Beginner’s

DATEVALUE=`date -R`
SIGNATURE_STRING="PUT\n\n${CONTENTTYPE}\n${DATEVALUE}\n${FILEPATH}"
# Create signature hash to be sent in Authorization header
SIGNATURE_HASH=`echo -en ${SIGNATURE_STRING} | openssl sha1 -hmac ${S3_SECRET_KEY} -binary |
base64`
# Clean out awsbucket
s3cmd rb --recursive --force s3://awsbucket >/dev/null 2>&1
s3cmd mb s3://awsbucket >/dev/null 2>&1
# Output to screen
echo
echo "Script to demonstrate Virtual-hosted style bucket addressing"
echo
echo "Listing $BUCKET bucket prior to PUT"
echo
echo " -> Running s3cmd ls s3://awsbucket.."
s3cmd ls s3://awsbucket
echo
echo " -> List bucket completed"
echo
echo "Performing a PUT to the following URL https://${BUCKET}.cephs3.local/${FILE}"
echo
echo " -> curl -k -X PUT -T ${FILE} -H Host: ${BUCKET}.cephs3.local -H Date: ${DATEVALUE} -H
Content-Type: ${CONTENTTYPE} -H Authorization: AWS ${S3_ACCESS_KEY}:${SIGNATURE_HASH}
https://${BUCKET}.cephs3.local/${FILE}"
# curl command to do PUT operation to our S3 endpoint
curl -k -X PUT -T "${FILE}" \
-H "Host: ${BUCKET}.cephs3.local" \
-H "Date: ${DATEVALUE}" \
-H "Content-Type: ${CONTENTTYPE}" \
-H "Authorization: AWS ${S3_ACCESS_KEY}:${SIGNATURE_HASH}" \
https://${BUCKET}.cephs3.local/${FILE}
echo
echo "Listing bucket after PUT"
echo
echo " -> Running s3cmd ls s3://awsbucket.."
echo
s3cmd ls s3://awsbucket
echo
echo " -> List bucket completed"
echo
echo "End of script.. Exiting"
root@labserver:~#

root@labserver:~# ./virtual.sh

Script to demonstrate Virtual-hosted style bucket addressing

Listing awsbucket bucket prior to PUT

-> Running s3cmd ls s3://awsbucket..

-> List bucket completed

Performing a PUT to the following URL https://awsbucket.cephs3.local/testfile

-> curl -k -X PUT -T testfile -H Host: awsbucket.cephs3.local -H Date: Sat, 20 Jul


2024 22:52:45 +0200 -H Content-Type: application/x-compressed-tar -H Authorization: AWS
Z3VKKAHG9W9WLRN30FQF:KcO6sCBpGw+HqEGKSvyjvAtgXaE= https://awsbucket.cephs3.local/testfile

Listing bucket after PUT

-> Running s3cmd ls s3://awsbucket..

2024-07-20 20:52 518 s3://awsbucket/testfile

-> List bucket completed

End of script.. Exiting


root@labserver:~#

69
IBM Storage Ceph for Beginner’s

For a POC or test environment, you can use s3bench to test the performance of the Ceph Object
Gateway (https://github.com/igneous-systems/s3bench ). You can monitor the RGW performance
using the Grafana dashboards.

root@labserver:~# go install github.com/igneous-systems/s3bench@latest


go: downloading github.com/igneous-systems/s3bench v0.0.0-20190531022958-7b8100187531
go: finding module for package github.com/aws/aws-sdk-go/aws/credentials
go: finding module for package github.com/aws/aws-sdk-go/aws
go: downloading github.com/aws/aws-sdk-go v1.54.20
go: finding module for package github.com/aws/aws-sdk-go/aws/session
go: finding module for package github.com/aws/aws-sdk-go/service/s3
go: found github.com/aws/aws-sdk-go/aws in github.com/aws/aws-sdk-go v1.54.20
go: found github.com/aws/aws-sdk-go/aws/credentials in github.com/aws/aws-sdk-go v1.54.20
go: found github.com/aws/aws-sdk-go/aws/session in github.com/aws/aws-sdk-go v1.54.20
go: found github.com/aws/aws-sdk-go/service/s3 in github.com/aws/aws-sdk-go v1.54.20
go: downloading github.com/jmespath/go-jmespath v0.4.0
root@labserver:~#
root@labserver:~/go/bin# ./s3bench -accessKey=Z3VKKAHG9W9WLRN30FQF -
accessSecret=zBDMygZBnEizrLrY3ioP7wpePeSy69AJBRGhMG1z -bucket=s3bench -
endpoint=http://cephnode2.local:8080 -region DC1_ZONE -numClients=2 -numSamples=100 -
objectNamePrefix=s3bench -objectSize=10024
Test parameters
endpoint(s): [http://cephnode2.local:8080]
bucket: s3bench
objectNamePrefix: s3bench
objectSize: 0.0096 MB
numClients: 2
numSamples: 100
verbose: %!d(bool=false)

Generating in-memory sample data... Done (47.033µs)

Running Write test...

Running Read test...

Test parameters
endpoint(s): [http://cephnode2.local:8080]
bucket: s3bench
objectNamePrefix: s3bench
objectSize: 0.0096 MB
numClients: 2
numSamples: 100
verbose: %!d(bool=false)

Results Summary for Write Operation(s)


Total Transferred: 0.956 MB
Total Throughput: 1.68 MB/s
Total Duration: 0.569 s
Number of Errors: 0
------------------------------------
Write times Max: 0.019 s
Write times 99th %ile: 0.019 s
Write times 90th %ile: 0.014 s
Write times 75th %ile: 0.013 s
Write times 50th %ile: 0.011 s
Write times 25th %ile: 0.010 s
Write times Min: 0.008 s

Results Summary for Read Operation(s)


Total Transferred: 0.956 MB
Total Throughput: 13.09 MB/s
Total Duration: 0.073 s
Number of Errors: 0
------------------------------------
Read times Max: 0.004 s
Read times 99th %ile: 0.004 s
Read times 90th %ile: 0.002 s

70
IBM Storage Ceph for Beginner’s

Read times 75th %ile: 0.002 s


Read times 50th %ile: 0.001 s
Read times 25th %ile: 0.001 s
Read times Min: 0.001 s

Cleaning up 100 objects...


Deleting a batch of 100 objects in range {0, 99}... Succeeded
Successfully deleted 100/100 objects in 188.251053ms
root@labserver:~/go/bin#

Note, some applications require you to specify a REGION. With Ceph, the REGION equates to the zone
name.

Another useful tool for a POC environment is s3tests https://github.com/ceph/s3-tests. This tool tests
S3 API compatibility. If you are comparing different object stores (e.g. Ceph, MinIO, Dell EMC Isilon,
Storage Scale etc.) you can use this to see which solution offers the highest level of S3 API
compatibility.

root@labserver:~/s3-tests# S3TEST_CONF=nd.conf tox -- s3tests_boto3/functional


.pkg: _optional_hooks> python /usr/lib/python3/dist-packages/pyproject_api/_backend.py True
setuptools.build_meta __legacy__
.pkg: get_requires_for_build_sdist> python /usr/lib/python3/dist-
packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: get_requires_for_build_wheel> python /usr/lib/python3/dist-
packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: prepare_metadata_for_build_wheel> python /usr/lib/python3/dist-
packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: build_sdist> python /usr/lib/python3/dist-packages/pyproject_api/_backend.py True
setuptools.build_meta __legacy__
py: install_package> python -I -m pip install --force-reinstall --no-deps /root/s3-
tests/.tox/.tmp/package/5/s3tests-0.0.1.tar.gz
py: commands[0]> pytest s3tests_boto3/functional
======================================================================= test session starts
=======================================================================
platform linux -- Python 3.12.3, pytest-8.3.1, pluggy-1.5.0
cachedir: .tox/py/.pytest_cache
rootdir: /root/s3-tests
configfile: pytest.ini
collected 759 items

s3tests_boto3/functional/test_headers.py .........F...FFFF........FF.FFF....F...F....F...
[ 6%]
s3tests_boto3/functional/test_iam.py
FFFFFFFFFFFFFFFFFFFFFFFFFFFssssssssssssssssssssssssssssssssssssssssssssssssssssssss
[ 17%]
s3tests_boto3/functional/test_s3.py
.............................................................................................
...............F.......... [ 32%]
...........................F..F.....................................................F........
..........................F................................... [ 53%]
........

.
.
.
.
.
======================================== 138 failed, 552 passed, 69 skipped, 14776 warnings,
1 error in 1071.85s (0:17:51) ========================================

Setting up IBM Storage Ceph RGW static web hosting

You can configure the Ceph Object Gateway to host static websites in S3 buckets. Traditional website
hosting involves configuring a web server for each website, which can use resources inefficiently when
content does not change dynamically. For example, sites that do not use server-side services like PHP,
servlets, databases, NodeJS, and the like. This approach is substantially more economical than setting
up virtual machines with web servers for each site. The procedure to do so is documented here:

71
IBM Storage Ceph for Beginner’s

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=configuration-static-web-hosting

We will use s3cmd to create out static website. First we need to create a bucket to host our static
website.

root@labserver:~/ceph-s3-tests-master# s3cmd mb s3://testwebsite


Bucket 's3://testwebsite/' created
root@labserver:~/ceph-s3-tests-master# s3cmd ls
2024-07-20 20:52 s3://awsbucket
2024-07-20 20:21 s3://testbucket
2024-07-21 09:01 s3://testwebsite
root@labserver:~/ceph-s3-tests-master#

Next, we will upload the index.html, error.html (required) and any other files referenced in the
index.html and grant all of these objects public access.

root@labserver:~# s3cmd put --acl-public index.html s3://testwebsite/


upload: 'index.html' -> 's3://testwebsite/index.html' [1 of 1]
250 of 250 100% in 0s 11.44 KB/s done
Public URL of the object is: http://testwebsite.cephs3.local/index.html
root@labserver:~# s3cmd put --acl-public error.html s3://testwebsite/
upload: 'error.html' -> 's3://testwebsite/error.html' [1 of 1]
169 of 169 100% in 0s 8.96 KB/s done
Public URL of the object is: http://testwebsite.cephs3.local/error.html
root@labserver:~# s3cmd put --acl-public cephpic.jpg s3://testwebsite/
upload: 'cephpic.jpg' -> 's3://testwebsite/cephpic.jpg' [1 of 1]
8821 of 8821 100% in 0s 465.21 KB/s done
Public URL of the object is: http://testwebsite.cephs3.local/cephpic.jpg
root@labserver:~# s3cmd ls s3://testwebsite
2024-07-21 09:15 8821 s3://testwebsite/cephpic.jpg
2024-07-21 09:02 169 s3://testwebsite/error.html
2024-07-21 09:02 250 s3://testwebsite/index.html
root@labserver:~#

Lastly, we enable static web hosting on the Ceph RDW gateways.

[root@cephnode1 ~]# ceph config set client.rgw rgw_enable_static_website true


[root@cephnode1 ~]# ceph config set client.rgw rgw_dns_name cephs3.local
[root@cephnode1 ~]# ceph config set client.rgw rgw_resolve_cname true
[root@cephnode1 ~]#

We should now be able to access our static website.

Figure 56: IBM Storage Ceph RGW Static Web hosting Example

72
IBM Storage Ceph for Beginner’s

Setting up IBM Storage Ceph RGW Presigned URL


By default, all S3 objects are private with only the object owner able to access them. The object owner
however, can optionally share objects with others by creating a presigned URL using their own security
credentials to grant time-limited permission to download objects. Anyone who receives the presigned
URL can then access the object. Below is an example of how this works. We will create a bucket and
upload a test file to it and then generate a presigned URL and access the object with a web browser.
We will choose to set the time-limited access to 3000 seconds.

root@labserver:~# s3cmd put /etc/hosts s3://presign


upload: '/etc/hosts' -> 's3://presign/hosts' [1 of 1]
712 of 712 100% in 0s 38.15 KB/s done
root@labserver:~# s3cmd ls s3://presign
2024-08-10 05:16 712 s3://presign/hosts
root@labserver:~#
root@labserver:~# s3cmd signurl s3://presign/hosts +3000
http://presign.cephs3.local/hosts?AWSAccessKeyId=Z3VKKAHG9W9WLRN30FQF&Expires=1723270384&Sign
ature=OLjEzrGIlgxzH36uocPt2yLZIQM%3D
root@labserver:~#
We can now access the presigned URL directly from a browser without having to use any S3
credentials.

Figure 57: IBM Storage Ceph RGW Presigned URL Example

IBM Storage Ceph RADOS Block Device (RBD) Deployment


Ceph RADOS block devices (RBDs) are thin-provisioned, resizable, and store data striped over multiple
OSDs. RBDs support snapshots and replication. Ceph block storage clients communicate with Ceph
clusters through kernel modules or the librbd library. RBDs deliver high performance and work well
with hypervisors like KVM, VMWare, Hyper-V, cloud-based computer systems like OpenStack and
container-based platforms like Kubernetes.

Since this document assumes you are trying to evaluate IBM Storage Ceph, we will configure and
provision RBDs to Linux and Windows hosts and also to a native Kubernetes cluster for persistent
volume storage. Ceph RBDs require just monitor, manager and OSD daemons and can be hosted in
the same Ceph storage cluster as RGWs and CephFS. More information on RADOS Block Devices can
be found here:

73
IBM Storage Ceph for Beginner’s

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=ceph-block-devices

Deleting pools on the Ceph dashboard is disabled by default. You can delete the
pools on Ceph Storage Dashboard by ensuring that value of
mon_allow_pool_delete is set to True in Manager modules. You can also enable
this in a running Ceph cluster from the command line by issuing "ceph tell mon.*
injectargs --mon_allow_pool_delete true".

If you navigate to the Block storage protocol in the Ceph dashboard you will notice that you need to
first create a RBD pool.

Figure 58: Ceph Dashboard – Block Storage

Navigate to Cluster -> Pools -> Create and create a new pool called RBD. You need to specify the
application as RBD. Notice the default is a replicated pool with 3 replicas.

74
IBM Storage Ceph for Beginner’s

Figure 59: Ceph Dashboard – Pool Create

Navigate back to Block Storage and we are can now create RBD images (similar to Luns). Notice, there
is an option to create namespaces. A namespace allows you to segregate RBD images (it is similar to
Lun masking that we usually do on a block storage arrays). Users granted access to one namespace
won’t be able to see other RBD images that reside in a different namespace to which they don’t have
access. See https://access.redhat.com/solutions/4872331.

For a POC environment, we want to create images for a Windows and Linux hosts. We don’t want them
to be able to access each other’s images so we will create two namespaces called windows and linux.
Both will reside in the same storage pool we created earlier.

Figure 60: Ceph Dashboard – Block Storage Namespaces

75
IBM Storage Ceph for Beginner’s

We can now create an image in each namespace. We will call them winlun and linuxlun and set them
to 3GB in size (we will accept the default options for now).

Figure 61: Ceph Dashboard – Block Storage Image Creation

Figure 62: Ceph Dashboard – Block Storage Image List

Unless otherwise specified, the client rbd command uses the Ceph user ID admin to access the Ceph
cluster. The admin Ceph user ID allows full administrative access to the cluster. It is recommended
that you acess the Ceph cluster with a Ceph user ID that has fewer permissions than the admin Ceph
user ID does. We call this non-admin Ceph user ID a “block device user” or “Ceph user”. We will create
two users called client.windows and client.linux (note, all ceph users should have the prefix client).
You can use the CLI command ceph auth get-or-create to create the required users and assign them
the correct MONITOR and OSD capabilities. Since we are using namespaces, we also have to specify

76
IBM Storage Ceph for Beginner’s

the namespaces for which they have access. We use the -o flag to create a keyring file that we will
provide to each of the clients with their corresponding keys.

[root@cephnode1 ~]# ceph auth get-or-create client.windows mon 'profile rbd' osd 'profile
rbd pool=rbd namespace=windows' -o /etc/ceph/ceph.client.windows.keyring
[root@cephnode1 ~]# ceph auth get-or-create client.linux mon 'profile rbd' osd 'profile rbd
pool=rbd namespace=linux' -o /etc/ceph/ceph.client.linux.keyring
[root@cephnode1 ~]# ceph auth get client.windows
[client.windows]
key = AQApVZxmHkGOFRAAvpJODJPdmeU1pzuN7OAx5g==
caps mon = "profile rbd"
caps osd = "profile rbd pool=rbd namespace=windows"
[root@cephnode1 ~]# ceph auth get client.linux
[client.linux]
key = AQA/VZxmTnFvEhAA+IOSfUHZig/HrW/RkGSKsw==
caps mon = "profile rbd"
caps osd = "profile rbd pool=rbd namespace=linux"
[root@cephnode1 ~]#

We can test access for each user to ensure that they are only able to see images in their own
namespace. Since we have both keyring files already in /etc/ceph we can issue the following
commands:

[root@cephnode1 ceph]# rbd --namespace linux --id linux --pool rbd ls


linuxlun
[root@cephnode1 ceph]# rbd --namespace windows --id linux --pool rbd ls
rbd: error asserting namespace: (1) Operation not permitted
2024-07-21T03:13:29.524+0200 7fee02502c00 -1 librbd::api::Namespace: exists: error asserting
namespace: (1) Operation not permitted
rbd: listing images failed: (1) Operation not permitted
[root@cephnode1 ceph]# rbd --namespace windows --id windows --pool rbd ls
winlun
[root@cephnode1 ceph]# rbd --namespace linux --id windows --pool rbd ls
rbd: error asserting namespace: (1) Operation not permitted
2024-07-21T03:13:41.794+0200 7fcc7e877c00 -1 librbd::api::Namespace: exists: error asserting
namespace: (1) Operation not permitted
rbd: listing images failed: (1) Operation not permitted
[root@cephnode1 ceph]#

As you can see, the linux user can’t access the windows namespace and vice-versa.

Accessing a Ceph RBD Image on a Windows Host


Ceph has been ported to Windows for both RBD and CephFS. The Windows Ceph driver is maintained
by Cloudbase and all contributions are directed upstream. The Windows MSI installer can be
downloaded here:

https://cloudbase.it/ceph-for-windows/

Download the installer and run it on your Windows host.

Figure 63: Ceph for Windows Installer

77
IBM Storage Ceph for Beginner’s

Ceph for Windows is released under the GNU LGPL (the same as Ceph). Be sure to choose the defaults
as we want both the CLI tools and RBD driver to be installed.

Figure 64: Ceph for Windows Installer Customer Setup

Be sure to accept the warning to install the driver.

Figure 65: Ceph for Windows Installer Driver Warning

Once completed, you will be prompted to reboot your Windows server. After reboot, we need to modify
the ceph.conf file. The default location for the ceph.conf file on Windows is
%ProgramData%\ceph\ceph.conf, You can just take the MON host information from one of your Ceph
cluster nodes.

[root@cephnode1 ~]# cat /etc/ceph/ceph.conf


# minimal ceph.conf for e7fcc1ac-42ec-11ef-a58f-bc241172f341
[global]
fsid = e7fcc1ac-42ec-11ef-a58f-bc241172f341
mon_host = [v2:10.0.0.240:3300/0,v1:10.0.0.240:6789/0]
[v2:10.0.0.241:3300/0,v1:10.0.0.241:6789/0] [v2:10.0.0.242:3300/0,v1:10.0.0.242:6789/0]
[root@cephnode1 ~]#

Edit the Windows ceph.conf file to match in the C:\ProgramData\ceph directory.

Figure 66: Ceph for Windows ceph.conf file

78
IBM Storage Ceph for Beginner’s

Copy the client keyring file to the same directory.

Figure 67: Ceph for Windows Configuration file and client keyring

You should be able to list RBD images for which you have access to using the rbd command. Note, if
we try the wrong namespace we get an error.

Figure 68: Ceph for Windows rbd image listing

If you navigate to Windows Disk Management you should see the RBD image. Right click to mark it as
online.

Figure 69: Windows Disk Management

Right Click on the RBD image and initialize and format it.

79
IBM Storage Ceph for Beginner’s

Figure 70: Windows Disk Management Initialize Disk

The volume should come online and available for use.

Figure 71: Windows Disk Management with online RBD image

You can use the rbd command to list the configuration as well. You can find a list of rbd command
options here:

https://docs.ceph.com/en/reef/rbd/rados-rbd-cmds/

Figure 72: Ceph for Windows rbd command to show mapped images

80
IBM Storage Ceph for Beginner’s

Accessing a Ceph RBD Image on a Linux Host


In our lab setup we have a Linux VM running Ubuntu server. We will first install the required ceph-
common package which include the rbd command. Additional packages ceph and ceph-mds will also
be install as pre-requisites by the package manager.

root@labserver:~# apt install ceph-common


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
libbabeltrace1 libboost-context1.83.0 libboost-filesystem1.83.0 libboost-iostreams1.83.0
.
.
.
No VM guests are running outdated hypervisor (qemu) binaries on this host.
root@labserver:~#

Copy the client keyring file from the cluster node where we created the rbd users (or you can re-export
the keyring file again from the CLI or use the GUI to copy and paste it). Also copy the ceph.conf from
one of your cluster nodes.

root@labserver:~# scp root@cephnode1:/etc/ceph/ceph.client.linux.keyring


/etc/ceph/ceph.client.linux.keyring
root@cephnode1's password:
ceph.client.linux.keyring
100% 63 132.5KB/s 00:00
root@labserver:~# scp root@cephnode1:/etc/ceph/ceph.conf /etc/ceph/ceph.conf
root@cephnode1's password:
ceph.conf
100% 259 473.4KB/s 00:00
root@labserver:~#

Like we did on Windows, we need to map the RBD image. As you can see, a new device /dev/rbd0 is
created.

root@labserver:~# rbd map --namespace linux --id linux rbd/linuxlun


/dev/rbd0
root@labserver:~# rbd showmapped
id pool namespace image snap device
0 rbd linux linuxlun - /dev/rbd0
root@labserver:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 50.8M 1 loop /snap/aws-cli/707
loop1 7:1 0 74.2M 1 loop /snap/core22/1380
loop2 7:2 0 38.8M 1 loop /snap/snapd/21759
sda 8:0 0 20G 0 disk
├─sda1 8:1 0 1M 0 part
└─sda2 8:2 0 20G 0 part /
sr0 11:0 1 2.6G 0 rom
rbd0 251:0 0 3G 0 disk
root@labserver:~#

Create a filesystem and mount it.

root@labserver:~# mkfs.xfs -K /dev/rbd0


meta-data=/dev/rbd0 isize=512 agcount=8, agsize=98304 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=1 inobtcount=1 nrext64=0
data = bsize=4096 blocks=786432, imaxpct=25
= sunit=16 swidth=16 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=16384, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
root@labserver:~# mount /dev/rbd0 /mnt
root@labserver:~# df -hT

81
IBM Storage Ceph for Beginner’s

Filesystem Type Size Used Avail Use% Mounted on


tmpfs tmpfs 392M 1.1M 391M 1% /run
/dev/sda2 ext4 20G 13G 6.3G 66% /
tmpfs tmpfs 2.0G 84K 2.0G 1% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 392M 12K 392M 1% /run/user/0
/dev/rbd0 xfs 3.0G 91M 2.9G 3% /mnt
root@labserver:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
.
.
.
/dev/rbd0 on /mnt type xfs
(rw,relatime,attr2,inode64,logbufs=8,logbsize=64k,sunit=128,swidth=128,noquota)
root@labserver:~#

A handy tool for POCs is to use the rbd bench option to test the performance of the RBD image. We
will assign a new RBD image and then run the rbd bench with a read/write I/O pattern against it. Note,
we are using the raw device but you could also create a filesystem and run the benchmark this way as
well. Also, you can perform different types of tests and use different block sizes (see the command
syntax for the rbd command or from the syntax output below).

root@labserver:~# rbd map --namespace linux --id linux rbd/rdbbench


/dev/rbd1
root@labserver:~# rbd showmapped
id pool namespace image snap device
0 rbd linux linuxlun - /dev/rbd0
1 rbd linux rdbbench - /dev/rbd1
root@labserver:~#

bench --io-type <read | write | readwrite | rw> [--io-size size-in-B/K/M/G/T] [--io-threads


num-ios-in-flight] [--io-total size-in-B/K/M/G/T]
[--io-pattern seq | rand] [--rw-mix-read read proportion in readwrite] image-spec
Generate a series of IOs to the image and measure the IO throughput and
latency. If no suffix is given, unit B is assumed for both
--io-size and --io-total. Defaults are: --io-size 4096, --io-threads 16, --io-
total 1G, --io-pattern seq, --rw-mix-read 50.

rbd: couldn't connect to the cluster!


root@labserver:~# rbd --namespace linux --id linux bench --io-type rw rdbbench
bench type readwrite read:write=50:50 io_size 4096 io_threads 16 bytes 1073741824 pattern
sequential
SEC OPS OPS/SEC BYTES/SEC
1 5232 5248.04 21 MiB/s
2 9520 4760.9 19 MiB/s
3 13776 4594.31 18 MiB/s
.
.
.
60 251456 4575.58 18 MiB/s
61 255168 4119.26 16 MiB/s
62 258880 4081.67 16 MiB/s
elapsed: 62 ops: 262144 ops/sec: 4168.33 bytes/sec: 16 MiB/s
read_ops: 131039 read_ops/sec: 2083.64 read_bytes/sec: 8.1 MiB/s
write_ops: 131105 write_ops/sec: 2084.69 write_bytes/sec: 8.1 MiB/s
root@labserver:~#

root@labserver:~# rbd --namespace linux --id linux unmap rbd/rdbbench


root@labserver:~# rbd device list
id pool namespace image snap device
0 rbd linux linuxlun - /dev/rbd0
root@labserver:~#

You can monitor performance of the benchmark from the Ceph Grafana dashboard.

82
IBM Storage Ceph for Beginner’s

Figure 73: Ceph Grafana Dashboard – RBD Performance statistics

You can also make use of the rbd iotop and rbd iostat commands. You need to first make sure that the
rbd_support Ceph manager module is enabled (you can check with “ceph mgr module ls”) and ensure
that the RBD stats are enabled in the Prometheus manager module (they are not by default).

[root@cephnode1 ~]# ceph config set mgr mgr/prometheus/rbd_stats_pools "*"


[root@cephnode1 ~]# rbd perf image iotop

Figure 74: Ceph iotop

Or alternatively you can use iostat or dstat.

[root@cephnode1 ~]# rbd perf image iostat


NAME WR RD WR_BYTES RD_BYTES WR_LAT RD_LAT
rbd/linux/linuxlun 10/s 0/s 22 MiB/s 0 B/s 7.08 s 0.00 ns

NAME WR RD WR_BYTES RD_BYTES WR_LAT RD_LAT


rbd/linux/linuxlun 12/s 0/s 28 MiB/s 0 B/s 8.48 s 0.00 ns
rbd/windows/winlun 0/s 0/s 0 B/s 0 B/s 3.65 s 0.00 ns

NAME WR RD WR_BYTES RD_BYTES WR_LAT RD_LAT


rbd/linux/linuxlun 15/s 0/s 30 MiB/s 0 B/s 10.94 s 0.00 ns
rbd/windows/winlun 0/s 0/s 0 B/s 0 B/s 26.74 ms 0.00 ns

83
IBM Storage Ceph for Beginner’s

Ceph RBD Images and Thin Provisioning


As mentioned earlier, RBD images are thin provisioned by default. As an example, let us write files to
our Windows RBD image and then delete them to see if the free space is reclaimed. First, we copy
some data to the RBD image.

Figure 75: Copying data to the Windows RBD image

Next, we can check on the Ceph dashboard for the current capacity utilization. In our case, the
Windows RBD image has consumed ~42% of its provisioned size of 3GB.

Figure 76: Ceph Dashboard List RBD Images

We now delete some files on the Windows host to free up space.

84
IBM Storage Ceph for Beginner’s

Figure 77: Deleting file on the Windows host

If we check on the Ceph dashboard, we should immediately see the freed-up space. This is one of the
advantages of using Ceph. On traditional block storage arrays, thin provisioned space is not freed-up
by default and is only possible if the array and host operating system support the SCSI unmap
function.

Figure 78: Ceph Dashboard List RBD Images

Testing RBD client access during a failure


For a POC, you will most likely need to prove that RADOS block clients are not affected during a Ceph
storage node failure for example. To test this scenario, we will generate an I/O load from both
Windows and Linux clients.

85
IBM Storage Ceph for Beginner’s

For the Windows RBD client, we will use AJA System Test Utility (https://www.aja.com/products/aja-
system-test).

Figure 79: Using AJA System Test to generate load on Windows RBD client

On Linux, you can just use a simple dd command to generate load as follows:

root@labserver:/mnt# while true


> do
> dd if=/dev/zero of=1gbfile bs=1024k count=1000
> sleep 2
> done
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 8.31147 s, 126 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 18.5819 s, 56.4 MB/s
.
.
.

We can check the total throughput on the Ceph Grafana Dashboard.

Figure 80: Ceph Grafana Dashboard RBD Details

86
IBM Storage Ceph for Beginner’s

On the IBM Storage Ceph Dashboard we can check the load on our four cluster nodes.

Figure 81: IBM Storage Ceph Dashboard Host Performance

We can verify from the Linux client which nodes the client is connecting too. Sure enough, the client is
load-balancing across the OSD nodes (.240-.243).

root@labserver:~# tshark -i ens18 | grep 10.0.0.24 | grep -v "10.0.0.101"


Running as user "root" and group "root". This could be dangerous.
Capturing on 'ens18'
60 1 0.000000000 10.0.0.241 → 10.0.0.246 TCP 66 6809 → 55436 [ACK] Seq=1 Ack=1
Win=22457 Len=0 TSval=3859233196 TSecr=1924312089
3 0.031761244 10.0.0.243 → 10.0.0.246 TCP 324 6811 → 35434 [PSH, ACK] Seq=1 Ack=1
Win=22451 Len=258 TSval=109991383 TSecr=2431963138
4 0.031789716 10.0.0.246 → 10.0.0.243 TCP 66 35434 → 6811 [ACK] Seq=1 Ack=259 Win=249
Len=0 TSval=2431963538 TSecr=109991383
6 0.185628596 10.0.0.241 → 10.0.0.246 TCP 324 6801 → 45636 [PSH, ACK] Seq=1 Ack=1
Win=28599 Len=258 TSval=3859233382 TSecr=1924312073
7 0.185675411 10.0.0.246 → 10.0.0.241 TCP 66 45636 → 6801 [ACK] Seq=1 Ack=259 Win=249
Len=0 TSval=1924312315 TSecr=3859233382
8 0.211422132 10.0.0.241 → 10.0.0.246 TCP 324 6809 → 55436 [PSH, ACK] Seq=1 Ack=1
Win=22457 Len=258 TSval=3859233408 TSecr=1924312089
9 0.211422396 10.0.0.241 → 10.0.0.246 TCP 324 6809 → 55436 [PSH, ACK] Seq=259 Ack=1
Win=22457 Len=258 TSval=3859233408 TSecr=1924312089
10 0.211496459 10.0.0.246 → 10.0.0.241 TCP 66 55436 → 6809 [ACK] Seq=1 Ack=259
Win=1775 Len=0 TSval=1924312341 TSecr=3859233408
11 0.211511900 10.0.0.246 → 10.0.0.241 TCP 66 55436 → 6809 [ACK] Seq=1 Ack=517
Win=1773 Len=0 TSval=1924312341 TSecr=3859233408
12 0.247393568 10.0.0.246 → 10.0.0.243 TCP 75 35434 → 6811 [PSH, ACK] Seq=1 Ack=259
Win=249 Len=9 TSval=2431963754 TSecr=109991383
13 0.247647438 10.0.0.243 → 10.0.0.246 TCP 66 6811 → 35434 [ACK] Seq=259 Ack=10
Win=22451 Len=0 TSval=109991599 TSecr=2431963754
15 0.249786477 10.0.0.243 → 10.0.0.246 TCP 324 6811 → 35434 [PSH, ACK] Seq=259 Ack=10
Win=22451 Len=258 TSval=109991601 TSecr=2431963754
16 0.249830887 10.0.0.246 → 10.0.0.243 TCP 66 35434 → 6811 [ACK] Seq=10 Ack=517
Win=249 Len=0 TSval=2431963756 TSecr=109991601
17 0.250556209 10.0.0.240 → 10.0.0.246 TCP 324 6811 → 34828 [PSH, ACK] Seq=1 Ack=1
Win=22454 Len=258 TSval=448126349 TSecr=4140953940
18 0.250580712 10.0.0.246 → 10.0.0.240 TCP 66 34828 → 6811 [ACK] Seq=1 Ack=259
Win=2356 Len=0 TSval=4140954719 TSecr=448126349
.
.
.

87
IBM Storage Ceph for Beginner’s

Let us now shutdown one of the cluster nodes and monitor the behavior.

[root@cephnode2 ~]# shutdown -t now


Shutdown scheduled for Mon 2024-07-22 19:06:17 SAST, use 'shutdown -c' to cancel.
[root@cephnode2 ~]#

[root@cephnode1 ~]# ceph -s


cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_WARN
1/3 mons down, quorum cephnode1,cephnode3
2 osds down
1 host (2 osds) down

services:
mon: 3 daemons, quorum cephnode1,cephnode3 (age 17s), out of quorum: cephnode2
mgr: cephnode2.iaecpr(active, since 78m), standbys: cephnode1.tbqyke
mds: 1/1 daemons up, 1 standby
osd: 8 osds: 6 up (since 16s), 8 in (since 3d)
rgw: 2 daemons active (2 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 11 pools, 625 pgs
objects: 1.40k objects, 3.2 GiB
usage: 17 GiB used, 182 GiB / 200 GiB avail
pgs: 625 active+clean

io:
client: 1.2 KiB/s rd, 15 MiB/s wr, 1 op/s rd, 8 op/s wr

[root@cephnode1 ~]#

From the Linux hosts we can see a slight delay in writing our test file. However, access was not lost
and continued as normal even with the failure albeit with some performance degradation (this is to be
expected as we shutdown an OSD node with 2 active OSDs). The same behavior is observed on the
Windows client.

.
.
.
1048576000 bytes (1.0 GB, 1000 MiB) copied, 28.8649 s, 36.3 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 36.7548 s, 28.5 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 38.0272 s, 27.6 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 30.8565 s, 34.0 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 25.0541 s, 41.9 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 25.4571 s, 41.2 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 23.8836 s, 43.9 MB/s
.
.
.

We can see the drop in performance when cephnode2 was shutdown and the subsequent resumption
of I/O load after a brief pause.

88
IBM Storage Ceph for Beginner’s

Figure 82: IBM Storage Ceph Dashboard Host Performance

The Ceph Grafana dashboard RBD Details also show the failure and continuation of I/O after it.

Figure 83: Ceph Grafana Dashboard RBD Details

Reboot the failed node and wait for the cluster to recover.

[root@cephnode1 ~]# ceph health detail


HEALTH_OK
[root@cephnode1 ~]#

IBM Storage Ceph Grafana Dashboards


For a POC, you would typically need to demonstrate Ceph’s reporting capability. The Ceph Grafana
dashboards are available at port 3000 on the node that has the Grafana service deployed. If you recall,
we selected cephnode4 (by way of labelling). The bootstrap process automatically enables the
Grafana dashboards unless you choose not to. By default, cephadm does not create an admin user for
Grafana so you might want to do this now. This process is documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=access-setting-admin-user-password-
grafana

89
IBM Storage Ceph for Beginner’s

Essentially, we need to create a Grafana YAML file with the desired admin password and then use the
Ceph orchestrator to apply the new specification as per below:

[root@cephnode1 ~]# cat grafana.yml


service_type: grafana
spec:
initial_admin_password: Passw0rd
[root@cephnode1 ~]#

[root@cephnode1 ~]# cephadm shell --mount grafana.yml:/root/grafana.yml


Inferring fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341
Inferring config /var/lib/ceph/e7fcc1ac-42ec-11ef-a58f-bc241172f341/mon.cephnode1/config
Using ceph image with id 'a09ffce67935' and tag 'latest' created on 2024-05-31 19:48:46 +0000
UTC
cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:354f5b6f203dbd9334ac2f2bfb541c7e06498a62283b1c91cef5fa8a036aea4f
[ceph: root@cephnode1 /]#

You can then connect to the IP of the node running the Grafana service (check under services in the
Ceph Dashboard) and login as admin with the new password.

Figure 84: Ceph Grafana Welcome page

If you navigate to Dashboard, you will see the available ones to be displayed.

90
IBM Storage Ceph for Beginner’s

Figure 85: Ceph Grafana Pre-installed Dashboards

You can then display the desired dashboard.

Figure 86: Ceph Grafana Cluster Dashboard

91
IBM Storage Ceph for Beginner’s

Figure 87: Ceph Grafana Host Dashboard

Figure 88: Ceph Grafana Pools Dashboard

92
IBM Storage Ceph for Beginner’s

Figure 89: Ceph Grafana OSD Dashboard

If you get NO DATA on any of the dashboard, check that the data source definitions are correctly
defined. Also, if you get errors when accessing the embedded Grafana pages on the Ceph
dashboard due to using a self-signed certificate, be sure to add an exception on your browser (e.g.
On Firefox, under Settings -> Privacy and Security, navigate to Certificate Manager and add an
exception manually).

IBM Storage Ceph Software Upgrade


As part of a POC, you will need to demonstrate Ceph non-disruptive upgrade capability. You can
upgrade your IBM Storage Ceph software version from the Ceph dashboard or CLI. A full explanation
of all the upgrade pre-requisites and options is documented here.

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=upgrading

To check for a software upgrade on the Ceph dashboard, navigate to “Administration -> Upgrade”.

93
IBM Storage Ceph for Beginner’s

Figure 90: Ceph Dashboard Upgrade Software

As you can see, we can’t do an upgrade check from the GUI. This issue is documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=access-upgrading-cluster

You can check the current version of IBM Storage Ceph from the command line as follows:

[root@cephnode1 ~]# ceph version


ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)
[root@cephnode1 ~]# ceph versions
{
"mon": {
"ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef
(stable)": 3
},
"mgr": {
"ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef
(stable)": 2
},
"osd": {
"ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef
(stable)": 8
},
"rgw": {
"ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef
(stable)": 2
},
"overall": {

94
IBM Storage Ceph for Beginner’s

"ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef


(stable)": 15
}
}
[root@cephnode1 ~]#

To perform a command line upgrade, follow the process documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=cephadm-upgrading-storage-ceph-cluster

You will also want to check what the options are to perform a staggered upgrade which is documented
here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=upgrading-staggered-upgrade

Since we can’t demonstrate the upgrade from the Ceph Dashboard, we will go through the steps for a
command line upgrade using the Ceph orchestrator. The automated upgrade process follows Ceph
best practices. It will start first with MGRs, MONs then other daemons. Each daemon is restarted only
after Ceph determines that the cluster will remain available.

As per the link provided above, the procedure is demonstrated below. First, we will ensure we have a
valid subscription and apply the latest OS updates (similar to when we bootstrapped a new cluster).

[root@cephnode1 ~]# cephdsh yum repolist


[1] 20:09:26 [SUCCESS] root@cephnode2
Updating Subscription Management repositories.
repo id repo name
ceph_stable_x86_64 IBM Ceph repo - x86_64
epel Extra Packages for Enterprise Linux 9 - x86_64
ibm-storage-ceph-7 ibm-storage-ceph-7
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
[2] 20:09:26 [SUCCESS] root@cephnode1
Updating Subscription Management repositories.
repo id repo name
ceph_stable_x86_64 IBM Ceph repo - x86_64
epel Extra Packages for Enterprise Linux 9 - x86_64
ibm-storage-ceph-7 ibm-storage-ceph-7
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
[3] 20:09:26 [SUCCESS] root@cephnode4
Updating Subscription Management repositories.
repo id repo name
ceph_stable_x86_64 IBM Ceph repo - x86_64
epel Extra Packages for Enterprise Linux 9 - x86_64
ibm-storage-ceph-7 ibm-storage-ceph-7
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
[4] 20:09:27 [SUCCESS] root@cephnode3
Updating Subscription Management repositories.
repo id repo name
ceph_stable_x86_64 IBM Ceph repo - x86_64
epel Extra Packages for Enterprise Linux 9 - x86_64
ibm-storage-ceph-7 ibm-storage-ceph-7
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
[root@cephnode1 ~]#

[root@cephnode1 ~]# cephdsh dnf update -y


[1] 21:38:08 [SUCCESS] root@cephnode4
Updating Subscription Management repositories.
Last metadata expiration check: 1:40:50 ago on Sun 21 Jul 2024 19:57:18.
Dependencies resolved.
Nothing to do.
Complete!
[2] 21:38:08 [SUCCESS] root@cephnode3
Updating Subscription Management repositories.
Last metadata expiration check: 2:39:59 ago on Sun 21 Jul 2024 18:58:09.
Dependencies resolved.

95
IBM Storage Ceph for Beginner’s

Nothing to do.
Complete!
[3] 21:38:09 [SUCCESS] root@cephnode1
Updating Subscription Management repositories.
Last metadata expiration check: 1:26:15 ago on Sun 21 Jul 2024 20:11:53.
Dependencies resolved.
Nothing to do.
Complete!
[4] 21:38:09 [SUCCESS] root@cephnode2
Updating Subscription Management repositories.
Last metadata expiration check: 2:09:19 ago on Sun 21 Jul 2024 19:28:49.
Dependencies resolved.
Nothing to do.
Complete!
[root@cephnode1 ~]#

We need to also update cephadm and ceph-ansible to the latest version.

[root@cephnode1 ~]# cephdsh dnf update cephadm -y


[1] 21:38:57 [SUCCESS] root@cephnode1
Updating Subscription Management repositories.
Last metadata expiration check: 1:27:04 ago on Sun 21 Jul 2024 20:11:53.
Dependencies resolved.
Nothing to do.
Complete!
[2] 21:38:57 [SUCCESS] root@cephnode4
Updating Subscription Management repositories.
Last metadata expiration check: 1:41:39 ago on Sun 21 Jul 2024 19:57:18.
Dependencies resolved.
Nothing to do.
Complete!
[3] 21:38:58 [SUCCESS] root@cephnode3
Updating Subscription Management repositories.
Last metadata expiration check: 2:40:48 ago on Sun 21 Jul 2024 18:58:09.
Dependencies resolved.
Nothing to do.
Complete!
[4] 21:38:58 [SUCCESS] root@cephnode2
Updating Subscription Management repositories.
Last metadata expiration check: 2:10:08 ago on Sun 21 Jul 2024 19:28:49.
Dependencies resolved.
Nothing to do.
Complete!
[root@cephnode1 ~]#

[root@cephnode1 ~]# dnf update cephadm-ansible -y


Updating Subscription Management repositories.
Last metadata expiration check: 1:28:00 ago on Sun 21 Jul 2024 20:11:53.
Dependencies resolved.
Nothing to do.
Complete!
[root@cephnode1 ~]#

We used PSSH as explained earlier. If you chose to not use it, then you can use the Ceph preflight
playbook to upgrade the Ceph packages on the other cluster nodes by specifying
upgrade_ceph_packages=true as follows:
[root@cephnode1 ~]# cd /usr/share/cephadm-ansible
[root@cephnode1 cephadm-ansible]# ansible-playbook -i ./inventory/production/hosts cephadm-
preflight.yml --extra-vars "ceph_origin=ibm upgrade_ceph_packages=true"
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new
standard, use callbacks_enabled instead. This feature will be removed from ansible-core in
version
2.15. Deprecation warnings can be disabled by setting deprecation_warnings=False in
ansible.cfg.

PLAY [insecure_registries]
*********************************************************************************************
*********************************************************************

TASK [fail if insecure_registry is undefined]


*********************************************************************************************
**************************************************

96
IBM Storage Ceph for Beginner’s

Sunday 21 July 2024 21:30:44 +0200 (0:00:00.036) 0:00:00.036 ***********


skipping: [cephnode2]

PLAY [preflight]
*********************************************************************************************
*******************************************************************************

TASK [fail when ceph_origin is custom with no repository defined]


*********************************************************************************************
******************************
Sunday 21 July 2024 21:30:44 +0200 (0:00:00.106) 0:00:00.143 ***********
skipping: [cephnode2]

TASK [fail if baseurl is not defined for ceph_custom_repositories]


*********************************************************************************************
*****************************
.
.
.
fetch ceph development repository -----------------------------------------------------------
-----------------------------------------------------------------------------------------
0.05s
enable red hat ceph storage tools repository ------------------------------------------------
-----------------------------------------------------------------------------------------
0.05s
configure ceph development repository -------------------------------------------------------
-----------------------------------------------------------------------------------------
0.05s
[root@cephnode1 cephadm-ansible]#

Before we start the upgrade, we need to check that all cluster nodes are online and that our cluster is
healthy.

[root@cephnode1 cephadm-ansible]# cephadm shell


Inferring fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341
Inferring config /var/lib/ceph/e7fcc1ac-42ec-11ef-a58f-bc241172f341/mon.cephnode1/config
Using ceph image with id 'a09ffce67935' and tag 'latest' created on 2024-05-31 19:48:46 +0000
UTC
cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:354f5b6f203dbd9334ac2f2bfb541c7e06498a62283b1c91cef5fa8a036aea4f
[ceph: root@cephnode1 /]# ceph -s
cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_OK

services:
mon: 3 daemons, quorum cephnode1,cephnode3,cephnode2 (age 57m)
mgr: cephnode2.iaecpr(active, since 57m), standbys: cephnode1.tbqyke
osd: 8 osds: 8 up (since 57m), 8 in (since 2d)
rgw: 2 daemons active (2 hosts, 1 zones)

data:
pools: 9 pools, 481 pgs
objects: 632 objects, 716 MiB
usage: 2.6 GiB used, 197 GiB / 200 GiB avail
pgs: 481 active+clean

[ceph: root@cephnode1 /]#

[root@cephnode1 ~]# ceph orch host ls


HOST ADDR LABELS STATUS
cephnode1.local 10.0.0.240 _admin,mon,mgr,osd,mds
cephnode2.local 10.0.0.241 mgr,mon,osd,rgw
cephnode3.local 10.0.0.242 mon,osd,rgw
cephnode4.local 10.0.0.243 mds,osd,grafana
4 hosts in cluster
[root@cephnode1 ~]#

[root@cephnode1 ~]# ceph cephadm check-host cephnode4.local


cephnode4.local (None) ok
podman (/usr/bin/podman) version 4.9.4 is present
systemctl is present

97
IBM Storage Ceph for Beginner’s

lvcreate is present
Unit chronyd.service is enabled and running
Hostname "cephnode4.local" matches what is expected.
Host looks OK
[root@cephnode1 ~]#

[root@cephnode1 ~]# ceph health


HEALTH_OK
[root@cephnode1 ~]#

In order to ensure that no recovery actions are performed during the upgrade, we want to set the
noout, noscrub and nodeep-scrub flags. This will also prevent any unnecessary load on the cluster
during the upgrade.

[ceph: root@cephnode1 /]# ceph osd set noout


noout is set
[ceph: root@cephnode1 /]# ceph osd set noscrub
noscrub is set
[ceph: root@cephnode1 /]# ceph osd set nodeep-scrub
nodeep-scrub is set
[ceph: root@cephnode1 /]#

Now we need to login to the IBM Container Registry to check service versions and available target
versions.

[root@cephnode1 ~]# ceph cephadm registry-login -i /etc/registry.json


registry login scheduled
[root@cephnode1 ~]# ceph orch upgrade check cp.icr.io/cp/ibm-ceph/ceph-7-rhel9:latest
{
"needs_update": {},
"non_ceph_image_daemons": [
"node-exporter.cephnode1",
"alertmanager.cephnode1",
"prometheus.cephnode1",
"node-exporter.cephnode2",
"haproxy.rgw.rgw_default.cephnode2.ggqdig",
"keepalived.rgw.rgw_default.cephnode2.feptmu",
"node-exporter.cephnode3",
"haproxy.rgw.rgw_default.cephnode3.hrynln",
"keepalived.rgw.rgw_default.cephnode3.warcjn",
"node-exporter.cephnode4",
"grafana.cephnode4"
],
"target_digest": "cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:01269061f428d247bdbc8d4855ccf847e3d909e9a4d2418e4ee63bf2e6e6a7de",
"target_id": "a09ffce67935824d5fa4c1ed5d399ec3c815d0d36dd4eaca0902ff6764375bfe",
"target_name": "cp.icr.io/cp/ibm-ceph/ceph-7-rhel9:latest",
"target_version": "ceph version 18.2.1-194.el9cp
(04a992766839cd3207877e518a1238cdbac3787e) reef (stable)",
"up_to_date": [
"mon.cephnode1",
"mgr.cephnode1.tbqyke",
"ceph-exporter.cephnode1",
"crash.cephnode1",
"osd.0",
"osd.1",
"ceph-exporter.cephnode2",
"crash.cephnode2",
"osd.2",
"osd.5",
"mon.cephnode2",
"mgr.cephnode2.iaecpr",
"rgw.rgw_default.cephnode2.urtdgh",
"ceph-exporter.cephnode3",
"crash.cephnode3",
"osd.3",
"osd.6",
"mon.cephnode3",
"rgw.rgw_default.cephnode3.iuusox",
"ceph-exporter.cephnode4",
"crash.cephnode4",

98
IBM Storage Ceph for Beginner’s

"osd.4",
"osd.7"
]
}
[root@cephnode1 ~]#

You can see that we have some services that need an update and others that do not. We can now start
the upgrade for our Ceph cluster.

[root@cephnode1 ~]# ceph orch upgrade start cp.icr.io/cp/ibm-ceph/ceph-7-rhel9:latest


Initiating upgrade to cp.icr.io/cp/ibm-ceph/ceph-7-rhel9:latest
[root@cephnode1 ~]#

We can monitor the progress using the “ceph orch upgrade status” command. When the in_progress
status changes to false the upgrade is completed.

[root@cephnode1 ~]# ceph orch upgrade status


{
"target_image": "cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:01269061f428d247bdbc8d4855ccf847e3d909e9a4d2418e4ee63bf2e6e6a7de",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"crash",
"ceph-exporter",
"osd",
"rgw",
"mgr"
],
"progress": "22/30 daemons upgraded",
"message": "Currently upgrading mon daemons",
"is_paused": false
}
[root@cephnode1 ~]#

[root@cephnode1 ~]# ceph orch upgrade status


{
"target_image": null,
"in_progress": false,
"which": "<unknown>",
"services_complete": [],
"progress": null,
"message": "",
"is_paused": false
}
[root@cephnode1 ~]#

You can also check the status using the “ceph status” command (sample output provided).

[ceph: root@host01 /]# ceph status


[...]
progress:
Upgrade to 18.2.0-128.el9c (1s)
[............................]

After the upgrade is completed, you can check the versions using “ceph ps” or “ceph versions” and
check the cluster version using the “ceph --version” commands.

[root@cephnode1 ~]# ceph orch ps


NAME HOST PORTS STATUS
REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.cephnode1 cephnode1.local *:9093,9094 running (75m)
4m ago 5d 24.4M - 0.26.0 2bdd88ba9d9f 3e4cdf16f044
ceph-exporter.cephnode1 cephnode1.local running (75m)
4m ago 5d 9.84M - 18.2.1-194.el9cp a09ffce67935 8b43319b2a73

99
IBM Storage Ceph for Beginner’s

ceph-exporter.cephnode2 cephnode2.local running (79m)


3m ago 2d 11.1M - 18.2.1-194.el9cp a09ffce67935 ab42f6da1169
ceph-exporter.cephnode3 cephnode3.local running (88m)
3m ago 2d 11.5M - 18.2.1-194.el9cp a09ffce67935 ba3c02427fca
ceph-exporter.cephnode4 cephnode4.local running (89m)
3m ago 2d 9517k - 18.2.1-194.el9cp a09ffce67935 8c18454ac5ee
crash.cephnode1 cephnode1.local running (75m)
4m ago 5d 6895k - 18.2.1-194.el9cp a09ffce67935 495bfbc94b2a
crash.cephnode2 cephnode2.local running (79m)
3m ago 2d 6883k - 18.2.1-194.el9cp a09ffce67935 88e7260c77bf
crash.cephnode3 cephnode3.local running (88m)
3m ago 2d 6878k - 18.2.1-194.el9cp a09ffce67935 3b7ef66338ad
crash.cephnode4 cephnode4.local running (89m)
3m ago 2d 6874k - 18.2.1-194.el9cp a09ffce67935 51380ed8956e
grafana.cephnode4 cephnode4.local *:3000 running (89m)
3m ago 9h 81.5M - 10.4.0-pre 623fd2b148fe 46d089c37119
haproxy.rgw.rgw_default.cephnode2.ggqdig cephnode2.local *:443,1900 running (79m)
.
.
.
cephnode1.local *:9095 running (75m) 4m ago 5d 68.0M - 2.48.0
d1ad5c044d2e eefb16865f49
rgw.rgw_default.cephnode2.urtdgh cephnode2.local *:8080 running (79m)
3m ago 37h 153M - 18.2.1-194.el9cp a09ffce67935 11f695d401ef
rgw.rgw_default.cephnode3.iuusox cephnode3.local *:8080 running (88m)
3m ago 37h 157M - 18.2.1-194.el9cp a09ffce67935 0de0049e3422
[root@cephnode1 ~]#

We need to unset the noout, noscrub, and nodeep-scrub flags we set before we started the upgrade.

[root@cephnode1 ~]# ceph status


cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_WARN
noout,noscrub,nodeep-scrub flag(s) set

services:
mon: 3 daemons, quorum cephnode1,cephnode3,cephnode2 (age 5m)
mgr: cephnode2.iaecpr(active, since 75m), standbys: cephnode1.tbqyke
osd: 8 osds: 8 up (since 75m), 8 in (since 2d)
flags noout,noscrub,nodeep-scrub
rgw: 2 daemons active (2 hosts, 1 zones)

data:
pools: 9 pools, 481 pgs
objects: 632 objects, 716 MiB
usage: 2.6 GiB used, 197 GiB / 200 GiB avail
pgs: 481 active+clean

[root@cephnode1 ~]#

[root@cephnode1 ~]# ceph osd unset noout


noout is unset
[root@cephnode1 ~]# ceph osd unset noscrub
noscrub is unset
[root@cephnode1 ~]# ceph osd unset nodeep-scrub
nodeep-scrub is unset
[root@cephnode1 ~]# ceph -s
cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_OK

services:
mon: 3 daemons, quorum cephnode1,cephnode3,cephnode2 (age 12m)
mgr: cephnode2.iaecpr(active, since 83m), standbys: cephnode1.tbqyke
osd: 8 osds: 8 up (since 83m), 8 in (since 2d)
rgw: 2 daemons active (2 hosts, 1 zones)

data:
pools: 9 pools, 481 pgs
objects: 632 objects, 716 MiB
usage: 2.6 GiB used, 197 GiB / 200 GiB avail
pgs: 481 active+clean

[root@cephnode1 ~]#

100
IBM Storage Ceph for Beginner’s

The final step is to upgrade ceph-tools on all client nodes that connect to the cluster and to check that
they are on the latest version. You can run the “dnf update ceph-common” on RHEL clients and “ceph
--version” commands to do this.

IBM Storage Ceph Filesystem (CephFS) Deployment


Ceph Filesystem or CephFS is a high performant POSIX compliant filesystem that is built on top of
Ceph’s distributed object store, RADOS. CephFS stores the filesystem metadata in a separate RADOS
pool to the file data and the filesystem is served out to clients via a cluster of Metadata Servers (MDS)
which can scale to support high throughput workloads. CephFS was originally the primary storage
interface for RADOS with RBD and RGW being added later. CephFS kernel driver is incorporated into
the Linux kernel.

As part of a POC, you would want to demonstrate the use of CephFS as a clustered filesystem similar
to IBM Storage Scale, GlusterFS or Lustre. The same use cases would typically apply to CephFS as
with any of the others mentioned.

Let’s start by navigating to File Systems on the Ceph Dashboard and clicking on “Create”. Note, we
are choosing to deploy our MDS servers on any cluster nodes labelled mds. Ceph will create two pools
for each filesystem, a metadata and a data pool. We do not specify a size of the filesystem at creation
time as the pool is thin provisioned.

Figure 91: Ceph Dashboard File Systems – Create

By default, a Ceph File System uses only one active MDS daemon. However, systems with many clients
benefit from multiple active MDS daemons. For a POC, we just need one active and one standby
daemon. Navigate to Administration -> Services and Edit the service call mds.labserver (our filesystem
name) and change the count to 2.

101
IBM Storage Ceph for Beginner’s

Figure 92: Ceph Dashboard MDS Service – Edit

We should have at least two MDS daemons running, one active and one standby.

Figure 93: Ceph Dashboard MDS Service – Details

You can also query this from the command line as follows:

[root@cephnode1 ~]# ceph orch ls --service-type mds


NAME PORTS RUNNING REFRESHED AGE PLACEMENT
mds.labserver 2/2 90s ago 102s count:2;label:mds
[root@cephnode1 ~]#

On the Ceph Dashboard, navigate back to File System and get details for the newly created filesystem.
We can see that we have one active and one standby daemon for our filesystem.

102
IBM Storage Ceph for Beginner’s

Figure 94: Ceph Dashboard File Systems – File System Details

We need to now add client access to the filesystem. You can click on “Authorize” to add a client. We
want to grant access for the client lab to the entire labserver filesystem and also give this client read
and write access. We can untick “Root Squash”.

Figure 95: Ceph Dashboard File Systems – Update Access

We need the client keyring so now must navigate to Administration -> Ceph Users -> client.lab and
click on “Edit” to see the client key. You can also choose to export it.

103
IBM Storage Ceph for Beginner’s

Figure 96: Ceph Dashboard Ceph Users – Edit User to display key

You can also generate the client keyring file from the command line as follows:

[root@cephnode1 ceph]# ceph auth get client.lab


[client.lab]
key = AQB0nZ1mIzu1ORAAI0z9e0wjqsqhNn5ENsWlRw==
caps mds = "allow rw fsname=labserver"
caps mon = "allow r fsname=labserver"
caps osd = "allow rw tag cephfs data=labserver"
[root@cephnode1 ceph]# ceph auth get client.lab > /etc/ceph/ceph.client.lab.keyring
[root@cephnode1 ceph]#

Always make sure your keyring file have the correct permissions set. They should be set to 600.

We need to copy this keyring file to our CephFS client server and mount the Ceph filesystem we just
created. We want to mount the filesystem using the Ceph Linux kernel driver so that it mounts as a
regular filesystem and we get native kernel performance. You can also mount it using fuse. The syntax
for the mount command is as follows:

mount -t ceph <client_name>@<fsid>.<fs_name>=/[subdir] <directory> -o [options]

The fsid is a unique identifier for the Ceph cluster, and stands for File System ID from the days when
the Ceph Storage Cluster was principally for the Ceph File System. Ceph now supports block devices
and object storage gateway interfaces too, so fsid is a bit of a misnomer. You can obtain this from
running “ceph -s” on one of the Ceph cluster nodes.

[root@cephnode1 ~]# ceph -s


cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_OK
.
.
.

104
IBM Storage Ceph for Beginner’s

You can also get the CephFS mount command syntax from the Ceph Dashboard by clicking on the
filesystem and then selecting “Attach” as the action.

Figure 97: Ceph Dashboard – File System – Attach Commands

Let us copy the client keyring file we exported and mount the filesystem.

root@labserver:/etc/ceph# scp cephnode1:/etc/ceph/ceph.client.lab.keyring


/etc/ceph/ceph.client.lab.keyring
root@cephnode1's password:
ceph.client.lab.keyring
100% 189 349.2KB/s 00:00
root@labserver:/etc/ceph# mount -t ceph [email protected]=/
/cephfs -v
parsing options: rw
mount.ceph: resolved to: "10.0.0.240:3300,10.0.0.241:3300,10.0.0.242:3300"
mount.ceph: trying mount with new device syntax: lab@e7fcc1ac-42ec-11ef-a58f-
bc241172f341.labserver=/
mount.ceph: options "name=lab,ms_mode=prefer-
crc,key=lab,mon_addr=10.0.0.240:3300/10.0.0.241:3300/10.0.0.242:3300" will pass to kernel
root@labserver:/etc/ceph# df
Filesystem 1K-blocks Used Available Use%
Mounted on
tmpfs 401004 1080 399924 1% /run
/dev/sda2 20463184 14165708 5232672 74% /
tmpfs 2005008 84 2004924 1%
/dev/shm
tmpfs 5120 0 5120 0%
/run/lock
/dev/rbd0 3080192 92372 2987820 3% /mnt
tmpfs 401000 12 400988 1%
/run/user/0
[email protected]=/ 65007616 0 65007616 0%
/cephfs
root@labserver:/etc/ceph# mount
.
.
.
[email protected]=/ on /cephfs type ceph
(rw,relatime,name=lab,secret=<hidden>,ms_mode=prefer-
crc,acl,mon_addr=10.0.0.240:3300/10.0.0.241:3300/10.0.0.242:3300)
root@labserver:/etc/ceph#

105
IBM Storage Ceph for Beginner’s

If you want to monitor performance from the command line, you can install cephfs-top utility. You
need to enable the stats plugin on your MGR module as it is disabled by default. You also need to
create a client.fstop user for the cephfs-top utility to function.

[root@cephnode1 ~]# ceph fs perf stats


Error ENOTSUP: Module 'stats' is not enabled/loaded (required by command 'fs perf stats'):
use `ceph mgr module enable stats` to enable it
[root@cephnode1 ~]# ceph mgr module enable stats
[root@cephnode1 ~]# ceph fs perf stats
{"version": 2, "global_counters": ["cap_hit", "read_latency", "write_latency",
"metadata_latency", "dentry_lease", "opened_files", "pinned_icaps", "opened_inodes",
"read_io_sizes", "write_io_sizes", "avg_read_latency", "stdev_read_latency",
"avg_write_latency", "stdev_write_latency", "avg_metadata_latency",
"stdev_metadata_latency"], "counters": [], "client_metadata": {}, "global_metrics": {},
"metrics": {"delayed_ranks": []}}
[root@cephnode1 ~]# ceph fs perf stats
{"version": 2, "global_counters": ["cap_hit", "read_latency", "write_latency",
"metadata_latency", "dentry_lease", "opened_files", "pinned_icaps", "opened_inodes",
"read_io_sizes", "write_io_sizes", "avg_read_latency", "stdev_read_latency",
.
.
.
"stdev_metadata_latency"], "IP": "10.0.0.244"}}}, "global_metrics": {"labserver":
{"client.205540": [[2059, 8], [0, 0], [72, 673842930], [0, 16169946], [5, 0], [0, 1], [1, 1],
[0, 1], [0, 0], [251, 1048576518], [0, 0], [0, 0], [0, 289537235], [2778254950830185715,
251], [0, 2309994], [62991625956152, 7]], "client.274523": [[2, 0], [0, 0], [0, 0], [0,
631727], [0, 0], [0, 1], [1, 1], [0, 1], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0,
315863], [3418239244, 2]], "client.284421": [[2, 0], [0, 0], [0, 0], [0, 681363], [0, 0], [0,
1], [1, 1], [0, 1], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 340681],
[22916621784, 2]]}}, "metrics": {"delayed_ranks": [], "mds.0": {"client.205540": [],
"client.274523": [], "client.284421": []}}}
[root@cephnode1 ~]#

[root@cephnode1 ~]# ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd
'allow r' mgr 'allow r'
[client.fstop]
key = AQC+dZ5mhqpkAhAAohC9SDLOSj8f10QxzLiDBg==
[root@cephnode1 ~]#

[root@cephnode1 ~]# dnf install cephfs-top -y


Updating Subscription Management repositories.
Last metadata expiration check: 1:44:31 ago on Mon 22 Jul 2024 15:26:22.
Dependencies resolved.
=============================================================================================
=========================================
Package Architecture Version
Repository Size
=============================================================================================
=========================================
Installing:
cephfs-top noarch 2:18.2.1-194.el9cp
ceph_stable_x86_64 114 k
.
.
.
Installed:
cephfs-top-2:18.2.1-194.el9cp.noarch

Complete!
[root@cephnode1 ~]#

Now you can get detailed metrics from the command line using cephfs-top.

106
IBM Storage Ceph for Beginner’s

Figure 98: Cephfs-top command line utility

You can also view the performance through the Ceph Dashboard or Grafana Dashboard.

Figure 99: Ceph Dashboard – File System Performance Details

A quick test of client access during a node failure can be done as follows. First, we generate some
client load from our CephFS client.

root@labserver:/cephfs# while true; do dd if=/dev/zero of=1gbfile bs=1024k count=1000; sleep


2; done
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.13656 s, 491 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.18898 s, 479 MB/s
.
.
Next, we simulate a failure of one of the nodes running an MDS daemon.

[root@cephnode1 ~]# shutdown -t now


Shutdown scheduled for Mon 2024-07-22 17:44:32 SAST, use 'shutdown -c' to cancel.
[root@cephnode1 ~]#

107
IBM Storage Ceph for Beginner’s

Check the cluster status to make sure the node is down and we should only have 1 MDS daemon
running.

[root@cephnode4 ~]# ceph -s


cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_WARN
insufficient standby MDS daemons available
1/3 mons down, quorum cephnode3,cephnode2
2 osds down
1 host (2 osds) down
Degraded data redundancy: 902/3924 objects degraded (22.987%), 194 pgs degraded,
455 pgs undersized

services:
mon: 3 daemons, quorum cephnode3,cephnode2 (age 2m), out of quorum: cephnode1
mgr: cephnode2.iaecpr(active, since 42m)
mds: 1/1 daemons up
osd: 8 osds: 6 up (since 2m), 8 in (since 3d)
rgw: 2 daemons active (2 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 11 pools, 625 pgs
objects: 1.31k objects, 2.5 GiB
usage: 15 GiB used, 185 GiB / 200 GiB avail
pgs: 902/3924 objects degraded (22.987%)
261 active+undersized
194 active+undersized+degraded
170 active+clean

io:
client: 4.3 KiB/s rd, 121 MiB/s wr, 8 op/s rd, 136 op/s wr

[root@cephnode4 ~]#

On the CephFS client, our simple script to create a file with dd is still running fine.

1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.81071 s, 579 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.62853 s, 644 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.58135 s, 663 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.55161 s, 411 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 8.83065 s, 119 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 5.6657 s, 185 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.7842 s, 588 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.72276 s, 609 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.78127 s, 589 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.70911 s, 614 MB/s

You can reboot the failed node and make sure the cluster goes back to a healthy state.

108
IBM Storage Ceph for Beginner’s

As with pools, deleting a CephFS filesystem is disabled in the Dashboard by default. You can follow
the procedure below to delete a File System.

[root@cephnode1 ~]# ceph fs status


cephfs - 0 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active cephfs.cephnode2.ecosxb Reqs: 0 /s 10 13 12 0
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 96.0k 62.0G
cephfs.cephfs.data data 0 62.0G
new - 0 clients
===
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active new.cephnode1.cdvukk Reqs: 0 /s 10 13 12 0
POOL TYPE USED AVAIL
cephfs.new.meta metadata 96.0k 62.0G
cephfs.new.data data 0 62.0G
STANDBY MDS
mds_default.cephnode4.utqxnu
linuxfs.cephnode4.nwrfth
mds_default.cephnode1.unlvwf
MDS version: ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef
(stable)
[root@cephnode1 ~]# ceph fs set cephfs down true
cephfs marked down.
[root@cephnode1 ~]# ceph fs set new down true
new marked down.
[root@cephnode1 ~]# ceph fs status
cephfs - 0 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 stopping cephfs.cephnode2.ecosxb 10 13 12 0
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 108k 62.0G
cephfs.cephfs.data data 0 62.0G
new - 0 clients
===
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 stopping new.cephnode1.cdvukk 10 13 12 0
POOL TYPE USED AVAIL
cephfs.new.meta metadata 96.0k 62.0G
cephfs.new.data data 0 62.0G
STANDBY MDS
mds_default.cephnode4.utqxnu
linuxfs.cephnode4.nwrfth
mds_default.cephnode1.unlvwf
MDS version: ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef
(stable)
[root@cephnode1 ~]# ceph fs status
cephfs - 0 clients
======
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 108k 62.0G
cephfs.cephfs.data data 0 62.0G
new - 0 clients
===
POOL TYPE USED AVAIL
cephfs.new.meta metadata 96.0k 62.0G
cephfs.new.data data 0 62.0G
STANDBY MDS
mds_default.cephnode4.utqxnu
new.cephnode1.cdvukk
linuxfs.cephnode4.nwrfth
cephfs.cephnode2.ecosxb
mds_default.cephnode1.unlvwf
MDS version: ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef
(stable)
[root@cephnode1 ~]# ceph fs rm new --yes-i-really-mean-it
[root@cephnode1 ~]# ceph fs rm cephfs --yes-i-really-mean-it

109
IBM Storage Ceph for Beginner’s

[root@cephnode1 ~]# ceph fs ls


No filesystems enabled
[root@cephnode1 ~]#

IBM Storage Ceph NFS Service Deployment


CephFS namespaces can be exported over NFS using the NFS-Ganesha service. NFS-Ganesha is an
open-source Network File System (NFS) server that exports file systems via the NFS protocol. NFS is
a popular choice for distributed systems (e.g. to provide Ceph storage to an IBM AIX client that doesn’t
have native Ceph support).

We will demonstrate deploying a single instance NFS service. As we did with the RGW service, we will
also demonstrate the deployment of a highly-available NFS server using the Ceph ingress service
which deploys a virtual IP along with HAProxy and keepalived. If you don’t need to test NFS high-
availability then you can just deploy one or more NFS gateways with no failover.

Full instructions can be found here.

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=operations-managing-nfs-ganesha-
gateway-using-ceph-orchestrator

Deploying the NFS Server with no failover


We will start by creating a NFS cluster. You can deploy the NFS service from the GUI or use the Ceph
orchestrator. Deploying it via the GUI doesn’t allow you to specify the port to use (the default is
12049). Make sure to add the node label NFS to one of your cluster nodes if you going to use node
labels to locate your service.

Figure 100: Ceph Host Labels

We can create the NFS cluster using the Dashboard. Navigate to Administration -> Services.

110
IBM Storage Ceph for Beginner’s

Figure 101: Ceph NFS Service Details

We can query our NFS cluster information from the command line.

[root@cephnode1 ~]# ceph nfs cluster ls


[
"mynfs"
]
[root@cephnode1 ~]# ceph nfs cluster info mynfs
{
"mynfs": {
"backend": [
{
"hostname": "cephnode4.local",
"ip": "10.0.0.243",
"port": 12049
}
],
"virtual_ip": null
}
}
[root@cephnode1 ~]#

Next, we will create a separate Ceph File system to use for our NFS export.

111
IBM Storage Ceph for Beginner’s

Figure 102: Ceph Dashboard File System

Now we will create an NFS export. Navigate to File -> NFS -> Create. IBM Storage Ceph support both
NFSv3 and NFSv4. Open-source Ceph (Reef) only support NFSv4. Pseudo path is the export position
within the NFSv4 pseudo filesystem where the export will be available on the server. It must be an
absolute path and be unique.

Figure 103: Ceph Dashboard NFS Export Create

We can specify the clients that we want to export the CephFS filesystem too as well as specify NFS
options.

112
IBM Storage Ceph for Beginner’s

Figure 104: Ceph Dashboard NFS Export Create - Clients

Figure 105: Ceph Dashboard NFS Export Details

You can now mount your NFS export on your intended NFS client. Since we just have a single NFS
server running on cephnode4, you would specify the NFS server IP as cephnode4 and the port as
12049.

root@labserver:~# mount -t nfs -o vers=4.2,proto=tcp,port=12049 cephnode4.local:/nfs


/mnt/cephnfs -vv
mount.nfs: timeout set for Wed Jul 24 09:35:48 2024
mount.nfs: trying text-based options
'vers=4.2,proto=tcp,port=12049,addr=10.0.0.243,clientaddr=10.0.0.246'
root@labserver:~# df
Filesystem 1K-blocks Used Available Use%
Mounted on
tmpfs 401004 1012 399992 1% /run

113
IBM Storage Ceph for Beginner’s

/dev/sda2 20463184 15064904 4333476 78% /


tmpfs 2005008 84 2004924 1%
/dev/shm
tmpfs 5120 0 5120 0%
/run/lock
[email protected]=/ 61784064 659456 61124608 2%
/cephfs
/dev/rbd0 3080192 1116372 1963820 37%
/mnt/rbd
tmpfs 401000 16 400984 1%
/run/user/0
cephnode4.local:/nfs 61124608 0 61124608 0%
/mnt/cephnfs
root@labserver:~#

You can deploy more than one NFS daemon (we only used 1 when we created our NFS service via
Administration -> Services). If you deployed more than one NFS daemon, you can use any cluster node
that the NFS service is running on. However, if that node fails, the NFS client will lose connectivity to
the NFS mount.

Deploying a Highly-Available NFS Server


We will use the ingress server to deploy a highly-available NFS server. You can find the full procedure
here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=orchestrator-implementing-ha-cephfsnfs-
service

The above link also explains how to convert an existing NFS cluster for high availability. First, we need
to label our nodes that we want to use as NFS servers. We will use cephnode1 and cephnode4.

Figure 106: Ceph Host Labels

Next, we will use the Ceph Orchestrator to deploy our NFS server with the ingress service. Our virtual
IP for the NFS server will be 10.0.0.245 (cephnfs.local).

[root@cephnode1 ganesha]# ceph nfs cluster create mynfs "2 label:nfs" --ingress --ingress-
mode haproxy-protocol --virtual-ip 10.0.0.245/24

114
IBM Storage Ceph for Beginner’s

[root@cephnode1 ganesha]#

• CLUSTER_ID with a unique string to name the NFS Ganesha cluster.


• PLACEMENT specifies the number of NFS servers to deploy and the host or hosts that you want
to deploy the NFS Ganesha daemon containers on.
• PORT_NUMBER flag is used to deploy NFS on a port other than the default port of 12049. With
ingress mode, the high-availability proxy takes port 2049 and NFS services are deployed on
12049.
• The --ingress flag combined with the --virtual-ip flag, deploys NFS with a high-availability
front-end (virtual IP and load balancer).
• The --virtual-ip IP_ADDRESS specifies an IP address to provide a known, stable NFS endpoint
that all clients can use to mount NFS exports. The --virtual-ip must include a CIDR prefix
length. The virtual IP will normally be configured on the first identified network interface
that has an existing IP in the same subnet.

You can query your NFS cluster deployment and check where our NFS server virtual IP is active (in our
case our NFS server virtual IP is aliased onto the public interface on cephnode1).

[root@cephnode1 ganesha]# ceph nfs cluster info mynfs


{
"mynfs": {
"backend": [
{
"hostname": "cephnode1.local",
"ip": "10.0.0.240",
"port": 12049
},
{
"hostname": "cephnode4.local",
"ip": "10.0.0.243",
"port": 12049
}
],
"monitor_port": 9049,
"port": 2049,
"virtual_ip": "10.0.0.245"
}
}
[root@cephnode1 ganesha]#

[root@cephnode4 ~]# ceph orch ls --service_name=nfs.mynfs


NAME PORTS RUNNING REFRESHED AGE PLACEMENT
nfs.mynfs ?:12049 2/2 3m ago 4m count:2;label:nfs
[root@cephnode4 ~]# ceph orch ls --service_name=ingress.nfs.mynfs
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
ingress.nfs.mynfs 10.0.0.245:2049,9049 4/4 3m ago 4m count:2;label:nfs
[root@cephnode4 ~]#
[root@cephnode4 ~]# ceph orch ps | grep nfs
haproxy.nfs.mynfs.cephnode1.dauiqc cephnode1.local *:2049,9049 running (6m)
4m ago 6m 5112k - 2.4.22-f8e3218 56a7ae245674 ae05a6e76804
haproxy.nfs.mynfs.cephnode4.wfoioy cephnode4.local *:2049,9049 running (6m)
4m ago 6m 5108k - 2.4.22-f8e3218 56a7ae245674 a17369fe3513
keepalived.nfs.mynfs.cephnode1.dpzxiv cephnode1.local running (6m)
4m ago 6m 1795k - 2.2.8 84146097b087 6dd7d4edb2ba
keepalived.nfs.mynfs.cephnode4.daebfo cephnode4.local running (6m)
4m ago 6m 1795k - 2.2.8 84146097b087 2d0ce7e1f37a
mds.cephnfs.cephnode1.cinygq cephnode1.local running (48m)
4m ago 9h 30.3M - 18.2.1-194.el9cp a09ffce67935 41dbcad3db49
nfs.mynfs.0.0.cephnode4.kkbwwa cephnode4.local *:12049 running (6m)
4m ago 6m 51.4M - 5.7 a09ffce67935 aeca3463341a
nfs.mynfs.1.0.cephnode1.kfmvag cephnode1.local *:12049 running (6m)
4m ago 6m 51.9M - 5.7 a09ffce67935 2977c14d5a1f
[root@cephnode4 ~]#

[root@cephnode4 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:04:de:52 brd ff:ff:ff:ff:ff:ff

115
IBM Storage Ceph for Beginner’s

altname enp0s18
inet 10.0.0.243/8 brd 10.255.255.255 scope global noprefixroute ens18
valid_lft forever preferred_lft forever
inet6 fe80::be24:11ff:fe04:de52/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:d8:68:7b brd ff:ff:ff:ff:ff:ff
altname enp0s19
inet 192.168.1.13/24 brd 192.168.1.255 scope global noprefixroute ens19
valid_lft forever preferred_lft forever
inet6 fe80::bd5d:7ac4:eaf0:adb9/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@cephnode4 ~]#

[root@cephnode1 e7fcc1ac-42ec-11ef-a58f-bc241172f341]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:72:f3:41 brd ff:ff:ff:ff:ff:ff
altname enp0s18
inet 10.0.0.240/8 brd 10.255.255.255 scope global noprefixroute ens18
valid_lft forever preferred_lft forever
inet 10.0.0.245/24 scope global ens18
valid_lft forever preferred_lft forever
inet6 fe80::be24:11ff:fe72:f341/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
qlen 1000
link/ether bc:24:11:05:3f:27 brd ff:ff:ff:ff:ff:ff
altname enp0s19
inet 192.168.1.10/24 brd 192.168.1.255 scope global noprefixroute ens19
valid_lft forever preferred_lft forever
inet6 fe80::3524:3a85:d8dd:2a79/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@cephnode1 e7fcc1ac-42ec-11ef-a58f-bc241172f341]#

As demonstrated previously, we need to now create an NFS export on the Ceph Dashboard.

Figure 107: Ceph Dashboard NFS Export Details

116
IBM Storage Ceph for Beginner’s

You can now mount your NFS export on the NFS client as follows:

root@labserver:~# mount -t nfs -o vers=4.2,proto=tcp,port=2049 cephnfs.local:/nfs


/mnt/cephnfs -vv
mount.nfs: timeout set for Wed Jul 24 10:08:29 2024
mount.nfs: trying text-based options
'vers=4.2,proto=tcp,port=2049,addr=10.0.0.245,clientaddr=10.0.0.246'
root@labserver:~# df
Filesystem 1K-blocks Used Available Use%
Mounted on
tmpfs 401004 1016 399988 1% /run
/dev/sda2 20463184 15064940 4333440 78% /
tmpfs 2005008 84 2004924 1%
/dev/shm
tmpfs 5120 0 5120 0%
/run/lock
[email protected]=/ 60579840 659456 59920384 2%
/cephfs
/dev/rbd0 3080192 1116372 1963820 37%
/mnt/rbd
tmpfs 401000 16 400984 1%
/run/user/0
cephnfs.local:/nfs 60936192 1015808 59920384 2%
/mnt/cephnfs
root@labserver:~#

You can now perform failover testing if required (similar to how we did for CephFS). Run a dd command
on the client to generate some load and shutdown the cluster node with the NFS server virtual IP. You
can check the I/O load via the Ceph Dashboard or using cephfs-top.

Figure 108: Ceph Dashboard CephFS Performance Details

Figure 109: Cephfs-top command line utility

117
IBM Storage Ceph for Beginner’s

In the current IBM Storage Ceph version, NFS failover testing didn’t work as expected with the client
being disconnected when the NFS server IP fails over. After doing some additional testing and
research, it was discovered that the ingress service deploys haproxy without the health check
option. Haproxy health checks automatically detect when a server becomes unresponsive or begins
to return errors; HAProxy can then temporarily remove that server from the pool until it begins to act
normally again. Without health checks, HAProxy has no way of knowing when a server has become
dysfunctional. Credit to this Reddit post that eventually help resolve this issue
https://www.reddit.com/r/ceph/comments/10bcwra/nfs_cluster_ha_not_working_what_am_i_mis
sing/

Since IBM Storage Ceph is containerized, the configuration files for HAProxy are deployed inside
containers. The only way to make changes to any of the configuration files (e.g. NFS-Ganesha Exports
file or HAProxy configuration file) is to use the method described below. For NFS-Ganesha exports,
you can download a sample file from here:

https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/export.txt

Modify it and update the NFS service.

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=orchestrator-setting-custom-nfs-ganesha-
configuration

[root@cephnode1 ~]# cat ganesha.conf


# {{ cephadm_managed }}
NFS_CORE_PARAM {
Enable_NLM = false;
Enable_RQUOTA = false;
Protocols = 4;
Disable_UDP = true;
}
.
.
.
RGW {
cluster = "ceph";
name = "client.{{ rgw_user }}";
}

%url {{ url }}
[root@cephnode1 ~]#

[root@cephnode1 ~]# ceph config-key set mgr/cephadm/services/nfs/ganesha.conf -i


./ganesha.conf
set mgr/cephadm/services/nfs/ganesha.conf
[root@cephnode1 ~]#

For HAProxy, to add the health check option perform the following steps:

[root@cephnode1 ~]# wget


https://raw.githubusercontent.com/ceph/ceph/main/src/pybind/mgr/cephadm/templates/services/in
gress/haproxy.cfg.j2
--2024-07-24 10:43:38--
https://raw.githubusercontent.com/ceph/ceph/main/src/pybind/mgr/cephadm/templates/services/in
gress/haproxy.cfg.j2
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133,
185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 2566 (2.5K) [text/plain]
Saving to: ‘haproxy.cfg.j2’

haproxy.cfg.j2
100%[================================================================>] 2.51K --.-KB/s
in 0.008s

2024-07-24 10:43:38 (319 KB/s) - ‘haproxy.cfg.j2’ saved [2566/2566]

118
IBM Storage Ceph for Beginner’s

[root@cephnode1 ~]# mv haproxy.cfg.j2 haproxy.cfg


[root@cephnode1 ~]# mv haproxy.cfg.j2 haproxy.cfg
mv: overwrite 'haproxy.cfg'? y
[root@cephnode1 ~]# vi haproxy.cfg
[root@cephnode1 ~]#
[root@cephnode1 ~]# cat haproxy.cfg
# {{ cephadm_managed }}
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/lib/haproxy/haproxy.pid
maxconn 8000
daemon
stats socket /var/lib/haproxy/stats
.
.
.
{% if mode == 'tcp' %}
mode tcp
balance source
hash-type consistent
{% if default_server_opts %}
default-server {{ default_server_opts|join(" ") }}
{% endif %}
{% for server in servers %}
server {{ server.name }} {{ server.ip }}:{{ server.port }} check
{% endfor %}
{% endif %}
[root@cephnode1 ~]#
[root@cephnode1 ~]# ceph config-key set mgr/cephadm/services/ingress/haproxy.cfg -i
./haproxy.cfg
set mgr/cephadm/services/ingress/haproxy.cfg
[root@cephnode1 ~]# ceph config-key get mgr/cephadm/services/ingress/haproxy.cfg
# {{ cephadm_managed }}
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/lib/haproxy/haproxy.pid
maxconn 8000
daemon
stats socket /var/lib/haproxy/stats
.
.
.
default-server {{ default_server_opts|join(" ") }}
{% endif %}
{% for server in servers %}
server {{ server.name }} {{ server.ip }}:{{ server.port }} check
{% endfor %}
{% endif %}
[root@cephnode1 ~]# ceph orch reconfig ingress.nfs.mynfs
Scheduled to reconfig haproxy.nfs.mynfs.cephnode1.mhbtvd on host 'cephnode1.local'
Scheduled to reconfig keepalived.nfs.mynfs.cephnode1.cdypgi on host 'cephnode1.local'
Scheduled to reconfig haproxy.nfs.mynfs.cephnode4.sxviuh on host 'cephnode4.local'
Scheduled to reconfig keepalived.nfs.mynfs.cephnode4.tqijom on host 'cephnode4.local'
[root@cephnode1 ~]#

You can verify that the new file has taken effect after redeploying the haproxy and keepalived services
as follows:

[root@cephnode1 ~]# cd /var/lib/ceph/e7fcc1ac-42ec-11ef-a58f-


bc241172f341/haproxy.nfs.mynfs.cephnode1.mhbtvd/haproxy/
[root@cephnode1 haproxy]# cat haproxy.cfg
# This file is generated by cephadm.
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/lib/haproxy/haproxy.pid
maxconn 8000
daemon
stats socket /var/lib/haproxy/stats

defaults
mode tcp

119
IBM Storage Ceph for Beginner’s

log global
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
maxconn 8000

frontend stats
mode http
bind 10.0.0.245:9000
bind 10.0.0.240:9000
stats enable
stats uri /stats
stats refresh 10s
stats auth admin:wokqvaej
http-request use-service prometheus-exporter if { path /metrics }
monitor-uri /health

frontend frontend
bind 10.0.0.245:2049
default_backend backend

backend backend
mode tcp
balance source
hash-type consistent
server nfs.mynfs.0 10.0.0.243:12049 check
server nfs.mynfs.1 10.0.0.240:12049 check
[root@cephnode1 haproxy]#

IBM Storage Ceph NFS with an Object Storage Backend


It is possible to expose Ceph Object Storage via the NFSv4 protocol. This exposes buckets and objects
as directories and files on NFS. This capability is important for certain types of workloads where you
have different applications that require different protocol access to the same data for example. It is
similar to IBM Storage Scale Unified File and Object Access where Storage Scale files can be exposed
via NFS, SMB and Object protocols. If the primary use case for a customer is Ceph Object Storage,
then it would be an important capability to demonstrate in a POC. A full description of how to configure
this is documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=ganesha-nfs-ceph-object-storage

Let’s enable this via the Ceph Dashboard. Navigate to Object -> NFS -> Create Export. It is
recommended to enable RO access for compatibility reasons (see limitations in the above link). We
will however enable RW access. We will use a bucket called testbucket. Note, the Storage Backend is
set to Object Gateway (unlike a normal NFS export which uses CephFS as the backend).

120
IBM Storage Ceph for Beginner’s

Figure 110: Ceph Dashboard Object NFS Export

After you create the Export, you should see it in the list of cluster NFS exports.

Figure 111: Ceph Dashboard NFS Export List

We can now mount our NFS export on a client server.

root@labserver:~# mount -t nfs -o vers=4.2,proto=tcp,port=2049,rw


cephnode1.local:/testbucket /mnt/testbucket -vv
mount.nfs: timeout set for Thu Jul 25 07:56:58 2024
mount.nfs: trying text-based options
'vers=4.2,proto=tcp,port=2049,addr=10.0.0.240,clientaddr=10.0.0.246'
root@labserver:~# df
Filesystem 1K-blocks Used Available Use%
Mounted on
tmpfs 401004 996 400008 1% /run

121
IBM Storage Ceph for Beginner’s

/dev/sda2 20463184 16027760 3370620 83% /


tmpfs 2005008 84 2004924 1%
/dev/shm
tmpfs 5120 0 5120 0%
/run/lock
/dev/rbd0 3080192 1116372 1963820 37%
/mnt/rbd
[email protected]=/ 60678144 1212416 59465728 2%
/cephfs
tmpfs 401000 16 400984 1%
/run/user/0
cephnode1.local:/testbucket 209682432 16444416 193238016 8%
/mnt/testbucket
root@labserver:~#

Let’s navigate to the newly mounted filesystem and list the contents of testbucket. As we can see, we
already have 2 files (these are actually objects).

root@labserver:~# cd /mnt/testbucket/
root@labserver:/mnt/testbucket# ls -al
total 4026
drwxrwxrwx 1 root root 0 Jul 24 20:53 .
drwxr-xr-x 5 root root 4096 Jul 24 20:54 ..
-rw-rw-rw- 1 root root 4118141 Jul 24 20:49 1gfile
-rw-rw-rw- 1 root root 318 Jul 20 22:22 hosts
root@labserver:/mnt/testbucket#

Let’s now use s3cmd to write an object to our testbucket called can_u_see_me_from_nfs. Thereafter,
we should be able to immediately see this new object via NFS with a simple ls command.

root@labserver:/mnt/testbucket# s3cmd put /mnt/cephnfs/hosts


s3://testbucket/can_u_see_me_from_nfs
upload: '/mnt/cephnfs/hosts' -> 's3://testbucket/can_u_see_me_from_nfs' [1 of 1]
552 of 552 100% in 0s 35.37 KB/s done
root@labserver:/mnt/testbucket# ls -alt
total 4026
drwxrwxrwx 3 root root 0 Jul 24 2024 .
-rw-rw-rw- 1 root root 332 Jul 24 20:58 can_u_see_me_from_nfs
drwxr-xr-x 5 root root 4096 Jul 24 20:54 ..
-rw-rw-rw- 1 root root 4118141 Jul 24 20:49 1gfile
-rw-rw-rw- 1 root root 318 Jul 20 22:22 hosts
root@labserver:/mnt/testbucket#

Sure enough, we can see the file we wrote to the RGW via s3cmd. Now let’s test it the other way
around. We will create a file via NFS called can_u_see_me_from_s3 and then query the RGW via
object protocol and list the contents of testbucket to see if we can see it.

root@labserver:/mnt/testbucket# cp /etc/hosts ./can_you_see_me_from_s3


root@labserver:/mnt/testbucket# ls -alt
total 4026
-rw-r--r-- 1 root root 552 Jul 25 07:57 can_you_see_me_from_s3
drwxrwxrwx 1 root root 0 Jul 25 07:57 .
-rw-rw-rw- 1 root root 332 Jul 24 20:58 can_u_see_me_from_nfs
drwxr-xr-x 5 root root 4096 Jul 24 20:54 ..
-rw-rw-rw- 1 root root 4118141 Jul 24 20:49 1gfile
-rw-rw-rw- 1 root root 318 Jul 20 22:22 hosts

root@labserver:~# s3cmd ls s3://testbucket


2024-07-24 18:49 1048576000 s3://testbucket/1gfile
2024-07-24 18:58 552 s3://testbucket/can_u_see_me_from_nfs
2024-07-25 05:57 552 s3://testbucket/can_you_see_me_from_s3
2024-07-20 20:22 518 s3://testbucket/hosts
root@labserver:~#

We have just demonstrated that we can expose our object storage via NFS easily. You can query the
contents of the testbucket in Ceph using the radosgw-admin command as well.

[root@cephnode1 ~]# radosgw-admin bucket list


[

122
IBM Storage Ceph for Beginner’s

"testwebsite",
"awsbucket",
"testbucket"
]
[root@cephnode1 ~]# radosgw-admin bucket list --bucket=testbucket
[
{
"name": "1gfile",
"instance": "",
"ver": {
"pool": 7,
"epoch": 12
},
"locator": "",
"exists": true,
"meta": {
"category": 1,
"size": 4118141,
"mtime": "2024-07-24T18:49:40.747228Z",
"etag": "94b829092828287fd4d714e5adfd0481-67",
"storage_class": "",
"owner": "ndocrat",
"owner_display_name": "Nishaan Docrat",
"content_type": "application/octet-stream",
"accounted_size": 1048576000,
"user_data": "",
"appendable": false
},
"tag": "7109acc2-883b-4502-b863-4e7097d7d83c.574704.15486274694923450889",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
},
{
"name": "can_u_see_me_from_nfs",
"instance": "",
"ver": {
"pool": 7,
"epoch": 17
},
"locator": "",
"exists": true,
"meta": {
"category": 1,
"size": 332,
"mtime": "2024-07-24T18:58:15.274165Z",
"etag": "87d7315472cc2b6e89fa6a7ab85236c5",
"storage_class": "STANDARD",
"owner": "ndocrat",
"owner_display_name": "Nishaan Docrat",
"content_type": "text/plain",
"accounted_size": 552,
"user_data": "",
"appendable": false
},
"tag": "7109acc2-883b-4502-b863-4e7097d7d83c.584431.17680027879669500773",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
},
{
"name": "can_you_see_me_from_s3",
"instance": "",
"ver": {
"pool": 7,
"epoch": 28
},
"locator": "",
"exists": true,
"meta": {
"category": 1,
"size": 552,
"mtime": "2024-07-25T05:57:20.986184Z",
"etag": "87d7315472cc2b6e89fa6a7ab85236c5",
"storage_class": "",
"owner": "ndocrat",
"owner_display_name": "",

123
IBM Storage Ceph for Beginner’s

"content_type": "",
"accounted_size": 552,
"user_data": "",
"appendable": false
},
"tag": "7109acc2-883b-4502-b863-4e7097d7d83c.594593.5626747190077097157",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
},
{
"name": "hosts",
"instance": "",
"ver": {
"pool": 7,
"epoch": 6
},
"locator": "",
"exists": true,
"meta": {
"category": 1,
"size": 318,
"mtime": "2024-07-20T20:22:18.288298Z",
"etag": "b04bc021f3faea3a141ee55e39e5bfbf",
"storage_class": "STANDARD",
"owner": "ndocrat",
"owner_display_name": "Nishaan Docrat",
"content_type": "text/plain",
"accounted_size": 518,
"user_data": "",
"appendable": false
},
"tag": "7109acc2-883b-4502-b863-4e7097d7d83c.112302.4596205271780943401",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
}
]
[root@cephnode1 ~]#

IBM Storage Ceph iSCSI Gateway


The Ceph iSCSI Gateway presents a highly available iSCSI Target that exports RBD images as SCSI
disks. This is a good way to offer block storage for hosts that don’t have native Ceph client support
(e.g. AIX, Solaris etc.). As of Red Hat Ceph Storage 5, the Ceph iSCSI Gateway is deprecated in favor
of NVMe-oF. Deprecated functionality will receive only bug fixes and may be removed in future
releases. Relevant documentation around this technology is identified as "Limited Availability".

Figure 112: Ceph iSCSI Gateway (https://docs.ceph.com/en/reef/rbd/iscsi-overview/)

Open-source Ceph still offers the use of the iSCSI Gateway whilst IBM Storage Ceph has this
functionality disabled by default in the Ceph Dashboard. It is possible to enable it, though it was not
possible to get this to work on IBM Storage Ceph 7.1. Trying to deploy the iSCSI service worked as
expected, though the service failed to start (this was tested on 3 separate IBM Storage Ceph clusters).

124
IBM Storage Ceph for Beginner’s

Figure 113: Ceph Dashboard – Administration – Manager Modules – Dashboard - iSCSI

Figure 114: IBM Storage Ceph iSCSI Gateway Service Creation

The service fails with the following error:

Aug 1 18:26:18 cephnode1 podman[13197]: 2024-08-01 18:26:18.812440929 +0200 SAST


m=+0.042185503 container create
039cebcbb7bce76d884bbd064cd25645280cc470970777e6ffa40acf93b09a08 (image=cp.icr.io/cp/ibm-
ceph/ceph-7-rhel9@sha256:01269061f428d247bdbc8d4855ccf847e3d909e9a4d2418e4ee63bf2e6e6a7de,
name=ceph-e7fcc1ac-42ec-11ef-a58f-bc241172f341-iscsi-iscsi-cephnode1-tufhhh-tcmu,
CEPH_POINT_RELEASE=, io.openshift.tags=ibm ceph, io.k8s.display-name=IBM Storage Ceph 7,
url=https://access.redhat.com/containers/#/registry.access.redhat.com/ibm-ceph/images/7-59,
io.openshift.expose-services=, io.k8s.description=IBM Storage Ceph 7, vcs-
ref=c1f5b277ae27199c6825ef06243c822df4123ce2, GIT_REPO=https://github.com/ceph/ceph-
container.git, release=59, GIT_CLEAN=True, architecture=x86_64, vcs-type=git,
maintainer=Guillaume Abrioux <[email protected]>, ceph=True, name=ibm-ceph, RELEASE=main,
com.redhat.component=ibm-ceph-container,

125
IBM Storage Ceph for Beginner’s

com.redhat.license_terms=https://www.redhat.com/agreements, GIT_BRANCH=main, version=7,


description=IBM Storage Ceph 7, io.buildah.version=1.29.0, summary=Provides the latest IBM
Storage Ceph 7 in a fully featured and supported base image., build-date=2024-05-31T19:46:36,
distribution-scope=public, vendor=Red Hat, Inc.,
GIT_COMMIT=12717c0777377369ea674892da98b0d85250f5b0)
.
.
.
Aug 1 18:26:18 cephnode1 ceph-e7fcc1ac-42ec-11ef-a58f-bc241172f341-iscsi-iscsi-cephnode1-
tufhhh-tcmu[13222]: /usr/local/scripts/tcmu-runner-entrypoint.sh: line 13: /usr/bin/tcmu-
runner: No such file or directory
.
.
.
Aug 1 18:26:18 cephnode1 ceph-e7fcc1ac-42ec-11ef-a58f-bc241172f341-iscsi-iscsi-cephnode1-
tufhhh[13259]: ERROR (catatonit:2): failed to exec pid1: No such file or directory
Aug 1 18:26:18 cephnode1 systemd[1]: Started Ceph iscsi.iscsi.cephnode1.tufhhh for e7fcc1ac-
42ec-11ef-a58f-bc241172f341.
Aug 1 18:26:19 cephnode1 podman[13264]: 2024-08-01 18:26:19.016832617 +0200 SAST
m=+0.030689684 container died
69906997c1ac8c3a9432434f43eb41ed4b2567676b57901b3fb6efcfa50aa1d5 (image=cp.icr.io/cp/ibm-
ceph/ceph-7-rhel9@sha256:01269061f428d247bdbc8d4855ccf847e3d909e9a4d2418e4ee63bf2e6e6a7de,
name=ceph-e7fcc1ac-42ec-11ef-a58f-bc241172f341-iscsi-iscsi-cephnode1-tufhhh,
com.redhat.component=ibm-ceph-container, ceph=True, GIT_COMMIT=
.
.
.
Aug 1 18:26:19 cephnode1 systemd[1]: var-lib-ceph-
e7fcc1ac\x2d42ec\x2d11ef\x2da58f\x2dbc241172f341-iscsi.iscsi.cephnode1.tufhhh-configfs.mount:
Deactivated successfully.
Aug 1 18:26:29 cephnode1 systemd[1]: ceph-e7fcc1ac-42ec-11ef-a58f-
[email protected]: Scheduled restart job, restart counter is
at 5.
Aug 1 18:26:29 cephnode1 systemd[1]: Stopped Ceph iscsi.iscsi.cephnode1.tufhhh for e7fcc1ac-
42ec-11ef-a58f-bc241172f341.
Aug 1 18:26:29 cephnode1 systemd[1]: ceph-e7fcc1ac-42ec-11ef-a58f-
[email protected]: Start request repeated too quickly.
Aug 1 18:26:29 cephnode1 systemd[1]: ceph-e7fcc1ac-42ec-11ef-a58f-
[email protected]: Failed with result 'exit-code'.
Aug 1 18:26:29 cephnode1 systemd[1]: Failed to start Ceph iscsi.iscsi.cephnode1.tufhhh for
e7fcc1ac-42ec-11ef-a58f-bc241172f341.
Aug 1 18:26:38 cephnode1 ceph-mgr[2277]: log_channel(cephadm) log [DBG] : Applying service
iscsi.iscsi spec

It is important to note that whilst IBM Storage Ceph does not support the use of the iSCSI Gateway,
open source Ceph still allows use of this feature. For the sake of completeness, below find details of
the iSCSI Gateway deployment using open source Ceph (Reef) 18.2.1.

Figure 115: Open-source Ceph iSCSI Service Creation

126
IBM Storage Ceph for Beginner’s

Figure 116: Open-source Ceph iSCSI Gateway Information

Figure 117: Open-source Ceph iSCSI Target Creation

127
IBM Storage Ceph for Beginner’s

Figure 118: Open-source Ceph iSCSI – Windows iSCSI Software Initiator Configuration

IBM is targeting the use of Ceph NVMe-oF for high performance workloads like VMWare. The NVMe-
oF Gateway presents an NVMe-oF target that exports RADOS Block Device (RBD) images as NVMe
namespaces. The NVMe-oF protocol allows clients (initiators) to send NVMe commands to storage
devices (targets) over a TCP/IP network, enabling clients without native Ceph client support to access
Ceph block storage.

Figure 119: Ceph NVMe-oF Architecture (https://docs.ceph.com/en/reef/rbd/nvmeof-overview/)

Deploying an IBM Storage Ceph Single Node Cluster


For a POC or test environment, you most likely will need to test multi-cluster functionality (e.g., RGW
multi-site, RBD replication or CephFS mirroring). An easy way to save on resources is to use single
node Ceph clusters for the multi-cluster requirement. You can also use a single node Ceph cluster to
test functionality or practice the use of Ceph. In order to deploy a single node cluster, we need to use
the --single-host-defaults option when bootstrapping a new Ceph cluster with cephadm. The –single-
host-defaults flag set the following options:

128
IBM Storage Ceph for Beginner’s

Figure 120: Ceph –single-host-defaults Flag sets these cluster options

As you can see, the CRUSH rule is set to replicate data or placement groups across OSDs instead of
the default hosts. Also, the default pool replica is set to 2x instead of the default 3x. Lastly, we only
have a single host so no need for a standby MGR.

As an example, below is the disk layout for the VM we will be using.

[root@rceph ~]# lsblk


NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0 11:0 1 2K 0 rom
vda 252:0 0 50G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 49G 0 part
├─rhel-root 253:0 0 45G 0 lvm /
└─rhel-swap 253:1 0 3.9G 0 lvm [SWAP]
vdb 252:16 0 30G 0 disk
vdc 252:32 0 30G 0 disk
vdd 252:48 0 30G 0 disk
[root@rceph ~]#
As with the bootstrap process for a 4-node Ceph cluster described earlier, we will follow a similar
process for the single node cluster (the only different basically is to specify the --single-node-defaults
flag when creating our cluster).

First, we will make sure we have a valid Red Hat subscription, ensure we only have the BaseOS and
AppStream repositories enabled apply the latest OS updates.

[root@rceph ~]# subscription-manager status


+-------------------------------------------+
System Status Details
+-------------------------------------------+
Overall Status: Disabled
Content Access Mode is set to Simple Content Access. This host has access to content,
regardless of subscription status.

System Purpose Status: Disabled

[root@rceph ~]# subscription-manager repos --disable=*


Repository 'kmm-2-for-rhel-9-x86_64-source-rpms' is disabled for this system.
Repository 'rhel-9-for-x86_64-baseos-e4s-rpms' is disabled for this system.
.
.
.
Repository 'satellite-client-6-for-rhel-9-x86_64-aus-debug-rpms' is disabled for this system.
[root@rceph ~]#

[root@rceph ~]# subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms


Repository 'rhel-9-for-x86_64-baseos-rpms' is enabled for this system.
[root@rceph ~]# subscription-manager repos --enable=rhel-9-for-x86_64-appstream-rpms
Repository 'rhel-9-for-x86_64-appstream-rpms' is enabled for this system.

[root@rceph ~]# dnf update


Updating Subscription Management repositories.
Last metadata expiration check: 0:00:11 ago on Thu 01 Aug 2024 08:43:35 PM SAST.
Dependencies resolved.
Nothing to do.
Complete!
[root@rceph ~]#

We need to add the IBM Storage Ceph repository and install the license and accept it.

[root@rceph ~]# curl https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/ibm-storage-


ceph-7-rhel-9.repo | sudo tee /etc/yum.repos.d/ibm-storage-ceph-7-rhel-9.repo

129
IBM Storage Ceph for Beginner’s

% Total % Received % Xferd Average Speed Time Time Time Current


Dload Upload Total Spent Left Speed
100 245 100 245 0 0 74 0 0:00:03 0:00:03 --:--:-- 74
[ibm-storage-ceph-7]
name = ibm-storage-ceph-7
baseurl = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/7/rhel9/$basearch/
enabled = 1
gpgcheck = 1
gpgkey = https://public.dhe.ibm.com/ibmdl/export/pub/storage/ceph/RPM-GPG-KEY-IBM-CEPH

[root@rceph ~]# dnf repolist


Updating Subscription Management repositories.
repo id repo name
ibm-storage-ceph-7 ibm-storage-ceph-7
rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 -
AppStream (RPMs)
rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 -
BaseOS (RPMs)
[root@rceph ~]#

[root@rceph ~]# dnf install ibm-storage-ceph-license -y


Updating Subscription Management repositories.
ibm-storage-ceph-7
26 kB/s | 55 kB 00:02
Dependencies resolved.
.
.
.
Transaction test succeeded.
Running transaction
Preparing :
1/1
Installing : ibm-storage-ceph-license-7-2.el9cp.noarch
1/1
Running scriptlet: ibm-storage-ceph-license-7-2.el9cp.noarch
1/1

Your licenses have been installed in /usr/share/ibm-storage-ceph-license/L-XSHK-LPQLHG/UTF8/


System locale: en
NOTICE

This document includes License Information documents below for multiple Programs. Each
License Information document identifies the Program(s) to which it applies. Only those
License Information documents for the Program(s) for which Licensee has acquired entitlements
apply.

==============================================
IMPORTANT: READ CAREFULLY
.
.
.
.
.
.
You can read this license in another language at /usr/share/ibm-storage-ceph-license/L-XSHK-
LPQLHG/UTF8/

To Accept these provisions:


run `sudo touch /usr/share/ibm-storage-ceph-license/accept`

Then proceed with install


Verifying : ibm-storage-ceph-license-7-2.el9cp.noarch
1/1
Installed products updated.

Installed:
ibm-storage-ceph-license-7-2.el9cp.noarch

Complete!
[root@rceph ~]#
[root@rceph ~]# touch /usr/share/ibm-storage-ceph-license/accept
[root@rceph ~]# ls -al /usr/share/ibm-storage-ceph-license/accept
-rw-r--r--. 1 root root 0 Aug 1 20:47 /usr/share/ibm-storage-ceph-license/accept
[root@rceph ~]#

130
IBM Storage Ceph for Beginner’s

Next, we need to install the cephadm utility and ansible playbooks.

[root@rceph ~]# dnf install cephadm-ansible


Updating Subscription Management repositories.
Last metadata expiration check: 0:02:17 ago on Thu 01 Aug 2024 08:45:39 PM SAST.
Dependencies resolved.
.
.
.
python3-pyparsing-2.4.7-9.el9.noarch python3-resolvelib-0.5.4-
5.el9.noarch
sshpass-1.09-4.el9.x86_64

Complete!
[root@rceph ~]#

We want to use ansible as the root user so don’t need to create a separate ansible user with sudo
access. We need to ensure that root ssh is allowed and working.

[root@rceph ~]# grep -i PermitRoot /etc/ssh/sshd_config


PermitRootLogin yes
# the setting of "PermitRootLogin without-password".
[root@rceph ~]#

[root@rceph .ssh]# ssh-copy-id -i id_rsa.pub rceph.local


/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "id_rsa.pub"
The authenticity of host 'rceph.local (10.0.0.239)' can't be established.
ED25519 key fingerprint is SHA256:ZeoiMYl3xQIiCXQ6Lfkj5IDF6nVW+maRNseqxlQ2fj4.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that
are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is
to install the new keys
[email protected]'s password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'rceph.local'"


and check to make sure that only the key(s) you wanted were added.

[root@rceph .ssh]#

[root@rceph .ssh]# ssh-copy-id -i id_rsa.pub rceph


/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "id_rsa.pub"
The authenticity of host 'rceph (10.0.0.239)' can't be established.
ED25519 key fingerprint is SHA256:ZeoiMYl3xQIiCXQ6Lfkj5IDF6nVW+maRNseqxlQ2fj4.
This host key is known by the following other names/addresses:
~/.ssh/known_hosts:1: rceph.local
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that
are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote
system.
(if you think this is a mistake, you may want to use -f option)

[root@rceph .ssh]#

We need to create an ansible inventory file and run the preflight playbook.

[root@rceph ~]# cd /usr/share/cephadm-ansible/


[root@rceph cephadm-ansible]#
[root@rceph cephadm-ansible]# cat hosts
[admin]
rceph.local
[root@rceph cephadm-ansible]#

[root@rceph cephadm-ansible]# ansible-playbook -i ./hosts cephadm-preflight.yml --extra-vars


"ceph_origin=ibm"

131
IBM Storage Ceph for Beginner’s

[WARNING]: log file at /root/ansible/ansible.log is not writeable and we cannot create it,
aborting

[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new


standard, use callbacks_enabled instead. This
feature will be removed from ansible-core in version 2.15. Deprecation warnings can be
disabled by setting
deprecation_warnings=False in ansible.cfg.

PLAY [insecure_registries]
*********************************************************************************************
************
.
.
.
fail if baseurl is not defined for ceph_custom_repositories ---------------------------------
-------------------------------- 0.02s
install prerequisites packages on clients ---------------------------------------------------
-------------------------------- 0.02s
set_fact ceph_custom_repositories -----------------------------------------------------------
-------------------------------- 0.02s
configure Ceph community repository ---------------------------------------------------------
-------------------------------- 0.02s
[root@rceph cephadm-ansible]#

Make sure we have login details for the IBM Container Registry (ICR) stored to a file. Your generated
entitlement key should be the password.

[root@rceph cephadm-ansible]# cat /etc/registry.json


{
"url":"cp.icr.io/cp",
"username":"cp",
"password":"<YOUR_ENTITLEMENT_KEY_HERE_FROM_ICR>"
}
[root@rceph cephadm-ansible]#

Lastly, we need to bootstrap our single node cluster with cephadm. We have to specify --single-node-
defaults as explained earlier.

[root@rceph cephadm-ansible]# cephadm bootstrap --cluster-network 10.0.0.0/24 --mon-ip


10.0.0.239 --allow-fqdn-hostname --registry-json /etc/registry.json --single-host-defaults
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.9.4 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: d3a18482-5038-11ef-8136-525400be8183
Verifying IP 10.0.0.239 port 3300 ...
Verifying IP 10.0.0.239 port 6789 ...
Mon IP `10.0.0.239` is in CIDR network `10.0.0.0/24`
Mon IP `10.0.0.239` is in CIDR network `10.0.0.0/24`
Adjusting default settings to suit single-host cluster...
Pulling custom registry login info from /etc/registry.json.
Logging into custom registry.
Pulling container image cp.icr.io/cp/ibm-ceph/ceph-7-rhel9:latest...
Ceph version: ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef
(stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...

132
IBM Storage Ceph for Beginner’s

Setting public_network to 10.0.0.0/24 in mon config section


Setting cluster_network to 10.0.0.0/24
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 0.0.0.0:9283 ...
Verifying port 0.0.0.0:8765 ...
Verifying port 0.0.0.0:8443 ...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to /etc/ceph/ceph.pub
Adding key to root@localhost authorized_keys...
Adding host rceph.local...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Deploying ceph-exporter service with default placement...
Deploying prometheus service with default placement...
Deploying grafana service with default placement...
Deploying node-exporter service with default placement...
Deploying alertmanager service with default placement...
Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for mgr epoch 9...
mgr epoch 9 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
Ceph Dashboard is now available at:

URL: https://rceph.local:8443/
User: admin
Password: rhmj2s8egg

Enabling client.admin keyring and conf on hosts with "admin" label


Saving cluster configuration to /var/lib/ceph/d3a18482-5038-11ef-8136-525400be8183/config
directory
Skipping call home integration. --enable-ibm-call-home not provided
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:

sudo /usr/sbin/cephadm shell --fsid d3a18482-5038-11ef-8136-525400be8183 -c


/etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Or, if you are only running a single cluster on this host:

sudo /usr/sbin/cephadm shell

Please consider enabling telemetry to help improve Ceph:

ceph telemetry on

For more information see:

https://docs.ceph.com/en/latest/mgr/telemetry/

Bootstrap complete.
[root@rceph cephadm-ansible]#

You can check the status of your cluster from the command line.

[root@rceph cephadm-ansible]# ceph -s


cluster:
id: d3a18482-5038-11ef-8136-525400be8183
health: HEALTH_WARN

133
IBM Storage Ceph for Beginner’s

OSD count 0 < osd_pool_default_size 2

services:
mon: 1 daemons, quorum rceph (age 106s)
mgr: rceph.zigotn(active, since 67s)
osd: 0 osds: 0 up, 0 in

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:

progress:
Updating grafana deployment (+1 -> 1) (0s)
[............................]

[root@rceph cephadm-ansible]#

You can check that the CRUSH rule is set to OSD.

[root@rceph ~]# ceph osd crush rule dump


[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"type": 1,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_firstn",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}
]

[root@rceph ~]#

We can also just check what daemons and services are deployed and what disks are available for OSDs
before connecting to the Ceph Dashboard to complete the installation.

[root@rceph ~]# cephadm shell


Inferring fsid d3a18482-5038-11ef-8136-525400be8183
Inferring config /var/lib/ceph/d3a18482-5038-11ef-8136-525400be8183/mon.rceph/config
Using ceph image with id 'a09ffce67935' and tag 'latest' created on 2024-05-31 19:48:46 +0000
UTC
cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:354f5b6f203dbd9334ac2f2bfb541c7e06498a62283b1c91cef5fa8a036aea4f
[ceph: root@rceph /]#

[ceph: root@rceph /]# ceph orch host ls


HOST ADDR LABELS STATUS
rceph.local 10.0.0.239 _admin
1 hosts in cluster
[ceph: root@rceph /]# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE
MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.rceph rceph.local *:9093,9094 running (4m) 3m ago 5m 19.9M
- 0.26.0 2bdd88ba9d9f 34ff4e1f9eca
ceph-exporter.rceph rceph.local running (6m) 3m ago 6m 6299k
- 18.2.1-194.el9cp a09ffce67935 cb775021a55b
crash.rceph rceph.local running (6m) 3m ago 6m 6899k
- 18.2.1-194.el9cp a09ffce67935 9f84661144a8

134
IBM Storage Ceph for Beginner’s

grafana.rceph rceph.local *:3000 running (4m) 3m ago 4m 60.5M


- 10.4.0-pre 623fd2b148fe a41baa01e68e
mgr.rceph.kumbrt rceph.local *:8443,8765 running (4m) 3m ago 4m 449M
- 18.2.1-194.el9cp a09ffce67935 9bf242a95197
mgr.rceph.zigotn rceph.local *:9283,8765,8443 running (6m) 3m ago 6m 493M
- 18.2.1-194.el9cp a09ffce67935 49157ce646da
mon.rceph rceph.local running (7m) 3m ago 7m 37.0M
2048M 18.2.1-194.el9cp a09ffce67935 358191cc4142
node-exporter.rceph rceph.local *:9100 running (5m) 3m ago 5m 14.5M
- 1.7.0 cf2bcc5cf8d9 7df2c801f4db
prometheus.rceph rceph.local *:9095 running (4m) 3m ago 4m 30.3M
- 2.48.0 d1ad5c044d2e f04587b9bf06
[ceph: root@rceph /]#
[ceph: root@rceph /]# ceph orch device ls --wide --refresh
HOST PATH TYPE TRANSPORT RPM DEVICE ID SIZE HEALTH IDENT
FAULT AVAILABLE REFRESHED REJECT REASONS
rceph.local /dev/sr0 hdd QEMU_DVD-ROM_QM00001 2048 N/A N/A
No 72s ago Insufficient space (<5GB)
rceph.local /dev/vdb hdd 30.0G N/A N/A
Yes 72s ago
rceph.local /dev/vdc hdd 30.0G N/A N/A
Yes 72s ago
rceph.local /dev/vdd hdd 30.0G N/A N/A
Yes 72s ago
[ceph: root@rceph /]#

We can now connect to the IBM Storage Ceph Dashboard (credentials were provided during the
bootstrap process).

Figure 121: IBM Storage Ceph Dashboard

Click on “Expand Cluster” to complete the installation. On the Add Hosts page we can edit or add node
labels as we did when we created a 4-node cluster.

135
IBM Storage Ceph for Beginner’s

Figure 122: IBM Storage Ceph Dashboard

We do not need to modify anything so can click “Next”. On the “Create OSDs” page the default
“Cost/Capacity-optimized” is selected by default. We can select “Next”.

Figure 123: IBM Storage Ceph Dashboard

On the Create Services page you can deploy additional services as you require. For now, though, we
will just select the defaults and click “Next”.

136
IBM Storage Ceph for Beginner’s

Figure 124: IBM Storage Ceph Dashboard

Click on “Cluster Review” page we get a summary of our configuration. We can click on “Expand
Cluster” to complete the installation.

Figure 125: IBM Storage Ceph Dashboard

After a few minutes we should see a healthy cluster with a raw storage capacity of 90GB.

137
IBM Storage Ceph for Beginner’s

Figure 126: IBM Storage Ceph Dashboard

Navigate through the cluster resources to validate the configuration.

Figure 127: IBM Storage Ceph Dashboard – Cluster Pools

138
IBM Storage Ceph for Beginner’s

Figure 128: IBM Storage Ceph Dashboard – Cluster Hosts

Figure 129: IBM Storage Ceph Dashboard – Cluster Physical Disks

139
IBM Storage Ceph for Beginner’s

Figure 130: IBM Storage Ceph Dashboard – CRUSH Map

Figure 131: IBM Storage Ceph Dashboard - Monitors

140
IBM Storage Ceph for Beginner’s

Figure 132: IBM Storage Ceph Dashboard – Administration Services

We can also check we have a healthy single node cluster from the command line.

[ceph: root@rceph /]# ceph -s


cluster:
id: d3a18482-5038-11ef-8136-525400be8183
health: HEALTH_OK

services:
mon: 1 daemons, quorum rceph (age 17m)
mgr: rceph.zigotn(active, since 14m), standbys: rceph.kumbrt
osd: 3 osds: 3 up (since 4m), 3 in (since 4m)

data:
pools: 2 pools, 33 pgs
objects: 2 objects, 449 KiB
usage: 80 MiB used, 90 GiB / 90 GiB avail
pgs: 33 active+clean

[ceph: root@rceph /]#

IBM Storage Ceph Container Storage Interface (CSI) Driver


As per the last Ceph user survey, one of the main use cases for Ceph is for containerized workloads.
The de-facto standard for most Red Hat OpenShift deployments is ODF (OpenShift Data Foundation)
now called IBM Fusion Data Foundation (FDF). FDF supports two storage options, Data Foundation
(which is basically Ceph under the covers with a Rook-Ceph operator and NooBaa as a multi-cloud
object gateway) and IBM Global Data Platform (GDP) which is based on IBM Storage Scale. FDF
essentially deploys a Ceph storage cluster inside Red Hat OpenShift or supports the use of an external
Ceph cluster (required for example when using Metro-DR for disaster recovery). For more information
on IBM Fusion, you can refer here https://www.ibm.com/products/storage-fusion.

Most OpenShift customers would be using ODF/FDF or a CSI driver based on their existing storage
deployment. Alternatively, they have the option of deploying the rook-ceph operator via OpenShift
OperatorHub with the option of configuring an external Ceph cluster. So, for OpenShift the use of Ceph

141
IBM Storage Ceph for Beginner’s

is largely automated (unless you deploying the rook-ceph operator to use with an external Ceph
cluster).

For our purposes, we will demonstrate the use of the IBM Storage Ceph CSI driver for a native
Kubernetes cluster. Like OpenShift, you can deploy the rook-ceph operator to deploy a storage cluster
inside your Kubernetes cluster or make use of an external Ceph cluster which is what we will
demonstrate. This deployment type supports providing storage to multiple Kubernetes clusters as
depicted below.

Figure 133: Centralised Ceph Storage Cluster serving multiple k8s Clusters

For the test setup, we have a 4-node Kubernetes cluster (1 master and 3 workers). We will present
RBD, CephFS and also configure Object Bucket Claims using the Ceph RGW.

First, let create a dedicated RBD pool for the CSI.

Figure 134: Ceph Dashboard – RBD pool for CSI

142
IBM Storage Ceph for Beginner’s

We also need a dedicated filesystem to test CephFS.

Figure 135: Ceph Dashboard – CephFS for CSI

The instructions to configure an external Ceph cluster for the Rook operator can be found here:

https://rook.io/docs/rook/latest-release/CRDs/Cluster/external-cluster/external-cluster/

Rook is an open-source cloud-native storage orchestrator for Kubernetes, providing the platform,
framework, and support for Ceph storage to natively integrate with Kubernetes. Rook automates
deployment and management of Ceph to provide self-managing, self-scaling, and self-healing storage
services. The Rook operator does this by building on Kubernetes resources to deploy, configure,
provision, scale, upgrade, and monitor Ceph (https://github.com/rook/rook).

We need to export the configuration from the provider Ceph cluster and import it into the Rook
consumer cluster. To do this we need to run the python script create-external-cluster-resources.py in
the provider Ceph cluster cephadm shell, to create the necessary users and keys.

To get the script we will clone the rook Git repository.

git clone --single-branch --branch v1.14.9 https://github.com/rook/rook.git

The script is located at rook/deploy/examples/create-external-cluster-resources.py. You can find a


full description of the options required for the create-external-cluster-resources.py script here:

https://rook.io/docs/rook/latest-release/CRDs/Cluster/external-cluster/provider-export/

Let’s execute it on our Ceph storage cluster with the --dry-run option to see what commands it is going
to run. We need to populate the options to match our cluster configuration.

[root@cephnode1 k8s]# python3 create-external-cluster-resources.py --namespace rook-ceph --


rbd-data-pool-name k8s-rbd --cephfs-filesystem-name k8s-fs --monitoring-endpoint
10.0.0.240,10.0.0.241 --monitoring-endpoint-port 9283 --rgw-endpoint cephs3.local:8080 --rgw-
skip-tls true --rgw-realm-name NDLAB --rgw-zonegroup-name LAB_ZONE_GROUP1 --rgw-zone-name
DC1_ZONE --dry-run
Execute: 'ceph fs ls'

143
IBM Storage Ceph for Beginner’s

Execute: 'ceph fsid'


Execute: 'ceph quorum_status'
Execute: 'ceph auth get-or-create client.healthchecker mon allow r, allow command
quorum_status, allow command version mgr allow command config osd profile rbd-read-only,
allow rwx pool=default.rgw.meta, allow r pool=.rgw.root, allow rw pool=default.rgw.control,
allow rx pool=default.rgw.log, allow x pool=default.rgw.buckets.index'
Execute: 'ceph mgr services'
Execute: 'ceph auth get-or-create client.csi-rbd-node mon profile rbd, allow command 'osd
blocklist' osd profile rbd'
Execute: 'ceph auth get-or-create client.csi-rbd-provisioner mon profile rbd, allow command
'osd blocklist' mgr allow rw osd profile rbd'
Execute: 'ceph status'
Execute: 'ceph radosgw-admin user create --uid rgw-admin-ops-user --display-name Rook RGW
Admin Ops user --caps buckets=*;users=*;usage=read;metadata=read;zone=read --rgw-realm NDLAB
--rgw-zonegroup LAB_ZONE_GROUP1 --rgw-zone DC1_ZONE'

[root@cephnode1 k8s]#

Next, we can run the command with the --format bash option as we need to copy the environment
variables to our consumer cluster.

root@cephnode1 k8s]# python3 create-external-cluster-resources.py --format bash --namespace


rook-ceph --rbd-data-pool-name k8s-rbd --cephfs-filesystem-name k8s-fs --monitoring-en dpoint
10.0.0.240,10.0.0.241 --monitoring-endpoint-port 9283 --rgw-endpoint cephs3.local:8080 --rgw-
skip-tls true --rgw-realm-name NDLAB --rgw-zonegroup-name LAB_ZONE_GROUP1 --rgw -zone-name
DC1_ZONE
export NAMESPACE=rook-ceph
export ROOK_EXTERNAL_FSID=e7fcc1ac-42ec-11ef-a58f-bc241172f341
export ROOK_EXTERNAL_USERNAME=client.healthchecker
export ROOK_EXTERNAL_CEPH_MON_DATA=cephnode1=10.0.0.240:6789
export ROOK_EXTERNAL_USER_SECRET=AQAafKVmFr7+CxAAYpKlrhZB9uEuqWXPi1PECw==
export ROOK_EXTERNAL_DASHBOARD_LINK=https://10.0.0.240:8443/
export CSI_RBD_NODE_SECRET=AQAafKVmN/iPDRAA4tGgdsOK1A3bb0Lg4zQ73A==
export CSI_RBD_NODE_SECRET_NAME=csi-rbd-node
export CSI_RBD_PROVISIONER_SECRET=AQAafKVmv0aNDhAAiiCpwU/RFhjeBkBvdsLIbg==
export CSI_RBD_PROVISIONER_SECRET_NAME=csi-rbd-provisioner
export CEPHFS_POOL_NAME=cephfs.k8s-fs.data
export CEPHFS_METADATA_POOL_NAME=cephfs.k8s-fs.meta
export CEPHFS_FS_NAME=k8s-fs
export CSI_CEPHFS_NODE_SECRET=AQAafKVmADZUDxAAL+Z23JR/HpFcJB9m5OAl4g==
export CSI_CEPHFS_PROVISIONER_SECRET=AQAafKVmtXYlEBAAULEpXDO08ZA0wmtGCuAwGQ==
export CSI_CEPHFS_NODE_SECRET_NAME=csi-cephfs-node
export CSI_CEPHFS_PROVISIONER_SECRET_NAME=csi-cephfs-provisioner
export MONITORING_ENDPOINT=10.0.0.240,10.0.0.241
export MONITORING_ENDPOINT_PORT=9283
export RBD_POOL_NAME=k8s-rbd
export RGW_POOL_PREFIX=default
export RGW_ENDPOINT=cephs3.local:8080
export RGW_ADMIN_OPS_USER_ACCESS_KEY=VENNMWSXOWBZEKDOBV10
export RGW_ADMIN_OPS_USER_SECRET_KEY=7cr1jCjwX4UAU1Tl8m5h3tt0IA8jdIsiqAFBSIGA

[root@cephnode1 k8s]#

On our consumer cluster we will clone the same GitHub repository as we will need the example YAML
files in the rook/deploy/example directory to define our Kubernetes storage resources.

root@k8smaster:~/ceph# git clone --single-branch --branch v1.14.9


https://github.com/rook/rook.git
Cloning into 'rook'...
remote: Enumerating objects: 99235, done.
remote: Counting objects: 100% (517/517), done.
remote: Compressing objects: 100% (303/303), done.
remote: Total 99235 (delta 333), reused 315 (delta 212), pack-reused 98718
Receiving objects: 100% (99235/99235), 52.71 MiB | 11.09 MiB/s, done.
Resolving deltas: 100% (69578/69578), done.
Note: switching to '0139b342aa287ee77563b580700c0591779f99d3'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

144
IBM Storage Ceph for Beginner’s

git switch -c <new-branch-name>

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

root@k8smaster:~/ceph#

Copy the environment variables from the provider cluster to a file on the consumer cluster to make it
easy to export all the required variables. We need to source the file before we run the import.

root@k8smaster:~/ceph/rook/deploy/examples# cat /root/ceph/vars.sh


export NAMESPACE=rook-ceph
export ROOK_EXTERNAL_FSID=e7fcc1ac-42ec-11ef-a58f-bc241172f341
export ROOK_EXTERNAL_USERNAME=client.healthchecker
export ROOK_EXTERNAL_CEPH_MON_DATA=cephnode1=10.0.0.240:6789
export ROOK_EXTERNAL_USER_SECRET=AQAafKVmFr7+CxAAYpKlrhZB9uEuqWXPi1PECw==
export ROOK_EXTERNAL_DASHBOARD_LINK=https://10.0.0.240:8443/
export CSI_RBD_NODE_SECRET=AQAafKVmN/iPDRAA4tGgdsOK1A3bb0Lg4zQ73A==
export CSI_RBD_NODE_SECRET_NAME=csi-rbd-node
export CSI_RBD_PROVISIONER_SECRET=AQAafKVmv0aNDhAAiiCpwU/RFhjeBkBvdsLIbg==
export CSI_RBD_PROVISIONER_SECRET_NAME=csi-rbd-provisioner
export CEPHFS_POOL_NAME=cephfs.k8s-fs.data
export CEPHFS_METADATA_POOL_NAME=cephfs.k8s-fs.meta
export CEPHFS_FS_NAME=k8s-fs
export CSI_CEPHFS_NODE_SECRET=AQAafKVmADZUDxAAL+Z23JR/HpFcJB9m5OAl4g==
export CSI_CEPHFS_PROVISIONER_SECRET=AQAafKVmtXYlEBAAULEpXDO08ZA0wmtGCuAwGQ==
export CSI_CEPHFS_NODE_SECRET_NAME=csi-cephfs-node
export CSI_CEPHFS_PROVISIONER_SECRET_NAME=csi-cephfs-provisioner
export MONITORING_ENDPOINT=10.0.0.240,10.0.0.241
export MONITORING_ENDPOINT_PORT=9283
export RBD_POOL_NAME=k8s-rbd
export RGW_POOL_PREFIX=default
export RGW_ENDPOINT=cephs3.local:8080
export RGW_ADMIN_OPS_USER_ACCESS_KEY=VENNMWSXOWBZEKDOBV10
export RGW_ADMIN_OPS_USER_SECRET_KEY=7cr1jCjwX4UAU1Tl8m5h3tt0IA8jdIsiqAFBSIGA
root@k8smaster:~/ceph/rook/deploy/examples# source /root/ceph/vars.sh
root@k8smaster:~/ceph/rook/deploy/examples#

We need to install the rook operator either using Helm or using manifests. We will use manifests. Note
the following:

# If Rook is not managing any existing cluster in the 'rook-ceph' namespace do:
# kubectl create -f ../../examples/crds.yaml -f ../../examples/common.yaml -f
../../examples/operator.yaml
# kubectl create -f common-external.yaml -f cluster-external.yaml
#
# If there is already a cluster managed by Rook in 'rook-ceph' then do:
# kubectl create -f common-external.yaml

Since we do not already have any existing cluster, we have to do the following:

root@k8smaster:~/ceph/rook/deploy/examples/external# kubectl create -f


../../examples/crds.yaml -f ../../examples/common.yaml -f ../../examples/operator.yaml
customresourcedefinition.apiextensions.k8s.io/cephblockpoolradosnamespaces.ceph.rook.io
created
customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephbucketnotifications.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephbuckettopics.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephclients.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephcosidrivers.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystemmirrors.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystemsubvolumegroups.ceph.rook.io
created
customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectrealms.ceph.rook.io created

145
IBM Storage Ceph for Beginner’s

customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzonegroups.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzones.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephrbdmirrors.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/objectbucketclaims.objectbucket.io created
customresourcedefinition.apiextensions.k8s.io/objectbuckets.objectbucket.io created
clusterrole.rbac.authorization.k8s.io/cephfs-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/cephfs-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/objectstorage-provisioner-role created
clusterrole.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/rbd-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
clusterrole.rbac.authorization.k8s.io/rook-ceph-global created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrole.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrole.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrole.rbac.authorization.k8s.io/rook-ceph-system created
clusterrolebinding.rbac.authorization.k8s.io/cephfs-csi-nodeplugin-role created
clusterrolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/objectstorage-provisioner-role-binding created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-global created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-system created
role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
role.rbac.authorization.k8s.io/rbd-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
role.rbac.authorization.k8s.io/rook-ceph-purge-osd created
role.rbac.authorization.k8s.io/rook-ceph-system created
rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
serviceaccount/objectstorage-provisioner created
serviceaccount/rook-ceph-cmd-reporter created
serviceaccount/rook-ceph-default created
serviceaccount/rook-ceph-mgr created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-purge-osd created
serviceaccount/rook-ceph-rgw created
serviceaccount/rook-ceph-system created
serviceaccount/rook-csi-cephfs-plugin-sa created
serviceaccount/rook-csi-cephfs-provisioner-sa created
serviceaccount/rook-csi-rbd-plugin-sa created
serviceaccount/rook-csi-rbd-provisioner-sa created
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created
Error from server (AlreadyExists): error when creating "../../examples/common.yaml":
namespaces "rook-ceph" already exists
root@k8smaster:~/ceph/rook/deploy/examples/external#

I mistakenly ran the import-external-cluster.sh script before this command hence why the resource
“already exists” errors. It didn’t cause any issues though you should follow the steps in this order.
root@k8smaster:~/ceph/rook/deploy/examples/external# kubectl create -f common-external.yaml -
f cluster-external.yaml
cephcluster.ceph.rook.io/rook-ceph-external created
Error from server (AlreadyExists): error when creating "common-external.yaml": namespaces
"rook-ceph" already exists
Error from server (AlreadyExists): error when creating "common-external.yaml":
rolebindings.rbac.authorization.k8s.io "rook-ceph-cluster-mgmt" already exists

146
IBM Storage Ceph for Beginner’s

Error from server (AlreadyExists): error when creating "common-external.yaml":


rolebindings.rbac.authorization.k8s.io "rook-ceph-cmd-reporter" already exists
Error from server (AlreadyExists): error when creating "common-external.yaml":
serviceaccounts "rook-ceph-cmd-reporter" already exists
Error from server (AlreadyExists): error when creating "common-external.yaml":
serviceaccounts "rook-ceph-default" already exists
Error from server (AlreadyExists): error when creating "common-external.yaml":
roles.rbac.authorization.k8s.io "rook-ceph-cmd-reporter" already exists
root@k8smaster:~/ceph/rook/deploy/examples/external#

We can now run the import-external-cluster.sh script to define the external cluster.

root@k8smaster:~/ceph/rook/deploy/examples# chmod +x import-external-cluster.sh


root@k8smaster:~/ceph/rook/deploy/examples# ./import-external-cluster.sh
namespace/rook-ceph created
secret/rook-ceph-mon created
configmap/rook-ceph-mon-endpoints created
secret/rook-csi-rbd-node created
secret/rook-csi-rbd-provisioner created
secret/rgw-admin-ops-user created
secret/rook-csi-cephfs-node created
secret/rook-csi-cephfs-provisioner created
storageclass.storage.k8s.io/ceph-rbd created
storageclass.storage.k8s.io/cephfs created
root@k8smaster:~/ceph/rook/deploy/examples#

After issuing the above commands we can query the rook operator deployment. Note, k is aliased to
kubectl.

root@k8smaster:~# k config set-context --current --namespace=rook-ceph


root@k8smaster:~# k get pods -o wide
NAME READY STATUS RESTARTS AGE
IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-6g2qn 0/2 ContainerCreating 0 3m40s
10.0.0.239 k8sworker2.local <none> <none>
csi-cephfsplugin-provisioner-76b7548bfb-hs6qv 0/5 ContainerCreating 0 3m40s
<none> k8sworker1.local <none> <none>
csi-cephfsplugin-provisioner-76b7548bfb-rnrzd 0/5 ContainerCreating 0 3m40s
<none> k8sworker2.local <none> <none>
csi-cephfsplugin-tlsp8 2/2 Running 0 3m40s
10.0.0.238 k8sworker1.local <none> <none>
csi-rbdplugin-fjvkz 0/2 ContainerCreating 0 3m40s
10.0.0.239 k8sworker2.local <none> <none>
csi-rbdplugin-l95gz 2/2 Running 0 3m40s
10.0.0.238 k8sworker1.local <none> <none>
csi-rbdplugin-provisioner-7bfcc8659c-fbn9p 0/5 ContainerCreating 0 3m40s
<none> k8sworker1.local <none> <none>
csi-rbdplugin-provisioner-7bfcc8659c-rcs5q 0/5 ContainerCreating 0 3m40s
<none> k8sworker2.local <none> <none>
rook-ceph-operator-58bb4fdc9c-bqz8k 1/1 Running 0 6m53s
192.168.220.6 k8sworker1.local <none> <none>
root@k8smaster:~#

After a few minutes, we should see all pods running and check if our external cluster is connected.

root@k8smaster:~# k get pods


NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-nbt8b 2/2 Running 1 (118s ago) 3m17s
csi-cephfsplugin-provisioner-76b7548bfb-6cjtz 5/5 Running 3 (17m ago) 48m
csi-cephfsplugin-provisioner-76b7548bfb-kv9l9 5/5 Running 2 (17m ago) 48m
csi-cephfsplugin-tk66l 2/2 Running 2 (17m ago) 46m
csi-cephfsplugin-tlsp8 2/2 Running 4 (17m ago) 58m
csi-rbdplugin-5xjt5 2/2 Running 1 (116s ago) 3m17s
csi-rbdplugin-l95gz 2/2 Running 4 (17m ago) 58m
csi-rbdplugin-provisioner-7bfcc8659c-5qjqk 5/5 Running 0 48m
csi-rbdplugin-provisioner-7bfcc8659c-tt4jz 5/5 Running 0 48m
csi-rbdplugin-xbvqt 2/2 Running 2 (17m ago) 46m
rook-ceph-operator-58bb4fdc9c-bqz8k 1/1 Running 1 (18m ago) 61m
root@k8smaster:~#

root@k8smaster:~# k get cephcluster

147
IBM Storage Ceph for Beginner’s

NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE


HEALTH EXTERNAL FSID
rook-ceph-external 8m15s Connected Cluster connected
successfully HEALTH_OK true e7fcc1ac-42ec-11ef-a58f-bc241172f341
root@k8smaster:~#

We can also query our storage classes and set the default storage class.

root@k8smaster:~/ceph/rook/deploy/examples# k get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
ALLOWVOLUMEEXPANSION AGE
ceph-rbd rook-ceph.rbd.csi.ceph.com Delete Immediate true
15m
cephfs rook-ceph.cephfs.csi.ceph.com Delete Immediate true
15m
root@k8smaster:~/ceph/rook/deploy/examples#

root@k8smaster:~/ceph/rook/deploy/examples# kubectl patch storageclass cephfs -p


'{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/cephfs patched

root@k8smaster:~/ceph/rook/deploy/examples# k get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
ALLOWVOLUMEEXPANSION AGE
ceph-rbd rook-ceph.rbd.csi.ceph.com Delete Immediate true
16m
cephfs (default) rook-ceph.cephfs.csi.ceph.com Delete Immediate true
16m
root@k8smaster:~/ceph/rook/deploy/examples#

We can also check the storage class definitions from the Kubernetes dashboard and make sure they
correspond to the correct RBD pool and CephFS filesystem.

Figure 136: Kubernetes Dashboard – Storage Classes

148
IBM Storage Ceph for Beginner’s

Figure 137: Kubernetes Dashboard – Storage Classes – Ceph RBD

Figure 138: Kubernetes Dashboard – Storage Classes – CephFS

We can use the example YAML files from the rook Git repository we cloned to test PVC creation. First
let’s try using CephFS storage class.

root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# cat mycephfspvc.yaml


---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cephfs-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: cephfs
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs#

root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k apply -f mycephfspvc.yaml


persistentvolumeclaim/cephfs-pvc created
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs#

root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS
CLAIM STORAGECLASS REASON AGE

149
IBM Storage Ceph for Beginner’s

pvc-a441b483-0921-4735-8cc8-202f82d96cc7 2Gi RWO Delete Bound


rook-ceph/cephfs-pvc cephfs 3m8s
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES
STORAGECLASS AGE
cephfs-pvc Bound pvc-a441b483-0921-4735-8cc8-202f82d96cc7 2Gi RWO
cephfs 3m11s
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs#

Now we can try using our RBD storage class.

root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# cat myrbdpvc.yaml


---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: ceph-rbd
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k apply -f myrbdpvc.yaml
persistentvolumeclaim/rbd-pvc created
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS
CLAIM STORAGECLASS REASON AGE
pvc-a20cda16-332f-4f01-af1e-dc319d61c60c 2Gi RWO Delete Bound
rook-ceph/rbd-pvc ceph-rbd 17s
pvc-a441b483-0921-4735-8cc8-202f82d96cc7 2Gi RWO Delete Bound
rook-ceph/cephfs-pvc cephfs 4m25s
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES
STORAGECLASS AGE
cephfs-pvc Bound pvc-a441b483-0921-4735-8cc8-202f82d96cc7 2Gi RWO
cephfs 4m28s
rbd-pvc Bound pvc-a20cda16-332f-4f01-af1e-dc319d61c60c 2Gi RWO
ceph-rbd 20s
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd#

We can also check the status of our PVCs on the Kubernetes Dashboard.

Figure 139: Kubernetes Dashboard – Config and Storage - PVCs

PVCs only support block or file. However, through the use of Object Bucket Claims (OBC) you can
define storage classes from where buckets can be provisioned. The goal of the Object Bucket Claim
(OBC) is to “provide a generic, dynamic bucket provision API similar to Persistent Volumes and
Claims”. This will look a lot like the PVC model used for block and file: An administrator defines a

150
IBM Storage Ceph for Beginner’s

storage class that points to underlying object storage, and when users create an object bucket claim
the bucket is automatically provisioned and is guaranteed to be available by the time the pod is
started. The procedure to configure an OBC is detailed here.

https://rook.io/docs/rook/latest/Storage-Configuration/Object-Storage-RGW/object-
storage/#connect-to-an-external-object-store

A good reference on object storage for Kubernetes concepts and the new COSI standard can be found
here:

https://archive.fosdem.org/2021/schedule/event/sds_object_storage_for_k8s/attachments/slides/
4507/export/events/attachments/sds_object_storage_for_k8s/slides/4507/cosi_slides.pdf

Let’s first define an external object store to the rook operator. We will use our highly-available RGW
endpoint.

root@k8smaster:~/ceph/rook/deploy/examples/external# cat myobject-external.yaml


#############################################################################################
####################
# Create an object store with settings for replication in a production environment. A minimum
of 3 hosts with
# OSDs are required in this example.
# kubectl create -f object.yaml
#############################################################################################
####################

apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
name: external-store
namespace: rook-ceph # namespace:cluster
spec:
gateway:
# The port on which **ALL** the gateway(s) are listening on.
# Passing a single IP from a load-balancer is also valid.
port: 8080
externalRgwEndpoints:
- ip: cephs3.local
# hostname: example.com
root@k8smaster:~/ceph/rook/deploy/examples/external#

root@k8smaster:~/ceph/rook/deploy/examples/external# k apply -f myobject-external.yaml


cephobjectstore.ceph.rook.io/external-store created
root@k8smaster:~/ceph/rook/deploy/examples/external#

On our Ceph cluster, we can see it creates a cosi object user.

Figure 140: Ceph Dashboard – Object Users

151
IBM Storage Ceph for Beginner’s

Now we can use the example YAML files to define our Object Storage Class using the rook external
object store we created above.

root@k8smaster:~/ceph/rook/deploy/examples/external# cat mystorageclass-bucket-delete.yaml


apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-delete-bucket
provisioner: rook-ceph.ceph.rook.io/bucket # driver:namespace:cluster
# set the reclaim policy to delete the bucket and all objects
# when its OBC is deleted.
reclaimPolicy: Delete
parameters:
objectStoreName: external-store
objectStoreNamespace: rook-ceph # namespace:cluster
# To accommodate brownfield cases reference the existing bucket name here instead
# of in the ObjectBucketClaim (OBC). In this case the provisioner will grant
# access to the bucket by creating a new user, attaching it to the bucket, and
# providing the credentials via a Secret in the namespace of the requesting OBC.
#bucketName:
root@k8smaster:~/ceph/rook/deploy/examples/external# k apply -f mystorageclass-bucket-
delete.yaml
storageclass.storage.k8s.io/rook-ceph-delete-bucket created
root@k8smaster:~/ceph/rook/deploy/examples/external#

root@k8smaster:~/ceph/rook/deploy/examples/external# k describe sc rook-ceph-delete-bucket


Name: rook-ceph-delete-bucket
IsDefaultClass: No
Annotations: kubectl.kubernetes.io/last-applied-
configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotation
s":{},"name":"rook-ceph-delete-bucket"},"parameters":{"objectStoreName":"external-
store","objectStoreNamespace":"rook-ceph"},"provisioner":"rook-
ceph.ceph.rook.io/bucket","reclaimPolicy":"Delete"}

Provisioner: rook-ceph.ceph.rook.io/bucket
Parameters: objectStoreName=external-store,objectStoreNamespace=rook-ceph
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
root@k8smaster:~/ceph/rook/deploy/examples/external#

And our object bucket claim (OBC).

root@k8smaster:~/ceph/rook/deploy/examples/external# cat object-bucket-claim-delete.yaml


apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
name: ceph-delete-bucket
spec:
# To create a new bucket specify either `bucketName` or
# `generateBucketName` here. Both cannot be used. To access
# an existing bucket the bucket name needs to be defined in
# the StorageClass referenced here, and both `bucketName` and
# `generateBucketName` must be omitted in the OBC.
#bucketName:
generateBucketName: ceph-bkt
storageClassName: rook-ceph-delete-bucket
additionalConfig:
# To set for quota for OBC
#maxObjects: "1000"
#maxSize: "2G"
root@k8smaster:~/ceph/rook/deploy/examples/external# k apply -f object-bucket-claim-
delete.yaml
objectbucketclaim.objectbucket.io/ceph-delete-bucket created
root@k8smaster:~/ceph/rook/deploy/examples/external# k describe obc ceph-delete-bucket
Name: ceph-delete-bucket
Namespace: rook-ceph
Labels: bucket-provisioner=rook-ceph.ceph.rook.io-bucket
Annotations: <none>
API Version: objectbucket.io/v1alpha1
Kind: ObjectBucketClaim
Metadata:

152
IBM Storage Ceph for Beginner’s

Creation Timestamp: 2024-07-28T12:54:36Z


Finalizers:
objectbucket.io/finalizer
Generation: 4
Resource Version: 25877
UID: a69792ea-eeb2-4e20-addc-20cf95213cd2
Spec:
Bucket Name: ceph-bkt-76073a1d-de20-43b6-bb28-702f003d9d90
Generate Bucket Name: ceph-bkt
Object Bucket Name: obc-rook-ceph-ceph-delete-bucket
Storage Class Name: rook-ceph-delete-bucket
Status:
Phase: Bound
Events: <none>
root@k8smaster:~/ceph/rook/deploy/examples/external#

On our Ceph cluster we can see the newly created bucket.

Figure 141: Ceph Dashboard – Object Buckets

Deploying a simple application to dynamically provision a PVC


Let’s ensure that we can deploy an application using our newly created Storage Classes. We will use
the open source nginx webserver for this test.

First, we need to create a deployment and apply it. We will use CephFS storage class for the PVC.

root@k8smaster:~/ceph# cat nginx.yaml


apiVersion: v1
kind: PersistentVolume
metadata:
name: nginx-pvc
spec:
storageClassName: cephfs
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
hostPath:
path: /data
---
apiVersion: v1
kind: PersistentVolumeClaim

153
IBM Storage Ceph for Beginner’s

metadata:
name: nginx-pvc-claim
spec:
storageClassName: cephfs
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: clear-nginx-deployment
spec:
selector:
matchLabels:
app: clear-nginx
template:
metadata:
labels:
app: clear-nginx
spec:
containers:
- name: clear-nginx
image: nginx:1.14.2
volumeMounts:
- mountPath: /var/www/html
name: site-data
ports:
- containerPort: 80
volumes:
- name: site-data
persistentVolumeClaim:
claimName: nginx-pvc-claim
---
apiVersion: v1
kind: Service
metadata:
name: clear-nginx-service
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: clear-nginx
type: NodePort
root@k8smaster:~/ceph#
root@k8smaster:~/ceph# k apply -f nginx.yaml
persistentvolume/nginx-pvc created
persistentvolumeclaim/nginx-pvc-claim created
deployment.apps/clear-nginx-deployment created
service/clear-nginx-service created
root@k8smaster:~/ceph#

Next, we can query the PVC creation and deployment.

root@k8smaster:~/ceph# k get pvc


NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nginx-pvc-claim Bound nginx-pvc 1Gi RWO cephfs 31s
root@k8smaster:~/ceph# k get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS
CLAIM STORAGECLASS REASON AGE
nginx-pvc 1Gi RWO Retain Bound
default/nginx-pvc-claim cephfs 37s
pvc-a20cda16-332f-4f01-af1e-dc319d61c60c 2Gi RWO Delete Bound
rook-ceph/rbd-pvc ceph-rbd 156m
pvc-a441b483-0921-4735-8cc8-202f82d96cc7 2Gi RWO Delete Bound
rook-ceph/cephfs-pvc cephfs 160m
root@k8smaster:~/ceph#

root@k8smaster:~/ceph# k get po -o wide

154
IBM Storage Ceph for Beginner’s

NAME READY STATUS RESTARTS AGE IP


NODE NOMINATED NODE READINESS GATES
clear-nginx-deployment-7fc79f5fd-fkpvr 1/1 Running 0 110s 192.168.198.4
k8sworker3.local <none> <none>
root@k8smaster:~/ceph#

root@k8smaster:~/ceph# k get svc -n default


NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
clear-nginx-service NodePort 10.102.169.128 <none> 80:32527/TCP 13m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h14m
root@k8smaster:~/ceph#

And test our nginx webserver.

Figure 142: nginx Welcome Page

And check the status of our application on the Kubernetes Dashboard.

Figure 143: Kubernetes Dashboard – Workloads – Pods

Ceph CSI Snapshots


You may need to support volume or volume group snapshots as part of your POC. The procedure to
deploy the CSI Snapshotter is documented here:

https://github.com/kubernetes-csi/external-snapshotter

155
IBM Storage Ceph for Beginner’s

To support snapshots, we need to clone the Git repository above. Then follow the instructions
documented. First, we need to install the snapshot and volume group snapshot CRDs. Note, the
location of the files from the cloned repository.

root@k8smaster:~/ceph/snap/external-snapshotter/client/config# kubectl -n kube-system


kustomize ./crd | kubectl create -f -
customresourcedefinition.apiextensions.k8s.io/volumesnapshotclasses.snapshot.storage.k8s.io
created
customresourcedefinition.apiextensions.k8s.io/volumesnapshotcontents.snapshot.storage.k8s.io
created
customresourcedefinition.apiextensions.k8s.io/volumesnapshots.snapshot.storage.k8s.io created
root@k8smaster:~/ceph/snap/external-snapshotter/client/config#

Then we need to deploy the common snapshot controller. Make sure to change the namespace to
match where your rook operator is deployed too.

root@k8smaster:~/ceph/snap/external-snapshotter/deploy/kubernetes/snapshot-controller# vi
rbac-snapshot-controller.yaml
root@k8smaster:~/ceph/snap/external-snapshotter/deploy/kubernetes/snapshot-controller# grep
namespace: *
rbac-snapshot-controller.yaml: namespace: rook-ceph
rbac-snapshot-controller.yaml: namespace: rook-ceph
rbac-snapshot-controller.yaml: namespace: rook-ceph
rbac-snapshot-controller.yaml: namespace: rook-ceph
setup-snapshot-controller.yaml: namespace: kube-system
root@k8smaster:~/ceph/snap/external-snapshotter/deploy/kubernetes/snapshot-controller# vi
setup-snapshot-controller.yaml
root@k8smaster:~/ceph/snap/external-snapshotter/deploy/kubernetes/snapshot-controller# grep
namespace: *
rbac-snapshot-controller.yaml: namespace: rook-ceph
rbac-snapshot-controller.yaml: namespace: rook-ceph
rbac-snapshot-controller.yaml: namespace: rook-ceph
rbac-snapshot-controller.yaml: namespace: rook-ceph
setup-snapshot-controller.yaml: namespace: rook-ceph
root@k8smaster:~/ceph/snap/external-snapshotter/deploy/kubernetes/snapshot-controller#

Once you have modified the namespaces to match your Kubernetes cluster, you can apply the
common snapshot controller CRD.

root@k8smaster:~/ceph/snap/external-snapshotter/deploy/kubernetes# kubectl -n kube-system


kustomize ./snapshot-controller | kubectl create -f -
serviceaccount/snapshot-controller created
role.rbac.authorization.k8s.io/snapshot-controller-leaderelection created
clusterrole.rbac.authorization.k8s.io/snapshot-controller-runner created
rolebinding.rbac.authorization.k8s.io/snapshot-controller-leaderelection created
clusterrolebinding.rbac.authorization.k8s.io/snapshot-controller-role created
deployment.apps/snapshot-controller created
root@k8smaster:~/ceph/snap/external-snapshotter/deploy/kubernetes#

You should see new snapshot controller pods deployed. You can check that they are running.

root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k get pods


NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-nbt8b 2/2 Running 5 (34m ago) 2d7h
csi-cephfsplugin-provisioner-76b7548bfb-6cjtz 5/5 Running 8 (34m ago) 2d8h
csi-cephfsplugin-provisioner-76b7548bfb-kv9l9 5/5 Running 7 (34m ago) 2d8h
csi-cephfsplugin-tk66l 2/2 Running 6 (34m ago) 2d8h
csi-cephfsplugin-tlsp8 2/2 Running 6 (34m ago) 2d8h
csi-rbdplugin-5xjt5 2/2 Running 5 (34m ago) 2d7h
csi-rbdplugin-l95gz 2/2 Running 6 (34m ago) 2d8h
csi-rbdplugin-provisioner-7bfcc8659c-5qjqk 5/5 Running 5 (34m ago) 2d8h
csi-rbdplugin-provisioner-7bfcc8659c-tt4jz 5/5 Running 5 (34m ago) 2d8h
csi-rbdplugin-xbvqt 2/2 Running 6 (34m ago) 2d8h
rook-ceph-operator-58bb4fdc9c-bqz8k 1/1 Running 2 (34m ago) 2d8h
snapshot-controller-7c5dccb849-4cnq9 1/1 Running 0 13m
snapshot-controller-7c5dccb849-xs7rk 1/1 Running 0 13m
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd#

156
IBM Storage Ceph for Beginner’s

Let’s create a RBD snapshot class. Use the example YAML files from the rook Git repository we cloned
earlier.

root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# cat snapshotclass.yaml


---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-rbdplugin-snapclass
driver: rook-ceph.rbd.csi.ceph.com # csi-provisioner-name
parameters:
# Specify a string that identifies your cluster. Ceph CSI supports any
# unique string. When Ceph CSI is deployed by Rook use the Rook namespace,
# for example "rook-ceph".
clusterID: rook-ceph # namespace:cluster
csi.storage.k8s.io/snapshotter-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/snapshotter-secret-namespace: rook-ceph # namespace:cluster
deletionPolicy: Delete
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k create -f snapshotclass.yaml
volumesnapshotclass.snapshot.storage.k8s.io/csi-rbdplugin-snapclass created
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k get volumesnapshotclass
NAME DRIVER DELETIONPOLICY AGE
csi-rbdplugin-snapclass rook-ceph.rbd.csi.ceph.com Delete 12s
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd#

We can now take a snapshot of a RBD PVC. Modify the snapshot.yaml file to match your snapshot
name and source volume.

root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k get pvc


NAME STATUS VOLUME CAPACITY ACCESS MODES
STORAGECLASS AGE
cephfs-pvc Bound pvc-a441b483-0921-4735-8cc8-202f82d96cc7 2Gi RWO
cephfs 2d7h
rbd-pvc Bound pvc-a20cda16-332f-4f01-af1e-dc319d61c60c 2Gi RWO
ceph-rbd 2d7h
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# cat snapshot.yaml
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: rbd-pvc-snapshot
spec:
volumeSnapshotClassName: csi-rbdplugin-snapclass
source:
persistentVolumeClaimName: rbd-pvc
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k apply -f snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/rbd-pvc-snapshot created
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd#

root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k get volumesnapshot


NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE
SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME
AGE
rbd-pvc-snapshot true rbd-pvc 2Gi csi-
rbdplugin-snapclass snapcontent-49272fca-81e1-4715-8132-debccda1fcc7 53s 54s
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd#

You can double check the snapshot creation from the Ceph Dashboard as well.

157
IBM Storage Ceph for Beginner’s

Figure 144: Ceph Dashboard – Block - Images

We can also test restoring the snapshot to a new PVC. Modify the storageClassName to match your
RBD storage class and the name of the snapshot you want to restore.

root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
ALLOWVOLUMEEXPANSION AGE
ceph-rbd rook-ceph.rbd.csi.ceph.com Delete Immediate
true 2d8h
cephfs (default) rook-ceph.cephfs.csi.ceph.com Delete Immediate
true 2d8h
rook-ceph-delete-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate
false 2d5h
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# cat pvc-restore.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-pvc-restore
spec:
storageClassName: ceph-rbd
dataSource:
name: rbd-pvc-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k apply -f pvc-restore.yaml
persistentvolumeclaim/rbd-pvc-restore created
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd# k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES
STORAGECLASS AGE
cephfs-pvc Bound pvc-a441b483-0921-4735-8cc8-202f82d96cc7 2Gi RWO
cephfs 2d7h
rbd-pvc Bound pvc-a20cda16-332f-4f01-af1e-dc319d61c60c 2Gi RWO
ceph-rbd 2d7h
rbd-pvc-restore Bound pvc-2715d0fe-1b88-40a3-958d-6cd2b4b99bb0 2Gi RWO
ceph-rbd 5s
root@k8smaster:~/ceph/rook/deploy/examples/csi/rbd#

Now that we create a snapshot class for RBD, we can do the same for CephFS.
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# cat snapshotclass.yaml
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-cephfsplugin-snapclass
driver: rook-ceph.cephfs.csi.ceph.com # csi-provisioner-name
parameters:
# Specify a string that identifies your cluster. Ceph CSI supports any
# unique string. When Ceph CSI is deployed by Rook use the Rook namespace,

158
IBM Storage Ceph for Beginner’s

# for example "rook-ceph".


clusterID: rook-ceph # namespace:cluster
csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/snapshotter-secret-namespace: rook-ceph # namespace:cluster
deletionPolicy: Delete
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k apply -f snapshotclass.yaml
volumesnapshotclass.snapshot.storage.k8s.io/csi-cephfsplugin-snapclass created
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k get volumesnapshotclass
NAME DRIVER DELETIONPOLICY AGE
csi-cephfsplugin-snapclass rook-ceph.cephfs.csi.ceph.com Delete 8s
csi-rbdplugin-snapclass rook-ceph.rbd.csi.ceph.com Delete 13m
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs#

And take a snapshot of an existing CephFS PVC. Modify the snapshot.yaml file to specify the name of
the snapshot and source CephFS PVC.

root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# cat snapshot.yaml


---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: cephfs-pvc-snapshot
spec:
volumeSnapshotClassName: csi-cephfsplugin-snapclass
source:
persistentVolumeClaimName: cephfs-pvc
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k apply -f snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/cephfs-pvc-snapshot created
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k get volumesnapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE
SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME
AGE
cephfs-pvc-snapshot true cephfs-pvc 2Gi csi-
cephfsplugin-snapclass snapcontent-749b2032-b0ec-44d5-a681-8ee554f45db9 8s 8s
rbd-pvc-snapshot true rbd-pvc 2Gi csi-
rbdplugin-snapclass snapcontent-49272fca-81e1-4715-8132-debccda1fcc7 12m
12m
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs#

You can check the creation of the CephFS snapshot on the Ceph Dashboard.

Figure 145: Ceph Dashboard – File – File Systems

Finally, we can restore the CephFS snapshot to a new PVC. Modify the storageClassName to match
your CephFS storage class and the name of the snapshot you want to restore.

root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
ALLOWVOLUMEEXPANSION AGE

159
IBM Storage Ceph for Beginner’s

ceph-rbd rook-ceph.rbd.csi.ceph.com Delete Immediate


true 2d8h
cephfs (default) rook-ceph.cephfs.csi.ceph.com Delete Immediate
true 2d8h
rook-ceph-delete-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate
false 2d5h
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# vi pvc-restore.yaml
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# cat pvc-restore.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cephfs-pvc-restore
spec:
storageClassName: cephfs
dataSource:
name: cephfs-pvc-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k apply -f ./pvc-restore.yaml
persistentvolumeclaim/cephfs-pvc-restore created
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs# k get pvc
NAME STATUS VOLUME CAPACITY ACCESS
MODES STORAGECLASS AGE
cephfs-pvc Bound pvc-a441b483-0921-4735-8cc8-202f82d96cc7 2Gi RWO
cephfs 2d7h
cephfs-pvc-restore Bound pvc-13de23e7-d3c3-4305-b50e-7a1eb71c6c6d 2Gi RWX
cephfs 10s
rbd-pvc Bound pvc-a20cda16-332f-4f01-af1e-dc319d61c60c 2Gi RWO
ceph-rbd 2d7h
rbd-pvc-restore Bound pvc-2715d0fe-1b88-40a3-958d-6cd2b4b99bb0 2Gi RWO
ceph-rbd 7m48s
root@k8smaster:~/ceph/rook/deploy/examples/csi/cephfs#

Practical Use of Ceph RGW for Kubernetes Application Backup


A good use of our Ceph Object Store is to use it for backups. To demonstrate this, we will use Velero.
Velero is an open-source tool that backs up and restores Kubernetes clusters, including their
resources and persistent volumes.

You can download the latest version of Velero here:

https://github.com/vmware-tanzu/velero/releases/tag/v1.15.1

And find out more information including documentation here:

https://velero.io

Before we install Velero, we need to create a new S3 user on Ceph.

160
IBM Storage Ceph for Beginner’s

Figure 146: Ceph Dashboard – Object – Users

And make a new bucket using the Velero user S3 credentials.

root@labserver:~# s3cmd mb s3://velero


Bucket 's3://velero/' created
root@labserver:~#

Download Velero and unpack it on a node in your Kubernetes cluster. Create a text file with the Velero
user S3 credentials.

root@k8smaster:~/velero# cat s3.txt


[default]
aws_access_key_id = O7GDH9W9UFGY6ZC2NRJB
aws_secret_access_key = lXdOfMWI4ncXT7zBKPCb3bkbUA2bMV7FT74ROAXg
root@k8smaster:~/velero#

Next, we can deploy Velero. Note how we specify the Ceph RGW endpoint in the --backup-location-
config option. We also have to specify the location of our SSL certificate since we configured our RGW
to use SSL termination.
root@k8smaster:~/velero# velero install --provider aws --bucket velero --plugins
velero/velero-plugin-for-aws --secret-file ./s3.txt --use-volume-snapshots=false --cacert
./cephs3.pem --backup-location-config
region=default,s3Url=http://cephs3.local:8080,insecureSkipTLSVerify=true

CustomResourceDefinition/backuprepositories.velero.io: attempting to create resource


CustomResourceDefinition/backuprepositories.velero.io: attempting to create resource client
CustomResourceDefinition/backuprepositories.velero.io: created
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: attempting to create resource client
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
.
.
.
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: attempting to create resource client
BackupStorageLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: attempting to create resource client
Deployment/velero: created
Velero is installed! Use 'kubectl logs deployment/velero -n velero' to view the status.
root@k8smaster:~/velero#

161
IBM Storage Ceph for Beginner’s

We can query the Velero deployment on our Kubernetes cluster and also check if the backup location
is available.

root@k8smaster:~/velero# k get pods -n velero


NAME READY STATUS RESTARTS AGE
velero-6986498dcc-58zn8 1/1 Running 0 2m46s
root@k8smaster:~/velero#

root@k8smaster:~/velero# velero backup-location get


NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE
DEFAULT
default aws velero Available 2024-07-28 18:08:52 +0200 SAST ReadWrite
true
root@k8smaster:~/velero#

We can now use Velero to backup our Kubernetes applications. We will use the nginx webserver we
deployed earlier.

root@k8smaster:~/velero# kubectl get all -A --show-labels


NAMESPACE NAME READY
STATUS RESTARTS AGE LABELS
default pod/clear-nginx-deployment-7fc79f5fd-fkpvr 1/1
Running 0 128m app=clear-nginx,pod-template-hash=7fc79f5fd
.
.
.
root@k8smaster:~/velero# velero backup create nginx-backup --selector app=clear-nginx
Backup request "nginx-backup" submitted successfully.
Run `velero backup describe nginx-backup` or `velero backup logs nginx-backup` for more
details.
root@k8smaster:~/velero# velero backup describe nginx-backu
An error occurred: backups.velero.io "nginx-backu" not found
root@k8smaster:~/velero# velero backup describe nginx-backup
Name: nginx-backup
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/resource-timeout=10m0s
velero.io/source-cluster-k8s-gitversion=v1.28.12
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=28

Phase: Completed

Namespaces:
Included: *
Excluded: <none>

Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto

Label selector: app=clear-nginx

Or label selector: <none>

Storage Location: default

Velero-Native Snapshot PVs: auto


Snapshot Move Data: false
Data Mover: velero

TTL: 720h0m0s

CSISnapshotTimeout: 10m0s
ItemOperationTimeout: 4h0m0s

Hooks: <none>

Backup Format Version: 1.1.0

Started: 2024-07-28 18:14:03 +0200 SAST

162
IBM Storage Ceph for Beginner’s

Completed: 2024-07-28 18:14:04 +0200 SAST

Expiration: 2024-08-27 18:14:03 +0200 SAST

Total items to be backed up: 6


Items backed up: 6

Backup Volumes:
Velero-Native Snapshots: <none included>

CSI Snapshots: <none included>

Pod Volume Backups: <none included>

HooksAttempted: 0
HooksFailed: 0
root@k8smaster:~/velero#

You can check the backup objects store on our Ceph cluster.

root@labserver:~/s3-tests# s3cmd ls s3://velero/backups


DIR s3://velero/backups/
root@labserver:~/s3-tests# s3cmd ls s3://velero/backups --recursive
2024-07-28 16:14 29 s3://velero/backups/nginx-backup/nginx-backup-csi-
volumesnapshotclasses.json.gz
2024-07-28 16:14 29 s3://velero/backups/nginx-backup/nginx-backup-csi-
volumesnapshotcontents.json.gz
2024-07-28 16:14 29 s3://velero/backups/nginx-backup/nginx-backup-csi-
volumesnapshots.json.gz
2024-07-28 16:14 27 s3://velero/backups/nginx-backup/nginx-backup-
itemoperations.json.gz
2024-07-28 16:14 4865 s3://velero/backups/nginx-backup/nginx-backup-logs.gz
2024-07-28 16:14 29 s3://velero/backups/nginx-backup/nginx-backup-
podvolumebackups.json.gz
2024-07-28 16:14 176 s3://velero/backups/nginx-backup/nginx-backup-resource-
list.json.gz
2024-07-28 16:14 49 s3://velero/backups/nginx-backup/nginx-backup-results.gz
2024-07-28 16:14 250 s3://velero/backups/nginx-backup/nginx-backup-
volumeinfo.json.gz
2024-07-28 16:14 29 s3://velero/backups/nginx-backup/nginx-backup-
volumesnapshots.json.gz
2024-07-28 16:14 3476 s3://velero/backups/nginx-backup/nginx-backup.tar.gz
2024-07-28 16:14 3091 s3://velero/backups/nginx-backup/velero-backup.json
root@labserver:~/s3-tests#

IBM Storage Ceph Multi-Cluster Management


To improve administrative efficiency, the IBM Storage Ceph Dashboard supports multi-cluster
management. You can also monitor multi-clusters through the Grafana dashboard. This is an example
of a Ceph feature available in IBM Storage Ceph and not yet available in its corresponding open-source
release. Information on how to setup and configure IBM Storage Ceph multi-cluster management can
be found here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=dashboard-managing-multi-clusters-
technology-preview

To setup multi-cluster management, navigate to Multi-Cluster and click on “Connect” to add another
Ceph Storage Cluster. You need to specify the cluster API URL (this is the same as the dashboard URL)
and also the admin username and password. You can also set the login expiration in days.

163
IBM Storage Ceph for Beginner’s

Figure 147: Ceph Dashboard – Multi-Cluster – Connect Cluster

You can see a list of all connected clusters on the Ceph Dashboard. You can launch the Ceph
dashboard of any connected cluster by clicking on the URL.

Figure 148: Ceph Dashboard – Multi-Cluster – Manage Clusters

If you navigate to Multi-Cluster on one of the connected clusters, you will see a message stating that
the cluster is already managed by another cluster with the cluster ID of the other cluster.

164
IBM Storage Ceph for Beginner’s

Figure 149: Ceph Dashboard – Multi-Cluster – Manage Clusters (client)

This feature is useful to demonstrate during a POC if you are deploying multiple clusters (e.g. for
replication). You can see rolled up information from the management cluster by navigating to Multi-
Cluster -> Overview (scroll down to see information).

Figure 150: Ceph Dashboard – Multi-Cluster – Overview

165
IBM Storage Ceph for Beginner’s

Figure 151: Ceph Dashboard – Multi-Cluster – Overview

Figure 152: Ceph Dashboard – Multi-Cluster – Overview

Open source Ceph Reef (18.2.4) did not include this feature at the time of creating this document.

166
IBM Storage Ceph for Beginner’s

Figure 153: Open-source Ceph Dashboard – No Multi-Cluster

IBM Storage Ceph Replication – RBD Snapshots


As with any block storage array, Ceph supports RBD snapshots. The typical use case for snapshots is
to perform backups or to clone volumes. As part of a POC, you will most likely need to demonstrate
this capability. Ceph creates copy-on-write (COW) snapshots. It is important to note that like any other
block storage array, these snapshots are crash-consistent and not application consistent. The
snapshots only consume space when the source image changes and they are not visible to any RBD
client. RBD snapshots consume space in the primary storage pool of the source RBD image so you
would need factor this into your capacity planning. Since these snapshots are not visible to clients,
with the correct user roles or access they are similar to IBM Storage Scale Safeguarded Copy (SGC)
and can be used as part of your Cyber Resiliency strategy to protect against cyber or ransomware
attacks. A RBD snapshot can be copied to another pool or cloned to a new RBD image which can be
assigned to client. Prior to cloning though, you would need to protect the snapshot you are using for
the clone so that it cannot be deleted. An explanation of RBD image snapshots can be found here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=images-creating-snapshots

Below is the procedure to showcase RBD image snapshots. We will first map this volume on our RBD
client and write a file to it so that we can later demonstrate how to rollback the source image from a
snapshot.

Create an RBD image and map it to a client (you would need to create a Ceph User for the RBD client
as we did before (e.g. ceph auth get-or-create client.ndcephrbd mon 'profile rbd' osd 'profile rbd
pool=rbd' -o /etc/ceph/ceph.client.ndcephrbd.keyring). Copy the client keyring and the ceph.conf file
from one of your cluster nodes to the RBD client then map the volume, create a filesystem if necessary
and mount it. Once mounted, we will create a file to demonstrate how we can use snapshots to
rollback the RBD image.

root@labserver:/etc/ceph# rbd showmapped


id pool namespace image snap device
0 rbd-mirror rbd_labserver - /dev/rbd0
root@labserver:/etc/ceph#

167
IBM Storage Ceph for Beginner’s

root@labserver:/mnt/rbd# ls -alt
total 8
drwxr-xr-x 2 root root 30 Dec 31 12:53 .
-rw-r--r-- 1 root root 398 Dec 31 12:53 file_before_snap
drwxr-xr-x 13 root root 4096 Aug 4 21:49 ..
root@labserver:/mnt/rbd#

Next, on the Ceph Dashboard, navigate to Block, Images and on the selected RBD image click on the
Snapshots tab and Create a new snapshot.

Figure 154: Ceph Dashboard – Block – Images - Snapshots

Once the snapshot is created, note the options we have available. We will demonstrate the rollback
and clone functionality.

Figure 155: Ceph Dashboard – Block – Images – Snapshots - Actions

168
IBM Storage Ceph for Beginner’s

Next, we will create a new file after we took the snapshot. Then we can unmount the filesystem and
unmap the RBD image prior to us restoring the source image from the snapshot we just created. This
is done of course to ensure consistency of the filesystem and is the typical procedure employed on all
of our IBM storage arrays.

root@labserver:/mnt/rbd# ls -alt
total 12
drwxr-xr-x 2 root root 53 Dec 31 12:57 .
-rw-r--r-- 1 root root 398 Dec 31 12:57 file_after_snap
-rw-r--r-- 1 root root 398 Dec 31 12:53 file_before_snap
drwxr-xr-x 13 root root 4096 Aug 4 21:49 ..
root@labserver:/mnt/rbd#

root@labserver:~# umount /mnt/rbd


root@labserver:~# rbd unmap /dev/rbd0
root@labserver:~# rbd showmapped
root@labserver:~#

Now we can rollback the source image from the snapshot from the Ceph Dashboard.

Figure 156: Ceph Dashboard – Block – Images – Snapshots – Rollback

Depending on the size of the RBD image, it might take some time to complete.

Figure 157: Ceph Dashboard – Block – Images – Snapshots – Rollback

169
IBM Storage Ceph for Beginner’s

To confirm the rollback was successful, we can remap the RBD image and remount the filesystem and
we should only see the file(s) we had prior to taking the snapshot.

root@labserver:~# rbd map --id ndcephrbd rbd-mirror/rbd_labserver


/dev/rbd0
root@labserver:~# mount /dev/rbd0 /mnt/rbd
root@labserver:~# cd /mnt/rbd
root@labserver:/mnt/rbd# ls -alt
total 8
drwxr-xr-x 2 root root 30 Dec 31 12:53 .
-rw-r--r-- 1 root root 398 Dec 31 12:53 file_before_snap
drwxr-xr-x 13 root root 4096 Aug 4 21:49 ..
root@labserver:/mnt/rbd#

The above procedure would suffice to test the creation and recovery from a block snapshot. Let us
now clone the snapshot to a new RBD image. First, we need to protect the source snapshot image.

Figure 158: Ceph Dashboard – Block – Images – Snapshots – Protect

Now we can clone the snapshot to a new RBD image.

Figure 159: Ceph Dashboard – Block – Images – Snapshots – Clone

170
IBM Storage Ceph for Beginner’s

After the clone completes, we should have a new RBD image that would be visible to RBD clients who
have the correct permissions on the RBD pool it resides in.

Figure 160: Ceph Dashboard – Block – Images

To demonstrate that the clone is available for use, we can map it to our desired RBD client, mount the
filesystem and list the contents as follows:

root@labserver:/mnt/rbd# rbd --id ndcephrbd --pool rbd-mirror ls


clonerbd
rbd_labserver
root@labserver:/mnt/rbd# rbd map --id ndcephrbd rbd-mirror/clonerbd
/dev/rbd1
root@labserver:/mnt/rbd# mkdir /mnt/clone
root@labserver:/mnt/rbd# mount /dev/rbd0 /mnt/clone
root@labserver:/mnt/rbd# ls -al /mnt/clone
total 8
drwxr-xr-x 2 root root 30 Dec 31 12:53 .
drwxr-xr-x 14 root root 4096 Dec 31 13:04 ..
-rw-r--r-- 1 root root 398 Dec 31 12:53 file_before_snap
root@labserver:/mnt/rbd#

The above two use cases should be sufficient to demonstrate block snapshots for your POC.

IBM Storage Ceph Replication – CephFS Snapshots


Similar to RBD snapshots, CephFS snapshots are immutable point-in-time copies of the active
filesystem. Again, with the correct user roles and access these snapshots can be incorporated into
your Cyber Resiliency strategy as mentioned earlier because of their immutability. Before we
demonstrate filesystem snapshots, there are a few key concepts we need to be aware of.

• CephFS volumes are an abstraction for CephFS file systems


• CephFS subvolumes are an abstraction for independent CephFS directory trees
• CephFS subvolume groups are an abstraction for a directory level higher than CephFS
subvolumes to effect policies (e.g., File layouts) across a set of subvolumes

The use of subvolumes and subvolume groups are outside the scope of this document (they are
typically used with OpenStack for example). Snapshots are currently supported on volumes and
subvolumes (as of IBM Storage Ceph 7.1). As with RBD snapshots, you can clone a subvolume
snapshot to a new subvolume.

171
IBM Storage Ceph for Beginner’s

For our purposes we will demonstrate creating a snapshot of the CephFS filesystem (CephFS volume).
This process is documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=snapshots-creating-snapshot-ceph-file-
system

On the Ceph Dashboard, if you navigate to File Systems and select a filesystem, under the Snapshots
tab you will notice that it requires the use of subvolumes.

Figure 161: Ceph Dashboard – File Systems – Snapshots

One useful feature of CephFS snapshots is the ability to use the snapshot scheduler to automate
snapshot creation and retention. Unfortunately, this feature is supported on CephFS directories or
subvolumes only.

Figure 162: Ceph Dashboard – File Systems – Snapshots

In order to demonstrate an entire CephFS volume (filesystem) snapshot, we will use an existing
filesystem called cephfs and create a filesystem snapshot on the command line and the Ceph
Dashboard. First, let us authorize a new Ceph user called client.ndceph who has access to create

172
IBM Storage Ceph for Beginner’s

snapshots (for the command line use case). We need to allow snapshots so need to tick this box. We
should also untick Root Squash.

Figure 163: Ceph Dashboard – File Systems – Create

As we did in the section describing how to deploy CephFS, we will mount the filesystem on a client.
We need the client keyring obtained as below after authorizing a new filesystem user on the Ceph
Dashboard.

[root@ndceph ceph]# ceph auth get client.ndcephfs


[client.ndcephfs]
key = AQDB0XNnDa0YBxAA8X+JPu0TDktr3+zwjhPRjQ==
caps mds = "allow rws fsname=cephfs"
caps mon = "allow r fsname=cephfs"
caps osd = "allow rw tag cephfs data=cephfs"
[root@ndceph ceph]# ceph auth get client.ndcephfs > /etc/ceph/ceph.client.ndcephfs.keyring

We need to copy the client keyring and the /etc/ceph.conf from the cluster node we ran the above
command. Then we can mount the filesystem as follows:

root@labserver:~# mount -t ceph [email protected]=/ /cephfs


-v
parsing options: rw
mount.ceph: resolved to: "10.0.0.235:3300"
mount.ceph: trying mount with new device syntax: ndcephfs@6f105f06-562e-11ef-8666-
525400463683.cephfs=/
mount.ceph: options "name=ndcephfs,ms_mode=prefer-crc,key=ndcephfs,mon_addr=10.0.0.235:3300"
will pass to kernel
root@labserver:~# df
Filesystem 1K-blocks Used Available Use% Mounted
on
tmpfs 401000 1028 399972 1% /run
/dev/sda2 41102636 21848724 17350352 56% /
tmpfs 2005000 84 2004916 1%
/dev/shm
tmpfs 5120 0 5120 0%
/run/lock
tmpfs 401000 16 400984 1%
/run/user/0
/dev/rbd0 3080192 92376 2987816 3%
/mnt/rbd
[email protected]=/ 29691904 0 29691904 0% /cephfs
root@labserver:~#

On one of our Ceph cluster nodes, we need to enable new snapshots on all existing filesystem.

173
IBM Storage Ceph for Beginner’s

[root@ndceph ceph]# ceph fs set cephfs allow_new_snaps true


enabled new snapshots
[root@ndceph ceph]#

Next, we can copy an arbitrary file to the new filesystem prior to creating a command line snapshot.

root@labserver:/mnt/cephfs# cp /etc/hosts .
root@labserver:/mnt/cephfs# ls -alt
total 5
-rw-r--r-- 1 root root 398 Dec 31 13:53 hosts
drwxr-xr-x 2 root root 1 Dec 31 13:53 .
drwxr-xr-x 15 root root 4096 Dec 31 13:49 ..
root@labserver:/mnt/cephfs#

To create a CephFS snapshot from the command line, we need to navigate to the hidden .snap
directory in the root of our Ceph filesystem.

root@labserver:/mnt/cephfs# cd .snap
root@labserver:/mnt/cephfs/.snap#
root@labserver:/mnt/cephfs/.snap# pwd
/mnt/cephfs/.snap
root@labserver:/mnt/cephfs/.snap#

And create a directory with mkdir to trigger a snapshot creation.

root@labserver:/mnt/cephfs/.snap# mkdir mysnap


root@labserver:/mnt/cephfs/.snap# ls -alt
total 0
drwxr-xr-x 2 root root 1 Dec 31 13:53 ..
drwxr-xr-x 2 root root 1 Dec 31 13:53 mysnap
drwxr-xr-x 2 root root 1 Dec 31 13:49 .
root@labserver:/mnt/cephfs/.snap#
We can list the contents of the directory and it should match the contents of the active filesystem at
the time when we created the snapshot.

root@labserver:/mnt/cephfs/.snap/mysnap# ls -alt
total 1
-rw-r--r-- 1 root root 398 Dec 31 13:53 hosts
drwxr-xr-x 2 root root 1 Dec 31 13:53 .
drwxr-xr-x 2 root root 1 Dec 31 13:49 ..
root@labserver:/mnt/cephfs/.snap/mysnap#

If you don’t want to use the command line, you can use the Ceph Dashboard to create a filesystem
snapshot. Navigate to File systems -> cephfs -> Directories and click on Create Snapshot.

Figure 164: Ceph Dashboard – File Systems – Directories – Create Snapshot

174
IBM Storage Ceph for Beginner’s

Figure 165: Ceph Dashboard – File Systems – Directories – List Snapshots

We can verify this from the active filesystem as well on a CephFS client.

root@labserver:/mnt/cephfs/.snap# ls -alt
total 0
drwxr-xr-x 2 root root 1 Dec 31 13:53 ..
drwxr-xr-x 2 root root 1 Dec 31 13:53 2024-12-31T14:00:02.978+02:00
drwxr-xr-x 2 root root 1 Dec 31 13:53 mysnap
drwxr-xr-x 2 root root 2 Dec 31 13:49 .
root@labserver:/mnt/cephfs/.snap#
root@labserver:/mnt/cephfs/.snap# ls -alt 2024-12-31T14:00:02.978+02:00
total 1
-rw-r--r-- 1 root root 398 Dec 31 13:53 hosts
drwxr-xr-x 2 root root 1 Dec 31 13:53 .
drwxr-xr-x 2 root root 2 Dec 31 13:49 ..
root@labserver:/mnt/cephfs/.snap#

Lastly, as mentioned earlier, the snapshots are immutable.

root@labserver:/mnt/cephfs/.snap# rm -fr mysnap


rm: cannot remove 'mysnap/hosts': Read-only file system
root@labserver:/mnt/cephfs/.snap#

IBM Storage Ceph Replication – RBD Mirroring


RBD mirroring is achieved via asynchronous replication of RBD images between two or more Ceph
storage clusters. RBD mirroring supports either journal-based mirroring or snapshot-based mirroring.
For journalling, all writes are written to a journal on the primary cluster which the secondary cluster
uses to replay the updates to its local copy. Snapshot mirroring basically just takes a periodic snapshot
and computes difference and writes that to its local copy. Journalling maintains write order
consistency and provides the lowest RPO. The rbd-mirror daemon is responsible for synchronizing
images between Ceph storage clusters. It does this by pulling changes from the primary image and
applying those changes to the secondary or non-primary image. Ceph supports either one-way or two-
way replication. One-way replication is required when you need to replicate to multiple secondary
clusters. For one-way replication, the rbd-mirror daemon runs only on the secondary cluster(s) and
the non-primary rbd image is read-only.

175
IBM Storage Ceph for Beginner’s

Figure 166: Ceph One-way RBD Replication (https://www.ibm.com/docs/en/storage-


ceph/7.1?topic=devices-ceph-block-device-mirroring)

Two-way replication supports failover and failback (demote primary image and promote the non-
primary image). Both the primary and secondary images are enabled for writes. Changes to the RBD
images on the secondary cluster will be replicated back to the primary. For two-way mirroring, the
rbd-mirror daemon runs on both clusters. Two-way mirroring also only supports two sites.

Figure 167: Ceph Two-way RBD Replication (https://www.ibm.com/docs/en/storage-


ceph/7.1?topic=devices-ceph-block-device-mirroring)

RBD mirroring is configured at a pool level and Ceph supports two mirroring modes. Either at the pool
level (all RBD images are mirrored) or image level (only a subset of images are replicated). A full
explanation of RBD mirroring can be found here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=devices-mirroring-ceph-block

Important: The CRUSH hierarchies supporting primary and secondary pools that mirror block
device images must have the same capacity and performance characteristics, and must have
adequate bandwidth to ensure mirroring without excess latency. For example, if you have X MB/s
average write throughput to images in the primary storage cluster, the network must support N * X
throughput in the network connection to the secondary site plus a safety factor of Y% to mirror N
images. (https://www.ibm.com/docs/en/storage-ceph/7.1?topic=devices-ceph-block-device-
mirroring)

For a POC environment, you most likely would need to demonstrate this functionality. We will setup
two-way replication to showcase how we can tolerate a site failure for Ceph RBD clients. Our test
environment is making use of two single node IBM Storage Ceph clusters (for a POC, you could use a

176
IBM Storage Ceph for Beginner’s

single node cluster for the secondary cluster). As per the diagram below, we will mirror an RBD image
from RCEPH to LCEPH. We have a Ceph RBD client called labserver to test client access to the RBD
image.

Figure 168: Ceph RBD Mirroring Lab Environment

The first thing we need to do for two-way replication is to deploy the rbd-mirror service on each of our
clusters.

Figure 169: Ceph Dashboard – Administration – Services – Create RBD Mirror Service on RCEPH

177
IBM Storage Ceph for Beginner’s

Figure 170: Ceph Dashboard – Administration – Services – Create RBD Mirror Service on LCEPH

If we navigate to Block -> Mirroring, we should see our rbd-mirror daemon running and the current
mirroring mode that is enabled for our rbd pool. Note, you can edit the Site Name to make it easier to
understand the configuration.

Figure 171: Ceph Dashboard – Block – Mirroring on RCEPH

178
IBM Storage Ceph for Beginner’s

Figure 172: Ceph Dashboard – Block – Mirroring on LCEPH

We want to enable RBD mirroring for all newly created images. We can enable this as follows:

[root@rceph ~]# ceph config set global rbd_default_features 125


[root@rceph ~]# ceph config show mon.rceph rbd_default_features
125
[root@rceph ~]#

[root@lceph ~]# ceph config set global rbd_default_features 125


[root@lceph ~]# ceph config show mon.lceph rbd_default_features
125
[root@lceph ~]#

If you already have RBD images created in your RBD pool, you can enable mirroring as follows:
rbd feature enable <POOL_NAME>/<IMAGE_NAME> exclusive-lock, journaling

If we query the mirror info for our RBD pool called rbd, we will see that we don’t have any peer sites
defined yet.

[root@rceph ~]# rbd mirror pool info rbd


Mode: pool
Site Name: RCEPH

Peer Sites: none


[root@rceph ~]#

[root@lceph ~]# rbd mirror pool info rbd


Mode: pool
Site Name: LCEPH

Peer Sites: none


[root@lceph ~]#

You might see a warning for the rbd-mirror daemon health status. This will disappear as soon as we
define the peer clusters at both sites.

179
IBM Storage Ceph for Beginner’s

[root@rceph ~]# rbd mirror pool status rbd --verbose


health: WARNING
daemon health: WARNING
image health: OK
images: 0 total

DAEMONS
service 124170:
instance_id:
client_id: rceph.qxvdwv
hostname: rceph.local
version: 18.2.1-194.el9cp
leader: false
health: WARNING
callouts: not reporting status

service 134099:
instance_id:
client_id: rceph.qxvdwv
hostname: rceph.local
version: 18.2.1-194.el9cp
leader: false
health: OK

IMAGES[root@rceph ~]#

We now need to bootstrap the peer clusters. You can navigate to Block -> Mirroring and click on the
top right option “Create Bootstrap Token” on the primary site which is RCEPH.

Figure 173: Ceph Dashboard – Block – Mirroring – Create Bootstrap Token on RCEPH

We now need to import the bootstrap token on secondary site cluster. Navigate to Block -> Mirroring
and click on the down icon next to the Create Bootstrap Token to Import a Bootstrap token. On the
Import Bootstrap Token page make sure to select bi-directional for two-way replication, insert the
local site’s name (LCEPH), choose the pool we want mirroring enabled on (we only have one RBD pool)
and paste the peer’s bootstrap token.

180
IBM Storage Ceph for Beginner’s

Figure 174: Ceph Dashboard – Block – Mirroring – Import Bootstrap Token on LCEPH

You can do the bootstrap process from the command line as well as follows:

[root@rceph ~]# rbd mirror pool peer bootstrap create --site-name RCEPH rbd > rceph.rbd
[root@rceph ~]#
[root@lceph ~]# rbd mirror pool peer bootstrap import --site-name LCEPH --direction rx-tx
rbd /tmp/rceph.rbd
[root@lceph ~]#

On the Ceph dashboard for each cluster, we should see both set to leaders with no images being
mirrored yet.

Figure 175: Ceph Dashboard – Block – Mirroring on RCEPH

181
IBM Storage Ceph for Beginner’s

Figure 176: Ceph Dashboard – Block – Mirroring on LCEPH

On RCEPH we can create an RBD image to mirror. We will name it linuxlun. Mirroring should be auto-
selected as we enabled mirroring for all new images.

Figure 177: Ceph Dashboard – Block – Images – Create on RCEPH

In the current version of Ceph, you cannot mirror a RBD image that belongs to a namespace. See
https://bugzilla.redhat.com/show_bug.cgi?id=2024444.
[root@cephnode1 ~]# rbd mirror image enable rbd/winlun --namespace=windows
2024-08-03T21:18:28.068+0200 7f060e20dc00 -1 librbd::api::Mirror: image_enable: cannot
enable mirroring: mirroring is not enabled on a namespace
[root@cephnode1 ~]#
This should be fixed in the next release of IBM Storage Ceph.

182
IBM Storage Ceph for Beginner’s

Make sure to create a corresponding client RBD user with the ceph auth get-or-create command as
discussed in the RBD deployment section (with MON and OSD capabilities) on both Ceph clusters. For
our test we will use a Ceph user called drlinux. Once the image is created, we should see it set as
primary on RCEPH and secondary on LCEPH.

A maximum of 5 snapshots are retained by default. If required, the limit can be overridden through
the rbd_mirroring_max_mirroring_snapshots configuration option. All snapshots are automatically
removed when the RBD image is deleted or when mirroring is disabled.

Figure 178: Ceph Dashboard – Block – Images on RCEPH

Figure 179: Ceph Dashboard – Block – Images on LCEPH

183
IBM Storage Ceph for Beginner’s

On RCEPH, the mirroring status should be STOPPED (since we are the primary for this image).

Figure 180: Ceph Dashboard – Block – Mirroring on RCEPH

On LCEPH, the status should be REPLAYING.

Figure 181: Ceph Dashboard – Block – Mirroring on LCEPH

To get more details of the mirroring status (including the transfer speed and last update) we can use
the command line.

[root@rceph ~]# rbd mirror pool status rbd --verbose


health: OK
daemon health: OK
image health: OK
images: 1 total

184
IBM Storage Ceph for Beginner’s

1 replaying

DAEMONS
service 134099:
instance_id: 134159
client_id: rceph.qxvdwv
hostname: rceph.local
version: 18.2.1-194.el9cp
leader: true
health: OK

IMAGES
linuxlun:
global_id: b5bc5fff-824e-42e9-9afb-b9a1a553bf4d
state: up+stopped
description: local image is primary
service: rceph.qxvdwv on rceph.local
last_update: 2024-08-04 10:12:18
peer_sites:
name: LCEPH
state: up+replaying
description: replaying,
{"bytes_per_second":0.0,"entries_behind_primary":0,"entries_per_second":0.0,"non_primary_posi
tion":{"entry_tid":3,"object_number":3,"tag_tid":15},"primary_position":{"entry_tid":3,"objec
t_number":3,"tag_tid":15}}
last_update: 2024-08-04 10:12:19
[root@rceph ~]#

The next step in a POC would be to simulate a DR scenario. To do this we will mount our RBD image
on our RBD client and write some data to it and then failover to the secondary cluster and access the
same RBD image.

Trying to map the RBD image though fails on the client (Ubuntu Server 22.04). See below:

root@labserver:~# rbd ls --id drlinux --conf /etc/ceph/dr/ceph.conf --keyring


/etc/ceph/dr/ceph.client.drlinux.keyring
linuxlun
root@labserver:~# rbd map --id drlinux rbd/linuxlun --conf /etc/ceph/dr/ceph.conf --keyring
/etc/ceph/dr/ceph.client.drlinux.keyring
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd
feature disable linuxlun journaling".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address
root@labserver:~# dmesg | tail
[641096.336457] XFS (rbd0): Unmounting Filesystem afdf8246-15b1-40a2-b430-74f7b5b8c1e0
[672558.036713] libceph: mon0 (1)10.0.0.239:6789 session established
[672558.037993] libceph: client114144 fsid d3a18482-5038-11ef-8136-525400be8183
[672558.070471] rbd: rbd0: capacity 3221225472 features 0x3d
[672593.946202] XFS (rbd0): Mounting V5 Filesystem 27f81964-0d2d-423c-ba6d-61a1839b582a
[672594.092033] XFS (rbd0): Ending clean mount
[675303.392634] XFS (rbd0): Unmounting Filesystem 27f81964-0d2d-423c-ba6d-61a1839b582a
[678617.285196] libceph: mon0 (1)10.0.0.239:6789 session established
[678617.286437] libceph: client134172 fsid d3a18482-5038-11ef-8136-525400be8183
[678617.302129] rbd: image linuxlun: image uses unsupported features: 0x40
root@labserver:~#

The Ubuntu 22.04 kernel does not support one of the RBD image features (0x40). After some research,
this feature corresponds to the journalling flag that was set on the RBD image at creation. Later Linux
kernels have support for all the advanced RBD features as explained here
https://access.redhat.com/solutions/4270092. For the purposes of this document, instead of trying
a later Linux kernel on our RBD client it is easier to just change from using journal mode to use
snapshots since our goal is to demonstrate the failover/failback process. If you are using a later Linux
kernel version than our RBD client (Linux kernel version 6.5) that supports RBD journalling then you
don’t have to change to using snapshots.

185
IBM Storage Ceph for Beginner’s

If you have existing RBD images you can convert from journal based mirroring to snapshot based
mirroring without having to delete any images.
https://www.ibm.com/docs/en/storage-ceph/7.1?topic=devices-converting-journal-based-
mirroring-snapshot-based-mirroring

On both RCEPH and LCEPH, delete the RBD image and then change the mirroring mode to image (from
pool).

Figure 182: Ceph Dashboard – Block – Mirroring – Pool Mirroring Mode on RCEPH

Do the same on LCEPH. Next, recreate the RBD image specifying Snapshot mirroring on RCEPH and
choose a suitable schedule interval (we have chosen every 3 minutes).

A maximum of 5 snapshots are retained by default. If required, the limit can be overridden through
the rbd_mirroring_max_mirroring_snapshots configuration option. All snapshots are automatically
removed when the RBD image is deleted or when mirroring is disabled.

186
IBM Storage Ceph for Beginner’s

Figure 183: Ceph Dashboard – Block – Images – Pool Mirroring Mode on RCEPH

Soon after creating the new RBD image, you should see automatic snapshots being taken for the
snapshot mirroring process.

Figure 184: Ceph Dashboard – Block – Images – Snapshots on RCEPH

If we query the pool mirroring status now, we can see the details for the last snapshot etc.

[root@rceph ~]# rbd mirror pool status rbd --verbose


health: OK
daemon health: OK
image health: OK
images: 1 total
1 replaying

DAEMONS

187
IBM Storage Ceph for Beginner’s

service 134099:
instance_id: 134159
client_id: rceph.qxvdwv
hostname: rceph.local
version: 18.2.1-194.el9cp
leader: true
health: OK

IMAGES
linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: up+stopped
description: local image is primary
service: rceph.qxvdwv on rceph.local
last_update: 2024-08-04 10:35:18
peer_sites:
name: LCEPH
state: up+replaying
description: replaying,
{"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"last_snapshot_bytes":0,"last_snapshot_sync_
seconds":0,"local_snapshot_timestamp":1722760410,"remote_snapshot_timestamp":1722760410,"repl
ay_state":"idle"}
last_update: 2024-08-04 10:35:19
[root@rceph ~]#

Next, we can mount this RBD image on our client, create a filesystem and write some data to it.

root@labserver:~# rbd map --id drlinux rbd/linuxlun --conf /etc/ceph/dr/ceph.conf --keyring


/etc/ceph/dr/ceph.client.drlinux.keyring
/dev/rbd0
root@labserver:~#
root@labserver:~# rbd showmapped
id pool namespace image snap device
0 rbd linuxlun - /dev/rbd0
root@labserver:~#

root@labserver:~# mkfs.xfs /dev/rbd0


meta-data=/dev/rbd0 isize=512 agcount=8, agsize=98304 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=1 inobtcount=1 nrext64=0
data = bsize=4096 blocks=786432, imaxpct=25
= sunit=16 swidth=16 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=16384, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Discarding blocks...Done.
root@labserver:~#

root@labserver:~# mount /dev/rbd0 /mnt/rbd/


root@labserver:~# cd /mnt/rbd
root@labserver:/mnt/rbd# dd if=/dev/zero of=600mb bs=1024k count=600
600+0 records in
600+0 records out
629145600 bytes (629 MB, 600 MiB) copied, 3.43661 s, 183 MB/s
root@labserver:/mnt/rbd#

root@labserver:/mnt/rbd# ls -alt /mnt/rbd


total 614404
-rw-r--r-- 1 root root 629145600 Aug 4 10:38 600mb
drwxr-xr-x 2 root root 19 Aug 4 10:38 .
drwxr-xr-x 6 root root 4096 Jul 27 13:06 ..
root@labserver:/mnt/rbd#

We can query the pool mirroring status again to see if data is being transferred.

root@rceph ~]# rbd mirror image status rbd/linuxlun


linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: up+stopped
description: local image is primary

188
IBM Storage Ceph for Beginner’s

service: rceph.qxvdwv on rceph.local


last_update: 2024-08-04 10:40:48
peer_sites:
name: LCEPH
state: up+replaying
description: replaying,
{"bytes_per_second":0.0,"bytes_per_snapshot":315882496.0,"last_snapshot_bytes":631764992,"las
t_snapshot_sync_seconds":6,"local_snapshot_timestamp":1722760740,"remote_snapshot_timestamp":
1722760740,"replay_state":"idle"}
last_update: 2024-08-04 10:40:49
snapshots:
7 .mirror.primary.0ee454a8-cf71-48e2-95ea-96f6a737f5ac.fae94c85-2671-416e-94a1-
8be2cdd6b490 (peer_uuids:[])
8 .mirror.primary.0ee454a8-cf71-48e2-95ea-96f6a737f5ac.12d9d443-28d4-4f58-b73a-
b288e32f9e55 (peer_uuids:[49c540be-90cc-4587-9b72-33c943902fb5])
[root@rceph ~]#

And we can query the mirroring status for the actual RBD image on the secondary cluster to check the
last snapshot sync time.

[root@lceph ~]# rbd mirror image status rbd/linuxlun


linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: up+replaying
description: replaying,
{"bytes_per_second":0.0,"bytes_per_snapshot":315882496.0,"last_snapshot_bytes":631764992,"las
t_snapshot_sync_seconds":6,"local_snapshot_timestamp":1722760740,"remote_snapshot_timestamp":
1722760740,"replay_state":"idle"}
service: lceph.quprif on lceph.local
last_update: 2024-08-04 10:40:19
peer_sites:
name: RCEPH
state: up+stopped
description: local image is primary
last_update: 2024-08-04 10:40:18
[root@lceph ~]#

Let us now simulate DR (with a planned failover). First we will write a new file prior to stopping client
RBD access to the primary image.

root@labserver:~# echo "BEFORE FAILOVER" > /mnt/rbd/before.failover


root@labserver:~# ls -al /mnt/rbd
total 8
drwxr-xr-x 2 root root 42 Aug 4 11:06 .
drwxr-xr-x 6 root root 4096 Jul 27 13:06 ..
-rw-r--r-- 1 root root 0 Aug 4 10:56 600mb
-rw-r--r-- 1 root root 16 Aug 4 11:06 before.failover
root@labserver:~# umount /mnt/rbd
root@labserver:~# rbd unmap --id drlinux rbd/linuxlun --conf /etc/ceph/dr/ceph.conf --keyring
/etc/ceph/dr/ceph.client.drlinux.keyring
root@labserver:~# rbd showmapped
root@labserver:~#

We can check the mirror status to make sure the last snapshot update is after we created the test file
and that the health status is good.

[root@rceph ~]# rbd mirror pool status rbd --verbose


health: OK
daemon health: OK
image health: OK
images: 1 total
1 replaying

DAEMONS
service 134099:
instance_id: 134159
client_id: rceph.qxvdwv
hostname: rceph.local
version: 18.2.1-194.el9cp
leader: true
health: OK

189
IBM Storage Ceph for Beginner’s

IMAGES
linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: up+stopped
description: local image is primary
service: rceph.qxvdwv on rceph.local
last_update: 2024-08-04 11:07:18
peer_sites:
name: LCEPH
state: up+replaying
description: replaying,
{"bytes_per_second":0.0,"bytes_per_snapshot":1114368.0,"last_snapshot_bytes":2097152,"last_sn
apshot_sync_seconds":0,"local_snapshot_timestamp":1722762360,"remote_snapshot_timestamp":1722
762360,"replay_state":"idle"}
last_update: 2024-08-04 11:07:19
[root@rceph ~]#

On RCEPH, we can DEMOTE the RBD image.

[root@rceph ~]# rbd mirror image demote rbd/linuxlun


Image demoted to non-primary
[root@rceph ~]#

And on LCEPH, we can PROMOTE the RBD image (if this was an unplanned failover, you can use the -
-force option).

[root@lceph ~]# rbd mirror image promote rbd/linuxlun


Image promoted to primary
[root@lceph ~]#

On RCEPH, our RBD image is now the secondary.

Figure 185: Ceph Dashboard – Block – Images on RCEPH

And on LCEPH our RBD image is now the primary.

190
IBM Storage Ceph for Beginner’s

Figure 186: Ceph Dashboard – Block – Images on LCEPH

We can query the mirroring status to make sure RCEPH is now set to replaying.

[root@lceph ~]# rbd mirror pool status rbd --verbose


health: OK
daemon health: OK
image health: OK
images: 1 total
1 replaying

DAEMONS
service 104106:
instance_id: 104159
client_id: lceph.quprif
hostname: lceph.local
version: 18.2.1-194.el9cp
leader: true
health: OK

IMAGES
linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: up+stopped
description: local image is primary
service: lceph.quprif on lceph.local
last_update: 2024-08-04 11:09:49
peer_sites:
name: RCEPH
state: up+replaying
description: replaying,
{"bytes_per_second":0.0,"bytes_per_snapshot":370176.0,"last_snapshot_bytes":370176,"last_snap
shot_sync_seconds":0,"local_snapshot_timestamp":1722762520,"remote_snapshot_timestamp":172276
2520,"replay_state":"idle"}
last_update: 2024-08-04 11:09:48
[root@lceph ~]#

Let us map the RBD image from LCEPH on our RBD client, mount the filesystem and check to see if we
have the latest data (the file we created prior to the planned failover in our example).

root@labserver:~# rbd map --id linux rbd/linuxlun --conf /etc/ceph/lceph/lceph.conf --keyring


/etc/ceph/lceph/ceph.client.linux.keyring
/dev/rbd0
root@labserver:~#

191
IBM Storage Ceph for Beginner’s

root@labserver:~# rbd showmapped


id pool namespace image snap device
0 rbd linuxlun - /dev/rbd0
root@labserver:~# mount /dev/rbd0 /mnt/rbd/
root@labserver:~# ls -alt /mnt/rbd
total 8
drwxr-xr-x 2 root root 42 Aug 4 11:06 .
-rw-r--r-- 1 root root 16 Aug 4 11:06 before.failover
-rw-r--r-- 1 root root 0 Aug 4 10:56 600mb
drwxr-xr-x 6 root root 4096 Jul 27 13:06 ..
root@labserver:~#

To simulate running workload at the secondary site (which is now the primary), we will create a file.

root@labserver:~# echo "AFTER FAILOVER" > /mnt/rbd/after.failover


root@labserver:~# ls -alt /mnt/rbd
total 12
drwxr-xr-x 2 root root 64 Aug 4 11:14 .
-rw-r--r-- 1 root root 15 Aug 4 11:14 after.failover
-rw-r--r-- 1 root root 16 Aug 4 11:06 before.failover
-rw-r--r-- 1 root root 0 Aug 4 10:56 600mb
drwxr-xr-x 6 root root 4096 Jul 27 13:06 ..
root@labserver:~#

Now let’s failback to the old primary (RCEPH). First, we will unmap the RBD image on our client.

root@labserver:~# umount /mnt/rbd


root@labserver:~# rbd unmap --id linux rbd/linuxlun --conf /etc/ceph/lceph/lceph.conf --
keyring /etc/ceph/lceph/ceph.client.linux.keyring
root@labserver:~# rbd showmapped
root@labserver:~#

Next, we need to check the current mirroring status and make sure it is healthy on LCEPH.

[root@lceph ~]# rbd mirror pool status rbd --verbose


health: OK
daemon health: OK
image health: OK
images: 1 total
1 replaying

DAEMONS
service 104106:
instance_id: 104159
client_id: lceph.quprif
hostname: lceph.local
version: 18.2.1-194.el9cp
leader: true
health: OK

IMAGES
linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: up+stopped
description: local image is primary
service: lceph.quprif on lceph.local
last_update: 2024-08-04 11:23:49
peer_sites:
name: RCEPH
state: up+replaying
description: replaying,
{"bytes_per_second":0.0,"bytes_per_snapshot":370176.0,"last_snapshot_bytes":370176,"last_snap
shot_sync_seconds":0,"local_snapshot_timestamp":1722762520,"remote_snapshot_timestamp":172276
2520,"replay_state":"idle"}
last_update: 2024-08-04 11:23:48
[root@lceph ~]#

Before we failback, we need to double check that we are not set to the primary (in the case of an
unplanned outage we might still be set as the primary). Make sure “mirroring primary” is set to false.

192
IBM Storage Ceph for Beginner’s

[root@rceph ~]# rbd info rbd/linuxlun


rbd image 'linuxlun':
size 3 GiB in 768 objects
order 22 (4 MiB objects)
snapshot_count: 3
id: 20c07c7bb1a76
block_name_prefix: rbd_data.20c07c7bb1a76
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, non-primary
op_features:
flags:
create_timestamp: Sun Aug 4 10:33:30 2024
access_timestamp: Sun Aug 4 11:08:19 2024
modify_timestamp: Sun Aug 4 10:33:30 2024
mirroring state: enabled
mirroring mode: snapshot
mirroring global id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
mirroring primary: false
[root@rceph ~]#

For our example (planned failover/failback) we do not have to manually resynchronize the image on
the old primary as we were replicating changes whilst we failed over to LCEPH back to RCEPH.
However, if this was an unplanned outage, before failing back to the old primary (RCEPH), we would
need to manually resynchronize the image (after making sure RCEPH is demoted as per above) from
LCEPH to ensure we have the latest updates. The syntax of the resync command at site-A (old primary)
is as follows:

rbd mirror image resync POOL_NAME/IMAGE_NAME

For the sake of completeness, we will run this even though in our example it is not needed.

[root@rceph ~]# rbd mirror image resync rbd/linuxlun


Flagged image for resync from primary
[root@rceph ~]#

We need to wait for the mirroring status at RCEPH (old primary) to transition from down+unknown to
up+replaying.

[root@rceph ~]# rbd mirror pool status rbd --verbose


health: OK
daemon health: OK
image health: OK
images: 1 total
1 stopped

DAEMONS
service 134099:
instance_id: 134159
client_id: rceph.qxvdwv
hostname: rceph.local
version: 18.2.1-194.el9cp
leader: true
health: OK

IMAGES
linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: down+unknown
description: status not found
last_update:
peer_sites:
name: LCEPH
state: up+stopped
description: local image is primary
last_update: 2024-08-04 11:27:21
[root@rceph ~]# rbd mirror pool status rbd --verbose
health: OK
daemon health: OK
image health: OK

193
IBM Storage Ceph for Beginner’s

images: 1 total
1 stopped

DAEMONS
service 134099:
instance_id: 134159
client_id: rceph.qxvdwv
hostname: rceph.local
version: 18.2.1-194.el9cp
leader: true
health: OK

IMAGES
linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: down+unknown
description: status not found
last_update:
peer_sites:
name: LCEPH
state: up+stopped
description: local image is primary
last_update: 2024-08-04 11:27:21
[root@rceph ~]#

[root@rceph ~]# rbd mirror pool status rbd --verbose


health: OK
daemon health: OK
image health: OK
images: 1 total
1 replaying

DAEMONS
service 134099:
instance_id: 134159
client_id: rceph.qxvdwv
hostname: rceph.local
version: 18.2.1-194.el9cp
leader: true
health: OK

IMAGES
linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: up+replaying
description: replaying,
{"bytes_per_second":36165939.2,"bytes_per_snapshot":723318784.0,"last_snapshot_bytes":7233187
84,"last_snapshot_sync_seconds":7,"local_snapshot_timestamp":1722762520,"remote_snapshot_time
stamp":1722762520,"replay_state":"idle"}
service: rceph.qxvdwv on rceph.local
last_update: 2024-08-04 11:27:48
peer_sites:
name: LCEPH
state: up+stopped
description: local image is primary
last_update: 2024-08-04 11:27:21
[root@rceph ~]#

Now that we are back in sync (remember, this manual resync was only required if we had an unplanned
outage), we can demote the RBD image on LCEPH and promote the RBD image at RCEPH back to
primary to complete the failback.

[root@lceph ~]# rbd mirror image demote rbd/linuxlun


Image demoted to non-primary
[root@lceph ~]#

[root@rceph ~]# rbd mirror image promote rbd/linuxlun


Image promoted to primary
[root@rceph ~]#

194
IBM Storage Ceph for Beginner’s

We can check the mirroring status on RCEPH to confirm the failover is successful. The status should
be up+stopped on RCEPH and up+replaying on LCEPH.

[root@rceph ~]# rbd mirror image status rbd/linuxlun


linuxlun:
global_id: 0ee454a8-cf71-48e2-95ea-96f6a737f5ac
state: up+stopped
description: local image is primary
service: rceph.qxvdwv on rceph.local
last_update: 2024-08-04 11:30:48
peer_sites:
name: LCEPH
state: up+replaying
description: replaying,
{"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"last_snapshot_bytes":0,"last_snapshot_sync_
seconds":0,"remote_snapshot_timestamp":1722763840,"replay_state":"idle"}
last_update: 2024-08-04 11:30:49
snapshots:
37 .mirror.primary.0ee454a8-cf71-48e2-95ea-96f6a737f5ac.513be678-62d9-495c-8687-
8903a7bdc6f2 (peer_uuids:[49c540be-90cc-4587-9b72-33c943902fb5])
[root@rceph ~]#

Or verify from the Ceph Dashboard on RCEPH.

Figure 187: Ceph Dashboard – Block – Images on RCEPH

As a final step, we can map our RBD image on our RBD client, mount the filesystem and check to make
sure we can see the file we created on the secondary after the initial failover.

root@labserver:~# rbd map --id drlinux rbd/linuxlun --conf /etc/ceph/dr/ceph.conf --keyring


/etc/ceph/dr/ceph.client.drlinux.keyring
/dev/rbd0
root@labserver:~# rbd showmapped
id pool namespace image snap device
0 rbd linuxlun - /dev/rbd0
root@labserver:~# mount /dev/rbd0 /mnt/rbd/
root@labserver:~# ls -alt /mnt/rbd
total 12
drwxr-xr-x 2 root root 64 Aug 4 11:14 .
-rw-r--r-- 1 root root 15 Aug 4 11:14 after.failover
-rw-r--r-- 1 root root 16 Aug 4 11:06 before.failover
-rw-r--r-- 1 root root 0 Aug 4 10:56 600mb
drwxr-xr-x 6 root root 4096 Jul 27 13:06 ..
root@labserver:~#

195
IBM Storage Ceph for Beginner’s

That concludes the test case for demonstrating RBD mirroring with failover/failback that you can use
for your POC.

IBM Storage Ceph Replication – CephFS Snapshot Mirroring


One of the least friendly features of Ceph is to configure CephFS snapshot mirroring. Unlike RBD two-
way replication, there is no failover/failback functionality similar to IBM Storage Scale AFM-DR. Ceph
supports the asynchronous replication of snapshots to a remote Ceph cluster. Snapshot
synchronization copies snapshot data to a remote CephFS filesystem and creates a remote snapshot
with the same name. As mentioned when we discussed CephFS snapshots, you are also able to
replicate specific directories within a CephFS filesystem if desired. CephFS snapshot mirroring is
enabled by the cephfs-mirror daemon which is responsible for copying data to the remote Ceph
cluster. You can configure one or more cephfs-mirror daemons for high availability if required.

Both source and target Ceph clusters must be running IBM Storage Ceph 7.0 or later

A more detailed explanation of CephFS snapshot mirroring can be found here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=systems-ceph-file-system-snapshot-
mirroring

For our lab environment, we will replicate our source filesystem called rcephfs on Ceph cluster RCEPH
to a target filesystem on Ceph cluster LCEPH called lcephfs as depicted below. We have a NFS client
called labserver that we will use to test access to the primary and replicated CephFS filesystems
accessed via NFS as opposed to using the native CephFS kernel driver.

Figure 188: CephFS Snapshot Mirroring Setup

Let us start by creating a Ceph filesystem called rcephfs on our source Ceph cluster call RCEPH and
deploy at least one cephfs-mirror daemon.

196
IBM Storage Ceph for Beginner’s

Figure 189: Ceph Dashboard – File Systems – Create on RCEPH

Remember, for high availability you need to deploy more than one cephfs-mirror daemon. We just
need a single daemon for our test as we are not concerned about replication performance or high
availability.

Figure 190: Ceph Dashboard – Administration – Services – Create Service on RCEPH

We will create the target Ceph filesystem on LCEPH called lcephfs.

197
IBM Storage Ceph for Beginner’s

Figure 191: Ceph Dashboard – File Systems – Create on LCEPH

We need to create a Ceph user for each CephFS peer on our target cluster LCEPH. We just have a single
peer so just one user is required. You can create it from the command line and verify it from the Ceph
Dashboard.

[root@lceph ceph]# ceph fs authorize lcephfs client.mirror_remote / rwps


[client.mirror_remote]
key = AQAJrK9mbFBuCRAAmJV/niXwpa+yJ8b/iI1zYA==
[root@lceph ceph]#

Figure 192: Ceph Dashboard – Administration – Ceph Users on LCEPH

On both source and target clusters, we need to make sure the CephFS mirroring module mirroring is
enabled (it is disabled by default).

198
IBM Storage Ceph for Beginner’s

Figure 193: Ceph Dashboard – Administration – Manaher Modules – Mirroring on RCEPH

Figure 194: Ceph Dashboard – Administration – Manaher Modules – Mirroring on LCEPH

On the source cluster RCEPH, we can enable CephFS mirroring on rcephfs.

[root@rceph ~]# ceph fs snapshot mirror enable rcephfs


{}
[root@rceph ~]#

On the target cluster LCEPH we need to create a cluster bootstrap token. The format of the command
is as follows:

ceph fs snapshot mirror peer_bootstrap create FILE_SYSTEM_NAME CLIENT_NAME SITE_NAME

199
IBM Storage Ceph for Beginner’s

We need to specify the target Ceph filesystem, the Ceph client we created earlier and specify a site
name which will be LCEPH. We need to copy the bootstrap token which is output inside the double
quotes.

[root@lceph ceph]# ceph fs snapshot mirror peer_bootstrap create lcephfs client.mirror_remote


LCEPH
{"token":
"eyJmc2lkIjogImE1ZDYzYmEwLTUxNjAtMTFlZi04YTk5LTUyNTQwMDBhYmQ0NiIsICJmaWxlc3lzdGVtIjogImxjZXBo
ZnMiLCAidXNlciI6ICJjbGllbnQubWlycm9yX3JlbW90ZSIsICJzaXRlX25hbWUiOiAiTENFUEgiLCAia2V5IjogIkFRQ
UpySzltYkZCdUNSQUFtSlYvbmlYd3BhK3lKOGIvaUkxellBPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjAuMC4yNDk6Mz
MwMC8wLHYxOjEwLjAuMC4yNDk6Njc4OS8wXSJ9"}
[root@lceph ceph]#

On our source cluster RCEPH, we need to import the target bootstrap token we just generated. The
format of the command is as follows:

ceph fs snapshot mirror peer_bootstrap import FILE_SYSTEM_NAME TOKEN

FILE_SYSTEM_NAME refers to our source filesystem which is rcephfs.

[root@rceph ~]# ceph fs snapshot mirror peer_bootstrap import rcephfs


eyJmc2lkIjogImE1ZDYzYmEwLTUxNjAtMTFlZi04YTk5LTUyNTQwMDBhYmQ0NiIsICJmaWxlc3lzdGVtIjogImxjZXBoZ
nMiLCAidXNlciI6ICJjbGllbnQubWlycm9yX3JlbW90ZSIsICJzaXRlX25hbWUiOiAiTENFUEgiLCAia2V5IjogIkFRQU
pySzltYkZCdUNSQUFtSlYvbmlYd3BhK3lKOGIvaUkxellBPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjAuMC4yNDk6MzM
wMC8wLHYxOjEwLjAuMC4yNDk6Njc4OS8wXSJ9
{}
[root@rceph ~]#

On our source cluster RCEPH, we can now list the mirror peers for our filesystem rcephfs.

[root@rceph ~]# ceph fs snapshot mirror peer_list rcephfs


{"c1014a51-4f65-4e64-a2fd-5d980648d963": {"client_name": "client.mirror_remote", "site_name":
"LCEPH", "fs_name": "lcephfs"}}
[root@rceph ~]#

[root@rceph ~]# ceph fs snapshot mirror daemon status


[{"daemon_id": 134264, "filesystems": [{"filesystem_id": 1, "name": "rcephfs",
"directory_count": 0, "peers": [{"uuid": "c1014a51-4f65-4e64-a2fd-5d980648d963", "remote":
{"client_name": "client.mirror_remote", "cluster_name": "LCEPH", "fs_name": "lcephfs"},
"stats": {"failure_count": 0, "recovery_count": 0}}]}]}]
[root@rceph ~]#

On the source cluster RCEPH, we need to configure the directory path for CephFS mirroring. We want
to mirror the entire rcephfs filesystem so will specify / as the directory path.

[root@rceph ~]# ceph fs snapshot mirror add rcephfs /


{}
[root@rceph ~]#

If we query our cephfs-mirror daemon we should have 1 directory configured. Also note the peer
cluster UUID.

[ceph: root@rceph ceph]# ceph fs snapshot mirror daemon status


[{"daemon_id": 134264, "filesystems": [{"filesystem_id": 1, "name": "rcephfs",
"directory_count": 1, "peers": [{"uuid": "c1014a51-4f65-4e64-a2fd-5d980648d963", "remote":
{"client_name": "client.mirror_remote", "cluster_name": "LCEPH", "fs_name": "lcephfs"},
"stats": {"failure_count": 0, "recovery_count": 0}}]}]}]
[ceph: root@rceph ceph]#

To get the actual mirror status of the snapshot mirroring, we have to run a few obscure commands.
First, we need to get the FSID of rcephfs on the source cluster RCEPH. ASOK refers to administrative
socket (refer to https://www.ibm.com/docs/en/storage-ceph/7?topic=monitoring-using-ceph-
administration-socket for more information on the use of ASOKs).

200
IBM Storage Ceph for Beginner’s

The ASOK files are located in the /var/run/ceph. On the node where the cephfs-mirror daemon is
running navigate to that directory and list the files. You have to be within the cephadm-shell.

[root@rceph ~]# cephadm shell


Inferring fsid d3a18482-5038-11ef-8136-525400be8183
Inferring config /var/lib/ceph/d3a18482-5038-11ef-8136-525400be8183/mon.rceph/config
Using ceph image with id 'a09ffce67935' and tag 'latest' created on 2024-05-31 19:48:46 +0000
UTC
cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:354f5b6f203dbd9334ac2f2bfb541c7e06498a62283b1c91cef5fa8a036aea4f
[ceph: root@rceph /]# cd /var/run/ceph
[ceph: root@rceph ceph]# ls
ceph-client.ceph-exporter.rceph.asok ceph-mgr.rceph.zigotn.asok
ceph-client.cephfs-mirror.rceph.asxyxf.2.94142486026488.asok ceph-mon.rceph.asok
ceph-client.cephfs-mirror.rceph.asxyxf.2.94142501419256.asok ceph-osd.0.asok
ceph-client.mirror_remote.2.94142506072312.asok ceph-osd.1.asok
ceph-client.rbd-mirror.rceph.qxvdwv.2.94180778195192.asok ceph-osd.2.asok
ceph-client.rgw.rgw.rceph.morhpb.2.94086409779448.asok client.rbd-mirror-
peer.2.LCEPH.94180808284408.asok
ceph-mds.rcephfs.rceph.nnjtxu.asok client.rbd-
mirror.rceph.qxvdwv.2.ceph.94180794046712.asok
ceph-mgr.rceph.kumbrt.asok
[ceph: root@rceph ceph]#

We have two ASOK files for our mirroring daemon. To get the FSID, we have to run the following
command:

ceph --admin-daemon PATH_TO_THE_ASOK_FILE help

From within the cephadmin-shell, issue the above command against the two files we found. We are
looking for fs mirror peer status, The first file doesn’t have this information.

[ceph: root@rceph ceph]# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-


mirror.rceph.asxyxf.2.94142501419256.asok help
{
"config diff": "dump diff of current config and default config",
"config diff get": "dump diff get <field>: dump diff of current and default config
setting <field>",
"config get": "config get <field>: get the config value",
"config help": "get config setting schema and descriptions",
"config set": "config set <field> <val> [<val> ...]: set a config variable",
"config show": "dump current config settings",
"config unset": "config unset <field>: unset a config variable",
"counter dump": "dump all labeled and non-labeled counters and their values",
"counter schema": "dump all labeled and non-labeled counters schemas",
"dump_cache": "show in-memory metadata cache contents",
"dump_mempools": "get mempool stats",
"get_command_descriptions": "list available commands",
"git_version": "get git sha1",
"help": "list available commands",
"injectargs": "inject configuration arguments into running daemon",
"kick_stale_sessions": "kick sessions that were remote reset",
"log dump": "dump recent log entries to log file",
"log flush": "flush log entries to log file",
"log reopen": "reopen log file",
"mds_requests": "show in-progress mds requests",
"mds_sessions": "show mds session state",
"objecter_requests": "show in-progress osd requests",
"perf dump": "dump non-labeled counters and their values",
"perf histogram dump": "dump perf histogram values",
"perf histogram schema": "dump perf histogram schema",
"perf reset": "perf reset <name>: perf reset all or one perfcounter name",
"perf schema": "dump non-labeled counters schemas",
"rotate-key": "rotate live authentication key",
"status": "show overall client status",
"version": "get ceph version"
}
[ceph: root@rceph ceph]#

201
IBM Storage Ceph for Beginner’s

The second file does have it. Our FSID is therefore rcephfs@1. Also, the PEER UUID is c1014a51-
4f65-4e64-a2fd-5d980648d963.

[ceph: root@rceph ceph]# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-


mirror.rceph.asxyxf.2.94142486026488.asok help
{
"config diff": "dump diff of current config and default config",
"config diff get": "dump diff get <field>: dump diff of current and default config
setting <field>",
"config get": "config get <field>: get the config value",
"config help": "get config setting schema and descriptions",
"config set": "config set <field> <val> [<val> ...]: set a config variable",
"config show": "dump current config settings",
"config unset": "config unset <field>: unset a config variable",
"counter dump": "dump all labeled and non-labeled counters and their values",
"counter schema": "dump all labeled and non-labeled counters schemas",
"dump_mempools": "get mempool stats",
"fs mirror peer status rcephfs@1 c1014a51-4f65-4e64-a2fd-5d980648d963": "get peer mirror
status",
"fs mirror status rcephfs@1": "get filesystem mirror status",
"get_command_descriptions": "list available commands",
"git_version": "get git sha1",
"help": "list available commands",
"injectargs": "inject configuration arguments into running daemon",
"log dump": "dump recent log entries to log file",
"log flush": "flush log entries to log file",
"log reopen": "reopen log file",
"objecter_requests": "show in-progress osd requests",
"perf dump": "dump non-labeled counters and their values",
"perf histogram dump": "dump perf histogram values",
"perf histogram schema": "dump perf histogram schema",
"perf reset": "perf reset <name>: perf reset all or one perfcounter name",
"perf schema": "dump non-labeled counters schemas",
"rotate-key": "rotate live authentication key",
"version": "get ceph version"
}
[ceph: root@rceph ceph]#

Once we have the FSID, we can run the following command from within the cephadm-shell to get the
mirroring status.

ceph --admin-daemon PATH_TO_THE_ASOK_FILE fs mirror status FILE_SYSTEM_NAME@_FILE_SYSTEM_ID

If we run the command against the second file we identified with the correct filesystem name and
FSID we get:

[ceph: root@rceph ceph]# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-


mirror.rceph.asxyxf.2.94142486026488.asok fs mirror status rcephfs@1
{
"rados_inst": "10.0.0.239:0/2903478823",
"peers": {
"c1014a51-4f65-4e64-a2fd-5d980648d963": {
"remote": {
"client_name": "client.mirror_remote",
"cluster_name": "LCEPH",
"fs_name": "lcephfs"
}
}
},
"snap_dirs": {
"dir_count": 1
}
}
[ceph: root@rceph ceph]#

To view the detailed peer status, we need to run the following command:

ceph --admin-daemon PATH_TO_ADMIN_SOCKET fs mirror status FILE_SYSTEM_NAME@FILE_SYSTEM_ID


PEER_UUID

202
IBM Storage Ceph for Beginner’s

Issuing the above command with our FSID and PEER UUID we get:

[ceph: root@rceph ceph]# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-


mirror.rceph.asxyxf.2.94142486026488.asok fs mirror peer status rcephfs@1 c1014a51-4f65-4e64-
a2fd-5d980648d963
{
"/": {
"state": "idle",
"snaps_synced": 0,
"snaps_deleted": 0,
"snaps_renamed": 0
}
}
[ceph: root@rceph ceph]#

The state can be one of these three values:

• idle means that the directory is currently not being synchronized.


• syncing means that the directory is currently being synchronized.
• failed means that the directory has reached the upper limit of consecutive failures.

Our state is idle as we haven’t created any snapshots yet. If you want to query detailed metrics for the
snapshot mirroring, we need to issue the following command from within the cephadm-shell:

[ceph: root@rceph ceph]# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-


mirror.rceph.asxyxf.2.94142486026488.asok counter dump
{
"AsyncMessenger::Worker": [
{
"labels": {
"id": "0"
},
.
.
.
"throttle-objecter_ops": [
{
"labels": {},
"counters": {
"val": 0,
"max": 1024,
"get_started": 0,
"get": 0,
"get_sum": 0,
"get_or_fail_fail": 0,
"get_or_fail_success": 0,
"take": 0,
"take_sum": 0,
"put": 0,
"put_sum": 0,
"wait": {
"avgcount": 0,
"sum": 0.000000000,
"avgtime": 0.000000000
}
}
}
]
}

If we create a snapshot on RCEPH, after some time (depending on the size of the snapshot and
bandwidth of the network between clusters), we should see the snapshot propagated to the target
cluster LCEPH.

203
IBM Storage Ceph for Beginner’s

Figure 195: Ceph Dashboard – File Systems – Directories – Create Snapshot on RCEPH

On LCEPH we can check to make sure we have the same snapshot.

Figure 196: Ceph Dashboard – File Systems – Directories on LCEPH

If we create another snapshot the same mirroring process should occur.

204
IBM Storage Ceph for Beginner’s

Figure 197: Ceph Dashboard – File Systems – Directories – Create Snapshot on RCEPH

If we query the peer mirroring status again, we should see 2 snapshots being transferred.

[ceph: root@rceph /]# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-


mirror.rceph.asxyxf.2.94142486026488.asok fs mirror peer status rcephfs@1 c1014a51-4f65-4e64-
a2fd-5d980648d963
{
"/": {
"state": "idle",
"last_synced_snap": {
"id": 3,
"name": "THIS_IS_A_PRIMARY_SNAP",
"sync_duration": 16.448360960999999,
"sync_time_stamp": "34431.938954s"
},
"snaps_synced": 2,
"snaps_deleted": 0,
"snaps_renamed": 0
}
}
[ceph: root@rceph /]#

And on LCEPH we can check to see if we can see the snapshot we created on RCEPH.

205
IBM Storage Ceph for Beginner’s

Figure 198: Ceph Dashboard – File Systems – Directories – Snapshots on LCEPH

As a final test, let’s mount both rcephfs and lcephfs on our NFS client labserver. First, we need to
create an NFS export on both clusters.

Figure 199: Ceph Dashboard – NFS - Create on RCEPH

206
IBM Storage Ceph for Beginner’s

Figure 200: Ceph Dashboard – NFS - Create on LCEPH

We can mount both filesystems on our NFS client.

root@labserver:~# mount -t nfs -o vers=4.2,proto=tcp,port=2049,rw lceph.local:/lceph


/mnt/lcephfs -vv
mount.nfs: timeout set for Sun Aug 4 19:51:32 2024
mount.nfs: trying text-based options
'vers=4.2,proto=tcp,port=2049,addr=10.0.0.249,clientaddr=10.0.0.246'
root@labserver:~# mount -t nfs -o vers=4.2,proto=tcp,port=2049,rw rceph.local:/rceph
/mnt/rcephfs -vv
mount.nfs: timeout set for Sun Aug 4 19:52:22 2024
mount.nfs: trying text-based options
'vers=4.2,proto=tcp,port=2049,addr=10.0.0.239,clientaddr=10.0.0.246'
root@labserver:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 401004 1032 399972 1% /run
/dev/sda2 20463184 16832204 2566176 87% /
tmpfs 2005008 84 2004924 1% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 401000 16 400984 1% /run/user/0
/dev/rbd0 3080192 92380 2987812 3% /mnt/rbd
rceph.local:/rceph 41373696 0 41373696 0% /mnt/rcephfs
lceph.local:/lceph 42696704 0 42696704 0% /mnt/lcephfs
root@labserver:~#

Now we will create a file in rcephfs and then take a snapshot and see if it propagates to lcephfs while
the target filesystem is mounted.

root@labserver:~# ls -l /mnt/rceph
total 0
root@labserver:~# ls -l /mnt/lceph
total 0
root@labserver:~#
root@labserver:~# touch /mnt/rceph/NEW_FILE_TO_REPLICATE
root@labserver:~# ls /mnt/rceph
NEW_FILE_TO_REPLICATE
root@labserver:~# ls /mnt/lceph
root@labserver:~#

Let us create a new snapshot for rcephfs on Ceph cluster RCEPH.

207
IBM Storage Ceph for Beginner’s

Figure 201: Ceph Dashboard – File Systems – Directories – Snapshots – Create on RCEPH

Our snapshot is tiny so hardly takes a minute to propagate to LCEPH.

ceph: root@rceph /]# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-


mirror.rceph.asxyxf.2.94142486026488.asok fs mirror peer status rcephfs@1 c1014a51-4f65-4e64-
a2fd-5d980648d963
{
"/": {
"state": "idle",
"last_synced_snap": {
"id": 4,
"name": "AFTER_RCEPH_FILE_CREATION",
"sync_duration": 7.9741755809999999,
"sync_time_stamp": "35657.087865s"
},
"snaps_synced": 3,
"snaps_deleted": 0,
"snaps_renamed": 0
}
}
[ceph: root@rceph /]#

We can check on LCEPH Ceph Dashboard to verify.

Figure 202: Ceph Dashboard – File Systems – Directories – Snapshots on LCEPH

208
IBM Storage Ceph for Beginner’s

And if we do a directory listing on our NFS client we can see that the file is replicated to LCEPH and
accessible on the client with the filesystem mounted.

root@labserver:/mnt/rcephfs# ls -al /mnt/lcephfs/


total 1228805
drwxr-xr-x 2 root root 1258291200 Aug 4 19:52 .
drwxr-xr-x 8 root root 4096 Aug 4 19:47 ..
-rw-r--r-- 1 root root 0 Aug 4 19:50 NEW_FILE_TO_REPLICATE
root@labserver:/mnt/rcephfs#

As a final test, let’s see what happens if we create a file on the target filesystem and then a new
snapshot on the source to check whether the target file gets overwritten after the latest source
snapshot is replicated.

root@labserver:/mnt/rcephfs# touch TEST_TO_SEE_TARGET_OVERWRITE_RCEPH


root@labserver:/mnt/rcephfs# ls -alt
total 1228805
-rw-r--r-- 1 root root 0 Aug 4 19:55 TEST_TO_SEE_TARGET_OVERWRITE_RCEPH
drwxr-xr-x 2 root root 1258291200 Aug 4 19:55 .
-rw-r--r-- 1 root root 0 Aug 4 19:50 NEW_FILE_TO_REPLICATE
drwxr-xr-x 8 root root 4096 Aug 4 19:47 ..
root@labserver:/mnt/rcephfs#

root@labserver:/mnt/lcephfs# touch LCEPH_AFTER_LATEST_SNAP


root@labserver:/mnt/lcephfs# ls -alt
total 1228805
-rw-r--r-- 1 root root 0 Aug 4 19:56 LCEPH_AFTER_LATEST_SNAP
drwxr-xr-x 2 root root 1258291200 Aug 4 19:56 .
-rw-r--r-- 1 root root 0 Aug 4 19:50 NEW_FILE_TO_REPLICATE
drwxr-xr-x 8 root root 4096 Aug 4 19:47 ..
root@labserver:/mnt/lcephfs#

So, we have a new file to replicate to the target and the target has a new file created (that of course is
not replicated to the source). We now create a new rcephfs snapshot on the source cluster RCEPH.

Figure 203: Ceph Dashboard – File Systems – Directories – Snapshots – Create on RCEPH

Make sure it is successfully transferred.

209
IBM Storage Ceph for Beginner’s

[ceph: root@rceph /]# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-


mirror.rceph.asxyxf.2.94142486026488.asok fs mirror peer status rcephfs@1 c1014a51-4f65-4e64-
a2fd-5d980648d963
{
"/": {
"state": "idle",
"last_synced_snap": {
"id": 5,
"name": "TESTING_OVERWRITE_OR_NOT",
"sync_duration": 0.031000682000000002,
"sync_time_stamp": "35960.508546s"
},
"snaps_synced": 4,
"snaps_deleted": 0,
"snaps_renamed": 0
}
}
[ceph: root@rceph /]#

Check on LCEPH if we can see the latest snapshot.

Figure 204: Ceph Dashboard – File Systems – Directories – Snapshots on LCEPH

And now we can check if we picked up the latest file created on our source rcephfs and if our file we
created prior to the latest snapshot has been overwritten or not.

root@labserver:/mnt/lcephfs# ls -alt
total 1228805
drwxr-xr-x 2 root root 1258291200 Aug 4 19:57 .
-rw-r--r-- 1 root root 0 Aug 4 19:56 LCEPH_AFTER_LATEST_SNAP
-rw-r--r-- 1 root root 0 Aug 4 19:55 TEST_TO_SEE_TARGET_OVERWRITE_RCEPH
-rw-r--r-- 1 root root 0 Aug 4 19:50 NEW_FILE_TO_REPLICATE
drwxr-xr-x 8 root root 4096 Aug 4 19:47 ..
root@labserver:/mnt/lcephfs#

So, we didn’t lose the file we created prior to the snapshot and can see the last file we generated on
the source. This functionality is similar to IBM Storage Scale AFM-DR.

root@labserver:/mnt/lcephfs# ls -alt /mnt/rcephfs/


total 1228805
-rw-r--r-- 1 root root 0 Aug 4 19:55 TEST_TO_SEE_TARGET_OVERWRITE_RCEPH
drwxr-xr-x 2 root root 1258291200 Aug 4 19:55 .
-rw-r--r-- 1 root root 0 Aug 4 19:50 NEW_FILE_TO_REPLICATE

210
IBM Storage Ceph for Beginner’s

drwxr-xr-x 8 root root 4096 Aug 4 19:47 ..


root@labserver:/mnt/lcephfs#

Since we can’t use the CephFS scheduler (we not making use of subvolumes as explained earlier), you
can schedule client snapshots via cron for example to meet a specific RPO (e.g. every 15 minutes or
every hour etc.).

If the source site experiences a failure, you can point clients to the target site and continue processing.
When the source site is recovered, you would need to disable snapshot mirroring and then setup
replication in the opposite direction and once it’s in sync, you can cutover clients back to the original
primary site.

IBM Storage Ceph Replication – RGW Multi-site


Perhaps Ceph’s strongest feature is the RGW Object Store. As with RBD Mirroring and CephFS
Snapshot mirroring, replication for the RGW is referred to as a multi-site configuration. In the RGW
Deployment section we briefly introduced the concepts of a Realm, Zone Group and Zone.

• Realms - A realm represents a globally unique namespace consisting of one or more


zonegroups containing one or more zones, and zones containing buckets, which in turn
contain objects. A realm enables the Ceph Object Gateway to support multiple namespaces
and their configuration on the same hardware.
• Zone groups - Zone groups define the geographic location of one or more Ceph Object
Gateway instances within one or more zones.
• Zones - Ceph Object Gateway supports the notion of zones. A zone defines a logical group
consisting of one or more Ceph Object Gateway instances.

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=gateway-multi-site-configuration-
administration

Ceph supports several multi-site configuration options for the Ceph Object Gateway:

• Multi-zone: A more advanced configuration consists of one zone group and multiple zones,
each zone with one or more ceph-radosgw instances. Each zone is backed by its own Ceph
Storage Cluster. Multiple zones in a zone group provides disaster recovery for the zone group
should one of the zones experience a significant failure. Each zone is active and may receive
write operations. In addition to disaster recovery, multiple active zones may also serve as a
foundation for content delivery networks.

• Multi-zone-group: Formerly called 'regions', the Ceph Object Gateway can also support
multiple zone groups, each zone group with one or more zones. Objects stored to zone groups
within the same realm share a global namespace, ensuring unique object IDs across zone
groups and zones.

• Multiple Realms: The Ceph Object Gateway supports the notion of realms, which can be a
single zone group or multiple zone groups and a globally unique namespace for the realm.
Multiple realms provide the ability to support numerous configurations and namespaces.

211
IBM Storage Ceph for Beginner’s

Figure 205: Ceph Object Gateway Realm (https://www.ibm.com/docs/en/storage-


ceph/7.1?topic=gateway-multi-site-configuration-administration)

A multi-site configuration requires a master zone group and a master zone. Each zone group requires
a master zone. Zone groups may have one or more secondary or non-master zones. A single site
deployment would typically consist of a single zone group with a single zone and one or more RGW
instances (like we covered for the section earlier on RGW Deployment). For a multi-site configuration,
we require at least two Ceph storage clusters and at least two RGW instances, one per cluster. To
demonstrate RGW replication, we will create a multi-site configuration based on the following:

Figure 206: Our Ceph Lab RGW Multi-Site Configuration

Please refer to the following URL for a more detailed explanation of how to setup a Ceph Object
Gateway Multi-Site configuration:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=gateway-configuring-multi-site

212
IBM Storage Ceph for Beginner’s

You can also migrate a single site deployment with a default zone configuration to a multi-site
deployment. This process is document here https://www.ibm.com/docs/en/storage-
ceph/7.1?topic=administration-migrating-single-site-system-multi-site.

Let us start on RCEPH and deploy a RGW service with the following configuration to match our multi-
site deployment:

Figure 207: Ceph Dashboard – Administration – Create Service on RCEPH

Next, we need to create a replication user on the primary cluster RCEPH as follows. Note the S3 access
and secret keys for this new replication user.

[root@rceph ~]# radosgw-admin user create --uid="rgw-sync-user" --display-


name="Syncronization User" --system
{
"user_id": "rgw-sync-user",
"display_name": "Syncronization User",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"subusers": [],
"keys": [
{
"user": "rgw-sync-user",
"access_key": "SB4FZDPO22LFEATFKO32",
"secret_key": "dp4dtV0g6RD1aNJtsR1klwK5srn4aDHUiW2QQmJB"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"system": true,
"default_placement": "",
"default_storage_class": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,

213
IBM Storage Ceph for Beginner’s

"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw",
"mfa_ids": []
}

[root@rceph ~]#

On the primary cluster RCEPH, navigate to Object -> Multi-site and edit the Zone called PRIMARY.
Enter the replication user access and secret keys. Ensure the endpoint is set to the RGW we just
deployed (where we chose to place the RGW daemon which in our case is on cluster node rceph.local
or 10.0.0.239).

Figure 208: Ceph Dashboard – Object – Multi-site - Edit Zone on RCEPH

Now edit the Zone Group called REPLICATED_ZONE on RCEPH and ensure our endpoint is correctly
set to rceph.local or 10.0.0.239.

214
IBM Storage Ceph for Beginner’s

Figure 209: Ceph Dashboard – Object – Multi-site - Edit Zone Group on RCEPH

On our primary cluster RCEPH, edit the Realm called MYLAB and copy the multi-site token.

Figure 210: Ceph Dashboard – Object – Multi-site - Edit Realm on RCEPH

Make sure the new Realm, Zone Group and Zone are set to be the Default.

When you next go to Object -> Gateways, you get an error that The Object Gateway Service is not
configured. This bug is document here https://bugzilla.redhat.com/show_bug.cgi?id=2231072. As
a workaround, set the Ceph Object Gateway credentials on the command-line interface with the
ceph dashboard set-rgw-credentials command.

215
IBM Storage Ceph for Beginner’s

We can query the RGW sync status for now to confirm out configuration.

[root@rceph ~]# radosgw-admin sync status


realm bee2e14c-b965-4f6d-a010-3a0625dee923 (MYLAB)
zonegroup b0269fde-61cc-4b7e-9e34-389cf0bc0eed (REPLICATED_ZONE)
zone 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af (PRIMARY)
current time 2024-08-03T07:01:14Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync no sync (zone is master)
[root@rceph ~]#

On the secondary cluster LCEPH, deploy the RGW service as follows. Note, we will import the Realm
Token from our primary cluster RCEPH later so will accept the default Zone Group and Zone Name.
This step is required for us to get access on the Dashboard to the Object protocol in order to configure
Multi-site (otherwise navigating to Object will give us a message that the Object Gateway service is
not configured).

Figure 211: Ceph Dashboard – Administration – Create Service on LCEPH

We need to now configure the connection between our primary cluster RCEPH to our secondary cluster
LCEPH for object replication. On the secondary cluster LCEPH, you should now be able to navigate to
Object -> Multi-site -> Import. Enter the information as follows. Note, we have to specify the
secondary Zone Name to match the configuration we want which is SECONDARY_ZONE. We can also
choose how many RGW daemons to deploy (one will do for our test).

216
IBM Storage Ceph for Beginner’s

Figure 212: Ceph Dashboard – Object – Multi-site – Import on LCEPH

Once you click on Import to complete this step, your configuration should look similar to this. Note.
there is a warning for us to restart all RGW instances on our secondary cluster to ensure a consistent
multi-site configuration.

Figure 213: Ceph Dashboard – Object – Multi-site on LCEPH

On our secondary cluster, navigate to Administration -> Services and restart the newly created RGW
server (which was deployed when we did the Import). Note, we also have the initial RGW service we
deployed which is not needed anymore.

217
IBM Storage Ceph for Beginner’s

Figure 214: Ceph Dashboard – Administration – Services

If you get the following error when you navigate to Object on the Ceph Dashboard, then you need to
issue the ceph dashboard set-rgw-credentials as detailed earlier.

Figure 215: Ceph Dashboard – Object – ERROR on LCEPH

We can now query the RGW replication status on our secondary cluster LCEPH. We can check that the
data sync source is set to our primary cluster zone called PRIMARY and if the data is all caught up or
not (in our case both bucket metadata and data are already caught up).

[root@lceph cephadm-ansible]# radosgw-admin sync status


realm bee2e14c-b965-4f6d-a010-3a0625dee923 (MYLAB)
zonegroup b0269fde-61cc-4b7e-9e34-389cf0bc0eed (REPLICATED_ZONE)
zone 30de453d-b997-4435-9a48-feebe5245952 (SECONDARY)
current time 2024-08-03T07:14:21Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master

218
IBM Storage Ceph for Beginner’s

data sync source: 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af (PRIMARY)


syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
[root@lceph cephadm-ansible]#

In Ceph RGW, a shard is a part of a bucket index that is split into multiple rados objects. Sharding is
a process that breaks down a dataset into multiple parts to increase parallelism and distribute the
load.

During sync, the shards are of two types:


• Behind shards are shards that require a data sync (either a full data sync or an incremental
data sync) in order to be brought up to date.
• Recovery shards are shards that encountered an error during sync and have been marked for
retry. The error occurs mostly on minor issues, such as acquiring a lock on a bucket. Errors of
this kind typically resolve on their own.

Secondary zones accept bucket operations; however, secondary zones redirect bucket operations to
the master zone and then synchronize with the master zone to receive the result of the bucket
operations. If the master zone is down, bucket operations executed on the secondary zone will fail,
but object operations should succeed. The master zone is only important when talking about accounts
and new buckets. Writing object data will always use the latest write, regardless of where it is
ingested.

On our secondary cluster, if you navigate to Object -> Overview you can see the status of the
replication as well.

Figure 216: Ceph Dashboard – Object – Overview on LCEPH

On the secondary cluster LCEPH, in the Ceph Dashboard under Object -> Multi-site you might see a
red exclamation mark next to the secondary zone. This is due to the following bug
https://bugzilla.redhat.com/show_bug.cgi?id=2242994.

219
IBM Storage Ceph for Beginner’s

Figure 217: Ceph Dashboard – Object – Multi-site on RCEPH

To fix this, edit the secondary zone called SECONDARY on LCEPH and change the endpoint to the IP
address of LCEPH which is 10.0.0.249.

Figure 218: Ceph Dashboard – Object – Multi-site – Edit SECONDARY Zone on LCEPH

On our primary cluster, if we navigate to Object -> Multi-site, we should see the following which clearly
shows that the PRIMARY ZONE on RCEPH is the master zone for the Zone Group REPLICATED_ZONE.
Depending on how many buckets and objects need to be replicated, it might take the Ceph Dashboard
a while to report the correct status.

220
IBM Storage Ceph for Beginner’s

Figure 219: Ceph Dashboard – Object – Multi-site on RCEPH

On our primary cluster RCEPH, the sync status should also show that we are the master zone.

[root@rceph ~]# radosgw-admin sync status


realm bee2e14c-b965-4f6d-a010-3a0625dee923 (MYLAB)
zonegroup b0269fde-61cc-4b7e-9e34-389cf0bc0eed (REPLICATED_ZONE)
zone 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af (PRIMARY)
current time 2024-08-03T07:31:37Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync no sync (zone is master)
data sync source: 30de453d-b997-4435-9a48-feebe5245952 (SECONDARY)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
[root@rceph ~]#

You can also check the sync status on our primary cluster RCEPH from the Ceph Dashboard. If you
hover over the status you will see how many shards need to still be replicated.

221
IBM Storage Ceph for Beginner’s

Figure 220: Ceph Dashboard – Object – Overview on RCEPH

Any existing Object Users on RCEPH will be propagated to LCEPH. You can verify this by navigating to
Object -> Users on each cluster.

Figure 221: Ceph Dashboard – Object – Users on RCEPH

222
IBM Storage Ceph for Beginner’s

Figure 222: Ceph Dashboard – Object – Users on LCEPH

Let us now do a simple test by creating a bucket on RCEPH and uploading a file to it and then querying
the same bucket on LCEPH to see if the data is replicated. First, we create and upload a file to our
bucket on RCEPH using AWSCLI or any S3 client of your choice.

[root@rceph ~]# aws s3api --endpoint-url http://rceph.local:8080 create-bucket --bucket


mynewbucket
[root@rceph ~]# aws s3api --endpoint-url http://rceph.local:8080 put-object --bucket
mynewbucket --key new --body ./awscliv2.zip
{
"ETag": "\"7bb7cfda37a4ed26666902219c73cce2\""
}
[root@rceph ~]# aws s3api --endpoint-url http://rceph.local:8080 list-objects --bucket
mynewbucket
{
"Contents": [
{
"Key": "new",
"LastModified": "2024-08-03T07:39:25.488000+00:00",
"ETag": "\"7bb7cfda37a4ed26666902219c73cce2\"",
"Size": 60864458,
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "S3 Test User",
"ID": "s3test"
}
}
],
"RequestCharged": null
}
[root@rceph ~]#

Then we can query the bucket on LCEPH to see if it got replicated.

[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 list-buckets


{
"Buckets": [
{
"Name": "mynewbucket",
"CreationDate": "2024-08-03T07:38:26.208000+00:00"
}
],
"Owner": {
"DisplayName": "S3 Test User",

223
IBM Storage Ceph for Beginner’s

"ID": "s3test"
}
}
[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 list-objects --bucket
mynewbucket
{
"Contents": [
{
"Key": "new",
"LastModified": "2024-08-03T07:39:25.488000+00:00",
"ETag": "\"7bb7cfda37a4ed26666902219c73cce2\"",
"Size": 60864458,
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "S3 Test User",
"ID": "s3test"
}
}
],
"RequestCharged": null
}
[root@rceph ~]#

Even though LCEPH does not contain the master zone, it will still accept bucket operations by
redirecting the request to the master and synchronizing afterwards. Let us test this by creating a
bucket on LCEPH and then listing all buckets on RCEPH and then copying a file to our bucket on LCEPH
and then querying the contents of the bucket on RCEPH.

[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 create-bucket --bucket


secbucket
[root@rceph ~]# aws s3api --endpoint-url http://rceph.local:8080 list-buckets
{
"Buckets": [
{
"Name": "mynewbucket",
"CreationDate": "2024-08-03T07:38:26.208000+00:00"
},
{
"Name": "secbucket",
"CreationDate": "2024-08-03T07:41:30.284000+00:00"
}
],
"Owner": {
"DisplayName": "S3 Test User",
"ID": "s3test"
}
}
[root@rceph ~]#

[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 put-object --bucket


secbucket --key sec --body ./awscliv2.zip
{
"ETag": "\"7bb7cfda37a4ed26666902219c73cce2\""
}
[root@rceph ~]# aws s3api --endpoint-url http://rceph.local:8080 list-objects --bucket
secbucket
{
"Contents": [
{
"Key": "sec",
"LastModified": "2024-08-03T07:42:28.876000+00:00",
"ETag": "\"7bb7cfda37a4ed26666902219c73cce2\"",
"Size": 60864458,
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "S3 Test User",
"ID": "s3test"
}
}
],
"RequestCharged": null
}
[root@rceph ~]#

224
IBM Storage Ceph for Beginner’s

We should have the same Object statistics on both primary RCEPH and secondary LCEPH. We can
check this via the Ceph Dashboard.

Figure 223: Ceph Dashboard – Object – Overview on RCEPH

Figure 224: Ceph Dashboard – Object – Overview on LCEPH

You can also view the replication performance from the Ceph Dashboard. Navigate to Object ->
Gateways -> Sync Performance.

225
IBM Storage Ceph for Beginner’s

Figure 225: Ceph Dashboard – Object – Gateways – Sync Performance on RCEPH

Let us simulate a primary gateway failure by stopping the RGW service on our primary cluster RCEPH.

Figure 226: Ceph Dashboard – Administration – Services – RGW Service on RCEPH

As explained earlier, the loss of a master zone should only affect new bucket operations but not affect
access to existing buckets and objects or our ability to write new objects to existing buckets. This is
because the default behavior of the Ceph RGW is to run in an active-active configuration. Now that
we simulated a primary zone failure by stopping the RGW service on RCEPH, let us test the creation of
a new bucket on LCEPH and access to an existing bucket on LCEPH. You could also setup the cluster
to run as active/passive, in which case you would need to remove the read-only status on the
secondary zone before running this test.

226
IBM Storage Ceph for Beginner’s

[root@rceph ceph]# aws s3api –endpoint-url http://rceph.local:8080 list-buckets

Could not connect to the endpoint URL: “http://rceph.local:8080/”


[root@rceph ceph]# aws s3api –endpoint-url http://lceph.local:8080 list-buckets
{
“Buckets”: [
{
“Name”: “mynewbucket”,
“CreationDate”: “2024-08-03T07:38:26.208000+00:00”
},
{
“Name”: “secbucket”,
“CreationDate”: “2024-08-03T07:41:30.284000+00:00”
}
],
“Owner”: {
“DisplayName”: “S3 Test User”,
“ID”: “s3test”
}
}
[root@rceph ceph]# aws s3api –endpoint-url http://lceph.local:8080 create-bucket –bucket
afterstop

argument of type ‘NoneType’ is not iterable


[root@rceph ceph]#

[root@rceph ~]# aws s3api –endpoint-url http://lceph.local:8080 put-object –bucket


mynewbucket –key afterstop –body ./awscliv2.zip
{
“Etag”: “\”7bb7cfda37a4ed26666902219c73cce2\””
}
[root@rceph ~]# aws s3api –endpoint-url http://lceph.local:8080 list-objects –bucket
mynewbucket
{
“Contents”: [
{
“Key”: “afterstop”,
{
“Contents”: [
{
“Key”: “afterstop”,
“LastModified”: “2024-08-03T13:27:41.326000+00:00”,
“Etag”: “\”7bb7cfda37a4ed26666902219c73cce2\””,
“Size”: 60864458,
“StorageClass”: “STANDARD”,
“Owner”: {
“DisplayName”: “S3 Test User”,
“ID”: “s3test”
}
},
{
“Key”: “new”,
“LastModified”: “2024-08-03T07:39:25.488000+00:00”,
“Etag”: “\”7bb7cfda37a4ed26666902219c73cce2\””,
“Size”: 60864458,
“StorageClass”: “STANDARD”,
“Owner”: {
“DisplayName”: “S3 Test User”,
“ID”: “s3test”
}
}
],
“RequestCharged”: null
}
[root@rceph ~]#

As expected, we can’t create new buckets (or Object Users for that matter) but we can access existing
buckets and write new objects to them even with the master zone being unavailable. This proves that
there are no failover commands to run in the event of a site failure due to the default active/active
behavior of the RGW. If you implemented an external load balancer (e.g. DNS round-robin) that
included RGWs from both zones, then Object Clients would not be interrupted as the load-balancer
would just direct new requests to RGWs on the site that is available.

227
IBM Storage Ceph for Beginner’s

Let us now assume that we have an extended outage and can’t get our primary site backup up. We
want to be able to create new buckets and new object users so can’t afford to wait for the master zone
to be recovered. We will now convert the secondary zone to master so that we can create new buckets
and new Object users even with the master zone down.

On the secondary cluster LCEPH, we need to make the secondary zone called SECONDARY the master
and default zone for our zone group. We can do this as follows:

[root@lceph ~]# radosgw-admin zone modify --rgw-zone=SECONDARY --master --default


2024-08-03T15:32:02.228+0200 7f7168126800 0 NOTICE: overriding master zone: 44d5f41c-a04f-
4d1b-8e14-4622bcc7b2af
{
"id": "30de453d-b997-4435-9a48-feebe5245952",
"name": "SECONDARY",
"domain_root": "SECONDARY.rgw.meta:root",
"control_pool": "SECONDARY.rgw.control",
"gc_pool": "SECONDARY.rgw.log:gc",
"lc_pool": "SECONDARY.rgw.log:lc",
"log_pool": "SECONDARY.rgw.log",
"intent_log_pool": "SECONDARY.rgw.log:intent",
"usage_log_pool": "SECONDARY.rgw.log:usage",
"roles_pool": "SECONDARY.rgw.meta:roles",
"reshard_pool": "SECONDARY.rgw.log:reshard",
"user_keys_pool": "SECONDARY.rgw.meta:users.keys",
"user_email_pool": "SECONDARY.rgw.meta:users.email",
"user_swift_pool": "SECONDARY.rgw.meta:users.swift",
"user_uid_pool": "SECONDARY.rgw.meta:users.uid",
"otp_pool": "SECONDARY.rgw.otp",
"system_key": {
"access_key": "SB4FZDPO22LFEATFKO32",
"secret_key": "dp4dtV0g6RD1aNJtsR1klwK5srn4aDHUiW2QQmJB"
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "SECONDARY.rgw.buckets.index",
"storage_classes": {
"STANDARD": {
"data_pool": "SECONDARY.rgw.buckets.data",
"compression_type": "lz4"
}
},
"data_extra_pool": "SECONDARY.rgw.buckets.non-ec",
"index_type": 0,
"inline_data": true
}
}
],
"realm_id": "bee2e14c-b965-4f6d-a010-3a0625dee923",
"notif_pool": "SECONDARY.rgw.log:notif"
}
[root@lceph ~]#

Next, on the secondary cluster LCEPH, we need to update the period.

A Ceph RGW period is a time period with multiple epochs that tracks changes to a Ceph Object
Gateway (RGW) configuration. The RGW period is stored in a realm, which also contains zones and
zone groups. Updating the period changes the epoch and ensures that other zones receive the
updated configuration.

[root@lceph ~]# radosgw-admin period update --commit


{
"id": "a43d12da-2ba9-4458-a6d9-ed04a3635cb3",
"epoch": 1,
"predecessor_uuid": "af5c89c3-efa7-45b9-98ff-b7886d99ba7a",
"sync_status": [

228
IBM Storage Ceph for Beginner’s

.
.
.
"period_map": {
"id": "a43d12da-2ba9-4458-a6d9-ed04a3635cb3",
"zonegroups": [
{
"id": "b0269fde-61cc-4b7e-9e34-389cf0bc0eed",
"name": "REPLICATED_ZONE",
"api_name": "REPLICATED_ZONE",
"is_master": true,
"endpoints": [
"http://10.0.0.239:8080"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "30de453d-b997-4435-9a48-feebe5245952",
"zones": [
{
"id": "30de453d-b997-4435-9a48-feebe5245952",
"name": "SECONDARY",
"endpoints": [
"http://10.0.0.249:8080"
],
"log_meta": false,
"log_data": true,
"bucket_index_max_shards": 11,
"read_only": false,
"tier_type": "",
"sync_from_all": true,
"sync_from": [],
"redirect_zone": "",
"supported_features": [
"compress-encrypted",
"resharding"
]
},
{
"id": "44d5f41c-a04f-4d1b-8e14-4622bcc7b2af",
"name": "PRIMARY",
"endpoints": [
"http://10.0.0.239:8080"
],
"log_meta": false,
"log_data": true,
"bucket_index_max_shards": 11,
"read_only": false,
"tier_type": "",
"sync_from_all": true,
"sync_from": [],
"redirect_zone": "",
"supported_features": [
"compress-encrypted",
"resharding"
]
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "bee2e14c-b965-4f6d-a010-3a0625dee923",
"sync_policy": {
"groups": []
},
"enabled_features": [
"resharding"
]
}
],
"short_zone_ids": [

229
IBM Storage Ceph for Beginner’s

{
"key": "30de453d-b997-4435-9a48-feebe5245952",
"val": 4270858657
},
{
"key": "44d5f41c-a04f-4d1b-8e14-4622bcc7b2af",
"val": 583748692
}
]
},
"master_zonegroup": "b0269fde-61cc-4b7e-9e34-389cf0bc0eed",
"master_zone": "30de453d-b997-4435-9a48-feebe5245952",
"period_config": {
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_ratelimit": {
"max_read_ops": 0,
"max_write_ops": 0,
"max_read_bytes": 0,
"max_write_bytes": 0,
"enabled": false
},
"bucket_ratelimit": {
"max_read_ops": 0,
"max_write_ops": 0,
"max_read_bytes": 0,
"max_write_bytes": 0,
"enabled": false
},
"anonymous_ratelimit": {
"max_read_ops": 0,
"max_write_ops": 0,
"max_read_bytes": 0,
"max_write_bytes": 0,
"enabled": false
}
},
"realm_id": "bee2e14c-b965-4f6d-a010-3a0625dee923",
"realm_epoch": 3
}
[root@lceph ~]#

Finally, we have to restart the RGWs on our secondary cluster to pick up the changes.

[root@lceph ~]# ceph orch ls --service-type rgw


NAME PORTS RUNNING REFRESHED AGE PLACEMENT
rgw.rgw ?:8080 1/1 3m ago 4h lceph.local;count:1
[root@lceph ~]# ceph orch restart rgw.rgw
Scheduled to restart rgw.rgw.lceph.tbhdia on host 'lceph.local'
[root@lceph ~]# ceph orch ls --service-type rgw
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
rgw.rgw ?:8080 1/1 0s ago 4h lceph.local;count:1
[root@lceph ~]#

If we check the Multi-site configuration on the secondary cluster LCEPH Ceph Dashboard, we can see
that our zone SECONDARY is now the master zone for our zone group REPLICATED_ZONE.

230
IBM Storage Ceph for Beginner’s

Figure 227: Ceph Dashboard – Object – Multi-site on LCEPH

We can also verify this from the command line on LCEPH. SECONDARY zone on LCEPH is now the
master (no sync).

[root@lceph ~]# radosgw-admin sync status


realm bee2e14c-b965-4f6d-a010-3a0625dee923 (MYLAB)
zonegroup b0269fde-61cc-4b7e-9e34-389cf0bc0eed (REPLICATED_ZONE)
zone 30de453d-b997-4435-9a48-feebe5245952 (SECONDARY)
current time 2024-08-03T13:38:13Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync no sync (zone is master)
2024-08-03T15:38:13.302+0200 7f5ccff24800 0 ERROR: failed to fetch datalog info
data sync source: 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af (PRIMARY)
failed to retrieve sync info: (22) Invalid argument
[root@lceph ~]#

We can now test bucket creation (which failed earlier as we had lost the master zone). This now works
as expected.

[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 create-bucket --bucket


afterfailover
[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 list-buckets
{
"Buckets": [
{
"Name": "afterfailover",
"CreationDate": "2024-08-03T13:40:26.533000+00:00"
},
{
"Name": "mynewbucket",
"CreationDate": "2024-08-03T07:38:26.208000+00:00"
},
{
"Name": "secbucket",
"CreationDate": "2024-08-03T07:41:30.284000+00:00"
}
],
"Owner": {
"DisplayName": "S3 Test User",
"ID": "s3test"
}
}
[root@rceph ~]#

231
IBM Storage Ceph for Beginner’s

[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 list-objects --bucket


afterfailover
{
"Contents": [
{
"Key": "afterfail",
"LastModified": "2024-08-03T13:41:21.371000+00:00",
"ETag": "\"7bb7cfda37a4ed26666902219c73cce2\"",
"Size": 60864458,
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "S3 Test User",
"ID": "s3test"
}
}
],
"RequestCharged": null
}
[root@rceph ~]#

When the former master zone recovers you can revert the failover configuration back to its original
state. To simulate this, let us first restart the RGW on RCEPH (which contained former master zone).

Figure 228: Ceph Dashboard – Administration – Services – RGW Service on RCEPH

If we query the replication sync status on RCEPH, we can see that it still thinks that it is the master
zone (it doesn’t have the latest RGW period).

[root@rceph ~]# radosgw-admin sync status


realm bee2e14c-b965-4f6d-a010-3a0625dee923 (MYLAB)
zonegroup b0269fde-61cc-4b7e-9e34-389cf0bc0eed (REPLICATED_ZONE)
zone 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af (PRIMARY)
current time 2024-08-03T13:46:35Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 30de453d-b997-4435-9a48-feebe5245952 (SECONDARY)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
[root@rceph ~]#

232
IBM Storage Ceph for Beginner’s

We can confirm this as well from the Ceph Dashboard. Our PRIMARY zone on RCEPH still thinks it’s
the master zone for our zone group REPLICATED_ZONE.

Figure 229: Ceph Dashboard – Object – Multi-site on RCEPH

To correct this issue, we need to pull the latest realm configuration from the current master zone
(which is on LCEPH).

[root@rceph ~]# radosgw-admin realm pull --url=http://lceph.local:8080 --access-


key=SB4FZDPO22LFEATFKO32 --secret=dp4dtV0g6RD1aNJtsR1klwK5srn4aDHUiW2QQmJB
{
"id": "bee2e14c-b965-4f6d-a010-3a0625dee923",
"name": "MYLAB",
"current_period": "a43d12da-2ba9-4458-a6d9-ed04a3635cb3",
"epoch": 3
}
[root@rceph ~]#

Next, we need to make the recovered zone (former master zone) the master and default for our
REPLICATED_ZONE zone group. On RCEPH, issue the following command:

[root@rceph ~]# radosgw-admin zone modify --rgw-zone=PRIMARY --master --default


2024-08-03T15:50:07.465+0200 7faee253f800 0 NOTICE: overriding master zone: 30de453d-b997-
4435-9a48-feebe5245952
{
"id": "44d5f41c-a04f-4d1b-8e14-4622bcc7b2af",
"name": "PRIMARY",
"domain_root": "PRIMARY.rgw.meta:root",
"control_pool": "PRIMARY.rgw.control",
"gc_pool": "PRIMARY.rgw.log:gc",
"lc_pool": "PRIMARY.rgw.log:lc",
"log_pool": "PRIMARY.rgw.log",
"intent_log_pool": "PRIMARY.rgw.log:intent",
"usage_log_pool": "PRIMARY.rgw.log:usage",
"roles_pool": "PRIMARY.rgw.meta:roles",
"reshard_pool": "PRIMARY.rgw.log:reshard",
"user_keys_pool": "PRIMARY.rgw.meta:users.keys",
"user_email_pool": "PRIMARY.rgw.meta:users.email",
"user_swift_pool": "PRIMARY.rgw.meta:users.swift",
"user_uid_pool": "PRIMARY.rgw.meta:users.uid",
"otp_pool": "PRIMARY.rgw.otp",
"system_key": {
"access_key": "SB4FZDPO22LFEATFKO32",
"secret_key": "dp4dtV0g6RD1aNJtsR1klwK5srn4aDHUiW2QQmJB"
},
"placement_pools": [

233
IBM Storage Ceph for Beginner’s

{
"key": "default-placement",
"val": {
"index_pool": "PRIMARY.rgw.buckets.index",
"storage_classes": {
"STANDARD": {
"data_pool": "PRIMARY.rgw.buckets.data",
"compression_type": "lz4"
}
},
"data_extra_pool": "PRIMARY.rgw.buckets.non-ec",
"index_type": 0,
"inline_data": true
}
}
],
"realm_id": "bee2e14c-b965-4f6d-a010-3a0625dee923",
"notif_pool": "PRIMARY.rgw.log:notif"
}
[root@rceph ~]#

We now have the latest realm configuration on RCEPH and have issued the command to convert our
zone called PRIMARY to make it the master and default zone for our zone group REPLICATED_ZONE.
We need to update the RGW period to reflect the changes. On RCEPH, issue:

[root@rceph ~]# radosgw-admin period update --commit


{
"id": "86b6792d-b46f-4c63-98cc-771203ec9233",
"epoch": 1,
"predecessor_uuid": "a43d12da-2ba9-4458-a6d9-ed04a3635cb3",
"sync_status": [
.
.
.
},
"realm_id": "bee2e14c-b965-4f6d-a010-3a0625dee923",
"realm_epoch": 4
}

[root@rceph ~]#

As we did for the failover, after we update the period we need to restart the RGW in the recovered
zone. On RCEPH, issue:

[root@rceph ~]# ceph orch ls --service-type rgw


NAME PORTS RUNNING REFRESHED AGE PLACEMENT
rgw.rgw ?:8080 1/1 8m ago 7h rceph.local;count:1
[root@rceph ~]# ceph orch restart rgw.rgw
Scheduled to restart rgw.rgw.rceph.morhpb on host 'rceph.local'
[root@rceph ~]# ceph orch ls --service-type rgw
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
rgw.rgw ?:8080 1/1 28s ago 7h rceph.local;count:1
[root@rceph ~]#

If you had previously configured the secondary zone to be read-only, then you would set that now.
Since we chose the active/active default configuration we don’t have to. We do however need to
update the period in the secondary zone and restart the RGWs for it to pick up the latest changes to
the realm. On LCEPH, do the following:

[root@lceph ~]# radosgw-admin period update --commit


Sending period to new master zone 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af
{
"id": "86b6792d-b46f-4c63-98cc-771203ec9233",
"epoch": 2,
"predecessor_uuid": "a43d12da-2ba9-4458-a6d9-ed04a3635cb3",
"sync_status": [
.
.
.

234
IBM Storage Ceph for Beginner’s

},
"realm_id": "bee2e14c-b965-4f6d-a010-3a0625dee923",
"realm_epoch": 4
}
[root@lceph ~]#

[root@lceph ~]# ceph orch ls --service-type rgw


NAME PORTS RUNNING REFRESHED AGE PLACEMENT
rgw.rgw ?:8080 1/1 9m ago 4h lceph.local;count:1
[root@lceph ~]# ceph orch restart rgw.rgw
Scheduled to restart rgw.rgw.lceph.tbhdia on host 'lceph.local'
[root@lceph ~]# ceph orch ls --service-type rgw
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
rgw.rgw ?:8080 1/1 3s ago 4h lceph.local;count:1
[root@lceph ~]#

We can query the current sync status from either the command line or via the Ceph Dashboard. As
expected, our PRIMARY zone on RCEPH is now the master for our zone group REPLICATED_ZONE and
LCEPH reflects this as well.

[root@lceph ~]# radosgw-admin sync status


realm bee2e14c-b965-4f6d-a010-3a0625dee923 (MYLAB)
zonegroup b0269fde-61cc-4b7e-9e34-389cf0bc0eed (REPLICATED_ZONE)
zone 30de453d-b997-4435-9a48-feebe5245952 (SECONDARY)
current time 2024-08-03T13:57:22Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af (PRIMARY)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
[root@lceph ~]#

[root@rceph ~]# radosgw-admin sync status


realm bee2e14c-b965-4f6d-a010-3a0625dee923 (MYLAB)
zonegroup b0269fde-61cc-4b7e-9e34-389cf0bc0eed (REPLICATED_ZONE)
zone 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af (PRIMARY)
current time 2024-08-03T13:57:43Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync no sync (zone is master)
data sync source: 30de453d-b997-4435-9a48-feebe5245952 (SECONDARY)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
[root@rceph ~]#

From the Ceph Dashboard on RCEPH:

235
IBM Storage Ceph for Beginner’s

Figure 230: Ceph Dashboard – Object – Multi-site on RCEPH

Figure 231: Ceph Dashboard – Object – Overview on RCEPH

From the Ceph Dashboard on LCEPH:

236
IBM Storage Ceph for Beginner’s

Figure 232: Ceph Dashboard – Object – Multi-site on LCEPH

Figure 233: Ceph Dashboard – Object – Overview on LCEPH

To complete our testing of the failback, let us create a new bucket on RCEPH.

[root@rceph ~]# aws s3api --endpoint-url http://rceph.local:8080 create-bucket --bucket


afterfailback

And check that it is available on LCEPH. Depending on the number of changes you made whilst your
SECONDARY zone was the master after failover, it might take some time for the changes to propagate.
In our lab environment, the sync appeared to be stuck for a few minutes:

[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 list-buckets


{
"Buckets": [
{
"Name": "afterfailover",
"CreationDate": "2024-08-03T13:40:26.533000+00:00"
},
{
"Name": "mynewbucket",
"CreationDate": "2024-08-03T07:38:26.208000+00:00"
},
{
"Name": "secbucket",

237
IBM Storage Ceph for Beginner’s

"CreationDate": "2024-08-03T07:41:30.284000+00:00"
}
],
"Owner": {
"DisplayName": "S3 Test User",
"ID": "s3test"
}
}
[root@rceph ~]#

Querying the sync status showed some metadata shards behind.

[root@lceph ~]# radosgw-admin sync status


realm bee2e14c-b965-4f6d-a010-3a0625dee923 (MYLAB)
zonegroup b0269fde-61cc-4b7e-9e34-389cf0bc0eed (REPLICATED_ZONE)
zone 30de453d-b997-4435-9a48-feebe5245952 (SECONDARY)
current time 2024-08-03T14:02:34Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is behind on 1 shards
behind shards: [25]
oldest incremental change not applied: 2024-08-03T16:00:38.499961+0200 [25]
data sync source: 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af (PRIMARY)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
[root@lceph ~]#

You can check for any sync errors as follows:

[root@rceph ~]# radosgw-admin sync error list


[
{
"shard_id": 0,
"entries": []
},
.
.
.
"shard_id": 31,
"entries": []
}
]
[root@rceph ~]#

Since there are no errors, and we don’t see progress when issuing the sync status (out of sync shard
count is not reducing) it is not uncommon for failover/failback issues to cause replication to stop. See
here for an explanation https://www.ibm.com/docs/en/storage-ceph/7.1?topic=tmscog-
synchronizing-data-in-multi-site-ceph-object-gateway-configuration). We can manually issue a
resync to resolve this and restart the RGW gateway on LCEPH.

[root@lceph ~]# radosgw-admin data sync init --source-zone PRIMARY

[root@lceph ~]# ceph orch ls --service-type rgw


NAME PORTS RUNNING REFRESHED AGE PLACEMENT
rgw.rgw ?:8080 1/1 3m ago 4h lceph.local;count:1
[root@lceph ~]# ceph orch restart rgw.rgw
Scheduled to restart rgw.rgw.lceph.tbhdia on host 'lceph.local'
[root@lceph ~]#

We can check the sync status again to see if its restarted and progressing.

[root@lceph ~]# radosgw-admin sync status


realm bee2e14c-b965-4f6d-a010-3a0625dee923 (MYLAB)
zonegroup b0269fde-61cc-4b7e-9e34-389cf0bc0eed (REPLICATED_ZONE)
zone 30de453d-b997-4435-9a48-feebe5245952 (SECONDARY)

238
IBM Storage Ceph for Beginner’s

current time 2024-08-03T14:28:05Z


zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 44d5f41c-a04f-4d1b-8e14-4622bcc7b2af (PRIMARY)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
[root@lceph ~]#

And list the buckets on LCEPH to see if we can see the bucket we created on RCEPH after we
performed the failback.

[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 list-buckets


{
"Buckets": [
{
"Name": "afterfailback",
"CreationDate": "2024-08-03T14:00:38.493000+00:00"
},
{
"Name": "afterfailover",
"CreationDate": "2024-08-03T13:40:26.533000+00:00"
},
{
"Name": "mynewbucket",
"CreationDate": "2024-08-03T07:38:26.208000+00:00"
},
{
"Name": "secbucket",
"CreationDate": "2024-08-03T07:41:30.284000+00:00"
}
],
"Owner": {
"DisplayName": "S3 Test User",
"ID": "s3test"
}
}
[root@rceph ~]#

Lastly, we can check if new data is syncing by writing an object on RCEPH and checking to see if it is
replicated to LCEPH.

[root@rceph ~]# aws s3api --endpoint-url http://rceph.local:8080 put-object --bucket


afterfailback --key afterfailback --body ./awscliv2.zip {
"ETag": "\"7bb7cfda37a4ed26666902219c73cce2\""
}
[root@rceph ~]#
[root@rceph ~]# aws s3api --endpoint-url http://lceph.local:8080 list-objects --bucket
afterfailback
{
"Contents": [
{
"Key": "afterfailback",
"LastModified": "2024-08-03T14:13:54.126000+00:00",
"ETag": "\"7bb7cfda37a4ed26666902219c73cce2\"",
"Size": 60864458,
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "S3 Test User",
"ID": "s3test"
}
}
],
"RequestCharged": null
}
[root@rceph ~]#

239
IBM Storage Ceph for Beginner’s

IBM Storage Ceph Encryption


With cyber-resiliency one of the top priorities for most organizations, you will most likely need to
demonstrate IBM Storage Ceph’s data in-flight and at-rest encryption. Storage encryption provides
many benefits including but not limited to:

• Protecting data - Encryption protects data from unauthorized access, theft, and tampering
• Meeting compliance requirements - Encryption may be required or encouraged by laws and
regulations. For example, the Payment Card Industry Data Security Standard (PCI DSS)
requires merchants to encrypt customer payment card data.
• Reducing the risk of costly penalties - Encryption can help organizations avoid costly
penalties, lengthy lawsuits, reduced revenue, and tarnished reputations.
• Reducing the attack surface - Encryption can reduce the surface of attack by cutting out the
lower layers of the hardware and software stack.

More information on IBM Storage Ceph encryption can be found here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=hardening-encryption-key-management

IBM Storage Ceph Data-in-flight Encryption

Encryption for all Ceph traffic over the network is enabled by default, with the introduction of the
messenger version 2 protocol. The secure mode setting for messenger v2 encrypts communication
between Ceph daemons and Ceph clients, providing end-to-end encryption.

https://docs.ceph.com/en/reef/rados/configuration/msgr2/

The messenger v2 protocol, or msgr2, is the second major revision on Ceph’s on-wire protocol. It
brings with it several key features:

• A secure mode that encrypts all data passing over the network
• Improved encapsulation of authentication payloads, enabling future integration of new
authentication modes like Kerberos
• Improved earlier feature advertisement and negotiation, enabling future protocol revisions

Ceph daemons can now bind to multiple ports, allowing both legacy Ceph clients and new v2-capable
clients to connect to the same cluster. By default, monitors now bind to the new IANA-assigned port
3300 (ce4h or 0xce4) for the new v2 protocol, while also binding to the old default port 6789 for the
legacy v1 protocol.

We can verify that our MON are using the v2 protocol as follows:

[root@cephnode1 ~]# ceph mon dump


epoch 7
fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341
last_changed 2024-07-22T08:25:33.891253+0000
created 2024-07-15T21:01:38.387060+0000
min_mon_release 18 (reef)
election_strategy: 1
0: [v2:10.0.0.240:3300/0,v1:10.0.0.240:6789/0] mon.cephnode1
1: [v2:10.0.0.242:3300/0,v1:10.0.0.242:6789/0] mon.cephnode3
2: [v2:10.0.0.241:3300/0,v1:10.0.0.241:6789/0] mon.cephnode2
dumped monmap epoch 7
[root@cephnode1 ~]#

240
IBM Storage Ceph for Beginner’s

And a netstat should also confirm we talking over the v2 protocol (port 3300).

[root@cephnode1 ~]# netstat -a | grep 6789


tcp 0 0 cephnode1.local:3300 cephnode3.local:37036 ESTABLISHED 2199/ceph-mon
tcp 0 0 cephnode1.local:3300 cephnode1.local:53126 ESTABLISHED 2199/ceph-mon
tcp 0 0 cephnode1.local:3300 cephnode1.local:52256 ESTABLISHED 2199/ceph-mon
tcp 0 0 cephnode1.local:3300 cephnode1.local:49302 ESTABLISHED 2199/ceph-mon
tcp 0 0 cephnode1.local:37778 cephnode3.local:3300 ESTABLISHED 2208/ceph-mgr
tcp 0 0 cephnode1.local:39494 cephnode2.local:3300 ESTABLISHED 2273/ceph-mds
tcp 0 0 cephnode1.local:3300 cephnode1.local:53174 ESTABLISHED 2199/ceph-mon
tcp 0 0 cephnode1.local:52256 cephnode1.local:3300 ESTABLISHED 2972/ceph-osd
tcp 0 0 cephnode1.local:41792 cephnode2.local:3300 ESTABLISHED
867158/ganesha.nfsd
tcp 0 0 cephnode1.local:60738 cephnode2.local:3300 ESTABLISHED 2217/ceph-mds
tcp 0 0 cephnode1.local:33002 cephnode3.local:6816 ESTABLISHED 2208/ceph-mgr
tcp 0 0 cephnode1.local:mysqlx cephnode2.local:3300 ESTABLISHED 2199/ceph-mon
tcp 0 0 cephnode1.local:3300 cephnode3.local:37104 ESTABLISHED 2199/ceph-mon
.
.
.
tcp 0 0 cephnode1.local:3300 cephnode1.local:58938 ESTABLISHED 2199/ceph-mon
tcp 0 0 cephnode1.local:60498 cephnode3.local:3300 ESTABLISHED 2199/ceph-mon
tcp 0 0 cephnode1.local:3300 cephs3.local:51352 ESTABLISHED 2199/ceph-mon
tcp 0 0 cephnode1.local:3300 cephnode3.local:37014 ESTABLISHED 2199/ceph-mon
[root@cephnode1 ~]#

We can also verify that the v1 protocol is not in use (i.e. we do not have any legacy clients).

[root@cephnode1 ~]# netstat -a | grep 6789


[root@cephnode1 ~]#

Ceph Data-in-flight does have a slight performance overhead. An excellent article that tests the impact
of using over-the-wire encryption (on Ceph Reef or IBM Storage Ceph V7) can be found here:

https://ceph.io/en/news/blog/2023/ceph-encryption-performance/

The conclusion is that the performance impact is negligible when using the messenger v2 protocol to
perform end-to-end encryption.

Figure 234: Ceph Data-in-flight Performance Overhead Summary


(https://ceph.io/en/news/blog/2023/ceph-encryption-performance/)

IBM Storage Ceph also supports key rotation. See here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=management-enabling-key-rotation

IBM Storage Ceph Data-at-Rest Encryption

Ceph Block Storage Encryption is a feature in Ceph that enables users to encrypt data at the block
level. It encrypts data before writing it to the storage cluster and decrypts it when retrieving it. Block

241
IBM Storage Ceph for Beginner’s

storage encryption adds an extra degree of protection to sensitive data stored on Ceph. The encryption
is done per-volume, so the user may select which volumes to encrypt and which to leave unencrypted.
Block Storage Encryption does incur a performance overhead though, especially on workloads with
large writes. More information on IBM Storage Ceph’s Data-at-rest encryption (DAE) is documented
here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=management-encryption-rest

Figure 235: Ceph Data-at-rest Performance Overhead Summary


(https://ceph.io/en/news/blog/2023/ceph-encryption-performance/)

Ceph OSD encryption-at-rest relies on the Linux kernel's dm-crypt subsystem and the Linux Unified
Key Setup ("LUKS"). See here for more information:

https://docs.ceph.com/en/latest/ceph-volume/lvm/encryption/
When creating an encrypted OSD, ceph-volume creates an encrypted logical volume and saves the
corresponding dm-crypt secret key in the Ceph Monitor data store. When the OSD is to be started,
ceph-volume ensures the device is mounted, retrieves the dm-crypt secret key from the Ceph
Monitors, and decrypts the underlying device. This creates a new device, containing the unencrypted
data, and this is the device the Ceph OSD daemon is started on.

Since decrypting the data on an encrypted OSD disk requires knowledge of the corresponding dm-
crypt secret key, OSD encryption provides protection for cases when a disk drive that was used as an
OSD is decommissioned, lost, or stolen. The OSD itself does not know whether the underlying logical
volume is encrypted or not, so there is no ceph OSD command that will return this information.

To check if your cluster is using block encryption you can issue the following command on your OSD
nodes. Check for “encrypted”. A value of 0 means it is not encrypted.

[ceph: root@cephnode1 /]# ceph-volume lvm list

====== osd.0 =======

[block] /dev/ceph-2a2f9223-0d70-41cb-aa07-7754b885728a/osd-block-187560c1-07a4-444d-
b55f-48a093c324f1

block device /dev/ceph-2a2f9223-0d70-41cb-aa07-7754b885728a/osd-block-


187560c1-07a4-444d-b55f-48a093c324f1
block uuid gNwSgp-22Xx-bidh-GVf1-6XMX-Qc7U-98GokG
cephx lockbox secret
cluster fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341

242
IBM Storage Ceph for Beginner’s

cluster name ceph


crush device class
encrypted 0
osd fsid 187560c1-07a4-444d-b55f-48a093c324f1
osd id 0
osdspec affinity all-available-devices
type block
vdo 0
devices /dev/sdb

====== osd.1 =======

[block] /dev/ceph-ae4c3ffc-5d5e-473f-b324-8d5c195cc841/osd-block-34372239-a478-4bbd-
8bff-9e6704cbc35a

block device /dev/ceph-ae4c3ffc-5d5e-473f-b324-8d5c195cc841/osd-block-


34372239-a478-4bbd-8bff-9e6704cbc35a
block uuid d9m0ka-kTUc-ePwc-wPQB-imO0-3RTX-VaVmQB
cephx lockbox secret
cluster fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341
cluster name ceph
crush device class
encrypted 0
osd fsid 34372239-a478-4bbd-8bff-9e6704cbc35a
osd id 1
osdspec affinity all-available-devices
type block
vdo 0
devices /dev/sdc

====== osd.10 ======

[block] /dev/ceph-fbe588c6-d736-4f75-be72-43ce94f21b7d/osd-block-5bd7dc13-5e2b-4340-
a2e9-8a940c0bfdca

block device /dev/ceph-fbe588c6-d736-4f75-be72-43ce94f21b7d/osd-block-


5bd7dc13-5e2b-4340-a2e9-8a940c0bfdca
block uuid IgmgHU-dHsp-ym4k-Wkkp-C29E-xM2q-QkROe5
cephx lockbox secret
cluster fsid e7fcc1ac-42ec-11ef-a58f-bc241172f341
cluster name ceph
crush device class
encrypted 0
osd fsid 5bd7dc13-5e2b-4340-a2e9-8a940c0bfdca
osd id 10
osdspec affinity all-available-devices
type block
vdo 0
devices /dev/sdd
[ceph: root@cephnode1 /]#

Or you can query LUKS as follows using the device name of the logical volume used for the OSD
(obtained from the output above). As an example, for osd.10:

[ceph: root@cephnode1 /]# cryptsetup luksDump /dev/ceph-fbe588c6-d736-4f75-be72-


43ce94f21b7d/osd-block-5bd7dc13-5e2b-4340-a2e9-8a940c0bfdca
Device /dev/ceph-fbe588c6-d736-4f75-be72-43ce94f21b7d/osd-block-5bd7dc13-5e2b-4340-a2e9-
8a940c0bfdca is not a valid LUKS device.
[ceph: root@cephnode1 /]#

Encryption is done at an OSD level so you can have a mix of encrypted and non-encrypted OSDs in a
cluster as it is transparent to the upper layers. This of course is not recommended.

Since OSDs can only be encrypted at creation, it is practical to do this at cluster creation. Because you
can have a mix of encrypted and unencrypted OSDs in a cluster, for an existing cluster you can remove
and re-add OSDs one at a time back into the cluster and encrypting them. This would be too tedious

243
IBM Storage Ceph for Beginner’s

though so we will just demonstrate how to do this at cluster creation. Our single node Ceph cluster
has the following storage layout.

[root@ndceph ~]# ceph orch device ls


HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT
REASONS
ndceph.local /dev/sr0 hdd QEMU_DVD-ROM_QM00001 2048 No 88s ago Insufficient
space (<5GB)
ndceph.local /dev/vdb hdd 20.0G Yes 88s ago
ndceph.local /dev/vdc hdd 20.0G Yes 88s ago
ndceph.local /dev/vdd hdd 20.0G Yes 88s ago
[root@ndceph ~]#

If you recall, after bootstrapping you Ceph cluster you connect to the Ceph Dashboard and go through
the Expand Cluster wizard. After adding your cluster nodes, a screen is displayed to create OSDs. Click
on Advanced and select Encryption to make use of data-at-rest or block encryption.

Figure 236: Ceph Dashboard – Expand Cluster – OSD Creation

After you complete the Expand Cluster wizard, OSDs are created on our lab cluster as follows:

[root@ndceph ~]# ceph osd tree


ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.05846 root default
-3 0.05846 host ndceph
0 hdd 0.01949 osd.0 up 1.00000 1.00000
1 hdd 0.01949 osd.1 up 1.00000 1.00000
2 hdd 0.01949 osd.2 up 1.00000 1.00000
[root@ndceph ~]#

We can check if our OSDs are encrypted from the command line. If encrypted is set to 1 then the OSD
is encrypted.

[root@ndceph ~]# cephadm shell


Inferring fsid 6f105f06-562e-11ef-8666-525400463683
Inferring config /var/lib/ceph/6f105f06-562e-11ef-8666-525400463683/mon.ndceph/config
Using ceph image with id 'a09ffce67935' and tag 'latest' created on 2024-05-31 19:48:46 +0000
UTC
cp.icr.io/cp/ibm-ceph/ceph-7-
rhel9@sha256:354f5b6f203dbd9334ac2f2bfb541c7e06498a62283b1c91cef5fa8a036aea4f
[ceph: root@ndceph /]# ceph-volume lvm list

244
IBM Storage Ceph for Beginner’s

====== osd.0 =======

[block] /dev/ceph-7513fae5-a462-4357-9bc4-7d9664815902/osd-block-10ecc366-f829-4b3f-
bc4e-f3932d7aeb83

block device /dev/ceph-7513fae5-a462-4357-9bc4-7d9664815902/osd-block-


10ecc366-f829-4b3f-bc4e-f3932d7aeb83
block uuid 8Qm4W4-w5qb-0KZY-hCd2-St0o-1T4Q-AtSpWC
cephx lockbox secret AQDAerZmbnOCOxAADQSl03/1TPW4eONmCWoxnQ==
cluster fsid 6f105f06-562e-11ef-8666-525400463683
cluster name ceph
crush device class
encrypted 1
osd fsid 10ecc366-f829-4b3f-bc4e-f3932d7aeb83
osd id 0
osdspec affinity cost_capacity
type block
vdo 0
devices /dev/vdb

====== osd.1 =======

[block] /dev/ceph-bbf9c4c4-6f4d-4c10-a34b-8704de19abd8/osd-block-a05fcdf3-3f95-485a-
b915-5b7781417fbe

block device /dev/ceph-bbf9c4c4-6f4d-4c10-a34b-8704de19abd8/osd-block-


a05fcdf3-3f95-485a-b915-5b7781417fbe
block uuid sH2lln-gufE-zoVa-tWt5-nEyO-EDz9-dFfdME
cephx lockbox secret AQDPerZmPEj0NxAAayM3bXBkdKLg19J+D0zogA==
cluster fsid 6f105f06-562e-11ef-8666-525400463683
cluster name ceph
crush device class
encrypted 1
osd fsid a05fcdf3-3f95-485a-b915-5b7781417fbe
osd id 1
osdspec affinity cost_capacity
type block
vdo 0
devices /dev/vdc

====== osd.2 =======

[block] /dev/ceph-af9e6646-715b-44de-8a82-06fa4978ff04/osd-block-e49b668e-2398-4d4e-
85e3-f050cb7989d3

block device /dev/ceph-af9e6646-715b-44de-8a82-06fa4978ff04/osd-block-


e49b668e-2398-4d4e-85e3-f050cb7989d3
block uuid QO4sBS-Mft4-3F6H-3rHD-JEEd-sVQx-I5xAR0
cephx lockbox secret AQDcerZme/mMOxAAchkaNrkJDj0Na6k+a78FvA==
cluster fsid 6f105f06-562e-11ef-8666-525400463683
cluster name ceph
crush device class
encrypted 1
osd fsid e49b668e-2398-4d4e-85e3-f050cb7989d3
osd id 2
osdspec affinity cost_capacity
type block
vdo 0
devices /dev/vdd
[ceph: root@ndceph /]#

Or we can query LUKS using the device name for the OSD obtained from the above command. For
example, for osd.2:

[ceph: root@ndceph /]# cryptsetup luksDump /dev/ceph-af9e6646-715b-44de-8a82-


06fa4978ff04/osd-block-e49b668e-2398-4d4e-85e3-f050cb7989d3
LUKS header information
Version: 2
Epoch: 3
Metadata area: 16384 [bytes]
Keyslots area: 16744448 [bytes]
UUID: 7ef61bb0-be92-4a63-87c9-41dcf9d27863
Label: (no label)

245
IBM Storage Ceph for Beginner’s

Subsystem: (no subsystem)


Flags: (no flags)

Data segments:
0: crypt
offset: 16777216 [bytes]
length: (whole device)
cipher: aes-xts-plain64
sector: 512 [bytes]

Keyslots:
0: luks2
Key: 512 bits
Priority: normal
Cipher: aes-xts-plain64
Cipher key: 512 bits
PBKDF: argon2id
Time cost: 5
Memory: 1048576
Threads: 2
Salt: e2 73 80 30 82 a5 28 f9 f5 58 d2 46 c8 d6 68 9b
34 34 3c 4b 4e 17 66 b7 e6 b3 ee 53 10 b0 0f ee
AF stripes: 4000
AF hash: sha256
Area offset:32768 [bytes]
Area length:258048 [bytes]
Digest ID: 0
Tokens:
Digests:
0: pbkdf2
Hash: sha256
Iterations: 117028
Salt: 9a 16 a0 4a b2 15 ca 10 08 94 45 70 5e 34 b7 ff
dc 7f 36 0b 2d 6f dd 98 49 bd 81 10 40 a8 d0 ed
Digest: ce a5 f2 8b 70 27 a6 6d 16 b5 2e 78 66 af 58 78
2f ca bc 31 d5 4b dc d7 24 dc f9 91 c8 de 27 b4
[ceph: root@ndceph /]#

Note, Ceph uses LUKS version 2. IBM Storage Ceph also supports key rotation. See here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=management-enabling-key-rotation

IBM Storage Ceph RGW SSL Termination

When using HAProxy and keepalived to terminate SSL connections, the HAProxy and keepalived
components use encryption keys. Please refer to the section describing Ceph RGW Deployment where
we used the ingress service that makes use of HAProxy and keepalived. We enabled SSL termination
and generated a self-signed certificate for SSL termination.

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=management-ssl-termination

IBM Storage Ceph RGW S3 Encryption

If you implemented Ceph Data-at-rest encryption, then all data stored including object data will
automatically be encrypted. However, Ceph RGW also supports both server-side and client-side
encryption for S3 object data. The Ceph Object Gateway supports server-side encryption of uploaded
objects for the S3 application programming interface (API). Server-side encryption means that the S3
client sends data over HTTP in its unencrypted form, and the Ceph Object Gateway stores that data in
the IBM Storage Ceph cluster in encrypted form. For more information see here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=security-server-side-encryption

246
IBM Storage Ceph for Beginner’s

The three server-side encryption key options supported by the Ceph RDG are described below
(source: https://www.ibm.com/docs/en/storage-ceph/7.1?topic=security-server-side-encryption)

Customer-provided keys - When using customer-provided keys, the S3 client passes an encryption
key along with each request to read or write encrypted data. It is the customer’s responsibility to
manage those keys. Customers must remember which key the Ceph Object Gateway used to encrypt
each object. Ceph Object Gateway implements the customer-provided key behavior in the S3 API
according to the Amazon SSE-C specification. Since the customer handles the key management and
the S3 client passes keys to the Ceph Object Gateway, the Ceph Object Gateway requires no special
configuration to support this encryption mode.

Key management service - When using a key management service, the secure key management
service stores the keys and the Ceph Object Gateway retrieves them on demand to serve requests to
encrypt or decrypt data. Ceph Object Gateway implements the key management service behavior in
the S3 API according to the Amazon SSE-KMS specification. Important: Currently, the only tested key
management implementations are HashiCorp Vault, and OpenStack Barbican. However, OpenStack
Barbican is a Technology Preview and is not supported for use in production systems.

SSE-S3 - When using SSE-S3, the keys are stored in a vault, but they are automatically created and
deleted by Ceph and retrieved as required to serve requests to encrypt or decrypt data. Ceph Object
Gateway implements the SSE-S3 behavior in the S3 API according to the Amazon SSE-S3
specification.

Figure 237: Ceph Dashboard – Object – Bucket – Create

247
IBM Storage Ceph for Beginner’s

Figure 238: Ceph Dashboard – Object – Bucket – Create (Specify server-side encryption method)

Using server-side S3 encryption is outside the scope of this document so we will concentrate on
demonstrating the use of client-side encryption. However, for the sake of completeness, you can set
the default server-side encryption for an existing bucket by using the S3 API. This is documented in
the link below:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=encryption-setting-default-existing-s3-
bucket

By setting the desired server-side encryption for an existing bucket, we can ensure that all new files
written to this bucket will be encrypted. First, we will query an existing bucket to see if it is encrypted
using AWSCLI.

[root@cephnode1 ~]# export AWS_CA_BUNDLE=/root/cert/cephs3.pem


[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local list-buckets
{
"Buckets": [
{
"Name": "awsbucket",
"CreationDate": "2024-07-20T20:52:45.833000+00:00"
},
{
"Name": "s3-tests-02dmnyfqor86m2m224ln-396",
"CreationDate": "2024-07-25T21:35:53.935000+00:00"
},
{
"Name": "testbuck2",
"CreationDate": "2024-07-25T21:39:57.960000+00:00"
},
{
"Name": "testbucket",
"CreationDate": "2024-07-20T20:21:36.722000+00:00"
},
{
"Name": "testwebsite",
"CreationDate": "2024-07-21T09:01:03.958000+00:00"
}
],
"Owner": {
"DisplayName": "Nishaan Docrat",
"ID": "ndocrat"

248
IBM Storage Ceph for Beginner’s

}
}
[root@cephnode1 ~]#

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local get-bucket-encryption --


bucket awsbucket

An error occurred (ServerSideEncryptionConfigurationNotFoundError) when calling the


GetBucketEncryption operation: The server side encryption configuration was not found
[root@cephnode1 ~]#

Since there is no encryption on the existing bucket, we will create JSON file with the preferred
encryption settings.

[root@cephnode1 ~]# cat bucket_encryption.json


{
"Rules": [
{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}
]
}
[root@cephnode1 ~]#

Next, we will apply the default encryption for our bucket.

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local put-bucket-encryption --


bucket awsbucket --server-side-encryption-configuration file://bucket_encryption.json
[root@cephnode1 ~]#
To verify the bucket is encrypted, we can run:

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local get-bucket-encryption --


bucket awsbucket {
"ServerSideEncryptionConfiguration": {
"Rules": [
{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}
]
}
}
[root@cephnode1 ~]#

Client-side encryption is the act of encrypting your data locally (at the client) to help ensure its security
in transit and at rest. To demonstrate the use of client-side encryption we will use s3cmd. To use
s3cmd, we need to define the encryption passphrase in the .s3cfg configuration file.

root@labserver:~# cat .s3cfg | grep gpg


gpg_command = /usr/bin/gpg
gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd
%(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd
%(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_passphrase = myencryptionpassword
root@labserver:~#

Next, we will create new bucket called encrypted and upload a text file to it with the -e flag which
encrypts the object.

root@labserver:~# s3cmd mb s3://encrypted


Bucket 's3://encrypted/' created
root@labserver:~# s3cmd info s3://encrypted
s3://encrypted/ (bucket):
Location: us-east-1
Payer: BucketOwner

249
IBM Storage Ceph for Beginner’s

Ownership: none
Versioning:none
Expiration rule: none
Block Public Access: none
Policy: none
CORS: none
ACL: Nishaan Docrat: FULL_CONTROL
root@labserver:~# s3cmd put /etc/hosts s3://encrypted -e
upload: '/tmp/tmpfile-7r0ikiQiALALIhkEQpqV' -> 's3://encrypted/hosts' [1 of 1]
357 of 357 100% in 0s 14.31 KB/s done
root@labserver:~#

s3cmd will automatically decrypt a file when it is retrieved. If we use another S3 API client like
AWSCLI, we should be able to retrieve the object but it should be encrypted. We can test this as
follows:

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local list-objects --bucket


encrypted
{
"Contents": [
{
"Key": "hosts",
"LastModified": "2024-08-10T05:06:11.525000+00:00",
"ETag": "\"c84f57ae238daef622b728674c472e33\"",
"Size": 357,
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "Nishaan Docrat",
"ID": "ndocrat"
}
}
],
"RequestCharged": null
}
[root@cephnode1 ~]#

[root@cephnode1 ~]# aws s3api --endpoint-url https://cephs3.local get-object --bucket


encrypted --key hosts /root/hosts
{
"AcceptRanges": "bytes",
"LastModified": "2024-08-10T05:06:11+00:00",
"ContentLength": 357,
"ETag": "\"c84f57ae238daef622b728674c472e33\"",
"ContentType": "application/octet-stream",
"Metadata": {
"s3cmd-attrs":
"atime:1723258386/ctime:1723171986/gid:0/gname:root/md5:fbaa3e77a30a93cf10e4089b803fb1ae/mode
:33188/mtime:1723171986/uid:0/uname:root",
"s3tools-gpgenc": "gpg"
},
"StorageClass": "STANDARD"
}
[root@cephnode1 ~]# ls -al hosts
-rw-r--r--. 1 root root 357 Aug 10 07:11 hosts
[root@cephnode1 ~]# cat hosts

▒g▒z▒▒▒▒▒▒▒]:▒▒▒▒>▒
R▒_▒s0▒▒▒▒
▒G]▒▒e<1▒x▒▒v▒▒T▒▒?a▒▒h#
I▒▒▒eW▒▒Co▒▒q▒▒▒kƻ#TS▒<▒+KB▒b▒x▒QLm:▒▒▒▒▒▒G▒▒B▒▒K▒D&▒▒І▒N▒F▒▒#7▒7‫▒`▒▒{▒▒@▒ڿ‬
▒▒▒L▒▒zN5`gO▒~▒W▒▒s˔▒▒/▒▒▒▒v
6}▒!▒▒^,ű▒7[W▒;cB7▒L{▒8^E:▒▒▒/upzTH!g▒▒▒*▒▒&▒HN7▒.▒
"y▒▒ XA5▒▒Y-▒~u▒/]▒jC0�▒1˴▒ɦ▒▒,
k▒
▒ƚ ▒i[root@cephnode1 ~]#

Using AWSCLI we are able to retrieve the object but it is encrypted. We can use s3cmd with our
passphrase to download the same file and check that it is decrypted.

root@labserver:~# s3cmd get s3://encrypted/hosts hosts


download: 's3://encrypted/hosts' -> 'hosts' [1 of 1]
357 of 357 100% in 0s 121.64 KB/s done
root@labserver:~# ls -alt hosts

250
IBM Storage Ceph for Beginner’s

-rwx------ 1 root root 712 Aug 10 05:06 hosts


root@labserver:~# cat hosts
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts


::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

10.0.0.240 cephnode1.local cephnode1


10.0.0.241 cephnode2.local cephnode2
10.0.0.242 cephnode3.local cephnode3
10.0.0.243 cephnode4.local cephnode4
10.0.0.244 cephs3.local cephs3
10.0.0.245 cephnfs.local cephnfs
10.0.0.246 labserver.myceph.com labserver

10.0.0.249 lceph.local lceph


10.0.0.239 rceph.local rceph
10.0.0.235 ndceph.local ndceph

10.0.0.237 k8smaster.local k8smaster


10.0.0.238 k8sworker1.local k8sworker1
#10.0.0.239 k8sworker2.local k8sworker2
10.0.0.236 k8sworker3.local k8sworker3
root@labserver:~#

IBM Storage Ceph Compression


Possibly the biggest pitfall of software defined storage is the lack of data reduction capabilities
including deduplication and compression. With the average amount of usable capacity required by
customers is growing at an exponential rate, the lack of data reduction capabilities drives up the
overall solution costs. If you factor in the costs of SSDs, you can quickly see how the overall cost is
negatively impacted. IBM Storage Ceph (like IBM Storage Scale) only supports software-based
compression. IBM Storage Ceph offers two types of compression. BlueStore compression (done at the
storage pool level) and Ceph RGW S3 compression. As with Ceph Block Encryption, software-based
compression has a performance overhead.

Figure 239: Ceph BlueStore Compression (source: https://www.redhat.com/en/blog/red-hat-ceph-


storage-33-bluestore-compression-performance)

Ceph supports four different compression algorithms and different compression modes for backend
BlueStore (pool level) encryption. It is almost guaranteed that you would need to test Ceph’s data
reduction capability during a POC. To get an accurate representation of the expected performance
impact and compression ratios to expect, it would be best to test with the customer’s data. Granted,

251
IBM Storage Ceph for Beginner’s

any performance testing on POC hardware is definitely not representative of a production deployment
so you would need to document this clearly. You would also need to get a reasonable variety of data
that you expect to write to the Ceph cluster of a representative size (testing with synthetically
generated files would give you inaccurate compression results for example). More details on back-
end compression can be found here:

https://www.ibm.com/docs/en/storage-ceph/7?topic=clusters-back-end-compression

For the purposes of this document, a process to test the compression is demonstrated. Due to
limitations in the lab environment storage capacity, the actual compression results should not be
taken as being representative of Ceph’s capability. For the BlueStore compression, a wide variety of
different file types were downloaded from https://filesamples.com/ to test the different Ceph
compression algorithms. The total sample size was only 2GB so definitely not representative of any
real-world workload. Five different RBD storage pools were created with 4 using one of Ceph’s
compression algorithms and one being uncompressed. It’s important to note that this test was
performed on RBD images and you could potentially get better results using CephFS.

Figure 240: Ceph Dashboard – Cluster – Pools - Create

252
IBM Storage Ceph for Beginner’s

Figure 241: RBD Pools used for Compression Testing

As mentioned above, Ceph supports different compression modes. These are:

• none - Never compress data.


• passive - Do not compress data unless the write operation has a compressible hint set.
• aggressive - Compress data unless the write operation has an incompressible hint set.
• force - Try to compress data no matter what. Uses compression under all circumstances even
if the clients hint that the data is not compressible

All the pools were set to use force. Another important factor to consider is that for HDD OSDs we have
64KB allocation units and compression block sizes within [128KB,512KB] range. If an input block is
lower than 128KB - it's not compressed. If it's above 512KB, it's split into multiple chunks and each
one is compressed independently (noting that any chunk smaller than 128KB is not compressed).

On the RBD client, images from each of these pools were mounted.

root@labserver:~# df -hT
Filesystem Type Size Used Avail Use% Mounted on
tmpfs tmpfs 392M 1.1M 391M 1% /run
/dev/sda2 ext4 20G 17G 2.5G 87% /
tmpfs tmpfs 2.0G 84K 2.0G 1% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 392M 16K 392M 1% /run/user/0
/dev/rbd0 xfs 5.0G 130M 4.9G 3% /mnt/snappy
/dev/rbd1 xfs 5.0G 130M 4.9G 3% /mnt/zlib
/dev/rbd2 xfs 5.0G 130M 4.9G 3% /mnt/zstd
/dev/rbd3 xfs 5.0G 130M 4.9G 3% /mnt/lz4
/dev/rbd4 xfs 5.0G 130M 4.9G 3% /mnt/uncompressed
root@labserver:~#

And our sample data was copied to each filesystem. The overall data stored on each filesystem
equated to 2GB and was the same across all filesystems.

root@labserver:/mnt# du -b .
2088985345 ./zlib/testfiles
2088985345 ./zlib
0 ./rcephfs
2088985345 ./uncompressed/testfiles

253
IBM Storage Ceph for Beginner’s

2088985345 ./uncompressed
0 ./testbucket
2088985345 ./snappy/testfiles
2088985345 ./snappy
0 ./cephnfs
0 ./rbd
2088985345 ./lz4/testfiles
2088985345 ./lz4
0 ./opennfs
2088985345 ./zstd/testfiles
2088985345 ./zstd
0 ./lcephfs
10444926725 .
root@labserver:/mnt#

Now if we check the pool capacity usage on our Ceph cluster we can see the following:

root@lceph ~]# ceph df | grep -E "POOL|rbd"


--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
rbd 13 32 697 MiB 198 2.0 GiB 3.77 17 GiB
rbd_uncompressed 23 32 1.9 GiB 510 5.8 GiB 10.12 17 GiB
rbd_snappy 24 32 1.9 GiB 510 3.9 GiB 6.95 26 GiB
rbd_zlib 25 32 1.9 GiB 510 5.8 GiB 10.05 17 GiB
rbd_zstd 26 32 1.9 GiB 510 5.8 GiB 10.05 17 GiB
rbd_lz4 27 32 1.9 GiB 512 3.9 GiB 6.95 26 GiB
[root@lceph ~]#

Based on the above output and taking into account the pool 3-way replication, to store 2GB of data
would require 6GB of physical capacity. We can see that the snappy and lz4 algorithms offered a 35%
capacity savings and the others about 4%. Of course using a larger data sample you would see better
results.

Above is a simple method of how to test Ceph BlueStore compression.

Ceph does not delete the object when deleting the file, same as the traditional file system, and the
object still remains on the RBD device. Also, a new write will either over-write these objects or
create new ones, as required. Therefore, the objects are still present in the pool, a 'ceph df' will
show the pool being occupied with the objects, even though those are not used. See here for more
information https://access.redhat.com/solutions/3075321

For RGW compression, compression is enabled on a storage class in the Zone’s placement target. It
is enabled by default when creating a zone and set to lz4.

254
IBM Storage Ceph for Beginner’s

Figure 242: Ceph Dashboard – Object – Multi-site – Edit Zone

You can query this for any zone from the command line as well.

[root@cephnode1 ~]# radosgw-admin zone placement list


[
{
"key": "default-placement",
"val": {
"index_pool": "DC1_ZONE.rgw.buckets.index",
"storage_classes": {
"STANDARD": {
"data_pool": "DC1_ZONE.rgw.buckets.data",
"compression_type": "lz4"
}
},
"data_extra_pool": "DC1_ZONE.rgw.buckets.non-ec",
"index_type": 0,
"inline_data": true
}
}
]
[root@cephnode1 ~]#

As a simple test, we can create a new bucket and write our sample data to it and then query the
compression savings.

root@labserver:~/local_disk_testfiles# s3cmd put * s3://compression-test --recursive


upload: 'Around the World in 28 Languages.epub' -> 's3://compression-test/Around the World in
28 Languages.epub' [1 of 54]
3246310 of 3246310 100% in 0s 41.12 MB/s done
upload: 'FileZilla_3.67.0_win64-setup.exe' -> 's3://compression-test/FileZilla_3.67.0_win64-
setup.exe' [2 of 54]
12388016 of 12388016 100% in 0s 72.63 MB/s done
upload: 'Symphony No.6 (1st movement).flac' -> 's3://compression-test/Symphony No.6 (1st
movement).flac' [part 1 of 4, 15MB] [3 of 54]
15728640 of 15728640 100% in 0s 72.75 MB/s done
upload: 'Symphony No.6 (1st movement).flac' -> 's3://compression-test/Symphony No.6 (1st
movement).flac' [part 2 of 4, 15MB] [3 of 54]
15728640 of 15728640 100% in 0s 61.10 MB/s done
upload: 'Symphony No.6 (1st movement).flac' -> 's3://compression-test/Symphony No.6 (1st
movement).flac' [part 3 of 4, 15MB] [3 of 54]
15728640 of 15728640 100% in 0s 75.98 MB/s done
.
.

255
IBM Storage Ceph for Beginner’s

.
upload: 'sample_5184×3456.pbm' -> 's3://compression-test/sample_5184×3456.pbm' [52 of 54]
2239501 of 2239501 100% in 0s 20.56 MB/s done
upload: 'sample_5184×3456.tiff' -> 's3://compression-test/sample_5184×3456.tiff' [part 1 of
4, 15MB] [53 of 54]
15728640 of 15728640 100% in 0s 16.60 MB/s done
upload: 'sample_5184×3456.tiff' -> 's3://compression-test/sample_5184×3456.tiff' [part 2 of
4, 15MB] [53 of 54]
15728640 of 15728640 100% in 0s 17.17 MB/s done
upload: 'sample_5184×3456.tiff' -> 's3://compression-test/sample_5184×3456.tiff' [part 3 of
4, 15MB] [53 of 54]
15728640 of 15728640 100% in 0s 17.38 MB/s done
upload: 'sample_5184×3456.tiff' -> 's3://compression-test/sample_5184×3456.tiff' [part 4 of
4, 6MB] [53 of 54]
6562056 of 6562056 100% in 0s 15.86 MB/s done
upload: 'sample_960x400_ocean_with_audio.webm' -> 's3://compression-
test/sample_960x400_ocean_with_audio.webm' [part 1 of 2, 15MB] [54 of 54]
15728640 of 15728640 100% in 0s 17.73 MB/s done
upload: 'sample_960x400_ocean_with_audio.webm' -> 's3://compression-
test/sample_960x400_ocean_with_audio.webm' [part 2 of 2, 1485KB] [54 of 54]
1520959 of 1520959 100% in 0s 10.25 MB/s done
root@labserver:~/local_disk_testfiles#

Next, we can issue the radosgw-admin bucket stats command to get the bucket capacity utilisation.
We are interested in size_kb_actual and size_kb_utilized.

[root@lceph ~]# radosgw-admin bucket stats


[
{
"bucket": "secbucket",
"num_shards": 11,
"tenant": "",
"versioning": "off",
"zonegroup": "b0269fde-61cc-4b7e-9e34-389cf0bc0eed",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "44d5f41c-a04f-4d1b-8e14-4622bcc7b2af.87566.2",
"marker": "44d5f41c-a04f-4d1b-8e14-4622bcc7b2af.87566.2",
"index_type": "Normal",
"versioned": false,
"versioning_enabled": false,
"object_lock_enabled": false,
"mfa_enabled": false,
"owner": "s3test",
"ver": "0#1,1#1,2#1,3#2,4#1,5#1,6#1,7#1,8#1,9#1,10#1",
"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0",
"mtime": "2024-08-03T07:41:30.293187Z",
"creation_time": "2024-08-03T07:41:30.284419Z",
"max_marker": "0#,1#,2#,3#00000000001.16.6,4#,5#,6#,7#,8#,9#,10#",
"usage": {
"rgw.main": {
"size": 60864458,
"size_actual": 60866560,
"size_utilized": 59842764,
"size_kb": 59438,
"size_kb_actual": 59440,
"size_kb_utilized": 58441,
"num_objects": 1
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
},
{
"bucket": "afterfailback",
.

256
IBM Storage Ceph for Beginner’s

.
.
},
{
"bucket": "compression-test",
"num_shards": 11,
"tenant": "",
"versioning": "off",
"zonegroup": "ace3190f-c02e-40e2-abdd-344efc6ab06c",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "7109acc2-883b-4502-b863-4e7097d7d83c.1215475.1",
"marker": "7109acc2-883b-4502-b863-4e7097d7d83c.1215475.1",
"index_type": "Normal",
"versioned": false,
"versioning_enabled": false,
"object_lock_enabled": false,
"mfa_enabled": false,
"owner": "velero",
"ver": "0#4,1#13,2#10,3#17,4#14,5#11,6#22,7#17,8#96,9#10,10#6",
"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0",
"mtime": "2024-08-04T21:04:08.685795Z",
"creation_time": "2024-08-04T21:04:08.677361Z",
"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#",
"usage": {
"rgw.main": {
"size": 2088985345,
"size_actual": 2089123840,
"size_utilized": 2073674044,
"size_kb": 2040025,
"size_kb_actual": 2040160,
"size_kb_utilized": 2025073,
"num_objects": 54
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 0
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}
]
[root@lceph ~]#

Notice that for our compression-test bucket, using the same data sample we used for the RBD test,
we hardly got any compression savings. Other buckets though do have significant savings (e.g.
testwebsite bucket has a 50% capacity saving).

[root@cephnode1 ~]# radosgw-admin bucket stats | grep -E


"\"bucket\"|size_kb_actual|size_kb_utilized"
"bucket": "testwebsite",
"size_kb_actual": 20,
"size_kb_utilized": 9,
"bucket": "ceph-bkt-76073a1d-de20-43b6-bb28-702f003d9d90",
"bucket": "testbuck2",
"size_kb_actual": 1400,
"size_kb_utilized": 803,
"bucket": "velero",

257
IBM Storage Ceph for Beginner’s

"size_kb_actual": 104,
"size_kb_utilized": 21,
"bucket": "awsbucket",
"size_kb_actual": 4,
"size_kb_utilized": 1,
"bucket": "testbucket",
"size_kb_actual": 20,
"size_kb_utilized": 2,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"bucket": "compression-test",
"size_kb_actual": 2040160,
"size_kb_utilized": 2025073,
"size_kb_actual": 0,
"size_kb_utilized": 0,
[root@cephnode1 ~]#

IBM Storage Ceph Performance Benchmarking


It is unlikely that you would need to benchmark a POC environment as the hardware resources being
used won’t be representative of a production deployment. In the unlikely event that you have
representative hardware or that you need to prove that you can meet a specific performance target
taking into account the hardware resources being used, then you can use a variety of tools to
benchmark your Ceph cluster. These same tools should be used to set a performance baseline for a
production cluster. Ceph includes tools to test OSD and RBD performance for example, and you can
use third party tools to test others (e.g. s3bench for RGW and FIO for CephFS). A comprehensive
explanation of all the available options can be found here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=benchmark-benchmarking-ceph-
performance

For example, to test OSD performance we can run rados bench.

[root@ndceph ~]# ceph osd pool create testbench


pool 'testbench' created
[root@ndceph ~]# rados bench -p testbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10
seconds or 0 objects
Object prefix: benchmark_data_ndceph.local_791787
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 16 16 0 0 0 - 0
1 16 52 36 143.728 144 0.349982 0.364187
2 16 70 54 107.881 72 0.363495 0.365326
3 16 70 54 71.942 0 - 0.365326
4 16 70 54 53.9645 0 - 0.365326
5 16 70 54 43.1749 0 - 0.365326
6 16 70 54 35.9798 0 - 0.365326
7 16 70 54 30.84 0 - 0.365326
8 16 70 54 26.9852 0 - 0.365326
9 16 71 55 24.4313 0.571429 7.02754 0.486458
10 16 83 67 26.7866 48 8.60579 1.94373
Total time run: 10.3208
Total writes made: 83
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 32.168
Stddev Bandwidth: 48.5621
Max bandwidth (MB/sec): 144
Min bandwidth (MB/sec): 0
Average IOPS: 8
Stddev IOPS: 12.1491
Max IOPS: 36
Min IOPS: 0
Average Latency(s): 1.96114
Stddev Latency(s): 3.23827
Max latency(s): 8.65966

258
IBM Storage Ceph for Beginner’s

Min latency(s): 0.0975068


[root@ndceph ~]#

To test RBD performance from a client we can run the rbd command with the rbdbench option.

root@labserver:~# rbd --namespace linux --id linux bench --io-type rw rdbbench


bench type readwrite read:write=50:50 io_size 4096 io_threads 16 bytes 1073741824 pattern
sequential
SEC OPS OPS/SEC BYTES/SEC
1 5232 5248.04 21 MiB/s
2 9520 4760.9 19 MiB/s
3 13776 4594.31 18 MiB/s
4 19408 4854.83 19 MiB/s
5 23776 4757.49 19 MiB/s
6 28080 4565.98 18 MiB/s
7 32272 4553.17 18 MiB/s
8 37536 4751.09 19 MiB/s
.
.
.
60 251456 4575.58 18 MiB/s
61 255168 4119.26 16 MiB/s
62 258880 4081.67 16 MiB/s
elapsed: 62 ops: 262144 ops/sec: 4168.33 bytes/sec: 16 MiB/s
read_ops: 131039 read_ops/sec: 2083.64 read_bytes/sec: 8.1 MiB/s
write_ops: 131105 write_ops/sec: 2084.69 write_bytes/sec: 8.1 MiB/s
root@labserver:~#

To test RGW performance we can use s3bench (https://github.com/igneous-systems/s3bench).

root@labserver:~/go/bin# ./s3bench -accessKey=Z3VKKAHG9W9WLRN30FQF -


accessSecret=zBDMygZBnEizrLrY3ioP7wpePeSy69AJBRGhMG1z -bucket=s3bench -
endpoint=http://cephnode2.local:8080 -region DC1_ZONE -numClients=2 -numSamples=100 -
objectNamePrefix=s3bench -objectSize=10024
Test parameters
endpoint(s): [http://cephnode2.local:8080]
bucket: s3bench
objectNamePrefix: s3bench
objectSize: 0.0096 MB
numClients: 2
numSamples: 100
verbose: %!d(bool=false)

Generating in-memory sample data... Done (47.033µs)

Running Write test...

Running Read test...

Test parameters
endpoint(s): [http://cephnode2.local:8080]
bucket: s3bench
objectNamePrefix: s3bench
objectSize: 0.0096 MB
numClients: 2
numSamples: 100
verbose: %!d(bool=false)

Results Summary for Write Operation(s)


Total Transferred: 0.956 MB
Total Throughput: 1.68 MB/s
Total Duration: 0.569 s
Number of Errors: 0
------------------------------------
Write times Max: 0.019 s
Write times 99th %ile: 0.019 s
Write times 90th %ile: 0.014 s
Write times 75th %ile: 0.013 s
Write times 50th %ile: 0.011 s
Write times 25th %ile: 0.010 s
Write times Min: 0.008 s

259
IBM Storage Ceph for Beginner’s

Results Summary for Read Operation(s)


Total Transferred: 0.956 MB
Total Throughput: 13.09 MB/s
Total Duration: 0.073 s
Number of Errors: 0
------------------------------------
Read times Max: 0.004 s
Read times 99th %ile: 0.004 s
Read times 90th %ile: 0.002 s
Read times 75th %ile: 0.002 s
Read times 50th %ile: 0.001 s
Read times 25th %ile: 0.001 s
Read times Min: 0.001 s

Cleaning up 100 objects...


Deleting a batch of 100 objects in range {0, 99}... Succeeded
Successfully deleted 100/100 objects in 188.251053ms
root@labserver:~/go/bin#

For CephFS performance we can run FIO (https://fio.readthedocs.io/en/latest/fio_doc.html).

root@labserver:/mnt/cephfs# fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio -


-bs=4k --iodepth=1 --size=1G --runtime=10 --group_reporting=1
randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B,
ioengine=libaio, iodepth=1
fio-3.36
Starting 1 process
randwrite: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=1036KiB/s][w=259 IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=82062: Wed Jan 1 21:36:22 2025
write: IOPS=250, BW=1001KiB/s (1025kB/s)(9.78MiB/10004msec); 0 zone resets
slat (usec): min=11, max=11322, avg=37.35, stdev=225.76
clat (usec): min=2425, max=15356, avg=3948.63, stdev=1130.19
lat (usec): min=2455, max=24761, avg=3985.97, stdev=1189.73
clat percentiles (usec):
| 1.00th=[ 2573], 5.00th=[ 2769], 10.00th=[ 2868], 20.00th=[ 3032],
| 30.00th=[ 3195], 40.00th=[ 3326], 50.00th=[ 3523], 60.00th=[ 3884],
| 70.00th=[ 4359], 80.00th=[ 4883], 90.00th=[ 5604], 95.00th=[ 5932],
| 99.00th=[ 7111], 99.50th=[ 8160], 99.90th=[ 9896], 99.95th=[13435],
| 99.99th=[15401]
bw ( KiB/s): min= 696, max= 1312, per=98.98%, avg=992.00, stdev=206.11, samples=19
iops : min= 174, max= 328, avg=248.00, stdev=51.53, samples=19
lat (msec) : 4=63.06%, 10=36.86%, 20=0.08%
cpu : usr=0.58%, sys=1.08%, ctx=2638, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,2504,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):


WRITE: bw=1001KiB/s (1025kB/s), 1001KiB/s-1001KiB/s (1025kB/s-1025kB/s), io=9.78MiB (10.3MB),
run=10004-10004msec
root@labserver:/mnt/cephfs#

Using the IBM Storage RESTful API


Most customers have monitoring and management software that makes use of a RESTful API. As part
of a POC you may need to demonstrate the use of IBM Storage Ceph’s RESTful API to query and/or
modify cluster resources. A full description of Ceph’s RESTful API can be found here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=developer-ceph-restful-api

The Ceph Dashboard has a direct link to the REST API which can be launched by clicking on the Cluster
API URL.

260
IBM Storage Ceph for Beginner’s

Figure 243: Ceph Dashboard Landing Page – Cluster API launch

On the Ceph RESTful API webpage you will see a list of all the supported API calls along with their
required parameters and expected responses.

Figure 244: Ceph REST API webpage

You can test any of the API calls by clicking on “Try it out”.

261
IBM Storage Ceph for Beginner’s

Figure 245: Ceph REST API – Trying out an API call

If you need to simulate application access, you can use the curl command. Ceph uses Java Web Tools
(JWT) tokens for authentication. You first need to authenticate with the Ceph cluster and obtain a valid
token. Then you can use that token to perform REST API calls. Let us demonstrate this process. First,
we need to obtain a valid JWT token. The syntax for the auth API call is as follows (provided so that
you can easily cut and paste it).

curl -k -X 'POST' \
'https://10.0.0.235:8443/api/auth' \
-H 'accept: application/vnd.ceph.api.v1.0+json' \
-H 'Content-Type: application/json' \
-d '{
"username": "admin",
"password": "MyPassw0rd",
"ttl": null
}'

In order to get formatted results, we will pipe the output to jq. The permissions for the user are
displayed. You can create a user with only specific permission via the Ceph Dashboard (e.g. read-only
access).

root@labserver:~# curl -k -X 'POST' 'https://10.0.0.235:8443/api/auth' -H 'accept:


application/vnd.ceph.api.v1.0+json' -H 'Content-Type: application/json' -d '{
"username": "admin",
"password": "myPassw0rd",
"ttl": null
}' | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1356 100 1288 100 68 5202 274 --:--:-- --:--:-- --:--:-- 5489
{
"token":
"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJjZXBoLWRhc2hib2FyZCIsImp0aSI6ImNiNzgwZTM0LWM
1NDItNDAzYi1hY2UwLTU2OTcxZDc3NTNiZSIsImV4cCI6MTczNTgyOTY2MiwiaWF0IjoxNzM1ODAwODYyLCJ1c2VybmFt
ZSI6ImFkbWluIn0.JHGmC3418z7IsqqX8vFKZBBmn8Ed2_6aLJFZKUCO9sQ",
"username": "admin",
"permissions": {
"cephfs": [
"create",
"delete",
"read",
"update"

262
IBM Storage Ceph for Beginner’s

],
"config-opt": [
"create",
"delete",
"read",
"update"
],
"dashboard-settings": [
"create",
"delete",
"read",
"update"
],
"grafana": [
"create",
"delete",
"read",
"update"
],
"hosts": [
"create",
"delete",
"read",
"update"
],
.
.
.
],
"prometheus": [
"create",
"delete",
"read",
"update"
],
"rbd-image": [
"create",
"delete",
"read",
"update"
],
"rbd-mirroring": [
"create",
"delete",
"read",
"update"
],
"rgw": [
"create",
"delete",
"read",
"update"
],
"user": [
"create",
"delete",
"read",
"update"
]
},
"pwdExpirationDate": null,
"sso": false,
"pwdUpdateRequired": false
}
root@labserver:~#

The authentication token we just generated needs to be passed with all future API requests. The
syntax to make requests is provided below for the /api/health/get_cluster_capacity API call.

curl -k -X 'GET' \
'https://10.0.0.235:8443/api/health/get_cluster_capacity' \
-H 'Authorization: Bearer
eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJjZXBoLWRhc2hib2FyZCIsImp0aSI6ImNiNzgwZTM0LWM1
NDItNDAzYi1hY2UwLTU2OTcxZDc3NTNiZSIsImV4cCI6MTczNTgyOTY2MiwiaWF0IjoxNzM1ODAwODYyLCJ1c2VybmFtZ
SI6ImFkbWluIn0.JHGmC3418z7IsqqX8vFKZBBmn8Ed2_6aLJFZKUCO9sQ' \

263
IBM Storage Ceph for Beginner’s

-H 'accept: application/vnd.ceph.api.v1.0+json' \
-H 'Content-Type:application/json'

Again, we can pipe the output to jq to make it easily readable.

root@labserver:~# curl -k -X 'GET' 'https://10.0.0.235:8443/api/health/get_cluster_capacity'


-H 'Authorization: Bearer
eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJjZXBoLWRhc2hib2FyZCIsImp0aSI6ImNiNzgwZTM0LWM1
NDItNDAzYi1hY2UwLTU2OTcxZDc3NTNiZSIsImV4cCI6MTczNTgyOTY2MiwiaWF0IjoxNzM1ODAwODYyLCJ1c2VybmFtZ
SI6ImFkbWluIn0.JHGmC3418z7IsqqX8vFKZBBmn8Ed2_6aLJFZKUCO9sQ' -H 'accept:
application/vnd.ceph.api.v1.0+json' -H 'Content-Type:application/json' | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 179 100 179 0 0 11828 0 --:--:-- --:--:-- --:--:-- 11933
{
"total_avail_bytes": 63625928704,
"total_bytes": 64361594880,
"total_used_raw_bytes": 735666176,
"total_objects": 1,
"total_pool_bytes_used": 8192,
"average_object_size": 4096.0
}
root@labserver:~#

You can also use curl to obtain the Ceph cluster’s metrics which are exported via Prometheus if you
need to.

root@labserver:~# curl -s http://ndceph.local:9283/metrics

# HELP ceph_health_status Cluster health status


# TYPE ceph_health_status untyped
ceph_health_status 0.0
# HELP ceph_mon_quorum_status Monitors in quorum
# TYPE ceph_mon_quorum_status gauge
ceph_mon_quorum_status{ceph_daemon="mon.ndceph"} 1.0
# HELP ceph_fs_metadata FS Metadata
# TYPE ceph_fs_metadata untyped
ceph_fs_metadata{data_pools="10",fs_id="1",metadata_pool="9",name="cephfs"} 1.0
# HELP ceph_mds_metadata MDS Metadata
# TYPE ceph_mds_metadata untyped
ceph_mds_metadata{ceph_daemon="mds.cephfs.ndceph.qbicpf",fs_id="-1",hostname="nd
ceph.local",public_addr="10.0.0.235:6827/1462988993",rank="-1",ceph_version="cep
h version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stab
le)"} 1.0
ceph_mds_metadata{ceph_daemon="mds.cephfs.ndceph.ujhgyz",fs_id="1",hostname="ndc
eph.local",public_addr="10.0.0.235:6829/261267190",rank="0",ceph_version="ceph v
ersion 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)
"} 1.0
# HELP ceph_mon_metadata MON Metadata
# TYPE ceph_mon_metadata untyped
ceph_mon_metadata{ceph_daemon="mon.ndceph",hostname="ndceph.local",public_addr="
10.0.0.235",rank="0",ceph_version="ceph version 18.2.1-194.el9cp (04a992766839cd
3207877e518a1238cdbac3787e) reef (stable)"} 1.0
# HELP ceph_mgr_metadata MGR metadata
# TYPE ceph_mgr_metadata gauge
ceph_mgr_metadata{ceph_daemon="mgr.ndceph.cmmpsd",hostname="ndceph.local",ceph_v
ersion="ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e)
reef (stable)"} 1.0
ceph_mgr_metadata{ceph_daemon="mgr.ndceph.azavyo",hostname="ndceph.local",ceph_v
ersion="ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e)
reef (stable)"} 1.0
.
.
.
ceph_prometheus_collect_duration_seconds_count{method="get_mgr_status"} 12651.0
ceph_prometheus_collect_duration_seconds_count{method="get_pg_status"} 12651.0
ceph_prometheus_collect_duration_seconds_count{method="get_osd_stats"} 12651.0
ceph_prometheus_collect_duration_seconds_count{method="get_metadata_and_osd_stat
us"} 12651.0
ceph_prometheus_collect_duration_seconds_count{method="get_num_objects"} 12651.0
ceph_prometheus_collect_duration_seconds_count{method="get_rbd_stats"} 12651.0
root@labserver:~#

264
IBM Storage Ceph for Beginner’s

Using the IBM Storage Ceph SNMP Gateway


SNMP is still a widely used protocol used to monitor distributed systems and devices across a variety
of hardware and software platforms. Ceph’s SNMP integration focuses on forwarding alerts from its
Prometheus Alertmanager cluster to a gateway daemon. The gateway daemon, transforms the alert
into an SNMP Notification and sends it on to a designated SNMP management platform. The gateway
daemon provides SNMP V2c and V3 support (authentication and encryption). The procedure to
configure snmptrapd is documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=traps-configuring-snmptrapd

If you need to demonstrate this as part of a POC, the easiest way to do it is to use Net-SNMP. First, we
need to install Net-SNMP on a SNMP management host and make sure port 162 is open.

dnf install -y net-snmp-utils net-snmp


firewall-cmd --zone=public --add-port=162/udp --permanent
firewall-cmd –reload

Next, we need to download the Ceph SNMP MIB and copy it to the default Net-SNMP MIB directory.
Make sure permissions on the MIB file are set to 644.

curl -o CEPH_MIB.txt -L
https://raw.githubusercontent.com/ceph/ceph/master/monitoring/snmp/CEPH-MIB.txt
cp CEPH_MIB.txt /usr/share/snmp/mibs

You can browse the CEPH SNMP MIB using snmptranslate if you need to see what traps are supported.

[root@lceph snmptrapd]# snmptranslate -Pu -Tz -M /usr/share/snmp/mibs -m CEPH-MIB


"org" "1.3"
"dod" "1.3.6"
"internet" "1.3.6.1"
"directory" "1.3.6.1.1"
"mgmt" "1.3.6.1.2"
"mib-2" "1.3.6.1.2.1"
"transmission" "1.3.6.1.2.1.10"
"experimental" "1.3.6.1.3"
"private" "1.3.6.1.4"
"enterprises" "1.3.6.1.4.1"
"ceph" "1.3.6.1.4.1.50495"
"cephCluster" "1.3.6.1.4.1.50495.1"
"cephMetadata" "1.3.6.1.4.1.50495.1.1"
"cephNotifications" "1.3.6.1.4.1.50495.1.2"
"prometheus" "1.3.6.1.4.1.50495.1.2.1"
"promGeneric" "1.3.6.1.4.1.50495.1.2.1.1"
"promGenericNotification" "1.3.6.1.4.1.50495.1.2.1.1.1"
"promGenericDaemonCrash" "1.3.6.1.4.1.50495.1.2.1.1.2"
"promHealthStatus" "1.3.6.1.4.1.50495.1.2.1.2"
"promHealthStatusError" "1.3.6.1.4.1.50495.1.2.1.2.1"
"promHealthStatusWarning" "1.3.6.1.4.1.50495.1.2.1.2.2"
"promMon" "1.3.6.1.4.1.50495.1.2.1.3"
"promMonLowQuorum" "1.3.6.1.4.1.50495.1.2.1.3.1"
"promMonDiskSpaceCritical" "1.3.6.1.4.1.50495.1.2.1.3.2"
"promOsd" "1.3.6.1.4.1.50495.1.2.1.4"
"promOsdDownHigh" "1.3.6.1.4.1.50495.1.2.1.4.1"
"promOsdDown" "1.3.6.1.4.1.50495.1.2.1.4.2"
.
.
.
"cephCompliance" "1.3.6.1.4.1.50495.2.2.1"
"security" "1.3.6.1.5"
"snmpV2" "1.3.6.1.6"
"snmpDomains" "1.3.6.1.6.1"
"snmpProxys" "1.3.6.1.6.2"
"snmpModules" "1.3.6.1.6.3"
"zeroDotZero" "0.0"
[root@lceph snmptrapd]#

265
IBM Storage Ceph for Beginner’s

Since we want to use SNMPV3 we only need to create the snmptrapd_auth.conf file. We need to
specify the ENGINE_ID. IBM recommends using 8000C53F_CLUSTER_FSID_WITHOUT_DASHES for
this parameter. To obtain the Ceph FSID you can run ceph -s.

[root@cephnode1 ~]# ceph -s


cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_OK
.
.
,

Our engine ID should therefore be 8000C53Fe7fcc1ac42ec11efa58fbc241172f341. Note, the IBM


documentation states the recommended ENGINE_ID should be
8000C53F_CLUSTER_FSID_WITHOUT_DASHES though this did not work.

What did work is 0x800007DB03 in the snmptrapd.conf file. Our snmptrapd.conf file should also
contain an SNMP_V3_AUTH_USER_NAME and SNMP_V3_AUTH_PASSWORD. We will use myceph as
the user and mycephpassword as the password.

[root@lceph snmptrapd]# cat snmptrapd_auth.conf


format2 %V\n% Agent Address: %A \n Agent Hostname: %B \n Date: %H - %J - %K - %L - %M - %Y \n
Enterprise OID: %N \n Trap Type: %W \n Trap Sub-Type: %q \n Community/Infosec Context: %P \n
Uptime: %T \n Description: %W \n PDU Attribute/Value Pair Array:\n%v \n -------------- \n
createuser -e 0x800007DB03 myceph SHA mycephpassword
authuser log,execute myceph
[root@lceph snmptrapd]#

Next, we can run the Net-SNMP daemon in the foreground on the SNMP management host. Note. the
IBM documentation states specify CEPH-MIB.txt which is incorrect. You need to drop the .txt
extension.

[root@lceph snmptrapd]# /usr/sbin/snmptrapd -M /usr/share/snmp/mibs -m CEPH-MIB -f -C -c


/root/snmptrapd/snmptrapd_auth.conf -Of -Lo :162
NET-SNMP version 5.9.1

On your Ceph cluster, we can now create the SNMP gateway service.

Figure 246: Ceph Dashboard – Administration – Services – Create

266
IBM Storage Ceph for Beginner’s

Specify the ENGINE ID that matches our snmptrapd.conf file (exclude 0x) and the SNMPV3 username
and password we specified in the file. Also, make sure you select SNMP Version 3. Verify the service
is up and running.

Figure 247: Ceph Dashboard – Administration – Services

Now you can monitor the SNMP management host for any SNMP traps. A simple way to generate a
trap is just to reboot one of the Ceph cluster nodes.

[root@cephnode3 ~]# reboot


[root@cephnode3 ~]#

And on the SNMP management host we immediately get a SNMP trap.

[root@lceph snmptrapd]# /usr/sbin/snmptrapd -d -M /usr/share/snmp/mibs -m CEPH-MIB -f -C -c


/root/snmptrapd/snmptrapd_auth.conf -Of -Lo :162
NET-SNMP version 5.9.1

Received 269 byte packet from UDP: [10.0.0.243]:34040->[10.0.0.249]:162


0000: 30 82 01 09 02 01 03 30 10 02 04 71 00 8C AD 02 0......0...q....
0016: 02 05 78 04 01 01 02 01 03 04 27 30 25 04 05 80 ..x.......'0%...
0032: 00 07 DB 03 02 01 00 02 01 00 04 06 6D 79 63 65 ............myce
0048: 70 68 04 0C A8 77 27 45 F2 85 96 A2 83 9B CA 68 ph...w'E.......h
0064: 04 00 30 81 C8 04 05 80 00 07 DB 03 04 00 A7 81 ..0.............
0080: BC 02 04 59 0F 08 83 02 01 00 02 01 00 30 81 AD ...Y.........0..
0096: 30 0E 06 08 2B 06 01 02 01 01 03 00 43 02 17 D4 0...+.......C...
0112: 30 1B 06 0A 2B 06 01 06 03 01 01 04 01 00 06 0D 0...+...........
0128: 2B 06 01 04 01 83 8A 3F 01 02 01 04 01 30 48 06 +......?.....0H.
0144: 0E 2B 06 01 04 01 83 8A 3F 01 02 01 04 01 01 04 .+......?.......
0160: 36 31 2E 33 2E 36 2E 31 2E 34 2E 31 2E 35 30 34 61.3.6.1.4.1.504
0176: 39 35 2E 31 2E 32 2E 31 2E 34 2E 31 5B 61 6C 65 95.1.2.1.4.1[ale
0192: 72 74 6E 61 6D 65 3D 43 65 70 68 4F 53 44 44 6F rtname=CephOSDDo
0208: 77 6E 48 69 67 68 5D 30 16 06 0E 2B 06 01 04 01 wnHigh]0...+....
0224: 83 8A 3F 01 02 01 04 01 02 04 04 69 6E 66 6F 30 ..?........info0
0240: 1C 06 0E 2B 06 01 04 01 83 8A 3F 01 02 01 04 01 ...+......?.....
0256: 03 04 0A 53 74 61 74 75 73 3A 20 4F 4B ...Status: OK

Agent Address: 0.0.0.0


Agent Hostname: cephnode4.local
Date: 0 - 17 - 36 - 13 - 7 - 4432539
Enterprise OID: .
Trap Type: Cold Start
Trap Sub-Type: 0
Community/Infosec Context: TRAP2, SNMP v3, user myceph, context
Uptime: 0
Description: Cold Start
PDU Attribute/Value Pair Array:
.iso.org.dod.internet.mgmt.mib-2.1.3.0 = Timeticks: (6100) 0:01:01.00

267
IBM Storage Ceph for Beginner’s

.iso.org.dod.internet.snmpV2.snmpModules.1.1.4.1.0 = OID:
.iso.org.dod.internet.private.enterprises.ceph.cephCluster.cephNotifications.prometheus.promO
sd.promOsdDownHigh
.iso.org.dod.internet.private.enterprises.ceph.cephCluster.cephNotifications.prometheus.promO
sd.promOsdDownHigh.1 = STRING: "1.3.6.1.4.1.50495.1.2.1.4.1[alertname=CephOSDDownHigh]"
.iso.org.dod.internet.private.enterprises.ceph.cephCluster.cephNotifications.prometheus.promO
sd.promOsdDownHigh.2 = STRING: "info"
.iso.org.dod.internet.private.enterprises.ceph.cephCluster.cephNotifications.prometheus.promO
sd.promOsdDownHigh.3 = STRING: "Status: OK"
--------------

IBM Storage Ceph Email Alerting


Email alerts are important in order to effectively manage one or more IBM Storage Ceph clusters. If
you need to demonstrate this as part of a POC or test, you can follow the procedure documented here:

https://www.ibm.com/docs/en/storage-ceph/7?topic=daemons-using-ceph-manager-alerts-
module

If you don’t have an SMTP relay to use, you can setup postfix on a server to forward email to public
email providers like Gmail and Yahoo. See here:
https://www.linode.com/docs/guides/configure-postfix-to-send-mail-using-gmail-and-google-
workspace-on-debian-or-ubuntu/

A simple way to enable email alerts is via the Ceph Dashboard. Navigate to Administration -> Manager
Modules -> select Alerts and click Edit. Enter your SMTP relay information.

Figure 248: Ceph Dashboard – Administration – Manager Modules – Alerts

Now when events are triggered, you should receive email alerts. Some examples are shown below:

268
IBM Storage Ceph for Beginner’s

Figure 249: Ceph Email Alert

Figure 250: Ceph Email Alert

269
IBM Storage Ceph for Beginner’s

IBM Storage Ceph Call Home and Storage Insights


IBM Call Home is one of the benefits of using IBM Storage Ceph versus open-source Ceph. Call Home
connects you to IBM support to monitor and respond to system events. IBM Storage Insights is a
cloud-based service that allows you to analyse, protect and optimize your storage infrastructure. More
information on the benefits of using IBM Storage Insights can be found here
https://www.ibm.com/products/storage-insights. Both the IBM Storage Ceph Pro and Premium
Editions include entitlement to use IBM Storage Insights.

The latest version of IBM Storage Insights supports the logs upload feature for existing IBM Storage
Ceph support tickets. This can dramatically reduce the time to resolve issues. It also includes an AI
Chatbot which allows users to interact and chat with IBM Storage Insights in natural language form to
help in observability and monitoring. You can view the latest Insights enhancements here:

https://www.ibm.com/docs/en/storage-insights?topic=new-change-history

Figure 251: IBM Storage Insights – Unified Storage Systems Dashboard

IBM Storage Ceph is only supported with IBM Storage Insights Pro. If you want to demonstrate this
integration as part of you POC you can use the IBM Storage Insights 60-day trial. More information on
how to access the trial is available here:

https://www.ibm.com/docs/en/storage-insights?topic=pro-want-try-buy-storage-insights

To configure Call home and Storage Insights in Ceph, you can follow this procedure. Make sure you
have a Storage Insights Pro license (or trial) entitlement. Click on the top right of the Ceph Dashboard
and select the drop down on the user icon to setup IBM Call Home and/or Storage Insights. The full
procedure to do this is also documented here.

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=insights-enabling-call-home-storage
https://www.ibm.com/docs/en/storage-insights?topic=pro-planning-storage-ceph-systems

270
IBM Storage Ceph for Beginner’s

Figure 252: Ceph Dashboard – Call Home and Storage Insights

IBM Storage Ceph clusters do not require the use of a Storage Insights data collector. To understand
what information is uploaded to IBM and also any firewall rules required for Insights to work see here:

https://www.ibm.com/docs/en/SSQRB8/pdf/IBM_Storage_Insights_Security_Guide.pdf

You have to configure Call Home first or you will get an error.

Click on Call Home and complete the wizard.

271
IBM Storage Ceph for Beginner’s

Figure 253: Ceph Dashboard – Call Home Setup

Next you can configure IBM Storage Insights. Select your company name as the tenant ID and Insights
will give you a list of choices to select and confirm if it finds a match.

Figure 254: Ceph Dashboard – IBM Storage Insights Setup

If you click on Call Home again you should have the option of downloading your inventory or checking
when the last contact with IBM was.

272
IBM Storage Ceph for Beginner’s

Figure 255: Ceph Dashboard – IBM Call Home Download Reports

IBM Storage Ceph Logging a Manual Support Ticket


If you need to log a support ticket to help you resolve any issue, you can do so easily. Log in to the IBM
Support website with your IBM ID (you can create an IBM ID if you don’t already have one) and click
on Open a case.

https://www.ibm.com/mysupport/s/?language=en_US

Figure 256: IBM Support – Open a case

273
IBM Storage Ceph for Beginner’s

You will need to generate an sos report and upload it to your support ticket. This process is
documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=troubleshooting-generating-sos-report

On one your storage Ceph cluster nodes, you can issue the following command to generate the report.
The report will be located in /var/tmp.

root@cephnode1 ~]# sosreport -a --all-logs


Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command,
E.G. 'sos report'.
Redirecting to 'sos report -a --all-logs'

sosreport (version 4.7.1)

This command will collect diagnostic and configuration information from


this Red Hat Enterprise Linux system and installed applications.

An archive containing the collected information will be generated in


/var/tmp/sos.mq1wgh68 and may be provided to a Red Hat support
representative.

Any information provided to Red Hat will be treated in accordance with


the published support policies at:

Distribution Website : https://www.redhat.com/


Commercial Support : https://access.redhat.com/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.

Press ENTER to continue, or CTRL-C to quit.

Optionally, please enter the case id that you are generating this report for []: DUMMY_CASE

Setting up archive ...


Setting up plugins ...
[plugin:firewall_tables] skipped command 'nft -a list ruleset': required kmods missing:
nf_tables, nfnetlink. Use '--allow-system-changes' to enable collection.
[plugin:firewall_tables] skipped command 'iptables -vnxL': required kmods missing: nf_tables,
iptable_filter.
[plugin:firewall_tables] skipped command 'ip6tables -vnxL': required kmods missing:
nf_tables, ip6table_filter.
[plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec.
Use '--allow-system-changes' to enable collection.
[plugin:networking] skipped command 'ss -peaonmi': required kmods missing: xsk_diag. Use '-
-allow-system-changes' to enable collection.
[plugin:networking] WARNING: collecting an eeprom dump is known to cause certain NIC drivers
(e.g. bnx2x/tg3) to interrupt device operation
[plugin:sssd] skipped command 'sssctl config-check': required services missing: sssd.
[plugin:sssd] skipped command 'sssctl domain-list': required services missing: sssd.
[plugin:systemd] skipped command 'systemd-resolve --status': required services missing:
systemd-resolved.
[plugin:systemd] skipped command 'systemd-resolve --statistics': required services missing:
systemd-resolved.
Running plugins. Please wait ...

Finishing plugins [Running: subscription_manager]


Finished running plugins
Creating compressed archive...

Your sosreport has been generated and saved in:


/var/tmp/sosreport-cephnode1-DUMMYCASE-2024-08-06-qkkhrke.tar.xz

Size 206.69MiB
Owner root
sha256 6abd358eb6387dd0166134a0279fb65daee32d3eb8b9ab1adc48e200fa5b4eae

Please send this file to your support representative.

274
IBM Storage Ceph for Beginner’s

[root@cephnode1 ~]#

[root@cephnode1 ~]# du -h /var/tmp/sosreport-cephnode1-DUMMYCASE-2024-08-06-qkkhrke.tar.xz


207M /var/tmp/sosreport-cephnode1-DUMMYCASE-2024-08-06-qkkhrke.tar.xz
[root@cephnode1 ~]#

Once you have the sos report, you can upload it to your support ticket via IBM ECuRep.

https://www.ibm.com/support/pages/enhanced-customer-data-repository-ecurep-send-data-
https#secure

Figure 257: IBM ECuRep

IBM Storage Ceph Simulating Hardware Failures


It is important to include the simulation of hardware failures in any POC or test. Most customers are
used to having to call their storage vendor to repair errors on their storage arrays and would be worried
about the effort to recover from hardware failures when evaluating a software defined storage solution
like IBM Storage Ceph. For the purposes of this document, we will simulate an OSD (disk) failure and
a Ceph OSD node failure.

IBM Storage Ceph OSD Replacement


The procedure to replace a failed disk is documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=osds-replacing

In our lab environment, we have a single node cluster with 3 OSDs. We will simulate a failure by
destroying the OSD and deleting the virtual disk. Destroy removes an OSD permanently. We will then
assign a new virtual disk to replace the failed one.

First, we navigate to Cluster -> OSDs and get the details for an OSD we want to replace (since this is a
virtual environment we need to get the OS device name that corresponds to the OSD).

275
IBM Storage Ceph for Beginner’s

Figure 258: Ceph Dashboard – Cluster – OSDs - Information

We can then select the action to DELETE the OSD.

Figure 259: Ceph Dashboard – Cluster – OSDs - Delete

For a single node cluster with 3 OSDs, we set the pool replica to 2. So, choosing to delete the OSD
should not impact data durability. We also choose not to preserve the OSDs ID in the CRUSH map.

276
IBM Storage Ceph for Beginner’s

Figure 260: Ceph Dashboard – Cluster – OSDs = Delete

The OSD should now be destroyed.

Figure 261: Ceph Dashboard – Cluster – OSDs - Information

[root@ndceph ~]# ceph osd tree


ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.05846 root default
-3 0.05846 host ndceph
0 hdd 0.01949 osd.0 up 1.00000 1.00000
1 hdd 0.01949 osd.1 destroyed 0 1.00000
2 hdd 0.01949 osd.2 up 1.00000 1.00000
[root@ndceph ~]#
[root@ndceph ~]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR
PGS STATUS

277
IBM Storage Ceph for Beginner’s

0 hdd 0.01949 1.00000 20 GiB 48 MiB 2.5 MiB 6 KiB 46 MiB 20 GiB 0.24 0.89
289 up
1 hdd 0.01949 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0
0 destroyed
2 hdd 0.01949 1.00000 20 GiB 60 MiB 2.5 MiB 6 KiB 58 MiB 20 GiB 0.29 1.11
289 up
TOTAL 40 GiB 109 MiB 5.0 MiB 13 KiB 104 MiB 40 GiB 0.27
MIN/MAX VAR: 0.89/1.11 STDDEV: 0.03
[root@ndceph ~]#

Next, we can unmap the OSD from the cluster node, create a new virtual disk and assign the new
virtual disk to the cluster node.

root@ndocrat-desktop:~# virsh detach-disk --domain ndceph /datastores/ndceph_osd2.qcow2 --


persistent --config --live
Disk detached successfully
root@ndocrat-desktop:~# qemu-img create -f qcow2 /datastores/ndceph_newosd2.qcow2 20G
Formatting '/datastores/ndceph_newosd2.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off
compression_type=zlib size=21474836480 lazy_refcounts=off refcount_bits=16
root@ndocrat-desktop:/datastores# virsh attach-disk ndceph /datastores/ndceph_newosd2.qcow2
vdc --persistent --subdriver qcow2
Disk attached successfully
root@ndocrat-desktop:/datastores#

On the Ceph cluster node, we can verify that the new disk is available for use (/dev/vdc).

[root@ndceph ~]# lsblk


NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0
11:0 1 2K 0 rom
vda
252:0 0 50G 0 disk
├─vda1
252:1 0 1G 0 part /boot
└─vda2
252:2 0 49G 0 part
├─rhel-root
253:0 0 45G 0 lvm /var/lib/containers/storage/overlay

/
└─rhel-swap
253:1 0 3.9G 0 lvm [SWAP]
vdb
252:16 0 20G 0 disk
└─ceph--7513fae5--a462--4357--9bc4--7d9664815902-osd--block--10ecc366--f829--4b3f--bc4e--
f3932d7aeb83 253:2 0 20G 0 lvm
└─8Qm4W4-w5qb-0KZY-hCd2-St0o-1T4Q-AtSpWC
253:4 0 20G 0 crypt
vdc
252:32 0 20G 0 disk
vdd
252:48 0 20G 0 disk
└─ceph--af9e6646--715b--44de--8a82--06fa4978ff04-osd--block--e49b668e--2398--4d4e--85e3--
f050cb7989d3 253:3 0 20G 0 lvm
└─QO4sBS-Mft4-3F6H-3rHD-JEEd-sVQx-I5xAR0
253:5 0 20G 0 crypt
[root@ndceph ~]#

Because we set Ceph to use all devices when we created our cluster, Ceph will automatically add the
new disk back as an OSD for us.

The effect of ceph orch apply is persistent which means that the Orchestrator automatically finds the
device, adds it to the cluster, and creates new OSDs. This occurs under the following conditions:

• New disks or drives are added to the system.


• Existing disks or drives are zapped.
• An OSD is removed and the devices are zapped.

278
IBM Storage Ceph for Beginner’s

You can disable automatic creation of OSDs on all the available devices by using the --unmanaged
parameter.

After a few minutes the new OSD will appear on the Ceph Dashboard.

Figure 262: Ceph Dashboard – Cluster – OSDs - List

And we can verify it from the command line.

[root@ndceph ~]# ceph orch device ls


HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT
REASONS
ndceph.local /dev/sr0 hdd QEMU_DVD-ROM_QM00001 2048 No 67s ago Insufficient
space (<5GB)
ndceph.local /dev/vdb hdd 20.0G No 67s ago Has a
FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
ndceph.local /dev/vdc hdd 20.0G No 67s ago Has a
FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
ndceph.local /dev/vdd hdd 20.0G No 67s ago Has a
FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
[root@ndceph ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.05846 root default
-3 0.05846 host ndceph
0 hdd 0.01949 osd.0 up 1.00000 1.00000
1 hdd 0.01949 osd.1 up 1.00000 1.00000
2 hdd 0.01949 osd.2 up 1.00000 1.00000
[root@ndceph ~]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR
PGS STATUS
0 hdd 0.01949 1.00000 20 GiB 58 MiB 2.8 MiB 9 KiB 55 MiB 20 GiB 0.28 1.10
199 up
1 hdd 0.01949 1.00000 20 GiB 47 MiB 2.8 MiB 1 KiB 44 MiB 20 GiB 0.23 0.88
206 up
2 hdd 0.01949 1.00000 20 GiB 54 MiB 2.3 MiB 10 KiB 51 MiB 20 GiB 0.26 1.02
173 up
TOTAL 60 GiB 159 MiB 7.9 MiB 22 KiB 151 MiB 60 GiB 0.26
MIN/MAX VAR: 0.88/1.10 STDDEV: 0.02
[root@ndceph ~]#

279
IBM Storage Ceph for Beginner’s

For an unplanned OSD failure, we won’t delete the OSD from the Ceph Dashboard. We will just unmap
it from the cluster node, delete it, create a new one and map it back to the cluster node. The behaviour
is similar to a planned removal.

We will forcefully remove the virtual disk backing OSD.0 with device name /dev/vdb.

Figure 263: Ceph Dashboard – Cluster – OSDs - Information

Without doing anything on the Ceph Dashboard, we will unmap the OSD from the cluster node and
delete the virtual disk on the Hypervisor.

root@ndocrat-desktop:/datastores# virsh detach-disk --domain ndceph


/datastores/ndceph_osd1.qcow2 --persistent --config --live Disk detached
successfully
root@ndocrat-desktop:/datastores# rm /datastores/ndceph_osd1.qcow2

On the Ceph node, we are missing /dev/vdb.

[root@ndceph ~]# lsblk


NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0
11:0 1 2K 0 rom
vda
252:0 0 50G 0 disk
├─vda1
252:1 0 1G 0 part /boot
└─vda2
252:2 0 49G 0 part
├─rhel-root
253:0 0 45G 0 lvm /var/lib/containers/storage/overlay

/
└─rhel-swap
253:1 0 3.9G 0 lvm [SWAP]
vdc
252:32 0 20G 0 disk
└─ceph--79e5581c--a1e0--476b--9d23--e6f0629bcfa9-osd--block--fdbb31a0--31ae--49be--b9d3--
a38544475372 253:6 0 20G 0 lvm
└─21TIgq-7Ych-a5ut-Oatv-wNQt-aWFd-RZVylo
253:7 0 20G 0 crypt

280
IBM Storage Ceph for Beginner’s

vdd
252:48 0 20G 0 disk
└─ceph--af9e6646--715b--44de--8a82--06fa4978ff04-osd--block--e49b668e--2398--4d4e--85e3--
f050cb7989d3 253:3 0 20G 0 lvm
└─QO4sBS-Mft4-3F6H-3rHD-JEEd-sVQx-I5xAR0
253:5 0 20G 0 crypt
[root@ndceph ~]#

Ceph logs alerts that we are missing an OSD.

Figure 264: Ceph Dashboard Home

And we can see it is performing a rebalance.

[root@ndceph ~]# ceph -s


cluster:
id: 6f105f06-562e-11ef-8666-525400463683
health: HEALTH_WARN
1 failed cephadm daemon(s)
1 osds down
Degraded data redundancy: 161/514 objects degraded (31.323%), 55 pgs degraded,
231 pgs undersized

services:
mon: 1 daemons, quorum ndceph (age 10m)
mgr: ndceph.azavyo(active, since 10m), standbys: ndceph.cmmpsd
osd: 3 osds: 2 up (since 95s), 3 in (since 8m)
rgw: 1 daemon active (1 hosts, 1 zones)

data:
pools: 8 pools, 321 pgs
objects: 257 objects, 455 KiB
usage: 163 MiB used, 60 GiB / 60 GiB avail
pgs: 161/514 objects degraded (31.323%)
176 active+undersized
90 active+clean
55 active+undersized+degraded

progress:
Global Recovery Event (90s)
[=======.....................] (remaining: 3m)

The OSD is marked down on the Ceph Dashboard.

281
IBM Storage Ceph for Beginner’s

Figure 265: Ceph Dashboard – Cluster – OSDs - List

We can now delete it as before.

Figure 266: Ceph Dashboard – Cluster – OSDs - Delete

We have to wait for the rebalance and purge to occur. After a while it should show as removed on the
Ceph Dashboard.

[root@ndceph ~]# ceph orch osd rm status


OSD HOST STATE PGS REPLACE FORCE ZAP DRAIN STARTED AT
0 ndceph.local done, waiting for purge 0 False False False
[root@ndceph ~]#

282
IBM Storage Ceph for Beginner’s

Figure 267: Ceph Dashboard – Cluster – OSDs - List

We can now create a new virtual disk and map it to the Ceph node. As before, because we set the Ceph
orchestrator to use all available devices, Ceph will automatically create a new OSD and add it to the
cluster.

root@ndocrat-desktop:/datastores# qemu-img create -f qcow2 ndceph_newosd1.qcow2 20G


Formatting 'ndceph_newosd1.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off
compression_type=zlib size=21474836480 lazy_refcounts=off refcount_bits=16
root@ndocrat-desktop:/datastores# virsh attach-disk ndceph /datastores/ndceph_newosd1.qcow2
vdb --persistent --subdriver qcow2
Disk attached successfully
root@ndocrat-desktop:/datastores#

A quick check on the node and the disk is already in use.

[root@ndceph ~]# lsblk


NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0
11:0 1 2K 0 rom
vda
252:0 0 50G 0 disk
├─vda1
252:1 0 1G 0 part /boot
└─vda2
252:2 0 49G 0 part
├─rhel-root
253:0 0 45G 0 lvm /var/lib/containers/storage/overlay

/
└─rhel-swap
253:1 0 3.9G 0 lvm [SWAP]
vdb
252:16 0 20G 0 disk
└─ceph--370788cf--86ca--4983--aa99--e5f6b2f4dd28-osd--block--05a4fb49--5ce9--4222--8a08--
7c30aba58731 253:6 0 20G 0 lvm
└─dbVJjO-jwpK-f5x9-hqVR-6Puw-1Lbt-oyBBl3
253:7 0 20G 0 crypt
vdc
252:32 0 20G 0 disk
└─ceph--79e5581c--a1e0--476b--9d23--e6f0629bcfa9-osd--block--fdbb31a0--31ae--49be--b9d3--
a38544475372 253:3 0 20G 0 lvm

283
IBM Storage Ceph for Beginner’s

└─21TIgq-7Ych-a5ut-Oatv-wNQt-aWFd-RZVylo
253:5 0 20G 0 crypt
vdd
252:48 0 20G 0 disk
└─ceph--af9e6646--715b--44de--8a82--06fa4978ff04-osd--block--e49b668e--2398--4d4e--85e3--
f050cb7989d3 253:2 0 20G 0 lvm
└─QO4sBS-Mft4-3F6H-3rHD-JEEd-sVQx-I5xAR0
253:4 0 20G 0 crypt
[root@ndceph ~]#

The Ceph Dashboard confirms it is back and the cluster is back to a healthy state.

Figure 268: Ceph Dashboard – Cluster – OSDs - List

IBM Storage Ceph Simulating a Node Failure

The procedure to replace a failed node is documented here:

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=failure-simulating-node

In our lab environment, we will simulate the worst-case scenario of completely losing a Ceph OSD
node with its associated disks and replacing it with a brand-new node with new disks.

First, we need to understand the impact of removing a node. We are only using 4% of the RAW capacity
so should be fine to remove a node.

[ceph: root@cephnode4 /]# ceph df


--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 400 GiB 384 GiB 16 GiB 16 GiB 4.03
TOTAL 400 GiB 384 GiB 16 GiB 16 GiB 4.03

--- POOLS ---


POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 769 KiB 2 2.3 MiB 0 118 GiB
.rgw.root 2 32 35 KiB 44 516 KiB 0 118 GiB
DC1_ZONE.rgw.log 3 32 23 KiB 338 1.9 MiB 0 118 GiB
DC1_ZONE.rgw.control 4 32 0 B 8 0 B 0 118 GiB
DC1_ZONE.rgw.meta 5 32 15 KiB 64 629 KiB 0 118 GiB

284
IBM Storage Ceph for Beginner’s

DC1_ZONE.rgw.buckets.index 6 32 111 KiB 154 332 KiB 0 118 GiB


DC1_ZONE.rgw.buckets.data 7 128 34 KiB 41 540 KiB 0 118 GiB
DC1_ZONE.rgw.buckets.non-ec 8 32 0 B 0 0 B 0 118 GiB
rbd 10 32 2.6 GiB 677 7.7 GiB 2.13 118 GiB
cephfs.labserver.meta 19 16 171 MiB 67 514 MiB 0.14 118 GiB
cephfs.labserver.data 20 128 518 B 3 12 KiB 0 118 GiB
cephfs.cephnfs.meta 25 16 14 MiB 28 43 MiB 0.01 118 GiB
cephfs.cephnfs.data 26 128 1000 MiB 250 2.9 GiB 0.82 118 GiB
.nfs 27 32 21 KiB 13 107 KiB 0 118 GiB
k8s-rbd 28 32 4.1 KiB 13 60 KiB 0 118 GiB
cephfs.k8s-fs.meta 29 16 981 KiB 30 3.0 MiB 0 118 GiB
cephfs.k8s-fs.data 30 128 470 B 4 36 KiB 0 118 GiB
[ceph: root@cephnode4 /]#

The node we want to remove is cephnode4 which has the following services running. Note down the
labels as well as we will need them when we add a brand-new node.

Figure 269: Ceph Dashboard – Cluster – Hosts - Information

And owns the following OSDs.

285
IBM Storage Ceph for Beginner’s

Figure 270: Ceph Dashboard – Cluster – OSDs - List

Since this is a 4-node cluster and we want to limit the amount or recovery action during the node
replacement, we will disable rebalancing and backfill.

[root@cephnode4 ~]# ceph osd set noout


noout is set
[root@cephnode4 ~]# ceph osd set noscrub
noscrub is set
[root@cephnode4 ~]# ceph osd set nodeep-scrub
nodeep-scrub is set
[root@cephnode4 ~]#

Next, we need to shutdown the node.

[root@cephnode4 ~]# shutdown -h now


[root@cephnode4 ~]#

[root@cephnode1 ~]# ceph orch host ls


HOST ADDR LABELS STATUS
cephnode1.local 10.0.0.240 _admin,mon,mgr,osd,mds,nfs,iscsi
cephnode2.local 10.0.0.241 mgr,mon,osd,rgw
cephnode3.local 10.0.0.242 mon,osd,rgw
cephnode4.local 10.0.0.243 mds,osd,grafana,nfs Offline
4 hosts in cluster
[root@cephnode1 ~]#

We want to change the node’s hostname so need to remove the host from the Ceph CRUSH map.
Before we do that however, we will delete its associated OSDs and remove them from the CRUSH
map.

[root@cephnode1 ~]# ceph osd dump | grep " down "


osd.4 down in weight 1 up_from 8389 up_thru 8778 down_at 8783 last_clean_interval
[8365,8384) [v2:10.0.0.243:6818/3031317955,v1:10.0.0.243:6819/3031317955]
[v2:192.168.1.13:6820/3031317955,v1:192.168.1.13:6821/3031317955] exists 422e1d7b-2220-49b3-
92da-fba48425f255
osd.7 down in weight 1 up_from 8388 up_thru 8766 down_at 8783 last_clean_interval
[8363,8384) [v2:10.0.0.243:6810/796686406,v1:10.0.0.243:6811/796686406]
[v2:192.168.1.13:6812/796686406,v1:192.168.1.13:6813/796686406] exists de12d590-5f56-4195-
9cf1-e611ea1a7755
osd.8 down in weight 1 up_from 8613 up_thru 8769 down_at 8783 last_clean_interval [0,0)
[v2:10.0.0.243:6802/1045852892,v1:10.0.0.243:6803/1045852892]

286
IBM Storage Ceph for Beginner’s

[v2:192.168.1.13:6804/1045852892,v1:192.168.1.13:6805/1045852892] exists fb3a64c4-0937-423e-


9b2e-15ab13fadc4c
[root@cephnode1 ~]#

[root@cephnode1 ~]# ceph osd rm osd.4


removed osd.4
[root@cephnode1 ~]# ceph osd rm osd.7
removed osd.7
[root@cephnode1 ~]# ceph osd rm osd.8
removed osd.8
[root@cephnode1 ~]#

[root@cephnode1 ~]# ceph osd crush rm cephnode4


Error ENOTEMPTY: (39) Directory not empty
[root@cephnode1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.39038 root default
-3 0.09760 host cephnode1
0 hdd 0.02440 osd.0 up 1.00000 1.00000
1 hdd 0.02440 osd.1 up 1.00000 1.00000
10 hdd 0.04880 osd.10 up 1.00000 1.00000
-9 0.09760 host cephnode2
2 hdd 0.02440 osd.2 up 1.00000 1.00000
5 hdd 0.02440 osd.5 up 0.95001 1.00000
11 hdd 0.04880 osd.11 up 1.00000 1.00000
-7 0.09760 host cephnode3
3 hdd 0.02440 osd.3 up 1.00000 1.00000
6 hdd 0.02440 osd.6 up 1.00000 1.00000
9 hdd 0.04880 osd.9 up 1.00000 1.00000
-5 0.09760 host cephnode4
4 hdd 0.02440 osd.4 DNE 0
7 hdd 0.02440 osd.7 DNE 0
8 hdd 0.04880 osd.8 DNE 0
[root@cephnode1 ~]# ceph osd crush rm osd.4
removed item id 4 name 'osd.4' from crush map
[root@cephnode1 ~]# ceph osd crush rm osd.7
removed item id 7 name 'osd.7' from crush map
[root@cephnode1 ~]# ceph osd crush rm osd.8
removed item id 8 name 'osd.8' from crush map
[root@cephnode1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.29279 root default
-3 0.09760 host cephnode1
0 hdd 0.02440 osd.0 up 1.00000 1.00000
1 hdd 0.02440 osd.1 up 1.00000 1.00000
10 hdd 0.04880 osd.10 up 1.00000 1.00000
-9 0.09760 host cephnode2
2 hdd 0.02440 osd.2 up 1.00000 1.00000
5 hdd 0.02440 osd.5 up 0.95001 1.00000
11 hdd 0.04880 osd.11 up 1.00000 1.00000
-7 0.09760 host cephnode3
3 hdd 0.02440 osd.3 up 1.00000 1.00000
6 hdd 0.02440 osd.6 up 1.00000 1.00000
9 hdd 0.04880 osd.9 up 1.00000 1.00000
-5 0 host cephnode4
[root@cephnode1 ~]#

Now we can remove the node from the CRUSH map and also the Ceph OSD users.

[root@cephnode1 ~]# ceph osd crush rm cephnode4


removed item id -5 name 'cephnode4' from crush map
[root@cephnode1 ~]#

[root@cephnode1 ~]# ceph auth del osd.4


[root@cephnode1 ~]# ceph auth del osd.7
[root@cephnode1 ~]# ceph auth del osd.8
[root@cephnode1 ~]#

On our Hypervisor, we can physically delete the Virtual Machine.

287
IBM Storage Ceph for Beginner’s

Figure 271: Proxmox Delete Virtual Machine cephnode4

We removed the node from the CRUSH map but still need to remove it from the cluster. Specify the --
force as Ceph might prevent the removal if it can’t relocate daemons (or will lose daemons set to only
have a single instance).

[root@cephnode2 tmp]# ceph orch host rm cephnode4.local --offline --force


Removed offline host 'cephnode4.local'
[root@cephnode2 tmp]#

[root@cephnode2 tmp]# ceph orch host ls


HOST ADDR LABELS STATUS
cephnode1.local 10.0.0.240 _admin,mon,mgr,osd,mds,nfs,iscsi
cephnode2.local 10.0.0.241 mgr,mon,osd,rgw
cephnode3.local 10.0.0.242 mon,osd,rgw
3 hosts in cluster
[root@cephnode2 tmp]#

As mentioned, the Grafana service is down as we don’t have any other node labelled to take the service
over.

[root@cephnode1 ~]# ceph -s


cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_WARN
Failed to apply 1 service(s): grafana
noout,noscrub,nodeep-scrub flag(s) set
too many PGs per OSD (283 > max 250)

services:
mon: 3 daemons, quorum cephnode1,cephnode3,cephnode2 (age 5d)
mgr: cephnode1.tbqyke(active, since 3d), standbys: cephnode2.iaecpr
mds: 3/3 daemons up, 1 standby
osd: 9 osds: 9 up (since 50m), 9 in (since 2w)
flags noout,noscrub,nodeep-scrub
rgw: 2 daemons active (2 hosts, 1 zones)
rgw-nfs: 1 daemon active (1 hosts, 1 zones)

data:
volumes: 3/3 healthy
pools: 17 pools, 849 pgs
objects: 1.74k objects, 3.7 GiB
usage: 16 GiB used, 284 GiB / 300 GiB avail
pgs: 849 active+clean

288
IBM Storage Ceph for Beginner’s

io:
client: 115 B/s rd, 0 op/s rd, 0 op/s wr

[root@cephnode1 ~]#

We need to now create a new virtual machine and install the OS. We then need to do the same tasks
we did prior to bootstrapping our Ceph cluster.

• dnf install ibm-storage-ceph-license -y


• touch /usr/share/ibm-storage-ceph-license/accept
• dnf install cephadm-ansible
• setup passwordless SSH from the admin node (cephnode1)

On cephnode1, we can update the ansible inventory file to reflect the new node (newcephnode4.local)
and run the pre-flight playbook.

[root@cephnode1 production]# cat hosts


cephnode2
cephnode3
newcephnode4

[admin]
cephnode1
[root@cephnode1 production]#
[root@cephnode1 cephadm-ansible]# ansible-playbook -i ./inventory/production/hosts cephadm-
distribute-ssh-key.yml -e cephadm_ssh_user=root -e admin_node=cephnode1.local

PLAY RECAP
*********************************************************************************************
*******************************
cephnode1 : ok=1 changed=0 unreachable=0 failed=0 skipped=0
rescued=0 ignored=0
cephnode2 : ok=3 changed=0 unreachable=0 failed=0 skipped=3
rescued=0 ignored=0
cephnode3 : ok=1 changed=0 unreachable=0 failed=0 skipped=0
rescued=0 ignored=0
newcephnode4 : ok=1 changed=1 unreachable=0 failed=0 skipped=0
rescued=0 ignored=0
.
.
.
[root@cephnode1 cephadm-ansible]#

Our new cluster node, newcephnode4.local has the following storage configuration (similar to the one
we removed).

[root@newcephnode4 ~]# lsblk


NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 39G 0 part
├─rhel-root 253:0 0 35G 0 lvm /
└─rhel-swap 253:1 0 4G 0 lvm [SWAP]
sdb 8:16 0 25G 0 disk
sdc 8:32 0 25G 0 disk
sdd 8:48 0 50G 0 disk
sr0 11:0 1 10.3G 0 rom
[root@newcephnode4 ~]#

On the Ceph Dashboard, we can add the new node back to the Ceph cluster. We need to add the
correct labels too.

289
IBM Storage Ceph for Beginner’s

Figure 272: Ceph Dashboard – Cluster - Hosts – Add

Wait for the node to be added and services to be started.

Figure 273: Ceph Dashboard – Cluster - Hosts – Information

290
IBM Storage Ceph for Beginner’s

Figure 274: Ceph Dashboard – Cluster - OSDs - List

We need to copy the Ceph admin keyring and ceph.conf file from one of the other cluster nodes in
order for the cephadm-shell to work.

[root@newcephnode4 ceph]# scp cephnode1:/etc/ceph/ceph.client.admin.keyring .


root@cephnode1's password:
ceph.client.admin.keyring
100% 151 336.2KB/s 00:00
[root@newcephnode4 ceph]# scp cephnode1:/etc/ceph/ceph.conf .
root@cephnode1's password:
ceph.conf
100% 259 627.8KB/s 00:00
[root@newcephnode4 ceph]#

Finally, we need to unset the noout, noscrub and nodeep-scrub settings and verify the cluster health.

[root@newcephnode4 ceph]# ceph osd unset noout


noout is unset
[root@newcephnode4 ceph]# ceph osd unset noscrub
noscrub is unset
[root@newcephnode4 ceph]# ceph osd unset nodeep-scrub
nodeep-scrub is unset
[root@newcephnode4 ceph]#

[root@newcephnode4 ceph]# ceph -s


cluster:
id: e7fcc1ac-42ec-11ef-a58f-bc241172f341
health: HEALTH_OK

services:
mon: 3 daemons, quorum cephnode1,cephnode3,cephnode2 (age 5d)
mgr: cephnode1.tbqyke(active, since 3d), standbys: cephnode2.iaecpr
mds: 3/3 daemons up, 1 standby
osd: 12 osds: 12 up (since 5m), 12 in (since 2w)
rgw: 2 daemons active (2 hosts, 1 zones)
rgw-nfs: 1 daemon active (1 hosts, 1 zones)

data:
volumes: 3/3 healthy
pools: 17 pools, 849 pgs
objects: 1.74k objects, 3.7 GiB
usage: 16 GiB used, 383 GiB / 400 GiB avail
pgs: 846 active+clean
2 active+clean+scrubbing

291
IBM Storage Ceph for Beginner’s

1 active+clean+scrubbing+deep

io:
client: 204 B/s rd, 0 op/s rd, 0 op/s wr

[root@newcephnode4 ceph]#

We can double check our CRUSH map to validate the new node is added.

Figure 275: Ceph Dashboard – Cluster - CRUSH map

IBM Storage Ceph Licensing


From the publicly available IBM Storage Ceph product licensing FAQ:

https://www.ibm.com/support/pages/ibm-storage-ceph-product-licensing-frequently-asked-
questions-faq

What editions are offered?

IBM Storage Ceph is offered in two editions: Premium Edition and Pro Edition. For each edition, there
is an Object part number that limits use to only object protocols while the other parts include file,
block, and object protocols. For each of the above, clients can purchase perpetual licenses, annual
licenses, or monthly licenses. After the initial year, clients can purchase renewal or reinstatement
parts

What is the difference between editions?

IBM Storage Ceph Premium Edition includes the required Red Hat Enterprise Linux subscriptions for
the IBM Storage Ceph nodes. IBM Storage Ceph Pro Edition requires the client to acquire Red Hat
Enterprise Linux subscriptions directly from Red Hat for the IBM Storage Ceph nodes. Both editions
include entitlement for IBM Storage Insights. The license for IBM Storage Insights (entitled via IBM
Spectrum Control) is limited to use with the IBM Storage Ceph environment ONLY.

292
IBM Storage Ceph for Beginner’s

What is the licensing metric for IBM Storage Ceph?

IBM Storage Ceph is licensed per TB of raw capacity. IBM defines a TB as 2^40 bytes (TiB). Customers
must purchase enough TiB entitlements to equal the total aggregate raw TiB of all OSD data devices
independent of the number of nodes, clusters, or how the underlying hardware architecture is
implemented. Once installed, the client can confirm license compliance by summing up the
"ceph_cluster_capacity_bytes" metric for all their clusters.

What is the minimum configuration that can be license?

Table 2: IBM Storage Ceph Minimum Licensing Configurations

Summary
This document has highlighted the steps required to plan, size and deploy a IBM Storage Ceph Cluster.
In addition, we have covered the use of advanced Ceph functions including replication, compression
and encryption. If you undertaking a POC or testing Ceph, this document should help you create a test
plan with the relevant test cases.

293
IBM Storage Ceph for Beginner’s

Appendix A - IBM Storage Ceph Tips


These various tips might help you during your POC or test.

ALLOW POOL DELETIONS

Ceph Dashboard does not allow you to delete storage pools by default. You can change this by
navigating to Administration -> Configuration and changing the mon_allow_pool_delete to true.

Figure 276: Ceph Dashboard – Administration - Configuration

CEPH ROLE BASED ACCESS (RBAC)

You might be required to showcase RBAC. Ceph has the following user roles.

Figure 277: Ceph Dashboard – User Management - Roles

You can create a new user with the specific role you need to test.

294
IBM Storage Ceph for Beginner’s

Figure 278: Ceph Dashboard – User Management - Create

CHANGING THE CEPH DASHBOARD PASSWORD FROM THE COMMAND LINE

You can use this procedure to change the Ceph Dashboard admin password.

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=ia-changing-ceph-dashboard-password-
using-command-line-interface

CEPH HEALTHCHECK HISTORY

You can issue the following command to check the cluster health history.

[root@cephnode1 ~]# ceph healthcheck history ls


Healthcheck Name First Seen (UTC) Last seen (UTC) Count Active
CEPHADM_APPLY_SPEC_FAIL 2024/07/21 10:26:31 2024/07/21 10:31:01 2 No
CEPHADM_DAEMON_PLACE_FAIL 2024/07/19 21:49:33 2024/07/19 22:43:11 6 No
CEPHADM_FAILED_DAEMON 2024/07/19 21:59:48 2024/07/19 21:59:48 1 No
CEPHADM_HOST_CHECK_FAILED 2024/07/20 00:00:27 2024/07/21 11:13:03 2 No
CEPHADM_REFRESH_FAILED 2024/07/18 20:32:01 2024/07/22 10:17:55 9 No
,
,
.

CEPH PLACEMENT GROUPS TOO HIGH

You are bound to see this warning depending on the number and size of disks you use per node. It is
better to use a large number of smaller disks than a small number of large disks. The optimal number
of PGs per OSD is 200-300.

Figure 279: Ceph Dashboard – Observability - Alerts

295
IBM Storage Ceph for Beginner’s

Change the limit with these commands. The PG Autoscaler is not really helpful for small POC or test
clusters.

[root@cephnode1 ~]# ceph config get mon mon_max_pg_per_osd


250
[root@cephnode1 ~]# ceph config set mon mon_max_pg_per_osd 500

USE CENTRALISED LOGGING

Containerization makes it difficult to debug issues. Remember to check the node’s /var/log/messages
file and also use journalctl to see daemon errors. It is highly recommended to enable centralised
logging to a file on Ceph as well. See here.

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=cluster-viewing-centralized-logs-ceph

[root@cephnode4 ~]# ceph config set global log_to_file true


[root@cephnode4 ~]# ceph config set global mon_cluster_log_to_file true

CEPH DASHBOARD SSO AND ACTIVE DIRECTORY FOR RGW

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=dashboard-enabling-single-sign-ceph
https://www.ibm.com/docs/en/storage-ceph/7.1?topic=configuration-configuring-active-directory-
ceph-object-gateway

POWERING DOWN AND REBOOTING A STORAGE CEPH CLUSTER

https://www.ibm.com/docs/en/storage-ceph/7.1?topic=management-powering-down-rebooting-
cluster

296

You might also like