Minio Ibm
Minio Ibm
White Paper
Implement high-performance
object storage with MinIO
and IBM
Achieve robust performance for AI, IoT and more using MinIO
and IBM Power Systems servers with POWER9 processors
2 Implement high-performance object storage with MinIO and IBM
Executive summary It also offers the flexibility to disaggregate storage from compute
Object storage presents several important benefits for resources, enabling organizations to optimize compute and
accommodating fast-growing volumes of unstructured data. storage for specific workflows. As a result, object storage is fast
With the right object storage solution and hardware becoming the default storage option for these organizations.
infrastructure, organizations can also achieve the robust
performance required for supporting computationally intensive Using the right hardware infrastructure, object storage can also
workloads, including artificial intelligence (AI)/machine provide a fundamentally different performance profile than
learning, Internet of Things (IoT), and big data analytics. other types of storage, enabling organizations to implement new
use cases and launch more ambitious projects. High-performance
Recent benchmark testing shows that MinIO object storage object storage can support workloads ranging from training AI
running on IBM Power Systems servers with IBM POWER9 algorithms to analyzing IoT data. Running MinIO object
processors can deliver exceptional throughput performance— storage with IBM Power Systems servers based on IBM
up to 25 GB/s in aggregate for four servers—plus linear POWER9 processors can deliver this level of performance,
scalability as clusters grow. That level of performance enables opening important opportunities for enterprises deploying
organizations to unlock the full value of their data while also workloads in private cloud or multicloud environments.
capitalizing on the scalability, accessibility, data protection, and
cost-effectiveness of object storage. Recognizing the advantages
of object storage
Launching data-intensive initiatives For storing large, rapidly expanding volumes of unstructured
Across industries, organizations are launching new technology data, object storage can present your organization with several
initiatives that require them to store, access, and analyze large, advantages over more traditional file- or block-based storage.
fast-growing volumes of data. Whether they are implementing
artificial intelligence (AI)/machine learning, capitalizing on Scalability
Internet of Things (IoT) technology, or employing other big Object storage is designed to scale. Instead of the nested files and
data solutions, these organizations might need to store and folders used by hierarchical file systems, object storage uses a flat
analyze tens—or hundreds—of petabytes of data. structure. That structure enables you to store billions of files
without the complexity and performance issues that can develop
Much of that data is unstructured. From multimedia files as you scale hierarchical environments. Object storage also lets
and text documents to web pages and log files, unstructured you scale incrementally: you can scale performance or capacity
data can be difficult to query, making it challenging for simply by adding racks of clusters.
organizations to work with all of the data they are collecting.
Traditional hierarchical file storage systems and block storage Fast retrieval
are not the best fit for these unstructured data volumes. With MinIO object storage, each object has metadata and uses
the URL as a unique identifier. These tags and ID numbers help
Object storage offers an important alternative to file- and eliminate the need to know the exact location of data within the
block-based storage for big data, as proven by organizations storage environment. Every object is accessible from anywhere
with hyperscale environments. Object storage provides through its unique URL—only standard IP routing and DNS
the right combination of cost-effective scalability, data mechanisms are required. The right object storage solution can
integrity, and accessibility that many organizations need. also avoid the bottleneck of a centralized metadata server, storing
the metadata alongside objects.
IBM Systems 3
Data protection and preservation Until recently, big data, IoT, and AI workloads often drove
Object storage solutions protect and preserve data more organizations to employ Hadoop Distributed File System
efficiently than other types of storage architectures. By using (HDFS) storage. With HDFS, you bring the algorithm to the
data protection capabilities such as erasure coding, object storage data. Each node computes a part of the algorithm using local
can protect data using far less raw storage capacity than storage and then sends the results back to a centralized server,
RAID-based architectures. Data protection capabilities can also where results are aggregated. This approach can work well for
help quickly repair problems on a per-object basis, instead of on some algorithms, and it can offer scalability for large-scale
a per-disk basis, helping to avoid data loss and to maintain high collections of data.
availability of data.
However, object storage presents several advantages over HDFS.
Cost-effectiveness For example, object storage can provide greater flexibility for
The ability of object storage to scale incrementally, without balancing compute and storage across your environment. Using
forklift upgrades, can help you control storage costs. high-speed networking with your object storage environment,
In addition, object storage data protection capabilities help you can consume your compute and storage resources in the
eliminate the need for numerous copies of files, reducing the optimal way for each particular workload.
raw storage capacity required to safeguard data and driving
down capital expenditures. Object storage also requires less capacity than HDFS to ensure
data protection for the same amount of data. While HDFS stores
Unlocking the full value of data with multiple copies of each file, object storage can use data
high-performance object storage protection capabilities such as erasure coding to protect data
Object storage has not always been used for high-performance more efficiently. Object storage also helps eliminate the risk of
workloads. In fact, some organizations employ object storage as a using a single master node, which can become a single point of
backup environment or a long-term disk-based archive. failure. Overall, high-performance object storage provides a
more efficient and reliable way to support data-intensive
Object storage does have advantages for these use cases. By workloads than HDFS.
storing objects along with metadata, object storage can make it
easier for users to find and retrieve the files, media clips, or entire Capitalizing on MinIO high-performance
projects they need among millions or billions of files. At the same object storage with enterprise capabilities
time, data protection capabilities can help securely preserve data MinIO high-performance distributed object storage is designed
over the long term. for large-scale data environments. It is a well-suited Amazon
S3–compatible replacement for HDFS, especially when used for
Yet to maximize the value of data residing in object storage, AI/machine learning, IoT, and other big data workloads.
you need to be able to consume it quickly. High-performance
object storage solutions can help you extend the benefits
of object storage to new use cases and extract more value from
your stored data. If you can achieve sufficient throughput, you
can use object storage for big data and IoT analytics, as well as
AI/machine learning workloads.
4 Implement high-performance object storage with MinIO and IBM
To achieve the object storage performance needed for AI, IoT, Several POWER9-based servers also feature a storage-rich
and big data workloads, the POWER9-based servers take design that supports processing and analysis of very large data
advantage of PCIe 4.0 technology. PCIe 4.0 doubles the volumes. The Power Systems LC922—which offers the
bandwidth offered by PCIe 3.0, which remains the standard highest storage capacity in the Power Systems portfolio—
used by other CPU architectures. supports up to 120 TB of capacity in a 2U form factor.
4x IBM Power
Systems S822LC
4x IBM Power
Systems LC922
servers (clients) servers
Figure 1: The test environment included four IBM Power Systems LC922 POWER9 servers (right), four IBM Power Systems S822LC servers as clients, and
100 GbE networking.
6 Implement high-performance object storage with MinIO and IBM
The MinIO team first evaluated throughput performance for Hashing operations require considerable CPU resources,
accelerated versions of two computationally intensive algorithms: but the POWER9-based servers can deliver the required
erasure coding and HighwayHash (for bitrot detection). performance. In the benchmark testing, the optimized
HighwayHash algorithm running on the POWER9 servers
Erasure coding achieved throughput of 5 GB/s per core, which can saturate
With MinIO, erasure coding is designed to take place inline on the 100 Gb network.
a per-object basis. When you store 1 GB of data, MinIO splits
up that data across a large number of drives and creates the COSBench
appropriate amount of parity data on separate drives. The team also ran COSBench, a commonly used open source
Depending on the parity configuration you choose, you can benchmarking tool, to measure the performance of object
afford to lose up to half of the servers and half of the drives— storage services. COSBench testing used four POWER9-based
you will still be able to reconstruct all of your data. Running systems, each with four NVMe drives and connected with
erasure coding inline—instead of offline—enables you to start 100 Gb/s networking.
protecting data the moment you store it, but it inherently
demands high-performance object storage, which MinIO is The team ran COSBench on the four clients with 256 threads
able to provide. per client (1024 total). Each test typically took about an hour,
with a prepare (WRITE) stage of 20–30 minutes, a 20-minute
In the benchmark testing, the optimized erasure coding main (READ) stage, and a final cleanup stage. The team
algorithm running on POWER9 systems achieved throughput uploaded and downloaded more than 10 TB of data to
of 7–9 GB/s per core, which is critical for saturating the fast mitigate any memory caching effects that could inflate the
100 Gb network. This level of throughput for the optimized performance numbers.
algorithm reflects the robust performance of the POWER9
system architecture, which is particularly well suited for this Object-size benchmarks: The team used the four-node
type of high-throughput workload. cluster to benchmark MinIO object storage read and write
throughput for objects of increasing size. Read performance
Bitrot detection reached 18 GB/s and stayed constant through 32 MB and
Similar to erasure coding, MinIO is designed to run bitrot 64 MB object sizes. For larger objects, the write performance
detection on the fly. MinIO’s implementation of the achieved 50 percent of the read performance, which is a
HighwayHash algorithm helps prevent the reading of corrupt strong result.
data. The algorithm computes a hash on read and verifies the
hash on write from the application. Any change in the hash Object Size 10 MB 20 MB 32 MB 64 MB
fingerprint indicates data corruption and requires the use of Read (GB/s) 14.9 18.1 18.7 18.0
parity data instead of the corrupted data. Write (GB/s) 5.7 7.3 10.1
Cluster scaling benchmarks: The team also benchmarked Moving forward with MinIO and IBM
MinIO cluster scaling by increasing the number of nodes used Object storage provides an important alternative to file and
in the test. The COSBench test demonstrated a maximum read block storage for large and growing volumes of unstructured
performance of nearly 25 GB/s in aggregate for the four data. By selecting high-performance object storage, your
POWER9-based servers. organization can extend the benefits of object storage to new
use cases, including AI/machine learning, IoT, and other big
Expanding the cluster could also boost read performance. data workloads. Employing MinIO in combination with IBM
Because MinIO clusters can grow to any number of servers, Power Systems servers based on POWER9 processors can
and overall throughput increases as cluster size increases, the deliver the performance to support those workloads and unlock
total read performance could be higher than 25 GB/s. greater value from data.
Number of Servers 1 2 3 4
Learn more
Throughput (GB/s) 10.5 19.4 24.1 25.4
To discover more about MinIO benefits for AI, IoT, and
Figure 3: MinIO Server performance increases as the cluster size expands. additional big data workloads, visit: https://min.io
Benchmarking summary To learn more about the complete line of the IBM Power
Results from the erasure coding, bitrot, and COSBench testing Systems family, visit: ibm.com/it-infrastructure/power
all show the impressive throughput performance that can be
achieved with MinIO Server on POWER9-based systems. The
results of the erasure coding and bitrot detection algorithm
testing highlight how well this architecture handles these two
specific computationally intensive processes. But the results
also suggest that this architecture could deliver strong results
for computationally intensive AI, IoT, and big data workloads.
IBM, the IBM logo and ibm.com are trademarks or registered trademarks
of International Business Machines Corporation in the United States, other
countries, or both. If these and other IBM trademarked terms are marked
on their first occurrence in this information with a trademark symbol
(® or ™), these symbols indicate US registered or common law trademarks
owned by IBM at the time this information was published. Such trademarks
may also be registered or common law trademarks in other countries.
A current list of IBM trademarks is available on the Web at “Copyright and
trademark information” at ibm.com/legal/copytrade.shtml. Other company,
product and service names may be trademarks or service marks of others.
Please Recycle
XXX-XXXXX-XXXX-00