Ssds For Big Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

SSDs for Big Data Fast Processing Requires High-Performance Storage

Micron Technology, Inc. Technical Marketing Brief

Executive Summary
Big data applications handle extremely large datasets that present challenges of scale. High-performance IT infrastructure is necessary to achieve very fast processing throughput for big data. Solid state drives (SSDs) based on NAND Flash memory are well-suited for big data applications because they provide ultra-fast storage performance, quickly delivering an impressive return on investment. SSDs can be deployed as host cache, network cache, all-SSD storage arrays, or hybrid storage arrays with an SSD tier. Depending on the big data application, either enterpriseclass or personal storage SSDs may be used. Enterprise SSDs are robust and durable, offering superior performance for mixed read/write workloads, while personal storage SSDs typically cost less and are suitable for read-centric workloads. It is important to understand the workload performance and endurance requirements before making a decision.

Examples of big data applications include:


Business analytics to drive insight, innovation, and

predictions
Scientic computing, such as seismic processing,

genomics, and meteorology


Real-time processing of data streams, such as sensor

data or nancial transactions


Web 2.0 public cloud services, such as social networking

sites, search engines, video sharing, and hosted services The primary reason for implementing big data solutions is productivity and competitive advantage. If analyzing customer data opens up new, high-growth market segments; or if analyzing product data leads to valuable new features and innovations; or if analyzing seismic images pinpoints the most productive places to drill for oil and gasthen big data is ultimately about big success. Big data presents challenges of extreme scale. It pushes the limits of IT applications and infrastructure for processing large datasets quickly and cost-effectively. Many technologies and techniques have been developed to meet these challenges, such as distributed computing, massive parallel processing (e.g., Apache Hadoop), and data structures that limit the data required for queries (e.g., bitmaps and column-oriented databases). Underlying all of this is the constant need for faster hardware with greater capacity because big data requires fast processing throughput, which means faster, multicore CPUs, greater memory performance and capacity, improved network bandwidth, and higher storage capacity and throughput.

The Big Picture of Big Data


In a scenario where data grows 40% year-over-year, where 90% of the worlds data was created in the last two years, and where terabytes (TBs) and petabytes (PBs) are talked about as glibly as megabytes and gigabytes, it is easy to mistakenly think that all data is big data. In fact, big data refers to datasets so large they are beyond the ability of traditional database management systems and data processing applications to capture and process. While the exact amount of data that qualies as big is debatable, it generally ranges from tens of TBs to multiple PBs. The Gartner Group further characterizes it in terms of volume of data, velocity into and out of a system (e.g., real-time processing of a data stream), and variety of data types and sources. Big data is typically unstructured, or free-oating, and not part of a relational database schema.

SSDs for Ultra-Fast Storage


SSDs have emerged as a popular choice for ultra-fast storage in enterprise environments, including big data applications. SSDs offer a level of price-to-performance somewhere between DRAM and hard disk drives (HDDs).

SSDs for Big Data Fast Processing Requires High-Performance Storage

SSDs are an order of magnitude denser and less expensive than DRAM, but DRAM has higher bandwidth and signicantly faster access times. Compared to HDDs, SSDs offer orders of magnitude faster random I/O performance and lower cost per IOPS, but HDDs still offer the best price per gigabyte. With capacity pricing for Flash memory projected to fall faster than other media, the SSD value proposition will continue to strengthen in the future.

All-SSD Storage Array An enterprise storage

SSD Benets
Exceptional Storage Performance Deliver good

sequential I/O and outstanding random I/O performance. For many systems, storage I/O acts as a bottleneck, while powerful, multicore CPUs sit idle waiting for data to process. SSDs remove the bottleneck and unleash application performance, enabling true processing throughput and user productivity. Nonvolatile Retain data when power is removed; no destaging required, like DRAM. Low Power Consume less power per system than equivalent spinning disks, reducing data center power and cooling expenses. Flexible Deployment Available in a unique variety of form factors and interfaces compared to other storage solutions: o Form factors: Half-height, half-length (HHHL), 2.5-inch, 1.8-inch, mSATA, m.2, etc. o Interfaces: PCIe, SAS, and SATA

array that uses Flash for storage and DRAM for ultrahigh throughput and low latency. All-SSD arrays offer features like built-in RAID, snapshots, and replication traditionally found in enterprise storage. They may include technologies like inline compression and deduplication to shrink the data footprint and maximize SSD efciency. This option provides additional management of SSDs as it relates to wearout over the entire array. SSD Tier in a Hybrid Storage Array A traditional enterprise storage array that includes SSDs as an ultra-fast tier in a hybrid storage environment. Automated storage management monitors data usage and places hot data in the SSD tier and cold, or less-frequently accessed, data in high-capacity, slower HDD tiers to optimize storage performance and cost. This option works well for mixed data, some of which requires very high performance. A variation on hybrid storage is when an SSD is incorporated as secondary cache in the storage controllers read/write cache.

Choosing the Right SSD in Big Data Deployments


SSDs in general are rated for 1 or 2 million device hours (MTTF), which translates to at least a century or two of operation. NAND Flash cells only wear out if they are being written to. Enterprise-class SSDs are designed for high reliability, maximum durability, and fast, consistent performance. Enterprise-class SSDs last 10 to 1000 times longer than personal storage SSDs under write workloads. While Flash memory performance tends to degrade with use, enterprise SSDs maintain performance over time. Write performance is 2 to 12 times better, and read performance is comparable to or better than personal storage SSDs. The price per gigabyte is 2 to 30 times more for enterprise-class SSDs. Big data applications in large corporate data centers, like scientic computing and business analytics applications, are often characterized by mixed read/write workloads that require very low latency and massive IOPSa good match for durable, robust enterpriseclass SSDs.

SSD Deployment Options


Host Cache SSDs reside in the host server and act

as a level-2 cache for data moved out of memory. Intelligent caching software determines which blocks of data to hold in cache. Typically, PCIe SSDs are used because they offer the lowest latency because no host controllers or adapters are involved. Best results are achieved for heavy read workloads. Cache may be read-only or write-back. Redundant SSDs are recommended for write-back to ensure data is protected. Network Cache Similar to host cache, except SSDs reside in a shared network appliance that accelerates all storage systems behind it. Out-of-band cache is read-only, while in-band is write-back. Network cache offers a better economic benet because it is shared, but it can be slower than direct host cache.

SSDs for Big Data Fast Processing Requires High-Performance Storage

Personal storage SSDs are designed for good read performance and tailored reliability and durability. They are optimized for workloads where reads are more frequent than writes. Personal storage SSDs offer high capacity and lower price per gigabyte than enterpriseclass SSDs. Web 2.0 public cloud applications like social networking sites are characterized by users uploading images, video, and audio les, which are subsequently downloaded or streamed by other users. This type of writeonce, read-many-times workload is a good candidate for personal storage SSDs.

Application Considerations for Enterprise vs. Personal Storage SSDs


Not all big data deployments are the same, and not all SSDs are the same. The question is how to match the right SSD to the right big data deployment. Choosing an SSD solution is based primarily on the performance and availability requirements of the application. The decision tree in Figure 1 and the following Q&A will help you choose the optimal SSD solution for your application.

Mixed read/write

What is the IOPS performance requirement for your application, including read/write mix?

Read-only

Is the estimated durability of personal storage SSDs at least 3 to 5 years?

Yes
ise SS D

Will personal storage SSDs deliver sufficient IOPS performance?

No
rs Pe on al Sto

r erp st: co Ent l < ta To SSD e rag

No Yes

What is the drive TCO over 3 to 5 years, including replacement cost?


Total cost: Personal Storage SSD > Enterprise SSD

Choose Enterprise SSD

Choose Personal Storage SSD

Drive TCO = Cost of drives + Cost of downtime + Cost of slowdown + Cost of IT labor + Risk of data loss

Figure 1: Decision Tree for Enterprise-Class vs. Personal Storage SSD

SSDs for Big Data Fast Processing Requires High-Performance Storage

Question #1: What is the IOPS performance requirement for your application, including the read/write mix?
The rst step is to quantify the workload that SSDs will support. An application workload can be measured using a variety of performance monitoring tools. Beyond workload, also consider the conguration of the system and the impact on the overall platform.

Question #2: What are the endurance requirements for your application?
For mixed read/write workloads, it is important to look closely at SSD durability ratings. This is usually expressed in terms of total bytes written (TBW) or full drive writes per day over a 5-year period. By comparing an applications daily write total with the durability rating of an SSD, it is possible to estimate the drives lifetime in your environment (assuming a constant workload; it might be wise to also estimate future growth). If the write workload is small enough that the estimated lifetime of a personal storage SSD will at least equal the 3 to 5 years typically expected of an IT system, and performance is sufcient, then personal storage SSDs can be a good choice. However, if personal storage SSDs will likely wear out and need to be replaced during the IT systems lifetime, then replacement costs should be considered.

replacements due to wearout, and multiply this gure by the acquisition cost. Cost of Application Downtime If the application needs to be taken ofine to replace an SSD, what is the cost for that lost productivity? Multiply this gure by the number of replacements. Cost of Slower Application Performance If the application does not have to go ofine for drive replacements, but system performance will slow during the replacement and subsequent data replication or RAID rebuild, how will this affect user productivity? Multiply this cost by the number of replacements. Cost of Labor for Drive Replacement Drive monitoring and replacement will be an additional management task for the IT staff, so the cost of labor should be included. Risk of Data Loss For unprotected drives, there is a signicant risk of data loss, and even for RAIDprotected drives, there is a small risk during the RAID rebuild window. Though difcult to quantify, these risks should be factored into the cost.

Conclusion
SSDs are a popular solution for big data applications. Deciding between personal storage and enterprise-class SSDs will depend on performance and endurance requirements and TCO.

Question #3: What is the SSD total cost of ownership (TCO) over 3 to 5 years?
The SSD TCO over the system lifetime includes:
Cost of Drives Determine how many drives will need

to be purchased during a 3- to 5-year period, including

micron.com
Products are warranted only to meet Microns production data sheet specications. Products and specications are subject to change without notice.
2013 Micron Technology, Inc. Micron and the Micron logo are trademarks of Micron Technology, Inc. All other trademarks are the property of their respective owners. All rights reserved. 04/13 EN

You might also like