Ssds For Big Data
Ssds For Big Data
Ssds For Big Data
Executive Summary
Big data applications handle extremely large datasets that present challenges of scale. High-performance IT infrastructure is necessary to achieve very fast processing throughput for big data. Solid state drives (SSDs) based on NAND Flash memory are well-suited for big data applications because they provide ultra-fast storage performance, quickly delivering an impressive return on investment. SSDs can be deployed as host cache, network cache, all-SSD storage arrays, or hybrid storage arrays with an SSD tier. Depending on the big data application, either enterpriseclass or personal storage SSDs may be used. Enterprise SSDs are robust and durable, offering superior performance for mixed read/write workloads, while personal storage SSDs typically cost less and are suitable for read-centric workloads. It is important to understand the workload performance and endurance requirements before making a decision.
predictions
Scientic computing, such as seismic processing,
sites, search engines, video sharing, and hosted services The primary reason for implementing big data solutions is productivity and competitive advantage. If analyzing customer data opens up new, high-growth market segments; or if analyzing product data leads to valuable new features and innovations; or if analyzing seismic images pinpoints the most productive places to drill for oil and gasthen big data is ultimately about big success. Big data presents challenges of extreme scale. It pushes the limits of IT applications and infrastructure for processing large datasets quickly and cost-effectively. Many technologies and techniques have been developed to meet these challenges, such as distributed computing, massive parallel processing (e.g., Apache Hadoop), and data structures that limit the data required for queries (e.g., bitmaps and column-oriented databases). Underlying all of this is the constant need for faster hardware with greater capacity because big data requires fast processing throughput, which means faster, multicore CPUs, greater memory performance and capacity, improved network bandwidth, and higher storage capacity and throughput.
SSDs are an order of magnitude denser and less expensive than DRAM, but DRAM has higher bandwidth and signicantly faster access times. Compared to HDDs, SSDs offer orders of magnitude faster random I/O performance and lower cost per IOPS, but HDDs still offer the best price per gigabyte. With capacity pricing for Flash memory projected to fall faster than other media, the SSD value proposition will continue to strengthen in the future.
SSD Benets
Exceptional Storage Performance Deliver good
sequential I/O and outstanding random I/O performance. For many systems, storage I/O acts as a bottleneck, while powerful, multicore CPUs sit idle waiting for data to process. SSDs remove the bottleneck and unleash application performance, enabling true processing throughput and user productivity. Nonvolatile Retain data when power is removed; no destaging required, like DRAM. Low Power Consume less power per system than equivalent spinning disks, reducing data center power and cooling expenses. Flexible Deployment Available in a unique variety of form factors and interfaces compared to other storage solutions: o Form factors: Half-height, half-length (HHHL), 2.5-inch, 1.8-inch, mSATA, m.2, etc. o Interfaces: PCIe, SAS, and SATA
array that uses Flash for storage and DRAM for ultrahigh throughput and low latency. All-SSD arrays offer features like built-in RAID, snapshots, and replication traditionally found in enterprise storage. They may include technologies like inline compression and deduplication to shrink the data footprint and maximize SSD efciency. This option provides additional management of SSDs as it relates to wearout over the entire array. SSD Tier in a Hybrid Storage Array A traditional enterprise storage array that includes SSDs as an ultra-fast tier in a hybrid storage environment. Automated storage management monitors data usage and places hot data in the SSD tier and cold, or less-frequently accessed, data in high-capacity, slower HDD tiers to optimize storage performance and cost. This option works well for mixed data, some of which requires very high performance. A variation on hybrid storage is when an SSD is incorporated as secondary cache in the storage controllers read/write cache.
as a level-2 cache for data moved out of memory. Intelligent caching software determines which blocks of data to hold in cache. Typically, PCIe SSDs are used because they offer the lowest latency because no host controllers or adapters are involved. Best results are achieved for heavy read workloads. Cache may be read-only or write-back. Redundant SSDs are recommended for write-back to ensure data is protected. Network Cache Similar to host cache, except SSDs reside in a shared network appliance that accelerates all storage systems behind it. Out-of-band cache is read-only, while in-band is write-back. Network cache offers a better economic benet because it is shared, but it can be slower than direct host cache.
Personal storage SSDs are designed for good read performance and tailored reliability and durability. They are optimized for workloads where reads are more frequent than writes. Personal storage SSDs offer high capacity and lower price per gigabyte than enterpriseclass SSDs. Web 2.0 public cloud applications like social networking sites are characterized by users uploading images, video, and audio les, which are subsequently downloaded or streamed by other users. This type of writeonce, read-many-times workload is a good candidate for personal storage SSDs.
Mixed read/write
What is the IOPS performance requirement for your application, including read/write mix?
Read-only
Yes
ise SS D
No
rs Pe on al Sto
No Yes
Drive TCO = Cost of drives + Cost of downtime + Cost of slowdown + Cost of IT labor + Risk of data loss
Question #1: What is the IOPS performance requirement for your application, including the read/write mix?
The rst step is to quantify the workload that SSDs will support. An application workload can be measured using a variety of performance monitoring tools. Beyond workload, also consider the conguration of the system and the impact on the overall platform.
Question #2: What are the endurance requirements for your application?
For mixed read/write workloads, it is important to look closely at SSD durability ratings. This is usually expressed in terms of total bytes written (TBW) or full drive writes per day over a 5-year period. By comparing an applications daily write total with the durability rating of an SSD, it is possible to estimate the drives lifetime in your environment (assuming a constant workload; it might be wise to also estimate future growth). If the write workload is small enough that the estimated lifetime of a personal storage SSD will at least equal the 3 to 5 years typically expected of an IT system, and performance is sufcient, then personal storage SSDs can be a good choice. However, if personal storage SSDs will likely wear out and need to be replaced during the IT systems lifetime, then replacement costs should be considered.
replacements due to wearout, and multiply this gure by the acquisition cost. Cost of Application Downtime If the application needs to be taken ofine to replace an SSD, what is the cost for that lost productivity? Multiply this gure by the number of replacements. Cost of Slower Application Performance If the application does not have to go ofine for drive replacements, but system performance will slow during the replacement and subsequent data replication or RAID rebuild, how will this affect user productivity? Multiply this cost by the number of replacements. Cost of Labor for Drive Replacement Drive monitoring and replacement will be an additional management task for the IT staff, so the cost of labor should be included. Risk of Data Loss For unprotected drives, there is a signicant risk of data loss, and even for RAIDprotected drives, there is a small risk during the RAID rebuild window. Though difcult to quantify, these risks should be factored into the cost.
Conclusion
SSDs are a popular solution for big data applications. Deciding between personal storage and enterprise-class SSDs will depend on performance and endurance requirements and TCO.
Question #3: What is the SSD total cost of ownership (TCO) over 3 to 5 years?
The SSD TCO over the system lifetime includes:
Cost of Drives Determine how many drives will need
micron.com
Products are warranted only to meet Microns production data sheet specications. Products and specications are subject to change without notice.
2013 Micron Technology, Inc. Micron and the Micron logo are trademarks of Micron Technology, Inc. All other trademarks are the property of their respective owners. All rights reserved. 04/13 EN