0% found this document useful (0 votes)
169 views45 pages

3PAR Performance

The document discusses performance of the HPE 3PAR StoreServ storage array. It covers hardware component limits and throughput for different components like CPU, disks, cache, and fiber channel links. It also describes how the adaptive cache and read-ahead algorithms work to optimize performance.

Uploaded by

malendo.piotr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views45 pages

3PAR Performance

The document discusses performance of the HPE 3PAR StoreServ storage array. It covers hardware component limits and throughput for different components like CPU, disks, cache, and fiber channel links. It also describes how the adaptive cache and read-ahead algorithms work to optimize performance.

Uploaded by

malendo.piotr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

HPE 3PAR StoreServ

Performance

Tomasz Piasecki
[email protected]

24 Oct 2018

© 2015 Hewlett-Packard Development Company, L.P. – Peter Mattei


Layers to consider

1. Host VLUNs
2. Host Ports
3. Node Cache
4. Node Volumes
5. Disk Ports
6. Disks
Understanding Performance
Common terms and definitions

IOPS (Input/output operations per second)


✓ Each component has a max IOPS value based on physical hardware and design
✓ Can vary greatly based on type of workload (pattern, RAID, RW ratio, queue,etc.)

Bandwidth (bits/s, KB/s, MB/s, GB/s)


✓ Each component has a max bandwidth value based on physical HW & design
✓ Can vary greatly based on type of workload (pattern, RAID, RW ratio, queue, etc.)
✓ Based on the following equation and known variables:

 Bandwidth = IOPS x (block size of IO)

Service Time
✓ Time to process one request (IO) once received
✓ In some cases, this might include any wait time as well
✓ Can vary greatly based on type of workload
Hardware Component Limits - CPU

CPU can limit your performance if maxed out


 Will cause InServ to not be able to reach max performance
 Even if only one node is at max, can cause entire system to run slow

What processes takes a lot of CPU?


 Active RC nodes have higher CPU utilization (30-50% higher)
 SSH connections uses more CPU than CLI sessions
 Very high IO rates (more read heavy loads) delack
 Volume compression or Volume compression + deduplication
3PAR ASIC
It makes the difference in real-world environments

RAID Rebuilds Data Deduplication

Snapshots Snapshots

Thin Reclamation Thin Reclamation

Sub-LUN Tiering Sub-LUN Tiering


CPU Load

CPU Load
Data Deduplication RAID Rebuilds

Mixed Workload Mixed Workload

Thin Provisioning Thin Provisioning

Replication Replication

RAID Calculations RAID Calculations


Inter-Node & Cache IO Inter-Node & Cache IO

Any
Processors
Intel x64
Processors
+
Most Arrays 3PAR StoreServ
Hardware Component Limits – Drive Chassis

InServ Drive Chassis (2 x FC links)

or

SAS links (12Gbps) – newer 3Par models

Notes:
✓We do not support enough drives to max out IOPS limits of a chassis!
✓In most cases, only worried about MB/s (throughput) on a FC link.
Hardware Component Limits – FC Links

Type(Speed) IOPS* MB/s


4 Gb/s Ports 30,000 360
8 Gb/s Ports 60,000 720
16 Gb/s Ports 120,000 1440

Notes:
✓In most cases, only worried about MB/s (throughput) on a FC link.
✓Limits apply to anything with a FC link (Hosts AND Drive Chassis)
Understanding Performance
How performance is tested in the results that follow

IOPS (Random)
✓ Small block sizes (16k or smaller)
✓ Random access pattern to the entire device
✓ Multiple threads/queues to the device

Bandwidth (Sequential)
✓ Large block sizes (128k or larger)
✓ Sequential access pattern to the entire device
✓ Single/Few thread/queue to the device
Understanding Performance
Larger Block Size Means Fewer Disk IOPS
Maximum IOPS vs Block Size (8K = 100%)
120%
Percentage Drop from 8K Value

This graph illustrates that max IOPS per PD


100%
drops as the block size increases. The
drop can be a disk limit or a port/node
80% bandwidth limitation. Actual impact
depends on many factors such as number
60%
of drives, Inserv model, etc.

40%

20%

0%
4k 8k 16k 32k 64k 128k 256k 512k
Hardware Component Limits
Disk Drives

Disk Type IOPS MB/s


SSD Variable (covered in next slide)

15K FC 200 45

10K FC 150 45

7.2K NL 75 30

Notes:
• These are back-end numbers only! Use performance spreadsheet to determine front-end
performance.
• Performance assumes load described earlier. Changing the workload changes performance!
Hardware Component Limits
SSD performance:
100 GB and 200GB SSDs, 4KB or 8KB block size

Workload Max IOPS RAID 1 Max IOPS RAID 5

100% Reads 8000 7500


70% Reads, 30% Writes 6000 3300

50% Reads, 50% Writes 5000 3000

30% Reads, 70% Writes 5000 2800


Notes:
100% Writes 5000 2800

• These are back-end numbers only! Use performance spreadsheet to determine front-end
performance.
• Performance assumes load described earlier. Changing the workload changes performance!
InServ Cache Behavior

High Level Caching Algorithm


Cache is mirrored across nodes to ensure there won’t be data loss in the event of a node
failure.
The algorithm is self-adapting, determining in real-time the allocation for writes versus
reads.
 Under intensive reads, up to 100% cache per node may be used.
 Under intensive writes, up to 50% cache per node may be used for its own dirty data or as little as
12.5% with 8-nodes.
The cache is used primarily as a buffer to facilitate data movement to and from disk.
✓ The write cache acts as a very fast buffer cache that allows the InServ to acknowledge the
host as soon as the data is mirrored in the cache rather than having to wait for it to be written to
disk.
3PAR Adaptive Cache
Self-adapting Cache – 50 to 100% for reads / 50 to 0% for writes
3000
Host Load
20K IOPs
2500
MBs of Cache Dedicated to Writes per Node

30K IOPs
40K IOPs

2000

1500

1000

500

0
0 10 20 30 40 50 60 70 80 90 100
% Read IOPS from Host
InServ Cache Behavior
Read-Ahead Algorithm

We detect & pre-fetch for up to 5 interlaced streams per VV


✓ Streams do not need to be purely sequential.
✓ Our pre-fetch algorithm is adaptive and will trigger when multiple pages in a set range or
zone are read. This allows us to read ahead on various patterns of reads such as reading
every other block.

When an IO comes in, a zone is computed equal to 8 times the IO


size.
✓ An 8KB IO will have a 64KB zone. If 3 reads hit in this zone, a read-ahead is triggered.
✓ If an IO size is 512KB or less, then the read-ahead is (IO size*8), with a minimum of 1MB.
✓ If an IO size is greater than 512KB, then the read-ahead is (IO size*4), with a maximum of
4MB.
“Delayed ack” mode
statcmp command displays Cache Memory Page (CMP) statistics by node or by
virtual volume.
It shows the number of pages dirty per node and type of disks, the maximum number
of pages allowed and the number of delayed acks :

Current number of Max allowed pages Number of delayed acks


dirty pages for each per node for this This counter is incremented
node for this type of type of disks whenever a delayed ack
disks (instant) happens

16
Delayed ack mode

• “Delayed ack” is a behavior of HP 3PAR systems when the cache gets filled faster than it can be de-
staged to disk (most likely because the physical disks are maxed out)

• This is determined by the number of dirty cache pages for a type of disk exceeding 85% of the allowed
maximum

• When this happens, the system will start delaying incoming write IOs to keep the number of dirty pages
below this number

• At this point, increasing the amount of outstanding IOs will only result in a higher average response
time

17
MB/s per disk and Write cache flusher limits

– Upon writing to the 3PAR array, the data will be put in write cache. Each 3PAR controller
node only allows a maximum number of pages for a type of disk, based on the number of
disks of each type.
– When reaching 85% of this number of maximum allowed cache page, the system will start
delaying the acknowledgement of IOs in order to throttle down the hosts, until some
cache pages have been freed by having their data de-staged to disk (condition known as
“delayed ack”)
– This de-staging happens at a fixed speed that will also depend on the number of disks of
each type.
– Because this de-staging happens at fixed speed, the maximum write bandwidth of the
hosts will be limited to the speed of this de-staging

18
Max number of pages per disk

• The maximum number of cache pages is a function of the number of disks of


each type that are connected to a given node:
SSD 4800 pages per PD
FC 1200 pages per PD
NL 600 pages per PD

Example on a 4 node system with 32 SSDs, 256 FC disks and 64 NL disks


(each node will see 16 SSDs, 128 FC and 32 NL):
Per node : 76800 pages for SSDs, 153600 pages for FC, 19200 pages for NL

19
stat commands

statvlun -ni –rw


-ni only non-idle devices are displayed
statport -host -ni -rw
-rw (read/write data separate)
-d interval <secs>
statport -d 15 -iter 1 -ni
-host/disk only host ports/disk ports
statvv -ni -rw
statport -disk -ni
statpd -ni -rw
statcpu -t -d 15 -iter 1
statcmp –iter 1
3par performance Troubleshooting
– Get Problem statement
• What is the exact performance problem they are seeing
• Hosts involved, SAN topology specific operations

–Collect configuration data


• SAN configuration (HPCC)
• Insplore Data of array

–Calculate performance limits


• Based on Array type, ports, disks/disk types RAID this array should do
– XYZ MB/Sec
– ABC/IOPS

– Collect stat data


– Format stat data
– View performance data to configuration
3par performance analysis basics

Start looking at

• Front-end

• Backend

• VV/CPU

• Do one or more of these areas show higher than calculated activity or


latencies

Global Solutions Engineering


Performance Isolation Examples

statvlun statport statvv Action


Low Low Low None required

High Low Low Verify host system resources and hba settings

High Low High verify statpd and statcmp

High High Low verify connectivity between host and array


SSD

RAID 5 RAID 1
Performance

Fast Class RAID 6

RAID 5 RAID 1

Nearline In a single command non-


disruptively optimize and adapt
RAID 6 • Cost
RAID 1 • Performance
• Efficiency
• Resiliency
RAID 6

Cost per Useable TB


Express Layout
&
Express writes
3PAR Express Layout
Provides more flexibility and usable capacity in smaller configurations
Traditional 3PAR Logical Disk Layout 3PAR Express Layout
– A physical drive is owned by one node – A physical drive is owned by a node pair
– Logical Disks (LD) / RAID sets are built by node with – Provides more capacity efficiency with small numbers of drives
chunklets from owned drives – LDs are built by node with chunklets from all drives *
– Standard layout for all configurations – Introduced for SSD with 3PAR OS 3.2.2
– Capacity inefficient in very small configurations – Support expanded to all drive types with 3PAR OS 3.3.1
i.e. RAID 5 (4+1) i.e. RAID 5 (4+1) i.e. RAID 5 (7+1) or RAID 6 (6+2)

LD LD LD LD

Node0 Node1 Node0 Node1

Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive

* Caution for 2-node systems only:


In order to allow the array to create new LD even after a drive failure, the set size for the CPG must be no more than the number of
PDs, minus the number of PDs for fault tolerance - I.e. with 8 drives use R5 7+1 or 6+2, with 8 drives use R5 6+1 or R6 4+2
Also see this advisory https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00045368en_us
3PAR Express Layout rules
Drives per node RAID level Express Standard
pair and CPG and set size Layout * Layout When does Express Layout kick in
R5 2+1 N Y Automatically when a single node cannot build Logical Disks
6 R5 5+1 Y N (LD) using the standard layout with the selected set size
R6 4+2 Y N
R5 3+1 N Y
8 R5 7+1 Y N
When will Standard Layout be used
R6 6+2 Y N
When a single node can build LDs with the selected set size
R5 4+1 N Y – initially or after a drive upgrade
10 R5 7+1 Y N
R6 8+2
R5 5+1
Y
N
N
Y
Logical Disk (LD) examination
12 R5 7+1 Y N
As each LD is examined, the array will look at the set size
R6 10+2 Y N
and the number of available drives behind a node and
R5 6+1 N Y identify if Express Layout needs to be used
14 R5 7+1 Y N
R6 10+2
R5 7+1
Y
N
N
Y
Tunesys
16
R6 14+2 Y N
When tunesys runs i.e. after an upgrade, each LD is
R5 8+1 N Y
examined and restriped across the new drives. LDs will be
18
R6 14+2 Y N built and/or rebuilt in the matching layout
R5 8+1 N Y
20
R6 14+2 Y N
R5 8+1 N Y
* Note for 2-node systems only:
28 In order to allow the array to create new LD even after a drive failure, the set size for the CPG must be no
R6 14+2 N Y more than the number of PDs, minus the number of PDs for fault tolerance - I.e. with 8 drives use R5 7+1 or
6+2, with 8 drives use R5 6+1 or R6 4+2
Also see this advisory https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00045368en_us
3PAR Virtualization Concept with OS ≥ 3.3.1
Drive ownership – Example 1: 2-Node System with 2 drive enclosures

• Nodes are installed in pairs for redundancy


and write cache mirroring

• In this example we have a 2-node


configuration with 2 drive enclosures
Node0 Node1

• Physical Drives (PD) are owned by a node pair


3PAR Virtualization Concept with OS ≥ 3.3.1
Drive ownership – Example 2: 4-Node system with 8 drive enclosures

• Nodes are installed in pairs for redundancy


and write cache mirroring
• 3PAR StoreServ arrays with 4 or more
nodes supports “Cache Persistency”

• In this example we have a 4-node Node 2 Node 3

configuration with 8 drive enclosures in total


• A Drive Enclosure belongs to a node pair Node 0 Node 1

• Physical Drives (PD) are owned by a node


pair
3PAR Virtualization Concept with OS ≥ 3.3.1
End-to-end on a 4-node system Server

i.e. RAID6 (6+2)


Active-active
Multipathing
Process step Phase state
Physical drives (PD) are Exported
automatically formatted in Disk initialization LUN
1GB Chunklets

Chunklets are bound together to form


LD LD LD LD
Logical Disks (LD) in the format Defines RAID level, step size, set LD LD
C LD LD
LD LD LD LD
defined in the CPG policies size and redundancy Virtual
(RAID level, Step Size …) P Volume
LD
LD
LD
LD G LD
LD
LD
LD
Virtual Volumes are built striped LD LD LD LD
Autonomical wide-striping across all
across all LDs of all nodes from all
Logical Disks (LD)
drives defined in a particular CPG

Present and access LUNs across


Virtual Volumes can now be
multiple active-active paths
exported as LUNs to servers
(HBAs, Fabrics, Nodes)
3PAR Express Writes
Lowering Write Latency for SSD and HDD
3PAR OS protocol Optimization
• Acceleration improves overall throughput and read Host
performance s
• Accelerate OLTP databases by reducing write latencies on • Delivers lower CPU interrupts per I/O
transaction logs transaction.
• Enabled by default no configuration necessary Optimized Writes • Results in higher IOPS and reduced
 fewer interrupts
Fibre Channel latency for writes.
• 3.2.1 introduced 8Gb adapter support in 7k, 8k, 10k and 20k • Does not require any changes on the
systems Cache initiators side.
• Provides up to 20% improvements in latency • All supported HBAs and Hosts will
• Able to provide sub 0.2 ms write latency benefit from this optimization.
iSCSI
3PAR
• Available with 3.3.1 for 8k, 9k and 20k systems
• Provides up to 50% improvements in latency
• Able to provide sub 0.3 ms write latency
Read the Demartek report on Express Writes
Read in Ivan’s Blog about Express Writes
HPE 3PAR FC Express Writes
Write latency improvement on a 3PAR 7450
Percentage of write IOs Latency Percentage of write IOs
0% 0.062 ms 5%
2% 0.125 ms 45%
70% 0.25 ms 40%
28% 0.5 ms 10%

%
3PAR OS ≤ 3.1.2 3PAR OS ≥ 3.2.1
3PAR Adaptive Flash Cache (AFC)
Read Cache Extension for hybrid arrays

128MB region
AO VV 1-Tier VV
Advantages and Use Cases

moves
• Leverages SSD providing lowers latency for HDD workloads
• Can be enabled system-wide or for individual Virtual Volume sets (VVset)
• Complementary to Adaptive Optimization (AO)
• AO moves regions of 128MB between tiers based on IO density sampling Cache Control
• AFC increases Read Cache hits by de-staging 16kB DRAM pages to Flash Cache Write Read Flash Cache
look-up
Cache Cache
• No dedicated SSDs required – can be shared with SSD Tier and AO DRAM DRAM
hash table

• Starting with 3.2.2 default for 7k,8k and 20k is RAID0 (was RAID1 before)

16KB cache page moves


• Min required SSD and supported max. capacities can be found in the HPE
3PAR Support Matrices on SPOCK.
www.hpe.com/storage/spock  other Hardware: 3PAR  HPE 3PAR support matrix 3.x.x Nearline Fast Solid
(NL) Class State
• SSMC built-in or CLI simulation helps determining the expected (FC) (SSD)

acceleration Flash
Cache
3PAR Adaptive Flash Cache performance benefits
Random read acceleration and increased write throughput in mixed workloads

Fewer random read requests to the HDD Front-end IOPS


 more available capacity for write caching
2500

Tested configuration: 2000

• 3PAR 7200
• 8 x 10 k SAS 1500

IOPS
• 4 x SSD and 668GB Adaptive Flash Cache Read
Write
• Workload: 1000
• 32 kB blocks
• 60/40 r/w 500
• 60/40 sequential/random

0
AFC disabled AFC enabled
3PAR Adaptive Flash Cache
Effect on a 7000 array with a working set of 1.2 x Cache size

Tested Configuration
• StoreServ 7400 – 2 Node
3 x IOPS • 100% 8kB Random Reads
• Working set : Cache = 1.2 : 1
• 272GB total Cache (16GB DRAM + 256 AFC)
• 320GB working set
• 32 x 10GB VV
• RAID5, SAS

1/3 Latency
Alerts & Event Logs
Monitoring and managing alerts

showalert [–n|-f|-a|-all] [-d|-oneline –svc]

setalert new|fixed|ack <alert_ID>

removealert <alert_ID
Setting threshold alert

Most system alerts are generated automatically


• Component State change
• Task failed
• Overall File Services for Node etc...

Several types of alerts can be configured manually


• Limits relating to CPGs
• Limits for virtual volumes capable of allocating space on demand
• The raw space alert - a global system threshold
Setting threshold alert
CPG limits

Setting Limits relating to CPGs


Threshold of used LD space= 2TB, when exceeded, results in a warning alert for cpg1
setcpg –sdgw 2T cpg1

Limit of CPG auto-grow for cpg1


setcpg –sdgl 3T cpg1
Setting threshold alert
Virtual Volumes limits

Setting for virtual volumes space allocation threshold

User space allocation warning. Generates a warning alert when the user data space of the TPVV
exceeds the specified percentage of the virtual volume size
setvv –usr_aw <threshold> vvname

User space allocation limit. The user space of the TPVV is prevented from growing beyond the
indicated percentage of the virtual volume size.
setvv –usr_al <threshold> vvname

showvv –s
Setting threshold alert
Raw space limits

Setting the raw space threshold alert


To set a raw space alert for a storage system with NL drive
setsys RawSpaceAlertNL <threshold>

To set a raw space alert for a storage system with FC drives


setsys RawSpaceAlertFC <threshold>

To set a raw space alert for a storage system with SSDs


setsys RawSpaceAlertSSD <threshold>

For each of these commands, <threshold> is an integer from 10 to 100,000 that represents the
total available space on the system in gigabytes. A value of 0 enables the default raw space alerts
of 50%, 75%, 85% and 95%.
Monitoring and managing the event log

• Include all alerts generated and alerts marked as acknowledged or fixed


• Active event log size = 4 MB.
• 11 versions of event log: current + past 10 versions.
• Total size is 44 MB (all 11 event log versions)

showeventlog [–min –more -online]


-sev <pattern>
only events with severities that match the specified pattern(s) are displayed.
Fatal, Critical, Major, Minor, Degraded, Informational
Debug.
-msg <pattern>
only events, whose messages match the specified pattern(s), are displayed
Monitoring and managing the event log

Displaying the current event log size

showsys -param
System parameters from configured settings
----Parameter----- ---Value----
RawSpaceAlertFC : 800
RawSpaceAlertNL : 0
RemoteSyslog : 1
RemoteSyslogHost : 192.168.6.15
SparingAlgorithm : Minimal
CopySpaceReclaim : 0
EventLogSize : 4M
Monitoring and managing the event log

Changing the default event log size 5 within the range 512 KB to 4 MB
setsys EventLogSize

Exporting the event log to remote syslog servers


• General - sends all non-debug events to the syslog server.
setsys RemoteSyslogHost
{{<hostname>|<IPv4>}[:<port>]|<IPv6>|[<IPv6>]:<port>}

• Security - sends security related events (user logins, commands issued, etc)
setsys RemoteSyslogSecurityHost
{{<hostname>|<IPv4>}[:<port>]|<IPv6>|[<IPv6>]:<port>}

• Display status showsys –d


• Disabling event log exporting setsys RemoteSyslog 0
The End

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

You might also like