Best Practices of OceanStor Dorado Oriented To Oracle Database Deploymen...
Best Practices of OceanStor Dorado Oriented To Oracle Database Deploymen...
Issue 01
Date 2020-12-10
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: https://e.huawei.com
Contents
1 Overview....................................................................................................................................1
1.1 Overview.................................................................................................................................................................................... 1
1.2 Conventions............................................................................................................................................................................... 2
1.3 Change History......................................................................................................................................................................... 3
7 Summary................................................................................................................................. 59
8 References............................................................................................................................... 67
9 Glossary................................................................................................................................... 68
10 Appendixes............................................................................................................................69
1 Overview
1.1 Overview
1.2 Conventions
1.3 Change History
1.1 Overview
Oracle databases are enterprises' core applications. Typically, Oracle databases are
used for online transaction processing (OLTP), online analytical processing (OLAP),
and data warehouse (DSS). However, many enterprises' database systems run
slowly for the following possible causes: The configuration and deployment mode
of the upper-layer application, database, host, network parameter, and storage
device are not optimal, or the plan and design are not proper. To meet the unique
requirements of Oracle workloads, Huawei provides the best practices of the
OceanStor Dorado V6 storage solution.
This document describes the best practices of a storage solution where the
OceanStor Dorado V6 is used with Oracle databases, with the focus on how to use
the OceanStor Dorado V6 to efficiently deploy the Oracle database service. This
document aims to help you obtain higher service deployment efficiency and
service operation quality as well as ensure the performance and availability of
storage systems and Oracle databases. This document also aims to help database
administrators, application architects, system administrators, network
administrators, storage administrators, and system architects easily and quickly
deploy stable, reliable, high-performance Oracle database systems.
NOTE
The recommended configurations are applicable to general-purpose scenarios. In specific
scenarios, you need to verify whether the configurations are appropriate.
1.2 Conventions
Symbol Conventions
Symbol Description
General Conventions
Convention Description
Command Conventions
Format Description
Applicable Versions
The best practices provided in this document are applicable to the products
described in the following table.
NOTE
For details about FlashLink of the OceanStor Dorado V6 series, click here.
HyperCDP
Generally, the data volume of a database in an Oracle data center can reach 100
TB. The traditional backup and restoration mode can meet the basic backup
requirements, but its long duration affects the progress of deploying development
and test environments, delaying product delivery. In addition, it is difficult to use
production data at any point in time to deploy development and testing
environments as required.
HyperCDP can address the challenges brought by the traditional backup mode. As
a high-density snapshot technology, HyperCDP can generate fully available copies
of specified data sets. A data copy at a certain point in time is generated through
redirect-on-write (ROW) and the mapping table, instead of physical data copying.
Data centers and enterprises can use such data copies for data protection and
recovery. In addition, the data copies can also be used to quickly deploy
development and test environments, with no more than 5% of the impact on
production service performance. HyperCDP generates data copies within seconds.
This shortens data backup and recovery durations, promotes development and test
environment delivery, and ensures the product delivery progress compared with
Oracle DataGuard.
HyperMetro
DR solutions are crucial to protecting customers' production systems against
natural disasters or misoperations and ensuring service continuity, recoverability,
and high availability. Conventional Oracle databases generally use DataGuard and
GoldenGate for DR. However, the recovery process is complex and may cause
exceptions. On the other hand, the traditional active-passive DR solution of
storage systems has disadvantages such as low resource utilization and high total
cost of ownership (TCO). In the event of a fault, customers must manually switch
over services and data recovery takes a long time, resulting in long service
interruption time. To address these challenges, the active-active solution is
proposed. Huawei's active-active data center solution allows two data centers to
carry service loads concurrently, improving service performance and resource
utilization. Both data centers back up each other. If either one fails, the other one
automatically takes over services to guarantee service continuity.
When the active-active data center solution is deployed for an Oracle cluster, the
hosts in the cluster run on two active-active storage systems. The hosts and
storage systems are deployed in different data centers for remote DR, providing
the optimal RPO and RTO for database cluster applications.
databases, and the little impact on services brings more than 65% of storage
space saving.
For details about SmartDedupe and SmartCompression of the OceanStor Dorado
V6 series, click here.
3.1 Overview
3.2 Storage Pools
3.3 LUNs
3.4 Mapping Views
3.1 Overview
Parameter/ Recommended Value/Method Default Value/ Location
Operation Method
Disk Capacity
The best practice about a storage tier is to use the same disk type and capacity. If
member disks vary in capacity, larger-capacity disks cannot be fully used, causing
a waste of capacity. If member disks vary in rotational speed, the performance
may deteriorate.
The disk capacity defined by disk manufacturers is different from that calculated
by operating systems. As a result, the nominal capacity of a disk is different from
that displayed in the operating system.
NOTE
Number of Disks
The best practice is to configure no more than 100 disks for each tier in a storage
pool. Assume that the number of disks at a tier is D (D/100 is rounded to an
integer N and the remainder is M). In this case, refer to the following rules:
● If D ≤ 100, configure all disks of this tier in one disk domain.
● If D > 100, create N+1 disk domains and evenly distribute all disks to these
disk domains. That is, the number of disks in each disk domain is D/(N+1). In
addition, it is recommended that disk enclosures be fully populated with disks.
● For the SmartTier feature, it is recommended that a maximum of 100 disks be
configured for each tier of a disk domain. Follow the preceding rules to
configure the specific number of disks at each tier.
Example 1: The total number (D) of SSDs of a storage system is 328. (328/100
rounded up to 3, which is the value of N. The remainder is 28, which is the value
of M). It is recommended that four (N+1) disk domains be configured and the
number of disks in each disk domain be 82 (328/4).
Example 2: If the number (D) of SSDs in a storage system is 223 (223/100
rounded up to 2, which is the value of N. The remainder is 23, which is the value
of M). It is recommended that three (N+1) disk domains be configured and the
number of disks in each disk domain is 74.3 (223/3). Therefore, 74 disks are
configured for each of two disk domains and 75 disks are configured for the
remaining disk domain.
NOTE
A larger number of disks in a disk domain results in a higher reconstruction speed. If the
number of disks in a disk domain reaches 100, the disk domain reliability reaches
99.9999%. However, if the number of disks is too large, the RAID 2.0+ computing overhead
increases and uneven disk use occurs, affecting storage performance to some extent.
Different RAID levels use different RAID algorithms, but the read/write
performance is similar. The performance of RAID 5, RAID 6, or RAID-TP decreases
slightly as the redundancy increases. The performance of different I/O models
(random/sequential read/write) is different:
● The performance of random read is equivalent to that of sequential read.
● The performance of sequential write is higher than that of random write.
You are advised to use RAID-TP for important Oracle database services.
NOTE
● RAID 2.0+ allows all member disks in a storage pool to provide hot spare capacity. For
ease of understanding, the hot spare capacity is expressed in the number of hot spare
disks on DeviceManager.
● If the total number of disks in a storage pool is less than 13, the number of hot spare
disks must meet the following requirement: Total number of disks – The number of hot
spare disks ≥ 5.
3.3 LUNs
Block storage uses LUNs as basic units for services. All LUNs created on OceanStor
Dorado V6 storage systems are thin LUNs. Before using a storage system of Oracle
database services, select a proper application type for LUNs based on actual
storage needs.
NOTE
● Each preset application type has a default application request size: 32 KB for
Oracle_OLAP and SQL_Server_OLAP, and 8 KB for the remaining types.
● The application type of a LUN cannot be changed after being set.
● For details about the number and size of LUNs, see 6.3 ASM.
NOTE
● For details about the number and size of LUNs, see 6.3 ASM.
To ensure the storage system's security and the connectivity between the storage
system and the database server, you need to plan the networking and security
authentication. The following describes the best practice of general-purpose
networking configuration in the Oracle database scenario.
4.1 Overview
4.2 Logical Topology
4.3 Physical Topology
4.4 Number of Paths
4.1 Overview
Parameter Recommended Value/Method Default Location
/ Value/
Operation Method
NOTE
This document describes only one networking model. For more networking
models, click here.
dual-port HBAs for the host and connect the HBAs to different switches. 16
Gbit/s or higher-rate Fibre Channel interface modules are recommended.
● Point-to-point zone planning
In a point-to-point zoning plan, each zone contains only one initiator and one
target. Point-to-point zoning is the most reliable and secure zoning rule.
Configuring zones effectively ensures at the network layer that access
permissions are allocated to nodes on a network strictly according to the
service plan. In other zoning rules, devices in zones interfere with each other.
However, the practice of making a zone contain only one initiator and one
target minimizes the possibility of this problem.
NOTE
If zones are incorrectly configured, link contention exists, leading to a decrease in host
performance.
To facilitate demonstration, zones are planned based on ports. In an actual project, you are
advised to plan zones based on WWPNs. If port-based zones are used, subsequent port
changes will interrupt services and require users to reconfigure zones.
NOTE
The physical networking planned in this document provides theoretical support for zone
planning and multipathing quantity calculation. In an actual project, plan physical
networking based on site requirements.
Figure 4-2 Fibre Channel physical networking where a two-controller storage system is used
NOTE
The physical networking planned in this document provides theoretical support for zone
planning and multipathing quantity calculation. In an actual project, plan physical
networking based on site requirements.
Figure 4-3 Fibre Channel physical networking where a multi-controller storage system is used
Diagrammatic Illustration
As shown in the following figure, each PCIe HBA on the Oracle server connects to
a port on each switch (four ports involved in total), and each controller connects
to a port on each switch (eight ports involved in total). Therefore, there are four
paths from a port on the server to a storage controller. The server has four ports
connected. Therefore, there are 16 paths from the server to the storage system. As
shown in Figure Diagrammatic illustration of multipathing, there are 2 x 4 + 2 x 4
= 16 storage paths.
Combination Enumeration
One controller port corresponds to one host port. Therefore, you can list all paths,
starting from the host side, as shown in the following table. There are 16 storage
paths in total.
PCIE01_Port01 A_Controller_P0
PCIE01_Port01 B_Controller_P0
PCIE01_Port01 C_Controller_P0
PCIE01_Port01 D_Controller_P0
PCIE01_Port02 A_Controller_P1
PCIE01_Port02 B_Controller_P1
PCIE01_Port02 C_Controller_P1
PCIE01_Port02 D_Controller_P1
PCIE02_Port01 A_Controller_P0
PCIE02_Port01 B_Controller_P0
PCIE02_Port01 C_Controller_P0
PCIE02_Port01 D_Controller_P0
PCIE02_Port02 A_Controller_P1
PCIE02_Port02 B_Controller_P1
PCIE02_Port02 C_Controller_P1
PCIE02_Port02 D_Controller_P1
5.1 Overview
5.2 Multipathing
5.3 HBA
5.4 Udev Configuration
5.5 I/O Scheduler
5.6 Kernel Parameters
5.7 HugePages
5.8 Optimization of Other Parameters
5.1 Overview
Parameter/ Recommended Value/ Default Value/ Location
Operation Method Method
HBA port retry If QLogic HBAs are used: If QLogic HBAs are 5.3.1 HBA
interval Port Down Retry Count: used: Parameters
10 Port Down Retry
Link Down Timeout: 10 Count: 30
If Emulex HBAs are used: Link Down
Timeout: 30
lpfc_devloss_tmo: 10
If Emulex HBAs are
used:
lpfc_devloss_tmo:
10
DIRTY_BACKG 3 10
ROUND_RATI
O
SHMMAX Red Hat Enterprise Linux Red Hat Enterprise 5.6.2 Shared
7.0 x86_64: Linux 7.0 x86_64: Memory
18446744073692774399 4294967295 bytes
bytes Red Hat Enterprise
Red Hat Enterprise Linux Linux 7.1 and later:
7.1 and later: Default 184467440736927
value 74399 bytes
5.2 Multipathing
Windows hosts generally have built-in multipathing software Multi-Path I/O
(MPIO). MPIO can only implement basic failover and load balancing, but cannot
meet the application requirements of systems with higher reliability. Huawei
UltraPath provides basic failover and load balancing functions as well as the
following advanced functions: routine path test, protection against transient path
interruption, path isolation, path alarm push, and path performance monitoring.
UltraPath is more compatible with Huawei storage systems. In addition, UltraPath
for OceanStor Dorado provides the partition alignment function. When I/Os are
delivered, UltraPath detects the partition alignment attributes on the storage
system and calculates the optimal path for forwarding I/Os based on the LBA
information carried by the I/Os. This avoids extra resource overheads caused by
frequent non-aligned I/O forwarding on the storage system, improving the overall
storage performance. UltraPath can meet the requirements of the entire IT system
for reliability, performance, maintainability, and storage adaptation. You are
advised to use Huawei UltraPath for your Huawei storage systems.
If Huawei UltraPath is used, no extra configuration on the storage system or host
is required. By default, UltraPath provides the optimal configuration for Huawei
storage to achieve the best performance and reliability. The best practice is to use
Huawei UltraPath.
NOTE
The following best practices are all based on Huawei UltraPath. If multipathing
software provided by the system is used, refer to the OceanStor Dorado V6 Host
Connectivity Guide for Windows.
5.3 HBA
Before mounting storage resources to a host, check whether the HBA on the host
can be identified and work properly. The commands for querying HBAs vary
according to vendors. For details, see the help documents of the corresponding
vendors. This document describes only the best practice of general HBA
configuration in Oracle database scenarios.
Parameter Description
Link Down Specifies the time when the driver waits for the link to come
Timeout up after link down before returning I/Os. Valid values for the
(seconds) Link Down Timeout setting are ranging from 0 to 240. A value
of 0 indicates that no timeout is used.
● Value range: 0 to 240
● Default value: 30
● Recommended value: 10
Emulex
The following table lists the key parameters that need to be optimized of some
Emulex HBAs. For details about other parameters, see the documentation of the
corresponding Emulex HBA version.
lpfc_devloss_tmo Specifies the time when the driver waits for the link to
come up after link down before returning I/Os.
● Value range: 1 to 255
● Default value: 10
● Recommended value: 10
NOTE
You are advised to use Huawei UltraPath to configure HBA parameters. UltraPath can
automatically modify the Qlogic-qla2xxx.conf or elx-lpfc.conf configuration file on the
system for optimal HBA parameter configuration. If the multipathing tool provided by the
system is used, refer to the tool instruction provided by the corresponding vendor.
Figure 5-1 Running the iostat command to check the disk running status
In the command output, avgqu-sz indicates the average queue depth of the block
devices where LUNs are created. If the value is greater than 10 for a long time, the
problem is probably caused by concurrency limits. As a result, I/O operations are
piled up at the block device layer of the host instead of being delivered to the
storage side. In this case, you can modify the HBA concurrency.
Run the following command to check the HBA driver concurrency:
#cat /sys/module/qla2xxx/parameters/ql2xmaxqdepth
128
You are advised to set the same queue depth for hosts in the Oracle cluster. For
small- and medium-sized databases, set the queue depth to 32. For large-sized
databases, you are advised to set the queue depth to 128.
The best practice in this document uses udev rules to configure Oracle ASM
disks. Udev has the same performance as ASMLib and Filter Driver, and is an
RHEL native tool that does not occupy extra space.
NOTE
SYMLINK or NAME You are advised to use The device files are
SYMLINK to create a stored in different
directory in /dev for directories from the
storing device files, for original files. This
example, raw. prevents the ASM from
reporting the ORA-15020
error when the original
device and connected
device are discovered at
the same time.
device from being in the same directory as the original device during the creation
of a disk group. The recommended udev configuration is described in the
following sections.
NOTE
If Disk Discovery Path is set incorrectly when you create an ASM disk group, the ORA-15020
error is reported.
NOTE
5.5.1 Principles
Each block device or its partition corresponds to a request queue for which an I/O
scheduler is selected to coordinate submitted requests. The basic purpose of the
I/O scheduler is to arrange the requests according to their sector numbers on the
block device, to reduce disk head movement and improve efficiency. Each
scheduler maintains a different number of queues for sequencing the submitted
requests and moves the first requests in a queue to the request queue for
response in sequence. The following figure shows the position of the I/O scheduler
in the kernel stack.
The I/O scheduler is to increase I/O throughput and reduce I/O response time,
which is generally contradictory. To balance them, the I/O scheduler offers
multiple scheduling algorithms to adapt to different I/O request scenarios. The
following describes the scheduling algorithms provided by the Linux kernel.
CFQ
CFQ allocates a request queue and a time slice to each process for accessing the
block device. A process sends read/write requests to the underlying block device
within the time slice, and the request queue of the process is suspended and wait
for scheduling after the time slice is used up.
NOTE
In Kernel 2.6.18 (20th, September, 2006), CFQ becomes the default scheduling program.
Deadline
The deadline I/O scheduler is aimed to ensure each I/O request is responded
within a certain period of time. An expiration date is set for all I/O operations to
ensure that requests are scheduled timely based on CFQ.
Tests show that the deadline I/O scheduler is superior to the CFQ I/O scheduler and is
suitable for certain multithreaded workloads and database workloads.
The deadline I/O scheduler is used in RHEL 7.x systems by default while CFQ is used for
SATA disks by default.
Noop
Noop is short for the No Operation I/O scheduler. It inserts all incoming I/O
requests into a simple first in first out (FIFO) queue and merges adjacent I/O
requests.
Assume that the I/O request sequence is as follows:
● 100, 500, 101, 10, 56, 1000
Noop will process in the following order:
● 100(101), 500, 10, 56, 1000
Noop has obvious advantages in the following scenarios:
● There are more intelligent I/O scheduler devices than the I/O scheduler. If the
block device drivers use RAID, or are SAN or NAS storage devices, these
devices better organize I/O requests and do not need the I/O scheduler to
perform extra scheduling.
● Upper-layer applications have closer contact with underlying devices than the
I/O scheduler. In other words, the I/O request sent by the upper-layer
applications to I/O scheduler has been optimized. Therefore, the I/O scheduler
only needs to execute the I/O request sent by the upper-layer application in
sequence.
● For certain disks without rotating magnetic heads, Noop is a better choice.
For disks with rotating magnetic heads, reassembling requests by the I/O
scheduler consumes certain CPU time. However, for SSDs, the CPU time can
be saved because SSDs provide a more intelligent request scheduling
algorithm.
NOTE
In the preceding scenarios, Noop is not necessarily preferred for all tuning including
performance tuning is based on actual workloads.
NOTE
● The deadline I/O scheduler is recommended by Oracle. For official explanation, click
here.
● The deadline I/O scheduler is recommended by RHEL. For official explanation, click here.
Parameter Description
NOTE
DIRTY DATA: Data that has been modified and saved in the cache to gain performance
advantages. After the data is stored to disks, it becomes clean.
Parameter Description
5.6.3 SEM
The following table describes the semaphore kernel parameters comprised of the
SEMMSL, SEMMNI, SEMMNS, and SEMOPM.
Parameter Description
SEMMNS Defined as the total number of semaphores for the entire system.
The default value is 32000.
Oracle requires 2x value of PROCESSES in the init.ora parameter
for semaphores (SEMMNS value) on startup of the database. Then
half of those semaphores are released. To properly size SEMMNS,
one must know the sum of all PROCESSES set across all instances
on the host. SEMMNS should best be set higher than SEMMNI x
SEMMSL value (this is how we get 32000 for default value
250*128)
SEMMNI Defined as the maximum number of semaphore sets for the entire
system. The default value is 128.
Regarding SEMMNI, SEMMNI should be set high enough for
proper number of sets to be available on the system. Using the
value of SEMMSL, one can determine max amount of SEMMNI
required. Round up to the nearest power of 2.
SEMMNI = SEMMNS/SEMMSL
net.core.wmem_max
5.7 HugePages
HugePages is a feature integrated into the Linux kernel. Enabling HugePages
makes it possible for the operating system to support memory pages greater than
the default (usually 4 KB). Using very large page sizes can improve system
performance by reducing the amount of system resources required to access page
table entries. The size of HugePages varies from 2 MB to 256 MB, depending on
the kernel version and hardware architecture. For Oracle databases, using
HugePages reduces the operating system maintenance of page states, and
increases Translation Lookaside Buffer (TLB) hit ratio. The following describes the
best practices of HugePages in Oracle databases.
5.7.1 Principle
Page Scheduling Process
The following table shows how an operating system converts a virtual address to
a physical address by using a processor (CPU).
Advantages of HugePages
● Reduction of the page table size
The operating system page table (mapping from the virtual memory to the
physical memory) is smaller, because each page table entry is pointing to
pages from 2 MB to 256 MB. For example, if you use HugePages with 64-bit
hardware, and you want to map 256 MB of memory, you may need one page
table entry (PTE). If you do not use HugePages, and you want to map 256 MB
of memory, then you must have 65536 (256 MB x 1024 KB/4 KB) PTEs.
● The memory of HugePages is intelligently locked in the physical memory and
is never swapped out to the swap partition, avoiding the swap impact on the
performance.
● As the number of page tables decreases, the TLB hits in the CPU (CPU cache
for the page table) increases.
● Contiguous pages are preallocated and can only be used for the shared
memory.
Parameters of HugePages
The following table describes the parameters related to HugePages.
HugePages_Free Number of huge pages in the pool that are not yet
allocated
Parameter Description
NOTE
Start the Oracle database instance and then calculate the recommended value using a
script. After the configuration is complete, restart the Oracle database instance.
Based on the lab test, the service delay decreases and transactions per second
(TPS) increases by about 15% under the same load after HugePages is enabled.
You are advised to enable HugePages when deploying Oracle databases.
Performance improvement varies with service loads.
NOTE
I/O Alignment
If master boot record (MBR) partitions are created in earlier versions of RHEL, the
first 63 sectors of a disk are reserved for the MBR and partition table. The first
partition starts from the 64th sector by default. As a result, misalignment occurs
between data blocks (database or file system) of hosts and data blocks stored in
the disk array, causing poor I/O processing efficiency.
In a Linux operating system, you can use either of the following methods to
resolve I/O misalignment:
Method 1: Change the start location of partitions.
When creating MBR partitions in Linux, you are advised to run the fdisk command
in expert mode and set the start location of the first partition as the start location
of the second extent on a LUN (the default extent size is 4 MB). The following is a
quick command used to create an MBR partition in /dev/sdb. The partition uses
all space of /dev/sdb. The start sector is set to 8192, namely, 4 MB.
printf "n\np\n1\n\n\nx\nb\n1\n 8192\nw\n" | fdisk /dev/sdb
NOTE
If the database is installed using udev, formatting partitions, I/O alignment in other words,
is not involved.
6.1 Overview
6.2 Data Block
6.3 ASM
6.1 Overview
Parameter/ Recommended Value/ Default Value/ Location
Operation Method Method
6.2.1 DB_BLOCK_SIZE
DB_BLOCK_SIZE specifies the size (in bytes) of an Oracle database block. To
achieve good performance, the size of an Oracle database block must be greater
than or equal to a multiple of the size of an operating system block. The following
table describes the DB_DLOCK_SIZE parameter.
Attribute Description
Type Integer
The block size of 8 KB is the best choice for most systems. The block size of 8 KB is
the best choice for most systems. However, the online transaction processing
(OLTP) system occasionally uses blocks of a smaller size, while the data storage
subsystem (DSS) occasionally uses blocks of a larger size. Pay attention to the
following when selecting the database block size to achieve the best performance:
● Read
The purpose is to minimize the number of read times during data retrieval
regardless of the block size.
– If there are small rows and random accesses at most, select a smaller
block size.
– If there are small rows and sequential accesses at most, select a larger
block size.
– If there are small rows, and random and sequential accesses, a larger
block size may be a better choice.
– If there are large rows, for example, rows that contain the data of a large
object (LOB), select a larger block size.
● Write
In OLTP systems with highly concurrent I/Os, the database block size of 8 KB
is usually the best for most systems that process a large number of
transactions. Only the system that processes LOB data requires a block size
larger than 8 KB.
● Advantages and disadvantages of the block size
The following table lists the advantages and disadvantages of different block
sizes.
NOTE
For details, see DB_BLOCK_SIZE Initialization Parameter on the official Oracle website.
The size of a non-standard block can be set to 2 KB, 4 KB, 8 KB, 16 KB, or 32 KB.
The ability to specify multiple block sizes for a database is especially useful if you
want to transfer table spaces between databases. For example, a table space that
uses a 4 KB block size can be transferred from an OLTP system to a database that
uses a standard block size of 8 KB.
The following describes how to create a table space with a non-standard block
size of 16 KB.
System altered.
Tablespace created.
Some of the newer high-capacity disk drives provide 4 KB sectors to add the Error
Correcting Code (ECC) function and improve format efficiency. Most Oracle
database platforms can detect sectors in a larger size and the databases
automatically create redo log files whose block size is 4 KB on these disks.
NOTE
The cause for the change from the default 512 bytes to 4 KB is that some drives use 4 KB
as their local block size and reduce metadata overhead by increasing their block size.
When using Huawei OceanStor Dorado V6 storage systems, you are advised
not to change the redo log block size due to the following reasons:
To avoid waste, you can specify a 512-byte block on the disk drive who has 4 KB
sectors to overwrite the default 4 KB block of redo logs. You can run the following
command to add a redo log file group whose block size is 512 bytes:
NOTE
For a disk with 4 KB sectors, BLOCKSIZE 512 overwrites the default 4 KB block.
ALTER DATABASE orcl ADD LOGFILE GROUP 4 ('+ORAREDO') SIZE 100M BLOCKSIZE 512 REUSE;
Run the following command to check the block size of the redo log file:
SQL> select blocksize from v$log;
BLOCKSIZE
----------
512
512
512
512
BLOCK_SIZE
----------
16384
6.3 ASM
Redundancy
The following table lists the redundancy types supported by ASM disk groups.
NOTE
The FLEX or EXTENDED disk groups can have different levels of mirroring redundancy.
You are advised to use the EXTERNAL redundancy mode to protect other disk
groups in RAID groups except the ORAGRID and ORAMGMT disk groups. The OCR
and voting files occupy less space, data mirroring does not affect database
performance, and the disk group files are not backed up in normal backup
scenarios. To improve cluster software reliability, you are advised to use the
NORMAL redundancy mode for the ORAGRID and ORAMGMT disk groups.
Other data disk groups use external redundancy to provide a large amount of
storage space. As the disk array already provides the mirroring function, data does
not need to be mirrored in disk groups, providing better I/O performance and
larger capacity.
NOTE
ASM stores the files on the disk stripes of the disk groups that use the EXTERNAL
redundancy mode. The mirroring function is processed and managed by the storage array.
6.3.3 AU Size
ASM allocates space by block called the allocation unit (AU). An AU is the
fundamental unit of allocation within a disk. All disks in an ASM disk group are
partitioned by AUs of the same size. In most ASM deployment scenarios, the
default AU settings, such as 1 MB for Oracle 11g or 4 MB for Oracle 12cR2, are
helpful to provide excellent performance for Oracle databases. For data
warehouses with large sequential reads, a larger AU is recommended, which can
reduce the metadata quantity for describing files in a disk group and avoid
hotspot data to give full play to performance advantages. The value can be 1 MB,
2 MB, 4 MB, 8 MB, 16 MB, 32 MB, or 64 MB.
For large databases whose size is 10 TB or larger, the AU size of 4 MB or larger is
recommended. The advantages are as follows:
● The SGA size in the relational database management system (RDBMS)
instance is decreased.
● The file capacity is limited.
● Less time is taken to open the large database which usually has a large
number of big data files.
NOTE
The AU attribute can be specified only during disk group creation. Once a disk group is
created, the AU size cannot be changed.
NOTE
Does ASM striping conflict with the striping provided by storage arrays? The answer is no.
ASM striping is a supplement to storage array striping.
Striped The striping attribute of templates applies to all disk group types
(normal redundancy, high redundancy, and external redundancy).
Permitted values for the striping attribute of ASM templates are
the following:
● FINE. Striping in 128 KB chunks
● COARSE. Striping in 1 MB chunks (4 MB starting with Oracle
Release 12.2)
If no striping attribute is specified, the default value is set to
COARSE.
Oracle ASM striping is used to balance loads across all disks in a disk group and
reduce I/O latency.
● Coarse striping
By default, ASM stripes data in a disk group in the AU size, which is named
coarse striping. The stripe size is always equal to the AU size. That is, a stripe
lies in only one disk and does not cross disks. Data is evenly distributed to all
disks in the disk group in the AU size, ensuring logical data continuity.
● Fine striping
By default, the fine striping size always remains 128 KB, that is, data is read
and written as 128 KB stripes. ASM selects eight devices (if any) from the disk
group, divides them into AUs, and stripes the AUs in the size of 128 KB.
There are two striping methods because of the differences between Oracle files.
Most Oracle files (including data files, archive logs, and backup sets) are large in
size and involve a large amount of data in each I/O operation. For these file
operations, the actual disk data read/write time is much longer than the
addressing time. Therefore, in this mode, using large stripes (coarse striping) can
increase the throughput.
Online log files and control files are small in size. Each I/O operation involves a
small amount of data and requires shorter read and write latency. Therefore, fine
striping is used to distribute data to multiple disks to provide concurrent access
with less latency. In addition, less time is taken to synchronize data to disks
because of small I/Os.
NOTE
During disk group creation, only control files use fine striping by default. You can modify
the template for a specified file type to change the default settings.
● ASM stripe template
Oracle ASM configures an ASM stripe template for each file type. The
following figure shows how to view the ASM stripe template.
User the ALTER command plus the default file stripe type of ORAREDO disk
groups to change the database redo log type to FINE, as shown in the
following:
SQL> ALTER DISKGROUP +ORAREDO ALTER TEMPLATE onlinelog
ATTRIBUTES(FINE);
NOTE
In the preceding command, +ORAREDO indicates the created disk group. Change it based
on the site requirements.
fixed capacity for all databases. Therefore, you are advised to provide
sufficient concurrency and capacity based on the service scale.
● Advanced LUN features
Ensure that the LUNs in the same disk group have the same advanced
features and add the LUNs in the cluster to the consistency group.
● Example of LUN planning
This document assumes that there are four active I/O paths and Oracle
database 12cR2. The following table describes the LUN plan for Oracle ASM.
ORAGRID 4 MB NORMAL 50 GB x 3
ORAMGMT 4 MB NORMAL 50 GB x 3
The LUN capacity is calculated based on the actual service volume. You are advised to plan
the capacity of Oracle data files, log files, archive files, and temporary files based on the
multiple of the number of active I/O paths. For details, see Recommendations for Storage
Preparation.
6.3.6 Multiplex
For better performance, you are advised not to reuse redo logs and the reuse
affects performance. You are advised to plan at least seven log groups and two
member files for each log group.
For higher security, you are advised to place group members in different disk
groups. For example, if seven log groups are planned and each log group has two
member files, group members are allocated to different disk groups, as shown in
the following table.
redo01_02.log ORAREDO2
redo02_02.log ORAREDO2
redo03_02.log ORAREDO2
[...]
Attribute Description
Type Integer
Default value 1
OceanStor V5 storage systems can use the rebalancing feature of ASM to take
over or replace devices of different storage vendors to implement data migration.
For example, the DATA disk group can be migrated from an old storage device to
OceanStor V5 storage systems. To implement such migration, ensure that the
storage devices are mounted to hosts and discovered by ASM during the
migration. When the rebalancing is complete, all data is removed from the old
framework.
NOTE
Expanding the Existing LUN Capacity to Expand the Capacity of Oracle ASM
Storage
The following uses DM Multipath as an example to describe how to expand the
LUN capacity when the entire LUN partition is used.
1. Use DeviceManager to expand the capacity of the original LUN, as shown in
the following figure.
2. Run the following command to rescan for disks on the host again. The disk
capacity remains unchanged.
# rescan-scsi-bus.sh
3. Obtain the drive letter of the LUN on the host and run the following
command to rescan for disks:
After the scan is complete, the disk capacity becomes 50 GB.
NOTE
sdx indicates the drive letter of the LUN to be expanded on the application server. For
details about how to view the drive letter, see Step 4. Replace it with the actual drive
letter.
4. View the LUN capacity after expansion.
# multipath -ll ORA-DATA4M-01 | awk '/sd/ {print $(NF-4)}' | xargs -i fdisk -l /dev/{} | egrep "^Disk"
Disk /dev/sdyn: 859.0 GB, 858993459200 bytes, 1677721600 sectors
Disk /dev/sdzd: 859.0 GB, 858993459200 bytes, 1677721600 sectors
Disk /dev/sdyv: 859.0 GB, 858993459200 bytes, 1677721600 sectors
Disk /dev/sdzl: 859.0 GB, 858993459200 bytes, 1677721600 sectors
5. Reload the multipath device.
# multipathd -k "resize map ORA-DATA4M-01"
ok
NOTE
In the preceding command, ORA-DATA4M-01 is the LUN alias in the multipath file.
Change it based on the site requirements.
6. Run the asmcmd lsdsk command to check the capacity change of the ASM
disk.
If the capacity of OS_MB does not change, do not go to the next step.
NOTE
Total_MB: capacity of the LUN in the current disk group before capacity expansion.
OS_MB: maximum size of an ASM disk that can be expanded to.
7. Run the ALTER DISKGROUP RESIZE DISK command to adjust the ASM disk
size.
SQL> ALTER DISKGROUP ORADATA RESIZE DISK ORADATA_0000 SIZE 819200M REBALANCE POWER
10;
Diskgroup altered.
8. Check the size of the new ASM disk.
NOTE
Adjusting the LUN capacity on the operating system may cause data loss. You are advised
to back up all LUN data before adjusting the capacity.
Oracle ASM has the following storage restrictions if the COMPATIBLE.ASM and
COMPATIBLE.RDBMS parameter values of disk groups are smaller than 12.1:
● When the AU size is equal to 1 MB, the maximum storage space of each
Oracle ASM disk can be set to 4 PB.
● When the AU size is equal to 2 MB, the maximum storage space of each
Oracle ASM disk can be set to 8 PB.
● When the AU size is equal to 4 MB, the maximum storage space of each
Oracle ASM disk can be set to 16 PB.
● When the AU size is equal to 8 MB, the maximum storage space of each
Oracle ASM disk can be set to 32 PB.
● The maximum storage space of the storage system can be set to 320 EB.
NOTE
● Maximum size of a disk group = Maximum disk size x Maximum number of disks in the
disk group (10, 000)
● The maximum number of disks in all disk groups is 10,000. The 10,000 disks can be in
one disk group or in a maximum of 511 disk groups.
NOTE
n indicates the number of Oracle databases connected to Oracle ASM instances. The
formula is based on experience.
7 Summary
The following table lists the best practices of all operation objects and
configurations mentioned in the preceding sections. For specific details, see the
corresponding section.
SWAPPINESS 1 to 20 60 5.6.1
Virtual
DIRTY_RATIO 40 to 80 20 Memory
DIRTY_BACK 3 10
GROUND_RA
TIO
8 References
References
OceanStor Dorado V6 storage system product documentation
Oracle ASM Administrator's Guide
Product Documentation for RHEL 7
9 Glossary
Acronym or Description
Abbreviation
FC Fibre Channel
10 Appendixes
NOTE
The CLI commands may vary depending on the UltraPath version. For details, click
here.
Method 2:
# cd /dev/disk/by-id
[root@oracle1 by-id]# ll -lh
lrwxrwxrwx 1 root root 9 Mar 12 17:08 wwn-0x6c4ff1f100ee3d7501948ec2000002c5 -> ../../sdb
NOTE
If the required permission is incorrect, restart the node or the UDEV service. The
following are examples:
RHEL7.x
udevadm control --reload-rules
udevadm trigger
or
udevadm trigger --type=devices --action=change
RHEL6.x
udevadm control --reload-rules
start_udev
NOTE
NOTE
The echo configuration will become invalid after the system is restarted.
● Run the udev rules to configure I/O scheduler.
RHEL creates a file in the following location. The Linux operating system uses
the udev rules to set the I/O scheduler algorithm after each restart.
#vi /etc/udev/rules.d/99-huawei-storage.rules
In RHEL 6 and 7, add the following content to the udev rules file. Then,
restart the operating system. The following is an example:
ACTION=="add|change", KERNEL=="sd*", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g
-u /dev/$name", RESULT=="36207969100f4a3810efc24f70000001a",ATTR{queue/
scheduler}="deadline"
ACTION=="add|change", KERNEL=="sd*", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g
-u /dev/$name", RESULT=="36207969100f4a3810efc253b0000001b",ATTR{queue/
scheduler}="deadline"
ACTION=="add|change", KERNEL=="sd*" ,SUBSYSTEM=="block",PROGRAM=="/usr/lib/udev/scsi_id -g -
u /dev/$name", RESULT=="36207969100f4a3810efc25a60000001c",ATTR{queue/
scheduler}="deadline"
ACTION=="add|change", KERNEL=="sd*", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g
-u /dev/$name", RESULT=="36207969100f4a3810efc25e70000001d",ATTR{queue/
scheduler}="deadline"
Parameter Description
NOTE
After the configuration is completed, only the changes that apply to the sd * device are
displayed. Changes related to the dm-* device are not directly displayed. However, the dm-*
device inherits the I/O scheduler algorithm from the sd * device that forms its path.
NOTE
In some virtual environments (such as VMs) and special devices, the output of the
preceding command may be none. In this case, the operating system or VM bypasses kernel
I/O scheduler and directly submits all I/O requests to the device. Do not change the I/O
scheduler algorithm in such an environment, for example, swap space.
Method 2: Modify the grub file to change the scheduling algorithm for all
devices in a unified manner.
● In RHEL 6, add elevator=deadline to the end of the kernel line in the /etc/
grub.conf file.
– After modifying the /etc/grub.conf file, restart the system for the
modification to take effect.
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64)
root (hd0,0)
kernel /vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=3333d6ac-
b29b-400a-8ccf-4b93e7b232da rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
SYSFONT=latarcyrheb-sun16 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
rhgb quiet elevator=deadline
initrd /initramfs-2.6.32-431.el6.x86_64.img
▪ On a BIOS-based host:
# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
Found initrd image: /boot/initramfs-4.1.12-112.16.4.el7uek.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-862.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-862.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-6976e35301344914acfe829b6e8ec911
Found initrd image: /boot/initramfs-0-rescue-6976e35301344914acfe829b6e8ec911.img
done
▪ On a UEFI-based host:
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-327.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-327.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-537dacc8159a4d4caaa419342da0b820
Found initrd image: /boot/initramfs-0-rescue-537dacc8159a4d4caaa419342da0b820.img
done
NOTE
Enabling HugePages
To configure and enable HugePages on a computer, perform the following steps:
3. Log in to the operating system as user oracle and run the ulimit -l command
to verify that the value of memlock is changed.
$ ulimit -l
241591910
5. Create a script and use it to calculate the recommended value of the current
shared memory segment of HugePages.
a. Create a text file named hugepages_settings.sh.
b. Add the following content in the file:
#!/bin/bash
#
# hugepages_settings.sh
#
# Linux bash script to compute values for the
# recommended HugePages/HugeTLB configuration
#
# Note: This script does calculation for all shared memory
# segments available when the script is run, no matter it
# is an Oracle RDBMS shared memory segment or not.
# Check for the kernel version
KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`
# Find out the HugePage size
HPG_SZ=`grep Hugepagesize /proc/meminfo | awk {'print $2'}`
# Start from 1 pages to be on the safe side and guarantee 1 free HugePage
NUM_PG=1
# Cumulative number of pages required to handle the running shared memory segments
for SEG_BYTES in `ipcs -m | awk {'print $5'} | grep "[0-9][0-9]*"`
do
MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`
if [ $MIN_PG -gt 0 ]; then
NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`
fi
done
# Finish with results
case $KERN in
'2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;
echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;
'2.6' | '3.8' | '3.10' | '4.1' | '4.14' ) echo "Recommended setting: vm.nr_hugepages =
$NUM_PG" ;;
*) echo "Unrecognized kernel version $KERN. Exiting." ;;
esac
# End
NOTE
If the host's kernel information is not contained in the script, add it manually. If
the kernel information is not contained in the script, the following error message
is displayed:
Unrecognized kernel version 4.1. Exiting.
Transparent HugePages may cause memory allocation delay during the runtime.
To prevent performance problems, you are advised to disable Transparent
HugePages on all Oracle database servers.
NOTE
▪ On a BIOS-based host:
# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
Found initrd image: /boot/initramfs-4.1.12-112.16.4.el7uek.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-862.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-862.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-6976e35301344914acfe829b6e8ec911
Found initrd image: /boot/initramfs-0-rescue-6976e35301344914acfe829b6e8ec911.img
done
▪ On a UEFI-based host:
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-327.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-327.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-537dacc8159a4d4caaa419342da0b820
Found initrd image: /boot/initramfs-0-rescue-537dacc8159a4d4caaa419342da0b820.img
done