0% found this document useful (0 votes)
319 views15 pages

Assessing and Tuning Network Performance For Data Guard and RMAN

Uploaded by

Samuel Asmelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
319 views15 pages

Assessing and Tuning Network Performance For Data Guard and RMAN

Uploaded by

Samuel Asmelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

 

nebila (Available) (0) Contact Us Help


PowerVLast Login: May 18, 2023 10:25 AM EAT Switch to Cloud Support

 
Dashboard Knowledge Service Requests Patches & Updates

Give Feedback...
Copyright (c) 2023, Oracle. All rights reserved. Oracle Confidential.

Assessing and Tuning Network Performance for Data Guard and RMAN (Doc ID 2064368.1) To Bottom

In this Document   Was this document helpful?  

Goal Yes
No
Solution    

  Installation and Usage of oratcptest


  Document Details  
  Supporting Utilities
  Gathering Information about the Interface with ethtool
Type:
HOWTO
  Gather Socket Information Using the ‘ss’ Command Status:
PUBLISHED
  Using the Linux ‘sar’ Command to Monitor Interface Throughput Last Major
Sep 9, 2022
Update:
  Determining Optimal Socket Buffer Size Sep 9, 2022
Last Update:
  Socket Buffer Size    

  Selective Acknowledgements   Related Products  


  Changing the Maximum Socket Buffer Size and Testing Single Process Throughput (Scenarios 1, 2, 3 and 4) Oracle Cloud Infrastructure -
  Multiple Process Aggregate Throughput (Scenario 3 - RMAN Operations) Database Service
Oracle Database Cloud
  Testing Multiple Process Throughput Exadata Service
Oracle Database Cloud
  Throughput with SYNC (Scenario 4 – SYNC Transport) Schema Service
  Assessing Network Performance for Data Guard Synchronous Redo Transport Oracle Database Backup
Service
  Setting Oracle Net SDU for SYNC Transport Oracle Database Exadata
Express Cloud Service
  Setting SDU for Oracle RAC Show More
  Setting SDU for Non-Oracle RAC
   

  Use oratcptest to Assess SYNC Transport Performance   Information Centers  

  Determine Redo Write Size Oracle Catalog: Information


Centers and Advisors for All
  Run tests with oratcptest Products and Services [50.2]

  Implement FASTSYNC Information Center: Oracle


Cloud Infrastructure (OCI) &
  Increase Socket Buffer Size Platform as a Service (PaaS)
  Command Options Overview [2048297.2]

  oractcptest help output: Index of Oracle Database


Information Centers
[1568043.2]

APPLIES TO: Information Center:


Transportable Tablespaces
(TTS) for Oracle Database
Oracle Cloud Infrastructure - Database Service - Version N/A and later [1461278.2]
Oracle Database Cloud Exadata Service - Version N/A and later 19c Database Upgrade - Self
Oracle Database Cloud Schema Service - Version N/A and later Guided Assistance with Best
Oracle Database Backup Service - Version N/A and later Practices [1919.2]
Oracle Database Exadata Express Cloud Service - Version N/A and later
Show More
Information in this document applies to any platform.    

  Document References  
GOAL    

The goal of this note is to provide steps to evaluate network bandwidth and experiments that will help tune operating system,   Recently Viewed  
Oracle Net or RMAN parallelism to improve overall network throughput for Oracle database operations that need to transfer
Assessing and Tuning
data across the network.    The most typical database operations that will benefit from this tuning is 1) RMAN database Network Performance for
backup/restore or database migration operations across the network, 2) Data Guard instantiation across network, and 3) Data Data Guard and RMAN
Guard or GoldenGate redo or data transport across the network.  [2064368.1]
Creating a Physical Standby
Scenario 1 - Understand the existing network and evaluate tuning options prior to database migration, Data Guard database using RMAN restore
deployment or RMAN operations for a large database. database from service
[2283978.1]
It is critical that you have sufficient network bandwidth to support peak redo rates (steady state and when resolving Dbname for Standby
Database in
gaps) along with any other network activity that shares the same network. Please note that your point-to-point network
Mgmt_target_properties
bandwidth will be throttled by the network segment, switch, router, and interface with the lowest network bandwidth. Shows Primary db Unique
 Some networks also implement Quality of Service rules to prioritize certain data as well as placing restrictions on a single Name [1067738.1]
flow throughput to prevent a single process from consuming the network.  Using oratcptest as described below can help PRIMARY_DB_UNIQUE_NAME
you determine potential network throughput using the shared network. column not populated in
standby database
Scenario 2- Experiencing a transport lag with the ASYNC transport. [2491642.1]
System Is Not Allowing To
Perform RT Transaction
With enough network bandwidth, ASYNC transport can maintain pace with very high workloads, up to approximately [2662478.1]
400MB/sec per Real Application Cluster instance. In cases where resources are constrained, the ASYNC transport can fall Show More
behind, resulting in a growing transport lag on the standby. A transport lag is the amount of data, measured in time that
   

the standby has not received from the primary. Determine transport lag on the standby database by querying the
V$DATAGUARD_STATS view using a query like the following. 

SQL> select name,value,time_computed,datum_time from v$dataguard_stats where name=’transport lag’;

Scenario 3- Determine Maximum Network Bandwidth or Evaluate Potential RMAN Operation Throughput.

RMAN can parallelize across many RMAN channels on the same node or across nodes in a RAC cluster. RMAN will be
bounded by the available network bandwidth. Evaluating network bandwidth is an important prerequisite prior to a large
database migration, standby database instantiation or when backup or restore rates are not meeting expectations.

Scenario 4- Post deployment: tuning transaction response time rates with SYNC transport.

When SYNC redo transport is enabled, a remote write is introduced in addition to the regular local write for commit
processing. This remote write, depending on network latency and remote I/O bandwidth, can increase commit processing
time. Because commit processing takes longer, more sessions will wait on LGWR to finish its work and begin work on their
commit requests. In other words, application concurrency has increased. Analyze database statistics and wait events to
detect increased application concurrency and if application performance (e.g. throughput and response time) is still
acceptable.

NOTE: Redo Compression and Encryption are out of scope for this document however each can have an impact on transfer
rates and transport lag and should be tested prior to implementation. It’s best to evaluate with and without compression and
encryption and compare performance differences. The overhead is usually attributed for the additional work and time to
compress or encrypt prior to sending redo and decompress and decrypt after receiving the redo.

SOLUTION

Installation and Usage of oratcptest

oratcptest can be used as a general purpose tool for measuring network bandwidth and latency. However, oratcptest was
designed specifically to help customers assess network resources that will be used by Data Guard redo transport, GoldenGate
RMAN backup and restore, migration, Data Guard instantiation, database remote clone.

You can control the test behavior by specifying various options. For example,

Network message size


Delay time between messages
Parallel sending streams
Whether oratcptest-server should write messages on disk, or not.
Whether oratcptest-client should wait for ACK before it sends the next message, or not, thus simulating the ASYNC and
SYNC transport used by Data Guard.

NOTE: This tool, like any Oracle network streaming transport, can simulate efficient network packet transfers from the
source host to target host. Throughput can be 100 MB/sec or higher depending on the available network bandwidth between
source and target servers and the invoked tool options. Take caution for any other critical applications sharing the same
network.

1. Copy the JAR file attached to this MOS note onto both client and server hosts. 

NOTE: oratcptest can be executed as any user.  Root privileges are not required.

2. Verify that the host where you install oratcptest has JRE 6 or later.
$ java -version
java version "11.0.16.1" 2022-08-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.16.1+1-LTS-1)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.16.1+1-LTS-1, mixed mode)

3. Verify that the JVM can run the JAR file with this command on both hosts.
% java -jar oratcptest.jar -help
 This command displays the help.  An error will result if JVM cannot run the JAR file.

4. Start the test server on the receiving side.   


% java -jar oratcptest.jar -server [VIP in RAC configurations (host name or IP in non-RAC)] -port=<port number>

NOTE: you can supply any available port for the server.

5. Run the test client. (Please change the server address and port number to match that of your server started in step 4.)
% java -jar oratcptest.jar <test.server.address.com or IP provided in the server command> -port=<port number> -
duration=120s -interval=20s -mode=ASYNC

The test will display output similar to the following :

Server (Standby)
-----------------

$ java -jar oratcptest.jar -server <VIP> -port=1521

OraTcpTest server started.

[A test was requested.]


Message payload = 1 Mbyte
Disk write = NO
Socket receive buffer = (system default)

The test terminated. The socket receive buffer was 3 Mbytes.

Client (Primary)
----------------

$ java -jar oratcptest.jar <standby VIP> -port=1521 -duration=120s -interval=20s -mode=ASYNC

[Requesting a test]
        Message payload        = 1 Mbyte
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = ASYNC
        Disk write             = NO
        Statistics interval    = 20 seconds
        Test duration          = 2 minutes
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(16:50:56) The server is ready.


                        Throughput           
(16:51:16)         46.859 Mbytes/s
(16:51:36)         48.011 Mbytes/s
(16:51:56)         46.699 Mbytes/s
(16:52:16)         45.982 Mbytes/s
(16:52:36)         46.145 Mbytes/s
(16:52:56)         46.095 Mbytes/s
(16:52:56) Test finished.
               Socket send buffer = 2097152
                  Avg. throughput = 46.631 Mbytes/s

NOTE: The socket buffer size reported by oratcptest is half of the actual value as confirmed by the ‘ss’ command (see
below).

 
6. To see a complete list of options issue the following command
$ java -jar oratcptest.jar -help

Supporting Utilities

Some utilities are useful in assisting with the documented processes. The utilities described are available in Linux. For other
operating systems refer to the operating system administration guide for equivalent utilities.

Gathering Information about the Interface with ethtool

The Linux utility ethtool can be used to find the theoretical maximum speed of the network interface.  On a shared system, the
combined network traffic of the databases cannot exceed the interface’s speed. 

To find the interface used, execute the following ip command from each node to the VIP or IP of the other node:

# ip route get <IP>


<target IP> via <GATEWAY> dev <INTERFACE> src <LOCAL IP> uid 0
    Cache

Find the speed and duplex setting of the interface using ethtool.

# ethtool <INTERFACE> | egrep 'Speed|Duplex'


        Speed: 10000Mb/s
        Duplex: Full

Multiply the speed returned by 0.125 to convert to MB/s

10000 Mb * 0.125 = 1250 MB/s

If the Duplex value is ‘Full’, the speed can be achieved in both directions (send and receive) simultaneously.  Otherwise, the
aggregate of send and receive throughput cannot exceed the speed.

For a data link, if the interfaces of the two systems involved are different ratings, throughput will be limited by the smaller of the
two.

Gather Socket Information Using the ‘ss’ Command

The Linux ‘ss’ command provides detailed information about a network socket which can be used to monitor the effects of
tuning.  It must be run with root permissions.

Execute the command on the primary and standby servers while oratcptest is running.  The information relevant to this note is:

rb=receive buffers (in bytes)


tb=transmit(write) buffers (in bytes)
sack=selective acknowledgements (enabled if shown)

Example output:

(Standby)

# ss --info --tcp --memory --processes --numeric --extended | grep -A1 'ps -aef | grep oratcptest |
grep -v grep |awk '{print $2}''

ESTAB 0 12 [::ffff: <standby IP address>]:<standby port> [::ffff: <primary IP address>]:<primary port>


users:(("java",pid=1491972,fd=8)) timer:(on,125ms,0) uid:1000 ino:109975165 sk:d <->
skmem:(r0,rb6291456,t0,tb332800,f348404,w3852,o0,bl0,d0) ts sack cubic wscale:7,7 rto:262
rtt:61.828/0.769 ato:40 mss:8948 pmtu:9000 rcvmss:8948 advmss:8948 cwnd:10 bytes_sent:18712
bytes_acked:18700 bytes_received:4867992964 segs_out:33837 segs_in:546831 data_segs_out:4643
data_segs_in:546472 send 11577926bps lastsnd:1 lastack:1 pacing_rate 23155800bps delivery_rate
4605984bps delivered:4641 app_limited busy:97472ms unacked:3 rcv_rtt:61.137 rcv_space:4913536
rcv_ssthresh:3144448 minrtt:61.242
(Primary)

# ss --info --tcp --memory --processes --numeric --extended | grep -A1 'ps -aef | grep oratcptest |
grep -v grep |awk '{print $2}''

ESTAB 0 4113372 [::ffff:<primary IP address>]:<primary port> [::ffff: <standby IP address>]:<standby


port> users:(("java",pid=1447642,fd=6)) timer:(on,078ms,0) uid:1000 ino:107991454 sk:18 <->
skmem:(r0,rb87380,t0,tb4194304,f804,w4201692,o0,bl0,d0) ts sack cubic wscale:7,7 rto:262
rtt:61.518/0.03 ato:40 mss:8948 pmtu:9000 rcvmss:536 advmss:8948 cwnd:730 bytes_sent:4712232408
bytes_acked:4709087961 bytes_received:18104 segs_out:529310 segs_in:32666 data_segs_out:528958
data_segs_in:4491 send 849447641bps lastsnd:48 lastrcv:48 lastack:48 pacing_rate 1698891824bps
delivery_rate 411803432bps delivered:528606 app_limited busy:94800ms rwnd_limited:69614ms(73.4%)
sndbuf_limited:23344ms(24.6%) unacked:353 rcv_space:34742 rcv_ssthresh:34742 notsent:968924
minrtt:61.233

In the example the primary transmit socket buffer size is 4MB and the standby receive socket buffer size is 6MB.  

Using the Linux ‘sar’ Command to Monitor Interface Throughput

The linux ‘sar’ command is useful for monitoring the total throughput of a given interface.  The information gathered from
ethtool should indicate the approximate achievable maximum aggregate for the interface.

While oratcptest is running, execute the following command which polls the interface every 5 seconds on the interface
determined in ethtool section.

rxkB/s = Receive KB/s


txkB/s = Transmit KB/s

Sample Output:

(standby)

$ sar -n DEV 5 | egrep 'IFACE|<interface name>'

#NOTE rxkB/s column

04:50:59 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:04 PM    <name>   5692.00    364.80  49867.42      34.17      0.00      0.00      0.00      1.63
04:51:04 PM     IFACE   rxpck/s   txpck/s    rxkB/s     txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:09 PM    <name>   5705.40    384.40  49890.11     74.35      0.00      0.00      0.00      1.63
04:51:09 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:14 PM    <name>   5601.40    345.80  48999.09      45.89      0.00      0.00      0.00      1.61
04:51:14 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:19 PM    <name>   5704.20    387.60  49928.55     38.92      0.00      0.00      0.00      1.64
04:51:19 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:24 PM    <name>   5671.80    333.60  49691.13      22.55      0.00      0.00      0.00      1.63
04:51:24 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:29 PM    <name>   5581.00    343.60  48883.48      22.39      0.00      0.00      0.00      1.60

(primary) 

$ sar -n DEV 5 | egrep 'IFACE|<interface name>'

#NOTE txkB/s column

04:50:58 PM     IFACE   rxpck/s   txpck/s    rxkB/s   txkB/s  rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:03 PM    <name>    365.20   5699.40     24.09  49880.22       0.00      0.00      0.00      1.63
04:51:03 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s  rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:08 PM    <name>    364.00   5694.60     23.67  49866.02       0.00      0.00      0.00      1.63
04:51:08 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:13 PM    <name>    353.80   5606.80     22.99  49057.97       0.00      0.00      0.00      1.61
04:51:13 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:18 PM    <name>    379.80   5698.20     24.69  49887.37       0.00      0.00      0.00      1.63
04:51:18 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:23 PM    <name>    342.20   5681.20     22.35  49762.49       0.00      0.00      0.00      1.63
04:51:23 PM     IFACE   rxpck/s   txpck/s    rxkB/s   txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
04:51:28 PM    <name>    347.20   5579.80     22.61  48851.03       0.00      0.00      0.00      1.60

Determining Optimal Socket Buffer Size

NOTE: Use the value determined for socket buffer size by this process for all subsequent testing.

Socket Buffer Size

The Bandwidth Delay Product (BDP) is the capacity of a data link (bandwidth) and its round-trip time or latency.  BDP represents
the maximum amount of unacknowledged data possible on the data link; each link will have a different BDP. 
The socket buffer is the queue of packets the kernel stores while writing to or reading from the network socket.  In essence, the
socket buffer size limits the maximum amount of unacknowledged data the endpoints/systems allow on a data link.  There is a
separate buffer for read and write and each socket has its own buffers. 

If the socket buffer size for a given data link is not at least as large as the BDP, throughput may be throttled.

For the best results the recommended maximum recommended socket buffer size is 3*BDP however for larger bandwidth and
higher latency networks that value could potentially be very high.  Some lower value may be chosen for the maximum socket
buffer size but it is not recommended to make this value lower than BDP. 
i.e. BDP <= Socket buffer size <=3*BDP

The TCP stack dynamically manages the socket buffers for a given socket as defined by the values of ipv4/tcp_rmem and
ipv4/tcp_wmem.  Each of these kernel parameters has three values, the minimum, default and maximum.  Tuning involves
changing the maximum (third) value.  The values are in bytes. 

For example:

# sysctl -a | egrep 'tcp_[r|w]mem'


net.ipv4.tcp_rmem = 4096 87380 6291456
net.ipv4.tcp_wmem = 4096 16384 4194304

When a socket opens, it is allocated the default number of buffers, the middle value.  If the socket requires additional buffers,
TCP will allocate those buffers up to the maximum value.  Therefore, increasing the maximum (third) value for tcp_rmem and
tcp_wmem can improve throughput of a data link.

Additionally, TCP buffer allocation is bound by the values in ipv4/tcp_mem which contains 3 values, low, pressure, and high
which are indicated in pages(find pagesize in Linux with getconf PAGESIZE).  These values are set at boot time based on
available memory.  For example:

# sysctl net.ipv4.tcp_mem
net.ipv4.tcp_mem = 6169545 8226061 12339090

# getconf PAGESIZE
4096

In this example system, the most amount of total memory TCP will allocate is ~50GB (4096 pagesize * 12339090 pages(high)). 
When the number of pages exceeds 32GB (4096 pagesize * 8226061 pages(pressure)) TCP will begin to moderate memory
consumption.

When setting tcp_rmem and tcp_wmem these values should be taken into consideration with the needs of the entire system.

NOTE: net.core.rmem_max and net.core.wmem_max are orthogonal kernel parameter which set the maximum value when a
process requests a specific socket buffer size using setsockopt call and do not have any relationship to the tcp_[r|w]mem
parameters.  These kernel parameters are out of scope.

Selective Acknowledgements

With large TCP windows, enabled by larger socket buffer sizes, Selective Acknowledgements is strongly recommended. With
selective acknowledgements, dropped or out of order packets are retransmitted from the sender individually rather than the
entire window when SACK is not used. Ensure SACK is enabled on all systems.  

# cat /proc/sys/net/ipv4/tcp_sack
1

A value >0 indicates SACK is enabled.

SACK can be further confirmed for a given socket using the ‘ss’ command. Enabled selective acknowledgements is the default for
Exadata and most newer systems.

Changing the Maximum Socket Buffer Size and Testing Single Process Throughput (Scenarios 1, 2, 3 and 4)

Run tests to find the right value for the maximum socket buffer size which balances throughput and memory utilization of the
system.

1. Find the current setting for maximum socket buffer size on each system:

On Linux, as root:
# cat /proc/sys/net/ipv4/tcp_rmem
/proc/sys/net/ipv4/tcp_rmem:4096 87380 6291456

# cat /proc/sys/net/ipv4/tcp_wmem
/proc/sys/net/ipv4/tcp_wmem:4096 16384 4194304

2. Run a test of single process throughput while gathering output from:


a. oratcptest
b. ss
c. sar

Output Standby

oratcptest $ java -jar oratcptest.jar -server <VIP> -port=1521

OraTcpTest server started.

[A test was requested.]


        Message payload       = 1 Mbyte
        Disk write            = NO
        Socket receive buffer = (system default)

The test terminated. The socket receive buffer was 3 Mbytes.  (Note: Actual buffer is 6MB)

ss (root) # ss --info --tcp --memory --processes --numeric --extended | grep -A1 `ps -aef | grep oratcptest | grep
|awk '{print $2}'`

ESTAB      0      12     [::ffff: <standby IP address>]:<standby port>  [::ffff: <primary IP address>]:<pr
port> users:(("java",pid=1491972,fd=8)) timer:(on,125ms,0) uid:1000 ino:109975165 sk:d <->

         skmem:(r0,rb6291456,t0,tb332800,f348404,w3852,o0,bl0,d0) ts sack cubic wscale:7,7 rto:262


rtt:61.828/0.769 ato:40 mss:8948 pmtu:9000 rcvmss:8948 advmss:8948 cwnd:10 bytes_sent:18712
bytes_acked:18700 bytes_received:4867992964 segs_out:33837 segs_in:546831 data_segs_out:4643
data_segs_in:546472 send 11577926bps lastsnd:1 lastack:1 pacing_rate 23155800bps delivery_rate 460
delivered:4641 app_limited busy:97472ms unacked:3 rcv_rtt:61.137 rcv_space:4913536 rcv_ssthresh:31
minrtt:61.242

sar $ sar -n DEV 5 | egrep 'IFACE|<interface name>'

#NOTE rxkB/s column

04:50:59 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
04:51:04 PM    <name>   5692.00    364.80  49867.42     34.17      0.00      0.00      0.00
04:51:04 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
04:51:09 PM    <name>   5705.40    384.40  49890.11     74.35      0.00      0.00      0.00
04:51:09 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
04:51:14 PM    <name>   5601.40    345.80  48999.09     45.89      0.00      0.00      0.00
04:51:14 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
04:51:19 PM    <name>   5704.20    387.60  49928.55     38.92      0.00      0.00      0.00

<truncated for brevity>

Notice the socket buffer size from ss output on the primary was 4MB and standby was 6MB. These are the
maximum values available to the sockets for the current setting of tcp_rmem and tcp_wmem.
These settings achieved a rate of ~46MB/s

 
3.  Now set the maximum send and receive buffer sizes to 3*BDP.  For example with a 10GbE network and a 4ms
latency(~5MB BDP) a 16MB socket buffer size would be appropriate.

On Linux, as root:

Set the maximum buffer sizes to 16MB for read and write buffers.  Change only the third value:
 

# sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"

net.ipv4.tcp_rmem = 4096 87380 16777216

# sysctl -w net.ipv4.tcp_wmem="4096 16384 16777216"

net.ipv4.tcp_wmem = 4096 16384 16777216

  

NOTE: The values for the net.ipv4.tcp_wmem and net.ipv4.tcp_rmem parameter must be quoted.

NOTE: Any firewall or other pass-through server on the route between the source and target can reduce the effective
socket buffer size of the endpoints. Be sure to increase the size of sockets buffers on these servers as well.

NOTE: Increasing these values potentially increases system memory usage.

NOTE: Changes made with SYSCTL are not permanent.

  
4. Test throughput of a single process using a new maximum socket buffer size.

Re-run the previous tests allowing the socket buffers to dynamically grow to the new maximum of 3*BDP

Output Standby

oratcptest $ java -jar oratcptest.jar -server <VIP of node (or hostname when no VIP exists) > -port=1521

OraTcpTest server started.

[A test was requested.]


        Message payload       = 1 Mbyte
        Disk write            = NO
        Socket receive buffer = (system default)

The test terminated. The socket receive buffer was 8 Mbytes.  (Note: Actual buffer is 16MB)

ss (root)             skmem:(r0,rb16777216,t0,tb332800,f2016,w10272,o0,bl0,d0) ts sack cubic wscale:9,9 rto:260


rtt:59.137/0.083 ato:40 mss:8948 pmtu:9000 rcvmss:8948 advmss:8948 cwnd:18 bytes_sent:24360
bytes_acked:24328 bytes_received:6349127924 segs_out:46216 segs_in:710774 data_segs_out:6055
data_segs_in:710478 send 21788593bps lastsnd:49 lastrcv:49 lastack:49 pacing_rate 43576720bps delivery_rat
11222792bps delivered:6048 app_limited busy:45268ms unacked:8 rcv_rtt:59.005 rcv_space:10021760
rcv_ssthresh:8387072 minrtt:56.632

sar  

#NOTE rxkB/s column


05:34:13 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %if
05:34:18 PM    <name>  15837.00   1146.20 139186.95     74.40      0.00      0.00      0.00     
05:34:18 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %if
05:34:23 PM    <name>  15730.40   1010.00 138213.64     66.48      0.00      0.00      0.00     
05:34:23 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %if
05:34:28 PM    <name>  15790.40    996.00 138734.82     66.39      0.00      0.00      0.00     
05:34:28 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %if
05:34:33 PM    <name>  15884.60   1016.60 139570.83     74.17      0.00      0.00      0.00     

<truncated for brevity>

By allowing TCP to increase the size of the read and write socket buffers to 16MB respectively, the throughput of the link
increased from 46MB/s to over 130MB/s for a single process.

NOTE: The -sockbuf oratcptest parameter is not used during this process.  TCP dynamic buffer management is
disabled when a specific buffer size is requested as with the -sockbuf parameter.

 
5. If needed, repeat steps 3 and 4 with a larger socket buffer size.

6. After completing for one node, to find the total aggregate throughput of a cluster execute the same tests with all nodes
of the primary and all nodes of the standby where node 1 primary ships to node 1 standby, node 2 to node 2 etc.
 
7. Reverse roles and re-run these tests. Sometimes routes will be different depending on the direction of the stream and
throughput could therefore be affected.
 
8. Set the new preferred values in /etc/sysctl.conf

In order to make the new maximum values persistent across reboots, edit /etc/sysctl.conf (or appropriate
/usr/lib/sysctl.d/ configuration file) changing the existing lines for the net.ipv4.tcp_wmem and net.ipv4.tcp_rmem
parameters to the new values. If these lines do not exist, they can be added

(as root)
# vi /etc/sysctl.conf
net.ipv4.tcp_rmem='4096 87380 16777216'
net.ipv4.tcp_wmem='4096 16384 16777216'

Multiple Process Aggregate Throughput (Scenario 3 - RMAN Operations)

While redo transport is a single process for each database instance operations, standby database instantiation and backups and
recover operations can be done with parallelism. Understanding the achievable throughput with parallelism is an important part
of planning instantiation of large databases, migrations and backup and recovery scenarios.

Testing Multiple Process Throughput

The oratcptest option -num_conn runs multiple streams and aggregates the total throughput.  It can be used to mimic
parallelism on a given node and can be run on multiple node simultaneously to find the total throughput of a cluster.

Iteratively test different values for -num_conn until the throughput does not improve for 3 straight values.  The socket buffer
size is another variable that can be changed.

1. Execute the same process from the previous section while including the -num_conn parameter. 

NOTE: This is an iterative process of increasing the number of connections to find the maximum throughput. The
assumption is the optimal maximum socket buffer size has already been set using steps in the previous section.

  

Output Standby

oratcptest $ java -jar oratcptest.jar -server <VIP of node (or hostname when no VIP exists)> -port=1521

OraTcpTest server started.

[A test was requested.]


        Message payload       = 1 Mbyte
        Disk write            = NO
        Socket receive buffer = (system default)

[A test was requested.]


        Message payload       = 1 Mbyte
        Disk write            = NO
        Socket receive buffer = (system default)

The test terminated. The socket receive buffer was 8 Mbytes.  (Note: Actual buffer is 16MB)

The test terminated. The socket receive buffer was 8 Mbytes.  (Note: Actual buffer is 16MB)

ss (root)  # ss --info --tcp --memory --processes --numeric --extended | grep -A1 `ps -aef | grep oratcptest | grep -v gre
|awk '{print $2}'`

ESTAB      0      32     [::ffff: <standby IP address>]:<standby port>  [::ffff: <primary IP address>]:<primary
port> users:(("java",pid=1841422,fd=9)) timer:(on,120ms,0) uid:1000 ino:111022883 sk:31 <->

         skmem:(r0,rb16777216,t0,tb332800,f649184,w10272,o0,bl0,d0) ts sack cubic wscale:9,9 rto:265


rtt:64.416/2.67 ato:40 mss:8948 pmtu:9000 rcvmss:8948 advmss:8948 cwnd:18 bytes_sent:1992
bytes_acked:1960 bytes_received:484606268 segs_out:9951 segs_in:54190 data_segs_out:463
data_segs_in:54184 send 20002981bps lastsnd:21 lastrcv:11 lastack:11 pacing_rate 40005488bps delivery_rate
9975160bps delivered:456 app_limited busy:3892ms unacked:8 rcv_rtt:62.604 rcv_space:8590080
rcv_ssthresh:8383488 minrtt:62.738                                               

ESTAB      0      32     [::ffff: <standby IP address>]:<standby port> [::ffff: <primary IP address>]:<primary p
users:(("java",pid=1841422,fd=8)) timer:(on,139ms,0) uid:1000 ino:111022882 sk:32 <->

         skmem:(r0,rb16777216,t0,tb332800,f653280,w10272,o0,bl0,d0) ts sack cubic wscale:9,9 rto:270


rtt:69.975/0.2 ato:40 mss:8948 pmtu:9000 rcvmss:8948 advmss:8948 cwnd:18 bytes_sent:1764 bytes_acked:1
bytes_received:425218052 segs_out:5112 segs_in:47674 data_segs_out:406 data_segs_in:47647 send
18413891bps lastsnd:3 lastack:3 pacing_rate 36827576bps delivery_rate 9036480bps delivered:399 app_limited
busy:3796ms unacked:8 rcv_rtt:69.15 rcv_space:8956948 rcv_ssthresh:8383488 minrtt:69.416       

sar  

03:35:55 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %if
03:36:00 PM    <name>  28147.40   5020.80 247559.12    324.54      0.00      0.00      0.00     
03:36:00 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %if
03:36:05 PM    <name>  27495.20   4618.20 241625.40    307.50      0.00      0.00      0.00     
03:36:05 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %if
03:36:10 PM    <name>  25562.20   4408.20 224800.60    294.70      0.00      0.00      0.00     
03:36:10 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %if
03:36:15 PM    <name>  25543.00   4310.60 224615.99    336.71      0.00      0.00      0.00     

<truncated for brevity>

2. Run the same command, increasing -num_conn by 2 and gather information. Repeat this process until the aggregate
throughput does not increase for 3 consecutive values of -num_conn.
3. After completing for one node, to find the total aggregate throughput of a cluster execute the same tests with all nodes
of the primary and all nodes of the standby where node 1 ships to node 1, node 2 to node 2 etc.
4. Reverse roles and re-run these tests. Sometimes routes will be different depending on the direction of the stream and
throughput could therefore be affected.

Use this information for set the optimal parallelism during migration, instantiation, or backup/recovery.

Example Results (after optimizing socket buffer size):

Degree of Parallelism One Node Rate Two Node Rate -


(MB/s) aggregate (MB/s)

1 140 MB/s 260 MB/s

2 280 MB/s 560 MB/s

4 560 MB/s 1100 MB/s


6 720 MB/s 1420 MB/s

8 860 MB/s 1700 MB/s

10 1065 MB/s 2140 MB/s

12 1150 MB/s 2310 MB/s

14 (optimal per-node parallelism) 1245 MB/s 2500 MB/s

16 1250 MB/s 2498 MB/s

18 1247 MB/s 2500 MB/s

Throughput with SYNC (Scenario 4 – SYNC Transport)

Assessing Network Performance for Data Guard Synchronous Redo Transport

Synchronous redo transport requires that a primary database transaction wait for confirmation from the standby that redo has
been received and written to disk (a standby redo log file) before commit success is signaled to the application. Network latency
is the single largest inhibitor in SYNC transport, and it must be consistently low to ensure minimal response time and throughput
impact for OLTP applications. Due to the impact of latency, the options for tuning transport are limited, so focus on assessing the
feasibility of SYNC transport for a given network link.
When assessing the performance of SYNC transport between primary and standby systems:

Set Oracle Net SDU for SYNC transport


Determine redo write size that LGWR submits to the network
Use oratcptest to assess SYNC transport performance
Implement FASTSYNC
Increase socket buffer size

Setting Oracle Net SDU for SYNC Transport

Oracle Net encapsulates data into buffers the size of the session data unit (SDU) before sending the data across the network.
Adjusting the size of the SDU buffers can improve performance, network utilization, and memory consumption.  With Oracle Net
Services you can influence data transfer by adjusting the Oracle Net session data unit (SDU) size. Oracle internal testing has
shown that setting the SDU to 65535 improves SYNC transport performance.

Setting SDU for Oracle RAC

SDU cannot be set in the TCP endpoint for SCAN/Node listeners, but SDU can be changed using the global parameter
DEFAULT_SDU_SIZE in the SQLNET.ORA file.
Set DEFAULT_SDU_SIZE in the RDBMS home sqlnet.ora file. (Not GRID home.)

DEFAULT_SDU_SIZE=2097152

Setting SDU for Non-Oracle RAC

You can set SDU on a per connection basis using the SDU parameter in the local naming configuration file (TNSNAMES.ORA) and
the listener configuration file (LISTENER.ORA).

tnsnames.ora

<net_service_name>=
(DESCRIPTION=
(SDU=65535)
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=tcp)(HOST=<hostname>)(PORT=<PORT>))
(CONNECT_DATA=
(SERVICE_NAME=<service name>))

listener.ora

<listener name> =
(DESCRIPTION=
(SDU=65535)
(ADDRESS=(PROTOCOL=tcp)
(HOST=<hostname>)(PORT=<PORT>)
)) 
 

NOTE: ASYNC transport uses a streaming protocol, and increasing the SDU size from the default has no performance benefit.

 NOTE: When SDU size of the client and server differ, the lower value of the two values is used.

  

Use oratcptest to Assess SYNC Transport Performance

Using oratcptest, SYNC writes can be simulated over the network in order to determine bandwidth and latency.  In order to do
this accurately, the average redo write size is needed.

Determine Redo Write Size

The log writer (LGWR) redo write size translates to the packet size written to the network. You can determine the average redo
write size using the metrics total redo size and total redo writes from an AWR report taken during peak redo rate.

total redo size 3,418,477,080

total redo writes 426,201

In this example the average redo write size is about 8k


(redo size / redo writes) = 8,020 or 8k.

The redo write size varies depending on workload and commit time. As the time to commit increases, the amount of redo
waiting for the next write increases, thus increasing the next write size. Because SYNC transport increases the time to commit,
you can expect the redo write size to increase as well. The degree to which the size increases depends on the latency between
the primary and standby. Therefore, metrics taken from an ASYNC configuration are a starting point, and this process should be
repeated once SYNC is enabled for a period of time. 

Run tests with oratcptest

In addition to providing the average write size, you can also specify that the oratcptest server process write network
message to the same disk location where the standby redo logs will be placed.

NOTE: ASM is currently not supported for the write location

Given that the average redo write size in the example is 8k, and if the standby redo logs will be placed on /u01/oraredo, the
server command to issue would be:

$ java -jar oratcptest.jar -server -port=<port number> -file=/u01/oraredo/oratcp.tmp

On the sending side, issue the following client command to send over 8k messages with SYNC writes:

$ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=sync -duration=120s -interval=20s -length=8k
-write

[Requesting a test]

        Message payload        = 8 kbytes


        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = YES
        Statistics interval    = 20 seconds
        Test duration          = 2 minutes
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)
 
(14:43:03) The server is ready.
                        Throughput            Latency (including disk-write)
(14:43:23)          5.662 Mbytes/s           1.382 ms
(14:43:43)          5.618 Mbytes/s           1.393 ms
(14:44:03)          5.656 Mbytes/s           1.383 ms
(14:44:23)          5.644 Mbytes/s           1.386 ms
(14:44:43)          5.680 Mbytes/s           1.377 ms
(14:45:03)          5.637 Mbytes/s           1.388 ms
(14:45:03) Test finished.
               Socket send buffer = 166400
                  Avg. throughput = 5.649 Mbytes/s
                     Avg. latency = 1.385 ms (including disk-write at server)

 The lower throughput is a result of the latency of the network round-trip and the write to disk.  The round-trip is a necessity
with SYNC transport but the write to disk can be addressed with the following section.
NOTE: Sync transport with higher round trip latency (> 5ms) can impact application response time and throughput for OLTP
applications significantly. In the same environment with batch jobs or DML operations, overall elapsed time may not be
impacted as much if sufficient network bandwidth is available.

Implement FASTSYNC

As of Oracle 12c, Data Guard FASTSYNC can improve round trip time of a sync remote write by acknowledging the write when
written to memory, instead of waiting for the write to disk to complete. Whether you see a benefit with FASTSYNC depends on
the speed of the disk at the standby database. Enable FASTSYNC in the log_archive_dest_n parameter by setting Data
Guard Broker property LogXptMode=FASTSYNC or by setting SYNC NOAFFIRM directly in the log_archive_dest_n
parameter when Broker is not used.

DGMGRL> edit database standby set property LogXptMode='FASTSYNC';

OR

SQL> alter system set log_archive_dest_2= ‘service=<standby net service name> SYNC NOAFFIRM db_unique_name=<standby
unique name> net_timeout=8 valid_for=(online_logfile,all_roles)’

Test the benefits of FASTSYNC in oratcptest by running SYNC mode without the -write option.

Server(standby):

$ java -jar oratcptest.jar -server -port=<port number>

Client(primary):

$ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=sync -duration=120s -interval=20s -length=8k

[Requesting a test]
        Message payload        = 8 kbytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = NO
        Statistics interval    = 20 seconds
        Test duration          = 2 minutes
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)
 
(14:40:19) The server is ready.
                        Throughput            Latency
(14:40:39)         25.145 Mbytes/s           0.311 ms
(14:40:59)         24.893 Mbytes/s           0.314 ms
(14:41:19)         25.380 Mbytes/s           0.308 ms
(14:41:39)         25.101 Mbytes/s           0.312 ms
(14:41:59)         24.757 Mbytes/s           0.316 ms
(14:42:19)         25.136 Mbytes/s           0.311 ms
(14:42:19) Test finished.
               Socket send buffer = 166400
                  Avg. throughput = 25.068 Mbytes/s
                     Avg. latency = 0.312 ms

  

NOTE: As the redo write size increases, the throughput and latency increase. Therefore, it is important to repeat these tests
with actual redo write size from metrics collected during sync redo transport.

Increase Socket Buffer Size

Socket buffers do not have the same impact on SYNC transport as they do for ASYNC; however, increased buffer sizes can help
resolve gaps in redo following a standby database outage.  Using the previously determined socket buffer size is recommended
but a setting of 3*Bandwidth Delay Product (BDP) can be used as well.

For example, if asynchronous bandwidth is 622 Mbits and latency is 30 ms

BDP = 622,000,000 (bandwidth) / 8 x 0.030 (latency) = 2,332,500 bytes

3 x BDP = 6,997,500 bytes

Set the Linux kernel parameters net.core.rmem_max and net.core.wmem_max to this value as described above in
'Configuring Operating System Maximum Buffer Size Limits'

Command Options
When ASYNC is chosen, only the bandwidth is measured. Using the default message length (1MB) should suffice for a bandwidth
calculation. The bandwidth is measured from an application point of view as it is calculated from the beginning of the message
send to the start of the next message send. Using the message size and this time interval, bandwidth is calculated. The average
bandwidth is the average measurement of all measurements made during the Statistics interval.

With oratcptest the latency is calculated when the SYNC option is selected. This calculation is based on the time interval from
the start of a message send from the client to the application acknowledgement the client gets from the server. The statistic
interval period is used to calculate the average latency from each sent and acknowledged message. This is application latency
and includes the lower network protocol latency's. More than one message send occurs during the statistics interval and
oratcptest tracks the time interval between all message sends and the acknowledged. If the -file and -write parameters are
used, the latency includes the server's write to disk. Because oratcptest uses the interval between the start of the message write
and the receipt of the acknowledgement message, latency normally increases as the size of the message increases.

oractcptest help output:

$ java -jar oratcptest.jar -help

[OraTcpTest server]

Usage:
        java OraTcpTest -server [server_address] [OPTION]

Options:
        -port=<number>
                listening TCP port number. Must be specified.
        -file=<name>
                file name for disk-write test. Default value is oratcp.tmp.
        -sockbuf=<bytes>
                server socket receive buffer size. Default value is zero,
                which means system default receive buffer size.
        -help
                display help.
        -version
                display version.
Server examples:
        java OraTcpTest -server -port=5555
        java OraTcpTest -server my.server.com -port=5555 -file=test.out -sockbuf=64k

[OraTcpTest client]
Usage:

        java OraTcpTest <server_address> [OPTION]

Options:

        -port=<number>
                listening TCP port number at server. Must be specified.
        -write
                server writes network message to disk before server replies
                with ACK.
        -mode=[SYNC|ASYNC]
                In SYNC mode, client waits for server's ACK before it sends
                next message. In ASYNC mode, it doesn't wait. Default value is
                SYNC
        -num_conn=<number>
                number of TCP connections. Default value is 1.
        -sockbuf=<bytes>
                client socket send buffer size. Default value is zero, which
                means system default send buffer size.
        -length=<bytes>
                message payload length. Default value is 1 Mbyte.
        -delay=<milliseconds>
                delay in milliseconds between network messages. Default value
                is zero, which means no delay between messages.
        -rtt
                round-trip-time measurement mode. Equivalent to -mode=SYNC and
                -length=0.
        -random_length
                random payload length uniformly distributed between 512 bytes
                and -length option value.
        -random_delay
                random delay uniformly distributed between zero and -delay
                option value.
        -payload=[RANDOM|ZERO|<filename>]
                payload content type among random data, all zeroes, or the
                contents of a user-specified file. Default value is RANDOM.
        -interval=<time>
                statistics reporting interval. Default value is 10 seconds.
        -duration=<time> or <bytes>
                test duration in time or bytes. If not specified, test does not
                terminate.
        -freq=<time>/<time>
                test repeat frequency. For example, -freq=1h/24h means the
                test will repeat every 1 hour for 24 hours.
        -timeout=<time>
                network timeout. Default value is zero, which means no timeout.
        -output=<name>
                output file name where client stores test result statistics.
        -help
                display help.
        -version
                display version.

Client examples:

        java OraTcpTest my.server.com -port=5555


        java OraTcpTest my.server.com -port=5555 -rtt
        java OraTcpTest my.server.com -port=5555 -write -mode=SYNC -num_conn=2 -length=2M -delay=15 -
sockbuf=64k -interval=5s -duration=2m -freq=1h/24h -timeout=60s -random_length -random_delay -
payload=ZERO -output=test.out

Didn't find what you are looking for? Ask in Community...

Attachments
Script to detect if SACK is seen (4.3 KB)
oratcptest (30.17 KB)

Related
Products

Oracle Cloud > Oracle Platform Cloud > Oracle Cloud Infrastructure - Database Service > Oracle Cloud Infrastructure - Database Service
Oracle Cloud > Oracle Platform Cloud > Oracle Database Cloud Exadata Service > Oracle Database Cloud Exadata Service
Oracle Cloud > Oracle Platform Cloud > Oracle Database Cloud Service > Oracle Database Cloud Schema Service
Oracle Cloud > Oracle Platform Cloud > Oracle Database Backup Service > Oracle Database Backup Service
Oracle Cloud > Oracle Platform Cloud > Oracle Database Cloud Service > Oracle Database Exadata Express Cloud Service
Oracle Cloud > Oracle Platform Cloud > Oracle Database Cloud Service > Oracle Database Cloud Service
Oracle Database Products > Oracle Database Suite > Oracle Database > Oracle Database - Enterprise Edition > Oracle Data Guard > Transport Issues
Oracle Cloud > Oracle Infrastructure Cloud > Oracle Cloud at Customer > Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine)

Keywords
ASYNC; CONNECT; DATA GUARD; DATAGUARD; NETWORK; PERFORMANCE; PRIMARY; REDO; STANDBY; STATISTICS; SYNC; SYNCHRONOUS; THROUGHPUT; TRANSFER; TRANSPORT
MODE; V$ARCHIVED_LOG

Back to Top
 
Copyright (c) 2023, Oracle. All rights reserved. Legal Notices and Terms of Use Privacy Statement
   

You might also like