0% found this document useful (0 votes)
2K views5 pages

OCFS2 Best Practices

OCFS2 is an extent based, POSIX compliant, general-purpose, symmetric shared disk cluster file system. A single volume will have a fixed block size of up to 4 KB, allowing a maximum volume size of 16 TB.

Uploaded by

Sanchita Banta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views5 pages

OCFS2 Best Practices

OCFS2 is an extent based, POSIX compliant, general-purpose, symmetric shared disk cluster file system. A single volume will have a fixed block size of up to 4 KB, allowing a maximum volume size of 16 TB.

Uploaded by

Sanchita Banta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doc...

Linux OCFS2 - Best Practices [ID 603080.1] Modified 04-JUL-2010 Type REFERENCE Status PUBLISHED

In this Document Purpose Scope Linux OCFS2 - Best Practices 1. What is OCFS2? 2. Is OCFS2 supported by Oracle ? 3. Is there a limit to the number of files and subdirectories in a directory? 4. File Types Supported by OCFS2 5. Should I label a disk after formatting? 6. Performance 7. LVM 8. How does Block size affect performance? 9. Should the IP interconnect be public or private? 10. What should the node name be and should it be related to the IP address? 11. Support for Mount by Label 12. Sample Configuration and Layout 13. Checking for Mounted Volumes 14. How to Obtain OCFS2? 15. How do I populate /etc/ocfs2/cluster.conf? References

Applies to:
Oracle Server - Enterprise Edition Linux x86 Linux x86-64

Purpose
The purpose of this article is to provide best practice guidelines for installing and using Oracle Cluster File System V2 (OCFS2) on Linux.

Scope
The article is intended for System Administrators, Database Administrators or users who are planning to install and use OCFS2 on Linux.

Linux OCFS2 - Best Practices 1. What is OCFS2?


Oracle Cluster File System V2 is the next generation of the Oracle Cluster File System V1. It is an extent based, POSIX compliant, general-purpose, symmetric shared disk cluster file system. OCFS2 is licensed under the GPL. It can be used for can be used for shared Oracle home installations, making management of Oracle Real Application Cluster (RAC) installations easier. OCFS2 allows the storage of application binary files, data files, and databases on sharable devices - e.g. those in a SAN. All nodes in a cluster can have concurrent read and write access to all volumes. OCFS2 supports different block sizes on different volumes, though a single volume will have a fixed block size of up to 4 KB, allowing a maximum volume size of 16 TB.

2. Is OCFS2 supported by Oracle ?


Licensed Oracle Database customers are entitled to receive support (usage and maintenance) for OCFS2 for Oracle Database and Clusterware usage as part of their Database licensing. Where Oracle is the provider of OCFS2 packages for the Linux distribution, Oracle provides maintenance support (including patches, backports and updates) for OCFS2 packages for that distribution. However, where Oracle is not the provider of OCFS2 software, customers are required to contact their OCFS2 supplier directly to obtain OCFS2 updates or PTFs (Program Temporary Fixes) for their distribution.Oracle will support OCFS2 for general-purpose filesystem usage under Oracle Unbreakable Linux Support licensing.

3. Is there a limit to the number of files and subdirectories in a directory?


OCFS2 supports up to 32,000 subdirectories and millions of files in each directory. Directory lookups slow down as the number of directories get larger, which impacts performance. This impact is greater if the system lacks memory to cache the inodes (this is true for any filesystem in Linux or Unix). The impact is also greater for a clustered filesystem as cluster locking is involved.

4. File Types Supported by OCFS2


Unlike the previous release (OCFS), OCFS2 is a general-purpose file system.You can store database files as well as non database files on OCFS2 file systems. OCFS2 volumes containing RAC Voting files, Cluster Registry (OCR) files, RDBMS database files, redo logs, archive logs and control files must be mounted with the "datavolume" and "nointr" mount options. The datavolume option ensures that the Oracle processes open these files with the o_direct flag. The nointr option ensures that the ios are not interrupted by signals. The /etc/fstab entries for such files should be as follows:

1 of 5

4/18/2011 9:58 AM

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doc...

/dev/sdX

/dir

ocfs2

_netdev,datavolume,nointr 0 0

You can store shared Oracle home or any other non database files on OCFS2. If you are using ocfs2 as a general purpose filesystem, the entry to be used in /etc/fstab should be as follows:
/dev/sdX /dir ocfs2 _netdev 0 0

5. Should I label a disk after formatting?


Linux has no guarantee of persistent device names across reboots, especially when new devices are added. Device names are allocated in the order the devices are discovered. This can lead to the wrong device name being used when attempting to mount. Labeling devices allows the label to be used instead of the device name when mounting. In order that volumes can be mounted using labels, the disk must be partitioned before formatting. Partitioning is recommended even if one is not planning to use labels, or planning to use the entire disk for the OCFS2 volume. After partitioning, the "mkfs.ocfs2" tool is used to format the disk, the label can be specified at this time, or later using the "tunefs.ocfs2" tool. An entry in /etc/fstab can then be created to mount the said partition by label name to avoid device naming issues. Examples:
# mkfs.ocfs2 -L ocfs2_disk1 /dev/sdb1 # tunefs.ocfs2 -L oracle_homes /dev/sdb1

Add following entry in /etc/fstab: LABEL=oracle_homes /orahomes ocfs2 _netdev 00

After adding the above entry in the /etc/fstab, the command "mount -a" can be used to have the kernel re-read the /etc/fstab entries and mount the new volume.

6. Performance
Cluster size: Cluster size is the smallest unit of space allocated to a file to hold the data. Options are 4, 8, 16, 32, 64, 128, 256, 512, and 1024 KB. Cluster size cannot be modified after the volume is formatted.Oracle recommends a cluster size of 128 KB or larger for database volumes. Oracle also recommends a cluster size of 32 or 64 KB for Oracle Home.

Block size: The smallest unit of space addressable by the file system. Specify the block size when you create the volume.Options are 512 bytes (not recommended), 1 KB, 2 KB, or 4 KB (recommended for most volumes). Block size cannot be modified after the volume is formatted.

Number of node slots: The maximum number of nodes that can concurrently mount a volume. When formatting, OCFS2 allocates space for "metadata", including a journal, for each node slot. Nodes that access the volume can be a combination of little-endian architectures (such as x86, x86-64, and ia64) and big-endian architectures (such as ppc64 and s390x). Node-specific files are referred to as local files. A node slot number is appended to the local file. For example: journal:0000 belongs to whatever node is assigned to slot number 0. Set each volumes maximum number of node slots when you create it, according to how many nodes that you expect to concurrently mount the volume. Use the tunefs.ocfs2 utility to increase the number of node slots as needed. This number cannot be decreased by ocfs2 1.2 tools, but later versions allow this, though the volume has to be unmounted from all nodes to accomplish this. Note that "df" and "du" commands do not account for space used for metadata within a volume. In extreme circumstances, file expansion may fail with an "out of space" error even though there appears to be plenty of space available. This can be due to the volume being extremely fragmented to the point where certain metadata objects cannot be expanded due to insufficient contiguous free space. The problem is resolved in current versions, though the complete solution requires a volume to have been formatted with at least version 1.4.4.1 of the tools. An alternative solution is to reduce the number of node slots when the out-of-space condition occurs. This frees up the space from the journal of a slot, allowing the metadata to be allocated. Note 391292.1 provides a script that produces a report of OCFS2 volumes. File System Capacity: Current software uses 32 bits to address block numbers. Thus the volume capacity is limited to (2 ^ 32) * blocksize. With a 4KB block size this amounts to a 16TB file system. This block addressing limit will be relaxed in a future release. At that point the limit will become addressing 1MB clusters using 32 bits, which will allow for a maximum 4PB file system. Number of volumes

2 of 5

4/18/2011 9:58 AM

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doc...

The mounting of too many OCFS2 volumes (i.e. 50 or more) per cluster is likely to create a performance (process) bottleneck - this is not specifically related to OCFS2. Ideally, it is desirable to have no more than around 20 OCFS2 partitions per system. See also http://oss.oracle.com/bugzilla/show_bug.cgi?id=992

7. LVM
Since no volume management is built into OCFS2, Oracle recommends enabling hardware raid support to create logical disk volumes of sufficient size. Logical Volume Manager (LVM) and any software RAID managers that are not cluster-aware are not supported. To learn more about support for LVM and LVM2 please refer to: Note 423207.1 Enterprise Linux Support for GFS, LVM and LVM2

8. How does Block size affect performance?


Unix filesystems are allocated a block size upon creation, a block being the unit of data the filesystem will always automatically allocate on the disk. So how does this element affect performance? A large block size will give the best I/O speed for large files whereas a small block size gives better storage efficiency. In general then, filesystems used for large files should have a large block size since as file size increases, the amount of unused space at the end of the file becomes relatively small. This also makes sense since if you're working with huge files you'll want to access them as quickly as possible, particularly if those files are executables. Large executables in a filesystem with a large block size will have a faster startup time, and paging time. The question of whether this consideration is redundant stems from the fact that block size applies to filesystems, not directories or files and most practical filesystems will contain a mix of file types. However, dedicated filesystems are not only good practice, but also essential to the management of, for instance, a large database. In the case of a database-dedicated filesystem the block size is a definite consideration. The larger data files and executables should be stored on filesystems with large block sizes, and the areas where smaller files exist, for example development areas will be best served by a smaller block size. Note that additional refinements can be made by aligning the block size of the database with the filesystem, for example making the Oracle block size the same. This will provide a performance lift to datafile reads and writes.

9. Should the IP interconnect be public or private?


Using a private interconnect is recommended. While OCFS2 does not take much bandwidth, it does require the nodes to be alive on the network and sends regular keepalive packets to ensure that they are. To avoid a network delay being interpreted as a node disappearing on the net which could lead to a node-self-fencing, a private interconnect is recommended. One could use the same interconnect for Oracle RAC and OCFS2.To learn more about OCFS2 and private interconnect, please refer Note 604958.1 OCFS2 Node Fence Caused by Removing the External Network

10. What should the node name be and should it be related to the IP address?
The node name needs to match the hostname. The IP address need not be the one associated with that hostname. ny valid IP address on that node can be used. OCFS2 will not attempt to match the node name (hostname) with the specified IP address. The node name does not need to include the domain name for example if the fully qualified name is "appserver.oracle.com" it can be defined to ocfs2 as "appserver". See also "How do I populate /etc/ocfs2 /cluster.conf?" below.

11. Support for Mount by Label


All OCFS2 versions supports mount by label feature. Mount will support the auto-mounting of OCFS2 volumes by label.
# mount -t ocfs2 -L mylabel /dir

See also "Should I label a disk after formatting?" above.

12. Sample Configuration and Layout


Install the appropriate ocfs2-kernel module. The correct module depends on your kernel version, architecture and type. The 'uname -a' command will identify your current running kernel e.g.
Linux bangalore-ibc-kp-8b-10-176-230-211.idc.oracle.com 2.6.18-194.3.1.0.2.el5 #1 SMP Wed Mar 5 13:03:41 EST 2008 i686 i686 i386 GNU/Linux

In the example above, the following Red Hat 5 OCFS2 packages, i.e. for kernel type enterprise for x86 arch, should be installed:
ocfs2-2.6.18-194.3.1.0.2.el5ovs-1.4.7-1.el5 ocfs2-tools-1.4.4-1.el5 ocfs2console-1.4.4-1.el5

During installation, OCFS2 automatically creates the necessary init/rc scripts (/etc/init.d/ocfs2). If you wish to automatically mount OCFS2 volumes upon server startup/reboot, add a corresponding line to the /etc/fstab file specifying a filesystem type of

3 of 5

4/18/2011 9:58 AM

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doc...

'ocfs2' for each OCFS2 partition. Also ensure that when running "service o2cb configure" reply "y" to option "Load O2CB driver on boot". For Oracle Enterprise / Red Hat Enterprise Linux only, ensure to add the _netdev to the 4th field in /etc/fstab file (usually 'defaults') e.g.
/dev/sda3 /OVS ocfs2 _netdev,datavolume 1 0

Note: the _netdev directive instructs mount to exclude these volumes on first pass mount i.e. only mount after all network services are started.

13. Checking for Mounted Volumes


OCFS2 provides a utility (/sbin/mounted.ocfs2) to check which nodes have an OCFS2 volume mounted. Before performing maintenance on an OCFS2 volume e.g. upgrade, ensure that no other nodes have the volume mounted. Following is sample mounted.ocfs2 output:
[root@arachnid /]# /sbin/mounted.ocfs2 /dev/sda3 Device FS Nodes /dev/sda3 ocfs2 0

See also the man page for mounted.ocfs2.

14. How to Obtain OCFS2?


Refer the OCFS2 project page at http://oss.oracle.com/projects/ocfs2/ for details of OCFS2 supported Linux distributions.

15. How do I populate /etc/ocfs2/cluster.conf?


If you have installed the console, use it to create this configuration file. For details, refer to the user's guide. If you do not have the console installed, it can be manually created. Users populating cluster.conf manually should follow the format strictly. The stanza header must start at the first column and end with a colon, stanza parameters must start after a tab, and a blank line must separate each stanza. Take care to avoid any stray "white space". Ensure the cluster.conf file is identical on all the nodes in the cluster.

The following is a sample /etc/ocfs2/cluster.conf that describes a three node cluster.

cluster: node_count = 3 name = webcluster node: ip_port = 7777 ip_address = 192.168.0.107 number = 7 name = node7 cluster = webcluster node: ip_port = 7777 ip_address = 192.168.0.106 number = 6 name = node6 cluster = webcluster node: ip_port = 7777 ip_address = 192.168.0.110 number = 10 name = node10 cluster = webcluster

References
NOTE:391771.1 - OCFS2 - FREQUENTLY ASKED QUESTIONS NOTE:421640.1 - OCFS2: Supportability as a general purpose filesystem NOTE:566819.1 - Supportability of OCFS2 on Non-certified Linux Distributions NOTE:727866.1 - OCFS2 Performance: Measurement, Diagnosis and Tuning http://oss.oracle.com/projects/ocfs2/

4 of 5

4/18/2011 9:58 AM

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doc...

http://www.novell.com/documentation/sles10/sles_admin/index.html?page=/documentation/sles10/sles_admin/data/b3uxgac.html NOTE:391292.1 - Script to gather OCFS2 diagnostic information

Related Products Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition Keywords VOLUME; ARCHIVELOGS; BLOCK SIZE; DATA FILE; FRAGMENTATION; OCFS VOLUME; RAW; REDO LOG Errors LVM2; ENOMEM; FS-2

Back to top Copyright (c) 2007, 2010, Oracle. All rights reserved. Legal Notices and Terms of Use | Privacy Statement

5 of 5

4/18/2011 9:58 AM

You might also like