CheckPoint NGX ClusterXL User Guide PDF
CheckPoint NGX ClusterXL User Guide PDF
NGX (R60)
IMPORTANT
Check Point recommends that customers stay up-to-date with the latest
service packs and versions of security products, as they contain security
enhancements and protection against new and changing attacks.
For additional technical information about Check Point products, consult Check Points SecureKnowledge at:
https://secureknowledge.checkpoint.com
See the latest version of this document in the User Center at:
http://www.checkpoint.com/support/technical/documents/docs_r60.html
Table of Contents 5
Chapter 4 High Availability and Load Sharing in ClusterXL
Introduction to High Availability and Load Sharing 35
Load Sharing 36
High Availability 36
Example ClusterXL Topology 37
Defining the Cluster Member IP Addresses 38
Defining the Cluster Virtual IP Addresses 39
The Synchronization Network 39
Configuring Cluster Addresses on Different Subnets 39
ClusterXL Modes 40
Introduction to ClusterXL Modes 40
Load Sharing Multicast Mode 41
Load Sharing Unicast Mode 42
New High Availability Mode 44
Mode Comparison Table 46
Failover 47
What is a Failover? 47
When Does a Failover Occur? 48
What Happens When a Gateway Recovers? 48
How a Recovered Cluster Member Obtains the Security Policy 48
Implementation Planning Considerations 49
High Availability or Load Sharing 49
Choosing the Load Sharing Mode 49
IP Address Migration 50
Hardware Requirements, Compatibility and Example Configuration 50
ClusterXL Hardware Requirements 50
ClusterXL Hardware Compatibility 53
Example configuration of a Cisco Catalyst Routing Switch 53
Check Point Software Compatibility 55
Operating System Compatibility 55
Check Point Software Compatibility (excluding SmartDefense) 55
ClusterXL Compatibility with SmartDefense 58
Forwarding Layer 58
Configuring ClusterXL 60
Configuring Routing for the Client Machines 60
Preparing the Cluster Member Machines 60
Choosing the CCP Transport Mode on the Cluster Members 61
SmartDashboard Configuration 62
6
Chapter 6 Monitoring and Troubleshooting Gateway Clusters
How to Verify the Cluster is Working Properly (cphaprob) 75
The cphaprob Command 76
Monitoring Cluster Status (cphaprob state) 77
Monitoring Cluster Interfaces (cphaprob [-a] if) 78
Monitoring Critical Devices (cphaprob list) 80
Registering a Critical Device (cphaprob -d ... register) 81
Registering Critical Devices Listed in a File (cphaprob -f <file> register) 81
Unregistering a Critical Device (cphaprob -d ... unregister) 82
Reporting Critical Device Status to ClusterXL (cphaprob -d ... report) 82
Example cphaprob Script 82
Monitoring Cluster Status using SmartConsole Clients 83
SmartView Monitor 83
SmartView Tracker 83
ClusterXL Configuration Commands (cphaconf, cphastart, cphastop) 87
The cphaconf Command 87
The cphastart and cphastop Commands 87
How to Initiate Failover 88
Stopping the Cluster Member 88
Starting the Cluster Member 88
Monitoring Synchronization (fw ctl pstat) 89
Troubleshooting Synchronization (cphaprob [-reset] syncstat) 92
Introduction to cphaprob [-reset] syncstat 92
Output of the cphaprob [-reset] syncstat command 93
Synchronization Troubleshooting Options 101
ClusterXL Error Messages 103
General ClusterXL Error Messages 104
SmartView Tracker Active Mode Messages 105
Sync Related Error Messages 106
TCP Out-of-State Error Messages 107
Platform Specific Error Messages 108
Solaris Platform Specific Issues: VLAN Switch Port Flapping 109
Member Fails to Start After Reboot 110
Table of Contents 7
How to Configure Module Configuration Parameters to Survive a Boot 120
Controlling the Clustering and Synchronization Timers 121
Blocking New Connections Under Load 121
Working with SmartView Tracker Active Mode 122
Reducing the Number of Pending Packets 123
Configuring Full Synchronization Advanced Options 124
Defining Disconnected Interfaces 125
Defining a Disconnected Interface on Unix 125
Defining a Disconnected Interface on Windows 125
Configuring Policy Update Timeout 125
Enhanced Enforcement of the TCP 3-Way Handshake 126
Configuring Cluster Addresses on Different Subnets 127
Introduction to Cluster Addresses on Different Subnets 127
Configuration of Cluster Addresses on Different Subnets 128
Example of Cluster Addresses on Different Subnets 129
Limitations of Cluster Addresses on Different Subnets 130
Moving from High Availability Legacy to High Availability New Mode or Load Sharing with
Minimal Effort 132
On the Modules 133
From SmartDashboard 133
Moving from High Availability Legacy to High Availability New Mode or Load Sharing with
Minimal Downtime 134
Moving from a Single Gateway to a ClusterXL Cluster 136
On the Single Gateway Machine 136
On Machine 'B' 136
In SmartDashboard, for Machine B 136
On Machine 'A' 136
In SmartDashboard for Machine A 137
Adding Another Member to an Existing Cluster 137
137
Configuring ISP Redundancy on a Cluster 138
Enabling Dynamic Routing Protocols in a Cluster Deployment 139
Components of the System 139
Dynamic Routing in ClusterXL 140
8
Appendix B Example cphaprob Script
More information 149
The clusterXL_monitor_process script 149
Index 155
Table of Contents 9
10
CHAPTER 1
Introduction to
ClusterXL
In This Chapter
Summary of Contents
Chapter 1, Introduction to ClusterXL briefly describes the need for Gateway
Clusters, introduces ClusterXL and the Cluster Control Protocol, specifies
installation and licensing requirements, and lists some clustering definitions and
terms.
Chapter 2, Synchronizing Connection Information Across the Cluster describes
State Synchronization, what not to synchronize, and how to configure State
Synchronization.
Chapter 4, High Availability and Load Sharing in ClusterXL describes the
ClusterXL Load Sharing and High Availability modes, talks about failover and the
compatibility with other Check Point software and hardware.
Chapter 5, Working with OPSEC Certified Clustering Products describes the
special considerations for working with OPSEC clustering products.
11
The Need for Gateway Clusters
12
Enhanced Reliability and Performance through Load Sharing
ClusterXL uses unique physical IP and MAC addresses for the cluster member, and virtual
IP addresses to represent the cluster itself. Virtual addresses (in all configurations other
than High Availability Legacy mode) do not belong to any real machine interface.
ClusterXL supplies an infrastructure that ensures that no data is lost in case of a failure,
by making sure each gateway cluster is aware of the connections going through the
other members. Passing information about connections and other VPN-1 Pro states
between the cluster members is called State Synchronization.
VPN-1 Pro Gateway Clusters can also be built using OPSEC certified High Availability
and Load Sharing products. OPSEC Certified Clustering products use the same State
Synchronization infrastructure as ClusterXL.
1 You must have a license for VPN-1 Pro (with SKU: CPMP-VPG) installed on at least
one of the cluster members. For Check Point Express you must have the matching
Express license (with SKU: CPXP-VPX) installed on at least one of the cluster
members.
2 On the other member(s) it is possible to install a secondary module license with
SKU: CPMP-HVPG and for a Check Point Express with SKU: CPXP-HVPX.
3 If you are using legacy licenses (FM-X), ignore points 1 and 2 and make sure that
each cluster member has a FireWall-1 license (with SKU: FM-U or similar).
4 For each ClusterXL Load Sharing cluster you must have an additional Load Sharing
add-on license installed on the management station. There are two Load Sharing
license SKUs: CPMP-CXLS-U-NGX and CPMP-CXLS-500-NGX. ClusterXL High
Availability and third party clusters (both High Availability and Load Sharing) do
not require an additional license/add-on.
5 After upgrading to NGX (R60), a previous version license for ClusterXL
automatically counts as a legitimate Load Sharing license eliminating the
requirement in point 4.
6 Both the plug and play and the evaluation licenses include the option to work with
up to three ClusterXL Load Sharing clusters managed by the same management
station.
ClusterXL supported platforms are listed in the platform support matrix in the Check
Point Enterprise Suite NGX (R60) Release Notes, available online at:
http://www.checkpoint.com/techsupport/downloads.jsp.
14
Enhanced Reliability and Performance through Load Sharing
Cluster
A group of machines that work together to provide Load Sharing and/or High
Availability.
Failure
A machine taking over packet filtering in place of another machine in the cluster that
suffered a Failure.
High Availability
The ability to maintain a connection when there is a Failure by having another machine
in the cluster take over the connection, without any loss of connectivity. Only the
Active machine filters packets, and the others do not. One of the machines in the
cluster is configured as the Active machine. If a Failover occurs on the Active machine,
one of the other machines in the cluster assumes its responsibilities.
Active Up
When the High Availability machine that was Active and suffered a Failure becomes
available again, it returns to the cluster, not as the Active machine but as one of the
standby machines in the cluster.
Primary Up
When the High Availability machine that was Active and suffered a Failure becomes
available again, it resumes its responsibilities as the Primary machine.
Hot Standby
In a Load Sharing Gateway Cluster, all machines in the cluster filter packets. Load
Sharing provides High Availability, gives transparent Failover to any of the other
machines in the cluster when a Failure occurs and provides enhanced reliability and
performance. Load Sharing is also known as Active/Active.
In Load Sharing Multicast mode of ClusterXL, every member of the cluster receives all
the packets sent to the cluster IP address. A router or Layer 3 switch forwards packets to
all cluster members using multicast. A ClusterXL decision algorithm on all cluster
members decides which cluster member should perform enforcement processing on the
packet.
Unicast Load Sharing
In Load Sharing Unicast mode of ClusterXL, one machine (the Pivot) receives all
traffic from a router with a unicast configuration, and redistributes the packets to the
other machines in the cluster. The Pivot machine is chosen automatically by ClusterXL.
Critical Device
A device which the administrator has defined to be critical to the operation of the
cluster member. A critical device is also known as a Problem Notification (pnote).
Critical devices are constantly monitored. If a critical device stops functioning, this is
defined as a Failure. A device can be hardware, or a process. The fwd and cphad
processes are predefined by default as critical devices. The Security Policy is also
predefined as a critical device. The administrator can add to the list of critical devices
using the cphaprob command.
State Synchronization
The technology that maintains connections after Failover. State Synchronization is used
by both ClusterXL and third-party clustering solutions. It works by replicating VPN-1
Pro kernel tables.
Secured interface
16
CHAPTER 2
Synchronizing
Connection Information
Across the Cluster
In This Chapter
17
The Check Point State Synchronization Solution
18
How State Synchronization Works
connecting the physical network interfaces of the cluster members directly using a
cross-cable. In a cluster with three of more members, use a dedicated hub or
switch.
Note - It is possible to run synchronization across a WAN. For details, see Synchronizing
Clusters over a Wide Area Network on page 24.
Note - The source MAC address can be changed. See Connecting Several Clusters on the
Same VLAN on page 116.
Non-Synchronized Services
In a gateway cluster, all connections on all cluster members are normally synchronized
across the cluster. However, not all services that cross a gateway cluster need necessarily
be synchronized.
It is possible to decide not to synchronize TCP, UDP and Other types of service.
By default, all these services are synchronized.
The VRRP and IP Clustering control protocols, as well as the IGMP protocol, are
not synchronized by default (although you can choose to turn on synchronization
for these protocols). Protocols that run solely between cluster members need not be
synchronized. Although it is possible to synchronize them, no benefit will be gained
if the cluster is configured to do so. The synchronization information is not relevant
for this case because it will not help in case of a failover. Therefore the following
protocols are not synchronized by default: IGMP, VRRP, IP clustering and some
other OPSEC cluster control protocols.
Broadcasts and multicasts are not synchronized, and cannot be synchronized.
It is possible to have both a synchronized service and a non-synchronized definition of
a service, and to use them selectively in the Rule Base.
20
Duration Limited Synchronization
To configure a service so that it will not be synchronized, edit the Service object. See
Setting a Service to be Non-Synchronized on page 26.
Non-Sticky Connections
A connection is called sticky if all packets of the connection are handled by a single
cluster member. In a non-sticky connection, a reply packet may return through a
different gateway than the original packet.
The synchronization mechanism knows how to properly handle non-sticky
connections. In a non-sticky connection, a cluster member gateway can receive an
out-of-state packet, which VPN-1 Pro normally drops because it poses a security risk.
In Load Sharing configurations, all cluster members are active, and in Static NAT and
encrypted connections, the source and destination IP addresses change. Therefore,
Static NAT and encrypted connections through a Load Sharing cluster may be
non-sticky. Non-stickiness may also occur with Hide NAT, but ClusterXL has a
mechanism to make it sticky.
In High Availability configurations, all packets reach the Active machine, so all
connections are sticky. If failover occurs during connection establishment, the
connection is lost, but synchronization can be performed later.
If the other members do not know about a non-sticky connection, the packet will be
out-of-state, and the connection will be dropped for security reasons. However, the
Synchronization mechanism knows how to inform other members of the connection.
The Synchronization mechanism thereby prevent out-of-state packets in valid, but
non-sticky connections, so that these non-sticky connections are allowed.
Non-sticky connections will also occur if the network administrator has configured
asymmetric routing, where a reply packet returns through a different gateway than the
original packet.
TCP Streaming
TCP streaming technology reassembles TCP segments, enabling inspection of complete
protocol units before any of them reach the client or server. In addition, TCP streaming
provides the ability to modify TCP streams on-the-fly and add or remove data from the
stream.
Certain Web Intelligence and VoIP Application Intelligence features that use TCP
streaming technology must be sticky (i.e., be handled by the same cluster member in
each direction) to avoid excessive synchronization. For further details about Check
Point security features that require stickiness, refer to the Check Point Enterprise Suite
NGX (R60) Release Notes, available online at:
http://www.checkpoint.com/techsupport/downloads.jsp.
By default, on the event of failover, a TCP streaming connection is reset.
22
How the Synchronization Mechanism Handles Non-Sticky Connections
See Enhanced Enforcement of the TCP 3-Way Handshake on page 126 for
additional information.
FIGURE 2-1 A Non-sticky (asymmetrically routed) connection
5 Wait for all the other cluster members to acknowledge the information in the sync
packet.
6 Release held SYN packet.
7 All cluster members are ready for the SYN-ACK.
24
Configuring State Synchronization
5 Set Synchronize connections on the cluster in the new service, so that it is different
from the setting in the existing service.
26
Configuring Duration Limited Synchronization
2 In the Service Properties window, click Advanced to display the Advanced Services
Properties window.
3 Select Start synchronizing x seconds after connection initiation.
Note - As this feature is limited to HTTP-based services, the Start synchronizing - seconds
after connection initiation checkbox is not displayed for other services.
4 In the seconds field, enter the number of seconds or select the number of seconds
from the dropdown list, for which you want synchronization to be delayed after
connection initiation.
28
CHAPTER 3
Sticky Connections
In This Chapter
Note - For the latest information regarding features that require sticky connections, refer to
the Check Point Enterprise Suite NGX (R60) Release Notes, available online at:
http://www.checkpoint.com/techsupport/downloads.jsp.
29
Introduction to Sticky Connections
The following services and connection types are now supported by enabling the Sticky
Decision Function:
VPN deployments with third-party VPN peers
SecureClient/SecuRemote/SSL Network Extender encrypted connections,
including SecureClient visitor mode
The Sticky Decision Function has the following limitations:
Sticky Decision Function is not supported when employing either Performance
Pack or a hardware-based accelerator card. Enabling the Sticky Decision Function
disables these acceleration products.
When the Sticky Decision Function is used in conjunction with VPN, cluster
members are prevented from opening more than one connection to a specific peer.
Opening another connection would cause another SA to be generated, which a
third-party peer, in many cases, would not be able to process.
30
Third-Party Gateways in Hub and Spoke Deployments
In this scenario:
A third-party peer (gateway or client) attempts to create a VPN tunnel.
Cluster Members A and B belong to a ClusterXL Gateway in Load Sharing mode.
The third-party peers, lacking the ability to store more than one set of SAs, cannot
negotiate a VPN tunnel with multiple cluster members, and therefore the cluster
member cannot complete the routing transaction.
This issue is resolved for certain third-party peers or any gateways that can save only
one set of SAs by making the connection sticky. Enabling the Sticky Decision Function
sets all VPN sessions to be processed by a single cluster member. To enable the Sticky
Decision Function, in SmartDashboard edit the cluster object > ClusterXL page >
Advanced, and enable the property Use Sticky Decision Function.
In this scenario:
The intent of this deployment is to enable hosts that reside behind Spoke A to
communicate with hosts behind Spoke B.
The ClusterXL Gateway is in Load Sharing mode, is composed of Cluster Members
A and B, and serves as a VPN Hub.
32
Establishing a Third-Party Gateway in a Hub and Spoke Deployment
2 Create a Tunnel Group to handle traffic from specific peers. Use a text editor to
edit the file $FWDIR/lib/user.def, and add a line similar to the following:
all@{member1,member2} vpn_sticky_gws = {<10.10.10.1;1>,
<20.20.20.1;1>};
The elements of this configuration are as follows:
Element Description
3 Other peers can be added to the Tunnel Group by including their IP addresses in
the same format as shown above. To continue with the example above, adding
Spoke C would look like this:
all@{member1,member2} vpn_sticky_gws = {<10.10.10.1;1>,
<20.20.20.1;1>,<30.30.30.1;1>};
Note that the Tunnel Group Identifier ;1 stays the same, which means that the
listed peers will always connect through the same cluster member.
This procedure in essence turns off Load Sharing for the connections affected. If
the implementation is to connect multiple sets of third-party gateways one to
another, a form of Load Sharing can be accomplished by setting gateway pairs to
work in tandem with specific cluster members. For instance, to set up a connection
between two other spokes (C and D), simply add their IP addresses to the line and
replace the Tunnel Group Identifier ;1 with ;2. The line would then look
something like this:
all@{member1,member2} vpn_sticky_gws = {<10.10.10.1;1>,
<20.20.20.1;1>,<192.168.15.5;2>,<192.168.1.4;2>,};
Note that there are now two peer identifiers: ;1 and ;2. Spokes A and B will now
connect through one cluster member, and Spokes C and D through another.
Note - The tunnel groups are shared between active cluster members. In case of a change in
cluster state (e.g., failover or member attach/detach), the reassignment is performed
according to the new state.
34
CHAPTER 4
In This Chapter
35
Introduction to High Availability and Load Sharing
All machines in the cluster are aware of the connections passing through each of the
other machines. The cluster members synchronize their connection and status
information across a secure synchronization network.
The glue that binds the machines in a ClusterXL cluster is the Cluster Control Protocol
(CCP), which is used to pass synchronization and other information between the
cluster members.
Load Sharing
ClusterXL Load Sharing distributes traffic within a cluster of gateways so that the total
throughput of multiple machines is increased.
In Load Sharing configurations, all functioning machines in the cluster are active, and
handle network traffic (Active/Active operation).
If any individual Check Point gateway in the cluster becomes unreachable, transparent
failover occurs to the remaining operational machines in the cluster, thus providing
High Availability. All connections are shared between the remaining gateways without
interruption.
High Availability
High Availability allows organizations to maintain a connection when there is a failure
in a cluster member, without Load Sharing between cluster members. In a High
Availability cluster, only one machine is active (Active/Standby operation). In the event
that the active cluster member becomes unreachable, all connections are re-directed to
a designated backup without interruption. In a synchronized cluster, the backup cluster
members are updated with the state of the connections of the active cluster member.
In a High Availability cluster, each machine is given a priority. The highest priority
machine serves as the gateway in normal circumstances. If this machine fails, control is
passed to the next highest priority machine. If that machine fails, control is passed to
the next machine, and so on.
Upon gateway recovery, it is possible to maintain the current active gateway (Active Up),
or to switch to the highest priority gateway (Primary Up). Note that in Active Up
configuration, changing and installing the Security Policy may restart the ClusterXL
configuration handshake on the members, which may lead to another member being
chosen as the Active machine.
36
High Availability
ClusterXL uses unique physical IP and MAC addresses for the cluster member, and virtual
IP addresses to represent the cluster itself. Cluster interface addresses do not belong to
any real machine interface.
FIGURE 4-1 shows a two-member ClusterXL cluster, and contrasts the virtual IP
addresses of the cluster, and the physical IP addresses of the cluster members.
Each cluster member has three interfaces: one external interface, one internal interface,
and one for synchronization. Cluster member interfaces facing in each direction are
connected via a switch, router, or VLAN switch.
All cluster member interfaces facing the same direction must be in the same network.
For example, there must not be a router between cluster members.
The SmartCenter Management Server can be located anywhere, and should be routable
to either the internal or external cluster addresses.
Refer to the sections following FIGURE 4-1 for a description of the ClusterXL
configuration concepts shown in the example.
Note -
1. High Availability Legacy Mode uses a different Topology, and is discussed in the
Appendix: High Availability Legacy Mode on page 141.
2. In the examples in this and subsequent sections, addresses in the range 192.168.0.0 to
192.168.255.255 which are RFC 1918 private addresses are used to represent routable
(public) IP addresses.
38
Defining the Cluster Virtual IP Addresses
For example, in the configuration in FIGURE 4-1, there are two cluster members,
Member_A and Member_B. Each has an interface with an IP address facing the
Internet through a hub or a switch. This is the External interface with IP address
192.168.10.1 on Member_A and 192.168.10.2 on Member_B, and is the interface that
the cluster external interface sees.
Note - NGX presents an option to use only two interfaces per member, one external and one
internal and to run synchronization over the internal interface. However, this configuration is
not recommended and should be used for backup only. For more information see Chapter 2,
Synchronizing Connection Information Across the Cluster.
The cluster has one external virtual IP address and one internal virtual IP address. The
external IP address is 192.168.10.100, and the internal IP address is 10.10.0.100.
Configuring different subnets for the cluster IP addresses and the member addresses is
useful in order to:
Enable a multi-machine cluster to replace a single-machine gateway in a
pre-configured network, without the need to allocate new addresses to the cluster
members.
Allow organizations to use only one routable address for the ClusterXL Gateway
Cluster. This saves routable addresses.
For details, see Configuring Cluster Addresses on Different Subnets on page 127.
ClusterXL Modes
In This Section
Note - All examples in the section refer to the ClusterXL configuration shown in FIGURE 4-1
on page 38.
40
Load Sharing Multicast Mode
Example
This scenario describes a user logging from the Internet to a web server behind the
Firewall cluster that is configured in Load Sharing Multicast mode.
1 The user requests a connection from 192.168.10.78 (his computer) to 10.10.0.34
(the web server).
2 A router on the 192.168.10.x network recognizes 192.168.10.100 (the cluster's
virtual IP address) as the gateway to the 10.10.0.x network.
3 The router issues an ARP request to 192.168.10.100.
4 One of the active members intercepts the ARP request, and responds with the
Multicast MAC assigned to the cluster IP address of 192.168.10.100.
5 When the web server responds to the user requests, it recognizes 10.10.0.100 as its
gateway to the Internet.
6 The web server issues an ARP request to 10.10.0.100.
7 One of the active members intercepts the ARP request, and responds with the
Multicast MAC address assigned to the cluster IP address of 10.10.0.100.
8 All packets sent between the user and the web server reach every cluster member,
which decides whether to handle or drop each packet.
9 When a failover occurs, one of the cluster members goes down. However, traffic
still reaches all of the active cluster members, and hence there is no need to make
changes in the network's ARP routing. All that changes is the cluster's decision
function, which takes into account the new state of the members.
42
Load Sharing Unicast Mode
Even though the pivot member is responsible for the decision process, it still acts as a
Firewall module that processes packets (for example, the decision it makes can be to
handle a packet on the local machine). However, since its additional tasks can be time
consuming, it is usually assigned a smaller share of the total load.
When a failover event occurs in a non-pivot member, its handled connections are
redistributed between active cluster members, providing the same High Availability
capabilities of New High Availability and Load Sharing Multicast. When the pivot
member encounters a problem, a regular failover event occurs, and, in addition, another
member assumes the role of the new pivot. The pivot member is always the active
member with the highest priority. This means that when a former pivot recuperates, it
will retain its previous role.
See Figure 4-1 on page 38 for an example of a typical ClusterXL configuration.
Example
In this scenario, we use a Load Sharing Unicast cluster as the gateway between the
user's computer and the web server.
1 The user requests a connection from 192.168.10.78 (his computer) to 10.10.0.34
(the web server).
2 A router on the 192.168.10.x network recognizes 192.168.10.100 (the cluster's
virtual IP address) as the gateway to the 10.10.0.x network.
3 The router issues an ARP request to 192.168.10.100.
4 The pivot member intercepts the ARP request, and responds with the MAC address
that corresponds to its own unique IP address of 192.168.10.1.
5 When the web server responds to the user requests, it recognizes 10.10.0.100 as its
gateway to the Internet.
6 The web server issues an ARP request to 10.10.0.100.
7 The pivot member intercepts the ARP request, and responds with the MAC address
that corresponds to its own unique IP address of 10.10.0.1.
8 The user's request packet reaches the pivot member on interface 192.168.10.1.
9 The pivot decides that the second member should handle this packet, and forwards
it to 192.168.10.2.
10 The second member recognizes the packet as a forwarded one, and processes it.
11 Further packets are processed by either the pivot member, or forwarded and
processed by the non-pivot member.
12 When a failover occurs on the pivot, the second member assumes the role of pivot.
13 The new pivot member sends gratuitous ARP requests to both the 192.168.10.x
and the 10.10.0.x networks. These requests associate the virtual IP address of
192.168.10.100 with the MAC address that correspond to the unique IP address of
192.168.10.2, and the virtual IP address of 10.10.0.100 with the MAC address that
correspond to the unique IP address of 10.10.0.2.
14 Traffic sent to the cluster is now received by the new pivot, and processed by the
local machine (as it is currently the only active machine in the cluster).
15 When the first machine recovers, it re-assumes the role of pivot, by associating the
cluster IP addresses with its own unique MAC addresses.
44
New High Availability Mode
handled according to their last known state. Upon the recovery of a member with a
higher priority, the role of the active machine may or may not be switched back to that
member, depending on the user's configuration.
It is important to note that the cluster may encounter problems in standby machines as
well. In this case, these machines are not considered for the role of active members, in
the event of a failover.
See Figure 4-1, Example ClusterXL Topology, on page 38 for an example of a typical
ClusterXL configuration.
Example
This scenario describes a user logging from the Internet to a web server behind the
Firewall cluster.
1 The user requests a connection from 192.168.10.78 (his computer) to 10.10.0.34
(the web server).
2 A router on the 192.168.10.x network recognizes 192.168.10.100 (the cluster's
virtual IP address) as the gateway to the 10.10.0.x network.
3 The router issues an ARP request to 192.168.10.100.
4 The active member intercepts the ARP request, and responds with the MAC
address that corresponds to its own unique IP address of 192.168.10.1.
5 When the web server responds to the user requests, it recognizes 10.10.0.100 as its
gateway to the Internet.
6 The web server issues an ARP request to 10.10.0.100.
7 The active member intercepts the ARP request, and responds with the MAC
address that corresponds to its own unique IP address of 10.10.0.1.
8 All traffic between the user and the web server is now routed through the active
member.
9 When a failover occurs, the standby member concludes that it should now replace
the faulty active member.
10 The stand-by member sends gratuitous ARP requests to both the 192.168.10.x and
the 10.10.0.x networks. These requests associate the virtual IP address of
192.168.10.100 with the MAC address that correspond to the unique IP address of
192.168.10.2, and the virtual IP address of 10.10.0.100 with the MAC address that
correspond to the unique IP address of 10.10.0.2.
11 The stand-by member has now switched to the role of the active member, and all
traffic directed through the cluster is routed through this machine
12 The former active member is now considered to be down, waiting to recover
from whatever problem that had caused the failover event
46
What is a Failover?
Failover
In This Section
What is a Failover?
A failover occurs when a Gateway is no longer able to perform its designated functions.
When this happens another Gateway in the cluster assumes the failed Gateways
responsibilities.
In a Load Sharing configuration, if one VPN-1 Pro Gateway in a cluster of Gateways
goes down, its connections are distributed among the remaining Gateways. All gateways
in a Load Sharing configuration are synchronized, so no connections are interrupted.
In a High Availability configuration, if one Gateway in a synchronized cluster goes
down, another Gateway becomes active and takes over the connections of the failed
Gateway. If you do not use State Synchronization, existing connections are closed when
failover occurs, although new connections can be opened.
To tell each cluster member that the other gateways are alive and functioning, the
ClusterXL Cluster Control Protocol maintains a heart beat between cluster members. If
a certain predetermined time has elapsed and no message is received from a cluster
member, it is assumed that the cluster member is down and a failover occurs. At this
point another cluster member automatically assumes the responsibilities of the failed
cluster member.
It should be noted that a cluster machine may still be operational but if any of the above
checks fail in the cluster, then the faulty member initiates the failover because it has
determined that it can no longer function as a cluster member.
Note that more than one cluster member may encounter a problem that will result in a
failover event. In cases where all cluster members encounter such problems, ClusterXL
will try to choose a single member to continue operating. The state of the chosen
member will be reported as Active Attention. This situation lasts until another member
fully recovers. For example, if a cross cable connecting the cluster members
malfunctions, both members will detect an interface problem. One of them will change
to the Down state, and the other to Active Attention.
48
High Availability or Load Sharing
on the SmartCenter Server. If the policy on the SmartCenter Server is more up to date
than the one on the cluster member, the policy on the SmartCenter Server will be
retrieved. If the cluster member does not have a local policy, it retrieves one from the
SmartCenter Server. This ensures that all cluster members use the same policy at any
given moment.
IP Address Migration
If you wish to provide High Availability or Load Sharing to an existing single gateway
configuration, it is recommended to take the existing IP addresses from the current
gateway, and make these the cluster addresses (cluster virtual addresses), when feasible.
Doing so will avoid altering current IPSec endpoint identities, as well keep Hide NAT
configurations the same in many cases.
In This Section
Hardware Requirements for HA New and Load Sharing Unicast Modes page 50
Hardware Requirements for Load Sharing Multicast Mode page 52
50
ClusterXL Hardware Requirements
TABLE 4-2 Switch Setting for High Availability New Mode and Load Sharing
52
ClusterXL Hardware Compatibility
Routers
Cisco 7200 Series
Cisco 1600, 2600, 3600 Series
Routing Switch
Extreme Networks Blackdiamond (Disable IGMP snooping)
Extreme Networks Alpine 3800 Series (Disable IGMP snooping)
Foundry Network Bigiron 4000 Series
Nortel Networks Passport 8600 Series
Cisco Catalyst 6500 Series (Disable IGMP snooping, Configure Multicast MAC
manually)
Switches
Cisco Catalyst 2900, 3500 Series
Nortel BayStack 450
Alteon 180e
Dell PowerConnect 3248 and PowerConnect 5224
Determining the MAC addresses which needs to be set is done by using the following
procedure:
On a network that has a cluster IP address of x.y.z.w :
If y<=127, the multicast MAC address would be 01:00:5e:y:z:w. For example:
01:00:5e:5A:0A:64 for 192.90.10.100
Determining the MAC address is done using the procedure described in Defining static
cam entries.
Determining the MAC address is done using the procedure described in Defining static
cam entries.
54
Operating System Compatibility
Notes
TABLE 4-7 Products and features that are not fully supported with ClusterXL
1 Since it requires per-packet state tracking, this feature cannot be guaranteed when a
session starts on one cluster member and fails over to another.
2 Application Intelligence protocol inspection includes the general HTTP worm
catcher, configuration of Optimized Protocol Enforcement, and Microsoft networks
inspection.
3 Application Intelligence protocol inspection is supported when connections
maintain unidirectional stickiness. Unidirectional stickiness means that packets in
the client-to-server direction are handled by one cluster member, while packets in
the server-to-client direction are handled by another cluster member. OPSEC
cluster solutions must maintain at least unidirectional stickiness for all connections
in order to qualify as OPSEC clusters.
Failover can break unidirectional stickiness for certain connections, and in that case,
VPN-1 Pro will proactively reset these connections.
4 Supported when connections maintain bidirectional stickiness. Bidirectional
stickiness is the situation where all packets of a connection, regardless of whether
they are in the client-to-server direction or the server-to-client direction, are
processed by a single cluster member.
56
Check Point Software Compatibility (excluding SmartDefense)
5 Supported with bandwidth limits and guarantees that are manually divided between
the members. With a 1.5 Mbps connection, and a three-member cluster, each
member would have a bandwidth of 500 Kbps, and limits of 1/3 of the total. If a
cluster member fails, the total bandwidth will not be automatically re-allocated
among the remaining members.
6 Using OPSEC partners platform.
7 Use SecureClient NG FP3 and above.
8 Configuration instructions for ACE server in Cluster environment:
High Availability: To support failover scenarios, manually copy the secured file,
which is created after the first authentication with the ACE server, from the
initiating member to all other members.
Load Sharing:
Every cluster member should be defined separately on the server with its unique
IP address.
Add the following entry to the tables.def file on the SmartCenter Server:
no_hide_services_ports = {.., <5500, 17> };
This forces the connection from the cluster members to the ACE server to go
out with the members IP address and not the Cluster address. Make sure the IP
addresses of the cluster members are routable from the ACE server box, and then
install the Security Policy.
In some cases the agent libraries (client side) will use the wrong interface IP
address in the decryption, and the authentication will fail. To overcome this
problem, place a new text file sdopts.rec in the same directory as the
dconf.rec file, with the following line
CLIENT_IP=x.x.x.x
where x.x.x.x is the primary IP address, as defined on the server. This is the IP
address of the interface to which the server is routed.
9 Works as two single gateways. SAM commands executed while a cluster is down are
not enforced on this member.
10 In a High Availability configuration, client authentication Wait mode is not reliable.
Use other client authentication modes instead.
11 The ipassignment.conf file must be copied manually.
12 Performance Pack on Solaris with VLANs is not supported.
13 ISP Redundancy is not supported if cluster addresses are configured on different
subnets.
14 ISP redundancy works with ClusterXL in Load Sharing Unicast mode only if
SecureXL is enabled.
15 Not supported in Legacy Mode.
16 For SecureXL hardware-based acceleration support consult the third party vendor.
17 Sticky Decision Function must be disabled.
18 If the VPN peer device supports only one Security Association (SA), the Sticky
Decision Function must be enabled. Examples for such peers are Access VPN with
Microsoft IPSec (L2TP), and Cisco VPN routers.
Notes
1 If there is a failover when fragments are being received, the packet will be lost.
2 Does not survive failover.
3 Requires unidirectional stickiness. This means that the same member must receive
all external packets, and the same member must receive all internal packets, but the
same member does not have to receive both internal and external packets.
4 Requires bidirectional connection stickiness.
5 Uses the forwarding layer, described in the next section.
Forwarding Layer
The Forwarding Layer is a ClusterXL mechanism that allows a cluster member to pass
packets to other members, after they have been locally inspected by the Firewall. This
feature allows connections to be opened from a cluster member to an external host.
58
Forwarding Layer
Packets originated by cluster members are hidden behind the cluster's virtual IP. Thus,
a reply from an external host is sent to the cluster, and not directly to the source
member. This can pose problems in the following situations:
The cluster is working in New High Availability mode, and the connection is
opened from the stand-by machine. All packets from the external host are handled
by the active machine, instead.
The cluster is working in a Load Sharing mode, and the decision function has
selected another member to handle this connection. This can happen since packets
directed at a cluster IP are distributed among cluster members as with any other
connection.
If a member decides, upon the completion of the Firewall inspection process, that a
packet is intended for another cluster member, it can use the Forwarding Layer to hand
the packet over to that destination. This is done by sending the packet over a secured
network (any subnet designated as a Synchronization network) directly to that member.
It is important to use secured networks only, as encrypted packets are decrypted during
the inspection process, and are forwarded as clear-text (unencrypted) data.
Packets sent on the Forwarding Layer use a special source MAC address to inform the
receiving member that they have already been inspected by another Firewall module.
Thus, the receiving member can safely hand over these packets to the local Operating
System, without further inspection. This process is secure, as Synchronization Networks
should always be isolated from any other network (using a dedicated network).
Configuring ClusterXL
In This Section
This procedure describes how to configure the Load Sharing Multicast, Load Sharing
Unicast, and High Availability New Modes modes from scratch. Their configuration is
identical, apart from the mode selection in SmartDashboard Gateway Cluster object or
Gateway Cluster creation wizard. FIGURE 4-2 is used to illustrate the configuration
steps.
Note - To configure High Availability Legacy Mode, see High Availability Legacy Mode on
page 141
60
Choosing the CCP Transport Mode on the Cluster Members
on Member_B configure the Int Interface with address 10.10.0.2, the Ext
interface with address 192.168.10.2, and the SYNC interface with address
10.0.10.2
10 For a VPN cluster to properly function, the cluster member clocks must be
accurately synchronized to within a second of each other. On cluster members that
are constantly up and running it is usually enough to set the time once. More
reliable synchronization can be achieved using NTP or some other time
synchronization services supplied by the operating system. The cluster member
clocks are not relevant for any other (non VPN) cluster capability.
11 Connect the cluster network machines, via the switches. For the Synchronization
interfaces, use a cross cable, or a dedicated switch. Make sure that each network
(internal, external, Synchronization, DMZ, and so on) is configured on a separate
VLAN, switch or hub.
Note - It is possible to run synchronization across a WAN. For details, see Synchronizing
Clusters over a Wide Area Network on page 24.
If you do not make this selection during installation, you can use the Check Point
Configuration Tool at any time. Run the cpconfig utility from the command line,
and select the option to turn on cluster capabilities on the module. Note that on
some platforms you may be asked to reboot.
SmartDashboard Configuration
FIGURE 4-2 relates the physical cluster topology to the required SmartDashboard
configuration.
When configuring a ClusterXL cluster in SmartDashboard, you use the Cluster object
Topology page to configure the topology for both cluster and cluster member. The cluster
IP addresses are virtual, in other words, they do not belong to any physical interface.
One (or more) interfaces of each cluster member will be in the synchronization
network.
FIGURE 4-2 Example ClusterXL topology and configuration
To define a new Gateway Cluster object, right click the Network Objects tree, and
choose New Check Point > Gateway Cluster. Configuration of the Gateway Cluster
Object can be performed using
Simple Mode (Wizard) which guides you step by step through the configuration
process. See the online help for further assistance.
Classic Mode, described below.
62
SmartDashboard Configuration
In the Network Objective column, define the purpose of the network by choose
one of the options from the drop-down list (Cluster, 1st Sync., etc.). The
options are explained in the Online Help. To define a new network, click Add
Network.
The Edit Topology window for the example in FIGURE 4-2 on page 62 is as
follows
64
SmartDashboard Configuration
7 Still in the Topology page, define the topology for each virtual cluster interface. In
a virtual cluster interface cell, right click and select Edit Interface. The Interface
Properties window opens.
In the General tab, Name the virtual interface, and define an IP Address (in
FIGURE 4-2, 192.168.10.100 is one of the virtual interfaces).
In the Topology tab, define whether the interface is internal or external, and set
up anti-spoofing.
In the Member Networks tab, define the member network and its netmask if
necessary. This advanced option is explained in Configuring Cluster Addresses
on Different Subnets on page 127.
8 Define the other pages in the cluster object as required (NAT, VPN, Remote Access,
and so on).
9 Install the Security Policy on the cluster.
66
CHAPTER 5
In This Chapter
67
Configuring OPSEC Certified Clustering Products
OPSEC certified clustering products use the VPN-1 Pro state synchronization
mechanism (described in Chapter 2, Synchronizing Connection Information Across
the Cluster) to exchange and update connection information and other states between
cluster members.
This guide provides general guidelines for working with OPSEC certified clustering
products. Configuration details vary for each clustering product. You are therefore
urged to follow the instructions supplied with the OPSEC product.
Note - It is possible to run synchronization across a WAN. For details, see Synchronizing
Clusters over a Wide Area Network on page 24.
3 For Nokia clusters, configure VRRP or IP Clustering before installing VPN-1 Pro.
For other OPSEC certified clusters, follow the vendor recommendations.
After the installation has finished, make sure that the option Enable VPN-1/FW-1
monitoring is set to Enable in the Nokia configuration manager. This assures that
IPSO will monitor changes in the status of the firewall. For VRRP and IP
Clustering in IPSO 3.8.2 and above, the state of the firewall is reported to the
Nokia cluster for failover purposes.
4 Install VPN-1 Pro on all cluster members. During the configuration phase (or later,
using the cpconfig Configuration Tool):
Install a license for VPN-1 Pro on each cluster member. No special license is
required to allow the OPSEC certified product to work with VPN-1 Pro.
68
SmartDashboard Configuration for OPSEC Clusters
5 The Topology page is used to define the virtual cluster IP addresses and cluster
member addresses.
For each cluster member, define the interfaces for the individual members .
For OPSEC certified products, the configuration of virtual cluster IPs is mandatory
in several products, while in others it is forbidden. Refer to your cluster product
documentation for details.
Define the synchronization networks. Depending on the OPSEC implementation,
it might be possible to get the synchronization network from the OPSEC
configuration if it is already defined. Refer to the OPSEC documentation to find
out if this feature is implemented for a specific OPSEC product.
6 Now go back to the 3rd Party Configuration page.
A non-sticky connection is one in which packets from client to server and from
server to client pass through different cluster members. Non-sticky connections are
a problem because they can lead to out-of-state packets being received by the
cluster member. VPN-1 Pro will reject out-of-state packets, even if they belong to
a valid connection.
Either the synchronization mechanism, or the OPSEC certified clustering product
need to be able identify valid non-sticky connections, so that VPN-1 Pro will allow
those connections through the cluster.
Find out whether or not the OPSEC certified clustering product can identify valid
non-sticky connections.
If the clustering product cannot identify valid non-sticky connections, the
synchronization mechanism can do so instead. In that case, check Support
non-sticky connections.
If the clustering product can identify valid non-sticky connections, the
synchronization mechanism does not have to take care of this. In that case,
uncheck Support non-sticky connections. Usually it is safe to uncheck this option
in High Availability solutions (not in Load Sharing). Unchecking this option
will lead to a slight improvement in the connection establishment rate.
If the Hide Cluster Members outgoing traffic behind the Clusters IP Address
option is checked, Support non-sticky connections should also be checked to
support outgoing connections from a standby machine (unless specifically
directed by OPSEC certified clustering product guide).
70
SmartDashboard Configuration for OPSEC Clusters
7 Many gateway clusters have a virtual cluster IP address that is defined in Topology
page of the cluster object, in addition to physical cluster member interface
addresses. The use of virtual cluster IP addresses affects the settings in the 3rd Party
Configuration page.
When a client behind the cluster establishes an outgoing connection towards the
Internet, the source address in the outgoing packets, is usually the physical IP
address of the cluster member interface. If virtual cluster IP addresses are used, the
clustering product usually changes the source IP address (using NAT) to that of the
external virtual IP address of the cluster.
This corresponds to the default setting of Hide Cluster Members outgoing traffic
behind the Clusters IP address being checked.
This section describes the behavior of specific command lines in OPSEC clusters.
Note - For details of the cpha command lines see Monitoring and Troubleshooting Gateway
Clusters on page 75.
72
The cphaprob Command in OPSEC Clusters
To produce a usage printout for cphaprob that shows all the available commands, type
cphaprob at the command line and press Enter. The meaning of each of these
commands is explained in the following sections.
cphaprob -d <device> -t <timeout(sec)> -s <ok|init|problem> [-p] register
cphaprob -f <file> register
cphaprob -d <device> [-p] unregister
cphaprob -d <device> -s <ok|init|problem> report
cphaprob [-i[a]] [-e] list
cphaprob state
cphaprob [-a] if
cphaprob state: When running this command the machine state is only Check Point
status and is not really a machine status. The command only monitors full sync success,
and if a policy was successfully installed. For IP clustering, the state is accurate and also
includes the status of the Nokia Cluster. For VRRP, the status is accurate for a firewall,
but it does not correctly reflect the status of the Nokia machine (for example, it does
not detect interface failure).
cphaprob [-a] if: Shows only the relevant information - interface name, if it is a
sync interface or not. Multicast/Broadcast refers to the cluster control protocol and
is relevant only for the sync interface. Note that the status of the interface is not printed
since it is not monitored. (This also applies in the Nokia machine.)
74
CHAPTER 6
Monitoring and
Troubleshooting
Gateway Clusters
In This Chapter
75
How to Verify the Cluster is Working Properly (cphaprob)
76
Monitoring Cluster Status (cphaprob state)
Do this after setting up the cluster, and whenever you want to monitor the cluster
status. The following is an example of the output of cphaprob state:
cphaprob state
Forwarding Is this
State Meaning packets? state a
Problem?
Active Everything is OK. Yes No
Active A problem has been detected, but the cluster member Yes Yes
attention
is still forwarding packets because it is the only
machine in the cluster or there is no other active
machines in the cluster. In any other situation the state
of the machine would be down.
Down One of the critical devices is down. No Yes
Ready Can occur in following scenarios: No No
1 When a cluster is upgraded from one version of
VPN-1 Pro to another, and the cluster members
have different versions of VPN-1 Pro, the members
with a new version have the ready state and the
members with the previous version have the active
state.
2 Before a cluster member becomes active, it sends a
message to the rest of the cluster, and then expects
to receive confirmations from the other cluster
members agreeing that it will become active. In the
period of time before it receives the confirmations,
the machine is in the ready state.
Standby Applies only to a High Availability configuration, and No No
means the member is waiting for an active machine to
fail in order to start packet forwarding.
Initializing An initial and transient state of the cluster member. No No
The cluster member is booting up, and ClusterXL
product is already running, but VPN-1 Pro is not yet
ready.
Local machine cannot hear anything coming from this Dont know Yes
cluster member.
To see the state of the cluster member interfaces and the virtual cluster interfaces, run
78
Monitoring Cluster Interfaces (cphaprob [-a] if)
The output of this command must be identical to the configuration in the cluster object
Topology page. For example:
cphaprob -a if
Required interfaces: 4
Required secured interfaces: 1
The interfaces are ClusterXL critical devices. ClusterXL checks the number of good
interfaces and sets a value of Required interfaces to the maximum number of good
interfaces seen since the last reboot. If the number of good interfaces is less than the
Required number, ClusterXL initiates failover. The same for secured interfaces, where only
the good synchronization interfaces are counted.
An interface can be:
Non-secured or Secured. A secured interface is a synchronization interface.
Shared or unique. A shared interface applies only to High Availability Legacy mode.
Multicast or broadcast. The Cluster Control Protocol (CCP) mode used in the cluster.
CCP can be changed to use broadcast instead. To toggle between these two modes
use the command cphaconf set_ccp <broadcast|multicast>
For third-party clustering products, except in the case of Nokia IP Clustering,
cphaprob -a if should always show virtual cluster IP addresses.
When an interface is DOWN, it means that the interface can neither receive or transmit
CCP packets.This may happen when an interface is malfunctioning, is connected to an
incorrect subnet, is unable to pick up Multicast Ethernet packets and so on. The
interface may also be able to receive but not transmit CCP packets, in which case the
status field is read. The displayed time is the number of seconds that have elapsed since
the interface was last able to receive/transmit a CCP packet.
See Defining Disconnected Interfaces on page 125 for additional information.
There are a number of built-in critical devices, and the administrator can define
additional critical devices. The default critical devices are:
The cluster interfaces on the cluster members.
Synchronization full synchronization completed successfully.
Filter the Security Policy, and whether it is loaded.
cphad which follows the ClusterXL process called cphamcset.
fwd the VPN-1 Pro daemon.
For Nokia IP Clustering, the output is the same as for ClusterXL Load Sharing. For
other third-party products, this command produces no output. The following example
output shows that the fwd process is down:
cphaprob list
Built-in Devices:
Registered Devices:
80
Registering a Critical Device (cphaprob -d ... register)
It is possible to add a user defined critical device to the default list of critical devices.
Use this command to register <device> as a critical process, and add it to the list of
devices that must be running for the cluster member to be considered active. If
<device> fails, then the cluster member is considered to have failed.
If <device> fails to contact the cluster member in <timeout> seconds, <device> will
be considered to have failed. For no timeout, use the value 0.
Define the status of the <device> that will be reported to ClusterXL upon registration.
This initial status can be one of:
ok <device> is alive.
init <device> is initializing. The machine is down. This state prevents the
machine from becoming active.
problem <device> has failed.
[-p] makes these changes permanent. After performing a reboot or after removing the
VPN-1 Pro kernel module (on Linux or IPSO for example) and re-attaching it, the
status of critical devices that were registered with this flag will be saved.
Register all the user defined critical devices listed in <file>. <file> must be an ASCII
file, with each device on a separate line. Each line must list three parameters, which
must be separated by at least a space or a tab, as follows:
<device> <timeout> <status>
<device> The name of the critical device. It must have no more than 15
characters, and must not include white spaces.
<timeout> If <device> fails to contact the cluster member in <timeout>
seconds, <device> will be considered to have failed. For no timeout, use the value
0.
<status> can be one of
ok <device> is alive.
init <device> is initializing. The machine is down. This state prevents the
machine from becoming active.
problem <device> has failed.
Unregister a user defined <device> as a critical process. This means that this device is
no longer considered critical. If a critical device (and hence a cluster member) was
registered as problem before running this command, then after running this
command the status of the cluster will depend only on the remaining critical devices.
[-p] makes these changes permanent. This means that after performing a reboot or after
removing the kernel (on Linux or IPSO for example) and re-attaching it, these critical
devices remain unregistered.
Use this command to report the status of a user defined critical device to ClusterXL.
<device> is the device that must be running for the cluster member to be considered
active. If <device> fails, then the cluster member is considered to have failed.
The status to be reported. The status can be one of:
ok <device> is alive
init <device> is initializing. The machine is down. This state prevents the machine
from becoming active.
problem <device> has failed. If this status is reported to ClusterXL, the cluster
member will immediately failover to another cluster member.
If <device> fails to contact the cluster member within the timeout that was defined
when the it was registered, <device> and hence the cluster member, will be considered
to have failed. This is true only for critical devices with timeouts. If a critical device is
registered with the -t 0 parameter, there will be no timeout, and until the device
reports otherwise, the status is considered to be the last reported status.
82
SmartView Monitor
SmartView Monitor
SmartView Monitor displays a snapshot of all ClusterXL cluster members in the
enterprise, enabling real-time monitoring and alerting. For each cluster member, state
change and critical device problem notifications are displayed. SmartView Monitor
allows you to specify the action to be taken if the status of a cluster member changes.
For example, VPN-1 Pro can issue an alert notifying you of suspicious activity.
Note - SmartView Monitor does not initiate full synchronization, so that some connections
may be lost. To initiate full synchronization, perform cpstart, or start the cluster member
using the cphaprob command.
SmartView Tracker
Every change in status of a cluster member is recorded in SmartView Tracker according
to the choice in the Fail-Over Tracking option of the cluster object ClusterXL page.
1 Square brackets are used to indicate place holders, which are substituted by relevant
data when an actual log message is issued (for example, [NUMBER] will be
replaced by a numeric value).
2 Angle brackets are used to indicate alternatives, one of which will be used in actual
log messages. The different alternatives are separated with a vertical line (for
example, <up|down> indicates that either up or down will be used).
3 The following place holders are frequently used:
ID: A unique cluster member identifier, starting from 1. This corresponds to
the order in which members are sorted in the cluster object's GUI.
IP: Any unique IP address that belongs to the member.
MODE: The cluster mode (for example, New HA, LS Multicast, and so on).
STATE: The state of the member (for example, active, down, standby).
DEVICE: The name of a pnote device (for example, fwd, Interface Active
Check).
General logs
Starting <ClusterXL|State Synchronization>.
Indicates that ClusterXL (or State Synchronization, for 3rd party clusters) was
successfully started on the reporting member. This message is usually issued after a
member boots, or after an explicit call to cphastart.
Stopping <ClusterXL|State Synchronization>.
Informs that ClusterXL (or State Synchronization) was deactivated on this machine.
The machine will no longer be a part of the cluster (even if configured to be so), until
ClusterXL is restarted.
Unconfigured cluster Machines changed their MAC Addresses. Please reboot
the cluster so that the changes take affect.
This message is usually issued when a machine is shut down, or after an explicit call to
cphastop.
State logs
Mode inconsistency detected: member [ID] ([IP]) will change its mode to
[MODE]. Please re-install the security policy on the cluster.
84
SmartView Tracker
This message should rarely happen. It indicates that another cluster member has
reported a different cluster mode than is known to the local member. This is usually the
result of a failure to install the security policy on all cluster members. To correct this
problem, install the Security Policy again.
Note - The cluster will continue to operate after a mode inconsistency has been detected, by
altering the mode of the reporting machine to match the other cluster members. However, it
is highly recommended that the policy will be re-installed as soon as possible.
State change of member [ID] ([IP]) from [STATE] to [STATE] was cancelled,
since all other members are down. Member remains [STATE].
When a member needs to change its state (for example, when an active member
encounters a problem and needs to bring itself down), it first queries the other
members for their state. If all other members are down, this member cannot change its
state to a non-active one (or else all members will be down, and the cluster will not
function). Thus, the reporting member continues to function, despite its problem (and
will usually report its state as active attention).
member [ID] ([IP]) <is active|is down|is stand-by|is initializing>
([REASON]).
This message is issued whenever a cluster member changes its state. The log text
specifies the new state of the member.
Pnote logs
PNote log messages are issued when a pnote device changes its state.
[DEVICE] on member [ID] ([IP]) status OK ([REASON]).
The pnote device is working normally.
[DEVICE] on member [ID] ([IP]) detected a problem ([REASON]).
Either an error was detected by the pnote device, or the device has not reported its
state for a number of seconds (as set by the timeout option of the pnote)
[DEVICE] on member [ID] ([IP]) is initializing ([REASON]).
Indicates that the device has registered itself with the pnote mechanism, but has not
yet determined its state.
[DEVICE] on member [ID] ([IP]) is in an unknown state ([STATE ID])
([REASON]).
This message should not normally appear. Contact Check Point Support.
Interface logs
interface [INTERFACE NAME] of member [ID] ([IP]) is up.
Indicates that this interface is working normally, meaning that it is able to receive
and transmit packets on the expected subnet.
interface [INTERFACE NAME] of member [ID] ([IP]) is down (receive
<up|down>, transmit <up|down>).
SecureXL logs
SecureXL device was deactivated since it does not support CPLS.
This message is the result of an attempt to configure a ClusterXL in Load Sharing
Multicast mode over VPN-1 Pro modules using an acceleration device that does
not support Load Sharing. As a result, acceleration will be turned off, but the
cluster will work in Check Point Load Sharing mode (CPLS).
Reason Strings
member [ID] ([IP]) reports more interfaces up.
This text can be included in a pnote log message describing the reasons for a
problem report: Another member has more interfaces reported to be working, than
the local member does. This means that the local member has a faulty interface, and
that its counterpart can do a better job as a cluster member. The local member will
therefore go down, leaving the member specified in the message to handle traffic.
member [ID] ([IP]) has more interfaces - check your disconnected
interfaces configuration in the <discntd.if file|registry>.
This message is issued when members in the same cluster have a different number
of interfaces. A member having less interfaces than the maximal number in the
cluster (the reporting member) may not be working properly, as it is missing an
interface required to operate against a cluster IP address, or a synchronization
network. If some of the interfaces on the other cluster member are redundant, and
should not be monitored by ClusterXL, they should be explicitly designated as
Disconnected. This is done using the file $FWDIR/conf/discntd.if (under Unix
systems), or the Windows Registry.
[NUMBER] interfaces required, only [NUMBER] up.
86
The cphaconf Command
ClusterXL has detected a problem with one or more of the monitored interfaces.
This does not necessarily mean that the member will go down, as the other
members may have less operational interfaces. In such a condition, the member
with the highest number of operational interfaces will remain up, while the others
will go down.
The state of a cluster member can be manually controlled in order to take down the
cluster member. This initiates failover to the other cluster member(s), in the case of
Load Sharing, or failover to the next highest priority cluster member in the case of
High Availability.
Note - Starting the Cluster member from SmartView Monitor does not initiate full
synchronization, so that some connections may be lost. To initiate full synchronization,
perform cpstart.
88
Starting the Cluster Member
The output of this command is a long list of statistics for the VPN-1 Pro Gateway. At
the end of the list there is a section called Synchronization that applies per Gateway
Cluster member. Many of the statistics are counters that can only increase. A typical
output is as follows:
Version: new
Status: Able to Send/Receive sync packets
Sync packets sent:
total : 3976, retransmitted : 0, retrans reqs : 58, acks : 97
Sync packets received:
total : 4290, were queued : 58, dropped by net : 47
retrans reqs : 0, received 0 acks
retrans reqs for illegal seq : 0
Callback statistics: handled 3 cb, average delay : 1, max delay : 2
Delta Sync memory usage: currently using XX KB mem
Callback statistics: handled 322 cb, average delay : 2, max delay : 8
Number of Pending packets currently held: 1
Packets released due to timeout: 18
This line must appear if synchronization is configured. It indicates that new sync is
working (as opposed to old sync from version 4.1).
If sync is unable to either send or receive packets, there is a problem. Sync may be
temporarily unable to send or receive packets during boot, but this should not happen
during normal operation. When performing full sync, sync packet reception may be
interrupted.
The total number of sync packets sent is shown. Note that the total number of sync
packets is non-zero and increasing.
The cluster member sends a retransmission request when a sync packet is received out of
order. This number may increase when under load.
Acks are the acknowledgements sent for received sync packets, when an
acknowledgement was requested by another cluster member.
The total number of sync packets received is shown. The queued packets figure
increases when a sync packet is received that complies with one of the following
conditions:
1 The sync packet is received with a sequence number that does not follow the
previously processed sync packet.
2 The sync packet is fragmented. This is done to solve MTU restrictions.
This figure never decreases. A non-zero value does not indicate a problem.
The dropped by net number may indicate network congestion. This number may
increase slowly under load. If this number increases too fast, a networking error may
interfere with the sync protocol. In that case, check the network.
This message refers to the number of received retransmission requests, in contrast to the
transmitted retransmission requests in the section above. When this number grows very
fast, it may indicate that the load on the machine is becoming too high for sync to
handle.
Acks refer to the number of acknowledgements received for the cb request sync
packets, which are sync packets with requests for acknowledgments.
Retrans reqs for illegal seq displays the number of retransmission requests for
packets which are no longer in this members possession. This may indicate a sync
problem.
Callback statistics relate to received packets that involve Flush and Ack. This
statistic only appears for a non-zero value.
90
Starting the Cluster Member
The callback average delay is how much the packet was delayed in this member until
it was released when the member received an ACK from all the other members.The
delay happens because packets are held until all other cluster members have
acknowledged reception of that sync packet.
This figure is measured in terms of numbers of packets. Normally this number should
be small (~1-5). Larger numbers may indicate an overload of sync traffic, which causes
connections that require sync acknowledgements to suffer slight latency.
In a heavily loaded system, the cluster member may drop synchronization updates sent
from another cluster member.
Delta Sync memory usage only appears for a non-zero value. Delta sync requires
requires memory only while full sync is occurring. Full sync happens when the system
goes up- after reboot for example. At other times, Delta sync requires no memory
because Delta sync updates are applied immediately. For information about Delta sync
see How State Synchronization Works on page 19.
Number of Pending packets currently held only appears for a non-zero value.
ClusterXL prevents out-of-state packets in non-sticky connections. It does this by
holding packets until a SYN-ACK is received from all other active cluster members. If
for some reason a SYN-ACK is not received, VPN-1 Pro on the cluster member will
not release the packet, and the connection will not be established.
Packets released due to timeout only appears for a non-zero value. If the Number
of Pending Packets is large (more than 100 pending packets), and the number of
Packets released due to timeout is small, you should take action to reduce the
number of pending packets. To tackle this problem, see Reducing the Number of
Pending Packets on page 123.
92
Output of the cphaprob [-reset] syncstat command
If its value is unreasonably high (more than 30% of the Total Generated Updates of other
members), contact Technical Support equipped with the entire output and a detailed
description of the network topology and configuration.
Tip - If this value is unreasonably high, contact Technical Support, equipped with the entire
output and a detailed description of the network topology and configuration.
94
Output of the cphaprob [-reset] syncstat command
Tip - See Enlarging the Receiving Queue on page 101 If this value is unreasonably high
(more than 10% of the total updates sent), contact Technical Support, equipped with the
entire output and a detailed description of the network topology and configuration.
Tip - To decrease the number of lost updates, expand the capacity of the Receiving Queue.
See Enlarging the Receiving Queue on page 101
Tip - Allow the sync mechanism to handle large differences in sequence numbers by
expanding the Receiving Queue capacity. See Enlarging the Receiving Queue on page 101
Tip - Try enlarging the Sync Timer (see Enlarging the Sync Timer on page 102). However,
you may well have to contact Technical Support equipped with the entire output and a
detailed description of the network topology and configuration.
Local Updates:
The statistics in this section relate to updates generated by the local cluster member.
Updates inform about changes in the connections handled by the cluster member, and
are sent from and to members. Updates are identified by sequence numbers.
Tip - If this value is unreasonably high (more than 30% of the Total generated updates on
page 96) contact Technical Support, equipped with the entire output and a detailed
description of the network topology and configuration.
Tip - If this value is unreasonably high (more than 30% of the Total generated updates on
page 96) contact Technical Support, equipped with the entire output and a detailed
description of the network topology and configuration.
Blocking Scenarios
Under extremely heavy load conditions, the cluster blocks new connections. This
parameter shows the number of times that the cluster member started blocking new
connections due to sync overload.
The member starts to block connections when its Sending Queue has reached its
capacity threshold. The capacity threshold is calculated as 80% of the difference
between the current sequence number and the sequence number for which the member
received an ACK from all the other operating members.
96
Output of the cphaprob [-reset] syncstat command
A positive value indicates heavy load. In this case, observe the Blocked packets on
page 97 to see how many packets we blocked. Each dropped packet means one blocked
connection.
This parameters is only measured if the Block New Connections mechanism (described in
Blocking New Connections Under Load on page 121) is active. To activate the Block
New Connections mechanism, apply the following command on all the cluster
members:
fw ctl set int fw_sync_block_new_conns 0
Tip - The best way to handle a severe blocking connections problem is to enlarge the
sending queue. See Enlarging the Sending Queue on page 101.
Another possibility is to decrease the timeout after which a member initiates an ACK. See
Reconfiguring the Acknowledgment Timeout on page 103. This updates the sending queue
capacity more accurately, thus making the blocking process more precise.
Blocked packets
The number of packets that were blocked because the cluster member was blocking all
new connections (see Blocking Scenarios on page 96). The number of blocked
packets is usually one packet per new connection attempt.
A value higher than 5% of the Sending Queue see Avg length of sending queue on
page 98) can imply a connectivity problem, or that ACKs are not being sent frequently
enough.
This parameters is only measured if the Block New Connections mechanism (described in
Blocking New Connections Under Load on page 121) is active. To activate the Block
New Connections mechanism, apply the following command on all the cluster
members:
fw ctl set int fw_sync_block_new_conns 0
Tip - The best way to handle a severe blocking connections problem is to enlarge the
sending queue. See Enlarging the Sending Queue on page 101.
Another possibility is to decrease the timeout after which a member initiates an ACK. See
Reconfiguring the Acknowledgment Timeout on page 103. This updates the sending queue
capacity more accurately, thus making the blocking process more precise.
Tip - Enlarge the Sending Queue to value larger than this value. See Enlarging the Sending
Queue on page 101.
Tip - Enlarge the Sending Queue so that this value is not larger than 80% of the new queue
size. See Enlarging the Sending Queue on page 101.
98
Output of the cphaprob [-reset] syncstat command
Tip - Contact Technical Support equipped with the entire output and a detailed description of
the network topology and configuration.
Tip - Contact Technical Support equipped with the entire output and a detailed description of
the network topology and configuration.
It should not be higher than 50 (5 seconds), because of the pending timeout mechanism
which releases held packets after a certain timeout. By default, the release timeout is 50
ticks. A high value indicates connectivity problem between the members.
Tip - Optionally change the default timeout by changing the value of the
fwldbcast_pending_timeout global variable. See How to Configure Module Configuration
Parameters on page 119 and Reducing the Number of Pending Packets on page 123.
Also, examine the parameter Timed out sync connection on page 95 to understand why
packets were held for a long time.
You may also need to contact Technical Support equipped with the entire output and a
detailed description of the network topology and configuration.
Tip - If the value is high, contact Technical Support equipped with the entire output and a
detailed description of the network topology and configuration in order to examine the cause
to the problem.
Timers:
The Sync and CPHA timers perform sync and cluster related actions every fixed
interval.
Queues:
Each cluster member has two queues. The Sending Queue and the Receiving Queue.
100
Synchronization Troubleshooting Options
To enlarge the receiving queue size, change the value of the global parameter
fw_sync_recv_queue_size. See How to Configure Module Configuration
Parameters on page 119. You must also make sure that the required queue size survives
boot. See How to Configure Module Configuration Parameters to Survive a Boot on
page 120.
Enlarging this queue means that the member can save more updates from other
members. However, be aware that each saved update consumes memory. When
changing this variable you should carefully consider the memory implications. Changes
will only take effect after reboot.
102
Synchronization Troubleshooting Options
This section lists the ClusterXL error messages. For other, less common error messages,
see SecureKnowledge solution sk23642 at http://support.checkpoint.com/kb/.
104
SmartView Tracker Active Mode Messages
This can cause some insignificant problems, such as a connection that is being
deleted twice, a link to an existing link, and so forth. It should not affect
connectivity or cause security issues.
Error SEP_IKE_owner_outbound: other cluster member packet in outbound
Cluster in not synchronized. Usually happens in OPSEC certified third-party load
sharing products for which Support non-sticky connections is unchecked in the
cluster object 3rd Party Configuration page. (Or equivalently, in NG FP3 clusters,
where the property use_limited_flushnack is set to false).
FW-1: fwha_pnote_register: too many registering members, cannot
register
The critical device (also known as Problem Notification, or pnote) mechanism can
only store up to 16 different devices. An attempt to configure the 17th device
(either by editing the cphaprob.conf file or by using the cphaprob -d ...
register command) will result in this message.
FW-1: fwha_pnote_register: <NAME> already registered (# <NUMBER>)
Each device registered with the pnote mechanism must have a unique name. This
message may happen when registering new pnote device, and means that the device
<NAME> is already registered as with pnote number <NUMBER>.
FW-1: fwha_pnote_unregister: attempting to unregister an unregistered
device <DEVICE NAME>
Indicates an attempt to unregister a device which is not currently registered.
FW-1: alert_policy_id_mismatch: failed to send a log
A log indicating that there is a different policy id between the two or more
members was not sent. Verify all cluster members have the same policy (using fw
stat). It is recommended to re-install the policy.
FW-1: fwha_receive_fwhap_msg: received incomplete HAP packet (read
<number> bytes)
This message can be received when ClusterXL hears CCP packets of clusters of
version 4.1. In that case it can be safely ignored.
106
TCP Out-of-State Error Messages
TCP packet out of state - first packet isn't SYN tcp_flags: FIN-ACK
TCP packet out of state - first packet isn't SYN tcp_flags:
FIN-PUSH-ACK
These messages occur when a FIN packet is retransmitted after deleting the
connection from the connection table. To solve the problem, in SmartDashboard
Global properties for Stateful Inspection, enlarge the TCP end timeout from 20
seconds to 60 seconds. If necessary, also enlarge the connection table so it won't fill
completely.
SYN packet for established connection
This message occurs when a SYN is received on an established connection, and the
sequence verifier is turned off. The sequence verifier is turned off for a non-sticky
connection in a cluster (or in SecureXL). Some applications close connections with
a RST packet (in order to reuse ports). To solve the problem, enable this behavior
to specific ports or to all ports. For example, run the command:
fw ctl set -1 fw_trust_rst_on_port <port>
Which means that VPN-1 Pro should trust a RST coming from every port, in case
a single port is not enough.
108
Platform Specific Error Messages
New connections/second
Note - These figures were derived for cluster members using the Windows platform, with
Pentium 4 processors running at 2.4 GHz.
For example, if the cluster holds 10,000 connections, and the connection rate is 1000
connections/sec you will need 69 MB for full sync.
Define the maximum amount of memory using the module global parameter:
fw_sync_max_saved_buf_mem.
The units are in megabytes. For details, see Advanced Cluster Configuration using
Module Configuration Parameters on page 119.
110
CHAPTER 7
ClusterXL Advanced
Configuration
In This Chapter
111
Upgrading ClusterXL Clusters
112
How to Define a Cluster Object for a VPN Peer with a Separate Manager
2 In the Topology page, add the external and internal cluster interface addresses of the
VPN peer. Do not use the cluster member interface addresses, except in the
following cases:
If the external cluster is of version 4.1, add the IP addresses of the cluster
member interfaces.
If the cluster is an OPSEC certified product (excluding Nokia), you may need
to add the IP addresses of the cluster members.
When adding cluster member interface IP addresses, in the interface Topology tab,
define the interface as Internal, and the IP Addresses behind this interface as Not
defined.
3 In the VPN Domain section of the page, define the encryption domain of the
externally managed gateway to be behind the internal virtual IP address of the
gateway. If the encryption domain is just one subnet, choose All IP addresses
behind cluster members based on topology information. If the encryption domain
includes more than one subnet, it must be Manually Defined.
When a cluster member establishes an outgoing connection towards the Internet, the
source address in the outgoing packets, is the physical IP address of the cluster member
interface. The source IP address is changed using NAT to that of the external virtual IP
address of the cluster. This address translation is called Cluster Hide.
For OPSEC certified clustering products, this corresponds to the default setting in the
3rd Party Configuration page of the cluster object, of Hide Cluster Members outgoing
traffic behind the Clusters IP address being checked.
114
VLAN Support in ClusterXL
Note - For more details about VLAN support, see the Check Point Enterprise Suite NGX (R60)
Release Notes, available online at: http://www.checkpoint.com/techsupport/downloads.jsp.
Note - ClusterXL does not support VLANS on Windows 2000 or Windows 2003 Server.
When a machine that is outside the cluster wishes to communicate with the cluster, it
sends an ARP query with the cluster (virtual) IP address. The cluster replies to the
ARP request with a multicast MAC address, even though the IP address is a unicast
address.
This destination multicast MAC address of the cluster is based on the unicast IP address
of the cluster. The upper three bytes are 01.00.5E, and they identify a Multicast MAC
in the standard way. The lower three bytes are the same as the lower three bytes of the
IP address. An example MAC address based on the IP address 10.0.10.11 is shown in
FIGURE 7-1.
FIGURE 7-1 The Multicast MAC address of the cluster
116
Connecting Several Clusters on the Same VLAN
When more than one cluster is connected to the same VLAN, the last three bytes of the
IP addresses of the cluster interfaces connected to the VLAN must be different. If they
are the same, then communication from outside the cluster that is intended for one of
the clusters will reach both clusters, which will cause communication problems.
For example, it is OK for the cluster interface of one of the clusters connected to the
VLAN to have the address 10.0.10.11, and the cluster interface of a second cluster to
have the address 10.0.10.12. However, the following addresses for the interfaces of the
first and second clusters will cause complications: 10.0.10.11 and 20.0.10.11.
The best solution is to change to the last three bytes of the IP address of all but one of
the cluster interfaces that share the same last three bytes of their IP address.
If the IP address of the cluster interface cannot be changed, you must change the
automatically assigned multicast MAC address of all but one of the clusters and replace
it with a user-defined multicast MAC address. Proceed as follows:
1 In the ClusterXL page of the cluster object, select Load Sharing>Multicast Mode. In
the Topology tab, edit the cluster interface that is connected to same VLAN as the
another cluster.
2 In the Interface Properties window, General tab, click Advanced.
3 Change the default MAC address, and carefully type the new user defined MAC
address. It must be of the form 01:00:5e:xy:yy:yy where x is between 0 and 7 and y
is between 0 and f(hex).
Cluster members communicate with each other using the Cluster Control Protocol
(CCP). CCP packets are distinguished from ordinary network traffic by giving CCP
packets a unique source MAC address.
The first four bytes of the source MAC address are all zero: 00.00.00.00
The fifth byte of the source MAC address is a magic number. Its value indicates its
purpose
TABLE 7-1
When more than one cluster is connected to the same VLAN, if CCP and forwarding
layer traffic uses multicast, this traffic reaches only the intended cluster.
However, if broadcast is used for CCP and forwarding layer traffic (and in certain other
cases), cluster traffic intended for one cluster is seen by all connected clusters, and is
processed by the wrong cluster, which causes communication problems.
To ensure that the source MAC address in packets from different clusters that are
connected to the same VLAN can be distinguished, change the MAC source address of
the cluster interface that is connected to the VLAN in all but one of the clusters.
Use the following module configuration parameters to set more than one cluster on the
same VLAN. These parameters apply to both ClusterXL and OPSEC certified
clustering products.
TABLE 7-2
Changing the values of these module configuration parameters alters the fifth part of
the source MAC address of Cluster Control Protocol and forwarded packets. Use any
value as long as the two module configuration parameters are different. To avoid
confusion, do not use the value 0x00.
When Performance Pack is used to enhance the performance of ClusterXL Load
Sharing Multicast Mode, the values of fwha_mac_magic and fwha_mac_forward_magic,
it is recommended that the chosen numbers be consecutive, with the lower one being
even (for example 0x10 and 0x11, or 0xBE and 0xBF).
118
How to Configure Module Configuration Parameters
For instruction about how to change these parameters, see How to Configure Module
Configuration Parameters on page 119.
In This Section
Linux/SecurePlatform
1 Edit the file $FWDIR/boot/modules/fwkern.conf.
2 Add the line Parameter=<value in hex>.
3 Reboot.
Solaris
1 Edit the file /etc/system.
2 Add the line set fw:Parameter=<value in hex>.
3 Reboot.
Windows
1 Edit the registry.
2 Add a DWORD value named Parameter under the key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\FW1\Parameters\G
lobals.
3 Reboot.
On Nokia
Run the command
modzap _Parameter $FWDIR/boot/modules/fwmod.o <value in hex>.
120
Controlling the Clustering and Synchronization Timers
Note that blocking new connections when sync is busy is only recommended for
Load Sharing ClusterXL deployments. While it is possible to block new
connections in High Availability mode, doing so does not solve inconsistencies in
sync, as High Availability mode precludes that from happening. This parameter can
be set to survive boot using the mechanism described in How to Configure
Module Configuration Parameters to Survive a Boot on page 120.
fw_sync_buffer_threshold is the maximum percentage of the buffer that may
be filled before new connections are blocked. By default it is set to 80, with a
buffer size of 512. By default, if more than 410 consecutive packets are sent without
getting an ACK on any one of them, new connections are dropped. When blocking
starts, fw_sync_block_new_conns is set to 1. When the situation stabilizes it is set
back to 0.
fw_sync_allowed_protocols is used to determine the type of connections that
can be opened while the system is in a blocking state. Thus, the user can have
better control over the system's behavior in cases of unusual load. The
fw_sync_allowed_protocols variable is a combination of flags, each specifying a
different type of connection. The required value of the variable is the result of
adding the separate values of these flags. For example, the default value of this
variable is 24, which is the sum of TCP_DATA_CONN_ALLOWED (8) and
UDP_DATA_CONN_ALLOWED (16), meaning that the default allows only TCP and UDP
data connections to be opened under load.
ICMP_CONN_ALLOWED 1
TCP_CONN_ALLOWED 2 (except for data connections)
UDP_CONN_ALLOWED 4 (except for data connections)
8 (the control connection should be
TCP_DATA_CONN_ALLOWED established or allowed)
16 (the control connection should be
UDP_DATA_CONN_ALLOWED established or allowed)
122
Reducing the Number of Pending Packets
Active mode view is not recommended on a heavily loaded cluster. To obtain a more
accurate report of Active connections under load, two solutions are available. They
apply both to a cluster and to a single VPN-1 Pro Gateway:
1 Enlarge fwlddist_buf_size
The fwlddist_buf_size parameter controls the size of the synchronization buffer
in words. (Words are used for both synchronization and in SmartView Tracker
Active mode. 1 word equals 4kbytes). The default is 16k words. The maximum
value is 64k words and the minimum value is 2k words.
If changing this parameter, make sure that it survives boot, because the change is
only applied after a reboot. Use the mechanism described in How to Configure
Module Configuration Parameters to Survive a Boot on page 120.
2 Obtain a Hotfix from Technical Support
Obtain a Check Point Technical Support Hotfix. This Hotfix has a variable that
controls the rate at which Active connections are read by fwd on the enforcement
module before being sent to the Management Server. Note that this solution
requires additional CPU resources.
124
Defining a Disconnected Interface on Unix
3 Add the interface name. To obtain the interface system name run the command:
fw getifs
4 Add this name to the list of disconnected interfaces using the following format:
\device\<System Interface Name>
The default value is 1 which should be sufficient for most configurations. For
configurations where the situation described above occurs, setting this parameter to 2
should be sufficient. Do NOT set this parameter to a value larger than 3.
126
Introduction to Cluster Addresses on Different Subnets
Note - This capability is available only for ClusterXL Gateway Clusters. For details about
OPSEC certified clusters, see the vendor documentation.
An important aspect of this is that packets sent from cluster members (as opposed to
packets routed through the members) are hidden behind the cluster IP and MAC
addresses. The cluster MAC is the:
MAC of the active machine, in High Availability New mode.
Multicast MAC, in Load Sharing Multicast mode.
Pivot member MAC in Load Sharing Unicast mode.
This enables the members to communicate with the surrounding networks, but also has
certain limitations, as described in Limitations of Cluster Addresses on Different
Subnets on page 130.
128
Example of Cluster Addresses on Different Subnets
One setting its 192.168.2.x IP address as the gateway for network 172.16.4.0.
2 For each cluster interface, configure the Interface Properties window as follows:
TABLE 7-4 Example ClusterXL Topology > Interface Properties
Note - Do not define Cluster IP addresses for the synchronization interfaces. The
synchronization interfaces are also defined in the Edit Topology page of the Gateway
Cluster object.
Note - Static ARP is not required in order for the machines to work properly as a cluster,
since the cluster synchronization protocol does not rely on ARP.
130
Limitations of Cluster Addresses on Different Subnets
When different subnets are used for the cluster IPs, static ARP entries containing the
router's MAC need to be configured on each of the cluster members. This is done
because this kind of router will not respond to ARP requests containing a multicast
source MAC. These special procedures are not required when using routers that fully
support multicast MAC addresses.
Anti-Spoofing
When the different subnets feature is defined on a non-external interface, the cluster IP
in the Cluster Topology tab should not be defined with the Network defined by
interface IP and Net Mask definition in the Topology tab of the Interface Properties
window of the cluster interface. You must add a group of networks that contain both
the routable network and the non-routable network, and define the Anti-spoofing for
this interface as specific: network with this new group.
In the example shown in FIGURE 7-3 on page 129, suppose side B is the internal
network, you must define a group which contains both 172.16.4.0 and 192.168.2.0
networks, and define the new group in the specific field of the Topology tab.
132
On the Modules
On the Modules
1 Run cpstop on all members (all network connectivity will be lost).
2 Reconfigure the IP addresses on all the cluster members, so that unique IP
addresses are used instead of shared (duplicate) IP addresses.
Note - SecurePlatform only: These address changes delete any existing static routes. Copy
them down for restoration in step 4.
4 SecurePlatform cluster members only: Redefine the static routes deleted in step 2.
5 Reboot the members.
From SmartDashboard
In SmartDashboard, open the cluster object, select the ClusterXL tab, change the cluster
mode from Legacy mode to new mode or to Load sharing mode. Then follow the
Check Point Gateway Cluster Wizard. For manual configuration, proceed as follows:
1 In the Topology tab of the cluster object,
For each cluster member, get the interfaces which have changed since the IP
addresses were changed. The interfaces which were previously shared interfaces
should now be defined as Cluster interfaces.
Define the cluster IP addresses of the cluster. The cluster interfaces' names may
be defined as you wish as they will be bound to physical interfaces according to
the IP addresses.
If the new IP addresses of the cluster members on a specific interface reside on
different subnet than the cluster IP address in this direction, the cluster members'
network should be defined in the Members Network fields of the cluster
interface (Configuring Cluster Addresses on Different Subnets on page 127).
2 Install the policy on the new cluster object (Security policy, QOS policy and so
on).
1. Make sure that you have all the IP addresses needed before you start implementing the
changes described here.
2. Backup your configuration before starting this procedure, because this procedure deletes
and recreates the objects in SmartDashboard.
In this procedure we use the example of machines 'A' and 'B', with the starting point
being that machine 'A' is active, and machine 'B' is on standby.
1 Disconnect machine 'B' from all interfaces except the interface connecting it to the
management (the management interface).
2 Run cphastop on machine 'B'.
3 Change the IP addresses of machine 'B' (as required by the new configuration).
Note - SecurePlatform only: These address changes delete any existing static routes. Copy
them down for restoration in step 5.
7 In the Topology tab of the Cluster Member Properties window, define the topology
of cluster member 'B' by clicking Get.... Make sure to mark the appropriate
interfaces as Cluster Interfaces.
134
From SmartDashboard
8 In the Cluster Object, define the new topology of the cluster (define the cluster
interfaces in the cluster's Topology tab).
9 In the ClusterXL page, change the clusters High Availability mode from Legacy
Mode to New Mode or select Load Sharing mode.
10 Verify that the other pages in the Cluster Object (NAT, VPN, Remote Access and
so on) are correct. In Legacy Check Point High Availability, the definitions were
per cluster member, while now they are on the cluster itself.
11 Install the policy on the cluster, which now only comprises cluster member 'B'.
12 Reconnect machine 'B' (which you disconnected in step 1) to the networks.
13 In this example the cluster comprises only two members, but if the cluster
comprises more then two members, repeat steps 1-9 for each cluster member.
14 For Load Sharing Multicast mode, configure the routers as described in TABLE 4-5
on page 52.
15 Disconnect machine 'A' from the all networks accept the management network.
The cluster stops processing traffic.
16 Run cphastop on machine 'A'.
17 Run cpstop and then cpstart on machine 'B' (if there are more the two machines,
run these commands on all machines except 'A').
18 Machine 'B' now becomes active and starts processing traffic.
19 Change the IP addresses of machine 'A' (as required by the new configuration).
20 Reset the MAC addresses of machine 'A' by executing cphaconf uninstall_macs.
The Windows machine must be rebooted for the MAC address change to take
affect.
21 Reboot the Windows machine for the MAC address change to take affect.
22 In SmartDashboard, open the Cluster Object and select the Cluster Members page.
Click Add > Add Gateway to Cluster and select member 'A' to re-attach it to the
cluster.
23 Reconnect machine 'A' to the networks from which it was disconnected in step 13.
24 Install the security policy on the cluster.
25 Run cpstop and then cpstart on machine 'A'.
26 Redefine static routes
On Machine 'B'
1 Define an interface on machine 'B' for each proposed cluster interface and
synchronization interface on machine 'A', with the same subnet.
2 Install VPN-1 Pro on the machine. During the installation you must enable
ClusterXL.
On Machine 'A'
1 Disconnect all proposed cluster and Synchronization interfaces. New connections
now open through the cluster, instead of through machine 'A'.
2 Change the addresses of these interfaces to some other unique IP address (preferably
on the same subnet as before.).
136
In SmartDashboard for Machine A
3 Connect each pair of interfaces of the same subnet using a dedicated network. Any
hosts or gateways previously connected to the single gateway must now be
connected to both machines, using the hub/VLAN.
Note - It is possible to run synchronization across a WAN. For details, see Synchronizing
Clusters over a Wide Area Network on page 24.
6 If the Cluster Mode is Load Sharing or New HA, ensure that the proper interfaces on
the new cluster member are configured as Cluster Interfaces.
7 Install the security policy on the cluster.
8 The new member is now part of the cluster.
138
Components of the System
Virtual IP Integration
All cluster members use the cluster IP address(es).
Failure Recovery
Dynamic Routing on ClusterXL avoids creating a ripple effect upon failover by
informing the neighboring routers that the router has exited a maintenance mode. The
neighboring routers then reestablish their relationships to the cluster, without informing
the other routers in the network. These restart protocols are widely adopted by all
major networking vendors. The following table lists the RFC and drafts compliant with
Check Point Dynamic Routing:
TABLE 7-5 Compliant Protocols
140
CHAPTER A
In This Appendix
141
Example of High Availability HA Legacy Mode Topology
In This Section
FIGURE A-1 shows an example ClusterXL Topology for High Availability Legacy
mode. The diagram relates the physical cluster topology to the required
SmartDashboard configuration. It shows two cluster members: Member_A (the
primary) and Member_B (the secondary) each with three interfaces. One for
synchronization, one external shared interface, and one internal shared interface.
FIGURE A-1 Example High Availability Legacy Mode Topology
142
Only one cluster member is active at any given time, so that the outside world can see
only the shared interfaces on one machine at any given time.
FIGURE A-1 shows the shared interfaces. The EXT interface, facing the Internet, has
IP address 192.168.0.1 on both Member_A and Member_B. The INT interface facing
the local network has IP address 172.20.10.1 on both Member_A and Member_B.
IP Address Migration
Many ClusterXL installations are intended to provide High Availability or Load Sharing
to an existing single gateway configuration. In those cases, it is recommended to take
the existing IP addresses from the current gateway, and make these the cluster addresses
Chapter A 143
(cluster virtual addresses) when feasible. Doing so will avoid altering current IPSec
endpoint identities, and in many cases will make it unnecessary to change Hide NAT
configurations.
Routing Configuration
Configure routing so that communication with the opposite side of the cluster is via the
cluster IP address on the near side of the cluster.
For example, in FIGURE A-1, configure routing as follows:
On each machine on the internal side of the router, define 172.20.0.1 as the default
gateway.
On external router, configure a static route such that network 172.20.0.1 is reached
via 192.168.10.1.
144
If the connecting switch is incapable of forwarding multicast, CCP can be changed to
use broadcast instead. To toggle between these two modes use the command:
'cphaconf set_ccp broadcast/multicast'
Routing Configuration
6 Configure routing so that communication with the networks on the internal side of
the cluster is via the cluster IP address on the external side of the cluster. For
example, in FIGURE A-1, on the external router, configure a static route such that
network 10.255.255.100 is reached via 192.168.10.100.
7 Configure routing so that communication with the networks on the external side of
the cluster is via the cluster IP address on the internal side of the cluster. For
example, in FIGURE A-1, on each machine on the internal side of the router,
define 10.255.255.100 as the default gateway.
8 Reboot the cluster members. MAC address configuration will take place
automatically.
Chapter A 145
SmartDashboard configuration
1 Using SmartDashboard, define the Gateway Cluster object. In the General
Properties page of the Gateway Cluster object, assign the routable external IP
address of the cluster as the general IP address of the cluster. Check ClusterXL as a
product installed on the cluster.
2 In the Cluster Members page, click Add > New Cluster Member to add cluster
members to the cluster. Cluster members exist solely inside the Gateway Cluster
object. For each cluster member:
In the Cluster Members Properties window General tab, define a name a Name
and IP Address. Choose an IP address that is routable from the SmartCenter
Server so that the Security Policy installation will be successful. This can be an
internal or an external address, or a dedicated management interface.
Click Communication, and Initialize Secure Internal Communication (SIC).
Define the NAT and VPN tabs, as required.
You can also add an existing gateway as a cluster member by selecting Add > Add
Gateway to Cluster in the Cluster Members page and selecting the gateway from the
list in the Add Gateway to Cluster window.
If you want to remove a gateway from the cluster, click Remove in the Cluster
Members page and select Detach Member from Cluster or right-click on the cluster
member in the Network Objects tree and select Detach from Cluster.
3 In the ClusterXL page,
Check High Availability Legacy Mode,
Choose whether to Use State Synchronization. This option is checked by default.
If you uncheck this, the cluster members will not be synchronized, and existing
connections on the failed gateway will be closed when failover occurs.
Specify the action Upon Gateway Recovery (see What Happens When a
Gateway Recovers? on page 48 for additional information).
Define the Fail-over Tracking method.
4 In the Topology page, define the cluster member addresses. Do not define any
virtual cluster interfaces. If converting from another cluster mode, the virtual cluster
interface definitions are deleted. In the Edit Topology window:
Define the topology for each cluster member interface. To automatically read all
the predefined settings on the member interfaces, click Get all members
topology.
146
In the Network Objective column, define the purpose of the network by
choosing one of the options from the drop-down list. Define the interfaces with
shared IP addresses as belonging to a Monitored Private network, and define one
(or more) interfaces of each cluster member as synchronization interface in a
synchronization network (1st Sync/2nd Sync/3rd Sync). The options are
explained in the Online Help. To define a new network, click Add Network.
5 Define the other pages in the Gateway Cluster object as required (NAT, VPN, Remote
Access, etc.).
Chapter A 147
148
CHAPTER B
Example cphaprob
Script
More information
The cphaprob command is described in How to Verify the Cluster is Working
Properly (cphaprob) on page 75.
Chapter 6, Monitoring and Troubleshooting Gateway Clusters.
#!/bin/sh
#
# This script monitors the existence of processes in the system. The process
names should be written
# in the $FWDIR/conf/cpha_proc_list file one every line.
#
# USAGE :
# cpha_monitor_process X silent
# where X is the number of seconds between process probings.
# if silent is set to 1, no messages will appear on the console.
#
#
# We initially register a pnote for each of the monitored processes
# (process name must be up to 15 characters) in the problem notification
mechanism.
149
# when we detect that a process is missing we report the pnote to be in
"problem" state.
# when the process is up again - we report the pnote is OK.
if [ "$2" -le 1 ]
then
silent=$2
else
silent=0
fi
if [ -f $FWDIR/conf/cpha_proc_list ]
then
procfile=$FWDIR/conf/cpha_proc_list
else
echo "No process file in $FWDIR/conf/cpha_proc_list "
exit 0
fi
arch=`uname -s`
while [ 1 ]
do
result=1
status=$?
if [ $status = 0 ]
then
if [ $silent = 0 ]
then
echo " $process is alive"
fi
# echo "3, $FWDIR/bin/cphaprob -d $process -s ok
report"
$FWDIR/bin/cphaprob -d $process -s ok report
else
if [ $silent = 0 ]
then
echo " $process is down"
fi
150
result=0
fi
done
if [ $result = 0 ]
then
if [ $silent = 0 ]
then
echo " One of the monitored processes is down!"
fi
else
if [ $silent = 0 ]
then
echo " All monitored processes are up "
fi
fi
if [ "$silent" = 0 ]
then
echo "sleeping"
fi
sleep $1
done
Chapter B 151
152
CHAPTER C
ClusterXL Command
Line Interface
The following command line commands relate to ClusterXL and are documented in
the Command Line Interface (CLI) Guide.
TABLE 7-6 Cluster-XL Command Line Interface
Command Description
cphaconf Used to configure ClusterXL. Running this command
is not recommended. It should be run only by VPN-1
Pro. See The cphaconf Command on page 87.
cphaprob Verifies that the cluster and the cluster members are
working properly. See How to Verify the Cluster is
Working Properly (cphaprob) on page 75.
On Nokia VRRP and other OPSEC certified clusters,
this command behaves differently. See The cphaprob
Command in OPSEC Clusters on page 72.
cphastart Running cphastart on a cluster member activates
ClusterXL on the member. It does not initiate full
synchronization. cpstart is the recommended way to start
a cluster member. See The cphastart and cphastop
Commands on page 87.
cphastop Running cphastop on a cluster member stops the cluster
member from passing traffic. State synchronization also
stops. It is still possible to open connections directly to
the cluster member. In High Availability Legacy mode,
running cphastop may cause the entire cluster to stop
functioning. See The cphastart and cphastop
Commands on page 87.
153
154
Index
C I
interface failure 48
cable failure 48 IP address
Client Authentication unique 143
High Availability 25
cphastart 88
cphastop 88
M
F Module Configuration
Parameters 119
failover
definition 14
when does it occur 48
Firewall Modules
S
restrictions on
synchronized Firewall Modules
synchronization 24 restrictions on
fw_sync_allowed_protocols 121 implementation 24
fw_sync_block_new_conns 121
fw_sync_buffer_threshold 121 synchronized Firewalls
fw_sync_max_saved_buf_mem 110 restrictions 24
fw_sync_simplified_fullsync 124 synchronizing Firewall Modules
fwha_timer_base_res 121 on different platforms 24
fwha_timer_cpha_res 121
fwha_timer_sync_res 121
fwldbcast_pending_timeout 123
U
unique IP address 143
H User Authentication
High Availability 25
High Availability
and SmartView Tracker 83
resources 25
Security Servers 25
synchronizing different version
Firewall Modules 24
155
156