0% found this document useful (0 votes)
40 views

Netfilter

The Linux netfilter framework allows modules to register custom hooks or filters at specific points in the network packet processing path. When a packet passes through these points, the registered hooks are invoked in priority order. Each hook is defined using an nf_hook_ops structure specifying the protocol family, hook type, and hook function. The nf_register_hook() function adds the hook to the appropriate list, while nf_hook_slow() iterates through the list, invoking each hook and passing the packet to the next hook or final function based on the hook's verdict.

Uploaded by

laertepalves
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Netfilter

The Linux netfilter framework allows modules to register custom hooks or filters at specific points in the network packet processing path. When a packet passes through these points, the registered hooks are invoked in priority order. Each hook is defined using an nf_hook_ops structure specifying the protocol family, hook type, and hook function. The nf_register_hook() function adds the hook to the appropriate list, while nf_hook_slow() iterates through the list, invoking each hook and passing the packet to the next hook or final function based on the hook's verdict.

Uploaded by

laertepalves
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

The Net Filter Facility

The Linux net filter is a framework in the kernel that allows modules to observe and modify packets
as they pass through the protocol stack. Kernel services or modules can register custom
hooks/filters by both protocol family (PF_INET) and by the point in packet processing
(NF_IP_LOCAL_IN) at which the filter is to be invoked.

The facilty is currently available for IPv4, IPv6 and DECnet but could be extended to other protocol
families. Each protocol family can provide several processing points in the stack where a packet of
that protocol can be passed to a filter. These points are referred to as hook points or hook types.
Hence, when registering a custom hook, the protocol family and the protocol specific hook type
must be specified.

335 static struct nf_hook_ops postroute_ops


336 = { { NULL, NULL }, fw_output, PF_INET,
NF_IP_POST_ROUTING, NF_IP_PRI_FILTER };
:
355 rc = nf_register_hook(&postroute_ops);

A statically allocated array of lists defined in net/core/netfilter.c holds all the hooks registered for
each protocol and hook types. NF_MAX_HOOKS, the maximum types of hooks a protocol can
support has been defined as 8 in include/linux/netfilter.h.

47 struct list_head nf_hooks[NPROTO][NF_MAX_HOOKS];


32 /* Largest hook number + 1 */
33 #define NF_MAX_HOOKS 8

There are 5 hook types defined for the IPV4 protocol.

40 /* After promisc drops, checksum checks. */


41 #define NF_IP_PRE_ROUTING 0
42 /* If the packet is destined for this box. */
43 #define NF_IP_LOCAL_IN 1
44 /* If the packet is destined for another interface. */
45 #define NF_IP_FORWARD 2
46 /* Packets coming from a local process. */
47 #define NF_IP_LOCAL_OUT 3
48 /* Packets about to hit the wire. */
49 #define NF_IP_POST_ROUTING 4

1
Definining a netfilter hook

Each custom hook is defined using the following nf_hook_ops structure. This structure is passed to
the nf_register_hook function.
44 struct nf_hook_ops
45 {
46 struct list_head list;
47
48 /* User fills in from here down. */
49 nf_hookfn *hook;
50 int pf;
51 int hooknum;
52 /* Hooks are ordered in ascending priority. */
53 int priority;
54 };

Structure elements are used as follows:

list: links all hooks of a common pm and hooknum into the nf_hooks array
pf: protocol family (PF_INET) of the filter.
hooknum: the protocol specific hook type (NF_IP_FORWARD) identifier.
priority: order of the hook in the list.
hook: A pointer to the hook function. It's prototype is as follows:

38 typedef unsigned int nf_hookfn(unsigned int hooknum,


39 struct sk_buff **skb,
40 const struct net_device *in,
41 const struct net_device *out,
42 int (*okfn)(struct sk_buff *));
43

Some standard priorities are shown below.

52 enum nf_ip_hook_priorities {
53 NF_IP_PRI_FIRST = INT_MIN,
54 NF_IP_PRI_CONNTRACK = -200,
55 NF_IP_PRI_MANGLE = -150,
56 NF_IP_PRI_NAT_DST = -100,
57 NF_IP_PRI_FILTER = 0,
58 NF_IP_PRI_NAT_SRC = 100,
59 NF_IP_PRI_LAST = INT_MAX,
60 };

2
The nf_register_hook() function defined in net/core/netfilter.c adds the nf_hook_ops structure that
defines a custom hook to the appropriate list based on the protocol family and filter type.
Since the list is ordered by ascending priority values, invocation order is lowest numerical value
first.

60 int nf_register_hook(struct nf_hops *reg)


61 {
62 struct list_head *i;
63
64 br_write_lock_bh(BR_NETPROTO_LOCK);
65 for (i = nf_hooks[reg->pf][reg->hooknum].next;
66 i != &nf_hooks[reg->pf][reg->hooknum];
67 i = i->next)
68 {
if (reg->priority <
((struct nf_hook_ops *)i)->priority)
69 break;
70 }
71 list_add(&reg->list, i->prev);
72 br_write_unlock_bh(BR_NETPROTO_LOCK);
73 return 0;
74 }

3
IP Packet Transmission Through the Netfilter Layer

From ip_build_xmit() or ip_build_xmit_slow(), the IP packet is pushed to the device/netfilter layer


using the NF_HOOK macro defined in include/linux/netfilter.h. Parameters passed include the
output device to be used and the the final output function to be invoked on successful verdict from
all the hooks in the list. The hook type is NF_IP_LOCAL_OUT. The input device is set to NULL,
since the packet originated on the local host.

713 err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb,


NULL, rt->u.dst.dev, output_maybe_reroute);

This macro translates to a call to the nf_hook_slow() function if the netfilter debug option is defined
or if there are hooks/filters set for the specific protocol family and hook type. Otherwise it simply
passes the sk_buff directly to the ok function.

117 /* This is gross, but inline doesn't cut it for avoiding the
118 function call in fast path: gcc doesn't inline (needs
value tracking?). --RR */
119 #ifdef CONFIG_NETFILTER_DEBUG
120 #define NF_HOOK nf_hook_slow
121 #else
122 #define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \
123 (list_empty(&nf_hooks[(pf)][(hook)]) \
124 ? (okfn)(skb) \
125 : nf_hook_slow((pf), (hook), (skb),
(indev), (outdev), (okfn)))
126 #endif

When the net filter facility is enabled and the look list is non-empty, this macro invokes the
nf_hook_slow() function. The nf_hook_slow() function is defined in net/core/netfilter.c, it's task is
to invoke each hook in the specified list, and based on the verdict from the hooks, it either passes
the packet to the okfn or drops the packet.
450 int nf_hook_slow(int pf, unsigned int hook,
struct sk_buff *skb,
451 struct net_device *indev,
452 struct net_device *outdev,
453 int (*okfn)(struct sk_buff *))
454 {
455 struct list_head *elem;
456 unsigned int verdict;
457 int ret = 0;

4
For a non-linear sk_buff each fragment's size, offset and page address are stored in the
skb_frag_struct array. If the skb is non-linear (i.e. skb->data_len!=0), skb_linearize() is called to
reorganized all the data into one linear buffer.

459 /* This stopgap cannot be removed until all the


hooks are audited. */
460 if (skb_is_nonlinear(skb) &&
skb_linearize(skb, GFP_ATOMIC) != 0) {
461 kfree_skb(skb);
462 return -ENOMEM;
463 }

After ensuring the sk_buff is linear, nf_hook_slow() continues. The ip_summed field in the sk_buff
was initialized to 0 (CHECKSUM_NONE) during creation. The objective of this code block is
unclear. It should be remembered though that this nf_hook_slow() is called for both input and
output processing.

464 if (skb->ip_summed == CHECKSUM_HW) {


465 if (outdev == NULL) {
466 skb->ip_summed = CHECKSUM_NONE;
467 } else {
468 skb_checksum_help(skb);
469 }
470 }
471
472 /* We may already have this, but read-locks nest anyway */
473 br_read_lock_bh(BR_NETPROTO_LOCK);
474
475 #ifdef CONFIG_NETFILTER_DEBUG
476 if (skb->nf_debug & (1 << hook)) {
477 printk("nf_hook: hook %i already set.\n",hook);
478 nf_dump_skb(pf, skb);
479 }
480 skb->nf_debug |= (1 << hook);
481 #endif
482

Here the function nf_iterate() is called to execute all the hooks defined for this protocol family and
hook type.
483 elem = &nf_hooks[pf][hook];
484 verdict = nf_iterate(&nf_hooks[pf][hook],
&skb, hook, indev, outdev, &elem, okfn);

5
On return to nf_hook_slow(), actions are based on the verdict. A verdict of NF_QUEUE for an IP
packet this results in a series of function calls leading to the ipq_enqueue() function defined in
net/ipv4/netfilter/ip_queue.c. It is not understood what conditions might trigger this situation.

486 if (verdict == NF_QUEUE) {


487 NFDEBUG("nf_hook: Verdict = QUEUE.\n");
488 nf_queue(skb, elem, pf, hook, indev, outdev,okfn);
489 }

If NF_ACCEPT is the verdict from all hooks, the output_maybe_reroute() function which was
passed into nf_hook_slow() as the okfn() is invoked with the sk_buff as the parameter. If the packet
is to be dropped kfree_skb() is called.

491 switch (verdict) {


492 case NF_ACCEPT:
493 ret = okfn(skb);
494 break;
496 case NF_DROP:
497 kfree_skb(skb);
498 ret = -EPERM;
499 break;
500 }
502 br_read_unlock_bh(BR_NETPROTO_LOCK);
503 return ret;
504 }

6
Iterating through the hook chain

The nf_iterate() function is defined in net/core/netfilter.c

340 static unsigned int nf_iterate(struct list_head *head,


341 struct sk_buff **skb,
342 int hook,
343 const struct net_device *indev,
344 const struct net_device*outdev,
345 struct list_head **i,
346 int (*okfn)(struct sk_buff *))
347 {

One iteration of this loop occurs for each hook called.

348 for (*i = (*i)->next; *i != head; *i = (*i)->next) {


349 struct nf_hook_ops *elem =
(struct nf_hook_ops *)*i;

The value returned by the hook function determines the action taken by the switch statement. An
immediate return, possibly aborting the send, is made if the value returned is NF_QUEUE,
NF_STOLEN, or NF_DROP. For values of NF_REPEAT or NF_ACCEPT the for loop continues.
350 switch (elem->hook(hook, skb, indev, outdev,okfn))
{
351 case NF_QUEUE:
352 return NF_QUEUE;
353
354 case NF_STOLEN:
355 return NF_STOLEN;
356
357 case NF_DROP:
358 return NF_DROP;
359
360 case NF_REPEAT:
361 *i = (*i)->prev;
362 break;
363
364 #ifdef CONFIG_NETFILTER_DEBUG
365 case NF_ACCEPT:
366 break;
367
368 default:
369 NFDEBUG("Evil return from %p(%u).\n",
370 elem->hook, hook);
371 #endif
372 }
373 }

7
If all the hook functions return NF_ACCEPT, then NF_ACCEPT is returned to nf_hook_slow.

374 return NF_ACCEPT;


375 }

The output_maybe_reroute() function

If the packet is accepted for transmission by nf_hook_slow, the okfn(), output_maybe_reroute(),


defined in net/ipv4/ip_output.c is called. It simply passes control to the output function associated
with the dst structure that is presently bound to the sk_buff.

113 static inline int


114 output_maybe_reroute(struct sk_buff *skb)
115 {
116 return skb->dst->output(skb);
117 }

The pointer skb->dst refers to the route cache element associated with this packet's source and
destination. In ip_route_output_slow(), rt->u.dst->output was set to ip_output() which is defined
in net/ipv4/ip_output.c.
255 int ip_output(struct sk_buff *skb)
256 {
257 #ifdef CONFIG_IP_ROUTE_NAT
258 struct rtable *rt = (struct rtable*)skb->dst;
259 #endif
260
261 IP_INC_STATS(IpOutRequests);
262
263 #ifdef CONFIG_IP_ROUTE_NAT
264 if (rt->rt_flags&RTCF_NAT)
265 ip_do_nat(skb);
266 #endif
267
268 return ip_finish_output(skb);
269 }

8
The ip_finish_output() function

The ip_finish_output() function sets skb->dev to the device associated with the route's associated
output device structure and the protocol type to ETH_P_IP. This indicates that the value 0x8000
must represent an IP packet even if the output device is not an ethernet device.
183 __inline__ int ip_finish_output(struct sk_buff *skb)
184 {
185 struct net_device *dev = skb->dst->dev;
186
187 skb->dev = dev;
188 skb->protocol = __constant_htons(ETH_P_IP);

Next, the NF_HOOK macro is again invoked. This macro expands to nf_hook_slow() and invokes
all the net filters defined for PF_INET at the NF_IP_POST_ROUTING level. If the verdict from
all filters is NF_ACCEPT, the okfn(), ip_finish_output2() is called as before.

189
190 return NF_HOOK(PF_INET, NF_IP_POST_ROUTING,
skb, NULL, dev, ip_finish_output2);
192 }

The ip_finish_output2() function

The ip_finish_output2() function is defined in net/ipv4/ip_output.c .


159 static inline int ip_finish_output2(struct sk_buff *skb)
160 {
161 struct dst_entry *dst = skb->dst;
162 struct hh_cache *hh = dst->hh;
163
164 #ifdef CONFIG_NETFILTER_DEBUG
165 nf_debug_ip_finish_output2(skb);
166 #endif /*CONFIG_NETFILTER_DEBUG*/
167

9
There are two mechanisms by which calls to the link layer may be made. If the dst_entry has an
hh_cache pointer then the hh_cache entry must contain both the hardware header itself and a pointer
to an output function at the device / link layer. The output function is always set to
dev_queue_xmit().

If there is no hh pointer but there is a neighbor pointer, then the neighbor structure must have an
output function pointer. The output function of the neighbour structure is set to
neigh_resolve_output() if the network device needs a hardware header. Otherwise (for a loopback,
point to point, or virtual device) it set to invoke dev_queue_xmit() by the arp_constructor() function
that is called when each neigbour structure is created. Note the ugly hardcode of the hardware
header length at 16 bytes.

168 if (hh) {
169 read_lock_bh(&hh->hh_lock);
170 memcpy(skb->data - 16, hh->hh_data, 16);
171 read_unlock_bh(&hh->hh_lock);
172 skb_push(skb, hh->hh_len);
173 return hh->hh_output(skb);
174 } else if (dst->neighbour)
175 return dst->neighbour->output(skb);
176

If there is no hardware header structure and no neighbor structure available, then there is no way to
send the packet and it must be dropped. The net_ratelimit() function is used to limit the number of
printk's generated to not more than 1 every 5 seconds to avoid flooding the syslog in case
something is badly amiss in the network setup.

177 if (net_ratelimit())
178 printk(KERN_DEBUG "ip_finish_output2:
No header cache and no neighbour!\n");
179 kfree_skb(skb);
180 return -EINVAL;
181 }

10

You might also like