Lecture - 08 - Tcpip Stack in The Linux Kernel
Lecture - 08 - Tcpip Stack in The Linux Kernel
Pracheer Agarwal
Swagat Konchada
It is the software layer in the kernel that provides a
uniform filesystem interface to userspace programs
Sock_create() Sock_map_fd()
Inet_create() Fd_install()
Lower layer initialization
Sys_connect()
Sockfd_lookup_light()
Returns the socket object
associated with the given fd
Move_addr_to_kernel()
For userspace sockaddr *
Sock->ops->connect()
Lower layer call
Tcp_v4_connect()
Socket layer functions
are elided.
Defined in <include/linux/skbuff.h>
_ _u32 qlen;
spinlock_t lock;/* atomicity in accessing a sk_buff list. */
};
Layout
General
Feature-specific
Management functions
struct sock * sk
sock data structure of the socket that owns this buffer
unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by head)
and the data in the fragments
unsigned int data_len
unlike len, data_len accounts only for the size of the data in the fragments.
unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc and atomic_dec
struct sock * sk
sock data structure of the socket that owns this buffer
unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by
head) and the data in the fragments
unsigned int data_len
unlike len, data_len accounts only for the size of the data in the fragments.
unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc and atomic_dec
struct sock * sk
sock data structure of the socket that owns this buffer
unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by head)
and the data in the fragments
unsigned int data_len
unlike len, data_len accounts only for the size of the data in the
fragments.
unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc and atomic_dec
struct sock * sk
sock data structure of the socket that owns this buffer
unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by head)
and the data in the fragments
unsigned int data_len
unlike len, data_len accounts only for the size of the data in the fragments.
unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc and atomic_dec
struct sock * sk
sock data structure of the socket that owns this buffer
unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by head)
and the data in the fragments
unsigned int data_len
unlike len, data_len accounts only for the size of the data in the fragments.
unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc() and atomic_dec()
• unsigned char *head
• sk_buff_data_t end
• unsigned char *data
• sk_buff_data_t tail
struct net_device *dev
represents the receiving interface or the to be transmitted device(or
interface) corresponding to the packet.
usually represents the virtual device’s(representation of all devices
grouped) net_device structure.
struct tcp_skb_cb {
... ... ... _ _u32 seq; /* Starting sequence number */
_ _u32 end_seq; /* SEQ + FIN + SYN + datalen*/
_ _u32 when; /* used to compute rtt's */
_ _u8 flags; /* TCP header flags. */
... ... ...
};
Defined in <include/linux/skbuff.h> & <net/core/skbuff.c>
Each of the above four memory management functions return the data ptr.
defined in <net/core/skbuff.c>
…
struct net_device{
char name[IFNAMSIZ];
int ifindex;
…
struct net_device{
char name[IFNAMSIZ];
int ifindex;
…
struct net_device{
…
struct net_device{
…
struct net_device{
…
struct net_device{
…
struct net_device{
…
struct net_device{
…
struct net_device *next;
struct hlist_node name_hlist;
struct hlist_node index_hlist;
We don’t process the packet in the interrupt subroutine.
Netif_rx() – raise the net Rx softIRQ.
Net_rx_action() is called - start processing the packet
Processing of packet starts with the protocol switching section
Netif_receive_skb() is called to process the packet and find out the next protocol layer.
Protocol family of the packet is extracted from the link layer header.
ip_rcv() is an entry point for IP packets processing.
Checks if the packet we have is destined for some other host (using PACKET_OTHERHOST)
Check the checksum of the packet by calling ip_fast_csum()
Call ip_route_input() , this routine checks kernel routing table rt_hash_table.
If packet needs to be forwarded input routine is ip_forward()
Otherwise ip_local_deliver()
ip_send() is called to check if the packet needs to be fragmented
If yes , fragment the packet by calling ip_fragment()
Packet output path – ip_finish_output()
ip_local_deliver() – packets need to delivered locally
ip_defrag()
Protocol identifier field skb->np.iph->protocol (in IP header).
For TCP, we find the receive handler as tcp_v4_rcv() (entry point for the TCP layer)
_tcp_v4_lookup() – find the socket to which the packet belongs
Establised sockets are maintained in the hash table tcp_ehash.
Established socket not found – New connection request for any listening socket
Search for listening socket – tcp_v4_lookup_listener()
tcp_rcv_established()
Application read the data from the receive queue if it issues recv()
Kernel routine to read data from TCP socket is tcp_recvmsg()