This document provides documentation on data structure APIs for linked lists in the Linux kernel, including functions for adding, removing, and traversing nodes. It covers topics such as doubly linked lists, list operations like splicing and rotation, and common list macros.
This document provides documentation on data structure APIs for linked lists in the Linux kernel, including functions for adding, removing, and traversing nodes. It covers topics such as doubly linked lists, list operations like splicing and rotation, and common list macros.
The Linux Kernel API ii This documentation is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA For more details see the le COPYING in the source distribution of Linux. The Linux Kernel API iii COLLABORATORS TITLE : The Linux Kernel API ACTION NAME DATE SIGNATURE WRITTEN BY September 24, 2014 REVISION HISTORY NUMBER DATE DESCRIPTION NAME The Linux Kernel API iv Contents 1 Data Types 1 1.1 Doubly Linked Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 list_add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 list_add_tail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.3 __list_del_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.4 list_replace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.5 list_del_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.6 list_move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.7 list_move_tail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.8 list_is_last . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.9 list_empty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.10 list_empty_careful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.11 list_rotate_left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.12 list_is_singular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.13 list_cut_position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.14 list_splice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.15 list_splice_tail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.16 list_splice_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.17 list_splice_tail_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.18 list_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.19 list_rst_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.20 list_last_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.21 list_rst_entry_or_null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.22 list_next_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.23 list_prev_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.24 list_for_each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.25 list_for_each_prev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.1.26 list_for_each_safe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.1.27 list_for_each_prev_safe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.1.28 list_for_each_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 The Linux Kernel API v 1.1.29 list_for_each_entry_reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1.30 list_prepare_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1.31 list_for_each_entry_continue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.1.32 list_for_each_entry_continue_reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.1.33 list_for_each_entry_from . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.1.34 list_for_each_entry_safe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1.35 list_for_each_entry_safe_continue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1.36 list_for_each_entry_safe_from . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1.37 list_for_each_entry_safe_reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1.38 list_safe_reset_next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1.39 hlist_for_each_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1.40 hlist_for_each_entry_continue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1.41 hlist_for_each_entry_from . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1.42 hlist_for_each_entry_safe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2 Basic C Library Functions 16 2.1 String Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 simple_strtoull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.2 simple_strtoul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.3 simple_strtol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.4 simple_strtoll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.5 vsnprintf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.6 vscnprintf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.7 snprintf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.8 scnprintf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.9 vsprintf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.10 sprintf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.11 vbin_printf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.12 bstr_printf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.13 bprintf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.14 vsscanf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.15 sscanf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.16 kstrtol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.17 kstrtoul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.18 kstrtoull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.19 kstrtoll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.1.20 kstrtouint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.1.21 kstrtoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 String Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 The Linux Kernel API vi 2.2.1 strnicmp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.2 strcpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.3 strncpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.4 strlcpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.5 strcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.6 strncat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.7 strlcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.8 strcmp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.9 strncmp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.10 strchr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.11 strchrnul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.12 strrchr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.13 strnchr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.14 skip_spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2.15 strim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2.16 strlen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2.17 strnlen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.18 strspn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.19 strcspn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.20 strpbrk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.21 strsep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.22 sysfs_streq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.23 strtobool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.24 memset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.25 memcpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.26 memmove . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.27 memcmp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.28 memscan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.29 strstr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.30 strnstr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.31 memchr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.32 memchr_inv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3 Bit Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.1 set_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.2 __set_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.3 clear_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.4 __change_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.5 change_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.6 test_and_set_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 The Linux Kernel API vii 2.3.7 test_and_set_bit_lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.8 __test_and_set_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.9 test_and_clear_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.10 __test_and_clear_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.3.11 test_and_change_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.3.12 test_bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3.13 __ffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3.14 ffz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3.15 ffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.3.16 s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.3.17 s64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 Basic Kernel Library Functions 44 3.1 Bitmap Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.1.1 __bitmap_shift_right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.1.2 __bitmap_shift_left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.1.3 bitmap_scnprintf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1.4 __bitmap_parse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1.5 bitmap_parse_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1.6 bitmap_scnlistprintf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1.7 bitmap_parselist_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1.8 bitmap_remap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1.9 bitmap_bitremap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.10 bitmap_onto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.11 bitmap_fold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.1.12 bitmap_nd_free_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.1.13 bitmap_release_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.1.14 bitmap_allocate_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.15 bitmap_copy_le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.16 __bitmap_parselist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.1.17 bitmap_pos_to_ord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.1.18 bitmap_ord_to_pos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Command-line Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2.1 get_option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2.2 get_options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.3 memparse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 CRC Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.1 crc7_be . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.2 crc16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 The Linux Kernel API viii 3.3.3 crc_itu_t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.4 .//lib/crc32.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.5 crc_ccitt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4 idr/ida Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.1 idr_preload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.2 idr_alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.3 idr_alloc_cyclic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4.4 idr_remove . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4.5 idr_destroy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.6 idr_for_each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.7 idr_get_next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.8 idr_replace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.9 idr_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.10 ida_pre_get . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4.11 ida_get_new_above . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4.12 ida_remove . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.13 ida_destroy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.14 ida_simple_get . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.15 ida_simple_remove . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4.16 ida_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 Memory Management in Linux 64 4.1 The Slab Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.1 kmalloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.2 kmalloc_array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.3 kcalloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.4 kzalloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.5 kzalloc_node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.6 kmem_cache_alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.7 kmem_cache_alloc_node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.8 kmem_cache_free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.1.9 kfree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.1.10 ksize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.1.11 kstrdup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.1.12 kstrndup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.1.13 kmemdup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.14 memdup_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.15 __krealloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.16 krealloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 The Linux Kernel API ix 4.1.17 kzfree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.1.18 get_user_pages_fast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 User Space Memory Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2.1 __copy_to_user_inatomic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2.2 __copy_to_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.3 __copy_from_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.4 clear_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.5 __clear_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.6 _copy_to_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.7 _copy_from_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 More Memory Management Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.1 read_cache_pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.2 page_cache_sync_readahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.3 page_cache_async_readahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.4 delete_from_page_cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3.5 lemap_ush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3.6 lemap_fdatawait_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3.7 lemap_fdatawait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3.8 lemap_write_and_wait_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3.9 replace_page_cache_page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3.10 add_to_page_cache_locked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3.11 add_page_wait_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3.12 unlock_page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3.13 end_page_writeback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3.14 __lock_page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3.15 page_cache_next_hole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3.16 page_cache_prev_hole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3.17 nd_get_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.18 nd_lock_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.19 pagecache_get_page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.20 nd_get_pages_contig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.3.21 nd_get_pages_tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.3.22 generic_le_read_iter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.23 lemap_fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.24 read_cache_page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.25 read_cache_page_gfp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3.26 __generic_le_write_iter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3.27 generic_le_write_iter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3.28 try_to_release_page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 The Linux Kernel API x 4.3.29 zap_vma_ptes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3.30 vm_insert_page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3.31 vm_insert_pfn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3.32 remap_pfn_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3.33 vm_iomap_memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3.34 unmap_mapping_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.3.35 follow_pfn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.3.36 vm_unmap_aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3.37 vm_unmap_ram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3.38 vm_map_ram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3.39 unmap_kernel_range_noush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.40 unmap_kernel_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.41 vfree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3.42 vunmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3.43 vmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.44 vmalloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.45 vzalloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.46 vmalloc_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3.47 vmalloc_node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3.48 vzalloc_node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.3.49 vmalloc_32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.3.50 vmalloc_32_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.3.51 remap_vmalloc_range_partial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.3.52 remap_vmalloc_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.3.53 alloc_vm_area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.3.54 nr_free_zone_pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.3.55 nr_free_pagecache_pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.3.56 nd_next_best_node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.3.57 free_bootmem_with_active_regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.3.58 sparse_memory_present_with_active_regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.3.59 get_pfn_range_for_nid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.3.60 absent_pages_in_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.3.61 node_map_pfn_alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.62 nd_min_pfn_with_active_regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.63 free_area_init_nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.64 set_dma_reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.3.65 setup_per_zone_wmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.3.66 get_pfnblock_ags_mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3.67 set_pfnblock_ags_mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 The Linux Kernel API xi 4.3.68 alloc_contig_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3.69 mempool_destroy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.3.70 mempool_create . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.3.71 mempool_resize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3.72 mempool_alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3.73 mempool_free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.74 dma_pool_create . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.75 dma_pool_destroy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.76 dma_pool_alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.77 dma_pool_free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.78 dmam_pool_create . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.79 dmam_pool_destroy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.80 balance_dirty_pages_ratelimited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.3.81 tag_pages_for_writeback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.3.82 write_cache_pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3.83 generic_writepages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3.84 write_one_page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3.85 wait_for_stable_page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.3.86 truncate_inode_pages_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.3.87 truncate_inode_pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.88 truncate_inode_pages_nal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.89 invalidate_mapping_pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.3.90 invalidate_inode_pages2_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.3.91 invalidate_inode_pages2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.3.92 truncate_pagecache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.3.93 truncate_setsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.3.94 truncate_pagecache_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5 Kernel IPC facilities 115 5.1 IPC utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.1.1 ipc_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.1.2 ipc_init_ids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.1.3 ipc_init_proc_interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.1.4 ipc_ndkey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.1.5 ipc_get_maxid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.1.6 ipc_addid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.1.7 ipcget_new . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.1.8 ipc_check_perms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.1.9 ipcget_public . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 The Linux Kernel API xii 5.1.10 ipc_rmid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.11 ipc_alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.12 ipc_free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.13 ipc_rcu_alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.1.14 ipcperms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.1.15 kernel_to_ipc64_perm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.1.16 ipc64_perm_to_ipc_perm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.1.17 ipc_obtain_object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.1.18 ipc_lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1.19 ipc_obtain_object_check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1.20 ipcget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1.21 ipc_update_perm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.1.22 ipcctl_pre_down_nolock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.1.23 ipc_parse_version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6 FIFO Buffer 125 6.1 kfo interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.1.1 DECLARE_KFIFO_PTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.1.2 DECLARE_KFIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.1.3 INIT_KFIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.1.4 DEFINE_KFIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.1.5 kfo_initialized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.1.6 kfo_esize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.1.7 kfo_recsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.1.8 kfo_size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.1.9 kfo_reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.1.10 kfo_reset_out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.1.11 kfo_len . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.1.12 kfo_is_empty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.1.13 kfo_is_full . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.1.14 kfo_avail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1.15 kfo_skip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1.16 kfo_peek_len . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1.17 kfo_alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1.18 kfo_free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.1.19 kfo_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.1.20 kfo_put . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.1.21 kfo_get . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.1.22 kfo_peek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 The Linux Kernel API xiii 6.1.23 kfo_in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.1.24 kfo_in_spinlocked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.1.25 kfo_out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.1.26 kfo_out_spinlocked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.1.27 kfo_from_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.1.28 kfo_to_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.1.29 kfo_dma_in_prepare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.1.30 kfo_dma_in_nish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.31 kfo_dma_out_prepare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.32 kfo_dma_out_nish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.1.33 kfo_out_peek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7 relay interface support 137 7.1 relay interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.1.1 relay_buf_full . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.1.2 relay_reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.1.3 relay_open . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.1.4 relay_switch_subbuf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.1.5 relay_subbufs_consumed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.1.6 relay_close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.1.7 relay_ush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.1.8 relay_mmap_buf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.1.9 relay_alloc_buf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.1.10 relay_create_buf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.1.11 relay_destroy_channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.1.12 relay_destroy_buf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.1.13 relay_remove_buf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.1.14 relay_buf_empty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.1.15 wakeup_readers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.1.16 __relay_reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.1.17 relay_close_buf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.1.18 relay_hotcpu_callback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.1.19 relay_late_setup_les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.1.20 relay_le_open . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.1.21 relay_le_mmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.1.22 relay_le_poll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.1.23 relay_le_release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.1.24 relay_le_read_subbuf_avail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.1.25 relay_le_read_start_pos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.1.26 relay_le_read_end_pos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 The Linux Kernel API xiv 8 Module Support 147 8.1 Module Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.1.1 __request_module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.1.2 call_usermodehelper_setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.1.3 call_usermodehelper_exec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.1.4 call_usermodehelper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.2 Inter Module support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9 Hardware Interfaces 150 9.1 Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 9.1.1 synchronize_hardirq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 9.1.2 synchronize_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 9.1.3 irq_set_afnity_notier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.1.4 disable_irq_nosync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.1.5 disable_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.1.6 enable_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 9.1.7 irq_set_irq_wake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 9.1.8 irq_wake_thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 9.1.9 setup_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 9.1.10 remove_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 9.1.11 free_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.1.12 request_threaded_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.1.13 request_any_context_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 9.2 DMA Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 9.2.1 request_dma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 9.2.2 free_dma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9.3 Resources Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9.3.1 request_resource_conict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9.3.2 reallocate_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9.3.3 lookup_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.3.4 insert_resource_conict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.3.5 insert_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 9.3.6 insert_resource_expand_to_t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 9.3.7 resource_alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 9.3.8 release_mem_region_adjustable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 9.3.9 request_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 9.3.10 release_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 9.3.11 allocate_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 9.3.12 adjust_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 The Linux Kernel API xv 9.3.13 __request_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 9.3.14 __check_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 9.3.15 __release_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 9.4 MTRR Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 9.4.1 mtrr_add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 9.4.2 mtrr_del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 9.4.3 arch_phys_wc_add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 9.5 PCI Support Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 9.5.1 pci_bus_max_busnr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 9.5.2 pci_nd_capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 9.5.3 pci_bus_nd_capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 9.5.4 pci_nd_next_ext_capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 9.5.5 pci_nd_ext_capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.5.6 pci_nd_next_ht_capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.5.7 pci_nd_ht_capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.5.8 pci_nd_parent_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 9.5.9 __pci_complete_power_transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 9.5.10 pci_set_power_state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 9.5.11 pci_choose_state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 9.5.12 pci_save_state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 9.5.13 pci_restore_state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.5.14 pci_store_saved_state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.5.15 pci_load_and_free_saved_state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.5.16 pci_reenable_device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 9.5.17 pci_enable_device_io . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 9.5.18 pci_enable_device_mem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 9.5.19 pci_enable_device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.5.20 pcim_enable_device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.5.21 pcim_pin_device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.5.22 pci_disable_device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.5.23 pci_set_pcie_reset_state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.5.24 pci_pme_capable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.5.25 pci_pme_active . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9.5.26 __pci_enable_wake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9.5.27 pci_wake_from_d3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.5.28 pci_prepare_to_sleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.5.29 pci_back_from_sleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.5.30 pci_dev_run_wake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.5.31 pci_release_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 The Linux Kernel API xvi 9.5.32 pci_request_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.5.33 pci_request_region_exclusive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.5.34 pci_release_selected_regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.5.35 pci_request_selected_regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.5.36 pci_release_regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.5.37 pci_request_regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.5.38 pci_request_regions_exclusive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.5.39 pci_set_master . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.5.40 pci_clear_master . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 9.5.41 pci_set_cacheline_size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 9.5.42 pci_set_mwi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 9.5.43 pci_try_set_mwi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9.5.44 pci_clear_mwi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9.5.45 pci_intx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.5.46 pci_intx_mask_supported . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.5.47 pci_check_and_mask_intx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.5.48 pci_check_and_unmask_intx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 9.5.49 pci_msi_off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 9.5.50 pci_wait_for_pending_transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 9.5.51 pci_reset_bridge_secondary_bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 9.5.52 __pci_reset_function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 9.5.53 __pci_reset_function_locked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 9.5.54 pci_reset_function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9.5.55 pci_try_reset_function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9.5.56 pci_probe_reset_slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 9.5.57 pci_reset_slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 9.5.58 pci_try_reset_slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 9.5.59 pci_probe_reset_bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9.5.60 pci_reset_bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9.5.61 pci_try_reset_bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9.5.62 pcix_get_max_mmrbc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.5.63 pcix_get_mmrbc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.5.64 pcix_set_mmrbc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.5.65 pcie_get_readrq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.5.66 pcie_set_readrq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.5.67 pcie_get_mps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.5.68 pcie_set_mps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.5.69 pcie_get_minimum_link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.5.70 pci_select_bars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 The Linux Kernel API xvii 9.5.71 pci_add_dynid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 9.5.72 pci_match_id . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 9.5.73 __pci_register_driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 9.5.74 pci_unregister_driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 9.5.75 pci_dev_driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 9.5.76 pci_dev_get . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 9.5.77 pci_dev_put . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 9.5.78 pci_stop_and_remove_bus_device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 9.5.79 pci_nd_bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 9.5.80 pci_nd_next_bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 9.5.81 pci_get_slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 9.5.82 pci_get_domain_bus_and_slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 9.5.83 pci_get_subsys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 9.5.84 pci_get_device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 9.5.85 pci_get_class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 9.5.86 pci_dev_present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 9.5.87 pci_msi_vec_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 9.5.88 pci_msix_vec_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 9.5.89 pci_enable_msix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 9.5.90 pci_msi_enabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 9.5.91 pci_enable_msi_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 9.5.92 pci_enable_msix_range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 9.5.93 pci_bus_alloc_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 9.5.94 pci_bus_add_device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.5.95 pci_bus_add_devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.5.96 pci_bus_set_ops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.5.97 pci_read_vpd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 9.5.98 pci_write_vpd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 9.5.99 pci_cfg_access_lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 9.5.100 pci_cfg_access_trylock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9.5.101 pci_cfg_access_unlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9.5.102 pci_lost_interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 9.5.103 __ht_create_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 9.5.104 ht_create_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 9.5.105 ht_destroy_irq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 9.5.106 pci_scan_slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 9.5.107 pci_rescan_bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 9.5.108 pci_create_slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 9.5.109 pci_destroy_slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 The Linux Kernel API xviii 9.5.110 pci_hp_create_module_link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 9.5.111 pci_hp_remove_module_link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 9.5.112 pci_enable_rom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 9.5.113 pci_disable_rom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 9.5.114 pci_map_rom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 9.5.115 pci_unmap_rom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 9.5.116 pci_platform_rom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 9.5.117 pci_enable_sriov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 9.5.118 pci_disable_sriov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 9.5.119 pci_num_vf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 9.5.120 pci_vfs_assigned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9.5.121 pci_sriov_set_totalvfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9.5.122 pci_sriov_get_totalvfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9.5.123 pci_read_legacy_io . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 9.5.124 pci_write_legacy_io . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 9.5.125 pci_mmap_legacy_mem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 9.5.126 pci_mmap_legacy_io . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 9.5.127 pci_adjust_legacy_attr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 9.5.128 pci_create_legacy_les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 9.5.129 pci_mmap_resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 9.5.130 pci_remove_resource_les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 9.5.131 pci_create_resource_les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 9.5.132 pci_write_rom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 9.5.133 pci_read_rom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 9.5.134 pci_remove_sysfs_dev_les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 9.6 PCI Hotplug Support Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 9.6.1 __pci_hp_register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 9.6.2 pci_hp_deregister . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 9.6.3 pci_hp_change_slot_info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 10 Firmware Interfaces 218 10.1 DMI Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 10.1.1 dmi_check_system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 10.1.2 dmi_rst_match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 10.1.3 dmi_get_system_info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 10.1.4 dmi_name_in_vendors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 10.1.5 dmi_nd_device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 10.1.6 dmi_get_date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 10.1.7 dmi_walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 The Linux Kernel API xix 10.1.8 dmi_match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 10.2 EDD Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 10.2.1 edd_show_raw_data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 10.2.2 edd_release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 10.2.3 edd_dev_is_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 10.2.4 edd_get_pci_dev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 10.2.5 edd_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 11 Security Framework 223 11.1 security_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 11.2 security_module_enable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 11.3 register_security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 11.4 securityfs_create_le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 11.5 securityfs_create_dir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 11.6 securityfs_remove . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 12 Audit Interfaces 227 12.1 audit_log_start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 12.2 audit_log_format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 12.3 audit_log_end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 12.4 audit_log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 12.5 audit_log_secctx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 12.6 audit_alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 12.7 __audit_free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 12.8 __audit_syscall_entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 12.9 __audit_syscall_exit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 12.10__audit_reusename . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 12.11__audit_getname . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 12.12__audit_inode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 12.13auditsc_get_stamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 12.14audit_set_loginuid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 12.15__audit_mq_open . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 12.16__audit_mq_sendrecv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 12.17__audit_mq_notify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 12.18__audit_mq_getsetattr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 12.19__audit_ipc_obj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 12.20__audit_ipc_set_perm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 12.21__audit_socketcall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 12.22__audit_fd_pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 The Linux Kernel API xx 12.23__audit_sockaddr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 12.24__audit_signal_info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 12.25__audit_log_bprm_fcaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 12.26__audit_log_capset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 12.27audit_core_dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 12.28audit_rule_change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 12.29audit_list_rules_send . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 12.30parent_len . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 12.31audit_compare_dname_path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 13 Accounting Framework 239 13.1 sys_acct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 13.2 acct_auto_close_mnt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 13.3 acct_auto_close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 13.4 acct_collect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 13.5 acct_process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 14 Block Devices 242 14.1 blk_get_backing_dev_info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 14.2 blk_delay_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 14.3 blk_start_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 14.4 blk_stop_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 14.5 blk_sync_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 14.6 __blk_run_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 14.7 blk_run_queue_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 14.8 blk_run_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 14.9 blk_queue_bypass_start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 14.10blk_queue_bypass_end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 14.11blk_cleanup_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 14.12blk_init_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 14.13blk_make_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 14.14blk_rq_set_block_pc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 14.15blk_requeue_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 14.16part_round_stats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 14.17blk_add_request_payload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 14.18generic_make_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 14.19submit_bio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 14.20blk_rq_check_limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 14.21blk_insert_cloned_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 The Linux Kernel API xxi 14.22blk_rq_err_bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 14.23blk_peek_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 14.24blk_start_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 14.25blk_fetch_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 14.26blk_update_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 14.27blk_unprep_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 14.28blk_end_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 14.29blk_end_request_all . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 14.30blk_end_request_cur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 14.31blk_end_request_err . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 14.32__blk_end_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 14.33__blk_end_request_all . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 14.34__blk_end_request_cur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 14.35__blk_end_request_err . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 14.36rq_ush_dcache_pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 14.37blk_lld_busy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 14.38blk_rq_unprep_clone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 14.39blk_rq_prep_clone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 14.40blk_start_plug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 14.41blk_pm_runtime_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 14.42blk_pre_runtime_suspend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 14.43blk_post_runtime_suspend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 14.44blk_pre_runtime_resume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 14.45blk_post_runtime_resume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 14.46__blk_run_queue_uncond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 14.47__blk_drain_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 14.48rq_ioc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 14.49__get_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 14.50get_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 14.51blk_attempt_plug_merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 14.52blk_end_bidi_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 14.53__blk_end_bidi_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 14.54blk_rq_map_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 14.55blk_rq_map_user_iov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 14.56blk_rq_unmap_user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 14.57blk_rq_map_kern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 14.58blk_release_queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 14.59blk_queue_prep_rq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 14.60blk_queue_unprep_rq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 The Linux Kernel API xxii 14.61blk_queue_merge_bvec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 14.62blk_set_default_limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 14.63blk_set_stacking_limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 14.64blk_queue_make_request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 14.65blk_queue_bounce_limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 14.66blk_limits_max_hw_sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 14.67blk_queue_max_hw_sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 14.68blk_queue_chunk_sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 14.69blk_queue_max_discard_sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 14.70blk_queue_max_write_same_sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 14.71blk_queue_max_segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 14.72blk_queue_max_segment_size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 14.73blk_queue_logical_block_size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 14.74blk_queue_physical_block_size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 14.75blk_queue_alignment_offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 14.76blk_limits_io_min . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 14.77blk_queue_io_min . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 14.78blk_limits_io_opt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 14.79blk_queue_io_opt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 14.80blk_queue_stack_limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 14.81blk_stack_limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 14.82bdev_stack_limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 14.83disk_stack_limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 14.84blk_queue_dma_pad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 14.85blk_queue_update_dma_pad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 14.86blk_queue_dma_drain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 14.87blk_queue_segment_boundary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 14.88blk_queue_dma_alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 14.89blk_queue_update_dma_alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 14.90blk_queue_ush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 14.91blk_execute_rq_nowait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 14.92blk_execute_rq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 14.93blkdev_issue_ush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 14.94blkdev_issue_discard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 14.95blkdev_issue_write_same . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 14.96blkdev_issue_zeroout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 14.97blk_queue_nd_tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 14.98blk_free_tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 14.99blk_queue_free_tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 The Linux Kernel API xxiii 14.100blk_init_tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 14.101blk_queue_init_tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 14.102blk_queue_resize_tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 14.103blk_queue_end_tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 14.104blk_queue_start_tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 14.105blk_queue_invalidate_tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 14.106__blk_queue_free_tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 14.107blk_rq_count_integrity_sg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 14.108blk_rq_map_integrity_sg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 14.109blk_integrity_compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 14.110blk_integrity_register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 14.111blk_integrity_unregister . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 14.112blk_trace_ioctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 14.113blk_trace_shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 14.114blk_add_trace_rq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 14.115blk_add_trace_bio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 14.116blk_add_trace_bio_remap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 14.117blk_add_trace_rq_remap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 14.118blk_mangle_minor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 14.119blk_alloc_devt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 14.120blk_free_devt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 14.121disk_replace_part_tbl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 14.122disk_expand_part_tbl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 14.123disk_block_events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 14.124disk_unblock_events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 14.125disk_ush_events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 14.126disk_clear_events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 14.127disk_get_part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 14.128disk_part_iter_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 14.129disk_part_iter_next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 14.130disk_part_iter_exit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 14.131disk_map_sector_rcu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 14.132register_blkdev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 14.133add_disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 14.134get_gendisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 14.135bdget_disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 The Linux Kernel API xxiv 15 Char devices 303 15.1 register_chrdev_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 15.2 alloc_chrdev_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 15.3 __register_chrdev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 15.4 unregister_chrdev_region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 15.5 __unregister_chrdev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 15.6 cdev_add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 15.7 cdev_del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 15.8 cdev_alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 15.9 cdev_init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 16 Miscellaneous Devices 307 16.1 misc_register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 16.2 misc_deregister . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 17 Clock Framework 309 17.1 struct clk_notier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 17.2 struct clk_notier_data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 17.3 clk_notier_register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 17.4 clk_notier_unregister . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 17.5 clk_get_accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 17.6 clk_prepare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 17.7 clk_unprepare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 17.8 clk_get . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 17.9 devm_clk_get . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 17.10clk_enable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 17.11clk_disable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 17.12clk_get_rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 17.13clk_put . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 17.14devm_clk_put . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 17.15clk_round_rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 17.16clk_set_rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 17.17clk_set_parent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 17.18clk_get_parent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 17.19clk_get_sys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 17.20clk_add_alias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 The Linux Kernel API 1 / 317 Chapter 1 Data Types 1.1 Doubly Linked Lists 1.1.1 list_add list_add add a new entry Synopsis void list_add (struct list_head * new, struct list_head * head); Arguments new new entry to be added head list head to add it after Description Insert a new entry after the specied head. This is good for implementing stacks. 1.1.2 list_add_tail list_add_tail add a new entry Synopsis void list_add_tail (struct list_head * new, struct list_head * head); Arguments new new entry to be added head list head to add it before The Linux Kernel API 2 / 317 Description Insert a new entry before the specied head. This is useful for implementing queues. 1.1.3 __list_del_entry __list_del_entry deletes entry from list. Synopsis void __list_del_entry (struct list_head * entry); Arguments entry the element to delete from the list. Note list_empty on entry does not return true after this, the entry is in an undened state. 1.1.4 list_replace list_replace replace old entry by new one Synopsis void list_replace (struct list_head * old, struct list_head * new); Arguments old the element to be replaced new the new element to insert Description If old was empty, it will be overwritten. 1.1.5 list_del_init list_del_init deletes entry from list and reinitialize it. Synopsis void list_del_init (struct list_head * entry); Arguments entry the element to delete from the list. The Linux Kernel API 3 / 317 1.1.6 list_move list_move delete from one list and add as anothers head Synopsis void list_move (struct list_head * list, struct list_head * head); Arguments list the entry to move head the head that will precede our entry 1.1.7 list_move_tail list_move_tail delete from one list and add as anothers tail Synopsis void list_move_tail (struct list_head * list, struct list_head * head); Arguments list the entry to move head the head that will follow our entry 1.1.8 list_is_last list_is_last tests whether list is the last entry in list head Synopsis int list_is_last (const struct list_head * list, const struct list_head * head); Arguments list the entry to test head the head of the list 1.1.9 list_empty list_empty tests whether a list is empty Synopsis int list_empty (const struct list_head * head); The Linux Kernel API 4 / 317 Arguments head the list to test. 1.1.10 list_empty_careful list_empty_careful tests whether a list is empty and not being modied Synopsis int list_empty_careful (const struct list_head * head); Arguments head the list to test Description tests whether a list is empty _and_ checks that no other CPU might be in the process of modifying either member (next or prev) NOTE using list_empty_careful without synchronization can only be safe if the only activity that can happen to the list entry is list_del_init. Eg. it cannot be used if another CPU could re-list_add it. 1.1.11 list_rotate_left list_rotate_left rotate the list to the left Synopsis void list_rotate_left (struct list_head * head); Arguments head the head of the list 1.1.12 list_is_singular list_is_singular tests whether a list has just one entry. Synopsis int list_is_singular (const struct list_head * head); Arguments head the list to test. The Linux Kernel API 5 / 317 1.1.13 list_cut_position list_cut_position cut a list into two Synopsis void list_cut_position (struct list_head * list, struct list_head * head, struct list_head * entry); Arguments list a new list to add all removed entries head a list with entries entry an entry within head, could be the head itself and if so we wont cut the list Description This helper moves the initial part of head, up to and including entry, from head to list. You should pass on entry an element you know is on head. list should be an empty list or a list you do not care about losing its data. 1.1.14 list_splice list_splice join two lists, this is designed for stacks Synopsis void list_splice (const struct list_head * list, struct list_head * head); Arguments list the new list to add. head the place to add it in the rst list. 1.1.15 list_splice_tail list_splice_tail join two lists, each list being a queue Synopsis void list_splice_tail (struct list_head * list, struct list_head * head); Arguments list the new list to add. head the place to add it in the rst list. The Linux Kernel API 6 / 317 1.1.16 list_splice_init list_splice_init join two lists and reinitialise the emptied list. Synopsis void list_splice_init (struct list_head * list, struct list_head * head); Arguments list the new list to add. head the place to add it in the rst list. Description The list at list is reinitialised 1.1.17 list_splice_tail_init list_splice_tail_init join two lists and reinitialise the emptied list Synopsis void list_splice_tail_init (struct list_head * list, struct list_head * head); Arguments list the new list to add. head the place to add it in the rst list. Description Each of the lists is a queue. The list at list is reinitialised 1.1.18 list_entry list_entry get the struct for this entry Synopsis list_entry ( ptr, type, member); Arguments ptr the struct list_head pointer. type the type of the struct this is embedded in. member the name of the list_struct within the struct. The Linux Kernel API 7 / 317 1.1.19 list_rst_entry list_rst_entry get the rst element from a list Synopsis list_rst_entry ( ptr, type, member); Arguments ptr the list head to take the element from. type the type of the struct this is embedded in. member the name of the list_struct within the struct. Description Note, that list is expected to be not empty. 1.1.20 list_last_entry list_last_entry get the last element from a list Synopsis list_last_entry ( ptr, type, member); Arguments ptr the list head to take the element from. type the type of the struct this is embedded in. member the name of the list_struct within the struct. Description Note, that list is expected to be not empty. 1.1.21 list_rst_entry_or_null list_rst_entry_or_null get the rst element from a list Synopsis list_rst_entry_or_null ( ptr, type, member); The Linux Kernel API 8 / 317 Arguments ptr the list head to take the element from. type the type of the struct this is embedded in. member the name of the list_struct within the struct. Description Note that if the list is empty, it returns NULL. 1.1.22 list_next_entry list_next_entry get the next element in list Synopsis list_next_entry ( pos, member); Arguments pos the type * to cursor member the name of the list_struct within the struct. 1.1.23 list_prev_entry list_prev_entry get the prev element in list Synopsis list_prev_entry ( pos, member); Arguments pos the type * to cursor member the name of the list_struct within the struct. 1.1.24 list_for_each list_for_each iterate over a list Synopsis list_for_each ( pos, head); The Linux Kernel API 9 / 317 Arguments pos the struct list_head to use as a loop cursor. head the head for your list. 1.1.25 list_for_each_prev list_for_each_prev iterate over a list backwards Synopsis list_for_each_prev ( pos, head); Arguments pos the struct list_head to use as a loop cursor. head the head for your list. 1.1.26 list_for_each_safe list_for_each_safe iterate over a list safe against removal of list entry Synopsis list_for_each_safe ( pos, n, head); Arguments pos the struct list_head to use as a loop cursor. n another struct list_head to use as temporary storage head the head for your list. 1.1.27 list_for_each_prev_safe list_for_each_prev_safe iterate over a list backwards safe against removal of list entry Synopsis list_for_each_prev_safe ( pos, n, head); Arguments pos the struct list_head to use as a loop cursor. n another struct list_head to use as temporary storage head the head for your list. The Linux Kernel API 10 / 317 1.1.28 list_for_each_entry list_for_each_entry iterate over list of given type Synopsis list_for_each_entry ( pos, head, member); Arguments pos the type * to use as a loop cursor. head the head for your list. member the name of the list_struct within the struct. 1.1.29 list_for_each_entry_reverse list_for_each_entry_reverse iterate backwards over list of given type. Synopsis list_for_each_entry_reverse ( pos, head, member); Arguments pos the type * to use as a loop cursor. head the head for your list. member the name of the list_struct within the struct. 1.1.30 list_prepare_entry list_prepare_entry prepare a pos entry for use in list_for_each_entry_continue Synopsis list_prepare_entry ( pos, head, member); Arguments pos the type * to use as a start point head the head of the list member the name of the list_struct within the struct. Description Prepares a pos entry for use as a start point in list_for_each_entry_continue. The Linux Kernel API 11 / 317 1.1.31 list_for_each_entry_continue list_for_each_entry_continue continue iteration over list of given type Synopsis list_for_each_entry_continue ( pos, head, member); Arguments pos the type * to use as a loop cursor. head the head for your list. member the name of the list_struct within the struct. Description Continue to iterate over list of given type, continuing after the current position. 1.1.32 list_for_each_entry_continue_reverse list_for_each_entry_continue_reverse iterate backwards from the given point Synopsis list_for_each_entry_continue_reverse ( pos, head, member); Arguments pos the type * to use as a loop cursor. head the head for your list. member the name of the list_struct within the struct. Description Start to iterate over list of given type backwards, continuing after the current position. 1.1.33 list_for_each_entry_from list_for_each_entry_from iterate over list of given type from the current point Synopsis list_for_each_entry_from ( pos, head, member); The Linux Kernel API 12 / 317 Arguments pos the type * to use as a loop cursor. head the head for your list. member the name of the list_struct within the struct. Description Iterate over list of given type, continuing from current position. 1.1.34 list_for_each_entry_safe list_for_each_entry_safe iterate over list of given type safe against removal of list entry Synopsis list_for_each_entry_safe ( pos, n, head, member); Arguments pos the type * to use as a loop cursor. n another type * to use as temporary storage head the head for your list. member the name of the list_struct within the struct. 1.1.35 list_for_each_entry_safe_continue list_for_each_entry_safe_continue continue list iteration safe against removal Synopsis list_for_each_entry_safe_continue ( pos, n, head, member); Arguments pos the type * to use as a loop cursor. n another type * to use as temporary storage head the head for your list. member the name of the list_struct within the struct. Description Iterate over list of given type, continuing after current point, safe against removal of list entry. The Linux Kernel API 13 / 317 1.1.36 list_for_each_entry_safe_from list_for_each_entry_safe_from iterate over list from current point safe against removal Synopsis list_for_each_entry_safe_from ( pos, n, head, member); Arguments pos the type * to use as a loop cursor. n another type * to use as temporary storage head the head for your list. member the name of the list_struct within the struct. Description Iterate over list of given type from current point, safe against removal of list entry. 1.1.37 list_for_each_entry_safe_reverse list_for_each_entry_safe_reverse iterate backwards over list safe against removal Synopsis list_for_each_entry_safe_reverse ( pos, n, head, member); Arguments pos the type * to use as a loop cursor. n another type * to use as temporary storage head the head for your list. member the name of the list_struct within the struct. Description Iterate backwards over list of given type, safe against removal of list entry. 1.1.38 list_safe_reset_next list_safe_reset_next reset a stale list_for_each_entry_safe loop Synopsis list_safe_reset_next ( pos, n, member); The Linux Kernel API 14 / 317 Arguments pos the loop cursor used in the list_for_each_entry_safe loop n temporary storage used in list_for_each_entry_safe member the name of the list_struct within the struct. Description list_safe_reset_next is not safe to use in general if the list may be modied concurrently (eg. the lock is dropped in the loop body). An exception to this is if the cursor element (pos) is pinned in the list, and list_safe_reset_next is called after re-taking the lock and before completing the current iteration of the loop body. 1.1.39 hlist_for_each_entry hlist_for_each_entry iterate over list of given type Synopsis hlist_for_each_entry ( pos, head, member); Arguments pos the type * to use as a loop cursor. head the head for your list. member the name of the hlist_node within the struct. 1.1.40 hlist_for_each_entry_continue hlist_for_each_entry_continue iterate over a hlist continuing after current point Synopsis hlist_for_each_entry_continue ( pos, member); Arguments pos the type * to use as a loop cursor. member the name of the hlist_node within the struct. 1.1.41 hlist_for_each_entry_from hlist_for_each_entry_from iterate over a hlist continuing from current point Synopsis hlist_for_each_entry_from ( pos, member); The Linux Kernel API 15 / 317 Arguments pos the type * to use as a loop cursor. member the name of the hlist_node within the struct. 1.1.42 hlist_for_each_entry_safe hlist_for_each_entry_safe iterate over list of given type safe against removal of list entry Synopsis hlist_for_each_entry_safe ( pos, n, head, member); Arguments pos the type * to use as a loop cursor. n another struct hlist_node to use as temporary storage head the head for your list. member the name of the hlist_node within the struct. The Linux Kernel API 16 / 317 Chapter 2 Basic C Library Functions When writing drivers, you cannot in general use routines which are from the C Library. Some of the functions have been found generally useful and they are listed below. The behaviour of these functions may vary slightly from those dened by ANSI, and these deviations are noted in the text. 2.1 String Conversions 2.1.1 simple_strtoull simple_strtoull convert a string to an unsigned long long Synopsis unsigned long long simple_strtoull (const char * cp, char ** endp, unsigned int base); Arguments cp The start of the string endp A pointer to the end of the parsed string will be placed here base The number base to use Description This function is obsolete. Please use kstrtoull instead. 2.1.2 simple_strtoul simple_strtoul convert a string to an unsigned long Synopsis unsigned long simple_strtoul (const char * cp, char ** endp, unsigned int base); The Linux Kernel API 17 / 317 Arguments cp The start of the string endp A pointer to the end of the parsed string will be placed here base The number base to use Description This function is obsolete. Please use kstrtoul instead. 2.1.3 simple_strtol simple_strtol convert a string to a signed long Synopsis long simple_strtol (const char * cp, char ** endp, unsigned int base); Arguments cp The start of the string endp A pointer to the end of the parsed string will be placed here base The number base to use Description This function is obsolete. Please use kstrtol instead. 2.1.4 simple_strtoll simple_strtoll convert a string to a signed long long Synopsis long long simple_strtoll (const char * cp, char ** endp, unsigned int base); Arguments cp The start of the string endp A pointer to the end of the parsed string will be placed here base The number base to use Description This function is obsolete. Please use kstrtoll instead. The Linux Kernel API 18 / 317 2.1.5 vsnprintf vsnprintf Format a string and place it in a buffer Synopsis int vsnprintf (char * buf, size_t size, const char * fmt, va_list args); Arguments buf The buffer to place the result into size The size of the buffer, including the trailing null space fmt The format string to use args Arguments for the format string Description This function follows C99 vsnprintf, but has some extensions: pS output the name of a text symbol with offset ps output the name of a text symbol without offset pF output the name of a function pointer with its offset pf output the name of a function pointer without its offset pB output the name of a backtrace symbol with its offset pR output the address range in a struct resource with decoded ags pr output the address range in a struct resource with raw ags pM output a 6-byte MAC address with colons pMR output a 6-byte MAC address with colons in reversed order pMF output a 6-byte MAC address with dashes pm output a 6-byte MAC address without colons pmR output a 6-byte MAC address without colons in reversed order pI4 print an IPv4 address without leading zeros pi4 print an IPv4 address with leading zeros pI6 print an IPv6 address with colons pi6 print an IPv6 address without colons pI6c print an IPv6 address as specied by RFC 5952 pIS depending on sa_family of struct sockaddr * print IPv4/IPv6 address piS depending on sa_family of struct sockaddr * print IPv4/IPv6 address pU[bBlL] print a UUID/GUID in big or little endian using lower or upper case. %*ph[CDN] a variable-length hex string with a separator (supports up to 64 bytes of the input) n is ignored ** Please update Documentation/printk-formats.txt when making changes ** The return value is the number of characters which would be generated for the given input, excluding the trailing \0, as per ISO C99. If you want to have the exact number of characters written into buf as return value (not including the trailing \0), use vscnprintf. If the return is greater than or equal to size, the resulting string is truncated. If youre not already dealing with a va_list consider using snprintf. 2.1.6 vscnprintf vscnprintf Format a string and place it in a buffer Synopsis int vscnprintf (char * buf, size_t size, const char * fmt, va_list args); Arguments buf The buffer to place the result into size The size of the buffer, including the trailing null space fmt The format string to use args Arguments for the format string The Linux Kernel API 19 / 317 Description The return value is the number of characters which have been written into the buf not including the trailing \0. If size is == 0 the function returns 0. If youre not already dealing with a va_list consider using scnprintf. See the vsnprintf documentation for format string extensions over C99. 2.1.7 snprintf snprintf Format a string and place it in a buffer Synopsis int snprintf (char * buf, size_t size, const char * fmt, ...); Arguments buf The buffer to place the result into size The size of the buffer, including the trailing null space fmt The format string to use @...: Arguments for the format string ... variable arguments Description The return value is the number of characters which would be generated for the given input, excluding the trailing null, as per ISO C99. If the return is greater than or equal to size, the resulting string is truncated. See the vsnprintf documentation for format string extensions over C99. 2.1.8 scnprintf scnprintf Format a string and place it in a buffer Synopsis int scnprintf (char * buf, size_t size, const char * fmt, ...); Arguments buf The buffer to place the result into size The size of the buffer, including the trailing null space fmt The format string to use @...: Arguments for the format string ... variable arguments The Linux Kernel API 20 / 317 Description The return value is the number of characters written into buf not including the trailing \0. If size is == 0 the function returns 0. 2.1.9 vsprintf vsprintf Format a string and place it in a buffer Synopsis int vsprintf (char * buf, const char * fmt, va_list args); Arguments buf The buffer to place the result into fmt The format string to use args Arguments for the format string Description The function returns the number of characters written into buf. Use vsnprintf or vscnprintf in order to avoid buffer overows. If youre not already dealing with a va_list consider using sprintf. See the vsnprintf documentation for format string extensions over C99. 2.1.10 sprintf sprintf Format a string and place it in a buffer Synopsis int sprintf (char * buf, const char * fmt, ...); Arguments buf The buffer to place the result into fmt The format string to use @...: Arguments for the format string ... variable arguments Description The function returns the number of characters written into buf. Use snprintf or scnprintf in order to avoid buffer overows. See the vsnprintf documentation for format string extensions over C99. The Linux Kernel API 21 / 317 2.1.11 vbin_printf vbin_printf Parse a format string and place args binary value in a buffer Synopsis int vbin_printf (u32 * bin_buf, size_t size, const char * fmt, va_list args); Arguments bin_buf The buffer to place args binary value size The size of the buffer(by words(32bits), not characters) fmt The format string to use args Arguments for the format string Description The format follows C99 vsnprintf, except n is ignored, and its argument is skiped. The return value is the number of words(32bits) which would be generated for the given input. NOTE If the return value is greater than size, the resulting bin_buf is NOT valid for bstr_printf. 2.1.12 bstr_printf bstr_printf Format a string from binary arguments and place it in a buffer Synopsis int bstr_printf (char * buf, size_t size, const char * fmt, const u32 * bin_buf); Arguments buf The buffer to place the result into size The size of the buffer, including the trailing null space fmt The format string to use bin_buf Binary arguments for the format string Description This function like C99 vsnprintf, but the difference is that vsnprintf gets arguments from stack, and bstr_printf gets arguments from bin_buf which is a binary buffer that generated by vbin_printf. The format follows C99 vsnprintf, but has some extensions: see vsnprintf comment for details. The return value is the number of characters which would be generated for the given input, excluding the trailing \0, as per ISO C99. If you want to have the exact number of characters written into buf as return value (not including the trailing \0), use vscnprintf. If the return is greater than or equal to size, the resulting string is truncated. The Linux Kernel API 22 / 317 2.1.13 bprintf bprintf Parse a format string and place args binary value in a buffer Synopsis int bprintf (u32 * bin_buf, size_t size, const char * fmt, ...); Arguments bin_buf The buffer to place args binary value size The size of the buffer(by words(32bits), not characters) fmt The format string to use @...: Arguments for the format string ... variable arguments Description The function returns the number of words(u32) written into bin_buf. 2.1.14 vsscanf vsscanf Unformat a buffer into a list of arguments Synopsis int vsscanf (const char * buf, const char * fmt, va_list args); Arguments buf input buffer fmt format of buffer args arguments 2.1.15 sscanf sscanf Unformat a buffer into a list of arguments Synopsis int sscanf (const char * buf, const char * fmt, ...); Arguments buf input buffer fmt formatting of buffer @...: resulting arguments ... variable arguments The Linux Kernel API 23 / 317 2.1.16 kstrtol kstrtol convert a string to a long Synopsis int kstrtol (const char * s, unsigned int base, long * res); Arguments s The start of the string. The string must be null-terminated, and may also include a single newline before its terminating null. The rst character may also be a plus sign or a minus sign. base The number base to use. The maximum supported base is 16. If base is given as 0, then the base of the string is automatically detected with the conventional semantics - If it begins with 0x the number will be parsed as a hexadecimal (case insensitive), if it otherwise begins with 0, it will be parsed as an octal number. Otherwise it will be parsed as a decimal. res Where to write the result of the conversion on success. Description Returns 0 on success, -ERANGE on overow and -EINVAL on parsing error. Used as a replacement for the obsolete sim- ple_strtoull. Return code must be checked. 2.1.17 kstrtoul kstrtoul convert a string to an unsigned long Synopsis int kstrtoul (const char * s, unsigned int base, unsigned long * res); Arguments s The start of the string. The string must be null-terminated, and may also include a single newline before its terminating null. The rst character may also be a plus sign, but not a minus sign. base The number base to use. The maximum supported base is 16. If base is given as 0, then the base of the string is automatically detected with the conventional semantics - If it begins with 0x the number will be parsed as a hexadecimal (case insensitive), if it otherwise begins with 0, it will be parsed as an octal number. Otherwise it will be parsed as a decimal. res Where to write the result of the conversion on success. Description Returns 0 on success, -ERANGE on overow and -EINVAL on parsing error. Used as a replacement for the obsolete sim- ple_strtoull. Return code must be checked. 2.1.18 kstrtoull kstrtoull convert a string to an unsigned long long The Linux Kernel API 24 / 317 Synopsis int kstrtoull (const char * s, unsigned int base, unsigned long long * res); Arguments s The start of the string. The string must be null-terminated, and may also include a single newline before its terminating null. The rst character may also be a plus sign, but not a minus sign. base The number base to use. The maximum supported base is 16. If base is given as 0, then the base of the string is automatically detected with the conventional semantics - If it begins with 0x the number will be parsed as a hexadecimal (case insensitive), if it otherwise begins with 0, it will be parsed as an octal number. Otherwise it will be parsed as a decimal. res Where to write the result of the conversion on success. Description Returns 0 on success, -ERANGE on overow and -EINVAL on parsing error. Used as a replacement for the obsolete sim- ple_strtoull. Return code must be checked. 2.1.19 kstrtoll kstrtoll convert a string to a long long Synopsis int kstrtoll (const char * s, unsigned int base, long long * res); Arguments s The start of the string. The string must be null-terminated, and may also include a single newline before its terminating null. The rst character may also be a plus sign or a minus sign. base The number base to use. The maximum supported base is 16. If base is given as 0, then the base of the string is automatically detected with the conventional semantics - If it begins with 0x the number will be parsed as a hexadecimal (case insensitive), if it otherwise begins with 0, it will be parsed as an octal number. Otherwise it will be parsed as a decimal. res Where to write the result of the conversion on success. Description Returns 0 on success, -ERANGE on overow and -EINVAL on parsing error. Used as a replacement for the obsolete sim- ple_strtoull. Return code must be checked. 2.1.20 kstrtouint kstrtouint convert a string to an unsigned int The Linux Kernel API 25 / 317 Synopsis int kstrtouint (const char * s, unsigned int base, unsigned int * res); Arguments s The start of the string. The string must be null-terminated, and may also include a single newline before its terminating null. The rst character may also be a plus sign, but not a minus sign. base The number base to use. The maximum supported base is 16. If base is given as 0, then the base of the string is automatically detected with the conventional semantics - If it begins with 0x the number will be parsed as a hexadecimal (case insensitive), if it otherwise begins with 0, it will be parsed as an octal number. Otherwise it will be parsed as a decimal. res Where to write the result of the conversion on success. Description Returns 0 on success, -ERANGE on overow and -EINVAL on parsing error. Used as a replacement for the obsolete sim- ple_strtoull. Return code must be checked. 2.1.21 kstrtoint kstrtoint convert a string to an int Synopsis int kstrtoint (const char * s, unsigned int base, int * res); Arguments s The start of the string. The string must be null-terminated, and may also include a single newline before its terminating null. The rst character may also be a plus sign or a minus sign. base The number base to use. The maximum supported base is 16. If base is given as 0, then the base of the string is automatically detected with the conventional semantics - If it begins with 0x the number will be parsed as a hexadecimal (case insensitive), if it otherwise begins with 0, it will be parsed as an octal number. Otherwise it will be parsed as a decimal. res Where to write the result of the conversion on success. Description Returns 0 on success, -ERANGE on overow and -EINVAL on parsing error. Used as a replacement for the obsolete sim- ple_strtoull. Return code must be checked. 2.2 String Manipulation 2.2.1 strnicmp strnicmp Case insensitive, length-limited string comparison The Linux Kernel API 26 / 317 Synopsis int strnicmp (const char * s1, const char * s2, size_t len); Arguments s1 One string s2 The other string len the maximum number of characters to compare 2.2.2 strcpy strcpy Copy a NUL terminated string Synopsis char * strcpy (char * dest, const char * src); Arguments dest Where to copy the string to src Where to copy the string from 2.2.3 strncpy strncpy Copy a length-limited, C-string Synopsis char * strncpy (char * dest, const char * src, size_t count); Arguments dest Where to copy the string to src Where to copy the string from count The maximum number of bytes to copy Description The result is not NUL-terminated if the source exceeds count bytes. In the case where the length of src is less than that of count, the remainder of dest will be padded with NUL. 2.2.4 strlcpy strlcpy Copy a C-string into a sized buffer The Linux Kernel API 27 / 317 Synopsis size_t strlcpy (char * dest, const char * src, size_t size); Arguments dest Where to copy the string to src Where to copy the string from size size of destination buffer BSD the result is always a valid NUL-terminated string that ts in the buffer (unless, of course, the buffer size is zero). It does not pad out the result like strncpy does. 2.2.5 strcat strcat Append one NUL-terminated string to another Synopsis char * strcat (char * dest, const char * src); Arguments dest The string to be appended to src The string to append to it 2.2.6 strncat strncat Append a length-limited, C-string to another Synopsis char * strncat (char * dest, const char * src, size_t count); Arguments dest The string to be appended to src The string to append to it count The maximum numbers of bytes to copy Description Note that in contrast to strncpy, strncat ensures the result is terminated. The Linux Kernel API 28 / 317 2.2.7 strlcat strlcat Append a length-limited, C-string to another Synopsis size_t strlcat (char * dest, const char * src, size_t count); Arguments dest The string to be appended to src The string to append to it count The size of the destination buffer. 2.2.8 strcmp strcmp Compare two strings Synopsis int strcmp (const char * cs, const char * ct); Arguments cs One string ct Another string 2.2.9 strncmp strncmp Compare two length-limited strings Synopsis int strncmp (const char * cs, const char * ct, size_t count); Arguments cs One string ct Another string count The maximum number of bytes to compare 2.2.10 strchr strchr Find the rst occurrence of a character in a string The Linux Kernel API 29 / 317 Synopsis char * strchr (const char * s, int c); Arguments s The string to be searched c The character to search for 2.2.11 strchrnul strchrnul Find and return a character in a string, or end of string Synopsis char * strchrnul (const char * s, int c); Arguments s The string to be searched c The character to search for Description Returns pointer to rst occurrence of c in s. If c is not found, then return a pointer to the null byte at the end of s. 2.2.12 strrchr strrchr Find the last occurrence of a character in a string Synopsis char * strrchr (const char * s, int c); Arguments s The string to be searched c The character to search for 2.2.13 strnchr strnchr Find a character in a length limited string Synopsis char * strnchr (const char * s, size_t count, int c); The Linux Kernel API 30 / 317 Arguments s The string to be searched count The number of characters to be searched c The character to search for 2.2.14 skip_spaces skip_spaces Removes leading whitespace from str. Synopsis char * skip_spaces (const char * str); Arguments str The string to be stripped. Description Returns a pointer to the rst non-whitespace character in str. 2.2.15 strim strim Removes leading and trailing whitespace from s. Synopsis char * strim (char * s); Arguments s The string to be stripped. Description Note that the rst trailing whitespace is replaced with a NUL-terminator in the given string s. Returns a pointer to the rst non-whitespace character in s. 2.2.16 strlen strlen Find the length of a string Synopsis size_t strlen (const char * s); The Linux Kernel API 31 / 317 Arguments s The string to be sized 2.2.17 strnlen strnlen Find the length of a length-limited string Synopsis size_t strnlen (const char * s, size_t count); Arguments s The string to be sized count The maximum number of bytes to search 2.2.18 strspn strspn Calculate the length of the initial substring of s which only contain letters in accept Synopsis size_t strspn (const char * s, const char * accept); Arguments s The string to be searched accept The string to search for 2.2.19 strcspn strcspn Calculate the length of the initial substring of s which does not contain letters in reject Synopsis size_t strcspn (const char * s, const char * reject); Arguments s The string to be searched reject The string to avoid 2.2.20 strpbrk strpbrk Find the rst occurrence of a set of characters The Linux Kernel API 32 / 317 Synopsis char * strpbrk (const char * cs, const char * ct); Arguments cs The string to be searched ct The characters to search for 2.2.21 strsep strsep Split a string into tokens Synopsis char * strsep (char ** s, const char * ct); Arguments s The string to be searched ct The characters to search for Description strsep updates s to point after the token, ready for the next call. It returns empty tokens, too, behaving exactly like the libc function of that name. In fact, it was stolen from glibc2 and de-fancy- ed. Same semantics, slimmer shape. ;) 2.2.22 sysfs_streq sysfs_streq return true if strings are equal, modulo trailing newline Synopsis bool sysfs_streq (const char * s1, const char * s2); Arguments s1 one string s2 another string Description This routine returns true iff two strings are equal, treating both NUL and newline-then-NUL as equivalent string terminations. Its geared for use with sysfs input strings, which generally terminate with newlines but are compared against values without newlines. The Linux Kernel API 33 / 317 2.2.23 strtobool strtobool convert common user inputs into boolean values Synopsis int strtobool (const char * s, bool * res); Arguments s input string res result Description This routine returns 0 iff the rst character is one of Yy1Nn0. Otherwise it will return -EINVAL. Value pointed to by res is updated upon nding a match. 2.2.24 memset memset Fill a region of memory with the given value Synopsis void * memset (void * s, int c, size_t count); Arguments s Pointer to the start of the area. c The byte to ll the area with count The size of the area. Description Do not use memset to access IO space, use memset_io instead. 2.2.25 memcpy memcpy Copy one area of memory to another Synopsis void * memcpy (void * dest, const void * src, size_t count); The Linux Kernel API 34 / 317 Arguments dest Where to copy to src Where to copy from count The size of the area. Description You should not use this function to access IO space, use memcpy_toio or memcpy_fromio instead. 2.2.26 memmove memmove Copy one area of memory to another Synopsis void * memmove (void * dest, const void * src, size_t count); Arguments dest Where to copy to src Where to copy from count The size of the area. Description Unlike memcpy, memmove copes with overlapping areas. 2.2.27 memcmp memcmp Compare two areas of memory Synopsis __visible int memcmp (const void * cs, const void * ct, size_t count); Arguments cs One area of memory ct Another area of memory count The size of the area. 2.2.28 memscan memscan Find a character in an area of memory. The Linux Kernel API 35 / 317 Synopsis void * memscan (void * addr, int c, size_t size); Arguments addr The memory area c The byte to search for size The size of the area. Description returns the address of the rst occurrence of c, or 1 byte past the area if c is not found 2.2.29 strstr strstr Find the rst substring in a NUL terminated string Synopsis char * strstr (const char * s1, const char * s2); Arguments s1 The string to be searched s2 The string to search for 2.2.30 strnstr strnstr Find the rst substring in a length-limited string Synopsis char * strnstr (const char * s1, const char * s2, size_t len); Arguments s1 The string to be searched s2 The string to search for len the maximum number of characters to search 2.2.31 memchr memchr Find a character in an area of memory. The Linux Kernel API 36 / 317 Synopsis void * memchr (const void * s, int c, size_t n); Arguments s The memory area c The byte to search for n The size of the area. Description returns the address of the rst occurrence of c, or NULL if c is not found 2.2.32 memchr_inv memchr_inv Find an unmatching character in an area of memory. Synopsis void * memchr_inv (const void * start, int c, size_t bytes); Arguments start The memory area c Find a character other than c bytes The size of the area. Description returns the address of the rst character other than c, or NULL if the whole buffer contains just c. 2.3 Bit Operations 2.3.1 set_bit set_bit Atomically set a bit in memory Synopsis void set_bit (long nr, volatile unsigned long * addr); Arguments nr the bit to set addr the address to start counting from The Linux Kernel API 37 / 317 Description This function is atomic and may not be reordered. See __set_bit if you do not require the atomic guarantees. Note there are no guarantees that this function will not be reordered on non x86 architectures, so if you are writing portable code, make sure not to rely on its reordering guarantees. Note that nr may be almost arbitrarily large; this function is not restricted to acting on a single-word quantity. 2.3.2 __set_bit __set_bit Set a bit in memory Synopsis void __set_bit (long nr, volatile unsigned long * addr); Arguments nr the bit to set addr the address to start counting from Description Unlike set_bit, this function is non-atomic and may be reordered. If its called on the same region of memory simultaneously, the effect may be that only one operation succeeds. 2.3.3 clear_bit clear_bit Clears a bit in memory Synopsis void clear_bit (long nr, volatile unsigned long * addr); Arguments nr Bit to clear addr Address to start counting from Description clear_bit is atomic and may not be reordered. However, it does not contain a memory barrier, so if it is used for locking purposes, you should call smp_mb__before_atomic and/or smp_mb__after_atomic in order to ensure changes are visible on other processors. The Linux Kernel API 38 / 317 2.3.4 __change_bit __change_bit Toggle a bit in memory Synopsis void __change_bit (long nr, volatile unsigned long * addr); Arguments nr the bit to change addr the address to start counting from Description Unlike change_bit, this function is non-atomic and may be reordered. If its called on the same region of memory simultane- ously, the effect may be that only one operation succeeds. 2.3.5 change_bit change_bit Toggle a bit in memory Synopsis void change_bit (long nr, volatile unsigned long * addr); Arguments nr Bit to change addr Address to start counting from Description change_bit is atomic and may not be reordered. Note that nr may be almost arbitrarily large; this function is not restricted to acting on a single-word quantity. 2.3.6 test_and_set_bit test_and_set_bit Set a bit and return its old value Synopsis int test_and_set_bit (long nr, volatile unsigned long * addr); Arguments nr Bit to set addr Address to count from The Linux Kernel API 39 / 317 Description This operation is atomic and cannot be reordered. It also implies a memory barrier. 2.3.7 test_and_set_bit_lock test_and_set_bit_lock Set a bit and return its old value for lock Synopsis int test_and_set_bit_lock (long nr, volatile unsigned long * addr); Arguments nr Bit to set addr Address to count from Description This is the same as test_and_set_bit on x86. 2.3.8 __test_and_set_bit __test_and_set_bit Set a bit and return its old value Synopsis int __test_and_set_bit (long nr, volatile unsigned long * addr); Arguments nr Bit to set addr Address to count from Description This operation is non-atomic and can be reordered. If two examples of this operation race, one can appear to succeed but actually fail. You must protect multiple accesses with a lock. 2.3.9 test_and_clear_bit test_and_clear_bit Clear a bit and return its old value Synopsis int test_and_clear_bit (long nr, volatile unsigned long * addr); The Linux Kernel API 40 / 317 Arguments nr Bit to clear addr Address to count from Description This operation is atomic and cannot be reordered. It also implies a memory barrier. 2.3.10 __test_and_clear_bit __test_and_clear_bit Clear a bit and return its old value Synopsis int __test_and_clear_bit (long nr, volatile unsigned long * addr); Arguments nr Bit to clear addr Address to count from Description This operation is non-atomic and can be reordered. If two examples of this operation race, one can appear to succeed but actually fail. You must protect multiple accesses with a lock. Note the operation is performed atomically with respect to the local CPU, but not other CPUs. Portable code should not rely on this behaviour. KVM relies on this behaviour on x86 for modifying memory that is also accessed from a hypervisor on the same CPU if running in a VM dont change this without also updating arch/x86/kernel/kvm.c 2.3.11 test_and_change_bit test_and_change_bit Change a bit and return its old value Synopsis int test_and_change_bit (long nr, volatile unsigned long * addr); Arguments nr Bit to change addr Address to count from The Linux Kernel API 41 / 317 Description This operation is atomic and cannot be reordered. It also implies a memory barrier. 2.3.12 test_bit test_bit Determine whether a bit is set Synopsis int test_bit (int nr, const volatile unsigned long * addr); Arguments nr bit number to test addr Address to start counting from 2.3.13 __ffs __ffs nd rst set bit in word Synopsis unsigned long __ffs (unsigned long word); Arguments word The word to search Description Undened if no bit exists, so code should check against 0 rst. 2.3.14 ffz ffz nd rst zero bit in word Synopsis unsigned long ffz (unsigned long word); Arguments word The word to search Description Undened if no zero exists, so code should check against ~0UL rst. The Linux Kernel API 42 / 317 2.3.15 ffs ffs nd rst set bit in word Synopsis int ffs (int x); Arguments x the word to search Description This is dened the same way as the libc and compiler builtin ffs routines, therefore differs in spirit from the other bitops. ffs(value) returns 0 if value is 0 or the position of the rst set bit if value is nonzero. The rst (least signicant) bit is at position 1. 2.3.16 s s nd last set bit in word Synopsis int s (int x); Arguments x the word to search Description This is dened in a similar way as the libc and compiler builtin ffs, but returns the position of the most signicant set bit. s(value) returns 0 if value is 0 or the position of the last set bit if value is nonzero. The last (most signicant) bit is at position 32. 2.3.17 s64 s64 nd last set bit in a 64-bit word Synopsis int s64 (__u64 x); Arguments x the word to search The Linux Kernel API 43 / 317 Description This is dened in a similar way as the libc and compiler builtin ffsll, but returns the position of the most signicant set bit. s64(value) returns 0 if value is 0 or the position of the last set bit if value is nonzero. The last (most signicant) bit is at position 64. The Linux Kernel API 44 / 317 Chapter 3 Basic Kernel Library Functions The Linux kernel provides more basic utility functions. 3.1 Bitmap Operations 3.1.1 __bitmap_shift_right __bitmap_shift_right logical right shift of the bits in a bitmap Synopsis void __bitmap_shift_right (unsigned long * dst, const unsigned long * src, int shift, int bits); Arguments dst destination bitmap src source bitmap shift shift by this many bits bits bitmap size, in bits Description Shifting right (dividing) means moving bits in the MS -> LS bit direction. Zeros are fed into the vacated MS positions and the LS bits shifted off the bottom are lost. 3.1.2 __bitmap_shift_left __bitmap_shift_left logical left shift of the bits in a bitmap Synopsis void __bitmap_shift_left (unsigned long * dst, const unsigned long * src, int shift, int bits); The Linux Kernel API 45 / 317 Arguments dst destination bitmap src source bitmap shift shift by this many bits bits bitmap size, in bits Description Shifting left (multiplying) means moving bits in the LS -> MS direction. Zeros are fed into the vacated LS bit positions and those MS bits shifted off the top are lost. 3.1.3 bitmap_scnprintf bitmap_scnprintf convert bitmap to an ASCII hex string. Synopsis int bitmap_scnprintf (char * buf, unsigned int buen, const unsigned long * maskp, int nmaskbits); Arguments buf byte buffer into which string is placed buflen reserved size of buf, in bytes maskp pointer to bitmap to convert nmaskbits size of bitmap, in bits Description Exactly nmaskbits bits are displayed. Hex digits are grouped into comma-separated sets of eight digits per set. Returns the number of characters which were written to *buf, excluding the trailing \0. 3.1.4 __bitmap_parse __bitmap_parse convert an ASCII hex string into a bitmap. Synopsis int __bitmap_parse (const char * buf, unsigned int buen, int is_user, unsigned long * maskp, int nmaskbits); Arguments buf pointer to buffer containing string. buflen buffer size in bytes. If string is smaller than this then it must be terminated with a \0. is_user location of buffer, 0 indicates kernel space maskp pointer to bitmap array that will contain result. nmaskbits size of bitmap, in bits. The Linux Kernel API 46 / 317 Description Commas group hex digits into chunks. Each chunk denes exactly 32 bits of the resultant bitmask. No chunk may specify a value larger than 32 bits (-EOVERFLOW), and if a chunk species a smaller value then leading 0-bits are prepended. -EINVAL is returned for illegal characters and for grouping errors such as 15, ,44, , and "". Leading and trailing whitespace accepted, but not embedded whitespace. 3.1.5 bitmap_parse_user bitmap_parse_user convert an ASCII hex string in a user buffer into a bitmap Synopsis int bitmap_parse_user (const char __user * ubuf, unsigned int ulen, unsigned long * maskp, int nmaskbits); Arguments ubuf pointer to user buffer containing string. ulen buffer size in bytes. If string is smaller than this then it must be terminated with a \0. maskp pointer to bitmap array that will contain result. nmaskbits size of bitmap, in bits. Description Wrapper for __bitmap_parse, providing it with user buffer. We cannot have this as an inline function in bitmap.h because it needs linux/uaccess.h to get the access_ok declaration and this causes cyclic dependencies. 3.1.6 bitmap_scnlistprintf bitmap_scnlistprintf convert bitmap to list format ASCII string Synopsis int bitmap_scnlistprintf (char * buf, unsigned int buen, const unsigned long * maskp, int nmaskbits); Arguments buf byte buffer into which string is placed buflen reserved size of buf, in bytes maskp pointer to bitmap to convert nmaskbits size of bitmap, in bits The Linux Kernel API 47 / 317 Description Output format is a comma-separated list of decimal numbers and ranges. Consecutively set bits are shown as two hyphen- separated decimal numbers, the smallest and largest bit numbers set in the range. Output format is compatible with the format accepted as input by bitmap_parselist. The return value is the number of characters which were written to *buf excluding the trailing \0, as per ISO C99s scnprintf. 3.1.7 bitmap_parselist_user bitmap_parselist_user Synopsis int bitmap_parselist_user (const char __user * ubuf, unsigned int ulen, unsigned long * maskp, int nmaskbits); Arguments ubuf pointer to user buffer containing string. ulen buffer size in bytes. If string is smaller than this then it must be terminated with a \0. maskp pointer to bitmap array that will contain result. nmaskbits size of bitmap, in bits. Description Wrapper for bitmap_parselist, providing it with user buffer. We cannot have this as an inline function in bitmap.h because it needs linux/uaccess.h to get the access_ok declaration and this causes cyclic dependencies. 3.1.8 bitmap_remap bitmap_remap Apply map dened by a pair of bitmaps to another bitmap Synopsis void bitmap_remap (unsigned long * dst, const unsigned long * src, const unsigned long * old, const unsigned long * new, int bits); Arguments dst remapped result src subset to be remapped old denes domain of map new denes range of map bits number of bits in each of these bitmaps The Linux Kernel API 48 / 317 Description Let old and new dene a mapping of bit positions, such that whatever position is held by the n-th set bit in old is mapped to the n-th set bit in new. In the more general case, allowing for the possibility that the weight w of new is less than the weight of old, map the position of the n-th set bit in old to the position of the m-th set bit in new, where m == n % w. If either of the old and new bitmaps are empty, or if src and dst point to the same location, then this routine copies src to dst. The positions of unset bits in old are mapped to themselves (the identify map). Apply the above specied mapping to src, placing the result in dst, clearing any bits previously set in dst. For example, lets say that old has bits 4 through 7 set, and new has bits 12 through 15 set. This denes the mapping of bit position 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all other bit positions unchanged. So if say src comes into this routine with bits 1, 5 and 7 set, then dst should leave with bits 1, 13 and 15 set. 3.1.9 bitmap_bitremap bitmap_bitremap Apply map dened by a pair of bitmaps to a single bit Synopsis int bitmap_bitremap (int oldbit, const unsigned long * old, const unsigned long * new, int bits); Arguments oldbit bit position to be mapped old denes domain of map new denes range of map bits number of bits in each of these bitmaps Description Let old and new dene a mapping of bit positions, such that whatever position is held by the n-th set bit in old is mapped to the n-th set bit in new. In the more general case, allowing for the possibility that the weight w of new is less than the weight of old, map the position of the n-th set bit in old to the position of the m-th set bit in new, where m == n % w. The positions of unset bits in old are mapped to themselves (the identify map). Apply the above specied mapping to bit position oldbit, returning the new bit position. For example, lets say that old has bits 4 through 7 set, and new has bits 12 through 15 set. This denes the mapping of bit position 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all other bit positions unchanged. So if say oldbit is 5, then this routine returns 13. 3.1.10 bitmap_onto bitmap_onto translate one bitmap relative to another Synopsis void bitmap_onto (unsigned long * dst, const unsigned long * orig, const unsigned long * relmap, int bits); The Linux Kernel API 49 / 317 Arguments dst resulting translated bitmap orig original untranslated bitmap relmap bitmap relative to which translated bits number of bits in each of these bitmaps Description Set the n-th bit of dst iff there exists some m such that the n-th bit of relmap is set, the m-th bit of orig is set, and the n-th bit of relmap is also the m-th _set_ bit of relmap. (If you understood the previous sentence the rst time your read it, youre overqualied for your current job.) In other words, orig is mapped onto (surjectively) dst, using the the map { <n, m> | the n-th bit of relmap is the m-th set bit of relmap }. Any set bits in orig above bit number W, where W is the weight of (number of set bits in) relmap are mapped nowhere. In particular, if for all bits m set in orig, m >= W, then dst will end up empty. In situations where the possibility of such an empty result is not desired, one way to avoid it is to use the bitmap_fold operator, below, to rst fold the orig bitmap over itself so that all its set bits x are in the range 0 <= x < W. The bitmap_fold operator does this by setting the bit (m % W) in dst, for each bit (m) set in orig. Example [1] for bitmap_onto: Lets say relmap has bits 30-39 set, and orig has bits 1, 3, 5, 7, 9 and 11 set. Then on return from this routine, dst will have bits 31, 33, 35, 37 and 39 set. When bit 0 is set in orig, it means turn on the bit in dst corresponding to whatever is the rst bit (if any) that is turned on in relmap. Since bit 0 was off in the above example, we leave off that bit (bit 30) in dst. When bit 1 is set in orig (as in the above example), it means turn on the bit in dst corresponding to whatever is the second bit that is turned on in relmap. The second bit in relmap that was turned on in the above example was bit 31, so we turned on bit 31 in dst. Similarly, we turned on bits 33, 35, 37 and 39 in dst, because they were the 4th, 6th, 8th and 10th set bits set in relmap, and the 4th, 6th, 8th and 10th bits of orig (i.e. bits 3, 5, 7 and 9) were also set. When bit 11 is set in orig, it means turn on the bit in dst corresponding to whatever is the twelfth bit that is turned on in relmap. In the above example, there were only ten bits turned on in relmap (30..39), so that bit 11 was set in orig had no affect on dst. Example [2] for bitmap_fold + bitmap_onto: Lets say relmap has these ten bits set: 40 41 42 43 45 48 53 61 74 95 (for the curious, thats 40 plus the rst ten terms of the Fibonacci sequence.) Further lets say we use the following code, invoking bitmap_fold then bitmap_onto, as suggested above to avoid the possi- tility of an empty dst result: unsigned long *tmp; // a temporary bitmaps bits bitmap_fold(tmp, orig, bitmap_weight(relmap, bits), bits); bitmap_onto(dst, tmp, relmap, bits); Then this table shows what various values of dst would be, for various origs. I list the zero-based positions of each set bit. The tmp column shows the intermediate result, as computed by using bitmap_fold to fold the orig bitmap modulo ten (the weight of relmap). orig tmp dst 0 0 40 1 1 41 9 9 95 10 0 40 (*) 1 3 5 7 1 3 5 7 41 43 48 61 0 1 2 3 4 0 1 2 3 4 40 41 42 43 45 0 9 18 27 0 9 8 7 40 61 74 95 0 10 20 30 0 40 0 11 22 33 0 1 2 3 40 41 42 43 0 12 24 36 0 2 4 6 40 42 45 53 78 102 211 1 2 8 41 42 74 (*) (*) For these marked lines, if we hadnt rst done bitmap_fold into tmp, then the dst result would have been empty. If either of orig or relmap is empty (no set bits), then dst will be returned empty. If (as explained above) the only set bits in orig are in positions m where m >= W, (where W is the weight of relmap) then dst will once again be returned empty. All bits in dst not set by the above rule are cleared. The Linux Kernel API 50 / 317 3.1.11 bitmap_fold bitmap_fold fold larger bitmap into smaller, modulo specied size Synopsis void bitmap_fold (unsigned long * dst, const unsigned long * orig, int sz, int bits); Arguments dst resulting smaller bitmap orig original larger bitmap sz specied size bits number of bits in each of these bitmaps Description For each bit oldbit in orig, set bit oldbit mod sz in dst. Clear all other bits in dst. See further the comment and Example [2] for bitmap_onto for why and how to use this. 3.1.12 bitmap_nd_free_region bitmap_nd_free_region nd a contiguous aligned mem region Synopsis int bitmap_nd_free_region (unsigned long * bitmap, int bits, int order); Arguments bitmap array of unsigned longs corresponding to the bitmap bits number of bits in the bitmap order region size (log base 2 of number of bits) to nd Description Find a region of free (zero) bits in a bitmap of bits bits and allocate them (set them to one). Only consider regions of length a power (order) of two, aligned to that power of two, which makes the search algorithm much faster. Return the bit offset in bitmap of the allocated region, or -errno on failure. 3.1.13 bitmap_release_region bitmap_release_region release allocated bitmap region Synopsis void bitmap_release_region (unsigned long * bitmap, int pos, int order); The Linux Kernel API 51 / 317 Arguments bitmap array of unsigned longs corresponding to the bitmap pos beginning of bit region to release order region size (log base 2 of number of bits) to release Description This is the complement to __bitmap_find_free_region and releases the found region (by clearing it in the bitmap). No return value. 3.1.14 bitmap_allocate_region bitmap_allocate_region allocate bitmap region Synopsis int bitmap_allocate_region (unsigned long * bitmap, int pos, int order); Arguments bitmap array of unsigned longs corresponding to the bitmap pos beginning of bit region to allocate order region size (log base 2 of number of bits) to allocate Description Allocate (set bits in) a specied region of a bitmap. Return 0 on success, or -EBUSY if specied region wasnt free (not all bits were zero). 3.1.15 bitmap_copy_le bitmap_copy_le copy a bitmap, putting the bits into little-endian order. Synopsis void bitmap_copy_le (void * dst, const unsigned long * src, int nbits); Arguments dst destination buffer src bitmap to copy nbits number of bits in the bitmap The Linux Kernel API 52 / 317 Description Require nbits % BITS_PER_LONG == 0. 3.1.16 __bitmap_parselist __bitmap_parselist convert list format ASCII string to bitmap Synopsis int __bitmap_parselist (const char * buf, unsigned int buen, int is_user, unsigned long * maskp, int nmaskbits); Arguments buf read nul-terminated user string from this buffer buflen buffer size in bytes. If string is smaller than this then it must be terminated with a \0. is_user location of buffer, 0 indicates kernel space maskp write resulting mask here nmaskbits number of bits in mask to be written Description Input format is a comma-separated list of decimal numbers and ranges. Consecutively set bits are shown as two hyphen-separated decimal numbers, the smallest and largest bit numbers set in the range. Returns 0 on success, -errno on invalid input strings. Error values -EINVAL: second number in range smaller than rst -EINVAL: invalid character in string -ERANGE: bit number specied too large for mask 3.1.17 bitmap_pos_to_ord bitmap_pos_to_ord nd ordinal of set bit at given position in bitmap Synopsis int bitmap_pos_to_ord (const unsigned long * buf, int pos, int bits); Arguments buf pointer to a bitmap pos a bit position in buf (0 <= pos < bits) bits number of valid bit positions in buf The Linux Kernel API 53 / 317 Description Map the bit at position pos in buf (of length bits) to the ordinal of which set bit it is. If it is not set or if pos is not a valid bit position, map to -1. If for example, just bits 4 through 7 are set in buf, then pos values 4 through 7 will get mapped to 0 through 3, respectively, and other pos values will get mapped to 0. When pos value 7 gets mapped to (returns) ord value 3 in this example, that means that bit 7 is the 3rd (starting with 0th) set bit in buf. The bit positions 0 through bits are valid positions in buf. 3.1.18 bitmap_ord_to_pos bitmap_ord_to_pos nd position of n-th set bit in bitmap Synopsis int bitmap_ord_to_pos (const unsigned long * buf, int ord, int bits); Arguments buf pointer to bitmap ord ordinal bit position (n-th set bit, n >= 0) bits number of valid bit positions in buf Description Map the ordinal offset of bit ord in buf to its position in buf. Value of ord should be in range 0 <= ord < weight(buf), else results are undened. If for example, just bits 4 through 7 are set in buf, then ord values 0 through 3 will get mapped to 4 through 7, respectively, and all other ord values return undened values. When ord value 3 gets mapped to (returns) pos value 7 in this example, that means that the 3rd set bit (starting with 0th) is at position 7 in buf. The bit positions 0 through bits are valid positions in buf. 3.2 Command-line Parsing 3.2.1 get_option get_option Parse integer from an option string Synopsis int get_option (char ** str, int * pint); Arguments str option string pint (output) integer value parsed from str The Linux Kernel API 54 / 317 Description Read an int from an option string; if available accept a subsequent comma as well. Return values 0 - no int in string 1 - int found, no subsequent comma 2 - int found including a subsequent comma 3 - hyphen found to denote a range 3.2.2 get_options get_options Parse a string into a list of integers Synopsis char * get_options (const char * str, int nints, int * ints); Arguments str String to be parsed nints size of integer array ints integer array Description This function parses a string containing a comma-separated list of integers, a hyphen-separated range of _positive_ integers, or a combination of both. The parse halts when the array is full, or when no more numbers can be retrieved from the string. Return value is the character in the string which caused the parse to end (typically a null terminator, if str is completely parseable). 3.2.3 memparse memparse parse a string with mem sufxes into a number Synopsis unsigned long long memparse (const char * ptr, char ** retptr); Arguments ptr Where parse begins retptr (output) Optional pointer to next char after parse completes Description Parses a string into a number. The number stored at ptr is potentially sufxed with K (for kilobytes, or 1024 bytes), M (for megabytes, or 1048576 bytes), or G (for gigabytes, or 1073741824). If the number is sufxed with K, M, or G, then the return value is the number multiplied by one kilobyte, one megabyte, or one gigabyte, respectively. The Linux Kernel API 55 / 317 3.3 CRC Functions 3.3.1 crc7_be crc7_be update the CRC7 for the data buffer Synopsis u8 crc7_be (u8 crc, const u8 * buffer, size_t len); Arguments crc previous CRC7 value buffer data pointer len number of bytes in the buffer Context any Description Returns the updated CRC7 value. The CRC7 is left-aligned in the byte (the lsbit is always 0), as that makes the computation easier, and all callers want it in that form. 3.3.2 crc16 crc16 compute the CRC-16 for the data buffer Synopsis u16 crc16 (u16 crc, u8 const * buffer, size_t len); Arguments crc previous CRC value buffer data pointer len number of bytes in the buffer Description Returns the updated CRC value. 3.3.3 crc_itu_t crc_itu_t Compute the CRC-ITU-T for the data buffer The Linux Kernel API 56 / 317 Synopsis u16 crc_itu_t (u16 crc, const u8 * buffer, size_t len); Arguments crc previous CRC value buffer data pointer len number of bytes in the buffer Description Returns the updated CRC value 3.3.4 .//lib/crc32.c .//lib/crc32.c Document generation inconsistency Oops Warning The template for this document tried to insert the structured comment from the le .//lib/crc32.c at this point, but none was found. This dummy section is inserted to allow generation to continue. 3.3.5 crc_ccitt crc_ccitt recompute the CRC for the data buffer Synopsis u16 crc_ccitt (u16 crc, u8 const * buffer, size_t len); Arguments crc previous CRC value buffer data pointer len number of bytes in the buffer The Linux Kernel API 57 / 317 3.4 idr/ida Functions idr synchronization (stolen from radix-tree.h) idr_find is able to be called locklessly, using RCU. The caller must ensure calls to this function are made within rcu_read _lock regions. Other readers (lock-free or otherwise) and modications may be running concurrently. It is still required that the caller manage the synchronization and lifetimes of the items. So if RCU lock-free lookups are used, typically this would mean that the items have their own locks, or are amenable to lock-free access; and that the items are freed by RCU (or only freed after having been deleted from the idr tree *and* a synchronize_rcu grace period). IDA - IDR based ID allocator This is id allocator without id -> pointer translation. Memory usage is much lower than full blown idr because each id only occupies a bit. ida uses a custom leaf node which contains IDA_BITMAP_BITS slots. 2007-04-25 written by Tejun Heo <htejungmail.com> 3.4.1 idr_preload idr_preload preload for idr_alloc Synopsis void idr_preload (gfp_t gfp_mask); Arguments gfp_mask allocation mask to use for preloading Description Preload per-cpu layer buffer for idr_alloc. Can only be used from process context and each idr_preload invocation should be matched with idr_preload_end. Note that preemption is disabled while preloaded. The rst idr_alloc in the preloaded section can be treated as if it were invoked with gfp_mask used for preloading. This allows using more permissive allocation masks for idrs protected by spinlocks. For example, if idr_alloc below fails, the failure can be treated as if idr_alloc were called with GFP_KERNEL rather than GFP_NOWAIT. idr_preload(GFP_KERNEL); spin_lock(lock); id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT); spin_unlock(lock); idr_preload_end; if (id < 0) error; 3.4.2 idr_alloc idr_alloc allocate new idr entry Synopsis int idr_alloc (struct idr * idr, void * ptr, int start, int end, gfp_t gfp_mask); The Linux Kernel API 58 / 317 Arguments idr the (initialized) idr ptr pointer to be associated with the new id start the minimum id (inclusive) end the maximum id (exclusive, <= 0 for max) gfp_mask memory allocation ags Description Allocate an id in [start, end) and associate it with ptr. If no ID is available in the specied range, returns -ENOSPC. On memory allocation failure, returns -ENOMEM. Note that end is treated as max when <= 0. This is to always allow using start + N as end as long as N is inside integer range. The user is responsible for exclusively synchronizing all operations which may modify idr. However, read-only accesses such as idr_find or iteration can be performed under RCU read lock provided the user destroys ptr in RCU-safe way after removal from idr. 3.4.3 idr_alloc_cyclic idr_alloc_cyclic allocate new idr entry in a cyclical fashion Synopsis int idr_alloc_cyclic (struct idr * idr, void * ptr, int start, int end, gfp_t gfp_mask); Arguments idr the (initialized) idr ptr pointer to be associated with the new id start the minimum id (inclusive) end the maximum id (exclusive, <= 0 for max) gfp_mask memory allocation ags Description Essentially the same as idr_alloc, but prefers to allocate progressively higher ids if it can. If the cur counter wraps, then it will start again at the start end of the range and allocate one that has already been used. 3.4.4 idr_remove idr_remove remove the given id and free its slot Synopsis void idr_remove (struct idr * idp, int id); The Linux Kernel API 59 / 317 Arguments idp idr handle id unique key 3.4.5 idr_destroy idr_destroy release all cached layers within an idr tree Synopsis void idr_destroy (struct idr * idp); Arguments idp idr handle Description Free all id mappings and all idp_layers. After this function, idp is completely unused and can be freed / recycled. The caller is responsible for ensuring that no one else accesses idp during or after idr_destroy. A typical clean-up sequence for objects stored in an idr tree will use idr_for_each to free all objects, if necessay, then idr_destroy to free up the id mappings and cached idr_layers. 3.4.6 idr_for_each idr_for_each iterate through all stored pointers Synopsis int idr_for_each (struct idr * idp, int (*fn) (int id, void *p, void *data), void * data); Arguments idp idr handle fn function to be called for each pointer data data passed back to callback function Description Iterate over the pointers registered with the given idr. The callback function will be called for each pointer currently registered, passing the id, the pointer and the data pointer passed to this function. It is not safe to modify the idr tree while in the callback, so functions such as idr_get_new and idr_remove are not allowed. We check the return of fn each time. If it returns anything other than 0, we break out and return that value. The caller must serialize idr_for_each vs idr_get_new and idr_remove. The Linux Kernel API 60 / 317 3.4.7 idr_get_next idr_get_next lookup next object of id to given id. Synopsis void * idr_get_next (struct idr * idp, int * nextidp); Arguments idp idr handle nextidp pointer to lookup key Description Returns pointer to registered object with id, which is next number to given id. After being looked up, *nextidp will be updated for the next iteration. This function can be called under rcu_read_lock, given that the leaf pointers lifetimes are correctly managed. 3.4.8 idr_replace idr_replace replace pointer for given id Synopsis void * idr_replace (struct idr * idp, void * ptr, int id); Arguments idp idr handle ptr pointer you want associated with the id id lookup key Description Replace the pointer registered with an id and return the old value. A -ENOENT return indicates that id was not found. A - EINVAL return indicates that id was not within valid constraints. The caller must serialize with writers. 3.4.9 idr_init idr_init initialize idr handle Synopsis void idr_init (struct idr * idp); The Linux Kernel API 61 / 317 Arguments idp idr handle Description This function is use to set up the handle (idp) that you will pass to the rest of the functions. 3.4.10 ida_pre_get ida_pre_get reserve resources for ida allocation Synopsis int ida_pre_get (struct ida * ida, gfp_t gfp_mask); Arguments ida ida handle gfp_mask memory allocation ag Description This function should be called prior to locking and calling the following function. It preallocates enough memory to satisfy the worst possible allocation. If the system is REALLY out of memory this function returns 0, otherwise 1. 3.4.11 ida_get_new_above ida_get_new_above allocate new ID above or equal to a start id Synopsis int ida_get_new_above (struct ida * ida, int starting_id, int * p_id); Arguments ida ida handle starting_id id to start search at p_id pointer to the allocated handle Description Allocate new ID above or equal to starting_id. It should be called with any required locks. If memory is required, it will return -EAGAIN, you should unlock and go back to the ida_pre_get call. If the ida is full, it will return -ENOSPC. p_id returns a value in the range starting_id ... 0x7fffffff. The Linux Kernel API 62 / 317 3.4.12 ida_remove ida_remove remove the given ID Synopsis void ida_remove (struct ida * ida, int id); Arguments ida ida handle id ID to free 3.4.13 ida_destroy ida_destroy release all cached layers within an ida tree Synopsis void ida_destroy (struct ida * ida); Arguments ida ida handle 3.4.14 ida_simple_get ida_simple_get get a new id. Synopsis int ida_simple_get (struct ida * ida, unsigned int start, unsigned int end, gfp_t gfp_mask); Arguments ida the (initialized) ida. start the minimum id (inclusive, < 0x8000000) end the maximum id (exclusive, < 0x8000000 or 0) gfp_mask memory allocation ags Description Allocates an id in the range start <= id < end, or returns -ENOSPC. On memory allocation failure, returns -ENOMEM. Use ida_simple_remove to get rid of an id. The Linux Kernel API 63 / 317 3.4.15 ida_simple_remove ida_simple_remove remove an allocated id. Synopsis void ida_simple_remove (struct ida * ida, unsigned int id); Arguments ida the (initialized) ida. id the id returned by ida_simple_get. 3.4.16 ida_init ida_init initialize ida handle Synopsis void ida_init (struct ida * ida); Arguments ida ida handle Description This function is use to set up the handle (ida) that you will pass to the rest of the functions. The Linux Kernel API 64 / 317 Chapter 4 Memory Management in Linux 4.1 The Slab Cache 4.1.1 kmalloc kmalloc allocate memory Synopsis void * kmalloc (size_t size, gfp_t ags); Arguments size how many bytes of memory are required. flags the type of memory to allocate. Description kmalloc is the normal method of allocating memory for objects smaller than page size in the kernel. The flags argument may be one of: GFP_USER - Allocate memory on behalf of user. May sleep. GFP_KERNEL - Allocate normal kernel ram. May sleep. GFP_ATOMIC - Allocation will not sleep. May use emergency pools. For example, use this inside interrupt handlers. GFP_HIGHUSER - Allocate pages from high memory. GFP_NOIO - Do not do any I/O at all while trying to get memory. GFP_NOFS - Do not make any fs calls while trying to get memory. GFP_NOWAIT - Allocation will not sleep. __GFP_THISNODE - Allocate node-local memory only. GFP_DMA - Allocation suitable for DMA. Should only be used for kmalloc caches. Otherwise, use a slab created with SLAB_DMA. Also it is possible to set different ags by ORing in one or more of the following additional flags: The Linux Kernel API 65 / 317 __GFP_COLD - Request cache-cold pages instead of trying to return cache-warm pages. __GFP_HIGH - This allocation has high priority and may use emergency pools. __GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail (think twice before using). __GFP_NORETRY - If memory is not immediately available, then give up at once. __GFP_NOWARN - If allocation fails, dont issue any warnings. __GFP_REPEAT - If allocation fails initially, try once more before failing. There are other ags available as well, but these are not intended for general use, and so are not documented here. For a full list of potential ags, always refer to linux/gfp.h. 4.1.2 kmalloc_array kmalloc_array allocate memory for an array. Synopsis void * kmalloc_array (size_t n, size_t size, gfp_t ags); Arguments n number of elements. size element size. flags the type of memory to allocate (see kmalloc). 4.1.3 kcalloc kcalloc allocate memory for an array. The memory is set to zero. Synopsis void * kcalloc (size_t n, size_t size, gfp_t ags); Arguments n number of elements. size element size. flags the type of memory to allocate (see kmalloc). 4.1.4 kzalloc kzalloc allocate memory. The memory is set to zero. Synopsis void * kzalloc (size_t size, gfp_t ags); The Linux Kernel API 66 / 317 Arguments size how many bytes of memory are required. flags the type of memory to allocate (see kmalloc). 4.1.5 kzalloc_node kzalloc_node allocate zeroed memory from a particular memory node. Synopsis void * kzalloc_node (size_t size, gfp_t ags, int node); Arguments size how many bytes of memory are required. flags the type of memory to allocate (see kmalloc). node memory node from which to allocate 4.1.6 kmem_cache_alloc kmem_cache_alloc Allocate an object Synopsis void * kmem_cache_alloc (struct kmem_cache * cachep, gfp_t ags); Arguments cachep The cache to allocate from. flags See kmalloc. Description Allocate an object from this cache. The ags are only relevant if the cache has no available objects. 4.1.7 kmem_cache_alloc_node kmem_cache_alloc_node Allocate an object on the specied node Synopsis void * kmem_cache_alloc_node (struct kmem_cache * cachep, gfp_t ags, int nodeid); The Linux Kernel API 67 / 317 Arguments cachep The cache to allocate from. flags See kmalloc. nodeid node number of the target node. Description Identical to kmem_cache_alloc but it will allocate memory on the given node, which can improve the performance for cpu bound structures. Fallback to other node is possible if __GFP_THISNODE is not set. 4.1.8 kmem_cache_free kmem_cache_free Deallocate an object Synopsis void kmem_cache_free (struct kmem_cache * cachep, void * objp); Arguments cachep The cache the allocation was from. objp The previously allocated object. Description Free an object which was previously allocated from this cache. 4.1.9 kfree kfree free previously allocated memory Synopsis void kfree (const void * objp); Arguments objp pointer returned by kmalloc. Description If objp is NULL, no operation is performed. Dont free memory not originally allocated by kmalloc or you will run into trouble. The Linux Kernel API 68 / 317 4.1.10 ksize ksize get the actual amount of memory allocated for a given object Synopsis size_t ksize (const void * objp); Arguments objp Pointer to the object Description kmalloc may internally round up allocations and return more memory than requested. ksize can be used to determine the actual amount of memory allocated. The caller may use this additional memory, even though a smaller amount of memory was initially specied with the kmalloc call. The caller must guarantee that objp points to a valid object previously allocated with either kmalloc or kmem_cache_alloc. The object must not be freed during the duration of the call. 4.1.11 kstrdup kstrdup allocate space for and copy an existing string Synopsis char * kstrdup (const char * s, gfp_t gfp); Arguments s the string to duplicate gfp the GFP mask used in the kmalloc call when allocating memory 4.1.12 kstrndup kstrndup allocate space for and copy an existing string Synopsis char * kstrndup (const char * s, size_t max, gfp_t gfp); Arguments s the string to duplicate max read at most max chars from s gfp the GFP mask used in the kmalloc call when allocating memory The Linux Kernel API 69 / 317 4.1.13 kmemdup kmemdup duplicate region of memory Synopsis void * kmemdup (const void * src, size_t len, gfp_t gfp); Arguments src memory region to duplicate len memory region length gfp GFP mask to use 4.1.14 memdup_user memdup_user duplicate memory region from user space Synopsis void * memdup_user (const void __user * src, size_t len); Arguments src source address in user space len number of bytes to copy Description Returns an ERR_PTR on failure. 4.1.15 __krealloc __krealloc like krealloc but dont free p. Synopsis void * __krealloc (const void * p, size_t new_size, gfp_t ags); Arguments p object to reallocate memory for. new_size how many bytes of memory are required. flags the type of memory to allocate. The Linux Kernel API 70 / 317 Description This function is like krealloc except it never frees the originally allocated buffer. Use this if you dont want to free the buffer immediately like, for example, with RCU. 4.1.16 krealloc krealloc reallocate memory. The contents will remain unchanged. Synopsis void * krealloc (const void * p, size_t new_size, gfp_t ags); Arguments p object to reallocate memory for. new_size how many bytes of memory are required. flags the type of memory to allocate. Description The contents of the object pointed to are preserved up to the lesser of the new and old sizes. If p is NULL, krealloc behaves exactly like kmalloc. If new_size is 0 and p is not a NULL pointer, the object pointed to is freed. 4.1.17 kzfree kzfree like kfree but zero memory Synopsis void kzfree (const void * p); Arguments p object to free memory of Description The memory of the object p points to is zeroed before freed. If p is NULL, kzfree does nothing. Note this function zeroes the whole allocated buffer which can be a good deal bigger than the requested buffer size passed to kmalloc. So be careful when using this function in performance sensitive code. 4.1.18 get_user_pages_fast get_user_pages_fast pin user pages in memory The Linux Kernel API 71 / 317 Synopsis int get_user_pages_fast (unsigned long start, int nr_pages, int write, struct page ** pages); Arguments start starting user address nr_pages number of pages from start to pin write whether pages will be written to pages array that receives pointers to the pages pinned. Should be at least nr_pages long. Description Returns number of pages pinned. This may be fewer than the number requested. If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns -errno. get_user_pages_fast provides equivalent functionality to get_user_pages, operating on current and current->mm, with force=0 and vma=NULL. However unlike get_user_pages, it must be called without mmap_sem held. get_user_pages_fast may take mmap_semand page table locks, so no assumptions can be made about lack of locking. get_user_pages_fast is to be implemented in a way that is advantageous (vs get_user_pages) when the user memory area is already faulted in and present in ptes. However if the pages have to be faulted in, it may turn out to be slightly slower so callers need to carefully consider what to use. On many architectures, get_user_pages_fast simply falls back to get_user_pages. 4.2 User Space Memory Access 4.2.1 __copy_to_user_inatomic __copy_to_user_inatomic Copy a block of data into user space, with less checking. Synopsis unsigned long __copy_to_user_inatomic (void __user * to, const void * from, unsigned long n); Arguments to Destination address, in user space. from Source address, in kernel space. n Number of bytes to copy. Context User context only. Description Copy data from kernel space to user space. Caller must check the specied block with access_ok before calling this function. The caller should also make sure he pins the user space address so that we dont result in page fault and sleep. Here we special-case 1, 2 and 4-byte copy_*_user invocations. On a fault we return the initial request size (1, 2 or 4), as copy_*_user should do. If a store crosses a page boundary and gets a fault, the x86 will not write anything, so this is accurate. The Linux Kernel API 72 / 317 4.2.2 __copy_to_user __copy_to_user Copy a block of data into user space, with less checking. Synopsis unsigned long __copy_to_user (void __user * to, const void * from, unsigned long n); Arguments to Destination address, in user space. from Source address, in kernel space. n Number of bytes to copy. Context User context only. This function may sleep. Description Copy data from kernel space to user space. Caller must check the specied block with access_ok before calling this function. Returns number of bytes that could not be copied. On success, this will be zero. 4.2.3 __copy_from_user __copy_from_user Copy a block of data from user space, with less checking. Synopsis unsigned long __copy_from_user (void * to, const void __user * from, unsigned long n); Arguments to Destination address, in kernel space. from Source address, in user space. n Number of bytes to copy. Context User context only. This function may sleep. Description Copy data from user space to kernel space. Caller must check the specied block with access_ok before calling this function. Returns number of bytes that could not be copied. On success, this will be zero. If some data could not be copied, this function will pad the copied data to the requested size using zero bytes. An alternate version - __copy_from_user_inatomic - may be called from atomic context and will fail rather than sleep. In this case the uncopied bytes will *NOT* be padded with zeros. See fs/lemap.h for explanation of why this is needed. The Linux Kernel API 73 / 317 4.2.4 clear_user clear_user Zero a block of memory in user space. Synopsis unsigned long clear_user (void __user * to, unsigned long n); Arguments to Destination address, in user space. n Number of bytes to zero. Description Zero a block of memory in user space. Returns number of bytes that could not be cleared. On success, this will be zero. 4.2.5 __clear_user __clear_user Zero a block of memory in user space, with less checking. Synopsis unsigned long __clear_user (void __user * to, unsigned long n); Arguments to Destination address, in user space. n Number of bytes to zero. Description Zero a block of memory in user space. Caller must check the specied block with access_ok before calling this function. Returns number of bytes that could not be cleared. On success, this will be zero. 4.2.6 _copy_to_user _copy_to_user Copy a block of data into user space. Synopsis unsigned long _copy_to_user (void __user * to, const void * from, unsigned n); The Linux Kernel API 74 / 317 Arguments to Destination address, in user space. from Source address, in kernel space. n Number of bytes to copy. Context User context only. This function may sleep. Description Copy data from kernel space to user space. Returns number of bytes that could not be copied. On success, this will be zero. 4.2.7 _copy_from_user _copy_from_user Copy a block of data from user space. Synopsis unsigned long _copy_from_user (void * to, const void __user * from, unsigned n); Arguments to Destination address, in kernel space. from Source address, in user space. n Number of bytes to copy. Context User context only. This function may sleep. Description Copy data from user space to kernel space. Returns number of bytes that could not be copied. On success, this will be zero. If some data could not be copied, this function will pad the copied data to the requested size using zero bytes. 4.3 More Memory Management Functions 4.3.1 read_cache_pages read_cache_pages populate an address space with some pages & start reads against them The Linux Kernel API 75 / 317 Synopsis int read_cache_pages (struct address_space * mapping, struct list_head * pages, int (*ller) (void *, struct page *), void * data); Arguments mapping the address_space pages The address of a list_head which contains the target pages. These pages have their ->index populated and are otherwise uninitialised. filler callback routine for lling a single page. data private data for the callback routine. Description Hides the details of the LRU cache etc from the lesystems. 4.3.2 page_cache_sync_readahead page_cache_sync_readahead generic le readahead Synopsis void page_cache_sync_readahead (struct address_space * mapping, struct le_ra_state * ra, struct le * lp, pgoff_t offset, unsigned long req_size); Arguments mapping address_space which holds the pagecache and I/O vectors ra le_ra_state which holds the readahead state filp passed on to ->readpage and ->readpages offset start offset into mapping, in pagecache page-sized units req_size hint: total size of the read which the caller is performing in pagecache pages Description page_cache_sync_readahead should be called when a cache miss happened: it will submit the read. The readahead logic may decide to piggyback more pages onto the read request if access patterns suggest it will improve performance. 4.3.3 page_cache_async_readahead page_cache_async_readahead le readahead for marked pages Synopsis void page_cache_async_readahead (struct address_space * mapping, struct le_ra_state * ra, struct le * lp, struct page * page, pgoff_t offset, unsigned long req_size); The Linux Kernel API 76 / 317 Arguments mapping address_space which holds the pagecache and I/O vectors ra le_ra_state which holds the readahead state filp passed on to ->readpage and ->readpages page the page at offset which has the PG_readahead ag set offset start offset into mapping, in pagecache page-sized units req_size hint: total size of the read which the caller is performing in pagecache pages Description page_cache_async_readahead should be called when a page is used which has the PG_readahead ag; this is a marker to suggest that the application has used up enough of the readahead window that we should start pulling in more pages. 4.3.4 delete_from_page_cache delete_from_page_cache delete page from page cache Synopsis void delete_from_page_cache (struct page * page); Arguments page the page which the kernel is trying to remove from page cache Description This must be called only on pages that have been veried to be in the page cache and locked. It will never put the page into the free list, the caller has a reference on the page. 4.3.5 lemap_ush lemap_ush mostly a non-blocking ush Synopsis int lemap_ush (struct address_space * mapping); Arguments mapping target address_space Description This is a mostly non-blocking ush. Not suitable for data-integrity purposes - I/O may not be started against all dirty pages. The Linux Kernel API 77 / 317 4.3.6 lemap_fdatawait_range lemap_fdatawait_range wait for writeback to complete Synopsis int lemap_fdatawait_range (struct address_space * mapping, loff_t start_byte, loff_t end_byte); Arguments mapping address space structure to wait for start_byte offset in bytes where the range starts end_byte offset in bytes where the range ends (inclusive) Description Walk the list of under-writeback pages of the given address space in the given range and wait for all of them. 4.3.7 lemap_fdatawait lemap_fdatawait wait for all under-writeback pages to complete Synopsis int lemap_fdatawait (struct address_space * mapping); Arguments mapping address space structure to wait for Description Walk the list of under-writeback pages of the given address space and wait for all of them. 4.3.8 lemap_write_and_wait_range lemap_write_and_wait_range write out & wait on a le range Synopsis int lemap_write_and_wait_range (struct address_space * mapping, loff_t lstart, loff_t lend); Arguments mapping the address_space for the pages lstart offset in bytes where the range starts lend offset in bytes where the range ends (inclusive) The Linux Kernel API 78 / 317 Description Write out and wait upon le offsets lstart->lend, inclusive. Note that `lend is inclusive (describes the last byte to be written) so that this function can be used to write to the very end-of-le (end = -1). 4.3.9 replace_page_cache_page replace_page_cache_page replace a pagecache page with a new one Synopsis int replace_page_cache_page (struct page * old, struct page * new, gfp_t gfp_mask); Arguments old page to be replaced new page to replace with gfp_mask allocation mode Description This function replaces a page in the pagecache with a new one. On success it acquires the pagecache reference for the new page and drops it for the old page. Both the old and new pages must be locked. This function does not add the new page to the LRU, the caller must do that. The remove + add is atomic. The only way this function can fail is memory allocation failure. 4.3.10 add_to_page_cache_locked add_to_page_cache_locked add a locked page to the pagecache Synopsis int add_to_page_cache_locked (struct page * page, struct address_space * mapping, pgoff_t offset, gfp_t gfp_mask); Arguments page page to add mapping the pages address_space offset page index gfp_mask page allocation mode Description This function is used to add a page to the pagecache. It must be locked. This function does not add the page to the LRU. The caller must do that. The Linux Kernel API 79 / 317 4.3.11 add_page_wait_queue add_page_wait_queue Add an arbitrary waiter to a pages wait queue Synopsis void add_page_wait_queue (struct page * page, wait_queue_t * waiter); Arguments page Page dening the wait queue of interest waiter Waiter to add to the queue Description Add an arbitrary waiter to the wait queue for the nominated page. 4.3.12 unlock_page unlock_page unlock a locked page Synopsis void unlock_page (struct page * page); Arguments page the page Description Unlocks the page and wakes up sleepers in ___wait_on_page_locked. Also wakes sleepers in wait_on_page_wri teback because the wakeup mechananism between PageLocked pages and PageWriteback pages is shared. But thats OK - sleepers in wait_on_page_writeback just go back to sleep. The mb is necessary to enforce ordering between the clear_bit and the read of the waitqueue (to avoid SMP races with a parallel wait_on_page_locked). 4.3.13 end_page_writeback end_page_writeback end writeback against a page Synopsis void end_page_writeback (struct page * page); Arguments page the page The Linux Kernel API 80 / 317 4.3.14 __lock_page __lock_page get a lock on the page, assuming we need to sleep to get it Synopsis void __lock_page (struct page * page); Arguments page the page to lock 4.3.15 page_cache_next_hole page_cache_next_hole nd the next hole (not-present entry) Synopsis pgoff_t page_cache_next_hole (struct address_space * mapping, pgoff_t index, unsigned long max_scan); Arguments mapping mapping index index max_scan maximum range to search Description Search the set [index, min(index+max_scan-1, MAX_INDEX)] for the lowest indexed hole. Returns the index of the hole if found, otherwise returns an index outside of the set specied (in which case return - index >= max_scan will be true). In rare cases of index wrap-around, 0 will be returned. page_cache_next_hole may be called under rcu_read_lock. However, like radix_tree_gang_lookup, this will not atomically search a snapshot of the tree at a single point in time. For example, if a hole is created at index 5, then subsequently a hole is created at index 10, page_cache_next_hole covering both indexes may return 10 if called under rcu_read_lock. 4.3.16 page_cache_prev_hole page_cache_prev_hole nd the prev hole (not-present entry) Synopsis pgoff_t page_cache_prev_hole (struct address_space * mapping, pgoff_t index, unsigned long max_scan); The Linux Kernel API 81 / 317 Arguments mapping mapping index index max_scan maximum range to search Description Search backwards in the range [max(index-max_scan+1, 0), index] for the rst hole. Returns the index of the hole if found, otherwise returns an index outside of the set specied (in which case index - return >= max_scan will be true). In rare cases of wrap-around, ULONG_MAX will be returned. page_cache_prev_hole may be called under rcu_read_lock. However, like radix_tree_gang_lookup, this will not atomically search a snapshot of the tree at a single point in time. For example, if a hole is created at index 10, then subsequently a hole is created at index 5, page_cache_prev_hole covering both indexes may return 5 if called under rcu_read_lock. 4.3.17 nd_get_entry nd_get_entry nd and get a page cache entry Synopsis struct page * nd_get_entry (struct address_space * mapping, pgoff_t offset); Arguments mapping the address_space to search offset the page cache index Description Looks up the page cache slot at mapping & offset. If there is a page cache page, it is returned with an increased refcount. If the slot holds a shadow entry of a previously evicted page, or a swap entry from shmem/tmpfs, it is returned. Otherwise, NULL is returned. 4.3.18 nd_lock_entry nd_lock_entry locate, pin and lock a page cache entry Synopsis struct page * nd_lock_entry (struct address_space * mapping, pgoff_t offset); The Linux Kernel API 82 / 317 Arguments mapping the address_space to search offset the page cache index Description Looks up the page cache slot at mapping & offset. If there is a page cache page, it is returned locked and with an increased refcount. If the slot holds a shadow entry of a previously evicted page, or a swap entry from shmem/tmpfs, it is returned. Otherwise, NULL is returned. find_lock_entry may sleep. 4.3.19 pagecache_get_page pagecache_get_page nd and get a page reference Synopsis struct page * pagecache_get_page (struct address_space * mapping, pgoff_t offset, int fgp_ags, gfp_t cache_gfp_mask, gfp_t radix_gfp_mask); Arguments mapping the address_space to search offset the page index fgp_flags PCG ags cache_gfp_mask gfp mask to use for the page cache data page allocation radix_gfp_mask gfp mask to use for radix tree node allocation Description Looks up the page cache slot at mapping & offset. PCG ags modify how the page is returned. FGP_ACCESSED the page will be marked accessed FGP_LOCK Page is return locked The Linux Kernel API 83 / 317 FGP_CREAT If page is not present then a new page is allocated using cache_gfp_mask and added to the page cache and the VMs LRU list. If radix tree nodes are allocated during page cache insertion then radix_gfp_mask is used. The page is returned locked and with an increased refcount. Otherwise, NULL is returned. If FGP_LOCK or FGP_CREAT are specied then the function may sleep even if the GFP ags specied for FGP_CREAT are atomic. If there is a page cache page, it is returned with an increased refcount. 4.3.20 nd_get_pages_contig nd_get_pages_contig gang contiguous pagecache lookup Synopsis unsigned nd_get_pages_contig (struct address_space * mapping, pgoff_t index, unsigned int nr_pages, struct page ** pages); Arguments mapping The address_space to search index The starting page index nr_pages The maximum number of pages pages Where the resulting pages are placed Description find_get_pages_contig works exactly like find_get_pages, except that the returned number of pages are guaranteed to be contiguous. find_get_pages_contig returns the number of pages which were found. 4.3.21 nd_get_pages_tag nd_get_pages_tag nd and return pages that match tag Synopsis unsigned nd_get_pages_tag (struct address_space * mapping, pgoff_t * index, int tag, unsigned int nr_pages, struct page ** pages); Arguments mapping the address_space to search index the starting page index tag the tag index nr_pages the maximum number of pages pages where the resulting pages are placed The Linux Kernel API 84 / 317 Description Like nd_get_pages, except we only return pages which are tagged with tag. We update index to index the next page for the traversal. 4.3.22 generic_le_read_iter generic_le_read_iter generic lesystem read routine Synopsis ssize_t generic_le_read_iter (struct kiocb * iocb, struct iov_iter * iter); Arguments iocb kernel I/O control block iter destination for the data read Description This is the read_iter routine for all lesystems that can use the page cache directly. 4.3.23 lemap_fault lemap_fault read in le data for page fault handling Synopsis int lemap_fault (struct vm_area_struct * vma, struct vm_fault * vmf); Arguments vma vma in which the fault was taken vmf struct vm_fault containing details of the fault Description filemap_fault is invoked via the vma operations vector for a mapped memory region to read in le data during a page fault. The gotos are kind of ugly, but this streamlines the normal case of having it in the page cache, and handles the special cases reasonably without having a lot of duplicated code. 4.3.24 read_cache_page read_cache_page read into page cache, ll it if needed Synopsis struct page * read_cache_page (struct address_space * mapping, pgoff_t index, int (*ller) (void *, struct page *), void * data); The Linux Kernel API 85 / 317 Arguments mapping the pages address_space index the page index filler function to perform the read data rst arg to ller(data, page) function, often left as NULL Description Read into the page cache. If a page already exists, and PageUptodate is not set, try to ll the page and wait for it to become unlocked. If the page does not get brought uptodate, return -EIO. 4.3.25 read_cache_page_gfp read_cache_page_gfp read into page cache, using specied page allocation ags. Synopsis struct page * read_cache_page_gfp (struct address_space * mapping, pgoff_t index, gfp_t gfp); Arguments mapping the pages address_space index the page index gfp the page allocator ags to use if allocating Description This is the same as read_mapping_page(mapping, index, NULL), but with any new page allocations done using the specied allocation ags. If the page does not get brought uptodate, return -EIO. 4.3.26 __generic_le_write_iter __generic_le_write_iter write data to a le Synopsis ssize_t __generic_le_write_iter (struct kiocb * iocb, struct iov_iter * from); Arguments iocb IO state structure (le, offset, etc.) from iov_iter with data to write The Linux Kernel API 86 / 317 Description This function does all the work needed for actually writing data to a le. It does all basic checks, removes SUID from the le, updates modication times and calls proper subroutines depending on whether we do direct IO or a standard buffered write. It expects i_mutex to be grabbed unless we work on a block device or similar object which does not need locking at all. This function does *not* take care of syncing data in case of O_SYNC write. A caller has to handle it. This is mainly due to the fact that we want to avoid syncing under i_mutex. 4.3.27 generic_le_write_iter generic_le_write_iter write data to a le Synopsis ssize_t generic_le_write_iter (struct kiocb * iocb, struct iov_iter * from); Arguments iocb IO state structure from iov_iter with data to write Description This is a wrapper around __generic_file_write_iter to be used by most lesystems. It takes care of syncing the le in case of O_SYNC le and acquires i_mutex as needed. 4.3.28 try_to_release_page try_to_release_page release old fs-specic metadata on a page Synopsis int try_to_release_page (struct page * page, gfp_t gfp_mask); Arguments page the page which the kernel is trying to free gfp_mask memory allocation ags (and I/O mode) Description The address_space is to try to release any data against the page (presumably at page->private). If the release was successful, return `1. Otherwise return zero. This may also be called if PG_fscache is set on a page, indicating that the page is known to the local caching routines. The gfp_mask argument species whether I/O may be performed to release this page (__GFP_IO), and whether the call may block (__GFP_WAIT & __GFP_FS). The Linux Kernel API 87 / 317 4.3.29 zap_vma_ptes zap_vma_ptes remove ptes mapping the vma Synopsis int zap_vma_ptes (struct vm_area_struct * vma, unsigned long address, unsigned long size); Arguments vma vm_area_struct holding ptes to be zapped address starting address of pages to zap size number of bytes to zap Description This function only unmaps ptes assigned to VM_PFNMAP vmas. The entire address range must be fully contained within the vma. Returns 0 if successful. 4.3.30 vm_insert_page vm_insert_page insert single page into user vma Synopsis int vm_insert_page (struct vm_area_struct * vma, unsigned long addr, struct page * page); Arguments vma user vma to map to addr target user address of this page page source kernel page Description This allows drivers to insert individual pages theyve allocated into a user vma. The page has to be a nice clean _individual_ kernel allocation. If you allocate a compound page, you need to have marked it as such (__GFP_COMP), or manually just split the page up yourself (see split_page). NOTE! Traditionally this was done with "remap_pfn_range" which took an arbitrary page protection parameter. This doesnt allow that. Your vma protection will have to be set up correctly, which means that if you want a shared writable mapping, youd better ask for a shared writable mapping! The page does not need to be reserved. Usually this function is called from f_op->mmap handler under mm->mmap_sem write-lock, so it can change vma->vm_ags. Caller must set VM_MIXEDMAP on vma if it wants to call this function from other places, for example from page-fault handler. The Linux Kernel API 88 / 317 4.3.31 vm_insert_pfn vm_insert_pfn insert single pfn into user vma Synopsis int vm_insert_pfn (struct vm_area_struct * vma, unsigned long addr, unsigned long pfn); Arguments vma user vma to map to addr target user address of this page pfn source kernel pfn Description Similar to vm_insert_page, this allows drivers to insert individual pages theyve allocated into a user vma. Same comments apply. This function should only be called from a vm_ops->fault handler, and in that case the handler should return NULL. vma cannot be a COW mapping. As this is called only for pages that do not currently exist, we do not need to ush old virtual caches or the TLB. 4.3.32 remap_pfn_range remap_pfn_range remap kernel memory to userspace Synopsis int remap_pfn_range (struct vm_area_struct * vma, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t prot); Arguments vma user vma to map to addr target user address to start at pfn physical address of kernel memory size size of map area prot page protection ags for this mapping Note this is only safe if the mm semaphore is held when called. 4.3.33 vm_iomap_memory vm_iomap_memory remap memory to userspace The Linux Kernel API 89 / 317 Synopsis int vm_iomap_memory (struct vm_area_struct * vma, phys_addr_t start, unsigned long len); Arguments vma user vma to map to start start of area len size of area Description This is a simplied io_remap_pfn_range for common driver use. The driver just needs to give us the physical memory range to be mapped, well gure out the rest from the vma information. NOTE! Some drivers might want to tweak vma->vm_page_prot rst to get whatever write-combining details or similar. 4.3.34 unmap_mapping_range unmap_mapping_range unmap the portion of all mmaps in the specied address_space corresponding to the specied page range in the underlying le. Synopsis void unmap_mapping_range (struct address_space * mapping, loff_t const holebegin, loff_t const holelen, int even_cows); Arguments mapping the address space containing mmaps to be unmapped. holebegin byte in rst page to unmap, relative to the start of the underlying le. This will be rounded down to a PAGE_SIZE boundary. Note that this is different from truncate_pagecache, which must keep the partial page. In contrast, we must get rid of partial pages. holelen size of prospective hole in bytes. This will be rounded up to a PAGE_SIZE boundary. A holelen of zero truncates to the end of the le. even_cows 1 when truncating a le, unmap even private COWed pages; but 0 when invalidating pagecache, dont throw away private data. 4.3.35 follow_pfn follow_pfn look up PFN at a user virtual address Synopsis int follow_pfn (struct vm_area_struct * vma, unsigned long address, unsigned long * pfn); The Linux Kernel API 90 / 317 Arguments vma memory mapping address user virtual address pfn location to store found PFN Description Only IO mappings and raw PFN mappings are allowed. Returns zero and the pfn at pfn on success, -ve otherwise. 4.3.36 vm_unmap_aliases vm_unmap_aliases unmap outstanding lazy aliases in the vmap layer Synopsis void vm_unmap_aliases ( void); Arguments void no arguments Description The vmap/vmalloc layer lazily ushes kernel virtual mappings primarily to amortize TLB ushing overheads. What this means is that any page you have now, may, in a former life, have been mapped into kernel virtual address by the vmap layer and so there might be some CPUs with TLB entries still referencing that page (additional to the regular 1:1 kernel mapping). vm_unmap_aliases ushes all such lazy mappings. After it returns, we can be sure that none of the pages we have control over will have any aliases from the vmap layer. 4.3.37 vm_unmap_ram vm_unmap_ram unmap linear kernel address space set up by vm_map_ram Synopsis void vm_unmap_ram (const void * mem, unsigned int count); Arguments mem the pointer returned by vm_map_ram count the count passed to that vm_map_ram call (cannot unmap partial) 4.3.38 vm_map_ram vm_map_ram map pages linearly into kernel virtual address (vmalloc space) The Linux Kernel API 91 / 317 Synopsis void * vm_map_ram (struct page ** pages, unsigned int count, int node, pgprot_t prot); Arguments pages an array of pointers to the pages to be mapped count number of pages node prefer to allocate data structures on this node prot memory protection to use. PAGE_KERNEL for regular RAM Description If you use this function for less than VMAP_MAX_ALLOC pages, it could be faster than vmap so its good. But if you mix long-life and short-life objects with vm_map_ram, it could consume lots of address space through fragmentation (especially on a 32bit machine). You could see failures in the end. Please use this function for short-lived objects. Returns a pointer to the address that has been mapped, or NULL on failure 4.3.39 unmap_kernel_range_noush unmap_kernel_range_noush unmap kernel VM area Synopsis void unmap_kernel_range_noush (unsigned long addr, unsigned long size); Arguments addr start of the VM area to unmap size size of the VM area to unmap Description Unmap PFN_UP(size) pages at addr. The VM area addr and size specify should have been allocated using get_vm_area and its friends. NOTE This function does NOT do any cache ushing. The caller is responsible for calling flush_cache_vunmap on to-be-mapped areas before calling this function and flush_tlb_kernel_range after. 4.3.40 unmap_kernel_range unmap_kernel_range unmap kernel VM area and ush cache and TLB The Linux Kernel API 92 / 317 Synopsis void unmap_kernel_range (unsigned long addr, unsigned long size); Arguments addr start of the VM area to unmap size size of the VM area to unmap Description Similar to unmap_kernel_range_noflush but ushes vcache before the unmapping and tlb after. 4.3.41 vfree vfree release memory allocated by vmalloc Synopsis void vfree (const void * addr); Arguments addr memory base address Description Free the virtually continuous memory area starting at addr, as obtained from vmalloc, vmalloc_32 or __vmalloc. If addr is NULL, no operation is performed. Must not be called in NMI context (strictly speaking, only if we dont have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, but making the calling conventions for vfree arch-depenedent would be a really bad idea) NOTE assumes that the object at *addr has a size >= sizeof(llist_node) 4.3.42 vunmap vunmap release virtual mapping obtained by vmap Synopsis void vunmap (const void * addr); Arguments addr memory base address The Linux Kernel API 93 / 317 Description Free the virtually contiguous memory area starting at addr, which was created from the page array passed to vmap. Must not be called in interrupt context. 4.3.43 vmap vmap map an array of pages into virtually contiguous space Synopsis void * vmap (struct page ** pages, unsigned int count, unsigned long ags, pgprot_t prot); Arguments pages array of page pointers count number of pages to map flags vm_area->ags prot page protection for the mapping Description Maps count pages from pages into contiguous kernel virtual space. 4.3.44 vmalloc vmalloc allocate virtually contiguous memory Synopsis void * vmalloc (unsigned long size); Arguments size allocation size Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space. Description For tight control over page level allocator and protection ags use __vmalloc instead. 4.3.45 vzalloc vzalloc allocate virtually contiguous memory with zero ll The Linux Kernel API 94 / 317 Synopsis void * vzalloc (unsigned long size); Arguments size allocation size Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space. The memory allocated is set to zero. Description For tight control over page level allocator and protection ags use __vmalloc instead. 4.3.46 vmalloc_user vmalloc_user allocate zeroed virtually contiguous memory for userspace Synopsis void * vmalloc_user (unsigned long size); Arguments size allocation size Description The resulting memory area is zeroed so it can be mapped to userspace without leaking data. 4.3.47 vmalloc_node vmalloc_node allocate memory on a specic node Synopsis void * vmalloc_node (unsigned long size, int node); Arguments size allocation size node numa node Description Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space. For tight control over page level allocator and protection ags use __vmalloc instead. The Linux Kernel API 95 / 317 4.3.48 vzalloc_node vzalloc_node allocate memory on a specic node with zero ll Synopsis void * vzalloc_node (unsigned long size, int node); Arguments size allocation size node numa node Description Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space. The memory allocated is set to zero. For tight control over page level allocator and protection ags use __vmalloc_node instead. 4.3.49 vmalloc_32 vmalloc_32 allocate virtually contiguous memory (32bit addressable) Synopsis void * vmalloc_32 (unsigned long size); Arguments size allocation size Description Allocate enough 32bit PA addressable pages to cover size from the page level allocator and map them into contiguous kernel virtual space. 4.3.50 vmalloc_32_user vmalloc_32_user allocate zeroed virtually contiguous 32bit memory Synopsis void * vmalloc_32_user (unsigned long size); Arguments size allocation size The Linux Kernel API 96 / 317 Description The resulting memory area is 32bit addressable and zeroed so it can be mapped to userspace without leaking data. 4.3.51 remap_vmalloc_range_partial remap_vmalloc_range_partial map vmalloc pages to userspace Synopsis int remap_vmalloc_range_partial (struct vm_area_struct * vma, unsigned long uaddr, void * kaddr, unsigned long size); Arguments vma vma to cover uaddr target user address to start at kaddr virtual address of vmalloc kernel memory size size of map area Returns 0 for success, -Exxx on failure This function checks that kaddr is a valid vmalloced area, and that it is big enough to cover the range starting at uaddr in vma. Will return failure if that criteria isnt met. Similar to remap_pfn_range (see mm/memory.c) 4.3.52 remap_vmalloc_range remap_vmalloc_range map vmalloc pages to userspace Synopsis int remap_vmalloc_range (struct vm_area_struct * vma, void * addr, unsigned long pgoff); Arguments vma vma to cover (map full range of vma) addr vmalloc memory pgoff number of pages into addr before rst page to map Returns 0 for success, -Exxx on failure This function checks that addr is a valid vmalloced area, and that it is big enough to cover the vma. Will return failure if that criteria isnt met. Similar to remap_pfn_range (see mm/memory.c) The Linux Kernel API 97 / 317 4.3.53 alloc_vm_area alloc_vm_area allocate a range of kernel address space Synopsis struct vm_struct * alloc_vm_area (size_t size, pte_t ** ptes); Arguments size size of the area ptes returns the PTEs for the address space Returns NULL on failure, vm_struct on success This function reserves a range of kernel address space, and allocates pagetables to map that range. No actual mappings are created. If ptes is non-NULL, pointers to the PTEs (in init_mm) allocated for the VM area are returned. 4.3.54 nr_free_zone_pages nr_free_zone_pages count number of pages beyond high watermark Synopsis unsigned long nr_free_zone_pages (int offset); Arguments offset The zone index of the highest zone Description nr_free_zone_pages counts the number of counts pages which are beyond the high watermark within all zones at or below a given zone index. For each zone, the number of pages is calculated as: managed_pages - high_pages 4.3.55 nr_free_pagecache_pages nr_free_pagecache_pages count number of pages beyond high watermark Synopsis unsigned long nr_free_pagecache_pages ( void); Arguments void no arguments The Linux Kernel API 98 / 317 Description nr_free_pagecache_pages counts the number of pages which are beyond the high watermark within all zones. 4.3.56 nd_next_best_node nd_next_best_node nd the next node that should appear in a given nodes fallback list Synopsis int nd_next_best_node (int node, nodemask_t * used_node_mask); Arguments node node whose fallback list were appending used_node_mask nodemask_t of already used nodes Description We use a number of factors to determine which is the next node that should appear on a given nodes fallback list. The node should not have appeared already in nodes fallback list, and it should be the next closest node according to the distance array (which contains arbitrary distance values from each node to each node in the system), and should also prefer nodes with no CPUs, since presumably theyll have very little allocation pressure on them otherwise. It returns -1 if no node is found. 4.3.57 free_bootmem_with_active_regions free_bootmem_with_active_regions Call memblock_free_early_nid for each active range Synopsis void free_bootmem_with_active_regions (int nid, unsigned long max_low_pfn); Arguments nid The node to free memory on. If MAX_NUMNODES, all nodes are freed. max_low_pfn The highest PFN that will be passed to memblock_free_early_nid Description If an architecture guarantees that all ranges registered contain no holes and may be freed, this this function may be used instead of calling memblock_free_early_nid manually. 4.3.58 sparse_memory_present_with_active_regions sparse_memory_present_with_active_regions Call memory_present for each active range The Linux Kernel API 99 / 317 Synopsis void sparse_memory_present_with_active_regions (int nid); Arguments nid The node to call memory_present for. If MAX_NUMNODES, all nodes will be used. Description If an architecture guarantees that all ranges registered contain no holes and may be freed, this function may be used instead of calling memory_present manually. 4.3.59 get_pfn_range_for_nid get_pfn_range_for_nid Return the start and end page frames for a node Synopsis void __meminit get_pfn_range_for_nid (unsigned int nid, unsigned long * start_pfn, unsigned long * end_pfn); Arguments nid The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned. start_pfn Passed by reference. On return, it will have the node start_pfn. end_pfn Passed by reference. On return, it will have the node end_pfn. Description It returns the start and end page frame of a node based on information provided by memblock_set_node. If called for a node with no available memory, a warning is printed and the start and end PFNs will be 0. 4.3.60 absent_pages_in_range absent_pages_in_range Return number of page frames in holes within a range Synopsis unsigned long absent_pages_in_range (unsigned long start_pfn, unsigned long end_pfn); Arguments start_pfn The start PFN to start searching for holes end_pfn The end PFN to stop searching for holes The Linux Kernel API 100 / 317 Description It returns the number of pages frames in memory holes within a range. 4.3.61 node_map_pfn_alignment node_map_pfn_alignment determine the maximum internode alignment Synopsis unsigned long node_map_pfn_alignment ( void); Arguments void no arguments Description This function should be called after node map is populated and sorted. It calculates the maximum power of two alignment which can distinguish all the nodes. For example, if all nodes are 1GiB and aligned to 1GiB, the return value would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)). If the nodes are shifted by 256MiB, 256MiB. Note that if only the last node is shifted, 1GiB is enough and this function will indicate so. This is used to test whether pfn -> nid mapping of the chosen memory model has ne enough granularity to avoid incorrect mapping for the populated node map. Returns the determined alignment in pfns. 0 if there is no alignment requirement (single node). 4.3.62 nd_min_pfn_with_active_regions nd_min_pfn_with_active_regions Find the minimum PFN registered Synopsis unsigned long nd_min_pfn_with_active_regions ( void); Arguments void no arguments Description It returns the minimum PFN based on information provided via memblock_set_node. 4.3.63 free_area_init_nodes free_area_init_nodes Initialise all pg_data_t and zone data The Linux Kernel API 101 / 317 Synopsis void free_area_init_nodes (unsigned long * max_zone_pfn); Arguments max_zone_pfn an array of max PFNs for each zone Description This will call free_area_init_node for each active node in the system. Using the page ranges provided by membloc k_set_node, the size of each zone in each node and their holes is calculated. If the maximum PFN between two adjacent zones match, it is assumed that the zone is empty. For example, if arch_max_dma_pfn == arch_max_dma32_pfn, it is assumed that arch_max_dma32_pfn has no pages. It is also assumed that a zone starts where the previous one ended. For example, ZONE_DMA32 starts at arch_max_dma_pfn. 4.3.64 set_dma_reserve set_dma_reserve set the specied number of pages reserved in the rst zone Synopsis void set_dma_reserve (unsigned long new_dma_reserve); Arguments new_dma_reserve The number of pages to mark reserved Description The per-cpu batchsize and zone watermarks are determined by present_pages. In the DMA zone, a signicant percentage may be consumed by kernel image and other unfreeable allocations which can skew the watermarks badly. This function may optionally be used to account for unfreeable pages in the rst zone (e.g., ZONE_DMA). The effect will be lower watermarks and smaller per-cpu batchsize. 4.3.65 setup_per_zone_wmarks setup_per_zone_wmarks called when min_free_kbytes changes or when memory is hot-{added|removed} Synopsis void setup_per_zone_wmarks ( void); Arguments void no arguments The Linux Kernel API 102 / 317 Description Ensures that the watermark[min,low,high] values for each zone are set correctly with respect to min_free_kbytes. 4.3.66 get_pfnblock_ags_mask get_pfnblock_ags_mask Return the requested group of ags for the pageblock_nr_pages block of pages Synopsis unsigned long get_pfnblock_ags_mask (struct page * page, unsigned long pfn, unsigned long end_bitidx, unsigned long mask); Arguments page The page within the block of interest pfn The target page frame number end_bitidx The last bit of interest to retrieve mask mask of bits that the caller is interested in Return pageblock_bits ags 4.3.67 set_pfnblock_ags_mask set_pfnblock_ags_mask Set the requested group of ags for a pageblock_nr_pages block of pages Synopsis void set_pfnblock_ags_mask (struct page * page, unsigned long ags, unsigned long pfn, unsigned long end_bitidx, unsigned long mask); Arguments page The page within the block of interest flags The ags to set pfn The target page frame number end_bitidx The last bit of interest mask mask of bits that the caller is interested in 4.3.68 alloc_contig_range alloc_contig_range - tries to allocate given range of pages The Linux Kernel API 103 / 317 Synopsis int alloc_contig_range (unsigned long start, unsigned long end, unsigned migratetype); Arguments start start PFN to allocate end one-past-the-last PFN to allocate migratetype migratetype of the underlaying pageblocks (either #MIGRATE_MOVABLE or #MIGRATE_CMA). All page- blocks in range must have the same migratetype and it must be either of the two. Description The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES aligned, however its the callers responsibility to guarantee that we are the only thread that changes migrate type of pageblocks the pages fall in. The PFN range must belong to a single zone. Returns zero on success or negative error code. On success all pages which PFN is in [start, end) are allocated for the caller and need to be freed with free_contig_range. 4.3.69 mempool_destroy mempool_destroy deallocate a memory pool Synopsis void mempool_destroy (mempool_t * pool); Arguments pool pointer to the memory pool which was allocated via mempool_create. Description Free all reserved elements in pool and pool itself. This function only sleeps if the free_fn function sleeps. 4.3.70 mempool_create mempool_create create a memory pool Synopsis mempool_t * mempool_create (int min_nr, mempool_alloc_t * alloc_fn, mempool_free_t * free_fn, void * pool_data); The Linux Kernel API 104 / 317 Arguments min_nr the minimum number of elements guaranteed to be allocated for this pool. alloc_fn user-dened element-allocation function. free_fn user-dened element-freeing function. pool_data optional private data available to the user-dened functions. Description this function creates and allocates a guaranteed size, preallocated memory pool. The pool can be used from the mempool_al loc and mempool_free functions. This function might sleep. Both the alloc_fn and the free_fn functions might sleep - as long as the mempool_alloc function is not called from IRQ contexts. 4.3.71 mempool_resize mempool_resize resize an existing memory pool Synopsis int mempool_resize (mempool_t * pool, int new_min_nr, gfp_t gfp_mask); Arguments pool pointer to the memory pool which was allocated via mempool_create. new_min_nr the new minimum number of elements guaranteed to be allocated for this pool. gfp_mask the usual allocation bitmask. Description This function shrinks/grows the pool. In the case of growing, it cannot be guaranteed that the pool will be grown to the new size immediately, but new mempool_free calls will rell it. Note, the caller must guarantee that no mempool_destroy is called while this function is running. mempool_alloc & mempo ol_free might be called (eg. from IRQ contexts) while this function executes. 4.3.72 mempool_alloc mempool_alloc allocate an element from a specic memory pool Synopsis void * mempool_alloc (mempool_t * pool, gfp_t gfp_mask); Arguments pool pointer to the memory pool which was allocated via mempool_create. gfp_mask the usual allocation bitmask. The Linux Kernel API 105 / 317 Description this function only sleeps if the alloc_fn function sleeps or returns NULL. Note that due to preallocation, this function *never* fails when called from process contexts. (it might fail if called from an IRQ context.) Note using __GFP_ZERO is not supported. 4.3.73 mempool_free mempool_free return an element to the pool. Synopsis void mempool_free (void * element, mempool_t * pool); Arguments element pool element pointer. pool pointer to the memory pool which was allocated via mempool_create. Description this function only sleeps if the free_fn function sleeps. 4.3.74 dma_pool_create dma_pool_create Creates a pool of consistent memory blocks, for dma. Synopsis struct dma_pool * dma_pool_create (const char * name, struct device * dev, size_t size, size_t align, size_t boundary); Arguments name name of pool, for diagnostics dev device that will be doing the DMA size size of the blocks in this pool. align alignment requirement for blocks; must be a power of two boundary returned blocks wont cross this power of two boundary Context !in_interrupt The Linux Kernel API 106 / 317 Description Returns a dma allocation pool with the requested characteristics, or null if one cant be created. Given one of these pools, dma_pool_alloc may be used to allocate memory. Such memory will all have consistent DMA mappings, accessible by the device and its driver without using cache ushing primitives. The actual size of blocks allocated may be larger than requested because of alignment. If boundary is nonzero, objects returned from dma_pool_alloc wont cross that size boundary. This is useful for devices which have addressing restrictions on individual DMA transfers, such as not crossing boundaries of 4KBytes. 4.3.75 dma_pool_destroy dma_pool_destroy destroys a pool of dma memory blocks. Synopsis void dma_pool_destroy (struct dma_pool * pool); Arguments pool dma pool that will be destroyed Context !in_interrupt Description Caller guarantees that no more memory from the pool is in use, and that nothing will try to use the pool after this call. 4.3.76 dma_pool_alloc dma_pool_alloc get a block of consistent memory Synopsis void * dma_pool_alloc (struct dma_pool * pool, gfp_t mem_ags, dma_addr_t * handle); Arguments pool dma pool that will produce the block mem_flags GFP_* bitmask handle pointer to dma address of block Description This returns the kernel virtual address of a currently unused block, and reports its dma address through the handle. If such a memory block cant be allocated, NULL is returned. The Linux Kernel API 107 / 317 4.3.77 dma_pool_free dma_pool_free put block back into dma pool Synopsis void dma_pool_free (struct dma_pool * pool, void * vaddr, dma_addr_t dma); Arguments pool the dma pool holding the block vaddr virtual address of block dma dma address of block Description Caller promises neither device nor driver will again touch this block unless it is rst re-allocated. 4.3.78 dmam_pool_create dmam_pool_create Managed dma_pool_create Synopsis struct dma_pool * dmam_pool_create (const char * name, struct device * dev, size_t size, size_t align, size_t allocation); Arguments name name of pool, for diagnostics dev device that will be doing the DMA size size of the blocks in this pool. align alignment requirement for blocks; must be a power of two allocation returned blocks wont cross this boundary (or zero) Description Managed dma_pool_create. DMA pool created with this function is automatically destroyed on driver detach. 4.3.79 dmam_pool_destroy dmam_pool_destroy Managed dma_pool_destroy Synopsis void dmam_pool_destroy (struct dma_pool * pool); The Linux Kernel API 108 / 317 Arguments pool dma pool that will be destroyed Description Managed dma_pool_destroy. 4.3.80 balance_dirty_pages_ratelimited balance_dirty_pages_ratelimited balance dirty memory state Synopsis void balance_dirty_pages_ratelimited (struct address_space * mapping); Arguments mapping address_space which was dirtied Description Processes which are dirtying memory should call in here once for each page which was newly dirtied. The function will periodi- cally check the systems dirty state and will initiate writeback if needed. On really big machines, get_writeback_state is expensive, so try to avoid calling it too often (ratelimiting). But once were over the dirty memory limit we decrease the ratelimiting by a lot, to prevent individual processes from overshooting the limit by (ratelimit_pages) each. 4.3.81 tag_pages_for_writeback tag_pages_for_writeback tag pages to be written by write_cache_pages Synopsis void tag_pages_for_writeback (struct address_space * mapping, pgoff_t start, pgoff_t end); Arguments mapping address space structure to write start starting page index end ending page index (inclusive) Description This function scans the page range from start to end (inclusive) and tags all pages that have DIRTY tag set with a special TOWRITE tag. The idea is that write_cache_pages (or whoever calls this function) will then use TOWRITE tag to identify pages eligible for writeback. This mechanism is used to avoid livelocking of writeback by a process steadily creating new dirty pages in the le (thus it is important for this function to be quick so that it can tag pages faster than a dirtying process can create them). The Linux Kernel API 109 / 317 4.3.82 write_cache_pages write_cache_pages walk the list of dirty pages of the given address space and write all of them. Synopsis int write_cache_pages (struct address_space * mapping, struct writeback_control * wbc, writepage_t writepage, void * data); Arguments mapping address space structure to write wbc subtract the number of written pages from *wbc->nr_to_write writepage function called for each page data data passed to writepage function Description If a page is already under I/O, write_cache_pages skips it, even if its dirty. This is desirable behaviour for memory- cleaning writeback, but it is INCORRECT for data-integrity system calls such as fsync. fsync and msync need to guar- antee that all the data which was dirty at the time the call was made get new I/O started against them. If wbc->sync_mode is WB_SYNC_ALL then we were called for data integrity and we must wait for existing IO to complete. To avoid livelocks (when other process dirties new pages), we rst tag pages which should be written back with TOWRITE tag and only then start writing them. For data-integrity sync we have to be careful so that we do not miss some pages (e.g., because some other process has cleared TOWRITE tag we set). The rule we follow is that TOWRITE tag can be cleared only by the process clearing the DIRTY tag (and submitting the page for IO). 4.3.83 generic_writepages generic_writepages walk the list of dirty pages of the given address space and writepage all of them. Synopsis int generic_writepages (struct address_space * mapping, struct writeback_control * wbc); Arguments mapping address space structure to write wbc subtract the number of written pages from *wbc->nr_to_write Description This is a library function, which implements the writepages address_space_operation. 4.3.84 write_one_page write_one_page write out a single page and optionally wait on I/O The Linux Kernel API 110 / 317 Synopsis int write_one_page (struct page * page, int wait); Arguments page the page to write wait if true, wait on writeout Description The page must be locked by the caller and will be unlocked upon return. write_one_page returns a negative error code if I/O failed. 4.3.85 wait_for_stable_page wait_for_stable_page wait for writeback to nish, if necessary. Synopsis void wait_for_stable_page (struct page * page); Arguments page The page to wait on. Description This function determines if the given page is related to a backing device that requires page contents to be held stable during writeback. If so, then it will wait for any pending writeback to complete. 4.3.86 truncate_inode_pages_range truncate_inode_pages_range truncate range of pages specied by start & end byte offsets Synopsis void truncate_inode_pages_range (struct address_space * mapping, loff_t lstart, loff_t lend); Arguments mapping mapping to truncate lstart offset from which to truncate lend offset to which to truncate (inclusive) The Linux Kernel API 111 / 317 Description Truncate the page cache, removing the pages that are between specied offsets (and zeroing out partial pages if lstart or lend + 1 is not page aligned). Truncate takes two passes - the rst pass is nonblocking. It will not block on page locks and it will not block on writeback. The second pass will wait. This is to prevent as much IO as possible in the affected region. The rst pass will remove most pages, so the search cost of the second pass is low. We pass down the cache-hot hint to the page freeing code. Even if the mapping is large, it is probably the case that the nal pages are the most recently touched, and freeing happens in ascending le offset order. Note that since ->invalidatepage accepts range to invalidate truncate_inode_pages_range is able to handle cases where lend + 1 is not page aligned properly. 4.3.87 truncate_inode_pages truncate_inode_pages truncate *all* the pages from an offset Synopsis void truncate_inode_pages (struct address_space * mapping, loff_t lstart); Arguments mapping mapping to truncate lstart offset from which to truncate Description Called under (and serialised by) inode->i_mutex. Note When this function returns, there can be a page in the process of deletion (inside __delete_from_page_cache) in the specied range. Thus mapping->nrpages can be non-zero when this function returns even after truncation of the whole mapping. 4.3.88 truncate_inode_pages_nal truncate_inode_pages_nal truncate *all* pages before inode dies Synopsis void truncate_inode_pages_nal (struct address_space * mapping); Arguments mapping mapping to truncate The Linux Kernel API 112 / 317 Description Called under (and serialized by) inode->i_mutex. Filesystems have to use this in the .evict_inode path to inform the VM that this is the nal truncate and the inode is going away. 4.3.89 invalidate_mapping_pages invalidate_mapping_pages Invalidate all the unlocked pages of one inode Synopsis unsigned long invalidate_mapping_pages (struct address_space * mapping, pgoff_t start, pgoff_t end); Arguments mapping the address_space which holds the pages to invalidate start the offset from which to invalidate end the offset to which to invalidate (inclusive) Description This function only removes the unlocked pages, if you want to remove all the pages of one inode, you must call truncate_inode_pages. invalidate_mapping_pages will not block on IO activity. It will not invalidate pages which are dirty, locked, under writeback or mapped into pagetables. 4.3.90 invalidate_inode_pages2_range invalidate_inode_pages2_range remove range of pages from an address_space Synopsis int invalidate_inode_pages2_range (struct address_space * mapping, pgoff_t start, pgoff_t end); Arguments mapping the address_space start the page offset from which to invalidate end the page offset to which to invalidate (inclusive) Description Any pages which are found to be mapped into pagetables are unmapped prior to invalidation. Returns -EBUSY if any pages could not be invalidated. The Linux Kernel API 113 / 317 4.3.91 invalidate_inode_pages2 invalidate_inode_pages2 remove all pages from an address_space Synopsis int invalidate_inode_pages2 (struct address_space * mapping); Arguments mapping the address_space Description Any pages which are found to be mapped into pagetables are unmapped prior to invalidation. Returns -EBUSY if any pages could not be invalidated. 4.3.92 truncate_pagecache truncate_pagecache unmap and remove pagecache that has been truncated Synopsis void truncate_pagecache (struct inode * inode, loff_t newsize); Arguments inode inode newsize new le size Description inodes new i_size must already be written before truncate_pagecache is called. This function should typically be called before the lesystem releases resources associated with the freed range (eg. deallocates blocks). This way, pagecache will always stay logically coherent with on-disk format, and the lesystem would not have to deal with situations such as writepage being called for a page that has already had its underlying blocks deallocated. 4.3.93 truncate_setsize truncate_setsize update inode and pagecache for a new le size Synopsis void truncate_setsize (struct inode * inode, loff_t newsize); The Linux Kernel API 114 / 317 Arguments inode inode newsize new le size Description truncate_setsize updates i_size and performs pagecache truncation (if necessary) to newsize. It will be typically be called from the lesystems setattr function when ATTR_SIZE is passed in. Must be called with inode_mutex held and before all lesystem specic block truncation has been performed. 4.3.94 truncate_pagecache_range truncate_pagecache_range unmap and remove pagecache that is hole-punched Synopsis void truncate_pagecache_range (struct inode * inode, loff_t lstart, loff_t lend); Arguments inode inode lstart offset of beginning of hole lend offset of last byte of hole Description This function should typically be called before the lesystem releases resources associated with the freed range (eg. deallocates blocks). This way, pagecache will always stay logically coherent with on-disk format, and the lesystem would not have to deal with situations such as writepage being called for a page that has already had its underlying blocks deallocated. The Linux Kernel API 115 / 317 Chapter 5 Kernel IPC facilities 5.1 IPC utilities 5.1.1 ipc_init ipc_init initialise ipc subsystem Synopsis int ipc_init ( void); Arguments void no arguments Description The various sysv ipc resources (semaphores, messages and shared memory) are initialised. A callback routine is registered into the memory hotplug notier chain since msgmni scales to lowmem this callback routine will be called upon successful memory add / remove to recompute msmgni. 5.1.2 ipc_init_ids ipc_init_ids initialise ipc identiers Synopsis void ipc_init_ids (struct ipc_ids * ids); Arguments ids ipc identier set The Linux Kernel API 116 / 317 Description Set up the sequence range to use for the ipc identier range (limited below IPCMNI) then initialise the ids idr. 5.1.3 ipc_init_proc_interface ipc_init_proc_interface create a proc interface for sysipc types using a seq_le interface. Synopsis void ipc_init_proc_interface (const char * path, const char * header, int ids, int (*show) (struct seq_le *, void *)); Arguments path Path in procfs header Banner to be printed at the beginning of the le. ids ipc id table to iterate. show show routine. 5.1.4 ipc_ndkey ipc_ndkey nd a key in an ipc identier set Synopsis struct kern_ipc_perm * ipc_ndkey (struct ipc_ids * ids, key_t key); Arguments ids ipc identier set key key to nd Description Returns the locked pointer to the ipc structure if found or NULL otherwise. If key is found ipc points to the owning ipc structure Called with ipc_ids.rwsem held. 5.1.5 ipc_get_maxid ipc_get_maxid get the last assigned id Synopsis int ipc_get_maxid (struct ipc_ids * ids); The Linux Kernel API 117 / 317 Arguments ids ipc identier set Description Called with ipc_ids.rwsem held. 5.1.6 ipc_addid ipc_addid add an ipc identier Synopsis int ipc_addid (struct ipc_ids * ids, struct kern_ipc_perm * new, int size); Arguments ids ipc identier set new new ipc permission set size limit for the number of used ids Description Add an entry new to the ipc ids idr. The permissions object is initialised and the rst free entry is set up and the id assigned is returned. The new entry is returned in a locked state on success. On failure the entry is not locked and a negative err-code is returned. Called with writer ipc_ids.rwsem held. 5.1.7 ipcget_new ipcget_new create a new ipc object Synopsis int ipcget_new (struct ipc_namespace * ns, struct ipc_ids * ids, const struct ipc_ops * ops, struct ipc_params * params); Arguments ns ipc namespace ids ipc identifer set ops the actual creation routine to call params its parameters The Linux Kernel API 118 / 317 Description This routine is called by sys_msgget, sys_semget and sys_shmget when the key is IPC_PRIVATE. 5.1.8 ipc_check_perms ipc_check_perms check security and permissions for an ipc object Synopsis int ipc_check_perms (struct ipc_namespace * ns, struct kern_ipc_perm * ipcp, const struct ipc_ops * ops, struct ipc_params * params); Arguments ns ipc namespace ipcp ipc permission set ops the actual security routine to call params its parameters Description This routine is called by sys_msgget, sys_semget and sys_shmget when the key is not IPC_PRIVATE and that key already exists in the ds IDR. On success, the ipc id is returned. It is called with ipc_ids.rwsem and ipcp->lock held. 5.1.9 ipcget_public ipcget_public get an ipc object or create a new one Synopsis int ipcget_public (struct ipc_namespace * ns, struct ipc_ids * ids, const struct ipc_ops * ops, struct ipc_params * params); Arguments ns ipc namespace ids ipc identifer set ops the actual creation routine to call params its parameters Description This routine is called by sys_msgget, sys_semget and sys_shmget when the key is not IPC_PRIVATE. It adds a new entry if the key is not found and does some permission / security checkings if the key is found. On success, the ipc id is returned. The Linux Kernel API 119 / 317 5.1.10 ipc_rmid ipc_rmid remove an ipc identier Synopsis void ipc_rmid (struct ipc_ids * ids, struct kern_ipc_perm * ipcp); Arguments ids ipc identier set ipcp ipc perm structure containing the identier to remove Description ipc_ids.rwsem (as a writer) and the spinlock for this ID are held before this function is called, and remain locked on the exit. 5.1.11 ipc_alloc ipc_alloc allocate ipc space Synopsis void * ipc_alloc (int size); Arguments size size desired Description Allocate memory from the appropriate pools and return a pointer to it. NULL is returned if the allocation fails 5.1.12 ipc_free ipc_free free ipc space Synopsis void ipc_free (void * ptr, int size); Arguments ptr pointer returned by ipc_alloc size size of block The Linux Kernel API 120 / 317 Description Free a block created with ipc_alloc. The caller must know the size used in the allocation call. 5.1.13 ipc_rcu_alloc ipc_rcu_alloc allocate ipc and rcu space Synopsis void * ipc_rcu_alloc (int size); Arguments size size desired Description Allocate memory for the rcu header structure + the object. Returns the pointer to the object or NULL upon failure. 5.1.14 ipcperms ipcperms check ipc permissions Synopsis int ipcperms (struct ipc_namespace * ns, struct kern_ipc_perm * ipcp, short ag); Arguments ns ipc namespace ipcp ipc permission set flag desired permission set Description Check user, group, other permissions for access to ipc resources. return 0 if allowed flag will most probably be 0 or S_...UGO from <linux/stat.h> 5.1.15 kernel_to_ipc64_perm kernel_to_ipc64_perm convert kernel ipc permissions to user Synopsis void kernel_to_ipc64_perm (struct kern_ipc_perm * in, struct ipc64_perm * out); The Linux Kernel API 121 / 317 Arguments in kernel permissions out new style ipc permissions Description Turn the kernel object in into a set of permissions descriptions for returning to userspace (out). 5.1.16 ipc64_perm_to_ipc_perm ipc64_perm_to_ipc_perm convert new ipc permissions to old Synopsis void ipc64_perm_to_ipc_perm (struct ipc64_perm * in, struct ipc_perm * out); Arguments in new style ipc permissions out old style ipc permissions Description Turn the new style permissions object in into a compatibility object and store it into the out pointer. 5.1.17 ipc_obtain_object ipc_obtain_object Synopsis struct kern_ipc_perm * ipc_obtain_object (struct ipc_ids * ids, int id); Arguments ids ipc identier set id ipc id to look for Description Look for an id in the ipc ids idr and return associated ipc object. Call inside the RCU critical section. The ipc object is *not* locked on exit. The Linux Kernel API 122 / 317 5.1.18 ipc_lock ipc_lock lock an ipc structure without rwsem held Synopsis struct kern_ipc_perm * ipc_lock (struct ipc_ids * ids, int id); Arguments ids ipc identier set id ipc id to look for Description Look for an id in the ipc ids idr and lock the associated ipc object. The ipc object is locked on successful exit. 5.1.19 ipc_obtain_object_check ipc_obtain_object_check Synopsis struct kern_ipc_perm * ipc_obtain_object_check (struct ipc_ids * ids, int id); Arguments ids ipc identier set id ipc id to look for Description Similar to ipc_obtain_object but also checks the ipc object reference counter. Call inside the RCU critical section. The ipc object is *not* locked on exit. 5.1.20 ipcget ipcget Common sys_*get code Synopsis int ipcget (struct ipc_namespace * ns, struct ipc_ids * ids, const struct ipc_ops * ops, struct ipc_params * params); The Linux Kernel API 123 / 317 Arguments ns namsepace ids ipc identier set ops operations to be called on ipc object creation, permission checks and further checks params the parameters needed by the previous operations. Description Common routine called by sys_msgget, sys_semget and sys_shmget. 5.1.21 ipc_update_perm ipc_update_perm update the permissions of an ipc object Synopsis int ipc_update_perm (struct ipc64_perm * in, struct kern_ipc_perm * out); Arguments in the permission given as input. out the permission of the ipc to set. 5.1.22 ipcctl_pre_down_nolock ipcctl_pre_down_nolock retrieve an ipc and check permissions for some IPC_XXX cmd Synopsis struct kern_ipc_perm* ipcctl_pre_down_nolock (struct ipc_namespace * ns, struct ipc_ids * ids, int id, int cmd, struct ipc64_perm * perm, int extra_perm); Arguments ns ipc namespace ids the table of ids where to look for the ipc id the id of the ipc to retrieve cmd the cmd to check perm the permission to set extra_perm one extra permission parameter used by msq The Linux Kernel API 124 / 317 Description This function does some common audit and permissions check for some IPC_XXX cmd and is called from semctl_down, shm- ctl_down and msgctl_down. It must be called without any lock held and - retrieves the ipc with the given id in the given table. - performs some audit and permission check, depending on the given cmd - returns a pointer to the ipc object or otherwise, the corresponding error. Call holding the both the rwsem and the rcu read lock. 5.1.23 ipc_parse_version ipc_parse_version ipc call version Synopsis int ipc_parse_version (int * cmd); Arguments cmd pointer to command Description Return IPC_64 for new style IPC and IPC_OLD for old style IPC. The cmd value is turned from an encoding command and version into just the command code. The Linux Kernel API 125 / 317 Chapter 6 FIFO Buffer 6.1 kfo interface 6.1.1 DECLARE_KFIFO_PTR DECLARE_KFIFO_PTR macro to declare a fo pointer object Synopsis DECLARE_KFIFO_PTR ( fo, type); Arguments fifo name of the declared fo type type of the fo elements 6.1.2 DECLARE_KFIFO DECLARE_KFIFO macro to declare a fo object Synopsis DECLARE_KFIFO ( fo, type, size); Arguments fifo name of the declared fo type type of the fo elements size the number of elements in the fo, this must be a power of 2 6.1.3 INIT_KFIFO INIT_KFIFO Initialize a fo declared by DECLARE_KFIFO The Linux Kernel API 126 / 317 Synopsis INIT_KFIFO ( fo); Arguments fifo name of the declared fo datatype 6.1.4 DEFINE_KFIFO DEFINE_KFIFO macro to dene and initialize a fo Synopsis DEFINE_KFIFO ( fo, type, size); Arguments fifo name of the declared fo datatype type type of the fo elements size the number of elements in the fo, this must be a power of 2 Note the macro can be used for global and local fo data type variables. 6.1.5 kfo_initialized kfo_initialized Check if the fo is initialized Synopsis kfo_initialized ( fo); Arguments fifo address of the fo to check Description Return true if fo is initialized, otherwise false. Assumes the fo was 0 before. 6.1.6 kfo_esize kfo_esize returns the size of the element managed by the fo The Linux Kernel API 127 / 317 Synopsis kfo_esize ( fo); Arguments fifo address of the fo to be used 6.1.7 kfo_recsize kfo_recsize returns the size of the record length eld Synopsis kfo_recsize ( fo); Arguments fifo address of the fo to be used 6.1.8 kfo_size kfo_size returns the size of the fo in elements Synopsis kfo_size ( fo); Arguments fifo address of the fo to be used 6.1.9 kfo_reset kfo_reset removes the entire fo content Synopsis kfo_reset ( fo); Arguments fifo address of the fo to be used Note usage of kfifo_reset is dangerous. It should be only called when the fo is exclusived locked or when it is secured that no other thread is accessing the fo. The Linux Kernel API 128 / 317 6.1.10 kfo_reset_out kfo_reset_out skip fo content Synopsis kfo_reset_out ( fo); Arguments fifo address of the fo to be used Note The usage of kfifo_reset_out is safe until it will be only called from the reader thread and there is only one concurrent reader. Otherwise it is dangerous and must be handled in the same way as kfifo_reset. 6.1.11 kfo_len kfo_len returns the number of used elements in the fo Synopsis kfo_len ( fo); Arguments fifo address of the fo to be used 6.1.12 kfo_is_empty kfo_is_empty returns true if the fo is empty Synopsis kfo_is_empty ( fo); Arguments fifo address of the fo to be used 6.1.13 kfo_is_full kfo_is_full returns true if the fo is full Synopsis kfo_is_full ( fo); The Linux Kernel API 129 / 317 Arguments fifo address of the fo to be used 6.1.14 kfo_avail kfo_avail returns the number of unused elements in the fo Synopsis kfo_avail ( fo); Arguments fifo address of the fo to be used 6.1.15 kfo_skip kfo_skip skip output data Synopsis kfo_skip ( fo); Arguments fifo address of the fo to be used 6.1.16 kfo_peek_len kfo_peek_len gets the size of the next fo record Synopsis kfo_peek_len ( fo); Arguments fifo address of the fo to be used Description This function returns the size of the next fo record in number of bytes. 6.1.17 kfo_alloc kfo_alloc dynamically allocates a new fo buffer The Linux Kernel API 130 / 317 Synopsis kfo_alloc ( fo, size, gfp_mask); Arguments fifo pointer to the fo size the number of elements in the fo, this must be a power of 2 gfp_mask get_free_pages mask, passed to kmalloc Description This macro dynamically allocates a new fo buffer. The numer of elements will be rounded-up to a power of 2. The fo will be release with kfifo_free. Return 0 if no error, otherwise an error code. 6.1.18 kfo_free kfo_free frees the fo Synopsis kfo_free ( fo); Arguments fifo the fo to be freed 6.1.19 kfo_init kfo_init initialize a fo using a preallocated buffer Synopsis kfo_init ( fo, buffer, size); Arguments fifo the fo to assign the buffer buffer the preallocated buffer to be used size the size of the internal buffer, this have to be a power of 2 Description This macro initialize a fo using a preallocated buffer. The numer of elements will be rounded-up to a power of 2. Return 0 if no error, otherwise an error code. The Linux Kernel API 131 / 317 6.1.20 kfo_put kfo_put put data into the fo Synopsis kfo_put ( fo, val); Arguments fifo address of the fo to be used val the data to be added Description This macro copies the given value into the fo. It returns 0 if the fo was full. Otherwise it returns the number processed elements. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macro. 6.1.21 kfo_get kfo_get get data from the fo Synopsis kfo_get ( fo, val); Arguments fifo address of the fo to be used val address where to store the data Description This macro reads the data from the fo. It returns 0 if the fo was empty. Otherwise it returns the number processed elements. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macro. 6.1.22 kfo_peek kfo_peek get data from the fo without removing Synopsis kfo_peek ( fo, val); The Linux Kernel API 132 / 317 Arguments fifo address of the fo to be used val address where to store the data Description This reads the data from the fo without removing it from the fo. It returns 0 if the fo was empty. Otherwise it returns the number processed elements. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macro. 6.1.23 kfo_in kfo_in put data into the fo Synopsis kfo_in ( fo, buf, n); Arguments fifo address of the fo to be used buf the data to be added n number of elements to be added Description This macro copies the given buffer into the fo and returns the number of copied elements. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macro. 6.1.24 kfo_in_spinlocked kfo_in_spinlocked put data into the fo using a spinlock for locking Synopsis kfo_in_spinlocked ( fo, buf, n, lock); Arguments fifo address of the fo to be used buf the data to be added n number of elements to be added lock pointer to the spinlock to use for locking The Linux Kernel API 133 / 317 Description This macro copies the given values buffer into the fo and returns the number of copied elements. 6.1.25 kfo_out kfo_out get data from the fo Synopsis kfo_out ( fo, buf, n); Arguments fifo address of the fo to be used buf pointer to the storage buffer n max. number of elements to get Description This macro get some data from the fo and return the numbers of elements copied. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macro. 6.1.26 kfo_out_spinlocked kfo_out_spinlocked get data from the fo using a spinlock for locking Synopsis kfo_out_spinlocked ( fo, buf, n, lock); Arguments fifo address of the fo to be used buf pointer to the storage buffer n max. number of elements to get lock pointer to the spinlock to use for locking Description This macro get the data from the fo and return the numbers of elements copied. 6.1.27 kfo_from_user kfo_from_user puts some data from user space into the fo The Linux Kernel API 134 / 317 Synopsis kfo_from_user ( fo, from, len, copied); Arguments fifo address of the fo to be used from pointer to the data to be added len the length of the data to be added copied pointer to output variable to store the number of copied bytes Description This macro copies at most len bytes from the from into the fo, depending of the available space and returns -EFAULT/0. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macro. 6.1.28 kfo_to_user kfo_to_user copies data from the fo into user space Synopsis kfo_to_user ( fo, to, len, copied); Arguments fifo address of the fo to be used to where the data must be copied len the size of the destination buffer copied pointer to output variable to store the number of copied bytes Description This macro copies at most len bytes from the fo into the to buffer and returns -EFAULT/0. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macro. 6.1.29 kfo_dma_in_prepare kfo_dma_in_prepare setup a scatterlist for DMA input Synopsis kfo_dma_in_prepare ( fo, sgl, nents, len); The Linux Kernel API 135 / 317 Arguments fifo address of the fo to be used sgl pointer to the scatterlist array nents number of entries in the scatterlist array len number of elements to transfer Description This macro lls a scatterlist for DMA input. It returns the number entries in the scatterlist array. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macros. 6.1.30 kfo_dma_in_nish kfo_dma_in_nish nish a DMA IN operation Synopsis kfo_dma_in_nish ( fo, len); Arguments fifo address of the fo to be used len number of bytes to received Description This macro nish a DMA IN operation. The in counter will be updated by the len parameter. No error checking will be done. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macros. 6.1.31 kfo_dma_out_prepare kfo_dma_out_prepare setup a scatterlist for DMA output Synopsis kfo_dma_out_prepare ( fo, sgl, nents, len); Arguments fifo address of the fo to be used sgl pointer to the scatterlist array nents number of entries in the scatterlist array len number of elements to transfer The Linux Kernel API 136 / 317 Description This macro lls a scatterlist for DMA output which at most len bytes to transfer. It returns the number entries in the scatterlist array. A zero means there is no space available and the scatterlist is not lled. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macros. 6.1.32 kfo_dma_out_nish kfo_dma_out_nish nish a DMA OUT operation Synopsis kfo_dma_out_nish ( fo, len); Arguments fifo address of the fo to be used len number of bytes transferrd Description This macro nish a DMA OUT operation. The out counter will be updated by the len parameter. No error checking will be done. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macros. 6.1.33 kfo_out_peek kfo_out_peek gets some data from the fo Synopsis kfo_out_peek ( fo, buf, n); Arguments fifo address of the fo to be used buf pointer to the storage buffer n max. number of elements to get Description This macro get the data from the fo and return the numbers of elements copied. The data is not removed from the fo. Note that with only one concurrent reader and one concurrent writer, you dont need extra locking to use these macro. The Linux Kernel API 137 / 317 Chapter 7 relay interface support Relay interface support is designed to provide an efcient mechanism for tools and facilities to relay large amounts of data from kernel space to user space. 7.1 relay interface 7.1.1 relay_buf_full relay_buf_full boolean, is the channel buffer full? Synopsis int relay_buf_full (struct rchan_buf * buf); Arguments buf channel buffer Description Returns 1 if the buffer is full, 0 otherwise. 7.1.2 relay_reset relay_reset reset the channel Synopsis void relay_reset (struct rchan * chan); Arguments chan the channel The Linux Kernel API 138 / 317 Description This has the effect of erasing all data from all channel buffers and restarting the channel in its initial state. The buffers are not freed, so any mappings are still in effect. NOTE. Care should be taken that the channel isnt actually being used by anything when this call is made. 7.1.3 relay_open relay_open create a new relay channel Synopsis struct rchan * relay_open (const char * base_lename, struct dentry * parent, size_t subbuf_size, size_t n_subbufs, struct rchan_callbacks * cb, void * private_data); Arguments base_filename base name of les to create, NULL for buffering only parent dentry of parent directory, NULL for root directory or buffer subbuf_size size of sub-buffers n_subbufs number of sub-buffers cb client callback functions private_data user-dened data Description Returns channel pointer if successful, NULL otherwise. Creates a channel buffer for each cpu using the sizes and attributes specied. The created channel buffer les will be named base_lename0...base_lenameN-1. File permissions will be S_IRUSR. 7.1.4 relay_switch_subbuf relay_switch_subbuf switch to a new sub-buffer Synopsis size_t relay_switch_subbuf (struct rchan_buf * buf, size_t length); Arguments buf channel buffer length size of current event The Linux Kernel API 139 / 317 Description Returns either the length passed in or 0 if full. Performs sub-buffer-switch tasks such as invoking callbacks, updating padding counts, waking up readers, etc. 7.1.5 relay_subbufs_consumed relay_subbufs_consumed update the buffers sub-buffers-consumed count Synopsis void relay_subbufs_consumed (struct rchan * chan, unsigned int cpu, size_t subbufs_consumed); Arguments chan the channel cpu the cpu associated with the channel buffer to update subbufs_consumed number of sub-buffers to add to current bufs count Description Adds to the channel buffers consumed sub-buffer count. subbufs_consumed should be the number of sub-buffers newly con- sumed, not the total consumed. NOTE. Kernel clients dont need to call this function if the channel mode is overwrite. 7.1.6 relay_close relay_close close the channel Synopsis void relay_close (struct rchan * chan); Arguments chan the channel Description Closes all channel buffers and frees the channel. 7.1.7 relay_ush relay_ush close the channel The Linux Kernel API 140 / 317 Synopsis void relay_ush (struct rchan * chan); Arguments chan the channel Description Flushes all channel buffers, i.e. forces buffer switch. 7.1.8 relay_mmap_buf relay_mmap_buf mmap channel buffer to process address space Synopsis int relay_mmap_buf (struct rchan_buf * buf, struct vm_area_struct * vma); Arguments buf relay channel buffer vma vm_area_struct describing memory to be mapped Description Returns 0 if ok, negative on error Caller should already have grabbed mmap_sem. 7.1.9 relay_alloc_buf relay_alloc_buf allocate a channel buffer Synopsis void * relay_alloc_buf (struct rchan_buf * buf, size_t * size); Arguments buf the buffer struct size total size of the buffer Description Returns a pointer to the resulting buffer, NULL if unsuccessful. The passed in size will get page aligned, if it isnt already. The Linux Kernel API 141 / 317 7.1.10 relay_create_buf relay_create_buf allocate and initialize a channel buffer Synopsis struct rchan_buf * relay_create_buf (struct rchan * chan); Arguments chan the relay channel Description Returns channel buffer if successful, NULL otherwise. 7.1.11 relay_destroy_channel relay_destroy_channel free the channel struct Synopsis void relay_destroy_channel (struct kref * kref); Arguments kref target kernel reference that contains the relay channel Description Should only be called from kref_put. 7.1.12 relay_destroy_buf relay_destroy_buf destroy an rchan_buf struct and associated buffer Synopsis void relay_destroy_buf (struct rchan_buf * buf); Arguments buf the buffer struct 7.1.13 relay_remove_buf relay_remove_buf remove a channel buffer The Linux Kernel API 142 / 317 Synopsis void relay_remove_buf (struct kref * kref); Arguments kref target kernel reference that contains the relay buffer Description Removes the le from the lesystem, which also frees the rchan_buf_struct and the channel buffer. Should only be called from kref_put. 7.1.14 relay_buf_empty relay_buf_empty boolean, is the channel buffer empty? Synopsis int relay_buf_empty (struct rchan_buf * buf); Arguments buf channel buffer Description Returns 1 if the buffer is empty, 0 otherwise. 7.1.15 wakeup_readers wakeup_readers wake up readers waiting on a channel Synopsis void wakeup_readers (unsigned long data); Arguments data contains the channel buffer Description This is the timer function used to defer reader waking. 7.1.16 __relay_reset __relay_reset reset a channel buffer The Linux Kernel API 143 / 317 Synopsis void __relay_reset (struct rchan_buf * buf, unsigned int init); Arguments buf the channel buffer init 1 if this is a rst-time initialization Description See relay_reset for description of effect. 7.1.17 relay_close_buf relay_close_buf close a channel buffer Synopsis void relay_close_buf (struct rchan_buf * buf); Arguments buf channel buffer Description Marks the buffer nalized and restores the default callbacks. The channel buffer and channel buffer data structure are then freed automatically when the last reference is given up. 7.1.18 relay_hotcpu_callback relay_hotcpu_callback CPU hotplug callback Synopsis int relay_hotcpu_callback (struct notier_block * nb, unsigned long action, void * hcpu); Arguments nb notier block action hotplug action to take hcpu CPU number Description Returns the success/failure of the operation. (NOTIFY_OK, NOTIFY_BAD) The Linux Kernel API 144 / 317 7.1.19 relay_late_setup_les relay_late_setup_les triggers le creation Synopsis int relay_late_setup_les (struct rchan * chan, const char * base_lename, struct dentry * parent); Arguments chan channel to operate on base_filename base name of les to create parent dentry of parent directory, NULL for root directory Description Returns 0 if successful, non-zero otherwise. Use to setup les for a previously buffer-only channel. Useful to do early tracing in kernel, before VFS is up, for example. 7.1.20 relay_le_open relay_le_open open le op for relay les Synopsis int relay_le_open (struct inode * inode, struct le * lp); Arguments inode the inode filp the le Description Increments the channel buffer refcount. 7.1.21 relay_le_mmap relay_le_mmap mmap le op for relay les Synopsis int relay_le_mmap (struct le * lp, struct vm_area_struct * vma); The Linux Kernel API 145 / 317 Arguments filp the le vma the vma describing what to map Description Calls upon relay_mmap_buf to map the le into user space. 7.1.22 relay_le_poll relay_le_poll poll le op for relay les Synopsis unsigned int relay_le_poll (struct le * lp, poll_table * wait); Arguments filp the le wait poll table Description Poll implemention. 7.1.23 relay_le_release relay_le_release release le op for relay les Synopsis int relay_le_release (struct inode * inode, struct le * lp); Arguments inode the inode filp the le Description Decrements the channel refcount, as the lesystem is no longer using it. 7.1.24 relay_le_read_subbuf_avail relay_le_read_subbuf_avail return bytes available in sub-buffer The Linux Kernel API 146 / 317 Synopsis size_t relay_le_read_subbuf_avail (size_t read_pos, struct rchan_buf * buf); Arguments read_pos le read position buf relay channel buffer 7.1.25 relay_le_read_start_pos relay_le_read_start_pos nd the rst available byte to read Synopsis size_t relay_le_read_start_pos (size_t read_pos, struct rchan_buf * buf); Arguments read_pos le read position buf relay channel buffer Description If the read_pos is in the middle of padding, return the position of the rst actually available byte, otherwise return the original value. 7.1.26 relay_le_read_end_pos relay_le_read_end_pos return the new read position Synopsis size_t relay_le_read_end_pos (struct rchan_buf * buf, size_t read_pos, size_t count); Arguments buf relay channel buffer read_pos le read position count number of bytes to be read The Linux Kernel API 147 / 317 Chapter 8 Module Support 8.1 Module Loading 8.1.1 __request_module __request_module try to load a kernel module Synopsis int __request_module (bool wait, const char * fmt, ...); Arguments wait wait (or not) for the operation to complete fmt printf style format string for the name of the module @...: arguments as specied in the format string ... variable arguments Description Load a module using the user mode module loader. The function returns zero on success or a negative errno code on failure. Note that a successful module load does not mean the module did not then unload and exit on an error of its own. Callers must check that the service they requested is now available not blindly invoke it. If module auto-loading support is disabled then this function becomes a no-operation. 8.1.2 call_usermodehelper_setup call_usermodehelper_setup prepare to call a usermode helper Synopsis struct subprocess_info * call_usermodehelper_setup (char * path, char ** argv, char ** envp, gfp_t gfp_mask, int (*init) (struct subprocess_info *info, struct cred *new), void (*cleanup) (struct subprocess_info *info), void * data); The Linux Kernel API 148 / 317 Arguments path path to usermode executable argv arg vector for process envp environment for process gfp_mask gfp mask for memory allocation init an init function cleanup a cleanup function data arbitrary context sensitive data Description Returns either NULL on allocation failure, or a subprocess_info structure. This should be passed to call_usermodehelper_exec to exec the process and free the structure. The init function is used to customize the helper process prior to exec. A non-zero return code causes the process to error out, exit, and return the failure to the calling process The cleanup function is just before ethe subprocess_info is about to be freed. This can be used for freeing the argv and envp. The Function must be runnable in either a process context or the context in which call_usermodehelper_exec is called. 8.1.3 call_usermodehelper_exec call_usermodehelper_exec start a usermode application Synopsis int call_usermodehelper_exec (struct subprocess_info * sub_info, int wait); Arguments sub_info information about the subprocessa wait wait for the application to nish and return status. when UMH_NO_WAIT dont wait at all, but you get no useful error back when the program couldnt be execed. This makes it safe to call from interrupt context. Description Runs a user-space application. The application is started asynchronously if wait is not set, and runs as a child of keventd. (ie. it runs with full root capabilities). 8.1.4 call_usermodehelper call_usermodehelper prepare and start a usermode application Synopsis int call_usermodehelper (char * path, char ** argv, char ** envp, int wait); The Linux Kernel API 149 / 317 Arguments path path to usermode executable argv arg vector for process envp environment for process wait wait for the application to nish and return status. when UMH_NO_WAIT dont wait at all, but you get no useful error back when the program couldnt be execed. This makes it safe to call from interrupt context. Description This function is the equivalent to use call_usermodehelper_setup and call_usermodehelper_exec. 8.2 Inter Module support Refer to the le kernel/module.c for more information. The Linux Kernel API 150 / 317 Chapter 9 Hardware Interfaces 9.1 Interrupt Handling 9.1.1 synchronize_hardirq synchronize_hardirq wait for pending hard IRQ handlers (on other CPUs) Synopsis void synchronize_hardirq (unsigned int irq); Arguments irq interrupt number to wait for Description This function waits for any pending hard IRQ handlers for this interrupt to complete before returning. If you use this function while holding a resource the IRQ handler may need you will deadlock. It does not take associated threaded handlers into account. Do not use this for shutdown scenarios where you must be sure that all parts (hardirq and threaded handler) have completed. This function may be called - with care - from IRQ context. 9.1.2 synchronize_irq synchronize_irq wait for pending IRQ handlers (on other CPUs) Synopsis void synchronize_irq (unsigned int irq); Arguments irq interrupt number to wait for The Linux Kernel API 151 / 317 Description This function waits for any pending IRQ handlers for this interrupt to complete before returning. If you use this function while holding a resource the IRQ handler may need you will deadlock. This function may be called - with care - from IRQ context. 9.1.3 irq_set_afnity_notier irq_set_afnity_notier control notication of IRQ afnity changes Synopsis int irq_set_afnity_notier (unsigned int irq, struct irq_afnity_notify * notify); Arguments irq Interrupt for which to enable/disable notication notify Context for notication, or NULL to disable notication. Function pointers must be initialised; the other elds will be initialised by this function. Description Must be called in process context. Notication may only be enabled after the IRQ is allocated and must be disabled before the IRQ is freed using free_irq. 9.1.4 disable_irq_nosync disable_irq_nosync disable an irq without waiting Synopsis void disable_irq_nosync (unsigned int irq); Arguments irq Interrupt to disable Description Disable the selected interrupt line. Disables and Enables are nested. Unlike disable_irq, this function does not ensure existing instances of the IRQ handler have completed before returning. This function may be called from IRQ context. 9.1.5 disable_irq disable_irq disable an irq and wait for completion The Linux Kernel API 152 / 317 Synopsis void disable_irq (unsigned int irq); Arguments irq Interrupt to disable Description Disable the selected interrupt line. Enables and Disables are nested. This function waits for any pending IRQ handlers for this interrupt to complete before returning. If you use this function while holding a resource the IRQ handler may need you will deadlock. This function may be called - with care - from IRQ context. 9.1.6 enable_irq enable_irq enable handling of an irq Synopsis void enable_irq (unsigned int irq); Arguments irq Interrupt to enable Description Undoes the effect of one call to disable_irq. If this matches the last disable, processing of interrupts on this IRQ line is re-enabled. This function may be called from IRQ context only when desc->irq_data.chip->bus_lock and desc->chip->bus_sync_unlock are NULL ! 9.1.7 irq_set_irq_wake irq_set_irq_wake control irq power management wakeup Synopsis int irq_set_irq_wake (unsigned int irq, unsigned int on); Arguments irq interrupt to control on enable/disable power management wakeup The Linux Kernel API 153 / 317 Description Enable/disable power management wakeup mode, which is disabled by default. Enables and disables must match, just as they match for non-wakeup mode support. Wakeup mode lets this IRQ wake the system from sleep states like suspend to RAM. 9.1.8 irq_wake_thread irq_wake_thread wake the irq thread for the action identied by dev_id Synopsis void irq_wake_thread (unsigned int irq, void * dev_id); Arguments irq Interrupt line dev_id Device identity for which the thread should be woken 9.1.9 setup_irq setup_irq setup an interrupt Synopsis int setup_irq (unsigned int irq, struct irqaction * act); Arguments irq Interrupt line to setup act irqaction for the interrupt Description Used to statically setup interrupts in the early boot process. 9.1.10 remove_irq remove_irq free an interrupt Synopsis void remove_irq (unsigned int irq, struct irqaction * act); The Linux Kernel API 154 / 317 Arguments irq Interrupt line to free act irqaction for the interrupt Description Used to remove interrupts statically setup by the early boot process. 9.1.11 free_irq free_irq free an interrupt allocated with request_irq Synopsis void free_irq (unsigned int irq, void * dev_id); Arguments irq Interrupt line to free dev_id Device identity to free Description Remove an interrupt handler. The handler is removed and if the interrupt line is no longer in use by any driver it is disabled. On a shared IRQ the caller must ensure the interrupt is disabled on the card it drives before calling this function. The function does not return until any executing interrupts for this IRQ have completed. This function must not be called from interrupt context. 9.1.12 request_threaded_irq request_threaded_irq allocate an interrupt line Synopsis int request_threaded_irq (unsigned int irq, irq_handler_t handler, irq_handler_t thread_fn, unsigned long irqags, const char * devname, void * dev_id); Arguments irq Interrupt line to allocate handler Function to be called when the IRQ occurs. Primary handler for threaded interrupts If NULL and thread_fn != NULL the default primary handler is installed thread_fn Function called from the irq handler thread If NULL, no irq thread is created irqflags Interrupt type ags devname An ascii name for the claiming device dev_id A cookie passed back to the handler function The Linux Kernel API 155 / 317 Description This call allocates interrupt resources and enables the interrupt line and IRQ handling. From the point this call is made your handler function may be invoked. Since your handler function must clear any interrupt the board raises, you must take care both to initialise your hardware and to set up the interrupt handler in the right order. If you want to set up a threaded irq handler for your device then you need to supply handler and thread_fn. handler is still called in hard interrupt context and has to check whether the interrupt originates from the device. If yes it needs to disable the interrupt on the device and return IRQ_WAKE_THREAD which will wake up the handler thread and run thread_fn. This split handler design is necessary to support shared interrupts. Dev_id must be globally unique. Normally the address of the device data structure is used as the cookie. Since the handler receives this value it makes sense to use it. If your interrupt is shared you must pass a non NULL dev_id as this is required when freeing the interrupt. Flags IRQF_SHARED Interrupt is shared IRQF_TRIGGER_* Specify active edge(s) or level 9.1.13 request_any_context_irq request_any_context_irq allocate an interrupt line Synopsis int request_any_context_irq (unsigned int irq, irq_handler_t handler, unsigned long ags, const char * name, void * dev_id); Arguments irq Interrupt line to allocate handler Function to be called when the IRQ occurs. Threaded handler for threaded interrupts. flags Interrupt type ags name An ascii name for the claiming device dev_id A cookie passed back to the handler function Description This call allocates interrupt resources and enables the interrupt line and IRQ handling. It selects either a hardirq or threaded handling method depending on the context. On failure, it returns a negative value. On success, it returns either IRQC_IS_HARDIRQ or IRQC_IS_NESTED. 9.2 DMA Channels 9.2.1 request_dma request_dma request and reserve a system DMA channel The Linux Kernel API 156 / 317 Synopsis int request_dma (unsigned int dmanr, const char * device_id); Arguments dmanr DMA channel number device_id reserving device ID string, used in /proc/dma 9.2.2 free_dma free_dma free a reserved system DMA channel Synopsis void free_dma (unsigned int dmanr); Arguments dmanr DMA channel number 9.3 Resources Management 9.3.1 request_resource_conict request_resource_conict request and reserve an I/O or memory resource Synopsis struct resource * request_resource_conict (struct resource * root, struct resource * new); Arguments root root resource descriptor new resource descriptor desired by caller Description Returns 0 for success, conict resource on error. 9.3.2 reallocate_resource reallocate_resource allocate a slot in the resource tree given range & alignment. The resource will be relocated if the new size cannot be reallocated in the current location. The Linux Kernel API 157 / 317 Synopsis int reallocate_resource (struct resource * root, struct resource * old, resource_size_t newsize, struct resource_constraint * con- straint); Arguments root root resource descriptor old resource descriptor desired by caller newsize new size of the resource descriptor constraint the size and alignment constraints to be met. 9.3.3 lookup_resource lookup_resource nd an existing resource by a resource start address Synopsis struct resource * lookup_resource (struct resource * root, resource_size_t start); Arguments root root resource descriptor start resource start address Description Returns a pointer to the resource if found, NULL otherwise 9.3.4 insert_resource_conict insert_resource_conict Inserts resource in the resource tree Synopsis struct resource * insert_resource_conict (struct resource * parent, struct resource * new); Arguments parent parent of the new resource new new resource to insert The Linux Kernel API 158 / 317 Description Returns 0 on success, conict resource if the resource cant be inserted. This function is equivalent to request_resource_conict when no conict happens. If a conict happens, and the conicting resources entirely t within the range of the new resource, then the new resource is inserted and the conicting resources become children of the new resource. 9.3.5 insert_resource insert_resource Inserts a resource in the resource tree Synopsis int insert_resource (struct resource * parent, struct resource * new); Arguments parent parent of the new resource new new resource to insert Description Returns 0 on success, -EBUSY if the resource cant be inserted. 9.3.6 insert_resource_expand_to_t insert_resource_expand_to_t Insert a resource into the resource tree Synopsis void insert_resource_expand_to_t (struct resource * root, struct resource * new); Arguments root root resource descriptor new new resource to insert Description Insert a resource into the resource tree, possibly expanding it in order to make it encompass any conicting resources. 9.3.7 resource_alignment resource_alignment calculate resources alignment Synopsis resource_size_t resource_alignment (struct resource * res); The Linux Kernel API 159 / 317 Arguments res resource pointer Description Returns alignment on success, 0 (invalid alignment) on failure. 9.3.8 release_mem_region_adjustable release_mem_region_adjustable release a previously reserved memory region Synopsis int release_mem_region_adjustable (struct resource * parent, resource_size_t start, resource_size_t size); Arguments parent parent resource descriptor start resource start address size resource region size Description This interface is intended for memory hot-delete. The requested region is released from a currently busy memory resource. The requested region must either match exactly or t into a single busy resource entry. In the latter case, the remaining resource is adjusted accordingly. Existing children of the busy memory resource must be immutable in the request. Note - Additional release conditions, such as overlapping region, can be supported after they are conrmed as valid cases. - When a busy memory resource gets split into two entries, the code assumes that all children remain in the lower address entry for simplicity. Enhance this logic when necessary. 9.3.9 request_resource request_resource request and reserve an I/O or memory resource Synopsis int request_resource (struct resource * root, struct resource * new); Arguments root root resource descriptor new resource descriptor desired by caller The Linux Kernel API 160 / 317 Description Returns 0 for success, negative error code on error. 9.3.10 release_resource release_resource release a previously reserved resource Synopsis int release_resource (struct resource * old); Arguments old resource pointer 9.3.11 allocate_resource allocate_resource allocate empty slot in the resource tree given range & alignment. The resource will be reallocated with a new size if it was already allocated Synopsis int allocate_resource (struct resource * root, struct resource * new, resource_size_t size, resource_size_t min, resource_size_t max, resource_size_t align, resource_size_t (*alignf) (void *, const struct resource *, resource_size_t, resource_size_t), void * alignf_data); Arguments root root resource descriptor new resource descriptor desired by caller size requested resource region size min minimum boundary to allocate max maximum boundary to allocate align alignment requested, in bytes alignf alignment function, optional, called if not NULL alignf_data arbitrary data to pass to the alignf function 9.3.12 adjust_resource adjust_resource modify a resources start and size Synopsis int adjust_resource (struct resource * res, resource_size_t start, resource_size_t size); The Linux Kernel API 161 / 317 Arguments res resource to modify start new start value size new size Description Given an existing resource, change its start and size to match the arguments. Returns 0 on success, -EBUSY if it cant t. Existing children of the resource are assumed to be immutable. 9.3.13 __request_region __request_region create a new busy resource region Synopsis struct resource * __request_region (struct resource * parent, resource_size_t start, resource_size_t n, const char * name, int ags); Arguments parent parent resource descriptor start resource start address n resource region size name reserving callers ID string flags IO resource ags 9.3.14 __check_region __check_region check if a resource region is busy or free Synopsis int __check_region (struct resource * parent, resource_size_t start, resource_size_t n); Arguments parent parent resource descriptor start resource start address n resource region size Description Returns 0 if the region is free at the moment it is checked, returns -EBUSY if the region is busy. The Linux Kernel API 162 / 317 NOTE This function is deprecated because its use is racy. Even if it returns 0, a subsequent call to request_region may fail because another driver etc. just allocated the region. Do NOT use it. It will be removed from the kernel. 9.3.15 __release_region __release_region release a previously reserved resource region Synopsis void __release_region (struct resource * parent, resource_size_t start, resource_size_t n); Arguments parent parent resource descriptor start resource start address n resource region size Description The described resource region must match a currently busy region. 9.4 MTRR Handling 9.4.1 mtrr_add mtrr_add Add a memory type region Synopsis int mtrr_add (unsigned long base, unsigned long size, unsigned int type, bool increment); Arguments base Physical base address of region size Physical size of region type Type of MTRR desired increment If this is true do usage counting on the region The Linux Kernel API 163 / 317 Description Memory type region registers control the caching on newer Intel and non Intel processors. This function allows drivers to request an MTRR is added. The details and hardware specics of each processors implementation are hidden from the caller, but nevertheless the caller should expect to need to provide a power of two size on an equivalent power of two boundary. If the region cannot be added either because all regions are in use or the CPU cannot support it a negative value is returned. On success the register number for this entry is returned, but should be treated as a cookie only. On a multiprocessor machine the changes are made to all processors. This is required on x86 by the Intel processors. The available types are MTRR_TYPE_UNCACHABLE - No caching MTRR_TYPE_WRBACK - Write data back in bursts whenever MTRR_TYPE_WRCOMB - Write data back soon but allow bursts MTRR_TYPE_WRTHROUGH - Cache reads but not writes BUGS Needs a quiet ag for the cases where drivers do not mind failures and do not wish system log messages to be sent. 9.4.2 mtrr_del mtrr_del delete a memory type region Synopsis int mtrr_del (int reg, unsigned long base, unsigned long size); Arguments reg Register returned by mtrr_add base Physical base address size Size of region Description If register is supplied then base and size are ignored. This is how drivers should call it. Releases an MTRR region. If the usage count drops to zero the register is freed and the region returns to default state. On success the register is returned, on failure a negative error code. 9.4.3 arch_phys_wc_add arch_phys_wc_add add a WC MTRR and handle errors if PAT is unavailable Synopsis int arch_phys_wc_add (unsigned long base, unsigned long size); The Linux Kernel API 164 / 317 Arguments base Physical base address size Size of region Description If PAT is available, this does nothing. If PAT is unavailable, it attempts to add a WC MTRR covering size bytes starting at base and logs an error if this fails. Drivers must store the return value to pass to mtrr_del_wc_if_needed, but drivers should not try to interpret that return value. 9.5 PCI Support Library 9.5.1 pci_bus_max_busnr pci_bus_max_busnr returns maximum PCI bus number of given bus children Synopsis unsigned char pci_bus_max_busnr (struct pci_bus * bus); Arguments bus pointer to PCI bus structure to search Description Given a PCI bus, returns the highest PCI bus number present in the set including the given PCI bus and its list of child PCI buses. 9.5.2 pci_nd_capability pci_nd_capability query for devices capabilities Synopsis int pci_nd_capability (struct pci_dev * dev, int cap); Arguments dev PCI device to query cap capability code The Linux Kernel API 165 / 317 Description Tell if a device supports a given PCI capability. Returns the address of the requested capability structure within the devices PCI conguration space or 0 in case the device does not support it. Possible values for cap: PCI_CAP_ID_PM Power Management PCI_CAP_ID_AGP Accelerated Graphics Port PCI_CAP_ID_VPD Vital Product Data PCI_CAP_ID_SLOTID Slot Identication PCI_CAP_ID_MSI Message Signalled Interrupts PCI_CAP_ID_CHSWP CompactPCI HotSwap PCI_CAP_ID_PCIX PCI-X PCI_CAP_ID_EXP PCI Express 9.5.3 pci_bus_nd_capability pci_bus_nd_capability query for devices capabilities Synopsis int pci_bus_nd_capability (struct pci_bus * bus, unsigned int devfn, int cap); Arguments bus the PCI bus to query devfn PCI device to query cap capability code Description Like pci_find_capability but works for pci devices that do not have a pci_dev structure set up yet. Returns the address of the requested capability structure within the devices PCI conguration space or 0 in case the device does not support it. 9.5.4 pci_nd_next_ext_capability pci_nd_next_ext_capability Find an extended capability Synopsis int pci_nd_next_ext_capability (struct pci_dev * dev, int start, int cap); Arguments dev PCI device to query start address at which to start looking (0 to start at beginning of list) cap capability code Description Returns the address of the next matching extended capability structure within the devices PCI conguration space or 0 if the device does not support it. Some capabilities can occur several times, e.g., the vendor-specic capability, and this provides a way to nd them all. The Linux Kernel API 166 / 317 9.5.5 pci_nd_ext_capability pci_nd_ext_capability Find an extended capability Synopsis int pci_nd_ext_capability (struct pci_dev * dev, int cap); Arguments dev PCI device to query cap capability code Description Returns the address of the requested extended capability structure within the devices PCI conguration space or 0 if the device does not support it. Possible values for cap: PCI_EXT_CAP_ID_ERR Advanced Error Reporting PCI_EXT_CAP_ID_VC Virtual Channel PCI_EXT_CAP_ID_DSN De- vice Serial Number PCI_EXT_CAP_ID_PWR Power Budgeting 9.5.6 pci_nd_next_ht_capability pci_nd_next_ht_capability query a devices Hypertransport capabilities Synopsis int pci_nd_next_ht_capability (struct pci_dev * dev, int pos, int ht_cap); Arguments dev PCI device to query pos Position from which to continue searching ht_cap Hypertransport capability code Description To be used in conjunction with pci_find_ht_capability to search for all capabilities matching ht_cap. pos should always be a value returned from pci_find_ht_capability. NB. To be 100% safe against broken PCI devices, the caller should take steps to avoid an innite loop. 9.5.7 pci_nd_ht_capability pci_nd_ht_capability query a devices Hypertransport capabilities Synopsis int pci_nd_ht_capability (struct pci_dev * dev, int ht_cap); The Linux Kernel API 167 / 317 Arguments dev PCI device to query ht_cap Hypertransport capability code Description Tell if a device supports a given Hypertransport capability. Returns an address within the devices PCI conguration space or 0 in case the device does not support the request capability. The address points to the PCI capability, of type PCI_CAP_ID_HT, which has a Hypertransport capability matching ht_cap. 9.5.8 pci_nd_parent_resource pci_nd_parent_resource return resource region of parent bus of given region Synopsis struct resource * pci_nd_parent_resource (const struct pci_dev * dev, struct resource * res); Arguments dev PCI device structure contains resources to be searched res child resource record for which parent is sought Description For given resource region of given device, return the resource region of parent bus the given region is contained in. 9.5.9 __pci_complete_power_transition __pci_complete_power_transition Complete power transition of a PCI device Synopsis int __pci_complete_power_transition (struct pci_dev * dev, pci_power_t state); Arguments dev PCI device to handle. state State to put the device into. Description This function should not be called directly by device drivers. The Linux Kernel API 168 / 317 9.5.10 pci_set_power_state pci_set_power_state Set the power state of a PCI device Synopsis int pci_set_power_state (struct pci_dev * dev, pci_power_t state); Arguments dev PCI device to handle. state PCI power state (D0, D1, D2, D3hot) to put the device into. Description Transition a device to a new power state, using the platform rmware and/or the devices PCI PM registers. RETURN VALUE -EINVAL if the requested state is invalid. -EIO if device does not support PCI PM or its PM capabilities register has a wrong version, or device doesnt support the requested state. 0 if device already is in the requested state. 0 if devices power state has been successfully changed. 9.5.11 pci_choose_state pci_choose_state Choose the power state of a PCI device Synopsis pci_power_t pci_choose_state (struct pci_dev * dev, pm_message_t state); Arguments dev PCI device to be suspended state target sleep state for the whole system. This is the value that is passed to suspend function. Description Returns PCI power state suitable for given device and given system message. 9.5.12 pci_save_state pci_save_state save the PCI conguration space of a device before suspending Synopsis int pci_save_state (struct pci_dev * dev); The Linux Kernel API 169 / 317 Arguments dev - PCI device that were dealing with 9.5.13 pci_restore_state pci_restore_state Restore the saved state of a PCI device Synopsis void pci_restore_state (struct pci_dev * dev); Arguments dev - PCI device that were dealing with 9.5.14 pci_store_saved_state pci_store_saved_state Allocate and return an opaque struct containing the device saved state. Synopsis struct pci_saved_state * pci_store_saved_state (struct pci_dev * dev); Arguments dev PCI device that were dealing with Description Return NULL if no state or error. 9.5.15 pci_load_and_free_saved_state pci_load_and_free_saved_state Reload the save state pointed to by state, and free the memory allocated for it. Synopsis int pci_load_and_free_saved_state (struct pci_dev * dev, struct pci_saved_state ** state); Arguments dev PCI device that were dealing with state Pointer to saved state returned from pci_store_saved_state The Linux Kernel API 170 / 317 9.5.16 pci_reenable_device pci_reenable_device Resume abandoned device Synopsis int pci_reenable_device (struct pci_dev * dev); Arguments dev PCI device to be resumed Description Note this function is a backend of pci_default_resume and is not supposed to be called by normal code, write proper resume handler and use it instead. 9.5.17 pci_enable_device_io pci_enable_device_io Initialize a device for use with IO space Synopsis int pci_enable_device_io (struct pci_dev * dev); Arguments dev PCI device to be initialized Description Initialize device before its used by a driver. Ask low-level code to enable I/O resources. Wake up the device if it was suspended. Beware, this function can fail. 9.5.18 pci_enable_device_mem pci_enable_device_mem Initialize a device for use with Memory space Synopsis int pci_enable_device_mem (struct pci_dev * dev); Arguments dev PCI device to be initialized The Linux Kernel API 171 / 317 Description Initialize device before its used by a driver. Ask low-level code to enable Memory resources. Wake up the device if it was suspended. Beware, this function can fail. 9.5.19 pci_enable_device pci_enable_device Initialize device before its used by a driver. Synopsis int pci_enable_device (struct pci_dev * dev); Arguments dev PCI device to be initialized Description Initialize device before its used by a driver. Ask low-level code to enable I/O and memory. Wake up the device if it was suspended. Beware, this function can fail. Note we dont actually enable the device many times if we call this function repeatedly (we just increment the count). 9.5.20 pcim_enable_device pcim_enable_device Managed pci_enable_device Synopsis int pcim_enable_device (struct pci_dev * pdev); Arguments pdev PCI device to be initialized Description Managed pci_enable_device. 9.5.21 pcim_pin_device pcim_pin_device Pin managed PCI device Synopsis void pcim_pin_device (struct pci_dev * pdev); The Linux Kernel API 172 / 317 Arguments pdev PCI device to pin Description Pin managed PCI device pdev. Pinned device wont be disabled on driver detach. pdev must have been enabled with pcim_e nable_device. 9.5.22 pci_disable_device pci_disable_device Disable PCI device after use Synopsis void pci_disable_device (struct pci_dev * dev); Arguments dev PCI device to be disabled Description Signal to the system that the PCI device is not in use by the system anymore. This only involves disabling PCI bus-mastering, if active. Note we dont actually disable the device until all callers of pci_enable_device have called pci_disable_device. 9.5.23 pci_set_pcie_reset_state pci_set_pcie_reset_state set reset state for device dev Synopsis int pci_set_pcie_reset_state (struct pci_dev * dev, enum pcie_reset_state state); Arguments dev the PCIe device reset state Reset state to enter into Description Sets the PCI reset state for the device. 9.5.24 pci_pme_capable pci_pme_capable check the capability of PCI device to generate PME# The Linux Kernel API 173 / 317 Synopsis bool pci_pme_capable (struct pci_dev * dev, pci_power_t state); Arguments dev PCI device to handle. state PCI state from which device will issue PME#. 9.5.25 pci_pme_active pci_pme_active enable or disable PCI devices PME# function Synopsis void pci_pme_active (struct pci_dev * dev, bool enable); Arguments dev PCI device to handle. enable true to enable PME# generation; false to disable it. Description The caller must verify that the device is capable of generating PME# before calling this function with enable equal to true. 9.5.26 __pci_enable_wake __pci_enable_wake enable PCI device as wakeup event source Synopsis int __pci_enable_wake (struct pci_dev * dev, pci_power_t state, bool runtime, bool enable); Arguments dev PCI device affected state PCI state from which device will issue wakeup events runtime True if the events are to be generated at run time enable True to enable event generation; false to disable Description This enables the device as a wakeup event source, or disables it. When such events involves platform-specic hooks, those hooks are called automatically by this routine. Devices with legacy power management (no standard PCI PM capabilities) always require such platform hooks. The Linux Kernel API 174 / 317 RETURN VALUE 0 is returned on success -EINVAL is returned if device is not supposed to wake up the system Error code depending on the platform is returned if both the platform and the native mechanism fail to enable the generation of wake-up events 9.5.27 pci_wake_from_d3 pci_wake_from_d3 enable/disable device to wake up from D3_hot or D3_cold Synopsis int pci_wake_from_d3 (struct pci_dev * dev, bool enable); Arguments dev PCI device to prepare enable True to enable wake-up event generation; false to disable Description Many drivers want the device to wake up the system from D3_hot or D3_cold and this function allows them to set that up cleanly - pci_enable_wake should not be called twice in a row to enable wake-up due to PCI PM vs ACPI ordering constraints. This function only returns error code if the device is not capable of generating PME# from both D3_hot and D3_cold, and the platform is unable to enable wake-up power for it. 9.5.28 pci_prepare_to_sleep pci_prepare_to_sleep prepare PCI device for system-wide transition into a sleep state Synopsis int pci_prepare_to_sleep (struct pci_dev * dev); Arguments dev Device to handle. Description Choose the power state appropriate for the device depending on whether it can wake up the system and/or is power manageable by the platform (PCI_D3hot is the default) and put the device into that state. 9.5.29 pci_back_from_sleep pci_back_from_sleep turn PCI device on during system-wide transition into working state The Linux Kernel API 175 / 317 Synopsis int pci_back_from_sleep (struct pci_dev * dev); Arguments dev Device to handle. Description Disable devices system wake-up capability and put it into D0. 9.5.30 pci_dev_run_wake pci_dev_run_wake Check if device can generate run-time wake-up events. Synopsis bool pci_dev_run_wake (struct pci_dev * dev); Arguments dev Device to check. Description Return true if the device itself is capable of generating wake-up events (through the platform or using the native PCIe PME) or if the device supports PME and one of its upstream bridges can generate wake-up events. 9.5.31 pci_release_region pci_release_region Release a PCI bar Synopsis void pci_release_region (struct pci_dev * pdev, int bar); Arguments pdev PCI device whose resources were previously reserved by pci_request_region bar BAR to release Description Releases the PCI I/O and memory resources previously reserved by a successful call to pci_request_region. Call this function only after all use of the PCI regions has ceased. The Linux Kernel API 176 / 317 9.5.32 pci_request_region pci_request_region Reserve PCI I/O and memory resource Synopsis int pci_request_region (struct pci_dev * pdev, int bar, const char * res_name); Arguments pdev PCI device whose resources are to be reserved bar BAR to be reserved res_name Name to be associated with resource Description Mark the PCI region associated with PCI device pdev BAR bar as being reserved by owner res_name. Do not access any address inside the PCI regions unless this call returns successfully. Returns 0 on success, or EBUSY on error. A warning message is also printed on failure. 9.5.33 pci_request_region_exclusive pci_request_region_exclusive Reserved PCI I/O and memory resource Synopsis int pci_request_region_exclusive (struct pci_dev * pdev, int bar, const char * res_name); Arguments pdev PCI device whose resources are to be reserved bar BAR to be reserved res_name Name to be associated with resource. Description Mark the PCI region associated with PCI device pdev BR bar as being reserved by owner res_name. Do not access any address inside the PCI regions unless this call returns successfully. Returns 0 on success, or EBUSY on error. A warning message is also printed on failure. The key difference that _exclusive makes it that userspace is explicitly not allowed to map the resource via /dev/mem or sysfs. 9.5.34 pci_release_selected_regions pci_release_selected_regions Release selected PCI I/O and memory resources The Linux Kernel API 177 / 317 Synopsis void pci_release_selected_regions (struct pci_dev * pdev, int bars); Arguments pdev PCI device whose resources were previously reserved bars Bitmask of BARs to be released Description Release selected PCI I/O and memory resources previously reserved. Call this function only after all use of the PCI regions has ceased. 9.5.35 pci_request_selected_regions pci_request_selected_regions Reserve selected PCI I/O and memory resources Synopsis int pci_request_selected_regions (struct pci_dev * pdev, int bars, const char * res_name); Arguments pdev PCI device whose resources are to be reserved bars Bitmask of BARs to be requested res_name Name to be associated with resource 9.5.36 pci_release_regions pci_release_regions Release reserved PCI I/O and memory resources Synopsis void pci_release_regions (struct pci_dev * pdev); Arguments pdev PCI device whose resources were previously reserved by pci_request_regions Description Releases all PCI I/O and memory resources previously reserved by a successful call to pci_request_regions. Call this function only after all use of the PCI regions has ceased. The Linux Kernel API 178 / 317 9.5.37 pci_request_regions pci_request_regions Reserved PCI I/O and memory resources Synopsis int pci_request_regions (struct pci_dev * pdev, const char * res_name); Arguments pdev PCI device whose resources are to be reserved res_name Name to be associated with resource. Description Mark all PCI regions associated with PCI device pdev as being reserved by owner res_name. Do not access any address inside the PCI regions unless this call returns successfully. Returns 0 on success, or EBUSY on error. A warning message is also printed on failure. 9.5.38 pci_request_regions_exclusive pci_request_regions_exclusive Reserved PCI I/O and memory resources Synopsis int pci_request_regions_exclusive (struct pci_dev * pdev, const char * res_name); Arguments pdev PCI device whose resources are to be reserved res_name Name to be associated with resource. Description Mark all PCI regions associated with PCI device pdev as being reserved by owner res_name. Do not access any address inside the PCI regions unless this call returns successfully. pci_request_regions_exclusive will mark the region so that /dev/mem and the sysfs MMIO access will not be al- lowed. Returns 0 on success, or EBUSY on error. A warning message is also printed on failure. 9.5.39 pci_set_master pci_set_master enables bus-mastering for device dev Synopsis void pci_set_master (struct pci_dev * dev); The Linux Kernel API 179 / 317 Arguments dev the PCI device to enable Description Enables bus-mastering on the device and calls pcibios_set_master to do the needed arch specic settings. 9.5.40 pci_clear_master pci_clear_master disables bus-mastering for device dev Synopsis void pci_clear_master (struct pci_dev * dev); Arguments dev the PCI device to disable 9.5.41 pci_set_cacheline_size pci_set_cacheline_size ensure the CACHE_LINE_SIZE register is programmed Synopsis int pci_set_cacheline_size (struct pci_dev * dev); Arguments dev the PCI device for which MWI is to be enabled Description Helper function for pci_set_mwi. Originally copied from drivers/net/acenic.c. Copyright 1998-2001 by Jes Sorensen, <jestrai ned-monkey.org>. RETURNS An appropriate -ERRNO error value on error, or zero for success. 9.5.42 pci_set_mwi pci_set_mwi enables memory-write-invalidate PCI transaction Synopsis int pci_set_mwi (struct pci_dev * dev); The Linux Kernel API 180 / 317 Arguments dev the PCI device for which MWI is enabled Description Enables the Memory-Write-Invalidate transaction in PCI_COMMAND. RETURNS An appropriate -ERRNO error value on error, or zero for success. 9.5.43 pci_try_set_mwi pci_try_set_mwi enables memory-write-invalidate PCI transaction Synopsis int pci_try_set_mwi (struct pci_dev * dev); Arguments dev the PCI device for which MWI is enabled Description Enables the Memory-Write-Invalidate transaction in PCI_COMMAND. Callers are not required to check the return value. RETURNS An appropriate -ERRNO error value on error, or zero for success. 9.5.44 pci_clear_mwi pci_clear_mwi disables Memory-Write-Invalidate for device dev Synopsis void pci_clear_mwi (struct pci_dev * dev); Arguments dev the PCI device to disable Description Disables PCI Memory-Write-Invalidate transaction on the device The Linux Kernel API 181 / 317 9.5.45 pci_intx pci_intx enables/disables PCI INTx for device dev Synopsis void pci_intx (struct pci_dev * pdev, int enable); Arguments pdev the PCI device to operate on enable boolean: whether to enable or disable PCI INTx Description Enables/disables PCI INTx for device dev 9.5.46 pci_intx_mask_supported pci_intx_mask_supported probe for INTx masking support Synopsis bool pci_intx_mask_supported (struct pci_dev * dev); Arguments dev the PCI device to operate on Description Check if the device dev support INTx masking via the cong space command word. 9.5.47 pci_check_and_mask_intx pci_check_and_mask_intx mask INTx on pending interrupt Synopsis bool pci_check_and_mask_intx (struct pci_dev * dev); Arguments dev the PCI device to operate on The Linux Kernel API 182 / 317 Description Check if the device dev has its INTx line asserted, mask it and return true in that case. False is returned if not interrupt was pending. 9.5.48 pci_check_and_unmask_intx pci_check_and_unmask_intx unmask INTx if no interrupt is pending Synopsis bool pci_check_and_unmask_intx (struct pci_dev * dev); Arguments dev the PCI device to operate on Description Check if the device dev has its INTx line asserted, unmask it if not and return true. False is returned and the mask remains active if there was still an interrupt pending. 9.5.49 pci_msi_off pci_msi_off disables any MSI or MSI-X capabilities Synopsis void pci_msi_off (struct pci_dev * dev); Arguments dev the PCI device to operate on Description If you want to use MSI, see pci_enable_msi and friends. This is a lower-level primitive that allows us to disable MSI operation at the device level. 9.5.50 pci_wait_for_pending_transaction pci_wait_for_pending_transaction waits for pending transaction Synopsis int pci_wait_for_pending_transaction (struct pci_dev * dev); The Linux Kernel API 183 / 317 Arguments dev the PCI device to operate on Description Return 0 if transaction is pending 1 otherwise. 9.5.51 pci_reset_bridge_secondary_bus pci_reset_bridge_secondary_bus Reset the secondary bus on a PCI bridge. Synopsis void pci_reset_bridge_secondary_bus (struct pci_dev * dev); Arguments dev Bridge device Description Use the bridge control register to assert reset on the secondary bus. Devices on the secondary bus are left in power-on state. 9.5.52 __pci_reset_function __pci_reset_function reset a PCI device function Synopsis int __pci_reset_function (struct pci_dev * dev); Arguments dev PCI device to reset Description Some devices allow an individual function to be reset without affecting other functions in the same device. The PCI device must be responsive to PCI cong space in order to use this function. The device function is presumed to be unused when this function is called. Resetting the device will make the contents of PCI conguration space random, so any caller of this must be prepared to reinitialise the device including MSI, bus mastering, BARs, decoding IO and memory spaces, etc. Returns 0 if the device function was successfully reset or negative if the device doesnt support resetting a single function. 9.5.53 __pci_reset_function_locked __pci_reset_function_locked reset a PCI device function while holding the dev mutex lock. The Linux Kernel API 184 / 317 Synopsis int __pci_reset_function_locked (struct pci_dev * dev); Arguments dev PCI device to reset Description Some devices allow an individual function to be reset without affecting other functions in the same device. The PCI device must be responsive to PCI cong space in order to use this function. The device function is presumed to be unused and the caller is holding the device mutex lock when this function is called. Re- setting the device will make the contents of PCI conguration space random, so any caller of this must be prepared to reinitialise the device including MSI, bus mastering, BARs, decoding IO and memory spaces, etc. Returns 0 if the device function was successfully reset or negative if the device doesnt support resetting a single function. 9.5.54 pci_reset_function pci_reset_function quiesce and reset a PCI device function Synopsis int pci_reset_function (struct pci_dev * dev); Arguments dev PCI device to reset Description Some devices allow an individual function to be reset without affecting other functions in the same device. The PCI device must be responsive to PCI cong space in order to use this function. This function does not just reset the PCI portion of a device, but clears all the state associated with the device. This function differs from __pci_reset_function in that it saves and restores device state over the reset. Returns 0 if the device function was successfully reset or negative if the device doesnt support resetting a single function. 9.5.55 pci_try_reset_function pci_try_reset_function quiesce and reset a PCI device function Synopsis int pci_try_reset_function (struct pci_dev * dev); Arguments dev PCI device to reset The Linux Kernel API 185 / 317 Description Same as above, except return -EAGAIN if unable to lock device. 9.5.56 pci_probe_reset_slot pci_probe_reset_slot probe whether a PCI slot can be reset Synopsis int pci_probe_reset_slot (struct pci_slot * slot); Arguments slot PCI slot to probe Description Return 0 if slot can be reset, negative if a slot reset is not supported. 9.5.57 pci_reset_slot pci_reset_slot reset a PCI slot Synopsis int pci_reset_slot (struct pci_slot * slot); Arguments slot PCI slot to reset Description A PCI bus may host multiple slots, each slot may support a reset mechanism independent of other slots. For instance, some slots may support slot power control. In the case of a 1:1 bus to slot architecture, this function may wrap the bus reset to avoid spurious slot related events such as hotplug. Generally a slot reset should be attempted before a bus reset. All of the function of the slot and any subordinate buses behind the slot are reset through this function. PCI cong space of all devices in the slot and behind the slot is saved before and restored after reset. Return 0 on success, non-zero on error. 9.5.58 pci_try_reset_slot pci_try_reset_slot Try to reset a PCI slot Synopsis int pci_try_reset_slot (struct pci_slot * slot); The Linux Kernel API 186 / 317 Arguments slot PCI slot to reset Description Same as above except return -EAGAIN if the slot cannot be locked 9.5.59 pci_probe_reset_bus pci_probe_reset_bus probe whether a PCI bus can be reset Synopsis int pci_probe_reset_bus (struct pci_bus * bus); Arguments bus PCI bus to probe Description Return 0 if bus can be reset, negative if a bus reset is not supported. 9.5.60 pci_reset_bus pci_reset_bus reset a PCI bus Synopsis int pci_reset_bus (struct pci_bus * bus); Arguments bus top level PCI bus to reset Description Do a bus reset on the given bus and any subordinate buses, saving and restoring state of all devices. Return 0 on success, non-zero on error. 9.5.61 pci_try_reset_bus pci_try_reset_bus Try to reset a PCI bus Synopsis int pci_try_reset_bus (struct pci_bus * bus); The Linux Kernel API 187 / 317 Arguments bus top level PCI bus to reset Description Same as above except return -EAGAIN if the bus cannot be locked 9.5.62 pcix_get_max_mmrbc pcix_get_max_mmrbc get PCI-X maximum designed memory read byte count Synopsis int pcix_get_max_mmrbc (struct pci_dev * dev); Arguments dev PCI device to query Returns mmrbc maximum designed memory read count in bytes or appropriate error value. 9.5.63 pcix_get_mmrbc pcix_get_mmrbc get PCI-X maximum memory read byte count Synopsis int pcix_get_mmrbc (struct pci_dev * dev); Arguments dev PCI device to query Returns mmrbc maximum memory read count in bytes or appropriate error value. 9.5.64 pcix_set_mmrbc pcix_set_mmrbc set PCI-X maximum memory read byte count Synopsis int pcix_set_mmrbc (struct pci_dev * dev, int mmrbc); The Linux Kernel API 188 / 317 Arguments dev PCI device to query mmrbc maximum memory read count in bytes valid values are 512, 1024, 2048, 4096 Description If possible sets maximum memory read byte count, some bridges have erratas that prevent this. 9.5.65 pcie_get_readrq pcie_get_readrq get PCI Express read request size Synopsis int pcie_get_readrq (struct pci_dev * dev); Arguments dev PCI device to query Description Returns maximum memory read request in bytes or appropriate error value. 9.5.66 pcie_set_readrq pcie_set_readrq set PCI Express maximum memory read request Synopsis int pcie_set_readrq (struct pci_dev * dev, int rq); Arguments dev PCI device to query rq maximum memory read count in bytes valid values are 128, 256, 512, 1024, 2048, 4096 Description If possible sets maximum memory read request in bytes 9.5.67 pcie_get_mps pcie_get_mps get PCI Express maximum payload size The Linux Kernel API 189 / 317 Synopsis int pcie_get_mps (struct pci_dev * dev); Arguments dev PCI device to query Description Returns maximum payload size in bytes 9.5.68 pcie_set_mps pcie_set_mps set PCI Express maximum payload size Synopsis int pcie_set_mps (struct pci_dev * dev, int mps); Arguments dev PCI device to query mps maximum payload size in bytes valid values are 128, 256, 512, 1024, 2048, 4096 Description If possible sets maximum payload size 9.5.69 pcie_get_minimum_link pcie_get_minimum_link determine minimum link settings of a PCI device Synopsis int pcie_get_minimum_link (struct pci_dev * dev, enum pci_bus_speed * speed, enum pcie_link_width * width); Arguments dev PCI device to query speed storage for minimum speed width storage for minimum width Description This function will walk up the PCI device chain and determine the minimum link width and speed of the device. The Linux Kernel API 190 / 317 9.5.70 pci_select_bars pci_select_bars Make BAR mask from the type of resource Synopsis int pci_select_bars (struct pci_dev * dev, unsigned long ags); Arguments dev the PCI device for which BAR mask is made flags resource type mask to be selected Description This helper routine makes bar mask from the type of resource. 9.5.71 pci_add_dynid pci_add_dynid add a new PCI device ID to this driver and re-probe devices Synopsis int pci_add_dynid (struct pci_driver * drv, unsigned int vendor, unsigned int device, unsigned int subvendor, unsigned int subdevice, unsigned int class, unsigned int class_mask, unsigned long driver_data); Arguments drv target pci driver vendor PCI vendor ID device PCI device ID subvendor PCI subvendor ID subdevice PCI subdevice ID class PCI class class_mask PCI class mask driver_data private driver data Description Adds a new dynamic pci device ID to this driver and causes the driver to probe for all devices again. drv must have been registered prior to calling this function. CONTEXT Does GFP_KERNEL allocation. The Linux Kernel API 191 / 317 RETURNS 0 on success, -errno on failure. 9.5.72 pci_match_id pci_match_id See if a pci device matches a given pci_id table Synopsis const struct pci_device_id * pci_match_id (const struct pci_device_id * ids, struct pci_dev * dev); Arguments ids array of PCI device id structures to search in dev the PCI device structure to match against. Description Used by a driver to check whether a PCI device present in the system is in its list of supported devices. Returns the matching pci_device_id structure or NULL if there is no match. Deprecated, dont use this as it will not catch any dynamic ids that a driver might want to check for. 9.5.73 __pci_register_driver __pci_register_driver register a new pci driver Synopsis int __pci_register_driver (struct pci_driver * drv, struct module * owner, const char * mod_name); Arguments drv the driver structure to register owner owner module of drv mod_name module name string Description Adds the driver structure to the list of registered drivers. Returns a negative value on error, otherwise 0. If no error occurred, the driver remains registered even if no device was claimed during registration. 9.5.74 pci_unregister_driver pci_unregister_driver unregister a pci driver The Linux Kernel API 192 / 317 Synopsis void pci_unregister_driver (struct pci_driver * drv); Arguments drv the driver structure to unregister Description Deletes the driver structure from the list of registered PCI drivers, gives it a chance to clean up by calling its remove function for each device it was responsible for, and marks those devices as driverless. 9.5.75 pci_dev_driver pci_dev_driver get the pci_driver of a device Synopsis struct pci_driver * pci_dev_driver (const struct pci_dev * dev); Arguments dev the device to query Description Returns the appropriate pci_driver structure or NULL if there is no registered driver for the device. 9.5.76 pci_dev_get pci_dev_get increments the reference count of the pci device structure Synopsis struct pci_dev * pci_dev_get (struct pci_dev * dev); Arguments dev the device being referenced Description Each live reference to a device should be refcounted. Drivers for PCI devices should normally record such references in their probe methods, when they bind to a device, and release them by calling pci_dev_put, in their disconnect methods. A pointer to the device with the incremented reference counter is returned. The Linux Kernel API 193 / 317 9.5.77 pci_dev_put pci_dev_put release a use of the pci device structure Synopsis void pci_dev_put (struct pci_dev * dev); Arguments dev device thats been disconnected Description Must be called when a user of a device is nished with it. When the last user of the device calls this function, the memory of the device is freed. 9.5.78 pci_stop_and_remove_bus_device pci_stop_and_remove_bus_device remove a PCI device and any children Synopsis void pci_stop_and_remove_bus_device (struct pci_dev * dev); Arguments dev the device to remove Description Remove a PCI device from the device lists, informing the drivers that the device has been removed. We also remove any subordinate buses and children in a depth-rst manner. For each device we remove, delete the device structure from the device lists, remove the /proc entry, and notify userspace (/sbin/hotplug). 9.5.79 pci_nd_bus pci_nd_bus locate PCI bus from a given domain and bus number Synopsis struct pci_bus * pci_nd_bus (int domain, int busnr); Arguments domain number of PCI domain to search busnr number of desired PCI bus The Linux Kernel API 194 / 317 Description Given a PCI bus number and domain number, the desired PCI bus is located in the global list of PCI buses. If the bus is found, a pointer to its data structure is returned. If no bus is found, NULL is returned. 9.5.80 pci_nd_next_bus pci_nd_next_bus begin or continue searching for a PCI bus Synopsis struct pci_bus * pci_nd_next_bus (const struct pci_bus * from); Arguments from Previous PCI bus found, or NULL for new search. Description Iterates through the list of known PCI buses. A new search is initiated by passing NULL as the from argument. Otherwise if from is not NULL, searches continue from next device on the global list. 9.5.81 pci_get_slot pci_get_slot locate PCI device for a given PCI slot Synopsis struct pci_dev * pci_get_slot (struct pci_bus * bus, unsigned int devfn); Arguments bus PCI bus on which desired PCI device resides devfn encodes number of PCI slot in which the desired PCI device resides and the logical device number within that slot in case of multi-function devices. Description Given a PCI bus and slot/function number, the desired PCI device is located in the list of PCI devices. If the device is found, its reference count is increased and this function returns a pointer to its data structure. The caller must decrement the reference count by calling pci_dev_put. If no device is found, NULL is returned. 9.5.82 pci_get_domain_bus_and_slot pci_get_domain_bus_and_slot locate PCI device for a given PCI domain (segment), bus, and slot Synopsis struct pci_dev * pci_get_domain_bus_and_slot (int domain, unsigned int bus, unsigned int devfn); The Linux Kernel API 195 / 317 Arguments domain PCI domain/segment on which the PCI device resides. bus PCI bus on which desired PCI device resides devfn encodes number of PCI slot in which the desired PCI device resides and the logical device number within that slot in case of multi-function devices. Description Given a PCI domain, bus, and slot/function number, the desired PCI device is located in the list of PCI devices. If the device is found, its reference count is increased and this function returns a pointer to its data structure. The caller must decrement the reference count by calling pci_dev_put. If no device is found, NULL is returned. 9.5.83 pci_get_subsys pci_get_subsys begin or continue searching for a PCI device by vendor/subvendor/device/subdevice id Synopsis struct pci_dev * pci_get_subsys (unsigned int vendor, unsigned int device, unsigned int ss_vendor, unsigned int ss_device, struct pci_dev * from); Arguments vendor PCI vendor id to match, or PCI_ANY_ID to match all vendor ids device PCI device id to match, or PCI_ANY_ID to match all device ids ss_vendor PCI subsystem vendor id to match, or PCI_ANY_ID to match all vendor ids ss_device PCI subsystem device id to match, or PCI_ANY_ID to match all device ids from Previous PCI device found in search, or NULL for new search. Description Iterates through the list of known PCI devices. If a PCI device is found with a matching vendor, device, ss_vendor and ss_device, a pointer to its device structure is returned, and the reference count to the device is incremented. Otherwise, NULL is returned. A new search is initiated by passing NULL as the from argument. Otherwise if from is not NULL, searches continue from next device on the global list. The reference count for from is always decremented if it is not NULL. 9.5.84 pci_get_device pci_get_device begin or continue searching for a PCI device by vendor/device id Synopsis struct pci_dev * pci_get_device (unsigned int vendor, unsigned int device, struct pci_dev * from); The Linux Kernel API 196 / 317 Arguments vendor PCI vendor id to match, or PCI_ANY_ID to match all vendor ids device PCI device id to match, or PCI_ANY_ID to match all device ids from Previous PCI device found in search, or NULL for new search. Description Iterates through the list of known PCI devices. If a PCI device is found with a matching vendor and device, the reference count to the device is incremented and a pointer to its device structure is returned. Otherwise, NULL is returned. A new search is initiated by passing NULL as the from argument. Otherwise if from is not NULL, searches continue from next device on the global list. The reference count for from is always decremented if it is not NULL. 9.5.85 pci_get_class pci_get_class begin or continue searching for a PCI device by class Synopsis struct pci_dev * pci_get_class (unsigned int class, struct pci_dev * from); Arguments class search for a PCI device with this class designation from Previous PCI device found in search, or NULL for new search. Description Iterates through the list of known PCI devices. If a PCI device is found with a matching class, the reference count to the device is incremented and a pointer to its device structure is returned. Otherwise, NULL is returned. A new search is initiated by passing NULL as the from argument. Otherwise if from is not NULL, searches continue from next device on the global list. The reference count for from is always decremented if it is not NULL. 9.5.86 pci_dev_present pci_dev_present Returns 1 if device matching the device list is present, 0 if not. Synopsis int pci_dev_present (const struct pci_device_id * ids); Arguments ids A pointer to a null terminated list of struct pci_device_id structures that describe the type of PCI device the caller is trying to nd. The Linux Kernel API 197 / 317 Obvious fact You do not have a reference to any device that might be found by this function, so if that device is removed from the system right after this function is nished, the value will be stale. Use this function to nd devices that are usually built into a system, or for a general hint as to if another device happens to be present at this specic moment in time. 9.5.87 pci_msi_vec_count pci_msi_vec_count Return the number of MSI vectors a device can send Synopsis int pci_msi_vec_count (struct pci_dev * dev); Arguments dev device to report about Description This function returns the number of MSI vectors a device requested via Multiple Message Capable register. It returns a negative errno if the device is not capable sending MSI interrupts. Otherwise, the call succeeds and returns a power of two, up to a maximum of 25 (32), according to the MSI specication. 9.5.88 pci_msix_vec_count pci_msix_vec_count return the number of devices MSI-X table entries Synopsis int pci_msix_vec_count (struct pci_dev * dev); Arguments dev pointer to the pci_dev data structure of MSI-X device function This function returns the number of devices MSI-X table entries and therefore the number of MSI-X vectors device is capable of sending. It returns a negative errno if the device is not capable of sending MSI-X interrupts. 9.5.89 pci_enable_msix pci_enable_msix congure devices MSI-X capability structure Synopsis int pci_enable_msix (struct pci_dev * dev, struct msix_entry * entries, int nvec); The Linux Kernel API 198 / 317 Arguments dev pointer to the pci_dev data structure of MSI-X device function entries pointer to an array of MSI-X entries nvec number of MSI-X irqs requested for allocation by device driver Description Setup the MSI-X capability structure of device function with the number of requested irqs upon its software driver call to request for MSI-X mode enabled on its hardware device function. A return of zero indicates the successful conguration of MSI-X capability structure with new allocated MSI-X irqs. A return of < 0 indicates a failure. Or a return of > 0 indicates that driver request is exceeding the number of irqs or MSI-X vectors available. Driver should use the returned value to re-send its request. 9.5.90 pci_msi_enabled pci_msi_enabled is MSI enabled? Synopsis int pci_msi_enabled ( void); Arguments void no arguments Description Returns true if MSI has not been disabled by the command-line option pci=nomsi. 9.5.91 pci_enable_msi_range pci_enable_msi_range congure devices MSI capability structure Synopsis int pci_enable_msi_range (struct pci_dev * dev, int minvec, int maxvec); Arguments dev device to congure minvec minimal number of interrupts to congure maxvec maximum number of interrupts to congure Description This function tries to allocate a maximum possible number of interrupts in a range between minvec and maxvec. It returns a negative errno if an error occurs. If it succeeds, it returns the actual number of interrupts allocated and updates the devs irq member to the lowest new interrupt number; the other interrupt numbers allocated to this device are consecutive. The Linux Kernel API 199 / 317 9.5.92 pci_enable_msix_range pci_enable_msix_range congure devices MSI-X capability structure Synopsis int pci_enable_msix_range (struct pci_dev * dev, struct msix_entry * entries, int minvec, int maxvec); Arguments dev pointer to the pci_dev data structure of MSI-X device function entries pointer to an array of MSI-X entries minvec minimum number of MSI-X irqs requested maxvec maximum number of MSI-X irqs requested Description Setup the MSI-X capability structure of device function with a maximum possible number of interrupts in the range between minvec and maxvec upon its software driver call to request for MSI-X mode enabled on its hardware device function. It returns a negative errno if an error occurs. If it succeeds, it returns the actual number of interrupts allocated and indicates the successful conguration of MSI-X capability structure with new allocated MSI-X interrupts. 9.5.93 pci_bus_alloc_resource pci_bus_alloc_resource allocate a resource from a parent bus Synopsis int pci_bus_alloc_resource (struct pci_bus * bus, struct resource * res, resource_size_t size, resource_size_t align, resource_size_t min, unsigned long type_mask, resource_size_t (*alignf) (void *, const struct resource *, resource_size_t, resource_size_t), void * alignf_data); Arguments bus PCI bus res resource to allocate size size of resource to allocate align alignment of resource to allocate min minimum /proc/iomem address to allocate type_mask IORESOURCE_* type ags alignf resource alignment function alignf_data data argument for resource alignment function The Linux Kernel API 200 / 317 Description Given the PCI bus a device resides on, the size, minimumaddress, alignment and type, try to nd an acceptable resource allocation for a specic device resource. 9.5.94 pci_bus_add_device pci_bus_add_device start driver for a single device Synopsis void pci_bus_add_device (struct pci_dev * dev); Arguments dev device to add Description This adds add sysfs entries and start device drivers 9.5.95 pci_bus_add_devices pci_bus_add_devices start driver for PCI devices Synopsis void pci_bus_add_devices (const struct pci_bus * bus); Arguments bus bus to check for new devices Description Start driver for PCI devices and add some sysfs entries. 9.5.96 pci_bus_set_ops pci_bus_set_ops Set raw operations of pci bus Synopsis struct pci_ops * pci_bus_set_ops (struct pci_bus * bus, struct pci_ops * ops); The Linux Kernel API 201 / 317 Arguments bus pci bus struct ops new raw operations Description Return previous raw operations 9.5.97 pci_read_vpd pci_read_vpd Read one entry from Vital Product Data Synopsis ssize_t pci_read_vpd (struct pci_dev * dev, loff_t pos, size_t count, void * buf); Arguments dev pci device struct pos offset in vpd space count number of bytes to read buf pointer to where to store result 9.5.98 pci_write_vpd pci_write_vpd Write entry to Vital Product Data Synopsis ssize_t pci_write_vpd (struct pci_dev * dev, loff_t pos, size_t count, const void * buf); Arguments dev pci device struct pos offset in vpd space count number of bytes to write buf buffer containing write data 9.5.99 pci_cfg_access_lock pci_cfg_access_lock Lock PCI cong reads/writes The Linux Kernel API 202 / 317 Synopsis void pci_cfg_access_lock (struct pci_dev * dev); Arguments dev pci device struct Description When access is locked, any userspace reads or writes to cong space and concurrent lock requests will sleep until access is allowed via pci_cfg_access_unlocked again. 9.5.100 pci_cfg_access_trylock pci_cfg_access_trylock try to lock PCI cong reads/writes Synopsis bool pci_cfg_access_trylock (struct pci_dev * dev); Arguments dev pci device struct Description Same as pci_cfg_access_lock, but will return 0 if access is already locked, 1 otherwise. This function can be used from atomic contexts. 9.5.101 pci_cfg_access_unlock pci_cfg_access_unlock Unlock PCI cong reads/writes Synopsis void pci_cfg_access_unlock (struct pci_dev * dev); Arguments dev pci device struct Description This function allows PCI cong accesses to resume. The Linux Kernel API 203 / 317 9.5.102 pci_lost_interrupt pci_lost_interrupt reports a lost PCI interrupt Synopsis enum pci_lost_interrupt_reason pci_lost_interrupt (struct pci_dev * pdev); Arguments pdev device whose interrupt is lost Description The primary function of this routine is to report a lost interrupt in a standard way which users can recognise (instead of blaming the driver). Returns a suggestion for xing it (although the driver is not required to act on this). 9.5.103 __ht_create_irq __ht_create_irq create an irq and attach it to a device. Synopsis int __ht_create_irq (struct pci_dev * dev, int idx, ht_irq_update_t * update); Arguments dev The hypertransport device to nd the irq capability on. idx Which of the possible irqs to attach to. update Function to be called when changing the htirq message Description The irq number of the new irq or a negative error value is returned. 9.5.104 ht_create_irq ht_create_irq create an irq and attach it to a device. Synopsis int ht_create_irq (struct pci_dev * dev, int idx); The Linux Kernel API 204 / 317 Arguments dev The hypertransport device to nd the irq capability on. idx Which of the possible irqs to attach to. Description ht_create_irq needs to be called for all hypertransport devices that generate irqs. The irq number of the new irq or a negative error value is returned. 9.5.105 ht_destroy_irq ht_destroy_irq destroy an irq created with ht_create_irq Synopsis void ht_destroy_irq (unsigned int irq); Arguments irq irq to be destroyed Description This reverses ht_create_irq removing the specied irq from existence. The irq should be free before this happens. 9.5.106 pci_scan_slot pci_scan_slot scan a PCI slot on a bus for devices. Synopsis int pci_scan_slot (struct pci_bus * bus, int devfn); Arguments bus PCI bus to scan devfn slot number to scan (must have zero function.) Description Scan a PCI slot on the specied PCI bus for devices, adding discovered devices to the bus->devices list. New devices will not have is_added set. Returns the number of new devices found. The Linux Kernel API 205 / 317 9.5.107 pci_rescan_bus pci_rescan_bus scan a PCI bus for devices. Synopsis unsigned int pci_rescan_bus (struct pci_bus * bus); Arguments bus PCI bus to scan Description Scan a PCI bus and child buses for new devices, adds them, and enables them. Returns the max number of subordinate bus discovered. 9.5.108 pci_create_slot pci_create_slot create or increment refcount for physical PCI slot Synopsis struct pci_slot * pci_create_slot (struct pci_bus * parent, int slot_nr, const char * name, struct hotplug_slot * hotplug); Arguments parent struct pci_bus of parent bridge slot_nr PCI_SLOT(pci_dev->devfn) or -1 for placeholder name user visible string presented in /sys/bus/pci/slots/<name> hotplug set if caller is hotplug driver, NULL otherwise Description PCI slots have rst class attributes such as address, speed, width, and a struct pci_slot is used to manage them. This interface will either return a new struct pci_slot to the caller, or if the pci_slot already exists, its refcount will be incremented. Slots are uniquely identied by a pci_bus, slot_nr tuple. There are known platforms with broken rmware that assign the same name to multiple slots. Workaround these broken platforms by renaming the slots on behalf of the caller. If rmware assigns name N to multiple slots The rst slot is assigned N The second slot is assigned N-1 The third slot is assigned N-2 etc. The Linux Kernel API 206 / 317 Placeholder slots In most cases, pci_bus, slot_nr will be sufcient to uniquely identify a slot. There is one notable exception - pSeries (rpaphp), where the slot_nr cannot be determined until a device is actually inserted into the slot. In this scenario, the caller may pass -1 for slot_nr. The following semantics are imposed when the caller passes slot_nr == -1. First, we no longer check for an existing struct pci_slot, as there may be many slots with slot_nr of -1. The other change in semantics is user-visible, which is the address parameter presented in sysfs will consist solely of a dddd bb tuple, where dddd is the PCI domain of the struct pci_bus and bb is the bus number. In other words, the devfn of the placeholder slot will not be displayed. 9.5.109 pci_destroy_slot pci_destroy_slot decrement refcount for physical PCI slot Synopsis void pci_destroy_slot (struct pci_slot * slot); Arguments slot struct pci_slot to decrement Description struct pci_slot is refcounted, so destroying them is really easy; we just call kobject_put on its kobj and let our release methods do the rest. 9.5.110 pci_hp_create_module_link pci_hp_create_module_link create symbolic link to the hotplug driver module. Synopsis void pci_hp_create_module_link (struct pci_slot * pci_slot); Arguments pci_slot struct pci_slot Description Helper function for pci_hotplug_core.c to create symbolic link to the hotplug driver module. The Linux Kernel API 207 / 317 9.5.111 pci_hp_remove_module_link pci_hp_remove_module_link remove symbolic link to the hotplug driver module. Synopsis void pci_hp_remove_module_link (struct pci_slot * pci_slot); Arguments pci_slot struct pci_slot Description Helper function for pci_hotplug_core.c to remove symbolic link to the hotplug driver module. 9.5.112 pci_enable_rom pci_enable_rom enable ROM decoding for a PCI device Synopsis int pci_enable_rom (struct pci_dev * pdev); Arguments pdev PCI device to enable Description Enable ROM decoding on dev. This involves simply turning on the last bit of the PCI ROM BAR. Note that some cards may share address decoders between the ROM and other resources, so enabling it may disable access to MMIO registers or other card memory. 9.5.113 pci_disable_rom pci_disable_rom disable ROM decoding for a PCI device Synopsis void pci_disable_rom (struct pci_dev * pdev); Arguments pdev PCI device to disable The Linux Kernel API 208 / 317 Description Disable ROM decoding on a PCI device by turning off the last bit in the ROM BAR. 9.5.114 pci_map_rom pci_map_rom map a PCI ROM to kernel space Synopsis void __iomem * pci_map_rom (struct pci_dev * pdev, size_t * size); Arguments pdev pointer to pci device struct size pointer to receive size of pci window over ROM Return kernel virtual pointer to image of ROM Map a PCI ROM into kernel space. If ROM is boot video ROM, the shadow BIOS copy will be returned instead of the actual ROM. 9.5.115 pci_unmap_rom pci_unmap_rom unmap the ROM from kernel space Synopsis void pci_unmap_rom (struct pci_dev * pdev, void __iomem * rom); Arguments pdev pointer to pci device struct rom virtual address of the previous mapping Description Remove a mapping of a previously mapped ROM 9.5.116 pci_platform_rom pci_platform_rom provides a pointer to any ROM image provided by the platform Synopsis void __iomem * pci_platform_rom (struct pci_dev * pdev, size_t * size); The Linux Kernel API 209 / 317 Arguments pdev pointer to pci device struct size pointer to receive size of pci window over ROM 9.5.117 pci_enable_sriov pci_enable_sriov enable the SR-IOV capability Synopsis int pci_enable_sriov (struct pci_dev * dev, int nr_virtfn); Arguments dev the PCI device nr_virtfn number of virtual functions to enable Description Returns 0 on success, or negative on failure. 9.5.118 pci_disable_sriov pci_disable_sriov disable the SR-IOV capability Synopsis void pci_disable_sriov (struct pci_dev * dev); Arguments dev the PCI device 9.5.119 pci_num_vf pci_num_vf return number of VFs associated with a PF device_release_driver Synopsis int pci_num_vf (struct pci_dev * dev); Arguments dev the PCI device The Linux Kernel API 210 / 317 Description Returns number of VFs, or 0 if SR-IOV is not enabled. 9.5.120 pci_vfs_assigned pci_vfs_assigned returns number of VFs are assigned to a guest Synopsis int pci_vfs_assigned (struct pci_dev * dev); Arguments dev the PCI device Description Returns number of VFs belonging to this device that are assigned to a guest. If device is not a physical function returns 0. 9.5.121 pci_sriov_set_totalvfs pci_sriov_set_totalvfs - reduce the TotalVFs available Synopsis int pci_sriov_set_totalvfs (struct pci_dev * dev, u16 numvfs); Arguments dev the PCI PF device numvfs number that should be used for TotalVFs supported Description Should be called from PF drivers probe routine with devices mutex held. Returns 0 if PF is an SRIOV-capable device and value of numvfs valid. If not a PF return -ENOSYS; if numvfs is invalid return -EINVAL; if VFs already enabled, return -EBUSY. 9.5.122 pci_sriov_get_totalvfs pci_sriov_get_totalvfs - get total VFs supported on this device Synopsis int pci_sriov_get_totalvfs (struct pci_dev * dev); The Linux Kernel API 211 / 317 Arguments dev the PCI PF device Description For a PCIe device with SRIOV support, return the PCIe SRIOV capability value of TotalVFs or the value of driver_max_VFs if the driver reduced it. Otherwise 0. 9.5.123 pci_read_legacy_io pci_read_legacy_io read byte(s) from legacy I/O port space Synopsis ssize_t pci_read_legacy_io (struct le * lp, struct kobject * kobj, struct bin_attribute * bin_attr, char * buf, loff_t off, size_t count); Arguments filp open sysfs le kobj kobject corresponding to le to read from bin_attr struct bin_attribute for this le buf buffer to store results off offset into legacy I/O port space count number of bytes to read Description Reads 1, 2, or 4 bytes from legacy I/O port space using an arch specic callback routine (pci_legacy_read). 9.5.124 pci_write_legacy_io pci_write_legacy_io write byte(s) to legacy I/O port space Synopsis ssize_t pci_write_legacy_io (struct le * lp, struct kobject * kobj, struct bin_attribute * bin_attr, char * buf, loff_t off, size_t count); Arguments filp open sysfs le kobj kobject corresponding to le to read from bin_attr struct bin_attribute for this le buf buffer containing value to be written off offset into legacy I/O port space count number of bytes to write The Linux Kernel API 212 / 317 Description Writes 1, 2, or 4 bytes from legacy I/O port space using an arch specic callback routine (pci_legacy_write). 9.5.125 pci_mmap_legacy_mem pci_mmap_legacy_mem map legacy PCI memory into user memory space Synopsis int pci_mmap_legacy_mem (struct le * lp, struct kobject * kobj, struct bin_attribute * attr, struct vm_area_struct * vma); Arguments filp open sysfs le kobj kobject corresponding to device to be mapped attr struct bin_attribute for this le vma struct vm_area_struct passed to mmap Description Uses an arch specic callback, pci_mmap_legacy_mem_page_range, to mmap legacy memory space (rst meg of bus space) into application virtual memory space. 9.5.126 pci_mmap_legacy_io pci_mmap_legacy_io map legacy PCI IO into user memory space Synopsis int pci_mmap_legacy_io (struct le * lp, struct kobject * kobj, struct bin_attribute * attr, struct vm_area_struct * vma); Arguments filp open sysfs le kobj kobject corresponding to device to be mapped attr struct bin_attribute for this le vma struct vm_area_struct passed to mmap Description Uses an arch specic callback, pci_mmap_legacy_io_page_range, to mmap legacy IO space (rst meg of bus space) into appli- cation virtual memory space. Returns -ENOSYS if the operation isnt supported The Linux Kernel API 213 / 317 9.5.127 pci_adjust_legacy_attr pci_adjust_legacy_attr adjustment of legacy le attributes Synopsis void pci_adjust_legacy_attr (struct pci_bus * b, enum pci_mmap_state mmap_type); Arguments b bus to create les under mmap_type I/O port or memory Description Stub implementation. Can be overridden by arch if necessary. 9.5.128 pci_create_legacy_les pci_create_legacy_les create legacy I/O port and memory les Synopsis void pci_create_legacy_les (struct pci_bus * b); Arguments b bus to create les under Description Some platforms allow access to legacy I/O port and ISA memory space on a per-bus basis. This routine creates the les and ties them into their associated read, write and mmap les from pci-sysfs.c On error unwind, but dont propagate the error to the caller as it is ok to set up the PCI bus without these les. 9.5.129 pci_mmap_resource pci_mmap_resource map a PCI resource into user memory space Synopsis int pci_mmap_resource (struct kobject * kobj, struct bin_attribute * attr, struct vm_area_struct * vma, int write_combine); The Linux Kernel API 214 / 317 Arguments kobj kobject for mapping attr struct bin_attribute for the le being mapped vma struct vm_area_struct passed into the mmap write_combine 1 for write_combine mapping Description Use the regular PCI mapping routines to map a PCI resource into userspace. 9.5.130 pci_remove_resource_les pci_remove_resource_les cleanup resource les Synopsis void pci_remove_resource_les (struct pci_dev * pdev); Arguments pdev dev to cleanup Description If we created resource les for pdev, remove them from sysfs and free their resources. 9.5.131 pci_create_resource_les pci_create_resource_les create resource les in sysfs for dev Synopsis int pci_create_resource_les (struct pci_dev * pdev); Arguments pdev dev in question Description Walk the resources in pdev creating les for each resource available. 9.5.132 pci_write_rom pci_write_rom used to enable access to the PCI ROM display The Linux Kernel API 215 / 317 Synopsis ssize_t pci_write_rom (struct le * lp, struct kobject * kobj, struct bin_attribute * bin_attr, char * buf, loff_t off, size_t count); Arguments filp sysfs le kobj kernel object handle bin_attr struct bin_attribute for this le buf user input off le offset count number of byte in input Description writing anything except 0 enables it 9.5.133 pci_read_rom pci_read_rom read a PCI ROM Synopsis ssize_t pci_read_rom (struct le * lp, struct kobject * kobj, struct bin_attribute * bin_attr, char * buf, loff_t off, size_t count); Arguments filp sysfs le kobj kernel object handle bin_attr struct bin_attribute for this le buf where to put the data we read from the ROM off le offset count number of bytes to read Description Put count bytes starting at off into buf from the ROM in the PCI device corresponding to kobj. 9.5.134 pci_remove_sysfs_dev_les pci_remove_sysfs_dev_les cleanup PCI specic sysfs les The Linux Kernel API 216 / 317 Synopsis void pci_remove_sysfs_dev_les (struct pci_dev * pdev); Arguments pdev device whose entries we should free Description Cleanup when pdev is removed from sysfs. 9.6 PCI Hotplug Support Library 9.6.1 __pci_hp_register __pci_hp_register register a hotplug_slot with the PCI hotplug subsystem Synopsis int __pci_hp_register (struct hotplug_slot * slot, struct pci_bus * bus, int devnr, const char * name, struct module * owner, const char * mod_name); Arguments slot pointer to the struct hotplug_slot to register bus bus this slot is on devnr device number name name registered with kobject core owner caller module owner mod_name caller module name Description Registers a hotplug slot with the pci hotplug subsystem, which will allow userspace interaction to the slot. Returns 0 if successful, anything else for an error. 9.6.2 pci_hp_deregister pci_hp_deregister deregister a hotplug_slot with the PCI hotplug subsystem Synopsis int pci_hp_deregister (struct hotplug_slot * hotplug); The Linux Kernel API 217 / 317 Arguments hotplug pointer to the struct hotplug_slot to deregister Description The slot must have been registered with the pci hotplug subsystem previously with a call to pci_hp_register. Returns 0 if successful, anything else for an error. 9.6.3 pci_hp_change_slot_info pci_hp_change_slot_info changes the slots information structure in the core Synopsis int pci_hp_change_slot_info (struct hotplug_slot * hotplug, struct hotplug_slot_info * info); Arguments hotplug pointer to the slot whose info has changed info pointer to the info copy into the slots info structure Description slot must have been registered with the pci hotplug subsystem previously with a call to pci_hp_register. Returns 0 if successful, anything else for an error. The Linux Kernel API 218 / 317 Chapter 10 Firmware Interfaces 10.1 DMI Interfaces 10.1.1 dmi_check_system dmi_check_system check system DMI data Synopsis int dmi_check_system (const struct dmi_system_id * list); Arguments list array of dmi_system_id structures to match against All non-null elements of the list must match their slots (eld indexs) data (i.e., each list string must be a substring of the specied DMI slots string data) to be considered a successful match. Description Walk the blacklist table running matching functions until someone returns non zero or we hit the end. Callback function is called for each successful match. Returns the number of matches. 10.1.2 dmi_rst_match dmi_rst_match nd dmi_system_id structure matching system DMI data Synopsis const struct dmi_system_id * dmi_rst_match (const struct dmi_system_id * list); Arguments list array of dmi_system_id structures to match against All non-null elements of the list must match their slots (eld indexs) data (i.e., each list string must be a substring of the specied DMI slots string data) to be considered a successful match. The Linux Kernel API 219 / 317 Description Walk the blacklist table until the rst match is found. Return the pointer to the matching entry or NULL if theres no match. 10.1.3 dmi_get_system_info dmi_get_system_info return DMI data value Synopsis const char * dmi_get_system_info (int eld); Arguments field data index (see enum dmi_eld) Description Returns one DMI data value, can be used to perform complex DMI data checks. 10.1.4 dmi_name_in_vendors dmi_name_in_vendors Check if string is in the DMI system or board vendor name Synopsis int dmi_name_in_vendors (const char * str); Arguments str Case sensitive Name 10.1.5 dmi_nd_device dmi_nd_device nd onboard device by type/name Synopsis const struct dmi_device * dmi_nd_device (int type, const char * name, const struct dmi_device * from); Arguments type device type or DMI_DEV_TYPE_ANY to match all device types name device name string or NULL to match all from previous device found in search, or NULL for new search. The Linux Kernel API 220 / 317 Description Iterates through the list of known onboard devices. If a device is found with a matching vendor and device, a pointer to its device structure is returned. Otherwise, NULL is returned. A new search is initiated by passing NULL as the from argument. If from is not NULL, searches continue from next device. 10.1.6 dmi_get_date dmi_get_date parse a DMI date Synopsis bool dmi_get_date (int eld, int * yearp, int * monthp, int * dayp); Arguments field data index (see enum dmi_eld) yearp optional out parameter for the year monthp optional out parameter for the month dayp optional out parameter for the day Description The date eld is assumed to be in the form resembling [mm[/dd]]/yy[yy] and the result is stored in the out parameters any or all of which can be omitted. If the eld doesnt exist, all out parameters are set to zero and false is returned. Otherwise, true is returned with any invalid part of date set to zero. On return, year, month and day are guaranteed to be in the range of [0,9999], [0,12] and [0,31] respectively. 10.1.7 dmi_walk dmi_walk Walk the DMI table and get called back for every record Synopsis int dmi_walk (void (*decode) (const struct dmi_header *, void *), void * private_data); Arguments decode Callback function private_data Private data to be passed to the callback function Description Returns -1 when the DMI table cant be reached, 0 on success. The Linux Kernel API 221 / 317 10.1.8 dmi_match dmi_match compare a string to the dmi eld (if exists) Synopsis bool dmi_match (enum dmi_eld f, const char * str); Arguments f DMI eld identier str string to compare the DMI eld to Description Returns true if the requested eld equals to the str (including NULL). 10.2 EDD Interfaces 10.2.1 edd_show_raw_data edd_show_raw_data copies raw data to buffer for userspace to parse Synopsis ssize_t edd_show_raw_data (struct edd_device * edev, char * buf); Arguments edev target edd_device buf output buffer Returns number of bytes written, or -EINVAL on failure 10.2.2 edd_release edd_release free edd structure Synopsis void edd_release (struct kobject * kobj); Arguments kobj kobject of edd structure The Linux Kernel API 222 / 317 Description This is called when the refcount of the edd structure reaches 0. This should happen right after we unregister, but just in case, we use the release callback anyway. 10.2.3 edd_dev_is_type edd_dev_is_type is this EDD device a type device? Synopsis int edd_dev_is_type (struct edd_device * edev, const char * type); Arguments edev target edd_device type a host bus or interface identier string per the EDD spec Description Returns 1 (TRUE) if it is a type device, 0 otherwise. 10.2.4 edd_get_pci_dev edd_get_pci_dev nds pci_dev that matches edev Synopsis struct pci_dev * edd_get_pci_dev (struct edd_device * edev); Arguments edev edd_device Description Returns pci_dev if found, or NULL 10.2.5 edd_init edd_init creates sysfs tree of EDD data Synopsis int edd_init ( void); Arguments void no arguments The Linux Kernel API 223 / 317 Chapter 11 Security Framework 11.1 security_init security_init initializes the security framework Synopsis int security_init ( void); Arguments void no arguments Description This should be called early in the kernel initialization sequence. 11.2 security_module_enable security_module_enable Load given security module on boot ? Synopsis int security_module_enable (struct security_operations * ops); Arguments ops a pointer to the struct security_operations that is to be checked. Description Each LSM must pass this method before registering its own operations to avoid security registration races. This method may also be used to check if your LSM is currently loaded during kernel initialization. The Linux Kernel API 224 / 317 Return true if -The passed LSM is the one chosen by user at boot time, -or the passed LSM is congured as the default and the user did not choose an alternate LSM at boot time. Otherwise, return false. 11.3 register_security register_security registers a security framework with the kernel Synopsis int register_security (struct security_operations * ops); Arguments ops a pointer to the struct security_options that is to be registered Description This function allows a security module to register itself with the kernel security subsystem. Some rudimentary checking is done on the ops value passed to this function. Youll need to check rst if your LSM is allowed to register its ops by calling security_module_enable(ops). If there is already a security module registered with the kernel, an error will be returned. Otherwise 0 is returned on success. 11.4 securityfs_create_le securityfs_create_le create a le in the securityfs lesystem Synopsis struct dentry * securityfs_create_le (const char * name, umode_t mode, struct dentry * parent, void * data, const struct le_operations * fops); Arguments name a pointer to a string containing the name of the le to create. mode the permission that the le should have parent a pointer to the parent dentry for this le. This should be a directory dentry if set. If this parameter is NULL, then the le will be created in the root of the securityfs lesystem. data a pointer to something that the caller will want to get to later on. The inode.i_private pointer will point to this value on the open call. fops a pointer to a struct le_operations that should be used for this le. The Linux Kernel API 225 / 317 Description This is the basic create a le function for securityfs. It allows for a wide range of exibility in creating a le, or a directory (if you want to create a directory, the securityfs_create_dir function is recommended to be used instead). This function returns a pointer to a dentry if it succeeds. This pointer must be passed to the securityfs_remove function when the le is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here). If an error occurs, the function will return the erorr value (via ERR_PTR). If securityfs is not enabled in the kernel, the value -ENODEV is returned. 11.5 securityfs_create_dir securityfs_create_dir create a directory in the securityfs lesystem Synopsis struct dentry * securityfs_create_dir (const char * name, struct dentry * parent); Arguments name a pointer to a string containing the name of the directory to create. parent a pointer to the parent dentry for this le. This should be a directory dentry if set. If this parameter is NULL, then the directory will be created in the root of the securityfs lesystem. Description This function creates a directory in securityfs with the given name. This function returns a pointer to a dentry if it succeeds. This pointer must be passed to the securityfs_remove function when the le is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here). If an error occurs, NULL will be returned. If securityfs is not enabled in the kernel, the value -ENODEV is returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. 11.6 securityfs_remove securityfs_remove removes a le or directory from the securityfs lesystem Synopsis void securityfs_remove (struct dentry * dentry); Arguments dentry a pointer to a the dentry of the le or directory to be removed. The Linux Kernel API 226 / 317 Description This function removes a le or directory in securityfs that was previously created with a call to another securityfs function (like securityfs_create_file or variants thereof.) This function is required to be called in order for the le to be removed. No automatic cleanup of les will happen when a module is removed; you are responsible here. The Linux Kernel API 227 / 317 Chapter 12 Audit Interfaces 12.1 audit_log_start audit_log_start obtain an audit buffer Synopsis struct audit_buffer * audit_log_start (struct audit_context * ctx, gfp_t gfp_mask, int type); Arguments ctx audit_context (may be NULL) gfp_mask type of allocation type audit message type Description Returns audit_buffer pointer on success or NULL on error. Obtain an audit buffer. This routine does locking to obtain the audit buffer, but then no locking is required for calls to au- dit_log_*format. If the task (ctx) is a task that is currently in a syscall, then the syscall is marked as auditable and an audit record will be written at syscall exit. If there is no associated task, then task context (ctx) should be NULL. 12.2 audit_log_format audit_log_format format a message into the audit buffer. Synopsis void audit_log_format (struct audit_buffer * ab, const char * fmt, ...); The Linux Kernel API 228 / 317 Arguments ab audit_buffer fmt format string @...: optional parameters matching fmt string ... variable arguments Description All the work is done in audit_log_vformat. 12.3 audit_log_end audit_log_end end one audit record Synopsis void audit_log_end (struct audit_buffer * ab); Arguments ab the audit_buffer Description netlink_unicast cannot be called inside an irq context because it blocks (last arg, ags, is not set to MSG_DONTWAIT), so the audit buffer is placed on a queue and a tasklet is scheduled to remove them from the queue outside the irq context. May be called in any context. 12.4 audit_log audit_log Log an audit record Synopsis void audit_log (struct audit_context * ctx, gfp_t gfp_mask, int type, const char * fmt, ...); Arguments ctx audit context gfp_mask type of allocation type audit message type fmt format string to use @...: variable parameters matching the format string ... variable arguments The Linux Kernel API 229 / 317 Description This is a convenience function that calls audit_log_start, audit_log_vformat, and audit_log_end. It may be called in any context. 12.5 audit_log_secctx audit_log_secctx Converts and logs SELinux context Synopsis void audit_log_secctx (struct audit_buffer * ab, u32 secid); Arguments ab audit_buffer secid security number Description This is a helper function that calls security_secid_to_secctx to convert secid to secctx and then adds the (converted) SELinux context to the audit log by calling audit_log_format, thus also preventing leak of internal secid to userspace. If secid cannot be converted audit_panic is called. 12.6 audit_alloc audit_alloc allocate an audit context block for a task Synopsis int audit_alloc (struct task_struct * tsk); Arguments tsk task Description Filter on the task information and allocate a per-task audit context if necessary. Doing so turns on system call auditing for the specied task. This is called from copy_process, so no lock is needed. 12.7 __audit_free __audit_free free a per-task audit context The Linux Kernel API 230 / 317 Synopsis void __audit_free (struct task_struct * tsk); Arguments tsk task whose audit context block to free Description Called from copy_process and do_exit 12.8 __audit_syscall_entry __audit_syscall_entry ll in an audit record at syscall entry Synopsis void __audit_syscall_entry (int arch, int major, unsigned long a1, unsigned long a2, unsigned long a3, unsigned long a4); Arguments arch architecture type major major syscall type (function) a1 additional syscall register 1 a2 additional syscall register 2 a3 additional syscall register 3 a4 additional syscall register 4 Description Fill in audit context at syscall entry. This only happens if the audit context was created when the task was created and the state or lters demand the audit context be built. If the state from the per-task lter or from the per-syscall lter is AU- DIT_RECORD_CONTEXT, then the record will be written at syscall exit time (otherwise, it will only be written if another part of the kernel requests that it be written). 12.9 __audit_syscall_exit __audit_syscall_exit deallocate audit context after a system call Synopsis void __audit_syscall_exit (int success, long return_code); The Linux Kernel API 231 / 317 Arguments success success value of the syscall return_code return value of the syscall Description Tear down after systemcall. If the audit context has been marked as auditable (either because of the AUDIT_RECORD_CONTEXT state from ltering, or because some other part of the kernel wrote an audit message), then write out the syscall information. In call cases, free the names stored from getname. 12.10 __audit_reusename __audit_reusename ll out lename with info from existing entry Synopsis struct lename * __audit_reusename (const __user char * uptr); Arguments uptr userland ptr to pathname Description Search the audit_names list for the current audit context. If there is an existing entry with a matching uptr then return the lename associated with that audit_name. If not, return NULL. 12.11 __audit_getname __audit_getname add a name to the list Synopsis void __audit_getname (struct lename * name); Arguments name name to add Description Add a name to the list of audit names for this context. Called from fs/namei.c:getname. The Linux Kernel API 232 / 317 12.12 __audit_inode __audit_inode store the inode and device from a lookup Synopsis void __audit_inode (struct lename * name, const struct dentry * dentry, unsigned int ags); Arguments name name being audited dentry dentry being audited flags attributes for this particular entry 12.13 auditsc_get_stamp auditsc_get_stamp get local copies of audit_context values Synopsis int auditsc_get_stamp (struct audit_context * ctx, struct timespec * t, unsigned int * serial); Arguments ctx audit_context for the task t timespec to store time recorded in the audit_context serial serial value that is recorded in the audit_context Description Also sets the context as auditable. 12.14 audit_set_loginuid audit_set_loginuid set current tasks audit_context loginuid Synopsis int audit_set_loginuid (kuid_t loginuid); Arguments loginuid loginuid value The Linux Kernel API 233 / 317 Description Returns 0. Called (set) from fs/proc/base.c::proc_loginuid_write. 12.15 __audit_mq_open __audit_mq_open record audit data for a POSIX MQ open Synopsis void __audit_mq_open (int oag, umode_t mode, struct mq_attr * attr); Arguments oflag open ag mode mode bits attr queue attributes 12.16 __audit_mq_sendrecv __audit_mq_sendrecv record audit data for a POSIX MQ timed send/receive Synopsis void __audit_mq_sendrecv (mqd_t mqdes, size_t msg_len, unsigned int msg_prio, const struct timespec * abs_timeout); Arguments mqdes MQ descriptor msg_len Message length msg_prio Message priority abs_timeout Message timeout in absolute time 12.17 __audit_mq_notify __audit_mq_notify record audit data for a POSIX MQ notify Synopsis void __audit_mq_notify (mqd_t mqdes, const struct sigevent * notication); The Linux Kernel API 234 / 317 Arguments mqdes MQ descriptor notification Notication event 12.18 __audit_mq_getsetattr __audit_mq_getsetattr record audit data for a POSIX MQ get/set attribute Synopsis void __audit_mq_getsetattr (mqd_t mqdes, struct mq_attr * mqstat); Arguments mqdes MQ descriptor mqstat MQ ags 12.19 __audit_ipc_obj __audit_ipc_obj record audit data for ipc object Synopsis void __audit_ipc_obj (struct kern_ipc_perm * ipcp); Arguments ipcp ipc permissions 12.20 __audit_ipc_set_perm __audit_ipc_set_perm record audit data for new ipc permissions Synopsis void __audit_ipc_set_perm (unsigned long qbytes, uid_t uid, gid_t gid, umode_t mode); Arguments qbytes msgq bytes uid msgq user id gid msgq group id mode msgq mode (permissions) The Linux Kernel API 235 / 317 Description Called only after audit_ipc_obj. 12.21 __audit_socketcall __audit_socketcall record audit data for sys_socketcall Synopsis int __audit_socketcall (int nargs, unsigned long * args); Arguments nargs number of args, which should not be more than AUDITSC_ARGS. args args array 12.22 __audit_fd_pair __audit_fd_pair record audit data for pipe and socketpair Synopsis void __audit_fd_pair (int fd1, int fd2); Arguments fd1 the rst le descriptor fd2 the second le descriptor 12.23 __audit_sockaddr __audit_sockaddr record audit data for sys_bind, sys_connect, sys_sendto Synopsis int __audit_sockaddr (int len, void * a); Arguments len data length in user space a data address in kernel space The Linux Kernel API 236 / 317 Description Returns 0 for success or NULL context or < 0 on error. 12.24 __audit_signal_info __audit_signal_info record signal info for shutting down audit subsystem Synopsis int __audit_signal_info (int sig, struct task_struct * t); Arguments sig signal value t task being signaled Description If the audit subsystem is being terminated, record the task (pid) and uid that is doing that. 12.25 __audit_log_bprm_fcaps __audit_log_bprm_fcaps store information about a loading bprm and relevant fcaps Synopsis int __audit_log_bprm_fcaps (struct linux_binprm * bprm, const struct cred * new, const struct cred * old); Arguments bprm pointer to the bprm being processed new the proposed new credentials old the old credentials Description Simply check if the proc already has the caps given by the le and if not store the priv escalation info for later auditing at the end of the syscall -Eric 12.26 __audit_log_capset __audit_log_capset store information about the arguments to the capset syscall The Linux Kernel API 237 / 317 Synopsis void __audit_log_capset (const struct cred * new, const struct cred * old); Arguments new the new credentials old the old (current) credentials Description Record the aguments userspace sent to sys_capset for later printing by the audit system if applicable 12.27 audit_core_dumps audit_core_dumps record information about processes that end abnormally Synopsis void audit_core_dumps (long signr); Arguments signr signal value Description If a process ends with a core dump, something shy is going on and we should record the event for investigation. 12.28 audit_rule_change audit_rule_change apply all rules to the specied message type Synopsis int audit_rule_change (int type, __u32 portid, int seq, void * data, size_t datasz); Arguments type audit message type portid target port id for netlink audit messages seq netlink audit message sequence (serial) number data payload data datasz size of payload data The Linux Kernel API 238 / 317 12.29 audit_list_rules_send audit_list_rules_send list the audit rules Synopsis int audit_list_rules_send (struct sk_buff * request_skb, int seq); Arguments request_skb skb of request we are replying to (used to target the reply) seq netlink audit message sequence (serial) number 12.30 parent_len parent_len nd the length of the parent portion of a pathname Synopsis int parent_len (const char * path); Arguments path pathname of which to determine length 12.31 audit_compare_dname_path audit_compare_dname_path compare given dentry name with last component in given path. Return of 0 indicates a match. Synopsis int audit_compare_dname_path (const char * dname, const char * path, int parentlen); Arguments dname dentry name that were comparing path full pathname that were comparing parentlen length of the parent if known. Passing in AUDIT_NAME_FULL here indicates that we must compute this value. The Linux Kernel API 239 / 317 Chapter 13 Accounting Framework 13.1 sys_acct sys_acct enable/disable process accounting Synopsis long sys_acct (const char __user * name); Arguments name le name for accounting records or NULL to shutdown accounting Description Returns 0 for success or negative errno values for failure. sys_acct is the only system call needed to implement process accounting. It takes the name of the le where accounting records should be written. If the lename is NULL, accounting will be shutdown. 13.2 acct_auto_close_mnt acct_auto_close_mnt turn off a lesystems accounting if it is on Synopsis void acct_auto_close_mnt (struct vfsmount * m); Arguments m vfsmount being shut down The Linux Kernel API 240 / 317 Description If the accounting is turned on for a le in the subtree pointed to to by m, turn accounting off. Done when m is about to die. 13.3 acct_auto_close acct_auto_close turn off a lesystems accounting if it is on Synopsis void acct_auto_close (struct super_block * sb); Arguments sb super block for the lesystem Description If the accounting is turned on for a le in the lesystem pointed to by sb, turn accounting off. 13.4 acct_collect acct_collect collect accounting information into pacct_struct Synopsis void acct_collect (long exitcode, int group_dead); Arguments exitcode task exit code group_dead not 0, if this thread is the last one in the process. 13.5 acct_process acct_process now just a wrapper around acct_process_in_ns, which in turn is a wrapper around do_acct_process. Synopsis void acct_process ( void); Arguments void no arguments The Linux Kernel API 241 / 317 Description handles process accounting for an exiting task The Linux Kernel API 242 / 317 Chapter 14 Block Devices 14.1 blk_get_backing_dev_info blk_get_backing_dev_info get the address of a queues backing_dev_info Synopsis struct backing_dev_info * blk_get_backing_dev_info (struct block_device * bdev); Arguments bdev device Description Locates the passed devices request queue and returns the address of its backing_dev_info Will return NULL if the request queue cannot be located. 14.2 blk_delay_queue blk_delay_queue restart queueing after dened interval Synopsis void blk_delay_queue (struct request_queue * q, unsigned long msecs); Arguments q The struct request_queue in question msecs Delay in msecs The Linux Kernel API 243 / 317 Description Sometimes queueing needs to be postponed for a little while, to allow resources to come back. This function will make sure that queueing is restarted around the specied time. Queue lock must be held. 14.3 blk_start_queue blk_start_queue restart a previously stopped queue Synopsis void blk_start_queue (struct request_queue * q); Arguments q The struct request_queue in question Description blk_start_queue will clear the stop ag on the queue, and call the request_fn for the queue if it was in a stopped state when entered. Also see blk_stop_queue. Queue lock must be held. 14.4 blk_stop_queue blk_stop_queue stop a queue Synopsis void blk_stop_queue (struct request_queue * q); Arguments q The struct request_queue in question Description The Linux block layer assumes that a block driver will consume all entries on the request queue when the request_fn strategy is called. Often this will not happen, because of hardware limitations (queue depth settings). If a device driver gets a queue full response, or if it simply chooses not to queue more I/O at one point, it can call this function to prevent the request_fn from being called until the driver has signalled its ready to go again. This happens by calling blk_start_queue to restart queue operations. Queue lock must be held. 14.5 blk_sync_queue blk_sync_queue cancel any pending callbacks on a queue The Linux Kernel API 244 / 317 Synopsis void blk_sync_queue (struct request_queue * q); Arguments q the queue Description The block layer may perform asynchronous callback activity on a queue, such as calling the unplug function after a timeout. A block device may call blk_sync_queue to ensure that any such activity is cancelled, thus allowing it to release resources that the callbacks might use. The caller must already have made sure that its ->make_request_fn will not re-add plugging prior to calling this function. This function does not cancel any asynchronous activity arising out of elevator or throttling code. That would require elevaot or_exit and blkcg_exit_queue to be called with queue lock initialized. 14.6 __blk_run_queue __blk_run_queue run a single device queue Synopsis void __blk_run_queue (struct request_queue * q); Arguments q The queue to run Description See blk_run_queue. This variant must be called with the queue lock held and interrupts disabled. 14.7 blk_run_queue_async blk_run_queue_async run a single device queue in workqueue context Synopsis void blk_run_queue_async (struct request_queue * q); Arguments q The queue to run The Linux Kernel API 245 / 317 Description Tells kblockd to perform the equivalent of blk_run_queue on behalf of us. The caller must hold the queue lock. 14.8 blk_run_queue blk_run_queue run a single device queue Synopsis void blk_run_queue (struct request_queue * q); Arguments q The queue to run Description Invoke request handling on this queue, if it has pending work to do. May be used to restart queueing when a request has completed. 14.9 blk_queue_bypass_start blk_queue_bypass_start enter queue bypass mode Synopsis void blk_queue_bypass_start (struct request_queue * q); Arguments q queue of interest Description In bypass mode, only the dispatch FIFO queue of q is used. This function makes q enter bypass mode and drains all requests which were throttled or issued before. On return, its guaranteed that no request is being throttled or has ELVPRIV set and blk_queue_bypass true inside queue or RCU read lock. 14.10 blk_queue_bypass_end blk_queue_bypass_end leave queue bypass mode Synopsis void blk_queue_bypass_end (struct request_queue * q); The Linux Kernel API 246 / 317 Arguments q queue of interest Description Leave bypass mode and restore the normal queueing behavior. 14.11 blk_cleanup_queue blk_cleanup_queue shutdown a request queue Synopsis void blk_cleanup_queue (struct request_queue * q); Arguments q request queue to shutdown Description Mark q DYING, drain all pending requests, mark q DEAD, destroy and put it. All future requests will be failed immediately with -ENODEV. 14.12 blk_init_queue blk_init_queue prepare a request queue for use with a block device Synopsis struct request_queue * blk_init_queue (request_fn_proc * rfn, spinlock_t * lock); Arguments rfn The function to be called to process requests that have been placed on the queue. lock Request queue spin lock The Linux Kernel API 247 / 317 Description If a block device wishes to use the standard request handling procedures, which sorts requests and coalesces adjacent requests, then it must call blk_init_queue. The function rfn will be called when there are requests on the queue that need to be processed. If the device supports plugging, then rfn may not be called immediately when requests are available on the queue, but may be called at some time later instead. Plugged queues are generally unplugged when a buffer belonging to one of the requests on the queue is needed, or due to memory pressure. rfn is not required, or even expected, to remove all requests off the queue, but only as many as it can handle at a time. If it does leave requests on the queue, it is responsible for arranging that the requests get dealt with eventually. The queue spin lock must be held while manipulating the requests on the request queue; this lock will be taken also from interrupt context, so irq disabling is needed for it. Function returns a pointer to the initialized request queue, or NULL if it didnt succeed. Note blk_init_queue must be paired with a blk_cleanup_queue call when the block device is deactivated (such as at module unload). 14.13 blk_make_request blk_make_request given a bio, allocate a corresponding struct request. Synopsis struct request * blk_make_request (struct request_queue * q, struct bio * bio, gfp_t gfp_mask); Arguments q target request queue bio The bio describing the memory mappings that will be submitted for IO. It may be a chained-bio properly constructed by block/bio layer. gfp_mask gfp ags to be used for memory allocation Description blk_make_request is the parallel of generic_make_request for BLOCK_PC type commands. Where the struct request needs to be farther initialized by the caller. It is passed a struct bio, which describes the memory info of the I/O transfer. The caller of blk_make_request must make sure that bi_io_vec are set to describe the memory buffers. That bio_data_dir will return the needed direction of the request. (And all bios in the passed bio-chain are properly set accordingly) If called under none-sleepable conditions, mapped bio buffers must not need bouncing, by calling the appropriate masked or agged allocator, suitable for the target device. Otherwise the call to blk_queue_bounce will BUG. WARNING When allocating/cloning a bio-chain, careful consideration should be given to how you allocate bios. In particular, you cannot use __GFP_WAIT for anything but the rst bio in the chain. Otherwise you risk waiting for IO completion of a bio that hasnt been submitted yet, thus resulting in a deadlock. Alternatively bios should be allocated using bio_kmalloc instead of bio_alloc, as that avoids the mempool deadlock. If possible a big IOshould be split into smaller parts when allocation fails. Partial allocation should not be an error, or you risk a live-lock. The Linux Kernel API 248 / 317 14.14 blk_rq_set_block_pc blk_rq_set_block_pc initialize a requeest to type BLOCK_PC Synopsis void blk_rq_set_block_pc (struct request * rq); Arguments rq request to be initialized 14.15 blk_requeue_request blk_requeue_request put a request back on queue Synopsis void blk_requeue_request (struct request_queue * q, struct request * rq); Arguments q request queue where request should be inserted rq request to be inserted Description Drivers often keep queueing requests until the hardware cannot accept more, when that condition happens we need to put the request back on the queue. Must be called with queue lock held. 14.16 part_round_stats part_round_stats Round off the performance stats on a struct disk_stats. Synopsis void part_round_stats (int cpu, struct hd_struct * part); Arguments cpu cpu number for stats access part target partition The Linux Kernel API 249 / 317 Description The average IO queue length and utilisation statistics are maintained by observing the current state of the queue length and the amount of time it has been in this state for. Normally, that accounting is done on IO completion, but that can result in more than a seconds worth of IO being accounted for within any one second, leading to >100% utilisation. To deal with that, we call this function to do a round-off before returning the results when reading /proc/diskstats. This accounts immediately for all queue usage up to the current jifes and restarts the counters again. 14.17 blk_add_request_payload blk_add_request_payload add a payload to a request Synopsis void blk_add_request_payload (struct request * rq, struct page * page, unsigned int len); Arguments rq request to update page page backing the payload len length of the payload. Description This allows to later add a payload to an already submitted request by a block driver. The driver needs to take care of freeing the payload itself. Note that this is a quite horrible hack and nothing but handling of discard requests should ever use it. 14.18 generic_make_request generic_make_request hand a buffer to its device driver for I/O Synopsis void generic_make_request (struct bio * bio); Arguments bio The bio describing the location in memory and on the device. The Linux Kernel API 250 / 317 Description generic_make_request is used to make I/O requests of block devices. It is passed a struct bio, which describes the I/O that needs to be done. generic_make_request does not return any status. The success/failure status of the request, along with notication of completion, is delivered asynchronously through the bio->bi_end_io function described (one day) else where. The caller of generic_make_request must make sure that bi_io_vec are set to describe the memory buffer, and that bi_dev and bi_sector are set to describe the device address, and the bi_end_io and optionally bi_private are set to describe how completion notication should be signaled. generic_make_request and the drivers it calls may use bi_next if this bio happens to be merged with someone else, and may resubmit the bio to a lower device by calling into generic_make_request recursively, which means the bio should NOT be touched after the call to ->make_request_fn. 14.19 submit_bio submit_bio submit a bio to the block device layer for I/O Synopsis void submit_bio (int rw, struct bio * bio); Arguments rw whether to READ or WRITE, or maybe to READA (read ahead) bio The struct bio which describes the I/O Description submit_bio is very similar in purpose to generic_make_request, and uses that function to do most of the work. Both are fairly rough interfaces; bio must be presetup and ready for I/O. 14.20 blk_rq_check_limits blk_rq_check_limits Helper function to check a request for the queue limit Synopsis int blk_rq_check_limits (struct request_queue * q, struct request * rq); Arguments q the queue rq the request being checked The Linux Kernel API 251 / 317 Description rq may have been made based on weaker limitations of upper-level queues in request stacking drivers, and it may violate the limitation of q. Since the block layer and the underlying device driver trust rq after it is inserted to q, it should be checked against q before the insertion using this generic function. This function should also be useful for request stacking drivers in some cases below, so export this function. Request stacking drivers like request-based dm may change the queue limits while requests are in the queue (e.g. dms table swapping). Such request stacking drivers should check those requests against the new queue limits again when they dispatch those requests, although such checkings are also done against the old queue limits when submitting requests. 14.21 blk_insert_cloned_request blk_insert_cloned_request Helper for stacking drivers to submit a request Synopsis int blk_insert_cloned_request (struct request_queue * q, struct request * rq); Arguments q the queue to submit the request rq the request being queued 14.22 blk_rq_err_bytes blk_rq_err_bytes determine number of bytes till the next failure boundary Synopsis unsigned int blk_rq_err_bytes (const struct request * rq); Arguments rq request to examine Description A request could be merge of IOs which require different failure handling. This function determines the number of bytes which can be failed from the beginning of the request without crossing into area which need to be retried further. Return The number of bytes to fail. Context queue_lock must be held. The Linux Kernel API 252 / 317 14.23 blk_peek_request blk_peek_request peek at the top of a request queue Synopsis struct request * blk_peek_request (struct request_queue * q); Arguments q request queue to peek at Description Return the request at the top of q. The returned request should be started using blk_start_request before LLD starts processing it. Return Pointer to the request at the top of q if available. Null otherwise. Context queue_lock must be held. 14.24 blk_start_request blk_start_request start request processing on the driver Synopsis void blk_start_request (struct request * req); Arguments req request to dequeue Description Dequeue req and start timeout timer on it. This hands off the request to the driver. Block internal functions which dont want to start timer should call blk_dequeue_request. Context queue_lock must be held. The Linux Kernel API 253 / 317 14.25 blk_fetch_request blk_fetch_request fetch a request from a request queue Synopsis struct request * blk_fetch_request (struct request_queue * q); Arguments q request queue to fetch a request from Description Return the request at the top of q. The request is started on return and LLD can start processing it immediately. Return Pointer to the request at the top of q if available. Null otherwise. Context queue_lock must be held. 14.26 blk_update_request blk_update_request Special helper function for request stacking drivers Synopsis bool blk_update_request (struct request * req, int error, unsigned int nr_bytes); Arguments req the request being processed error 0 for success, < 0 for error nr_bytes number of bytes to complete req Description Ends I/O on a number of bytes attached to req, but doesnt complete the request structure even if req doesnt have leftover. If req has leftover, sets it up for the next range of segments. This special helper function is only for request stacking drivers (e.g. request-based dm) so that they can handle partial completion. Actual device drivers should use blk_end_request instead. Passing the result of blk_rq_bytes as nr_bytes guarantees false return from this function. The Linux Kernel API 254 / 317 Return false - this request doesnt have any more data true - this request has more data 14.27 blk_unprep_request blk_unprep_request unprepare a request Synopsis void blk_unprep_request (struct request * req); Arguments req the request Description This function makes a request ready for complete resubmission (or completion). It happens only after all error handling is complete, so represents the appropriate moment to deallocate any resources that were allocated to the request in the prep_rq_fn. The queue lock is held when calling this. 14.28 blk_end_request blk_end_request Helper function for drivers to complete the request. Synopsis bool blk_end_request (struct request * rq, int error, unsigned int nr_bytes); Arguments rq the request being processed error 0 for success, < 0 for error nr_bytes number of bytes to complete Description Ends I/O on a number of bytes attached to rq. If rq has leftover, sets it up for the next range of segments. Return false - we are done with this request true - still buffers pending for this request The Linux Kernel API 255 / 317 14.29 blk_end_request_all blk_end_request_all Helper function for drives to nish the request. Synopsis void blk_end_request_all (struct request * rq, int error); Arguments rq the request to nish error 0 for success, < 0 for error Description Completely nish rq. 14.30 blk_end_request_cur blk_end_request_cur Helper function to nish the current request chunk. Synopsis bool blk_end_request_cur (struct request * rq, int error); Arguments rq the request to nish the current chunk for error 0 for success, < 0 for error Description Complete the current consecutively mapped chunk from rq. Return false - we are done with this request true - still buffers pending for this request 14.31 blk_end_request_err blk_end_request_err Finish a request till the next failure boundary. Synopsis bool blk_end_request_err (struct request * rq, int error); The Linux Kernel API 256 / 317 Arguments rq the request to nish till the next failure boundary for error must be negative errno Description Complete rq till the next failure boundary. Return false - we are done with this request true - still buffers pending for this request 14.32 __blk_end_request __blk_end_request Helper function for drivers to complete the request. Synopsis bool __blk_end_request (struct request * rq, int error, unsigned int nr_bytes); Arguments rq the request being processed error 0 for success, < 0 for error nr_bytes number of bytes to complete Description Must be called with queue lock held unlike blk_end_request. Return false - we are done with this request true - still buffers pending for this request 14.33 __blk_end_request_all __blk_end_request_all Helper function for drives to nish the request. Synopsis void __blk_end_request_all (struct request * rq, int error); The Linux Kernel API 257 / 317 Arguments rq the request to nish error 0 for success, < 0 for error Description Completely nish rq. Must be called with queue lock held. 14.34 __blk_end_request_cur __blk_end_request_cur Helper function to nish the current request chunk. Synopsis bool __blk_end_request_cur (struct request * rq, int error); Arguments rq the request to nish the current chunk for error 0 for success, < 0 for error Description Complete the current consecutively mapped chunk from rq. Must be called with queue lock held. Return false - we are done with this request true - still buffers pending for this request 14.35 __blk_end_request_err __blk_end_request_err Finish a request till the next failure boundary. Synopsis bool __blk_end_request_err (struct request * rq, int error); Arguments rq the request to nish till the next failure boundary for error must be negative errno The Linux Kernel API 258 / 317 Description Complete rq till the next failure boundary. Must be called with queue lock held. Return false - we are done with this request true - still buffers pending for this request 14.36 rq_ush_dcache_pages rq_ush_dcache_pages Helper function to ush all pages in a request Synopsis void rq_ush_dcache_pages (struct request * rq); Arguments rq the request to be ushed Description Flush all pages in rq. 14.37 blk_lld_busy blk_lld_busy Check if underlying low-level drivers of a device are busy Synopsis int blk_lld_busy (struct request_queue * q); Arguments q the queue of the device being checked Description Check if underlying low-level drivers of a device are busy. If the drivers want to export their busy state, they must set own exporting function using blk_queue_lld_busy rst. Basically, this function is used only by request stacking drivers to stop dispatching requests to underlying devices when un- derlying devices are busy. This behavior helps more I/O merging on the queue of the request stacking driver and prevents I/O throughput regression on burst I/O load. The Linux Kernel API 259 / 317 Return 0 - Not busy (The request stacking driver should dispatch request) 1 - Busy (The request stacking driver should stop dispatching request) 14.38 blk_rq_unprep_clone blk_rq_unprep_clone Helper function to free all bios in a cloned request Synopsis void blk_rq_unprep_clone (struct request * rq); Arguments rq the clone request to be cleaned up Description Free all bios in rq for a cloned request. 14.39 blk_rq_prep_clone blk_rq_prep_clone Helper function to setup clone request Synopsis int blk_rq_prep_clone (struct request * rq, struct request * rq_src, struct bio_set * bs, gfp_t gfp_mask, int (*bio_ctr) (struct bio *, struct bio *, void *), void * data); Arguments rq the request to be setup rq_src original request to be cloned bs bio_set that bios for clone are allocated from gfp_mask memory allocation mask for bio bio_ctr setup function to be called for each clone bio. Returns 0 for success, non 0 for failure. data private data to be passed to bio_ctr Description Clones bios in rq_src to rq, and copies attributes of rq_src to rq. The actual data parts of rq_src (e.g. ->cmd, ->sense) are not copied, and copying such parts is the callers responsibility. Also, pages which the original bios are pointing to are not copied and the cloned bios just point same pages. So cloned bios must be completed before original bios, which means the caller must complete rq before rq_src. The Linux Kernel API 260 / 317 14.40 blk_start_plug blk_start_plug initialize blk_plug and track it inside the task_struct Synopsis void blk_start_plug (struct blk_plug * plug); Arguments plug The struct blk_plug that needs to be initialized Description Tracking blk_plug inside the task_struct will help with auto-ushing the pending I/O should the task end up blocking between blk_start_plug and blk_finish_plug. This is important from a performance perspective, but also ensures that we dont deadlock. For instance, if the task is blocking for a memory allocation, memory reclaim could end up wanting to free a page belonging to that request that is currently residing in our private plug. By ushing the pending I/O when the process goes to sleep, we avoid this kind of deadlock. 14.41 blk_pm_runtime_init blk_pm_runtime_init Block layer runtime PM initialization routine Synopsis void blk_pm_runtime_init (struct request_queue * q, struct device * dev); Arguments q the queue of the device dev the device the queue belongs to Description Initialize runtime-PM-related elds for q and start auto suspend for dev. Drivers that want to take advantage of request-based runtime PM should call this function after dev has been initialized, and its request queue q has been allocated, and runtime PM for it can not happen yet(either due to disabled/forbidden or its usage_count > 0). In most cases, driver should call this function before any I/O has taken place. This function takes care of setting up using auto suspend for the device, the autosuspend delay is set to -1 to make runtime suspend impossible until an updated value is either set by user or by driver. Drivers do not need to touch other autosuspend settings. The block layer runtime PM is request based, so only works for drivers that use request as their IO unit instead of those directly use bios. The Linux Kernel API 261 / 317 14.42 blk_pre_runtime_suspend blk_pre_runtime_suspend Pre runtime suspend check Synopsis int blk_pre_runtime_suspend (struct request_queue * q); Arguments q the queue of the device Description This function will check if runtime suspend is allowed for the device by examining if there are any requests pending in the queue. If there are requests pending, the device can not be runtime suspended; otherwise, the queues status will be updated to SUSPENDING and the driver can proceed to suspend the device. For the not allowed case, we mark last busy for the device so that runtime PM core will try to autosuspend it some time later. This function should be called near the start of the devices runtime_suspend callback. Return 0 - OK to runtime suspend the device -EBUSY - Device should not be runtime suspended 14.43 blk_post_runtime_suspend blk_post_runtime_suspend Post runtime suspend processing Synopsis void blk_post_runtime_suspend (struct request_queue * q, int err); Arguments q the queue of the device err return value of the devices runtime_suspend function Description Update the queues runtime status according to the return value of the devices runtime suspend function and mark last busy for the device so that PM core will try to auto suspend the device at a later time. This function should be called near the end of the devices runtime_suspend callback. The Linux Kernel API 262 / 317 14.44 blk_pre_runtime_resume blk_pre_runtime_resume Pre runtime resume processing Synopsis void blk_pre_runtime_resume (struct request_queue * q); Arguments q the queue of the device Description Update the queues runtime status to RESUMING in preparation for the runtime resume of the device. This function should be called near the start of the devices runtime_resume callback. 14.45 blk_post_runtime_resume blk_post_runtime_resume Post runtime resume processing Synopsis void blk_post_runtime_resume (struct request_queue * q, int err); Arguments q the queue of the device err return value of the devices runtime_resume function Description Update the queues runtime status according to the return value of the devices runtime_resume function. If it is successfully resumed, process the requests that are queued into the devices queue when it is resuming and then mark last busy and initiate autosuspend for it. This function should be called near the end of the devices runtime_resume callback. 14.46 __blk_run_queue_uncond __blk_run_queue_uncond run a queue whether or not it has been stopped Synopsis void __blk_run_queue_uncond (struct request_queue * q); The Linux Kernel API 263 / 317 Arguments q The queue to run Description Invoke request handling on a queue if there are any pending requests. May be used to restart request handling after a request has completed. This variant runs the queue whether or not the queue has been stopped. Must be called with the queue lock held and interrupts disabled. See also blk_run_queue. 14.47 __blk_drain_queue __blk_drain_queue drain requests from request_queue Synopsis void __blk_drain_queue (struct request_queue * q, bool drain_all); Arguments q queue to drain drain_all whether to drain all requests or only the ones w/ ELVPRIV Description Drain requests from q. If drain_all is set, all requests are drained. If not, only ELVPRIV requests are drained. The caller is responsible for ensuring that no new requests which need to be drained are queued. 14.48 rq_ioc rq_ioc determine io_context for request allocation Synopsis struct io_context * rq_ioc (struct bio * bio); Arguments bio request being allocated is for this bio (can be NULL) Description Determine io_context to use for request allocation for bio. May return NULL if current->io_context doesnt exist. The Linux Kernel API 264 / 317 14.49 __get_request __get_request get a free request Synopsis struct request * __get_request (struct request_list * rl, int rw_ags, struct bio * bio, gfp_t gfp_mask); Arguments rl request list to allocate from rw_flags RW and SYNC ags bio bio to allocate request for (can be NULL) gfp_mask allocation mask Description Get a free request from q. This function may fail under memory pressure or if q is dead. Must be callled with q->queue_lock held and, Returns NULL on failure, with q->queue_lock held. Returns !NULL on success, with q->queue_lock *not held*. 14.50 get_request get_request get a free request Synopsis struct request * get_request (struct request_queue * q, int rw_ags, struct bio * bio, gfp_t gfp_mask); Arguments q request_queue to allocate request from rw_flags RW and SYNC ags bio bio to allocate request for (can be NULL) gfp_mask allocation mask Description Get a free request from q. If __GFP_WAIT is set in gfp_mask, this function keeps retrying under memory pressure and fails iff q is dead. Must be callled with q->queue_lock held and, Returns NULL on failure, with q->queue_lock held. Returns !NULL on success, with q->queue_lock *not held*. The Linux Kernel API 265 / 317 14.51 blk_attempt_plug_merge blk_attempt_plug_merge try to merge with currents plugged list Synopsis bool blk_attempt_plug_merge (struct request_queue * q, struct bio * bio, unsigned int * request_count); Arguments q request_queue new bio is being queued at bio new bio being queued request_count out parameter for number of traversed plugged requests Description Determine whether bio being queued on q can be merged with a request on currents plugged list. Returns true if merge was successful, otherwise false. Plugging coalesces IOs from the same issuer for the same purpose without going through q->queue_lock. As such its more of an issuing mechanism than scheduling, and the request, while may have elvpriv data, is not added on the elevator at this point. In addition, we dont have reliable access to the elevator outside queue lock. Only check basic merging parameters without querying the elevator. Caller must ensure !blk_queue_nomerges(q) beforehand. 14.52 blk_end_bidi_request blk_end_bidi_request Complete a bidi request Synopsis bool blk_end_bidi_request (struct request * rq, int error, unsigned int nr_bytes, unsigned int bidi_bytes); Arguments rq the request to complete error 0 for success, < 0 for error nr_bytes number of bytes to complete rq bidi_bytes number of bytes to complete rq->next_rq Description Ends I/O on a number of bytes attached to rq and rq->next_rq. Drivers that supports bidi can safely call this member for any type of request, bidi or uni. In the later case bidi_bytes is just ignored. The Linux Kernel API 266 / 317 Return false - we are done with this request true - still buffers pending for this request 14.53 __blk_end_bidi_request __blk_end_bidi_request Complete a bidi request with queue lock held Synopsis bool __blk_end_bidi_request (struct request * rq, int error, unsigned int nr_bytes, unsigned int bidi_bytes); Arguments rq the request to complete error 0 for success, < 0 for error nr_bytes number of bytes to complete rq bidi_bytes number of bytes to complete rq->next_rq Description Identical to blk_end_bidi_request except that queue lock is assumed to be locked on entry and remains so on return. Return false - we are done with this request true - still buffers pending for this request 14.54 blk_rq_map_user blk_rq_map_user map user data to a request, for REQ_TYPE_BLOCK_PC usage Synopsis int blk_rq_map_user (struct request_queue * q, struct request * rq, struct rq_map_data * map_data, void __user * ubuf, unsigned long len, gfp_t gfp_mask); Arguments q request queue where request should be inserted rq request structure to ll map_data pointer to the rq_map_data holding pages (if necessary) ubuf the user buffer len length of user data gfp_mask memory allocation ags The Linux Kernel API 267 / 317 Description Data will be mapped directly for zero copy I/O, if possible. Otherwise a kernel bounce buffer is used. A matching blk_rq_unmap_user must be issued at the end of I/O, while still in process context. Note The mapped bio may need to be bounced through blk_queue_bounce before being submitted to the device, as pages mapped may be out of reach. Its the callers responsibility to make sure this happens. The original bio must be passed back in to blk_rq_unmap_user for proper unmapping. 14.55 blk_rq_map_user_iov blk_rq_map_user_iov map user data to a request, for REQ_TYPE_BLOCK_PC usage Synopsis int blk_rq_map_user_iov (struct request_queue * q, struct request * rq, struct rq_map_data * map_data, const struct sg_iovec * iov, int iov_count, unsigned int len, gfp_t gfp_mask); Arguments q request queue where request should be inserted rq request to map data to map_data pointer to the rq_map_data holding pages (if necessary) iov pointer to the iovec iov_count number of elements in the iovec len I/O byte count gfp_mask memory allocation ags Description Data will be mapped directly for zero copy I/O, if possible. Otherwise a kernel bounce buffer is used. A matching blk_rq_unmap_user must be issued at the end of I/O, while still in process context. Note The mapped bio may need to be bounced through blk_queue_bounce before being submitted to the device, as pages mapped may be out of reach. Its the callers responsibility to make sure this happens. The original bio must be passed back in to blk_rq_unmap_user for proper unmapping. 14.56 blk_rq_unmap_user blk_rq_unmap_user unmap a request with user data The Linux Kernel API 268 / 317 Synopsis int blk_rq_unmap_user (struct bio * bio); Arguments bio start of bio list Description Unmap a rq previously mapped by blk_rq_map_user. The caller must supply the original rq->bio from the blk_rq_map _user return, since the I/O completion may have changed rq->bio. 14.57 blk_rq_map_kern blk_rq_map_kern map kernel data to a request, for REQ_TYPE_BLOCK_PC usage Synopsis int blk_rq_map_kern (struct request_queue * q, struct request * rq, void * kbuf, unsigned int len, gfp_t gfp_mask); Arguments q request queue where request should be inserted rq request to ll kbuf the kernel buffer len length of user data gfp_mask memory allocation ags Description Data will be mapped directly if possible. Otherwise a bounce buffer is used. Can be called multiple times to append multiple buffers. 14.58 blk_release_queue blk_release_queue release a struct request_queue when it is no longer needed Synopsis void blk_release_queue (struct kobject * kobj); Arguments kobj the kobj belonging to the request queue to be released The Linux Kernel API 269 / 317 Description blk_release_queue is the pair to blk_init_queue or blk_queue_make_request. It should be called when a request queue is being released; typically when a block device is being de-registered. Currently, its primary task it to free all the struct request structures that were allocated to the queue and the queue itself. Caveat Hopefully the low level driver will have nished any outstanding requests rst... 14.59 blk_queue_prep_rq blk_queue_prep_rq set a prepare_request function for queue Synopsis void blk_queue_prep_rq (struct request_queue * q, prep_rq_fn * pfn); Arguments q queue pfn prepare_request function Description Its possible for a queue to register a prepare_request callback which is invoked before the request is handed to the request_fn. The goal of the function is to prepare a request for I/O, it can be used to build a cdb from the request data for instance. 14.60 blk_queue_unprep_rq blk_queue_unprep_rq set an unprepare_request function for queue Synopsis void blk_queue_unprep_rq (struct request_queue * q, unprep_rq_fn * ufn); Arguments q queue ufn unprepare_request function Description Its possible for a queue to register an unprepare_request callback which is invoked before the request is nally completed. The goal of the function is to deallocate any data that was allocated in the prepare_request callback. The Linux Kernel API 270 / 317 14.61 blk_queue_merge_bvec blk_queue_merge_bvec set a merge_bvec function for queue Synopsis void blk_queue_merge_bvec (struct request_queue * q, merge_bvec_fn * mbfn); Arguments q queue mbfn merge_bvec_fn Description Usually queues have static limitations on the max sectors or segments that we can put in a request. Stacking drivers may have some settings that are dynamic, and thus we have to query the queue whether it is ok to add a new bio_vec to a bio at a given offset or not. If the block device has such limitations, it needs to register a merge_bvec_fn to control the size of bios sent to it. Note that a block device *must* allow a single page to be added to an empty bio. The block device driver may want to use the bio_split function to deal with these bios. By default no merge_bvec_fn is dened for a queue, and only the xed limits are honored. 14.62 blk_set_default_limits blk_set_default_limits reset limits to default values Synopsis void blk_set_default_limits (struct queue_limits * lim); Arguments lim the queue_limits structure to reset Description Returns a queue_limit struct to its default state. 14.63 blk_set_stacking_limits blk_set_stacking_limits set default limits for stacking devices Synopsis void blk_set_stacking_limits (struct queue_limits * lim); The Linux Kernel API 271 / 317 Arguments lim the queue_limits structure to reset Description Returns a queue_limit struct to its default state. Should be used by stacking drivers like DM that have no internal limits. 14.64 blk_queue_make_request blk_queue_make_request dene an alternate make_request function for a device Synopsis void blk_queue_make_request (struct request_queue * q, make_request_fn * mfn); Arguments q the request queue for the device to be affected mfn the alternate make_request function Description The normal way for struct bios to be passed to a device driver is for them to be collected into requests on a request queue, and then to allow the device driver to select requests off that queue when it is ready. This works well for many block devices. However some block devices (typically virtual devices such as md or lvm) do not benet from the processing on the request queue, and are served best by having the requests passed directly to them. This can be achieved by providing a function to blk_queue_make_request. Caveat The driver that does this *must* be able to deal appropriately with buffers in highmemory. This can be accomplished by either calling __bio_kmap_atomic to get a temporary kernel mapping, or by calling blk_queue_bounce to create a buffer in normal memory. 14.65 blk_queue_bounce_limit blk_queue_bounce_limit set bounce buffer limit for queue Synopsis void blk_queue_bounce_limit (struct request_queue * q, u64 max_addr); Arguments q the request queue for the device max_addr the maximum address the device can handle The Linux Kernel API 272 / 317 Description Different hardware can have different requirements as to what pages it can do I/O directly to. A low level driver can call blk_queue_bounce_limit to have lower memory pages allocated as bounce buffers for doing I/O to pages residing above max_a ddr. 14.66 blk_limits_max_hw_sectors blk_limits_max_hw_sectors set hard and soft limit of max sectors for request Synopsis void blk_limits_max_hw_sectors (struct queue_limits * limits, unsigned int max_hw_sectors); Arguments limits the queue limits max_hw_sectors max hardware sectors in the usual 512b unit Description Enables a low level driver to set a hard upper limit, max_hw_sectors, on the size of requests. max_hw_sectors is set by the device driver based upon the combined capabilities of I/O controller and storage device. max_sectors is a soft limit imposed by the block layer for lesystem type requests. This value can be overridden on a per-device basis in /sys/block/<device>/queue/max_sectors_kb. The soft limit can not exceed max_hw_sectors. 14.67 blk_queue_max_hw_sectors blk_queue_max_hw_sectors set max sectors for a request for this queue Synopsis void blk_queue_max_hw_sectors (struct request_queue * q, unsigned int max_hw_sectors); Arguments q the request queue for the device max_hw_sectors max hardware sectors in the usual 512b unit Description See description for blk_limits_max_hw_sectors. The Linux Kernel API 273 / 317 14.68 blk_queue_chunk_sectors blk_queue_chunk_sectors set size of the chunk for this queue Synopsis void blk_queue_chunk_sectors (struct request_queue * q, unsigned int chunk_sectors); Arguments q the request queue for the device chunk_sectors chunk sectors in the usual 512b unit Description If a driver doesnt want IOs to cross a given chunk size, it can set this limit and prevent merging across chunks. Note that the chunk size must currently be a power-of-2 in sectors. Also note that the block layer must accept a page worth of data at any offset. So if the crossing of chunks is a hard limitation in the driver, it must still be prepared to split single page bios. 14.69 blk_queue_max_discard_sectors blk_queue_max_discard_sectors set max sectors for a single discard Synopsis void blk_queue_max_discard_sectors (struct request_queue * q, unsigned int max_discard_sectors); Arguments q the request queue for the device max_discard_sectors maximum number of sectors to discard 14.70 blk_queue_max_write_same_sectors blk_queue_max_write_same_sectors set max sectors for a single write same Synopsis void blk_queue_max_write_same_sectors (struct request_queue * q, unsigned int max_write_same_sectors); Arguments q the request queue for the device max_write_same_sectors maximum number of sectors to write per command The Linux Kernel API 274 / 317 14.71 blk_queue_max_segments blk_queue_max_segments set max hw segments for a request for this queue Synopsis void blk_queue_max_segments (struct request_queue * q, unsigned short max_segments); Arguments q the request queue for the device max_segments max number of segments Description Enables a low level driver to set an upper limit on the number of hw data segments in a request. 14.72 blk_queue_max_segment_size blk_queue_max_segment_size set max segment size for blk_rq_map_sg Synopsis void blk_queue_max_segment_size (struct request_queue * q, unsigned int max_size); Arguments q the request queue for the device max_size max size of segment in bytes Description Enables a low level driver to set an upper limit on the size of a coalesced segment 14.73 blk_queue_logical_block_size blk_queue_logical_block_size set logical block size for the queue Synopsis void blk_queue_logical_block_size (struct request_queue * q, unsigned short size); The Linux Kernel API 275 / 317 Arguments q the request queue for the device size the logical block size, in bytes Description This should be set to the lowest possible block size that the storage device can address. The default of 512 covers most hardware. 14.74 blk_queue_physical_block_size blk_queue_physical_block_size set physical block size for the queue Synopsis void blk_queue_physical_block_size (struct request_queue * q, unsigned int size); Arguments q the request queue for the device size the physical block size, in bytes Description This should be set to the lowest possible sector size that the hardware can operate on without reverting to read-modify-write operations. 14.75 blk_queue_alignment_offset blk_queue_alignment_offset set physical block alignment offset Synopsis void blk_queue_alignment_offset (struct request_queue * q, unsigned int offset); Arguments q the request queue for the device offset alignment offset in bytes Description Some devices are naturally misaligned to compensate for things like the legacy DOS partition table 63-sector offset. Low-level drivers should call this function for devices whose rst sector is not naturally aligned. The Linux Kernel API 276 / 317 14.76 blk_limits_io_min blk_limits_io_min set minimum request size for a device Synopsis void blk_limits_io_min (struct queue_limits * limits, unsigned int min); Arguments limits the queue limits min smallest I/O size in bytes Description Some devices have an internal block size bigger than the reported hardware sector size. This function can be used to signal the smallest I/O the device can perform without incurring a performance penalty. 14.77 blk_queue_io_min blk_queue_io_min set minimum request size for the queue Synopsis void blk_queue_io_min (struct request_queue * q, unsigned int min); Arguments q the request queue for the device min smallest I/O size in bytes Description Storage devices may report a granularity or preferred minimum I/O size which is the smallest request the device can perform without incurring a performance penalty. For disk drives this is often the physical block size. For RAID arrays it is often the stripe chunk size. A properly aligned multiple of minimum_io_size is the preferred request size for workloads where a high number of I/O operations is desired. 14.78 blk_limits_io_opt blk_limits_io_opt set optimal request size for a device Synopsis void blk_limits_io_opt (struct queue_limits * limits, unsigned int opt); The Linux Kernel API 277 / 317 Arguments limits the queue limits opt smallest I/O size in bytes Description Storage devices may report an optimal I/O size, which is the devices preferred unit for sustained I/O. This is rarely reported for disk drives. For RAIDarrays it is usually the stripe width or the internal track size. Aproperly aligned multiple of optimal_io_size is the preferred request size for workloads where sustained throughput is desired. 14.79 blk_queue_io_opt blk_queue_io_opt set optimal request size for the queue Synopsis void blk_queue_io_opt (struct request_queue * q, unsigned int opt); Arguments q the request queue for the device opt optimal request size in bytes Description Storage devices may report an optimal I/O size, which is the devices preferred unit for sustained I/O. This is rarely reported for disk drives. For RAIDarrays it is usually the stripe width or the internal track size. Aproperly aligned multiple of optimal_io_size is the preferred request size for workloads where sustained throughput is desired. 14.80 blk_queue_stack_limits blk_queue_stack_limits inherit underlying queue limits for stacked drivers Synopsis void blk_queue_stack_limits (struct request_queue * t, struct request_queue * b); Arguments t the stacking driver (top) b the underlying device (bottom) The Linux Kernel API 278 / 317 14.81 blk_stack_limits blk_stack_limits adjust queue_limits for stacked devices Synopsis int blk_stack_limits (struct queue_limits * t, struct queue_limits * b, sector_t start); Arguments t the stacking driver limits (top device) b the underlying queue limits (bottom, component device) start rst data sector within component device Description This function is used by stacking drivers like MD and DM to ensure that all component devices have compatible block sizes and alignments. The stacking driver must provide a queue_limits struct (top) and then iteratively call the stacking function for all component (bottom) devices. The stacking function will attempt to combine the values and ensure proper alignment. Returns 0 if the top and bottom queue_limits are compatible. The top devices block sizes and alignment offsets may be adjusted to ensure alignment with the bottom device. If no compatible sizes and alignments exist, -1 is returned and the resulting top queue_limits will have the misaligned ag set to indicate that the alignment_offset is undened. 14.82 bdev_stack_limits bdev_stack_limits adjust queue limits for stacked drivers Synopsis int bdev_stack_limits (struct queue_limits * t, struct block_device * bdev, sector_t start); Arguments t the stacking driver limits (top device) bdev the component block_device (bottom) start rst data sector within component device Description Merges queue limits for a top device and a block_device. Returns 0 if alignment didnt change. Returns -1 if adding the bottom device caused misalignment. 14.83 disk_stack_limits disk_stack_limits adjust queue limits for stacked drivers The Linux Kernel API 279 / 317 Synopsis void disk_stack_limits (struct gendisk * disk, struct block_device * bdev, sector_t offset); Arguments disk MD/DM gendisk (top) bdev the underlying block device (bottom) offset offset to beginning of data within component device Description Merges the limits for a top level gendisk and a bottom level block_device. 14.84 blk_queue_dma_pad blk_queue_dma_pad set pad mask Synopsis void blk_queue_dma_pad (struct request_queue * q, unsigned int mask); Arguments q the request queue for the device mask pad mask Description Set dma pad mask. Appending pad buffer to a request modies the last entry of a scatter list such that it includes the pad buffer. 14.85 blk_queue_update_dma_pad blk_queue_update_dma_pad update pad mask Synopsis void blk_queue_update_dma_pad (struct request_queue * q, unsigned int mask); Arguments q the request queue for the device mask pad mask The Linux Kernel API 280 / 317 Description Update dma pad mask. Appending pad buffer to a request modies the last entry of a scatter list such that it includes the pad buffer. 14.86 blk_queue_dma_drain blk_queue_dma_drain Set up a drain buffer for excess dma. Synopsis int blk_queue_dma_drain (struct request_queue * q, dma_drain_needed_fn * dma_drain_needed, void * buf, unsigned int size); Arguments q the request queue for the device dma_drain_needed fn which returns non-zero if drain is necessary buf physically contiguous buffer size size of the buffer in bytes Description Some devices have excess DMA problems and cant simply discard (or zero ll) the unwanted piece of the transfer. They have to have a real area of memory to transfer it into. The use case for this is ATAPI devices in DMA mode. If the packet command causes a transfer bigger than the transfer size some HBAs will lock up if there arent DMA elements to contain the excess transfer. What this API does is adjust the queue so that the buf is always appended silently to the scatterlist. Note This routine adjusts max_hw_segments to make room for appending the drain buffer. If you call blk_queue_max_segments after calling this routine, you must set the limit to one fewer than your device can support otherwise there wont be room for the drain buffer. 14.87 blk_queue_segment_boundary blk_queue_segment_boundary set boundary rules for segment merging Synopsis void blk_queue_segment_boundary (struct request_queue * q, unsigned long mask); Arguments q the request queue for the device mask the memory boundary mask The Linux Kernel API 281 / 317 14.88 blk_queue_dma_alignment blk_queue_dma_alignment set dma length and memory alignment Synopsis void blk_queue_dma_alignment (struct request_queue * q, int mask); Arguments q the request queue for the device mask alignment mask description set required memory and length alignment for direct dma transactions. this is used when building direct io requests for the queue. 14.89 blk_queue_update_dma_alignment blk_queue_update_dma_alignment update dma length and memory alignment Synopsis void blk_queue_update_dma_alignment (struct request_queue * q, int mask); Arguments q the request queue for the device mask alignment mask description update required memory and length alignment for direct dma transactions. If the requested alignment is larger than the current alignment, then the current queue alignment is updated to the new value, otherwise it is left alone. The design of this is to allow multiple objects (driver, device, transport etc) to set their respective alignments without having them interfere. 14.90 blk_queue_ush blk_queue_ush congure queues cache ush capability Synopsis void blk_queue_ush (struct request_queue * q, unsigned int ush); The Linux Kernel API 282 / 317 Arguments q the request queue for the device flush 0, REQ_FLUSH or REQ_FLUSH | REQ_FUA Description Tell block layer cache ush capability of q. If it supports ushing, REQ_FLUSH should be set. If it supports bypassing write cache for individual writes, REQ_FUA should be set. 14.91 blk_execute_rq_nowait blk_execute_rq_nowait insert a request into queue for execution Synopsis void blk_execute_rq_nowait (struct request_queue * q, struct gendisk * bd_disk, struct request * rq, int at_head, rq_end_io_fn * done); Arguments q queue to insert the request in bd_disk matching gendisk rq request to insert at_head insert request at head or tail of queue done I/O completion handler Description Insert a fully prepared request at the back of the I/O scheduler queue for execution. Dont wait for completion. Note This function will invoke done directly if the queue is dead. 14.92 blk_execute_rq blk_execute_rq insert a request into queue for execution Synopsis int blk_execute_rq (struct request_queue * q, struct gendisk * bd_disk, struct request * rq, int at_head); The Linux Kernel API 283 / 317 Arguments q queue to insert the request in bd_disk matching gendisk rq request to insert at_head insert request at head or tail of queue Description Insert a fully prepared request at the back of the I/O scheduler queue for execution and wait for completion. 14.93 blkdev_issue_ush blkdev_issue_ush queue a ush Synopsis int blkdev_issue_ush (struct block_device * bdev, gfp_t gfp_mask, sector_t * error_sector); Arguments bdev blockdev to issue ush for gfp_mask memory allocation ags (for bio_alloc) error_sector error sector Description Issue a ush for the block device in question. Caller can supply room for storing the error offset in case of a ush error, if they wish to. If WAIT ag is not passed then caller may check only what request was pushed in some internal queue for later handling. 14.94 blkdev_issue_discard blkdev_issue_discard queue a discard Synopsis int blkdev_issue_discard (struct block_device * bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, unsigned long ags); Arguments bdev blockdev to issue discard for sector start sector nr_sects number of sectors to discard gfp_mask memory allocation ags (for bio_alloc) flags BLKDEV_IFL_* ags to control behaviour The Linux Kernel API 284 / 317 Description Issue a discard request for the sectors in question. 14.95 blkdev_issue_write_same blkdev_issue_write_same queue a write same operation Synopsis int blkdev_issue_write_same (struct block_device * bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct page * page); Arguments bdev target blockdev sector start sector nr_sects number of sectors to write gfp_mask memory allocation ags (for bio_alloc) page page containing data to write Description Issue a write same request for the sectors in question. 14.96 blkdev_issue_zeroout blkdev_issue_zeroout zero-ll a block range Synopsis int blkdev_issue_zeroout (struct block_device * bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask); Arguments bdev blockdev to write sector start sector nr_sects number of sectors to write gfp_mask memory allocation ags (for bio_alloc) Description Generate and issue number of bios with zeroled pages. The Linux Kernel API 285 / 317 14.97 blk_queue_nd_tag blk_queue_nd_tag nd a request by its tag and queue Synopsis struct request * blk_queue_nd_tag (struct request_queue * q, int tag); Arguments q The request queue for the device tag The tag of the request Notes Should be used when a device returns a tag and you want to match it with a request. no locks need be held. 14.98 blk_free_tags blk_free_tags release a given set of tag maintenance info Synopsis void blk_free_tags (struct blk_queue_tag * bqt); Arguments bqt the tag map to free Description Drop the reference count on bqt and frees it when the last reference is dropped. 14.99 blk_queue_free_tags blk_queue_free_tags release tag maintenance info Synopsis void blk_queue_free_tags (struct request_queue * q); Arguments q the request queue for the device The Linux Kernel API 286 / 317 Notes This is used to disable tagged queuing to a device, yet leave queue in function. 14.100 blk_init_tags blk_init_tags initialize the tag info for an external tag map Synopsis struct blk_queue_tag * blk_init_tags (int depth); Arguments depth the maximum queue depth supported 14.101 blk_queue_init_tags blk_queue_init_tags initialize the queue tag info Synopsis int blk_queue_init_tags (struct request_queue * q, int depth, struct blk_queue_tag * tags); Arguments q the request queue for the device depth the maximum queue depth supported tags the tag to use Description Queue lock must be held here if the function is called to resize an existing map. 14.102 blk_queue_resize_tags blk_queue_resize_tags change the queueing depth Synopsis int blk_queue_resize_tags (struct request_queue * q, int new_depth); The Linux Kernel API 287 / 317 Arguments q the request queue for the device new_depth the new max command queueing depth Notes Must be called with the queue lock held. 14.103 blk_queue_end_tag blk_queue_end_tag end tag operations for a request Synopsis void blk_queue_end_tag (struct request_queue * q, struct request * rq); Arguments q the request queue for the device rq the request that has completed Description Typically called when end_that_request_first returns 0, meaning all transfers have been done for a request. Its impor- tant to call this function before end_that_request_last, as that will put the request back on the free list thus corrupting the internal tag list. Notes queue lock must be held. 14.104 blk_queue_start_tag blk_queue_start_tag nd a free tag and assign it Synopsis int blk_queue_start_tag (struct request_queue * q, struct request * rq); Arguments q the request queue for the device rq the block request that needs tagging The Linux Kernel API 288 / 317 Description This can either be used as a stand-alone helper, or possibly be assigned as the queue prep_rq_fn (in which case struct request automagically gets a tag assigned). Note that this function assumes that any type of request can be queued! if this is not true for your device, you must check the request type before calling this function. The request will also be removed from the request queue, so its the drivers responsibility to readd it if it should need to be restarted for some reason. Notes queue lock must be held. 14.105 blk_queue_invalidate_tags blk_queue_invalidate_tags invalidate all pending tags Synopsis void blk_queue_invalidate_tags (struct request_queue * q); Arguments q the request queue for the device Description Hardware conditions may dictate a need to stop all pending requests. In this case, we will safely clear the block side of the tag queue and readd all requests to the request queue in the right order. Notes queue lock must be held. 14.106 __blk_queue_free_tags __blk_queue_free_tags release tag maintenance info Synopsis void __blk_queue_free_tags (struct request_queue * q); Arguments q the request queue for the device Notes blk_cleanup_queue will take care of calling this function, if tagging has been used. So theres no need to call this directly. The Linux Kernel API 289 / 317 14.107 blk_rq_count_integrity_sg blk_rq_count_integrity_sg Count number of integrity scatterlist elements Synopsis int blk_rq_count_integrity_sg (struct request_queue * q, struct bio * bio); Arguments q request queue bio bio with integrity metadata attached Description Returns the number of elements required in a scatterlist corresponding to the integrity metadata in a bio. 14.108 blk_rq_map_integrity_sg blk_rq_map_integrity_sg Map integrity metadata into a scatterlist Synopsis int blk_rq_map_integrity_sg (struct request_queue * q, struct bio * bio, struct scatterlist * sglist); Arguments q request queue bio bio with integrity metadata attached sglist target scatterlist Description Map the integrity vectors in request into a scatterlist. The scatterlist must be big enough to hold all elements. I.e. sized using blk_rq_count_integrity_sg. 14.109 blk_integrity_compare blk_integrity_compare Compare integrity prole of two disks Synopsis int blk_integrity_compare (struct gendisk * gd1, struct gendisk * gd2); The Linux Kernel API 290 / 317 Arguments gd1 Disk to compare gd2 Disk to compare Description Meta-devices like DM and MD need to verify that all sub-devices use the same integrity format before advertising to upper layers that they can send/receive integrity metadata. This function can be used to check whether two gendisk devices have compatible integrity formats. 14.110 blk_integrity_register blk_integrity_register Register a gendisk as being integrity-capable Synopsis int blk_integrity_register (struct gendisk * disk, struct blk_integrity * template); Arguments disk struct gendisk pointer to make integrity-aware template optional integrity prole to register Description When a device needs to advertise itself as being able to send/receive integrity metadata it must use this function to register the capability with the block layer. The template is a blk_integrity struct with values appropriate for the underlying hardware. If template is NULL the new prole is allocated but not lled out. See Documentation/block/data-integrity.txt. 14.111 blk_integrity_unregister blk_integrity_unregister Remove block integrity prole Synopsis void blk_integrity_unregister (struct gendisk * disk); Arguments disk disk whose integrity prole to deallocate Description This function frees all memory used by the block integrity prole. To be called at device teardown. The Linux Kernel API 291 / 317 14.112 blk_trace_ioctl blk_trace_ioctl handle the ioctls associated with tracing Synopsis int blk_trace_ioctl (struct block_device * bdev, unsigned cmd, char __user * arg); Arguments bdev the block device cmd the ioctl cmd arg the argument data, if any 14.113 blk_trace_shutdown blk_trace_shutdown stop and cleanup trace structures Synopsis void blk_trace_shutdown (struct request_queue * q); Arguments q the request queue associated with the device 14.114 blk_add_trace_rq blk_add_trace_rq Add a trace for a request oriented action Synopsis void blk_add_trace_rq (struct request_queue * q, struct request * rq, unsigned int nr_bytes, u32 what); Arguments q queue the io is for rq the source request nr_bytes number of completed bytes what the action The Linux Kernel API 292 / 317 Description Records an action against a request. Will log the bio offset + size. 14.115 blk_add_trace_bio blk_add_trace_bio Add a trace for a bio oriented action Synopsis void blk_add_trace_bio (struct request_queue * q, struct bio * bio, u32 what, int error); Arguments q queue the io is for bio the source bio what the action error error, if any Description Records an action against a bio. Will log the bio offset + size. 14.116 blk_add_trace_bio_remap blk_add_trace_bio_remap Add a trace for a bio-remap operation Synopsis void blk_add_trace_bio_remap (void * ignore, struct request_queue * q, struct bio * bio, dev_t dev, sector_t from); Arguments ignore trace callback data parameter (not used) q queue the io is for bio the source bio dev target device from source sector Description Device mapper or raid target sometimes need to split a bio because it spans a stripe (or similar). Add a trace for that action. The Linux Kernel API 293 / 317 14.117 blk_add_trace_rq_remap blk_add_trace_rq_remap Add a trace for a request-remap operation Synopsis void blk_add_trace_rq_remap (void * ignore, struct request_queue * q, struct request * rq, dev_t dev, sector_t from); Arguments ignore trace callback data parameter (not used) q queue the io is for rq the source request dev target device from source sector Description Device mapper remaps request to other devices. Add a trace for that action. 14.118 blk_mangle_minor blk_mangle_minor scatter minor numbers apart Synopsis int blk_mangle_minor (int minor); Arguments minor minor number to mangle Description Scatter consecutively allocated minor number apart if MANGLE_DEVT is enabled. Mangling twice gives the original value. RETURNS Mangled value. CONTEXT Dont care. The Linux Kernel API 294 / 317 14.119 blk_alloc_devt blk_alloc_devt allocate a dev_t for a partition Synopsis int blk_alloc_devt (struct hd_struct * part, dev_t * devt); Arguments part partition to allocate dev_t for devt out parameter for resulting dev_t Description Allocate a dev_t for block device. RETURNS 0 on success, allocated dev_t is returned in *devt. -errno on failure. CONTEXT Might sleep. 14.120 blk_free_devt blk_free_devt free a dev_t Synopsis void blk_free_devt (dev_t devt); Arguments devt dev_t to free Description Free devt which was allocated using blk_alloc_devt. CONTEXT Might sleep. The Linux Kernel API 295 / 317 14.121 disk_replace_part_tbl disk_replace_part_tbl replace disk->part_tbl in RCU-safe way Synopsis void disk_replace_part_tbl (struct gendisk * disk, struct disk_part_tbl * new_ptbl); Arguments disk disk to replace part_tbl for new_ptbl new part_tbl to install Description Replace disk->part_tbl with new_ptbl in RCU-safe way. The original ptbl is freed using RCU callback. LOCKING Matching bd_mutx locked. 14.122 disk_expand_part_tbl disk_expand_part_tbl expand disk->part_tbl Synopsis int disk_expand_part_tbl (struct gendisk * disk, int partno); Arguments disk disk to expand part_tbl for partno expand such that this partno can t in Description Expand disk->part_tbl such that partno can t in. disk->part_tbl uses RCU to allow unlocked dereferencing for stats and other stuff. LOCKING Matching bd_mutex locked, might sleep. RETURNS 0 on success, -errno on failure. The Linux Kernel API 296 / 317 14.123 disk_block_events disk_block_events block and ush disk event checking Synopsis void disk_block_events (struct gendisk * disk); Arguments disk disk to block events for Description On return from this function, it is guaranteed that event checking isnt in progress and wont happen until unblocked by disk_ unblock_events. Events blocking is counted and the actual unblocking happens after the matching number of unblocks are done. Note that this intentionally does not block event checking from disk_clear_events. CONTEXT Might sleep. 14.124 disk_unblock_events disk_unblock_events unblock disk event checking Synopsis void disk_unblock_events (struct gendisk * disk); Arguments disk disk to unblock events for Description Undo disk_block_events. When the block count reaches zero, it starts events polling if congured. CONTEXT Dont care. Safe to call from irq context. 14.125 disk_ush_events disk_ush_events schedule immediate event checking and ushing The Linux Kernel API 297 / 317 Synopsis void disk_ush_events (struct gendisk * disk, unsigned int mask); Arguments disk disk to check and ush events for mask events to ush Description Schedule immediate event checking on disk if not blocked. Events in mask are scheduled to be cleared from the driver. Note that this doesnt clear the events from disk->ev. CONTEXT If mask is non-zero must be called with bdev->bd_mutex held. 14.126 disk_clear_events disk_clear_events synchronously check, clear and return pending events Synopsis unsigned int disk_clear_events (struct gendisk * disk, unsigned int mask); Arguments disk disk to fetch and clear events from mask mask of events to be fetched and clearted Description Disk events are synchronously checked and pending events in mask are cleared and returned. This ignores the block count. CONTEXT Might sleep. 14.127 disk_get_part disk_get_part get partition Synopsis struct hd_struct * disk_get_part (struct gendisk * disk, int partno); The Linux Kernel API 298 / 317 Arguments disk disk to look partition from partno partition number Description Look for partition partno from disk. If found, increment reference count and return it. CONTEXT Dont care. RETURNS Pointer to the found partition on success, NULL if not found. 14.128 disk_part_iter_init disk_part_iter_init initialize partition iterator Synopsis void disk_part_iter_init (struct disk_part_iter * piter, struct gendisk * disk, unsigned int ags); Arguments piter iterator to initialize disk disk to iterate over flags DISK_PITER_* ags Description Initialize piter so that it iterates over partitions of disk. CONTEXT Dont care. 14.129 disk_part_iter_next disk_part_iter_next proceed iterator to the next partition and return it The Linux Kernel API 299 / 317 Synopsis struct hd_struct * disk_part_iter_next (struct disk_part_iter * piter); Arguments piter iterator of interest Description Proceed piter to the next partition and return it. CONTEXT Dont care. 14.130 disk_part_iter_exit disk_part_iter_exit nish up partition iteration Synopsis void disk_part_iter_exit (struct disk_part_iter * piter); Arguments piter iter of interest Description Called when iteration is over. Cleans up piter. CONTEXT Dont care. 14.131 disk_map_sector_rcu disk_map_sector_rcu map sector to partition Synopsis struct hd_struct * disk_map_sector_rcu (struct gendisk * disk, sector_t sector); The Linux Kernel API 300 / 317 Arguments disk gendisk of interest sector sector to map Description Find out which partition sector maps to on disk. This is primarily used for stats accounting. CONTEXT RCU read locked. The returned partition pointer is valid only while preemption is disabled. RETURNS Found partition on success, part0 is returned if no partition matches 14.132 register_blkdev register_blkdev register a new block device Synopsis int register_blkdev (unsigned int major, const char * name); Arguments major the requested major device number [1..255]. If major=0, try to allocate any unused major number. name the name of the new block device as a zero terminated string Description The name must be unique within the system. The return value depends on the major input parameter. - if a major device number was requested in range [1..255] then the function returns zero on success, or a negative error code - if any unused major number was requested with major=0 parameter then the return value is the allocated major number in range [1..255] or a negative error code otherwise 14.133 add_disk add_disk add partitioning information to kernel list Synopsis void add_disk (struct gendisk * disk); The Linux Kernel API 301 / 317 Arguments disk per-device partitioning information Description This function registers the partitioning information in disk with the kernel. FIXME error handling 14.134 get_gendisk get_gendisk get partitioning information for a given device Synopsis struct gendisk * get_gendisk (dev_t devt, int * partno); Arguments devt device to get partitioning information for partno returned partition index Description This function gets the structure containing partitioning information for the given device devt. 14.135 bdget_disk bdget_disk do bdget by gendisk and partition number Synopsis struct block_device * bdget_disk (struct gendisk * disk, int partno); Arguments disk gendisk of interest partno partition number Description Find partition partno from disk, do bdget on it. The Linux Kernel API 302 / 317 CONTEXT Dont care. RETURNS Resulting block_device on success, NULL on failure. The Linux Kernel API 303 / 317 Chapter 15 Char devices 15.1 register_chrdev_region register_chrdev_region register a range of device numbers Synopsis int register_chrdev_region (dev_t from, unsigned count, const char * name); Arguments from the rst in the desired range of device numbers; must include the major number. count the number of consecutive device numbers required name the name of the device or driver. Description Return value is zero on success, a negative error code on failure. 15.2 alloc_chrdev_region alloc_chrdev_region register a range of char device numbers Synopsis int alloc_chrdev_region (dev_t * dev, unsigned baseminor, unsigned count, const char * name); Arguments dev output parameter for rst assigned number baseminor rst of the requested range of minor numbers count the number of minor numbers required name the name of the associated device or driver The Linux Kernel API 304 / 317 Description Allocates a range of char device numbers. The major number will be chosen dynamically, and returned (along with the rst minor number) in dev. Returns zero or a negative error code. 15.3 __register_chrdev __register_chrdev create and register a cdev occupying a range of minors Synopsis int __register_chrdev (unsigned int major, unsigned int baseminor, unsigned int count, const char * name, const struct le_operations * fops); Arguments major major device number or 0 for dynamic allocation baseminor rst of the requested range of minor numbers count the number of minor numbers required name name of this range of devices fops le operations associated with this devices Description If major == 0 this functions will dynamically allocate a major and return its number. If major > 0 this function will attempt to reserve a device with the given major number and will return zero on success. Returns a -ve errno on failure. The name of this device has nothing to do with the name of the device in /dev. It only helps to keep track of the different owners of devices. If your module name has only one type of devices its ok to use e.g. the name of the module here. 15.4 unregister_chrdev_region unregister_chrdev_region return a range of device numbers Synopsis void unregister_chrdev_region (dev_t from, unsigned count); Arguments from the rst in the range of numbers to unregister count the number of device numbers to unregister The Linux Kernel API 305 / 317 Description This function will unregister a range of count device numbers, starting with from. The caller should normally be the one who allocated those numbers in the rst place... 15.5 __unregister_chrdev __unregister_chrdev unregister and destroy a cdev Synopsis void __unregister_chrdev (unsigned int major, unsigned int baseminor, unsigned int count, const char * name); Arguments major major device number baseminor rst of the range of minor numbers count the number of minor numbers this cdev is occupying name name of this range of devices Description Unregister and destroy the cdev occupying the region described by major, baseminor and count. This function undoes what __register_chrdev did. 15.6 cdev_add cdev_add add a char device to the system Synopsis int cdev_add (struct cdev * p, dev_t dev, unsigned count); Arguments p the cdev structure for the device dev the rst device number for which this device is responsible count the number of consecutive minor numbers corresponding to this device Description cdev_add adds the device represented by p to the system, making it live immediately. A negative error code is returned on failure. The Linux Kernel API 306 / 317 15.7 cdev_del cdev_del remove a cdev from the system Synopsis void cdev_del (struct cdev * p); Arguments p the cdev structure to be removed Description cdev_del removes p from the system, possibly freeing the structure itself. 15.8 cdev_alloc cdev_alloc allocate a cdev structure Synopsis struct cdev * cdev_alloc ( void); Arguments void no arguments Description Allocates and returns a cdev structure, or NULL on failure. 15.9 cdev_init cdev_init initialize a cdev structure Synopsis void cdev_init (struct cdev * cdev, const struct le_operations * fops); Arguments cdev the structure to initialize fops the le_operations for this device Description Initializes cdev, remembering fops, making it ready to add to the system with cdev_add. The Linux Kernel API 307 / 317 Chapter 16 Miscellaneous Devices 16.1 misc_register misc_register register a miscellaneous device Synopsis int misc_register (struct miscdevice * misc); Arguments misc device structure Register a miscellaneous device with the kernel. If the minor number is set to MISC_DYNAMIC_MINOR a minor number is assigned and placed in the minor eld of the structure. For other cases the minor number requested is used. Description The structure passed is linked into the kernel and may not be destroyed until it has been unregistered. A zero is returned on success and a negative errno code for failure. 16.2 misc_deregister misc_deregister unregister a miscellaneous device Synopsis int misc_deregister (struct miscdevice * misc); Arguments misc device to unregister The Linux Kernel API 308 / 317 Description Unregister a miscellaneous device that was previously successfully registered with misc_register. Success is indicated by a zero return, a negative errno code indicates an error. The Linux Kernel API 309 / 317 Chapter 17 Clock Framework The clock framework denes programming interfaces to support software management of the system clock tree. This framework is widely used with System-On-Chip (SOC) platforms to support power management and various devices which may need custom clock rates. Note that these "clocks" dont relate to timekeeping or real time clocks (RTCs), each of which have separate frameworks. These struct clk instances may be used to manage for example a 96 MHz signal that is used to shift bits into and out of peripherals or busses, or otherwise trigger synchronous state machine transitions in system hardware. Power management is supported by explicit software clock gating: unused clocks are disabled, so the system doesnt waste power changing the state of transistors that arent in active use. On some systems this may be backed by hardware clock gating, where clocks are gated without being disabled in software. Sections of chips that are powered but not clocked may be able to retain their last state. This low power state is often called a retention mode. This mode still incurs leakage currents, especially with ner circuit geometries, but for CMOS circuits power is mostly used by clocked state changes. Power-aware drivers only enable their clocks when the device they manage is in active use. Also, system sleep states often differ according to which clock domains are active: while a "standby" state may allow wakeup from several active domains, a "mem" (suspend-to-RAM) state may require a more wholesale shutdown of clocks derived from higher speed PLLs and oscillators, limiting the number of possible wakeup event sources. A drivers suspend method may need to be aware of system-specic clock constraints on the target sleep state. Some platforms support programmable clock generators. These can be used by external chips of various kinds, such as other CPUs, multimedia codecs, and devices with strict requirements for interface clocking. 17.1 struct clk_notier struct clk_notier associate a clk with a notier Synopsis struct clk_notifier { struct clk * clk; struct srcu_notifier_head notifier_head; struct list_head node; }; Members clk struct clk * to associate the notier with notier_head a blocking_notier_head for this clk node linked list pointers The Linux Kernel API 310 / 317 Description A list of struct clk_notier is maintained by the notier code. An entry is created whenever code registers the rst notier on a particular clk. Future notiers on that clk are added to the notifier_head. 17.2 struct clk_notier_data struct clk_notier_data rate data to pass to the notier callback Synopsis struct clk_notifier_data { struct clk * clk; unsigned long old_rate; unsigned long new_rate; }; Members clk struct clk * being changed old_rate previous rate of this clk new_rate new rate of this clk Description For a pre-notier, old_rate is the clks rate before this rate change, and new_rate is what the rate will be in the future. For a post-notier, old_rate and new_rate are both set to the clks current rate (this was done to optimize the implementation). 17.3 clk_notier_register clk_notier_register change notier callback Synopsis int clk_notier_register (struct clk * clk, struct notier_block * nb); Arguments clk clock whose rate we are interested in nb notier block with callback function pointer ProTip debugging across notier chains can be frustrating. Make sure that your notier callback function prints a nice big warning in case of failure. The Linux Kernel API 311 / 317 17.4 clk_notier_unregister clk_notier_unregister change notier callback Synopsis int clk_notier_unregister (struct clk * clk, struct notier_block * nb); Arguments clk clock whose rate we are no longer interested in nb notier block which will be unregistered 17.5 clk_get_accuracy clk_get_accuracy obtain the clock accuracy in ppb (parts per billion) for a clock source. Synopsis long clk_get_accuracy (struct clk * clk); Arguments clk clock source Description This gets the clock source accuracy expressed in ppb. A perfect clock returns 0. 17.6 clk_prepare clk_prepare prepare a clock source Synopsis int clk_prepare (struct clk * clk); Arguments clk clock source Description This prepares the clock source for use. Must not be called from within atomic context. The Linux Kernel API 312 / 317 17.7 clk_unprepare clk_unprepare undo preparation of a clock source Synopsis void clk_unprepare (struct clk * clk); Arguments clk clock source Description This undoes a previously prepared clock. The caller must balance the number of prepare and unprepare calls. Must not be called from within atomic context. 17.8 clk_get clk_get lookup and obtain a reference to a clock producer. Synopsis struct clk * clk_get (struct device * dev, const char * id); Arguments dev device for clock consumer id clock consumer ID Description Returns a struct clk corresponding to the clock producer, or valid IS_ERR condition containing errno. The implementation uses dev and id to determine the clock consumer, and thereby the clock producer. (IOW, id may be identical strings, but clk_get may return different clock producers depending on dev.) Drivers must assume that the clock source is not enabled. clk_get should not be called from within interrupt context. 17.9 devm_clk_get devm_clk_get lookup and obtain a managed reference to a clock producer. Synopsis struct clk * devm_clk_get (struct device * dev, const char * id); The Linux Kernel API 313 / 317 Arguments dev device for clock consumer id clock consumer ID Description Returns a struct clk corresponding to the clock producer, or valid IS_ERR condition containing errno. The implementation uses dev and id to determine the clock consumer, and thereby the clock producer. (IOW, id may be identical strings, but clk_get may return different clock producers depending on dev.) Drivers must assume that the clock source is not enabled. devm_clk_get should not be called from within interrupt context. The clock will automatically be freed when the device is unbound from the bus. 17.10 clk_enable clk_enable inform the system when the clock source should be running. Synopsis int clk_enable (struct clk * clk); Arguments clk clock source Description If the clock can not be enabled/disabled, this should return success. May be called from atomic contexts. Returns success (0) or negative errno. 17.11 clk_disable clk_disable inform the system when the clock source is no longer required. Synopsis void clk_disable (struct clk * clk); Arguments clk clock source The Linux Kernel API 314 / 317 Description Inform the system that a clock source is no longer required by a driver and may be shut down. May be called from atomic contexts. Implementation detail if the clock source is shared between multiple drivers, clk_enable calls must be balanced by the same number of clk_dis able calls for the clock source to be disabled. 17.12 clk_get_rate clk_get_rate obtain the current clock rate (in Hz) for a clock source. This is only valid once the clock source has been enabled. Synopsis unsigned long clk_get_rate (struct clk * clk); Arguments clk clock source 17.13 clk_put clk_put "free" the clock source Synopsis void clk_put (struct clk * clk); Arguments clk clock source Note drivers must ensure that all clk_enable calls made on this clock source are balanced by clk_disable calls prior to calling this function. clk_put should not be called from within interrupt context. 17.14 devm_clk_put devm_clk_put "free" a managed clock source The Linux Kernel API 315 / 317 Synopsis void devm_clk_put (struct device * dev, struct clk * clk); Arguments dev device used to acuqire the clock clk clock source acquired with devm_clk_get Note drivers must ensure that all clk_enable calls made on this clock source are balanced by clk_disable calls prior to calling this function. clk_put should not be called from within interrupt context. 17.15 clk_round_rate clk_round_rate adjust a rate to the exact rate a clock can provide Synopsis long clk_round_rate (struct clk * clk, unsigned long rate); Arguments clk clock source rate desired clock rate in Hz Description Returns rounded clock rate in Hz, or negative errno. 17.16 clk_set_rate clk_set_rate set the clock rate for a clock source Synopsis int clk_set_rate (struct clk * clk, unsigned long rate); Arguments clk clock source rate desired clock rate in Hz The Linux Kernel API 316 / 317 Description Returns success (0) or negative errno. 17.17 clk_set_parent clk_set_parent set the parent clock source for this clock Synopsis int clk_set_parent (struct clk * clk, struct clk * parent); Arguments clk clock source parent parent clock source Description Returns success (0) or negative errno. 17.18 clk_get_parent clk_get_parent get the parent clock source for this clock Synopsis struct clk * clk_get_parent (struct clk * clk); Arguments clk clock source Description Returns struct clk corresponding to parent clock source, or valid IS_ERR condition containing errno. 17.19 clk_get_sys clk_get_sys get a clock based upon the device name Synopsis struct clk * clk_get_sys (const char * dev_id, const char * con_id); The Linux Kernel API 317 / 317 Arguments dev_id device name con_id connection ID Description Returns a struct clk corresponding to the clock producer, or valid IS_ERR condition containing errno. The implementation uses dev_id and con_id to determine the clock consumer, and thereby the clock producer. In contrast to clk_get this function takes the device name instead of the device itself for identication. Drivers must assume that the clock source is not enabled. clk_get_sys should not be called from within interrupt context. 17.20 clk_add_alias clk_add_alias add a new clock alias Synopsis int clk_add_alias (const char * alias, const char * alias_dev_name, char * id, struct device * dev); Arguments alias name for clock alias alias_dev_name device name id platform specic clock name dev device Description Allows using generic clock names for drivers by adding a new alias. Assumes clkdev, see clkdev.h for more info.