0% found this document useful (0 votes)

148 views28 pages

Io uring-BPF

The document discusses the integration of BPF (Berkeley Packet Filter) with io_uring in Linux to enhance I/O operations by allowing for more efficient request handling and completion processing. It outlines various features of io_uring, addresses misconceptions, and presents new request types and APIs that facilitate BPF usage, including the ability to manage multiple completion queues. Performance tests indicate that using io_uring with BPF can significantly reduce overhead compared to traditional I/O methods.

Uploaded by

naveenkumar3124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views28 pages

Io uring-BPF

Uploaded by

naveenkumar3124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

io_uring: BPF controlled I/O

Linux Plumbers 2021

Pavel Begunkov
asml.silence at gmail.com
io_uring: introduction

2
Lots of operations ...
IORING_OP_OPENAT,
IORING_OP_CLOSE,
IORING_OP_FILES_UPDATE,
enum { IORING_OP_STATX,
IORING_OP_NOP, IORING_OP_READ,
IORING_OP_READV, IORING_OP_WRITE,
IORING_OP_WRITEV, IORING_OP_FADVISE,
IORING_OP_FSYNC, IORING_OP_MADVISE,
IORING_OP_READ_FIXED, IORING_OP_SEND,
IORING_OP_WRITE_FIXED, IORING_OP_RECV,
IORING_OP_POLL_ADD, IORING_OP_OPENAT2,
IORING_OP_POLL_REMOVE, IORING_OP_EPOLL_CTL,
IORING_OP_SYNC_FILE_RANGE, IORING_OP_SPLICE,
IORING_OP_SENDMSG, IORING_OP_PROVIDE_BUFFERS,
IORING_OP_RECVMSG, IORING_OP_REMOVE_BUFFERS,
IORING_OP_TIMEOUT, IORING_OP_TEE,
IORING_OP_TIMEOUT_REMOVE, IORING_OP_SHUTDOWN,
IORING_OP_ACCEPT, IORING_OP_RENAMEAT,
IORING_OP_ASYNC_CANCEL, IORING_OP_UNLINKAT,
IORING_OP_LINK_TIMEOUT, IORING_OP_MKDIRAT,
IORING_OP_CONNECT, IORING_OP_SYMLINKAT,
IORING_OP_FALLOCATE, IORING_OP_LINKAT,
... };
3
Features
• SQPOLL for syscall-less submission
• IOPOLL for beating performance records
• Registered resources with fast updates
- IORING_REGISTER_FILES: optimised file refcounting
- IORING_REGISTER_BUFFERS: eliminates page refcounting, no page table walking, etc.
- dynamic fast updates: no more full io_uring quiesce
• IOSQE_IO_LINK: request links for execution ordering
• IORING_FEAT_FAST_POLL: automatic poll fallback, no need for epoll
• IO-WQ: internal thread pool, when nothing else works
• multi-shot requests, e.g. poll generating multiple CQEs
• executors (IO-WQ, SQPOLL) sharing
• and more ...
4
Execution flow
First try nowait: IOCB_NOWAIT, LOOKUP_CACHED, etc.
• might just complete, e.g. if data is already there
• O_DIRECT goes async, -EIOCBQUEUED
• added to a waitqueue, e.g. poll requests

Try async buffered read, see FMODE_BUF_RASYNC

Internally try polling if supported, see IORING_FEAT_FAST_POLL

• once fires, goto nowait attempts again

Any other way to go genuinely async; will be more in the future

Fall back to a thread pool, slower but often necessary

5
Misconception debunking
io_uring is not "just a worker pool"
• worker threads is a slower path

io_uring is not I/O Completion Ports (IOCP)

• ... Microsoft is now developing a io_uring for Windows

io_uring is not only about syscall elimination/reduction

• provides asynchrony
• easy parallelism
• provides a state to base optimisations on, e.g. registerested files

6
The problem

By Natascha Eibl - https://meltdownattack.com/, CC0,

https://commons.wikimedia.org/w/index.php?curid=65233480
https://commons.wikimedia.org/w/index.php?curid=65235937

7
syscall overhead
Vulnerability mitigations are expensive, and so are syscalls
• cost varies with CPU and enabled mitigations

Overhead for syscalls in a tight loop with little work can take 20-50%
(apparently, tested CPU is the worst case)

# copy by 4KB at a time

# cp_4kb ./file /dev/zero

29.47% busybox [kernel.vmlinux] [k] syscall_exit_to_user_mode

12.68% busybox [kernel.vmlinux] [k] entry_SYSCALL_64
12.49% busybox [kernel.vmlinux] [k] syscall_return_via_sysret
0.52% busybox [kernel.vmlinux] [k] do_syscall_64
...

8
# mitigations enabled
# nop requests, batch 32
# fio/t/io_uring -d32 -s32 -c32 -N1

16.41% io_uring [kernel.vmlinux] [k] io_submit_sqe

14.78% io_uring [kernel.vmlinux] [k] syscall_exit_to_user_mode
10.70% io_uring [kernel.vmlinux] [k] __io_submit_flush_completions
10.17% io_uring [kernel.vmlinux] [k] io_submit_sqes
9.78% io_uring [kernel.vmlinux] [k] io_issue_sqe
7.61% io_uring [kernel.vmlinux] [k] __io_queue_sqe
7.28% io_uring [kernel.vmlinux] [k] io_req_free_batch
5.07% io_uring [kernel.vmlinux] [k] entry_SYSCALL_64
4.79% io_uring [kernel.vmlinux] [k] syscall_return_via_sysret
4.29% io_uring io_uring [.] submitter_fn
2.75% io_uring [kernel.vmlinux] [k] io_alloc_req
...

9
# mitigations enabled
# Null block device, “realistic batching” 4 requests at a time
# modprobe null_blk no_sched=1 irqmode=1 completion_nsec=0 submit_queues=16
# fio/t/io_uring -d4 -s4 -c4 -p1 -B1 -F1 -b512 /dev/nullb0

9.01% io_uring [kernel.vmlinux] [k] syscall_exit_to_user_mode

4.87% io_uring [kernel.vmlinux] [k] blkdev_direct_IO
3.27% io_uring [kernel.vmlinux] [k] entry_SYSCALL_64
2.92% io_uring [kernel.vmlinux] [k] syscall_return_via_sysret
2.89% io_uring [kernel.vmlinux] [k] kmem_cache_free
2.74% io_uring [null_blk] [k] null_queue_rq
2.68% io_uring io_uring [.] submitter_fn
2.31% io_uring [kernel.vmlinux] [k] blkdev_bio_end_io
2.27% io_uring [kernel.vmlinux] [k] io_issue_sqe
2.19% io_uring [kernel.vmlinux] [k] io_do_iopoll
2.12% io_uring [kernel.vmlinux] [k] kmem_cache_alloc
...

10
Sweet spot for optimisation. How about SQPOLL?
• still needs userspace to process completions
• takes a CPU core; high CPU consumption
• cache bouncing

BPF is there to help! Can also help latency

11
Requirements
Flexibility: what capabilities BPF has to have?
• submitting new requests
• accessing CQEs, multiple if needed
• poking into userspace memory

Low overhead
• Traditionally we’ve optimised batched submission more
• BPF is expected to have a lower batch ratio

12
struct io_uring_sqe {
...
u32 callback_id;
Idea 1: let’s add a callback to each };

request and run it on completion int io_init_req(struct io_uring_sqe *sqe)

{
if (sqe->callback_id)
• needs hooks in generic paths, non-zero cost req->bpf_cb = get_bpf(sqe->callback_id);
...
• limits control over execution context
}
• can’t do waiting and other async stuff
• BPF needs context, would need allocation void io_req_complete(struct io_kiocb *req, long res)
{a
• looks horrible ... if (req->callback)
req->bpf_cb(req, res);
...
}

13
New io_uring request type: IORING_OP_BPF
No extra per request overhead, everything is enclosed in opcode handlers.
And we can use generic io_uring infrastructure:
• locking and better control of execution context
• completion and other batching
• space in the internal request struct, i.e. struct io_kiocb
• can be linked to other requests
• possible to execute multiple times, i.e. keeping a BPF request alive

The downside is that extra requests are not free, there is a cost to
that, but we can work with it.

14
Feeding BPF completions
BPF needs feedback from other
requests.
The first idea: just use links and
pass a CQE of the previous request
to BPF!
• ugly again
• bound to linking by design
• no way to pass multiple CQEs
• extra overhead for non-BPF code

15
Multiple CQs
Introduce multiple CQs:
• sqe->cq_idx, each request
specifies to which CQ its
completion goes
• BPF can emit and consume CQEs
to / from any CQ
• Can wait
• Synchronisation is up to the
userspace / BPF

16
Pros:
• Can pass multiple CQEs
• CQs can be waited on (including by BPF)
• Extra way of communication:
posting to a CQ

Example:
Each BPF request has its own CQ. It keeps a
number of operations in-flight and posts to
the main CQ when it’s done with the job.

17
What about poking into the normal userspace memory?
BPF subsystem already has an answer: sleepable BPF programs

It does what it sounds like, allows BPF programs to sleep.

• reading userspace memory is already there
• writing is trivial to add
• a big deal for io_uring as submission might need to sleep
• bpf_copy_[from,to]_user() + io_uring performance is yet to be measured

There are also BPF maps / arrays and other infrastructure provided by BPF
• not everything is supported with sleepable programs, may get lifted (if not already)

18
Overhead
There can be O(N) BPF requests, important to keep overhead low
A lot of work has been done! Highlights:
• persistent submission state, request caching
• infrastructure around task_work and execution batching
• task_struct referencing and other overhead amortisation
• removing request refcounting
• completion batching
• native io-wq workers (planned to use)
• upcoming IOSQE_CQE_SKIP_SUCCESS
• just cutting the number of instructions required per request ...

QD1 should be in a good shape as well …

… apart from syscalling and __do_sys_io_uring_enter
19
API: program registration
API is not set in stone yet, can and will change

enum {
...
IORING_REGISTER_BPF,
IORING_UNREGISTER_BPF,
};

int bpf_prog_fds[NR_PROGS] = {...};

// BPF registration can be made optional
ret = __sys_io_uring_register(ring->ring_fd, IORING_REGISTER_BPF, bpf_prog_fds, NR_PROGS);

// unregister programs, inflicts full quiesce

ret = __sys_io_uring_register(ring->ring_fd, IORING_UNREGISTER_BPF, 0, 0);
// or cleaned up automatically on ring exit

20
API: BPF request

enum {
...
IORING_OP_BPF,
};

struct io_uring_sqe *sqe = ...;

memset(sqe, 0, sizeof(sqe));
sqe->opcode = IORING_OP_BPF;
sqe->off = bpf_program_idx;
// generic, for all request types
sqe->user_data = (u64)data_ptr; // returned back in CQE. Also, BPF has access to its user_data
sqe->cq_idx = completion_queue_idx; // CQ index to post CQE to
sqe->flags = sqe_flags; // combination of IOSQE_*, as usual

21
API: BPF definitions
enum { // Return values for io_uring BPF programs
IORING_BPF_OK = 0, // complete request
IORING_BPF_WAIT, // wait on CQ for completions
};

struct io_uring_bpf_ctx { // BPF io_uring context

__u64 user_data; // sqe->user_data specified at submission
__u32 wait_nr; // number of requests to wait for
__u32 wait_idx; // CQ index to wait on
};

// Returns the number of submitted requests or a negative error if failed.

long (*bpf_io_uring_submit)(void *ctx, void *sqe, __u32 size);
// Returns 0 on success, -ENOMEM if the CQE has been dropped.
long (*bpf_io_uring_emit_cqe)(void *ctx, __u32 cq_idx, __u64 user_data, __s32 res, __u32 cflags);
// Returns 0 on success, -ENOENT if there are no CQEs in the CQ.
long (*bpf_io_uring_reap_cqe)(void *ctx, __u32 cq_idx, struct io_uring_cqe *cqe, __u32 size);

22
API: libbpf example
SEC("iouring") // io_uring BPF program
int bpf_program_name(struct io_uring_bpf_ctx *ctx) {
struct io_uring_cqe cqe;
ret = bpf_io_uring_reap_cqe(ctx, cq_idx, &cqe, sizeof(cqe));

struct io_uring_sqe sqe;

io_uring_prep_nop(&sqe); // helper copy-pasted from liburing
sqe.user_data = 42;
ret = bpf_io_uring_submit(ctx, &sqe, sizeof(sqe));

u64 data, uptr = (u64 )ctx->user_data;

bpf_copy_from_user(&data, sizeof(data), uptr);

if (exit) return IORING_BPF_OK;

ctx->wait_idx = cq_idx_to_wait;
ctx->wait_nr = nr_cqes_to_wait;
return IORING_BPF_WAIT; // wait for @nr_cqes_to_wait CQEs in @cq_idx_to_wait CQ
}
23
API: ideas?

• make BPF registration optional

• extra data to pass in SQE, e.g. maps or shared memory
• more convenient bpf_copy_[from,to]_user(), e.g. plain pointers
• other synchronisation, e.g. futex
• batched version of bpf_io_uring_submit()
• anything missing?

24
Testing
Not yet conclusive. Test case:
• Copy a file by 4KB at a time into /dev/zero, buffered and fully cached
Mitigations Test case Time (ms)

ON read(2)/write(2) 1350

ON io_uring, simple QD=1 1630

any io_uring + BPF 810

OFF read(2)/write(2) 550

However, let’s take another CPU:

Mitigations Test case Time (ms)

ON read(2)/write(2) 1320

any io_uring + BPF 1250

25
Applicability
Applicability: shouldn’t be of interest if batching is naturally “high
enough”.
High queue depth is not always possible and/or desirable.
• batching hurts latency
• may care about ordering, e.g. TCP sockets.
• slow devices and memory/responsiveness restrictions

Use cases to try:

• databases / engines, caching systems
• Intelligent file-file splicing, e.g. based on data
• broadcast / collect
• mentioned that may be of use to QUIC
• explore applicability to FUSE
• ideas are welcome
26
Next steps
Need to explore more test cases …
… and more “interesting” tests.

Each new case requires some tuning and optimisation.

Upside: also usually benefits non-BPF io_uring.

Have to solve some slight performance regressions from multi-CQ

• good chance extra CQs will only be visible to BPF

TODO: selftests, bpf_link, API changes

27
Resources
Kernel
https://github.com/isilence/linux.git bpf_v3
Liburing, see <liburing>/examples/bpf/*
https://github.com/isilence/liburing.git bpf_v3

io_uring mailing list

[email protected]
io_uring guide
https://kernel.dk/io_uring.pdf
benchmark, <fio>/t/io_uring.c
git://git.kernel.dk/fio

200-301 Exam - Free Actual Q&as, Page 1 - ExamTopics
100% (4)
200-301 Exam - Free Actual Q&as, Page 1 - ExamTopics
579 pages
Ebpf Good
No ratings yet
Ebpf Good
240 pages
Osbook-V0 56
No ratings yet
Osbook-V0 56
317 pages
EBPF-kernel Tracing With
No ratings yet
EBPF-kernel Tracing With
220 pages
2021-05-24 - IO Rings When One IO Operation Is Not Enough
No ratings yet
2021-05-24 - IO Rings When One IO Operation Is Not Enough
17 pages
BPF Co Re
No ratings yet
BPF Co Re
29 pages
Axboe kr2022
No ratings yet
Axboe kr2022
38 pages
Os Lab Bca
No ratings yet
Os Lab Bca
25 pages
Startup Conf
No ratings yet
Startup Conf
5 pages
Ebpf Componenets
No ratings yet
Ebpf Componenets
30 pages
L5 Kernel Extensions
No ratings yet
L5 Kernel Extensions
23 pages
Ebpfperftools2019 190310010642
No ratings yet
Ebpfperftools2019 190310010642
39 pages
Linuxbpfsuperpowers 160302200247
No ratings yet
Linuxbpfsuperpowers 160302200247
60 pages
Chih-Yen Chang, How I Use A Novel Approach To Exploit A Limited OOB On Ubuntu at Pwn2Own Vancouver 2024
No ratings yet
Chih-Yen Chang, How I Use A Novel Approach To Exploit A Limited OOB On Ubuntu at Pwn2Own Vancouver 2024
126 pages
EBPF Tunnel
No ratings yet
EBPF Tunnel
6 pages
Message Passing - Blocking & Non-Blocking API
No ratings yet
Message Passing - Blocking & Non-Blocking API
10 pages
AOS Experiment-2 and 3
No ratings yet
AOS Experiment-2 and 3
11 pages
Brakmo TCPBPF Talk
No ratings yet
Brakmo TCPBPF Talk
5 pages
An Introduction To The Io - Uring Asynchronous I - O Framework
No ratings yet
An Introduction To The Io - Uring Asynchronous I - O Framework
25 pages
Linux Perf Examples
No ratings yet
Linux Perf Examples
45 pages
Ebpf Offload Getting Started Guide: Corigine CX Smartnic
No ratings yet
Ebpf Offload Getting Started Guide: Corigine CX Smartnic
34 pages
Ebpf
100% (1)
Ebpf
16 pages
PyParallel: How We Removed The GIL and Exploited All Cores
No ratings yet
PyParallel: How We Removed The GIL and Exploited All Cores
153 pages
Tracing Linux Ezannoni Linuxcon Ja 2015 - 0
No ratings yet
Tracing Linux Ezannoni Linuxcon Ja 2015 - 0
42 pages
eBPF Offload To Hardware Cls - BPF and XDP: Nic Viljoen, DXDD (Based On Netdev 1.2 Talk) November 10th 2016
No ratings yet
eBPF Offload To Hardware Cls - BPF and XDP: Nic Viljoen, DXDD (Based On Netdev 1.2 Talk) November 10th 2016
26 pages
Arxiv
No ratings yet
Arxiv
8 pages
Vieira 2020
No ratings yet
Vieira 2020
36 pages
Os Lab
No ratings yet
Os Lab
45 pages
Linux Performance 2018: Brendan Gregg
No ratings yet
Linux Performance 2018: Brendan Gregg
26 pages
Lab2 Process HK232
No ratings yet
Lab2 Process HK232
27 pages
Simple Operating System
No ratings yet
Simple Operating System
19 pages
Operating system: Full name: Trịnh Mạnh Hùng Student id: 1952740
100% (2)
Operating system: Full name: Trịnh Mạnh Hùng Student id: 1952740
11 pages
Ebpf Hardware Offload To Smartnics: Cls BPF and XDP: Keywords
No ratings yet
Ebpf Hardware Offload To Smartnics: Cls BPF and XDP: Keywords
6 pages
Scheduling
No ratings yet
Scheduling
31 pages
Assignment 4
No ratings yet
Assignment 4
10 pages
Beej's Guide To Unix IPC: Brian "Beej Jorgensen" Hall Beej@beej - Us
No ratings yet
Beej's Guide To Unix IPC: Brian "Beej Jorgensen" Hall Beej@beej - Us
56 pages
How To Make Computers Work For You When You Are Enjoying Life
No ratings yet
How To Make Computers Work For You When You Are Enjoying Life
29 pages
The C10K Problem
No ratings yet
The C10K Problem
23 pages
Generative Adversarial Networks Seminar Report
50% (4)
Generative Adversarial Networks Seminar Report
11 pages
Galois Tech Talk: A Scalable Io Manager For GHC
No ratings yet
Galois Tech Talk: A Scalable Io Manager For GHC
22 pages
1stMarch2021eBPF FromaProgrammersPerspective
No ratings yet
1stMarch2021eBPF FromaProgrammersPerspective
13 pages
Beej's Guide To Unix IPC: Brian "Beej Jorgensen" Hall Beej@beej - Us
No ratings yet
Beej's Guide To Unix IPC: Brian "Beej Jorgensen" Hall Beej@beej - Us
51 pages
The Life of A Dirty Page
No ratings yet
The Life of A Dirty Page
24 pages
Beej's Guide To Unix IPC: Brian "Beej Jorgensen" Hall Beej@beej - Us
No ratings yet
Beej's Guide To Unix IPC: Brian "Beej Jorgensen" Hall Beej@beej - Us
48 pages
OS Digital Assignment 2
No ratings yet
OS Digital Assignment 2
17 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
Top Half Bottom Half
No ratings yet
Top Half Bottom Half
13 pages
BPA Guide by LInux Journal
No ratings yet
BPA Guide by LInux Journal
2 pages
Charotar University of Science and Technology
No ratings yet
Charotar University of Science and Technology
39 pages
Kernel Tracing Using eBPF
No ratings yet
Kernel Tracing Using eBPF
31 pages
? What Is eBPF
No ratings yet
? What Is eBPF
5 pages
Queueing in The Linux Network Stack
No ratings yet
Queueing in The Linux Network Stack
5 pages
Introduction To eBPF (For Network Packet Processing) : - Paris, 2017-03-24
No ratings yet
Introduction To eBPF (For Network Packet Processing) : - Paris, 2017-03-24
21 pages
LSF For Users: Mike Page SCD Consulting Services Group
No ratings yet
LSF For Users: Mike Page SCD Consulting Services Group
26 pages
Networking Lab
No ratings yet
Networking Lab
54 pages
A Gentle Introduction To EBPF
No ratings yet
A Gentle Introduction To EBPF
12 pages
18 Ajit Gupta Android Practical
No ratings yet
18 Ajit Gupta Android Practical
122 pages
Chapter 1 - MIS
No ratings yet
Chapter 1 - MIS
62 pages
Unix 8
No ratings yet
Unix 8
24 pages
Tracing Linux Ezannoni Berlin 2016 Final
No ratings yet
Tracing Linux Ezannoni Berlin 2016 Final
33 pages
Process Fault-Tolerance: Semantics, Design and Applications For High Performance Computing
No ratings yet
Process Fault-Tolerance: Semantics, Design and Applications For High Performance Computing
11 pages
Yescool Amoi-C20 User Manual-English
No ratings yet
Yescool Amoi-C20 User Manual-English
30 pages
Ebpf Implementation For Freebsd: Yutaro Hayakawa
No ratings yet
Ebpf Implementation For Freebsd: Yutaro Hayakawa
33 pages
Unreliable Guide To Hacking The Linux Kernel
No ratings yet
Unreliable Guide To Hacking The Linux Kernel
24 pages
Log
No ratings yet
Log
24 pages
Empowering Technology Lesson2
No ratings yet
Empowering Technology Lesson2
54 pages
Power For All - UttarPradesh
No ratings yet
Power For All - UttarPradesh
106 pages
Programming Fundamentals PDF
No ratings yet
Programming Fundamentals PDF
56 pages
Lecture 02 Running EnergyPlus
No ratings yet
Lecture 02 Running EnergyPlus
29 pages
7.6. XSI-X/Open System Interface: IPC
No ratings yet
7.6. XSI-X/Open System Interface: IPC
5 pages
Internship Report Vinoth Kumar
No ratings yet
Internship Report Vinoth Kumar
28 pages
Week8 Tree Worksheets
No ratings yet
Week8 Tree Worksheets
6 pages
Audi A6 f2 Faulty 0009
No ratings yet
Audi A6 f2 Faulty 0009
2 pages
La Dificultad de Escribir Un Ensayo Persuasivo
100% (1)
La Dificultad de Escribir Un Ensayo Persuasivo
7 pages
Citra Log - Txt.old
No ratings yet
Citra Log - Txt.old
6 pages
5th Sem
No ratings yet
5th Sem
1 page
Official - PCPP
No ratings yet
Official - PCPP
12 pages
C++ Programming Task
No ratings yet
C++ Programming Task
6 pages
Bni Iol-712-000-K023 - en - Bni00041
No ratings yet
Bni Iol-712-000-K023 - en - Bni00041
12 pages
WR 1 Q P Memo
No ratings yet
WR 1 Q P Memo
7 pages
Inte 423 Exam Draft
No ratings yet
Inte 423 Exam Draft
3 pages
Mobile Phone Cloning IJERTCONV3IS10043
No ratings yet
Mobile Phone Cloning IJERTCONV3IS10043
5 pages
Types of Software Testing
No ratings yet
Types of Software Testing
10 pages
Excel Cad
No ratings yet
Excel Cad
8 pages
2.1.1.5 Lab - The World Runs On Circuits
No ratings yet
2.1.1.5 Lab - The World Runs On Circuits
3 pages
Setting Up of An Open Source Based Private Cloud
No ratings yet
Setting Up of An Open Source Based Private Cloud
6 pages
Mark-VIe Power Supply Specificationsforprojectcl
No ratings yet
Mark-VIe Power Supply Specificationsforprojectcl
6 pages
FREE Equation Calculator - Equations Solver - Mathematics Software
No ratings yet
FREE Equation Calculator - Equations Solver - Mathematics Software
4 pages
A7ph 206 1
No ratings yet
A7ph 206 1
7 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)

Io uring-BPF

Uploaded by

Io uring-BPF

Uploaded by

io_uring: BPF controlled I/O

Linux Plumbers 2021

Try async buffered read, see FMODE_BUF_RASYNC

Internally try polling if supported, see IORING_FEAT_FAST_POLL

Any other way to go genuinely async; will be more in the future

Fall back to a thread pool, slower but often necessary

io_uring is not I/O Completion Ports (IOCP)

io_uring is not only about syscall elimination/reduction

By Natascha Eibl - https://meltdownattack.com/, CC0,

# copy by 4KB at a time

29.47% busybox [kernel.vmlinux] [k] syscall_exit_to_user_mode

16.41% io_uring [kernel.vmlinux] [k] io_submit_sqe

9.01% io_uring [kernel.vmlinux] [k] syscall_exit_to_user_mode

BPF is there to help! Can also help latency

request and run it on completion int io_init_req(struct io_uring_sqe *sqe)

It does what it sounds like, allows BPF programs to sleep.

QD1 should be in a good shape as well …

int bpf_prog_fds[NR_PROGS] = {...};

// unregister programs, inflicts full quiesce

struct io_uring_sqe *sqe = ...;

struct io_uring_bpf_ctx { // BPF io_uring context

// Returns the number of submitted requests or a negative error if failed.

struct io_uring_sqe sqe;

u64 data, *uptr = (u64 *)ctx->user_data;

if (exit) return IORING_BPF_OK;

• make BPF registration optional

ON io_uring, simple QD=1 1630

any io_uring + BPF 810

OFF read(2)/write(2) 550

However, let’s take another CPU:

any io_uring + BPF 1250

Use cases to try:

Each new case requires some tuning and optimisation.

Have to solve some slight performance regressions from multi-CQ

TODO: selftests, bpf_link, API changes

io_uring mailing list

You might also like

u64 data, uptr = (u64 )ctx->user_data;