android_kernel_xiaomi_sm8450/kernel
Daniel Borkmann be35504b95 bpf: Fix overrunning reservations in ringbuf
commit cfa1a2329a691ffd991fcf7248a57d752e712881 upstream.

The BPF ring buffer internally is implemented as a power-of-2 sized circular
buffer, with two logical and ever-increasing counters: consumer_pos is the
consumer counter to show which logical position the consumer consumed the
data, and producer_pos which is the producer counter denoting the amount of
data reserved by all producers.

Each time a record is reserved, the producer that "owns" the record will
successfully advance producer counter. In user space each time a record is
read, the consumer of the data advanced the consumer counter once it finished
processing. Both counters are stored in separate pages so that from user
space, the producer counter is read-only and the consumer counter is read-write.

One aspect that simplifies and thus speeds up the implementation of both
producers and consumers is how the data area is mapped twice contiguously
back-to-back in the virtual memory, allowing to not take any special measures
for samples that have to wrap around at the end of the circular buffer data
area, because the next page after the last data page would be first data page
again, and thus the sample will still appear completely contiguous in virtual
memory.

Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for
book-keeping the length and offset, and is inaccessible to the BPF program.
Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`
for the BPF program to use. Bing-Jhong and Muhammad reported that it is however
possible to make a second allocated memory chunk overlapping with the first
chunk and as a result, the BPF program is now able to edit first chunk's
header.

For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size
of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to
bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in
[0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets
allocate a chunk B with size 0x3000. This will succeed because consumer_pos
was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask`
check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able
to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned
earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data
pages. This means that chunk B at [0x4000,0x4008] is chunk A's header.
bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then
locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk
B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong
page and could cause a crash.

Fix it by calculating the oldest pending_pos and check whether the range
from the oldest outstanding record to the newest would span beyond the ring
buffer size. If that is the case, then reject the request. We've tested with
the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)
before/after the fix and while it seems a bit slower on some benchmarks, it
is still not significantly enough to matter.

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg>
Co-developed-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240621140828.18238-1-daniel@iogearbox.net
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-27 10:40:22 +02:00
..
bpf bpf: Fix overrunning reservations in ringbuf 2024-07-27 10:40:22 +02:00
cgroup sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:32:15 +02:00
configs
debug kdb: Use format-specifiers rather than memset() for padding in kdb_read() 2024-06-16 13:32:35 +02:00
dma dma-mapping: clear dev->dma_mem to NULL after freeing it 2024-01-25 14:37:45 -08:00
entry entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set 2023-01-04 11:39:22 +01:00
events perf/core: Fix missing wakeup when waiting for context reference 2024-07-05 09:12:44 +02:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:06:44 +01:00
gcov gcov: add support for GCC 14 2024-07-05 09:12:41 +02:00
irq genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline 2024-06-16 13:32:30 +02:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-07-27 08:43:57 +02:00
livepatch kallsyms: refactor {,module_}kallsyms_on_each_symbol 2024-06-21 14:52:58 +02:00
locking lockdep: Fix block chain corruption 2023-12-08 08:46:09 +01:00
power PM: suspend: Set mem_sleep_current during kernel command line setup 2024-04-13 12:58:13 +02:00
printk printk: Update @console_may_schedule in console_trylock_spinning() 2024-04-13 12:58:54 +02:00
rcu rcutorture: Fix invalid context warning when enable srcu barrier testing 2024-07-05 09:12:33 +02:00
sched sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:32:15 +02:00
time tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device() 2024-07-05 09:12:32 +02:00
trace tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test 2024-07-05 09:12:43 +02:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-14 10:16:14 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-02-23 08:41:53 +01:00
audit_fsnotify.c fsnotify: make allow_dups a property of the group 2024-06-21 14:53:39 +02:00
audit_tree.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-06-21 14:53:39 +02:00
audit_watch.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-06-21 14:53:39 +02:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-23 08:42:03 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:40:00 +02:00
auditfilter.c ima: Avoid blocking in RCU read-side critical section 2024-07-18 13:05:44 +02:00
auditsc.c audit: fix possible soft lockup in __audit_inode_child() 2023-09-19 12:20:13 +02:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:23:46 +02:00
capability.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:23:45 +02:00
configs.c
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-09-15 09:50:40 +02:00
cpu.c cpu: Re-enable CPU mitigations by default for !X86 architectures 2024-05-02 16:23:44 +02:00
crash_core.c crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo 2021-06-23 14:42:52 +02:00
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 15:44:30 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c mm: optimize the redundant loop of mm_update_owner_next() 2024-07-18 13:05:42 +02:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 16:40:18 +01:00
fork.c exec: Simplify unshare_files 2024-06-21 14:52:47 +02:00
freezer.c Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" 2021-04-07 15:00:14 +02:00
gen_kheaders.sh kheaders: explicitly define file modes for archived headers 2024-07-05 09:12:44 +02:00
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c irq_work, smp: Allow irq_work on call_single_queue 2020-05-28 10:54:15 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-20 16:05:58 +02:00
kallsyms.c kallsyms: only build {,module_}kallsyms_on_each_symbol when required 2024-06-21 14:52:58 +02:00
kcmp.c kcmp: In get_file_raw_ptr use task_lookup_fd_rcu 2024-06-21 14:52:48 +02:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: don't lose track of remote references during softirqs 2024-07-05 09:12:41 +02:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-27 08:43:40 +02:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 15:45:37 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kexec.c panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-17 11:47:33 +02:00
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c kprobes: Fix possible use-after-free issue on kprobe registration 2024-05-02 16:23:36 +02:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2023-04-20 12:10:29 +02:00
kthread.c exit: Implement kthread_exit 2024-06-21 14:53:28 +02:00
latencytop.c
Makefile futex: Move to kernel/futex/ 2023-01-14 10:15:20 +01:00
module_signature.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module_signing.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module-internal.h
module.c NFSD: Remove svc_serv_ops::svo_module 2024-06-21 14:53:37 +02:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c nsproxy: support CLONE_NEWTIME with setns() 2020-07-08 11:14:22 +02:00
padata.c padata: Disable BH when taking works lock on MT path 2024-07-05 09:12:33 +02:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 12:59:40 +02:00
params.c params: lift param_set_uint_minmax to common code 2024-06-16 13:32:26 +02:00
pid_namespace.c zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING 2024-07-05 09:12:33 +02:00
pid.c kernel/pid.c: implement additional checks upon pidfd_create() parameters 2024-06-21 14:53:17 +02:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-21 15:16:05 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-09 10:20:49 +02:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 16:54:58 +00:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-17 11:47:34 +02:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-03-11 16:40:04 +01:00
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:40:03 +02:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:01:05 +02:00
scs.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2024-03-01 13:16:46 +01:00
signal.c task_work: unconditionally run task_work from get_signal() 2023-01-04 11:39:23 +01:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:23:29 +02:00
smpboot.c sched/core: Initialize the idle task with preemption disabled 2021-07-14 16:55:50 +02:00
smpboot.h
softirq.c softirq: Add debug check to __raise_softirq_irqoff() 2020-09-16 15:18:56 +02:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:01:00 +01:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix unused variable warn w/o MODULE 2021-09-08 08:49:00 +02:00
stop_machine.c stop_machine, rcu: Mark functions as notrace 2020-10-26 12:12:27 +01:00
sys_ni.c syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:12:55 +02:00
sys.c fs: add file and path permissions helpers 2024-06-21 14:52:58 +02:00
sysctl-test.c
sysctl.c sysctl: introduce new proc handler proc_dobool 2024-06-21 14:53:18 +02:00
task_work.c task_work: add helper for more targeted task_work canceling 2023-01-04 11:39:23 +01:00
taskstats.c taskstats: move specifying netlink policy back to ops 2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c torture: Fix hang during kthread shutdown phase 2023-08-30 16:23:17 +02:00
tracepoint.c tracepoint: Use rcu get state and cond sync for static call updates 2021-09-03 10:09:30 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 10:54:33 +01:00
ucount.c fanotify: configurable limits via sysfs 2024-06-21 14:53:06 +02:00
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
user_namespace.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
user-return-notifier.c
user.c user.c: make uidhash_table static 2020-06-04 19:06:24 -07:00
usermode_driver.c bpf: Fix umd memory leak in copy_process() 2021-03-30 14:32:03 +02:00
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths 2023-03-17 08:45:13 +01:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-07-27 08:43:40 +02:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 16:54:56 +00:00
workqueue_internal.h
workqueue.c Revert "workqueue: remove unused cancel_work()" 2023-12-08 08:46:13 +01:00