android_kernel_xiaomi_sm8450/kernel
Eduard Zingerman ca42be8dd1 bpf: Allow reads from uninit stack
commit 6715df8d5d24655b9fd368e904028112b54c7de1 upstream.

This commits updates the following functions to allow reads from
uninitialized stack locations when env->allow_uninit_stack option is
enabled:
- check_stack_read_fixed_off()
- check_stack_range_initialized(), called from:
  - check_stack_read_var_off()
  - check_helper_mem_access()

Such change allows to relax logic in stacksafe() to treat STACK_MISC
and STACK_INVALID in a same way and make the following stack slot
configurations equivalent:

  |  Cached state    |  Current state   |
  |   stack slot     |   stack slot     |
  |------------------+------------------|
  | STACK_INVALID or | STACK_INVALID or |
  | STACK_MISC       | STACK_SPILL   or |
  |                  | STACK_MISC    or |
  |                  | STACK_ZERO    or |
  |                  | STACK_DYNPTR     |

This leads to significant verification speed gains (see below).

The idea was suggested by Andrii Nakryiko [1] and initial patch was
created by Alexei Starovoitov [2].

Currently the env->allow_uninit_stack is allowed for programs loaded
by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities.

A number of test cases from verifier/*.c were expecting uninitialized
stack access to be an error. These test cases were updated to execute
in unprivileged mode (thus preserving the tests).

The test progs/test_global_func10.c expected "invalid indirect read
from stack" error message because of the access to uninitialized
memory region. This error is no longer possible in privileged mode.
The test is updated to provoke an error "invalid indirect access to
stack" because of access to invalid stack address (such error is not
verified by progs/test_global_func*.c series of tests).

The following tests had to be removed because these can't be made
unprivileged:
- verifier/sock.c:
  - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init
  stack_value"
  BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode.
- verifier/var_off.c:
  - "indirect variable-offset stack access, max_off+size > max_initialized"
  - "indirect variable-offset stack access, uninitialized"
  These tests verify that access to uninitialized stack values is
  detected when stack offset is not a constant. However, variable
  stack access is prohibited in unprivileged mode, thus these tests
  are no longer valid.

 * * *

Here is veristat log comparing this patch with current master on a
set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg
and cilium BPF binaries (see [3]):

$ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log
File                        Program                     States (A)  States (B)  States    (DIFF)
--------------------------  --------------------------  ----------  ----------  ----------------
bpf_host.o                  tail_handle_ipv6_from_host         349         244    -105 (-30.09%)
bpf_host.o                  tail_handle_nat_fwd_ipv4          1320         895    -425 (-32.20%)
bpf_lxc.o                   tail_handle_nat_fwd_ipv4          1320         895    -425 (-32.20%)
bpf_sock.o                  cil_sock4_connect                   70          48     -22 (-31.43%)
bpf_sock.o                  cil_sock4_sendmsg                   68          46     -22 (-32.35%)
bpf_xdp.o                   tail_handle_nat_fwd_ipv4          1554         803    -751 (-48.33%)
bpf_xdp.o                   tail_lb_ipv4                      6457        2473   -3984 (-61.70%)
bpf_xdp.o                   tail_lb_ipv6                      7249        3908   -3341 (-46.09%)
pyperf600_bpf_loop.bpf.o    on_event                           287         145    -142 (-49.48%)
strobemeta.bpf.o            on_event                         15915        4772  -11143 (-70.02%)
strobemeta_nounroll2.bpf.o  on_event                         17087        3820  -13267 (-77.64%)
xdp_synproxy_kern.bpf.o     syncookie_tc                     21271        6635  -14636 (-68.81%)
xdp_synproxy_kern.bpf.o     syncookie_xdp                    23122        6024  -17098 (-73.95%)
--------------------------  --------------------------  ----------  ----------  ----------------

Note: I limited selection by states_pct<-30%.

Inspection of differences in pyperf600_bpf_loop behavior shows that
the following patch for the test removes almost all differences:

    - a/tools/testing/selftests/bpf/progs/pyperf.h
    + b/tools/testing/selftests/bpf/progs/pyperf.h
    @ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args *ctx)
            }

            if (event->pthread_match || !pidData->use_tls) {
    -               void* frame_ptr;
    -               FrameData frame;
    +               void* frame_ptr = 0;
    +               FrameData frame = {};
                    Symbol sym = {};
                    int cur_cpu = bpf_get_smp_processor_id();

W/o this patch the difference comes from the following pattern
(for different variables):

    static bool get_frame_data(... FrameData *frame ...)
    {
        ...
        bpf_probe_read_user(&frame->f_code, ...);
        if (!frame->f_code)
            return false;
        ...
        bpf_probe_read_user(&frame->co_name, ...);
        if (frame->co_name)
            ...;
    }

    int __on_event(struct bpf_raw_tracepoint_args *ctx)
    {
        FrameData frame;
        ...
        get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback
        ...
    }

    SEC("raw_tracepoint/kfree_skb")
    int on_event(struct bpf_raw_tracepoint_args* ctx)
    {
        ...
        ret |= __on_event(ctx);
        ret |= __on_event(ctx);
        ...
    }

With regards to value `frame->co_name` the following is important:
- Because of the conditional `if (!frame->f_code)` each call to
  __on_event() produces two states, one with `frame->co_name` marked
  as STACK_MISC, another with it as is (and marked STACK_INVALID on a
  first call).
- The call to bpf_probe_read_user() does not mark stack slots
  corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks
  these slots as BPF_MISC, this happens because of the following loop
  in the check_helper_call():

	for (i = 0; i < meta.access_size; i++) {
		err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B,
				       BPF_WRITE, -1, false);
		if (err)
			return err;
	}

  Note the size of the write, it is a one byte write for each byte
  touched by a helper. The BPF_B write does not lead to write marks
  for the target stack slot.
- Which means that w/o this patch when second __on_event() call is
  verified `if (frame->co_name)` will propagate read marks first to a
  stack slot with STACK_MISC marks and second to a stack slot with
  STACK_INVALID marks and these states would be considered different.

[1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@mail.gmail.com/
[2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/
[3] git@github.com:anakryiko/cilium.git

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230219200427.606541-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-18 13:05:50 +02:00
..
bpf bpf: Allow reads from uninit stack 2024-07-18 13:05:50 +02:00
cgroup sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:32:15 +02:00
configs
debug kdb: Use format-specifiers rather than memset() for padding in kdb_read() 2024-06-16 13:32:35 +02:00
dma dma-mapping: clear dev->dma_mem to NULL after freeing it 2024-01-25 14:37:45 -08:00
entry entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set 2023-01-04 11:39:22 +01:00
events perf/core: Fix missing wakeup when waiting for context reference 2024-07-05 09:12:44 +02:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:06:44 +01:00
gcov gcov: add support for GCC 14 2024-07-05 09:12:41 +02:00
irq genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline 2024-06-16 13:32:30 +02:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-07-27 08:43:57 +02:00
livepatch kallsyms: refactor {,module_}kallsyms_on_each_symbol 2024-06-21 14:52:58 +02:00
locking lockdep: Fix block chain corruption 2023-12-08 08:46:09 +01:00
power PM: suspend: Set mem_sleep_current during kernel command line setup 2024-04-13 12:58:13 +02:00
printk printk: Update @console_may_schedule in console_trylock_spinning() 2024-04-13 12:58:54 +02:00
rcu rcutorture: Fix invalid context warning when enable srcu barrier testing 2024-07-05 09:12:33 +02:00
sched sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:32:15 +02:00
time tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device() 2024-07-05 09:12:32 +02:00
trace tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test 2024-07-05 09:12:43 +02:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-14 10:16:14 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-02-23 08:41:53 +01:00
audit_fsnotify.c fsnotify: make allow_dups a property of the group 2024-06-21 14:53:39 +02:00
audit_tree.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-06-21 14:53:39 +02:00
audit_watch.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-06-21 14:53:39 +02:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-23 08:42:03 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:40:00 +02:00
auditfilter.c ima: Avoid blocking in RCU read-side critical section 2024-07-18 13:05:44 +02:00
auditsc.c audit: fix possible soft lockup in __audit_inode_child() 2023-09-19 12:20:13 +02:00
backtracetest.c
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:23:46 +02:00
capability.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:23:45 +02:00
configs.c
context_tracking.c
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-09-15 09:50:40 +02:00
cpu.c cpu: Re-enable CPU mitigations by default for !X86 architectures 2024-05-02 16:23:44 +02:00
crash_core.c crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo 2021-06-23 14:42:52 +02:00
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 15:44:30 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c mm: optimize the redundant loop of mm_update_owner_next() 2024-07-18 13:05:42 +02:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 16:40:18 +01:00
fork.c exec: Simplify unshare_files 2024-06-21 14:52:47 +02:00
freezer.c Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" 2021-04-07 15:00:14 +02:00
gen_kheaders.sh kheaders: explicitly define file modes for archived headers 2024-07-05 09:12:44 +02:00
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-20 16:05:58 +02:00
kallsyms.c kallsyms: only build {,module_}kallsyms_on_each_symbol when required 2024-06-21 14:52:58 +02:00
kcmp.c kcmp: In get_file_raw_ptr use task_lookup_fd_rcu 2024-06-21 14:52:48 +02:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: don't lose track of remote references during softirqs 2024-07-05 09:12:41 +02:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-27 08:43:40 +02:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 15:45:37 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kexec.c panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:10:29 +02:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-17 11:47:33 +02:00
kmod.c
kprobes.c kprobes: Fix possible use-after-free issue on kprobe registration 2024-05-02 16:23:36 +02:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2023-04-20 12:10:29 +02:00
kthread.c exit: Implement kthread_exit 2024-06-21 14:53:28 +02:00
latencytop.c
Makefile futex: Move to kernel/futex/ 2023-01-14 10:15:20 +01:00
module_signature.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module_signing.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module-internal.h
module.c NFSD: Remove svc_serv_ops::svo_module 2024-06-21 14:53:37 +02:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c
padata.c padata: Disable BH when taking works lock on MT path 2024-07-05 09:12:33 +02:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 12:59:40 +02:00
params.c params: lift param_set_uint_minmax to common code 2024-06-16 13:32:26 +02:00
pid_namespace.c zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING 2024-07-05 09:12:33 +02:00
pid.c kernel/pid.c: implement additional checks upon pidfd_create() parameters 2024-06-21 14:53:17 +02:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-21 15:16:05 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-09 10:20:49 +02:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 16:54:58 +00:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-17 11:47:34 +02:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-03-11 16:40:04 +01:00
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:40:03 +02:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:01:05 +02:00
scs.c
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2024-03-01 13:16:46 +01:00
signal.c task_work: unconditionally run task_work from get_signal() 2023-01-04 11:39:23 +01:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:23:29 +02:00
smpboot.c sched/core: Initialize the idle task with preemption disabled 2021-07-14 16:55:50 +02:00
smpboot.h
softirq.c softirq: Add debug check to __raise_softirq_irqoff() 2020-09-16 15:18:56 +02:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:01:00 +01:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix unused variable warn w/o MODULE 2021-09-08 08:49:00 +02:00
stop_machine.c stop_machine, rcu: Mark functions as notrace 2020-10-26 12:12:27 +01:00
sys_ni.c syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:12:55 +02:00
sys.c fs: add file and path permissions helpers 2024-06-21 14:52:58 +02:00
sysctl-test.c
sysctl.c sysctl: introduce new proc handler proc_dobool 2024-06-21 14:53:18 +02:00
task_work.c task_work: add helper for more targeted task_work canceling 2023-01-04 11:39:23 +01:00
taskstats.c taskstats: move specifying netlink policy back to ops 2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c torture: Fix hang during kthread shutdown phase 2023-08-30 16:23:17 +02:00
tracepoint.c tracepoint: Use rcu get state and cond sync for static call updates 2021-09-03 10:09:30 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 10:54:33 +01:00
ucount.c fanotify: configurable limits via sysfs 2024-06-21 14:53:06 +02:00
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
user_namespace.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
user-return-notifier.c
user.c
usermode_driver.c bpf: Fix umd memory leak in copy_process() 2021-03-30 14:32:03 +02:00
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths 2023-03-17 08:45:13 +01:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-07-27 08:43:40 +02:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 16:54:56 +00:00
workqueue_internal.h
workqueue.c Revert "workqueue: remove unused cancel_work()" 2023-12-08 08:46:13 +01:00