android_kernel_xiaomi_sm8450/net/core
Connor O'Brien cf002be3b8 bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode
From: Daniel Borkmann <daniel@iogearbox.net>

commit 8520e224f547cd070c7c8f97b1fc6d58cff7ccaa upstream.

Fix cgroup v1 interference when non-root cgroup v2 BPF programs are used.
Back in the days, commit bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
embedded per-socket cgroup information into sock->sk_cgrp_data and in order
to save 8 bytes in struct sock made both mutually exclusive, that is, when
cgroup v1 socket tagging (e.g. net_cls/net_prio) is used, then cgroup v2
falls back to the root cgroup in sock_cgroup_ptr() (&cgrp_dfl_root.cgrp).

The assumption made was "there is no reason to mix the two and this is in line
with how legacy and v2 compatibility is handled" as stated in bd1060a1d6.
However, with Kubernetes more widely supporting cgroups v2 as well nowadays,
this assumption no longer holds, and the possibility of the v1/v2 mixed mode
with the v2 root fallback being hit becomes a real security issue.

Many of the cgroup v2 BPF programs are also used for policy enforcement, just
to pick _one_ example, that is, to programmatically deny socket related system
calls like connect(2) or bind(2). A v2 root fallback would implicitly cause
a policy bypass for the affected Pods.

In production environments, we have recently seen this case due to various
circumstances: i) a different 3rd party agent and/or ii) a container runtime
such as [0] in the user's environment configuring legacy cgroup v1 net_cls
tags, which triggered implicitly mentioned root fallback. Another case is
Kubernetes projects like kind [1] which create Kubernetes nodes in a container
and also add cgroup namespaces to the mix, meaning programs which are attached
to the cgroup v2 root of the cgroup namespace get attached to a non-root
cgroup v2 path from init namespace point of view. And the latter's root is
out of reach for agents on a kind Kubernetes node to configure. Meaning, any
entity on the node setting cgroup v1 net_cls tag will trigger the bypass
despite cgroup v2 BPF programs attached to the namespace root.

Generally, this mutual exclusiveness does not hold anymore in today's user
environments and makes cgroup v2 usage from BPF side fragile and unreliable.
This fix adds proper struct cgroup pointer for the cgroup v2 case to struct
sock_cgroup_data in order to address these issues; this implicitly also fixes
the tradeoffs being made back then with regards to races and refcount leaks
as stated in bd1060a1d6, and removes the fallback, so that cgroup v2 BPF
programs always operate as expected.

  [0] https://github.com/nestybox/sysbox/
  [1] https://kind.sigs.k8s.io/

Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-1-daniel@iogearbox.net
[resolve trivial conflicts]
Signed-off-by: Connor O'Brien <connor.obrien@crowdstrike.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12 11:06:42 +02:00
..
bpf_sk_storage.c bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing 2023-08-11 11:57:47 +02:00
datagram.c net: datagram: fix data-races in datagram_poll() 2023-05-30 12:57:46 +01:00
datagram.h
dev_addr_lists.c net: core: add nested_level variable in net_device 2020-09-28 15:00:15 -07:00
dev_ioctl.c net: dev: Convert sa_data to flexible array in struct sockaddr 2024-03-01 13:16:50 +01:00
dev.c net: give more chances to rcu in netdev_wait_allrefs_any() 2024-06-16 13:32:07 +02:00
devlink.c devlink: remove reload failed checks in params get/set callbacks 2023-09-23 11:01:05 +02:00
drop_monitor.c drop_monitor: replace spin_lock by raw_spin_lock 2024-07-05 09:12:34 +02:00
dst_cache.c wireguard: device: reset peer src endpoint when netns exits 2021-12-08 09:03:22 +01:00
dst.c ipv6: remove max_size check inline with ipv4 2024-01-15 18:48:07 +01:00
failover.c
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c ipv6: fix memory leak in fib6_rule_suppress 2021-12-08 09:03:21 +01:00
filter.c bpf: Fix a segment issue when downgrading gso_size 2024-08-19 05:41:05 +02:00
flow_dissector.c net/ipv6: SKB symmetric hash should incorporate transport ports 2023-09-19 12:20:23 +02:00
flow_offload.c netfilter: nf_tables: bail out early if hardware offload is not supported 2022-06-14 18:32:40 +02:00
gen_estimator.c net_sched: gen_estimator: support large ewma log 2021-01-27 11:55:23 +01:00
gen_stats.c docs: networking: convert gen_stats.txt to ReST 2020-04-28 14:39:46 -07:00
gro_cells.c net: Fix data-races around netdev_max_backlog. 2022-08-31 17:15:19 +02:00
hwbm.c net: hwbm: Make the hwbm_pool lock a mutex 2019-06-09 19:40:10 -07:00
link_watch.c net: linkwatch: use system_unbound_wq 2024-08-19 05:41:11 +02:00
lwt_bpf.c lwt: Fix return values of BPF xmit ops 2023-09-19 12:20:09 +02:00
lwtunnel.c lwtunnel: Validate RTA_ENCAP_TYPE attribute length 2022-01-11 15:25:00 +01:00
Makefile ethtool: move to its own directory 2019-12-12 17:07:05 -08:00
neighbour.c neighbour: Don't let neigh_forced_gc() disable preemption for long 2024-01-25 14:37:37 -08:00
net_namespace.c netns: Make get_net_ns() handle zero refcount net 2024-07-05 09:12:38 +02:00
net-procfs.c net-procfs: show net devices bound packet types 2022-02-01 17:25:44 +01:00
net-sysfs.c ethtool: check device is present when getting link settings 2024-09-04 13:17:46 +02:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c page_pool: add tracepoints for page_pool with details need by XDP 2019-06-19 11:23:13 -04:00
netclassid_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2024-09-12 11:06:42 +02:00
netevent.c
netpoll.c netpoll: Fix race condition in netpoll_owner_active 2024-07-05 09:12:35 +02:00
netprio_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2024-09-12 11:06:42 +02:00
page_pool.c mm: fix struct page layout on 32-bit systems 2021-05-19 10:13:17 +02:00
pktgen.c net: pktgen: Fix interface flags printing 2023-10-25 11:54:20 +02:00
ptp_classifier.c ptp: Add generic ptp v2 header parsing function 2020-08-19 16:07:49 -07:00
request_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-02-23 08:41:55 +01:00
rtnetlink.c rtnetlink: Correct nested IFLA_VF_VLAN_LIST attribute validation 2024-05-17 11:48:07 +02:00
scm.c io_uring/unix: drop usage of io_uring socket 2024-03-26 18:21:45 -04:00
secure_seq.c tcp: Fix data-races around sysctl knobs related to SYN option. 2022-07-29 17:19:21 +02:00
skbuff.c kcov: Remove kcov include from sched.h and move it to its users. 2024-05-17 11:48:07 +02:00
skmsg.c bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues 2024-07-18 13:05:43 +02:00
sock_diag.c sock_diag: annotate data-races around sock_diag_handlers[family] 2024-03-26 18:21:49 -04:00
sock_map.c bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues 2024-07-18 13:05:43 +02:00
sock_reuseport.c udp: Update reuse->has_conns under reuseport_lock. 2022-10-30 09:41:19 +01:00
sock.c ipv6: Fix data races around sk->sk_prot. 2024-07-05 09:12:56 +02:00
stream.c net: deal with most data-races in sk_wait_event() 2023-05-30 12:57:46 +01:00
sysctl_net_core.c net: Fix data-races around weight_p and dev_weight_[rt]x_bias. 2022-08-31 17:15:19 +02:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: tso: add UDP segmentation support 2020-06-18 20:46:23 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-01-24 20:54:30 +01:00
xdp.c xdp: fix invalid wait context of page_pool_destroy() 2024-08-19 05:40:48 +02:00