android_kernel_xiaomi_sm8450/net/core
Taehee Yoo be9d08ff10 xdp: fix invalid wait context of page_pool_destroy()
[ Upstream commit 59a931c5b732ca5fc2ca727f5a72aeabaafa85ec ]

If the driver uses a page pool, it creates a page pool with
page_pool_create().
The reference count of page pool is 1 as default.
A page pool will be destroyed only when a reference count reaches 0.
page_pool_destroy() is used to destroy page pool, it decreases a
reference count.
When a page pool is destroyed, ->disconnect() is called, which is
mem_allocator_disconnect().
This function internally acquires mutex_lock().

If the driver uses XDP, it registers a memory model with
xdp_rxq_info_reg_mem_model().
The xdp_rxq_info_reg_mem_model() internally increases a page pool
reference count if a memory model is a page pool.
Now the reference count is 2.

To destroy a page pool, the driver should call both page_pool_destroy()
and xdp_unreg_mem_model().
The xdp_unreg_mem_model() internally calls page_pool_destroy().
Only page_pool_destroy() decreases a reference count.

If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), we
will face an invalid wait context warning.
Because xdp_unreg_mem_model() calls page_pool_destroy() with
rcu_read_lock().
The page_pool_destroy() internally acquires mutex_lock().

Splat looks like:
=============================
[ BUG: Invalid wait context ]
6.10.0-rc6+ #4 Tainted: G W
-----------------------------
ethtool/1806 is trying to lock:
ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150
other info that might help us debug this:
context-{5:5}
3 locks held by ethtool/1806:
stack backtrace:
CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
__lock_acquire+0x1681/0x4de0
? _printk+0x64/0xe0
? __pfx_mark_lock.part.0+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
lock_acquire+0x1b3/0x580
? mem_allocator_disconnect+0x73/0x150
? __wake_up_klogd.part.0+0x16/0xc0
? __pfx_lock_acquire+0x10/0x10
? dump_stack_lvl+0x91/0xc0
__mutex_lock+0x15c/0x1690
? mem_allocator_disconnect+0x73/0x150
? __pfx_prb_read_valid+0x10/0x10
? mem_allocator_disconnect+0x73/0x150
? __pfx_llist_add_batch+0x10/0x10
? console_unlock+0x193/0x1b0
? lockdep_hardirqs_on+0xbe/0x140
? __pfx___mutex_lock+0x10/0x10
? tick_nohz_tick_stopped+0x16/0x90
? __irq_work_queue_local+0x1e5/0x330
? irq_work_queue+0x39/0x50
? __wake_up_klogd.part.0+0x79/0xc0
? mem_allocator_disconnect+0x73/0x150
mem_allocator_disconnect+0x73/0x150
? __pfx_mem_allocator_disconnect+0x10/0x10
? mark_held_locks+0xa5/0xf0
? rcu_is_watching+0x11/0xb0
page_pool_release+0x36e/0x6d0
page_pool_destroy+0xd7/0x440
xdp_unreg_mem_model+0x1a7/0x2a0
? __pfx_xdp_unreg_mem_model+0x10/0x10
? kfree+0x125/0x370
? bnxt_free_ring.isra.0+0x2eb/0x500
? bnxt_free_mem+0x5ac/0x2500
xdp_rxq_info_unreg+0x4a/0xd0
bnxt_free_mem+0x1356/0x2500
bnxt_close_nic+0xf0/0x3b0
? __pfx_bnxt_close_nic+0x10/0x10
? ethnl_parse_bit+0x2c6/0x6d0
? __pfx___nla_validate_parse+0x10/0x10
? __pfx_ethnl_parse_bit+0x10/0x10
bnxt_set_features+0x2a8/0x3e0
__netdev_update_features+0x4dc/0x1370
? ethnl_parse_bitset+0x4ff/0x750
? __pfx_ethnl_parse_bitset+0x10/0x10
? __pfx___netdev_update_features+0x10/0x10
? mark_held_locks+0xa5/0xf0
? _raw_spin_unlock_irqrestore+0x42/0x70
? __pm_runtime_resume+0x7d/0x110
ethnl_set_features+0x32d/0xa20

To fix this problem, it uses rhashtable_lookup_fast() instead of
rhashtable_lookup() with rcu_read_lock().
Using xa without rcu_read_lock() here is safe.
xa is freed by __xdp_mem_allocator_rcu_free() and this is called by
call_rcu() of mem_xa_remove().
The mem_xa_remove() is called by page_pool_destroy() if a reference
count reaches 0.
The xa is already protected by the reference count mechanism well in the
control plane.
So removing rcu_read_lock() for page_pool_destroy() is safe.

Fixes: c3f812cea0 ("page_pool: do not release pool until inflight == 0.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240712095116.3801586-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-19 05:40:48 +02:00
..
bpf_sk_storage.c bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing 2023-08-11 11:57:47 +02:00
datagram.c net: datagram: fix data-races in datagram_poll() 2023-05-30 12:57:46 +01:00
datagram.h
dev_addr_lists.c net: core: add nested_level variable in net_device 2020-09-28 15:00:15 -07:00
dev_ioctl.c net: dev: Convert sa_data to flexible array in struct sockaddr 2024-03-01 13:16:50 +01:00
dev.c net: give more chances to rcu in netdev_wait_allrefs_any() 2024-06-16 13:32:07 +02:00
devlink.c devlink: remove reload failed checks in params get/set callbacks 2023-09-23 11:01:05 +02:00
drop_monitor.c drop_monitor: replace spin_lock by raw_spin_lock 2024-07-05 09:12:34 +02:00
dst_cache.c wireguard: device: reset peer src endpoint when netns exits 2021-12-08 09:03:22 +01:00
dst.c ipv6: remove max_size check inline with ipv4 2024-01-15 18:48:07 +01:00
failover.c
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c ipv6: fix memory leak in fib6_rule_suppress 2021-12-08 09:03:21 +01:00
filter.c bpf: Add a check for struct bpf_fib_lookup size 2024-07-05 09:12:49 +02:00
flow_dissector.c net/ipv6: SKB symmetric hash should incorporate transport ports 2023-09-19 12:20:23 +02:00
flow_offload.c netfilter: nf_tables: bail out early if hardware offload is not supported 2022-06-14 18:32:40 +02:00
gen_estimator.c net_sched: gen_estimator: support large ewma log 2021-01-27 11:55:23 +01:00
gen_stats.c docs: networking: convert gen_stats.txt to ReST 2020-04-28 14:39:46 -07:00
gro_cells.c net: Fix data-races around netdev_max_backlog. 2022-08-31 17:15:19 +02:00
hwbm.c net: hwbm: Make the hwbm_pool lock a mutex 2019-06-09 19:40:10 -07:00
link_watch.c net: linkwatch: fix failure to restore device state across suspend/resume 2021-08-18 08:59:13 +02:00
lwt_bpf.c lwt: Fix return values of BPF xmit ops 2023-09-19 12:20:09 +02:00
lwtunnel.c lwtunnel: Validate RTA_ENCAP_TYPE attribute length 2022-01-11 15:25:00 +01:00
Makefile ethtool: move to its own directory 2019-12-12 17:07:05 -08:00
neighbour.c neighbour: Don't let neigh_forced_gc() disable preemption for long 2024-01-25 14:37:37 -08:00
net_namespace.c netns: Make get_net_ns() handle zero refcount net 2024-07-05 09:12:38 +02:00
net-procfs.c net-procfs: show net devices bound packet types 2022-02-01 17:25:44 +01:00
net-sysfs.c net-sysfs: add check for netdevice being present to speed_show 2022-03-16 14:16:00 +01:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c page_pool: add tracepoints for page_pool with details need by XDP 2019-06-19 11:23:13 -04:00
netclassid_cgroup.c cgroup, netclassid: remove double cond_resched 2020-04-21 15:44:30 -07:00
netevent.c
netpoll.c netpoll: Fix race condition in netpoll_owner_active 2024-07-05 09:12:35 +02:00
netprio_cgroup.c netprio_cgroup: Fix unlimited memory leak of v2 cgroups 2020-05-09 20:59:21 -07:00
page_pool.c mm: fix struct page layout on 32-bit systems 2021-05-19 10:13:17 +02:00
pktgen.c net: pktgen: Fix interface flags printing 2023-10-25 11:54:20 +02:00
ptp_classifier.c ptp: Add generic ptp v2 header parsing function 2020-08-19 16:07:49 -07:00
request_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-02-23 08:41:55 +01:00
rtnetlink.c rtnetlink: Correct nested IFLA_VF_VLAN_LIST attribute validation 2024-05-17 11:48:07 +02:00
scm.c io_uring/unix: drop usage of io_uring socket 2024-03-26 18:21:45 -04:00
secure_seq.c tcp: Fix data-races around sysctl knobs related to SYN option. 2022-07-29 17:19:21 +02:00
skbuff.c kcov: Remove kcov include from sched.h and move it to its users. 2024-05-17 11:48:07 +02:00
skmsg.c bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues 2024-07-18 13:05:43 +02:00
sock_diag.c sock_diag: annotate data-races around sock_diag_handlers[family] 2024-03-26 18:21:49 -04:00
sock_map.c bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues 2024-07-18 13:05:43 +02:00
sock_reuseport.c udp: Update reuse->has_conns under reuseport_lock. 2022-10-30 09:41:19 +01:00
sock.c ipv6: Fix data races around sk->sk_prot. 2024-07-05 09:12:56 +02:00
stream.c net: deal with most data-races in sk_wait_event() 2023-05-30 12:57:46 +01:00
sysctl_net_core.c net: Fix data-races around weight_p and dev_weight_[rt]x_bias. 2022-08-31 17:15:19 +02:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: tso: add UDP segmentation support 2020-06-18 20:46:23 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-01-24 20:54:30 +01:00
xdp.c xdp: fix invalid wait context of page_pool_destroy() 2024-08-19 05:40:48 +02:00