android_kernel_samsung_sm8650/net
Felix Huettner 644b3051b0 net: openvswitch: fix race on port output
[ Upstream commit 066b86787fa3d97b7aefb5ac0a99a22dad2d15f8 ]

assume the following setup on a single machine:
1. An openvswitch instance with one bridge and default flows
2. two network namespaces "server" and "client"
3. two ovs interfaces "server" and "client" on the bridge
4. for each ovs interface a veth pair with a matching name and 32 rx and
   tx queues
5. move the ends of the veth pairs to the respective network namespaces
6. assign ip addresses to each of the veth ends in the namespaces (needs
   to be the same subnet)
7. start some http server on the server network namespace
8. test if a client in the client namespace can reach the http server

when following the actions below the host has a chance of getting a cpu
stuck in a infinite loop:
1. send a large amount of parallel requests to the http server (around
   3000 curls should work)
2. in parallel delete the network namespace (do not delete interfaces or
   stop the server, just kill the namespace)

there is a low chance that this will cause the below kernel cpu stuck
message. If this does not happen just retry.
Below there is also the output of bpftrace for the functions mentioned
in the output.

The series of events happening here is:
1. the network namespace is deleted calling
   `unregister_netdevice_many_notify` somewhere in the process
2. this sets first `NETREG_UNREGISTERING` on both ends of the veth and
   then runs `synchronize_net`
3. it then calls `call_netdevice_notifiers` with `NETDEV_UNREGISTER`
4. this is then handled by `dp_device_event` which calls
   `ovs_netdev_detach_dev` (if a vport is found, which is the case for
   the veth interface attached to ovs)
5. this removes the rx_handlers of the device but does not prevent
   packages to be sent to the device
6. `dp_device_event` then queues the vport deletion to work in
   background as a ovs_lock is needed that we do not hold in the
   unregistration path
7. `unregister_netdevice_many_notify` continues to call
   `netdev_unregister_kobject` which sets `real_num_tx_queues` to 0
8. port deletion continues (but details are not relevant for this issue)
9. at some future point the background task deletes the vport

If after 7. but before 9. a packet is send to the ovs vport (which is
not deleted at this point in time) which forwards it to the
`dev_queue_xmit` flow even though the device is unregistering.
In `skb_tx_hash` (which is called in the `dev_queue_xmit`) path there is
a while loop (if the packet has a rx_queue recorded) that is infinite if
`dev->real_num_tx_queues` is zero.

To prevent this from happening we update `do_output` to handle devices
without carrier the same as if the device is not found (which would
be the code path after 9. is done).

Additionally we now produce a warning in `skb_tx_hash` if we will hit
the infinite loop.

bpftrace (first word is function name):

__dev_queue_xmit server: real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 2, reg_state: 1
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 6, reg_state: 2
ovs_netdev_detach_dev server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, reg_state: 2
netdev_rx_handler_unregister server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
netdev_rx_handler_unregister ret server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 27, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 22, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 18, reg_state: 2
netdev_unregister_kobject: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
ovs_vport_send server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
__dev_queue_xmit server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
broken device server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024
ovs_dp_detach_port server: real_num_tx_queues: 0 cpu 9, pid: 9124, tid: 9124, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 33604, tid: 33604

stuck message:

watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [curl:1929279]
Modules linked in: veth pktgen bridge stp llc ip_set_hash_net nft_counter xt_set nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tls binfmt_misc nls_iso8859_1 input_leds joydev serio_raw dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net ahci net_failover crypto_simd cryptd psmouse libahci virtio_blk failover
CPU: 5 PID: 1929279 Comm: curl Not tainted 5.15.0-67-generic #74-Ubuntu
Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:netdev_pick_tx+0xf1/0x320
Code: 00 00 8d 48 ff 0f b7 c1 66 39 ca 0f 86 e9 01 00 00 45 0f b7 ff 41 39 c7 0f 87 5b 01 00 00 44 29 f8 41 39 c7 0f 87 4f 01 00 00 <eb> f2 0f 1f 44 00 00 49 8b 94 24 28 04 00 00 48 85 d2 0f 84 53 01
RSP: 0018:ffffb78b40298820 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff9c8773adc2e0 RCX: 000000000000083f
RDX: 0000000000000000 RSI: ffff9c8773adc2e0 RDI: ffff9c870a25e000
RBP: ffffb78b40298858 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c870a25e000
R13: ffff9c870a25e000 R14: ffff9c87fe043480 R15: 0000000000000000
FS:  00007f7b80008f00(0000) GS:ffff9c8e5f740000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7b80f6a0b0 CR3: 0000000329d66000 CR4: 0000000000350ee0
Call Trace:
 <IRQ>
 netdev_core_pick_tx+0xa4/0xb0
 __dev_queue_xmit+0xf8/0x510
 ? __bpf_prog_exit+0x1e/0x30
 dev_queue_xmit+0x10/0x20
 ovs_vport_send+0xad/0x170 [openvswitch]
 do_output+0x59/0x180 [openvswitch]
 do_execute_actions+0xa80/0xaa0 [openvswitch]
 ? kfree+0x1/0x250
 ? kfree+0x1/0x250
 ? kprobe_perf_func+0x4f/0x2b0
 ? flow_lookup.constprop.0+0x5c/0x110 [openvswitch]
 ovs_execute_actions+0x4c/0x120 [openvswitch]
 ovs_dp_process_packet+0xa1/0x200 [openvswitch]
 ? ovs_ct_update_key.isra.0+0xa8/0x120 [openvswitch]
 ? ovs_ct_fill_key+0x1d/0x30 [openvswitch]
 ? ovs_flow_key_extract+0x2db/0x350 [openvswitch]
 ovs_vport_receive+0x77/0xd0 [openvswitch]
 ? __htab_map_lookup_elem+0x4e/0x60
 ? bpf_prog_680e8aff8547aec1_kfree+0x3b/0x714
 ? trace_call_bpf+0xc8/0x150
 ? kfree+0x1/0x250
 ? kfree+0x1/0x250
 ? kprobe_perf_func+0x4f/0x2b0
 ? kprobe_perf_func+0x4f/0x2b0
 ? __mod_memcg_lruvec_state+0x63/0xe0
 netdev_port_receive+0xc4/0x180 [openvswitch]
 ? netdev_port_receive+0x180/0x180 [openvswitch]
 netdev_frame_hook+0x1f/0x40 [openvswitch]
 __netif_receive_skb_core.constprop.0+0x23d/0xf00
 __netif_receive_skb_one_core+0x3f/0xa0
 __netif_receive_skb+0x15/0x60
 process_backlog+0x9e/0x170
 __napi_poll+0x33/0x180
 net_rx_action+0x126/0x280
 ? ttwu_do_activate+0x72/0xf0
 __do_softirq+0xd9/0x2e7
 ? rcu_report_exp_cpu_mult+0x1b0/0x1b0
 do_softirq+0x7d/0xb0
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0x54/0x60
 ip_finish_output2+0x191/0x460
 __ip_finish_output+0xb7/0x180
 ip_finish_output+0x2e/0xc0
 ip_output+0x78/0x100
 ? __ip_finish_output+0x180/0x180
 ip_local_out+0x5e/0x70
 __ip_queue_xmit+0x184/0x440
 ? tcp_syn_options+0x1f9/0x300
 ip_queue_xmit+0x15/0x20
 __tcp_transmit_skb+0x910/0x9c0
 ? __mod_memcg_state+0x44/0xa0
 tcp_connect+0x437/0x4e0
 ? ktime_get_with_offset+0x60/0xf0
 tcp_v4_connect+0x436/0x530
 __inet_stream_connect+0xd4/0x3a0
 ? kprobe_perf_func+0x4f/0x2b0
 ? aa_sk_perm+0x43/0x1c0
 inet_stream_connect+0x3b/0x60
 __sys_connect_file+0x63/0x70
 __sys_connect+0xa6/0xd0
 ? setfl+0x108/0x170
 ? do_fcntl+0xe8/0x5a0
 __x64_sys_connect+0x18/0x20
 do_syscall_64+0x5c/0xc0
 ? __x64_sys_fcntl+0xa9/0xd0
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? syscall_exit_to_user_mode+0x27/0x50
 ? do_syscall_64+0x69/0xc0
 ? __sys_setsockopt+0xea/0x1e0
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? syscall_exit_to_user_mode+0x27/0x50
 ? __x64_sys_setsockopt+0x1f/0x30
 ? do_syscall_64+0x69/0xc0
 ? irqentry_exit+0x1d/0x30
 ? exc_page_fault+0x89/0x170
 entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f7b8101c6a7
Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 24 89
RSP: 002b:00007ffffd6b2198 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b8101c6a7
RDX: 0000000000000010 RSI: 00007ffffd6b2360 RDI: 0000000000000005
RBP: 0000561f1370d560 R08: 00002795ad21d1ac R09: 0030312e302e302e
R10: 00007ffffd73f080 R11: 0000000000000246 R12: 0000561f1370c410
R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000000
 </TASK>

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Co-developed-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZC0pBXBAgh7c76CA@kernel-bug-kernel-bug
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:35:09 +02:00
..
6lowpan
9p 9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition 2023-04-20 12:35:08 +02:00
802 mrp: introduce active flags to prevent UAF when applicant uninit 2022-12-31 13:33:02 +01:00
8021q
appletalk
atm net/atm: fix proc_mpc_write incorrect return value 2022-10-15 11:08:36 +01:00
ax25
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-09-22 13:02:10 -07:00
bluetooth Bluetooth: hci_conn: Fix possible UAF 2023-04-20 12:35:06 +02:00
bpf Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" 2023-03-17 08:50:32 +01:00
bpfilter
bridge netfilter: ebtables: fix table blob use-after-free 2023-03-11 13:55:24 +01:00
caif net: caif: Fix use-after-free in cfusbl_device_notify() 2023-03-17 08:50:24 +01:00
can can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos 2023-04-13 16:55:33 +02:00
ceph use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
core net: openvswitch: fix race on port output 2023-04-20 12:35:09 +02:00
dcb
dccp dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions. 2023-02-22 12:59:52 +01:00
dns_resolver
dsa net: dsa: tag_brcm: legacy: fix daisy-chained switches 2023-03-30 12:49:09 +02:00
ethernet
ethtool ethtool: reset #lanes when lanes is omitted 2023-04-13 16:55:24 +02:00
hsr hsr: ratelimit only when errors are printed 2023-04-06 12:10:58 +02:00
ieee802154 net: ieee802154: fix error return code in dgram_bind() 2022-10-07 09:29:17 +02:00
ife
ipv4 tcp: restrict net.ipv4.tcp_app_win 2023-04-20 12:35:08 +02:00
ipv6 raw: Fix NULL deref in raw_get_next(). 2023-04-13 16:55:23 +02:00
iucv net/iucv: Fix size of interrupt data 2023-03-22 13:33:50 +01:00
kcm kcm: close race conditions on sk_receive_queue 2022-11-15 12:42:26 +01:00
key xfrm: Fix oops in __xfrm_state_delete() 2022-11-22 07:14:55 +01:00
l2tp l2tp: generate correct module alias strings 2023-04-13 16:55:20 +02:00
l3mdev
lapb
llc
mac80211 wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta 2023-04-13 16:55:19 +02:00
mac802154 mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add() 2022-12-05 09:53:08 +01:00
mctp net: mctp: purge receive queues on sk destruction 2023-02-06 08:06:34 +01:00
mpls net: mpls: fix stale pointer if allocation fails during device rename 2023-02-22 12:59:53 +01:00
mptcp mptcp: fix UaF in listener shutdown 2023-03-30 12:49:00 +02:00
ncsi
netfilter netfilter: nft_redir: correct value of inet type .maxattrs 2023-03-22 13:33:42 +01:00
netlabel
netlink netlink: annotate lockless accesses to nlk->max_recvmsg_len 2023-04-13 16:55:24 +02:00
netrom netrom: Fix use-after-free caused by accept on already connected socket 2023-02-09 11:28:06 +01:00
nfc nfc: change order inside nfc_se_io error path 2023-03-17 08:50:17 +01:00
nsh
openvswitch net: openvswitch: fix race on port output 2023-04-20 12:35:09 +02:00
packet packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE 2022-11-29 08:30:18 -08:00
phonet
psample
qrtr net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT 2023-04-13 16:55:21 +02:00
rds rds: rds_rm_zerocopy_callback() correct order for list_add_tail() 2023-03-10 09:33:02 +01:00
rfkill
rose net/rose: Fix to not accept on connected socket 2023-02-22 12:59:42 +01:00
rxrpc rxrpc: Fix missing unlock in rxrpc_do_sendmsg() 2022-12-31 13:32:55 +01:00
sched act_mirred: use the backlog for nested calls to mirred ingress 2023-03-30 12:49:20 +02:00
sctp sctp: check send stream number after wait_for_sndbuf 2023-04-13 16:55:21 +02:00
smc net/smc: fix deadlock triggered by cancel_delayed_work_syn() 2023-03-22 13:33:47 +01:00
strparser
sunrpc sunrpc: only free unix grouplist after RCU settles 2023-04-13 16:55:23 +02:00
switchdev
tipc tipc: fix kernel warning when sending SYN message 2023-02-22 12:59:53 +01:00
tls net: tls: fix device-offloaded sendpage straddling records 2023-03-17 08:50:26 +01:00
unix af_unix: fix struct pid leaks in OOB support 2023-03-17 08:50:28 +01:00
vmw_vsock net: vmw_vsock: vmci: Check memcpy_from_msg() 2022-12-31 13:32:26 +01:00
wireless wifi: cfg80211: fix MLO connection ownership 2023-03-22 13:33:43 +01:00
x25 net/x25: Fix to not accept on connected socket 2023-02-09 11:28:13 +01:00
xdp xsk: Add missing overflow check in xdp_umem_reg 2023-03-30 12:49:02 +02:00
xfrm xfrm: Zero padding when dumping algos and encap 2023-04-06 12:10:37 +02:00
compat.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
devres.c
Kconfig
Kconfig.debug
Makefile
socket.c net: avoid double iput when sock_alloc_file fails 2023-03-10 09:34:34 +01:00
sysctl_net.c