62accf6c1d
65410 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
linke li
|
d21475d29d |
net: mark racy access on sk->sk_rcvbuf
[ Upstream commit c2deb2e971f5d9aca941ef13ee05566979e337a4 ] sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be changed by other threads. Mark this as benign using READ_ONCE(). This patch is aimed at reducing the number of benign races reported by KCSAN in order to focus future debugging effort on harmful races. Signed-off-by: linke li <lilinke99@qq.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Igor Artemiev
|
a762b8e041 |
wifi: cfg80211: fix rdev_dump_mpp() arguments order
[ Upstream commit ec50f3114e55406a1aad24b7dfaa1c3f4336d8eb ] Fix the order of arguments in the TP_ARGS macro for the rdev_dump_mpp tracepoint event. Found by Linux Verification Center (linuxtesting.org). Signed-off-by: Igor Artemiev <Igor.A.Artemiev@mcst.ru> Link: https://msgid.link/20240311164519.118398-1-Igor.A.Artemiev@mcst.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Jeff Johnson
|
a21712550a |
wifi: mac80211: fix ieee80211_bss_*_flags kernel-doc
[ Upstream commit 774f8841f55d7ac4044c79812691649da203584a ] Running kernel-doc on ieee80211_i.h flagged the following: net/mac80211/ieee80211_i.h:145: warning: expecting prototype for enum ieee80211_corrupt_data_flags. Prototype was for enum ieee80211_bss_corrupt_data_flags instead net/mac80211/ieee80211_i.h:162: warning: expecting prototype for enum ieee80211_valid_data_flags. Prototype was for enum ieee80211_bss_valid_data_flags instead Fix these warnings. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://msgid.link/20240314-kdoc-ieee80211_i-v1-1-72b91b55b257@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Richard Gobert
|
7df798dd59 |
net: gro: add flush check in udp_gro_receive_segment
[ Upstream commit 5babae777c61aa8a8679d59d3cdc54165ad96d42 ]
GRO-GSO path is supposed to be transparent and as such L3 flush checks are
relevant to all UDP flows merging in GRO. This patch uses the same logic
and code from tcp_gro_receive, terminating merge if flush is non zero.
Fixes:
|
||
Xin Long
|
adbce6d20d |
tipc: fix a possible memleak in tipc_buf_append
[ Upstream commit 97bf6f81b29a8efaf5d0983251a7450e5794370d ] __skb_linearize() doesn't free the skb when it fails, so move '*buf = NULL' after __skb_linearize(), so that the skb can be freed on the err path. Fixes: b7df21cf1b79 ("tipc: skb_linearize the head skb when reassembling msgs") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/90710748c29a1521efac4f75ea01b3b7e61414cf.1714485818.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Felix Fietkau
|
faa83a7797 |
net: core: reject skb_copy(_expand) for fraglist GSO skbs
[ Upstream commit d091e579b864fa790dd6a0cd537a22c383126681 ]
SKB_GSO_FRAGLIST skbs must not be linearized, otherwise they become
invalid. Return NULL if such an skb is passed to skb_copy or
skb_copy_expand, in order to prevent a crash on a potential later
call to skb_gso_segment.
Fixes:
|
||
Felix Fietkau
|
48ab384d2b |
net: bridge: fix multicast-to-unicast with fraglist GSO
[ Upstream commit 59c878cbcdd80ed39315573b3511d0acfd3501b5 ]
Calling skb_copy on a SKB_GSO_FRAGLIST skb is not valid, since it returns
an invalid linearized skb. This code only needs to change the ethernet
header, so pskb_copy is the right function to call here.
Fixes:
|
||
David Bauer
|
553b2f6c34 |
net l2tp: drop flow hash on forward
[ Upstream commit 42f853b42899d9b445763b55c3c8adc72be0f0e1 ]
Drop the flow-hash of the skb when forwarding to the L2TP netdev.
This avoids the L2TP qdisc from using the flow-hash from the outer
packet, which is identical for every flow within the tunnel.
This does not affect every platform but is specific for the ethernet
driver. It depends on the platform including L4 information in the
flow-hash.
One such example is the Mediatek Filogic MT798x family of networking
processors.
Fixes:
|
||
Kuniyuki Iwashima
|
bbccf0caef |
nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment().
[ Upstream commit 4b911a9690d72641879ea6d13cce1de31d346d79 ]
syzbot triggered various splats (see [0] and links) by a crafted GSO
packet of VIRTIO_NET_HDR_GSO_UDP layering the following protocols:
ETH_P_8021AD + ETH_P_NSH + ETH_P_IPV6 + IPPROTO_UDP
NSH can encapsulate IPv4, IPv6, Ethernet, NSH, and MPLS. As the inner
protocol can be Ethernet, NSH GSO handler, nsh_gso_segment(), calls
skb_mac_gso_segment() to invoke inner protocol GSO handlers.
nsh_gso_segment() does the following for the original skb before
calling skb_mac_gso_segment()
1. reset skb->network_header
2. save the original skb->{mac_heaeder,mac_len} in a local variable
3. pull the NSH header
4. resets skb->mac_header
5. set up skb->mac_len and skb->protocol for the inner protocol.
and does the following for the segmented skb
6. set ntohs(ETH_P_NSH) to skb->protocol
7. push the NSH header
8. restore skb->mac_header
9. set skb->mac_header + mac_len to skb->network_header
10. restore skb->mac_len
There are two problems in 6-7 and 8-9.
(a)
After 6 & 7, skb->data points to the NSH header, so the outer header
(ETH_P_8021AD in this case) is stripped when skb is sent out of netdev.
Also, if NSH is encapsulated by NSH + Ethernet (so NSH-Ethernet-NSH),
skb_pull() in the first nsh_gso_segment() will make skb->data point
to the middle of the outer NSH or Ethernet header because the Ethernet
header is not pulled by the second nsh_gso_segment().
(b)
While restoring skb->{mac_header,network_header} in 8 & 9,
nsh_gso_segment() does not assume that the data in the linear
buffer is shifted.
However, udp6_ufo_fragment() could shift the data and change
skb->mac_header accordingly as demonstrated by syzbot.
If this happens, even the restored skb->mac_header points to
the middle of the outer header.
It seems nsh_gso_segment() has never worked with outer headers so far.
At the end of nsh_gso_segment(), the outer header must be restored for
the segmented skb, instead of the NSH header.
To do that, let's calculate the outer header position relatively from
the inner header and set skb->{data,mac_header,protocol} properly.
[0]:
BUG: KMSAN: uninit-value in ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline]
BUG: KMSAN: uninit-value in ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
BUG: KMSAN: uninit-value in ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668
ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline]
ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668
ipvlan_start_xmit+0x5c/0x1a0 drivers/net/ipvlan/ipvlan_main.c:222
__netdev_start_xmit include/linux/netdevice.h:4989 [inline]
netdev_start_xmit include/linux/netdevice.h:5003 [inline]
xmit_one net/core/dev.c:3547 [inline]
dev_hard_start_xmit+0x244/0xa10 net/core/dev.c:3563
__dev_queue_xmit+0x33ed/0x51c0 net/core/dev.c:4351
dev_queue_xmit include/linux/netdevice.h:3171 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3819 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
__do_kmalloc_node mm/slub.c:3980 [inline]
__kmalloc_node_track_caller+0x705/0x1000 mm/slub.c:4001
kmalloc_reserve+0x249/0x4a0 net/core/skbuff.c:582
__alloc_skb+0x352/0x790 net/core/skbuff.c:651
skb_segment+0x20aa/0x7080 net/core/skbuff.c:4647
udp6_ufo_fragment+0xcab/0x1150 net/ipv6/udp_offload.c:109
ipv6_gso_segment+0x14be/0x2ca0 net/ipv6/ip6_offload.c:152
skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53
nsh_gso_segment+0x6f4/0xf70 net/nsh/nsh.c:108
skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53
__skb_gso_segment+0x4b0/0x730 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x107f/0x1930 net/core/dev.c:3628
__dev_queue_xmit+0x1f28/0x51c0 net/core/dev.c:4343
dev_queue_xmit include/linux/netdevice.h:3171 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
CPU: 1 PID: 5101 Comm: syz-executor421 Not tainted 6.8.0-rc5-syzkaller-00297-gf2e367d6ad3b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Fixes:
|
||
Josef Bacik
|
95ebd5fc15 |
sunrpc: add a struct rpc_stats arg to rpc_create_args
[ Upstream commit 2057a48d0dd00c6a2a94ded7df2bf1d3f2a4a0da ] We want to be able to have our rpc stats handled in a per network namespace manner, so add an option to rpc_create_args to specify a different rpc_stats struct instead of using the one on the rpc_program. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Stable-dep-of: 24457f1be29f ("nfs: Handle error of rpc_proc_register() in nfs_net_init().") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Johannes Berg
|
5a730a161a |
wifi: nl80211: don't free NULL coalescing rule
[ Upstream commit 801ea33ae82d6a9d954074fbcf8ea9d18f1543a7 ]
If the parsing fails, we can dereference a NULL pointer here.
Cc: stable@vger.kernel.org
Fixes:
|
||
Yick Xie
|
16affc4d73 |
udp: preserve the connected status if only UDP cmsg
commit 680d11f6e5427b6af1321932286722d24a8b16c1 upstream.
If "udp_cmsg_send()" returned 0 (i.e. only UDP cmsg),
"connected" should not be set to 0. Otherwise it stops
the connected socket from using the cached route.
Fixes:
|
||
Eric Dumazet
|
ab31bc5022 |
tcp: Fix NEW_SYN_RECV handling in inet_twsk_purge()
commit 1c4e97dd2d3c9a3e84f7e26346aa39bc426d3249 upstream. inet_twsk_purge() uses rcu to find TIME_WAIT and NEW_SYN_RECV objects to purge. These objects use SLAB_TYPESAFE_BY_RCU semantic and need special care. We need to use refcount_inc_not_zero(&sk->sk_refcnt). Reuse the existing correct logic I wrote for TIME_WAIT, because both structures have common locations for sk_state, sk_family, and netns pointer. If after the refcount_inc_not_zero() the object fields longer match the keys, use sock_gen_put(sk) to release the refcount. Then we can call inet_twsk_deschedule_put() for TIME_WAIT, inet_csk_reqsk_queue_drop_and_put() for NEW_SYN_RECV sockets, with BH disabled. Then we need to restart the loop because we had drop rcu_read_lock(). Fixes: 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()") Link: https://lore.kernel.org/netdev/CANn89iLvFuuihCtt9PME2uS1WJATnf5fKjDToa1WzVnRzHnPfg@mail.gmail.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240308200122.64357-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> [shaozhengchao: resolved conflicts in 5.10] Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Kuniyuki Iwashima
|
74e5e5601d |
tcp: Clean up kernel listener's reqsk in inet_twsk_purge()
commit 740ea3c4a0b2e326b23d7cdf05472a0e92aa39bc upstream. Eric Dumazet reported a use-after-free related to the per-netns ehash series. [0] When we create a TCP socket from userspace, the socket always holds a refcnt of the netns. This guarantees that a reqsk timer is always fired before netns dismantle. Each reqsk has a refcnt of its listener, so the listener is not freed before the reqsk, and the net is not freed before the listener as well. OTOH, when in-kernel users create a TCP socket, it might not hold a refcnt of its netns. Thus, a reqsk timer can be fired after the netns dismantle and access freed per-netns ehash. To avoid the use-after-free, we need to clean up TCP_NEW_SYN_RECV sockets in inet_twsk_purge() if the netns uses a per-netns ehash. [0]: https://lore.kernel.org/netdev/CANn89iLXMup0dRD_Ov79Xt8N9FM0XdhCHEN05sf3eLwxKweM6w@mail.gmail.com/ BUG: KASAN: use-after-free in tcp_or_dccp_get_hashinfo include/net/inet_hashtables.h:181 [inline] BUG: KASAN: use-after-free in reqsk_queue_unlink+0x320/0x350 net/ipv4/inet_connection_sock.c:913 Read of size 8 at addr ffff88807545bd80 by task syz-executor.2/8301 CPU: 1 PID: 8301 Comm: syz-executor.2 Not tainted 6.0.0-syzkaller-02757-gaf7d23f9d96a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 tcp_or_dccp_get_hashinfo include/net/inet_hashtables.h:181 [inline] reqsk_queue_unlink+0x320/0x350 net/ipv4/inet_connection_sock.c:913 inet_csk_reqsk_queue_drop net/ipv4/inet_connection_sock.c:927 [inline] inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:939 [inline] reqsk_timer_handler+0x724/0x1160 net/ipv4/inet_connection_sock.c:1053 call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474 expire_timers kernel/time/timer.c:1519 [inline] __run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790 __run_timers kernel/time/timer.c:1768 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803 __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571 invoke_softirq kernel/softirq.c:445 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1107 </IRQ> Fixes: d1e5e6408b30 ("tcp: Introduce optional per-netns ehash.") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221012145036.74960-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> [shaozhengchao: resolved conflicts in 5.10] Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Rahul Rameshbabu
|
f3a2f186a1 |
ethernet: Add helper for assigning packet type when dest address does not match device address
commit 6e159fd653d7ebf6290358e0330a0cb8a75cf73b upstream. Enable reuse of logic in eth_type_trans for determining packet type. Suggested-by: Sabrina Dubroca <sd@queasysnail.net> Cc: stable@vger.kernel.org Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240423181319.115860-3-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Nathan Chancellor
|
de657b2109 |
Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()
commit 9bf4e919ccad613b3596eebf1ff37b05b6405307 upstream. After an innocuous optimization change in LLVM main (19.0.0), x86_64 allmodconfig (which enables CONFIG_KCSAN / -fsanitize=thread) fails to build due to the checks in check_copy_size(): In file included from net/bluetooth/sco.c:27: In file included from include/linux/module.h:13: In file included from include/linux/stat.h:19: In file included from include/linux/time.h:60: In file included from include/linux/time32.h:13: In file included from include/linux/timex.h:67: In file included from arch/x86/include/asm/timex.h:6: In file included from arch/x86/include/asm/tsc.h:10: In file included from arch/x86/include/asm/msr.h:15: In file included from include/linux/percpu.h:7: In file included from include/linux/smp.h:118: include/linux/thread_info.h:244:4: error: call to '__bad_copy_from' declared with 'error' attribute: copy source size is too small 244 | __bad_copy_from(); | ^ The same exact error occurs in l2cap_sock.c. The copy_to_user() statements that are failing come from l2cap_sock_getsockopt_old() and sco_sock_getsockopt_old(). This does not occur with GCC with or without KCSAN or Clang without KCSAN enabled. len is defined as an 'int' because it is assigned from '__user int *optlen'. However, it is clamped against the result of sizeof(), which has a type of 'size_t' ('unsigned long' for 64-bit platforms). This is done with min_t() because min() requires compatible types, which results in both len and the result of sizeof() being casted to 'unsigned int', meaning len changes signs and the result of sizeof() is truncated. From there, len is passed to copy_to_user(), which has a third parameter type of 'unsigned long', so it is widened and changes signs again. This excessive casting in combination with the KCSAN instrumentation causes LLVM to fail to eliminate the __bad_copy_from() call, failing the build. The official recommendation from LLVM developers is to consistently use long types for all size variables to avoid the unnecessary casting in the first place. Change the type of len to size_t in both l2cap_sock_getsockopt_old() and sco_sock_getsockopt_old(). This clears up the error while allowing min_t() to be replaced with min(), resulting in simpler code with no casts and fewer implicit conversions. While len is a different type than optlen now, it should result in no functional change because the result of sizeof() will clamp all values of optlen in the same manner as before. Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2007 Link: https://github.com/llvm/llvm-project/issues/85647 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Kuniyuki Iwashima
|
ae7c8f52aa |
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
[ Upstream commit 1971d13ffa84a551d29a81fdf5b5ec5be166ac83 ] syzbot reported a lockdep splat regarding unix_gc_lock and unix_state_lock(). One is called from recvmsg() for a connected socket, and another is called from GC for TCP_LISTEN socket. So, the splat is false-positive. Let's add a dedicated lock class for the latter to suppress the splat. Note that this change is not necessary for net-next.git as the issue is only applied to the old GC impl. [0]: WARNING: possible circular locking dependency detected 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted ----------------------------------------------------- kworker/u8:1/11 is trying to acquire lock: ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 but task is already holding lock: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (unix_gc_lock){+.+.}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] unix_notinflight+0x13d/0x390 net/unix/garbage.c:140 unix_detach_fds net/unix/af_unix.c:1819 [inline] unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188 skb_release_all net/core/skbuff.c:1200 [inline] __kfree_skb net/core/skbuff.c:1216 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252 kfree_skb include/linux/skbuff.h:1262 [inline] manage_oob net/unix/af_unix.c:2672 [inline] unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749 unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981 do_splice_read fs/splice.c:985 [inline] splice_file_to_pipe+0x299/0x500 fs/splice.c:1295 do_splice+0xf2d/0x1880 fs/splice.c:1379 __do_splice fs/splice.c:1436 [inline] __do_sys_splice fs/splice.c:1652 [inline] __se_sys_splice+0x331/0x4a0 fs/splice.c:1634 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&u->lock){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(unix_gc_lock); lock(&u->lock); lock(unix_gc_lock); lock(&u->lock); *** DEADLOCK *** 3 locks held by kworker/u8:1/11: #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline] #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline] #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 stack backtrace: CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Pablo Neira Ayuso
|
e4bb6da24d |
netfilter: nf_tables: honor table dormant flag from netdev release event path
[ Upstream commit 8e30abc9ace4f0add4cd761dfdbfaebae5632dd2 ]
Check for table dormant flag otherwise netdev release event path tries
to unregister an already unregistered hook.
[524854.857999] ------------[ cut here ]------------
[524854.858010] WARNING: CPU: 0 PID: 3386599 at net/netfilter/core.c:501 __nf_unregister_net_hook+0x21a/0x260
[...]
[524854.858848] CPU: 0 PID: 3386599 Comm: kworker/u32:2 Not tainted 6.9.0-rc3+ #365
[524854.858869] Workqueue: netns cleanup_net
[524854.858886] RIP: 0010:__nf_unregister_net_hook+0x21a/0x260
[524854.858903] Code: 24 e8 aa 73 83 ff 48 63 43 1c 83 f8 01 0f 85 3d ff ff ff e8 98 d1 f0 ff 48 8b 3c 24 e8 8f 73 83 ff 48 63 43 1c e9 26 ff ff ff <0f> 0b 48 83 c4 18 48 c7 c7 00 68 e9 82 5b 5d 41 5c 41 5d 41 5e 41
[524854.858914] RSP: 0018:ffff8881e36d79e0 EFLAGS: 00010246
[524854.858926] RAX: 0000000000000000 RBX: ffff8881339ae790 RCX: ffffffff81ba524a
[524854.858936] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881c8a16438
[524854.858945] RBP: ffff8881c8a16438 R08: 0000000000000001 R09: ffffed103c6daf34
[524854.858954] R10: ffff8881e36d79a7 R11: 0000000000000000 R12: 0000000000000005
[524854.858962] R13: ffff8881c8a16000 R14: 0000000000000000 R15: ffff8881351b5a00
[524854.858971] FS: 0000000000000000(0000) GS:ffff888390800000(0000) knlGS:0000000000000000
[524854.858982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[524854.858991] CR2: 00007fc9be0f16f4 CR3: 00000001437cc004 CR4: 00000000001706f0
[524854.859000] Call Trace:
[524854.859006] <TASK>
[524854.859013] ? __warn+0x9f/0x1a0
[524854.859027] ? __nf_unregister_net_hook+0x21a/0x260
[524854.859044] ? report_bug+0x1b1/0x1e0
[524854.859060] ? handle_bug+0x3c/0x70
[524854.859071] ? exc_invalid_op+0x17/0x40
[524854.859083] ? asm_exc_invalid_op+0x1a/0x20
[524854.859100] ? __nf_unregister_net_hook+0x6a/0x260
[524854.859116] ? __nf_unregister_net_hook+0x21a/0x260
[524854.859135] nf_tables_netdev_event+0x337/0x390 [nf_tables]
[524854.859304] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859461] ? packet_notifier+0xb3/0x360
[524854.859476] ? _raw_spin_unlock_irqrestore+0x11/0x40
[524854.859489] ? dcbnl_netdevice_event+0x35/0x140
[524854.859507] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859661] notifier_call_chain+0x7d/0x140
[524854.859677] unregister_netdevice_many_notify+0x5e1/0xae0
Fixes:
|
||
Hyunwoo Kim
|
35880c3fa6 |
net: openvswitch: Fix Use-After-Free in ovs_ct_exit
[ Upstream commit 5ea7b72d4fac2fdbc0425cd8f2ea33abe95235b2 ]
Since kfree_rcu, which is called in the hlist_for_each_entry_rcu traversal
of ovs_ct_limit_exit, is not part of the RCU read critical section, it
is possible that the RCU grace period will pass during the traversal and
the key will be free.
To prevent this, it should be changed to hlist_for_each_entry_safe.
Fixes:
|
||
Ismael Luceno
|
aca5dadab1 |
ipvs: Fix checksumming on GSO of SCTP packets
[ Upstream commit e10d3ba4d434ed172914617ed8d74bd411421193 ]
It was observed in the wild that pairs of consecutive packets would leave
the IPVS with the same wrong checksum, and the issue only went away when
disabling GSO.
IPVS needs to avoid computing the SCTP checksum when using GSO.
Fixes:
|
||
Eric Dumazet
|
7da0f91681 |
ipv4: check for NULL idev in ip_route_use_hint()
[ Upstream commit 58a4c9b1e5a3e53c9148e80b90e1e43897ce77d1 ]
syzbot was able to trigger a NULL deref in fib_validate_source()
in an old tree [1].
It appears the bug exists in latest trees.
All calls to __in_dev_get_rcu() must be checked for a NULL result.
[1]
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 2 PID: 3257 Comm: syz-executor.3 Not tainted 5.10.0-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:fib_validate_source+0xbf/0x15a0 net/ipv4/fib_frontend.c:425
Code: 18 f2 f2 f2 f2 42 c7 44 20 23 f3 f3 f3 f3 48 89 44 24 78 42 c6 44 20 27 f3 e8 5d 88 48 fc 4c 89 e8 48 c1 e8 03 48 89 44 24 18 <42> 80 3c 20 00 74 08 4c 89 ef e8 d2 15 98 fc 48 89 5c 24 10 41 bf
RSP: 0018:ffffc900015fee40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88800f7a4000 RCX: ffff88800f4f90c0
RDX: 0000000000000000 RSI: 0000000004001eac RDI: ffff8880160c64c0
RBP: ffffc900015ff060 R08: 0000000000000000 R09: ffff88800f7a4000
R10: 0000000000000002 R11: ffff88800f4f90c0 R12: dffffc0000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800f7a4000
FS: 00007f938acfe6c0(0000) GS:ffff888058c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f938acddd58 CR3: 000000001248e000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
ip_route_use_hint+0x410/0x9b0 net/ipv4/route.c:2231
ip_rcv_finish_core+0x2c4/0x1a30 net/ipv4/ip_input.c:327
ip_list_rcv_finish net/ipv4/ip_input.c:612 [inline]
ip_sublist_rcv+0x3ed/0xe50 net/ipv4/ip_input.c:638
ip_list_rcv+0x422/0x470 net/ipv4/ip_input.c:673
__netif_receive_skb_list_ptype net/core/dev.c:5572 [inline]
__netif_receive_skb_list_core+0x6b1/0x890 net/core/dev.c:5620
__netif_receive_skb_list net/core/dev.c:5672 [inline]
netif_receive_skb_list_internal+0x9f9/0xdc0 net/core/dev.c:5764
netif_receive_skb_list+0x55/0x3e0 net/core/dev.c:5816
xdp_recv_frames net/bpf/test_run.c:257 [inline]
xdp_test_run_batch net/bpf/test_run.c:335 [inline]
bpf_test_run_xdp_live+0x1818/0x1d00 net/bpf/test_run.c:363
bpf_prog_test_run_xdp+0x81f/0x1170 net/bpf/test_run.c:1376
bpf_prog_test_run+0x349/0x3c0 kernel/bpf/syscall.c:3736
__sys_bpf+0x45c/0x710 kernel/bpf/syscall.c:5115
__do_sys_bpf kernel/bpf/syscall.c:5201 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5199 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5199
Fixes:
|
||
Florian Westphal
|
e3b887a9c1 |
netfilter: nft_set_pipapo: do not free live element
[ Upstream commit 3cfc9ec039af60dbd8965ae085b2c2ccdcfbe1cc ]
Pablo reports a crash with large batches of elements with a
back-to-back add/remove pattern. Quoting Pablo:
add_elem("00000000") timeout 100 ms
...
add_elem("0000000X") timeout 100 ms
del_elem("0000000X") <---------------- delete one that was just added
...
add_elem("00005000") timeout 100 ms
1) nft_pipapo_remove() removes element 0000000X
Then, KASAN shows a splat.
Looking at the remove function there is a chance that we will drop a
rule that maps to a non-deactivated element.
Removal happens in two steps, first we do a lookup for key k and return the
to-be-removed element and mark it as inactive in the next generation.
Then, in a second step, the element gets removed from the set/map.
The _remove function does not work correctly if we have more than one
element that share the same key.
This can happen if we insert an element into a set when the set already
holds an element with same key, but the element mapping to the existing
key has timed out or is not active in the next generation.
In such case its possible that removal will unmap the wrong element.
If this happens, we will leak the non-deactivated element, it becomes
unreachable.
The element that got deactivated (and will be freed later) will
remain reachable in the set data structure, this can result in
a crash when such an element is retrieved during lookup (stale
pointer).
Add a check that the fully matching key does in fact map to the element
that we have marked as inactive in the deactivation step.
If not, we need to continue searching.
Add a bug/warn trap at the end of the function as well, the remove
function must not ever be called with an invisible/unreachable/non-existent
element.
v2: avoid uneeded temporary variable (Stefano)
Fixes:
|
||
Ziyang Xuan
|
934e66e231 |
netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get()
[ Upstream commit f969eb84ce482331a991079ab7a5c4dc3b7f89bf ]
nft_unregister_expr() can concurrent with __nft_expr_type_get(),
and there is not any protection when iterate over nf_tables_expressions
list in __nft_expr_type_get(). Therefore, there is potential data-race
of nf_tables_expressions list entry.
Use list_for_each_entry_rcu() to iterate over nf_tables_expressions
list in __nft_expr_type_get(), and use rcu_read_lock() in the caller
nft_expr_type_get() to protect the entire type query process.
Fixes:
|
||
Michal Luczaj
|
2e2a03787f |
af_unix: Fix garbage collector racing against connect()
[ Upstream commit 47d8ac011fe1c9251070e1bd64cb10b48193ec51 ]
Garbage collector does not take into account the risk of embryo getting
enqueued during the garbage collection. If such embryo has a peer that
carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list.
sockets are AF_UNIX/SOCK_STREAM
S is an unconnected socket
L is a listening in-flight socket bound to addr, not in fdtable
V's fd will be passed via sendmsg(), gets inflight count bumped
connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc()
---------------- ------------------------- -----------
NS = unix_create1()
skb1 = sock_wmalloc(NS)
L = unix_find_other(addr)
unix_state_lock(L)
unix_peer(S) = NS
// V count=1 inflight=0
NS = unix_peer(S)
skb2 = sock_alloc()
skb_queue_tail(NS, skb2[V])
// V became in-flight
// V count=2 inflight=1
close(V)
// V count=1 inflight=1
// GC candidate condition met
for u in gc_inflight_list:
if (total_refs == inflight_refs)
add u to gc_candidates
// gc_candidates={L, V}
for u in gc_candidates:
scan_children(u, dec_inflight)
// embryo (skb1) was not
// reachable from L yet, so V's
// inflight remains unchanged
__skb_queue_tail(L, skb1)
unix_state_unlock(L)
for u in gc_candidates:
if (u.inflight)
scan_children(u, inc_inflight_move_tail)
// V count=1 inflight=2 (!)
If there is a GC-candidate listening socket, lock/unlock its state. This
makes GC wait until the end of any ongoing connect() to that socket. After
flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
this point, unix_inflight() can not happen because unix_gc_lock is already
taken. Inflight graph remains unaffected.
Fixes:
|
||
Kuniyuki Iwashima
|
14bea27d1c |
af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
[ Upstream commit 97af84a6bba2ab2b9c704c08e67de3b5ea551bb2 ] When touching unix_sk(sk)->inflight, we are always under spin_lock(&unix_gc_lock). Let's convert unix_sk(sk)->inflight to the normal unsigned long. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Eric Dumazet
|
cf4bc359b7 |
netfilter: complete validation of user input
[ Upstream commit 65acf6e0501ac8880a4f73980d01b5d27648b956 ] In my recent commit, I missed that do_replace() handlers use copy_from_sockptr() (which I fixed), followed by unsafe copy_from_sockptr_offset() calls. In all functions, we can perform the @optlen validation before even calling xt_alloc_table_info() with the following check: if ((u64)optlen < (u64)tmp.size + sizeof(tmp)) return -EINVAL; Fixes: 0c83842df40f ("netfilter: validate user input for expected length") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20240409120741.3538135-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Luiz Augusto von Dentz
|
b0e30c3769 |
Bluetooth: SCO: Fix not validating setsockopt user input
[ Upstream commit 51eda36d33e43201e7a4fd35232e069b2c850b01 ] syzbot reported sco_sock_setsockopt() is copying data without checking user input length. BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in sco_sock_setsockopt+0xc0b/0xf90 net/bluetooth/sco.c:893 Read of size 4 at addr ffff88805f7b15a3 by task syz-executor.5/12578 Fixes: |
||
Jiri Benc
|
3fb02ec57e |
ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
[ Upstream commit 7633c4da919ad51164acbf1aa322cc1a3ead6129 ]
Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
still means hlist_for_each_entry_rcu can return an item that got removed
from the list. The memory itself of such item is not freed thanks to RCU
but nothing guarantees the actual content of the memory is sane.
In particular, the reference count can be zero. This can happen if
ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
timing, this can happen:
1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
2. Then, the whole ipv6_del_addr is executed for the given entry. The
reference count drops to zero and kfree_rcu is scheduled.
3. ipv6_get_ifaddr continues and tries to increments the reference count
(in6_ifa_hold).
4. The rcu is unlocked and the entry is freed.
5. The freed entry is returned.
Prevent increasing of the reference count in such case. The name
in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
[ 41.506330] refcount_t: addition on 0; use-after-free.
[ 41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130
[ 41.507413] Modules linked in: veth bridge stp llc
[ 41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14
[ 41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
[ 41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130
[ 41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff
[ 41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282
[ 41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000
[ 41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900
[ 41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff
[ 41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000
[ 41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48
[ 41.514086] FS: 00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000
[ 41.514726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0
[ 41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 41.516799] Call Trace:
[ 41.517037] <TASK>
[ 41.517249] ? __warn+0x7b/0x120
[ 41.517535] ? refcount_warn_saturate+0xa5/0x130
[ 41.517923] ? report_bug+0x164/0x190
[ 41.518240] ? handle_bug+0x3d/0x70
[ 41.518541] ? exc_invalid_op+0x17/0x70
[ 41.520972] ? asm_exc_invalid_op+0x1a/0x20
[ 41.521325] ? refcount_warn_saturate+0xa5/0x130
[ 41.521708] ipv6_get_ifaddr+0xda/0xe0
[ 41.522035] inet6_rtm_getaddr+0x342/0x3f0
[ 41.522376] ? __pfx_inet6_rtm_getaddr+0x10/0x10
[ 41.522758] rtnetlink_rcv_msg+0x334/0x3d0
[ 41.523102] ? netlink_unicast+0x30f/0x390
[ 41.523445] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 41.523832] netlink_rcv_skb+0x53/0x100
[ 41.524157] netlink_unicast+0x23b/0x390
[ 41.524484] netlink_sendmsg+0x1f2/0x440
[ 41.524826] __sys_sendto+0x1d8/0x1f0
[ 41.525145] __x64_sys_sendto+0x1f/0x30
[ 41.525467] do_syscall_64+0xa5/0x1b0
[ 41.525794] entry_SYSCALL_64_after_hwframe+0x72/0x7a
[ 41.526213] RIP: 0033:0x7fbc4cfcea9a
[ 41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a
[ 41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005
[ 41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c
[ 41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b
[ 41.531573] </TASK>
Fixes:
|
||
Arnd Bergmann
|
9e55a650ac |
ipv4/route: avoid unused-but-set-variable warning
[ Upstream commit cf1b7201df59fb936f40f4a807433fe3f2ce310a ]
The log_martians variable is only used in an #ifdef, causing a 'make W=1'
warning with gcc:
net/ipv4/route.c: In function 'ip_rt_send_redirect':
net/ipv4/route.c:880:13: error: variable 'log_martians' set but not used [-Werror=unused-but-set-variable]
Change the #ifdef to an equivalent IS_ENABLED() to let the compiler
see where the variable is used.
Fixes:
|
||
Arnd Bergmann
|
1afc86bcfb |
ipv6: fib: hide unused 'pn' variable
[ Upstream commit 74043489fcb5e5ca4074133582b5b8011b67f9e7 ]
When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing
a 'make W=1' warning:
net/ipv6/ip6_fib.c: In function 'fib6_add':
net/ipv6/ip6_fib.c:1388:32: error: variable 'pn' set but not used [-Werror=unused-but-set-variable]
Add another #ifdef around the variable declaration, matching the other
uses in this file.
Fixes:
|
||
Eric Dumazet
|
a82984b3c6 |
xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
[ Upstream commit 237f3cf13b20db183d3706d997eedc3c49eacd44 ]
syzbot reported an illegal copy in xsk_setsockopt() [1]
Make sure to validate setsockopt() @optlen parameter.
[1]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
copy_from_sockptr include/linux/sockptr.h:55 [inline]
xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7fb40587de69
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69
RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006
RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000
R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08
</TASK>
Allocated by task 7549:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slub.c:3966 [inline]
__kmalloc+0x233/0x4a0 mm/slub.c:3979
kmalloc include/linux/slab.h:632 [inline]
__cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869
do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
The buggy address belongs to the object at ffff888028c6cde0
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 1 bytes to the right of
allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
The buggy address belongs to the physical page:
page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c
anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001
raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
prep_new_page mm/page_alloc.c:1540 [inline]
get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
__alloc_pages+0x256/0x680 mm/page_alloc.c:4569
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page+0x5f/0x160 mm/slub.c:2175
allocate_slab mm/slub.c:2338 [inline]
new_slab+0x84/0x2f0 mm/slub.c:2391
___slab_alloc+0xc73/0x1260 mm/slub.c:3525
__slab_alloc mm/slub.c:3610 [inline]
__slab_alloc_node mm/slub.c:3663 [inline]
slab_alloc_node mm/slub.c:3835 [inline]
__do_kmalloc_node mm/slub.c:3965 [inline]
__kmalloc_node+0x2db/0x4e0 mm/slub.c:3973
kmalloc_node include/linux/slab.h:648 [inline]
__vmalloc_area_node mm/vmalloc.c:3197 [inline]
__vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392
__vmalloc_node mm/vmalloc.c:3457 [inline]
vzalloc+0x79/0x90 mm/vmalloc.c:3530
bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162
bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895
__sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631
__do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
page last free pid 6650 tgid 6647 stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1140 [inline]
free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346
free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532
release_pages+0x2117/0x2400 mm/swap.c:1042
tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300
tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392
exit_mmap+0x4b6/0xd40 mm/mmap.c:3300
__mmput+0x115/0x3c0 kernel/fork.c:1345
exit_mm+0x220/0x310 kernel/exit.c:569
do_exit+0x99e/0x27e0 kernel/exit.c:865
do_group_exit+0x207/0x2c0 kernel/exit.c:1027
get_signal+0x176e/0x1850 kernel/signal.c:2907
arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310
exit_to_user_mode_loop kernel/entry/common.c:105 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline]
syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212
do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Memory state around the buggy address:
ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
>ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
^
ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
Fixes:
|
||
Ilya Maximets
|
583b7b856f |
net: openvswitch: fix unwanted error log on timeout policy probing
[ Upstream commit 4539f91f2a801c0c028c252bffae56030cfb2cae ]
On startup, ovs-vswitchd probes different datapath features including
support for timeout policies. While probing, it tries to execute
certain operations with OVS_PACKET_ATTR_PROBE or OVS_FLOW_ATTR_PROBE
attributes set. These attributes tell the openvswitch module to not
log any errors when they occur as it is expected that some of the
probes will fail.
For some reason, setting the timeout policy ignores the PROBE attribute
and logs a failure anyway. This is causing the following kernel log
on each re-start of ovs-vswitchd:
kernel: Failed to associated timeout policy `ovs_test_tp'
Fix that by using the same logging macro that all other messages are
using. The message will still be printed at info level when needed
and will be rate limited, but with a net rate limiter instead of
generic printk one.
The nf_ct_set_timeout() itself will still print some info messages,
but at least this change makes logging in openvswitch module more
consistent.
Fixes:
|
||
Dmitry Antipov
|
8478394f76 |
Bluetooth: Fix memory leak in hci_req_sync_complete()
commit 45d355a926ab40f3ae7bc0b0a00cb0e3e8a5a810 upstream.
In 'hci_req_sync_complete()', always free the previous sync
request state before assigning reference to a new one.
Reported-by: syzbot+39ec16ff6cc18b1d066d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=39ec16ff6cc18b1d066d
Cc: stable@vger.kernel.org
Fixes:
|
||
Sven Eckelmann
|
70a8be9dc2 |
batman-adv: Avoid infinite loop trying to resize local TT
commit b1f532a3b1e6d2e5559c7ace49322922637a28aa upstream.
If the MTU of one of an attached interface becomes too small to transmit
the local translation table then it must be resized to fit inside all
fragments (when enabled) or a single packet.
But if the MTU becomes too low to transmit even the header + the VLAN
specific part then the resizing of the local TT will never succeed. This
can for example happen when the usable space is 110 bytes and 11 VLANs are
on top of batman-adv. In this case, at least 116 byte would be needed.
There will just be an endless spam of
batman_adv: batadv0: Forced to purge local tt entries to fit new maximum fragment MTU (110)
in the log but the function will never finish. Problem here is that the
timeout will be halved all the time and will then stagnate at 0 and
therefore never be able to reduce the table even more.
There are other scenarios possible with a similar result. The number of
BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too
high to fit inside a packet. Such a scenario can therefore happen also with
only a single VLAN + 7 non-purgable addresses - requiring at least 120
bytes.
While this should be handled proactively when:
* interface with too low MTU is added
* VLAN is added
* non-purgeable local mac is added
* MTU of an attached interface is reduced
* fragmentation setting gets disabled (which most likely requires dropping
attached interfaces)
not all of these scenarios can be prevented because batman-adv is only
consuming events without the the possibility to prevent these actions
(non-purgable MAC address added, MTU of an attached interface is reduced).
It is therefore necessary to also make sure that the code is able to handle
also the situations when there were already incompatible system
configuration are present.
Cc: stable@vger.kernel.org
Fixes:
|
||
Pablo Neira Ayuso
|
b58d0ac35f |
netfilter: nf_tables: discard table flag update with pending basechain deletion
commit 1bc83a019bbe268be3526406245ec28c2458a518 upstream. Hook unregistration is deferred to the commit phase, same occurs with hook updates triggered by the table dormant flag. When both commands are combined, this results in deleting a basechain while leaving its hook still registered in the core. Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Pablo Neira Ayuso
|
2cee2ff7f8 |
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
commit 0d459e2ffb541841714839e8228b845458ed3b27 upstream. The commit mutex should not be released during the critical section between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC worker could collect expired objects and get the released commit lock within the same GC sequence. nf_tables_module_autoload() temporarily releases the mutex to load module dependencies, then it goes back to replay the transaction again. Move it at the end of the abort phase after nft_gc_seq_end() is called. Cc: stable@vger.kernel.org Fixes: 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path") Reported-by: Kuan-Ting Chen <hexrabbit@devco.re> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Pablo Neira Ayuso
|
453c8da7ef |
netfilter: nf_tables: release batch on table validation from abort path
commit a45e6889575c2067d3c0212b6bc1022891e65b91 upstream.
Unlike early commit path stage which triggers a call to abort, an
explicit release of the batch is required on abort, otherwise mutex is
released and commit_list remains in place.
Add WARN_ON_ONCE to ensure commit_list is empty from the abort path
before releasing the mutex.
After this patch, commit_list is always assumed to be empty before
grabbing the mutex, therefore
03c1f1ef1584 ("netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()")
only needs to release the pending modules for registration.
Cc: stable@vger.kernel.org
Fixes:
|
||
Eric Dumazet
|
bc4d1ebca1 |
net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list()
[ Upstream commit 00af2aa93b76b1bade471ad0d0525d4d29ca5cc0 ] Many syzbot reports show extreme rtnl pressure, and many of them hint that smc acquires rtnl in netns creation for no good reason [1] This patch returns early from smc_pnet_net_init() if there is no netdevice yet. I am not even sure why smc_pnet_create_pnetids_list() even exists, because smc_pnet_netdev_event() is also calling smc_pnet_add_base_pnetid() when handling NETDEV_UP event. [1] extract of typical syzbot reports 2 locks held by syz-executor.3/12252: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.4/12253: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.1/12257: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.2/12261: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.0/12265: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.3/12268: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.4/12271: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.1/12274: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.2/12280: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Jan Karcher <jaka@linux.ibm.com> Cc: "D. Wythe" <alibuda@linux.alibaba.com> Cc: Tony Lu <tonylu@linux.alibaba.com> Cc: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20240302100744.3868021-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Davide Caratti
|
c137ee44c5 |
mptcp: don't account accept() of non-MPC client as fallback to TCP
commit 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 upstream.
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes:
|
||
Antoine Tenart
|
3391b15778 |
udp: do not accept non-tunnel GSO skbs landing in a tunnel
[ Upstream commit 3d010c8031e39f5fa1e8b13ada77e0321091011f ]
When rx-udp-gro-forwarding is enabled UDP packets might be GROed when
being forwarded. If such packets might land in a tunnel this can cause
various issues and udp_gro_receive makes sure this isn't the case by
looking for a matching socket. This is performed in
udp4/6_gro_lookup_skb but only in the current netns. This is an issue
with tunneled packets when the endpoint is in another netns. In such
cases the packets will be GROed at the UDP level, which leads to various
issues later on. The same thing can happen with rx-gro-list.
We saw this with geneve packets being GROed at the UDP level. In such
case gso_size is set; later the packet goes through the geneve rx path,
the geneve header is pulled, the offset are adjusted and frag_list skbs
are not adjusted with regard to geneve. When those skbs hit
skb_fragment, it will misbehave. Different outcomes are possible
depending on what the GROed skbs look like; from corrupted packets to
kernel crashes.
One example is a BUG_ON[1] triggered in skb_segment while processing the
frag_list. Because gso_size is wrong (geneve header was pulled)
skb_segment thinks there is "geneve header size" of data in frag_list,
although it's in fact the next packet. The BUG_ON itself has nothing to
do with the issue. This is only one of the potential issues.
Looking up for a matching socket in udp_gro_receive is fragile: the
lookup could be extended to all netns (not speaking about performances)
but nothing prevents those packets from being modified in between and we
could still not find a matching socket. It's OK to keep the current
logic there as it should cover most cases but we also need to make sure
we handle tunnel packets being GROed too early.
This is done by extending the checks in udp_unexpected_gso: GSO packets
lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must
be segmented.
[1] kernel BUG at net/core/skbuff.c:4408!
RIP: 0010:skb_segment+0xd2a/0xf70
__udp_gso_segment+0xaa/0x560
Fixes:
|
||
Antoine Tenart
|
360edeb621 |
udp: do not transition UDP GRO fraglist partial checksums to unnecessary
commit f0b8c30345565344df2e33a8417a27503589247d upstream.
UDP GRO validates checksums and in udp4/6_gro_complete fraglist packets
are converted to CHECKSUM_UNNECESSARY to avoid later checks. However
this is an issue for CHECKSUM_PARTIAL packets as they can be looped in
an egress path and then their partial checksums are not fixed.
Different issues can be observed, from invalid checksum on packets to
traces like:
gen01: hw csum failure
skb len=3008 headroom=160 headlen=1376 tailroom=0
mac=(106,14) net=(120,40) trans=160
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xffff232e ip_summed=2 complete_sw=0 valid=0 level=0)
hash(0x77e3d716 sw=1 l4=1) proto=0x86dd pkttype=0 iif=12
...
Fix this by only converting CHECKSUM_NONE packets to
CHECKSUM_UNNECESSARY by reusing __skb_incr_checksum_unnecessary. All
other checksum types are kept as-is, including CHECKSUM_COMPLETE as
fraglist packets being segmented back would have their skb->csum valid.
Fixes:
|
||
Kuniyuki Iwashima
|
fd307f2d91 |
ipv6: Fix infinite recursion in fib6_dump_done().
commit d21d40605bca7bd5fc23ef03d4c1ca1f48bc2cae upstream.
syzkaller reported infinite recursive calls of fib6_dump_done() during
netlink socket destruction. [1]
From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then
the response was generated. The following recvmmsg() resumed the dump
for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due
to the fault injection. [0]
12:01:34 executing program 3:
r0 = socket$nl_route(0x10, 0x3, 0x0)
sendmsg$nl_route(r0, ... snip ...)
recvmmsg(r0, ... snip ...) (fail_nth: 8)
Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call
of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3]. syzkaller stopped
receiving the response halfway through, and finally netlink_sock_destruct()
called nlk_sk(sk)->cb.done().
fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it
is still not NULL. fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by
nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling
itself recursively and hitting the stack guard page.
To avoid the issue, let's set the destructor after kzalloc().
[0]:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:117)
should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
should_failslab (mm/slub.c:3733)
kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992)
inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662)
rtnl_dump_all (net/core/rtnetlink.c:4029)
netlink_dump (net/netlink/af_netlink.c:2269)
netlink_recvmsg (net/netlink/af_netlink.c:1988)
____sys_recvmsg (net/socket.c:1046 net/socket.c:2801)
___sys_recvmsg (net/socket.c:2846)
do_recvmmsg (net/socket.c:2943)
__x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034)
[1]:
BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb)
stack guard page: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: events netlink_sock_destruct_work
RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570)
Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff
RSP: 0018:ffffc9000d980000 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3
RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358
RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000
R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68
FS: 0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<#DF>
</#DF>
<TASK>
fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
...
fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
netlink_sock_destruct (net/netlink/af_netlink.c:401)
__sk_destruct (net/core/sock.c:2177 (discriminator 2))
sk_destruct (net/core/sock.c:2224)
__sk_free (net/core/sock.c:2235)
sk_free (net/core/sock.c:2246)
process_one_work (kernel/workqueue.c:3259)
worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
kthread (kernel/kthread.c:388)
ret_from_fork (arch/x86/kernel/process.c:153)
ret_from_fork_asm (arch/x86/entry/entry_64.S:256)
Modules linked in:
Fixes:
|
||
Eric Dumazet
|
b14b9f9503 |
erspan: make sure erspan_base_hdr is present in skb->head
commit 17af420545a750f763025149fa7b833a4fc8b8f0 upstream.
syzbot reported a problem in ip6erspan_rcv() [1]
Issue is that ip6erspan_rcv() (and erspan_rcv()) no longer make
sure erspan_base_hdr is present in skb linear part (skb->head)
before getting @ver field from it.
Add the missing pskb_may_pull() calls.
v2: Reload iph pointer in erspan_rcv() after pskb_may_pull()
because skb->head might have changed.
[1]
BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2756 [inline]
BUG: KMSAN: uninit-value in ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
BUG: KMSAN: uninit-value in gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
pskb_may_pull include/linux/skbuff.h:2756 [inline]
ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
ip6_protocol_deliver_rcu+0x1d4c/0x2ca0 net/ipv6/ip6_input.c:438
ip6_input_finish net/ipv6/ip6_input.c:483 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492
ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586
dst_input include/net/dst.h:460 [inline]
ip6_rcv_finish+0x955/0x970 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:314 [inline]
ipv6_rcv+0xde/0x390 net/ipv6/ip6_input.c:310
__netif_receive_skb_one_core net/core/dev.c:5538 [inline]
__netif_receive_skb+0x1da/0xa00 net/core/dev.c:5652
netif_receive_skb_internal net/core/dev.c:5738 [inline]
netif_receive_skb+0x58/0x660 net/core/dev.c:5798
tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1549
tun_get_user+0x5566/0x69e0 drivers/net/tun.c:2002
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2108 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xb63/0x1520 fs/read_write.c:590
ksys_write+0x20f/0x4c0 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__x64_sys_write+0x93/0xe0 fs/read_write.c:652
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3804 [inline]
slab_alloc_node mm/slub.c:3845 [inline]
kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
__alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
alloc_skb include/linux/skbuff.h:1318 [inline]
alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
tun_alloc_skb drivers/net/tun.c:1525 [inline]
tun_get_user+0x209a/0x69e0 drivers/net/tun.c:1846
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2108 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xb63/0x1520 fs/read_write.c:590
ksys_write+0x20f/0x4c0 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__x64_sys_write+0x93/0xe0 fs/read_write.c:652
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 1 PID: 5045 Comm: syz-executor114 Not tainted 6.9.0-rc1-syzkaller-00021-g962490525cff #0
Fixes:
|
||
Eric Dumazet
|
5e45dc4408 |
net/sched: act_skbmod: prevent kernel-infoleak
commit d313eb8b77557a6d5855f42d2234bd592c7b50dd upstream.
syzbot found that tcf_skbmod_dump() was copying four bytes
from kernel stack to user space [1].
The issue here is that 'struct tc_skbmod' has a four bytes hole.
We need to clear the structure before filling fields.
[1]
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline]
BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline]
BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
instrument_copy_to_user include/linux/instrumented.h:114 [inline]
copy_to_user_iter lib/iov_iter.c:24 [inline]
iterate_ubuf include/linux/iov_iter.h:29 [inline]
iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
iterate_and_advance include/linux/iov_iter.h:271 [inline]
_copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
copy_to_iter include/linux/uio.h:196 [inline]
simple_copy_to_iter net/core/datagram.c:532 [inline]
__skb_datagram_iter+0x185/0x1000 net/core/datagram.c:420
skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
skb_copy_datagram_msg include/linux/skbuff.h:4050 [inline]
netlink_recvmsg+0x432/0x1610 net/netlink/af_netlink.c:1962
sock_recvmsg_nosec net/socket.c:1046 [inline]
sock_recvmsg+0x2c4/0x340 net/socket.c:1068
__sys_recvfrom+0x35a/0x5f0 net/socket.c:2242
__do_sys_recvfrom net/socket.c:2260 [inline]
__se_sys_recvfrom net/socket.c:2256 [inline]
__x64_sys_recvfrom+0x126/0x1d0 net/socket.c:2256
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was stored to memory at:
pskb_expand_head+0x30f/0x19d0 net/core/skbuff.c:2253
netlink_trim+0x2c2/0x330 net/netlink/af_netlink.c:1317
netlink_unicast+0x9f/0x1260 net/netlink/af_netlink.c:1351
nlmsg_unicast include/net/netlink.h:1144 [inline]
nlmsg_notify+0x21d/0x2f0 net/netlink/af_netlink.c:2610
rtnetlink_send+0x73/0x90 net/core/rtnetlink.c:741
rtnetlink_maybe_send include/linux/rtnetlink.h:17 [inline]
tcf_add_notify net/sched/act_api.c:2048 [inline]
tcf_action_add net/sched/act_api.c:2071 [inline]
tc_ctl_action+0x146e/0x19d0 net/sched/act_api.c:2119
rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:745
____sys_sendmsg+0x877/0xb60 net/socket.c:2584
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
__sys_sendmsg net/socket.c:2667 [inline]
__do_sys_sendmsg net/socket.c:2676 [inline]
__se_sys_sendmsg net/socket.c:2674 [inline]
__x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was stored to memory at:
__nla_put lib/nlattr.c:1041 [inline]
nla_put+0x1c6/0x230 lib/nlattr.c:1099
tcf_skbmod_dump+0x23f/0xc20 net/sched/act_skbmod.c:256
tcf_action_dump_old net/sched/act_api.c:1191 [inline]
tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
tcf_action_dump+0x1fd/0x460 net/sched/act_api.c:1251
tca_get_fill+0x519/0x7a0 net/sched/act_api.c:1628
tcf_add_notify_msg net/sched/act_api.c:2023 [inline]
tcf_add_notify net/sched/act_api.c:2042 [inline]
tcf_action_add net/sched/act_api.c:2071 [inline]
tc_ctl_action+0x1365/0x19d0 net/sched/act_api.c:2119
rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:745
____sys_sendmsg+0x877/0xb60 net/socket.c:2584
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
__sys_sendmsg net/socket.c:2667 [inline]
__do_sys_sendmsg net/socket.c:2676 [inline]
__se_sys_sendmsg net/socket.c:2674 [inline]
__x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Local variable opt created at:
tcf_skbmod_dump+0x9d/0xc20 net/sched/act_skbmod.c:244
tcf_action_dump_old net/sched/act_api.c:1191 [inline]
tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
Bytes 188-191 of 248 are uninitialized
Memory access of size 248 starts at ffff888117697680
Data copied to user address 00007ffe56d855f0
Fixes:
|
||
Jakub Sitnicki
|
dd54b48db0 |
bpf, sockmap: Prevent lock inversion deadlock in map delete elem
commit ff91059932401894e6c86341915615c5eb0eca48 upstream.
syzkaller started using corpuses where a BPF tracing program deletes
elements from a sockmap/sockhash map. Because BPF tracing programs can be
invoked from any interrupt context, locks taken during a map_delete_elem
operation must be hardirq-safe. Otherwise a deadlock due to lock inversion
is possible, as reported by lockdep:
CPU0 CPU1
---- ----
lock(&htab->buckets[i].lock);
local_irq_disable();
lock(&host->lock);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(&host->lock);
Locks in sockmap are hardirq-unsafe by design. We expects elements to be
deleted from sockmap/sockhash only in task (normal) context with interrupts
enabled, or in softirq context.
Detect when map_delete_elem operation is invoked from a context which is
_not_ hardirq-unsafe, that is interrupts are disabled, and bail out with an
error.
Note that map updates are not affected by this issue. BPF verifier does not
allow updating sockmap/sockhash from a BPF tracing program today.
Fixes:
|
||
Eric Dumazet
|
0f038242b7 |
netfilter: validate user input for expected length
commit 0c83842df40f86e529db6842231154772c20edcc upstream.
I got multiple syzbot reports showing old bugs exposed
by BPF after commit 20f2505fb436 ("bpf: Try to avoid kzalloc
in cgroup/{s,g}etsockopt")
setsockopt() @optlen argument should be taken into account
before copying data.
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
__asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
copy_from_sockptr include/linux/sockptr.h:55 [inline]
do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x72/0x7a
RIP: 0033:0x7fd22067dde9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
</TASK>
Allocated by task 7238:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slub.c:4069 [inline]
__kmalloc_noprof+0x200/0x410 mm/slub.c:4082
kmalloc_noprof include/linux/slab.h:664 [inline]
__cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x72/0x7a
The buggy address belongs to the object at ffff88802cd73da0
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes inside of
allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
page_type: 0xffffefff(slab)
raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490
prep_new_page mm/page_alloc.c:1498 [inline]
get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454
__alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712
__alloc_pages_node_noprof include/linux/gfp.h:244 [inline]
alloc_pages_node_noprof include/linux/gfp.h:271 [inline]
alloc_slab_page+0x5f/0x120 mm/slub.c:2249
allocate_slab+0x5a/0x2e0 mm/slub.c:2412
new_slab mm/slub.c:2465 [inline]
___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615
__slab_alloc+0x58/0xa0 mm/slub.c:3705
__slab_alloc_node mm/slub.c:3758 [inline]
slab_alloc_node mm/slub.c:3936 [inline]
__do_kmalloc_node mm/slub.c:4068 [inline]
kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089
kstrdup+0x3a/0x80 mm/util.c:62
device_rename+0xb5/0x1b0 drivers/base/core.c:4558
dev_change_name+0x275/0x860 net/core/dev.c:1232
do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864
__rtnl_newlink net/core/rtnetlink.c:3680 [inline]
rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727
rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
page last free pid 5146 tgid 5146 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1110 [inline]
free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617
discard_slab mm/slub.c:2511 [inline]
__put_partials+0xeb/0x130 mm/slub.c:2980
put_cpu_partial+0x17c/0x250 mm/slub.c:3055
__slab_free+0x2ea/0x3d0 mm/slub.c:4254
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3888 [inline]
slab_alloc_node mm/slub.c:3948 [inline]
__do_kmalloc_node mm/slub.c:4068 [inline]
__kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076
kmalloc_node_noprof include/linux/slab.h:681 [inline]
kvmalloc_node_noprof+0x72/0x190 mm/util.c:634
bucket_table_alloc lib/rhashtable.c:186 [inline]
rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367
rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427
process_one_work kernel/workqueue.c:3218 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
Memory state around the buggy address:
ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc
ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
>ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
^
ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc
ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
Fixes:
|
||
Ziyang Xuan
|
940d41caa7 |
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
commit 24225011d81b471acc0e1e315b7d9905459a6304 upstream.
nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
And thhere is not any protection when iterate over nf_tables_flowtables
list in __nft_flowtable_type_get(). Therefore, there is pertential
data-race of nf_tables_flowtables list entry.
Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
nft_flowtable_type_get() to protect the entire type query process.
Fixes:
|
||
Pablo Neira Ayuso
|
46c4481938 |
netfilter: nf_tables: flush pending destroy work before exit_net release
commit 24cea9677025e0de419989ecb692acd4bb34cac2 upstream.
Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy
work before netlink notifier") to address a race between exit_net and
the destroy workqueue.
The trace below shows an element to be released via destroy workqueue
while exit_net path (triggered via module removal) has already released
the set that is used in such transaction.
[ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465
[ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359
[ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
[ 1360.547984] Call Trace:
[ 1360.547991] <TASK>
[ 1360.547998] dump_stack_lvl+0x53/0x70
[ 1360.548014] print_report+0xc4/0x610
[ 1360.548026] ? __virt_addr_valid+0xba/0x160
[ 1360.548040] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 1360.548054] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548176] kasan_report+0xae/0xe0
[ 1360.548189] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548312] nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548447] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables]
[ 1360.548577] ? _raw_spin_unlock_irq+0x18/0x30
[ 1360.548591] process_one_work+0x2f1/0x670
[ 1360.548610] worker_thread+0x4d3/0x760
[ 1360.548627] ? __pfx_worker_thread+0x10/0x10
[ 1360.548640] kthread+0x16b/0x1b0
[ 1360.548653] ? __pfx_kthread+0x10/0x10
[ 1360.548665] ret_from_fork+0x2f/0x50
[ 1360.548679] ? __pfx_kthread+0x10/0x10
[ 1360.548690] ret_from_fork_asm+0x1a/0x30
[ 1360.548707] </TASK>
[ 1360.548719] Allocated by task 192061:
[ 1360.548726] kasan_save_stack+0x20/0x40
[ 1360.548739] kasan_save_track+0x14/0x30
[ 1360.548750] __kasan_kmalloc+0x8f/0xa0
[ 1360.548760] __kmalloc_node+0x1f1/0x450
[ 1360.548771] nf_tables_newset+0x10c7/0x1b50 [nf_tables]
[ 1360.548883] nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink]
[ 1360.548909] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
[ 1360.548927] netlink_unicast+0x367/0x4f0
[ 1360.548935] netlink_sendmsg+0x34b/0x610
[ 1360.548944] ____sys_sendmsg+0x4d4/0x510
[ 1360.548953] ___sys_sendmsg+0xc9/0x120
[ 1360.548961] __sys_sendmsg+0xbe/0x140
[ 1360.548971] do_syscall_64+0x55/0x120
[ 1360.548982] entry_SYSCALL_64_after_hwframe+0x55/0x5d
[ 1360.548994] Freed by task 192222:
[ 1360.548999] kasan_save_stack+0x20/0x40
[ 1360.549009] kasan_save_track+0x14/0x30
[ 1360.549019] kasan_save_free_info+0x3b/0x60
[ 1360.549028] poison_slab_object+0x100/0x180
[ 1360.549036] __kasan_slab_free+0x14/0x30
[ 1360.549042] kfree+0xb6/0x260
[ 1360.549049] __nft_release_table+0x473/0x6a0 [nf_tables]
[ 1360.549131] nf_tables_exit_net+0x170/0x240 [nf_tables]
[ 1360.549221] ops_exit_list+0x50/0xa0
[ 1360.549229] free_exit_list+0x101/0x140
[ 1360.549236] unregister_pernet_operations+0x107/0x160
[ 1360.549245] unregister_pernet_subsys+0x1c/0x30
[ 1360.549254] nf_tables_module_exit+0x43/0x80 [nf_tables]
[ 1360.549345] __do_sys_delete_module+0x253/0x370
[ 1360.549352] do_syscall_64+0x55/0x120
[ 1360.549360] entry_SYSCALL_64_after_hwframe+0x55/0x5d
(gdb) list *__nft_release_table+0x473
0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354).
11349 list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) {
11350 list_del(&flowtable->list);
11351 nft_use_dec(&table->use);
11352 nf_tables_flowtable_destroy(flowtable);
11353 }
11354 list_for_each_entry_safe(set, ns, &table->sets, list) {
11355 list_del(&set->list);
11356 nft_use_dec(&table->use);
11357 if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
11358 nft_map_deactivate(&ctx, set);
(gdb)
[ 1360.549372] Last potentially related work creation:
[ 1360.549376] kasan_save_stack+0x20/0x40
[ 1360.549384] __kasan_record_aux_stack+0x9b/0xb0
[ 1360.549392] __queue_work+0x3fb/0x780
[ 1360.549399] queue_work_on+0x4f/0x60
[ 1360.549407] nft_rhash_remove+0x33b/0x340 [nf_tables]
[ 1360.549516] nf_tables_commit+0x1c6a/0x2620 [nf_tables]
[ 1360.549625] nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink]
[ 1360.549647] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
[ 1360.549671] netlink_unicast+0x367/0x4f0
[ 1360.549680] netlink_sendmsg+0x34b/0x610
[ 1360.549690] ____sys_sendmsg+0x4d4/0x510
[ 1360.549697] ___sys_sendmsg+0xc9/0x120
[ 1360.549706] __sys_sendmsg+0xbe/0x140
[ 1360.549715] do_syscall_64+0x55/0x120
[ 1360.549725] entry_SYSCALL_64_after_hwframe+0x55/0x5d
Fixes:
|
||
Pablo Neira Ayuso
|
7b6fba6918 |
netfilter: nf_tables: reject new basechain after table flag update
commit 994209ddf4f430946f6247616b2e33d179243769 upstream. When dormant flag is toggled, hooks are disabled in the commit phase by iterating over current chains in table (existing and new). The following configuration allows for an inconsistent state: add table x add chain x y { type filter hook input priority 0; } add table x { flags dormant; } add chain x w { type filter hook input priority 1; } which triggers the following warning when trying to unregister chain w which is already unregistered. [ 127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50 1 __nf_unregister_net_hook+0x21a/0x260 [...] [ 127.322519] Call Trace: [ 127.322521] <TASK> [ 127.322524] ? __warn+0x9f/0x1a0 [ 127.322531] ? __nf_unregister_net_hook+0x21a/0x260 [ 127.322537] ? report_bug+0x1b1/0x1e0 [ 127.322545] ? handle_bug+0x3c/0x70 [ 127.322552] ? exc_invalid_op+0x17/0x40 [ 127.322556] ? asm_exc_invalid_op+0x1a/0x20 [ 127.322563] ? kasan_save_free_info+0x3b/0x60 [ 127.322570] ? __nf_unregister_net_hook+0x6a/0x260 [ 127.322577] ? __nf_unregister_net_hook+0x21a/0x260 [ 127.322583] ? __nf_unregister_net_hook+0x6a/0x260 [ 127.322590] ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables] [ 127.322655] nft_table_disable+0x75/0xf0 [nf_tables] [ 127.322717] nf_tables_commit+0x2571/0x2620 [nf_tables] Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Mahmoud Adam
|
cfb786b03b |
net/rds: fix possible cp null dereference
commit 62fc3357e079a07a22465b9b6ef71bb6ea75ee4b upstream. cp might be null, calling cp->cp_conn would produce null dereference [Simon Horman adds:] Analysis: * cp is a parameter of __rds_rdma_map and is not reassigned. * The following call-sites pass a NULL cp argument to __rds_rdma_map() - rds_get_mr() - rds_get_mr_for_dest * Prior to the code above, the following assumes that cp may be NULL (which is indicative, but could itself be unnecessary) trans_private = rs->rs_transport->get_mr( sg, nents, rs, &mr->r_key, cp ? cp->cp_conn : NULL, args->vec.addr, args->vec.bytes, need_odp ? ODP_ZEROBASED : ODP_NOT_NEEDED); * The code modified by this patch is guarded by IS_ERR(trans_private), where trans_private is assigned as per the previous point in this analysis. The only implementation of get_mr that I could locate is rds_ib_get_mr() which can return an ERR_PTR if the conn (4th) argument is NULL. * ret is set to PTR_ERR(trans_private). rds_ib_get_mr can return ERR_PTR(-ENODEV) if the conn (4th) argument is NULL. Thus ret may be -ENODEV in which case the code in question will execute. Conclusion: * cp may be NULL at the point where this patch adds a check; this patch does seem to address a possible bug Fixes: c055fc00c07b ("net/rds: fix WARNING in rds_conn_connect_if_down") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Mahmoud Adam <mngyadam@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240326153132.55580-1-mngyadam@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |