android_kernel_xiaomi_sm8450/net/core
Florian Westphal 9705f447bf inet: inet_defrag: prevent sk release while still in use
commit 18685451fc4e546fc0e718580d32df3c0e5c8272 upstream.

ip_local_out() and other functions can pass skb->sk as function argument.

If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.

This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.

Eric Dumazet made an initial analysis of this bug.  Quoting Eric:
  Calling ip_defrag() in output path is also implying skb_orphan(),
  which is buggy because output path relies on sk not disappearing.

  A relevant old patch about the issue was :
  8282f27449 ("inet: frag: Always orphan skbs inside ip_defrag()")

  [..]

  net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
  inet socket, not an arbitrary one.

  If we orphan the packet in ipvlan, then downstream things like FQ
  packet scheduler will not work properly.

  We need to change ip_defrag() to only use skb_orphan() when really
  needed, ie whenever frag_list is going to be used.

Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:

If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.

This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.

This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.

In the former case, things work as before, skb is orphaned.  This is
safe because skb gets queued/stolen and won't continue past reasm engine.

In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.

Fixes: 7026b1ddb6 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17 15:07:37 +02:00
..
bpf_sk_storage.c bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing 2023-08-11 11:57:47 +02:00
datagram.c net: datagram: fix data-races in datagram_poll() 2023-05-30 12:57:46 +01:00
datagram.h
dev_addr_lists.c net: core: add nested_level variable in net_device 2020-09-28 15:00:15 -07:00
dev_ioctl.c net: dev: Convert sa_data to flexible array in struct sockaddr 2024-03-01 13:16:50 +01:00
dev.c net: give more chances to rcu in netdev_wait_allrefs_any() 2024-06-16 13:32:07 +02:00
devlink.c devlink: remove reload failed checks in params get/set callbacks 2023-09-23 11:01:05 +02:00
drop_monitor.c drop_monitor: replace spin_lock by raw_spin_lock 2024-07-05 09:12:34 +02:00
dst_cache.c wireguard: device: reset peer src endpoint when netns exits 2021-12-08 09:03:22 +01:00
dst.c ipv6: remove max_size check inline with ipv4 2024-01-15 18:48:07 +01:00
failover.c
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c ipv6: fix memory leak in fib6_rule_suppress 2021-12-08 09:03:21 +01:00
filter.c bpf: Fix a segment issue when downgrading gso_size 2024-08-19 05:41:05 +02:00
flow_dissector.c net/ipv6: SKB symmetric hash should incorporate transport ports 2023-09-19 12:20:23 +02:00
flow_offload.c netfilter: nf_tables: bail out early if hardware offload is not supported 2022-06-14 18:32:40 +02:00
gen_estimator.c net_sched: gen_estimator: support large ewma log 2021-01-27 11:55:23 +01:00
gen_stats.c docs: networking: convert gen_stats.txt to ReST 2020-04-28 14:39:46 -07:00
gro_cells.c net: Fix data-races around netdev_max_backlog. 2022-08-31 17:15:19 +02:00
hwbm.c net: hwbm: Make the hwbm_pool lock a mutex 2019-06-09 19:40:10 -07:00
link_watch.c net: linkwatch: use system_unbound_wq 2024-08-19 05:41:11 +02:00
lwt_bpf.c lwt: Fix return values of BPF xmit ops 2023-09-19 12:20:09 +02:00
lwtunnel.c lwtunnel: Validate RTA_ENCAP_TYPE attribute length 2022-01-11 15:25:00 +01:00
Makefile ethtool: move to its own directory 2019-12-12 17:07:05 -08:00
neighbour.c neighbour: Don't let neigh_forced_gc() disable preemption for long 2024-01-25 14:37:37 -08:00
net_namespace.c netns: Make get_net_ns() handle zero refcount net 2024-07-05 09:12:38 +02:00
net-procfs.c net-procfs: show net devices bound packet types 2022-02-01 17:25:44 +01:00
net-sysfs.c ethtool: check device is present when getting link settings 2024-09-04 13:17:46 +02:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c page_pool: add tracepoints for page_pool with details need by XDP 2019-06-19 11:23:13 -04:00
netclassid_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2024-09-12 11:06:42 +02:00
netevent.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
netpoll.c netpoll: Fix race condition in netpoll_owner_active 2024-07-05 09:12:35 +02:00
netprio_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2024-09-12 11:06:42 +02:00
page_pool.c mm: fix struct page layout on 32-bit systems 2021-05-19 10:13:17 +02:00
pktgen.c net: pktgen: Fix interface flags printing 2023-10-25 11:54:20 +02:00
ptp_classifier.c ptp: Add generic ptp v2 header parsing function 2020-08-19 16:07:49 -07:00
request_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-02-23 08:41:55 +01:00
rtnetlink.c rtnetlink: Correct nested IFLA_VF_VLAN_LIST attribute validation 2024-05-17 11:48:07 +02:00
scm.c io_uring/unix: drop usage of io_uring socket 2024-03-26 18:21:45 -04:00
secure_seq.c tcp: Fix data-races around sysctl knobs related to SYN option. 2022-07-29 17:19:21 +02:00
skbuff.c kcov: Remove kcov include from sched.h and move it to its users. 2024-05-17 11:48:07 +02:00
skmsg.c bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues 2024-07-18 13:05:43 +02:00
sock_destructor.h inet: inet_defrag: prevent sk release while still in use 2024-10-17 15:07:37 +02:00
sock_diag.c sock_diag: annotate data-races around sock_diag_handlers[family] 2024-03-26 18:21:49 -04:00
sock_map.c bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues 2024-07-18 13:05:43 +02:00
sock_reuseport.c udp: Update reuse->has_conns under reuseport_lock. 2022-10-30 09:41:19 +01:00
sock.c ipv6: Fix data races around sk->sk_prot. 2024-07-05 09:12:56 +02:00
stream.c net: deal with most data-races in sk_wait_event() 2023-05-30 12:57:46 +01:00
sysctl_net_core.c net: Fix data-races around weight_p and dev_weight_[rt]x_bias. 2022-08-31 17:15:19 +02:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: tso: add UDP segmentation support 2020-06-18 20:46:23 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-01-24 20:54:30 +01:00
xdp.c xdp: fix invalid wait context of page_pool_destroy() 2024-08-19 05:40:48 +02:00