Go to file
Florian Westphal 9705f447bf inet: inet_defrag: prevent sk release while still in use
commit 18685451fc4e546fc0e718580d32df3c0e5c8272 upstream.

ip_local_out() and other functions can pass skb->sk as function argument.

If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.

This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.

Eric Dumazet made an initial analysis of this bug.  Quoting Eric:
  Calling ip_defrag() in output path is also implying skb_orphan(),
  which is buggy because output path relies on sk not disappearing.

  A relevant old patch about the issue was :
  8282f27449 ("inet: frag: Always orphan skbs inside ip_defrag()")

  [..]

  net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
  inet socket, not an arbitrary one.

  If we orphan the packet in ipvlan, then downstream things like FQ
  packet scheduler will not work properly.

  We need to change ip_defrag() to only use skb_orphan() when really
  needed, ie whenever frag_list is going to be used.

Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:

If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.

This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.

This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.

In the former case, things work as before, skb is orphaned.  This is
safe because skb gets queued/stolen and won't continue past reasm engine.

In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.

Fixes: 7026b1ddb6 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17 15:07:37 +02:00
arch x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency 2024-10-17 15:07:36 +02:00
block block: initialize integrity buffer to zero before writing it to media 2024-09-12 11:06:41 +02:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-22 14:13:17 +02:00
crypto crypto: aead,cipher - zeroize key buffer after use 2024-07-18 13:05:38 +02:00
Documentation hwspinlock: Introduce hwspin_lock_bust() 2024-09-12 11:06:41 +02:00
drivers gpio: prevent potential speculation leaks in gpio_device_get_desc() 2024-10-17 15:07:37 +02:00
fs ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry() 2024-10-17 15:07:36 +02:00
include inet: inet_defrag: prevent sk release while still in use 2024-10-17 15:07:37 +02:00
init init/main.c: Fix potential static_command_line memory overflow 2024-05-02 16:23:39 +02:00
io_uring io_uring/io-wq: limit retrying worker initialisation 2024-08-19 05:41:03 +02:00
ipc ipc: replace costly bailout check in sysvipc_find_ipc() 2024-09-04 13:17:44 +02:00
kernel cgroup: Make operations on the cgroup root_list RCU safe 2024-10-17 15:07:36 +02:00
lib lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc() 2024-09-12 11:06:49 +02:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm memcg: protect concurrent access to mem_cgroup_idr 2024-09-12 11:06:51 +02:00
net inet: inet_defrag: prevent sk release while still in use 2024-10-17 15:07:37 +02:00
samples Add gitignore file for samples/fanotify/ subdirectory 2024-08-19 05:41:21 +02:00
scripts scripts: kconfig: merge_config: config files: add a trailing newline 2024-10-17 15:07:32 +02:00
security smack: unix sockets: fix accept()ed socket label 2024-09-12 11:06:45 +02:00
sound ASoC: tda7419: fix module autoloading 2024-10-17 15:07:35 +02:00
tools kselftests: dmabuf-heaps: Ensure the driver name is null-terminated 2024-09-12 11:06:49 +02:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2022-02-01 17:25:48 +01:00
virt KVM: Always flush async #PF workqueue when vCPU is being destroyed 2024-04-13 12:58:04 +02:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore kbuild: generate Module.symvers only when vmlinux exists 2021-05-19 10:12:59 +02:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Remove DECnet support from kernel 2023-06-21 15:45:38 +02:00
Makefile Linux 5.10.226 2024-09-12 11:06:51 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.