commit b5fc29233d28be7a3322848ebe73ac327559cdb9 upstream.
After commit d38afeec26 ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.
Now we can remove unnecessary inet6_destroy_sock() calls in
sk->sk_prot->destroy().
DCCP and SCTP have their own sk->sk_destruct() function, so we
change them separately in the following patches.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a5cb752b125766524c921faab1a45cc96065b0a7 upstream.
Beyond reducing code duplication this also avoids scheduling
the mptcp_worker on a closed socket on some edge scenarios.
The addressed issue is actually older than the blamed commit
below, but this fix needs it as a pre-requisite.
Fixes: ba8f48f7a4 ("mptcp: introduce mptcp_schedule_work")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0a3f4f1f9c27215e4ddcd312558342e57b93e518 ]
Backports notes: one simple conflict in net/mptcp/protocol.c with:
commit f8c9dfbd875b ("mptcp: add pm listener events")
Where one commit removes code in __mptcp_close_ssk() while the other
one adds one line at the same place. We can simply remove the whole
condition because this extra instruction is not present in v6.1.
As reported by Christoph after having refactored the passive
socket initialization, the mptcp listener shutdown path is prone
to an UaF issue.
BUG: KASAN: use-after-free in _raw_spin_lock_bh+0x73/0xe0
Write of size 4 at addr ffff88810cb23098 by task syz-executor731/1266
CPU: 1 PID: 1266 Comm: syz-executor731 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0x91
print_report+0x16a/0x46f
kasan_report+0xad/0x130
kasan_check_range+0x14a/0x1a0
_raw_spin_lock_bh+0x73/0xe0
subflow_error_report+0x6d/0x110
sk_error_report+0x3b/0x190
tcp_disconnect+0x138c/0x1aa0
inet_child_forget+0x6f/0x2e0
inet_csk_listen_stop+0x209/0x1060
__mptcp_close_ssk+0x52d/0x610
mptcp_destroy_common+0x165/0x640
mptcp_destroy+0x13/0x80
__mptcp_destroy_sock+0xe7/0x270
__mptcp_close+0x70e/0x9b0
mptcp_close+0x2b/0x150
inet_release+0xe9/0x1f0
__sock_release+0xd2/0x280
sock_close+0x15/0x20
__fput+0x252/0xa20
task_work_run+0x169/0x250
exit_to_user_mode_prepare+0x113/0x120
syscall_exit_to_user_mode+0x1d/0x40
do_syscall_64+0x48/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
The msk grace period can legitly expire in between the last
reference count dropped in mptcp_subflow_queue_clean() and
the later eventual access in inet_csk_listen_stop()
After the previous patch we don't need anymore special-casing
msk listener socket cleanup: the mptcp worker will process each
of the unaccepted msk sockets.
Just drop the now unnecessary code.
Please note this commit depends on the two parent ones:
mptcp: refactor passive socket initialization
mptcp: use the workqueue to destroy unaccepted sockets
Fixes: 6aeed90450 ("mptcp: fix race on unaccepted mptcp sockets")
Cc: stable@vger.kernel.org
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/346
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b6985b9b82954caa53f862d6059d06c0526254f0 ]
Backports notes: one simple conflict in net/mptcp/protocol.c with:
commit a5ef058dc4d9 ("net: introduce and use custom sockopt socket flag")
Where the two commits add a new line for different actions in the same
context in mptcp_stream_accept().
Christoph reported a UaF at token lookup time after having
refactored the passive socket initialization part:
BUG: KASAN: use-after-free in __token_bucket_busy+0x253/0x260
Read of size 4 at addr ffff88810698d5b0 by task syz-executor653/3198
CPU: 1 PID: 3198 Comm: syz-executor653 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0x91
print_report+0x16a/0x46f
kasan_report+0xad/0x130
__token_bucket_busy+0x253/0x260
mptcp_token_new_connect+0x13d/0x490
mptcp_connect+0x4ed/0x860
__inet_stream_connect+0x80e/0xd90
tcp_sendmsg_fastopen+0x3ce/0x710
mptcp_sendmsg+0xff1/0x1a20
inet_sendmsg+0x11d/0x140
__sys_sendto+0x405/0x490
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
We need to properly clean-up all the paired MPTCP-level
resources and be sure to release the msk last, even when
the unaccepted subflow is destroyed by the TCP internals
via inet_child_forget().
We can re-use the existing MPTCP_WORK_CLOSE_SUBFLOW infra,
explicitly checking that for the critical scenario: the
closed subflow is the MPC one, the msk is not accepted and
eventually going through full cleanup.
With such change, __mptcp_destroy_sock() is always called
on msk sockets, even on accepted ones. We don't need anymore
to transiently drop one sk reference at msk clone time.
Please note this commit depends on the parent one:
mptcp: refactor passive socket initialization
Fixes: 58b0991962 ("mptcp: create msk early")
Cc: stable@vger.kernel.org
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/347
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a236aef280ed5122b2d47087eb514d0921ae033 ]
After commit 30e51b923e ("mptcp: fix unreleased socket in accept queue")
unaccepted msk sockets go throu complete shutdown, we don't need anymore
to delay inserting the first subflow into the subflow lists.
The reference counting deserve some extra care, as __mptcp_close() is
unaware of the request socket linkage to the first subflow.
Please note that this is more a refactoring than a fix but because this
modification is needed to include other corrections, see the following
commits. Then a Fixes tag has been added here to help the stable team.
Fixes: 30e51b923e ("mptcp: fix unreleased socket in accept queue")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit cee4034a3db1d30c3243dd51506a9d4ab1a849fa upstream.
Christoph reports a lockdep splat in the mptcp_subflow_create_socket()
error path, when such function is invoked by
mptcp_pm_nl_create_listen_socket().
Such code path acquires two separates, nested socket lock, with the
internal lock operation lacking the "nested" annotation. Adding that
in sock_release() for mptcp's sake only could be confusing.
Instead just add a new lockclass to the in-kernel msk socket,
re-initializing the lockdep infra after the socket creation.
Fixes: ad2171009d96 ("mptcp: fix locking for in-kernel listener creation")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/354
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ba14528684f528566fb7d956bfbfb958b591d86 upstream.
tcp_set_state() is called from tcp_done() already.
There is then no need to first set the state to TCP_CLOSE, then call
tcp_done().
Fixes: d582484726 ("mptcp: fix fallback for MP_JOIN subflows")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/362
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 822467a48e938e661965d09df5fcac66f7291050 upstream.
Add __ro_after_init labels for the variables tcp_prot_override and
tcpv6_prot_override, just like other variables adjacent to them, to
indicate that they are initialised from the init hooks and no writes
occur afterwards.
Fixes: b19bc2945b ("mptcp: implement delegated actions")
Cc: stable@vger.kernel.org
Fixes: 51fa7f8ebf ("mptcp: mark ops structures as ro_after_init")
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b7a679ba7c652587b85294f4953f33ac0b756d40 upstream.
Christoph reported a possible deadlock while the TCP stack
destroys an unaccepted subflow due to an incoming reset: the
MPTCP socket error path tries to acquire the msk-level socket
lock while TCP still owns the listener socket accept queue
spinlock, and the reverse dependency already exists in the
TCP stack.
Note that the above is actually a lockdep false positive, as
the chain involves two separate sockets. A different per-socket
lockdep key will address the issue, but such a change will be
quite invasive.
Instead, we can simply stop earlier the socket error handling
for orphaned or unaccepted subflows, breaking the critical
lockdep chain. Error handling in such a scenario is a no-op.
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Fixes: 15cc104533 ("mptcp: deliver ssk errors to msk")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/355
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ad2171009d968104ccda9dc517f5a3ba891515db ]
For consistency, in mptcp_pm_nl_create_listen_socket(), we need to
call the __mptcp_nmpc_socket() under the msk socket lock.
Note that as a side effect, mptcp_subflow_create_socket() needs a
'nested' lockdep annotation, as it will acquire the subflow (kernel)
socket lock under the in-kernel listener msk socket lock.
The current lack of locking is almost harmless, because the relevant
socket is not exposed to the user space, but in future we will add
more complexity to the mentioned helper, let's play safe.
Fixes: 1729cf186d ("mptcp: create the listening socket for new port")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 976d302fb6165ad620778d7ba834cde6e3fe9f9f ]
When endpoint creation fails, we need to free the newly allocated
entry and eventually destroy the paired mptcp listener socket.
Consolidate such action in a single point let all the errors path
reach it.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: ad2171009d96 ("mptcp: fix locking for in-kernel listener creation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 21e43569685de4ad773fb060c11a15f3fd5e7ac4 ]
We need to call the __mptcp_nmpc_socket(), and later subflow socket
access under the msk socket lock, or e.g. a racing connect() could
change the socket status under the hood, with unexpected results.
Fixes: 54635bd047 ("mptcp: add TCP_FASTOPEN_CONNECT socket option")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d3d429047cc66ff49780c93e4fccd9527723d385 ]
There are other socket options that need to act only on the first
subflow, e.g. all TCP_FASTOPEN* socket options.
This is similar to the getsockopt version.
In the next commit, this new mptcp_setsockopt_first_sf_only() helper is
used by other another option.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 21e43569685d ("mptcp: fix locking for setsockopt corner-case")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1249db44a102d9d3541ed7798d4b01ffdcf03524 upstream.
Currently the subflow error report callback unconditionally
propagates the fallback subflow status to the owning msk.
If the msk is already orphaned, the above prevents the code
from correctly tracking the msk moving to the TCP_CLOSE state
and doing the appropriate cleanup.
All the above causes increasing memory usage over time and
sporadic self-tests failures.
There is a great deal of infrastructure trying to propagate
correctly the fallback subflow status to the owning mptcp socket,
e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed():
in the error propagation path we need only to cope with unorphaned
sockets.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/339
Fixes: 15cc104533 ("mptcp: deliver ssk errors to msk")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4e85922e3e7ef2071f91f65e61629b60f3a9cf4 upstream.
If the peer closes all the existing subflows for a given
mptcp socket and later the application closes it, the current
implementation let it survive until the timewait timeout expires.
While the above is allowed by the protocol specification it
consumes resources for almost no reason and additionally
causes sporadic self-tests failures.
Let's move the mptcp socket to the TCP_CLOSE state when there are
no alive subflows at close time, so that the allocated resources
will be freed immediately.
Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb00ee4f3343acb2b9222ca9b73b47dd1e1a8efc upstream.
If an MPTCP socket has been created with AF_INET6 and the IPV6_V6ONLY
option has been set, the userspace PM would allow creating subflows
using IPv4 addresses, e.g. mapped in v6.
The kernel side of userspace PM will also accept creating subflows with
local and remote addresses having different families. Depending on the
subflow socket's family, different behaviours are expected:
- If AF_INET is forced with a v6 address, the kernel will take the last
byte of the IP and try to connect to that: a new subflow is created
but to a non expected address.
- If AF_INET6 is forced with a v4 address, the kernel will try to
connect to a v4 address (v4-mapped-v6). A -EBADF error from the
connect() part is then expected.
It is then required to check the given families can be accepted. This is
done by using a new helper for addresses family matching, taking care of
IPv4 vs IPv4-mapped-IPv6 addresses. This helper will be re-used later by
the in-kernel path-manager to use mixed IPv4 and IPv6 addresses.
While at it, a clear error message is now reported if there are some
conflicts with the families that have been passed by the userspace.
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6bc1fe7dd748ba5e76e7917d110837cafe7b931c upstream.
Let the caller specify the to-be-created subflow family.
For a given MPTCP socket created with the AF_INET6 family, the current
userspace PM can already ask the kernel to create subflows in v4 and v6.
If "plain" IPv4 addresses are passed to the kernel, they are
automatically mapped in v6 addresses "by accident". This can be
problematic because the userspace will need to pass different addresses,
now the v4-mapped-v6 addresses to destroy this new subflow.
On the other hand, if the MPTCP socket has been created with the AF_INET
family, the command to create a subflow in v6 will be accepted but the
result will not be the one as expected as new subflow will be created in
IPv4 using part of the v6 addresses passed to the kernel: not creating
the expected subflow then.
No functional change intended for the in-kernel PM where an explicit
enforcement is currently in place. This arbitrary enforcement will be
leveraged by other patches in a future version.
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Cc: stable@vger.kernel.org
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fec3adfd754ccc99a7230e8ab9f105b65fb07bcc ]
MattB reported a lockdep splat in the mptcp listener code cleanup:
WARNING: possible circular locking dependency detected
packetdrill/14278 is trying to acquire lock:
ffff888017d868f0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069)
but task is already holding lock:
ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973)
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
__lock_acquire (kernel/locking/lockdep.c:5055)
lock_acquire (kernel/locking/lockdep.c:466)
lock_sock_nested (net/core/sock.c:3463)
mptcp_worker (net/mptcp/protocol.c:2614)
process_one_work (kernel/workqueue.c:2294)
worker_thread (include/linux/list.h:292)
kthread (kernel/kthread.c:376)
ret_from_fork (arch/x86/entry/entry_64.S:312)
-> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}:
check_prev_add (kernel/locking/lockdep.c:3098)
validate_chain (kernel/locking/lockdep.c:3217)
__lock_acquire (kernel/locking/lockdep.c:5055)
lock_acquire (kernel/locking/lockdep.c:466)
__flush_work (kernel/workqueue.c:3070)
__cancel_work_timer (kernel/workqueue.c:3160)
mptcp_cancel_work (net/mptcp/protocol.c:2758)
mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817)
__mptcp_close_ssk (net/mptcp/protocol.c:2363)
mptcp_destroy_common (net/mptcp/protocol.c:3170)
mptcp_destroy (include/net/sock.h:1495)
__mptcp_destroy_sock (net/mptcp/protocol.c:2886)
__mptcp_close (net/mptcp/protocol.c:2959)
mptcp_close (net/mptcp/protocol.c:2974)
inet_release (net/ipv4/af_inet.c:432)
__sock_release (net/socket.c:651)
sock_close (net/socket.c:1367)
__fput (fs/file_table.c:320)
task_work_run (kernel/task_work.c:181 (discriminator 1))
exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49)
syscall_exit_to_user_mode (kernel/entry/common.c:130)
do_syscall_64 (arch/x86/entry/common.c:87)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_INET);
lock((work_completion)(&msk->work));
lock(sk_lock-AF_INET);
lock((work_completion)(&msk->work));
*** DEADLOCK ***
The report is actually a false positive, since the only existing lock
nesting is the msk socket lock acquired by the mptcp work.
cancel_work_sync() is invoked without the relevant socket lock being
held, but under a different (the msk listener) socket lock.
We could silence the splat adding a per workqueue dynamic lockdep key,
but that looks overkill. Instead just tell lockdep the msk socket lock
is not held around cancel_work_sync().
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/322
Fixes: 30e51b923e ("mptcp: fix unreleased socket in accept queue")
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d803344fdc3e38079fabcf38b1e4cb6f8faa655 ]
MatM reported a deadlock at fastopening time:
INFO: task syz-executor.0:11454 blocked for more than 143 seconds.
Tainted: G S 6.1.0-rc5-03226-gdb0157db5153 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:25104 pid:11454 ppid:424 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5191 [inline]
__schedule+0x5c2/0x1550 kernel/sched/core.c:6503
schedule+0xe8/0x1c0 kernel/sched/core.c:6579
__lock_sock+0x142/0x260 net/core/sock.c:2896
lock_sock_nested+0xdb/0x100 net/core/sock.c:3466
__mptcp_close_ssk+0x1a3/0x790 net/mptcp/protocol.c:2328
mptcp_destroy_common+0x16a/0x650 net/mptcp/protocol.c:3171
mptcp_disconnect+0xb8/0x450 net/mptcp/protocol.c:3019
__inet_stream_connect+0x897/0xa40 net/ipv4/af_inet.c:720
tcp_sendmsg_fastopen+0x3dd/0x740 net/ipv4/tcp.c:1200
mptcp_sendmsg_fastopen net/mptcp/protocol.c:1682 [inline]
mptcp_sendmsg+0x128a/0x1a50 net/mptcp/protocol.c:1721
inet6_sendmsg+0x11f/0x150 net/ipv6/af_inet6.c:663
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xf7/0x190 net/socket.c:734
____sys_sendmsg+0x336/0x970 net/socket.c:2476
___sys_sendmsg+0x122/0x1c0 net/socket.c:2530
__sys_sendmmsg+0x18d/0x460 net/socket.c:2616
__do_sys_sendmmsg net/socket.c:2645 [inline]
__se_sys_sendmmsg net/socket.c:2642 [inline]
__x64_sys_sendmmsg+0x9d/0x110 net/socket.c:2642
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5920a75e7d
RSP: 002b:00007f59201e8028 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f5920bb4f80 RCX: 00007f5920a75e7d
RDX: 0000000000000001 RSI: 0000000020002940 RDI: 0000000000000005
RBP: 00007f5920ae7593 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000020004050 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f5920bb4f80 R15: 00007f59201c8000
</TASK>
In the error path, tcp_sendmsg_fastopen() ends-up calling
mptcp_disconnect(), and the latter tries to close each
subflow, acquiring the socket lock on each of them.
At fastopen time, we have a single subflow, and such subflow
socket lock is already held by the called, causing the deadlock.
We already track the 'fastopen in progress' status inside the msk
socket. Use it to address the issue, making mptcp_disconnect() a
no op when invoked from the fastopen (error) path and doing the
relevant cleanup after releasing the subflow socket lock.
While at the above, rename the fastopen status bit to something
more meaningful.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/321
Fixes: fa9e57468a ("mptcp: fix abba deadlock on fastopen")
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d3295fee3c756ece33ac0d935e172e68c0a4161b upstream.
Before, only the destructor from TCP request sock in IPv4 was called
even if the subflow was IPv6.
It is important to use the right destructor to avoid memory leaks with
some advanced IPv6 features, e.g. when the request socks contain
specific IPv6 options.
Fixes: 79c0949e9a ("mptcp: Add key generation and token tree")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34b21d1ddc8ace77a8fa35c1b1e06377209e0dae upstream.
tcp_request_sock_ops structure is specific to IPv4. It should then not
be used with MPTCP subflows on top of IPv6.
For example, it contains the 'family' field, initialised to AF_INET.
This 'family' field is used by TCP FastOpen code to generate the cookie
but also by TCP Metrics, SELinux and SYN Cookies. Using the wrong family
will not lead to crashes but displaying/using/checking wrong things.
Note that 'send_reset' callback from request_sock_ops structure is used
in some error paths. It is then also important to use the correct one
for IPv4 or IPv6.
The slab name can also be different in IPv4 and IPv6, it will be used
when printing some log messages. The slab pointer will anyway be the
same because the object size is the same for both v4 and v6. A
BUILD_BUG_ON() has also been added to make sure this size is the same.
Fixes: cec37a6e41 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fff88186f047627bb128d65155f42517f8e448f upstream.
To ease the maintenance, it is often recommended to avoid having #ifdef
preprocessor conditions.
Here the section related to CONFIG_MPTCP was quite short but the next
commit needs to add more code around. It is then cleaner to move
specific MPTCP code to functions located in net/mptcp directory.
Now that mptcp_subflow_request_sock_ops structure can be static, it can
also be marked as "read only after init".
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All of the subflows of a msk will be orphaned in mptcp_close(), which
means the subflows are in DEAD state. After then, DATA_FIN will be sent,
and the other side will response with a DATA_ACK for this DATA_FIN.
However, if the other side still has pending data, the data that received
on these subflows will not be passed to the msk, as they are DEAD and
subflow_data_ready() will not be called in tcp_data_ready(). Therefore,
these data can't be acked, and they will be retransmitted again and again,
until timeout.
Fix this by setting ssk->sk_socket and ssk->sk_wq to 'NULL', instead of
orphaning the subflows in __mptcp_close(), as Paolo suggested.
Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Our CI reported lockdep splat in the fastopen code:
======================================================
WARNING: possible circular locking dependency detected
6.0.0.mptcp_f5e8bfe9878d+ #1558 Not tainted
------------------------------------------------------
packetdrill/1071 is trying to acquire lock:
ffff8881bd198140 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_wait_for_connect+0x19c/0x310
but task is already holding lock:
ffff8881b8346540 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0xfdf/0x1740
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (k-sk_lock-AF_INET){+.+.}-{0:0}:
__lock_acquire+0xb6d/0x1860
lock_acquire+0x1d8/0x620
lock_sock_nested+0x37/0xd0
inet_stream_connect+0x3f/0xa0
mptcp_connect+0x411/0x800
__inet_stream_connect+0x3ab/0x800
mptcp_stream_connect+0xac/0x110
__sys_connect+0x101/0x130
__x64_sys_connect+0x6e/0xb0
do_syscall_64+0x59/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (sk_lock-AF_INET){+.+.}-{0:0}:
check_prev_add+0x15e/0x2110
validate_chain+0xace/0xdf0
__lock_acquire+0xb6d/0x1860
lock_acquire+0x1d8/0x620
lock_sock_nested+0x37/0xd0
inet_wait_for_connect+0x19c/0x310
__inet_stream_connect+0x26c/0x800
tcp_sendmsg_fastopen+0x341/0x650
mptcp_sendmsg+0x109d/0x1740
sock_sendmsg+0xe1/0x120
__sys_sendto+0x1c7/0x2a0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x59/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(k-sk_lock-AF_INET);
lock(sk_lock-AF_INET);
lock(k-sk_lock-AF_INET);
lock(sk_lock-AF_INET);
*** DEADLOCK ***
1 lock held by packetdrill/1071:
#0: ffff8881b8346540 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0xfdf/0x1740
======================================================
The problem is caused by the blocking inet_wait_for_connect() releasing
and re-acquiring the msk socket lock while the subflow socket lock is
still held and the MPTCP socket requires that the msk socket lock must
be acquired before the subflow socket lock.
Address the issue always invoking tcp_sendmsg_fastopen() in an
unblocking manner, and later eventually complete the blocking
__inet_stream_connect() as needed.
Fixes: d98a82a6af ("mptcp: handle defer connect in mptcp_sendmsg")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current MPTCP connect implementation duplicates a bit of inet
code and does not use nor provide a struct proto->connect callback,
which in turn will not fit the upcoming fastopen implementation.
Refactor such implementation to use the common helper, moving the
MPTCP-specific bits into mptcp_connect(). Additionally, avoid an
indirect call to the subflow connect callback.
Note that the fastopen call-path invokes mptcp_connect() while already
holding the subflow socket lock. Explicitly keep track of such path
via a new MPTCP-level flag and handle the locking accordingly.
Additionally, track the connect flags in a new msk field to allow
propagating them to the subflow inet_stream_connect call.
Fixes: d98a82a6af ("mptcp: handle defer connect in mptcp_sendmsg")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mptcp_pm_nl_get_local_id() code assumes that the msk local address
is available at that point. For passive sockets, we initialize such
address at accept() time.
Depending on the running configuration and the user-space timing, a
passive MPJ subflow can join the msk socket before accept() completes.
In such case, the PM assigns a wrong local id to the MPJ subflow
and later PM netlink operations will end-up touching the wrong/unexpected
subflow.
All the above causes sporadic self-tests failures, especially when
the host is heavy loaded.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/308
Fixes: 01cacb00b3 ("mptcp: add netlink-based PM")
Fixes: d045b9eb95 ("mptcp: introduce implicit endpoints")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MPTCP data path is quite complex and hard to understend even
without some foggy comments referring to modified code and/or
completely misleading from the beginning.
Update a few of them to more accurately describing the current
status.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daire reported a user-space application hang-up when the
peer is forcibly closed before the data transfer completion.
The relevant application expects the peer to either
do an application-level clean shutdown or a transport-level
connection reset.
We can accommodate a such user by extending the fastclose
usage: at fd close time, if the msk socket has some unread
data, and at FIN_WAIT timeout.
Note that at MPTCP close time we must ensure that the TCP
subflows will reset: set the linger socket option to a suitable
value.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an mptcp socket is closed due to an incoming FASTCLOSE
option, so specific sk_err is set and later syscall will
fail usually with EPIPE.
Align the current fastclose error handling with TCP reset,
properly setting the socket error according to the current
msk state and propagating such error.
Additionally sendmsg() is currently not handling properly
the sk_err, always returning EPIPE.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mptcp socket and its subflow sockets in accept queue can't be
released after the process exit.
While the release of a mptcp socket in listening state, the
corresponding tcp socket will be released too. Meanwhile, the tcp
socket in the unaccept queue will be released too. However, only init
subflow is in the unaccept queue, and the joined subflow is not in the
unaccept queue, which makes the joined subflow won't be released, and
therefore the corresponding unaccepted mptcp socket will not be released
to.
This can be reproduced easily with following steps:
1. create 2 namespace and veth:
$ ip netns add mptcp-client
$ ip netns add mptcp-server
$ sysctl -w net.ipv4.conf.all.rp_filter=0
$ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
$ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
$ ip link add red-client netns mptcp-client type veth peer red-server \
netns mptcp-server
$ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
$ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
$ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
$ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
$ ip -n mptcp-server link set red-server up
$ ip -n mptcp-client link set red-client up
2. configure the endpoint and limit for client and server:
$ ip -n mptcp-server mptcp endpoint flush
$ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
$ ip -n mptcp-client mptcp endpoint flush
$ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
$ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
1 subflow
3. listen and accept on a port, such as 9999. The nc command we used
here is modified, which makes it use mptcp protocol by default.
$ ip netns exec mptcp-server nc -l -k -p 9999
4. open another *two* terminal and use each of them to connect to the
server with the following command:
$ ip netns exec mptcp-client nc 10.0.0.1 9999
Input something after connect to trigger the connection of the second
subflow. So that there are two established mptcp connections, with the
second one still unaccepted.
5. exit all the nc command, and check the tcp socket in server namespace.
And you will find that there is one tcp socket in CLOSE_WAIT state
and can't release forever.
Fix this by closing all of the unaccepted mptcp socket in
mptcp_subflow_queue_clean() with __mptcp_close().
Now, we can ensure that all unaccepted mptcp sockets will be cleaned by
__mptcp_close() before they are released, so mptcp_sock_destruct(), which
is used to clean the unaccepted mptcp socket, is not needed anymore.
The selftests for mptcp is ran for this commit, and no new failures.
Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Fixes: 6aeed90450 ("mptcp: fix race on unaccepted mptcp sockets")
Cc: stable@vger.kernel.org
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Factor out __mptcp_close() from mptcp_close(). The caller of
__mptcp_close() should hold the socket lock, and cancel mptcp work when
__mptcp_close() returns true.
This function will be used in the next commit.
Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Fixes: 6aeed90450 ("mptcp: fix race on unaccepted mptcp sockets")
Cc: stable@vger.kernel.org
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If fastopen is used, poll must allow a first write that will trigger
the SYN+data
Similar to what is done in tcp_poll().
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When TCP_FASTOPEN_CONNECT has been set on the socket before a connect,
the defer flag is set and must be handled when sendmsg is called.
This is similar to what is done in tcp_sendmsg_locked().
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Benjamin Hesmans <benjamin.hesmans@tessares.net>
Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net>
Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Set the option for the first subflow only. For the other subflows TFO
can't be used because a mapping would be needed to cover the data in the
SYN.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We will soon introduce an optional per-netns ehash.
This means we cannot use tcp_hashinfo directly in most places.
Instead, access it via net->ipv4.tcp_death_row.hashinfo.
The access will be valid only while initialising tcp_hashinfo
itself and creating/destroying each netns.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that non-root users can configure MPTCP endpoints, account
the memory allocation to the user.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
GENL_ADMIN_PERM checks that the user has CAP_NET_ADMIN in the initial
namespace by calling netlink_capable(). Instead, use GENL_UNS_ADMIN_PERM
which uses netlink_ns_capable(). This checks that the caller has
CAP_NET_ADMIN in the current user namespace.
See also
commit 4a92602aa1 ("openvswitch: allow management from inside user namespaces")
which introduced this mechanism. See also
commit 5617c6cd6f ("nl80211: Allow privileged operations from user namespaces")
which introduced this for nl80211.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch adds a new bool variable 'do_check_data_fin' to replace the
original int variable 'copied' in __mptcp_push_pending(), check it to
determine whether to call __mptcp_check_send_data_fin().
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Similar to mptcp_for_each_subflow(): this is clearer now that the _safe
version is used in multiple places.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.
One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.
To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_max_skb_frags, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 5f74f82ea3 ("net:Add sysctl_max_skb_frags")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>