Pull core fixes from Thomas Gleixner:
"A small set of updates for core code:
- Prevent tracing in functions which are called from trace patching
via stop_machine() to prevent executing half patched function trace
entries.
- Remove old GCC workarounds
- Remove pointless includes of notifier.h"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Remove workaround for unreachable warnings from old GCC
notifier: Remove notifier header file wherever not used
watchdog: Mark watchdog touch functions as notrace
rds is the last in-kernel user of the old do_gettimeofday()
function. Convert it over to ktime_get_real() to make it
work more like the generic socket timestamps, and to let
us kill off do_gettimeofday().
A follow-up patch will have to change the user space interface
to deal better with 32-bit tasks, which may use an incompatible
layout for 'struct timespec'.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tls records are decrypted using asynchronous acclerators such as
NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
getting -EINPROGRESS, the tls record processing stops till the time the
crypto accelerator finishes off and returns the result. This incurs a
context switch and is not an efficient way of accessing the crypto
accelerators. Crypto accelerators work efficient when they are queued
with multiple crypto jobs without having to wait for the previous ones
to complete.
The patch submits multiple crypto requests without having to wait for
for previous ones to complete. This has been implemented for records
which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
for all the asynchronous decryption requests to complete.
The references to records which have been sent for async decryption are
dropped. For cases where record decryption is not possible in zero-copy
mode, asynchronous decryption is not used and we wait for decryption
crypto api to complete.
For crypto requests executing in async fashion, the memory for
aead_request, sglists and skb etc is freed from the decryption
completion handler. The decryption completion handler wakesup the
sleeping user context when recvmsg() flags that it has done sending
all the decryption requests and there are no more decryption requests
pending to be completed.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Reviewed-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 80f1a0f4e0cd ("net/ipv6: Put lwtstate when destroying fib6_info")
partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
with ip6_rt_copy_init(), and it should be done only once there.
rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
ip6_rt_copy_init(), so there is no need to get it again at the end.
With this patch, lwtstate also isn't copied from RTF_REJECT routes.
[1]:
unreferenced object 0xffff880b6aaa14e0 (size 64):
comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
hex dump (first 32 bytes):
01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
[<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
[<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
[<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
[<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
[<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
[<000000004c893c76>] netlink_unicast+0x417/0x5a0
[<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
[<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
[<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
[<000000001e7426c8>] __sys_sendmsg+0xde/0x170
[<00000000fe411443>] do_syscall_64+0x9f/0x4a0
[<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<000000006d21f353>] 0xffffffffffffffff
Fixes: 6edb3c96a5f0 ("net/ipv6: Defer initialization of dst to data path")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 1337 says:
''Ignore RST segments in TIME-WAIT state.
If the 2 minute MSL is enforced, this fix avoids all three hazards.''
So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk
expire rather than removing it instantly when a reset is received.
However, Linux will also re-start the TIME-WAIT timer.
This causes connect to fail when tying to re-use ports or very long
delays (until syn retry interval exceeds MSL).
packetdrill test case:
// Demonstrate bogus rearming of TIME-WAIT timer in rfc1337 mode.
`sysctl net.ipv4.tcp_rfc1337=1`
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < S 0:0(0) win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
// Receive first segment
0.310 < P. 1:1001(1000) ack 1 win 46
// Send one ACK
0.310 > . 1:1(0) ack 1001
// read 1000 byte
0.310 read(4, ..., 1000) = 1000
// Application writes 100 bytes
0.350 write(4, ..., 100) = 100
0.350 > P. 1:101(100) ack 1001
// ACK
0.500 < . 1001:1001(0) ack 101 win 257
// close the connection
0.600 close(4) = 0
0.600 > F. 101:101(0) ack 1001 win 244
// Our side is in FIN_WAIT_1 & waits for ack to fin
0.7 < . 1001:1001(0) ack 102 win 244
// Our side is in FIN_WAIT_2 with no outstanding data.
0.8 < F. 1001:1001(0) ack 102 win 244
0.8 > . 102:102(0) ack 1002 win 244
// Our side is now in TIME_WAIT state, send ack for fin.
0.9 < F. 1002:1002(0) ack 102 win 244
0.9 > . 102:102(0) ack 1002 win 244
// Peer reopens with in-window SYN:
1.000 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
// Therefore, reply with ACK.
1.000 > . 102:102(0) ack 1002 win 244
// Peer sends RST for this ACK. Normally this RST results
// in tw socket removal, but rfc1337=1 setting prevents this.
1.100 < R 1002:1002(0) win 244
// second syn. Due to rfc1337=1 expect another pure ACK.
31.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
31.0 > . 102:102(0) ack 1002 win 244
// .. and another RST from peer.
31.1 < R 1002:1002(0) win 244
31.2 `echo no timer restart;ss -m -e -a -i -n -t -o state TIME-WAIT`
// third syn after one minute. Time-Wait socket should have expired by now.
63.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
// so we expect a syn-ack & 3whs to proceed from here on.
63.0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
Without this patch, 'ss' shows restarts of tw timer and last packet is
thus just another pure ack, more than one minute later.
This restores the original code from commit 283fd6cf0be690a83
("Merge in ANK networking jumbo patch") in netdev-vger-cvs.git .
For some reason the else branch was removed/lost in 1f28b683339f7
("Merge in TCP/UDP optimizations and [..]") and timer restart became
unconditional.
Reported-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Getting prompt "The RDS Protocol" (RDS) is not too helpful, and it is
easily confused with Radio Data System (which we may want to support
in kernel, too).
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nl_net is set on entry to ip6_route_info_create. Only devices
within that namespace are considered so no need to reset it
before returning.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make IPv4 consistent with IPv6 and return an extack message that the
ONLINK flag requires a nexthop device.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently a Linux IPv6 TCP sender will change the flow label upon
timeouts to potentially steer away from a data path that has gone
bad. However this does not help if the problem is on the ACK path
and the data path is healthy. In this case the receiver is likely
to receive repeated spurious retransmission because the sender
couldn't get the ACKs in time and has recurring timeouts.
This patch adds another feature to mitigate this problem. It
leverages the DSACK states in the receiver to change the flow
label of the ACKs to speculatively re-route the ACK packets.
In order to allow triggering on the second consecutive spurious
RTO, the receiver changes the flow label upon sending a second
consecutive DSACK for a sequence number below RCV.NXT.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 71e41286203c017d24f041a7cd71abea7ca7b1e0.
mmap()/munmap() can not be backed by kmalloced pages :
We fault in :
VM_BUG_ON_PAGE(PageSlab(page), page);
unmap_single_vma+0x8a/0x110
unmap_vmas+0x4b/0x90
unmap_region+0xc9/0x140
do_munmap+0x274/0x360
vm_munmap+0x81/0xc0
SyS_munmap+0x2b/0x40
do_syscall_64+0x13e/0x1c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Fixes: 71e41286203c ("packet: switch kvzalloc to allocate memory")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: John Sperbeck <jsperbeck@google.com>
Bisected-by: John Sperbeck <jsperbeck@google.com>
Cc: Zhang Yu <zhangyu31@baidu.com>
Cc: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the new locking rule, we have to take tcf_lock
for both ->init() and ->dump(), as RTNL will be removed.
However, it is missing for act_connmark.
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 331a9295de23 ("net: sched: act: add extack for lookup callback").
This extack is never used after 6 months... In fact, it can be just
set in the caller, right after ->lookup().
Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-09-01
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add AF_XDP zero-copy support for i40e driver (!), from Björn and Magnus.
2) BPF verifier improvements by giving each register its own liveness
chain which allows to simplify and getting rid of skip_callee() logic,
from Edward.
3) Add bpf fs pretty print support for percpu arraymap, percpu hashmap
and percpu lru hashmap. Also add generic percpu formatted print on
bpftool so the same can be dumped there, from Yonghong.
4) Add bpf_{set,get}sockopt() helper support for TCP_SAVE_SYN and
TCP_SAVED_SYN options to allow reflection of tos/tclass from received
SYN packet, from Nikita.
5) Misc improvements to the BPF sockmap test cases in terms of cgroup v2
interaction and removal of incorrect shutdown() calls, from John.
6) Few cleanups in xdp_umem_assign_dev() and xdpsock samples, from Prashant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit gets rid of the structure xdp_umem_props. It was there to
be able to break a dependency at one point, but this is no longer
needed. The values in the struct are instead stored directly in the
xdp_umem structure. This simplifies the xsk code as well as af_xdp
zero-copy drivers and as a bonus gets rid of one internal header file.
The i40e driver is also adapted to the new interface in this commit.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Since xdp_umem_query() was added one assignment of bpf.command was
missed from cleanup. Removing the assignment statement.
Fixes: 84c6b86875e01a0 ("xsk: don't allow umem replace at stack level")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Adding support for two new bpf get/set sockopts: TCP_SAVE_SYN (set)
and TCP_SAVED_SYN (get). This would allow for bpf program to build
logic based on data from ingress SYN packet (e.g. doing tcp's tos/
tclass reflection (see sample prog)) and do it transparently from
userspace program point of view.
Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Variable 'headroom' is being assigned but is never used hence it is
redundant and can be removed.
Cleans up clang warning:
variable ‘headroom’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did
not include XDP meta data in the data buffers copied out to the user
application.
In this commit, we check if meta data is available, and if so, it is
prepended to the frame.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In the error path of changing the SKB headroom of the second
A-MSDU subframe, we would not account for the already-changed
length of the first frame that just got converted to be in
A-MSDU format and thus is a bit longer now.
Fix this by doing the necessary accounting.
It would be possible to reorder the operations, but that would
make the code more complex (to calculate the necessary pad),
and the headroom expansion should not fail frequently enough
to make that worthwhile.
Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Do not start to aggregate packets in a A-MSDU frame (converting the
first subframe to A-MSDU, adding the header) if max_tx_fragments or
max_amsdu_subframes limits are already exceeded by it. In particular,
this happens when drivers set the limit to 1 to avoid A-MSDUs at all.
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
[reword commit message to be more precise]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nl80211_update_ft_ies() tried to validate NL80211_ATTR_IE with
is_valid_ie_attr() before dereferencing it, but that helper function
returns true in case of NULL pointer (i.e., attribute not included).
This can result to dereferencing a NULL pointer. Fix that by explicitly
checking that NL80211_ATTR_IE is included.
Fixes: 355199e02b83 ("cfg80211: Extend support for IEEE 802.11r Fast BSS Transition")
Signed-off-by: Arunk Khandavalli <akhandav@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* new channels in 60 GHz
* clarify (average) ACK signal reporting API
* expose ieee80211_send_layer2_update() for all drivers
* start/stop mac80211's TXQs properly when required
* avoid regulatory restore with IE ignoring
* spelling: contidion -> condition
* fully implement WFA Multi-AP backhaul
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAluGiMwACgkQB8qZga/f
l8SHMRAAhlYZNjZIHMqbyqRMFeGkgyfgQRYKb4xhrYr7v0542g5U99MWMNBtUJmq
8aP4dzUFuPkR0qOi220PCs8PBBjuVcdTK1vq7AYiwiK4Um10/MtAlay6BUJFKVqU
sJMaMbPy4mB3ocWl/q2K4nKaCZARsr854xwiIJZVwbc8n8t60Mr5ELbzELb5prGS
jPpeRzYd7m4y4xnSYaiXWchdNOplFRN04NcuKJx10Pr3oWilGlj/ujGvwp78U6Uy
v1H3T9S4XWMFkvl3deeOS6SVkejx76cvH8Ryoq+/qqQsAgs3c9tPQX+mwj6mNicq
KsQcMIX6WmHNG5IcyaWat4LzdPgb5Xv31brA5tciZ3jebmIbc0P4dSYDLs7Jq1fg
gkYuyNV3Jlwzv93RzqcrxfAIquZAvI7fy4CGiiRwtMk3wuHJlGy21PjGqZtWwqbn
v0MbQf9riv1e653ygKSUpm1UoT5HMFbs6ZzbqpSy7Vr0e9+6B78Xkcp5u1DAvIan
09YSqzKlypoGC+/802BL34HTpoUnf/hiBzVjYFmqvL/X2qv7oEUMOIv2x9Lg7NHh
NZOPWcwjtPN57UP97Y6gCRAI2kJTigNdVnKISbIzZgNJ/HhB0M7ZmJ2UpB7EuJ1M
q5aTIolqdoruwdGJ8d3gRr9xjDcuhhjr+FS8h6KfByJ0Qixroqk=
=BW1J
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2018-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Only a few changes at this point:
* new channels in 60 GHz
* clarify (average) ACK signal reporting API
* expose ieee80211_send_layer2_update() for all drivers
* start/stop mac80211's TXQs properly when required
* avoid regulatory restore with IE ignoring
* spelling: contidion -> condition
* fully implement WFA Multi-AP backhaul
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After the commit 802bfb19152c ("net/sched: user-space can't set
unknown tcfa_action values"), unknown tcfa_action values are
converted to TC_ACT_UNSPEC, but the common agreement is instead
rejecting such configurations.
This change also introduces a helper to simplify the destruction
of a single action, avoiding code duplication.
v1 -> v2:
- helper is now static and renamed according to act_* convention
- updated extack message, according to the new behavior
Fixes: 802bfb19152c ("net/sched: user-space can't set unknown tcfa_action values")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
decrypt_skb fails if the number of sg elements required to map it
is greater than MAX_SKB_FRAGS. nsg must always be calculated, but
skb_cow_data adds unnecessary memcpy's for the zerocopy case.
The new function skb_nsg calculates the number of scatterlist elements
required to map the skb without the extra overhead of skb_cow_data.
This patch reduces memcpy by 50% on my encrypted NBD benchmarks.
Reported-by: Vakul Garg <Vakul.garg@nxp.com>
Reviewed-by: Vakul Garg <Vakul.garg@nxp.com>
Tested-by: Vakul Garg <Vakul.garg@nxp.com>
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current behavior of IP defragmentation is inconsistent:
- some overlapping/wrong length fragments are dropped without
affecting the queue;
- most overlapping fragments cause the whole frag queue to be dropped.
This patch brings consistency: if a bad fragment is detected,
the whole frag queue is dropped. Two major benefits:
- fail fast: corrupted frag queues are cleared immediately, instead of
by timeout;
- testing of overlapping fragments is now much easier: any kind of
random fragment length mutation now leads to the frag queue being
discarded (IP packet dropped); before this patch, some overlaps were
"corrected", with tests not seeing expected packet drops.
Note that in one case (see "if (end&7)" conditional) the current
behavior is preserved as there are concerns that this could be
legitimate padding.
Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since [gs]et_settings ethtool_ops callbacks have been deprecated in
February 2016, all in tree NIC drivers have been converted to provide
[gs]et_link_ksettings() and out of tree drivers have had enough time to do
the same.
Drop get_settings() and set_settings() and implement both ETHTOOL_[GS]SET
and ETHTOOL_[GS]LINKSETTINGS only using [gs]et_link_ksettings().
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_unregister_all(PF_INET6) gets called from inet6_init in cases when
no handler has been registered for PF_INET6 yet, for example if
ip6_mr_init() fails. Abort and avoid a NULL pointer deref in that case.
Example of panic (triggered by faking a failure of
register_pernet_subsys):
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[...]
RIP: 0010:rtnl_unregister_all+0x17e/0x2a0
[...]
Call Trace:
? rtnetlink_net_init+0x250/0x250
? sock_unregister+0x103/0x160
? kernel_getsockopt+0x200/0x200
inet6_init+0x197/0x20d
Fixes: e2fddf5e96df ("[IPV6]: Make af_inet6 to check ip6_route_init return value.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
contains an error in the cleanup path of inet6_init(): when
proto_register(&pingv6_prot, 1) fails, we try to unregister
&pingv6_prot. When rawv6_init() fails, we skip unregistering
&pingv6_prot.
Example of panic (triggered by faking a failure of
proto_register(&pingv6_prot, 1)):
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[...]
RIP: 0010:__list_del_entry_valid+0x79/0x160
[...]
Call Trace:
proto_unregister+0xbb/0x550
? trace_preempt_on+0x6f0/0x6f0
? sock_no_shutdown+0x10/0x10
inet6_init+0x153/0x1b8
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 15e668070a64 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
moved the cleanup label for ipmr_fail, but should have changed the
contents of the cleanup labels as well. Now we can end up cleaning up
icmpv6 even though it hasn't been initialized (jump to icmp_fail or
ipmr_fail).
Simply undo things in the reverse order of their initialization.
Example of panic (triggered by faking a failure of icmpv6_init):
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[...]
RIP: 0010:__list_del_entry_valid+0x79/0x160
[...]
Call Trace:
? lock_release+0x8a0/0x8a0
unregister_pernet_operations+0xd4/0x560
? ops_free_list+0x480/0x480
? down_write+0x91/0x130
? unregister_pernet_subsys+0x15/0x30
? down_read+0x1b0/0x1b0
? up_read+0x110/0x110
? kmem_cache_create_usercopy+0x1b4/0x240
unregister_pernet_subsys+0x1d/0x30
icmpv6_cleanup+0x1d/0x30
inet6_init+0x1b5/0x23f
Fixes: 15e668070a64 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported a use-after-free in tipc_group_fill_sock_diag(),
where tipc_group_fill_sock_diag() still reads tsk->group meanwhile
tipc_group_delete() just deletes it in tipc_release().
tipc_nl_sk_walk() aims to lock this sock when walking each sock
in the hash table to close race conditions with sock changes like
this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately
this doesn't work at all. All non-BH call path should take
lock_sock() instead to make it work.
tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu()
where RCU read lock is required, this is the reason why lock_sock()
can't be taken on this path. This could be resolved by switching to
rhashtable iterator API's, where taking a sleepable lock is possible.
Also, the iterator API's are friendly for restartable calls like
diag dump, the last position is remembered behind the scence,
all we need to do here is saving the iterator into cb->args[].
I tested this with parallel tipc diag dump and thousands of tipc
socket creation and release, no crash or memory leak.
Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rhashtable_walk_exit() must be paired with rhashtable_walk_enter().
Fixes: 40f9f4397060 ("tipc: Fix tipc_sk_reinit race conditions")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before the commit d6990976af7c ("vti6: fix PMTU caching and reporting
on xmit") '!skb->ignore_df' check was always true because the function
skb_scrub_packet() was called before it, resetting ignore_df to zero.
In the commit, skb_scrub_packet() was moved below, and now this check
can be false for the packet, e.g. when sending it in the two fragments,
this prevents successful PMTU updates in such case. The next attempts
to send the packet lead to the same tx error. Moreover, vti6 initial
MTU value relies on PMTU adjustments.
This issue can be reproduced with the following LTP test script:
udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000
Fixes: ccd740cbc6e0 ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the xdp_umem_get_{data,dma} functions to include/net/xdp_sock.h,
so that the upcoming zero-copy implementation in the Ethernet drivers
can utilize them.
Also, supply some dummy function implementations for
CONFIG_XDP_SOCKETS=n configs.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Export __xdp_rxq_info_unreg_mem_model as xdp_rxq_info_unreg_mem_model,
so it can be used from netdev drivers. Also, add additional checks for
the memory type.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit adds proper MEM_TYPE_ZERO_COPY support for
convert_to_xdp_frame. Converting a MEM_TYPE_ZERO_COPY xdp_buff to an
xdp_frame is done by transforming the MEM_TYPE_ZERO_COPY buffer into a
MEM_TYPE_PAGE_ORDER0 frame. This is costly, and in the future it might
make sense to implement a more sophisticated thread-safe alloc/free
scheme for MEM_TYPE_ZERO_COPY, so that no allocation and copy is
required in the fast-path.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When we perform the sg shift repair for the scatterlist ring, we
currently start out at i = first_sg + 1. However, this is not
correct since the first_sg could point to the sge sitting at slot
MAX_SKB_FRAGS - 1, and a subsequent i = MAX_SKB_FRAGS will access
the scatterlist ring (sg) out of bounds. Add the sk_msg_iter_var()
helper for iterating through the ring, and apply the same rule
for advancing to the next ring element as we do elsewhere. Later
work will use this helper also in other places.
Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If first_sg and last_sg wraps around in the scatterlist ring, then we
need to account for that in the shift as well. E.g. crafting such msgs
where this is the case leads to a hang as shift becomes negative. E.g.
consider the following scenario:
first_sg := 14 |=> shift := -12 msg->sg_start := 10
last_sg := 3 | msg->sg_end := 5
round 1: i := 15, move_from := 3, sg[15] := sg[ 3]
round 2: i := 0, move_from := -12, sg[ 0] := sg[-12]
round 3: i := 1, move_from := -11, sg[ 1] := sg[-11]
round 4: i := 2, move_from := -10, sg[ 2] := sg[-10]
[...]
round 13: i := 11, move_from := -1, sg[ 2] := sg[ -1]
round 14: i := 12, move_from := 0, sg[ 2] := sg[ 0]
round 15: i := 13, move_from := 1, sg[ 2] := sg[ 1]
round 16: i := 14, move_from := 2, sg[ 2] := sg[ 2]
round 17: i := 15, move_from := 3, sg[ 2] := sg[ 3]
[...]
This means we will loop forever and never hit the msg->sg_end condition
to break out of the loop. When we see that the ring wraps around, then
the shift should be MAX_SKB_FRAGS - first_sg + last_sg - 1. Meaning,
the remainder slots from the tail of the ring and the head until last_sg
combined.
Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In the current code, msg->data is set as sg_virt(&sg[i]) + start - offset
and msg->data_end relative to it as msg->data + bytes. Using iterator i
to point to the updated starting scatterlist element holds true for some
cases, however not for all where we'd end up pointing out of bounds. It
is /correct/ for these ones:
1) When first finding the starting scatterlist element (sge) where we
find that the page is already privately owned by the msg and where
the requested bytes and headroom fit into the sge's length.
However, it's /incorrect/ for the following ones:
2) After we made the requested area private and updated the newly allocated
page into first_sg slot of the scatterlist ring; when we find that no
shift repair of the ring is needed where we bail out updating msg->data
and msg->data_end. At that point i will point to last_sg, which in this
case is the next elem of first_sg in the ring. The sge at that point
might as well be invalid (e.g. i == msg->sg_end), which we use for
setting the range of sg_virt(&sg[i]). The correct one would have been
first_sg.
3) Similar as in 2) but when we find that a shift repair of the ring is
needed. In this case we fix up all sges and stop once we've reached the
end. In this case i will point to will point to the new msg->sg_end,
and the sge at that point will be invalid. Again here the requested
range sits in first_sg.
Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When building building AMSDU from non-linear SKB, we hit a
kernel panic when trying to push the padding to the tail.
Instead, put the padding at the head of the next subframe.
This also fixes the A-MSDU subframes to not have the padding
accounted in the length field and not have pad at all for
the last subframe, both required by the spec.
Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Reviewed-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
IEEE 802.11-2016 14.10.8.3 HWMP sequence numbering says:
If it is a target mesh STA, it shall update its own HWMP SN to
maximum (current HWMP SN, target HWMP SN in the PREQ element) + 1
immediately before it generates a PREP element in response to a
PREQ element.
Signed-off-by: Yuan-Chi Pang <fu3mo6goo@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Modify the API to include all ACK frames in average ACK
signal strength reporting, not just ACKs for data frames.
Make exposing the data conditional on implementing the
extended feature flag.
This is how it was really implemented in mac80211, update
the code there to use the new defines and clean up some of
the setting code.
Keep nl80211.h source compatibility by keeping the old names.
Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org>
[rewrite commit log, change compatibility to be old=new
instead of the other way around, update kernel-doc,
roll in mac80211 changes, make mac80211 depend on valid
bit instead of HW flag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The current mac80211 WDS (4-address mode) can be used to cover most of the
Multi-AP requirements for Data frames per the WFA Multi-AP Specification v1.0.
When configuring AP/STA interfaces in 4-address mode, they are able to function
as fronthaul AP/backhaul STA of Multi-AP device complying below
Tx, Rx requirements except one missing STA Rx requirement added by this patch.
Multi-AP specification section 14.1 describes the following requirements:
Transmitter requirements
------------------------
1. Fronthaul AP
i) When DA!=RA of backhaul STA, must use 4-address format
ii) When DA==RA of backhaul STA, shall use either 3-address
or 4-address format with RA updated with STA MAC
(mac80211 support 4-address format via AP/VLAN interface)
2. Backhaul STA
i) When SA!=TA of backhaul STA, must use 4-address format
ii) When SA==TA of backhaul STA, shall use either 3-address
or 4-address format with RA updated with AP MAC
(mac80211 support 4-address format via use_4addr)
Receiver requirements
---------------------
1. Fronthaul AP
i) When SA!=TA of backhaul STA, must support receiving 4-address
format frames
ii) When SA==TA of backhaul STA, must support receiving both
3-address and 4-address format frames
(mac80211 support both 3-addr & 4-addr via AP/VLAN interface)
2. Backhaul STA
i) When DA!=RA of backhaul STA, must support receiving 4-address
format frames
ii) When DA==RA of backhaul STA, must support receiving both
3-address and 4-address format frames
(mac80211 support only receiving 4-address format via
use_4addr)
This patch addresses the above Rx requirement (ii) for backhaul STA to receive
unicast (DA==RA) 3-address frames in addition to 4-address frames.
The current design doesn't accept 3-address frames when configured in 4-address
mode (use_4addr). Hence add a check to allow 3-address frames when DA==RA of
backhaul STA (adhering to Table 9-26 of IEEE Std 802.11™-2016).
This case was tested with a bridged station interface when associated with
a non-mac80211 based vendor AP implementation using 3-address frames for WDS.
STA was able to support the Multi-AP Rx requirement when DA==RA. No issues,
no loops seen when tested with mac80211 based AP as well.
Verified and confirmed all other Tx and Rx requirements of AP and STA for
Multi-AP respectively. They all work using the current mac80211-WDS design.
Signed-off-by: Sathishkumar Muruganandam <murugana@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
While recently going over bpf_msg_pull_data(), I noticed three
issues which are fixed in here:
1) When we attempt to find the first scatterlist element (sge)
for the start offset, we add len to the offset before we check
for start < offset + len, whereas it should come after when
we iterate to the next sge to accumulate the offsets. For
example, given a start offset of 12 with a sge length of 8
for the first sge in the list would lead us to determine this
sge as the first sge thinking it covers first 16 bytes where
start is located, whereas start sits in subsequent sges so
we would end up pulling in the wrong data.
2) After figuring out the starting sge, we have a short-cut test
in !msg->sg_copy[i] && bytes <= len. This checks whether it's
not needed to make the page at the sge private where we can
just exit by updating msg->data and msg->data_end. However,
the length test is not fully correct. bytes <= len checks
whether the requested bytes (end - start offsets) fit into the
sge's length. The part that is missing is that start must not
be sge length aligned. Meaning, the start offset into the sge
needs to be accounted as well on top of the requested bytes
as otherwise we can access the sge out of bounds. For example
the sge could have length of 8, our requested bytes could have
length of 8, but at a start offset of 4, so we also would need
to pull in 4 bytes of the next sge, when we jump to the out
label we do set msg->data to sg_virt(&sg[i]) + start - offset
and msg->data_end to msg->data + bytes which would be oob.
3) The subsequent bytes < copy test for finding the last sge has
the same issue as in point 2) but also it tests for less than
rather than less or equal to. Meaning if the sge length is of
8 and requested bytes of 8 while having the start aligned with
the sge, we would unnecessarily go and pull in the next sge as
well to make it private.
Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
freq_reg_info expects to get the frequency in kHz. Instead we
accidently pass it in MHz. Thus, currently the function always
return ERR rule. Fix that.
Fixes: 50f32718e125 ("nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[fix kHz/MHz in commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
TXOP (also known as Channel Occupancy Time) is u16 and should be
added using nla_put_u16 instead of u8, fix that.
Fixes: 50f32718e125 ("nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The current support in the 60GHz band is for channels 1-4.
Add support for channels 5 and 6.
This requires enlarging ieee80211_channel.center_freq from u16 to u32.
Signed-off-by: Alexei Avshalom Lazar <ailizaro@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>