Commit Graph

63536 Commits

Author SHA1 Message Date
Maciej Żenczykowski
ea53c24cb0 FROMGIT: bpf: Do not change gso_size during bpf_skb_change_proto()
This is technically a backwards incompatible change in behaviour, but I'm
going to argue that it is very unlikely to break things, and likely to fix
*far* more then it breaks.

In no particular order, various reasons follow:

(a) I've long had a bug assigned to myself to debug a super rare kernel crash
on Android Pixel phones which can (per stacktrace) be traced back to BPF clat
IPv6 to IPv4 protocol conversion causing some sort of ugly failure much later
on during transmit deep in the GSO engine, AFAICT precisely because of this
change to gso_size, though I've never been able to manually reproduce it. I
believe it may be related to the particular network offload support of attached
USB ethernet dongle being used for tethering off of an IPv6-only cellular
connection. The reason might be we end up with more segments than max permitted,
or with a GSO packet with only one segment... (either way we break some
assumption and hit a BUG_ON)

(b) There is no check that the gso_size is > 20 when reducing it by 20, so we
might end up with a negative (or underflowing) gso_size or a gso_size of 0.
This can't possibly be good. Indeed this is probably somehow exploitable (or
at least can result in a kernel crash) by delivering crafted packets and perhaps
triggering an infinite loop or a divide by zero... As a reminder: gso_size (MSS)
is related to MTU, but not directly derived from it: gso_size/MSS may be
significantly smaller then one would get by deriving from local MTU. And on
some NICs (which do loose MTU checking on receive, it may even potentially be
larger, for example my work pc with 1500 MTU can receive 1520 byte frames [and
sometimes does due to bugs in a vendor plat46 implementation]). Indeed even just
going from 21 to 1 is potentially problematic because it increases the number
of segments by a factor of 21 (think DoS, or some other crash due to too many
segments).

(c) It's always safe to not increase the gso_size, because it doesn't result in
the max packet size increasing.  So the skb_increase_gso_size() call was always
unnecessary for correctness (and outright undesirable, see later). As such the
only part which is potentially dangerous (ie. could cause backwards compatibility
issues) is the removal of the skb_decrease_gso_size() call.

(d) If the packets are ultimately destined to the local device, then there is
absolutely no benefit to playing around with gso_size. It only matters if the
packets will egress the device. ie. we're either forwarding, or transmitting
from the device.

(e) This logic only triggers for packets which are GSO. It does not trigger for
skbs which are not GSO. It will not convert a non-GSO MTU sized packet into a
GSO packet (and you don't even know what the MTU is, so you can't even fix it).
As such your transmit path must *already* be able to handle an MTU 20 bytes
larger then your receive path (for IPv4 to IPv6 translation) - and indeed 28
bytes larger due to IPv4 fragments. Thus removing the skb_decrease_gso_size()
call doesn't actually increase the size of the packets your transmit side must
be able to handle. ie. to handle non-GSO max-MTU packets, the IPv4/IPv6 device/
route MTUs must already be set correctly. Since for example with an IPv4 egress
MTU of 1500, IPv4 to IPv6 translation will already build 1520 byte IPv6 frames,
so you need a 1520 byte device MTU. This means if your IPv6 device's egress
MTU is 1280, your IPv4 route must be 1260 (and actually 1252, because of the
need to handle fragments). This is to handle normal non-GSO packets. Thus the
reduction is simply not needed for GSO packets, because when they're correctly
built, they will already be the right size.

(f) TSO/GSO should be able to exactly undo GRO: the number of packets (TCP
segments) should not be modified, so that TCP's MSS counting works correctly
(this matters for congestion control). If protocol conversion changes the
gso_size, then the number of TCP segments may increase or decrease. Packet loss
after protocol conversion can result in partial loss of MSS segments that the
sender sent. How's the sending TCP stack going to react to receiving ACKs/SACKs
in the middle of the segments it sent?

(g) skb_{decrease,increase}_gso_size() are already no-ops for GSO_BY_FRAGS
case (besides triggering WARN_ON_ONCE). This means you already cannot guarantee
that gso_size (and thus resulting packet MTU) is changed. ie. you must assume
it won't be changed.

(h) changing gso_size is outright buggy for UDP GSO packets, where framing
matters (I believe that's also the case for SCTP, but it's already excluded
by [g]).  So the only remaining case is TCP, which also doesn't want it
(see [f]).

(i) see also the reasoning on the previous attempt at fixing this
(commit fa7b83bf3b156c767f3e4a25bbf3817b08f3ff8e) which shows that the current
behaviour causes TCP packet loss:

  In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
  coalesced packet payload can be > MSS, but < MSS + 20.

  bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload
  length. After then tcp_gso_segment checks for the payload length if it
  is <= MSS. The condition is causing the packet to be dropped.

  tcp_gso_segment():
    [...]
    mss = skb_shinfo(skb)->gso_size;
    if (unlikely(skb->len <= mss)) goto out;
    [...]

Thus changing the gso_size is simply a very bad idea. Increasing is unnecessary
and buggy, and decreasing can go negative.

Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dongseok Yi <dseok.yi@samsung.com>
Cc: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/CANP3RGfjLikQ6dg=YpBU0OeHvyv7JOki7CyOUS9modaXAi-9vQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210617000953.2787453-2-zenczykowski@gmail.com

(cherry picked from commit 364745fbe981a4370f50274475da4675661104df https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=364745fbe981a4370f50274475da4675661104df )
Test: builds, TreeHugger
Bug: 188690383
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I0ef3174cbd3caaa42d5779334a9c0bfdc9ab81f5
2021-06-28 21:04:04 +00:00
Maciej Żenczykowski
e6c0526092 ANDROID: minor fixups of xt_IDLETIMER support
Add missing newline termination to a bunch of pr_debug()/pr_err()

Test: builds, and kernel net tests passes
Bug: 183485987
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I53eccc88c383259bc7a71ea688f728a0908fa765
2021-06-17 17:44:38 +00:00
Greg Kroah-Hartman
9e08e97ec6 Merge 5.10.43 into android12-5.10
Changes in 5.10.43
	btrfs: tree-checker: do not error out if extent ref hash doesn't match
	net: usb: cdc_ncm: don't spew notifications
	hwmon: (dell-smm-hwmon) Fix index values
	hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_3 for RAA228228
	netfilter: conntrack: unregister ipv4 sockopts on error unwind
	efi/fdt: fix panic when no valid fdt found
	efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
	efi/libstub: prevent read overflow in find_file_option()
	efi: cper: fix snprintf() use in cper_dimm_err_location()
	vfio/pci: Fix error return code in vfio_ecap_init()
	vfio/pci: zap_vma_ptes() needs MMU
	samples: vfio-mdev: fix error handing in mdpy_fb_probe()
	vfio/platform: fix module_put call in error flow
	ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
	HID: logitech-hidpp: initialize level variable
	HID: pidff: fix error return code in hid_pidff_init()
	HID: i2c-hid: fix format string mismatch
	devlink: Correct VIRTUAL port to not have phys_port attributes
	net/sched: act_ct: Offload connections with commit action
	net/sched: act_ct: Fix ct template allocation for zone 0
	mptcp: always parse mptcp options for MPC reqsk
	nvme-rdma: fix in-casule data send for chained sgls
	ACPICA: Clean up context mutex during object deletion
	perf probe: Fix NULL pointer dereference in convert_variable_location()
	net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs
	net: sock: fix in-kernel mark setting
	net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
	net/tls: Fix use-after-free after the TLS device goes down and up
	net/mlx5e: Fix incompatible casting
	net/mlx5: Check firmware sync reset requested is set before trying to abort it
	net/mlx5e: Check for needed capability for cvlan matching
	net/mlx5: DR, Create multi-destination flow table with level less than 64
	nvmet: fix freeing unallocated p2pmem
	netfilter: nft_ct: skip expectations for confirmed conntrack
	netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
	drm/i915/selftests: Fix return value check in live_breadcrumbs_smoketest()
	bpf: Simplify cases in bpf_base_func_proto
	bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
	ieee802154: fix error return code in ieee802154_add_iface()
	ieee802154: fix error return code in ieee802154_llsec_getparams()
	igb: add correct exception tracing for XDP
	ixgbevf: add correct exception tracing for XDP
	cxgb4: fix regression with HASH tc prio value update
	ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
	ice: Fix allowing VF to request more/less queues via virtchnl
	ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
	ice: handle the VF VSI rebuild failure
	ice: report supported and advertised autoneg using PHY capabilities
	ice: Allow all LLDP packets from PF to Tx
	i2c: qcom-geni: Add shutdown callback for i2c
	cxgb4: avoid link re-train during TC-MQPRIO configuration
	i40e: optimize for XDP_REDIRECT in xsk path
	i40e: add correct exception tracing for XDP
	ice: simplify ice_run_xdp
	ice: optimize for XDP_REDIRECT in xsk path
	ice: add correct exception tracing for XDP
	ixgbe: optimize for XDP_REDIRECT in xsk path
	ixgbe: add correct exception tracing for XDP
	arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
	optee: use export_uuid() to copy client UUID
	bus: ti-sysc: Fix am335x resume hang for usb otg module
	arm64: dts: ls1028a: fix memory node
	arm64: dts: zii-ultra: fix 12V_MAIN voltage
	arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
	ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
	ARM: dts: imx7d-pico: Fix the 'tuning-step' property
	ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
	bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
	tipc: add extack messages for bearer/media failure
	tipc: fix unique bearer names sanity check
	serial: stm32: fix threaded interrupt handling
	riscv: vdso: fix and clean-up Makefile
	io_uring: fix link timeout refs
	io_uring: use better types for cflags
	drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate
	Bluetooth: fix the erroneous flush_work() order
	Bluetooth: use correct lock to prevent UAF of hdev object
	wireguard: do not use -O3
	wireguard: peer: allocate in kmem_cache
	wireguard: use synchronize_net rather than synchronize_rcu
	wireguard: selftests: remove old conntrack kconfig value
	wireguard: selftests: make sure rp_filter is disabled on vethc
	wireguard: allowedips: initialize list head in selftest
	wireguard: allowedips: remove nodes in O(1)
	wireguard: allowedips: allocate nodes in kmem_cache
	wireguard: allowedips: free empty intermediate nodes when removing single node
	net: caif: added cfserl_release function
	net: caif: add proper error handling
	net: caif: fix memory leak in caif_device_notify
	net: caif: fix memory leak in cfusbl_device_notify
	HID: i2c-hid: Skip ELAN power-on command after reset
	HID: magicmouse: fix NULL-deref on disconnect
	HID: multitouch: require Finger field to mark Win8 reports as MT
	gfs2: fix scheduling while atomic bug in glocks
	ALSA: timer: Fix master timer notification
	ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
	ALSA: hda: update the power_state during the direct-complete
	ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch
	ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
	ext4: fix memory leak in ext4_fill_super
	ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
	ext4: fix fast commit alignment issues
	ext4: fix memory leak in ext4_mb_init_backend on error path.
	ext4: fix accessing uninit percpu counter variable with fast_commit
	usb: dwc2: Fix build in periphal-only mode
	pid: take a reference when initializing `cad_pid`
	ocfs2: fix data corruption by fallocate
	mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
	mm/page_alloc: fix counting of free pages after take off from buddy
	x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
	x86/sev: Check SME/SEV support in CPUID first
	nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
	drm/amdgpu: Don't query CE and UE errors
	drm/amdgpu: make sure we unpin the UVD BO
	x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
	powerpc/kprobes: Fix validation of prefixed instructions across page boundary
	btrfs: mark ordered extent and inode with error if we fail to finish
	btrfs: fix error handling in btrfs_del_csums
	btrfs: return errors from btrfs_del_csums in cleanup_ref_head
	btrfs: fixup error handling in fixup_inode_link_counts
	btrfs: abort in rename_exchange if we fail to insert the second ref
	btrfs: fix deadlock when cloning inline extents and low on available space
	mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
	drm/msm/dpu: always use mdp device to scale bandwidth
	btrfs: fix unmountable seed device after fstrim
	KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
	KVM: arm64: Fix debug register indexing
	x86/kvm: Teardown PV features on boot CPU as well
	x86/kvm: Disable kvmclock on all CPUs on shutdown
	x86/kvm: Disable all PV features on crash
	lib/lz4: explicitly support in-place decompression
	i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
	netfilter: nf_tables: missing error reporting for not selected expressions
	xen-netback: take a reference to the RX task thread
	neighbour: allow NUD_NOARP entries to be forced GCed
	Linux 5.10.43

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8d7ec0878193e4e454076809b7fb71fcc4e3d810
2021-06-12 14:48:14 +02:00
Maciej Żenczykowski
3871aa16fd ANDROID: core of xt_IDLETIMER send_nl_msg support
This was reverted in ee6918c6f7 due to
conflicts with upstream, and this attempts to reapply the majority
of that change in an upstream compatible fashion.

Test: builds, and kernel net tests passes, booted on phone,
  but no real testing
Bug: 183485987
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I0fa1da3cb38d37c6888444dc36d49e6d1828a855
2021-06-11 14:54:24 +00:00
Maciej Żenczykowski
b4355a880a ANDROID: start to re-add xt_IDLETIMER send_nl_msg support
This was reverted in ee6918c6f7 due to conflicts with
upstream, this first patch is just the minimum necessary to make the netfilter
IDLETIMER target with --send_nl_msg load successfully:

  phone-5.10:/ # iptables-save | egrep IDLETIMER
  -A idletimer_raw_PREROUTING -i rmnet0 -j IDLETIMER --timeout 10 --label 0 --send_nl_msg
  -A idletimer_mangle_POSTROUTING -o rmnet0 -j IDLETIMER --timeout 10 --label 0 --send_nl_msg
  phone-5.10:/ # ip6tables-save | egrep IDLETIMER
  -A idletimer_raw_PREROUTING -i rmnet0 -j IDLETIMER --timeout 10 --label 0 --send_nl_msg
  -A idletimer_mangle_POSTROUTING -o rmnet0 -j IDLETIMER --timeout 10 --label 0 --send_nl_msg

Test: builds, and kernel net tests passes, booted on phone, observed ip{,6}tables loading rules
Bug: 183485987
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I1fe2c4e41a092cc82c3d6d49d1217798b2728bcb
2021-06-11 14:54:06 +00:00
David Ahern
d17d47da59 neighbour: allow NUD_NOARP entries to be forced GCed
commit 7a6b1ab7475fd6478eeaf5c9d1163e7a18125c8f upstream.

IFF_POINTOPOINT interfaces use NUD_NOARP entries for IPv6. It's possible to
fill up the neighbour table with enough entries that it will overflow for
valid connections after that.

This behaviour is more prevalent after commit 58956317c8 ("neighbor:
Improve garbage collection") is applied, as it prevents removal from
entries that are not NUD_FAILED, unless they are more than 5s old.

Fixes: 58956317c8 (neighbor: Improve garbage collection)
Reported-by: Kasper Dupont <kasperd@gjkwv.06.feb.2021.kasperd.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:29 +02:00
Pablo Neira Ayuso
316de9a88c netfilter: nf_tables: missing error reporting for not selected expressions
commit c781471d67a56d7d4c113669a11ede0463b5c719 upstream.

Sometimes users forget to turn on nftables extensions from Kconfig that
they need. In such case, the error reporting from userspace is
misleading:

 $ sudo nft add rule x y counter
 Error: Could not process rule: No such file or directory
 add rule x y counter
 ^^^^^^^^^^^^^^^^^^^^

Add missing NL_SET_BAD_ATTR() to provide a hint:

 $ nft add rule x y counter
 Error: Could not process rule: No such file or directory
 add rule x y counter
              ^^^^^^^

Fixes: 83d9dcba06 ("netfilter: nf_tables: extended netlink error reporting for expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:29 +02:00
Krzysztof Kozlowski
48ee0db61c nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
commit 4ac06a1e013cf5fdd963317ffd3b968560f33bba upstream.

It's possible to trigger NULL pointer dereference by local unprivileged
user, when calling getsockname() after failed bind() (e.g. the bind
fails because LLCP_SAP_MAX used as SAP):

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014
  Call Trace:
   llcp_sock_getname+0xb1/0xe0
   __sys_getpeername+0x95/0xc0
   ? lockdep_hardirqs_on_prepare+0xd5/0x180
   ? syscall_enter_from_user_mode+0x1c/0x40
   __x64_sys_getpeername+0x11/0x20
   do_syscall_64+0x36/0x70
   entry_SYSCALL_64_after_hwframe+0x44/0xae

This can be reproduced with Syzkaller C repro (bind followed by
getpeername):
https://syzkaller.appspot.com/x/repro.c?x=14def446e00000

Cc: <stable@vger.kernel.org>
Fixes: d646960f79 ("NFC: Initial LLCP support")
Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:27 +02:00
Pavel Skripkin
46403c1f80 net: caif: fix memory leak in cfusbl_device_notify
commit 7f5d86669fa4d485523ddb1d212e0a2d90bd62bb upstream.

In case of caif_enroll_dev() fail, allocated
link_support won't be assigned to the corresponding
structure. So simply free allocated pointer in case
of error.

Fixes: 7ad65bf68d ("caif: Add support for CAIF over CDC NCM USB interface")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:25 +02:00
Pavel Skripkin
af2806345a net: caif: fix memory leak in caif_device_notify
commit b53558a950a89824938e9811eddfc8efcd94e1bb upstream.

In case of caif_enroll_dev() fail, allocated
link_support won't be assigned to the corresponding
structure. So simply free allocated pointer in case
of error

Fixes: 7c18d2205e ("caif: Restructure how link caif link layer enroll")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+7ec324747ce876a29db6@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:24 +02:00
Pavel Skripkin
d6db727457 net: caif: add proper error handling
commit a2805dca5107d5603f4bbc027e81e20d93476e96 upstream.

caif_enroll_dev() can fail in some cases. Ingnoring
these cases can lead to memory leak due to not assigning
link_support pointer to anywhere.

Fixes: 7c18d2205e ("caif: Restructure how link caif link layer enroll")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:24 +02:00
Pavel Skripkin
dac53568c6 net: caif: added cfserl_release function
commit bce130e7f392ddde8cfcb09927808ebd5f9c8669 upstream.

Added cfserl_release() function.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:24 +02:00
Lin Ma
74caf718cc Bluetooth: use correct lock to prevent UAF of hdev object
commit e305509e678b3a4af2b3cfd410f409f7cdaabb52 upstream.

The hci_sock_dev_event() function will cleanup the hdev object for
sockets even if this object may still be in used within the
hci_sock_bound_ioctl() function, result in UAF vulnerability.

This patch replace the BH context lock to serialize these affairs
and prevent the race condition.

Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:23 +02:00
Lin Ma
3795007c8d Bluetooth: fix the erroneous flush_work() order
commit 6a137caec23aeb9e036cdfd8a46dd8a366460e5d upstream.

In the cleanup routine for failed initialization of HCI device,
the flush_work(&hdev->rx_work) need to be finished before the
flush_work(&hdev->cmd_work). Otherwise, the hci_rx_work() can
possibly invoke new cmd_work and cause a bug, like double free,
in late processings.

This was assigned CVE-2021-3564.

This patch reorder the flush_work() to fix this bug.

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Hao Xiong <mart1n@zju.edu.cn>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:23 +02:00
Hoang Le
fdf1e5eec3 tipc: fix unique bearer names sanity check
[ Upstream commit f20a46c3044c3f75232b3d0e2d09af9b25efaf45 ]

When enabling a bearer by name, we don't sanity check its name with
higher slot in bearer list. This may have the effect that the name
of an already enabled bearer bypasses the check.

To fix the above issue, we just perform an extra checking with all
existing bearers.

Fixes: cb30a63384 ("tipc: refactor function tipc_enable_bearer()")
Cc: stable@vger.kernel.org
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:22 +02:00
Hoang Le
e31ae45ed1 tipc: add extack messages for bearer/media failure
[ Upstream commit b83e214b2e04204f1fc674574362061492c37245 ]

Add extack error messages for -EINVAL errors when enabling bearer,
getting/setting properties for a media/bearer

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:22 +02:00
Coco Li
0987023582 ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
[ Upstream commit 821bbf79fe46a8b1d18aa456e8ed0a3c208c3754 ]

Reported by syzbot:
HEAD commit:    90c911ad Merge tag 'fixes' of git://git.kernel.org/pub/scm..
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
dashboard link: https://syzkaller.appspot.com/bug?extid=123aa35098fd3c000eb7
compiler:       Debian clang version 11.0.1-2

==================================================================
BUG: KASAN: slab-out-of-bounds in fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline]
BUG: KASAN: slab-out-of-bounds in fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732
Read of size 8 at addr ffff8880145c78f8 by task syz-executor.4/17760

CPU: 0 PID: 17760 Comm: syz-executor.4 Not tainted 5.12.0-rc8-syzkaller #0
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x202/0x31e lib/dump_stack.c:120
 print_address_description+0x5f/0x3b0 mm/kasan/report.c:232
 __kasan_report mm/kasan/report.c:399 [inline]
 kasan_report+0x15c/0x200 mm/kasan/report.c:416
 fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline]
 fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732
 fib6_nh_release+0x9a/0x430 net/ipv6/route.c:3536
 fib6_info_destroy_rcu+0xcb/0x1c0 net/ipv6/ip6_fib.c:174
 rcu_do_batch kernel/rcu/tree.c:2559 [inline]
 rcu_core+0x8f6/0x1450 kernel/rcu/tree.c:2794
 __do_softirq+0x372/0x7a6 kernel/softirq.c:345
 invoke_softirq kernel/softirq.c:221 [inline]
 __irq_exit_rcu+0x22c/0x260 kernel/softirq.c:422
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:434
 sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1100
 </IRQ>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
RIP: 0010:lock_acquire+0x1f6/0x720 kernel/locking/lockdep.c:5515
Code: f6 84 24 a1 00 00 00 02 0f 85 8d 02 00 00 f7 c3 00 02 00 00 49 bd 00 00 00 00 00 fc ff df 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 3d 00 00 00 00 00 4b c7 44 3d 09 00 00 00 00 43 c7 44 3d
RSP: 0018:ffffc90009e06560 EFLAGS: 00000206
RAX: 1ffff920013c0cc0 RBX: 0000000000000246 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90009e066e0 R08: dffffc0000000000 R09: fffffbfff1f992b1
R10: fffffbfff1f992b1 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: 0000000000000000 R15: 1ffff920013c0cb4
 rcu_lock_acquire+0x2a/0x30 include/linux/rcupdate.h:267
 rcu_read_lock include/linux/rcupdate.h:656 [inline]
 ext4_get_group_info+0xea/0x340 fs/ext4/ext4.h:3231
 ext4_mb_prefetch+0x123/0x5d0 fs/ext4/mballoc.c:2212
 ext4_mb_regular_allocator+0x8a5/0x28f0 fs/ext4/mballoc.c:2379
 ext4_mb_new_blocks+0xc6e/0x24f0 fs/ext4/mballoc.c:4982
 ext4_ext_map_blocks+0x2be3/0x7210 fs/ext4/extents.c:4238
 ext4_map_blocks+0xab3/0x1cb0 fs/ext4/inode.c:638
 ext4_getblk+0x187/0x6c0 fs/ext4/inode.c:848
 ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:900
 ext4_append+0x1a4/0x360 fs/ext4/namei.c:67
 ext4_init_new_dir+0x337/0xa10 fs/ext4/namei.c:2768
 ext4_mkdir+0x4b8/0xc00 fs/ext4/namei.c:2814
 vfs_mkdir+0x45b/0x640 fs/namei.c:3819
 ovl_do_mkdir fs/overlayfs/overlayfs.h:161 [inline]
 ovl_mkdir_real+0x53/0x1a0 fs/overlayfs/dir.c:146
 ovl_create_real+0x280/0x490 fs/overlayfs/dir.c:193
 ovl_workdir_create+0x425/0x600 fs/overlayfs/super.c:788
 ovl_make_workdir+0xed/0x1140 fs/overlayfs/super.c:1355
 ovl_get_workdir fs/overlayfs/super.c:1492 [inline]
 ovl_fill_super+0x39ee/0x5370 fs/overlayfs/super.c:2035
 mount_nodev+0x52/0xe0 fs/super.c:1413
 legacy_get_tree+0xea/0x180 fs/fs_context.c:592
 vfs_get_tree+0x86/0x270 fs/super.c:1497
 do_new_mount fs/namespace.c:2903 [inline]
 path_mount+0x196f/0x2be0 fs/namespace.c:3233
 do_mount fs/namespace.c:3246 [inline]
 __do_sys_mount fs/namespace.c:3454 [inline]
 __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3431
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665f9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f68f2b87188 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 00000000004665f9
RDX: 00000000200000c0 RSI: 0000000020000000 RDI: 000000000040000a
RBP: 00000000004bfbb9 R08: 0000000020000100 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60
R13: 00007ffe19002dff R14: 00007f68f2b87300 R15: 0000000000022000

Allocated by task 17768:
 kasan_save_stack mm/kasan/common.c:38 [inline]
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:427 [inline]
 ____kasan_kmalloc+0xc2/0xf0 mm/kasan/common.c:506
 kasan_kmalloc include/linux/kasan.h:233 [inline]
 __kmalloc+0xb4/0x380 mm/slub.c:4055
 kmalloc include/linux/slab.h:559 [inline]
 kzalloc include/linux/slab.h:684 [inline]
 fib6_info_alloc+0x2c/0xd0 net/ipv6/ip6_fib.c:154
 ip6_route_info_create+0x55d/0x1a10 net/ipv6/route.c:3638
 ip6_route_add+0x22/0x120 net/ipv6/route.c:3728
 inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352
 rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553
 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg net/socket.c:674 [inline]
 ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350
 ___sys_sendmsg net/socket.c:2404 [inline]
 __sys_sendmsg+0x319/0x400 net/socket.c:2433
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Last potentially related work creation:
 kasan_save_stack+0x27/0x50 mm/kasan/common.c:38
 kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345
 __call_rcu kernel/rcu/tree.c:3039 [inline]
 call_rcu+0x1b1/0xa30 kernel/rcu/tree.c:3114
 fib6_info_release include/net/ip6_fib.h:337 [inline]
 ip6_route_info_create+0x10c4/0x1a10 net/ipv6/route.c:3718
 ip6_route_add+0x22/0x120 net/ipv6/route.c:3728
 inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352
 rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553
 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg net/socket.c:674 [inline]
 ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350
 ___sys_sendmsg net/socket.c:2404 [inline]
 __sys_sendmsg+0x319/0x400 net/socket.c:2433
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Second to last potentially related work creation:
 kasan_save_stack+0x27/0x50 mm/kasan/common.c:38
 kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345
 insert_work+0x54/0x400 kernel/workqueue.c:1331
 __queue_work+0x981/0xcc0 kernel/workqueue.c:1497
 queue_work_on+0x111/0x200 kernel/workqueue.c:1524
 queue_work include/linux/workqueue.h:507 [inline]
 call_usermodehelper_exec+0x283/0x470 kernel/umh.c:433
 kobject_uevent_env+0x1349/0x1730 lib/kobject_uevent.c:617
 kvm_uevent_notify_change+0x309/0x3b0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4809
 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:877 [inline]
 kvm_put_kvm+0x9c/0xd10 arch/x86/kvm/../../../virt/kvm/kvm_main.c:920
 kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3120
 __fput+0x352/0x7b0 fs/file_table.c:280
 task_work_run+0x146/0x1c0 kernel/task_work.c:140
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
 exit_to_user_mode_prepare+0x10b/0x1e0 kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
 syscall_exit_to_user_mode+0x26/0x70 kernel/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff8880145c7800
 which belongs to the cache kmalloc-192 of size 192
The buggy address is located 56 bytes to the right of
 192-byte region [ffff8880145c7800, ffff8880145c78c0)
The buggy address belongs to the page:
page:ffffea00005171c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x145c7
flags: 0xfff00000000200(slab)
raw: 00fff00000000200 ffffea00006474c0 0000000200000002 ffff888010c41a00
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880145c7780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff8880145c7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8880145c7880: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
                                                                ^
 ffff8880145c7900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880145c7980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================

In the ip6_route_info_create function, in the case that the nh pointer
is not NULL, the fib6_nh in fib6_info has not been allocated.
Therefore, when trying to free fib6_info in this error case using
fib6_info_release, the function will call fib6_info_destroy_rcu,
which it will access fib6_nh_release(f6i->fib6_nh);
However, f6i->fib6_nh doesn't have any refcount yet given the lack of allocation
causing the reported memory issue above.
Therefore, releasing the empty pointer directly instead would be the solution.

Fixes: f88d8ea67f ("ipv6: Plumb support for nexthop object in a fib6_info")
Fixes: 706ec91916 ("ipv6: Fix nexthop refcnt leak when creating ipv6 route info")
Signed-off-by: Coco Li <lixiaoyan@google.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:20 +02:00
Wei Yongjun
e513d88962 ieee802154: fix error return code in ieee802154_llsec_getparams()
[ Upstream commit 373e864cf52403b0974c2f23ca8faf9104234555 ]

Fix to return negative error code -ENOBUFS from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 3e9c156e2c ("ieee802154: add netlink interfaces for llsec")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20210519141614.3040055-1-weiyongjun1@huawei.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:19 +02:00
Zhen Lei
2a0ba0125c ieee802154: fix error return code in ieee802154_add_iface()
[ Upstream commit 79c6b8ed30e54b401c873dbad2511f2a1c525fd5 ]

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: be51da0f3e ("ieee802154: Stop using NLA_PUT*().")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210508062517.2574-1-thunder.leizhen@huawei.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:19 +02:00
Pablo Neira Ayuso
8d614eebc0 netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
[ Upstream commit 8971ee8b087750a23f3cd4dc55bff2d0303fd267 ]

The private helper data size cannot be updated. However, updates that
contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is
the same.

Fixes: 12f7a50533 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:19 +02:00
Pablo Neira Ayuso
5f3429c05e netfilter: nft_ct: skip expectations for confirmed conntrack
[ Upstream commit 1710eb913bdcda3917f44d383c32de6bdabfc836 ]

nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed
conntrack entry. However, nf_ct_ext_add() can only be called for
!nf_ct_is_confirmed().

[ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00
[ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202
[ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887
[ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440
[ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447
[ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440
[ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20
[ 1825.352240] FS:  00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000
[ 1825.352343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0
[ 1825.352508] Call Trace:
[ 1825.352544]  nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack]
[ 1825.352641]  nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct]
[ 1825.352716]  nft_do_chain+0x232/0x850 [nf_tables]

Add the ct helper extension only for unconfirmed conntrack. Skip rule
evaluation if the ct helper extension does not exist. Thus, you can
only create expectations from the first packet.

It should be possible to remove this limitation by adding a new action
to attach a generic ct helper to the first packet. Then, use this ct
helper extension from follow up packets to create the ct expectation.

While at it, add a missing check to skip the template conntrack too
and remove check for IPCT_UNTRACK which is implicit to !ct.

Fixes: 857b46027d ("netfilter: nft_ct: add ct expectations support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:18 +02:00
Maxim Mikityanskiy
f1d4184f12 net/tls: Fix use-after-free after the TLS device goes down and up
[ Upstream commit c55dcdd435aa6c6ad6ccac0a4c636d010ee367a4 ]

When a netdev with active TLS offload goes down, tls_device_down is
called to stop the offload and tear down the TLS context. However, the
socket stays alive, and it still points to the TLS context, which is now
deallocated. If a netdev goes up, while the connection is still active,
and the data flow resumes after a number of TCP retransmissions, it will
lead to a use-after-free of the TLS context.

This commit addresses this bug by keeping the context alive until its
normal destruction, and implements the necessary fallbacks, so that the
connection can resume in software (non-offloaded) kTLS mode.

On the TX side tls_sw_fallback is used to encrypt all packets. The RX
side already has all the necessary fallbacks, because receiving
non-decrypted packets is supported. The thing needed on the RX side is
to block resync requests, which are normally produced after receiving
non-decrypted packets.

The necessary synchronization is implemented for a graceful teardown:
first the fallbacks are deployed, then the driver resources are released
(it used to be possible to have a tls_dev_resync after tls_dev_del).

A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
mode. It's used to skip the RX resync logic completely, as it becomes
useless, and some objects may be released (for example, resync_async,
which is allocated and freed by the driver).

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:18 +02:00
Maxim Mikityanskiy
874ece252e net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
[ Upstream commit 05fc8b6cbd4f979a6f25759c4a17dd5f657f7ecd ]

RCU synchronization is guaranteed to finish in finite time, unlike a
busy loop that polls a flag. This patch is a preparation for the bugfix
in the next patch, where the same synchronize_net() call will also be
used to sync with the TX datapath.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:17 +02:00
Alexander Aring
a5de17bb91 net: sock: fix in-kernel mark setting
[ Upstream commit dd9082f4a9f94280fbbece641bf8fc0a25f71f7a ]

This patch fixes the in-kernel mark setting by doing an additional
sk_dst_reset() which was introduced by commit 50254256f3 ("sock: Reset
dst when changing sk_mark via setsockopt"). The code is now shared to
avoid any further suprises when changing the socket mark value.

Fixes: 84d1c61740 ("net: sock: add sock_set_mark")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:17 +02:00
Vladimir Oltean
09fdb6747b net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs
[ Upstream commit 4ef8d857b5f494e62bce9085031563fda35f9563 ]

When using sub-VLANs in the range of 1-7, the resulting value from:

	rx_vid = dsa_8021q_rx_vid_subvlan(ds, port, subvlan);

is wrong according to the description from tag_8021q.c:

 | 11  | 10  |  9  |  8  |  7  |  6  |  5  |  4  |  3  |  2  |  1  |  0  |
 +-----------+-----+-----------------+-----------+-----------------------+
 |    DIR    | SVL |    SWITCH_ID    |  SUBVLAN  |          PORT         |
 +-----------+-----+-----------------+-----------+-----------------------+

For example, when ds->index == 0, port == 3 and subvlan == 1,
dsa_8021q_rx_vid_subvlan() returns 1027, same as it returns for
subvlan == 0, but it should have returned 1043.

This is because the low portion of the subvlan bits are not masked
properly when writing into the 12-bit VLAN value. They are masked into
bits 4:3, but they should be masked into bits 5:4.

Fixes: 3eaae1d05f ("net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:17 +02:00
Paolo Abeni
b198f77a36 mptcp: always parse mptcp options for MPC reqsk
[ Upstream commit 06f9a435b3aa12f4de6da91f11fdce8ce7b46205 ]

In subflow_syn_recv_sock() we currently skip options parsing
for OoO packet, given that such packets may not carry the relevant
MPC option.

If the peer generates an MPC+data TSO packet and some of the early
segments are lost or get reorder, we server will ignore the peer key,
causing transient, unexpected fallback to TCP.

The solution is always parsing the incoming MPTCP options, and
do the fallback only for in-order packets. This actually cleans
the existing code a bit.

Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:16 +02:00
Ariel Levkovich
be0d850726 net/sched: act_ct: Fix ct template allocation for zone 0
[ Upstream commit fb91702b743dec78d6507c53a2dec8a8883f509d ]

Fix current behavior of skipping template allocation in case the
ct action is in zone 0.

Skipping the allocation may cause the datapath ct code to ignore the
entire ct action with all its attributes (commit, nat) in case the ct
action in zone 0 was preceded by a ct clear action.

The ct clear action sets the ct_state to untracked and resets the
skb->_nfct pointer. Under these conditions and without an allocated
ct template, the skb->_nfct pointer will remain NULL which will
cause the tc ct action handler to exit without handling commit and nat
actions, if such exist.

For example, the following rule in OVS dp:
recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \
in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \
recirc(0x37a)

Will result in act_ct skipping the commit and nat actions in zone 0.

The change removes the skipping of template allocation for zone 0 and
treats it the same as any other zone.

Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20210526170110.54864-1-lariel@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:16 +02:00
Paul Blakey
f07c548314 net/sched: act_ct: Offload connections with commit action
[ Upstream commit 0cc254e5aa37cf05f65bcdcdc0ac5c58010feb33 ]

Currently established connections are not offloaded if the filter has a
"ct commit" action. This behavior will not offload connections of the
following scenario:

$ tc_filter add dev $DEV ingress protocol ip prio 1 flower \
  ct_state -trk \
  action ct commit action goto chain 1

$ tc_filter add dev $DEV ingress protocol ip chain 1 prio 1 flower \
  action mirred egress redirect dev $DEV2

$ tc_filter add dev $DEV2 ingress protocol ip prio 1 flower \
  action ct commit action goto chain 1

$ tc_filter add dev $DEV2 ingress protocol ip prio 1 chain 1 flower \
  ct_state +trk+est \
  action mirred egress redirect dev $DEV

Offload established connections, regardless of the commit flag.

Fixes: 46475bb20f ("net/sched: act_ct: Software offload of established flows")
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Link: https://lore.kernel.org/r/1622029449-27060-1-git-send-email-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:16 +02:00
Parav Pandit
4f00f9c169 devlink: Correct VIRTUAL port to not have phys_port attributes
[ Upstream commit b28d8f0c25a9b0355116cace5f53ea52bd4020c8 ]

Physical port name, port number attributes do not belong to virtual port
flavour. When VF or SF virtual ports are registered they incorrectly
append "np0" string in the netdevice name of the VF/SF.

Before this fix, VF netdevice name were ens2f0np0v0, ens2f0np0v1 for VF
0 and 1 respectively.

After the fix, they are ens2f0v0, ens2f0v1.

With this fix, reading /sys/class/net/ens2f0v0/phys_port_name returns
-EOPNOTSUPP.

Also devlink port show example for 2 VFs on one PF to ensure that any
physical port attributes are not exposed.

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
pci/0000:06:00.3/196608: type eth netdev ens2f0v0 flavour virtual splittable false
pci/0000:06:00.4/262144: type eth netdev ens2f0v1 flavour virtual splittable false

This change introduces a netdevice name change on systemd/udev
version 245 and higher which honors phys_port_name sysfs file for
generation of netdevice name.

This also aligns to phys_port_name usage which is limited to switchdev
ports as described in [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Documentation/networking/switchdev.rst

Fixes: acf1ee44ca ("devlink: Introduce devlink port flavour virtual")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20210526200027.14008-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:16 +02:00
Julian Anastasov
4b1aba6536 ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
[ Upstream commit 56e4ee82e850026d71223262c07df7d6af3bd872 ]

syzbot reported memory leak [1] when adding service with
HASHED flag. We should ignore this flag both from sockopt
and netlink provided data, otherwise the service is not
hashed and not visible while releasing resources.

[1]
BUG: memory leak
unreferenced object 0xffff888115227800 (size 512):
  comm "syz-executor263", pid 8658, jiffies 4294951882 (age 12.560s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff83977188>] kmalloc include/linux/slab.h:556 [inline]
    [<ffffffff83977188>] kzalloc include/linux/slab.h:686 [inline]
    [<ffffffff83977188>] ip_vs_add_service+0x598/0x7c0 net/netfilter/ipvs/ip_vs_ctl.c:1343
    [<ffffffff8397d770>] do_ip_vs_set_ctl+0x810/0xa40 net/netfilter/ipvs/ip_vs_ctl.c:2570
    [<ffffffff838449a8>] nf_setsockopt+0x68/0xa0 net/netfilter/nf_sockopt.c:101
    [<ffffffff839ae4e9>] ip_setsockopt+0x259/0x1ff0 net/ipv4/ip_sockglue.c:1435
    [<ffffffff839fa03c>] raw_setsockopt+0x18c/0x1b0 net/ipv4/raw.c:857
    [<ffffffff83691f20>] __sys_setsockopt+0x1b0/0x360 net/socket.c:2117
    [<ffffffff836920f2>] __do_sys_setsockopt net/socket.c:2128 [inline]
    [<ffffffff836920f2>] __se_sys_setsockopt net/socket.c:2125 [inline]
    [<ffffffff836920f2>] __x64_sys_setsockopt+0x22/0x30 net/socket.c:2125
    [<ffffffff84350efa>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
    [<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-and-tested-by: syzbot+e562383183e4b1766930@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:15 +02:00
Florian Westphal
39a909a972 netfilter: conntrack: unregister ipv4 sockopts on error unwind
[ Upstream commit 22cbdbcfb61acc78d5fc21ebb13ccc0d7e29f793 ]

When ipv6 sockopt register fails, the ipv4 one needs to be removed.

Fixes: a0ae2562c6 ("netfilter: conntrack: remove l3proto abstraction")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:13 +02:00
Greg Kroah-Hartman
556758235b Revert "Revert "mm: fix struct page layout on 32-bit systems""
This reverts commit c34cd7750e.

Bring back the commit in 5.10.38 that broke the kabi.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I570fc31f9e1f196136bbbef479fb2413011ddf0e
2021-06-04 11:15:16 -07:00
Greg Kroah-Hartman
c5d480cd47 Merge 5.10.42 into android12-5.10
Changes in 5.10.42
	ALSA: hda/realtek: the bass speaker can't output sound on Yoga 9i
	ALSA: hda/realtek: Headphone volume is controlled by Front mixer
	ALSA: hda/realtek: Chain in pop reduction fixup for ThinkStation P340
	ALSA: hda/realtek: fix mute/micmute LEDs for HP 855 G8
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook G8
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 15 G8
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 17 G8
	ALSA: usb-audio: scarlett2: Fix device hang with ehci-pci
	ALSA: usb-audio: scarlett2: Improve driver startup messages
	cifs: set server->cipher_type to AES-128-CCM for SMB3.0
	NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
	iommu/vt-d: Fix sysfs leak in alloc_iommu()
	perf intel-pt: Fix sample instruction bytes
	perf intel-pt: Fix transaction abort handling
	perf scripts python: exported-sql-viewer.py: Fix copy to clipboard from Top Calls by elapsed Time report
	perf scripts python: exported-sql-viewer.py: Fix Array TypeError
	perf scripts python: exported-sql-viewer.py: Fix warning display
	proc: Check /proc/$pid/attr/ writes against file opener
	net: hso: fix control-request directions
	net/sched: fq_pie: re-factor fix for fq_pie endless loop
	net/sched: fq_pie: fix OOB access in the traffic path
	netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version
	mac80211: assure all fragments are encrypted
	mac80211: prevent mixed key and fragment cache attacks
	mac80211: properly handle A-MSDUs that start with an RFC 1042 header
	cfg80211: mitigate A-MSDU aggregation attacks
	mac80211: drop A-MSDUs on old ciphers
	mac80211: add fragment cache to sta_info
	mac80211: check defrag PN against current frame
	mac80211: prevent attacks on TKIP/WEP as well
	mac80211: do not accept/forward invalid EAPOL frames
	mac80211: extend protection against mixed key and fragment cache attacks
	ath10k: add CCMP PN replay protection for fragmented frames for PCIe
	ath10k: drop fragments with multicast DA for PCIe
	ath10k: drop fragments with multicast DA for SDIO
	ath10k: drop MPDU which has discard flag set by firmware for SDIO
	ath10k: Fix TKIP Michael MIC verification for PCIe
	ath10k: Validate first subframe of A-MSDU before processing the list
	ath11k: Clear the fragment cache during key install
	dm snapshot: properly fix a crash when an origin has no snapshots
	drm/amd/pm: correct MGpuFanBoost setting
	drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
	drm/amdkfd: correct sienna_cichlid SDMA RLC register offset error
	drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gate
	drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg2.0: add cancel_delayed_work_sync before power gate
	selftests/gpio: Use TEST_GEN_PROGS_EXTENDED
	selftests/gpio: Move include of lib.mk up
	selftests/gpio: Fix build when source tree is read only
	kgdb: fix gcc-11 warnings harder
	Documentation: seccomp: Fix user notification documentation
	seccomp: Refactor notification handler to prepare for new semantics
	serial: core: fix suspicious security_locked_down() call
	misc/uss720: fix memory leak in uss720_probe
	thunderbolt: usb4: Fix NVM read buffer bounds and offset issue
	thunderbolt: dma_port: Fix NVM read buffer bounds and offset issue
	KVM: X86: Fix vCPU preempted state from guest's point of view
	KVM: arm64: Prevent mixed-width VM creation
	mei: request autosuspend after sending rx flow control
	staging: iio: cdc: ad7746: avoid overwrite of num_channels
	iio: gyro: fxas21002c: balance runtime power in error path
	iio: dac: ad5770r: Put fwnode in error case during ->probe()
	iio: adc: ad7768-1: Fix too small buffer passed to iio_push_to_buffers_with_timestamp()
	iio: adc: ad7124: Fix missbalanced regulator enable / disable on error.
	iio: adc: ad7124: Fix potential overflow due to non sequential channel numbers
	iio: adc: ad7923: Fix undersized rx buffer.
	iio: adc: ad7793: Add missing error code in ad7793_setup()
	iio: adc: ad7192: Avoid disabling a clock that was never enabled.
	iio: adc: ad7192: handle regulator voltage error first
	serial: 8250: Add UART_BUG_TXRACE workaround for Aspeed VUART
	serial: 8250_dw: Add device HID for new AMD UART controller
	serial: 8250_pci: Add support for new HPE serial device
	serial: 8250_pci: handle FL_NOIRQ board flag
	USB: trancevibrator: fix control-request direction
	Revert "irqbypass: do not start cons/prod when failed connect"
	USB: usbfs: Don't WARN about excessively large memory allocations
	drivers: base: Fix device link removal
	serial: tegra: Fix a mask operation that is always true
	serial: sh-sci: Fix off-by-one error in FIFO threshold register setting
	serial: rp2: use 'request_firmware' instead of 'request_firmware_nowait'
	USB: serial: ti_usb_3410_5052: add startech.com device id
	USB: serial: option: add Telit LE910-S1 compositions 0x7010, 0x7011
	USB: serial: ftdi_sio: add IDs for IDS GmbH Products
	USB: serial: pl2303: add device id for ADLINK ND-6530 GC
	thermal/drivers/intel: Initialize RW trip to THERMAL_TEMP_INVALID
	usb: dwc3: gadget: Properly track pending and queued SG
	usb: gadget: udc: renesas_usb3: Fix a race in usb3_start_pipen()
	usb: typec: mux: Fix matching with typec_altmode_desc
	net: usb: fix memory leak in smsc75xx_bind
	Bluetooth: cmtp: fix file refcount when cmtp_attach_device fails
	fs/nfs: Use fatal_signal_pending instead of signal_pending
	NFS: fix an incorrect limit in filelayout_decode_layout()
	NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
	NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
	NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
	drm/meson: fix shutdown crash when component not probed
	net/mlx5e: reset XPS on error flow if netdev isn't registered yet
	net/mlx5e: Fix multipath lag activation
	net/mlx5e: Fix error path of updating netdev queues
	{net,vdpa}/mlx5: Configure interface MAC into mpfs L2 table
	net/mlx5e: Fix nullptr in add_vlan_push_action()
	net/mlx5: Set reformat action when needed for termination rules
	net/mlx5e: Fix null deref accessing lag dev
	net/mlx4: Fix EEPROM dump support
	net/mlx5: Set term table as an unmanaged flow table
	SUNRPC in case of backlog, hand free slots directly to waiting task
	Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv"
	tipc: wait and exit until all work queues are done
	tipc: skb_linearize the head skb when reassembling msgs
	spi: spi-fsl-dspi: Fix a resource leak in an error handling path
	netfilter: flowtable: Remove redundant hw refresh bit
	net: dsa: mt7530: fix VLAN traffic leaks
	net: dsa: fix a crash if ->get_sset_count() fails
	net: dsa: sja1105: update existing VLANs from the bridge VLAN list
	net: dsa: sja1105: use 4095 as the private VLAN for untagged traffic
	net: dsa: sja1105: error out on unsupported PHY mode
	net: dsa: sja1105: add error handling in sja1105_setup()
	net: dsa: sja1105: call dsa_unregister_switch when allocating memory fails
	net: dsa: sja1105: fix VL lookup command packing for P/Q/R/S
	i2c: s3c2410: fix possible NULL pointer deref on read message after write
	i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
	i2c: i801: Don't generate an interrupt on bus reset
	i2c: sh_mobile: Use new clock calculation formulas for RZ/G2E
	afs: Fix the nlink handling of dir-over-dir rename
	perf jevents: Fix getting maximum number of fds
	nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_response
	mptcp: avoid error message on infinite mapping
	mptcp: drop unconditional pr_warn on bad opt
	mptcp: fix data stream corruption
	platform/x86: hp_accel: Avoid invoking _INI to speed up resume
	gpio: cadence: Add missing MODULE_DEVICE_TABLE
	Revert "crypto: cavium/nitrox - add an error message to explain the failure of pci_request_mem_regions"
	Revert "media: usb: gspca: add a missed check for goto_low_power"
	Revert "ALSA: sb: fix a missing check of snd_ctl_add"
	Revert "serial: max310x: pass return value of spi_register_driver"
	serial: max310x: unregister uart driver in case of failure and abort
	Revert "net: fujitsu: fix a potential NULL pointer dereference"
	net: fujitsu: fix potential null-ptr-deref
	Revert "net/smc: fix a NULL pointer dereference"
	net/smc: properly handle workqueue allocation failure
	Revert "net: caif: replace BUG_ON with recovery code"
	net: caif: remove BUG_ON(dev == NULL) in caif_xmit
	Revert "char: hpet: fix a missing check of ioremap"
	char: hpet: add checks after calling ioremap
	Revert "ALSA: gus: add a check of the status of snd_ctl_add"
	Revert "ALSA: usx2y: Fix potential NULL pointer dereference"
	Revert "isdn: mISDNinfineon: fix potential NULL pointer dereference"
	isdn: mISDNinfineon: check/cleanup ioremap failure correctly in setup_io
	Revert "ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()"
	ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()
	Revert "isdn: mISDN: Fix potential NULL pointer dereference of kzalloc"
	isdn: mISDN: correctly handle ph_info allocation failure in hfcsusb_ph_info
	Revert "dmaengine: qcom_hidma: Check for driver register failure"
	dmaengine: qcom_hidma: comment platform_driver_register call
	Revert "libertas: add checks for the return value of sysfs_create_group"
	libertas: register sysfs groups properly
	Revert "ASoC: cs43130: fix a NULL pointer dereference"
	ASoC: cs43130: handle errors in cs43130_probe() properly
	Revert "media: dvb: Add check on sp8870_readreg"
	media: dvb: Add check on sp8870_readreg return
	Revert "media: gspca: mt9m111: Check write_bridge for timeout"
	media: gspca: mt9m111: Check write_bridge for timeout
	Revert "media: gspca: Check the return value of write_bridge for timeout"
	media: gspca: properly check for errors in po1030_probe()
	Revert "net: liquidio: fix a NULL pointer dereference"
	net: liquidio: Add missing null pointer checks
	Revert "brcmfmac: add a check for the status of usb_register"
	brcmfmac: properly check for bus register errors
	btrfs: return whole extents in fiemap
	scsi: ufs: ufs-mediatek: Fix power down spec violation
	scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic
	openrisc: Define memory barrier mb
	scsi: pm80xx: Fix drives missing during rmmod/insmod loop
	btrfs: release path before starting transaction when cloning inline extent
	btrfs: do not BUG_ON in link_to_fixup_dir
	platform/x86: hp-wireless: add AMD's hardware id to the supported list
	platform/x86: intel_punit_ipc: Append MODULE_DEVICE_TABLE for ACPI
	platform/x86: touchscreen_dmi: Add info for the Mediacom Winpad 7.0 W700 tablet
	SMB3: incorrect file id in requests compounded with open
	drm/amd/display: Disconnect non-DP with no EDID
	drm/amd/amdgpu: fix refcount leak
	drm/amdgpu: Fix a use-after-free
	drm/amd/amdgpu: fix a potential deadlock in gpu reset
	drm/amdgpu: stop touching sched.ready in the backend
	platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet
	block: fix a race between del_gendisk and BLKRRPART
	linux/bits.h: fix compilation error with GENMASK
	net: netcp: Fix an error message
	net: dsa: fix error code getting shifted with 4 in dsa_slave_get_sset_count
	interconnect: qcom: bcm-voter: add a missing of_node_put()
	interconnect: qcom: Add missing MODULE_DEVICE_TABLE
	ASoC: cs42l42: Regmap must use_single_read/write
	net: stmmac: Fix MAC WoL not working if PHY does not support WoL
	net: ipa: memory region array is variable size
	vfio-ccw: Check initialized flag in cp_init()
	spi: Assume GPIO CS active high in ACPI case
	net: really orphan skbs tied to closing sk
	net: packetmmap: fix only tx timestamp on request
	net: fec: fix the potential memory leak in fec_enet_init()
	chelsio/chtls: unlock on error in chtls_pt_recvmsg()
	net: mdio: thunder: Fix a double free issue in the .remove function
	net: mdio: octeon: Fix some double free issues
	cxgb4/ch_ktls: Clear resources when pf4 device is removed
	openvswitch: meter: fix race when getting now_ms.
	tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT
	net: sched: fix packet stuck problem for lockless qdisc
	net: sched: fix tx action rescheduling issue during deactivation
	net: sched: fix tx action reschedule issue with stopped queue
	net: hso: check for allocation failure in hso_create_bulk_serial_device()
	net: bnx2: Fix error return code in bnx2_init_board()
	bnxt_en: Include new P5 HV definition in VF check.
	bnxt_en: Fix context memory setup for 64K page size.
	mld: fix panic in mld_newpack()
	net/smc: remove device from smcd_dev_list after failed device_add()
	gve: Check TX QPL was actually assigned
	gve: Update mgmt_msix_idx if num_ntfy changes
	gve: Add NULL pointer checks when freeing irqs.
	gve: Upgrade memory barrier in poll routine
	gve: Correct SKB queue index validation.
	iommu/virtio: Add missing MODULE_DEVICE_TABLE
	net: hns3: fix incorrect resp_msg issue
	net: hns3: put off calling register_netdev() until client initialize complete
	iommu/vt-d: Use user privilege for RID2PASID translation
	cxgb4: avoid accessing registers when clearing filters
	staging: emxx_udc: fix loop in _nbu2ss_nuke()
	ASoC: cs35l33: fix an error code in probe()
	bpf, offload: Reorder offload callback 'prepare' in verifier
	bpf: Set mac_len in bpf_skb_change_head
	ixgbe: fix large MTU request from VF
	ASoC: qcom: lpass-cpu: Use optional clk APIs
	scsi: libsas: Use _safe() loop in sas_resume_port()
	net: lantiq: fix memory corruption in RX ring
	ipv6: record frag_max_size in atomic fragments in input path
	ALSA: usb-audio: scarlett2: snd_scarlett_gen2_controls_create() can be static
	net: ethernet: mtk_eth_soc: Fix packet statistics support for MT7628/88
	sch_dsmark: fix a NULL deref in qdisc_reset()
	net: hsr: fix mac_len checks
	MIPS: alchemy: xxs1500: add gpio-au1000.h header file
	MIPS: ralink: export rt_sysc_membase for rt2880_wdt.c
	net: zero-initialize tc skb extension on allocation
	net: mvpp2: add buffer header handling in RX
	i915: fix build warning in intel_dp_get_link_status()
	samples/bpf: Consider frame size in tx_only of xdpsock sample
	net: hns3: check the return of skb_checksum_help()
	bpftool: Add sock_release help info for cgroup attach/prog load command
	SUNRPC: More fixes for backlog congestion
	Revert "Revert "ALSA: usx2y: Fix potential NULL pointer dereference""
	net: hso: bail out on interrupt URB allocation failure
	scripts/clang-tools: switch explicitly to Python 3
	neighbour: Prevent Race condition in neighbour subsytem
	usb: core: reduce power-on-good delay time of root hub
	Linux 5.10.42

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I05d98d1355a080e0951b4b2ae77f0a9ccb6dfc5d
2021-06-03 18:47:38 +02:00
Chinmay Agarwal
5c7b23b796 neighbour: Prevent Race condition in neighbour subsytem
commit eefb45eef5c4c425e87667af8f5e904fbdd47abf upstream.

Following Race Condition was detected:

<CPU A, t0>: Executing: __netif_receive_skb() ->__netif_receive_skb_core()
-> arp_rcv() -> arp_process().arp_process() calls __neigh_lookup() which
takes a reference on neighbour entry 'n'.
Moves further along, arp_process() and calls neigh_update()->
__neigh_update(). Neighbour entry is unlocked just before a call to
neigh_update_gc_list.

This unlocking paves way for another thread that may take a reference on
the same and mark it dead and remove it from gc_list.

<CPU B, t1> - neigh_flush_dev() is under execution and calls
neigh_mark_dead(n) marking the neighbour entry 'n' as dead. Also n will be
removed from gc_list.
Moves further along neigh_flush_dev() and calls
neigh_cleanup_and_release(n), but since reference count increased in t1,
'n' couldn't be destroyed.

<CPU A, t3>- Code hits neigh_update_gc_list, with neighbour entry
set as dead.

<CPU A, t4> - arp_process() finally calls neigh_release(n), destroying
the neighbour entry and we have a destroyed ntry still part of gc_list.

Fixes: eb4e8fac00d1("neighbour: Prevent a dead entry from updating gc_list")
Signed-off-by: Chinmay Agarwal <chinagar@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03 09:00:52 +02:00
Trond Myklebust
899b5131e7 SUNRPC: More fixes for backlog congestion
commit e86be3a04bc4aeaf12f93af35f08f8d4385bcd98 upstream.

Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well
as socket based transports.
Ensure we always initialise the request after waking up from the backlog
list.

Fixes: e877a88d1f06 ("SUNRPC in case of backlog, hand free slots directly to waiting task")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03 09:00:51 +02:00
Vlad Buslov
ac493452e9 net: zero-initialize tc skb extension on allocation
[ Upstream commit 9453d45ecb6c2199d72e73c993e9d98677a2801b ]

Function skb_ext_add() doesn't initialize created skb extension with any
value and leaves it up to the user. However, since extension of type
TC_SKB_EXT originally contained only single value tc_skb_ext->chain its
users used to just assign the chain value without setting whole extension
memory to zero first. This assumption changed when TC_SKB_EXT extension was
extended with additional fields but not all users were updated to
initialize the new fields which leads to use of uninitialized memory
afterwards. UBSAN log:

[  778.299821] UBSAN: invalid-load in net/openvswitch/flow.c:899:28
[  778.301495] load of value 107 is not a valid value for type '_Bool'
[  778.303215] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7+ #2
[  778.304933] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  778.307901] Call Trace:
[  778.308680]  <IRQ>
[  778.309358]  dump_stack+0xbb/0x107
[  778.310307]  ubsan_epilogue+0x5/0x40
[  778.311167]  __ubsan_handle_load_invalid_value.cold+0x43/0x48
[  778.312454]  ? memset+0x20/0x40
[  778.313230]  ovs_flow_key_extract.cold+0xf/0x14 [openvswitch]
[  778.314532]  ovs_vport_receive+0x19e/0x2e0 [openvswitch]
[  778.315749]  ? ovs_vport_find_upcall_portid+0x330/0x330 [openvswitch]
[  778.317188]  ? create_prof_cpu_mask+0x20/0x20
[  778.318220]  ? arch_stack_walk+0x82/0xf0
[  778.319153]  ? secondary_startup_64_no_verify+0xb0/0xbb
[  778.320399]  ? stack_trace_save+0x91/0xc0
[  778.321362]  ? stack_trace_consume_entry+0x160/0x160
[  778.322517]  ? lock_release+0x52e/0x760
[  778.323444]  netdev_frame_hook+0x323/0x610 [openvswitch]
[  778.324668]  ? ovs_netdev_get_vport+0xe0/0xe0 [openvswitch]
[  778.325950]  __netif_receive_skb_core+0x771/0x2db0
[  778.327067]  ? lock_downgrade+0x6e0/0x6f0
[  778.328021]  ? lock_acquire+0x565/0x720
[  778.328940]  ? generic_xdp_tx+0x4f0/0x4f0
[  778.329902]  ? inet_gro_receive+0x2a7/0x10a0
[  778.330914]  ? lock_downgrade+0x6f0/0x6f0
[  778.331867]  ? udp4_gro_receive+0x4c4/0x13e0
[  778.332876]  ? lock_release+0x52e/0x760
[  778.333808]  ? dev_gro_receive+0xcc8/0x2380
[  778.334810]  ? lock_downgrade+0x6f0/0x6f0
[  778.335769]  __netif_receive_skb_list_core+0x295/0x820
[  778.336955]  ? process_backlog+0x780/0x780
[  778.337941]  ? mlx5e_rep_tc_netdevice_event_unregister+0x20/0x20 [mlx5_core]
[  778.339613]  ? seqcount_lockdep_reader_access.constprop.0+0xa7/0xc0
[  778.341033]  ? kvm_clock_get_cycles+0x14/0x20
[  778.342072]  netif_receive_skb_list_internal+0x5f5/0xcb0
[  778.343288]  ? __kasan_kmalloc+0x7a/0x90
[  778.344234]  ? mlx5e_handle_rx_cqe_mpwrq+0x9e0/0x9e0 [mlx5_core]
[  778.345676]  ? mlx5e_xmit_xdp_frame_mpwqe+0x14d0/0x14d0 [mlx5_core]
[  778.347140]  ? __netif_receive_skb_list_core+0x820/0x820
[  778.348351]  ? mlx5e_post_rx_mpwqes+0xa6/0x25d0 [mlx5_core]
[  778.349688]  ? napi_gro_flush+0x26c/0x3c0
[  778.350641]  napi_complete_done+0x188/0x6b0
[  778.351627]  mlx5e_napi_poll+0x373/0x1b80 [mlx5_core]
[  778.352853]  __napi_poll+0x9f/0x510
[  778.353704]  ? mlx5_flow_namespace_set_mode+0x260/0x260 [mlx5_core]
[  778.355158]  net_rx_action+0x34c/0xa40
[  778.356060]  ? napi_threaded_poll+0x3d0/0x3d0
[  778.357083]  ? sched_clock_cpu+0x18/0x190
[  778.358041]  ? __common_interrupt+0x8e/0x1a0
[  778.359045]  __do_softirq+0x1ce/0x984
[  778.359938]  __irq_exit_rcu+0x137/0x1d0
[  778.360865]  irq_exit_rcu+0xa/0x20
[  778.361708]  common_interrupt+0x80/0xa0
[  778.362640]  </IRQ>
[  778.363212]  asm_common_interrupt+0x1e/0x40
[  778.364204] RIP: 0010:native_safe_halt+0xe/0x10
[  778.365273] Code: 4f ff ff ff 4c 89 e7 e8 50 3f 40 fe e9 dc fe ff ff 48 89 df e8 43 3f 40 fe eb 90 cc e9 07 00 00 00 0f 00 2d 74 05 62 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 64 05 62 00 f4 c3 cc cc 0f 1f 44 00
[  778.369355] RSP: 0018:ffffffff84407e48 EFLAGS: 00000246
[  778.370570] RAX: ffff88842de46a80 RBX: ffffffff84425840 RCX: ffffffff83418468
[  778.372143] RDX: 000000000026f1da RSI: 0000000000000004 RDI: ffffffff8343af5e
[  778.373722] RBP: fffffbfff0884b08 R08: 0000000000000000 R09: ffff88842de46bcb
[  778.375292] R10: ffffed1085bc8d79 R11: 0000000000000001 R12: 0000000000000000
[  778.376860] R13: ffffffff851124a0 R14: 0000000000000000 R15: dffffc0000000000
[  778.378491]  ? rcu_eqs_enter.constprop.0+0xb8/0xe0
[  778.379606]  ? default_idle_call+0x5e/0xe0
[  778.380578]  default_idle+0xa/0x10
[  778.381406]  default_idle_call+0x96/0xe0
[  778.382350]  do_idle+0x3d4/0x550
[  778.383153]  ? arch_cpu_idle_exit+0x40/0x40
[  778.384143]  cpu_startup_entry+0x19/0x20
[  778.385078]  start_kernel+0x3c7/0x3e5
[  778.385978]  secondary_startup_64_no_verify+0xb0/0xbb

Fix the issue by providing new function tc_skb_ext_alloc() that allocates
tc skb extension and initializes its memory to 0 before returning it to the
caller. Change all existing users to use new API instead of calling
skb_ext_add() directly.

Fixes: 038ebb1a71 ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct")
Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:51 +02:00
George McCollister
f6442ee08f net: hsr: fix mac_len checks
[ Upstream commit 48b491a5cc74333c4a6a82fe21cea42c055a3b0b ]

Commit 2e9f60932a2c ("net: hsr: check skb can contain struct hsr_ethhdr
in fill_frame_info") added the following which resulted in -EINVAL
always being returned:
	if (skb->mac_len < sizeof(struct hsr_ethhdr))
		return -EINVAL;

mac_len was not being set correctly so this check completely broke
HSR/PRP since it was always 14, not 20.

Set mac_len correctly and modify the mac_len checks to test in the
correct places since sometimes it is legitimately 14.

Fixes: 2e9f60932a2c ("net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_info")
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:50 +02:00
Taehee Yoo
a6a0af3c90 sch_dsmark: fix a NULL deref in qdisc_reset()
[ Upstream commit 9b76eade16423ef06829cccfe3e100cfce31afcd ]

If Qdisc_ops->init() is failed, Qdisc_ops->reset() would be called.
When dsmark_init(Qdisc_ops->init()) is failed, it possibly doesn't
initialize dsmark_qdisc_data->q. But dsmark_reset(Qdisc_ops->reset())
uses dsmark_qdisc_data->q pointer wihtout any null checking.
So, panic would occur.

Test commands:
    sysctl net.core.default_qdisc=dsmark -w
    ip link add dummy0 type dummy
    ip link add vw0 link dummy0 type virt_wifi
    ip link set vw0 up

Splat looks like:
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 3 PID: 684 Comm: ip Not tainted 5.12.0+ #910
RIP: 0010:qdisc_reset+0x2b/0x680
Code: 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 41 56 41 55 41 54
55 48 89 fd 48 83 c7 18 53 48 89 fa 48 c1 ea 03 48 83 ec 20 <80> 3c 02
00 0f 85 09 06 00 00 4c 8b 65 18 0f 1f 44 00 00 65 8b 1d
RSP: 0018:ffff88800fda6bf8 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: ffff8880050ed800 RCX: 0000000000000000
RDX: 0000000000000003 RSI: ffffffff99e34100 RDI: 0000000000000018
RBP: 0000000000000000 R08: fffffbfff346b553 R09: fffffbfff346b553
R10: 0000000000000001 R11: fffffbfff346b552 R12: ffffffffc0824940
R13: ffff888109e83800 R14: 00000000ffffffff R15: ffffffffc08249e0
FS:  00007f5042287680(0000) GS:ffff888119800000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055ae1f4dbd90 CR3: 0000000006760002 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? rcu_read_lock_bh_held+0xa0/0xa0
 dsmark_reset+0x3d/0xf0 [sch_dsmark]
 qdisc_reset+0xa9/0x680
 qdisc_destroy+0x84/0x370
 qdisc_create_dflt+0x1fe/0x380
 attach_one_default_qdisc.constprop.41+0xa4/0x180
 dev_activate+0x4d5/0x8c0
 ? __dev_open+0x268/0x390
 __dev_open+0x270/0x390

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:50 +02:00
Francesco Ruggeri
5a2e1ce7ab ipv6: record frag_max_size in atomic fragments in input path
[ Upstream commit e29f011e8fc04b2cdc742a2b9bbfa1b62518381a ]

Commit dbd1759e6a ("ipv6: on reassembly, record frag_max_size")
filled the frag_max_size field in IP6CB in the input path.
The field should also be filled in case of atomic fragments.

Fixes: dbd1759e6a ('ipv6: on reassembly, record frag_max_size')
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:50 +02:00
Jussi Maki
9948170c8e bpf: Set mac_len in bpf_skb_change_head
[ Upstream commit 84316ca4e100d8cbfccd9f774e23817cb2059868 ]

The skb_change_head() helper did not set "skb->mac_len", which is
problematic when it's used in combination with skb_redirect_peer().
Without it, redirecting a packet from a L3 device such as wireguard to
the veth peer device will cause skb->data to point to the middle of the
IP header on entry to tcp_v4_rcv() since the L2 header is not pulled
correctly due to mac_len=0.

Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210519154743.2554771-2-joamaki@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:49 +02:00
Julian Wiedmann
8b2cdc004d net/smc: remove device from smcd_dev_list after failed device_add()
[ Upstream commit 444d7be9532dcfda8e0385226c862fd7e986f607 ]

If the device_add() for a smcd_dev fails, there's no cleanup step that
rolls back the earlier list_add(). The device subsequently gets freed,
and we end up with a corrupted list.

Add some error handling that removes the device from the list.

Fixes: c6ba7c9ba4 ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:48 +02:00
Taehee Yoo
beb39adb15 mld: fix panic in mld_newpack()
[ Upstream commit 020ef930b826d21c5446fdc9db80fd72a791bc21 ]

mld_newpack() doesn't allow to allocate high order page,
only order-0 allocation is allowed.
If headroom size is too large, a kernel panic could occur in skb_put().

Test commands:
    ip netns del A
    ip netns del B
    ip netns add A
    ip netns add B
    ip link add veth0 type veth peer name veth1
    ip link set veth0 netns A
    ip link set veth1 netns B

    ip netns exec A ip link set lo up
    ip netns exec A ip link set veth0 up
    ip netns exec A ip -6 a a 2001:db8:0::1/64 dev veth0
    ip netns exec B ip link set lo up
    ip netns exec B ip link set veth1 up
    ip netns exec B ip -6 a a 2001:db8:0::2/64 dev veth1
    for i in {1..99}
    do
        let A=$i-1
        ip netns exec A ip link add ip6gre$i type ip6gre \
	local 2001:db8:$A::1 remote 2001:db8:$A::2 encaplimit 100
        ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev ip6gre$i
        ip netns exec A ip link set ip6gre$i up

        ip netns exec B ip link add ip6gre$i type ip6gre \
	local 2001:db8:$A::2 remote 2001:db8:$A::1 encaplimit 100
        ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev ip6gre$i
        ip netns exec B ip link set ip6gre$i up
    done

Splat looks like:
kernel BUG at net/core/skbuff.c:110!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.12.0+ #891
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:skb_panic+0x15d/0x15f
Code: 92 fe 4c 8b 4c 24 10 53 8b 4d 70 45 89 e0 48 c7 c7 00 ae 79 83
41 57 41 56 41 55 48 8b 54 24 a6 26 f9 ff <0f> 0b 48 8b 6c 24 20 89
34 24 e8 4a 4e 92 fe 8b 34 24 48 c7 c1 20
RSP: 0018:ffff88810091f820 EFLAGS: 00010282
RAX: 0000000000000089 RBX: ffff8881086e9000 RCX: 0000000000000000
RDX: 0000000000000089 RSI: 0000000000000008 RDI: ffffed1020123efb
RBP: ffff888005f6eac0 R08: ffffed1022fc0031 R09: ffffed1022fc0031
R10: ffff888117e00187 R11: ffffed1022fc0030 R12: 0000000000000028
R13: ffff888008284eb0 R14: 0000000000000ed8 R15: 0000000000000ec0
FS:  0000000000000000(0000) GS:ffff888117c00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8b801c5640 CR3: 0000000033c2c006 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 skb_put.cold.104+0x22/0x22
 ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 ? rcu_read_lock_sched_held+0x91/0xc0
 mld_newpack+0x398/0x8f0
 ? ip6_mc_hdr.isra.26.constprop.46+0x600/0x600
 ? lock_contended+0xc40/0xc40
 add_grhead.isra.33+0x280/0x380
 add_grec+0x5ca/0xff0
 ? mld_sendpack+0xf40/0xf40
 ? lock_downgrade+0x690/0x690
 mld_send_initial_cr.part.34+0xb9/0x180
 ipv6_mc_dad_complete+0x15d/0x1b0
 addrconf_dad_completed+0x8d2/0xbb0
 ? lock_downgrade+0x690/0x690
 ? addrconf_rs_timer+0x660/0x660
 ? addrconf_dad_work+0x73c/0x10e0
 addrconf_dad_work+0x73c/0x10e0

Allowing high order page allocation could fix this problem.

Fixes: 72e09ad107 ("ipv6: avoid high order allocations")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:48 +02:00
Yunsheng Lin
f9fc21e2b1 net: sched: fix tx action reschedule issue with stopped queue
[ Upstream commit dcad9ee9e0663d74a89b25b987f9c7be86432812 ]

The netdev qeueue might be stopped when byte queue limit has
reached or tx hw ring is full, net_tx_action() may still be
rescheduled if STATE_MISSED is set, which consumes unnecessary
cpu without dequeuing and transmiting any skb because the
netdev queue is stopped, see qdisc_run_end().

This patch fixes it by checking the netdev queue state before
calling qdisc_run() and clearing STATE_MISSED if netdev queue is
stopped during qdisc_run(), the net_tx_action() is rescheduled
again when netdev qeueue is restarted, see netif_tx_wake_queue().

As there is time window between netif_xmit_frozen_or_stopped()
checking and STATE_MISSED clearing, between which STATE_MISSED
may set by net_tx_action() scheduled by netif_tx_wake_queue(),
so set the STATE_MISSED again if netdev queue is restarted.

Fixes: 6b3ba9146f ("net: sched: allow qdiscs to handle locking")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:47 +02:00
Yunsheng Lin
2f23d5bcd9 net: sched: fix tx action rescheduling issue during deactivation
[ Upstream commit 102b55ee92f9fda4dde7a45d2b20538e6e3e3d1e ]

Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
qdisc before calling __qdisc_run(), which ultimately clear the
STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
is set before clearing STATE_MISSED, there may be rescheduling
of net_tx_action() at the end of qdisc_run_end(), see below:

CPU0(net_tx_atcion)  CPU1(__dev_xmit_skb)  CPU2(dev_deactivate)
          .                   .                     .
          .            set STATE_MISSED             .
          .           __netif_schedule()            .
          .                   .           set STATE_DEACTIVATED
          .                   .                qdisc_reset()
          .                   .                     .
          .<---------------   .              synchronize_net()
clear __QDISC_STATE_SCHED  |  .                     .
          .                |  .                     .
          .                |  .            some_qdisc_is_busy()
          .                |  .               return *false*
          .                |  .                     .
  test STATE_DEACTIVATED   |  .                     .
__qdisc_run() *not* called |  .                     .
          .                |  .                     .
   test STATE_MISS         |  .                     .
 __netif_schedule()--------|  .                     .
          .                   .                     .
          .                   .                     .

__qdisc_run() is not called by net_tx_atcion() in CPU0 because
CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
is called at the end of qdisc_run_end(), causing tx action
rescheduling problem.

qdisc_run() called by net_tx_action() runs in the softirq context,
which should has the same semantic as the qdisc_run() called by
__dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
synchronize_net() between STATE_DEACTIVATED flag being set and
qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
bail out for the deactived lockless qdisc in net_tx_action(), and
qdisc_reset() will reset all skb not dequeued yet.

So add the rcu_read_lock() explicitly to protect the qdisc_run()
and do the STATE_DEACTIVATED checking in net_tx_action() before
calling qdisc_run_begin(). Another option is to do the checking in
the qdisc_run_end(), but it will add unnecessary overhead for
non-tx_action case, because __dev_queue_xmit() will not see qdisc
with STATE_DEACTIVATED after synchronize_net(), the qdisc with
STATE_DEACTIVATED can only be seen by net_tx_action() because of
__netif_schedule().

The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
between net_tx_action() and qdisc_reset(), see:
commit d518d2ed86 ("net/sched: fix race between deactivation
and dequeue for NOLOCK qdisc"). As the bailout added above for
deactived lockless qdisc in net_tx_action() provides better
protection for the race without calling qdisc_run() at all, so
remove the STATE_DEACTIVATED checking in qdisc_run().

After qdisc_reset(), there is no skb in qdisc to be dequeued, so
clear the STATE_MISSED in dev_reset_queue() too.

Fixes: 6b3ba9146f ("net: sched: allow qdiscs to handle locking")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
V8: Clearing STATE_MISSED before calling __netif_schedule() has
    avoid the endless rescheduling problem, but there may still
    be a unnecessary rescheduling, so adjust the commit log.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:47 +02:00
Yunsheng Lin
21c7151092 net: sched: fix packet stuck problem for lockless qdisc
[ Upstream commit a90c57f2cedd52a511f739fb55e6244e22e1a2fb ]

Lockless qdisc has below concurrent problem:
    cpu0                 cpu1
     .                     .
q->enqueue                 .
     .                     .
qdisc_run_begin()          .
     .                     .
dequeue_skb()              .
     .                     .
sch_direct_xmit()          .
     .                     .
     .                q->enqueue
     .             qdisc_run_begin()
     .            return and do nothing
     .                     .
qdisc_run_end()            .

cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().

Lockless qdisc has below another concurrent problem when
tx_action is involved:

cpu0(serving tx_action)     cpu1             cpu2
          .                   .                .
          .              q->enqueue            .
          .            qdisc_run_begin()       .
          .              dequeue_skb()         .
          .                   .            q->enqueue
          .                   .                .
          .             sch_direct_xmit()      .
          .                   .         qdisc_run_begin()
          .                   .       return and do nothing
          .                   .                .
 clear __QDISC_STATE_SCHED    .                .
 qdisc_run_begin()            .                .
 return and do nothing        .                .
          .                   .                .
          .            qdisc_run_end()         .

This patch fixes the above data race by:
1. If the first spin_trylock() return false and STATE_MISSED is
   not set, set STATE_MISSED and retry another spin_trylock() in
   case other CPU may not see STATE_MISSED after it releases the
   lock.
2. reschedule if STATE_MISSED is set after the lock is released
   at the end of qdisc_run_end().

For tx_action case, STATE_MISSED is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled again
to dequeue the skb enqueued by cpu2.

Clear STATE_MISSED before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the second
spin_trylock() and __netif_schedule() calling.

Also clear the STATE_MISSED before calling __netif_schedule()
at the end of qdisc_run_end() to avoid doing another round of
dequeuing in the pfifo_fast_dequeue().

The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:

 threads  without+this_patch   with+this_patch      delta
    1        2.61Mpps            2.60Mpps           -0.3%
    2        3.97Mpps            3.82Mpps           -3.7%
    4        5.62Mpps            5.59Mpps           -0.5%
    8        2.78Mpps            2.77Mpps           -0.3%
   16        2.22Mpps            2.22Mpps           -0.0%

Fixes: 6b3ba9146f ("net: sched: allow qdiscs to handle locking")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:47 +02:00
Jim Ma
60e2193a60 tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT
[ Upstream commit 974271e5ed45cfe4daddbeb16224a2156918530e ]

In tls_sw_splice_read, checkout MSG_* is inappropriate, should use
SPLICE_*, update tls_wait_data to accept nonblock arguments instead
of flags for recvmsg and splice.

Fixes: c46234ebb4 ("tls: RX path for ktls")
Signed-off-by: Jim Ma <majinjing3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:47 +02:00
Tao Liu
886dd7f3e9 openvswitch: meter: fix race when getting now_ms.
[ Upstream commit e4df1b0c24350a0f00229ff895a91f1072bd850d ]

We have observed meters working unexpected if traffic is 3+Gbit/s
with multiple connections.

now_ms is not pretected by meter->lock, we may get a negative
long_delta_ms when another cpu updated meter->used, then:
    delta_ms = (u32)long_delta_ms;
which will be a large value.

    band->bucket += delta_ms * band->rate;
then we get a wrong band->bucket.

OpenVswitch userspace datapath has fixed the same issue[1] some
time ago, and we port the implementation to kernel datapath.

[1] https://patchwork.ozlabs.org/project/openvswitch/patch/20191025114436.9746-1-i.maximets@ovn.org/

Fixes: 96fbc13d7e ("openvswitch: Add meter infrastructure")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:47 +02:00
Richard Sanger
9c386011fa net: packetmmap: fix only tx timestamp on request
[ Upstream commit 171c3b151118a2fe0fc1e2a9d1b5a1570cfe82d2 ]

The packetmmap tx ring should only return timestamps if requested via
setsockopt PACKET_TIMESTAMP, as documented. This allows compatibility
with non-timestamp aware user-space code which checks
tp_status == TP_STATUS_AVAILABLE; not expecting additional timestamp
flags to be set in tp_status.

Fixes: b9c32fb271 ("packet: if hw/sw ts enabled in rx/tx ring, report which ts we got")
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Richard Sanger <rsanger@wand.net.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:46 +02:00
Paolo Abeni
1f1b431a4f net: really orphan skbs tied to closing sk
[ Upstream commit 098116e7e640ba677d9e345cbee83d253c13d556 ]

If the owing socket is shutting down - e.g. the sock reference
count already dropped to 0 and only sk_wmem_alloc is keeping
the sock alive, skb_orphan_partial() becomes a no-op.

When forwarding packets over veth with GRO enabled, the above
causes refcount errors.

This change addresses the issue with a plain skb_orphan() call
in the critical scenario.

Fixes: 9adc89af724f ("net: let skb_orphan_partial wake-up waiters.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:46 +02:00
Vladimir Oltean
d7932e6852 net: dsa: fix error code getting shifted with 4 in dsa_slave_get_sset_count
[ Upstream commit b94cbc909f1d80378a1f541968309e5c1178c98b ]

DSA implements a bunch of 'standardized' ethtool statistics counters,
namely tx_packets, tx_bytes, rx_packets, rx_bytes. So whatever the
hardware driver returns in .get_sset_count(), we need to add 4 to that.

That is ok, except that .get_sset_count() can return a negative error
code, for example:

b53_get_sset_count
-> phy_ethtool_get_sset_count
   -> return -EIO

-EIO is -5, and with 4 added to it, it becomes -1, aka -EPERM. One can
imagine that certain error codes may even become positive, although
based on code inspection I did not see instances of that.

Check the error code first, if it is negative return it as-is.

Based on a similar patch for dsa_master_get_strings from Dan Carpenter:
https://patchwork.kernel.org/project/netdevbpf/patch/YJaSe3RPgn7gKxZv@mwanda/

Fixes: 91da11f870 ("net: Distributed Switch Architecture protocol support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:45 +02:00