android_kernel_xiaomi_sm8450/net
Maciej Żenczykowski ea53c24cb0 FROMGIT: bpf: Do not change gso_size during bpf_skb_change_proto()
This is technically a backwards incompatible change in behaviour, but I'm
going to argue that it is very unlikely to break things, and likely to fix
*far* more then it breaks.

In no particular order, various reasons follow:

(a) I've long had a bug assigned to myself to debug a super rare kernel crash
on Android Pixel phones which can (per stacktrace) be traced back to BPF clat
IPv6 to IPv4 protocol conversion causing some sort of ugly failure much later
on during transmit deep in the GSO engine, AFAICT precisely because of this
change to gso_size, though I've never been able to manually reproduce it. I
believe it may be related to the particular network offload support of attached
USB ethernet dongle being used for tethering off of an IPv6-only cellular
connection. The reason might be we end up with more segments than max permitted,
or with a GSO packet with only one segment... (either way we break some
assumption and hit a BUG_ON)

(b) There is no check that the gso_size is > 20 when reducing it by 20, so we
might end up with a negative (or underflowing) gso_size or a gso_size of 0.
This can't possibly be good. Indeed this is probably somehow exploitable (or
at least can result in a kernel crash) by delivering crafted packets and perhaps
triggering an infinite loop or a divide by zero... As a reminder: gso_size (MSS)
is related to MTU, but not directly derived from it: gso_size/MSS may be
significantly smaller then one would get by deriving from local MTU. And on
some NICs (which do loose MTU checking on receive, it may even potentially be
larger, for example my work pc with 1500 MTU can receive 1520 byte frames [and
sometimes does due to bugs in a vendor plat46 implementation]). Indeed even just
going from 21 to 1 is potentially problematic because it increases the number
of segments by a factor of 21 (think DoS, or some other crash due to too many
segments).

(c) It's always safe to not increase the gso_size, because it doesn't result in
the max packet size increasing.  So the skb_increase_gso_size() call was always
unnecessary for correctness (and outright undesirable, see later). As such the
only part which is potentially dangerous (ie. could cause backwards compatibility
issues) is the removal of the skb_decrease_gso_size() call.

(d) If the packets are ultimately destined to the local device, then there is
absolutely no benefit to playing around with gso_size. It only matters if the
packets will egress the device. ie. we're either forwarding, or transmitting
from the device.

(e) This logic only triggers for packets which are GSO. It does not trigger for
skbs which are not GSO. It will not convert a non-GSO MTU sized packet into a
GSO packet (and you don't even know what the MTU is, so you can't even fix it).
As such your transmit path must *already* be able to handle an MTU 20 bytes
larger then your receive path (for IPv4 to IPv6 translation) - and indeed 28
bytes larger due to IPv4 fragments. Thus removing the skb_decrease_gso_size()
call doesn't actually increase the size of the packets your transmit side must
be able to handle. ie. to handle non-GSO max-MTU packets, the IPv4/IPv6 device/
route MTUs must already be set correctly. Since for example with an IPv4 egress
MTU of 1500, IPv4 to IPv6 translation will already build 1520 byte IPv6 frames,
so you need a 1520 byte device MTU. This means if your IPv6 device's egress
MTU is 1280, your IPv4 route must be 1260 (and actually 1252, because of the
need to handle fragments). This is to handle normal non-GSO packets. Thus the
reduction is simply not needed for GSO packets, because when they're correctly
built, they will already be the right size.

(f) TSO/GSO should be able to exactly undo GRO: the number of packets (TCP
segments) should not be modified, so that TCP's MSS counting works correctly
(this matters for congestion control). If protocol conversion changes the
gso_size, then the number of TCP segments may increase or decrease. Packet loss
after protocol conversion can result in partial loss of MSS segments that the
sender sent. How's the sending TCP stack going to react to receiving ACKs/SACKs
in the middle of the segments it sent?

(g) skb_{decrease,increase}_gso_size() are already no-ops for GSO_BY_FRAGS
case (besides triggering WARN_ON_ONCE). This means you already cannot guarantee
that gso_size (and thus resulting packet MTU) is changed. ie. you must assume
it won't be changed.

(h) changing gso_size is outright buggy for UDP GSO packets, where framing
matters (I believe that's also the case for SCTP, but it's already excluded
by [g]).  So the only remaining case is TCP, which also doesn't want it
(see [f]).

(i) see also the reasoning on the previous attempt at fixing this
(commit fa7b83bf3b156c767f3e4a25bbf3817b08f3ff8e) which shows that the current
behaviour causes TCP packet loss:

  In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
  coalesced packet payload can be > MSS, but < MSS + 20.

  bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload
  length. After then tcp_gso_segment checks for the payload length if it
  is <= MSS. The condition is causing the packet to be dropped.

  tcp_gso_segment():
    [...]
    mss = skb_shinfo(skb)->gso_size;
    if (unlikely(skb->len <= mss)) goto out;
    [...]

Thus changing the gso_size is simply a very bad idea. Increasing is unnecessary
and buggy, and decreasing can go negative.

Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dongseok Yi <dseok.yi@samsung.com>
Cc: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/CANP3RGfjLikQ6dg=YpBU0OeHvyv7JOki7CyOUS9modaXAi-9vQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210617000953.2787453-2-zenczykowski@gmail.com

(cherry picked from commit 364745fbe981a4370f50274475da4675661104df https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=364745fbe981a4370f50274475da4675661104df )
Test: builds, TreeHugger
Bug: 188690383
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I0ef3174cbd3caaa42d5779334a9c0bfdc9ab81f5
2021-06-28 21:04:04 +00:00
..
6lowpan
9p net: 9p: advance iov on empty read 2021-04-07 15:00:08 +02:00
802
8021q net: vlan: avoid leaks on register_vlan_dev() failures 2021-01-17 14:16:55 +01:00
appletalk appletalk: Fix skb allocation size in loopback case 2021-04-07 15:00:08 +02:00
atm net: atm: fix update of position index in lec_seq_next 2020-10-31 12:26:30 -07:00
ax25
batman-adv batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field 2021-04-14 08:41:59 +02:00
bluetooth Bluetooth: use correct lock to prevent UAF of hdev object 2021-06-10 13:39:23 +02:00
bpf bpf: Reject too big ctx_size_in for raw_tp test run 2021-01-27 11:55:07 +01:00
bpfilter Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" 2020-10-15 12:33:24 -07:00
bridge bridge: Fix possible races between assigning rx_handler_data and setting IFF_BRIDGE_PORT bit 2021-05-22 11:40:54 +02:00
caif net: caif: fix memory leak in cfusbl_device_notify 2021-06-10 13:39:25 +02:00
can can: isotp: fix msg_namelen values depending on CAN_REQUIRED_SIZE 2021-04-14 08:42:07 +02:00
ceph libceph: clear con->out_msg on Policy::stateful_server faults 2020-10-12 15:29:27 +02:00
core FROMGIT: bpf: Do not change gso_size during bpf_skb_change_proto() 2021-06-28 21:04:04 +00:00
dcb net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands 2021-01-23 16:04:01 +01:00
dccp ipv6: weaken the v4mapped source check 2021-03-30 14:32:01 +02:00
decnet
dns_resolver
dsa net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs 2021-06-10 13:39:17 +02:00
ethernet
ethtool ethtool: fix missing NLM_F_MULTI flag when dumping 2021-05-19 10:13:08 +02:00
hsr net: hsr: fix mac_len checks 2021-06-03 09:00:50 +02:00
ieee802154 ieee802154: fix error return code in ieee802154_llsec_getparams() 2021-06-10 13:39:19 +02:00
ife
ipv4 Merge 5.10.37 into android12-5.10 2021-05-15 09:28:55 +02:00
ipv6 Merge 5.10.43 into android12-5.10 2021-06-12 14:48:14 +02:00
iucv net/af_iucv: remove WARN_ONCE on malformed RX packets 2021-03-07 12:34:05 +01:00
kcm
key af_key: relax availability checks for skb size calculation 2021-02-13 13:55:02 +01:00
l2tp net: l2tp: reduce log level of messages in receive path, add counter instead 2021-03-17 17:06:11 +01:00
l3mdev
lapb net: lapb: Copy the skb before sending a packet 2021-02-10 09:29:14 +01:00
llc
mac80211 mac80211: extend protection against mixed key and fragment cache attacks 2021-06-03 09:00:29 +02:00
mac802154 net: mac802154: Fix general protection fault 2021-04-14 08:42:13 +02:00
mpls net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 2021-03-17 17:06:11 +01:00
mptcp mptcp: always parse mptcp options for MPC reqsk 2021-06-10 13:39:16 +02:00
ncsi net/ncsi: Avoid channel_monitor hrtimer deadlock 2021-04-14 08:42:08 +02:00
netfilter ANDROID: minor fixups of xt_IDLETIMER support 2021-06-17 17:44:38 +00:00
netlabel cipso,calipso: resolve a number of problems with the DOI refcounts 2021-03-17 17:06:15 +01:00
netlink
netrom
nfc nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect 2021-06-10 13:39:27 +02:00
nsh
openvswitch openvswitch: meter: fix race when getting now_ms. 2021-06-03 09:00:47 +02:00
packet net: packetmmap: fix only tx timestamp on request 2021-06-03 09:00:46 +02:00
phonet
psample net: psample: Fix netlink skb length with tunnel info 2021-03-07 12:34:07 +01:00
qrtr net: qrtr: Avoid potential use after free in MHI send 2021-05-07 11:04:31 +02:00
rds net/rds: Fix a use after free in rds_message_map_pages 2021-04-14 08:42:09 +02:00
rfkill rfkill: Fix use-after-free in rfkill_resume() 2020-11-12 09:18:06 +01:00
rose rose: Fix Null pointer dereference in rose_send_frame() 2020-11-20 10:04:58 -08:00
rxrpc rxrpc: Fix clearance of Tx/Rx ring when releasing a call 2021-02-17 11:02:28 +01:00
sched net/sched: act_ct: Fix ct template allocation for zone 0 2021-06-10 13:39:16 +02:00
sctp Merge 5.10.38 into android12-5.10 2021-05-20 15:35:25 +02:00
smc net/smc: remove device from smcd_dev_list after failed device_add() 2021-06-03 09:00:48 +02:00
strparser
sunrpc SUNRPC: More fixes for backlog congestion 2021-06-03 09:00:51 +02:00
switchdev net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP 2021-02-07 15:37:12 +01:00
tipc tipc: fix unique bearer names sanity check 2021-06-10 13:39:22 +02:00
tls net/tls: Fix use-after-free after the TLS device goes down and up 2021-06-10 13:39:18 +02:00
unix networking changes for the 5.10 merge window 2020-10-15 18:42:13 -07:00
vmw_vsock Merge 5.10.37 into android12-5.10 2021-05-15 09:28:55 +02:00
wimax
wireless Merge 5.10.42 into android12-5.10 2021-06-03 18:47:38 +02:00
x25 net/x25: prevent a couple of overflows 2020-12-02 17:26:36 -08:00
xdp xsk: Fix for xp_aligned_validate_desc() when len == chunk_size 2021-05-19 10:13:06 +02:00
xfrm Revert "xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume" 2021-05-13 20:27:08 +00:00
compat.c Revert "Revert "iov_iter: transparently handle compat iovecs in import_iovec"" 2020-11-02 09:27:36 +01:00
devres.c
Kconfig
Makefile
OWNERS ANDROID: Add OWNERS files referring to the respective android-mainline OWNERS 2021-04-03 14:11:30 +00:00
socket.c Merge 9ff9b0d392 ("Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") into android-mainline 2020-10-26 09:20:21 +01:00
sysctl_net.c