android_kernel_samsung_sm8650/net/ipv4
John Fastabend ab90b68f65 bpf, sockmap: TCP data stall on recv before accept
[ Upstream commit ea444185a6bf7da4dd0df1598ee953e4f7174858 ]

A common mechanism to put a TCP socket into the sockmap is to hook the
BPF_SOCK_OPS_{ACTIVE_PASSIVE}_ESTABLISHED_CB event with a BPF program
that can map the socket info to the correct BPF verdict parser. When
the user adds the socket to the map the psock is created and the new
ops are assigned to ensure the verdict program will 'see' the sk_buffs
as they arrive.

Part of this process hooks the sk_data_ready op with a BPF specific
handler to wake up the BPF verdict program when data is ready to read.
The logic is simple enough (posted here for easy reading)

 static void sk_psock_verdict_data_ready(struct sock *sk)
 {
	struct socket *sock = sk->sk_socket;

	if (unlikely(!sock || !sock->ops || !sock->ops->read_skb))
		return;
	sock->ops->read_skb(sk, sk_psock_verdict_recv);
 }

The oversight here is sk->sk_socket is not assigned until the application
accepts() the new socket. However, its entirely ok for the peer application
to do a connect() followed immediately by sends. The socket on the receiver
is sitting on the backlog queue of the listening socket until its accepted
and the data is queued up. If the peer never accepts the socket or is slow
it will eventually hit data limits and rate limit the session. But,
important for BPF sockmap hooks when this data is received TCP stack does
the sk_data_ready() call but the read_skb() for this data is never called
because sk_socket is missing. The data sits on the sk_receive_queue.

Then once the socket is accepted if we never receive more data from the
peer there will be no further sk_data_ready calls and all the data
is still on the sk_receive_queue(). Then user calls recvmsg after accept()
and for TCP sockets in sockmap we use the tcp_bpf_recvmsg_parser() handler.
The handler checks for data in the sk_msg ingress queue expecting that
the BPF program has already run from the sk_data_ready hook and enqueued
the data as needed. So we are stuck.

To fix do an unlikely check in recvmsg handler for data on the
sk_receive_queue and if it exists wake up data_ready. We have the sock
locked in both read_skb and recvmsg so should avoid having multiple
runners.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-7-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05 09:26:19 +02:00
..
bpfilter
netfilter netfilter: tproxy: fix deadlock due to missing BH disable 2023-03-17 08:50:25 +01:00
af_inet.c tcp: add annotations around sk->sk_shutdown accesses 2023-05-24 17:32:32 +01:00
ah4.c xfrm: ah: add extack to ah_init_state, ah6_init_state 2022-09-29 07:17:59 +02:00
arp.c ipv4: move from strlcpy with unused retval to strscpy 2022-08-22 17:59:37 -07:00
bpf_tcp_ca.c bpf: Use 0 instead of NOT_INIT for btf_struct_access() writes 2022-09-10 17:27:32 -07:00
cipso_ipv4.c cipso: Fix data-races around sysctl. 2022-07-08 12:10:33 +01:00
datagram.c Networking fixes for 6.1-rc2, including fixes from netfilter 2022-10-20 17:24:59 -07:00
devinet.c net: Fix data-races around sysctl_devconf_inherit_init_net. 2022-08-24 13:46:58 +01:00
esp4_offload.c xfrm: replay: Fix ESN wrap around for GSO 2022-10-19 09:00:53 +02:00
esp4.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2022-10-03 07:52:13 +01:00
fib_frontend.c ipv4: Fix incorrect table ID in IOCTL path 2023-03-22 13:33:50 +01:00
fib_lookup.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
fib_notifier.c net: ipv4: remove superfluous header files from fib_notifier.c 2021-09-28 17:32:56 -07:00
fib_rules.c ipv4: remove unnecessary type castings 2022-04-30 15:12:58 +01:00
fib_semantics.c ipv4: prevent potential spectre v1 gadget in fib_metrics_match() 2023-02-01 08:34:45 +01:00
fib_trie.c ipv4: Fix error return code in fib_table_insert() 2022-11-22 20:18:20 -08:00
fou.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
gre_demux.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
gre_offload.c net: gro: skb_gro_header helper function 2022-08-25 10:33:21 +02:00
icmp.c icmp: guard against too small mtu 2023-04-13 16:55:21 +02:00
igmp.c treewide: use prandom_u32_max() when possible, part 1 2022-10-11 17:42:55 -06:00
inet_connection_sock.c inet: Add IP_LOCAL_PORT_RANGE socket option 2023-06-05 09:26:16 +02:00
inet_diag.c net: inet: Retire port only listening_hash 2022-05-12 16:52:18 -07:00
inet_fragment.c ipv4: remove unnecessary type castings 2022-04-30 15:12:58 +01:00
inet_hashtables.c inet: Add IP_LOCAL_PORT_RANGE socket option 2023-06-05 09:26:16 +02:00
inet_timewait_sock.c tcp: avoid the lookup process failing to get sk in ehash table 2023-02-01 08:34:25 +01:00
inetpeer.c inetpeer: Fix data-races around sysctl. 2022-07-08 12:10:33 +01:00
ip_forward.c ip: Fix data-races around sysctl_ip_fwd_update_priority. 2022-07-15 11:49:55 +01:00
ip_fragment.c net: ip: Handle delivery_time in ip defrag 2022-03-03 14:38:48 +00:00
ip_gre.c erspan: do not use skb_mac_header() in ndo_start_xmit() 2023-03-30 12:49:09 +02:00
ip_input.c xfrm: fix "disable_policy" on ipv4 early demux 2022-10-12 10:45:34 +02:00
ip_options.c ipv4: drop fragmentation code from ip_options_build() 2022-01-29 17:53:07 +00:00
ip_output.c ipv4: Fix potential uninit variable access bug in __ip_make_skb() 2023-05-11 23:03:26 +09:00
ip_sockglue.c ipv{4,6}/raw: fix output xfrm lookup wrt protocol 2023-06-05 09:26:16 +02:00
ip_tunnel_core.c net: Add helper function to parse netlink msg of ip_tunnel_parm 2022-10-03 07:59:06 +01:00
ip_tunnel.c net: tunnels: annotate lockless accesses to dev->needed_headroom 2023-03-22 13:33:46 +01:00
ip_vti.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ipcomp.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.c Driver core / kernfs changes for 6.0-rc1 2022-08-04 11:31:20 -07:00
ipip.c net: Add helper function to parse netlink msg of ip_tunnel_parm 2022-10-03 07:59:06 +01:00
ipmr_base.c ipmr: adopt rcu_read_lock() in mr_dump() 2022-06-24 11:34:38 +01:00
ipmr.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-09-22 13:02:10 -07:00
Kconfig tcp: configurable source port perturb table size 2022-11-16 13:02:04 +00:00
Makefile
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-02-01 08:34:45 +01:00
netfilter.c netfilter: Use l3mdev flow key when re-routing mangled packets 2022-05-16 13:03:29 +02:00
netlink.c
nexthop.c nh: fix scope used to find saddr when adding non gw nh 2022-10-27 10:17:40 -07:00
ping.c ping: Fix potentail NULL deref for /proc/net/icmp. 2023-04-13 16:55:24 +02:00
proc.c tcp: Don't allocate tcp_death_row outside of struct netns_ipv4. 2022-09-20 10:21:49 -07:00
protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
raw_diag.c raw: Fix NULL deref in raw_get_next(). 2023-04-13 16:55:23 +02:00
raw.c ipv{4,6}/raw: fix output xfrm lookup wrt protocol 2023-06-05 09:26:16 +02:00
route.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
syncookies.c mptcp: remove MPTCP 'ifdef' in TCP SYN cookies 2023-01-07 11:11:44 +01:00
sysctl_net_ipv4.c tcp: restrict net.ipv4.tcp_app_win 2023-04-20 12:35:08 +02:00
tcp_bbr.c bpf: Switch to new kfunc flags infrastructure 2022-07-21 20:59:42 -07:00
tcp_bic.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_bpf.c bpf, sockmap: TCP data stall on recv before accept 2023-06-05 09:26:19 +02:00
tcp_cdg.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
tcp_cong.c tcp: Add tracepoint for tcp_set_ca_state 2022-04-07 20:33:15 -07:00
tcp_cubic.c bpf: Switch to new kfunc flags infrastructure 2022-07-21 20:59:42 -07:00
tcp_dctcp.c bpf: Switch to new kfunc flags infrastructure 2022-07-21 20:59:42 -07:00
tcp_dctcp.h
tcp_diag.c tcp: Access &tcp_hashinfo via net. 2022-09-20 10:21:49 -07:00
tcp_fastopen.c tcp: Make SYN ACK RTO tunable by BPF programs with TFO 2022-08-17 10:19:22 +01:00
tcp_highspeed.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_htcp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_hybla.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_illinois.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_input.c tcp: add annotations around sk->sk_shutdown accesses 2023-05-24 17:32:32 +01:00
tcp_ipv4.c tcp: fix possible sk_priority leak in tcp_v4_send_reset() 2023-05-24 17:32:44 +01:00
tcp_lp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_metrics.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
tcp_minisocks.c tcp: tcp_check_req() can be called from process context 2023-03-11 13:55:29 +01:00
tcp_nv.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_offload.c gro: add support of (hw)gro packets to gro stack 2022-10-03 12:38:34 +01:00
tcp_output.c tcp: tcp_make_synack() can be called from process context 2023-03-22 13:33:43 +01:00
tcp_rate.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-04-28 13:02:01 -07:00
tcp_recovery.c tcp: Fix data-races around sysctl_tcp_recovery. 2022-07-20 10:14:50 +01:00
tcp_scalable.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_timer.c tcp: Make SYN ACK RTO tunable by BPF programs with TFO 2022-08-17 10:19:22 +01:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-24 07:24:43 +01:00
tcp_vegas.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_vegas.h
tcp_veno.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_westwood.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_yeah.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp.c bpf, sockmap: Pass skb ownership through read_skb 2023-06-05 09:26:18 +02:00
tunnel4.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-17 08:50:24 +01:00
udp_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
udp_impl.h net: remove noblock parameter from recvmsg() entities 2022-04-12 15:00:25 +02:00
udp_offload.c gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlers 2021-11-24 17:21:42 -08:00
udp_tunnel_core.c net/tunnel: wait until all sk_user_data reader finish before releasing the sock 2022-12-31 13:32:27 +01:00
udp_tunnel_nic.c udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister() 2022-02-23 12:35:00 +00:00
udp_tunnel_stub.c
udp.c bpf, sockmap: Pass skb ownership through read_skb 2023-06-05 09:26:18 +02:00
udplite.c udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). 2023-05-30 14:03:20 +01:00
xfrm4_input.c
xfrm4_output.c
xfrm4_policy.c net: rename reference+tracking helpers 2022-06-09 21:52:55 -07:00
xfrm4_protocol.c net: xfrm: unexport __init-annotated xfrm4_protocol_init() 2022-06-08 10:10:13 -07:00
xfrm4_state.c
xfrm4_tunnel.c xfrm: tunnel: add extack to ipip_init_state, xfrm6_tunnel_init_state 2022-09-29 07:18:00 +02:00