Kernel for Galaxy S24, rebased on CLO sources (WIP)
Go to file
mfreemon@cloudflare.com 0796c53424 tcp: enforce receive buffer memory limits by allowing the tcp window to shrink
[ Upstream commit b650d953cd391595e536153ce30b4aab385643ac ]

Under certain circumstances, the tcp receive buffer memory limit
set by autotuning (sk_rcvbuf) is increased due to incoming data
packets as a result of the window not closing when it should be.
This can result in the receive buffer growing all the way up to
tcp_rmem[2], even for tcp sessions with a low BDP.

To reproduce:  Connect a TCP session with the receiver doing
nothing and the sender sending small packets (an infinite loop
of socket send() with 4 bytes of payload with a sleep of 1 ms
in between each send()).  This will cause the tcp receive buffer
to grow all the way up to tcp_rmem[2].

As a result, a host can have individual tcp sessions with receive
buffers of size tcp_rmem[2], and the host itself can reach tcp_mem
limits, causing the host to go into tcp memory pressure mode.

The fundamental issue is the relationship between the granularity
of the window scaling factor and the number of byte ACKed back
to the sender.  This problem has previously been identified in
RFC 7323, appendix F [1].

The Linux kernel currently adheres to never shrinking the window.

In addition to the overallocation of memory mentioned above, the
current behavior is functionally incorrect, because once tcp_rmem[2]
is reached when no remediations remain (i.e. tcp collapse fails to
free up any more memory and there are no packets to prune from the
out-of-order queue), the receiver will drop in-window packets
resulting in retransmissions and an eventual timeout of the tcp
session.  A receive buffer full condition should instead result
in a zero window and an indefinite wait.

In practice, this problem is largely hidden for most flows.  It
is not applicable to mice flows.  Elephant flows can send data
fast enough to "overrun" the sk_rcvbuf limit (in a single ACK),
triggering a zero window.

But this problem does show up for other types of flows.  Examples
are websockets and other type of flows that send small amounts of
data spaced apart slightly in time.  In these cases, we directly
encounter the problem described in [1].

RFC 7323, section 2.4 [2], says there are instances when a retracted
window can be offered, and that TCP implementations MUST ensure
that they handle a shrinking window, as specified in RFC 1122,
section 4.2.2.16 [3].  All prior RFCs on the topic of tcp window
management have made clear that sender must accept a shrunk window
from the receiver, including RFC 793 [4] and RFC 1323 [5].

This patch implements the functionality to shrink the tcp window
when necessary to keep the right edge within the memory limit by
autotuning (sk_rcvbuf).  This new functionality is enabled with
the new sysctl: net.ipv4.tcp_shrink_window

Additional information can be found at:
https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/

[1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F
[2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4
[3] https://www.rfc-editor.org/rfc/rfc1122#page-91
[4] https://www.rfc-editor.org/rfc/rfc793
[5] https://www.rfc-editor.org/rfc/rfc1323

Signed-off-by: Mike Freemon <mfreemon@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-19 23:08:54 +02:00
arch riscv, bpf: Sign-extend return values 2023-10-19 23:08:53 +02:00
block block: fix use-after-free of q->q_usage_counter 2023-10-10 22:00:37 +02:00
certs certs: Fix build error when PKCS#11 URI contains semicolon 2023-02-09 11:28:11 +01:00
crypto crypto: lrw,xts - Replace strlcpy with strscpy 2023-09-23 11:11:01 +02:00
Documentation tcp: enforce receive buffer memory limits by allowing the tcp window to shrink 2023-10-19 23:08:54 +02:00
drivers pinctrl: renesas: rzn1: Enable missing PINMUX 2023-10-19 23:08:54 +02:00
fs quota: Fix slow quotaoff 2023-10-19 23:08:50 +02:00
include tcp: enforce receive buffer memory limits by allowing the tcp window to shrink 2023-10-19 23:08:54 +02:00
init sched/psi: Select KERNFS as needed 2023-09-13 09:42:28 +02:00
io_uring io_uring/fs: remove sqe->rw_flags checking from LINKAT 2023-10-06 14:57:02 +02:00
ipc ipc: fix memory leak in init_mqueue_fs() 2022-12-31 13:32:01 +01:00
kernel workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask() 2023-10-19 23:08:54 +02:00
lib lib/test_meminit: fix off-by-one error in test_pages() 2023-10-15 18:32:41 +02:00
LICENSES LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers 2021-12-16 14:33:10 +01:00
mm mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list 2023-10-10 22:00:36 +02:00
net tcp: enforce receive buffer memory limits by allowing the tcp window to shrink 2023-10-19 23:08:54 +02:00
rust rust: allocator: Prevent mis-aligned allocation 2023-08-11 12:08:18 +02:00
samples samples/hw_breakpoint: fix building without module unloading 2023-09-23 11:11:09 +02:00
scripts modpost: add missing else to the "of" check 2023-10-10 22:00:41 +02:00
security KEYS: trusted: Remove redundant static calls usage 2023-10-19 23:08:50 +02:00
sound ALSA: hda/realtek - ALC287 merge RTK codec with CS CS35L41 AMP 2023-10-19 23:08:51 +02:00
tools netfilter: nf_tables: Deduplicate nft_register_obj audit logs 2023-10-10 22:00:43 +02:00
usr usr/gen_init_cpio.c: remove unnecessary -1 values from int file 2022-10-03 14:21:44 -07:00
virt kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add() 2023-09-13 09:42:46 +02:00
.clang-format inet: ping: use hlist_nulls rcu iterator during lookup 2022-12-01 12:42:46 +01:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
.mailmap 9 hotfixes. 6 for MM, 3 for other areas. Four of these patches address 2022-12-10 17:10:52 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Remove Michal Marek from Kbuild maintainers 2022-11-16 14:53:00 +09:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS devlink: move code to a dedicated directory 2023-08-30 16:11:00 +02:00
Makefile Linux 6.1.58 2023-10-15 18:32:41 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.