Kernel for Galaxy S24, rebased on CLO sources (WIP)
Go to file
Wen Gu 9540765d18 net/smc: Reset connection when trying to use SMCRv2 fails.
commit 35112271672ae98f45df7875244a4e33aa215e31 upstream.

We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It
can be reproduced by:

- smc_run nginx
- smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port>

 BUG: kernel NULL pointer dereference, address: 0000000000000014
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G        W   E      6.4.0-rc1+ #42
 Workqueue: smc_hs_wq smc_listen_work [smc]
 RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc]
 RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06
 RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000
 RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00
 RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000
 R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180
 R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001
 FS:  0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib]
  ? smcr_buf_map_link+0x24b/0x290 [smc]
  ? __smc_buf_create+0x4ee/0x9b0 [smc]
  smc_clc_send_accept+0x4c/0xb0 [smc]
  smc_listen_work+0x346/0x650 [smc]
  ? __schedule+0x279/0x820
  process_one_work+0x1e5/0x3f0
  worker_thread+0x4d/0x2f0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xe5/0x120
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2c/0x50
  </TASK>

During the CLC handshake, server sequentially tries available SMCRv2
and SMCRv1 devices in smc_listen_work().

If an SMCRv2 device is found. SMCv2 based link group and link will be
assigned to the connection. Then assumed that some buffer assignment
errors happen later in the CLC handshake, such as RMB registration
failure, server will give up SMCRv2 and try SMCRv1 device instead. But
the resources assigned to the connection won't be reset.

When server tries SMCRv1 device, the connection creation process will
be executed again. Since conn->lnk has been assigned when trying SMCRv2,
it will not be set to the correct SMCRv1 link in
smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to
correct SMCRv1 link group but conn->lnk points to the SMCRv2 link
mistakenly.

Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx]
will be accessed. Since the link->link_idx is not correct, the related
MR may not have been initialized, so crash happens.

 | Try SMCRv2 device first
 |     |-> conn->lgr:	assign existed SMCRv2 link group;
 |     |-> conn->link:	assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC);
 |     |-> sndbuf & RMB creation fails, quit;
 |
 | Try SMCRv1 device then
 |     |-> conn->lgr:	create SMCRv1 link group and assign;
 |     |-> conn->link:	keep SMCRv2 link mistakenly;
 |     |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0]
 |         initialized.
 |
 | Then smc_clc_send_confirm_accept() accesses
 | conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash.
 v

This patch tries to fix this by cleaning conn->lnk before assigning
link. In addition, it is better to reset the connection and clean the
resources assigned if trying SMCRv2 failed in buffer creation or
registration.

Fixes: e49300a6bf ("net/smc: add listen processing for SMC-Rv2")
Link: https://lore.kernel.org/r/20220523055056.2078994-1-liuyacan@corp.netease.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30 14:03:33 +01:00
arch arm64: dts: imx8mn-var-som: fix PHY detection bug by adding deassert delay 2023-05-30 14:03:33 +01:00
block block, bfq: Fix division by zero error on zero wsum 2023-05-24 17:32:38 +01:00
certs certs: Fix build error when PKCS#11 URI contains semicolon 2023-02-09 11:28:11 +01:00
crypto crypto: testmgr - fix RNG performance in fuzz tests 2023-05-24 17:32:53 +01:00
Documentation dt-binding: cdns,usb3: Fix cdns,on-chip-buff-size type 2023-05-30 14:03:19 +01:00
drivers regulator: mt6359: add read check for PMIC MT6359 2023-05-30 14:03:33 +01:00
fs cifs: mapchars mount option ignored 2023-05-30 14:03:21 +01:00
include net/mlx5: DR, Check force-loopback RC QP capability independently from RoCE 2023-05-30 14:03:33 +01:00
init gcc: disable '-Warray-bounds' for gcc-13 too 2023-04-26 14:28:43 +02:00
io_uring io_uring/rsrc: use nospec'ed indexes 2023-05-11 23:03:24 +09:00
ipc ipc: fix memory leak in init_mqueue_fs() 2022-12-31 13:32:01 +01:00
kernel x86/pci/xen: populate MSI sysfs entries 2023-05-30 14:03:22 +01:00
lib debugobjects: Don't wake up kswapd from fill_pool() 2023-05-30 14:03:20 +01:00
LICENSES LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers 2021-12-16 14:33:10 +01:00
mm mm: fix zswap writeback race condition 2023-05-24 17:32:51 +01:00
net net/smc: Reset connection when trying to use SMCRv2 fails. 2023-05-30 14:03:33 +01:00
rust rust: kernel: Mark rust_fmt_argument as extern "C" 2023-04-26 14:28:38 +02:00
samples samples/bpf: Fix fout leak in hbm's run_bpf_prog 2023-05-24 17:32:38 +01:00
scripts recordmcount: Fix memory leaks in the uwrite function 2023-05-24 17:32:41 +01:00
security selinux: ensure av_permissions.h is built when needed 2023-05-11 23:03:06 +09:00
sound ASoC: Intel: avs: Access path components under lock 2023-05-30 14:03:32 +01:00
tools selftests: fib_tests: mute cleanup error message 2023-05-30 14:03:20 +01:00
usr usr/gen_init_cpio.c: remove unnecessary -1 values from int file 2022-10-03 14:21:44 -07:00
virt KVM: Fix vcpu_array[0] races 2023-05-24 17:32:50 +01:00
.clang-format inet: ping: use hlist_nulls rcu iterator during lookup 2022-12-01 12:42:46 +01:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
.mailmap 9 hotfixes. 6 for MM, 3 for other areas. Four of these patches address 2022-12-10 17:10:52 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Remove Michal Marek from Kbuild maintainers 2022-11-16 14:53:00 +09:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS platform/x86: Move existing HP drivers to a new hp subdir 2023-05-24 17:32:42 +01:00
Makefile Linux 6.1.30 2023-05-24 17:32:53 +01:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.