Go to file
Amit Cohen 05b6d0234a mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
[ Upstream commit 6d6eeabcfaba2fcadf5443b575789ea606f9de83 ]

Lately, a bug was found when many TC filters are added - at some point,
several bugs are printed to dmesg [1] and the switch is crashed with
segmentation fault.

The issue starts when gen_pool_free() fails because of unexpected
behavior - a try to free memory which is already freed, this leads to BUG()
call which crashes the switch and makes many other bugs.

Trying to track down the unexpected behavior led to a bug in eRP code. The
function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
index, sets the value and returns an error code. When gen_pool_alloc()
fails it returns address 0, we track it and return -ENOBUFS outside, BUT
the call for gen_pool_alloc() already override the index in erp_table
structure. This is a problem when such allocation is done as part of
table expansion. This is not a new table, which will not be used in case
of allocation failure. We try to expand eRP table and override the
current index (non-zero) with zero. Then, it leads to an unexpected
behavior when address 0 is freed twice. Note that address 0 is valid in
erp_table->base_index and indeed other tables use it.

gen_pool_alloc() fails in case that there is no space left in the
pre-allocated pool, in our case, the pool is limited to
ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
erp entries are required, we exceed the limit and return an error, this
error leads to "Failed to migrate vregion" print.

Fix this by changing erp_table->base_index only in case of a successful
allocation.

Add a test case for such a scenario. Without this fix it causes
segmentation fault:

$ TESTS="max_erp_entries_test" ./tc_flower.sh
./tc_flower.sh: line 988:  1560 Segmentation fault      tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null

[1]:
kernel BUG at lib/genalloc.c:508!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1
Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
RIP: 0010:gen_pool_free_owner+0xc9/0xe0
...
Call Trace:
 <TASK>
 __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum]
 objagg_obj_root_destroy+0x18/0x80 [objagg]
 objagg_obj_destroy+0x12c/0x130 [objagg]
 mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum]
 mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum]
 mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum]
 mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum]
 tc_setup_cb_destroy+0xc1/0x180
 fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
 __fl_delete+0x1ac/0x1c0 [cls_flower]
 fl_destroy+0xc2/0x150 [cls_flower]
 tcf_proto_destroy+0x1a/0xa0
...
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion

Fixes: f465261aa1 ("mlxsw: spectrum_acl: Implement common eRP core")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4cfca254dfc0e5d283974801a24371c7b6db5989.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 14:34:32 -08:00
arch s390/pci: fix max size calculation in zpci_memcpy_toio() 2024-01-25 14:34:32 -08:00
block blk-throttle: fix lockdep warning of "cgroup_mutex or RCU read lock required!" 2023-12-20 15:41:20 +01:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-22 14:11:22 +02:00
crypto crypto: scomp - fix req->dst buffer overflow 2024-01-25 14:34:24 -08:00
Documentation firmware: ti_sci: Replace HTTP links with HTTPS ones 2023-11-20 10:30:12 +01:00
drivers mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure 2024-01-25 14:34:32 -08:00
fs f2fs: fix to avoid dirent corruption 2024-01-25 14:34:27 -08:00
include drm/bridge: Fix typo in post_disable() description 2024-01-25 14:34:27 -08:00
init rootfs: Fix support for rootfstype= when root= is given 2024-01-25 14:34:30 -08:00
ipc ipc/sem: Fix dangling sem_array access in semtimedop race 2022-12-08 11:23:06 +01:00
kernel kdb: Fix a potential buffer overflow in kdb_local() 2024-01-25 14:34:32 -08:00
lib ida: Fix crash in ida_free when the bitmap is empty 2024-01-25 14:34:21 -08:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm mm: fix unmap_mapping_range high bits shift bug 2024-01-15 18:25:28 +01:00
net ipvs: avoid stat macros calls from preemptible context 2024-01-25 14:34:32 -08:00
samples samples/bpf: Fix buffer overflow in tcp_basertt 2023-07-27 08:37:07 +02:00
scripts sign-file: Fix incorrect return values check 2023-12-20 15:41:17 +01:00
security apparmor: avoid crash when parsed profile name is empty 2024-01-25 14:34:31 -08:00
sound ALSA: oxygen: Fix right channel of capture volume mixer 2024-01-25 14:34:30 -08:00
tools mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure 2024-01-25 14:34:32 -08:00
usr initramfs: restore default compression behavior 2020-04-08 09:08:38 +02:00
virt KVM: Destroy target device if coalesced MMIO unregistration fails 2023-03-11 16:44:01 +01:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes
.gitignore Modules updates for v5.4 2019-09-22 10:34:46 -07:00
.mailmap ARM: SoC fixes 2019-11-10 13:41:59 -08:00
COPYING
CREDITS MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer 2019-10-10 08:12:51 -07:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS iio: stx104: Move to addac subdirectory 2023-08-30 16:27:12 +02:00
Makefile Linux 5.4.267 2024-01-15 18:25:30 +01:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.