[ Upstream commit fb8bc4c74ae4526d9489362ab2793a936d072b84 ]
There are two states for ubifs writing pages:
1. Dirty, Private
2. Not Dirty, Not Private
There is a third possibility which maybe related to [1] that page is
private but not dirty caused by following process:
PA
lock(page)
ubifs_write_end
attach_page_private // set Private
__set_page_dirty_nobuffers // set Dirty
unlock(page)
write_cache_pages
lock(page)
clear_page_dirty_for_io(page) // clear Dirty
ubifs_writepage
write_inode
// fail, goto out, following codes are not executed
// do_writepage
// set_page_writeback // set Writeback
// detach_page_private // clear Private
// end_page_writeback // clear Writeback
out:
unlock(page) // Private, Not Dirty
PB
ksys_fadvise64_64
generic_fadvise
invalidate_inode_page
// page is neither Dirty nor Writeback
invalidate_complete_page
// page_has_private is true
try_to_release_page
ubifs_releasepage
ubifs_assert(c, 0) !!!
Then we may get following assertion failed:
UBIFS error (ubi0:0 pid 1492): ubifs_assert_failed [ubifs]:
UBIFS assert failed: 0, in fs/ubifs/file.c:1499
UBIFS warning (ubi0:0 pid 1492): ubifs_ro_mode [ubifs]:
switched to read-only mode, error -22
CPU: 2 PID: 1492 Comm: aa Not tainted 5.16.0-rc2-00012-g7bb767dee0ba-dirty
Call Trace:
dump_stack+0x13/0x1b
ubifs_ro_mode+0x54/0x60 [ubifs]
ubifs_assert_failed+0x4b/0x80 [ubifs]
ubifs_releasepage+0x7e/0x1e0 [ubifs]
try_to_release_page+0x57/0xe0
invalidate_inode_page+0xfb/0x130
invalidate_mapping_pagevec+0x12/0x20
generic_fadvise+0x303/0x3c0
vfs_fadvise+0x35/0x40
ksys_fadvise64_64+0x4c/0xb0
Jump [2] to find a reproducer.
[1] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty
[2] https://bugzilla.kernel.org/show_bug.cgi?id=215357
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 122deabfe1428bffe95e2bf364ff8a5059bdf089 ]
Following process will cause a memleak for copied up znode:
dirty_cow_znode
zn = copy_znode(c, znode);
err = insert_old_idx(c, zbr->lnum, zbr->offs);
if (unlikely(err))
return ERR_PTR(err); // No one refers to zn.
Fix it by adding copied znode back to tnc, then it will be freed
by ubifs_destroy_tnc_subtree() while closing tnc.
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216705
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 944e096aa24071d3fe22822f6249d3ae309e39ea ]
Dirty znodes will be written on flash in committing process with
following states:
process A | znode state
------------------------------------------------------
do_commit | DIRTY_ZNODE
ubifs_tnc_start_commit | DIRTY_ZNODE
get_znodes_to_commit | DIRTY_ZNODE | COW_ZNODE
layout_commit | DIRTY_ZNODE | COW_ZNODE
fill_gap | 0
write master | 0 or OBSOLETE_ZNODE
process B | znode state
------------------------------------------------------
do_commit | DIRTY_ZNODE[1]
ubifs_tnc_start_commit | DIRTY_ZNODE
get_znodes_to_commit | DIRTY_ZNODE | COW_ZNODE
ubifs_tnc_end_commit | DIRTY_ZNODE | COW_ZNODE
write_index | 0
write master | 0 or OBSOLETE_ZNODE[2] or
| DIRTY_ZNODE[3]
[1] znode is dirtied without concurrent committing process
[2] znode is copied up (re-dirtied by other process) before cleaned
up in committing process
[3] znode is re-dirtied after cleaned up in committing process
Currently, the clean znode count is updated in free_obsolete_znodes(),
which is called only in normal path. If do_commit failed, clean znode
count won't be updated, which triggers a failure ubifs assertion[4] in
ubifs_tnc_close():
ubifs_assert_failed [ubifs]: UBIFS assert failed: freed == n
[4] Commit 380347e9ca ("UBIFS: Add an assertion for clean_zn_cnt").
Fix it by re-statisticing cleaned znode count in tnc_destroy_cnext().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216704
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c15859bfd326c10230f09cb48a17f8a35f190342 ]
It willl cause null-ptr-deref in the following case:
uif_init()
ubi_add_volume()
cdev_add() -> if it fails, call kill_volumes()
device_register()
kill_volumes() -> if ubi_add_volume() fails call this function
ubi_free_volume()
cdev_del()
device_unregister() -> trying to delete a not added device,
it causes null-ptr-deref
So in ubi_free_volume(), it delete devices whether they are added
or not, it will causes null-ptr-deref.
Handle the error case whlie calling ubi_add_volume() to fix this
problem. If add volume fails, set the corresponding vol to null,
so it can not be accessed in kill_volumes() and release the
resource in ubi_add_volume() error path.
Fixes: 801c135ce7 ("UBI: Unsorted Block Images")
Suggested-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9af31d6ec1a4be4caab2550096c6bd2ba8fba472 ]
There is an use-after-free problem reported by KASAN:
==================================================================
BUG: KASAN: use-after-free in ubi_eba_copy_table+0x11f/0x1c0 [ubi]
Read of size 8 at addr ffff888101eec008 by task ubirsvol/4735
CPU: 2 PID: 4735 Comm: ubirsvol
Not tainted 6.1.0-rc1-00003-g84fa3304a7fc-dirty #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report+0x171/0x472
kasan_report+0xad/0x130
ubi_eba_copy_table+0x11f/0x1c0 [ubi]
ubi_resize_volume+0x4f9/0xbc0 [ubi]
ubi_cdev_ioctl+0x701/0x1850 [ubi]
__x64_sys_ioctl+0x11d/0x170
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>
When ubi_change_vtbl_record() returns an error in ubi_resize_volume(),
"new_eba_tbl" will be freed on error handing path, but it is holded
by "vol->eba_tbl" in ubi_eba_replace_table(). It means that the liftcycle
of "vol->eba_tbl" and "vol" are different, so when resizing volume in
next time, it causing an use-after-free fault.
Fix it by not freeing "new_eba_tbl" after it replaced in
ubi_eba_replace_table(), while will be freed in next volume resizing.
Fixes: 801c135ce7 ("UBI: Unsorted Block Images")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e874dcde1cbf82c786c0e7f2899811c02630cc52 ]
UBIFS calculates available space by c->main_bytes - c->lst.total_used
(which means non-index lebs' free and dirty space is accounted into
total available), then index lebs and four lebs (one for gc_lnum, one
for deletions, two for journal heads) are deducted.
In following situation, ubifs may get -ENOSPC from make_reservation():
LEB 84: DATAHD free 122880 used 1920 dirty 2176 dark 6144
LEB 110:DELETION free 126976 used 0 dirty 0 dark 6144 (empty)
LEB 201:gc_lnum free 126976 used 0 dirty 0 dark 6144
LEB 272:GCHD free 77824 used 47672 dirty 1480 dark 6144
LEB 356:BASEHD free 0 used 39776 dirty 87200 dark 6144
OTHERS: index lebs, zero-available non-index lebs
UBIFS calculates the available bytes is 6888 (How to calculate it:
126976 * 5[remain main bytes] - 1920[used] - 47672[used] - 39776[used] -
126976 * 1[deletions] - 126976 * 1[gc_lnum] - 126976 * 2[journal heads]
- 6144 * 5[dark] = 6888) after doing budget, however UBIFS cannot use
BASEHD's dirty space(87200), because UBIFS cannot find next BASEHD to
reclaim current BASEHD. (c->bi.min_idx_lebs equals to c->lst.idx_lebs,
the empty leb won't be found by ubifs_find_free_space(), and dirty index
lebs won't be picked as gced lebs. All non-index lebs has dirty space
less then c->dead_wm, non-index lebs won't be picked as gced lebs
either. So new free lebs won't be produced.). See more details in Link.
To fix it, reserve one leb for each journal head while doing budget.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216562
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 25fce616a61fc2f1821e4a9ce212d0e064707093 ]
If target inode is a special file (eg. block/char device) with nlink
count greater than 1, the inode with ui->data will be re-written on
disk. However, UBIFS losts target inode's data_len while doing space
budget. Bad space budget may let make_reservation() return with -ENOSPC,
which could turn ubifs to read-only mode in do_writepage() process.
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216494
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b248eaf049d9cdc5eb76b59399e4d3de233f02ac ]
Each dirty inode should reserve 'c->bi.inode_budget' bytes in space
budget calculation. Currently, space budget for dirty inode reports
more space than what UBIFS actually needs to write.
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b2ba09060e41adb356b9ae58ef94a7390928004 ]
There is no space budget for ubifs_xrename(). It may let
make_reservation() return with -ENOSPC, which could turn
ubifs to read-only mode in do_writepage() process.
Fix it by adding space budget for ubifs_xrename().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216569
Fixes: 9ec64962af ("ubifs: Implement RENAME_EXCHANGE")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2c36cc6ca23e614f9e4238d0ecf48549ee9002a ]
Fix bad space budget when symlink file is encrypted. Bad space budget
may let make_reservation() return with -ENOSPC, which could turn ubifs
to read-only mode in do_writepage() process.
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216490
Fixes: ca7f85be8d ("ubifs: Add support for encrypted symlinks")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aa6d148e6d6270274e3d5a529b71c54cd329d17f ]
With CONFIG_UBIFS_FS_AUTHENTICATION not set, the compiler can assume that
ubifs_node_check_hash() is never true and drops the call to ubifs_bad_hash().
Is CONFIG_CC_OPTIMIZE_FOR_SIZE enabled this optimization does not happen anymore.
So When CONFIG_UBIFS_FS and CONFIG_CC_OPTIMIZE_FOR_SIZE is enabled but
CONFIG_UBIFS_FS_AUTHENTICATION is not set, the build errors is as followd:
ERROR: modpost: "ubifs_bad_hash" [fs/ubifs/ubifs.ko] undefined!
Fix it by add no-op ubifs_bad_hash() for the CONFIG_UBIFS_FS_AUTHENTICATION=n case.
Fixes: 16a26b20d2 ("ubifs: authentication: Add hashes to index nodes")
Signed-off-by: Li Hua <hucool.lihua@huawei.com>
Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f88c73afe481f93d40801596927e8c0047b6d96 ]
If the return value of the uml_parse_vector_ifspec function is NULL,
we should call kfree(params) to prevent memory leak.
Fixes: 49da7e64f3 ("High Performance UML Vector Network Driver")
Signed-off-by: Xiang Yang <xiangyang3@huawei.com>
Acked-By: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b1b9896718bc1a212dc288ad66a5fa2fef11353d ]
When aops->write_begin() does not initialize fsdata, KMSAN may report
an error passing the latter to aops->write_end().
Fix this by unconditionally initializing fsdata.
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 95ae251fe8 ("f2fs: add fs-verity support")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b87846bd61c7c09560617da416208a5454530d57 ]
This is simpler, and as a side effect it replaces several uses of
kmap_atomic() with its recommended replacement kmap_local_page().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: b1b9896718bc ("fs: f2fs: initialize fsdata in pagecache_write()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3066bc2d58be31275afb51a589668f265e419c37 ]
The ARR (auto reload register) and CMP (compare) registers are
successively written. The status bits to check the update of these
registers are polled together with regmap_read_poll_timeout().
The condition to end the loop may become true, even if one of the
register isn't correctly updated.
So ensure both status bits are set before clearing them.
Fixes: e70a540b4e ("pwm: Add STM32 LPTimer PWM driver")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 334c7b13d38321e47d1a51dba0bef9f4c403ec75 ]
Commit 2cfe9bbec56ea579135cdd92409fff371841904f added support for the
RGB and green PWM controlled LEDs on the HiFive Unmatched board
managed by the leds-pwm-multicolor and leds-pwm drivers respectively.
All three colours of the RGB LED and the green LED run from different
lines of the same PWM, but with the same period so this works fine when
the LED drivers are loaded one after the other.
Unfortunately it does expose a race in the PWM driver when both LED
drivers are loaded at roughly the same time. Here is an example:
| Thread A | Thread B |
| led_pwm_mc_probe | led_pwm_probe |
| devm_fwnode_pwm_get | |
| pwm_sifive_request | |
| ddata->user_count++ | |
| | devm_fwnode_pwm_get |
| | pwm_sifive_request |
| | ddata->user_count++ |
| ... | ... |
| pwm_state_apply | pwm_state_apply |
| pwm_sifive_apply | pwm_sifive_apply |
Now both calls to pwm_sifive_apply will see that ddata->approx_period,
initially 0, is different from the requested period and the clock needs
to be updated. But since ddata->user_count >= 2 both calls will fail
with -EBUSY, which will then cause both LED drivers to fail to probe.
Fix it by letting the first call to pwm_sifive_apply update the clock
even when ddata->user_count != 1.
Fixes: 9e37a53eb0 ("pwm: sifive: Add a driver for SiFive SoC PWM")
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f02f491b786143f08eb19840f1cf4f12aec6dee ]
The lock is only to serialize access and update to user_count and
approx_period between different PWMs served by the same pwm_chip.
So the lock needs only to be taken during the check if the (chip global)
period can and/or needs to be changed.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Stable-dep-of: 334c7b13d383 ("pwm: sifive: Always let the first pwm_apply_state succeed")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3da73f102309fe29150e5c35acd20dd82063ff67 ]
strdup() allocates memory for key_name. We need to release the memory in
the following error paths. Add free() to avoid memory leak.
Fixes: 1e7e478838 ("x86/static_call: Add inline static call implementation for x86-64")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20221205080642.558583-1-linmq006@gmail.com
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fad376fce0af58deebc5075b8539dc05bf639af3 ]
As a shift exponent, db_agl2size can not be less than 0. Add the missing
check to fix the shift-out-of-bounds bug reported by syzkaller:
UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:2227:15
shift exponent -744642816 is negative
Reported-by: syzbot+0be96567042453c0c820@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8c710f75256bb3cf05ac7b1672c82b92c43f3d28 upstream.
The tcindex classifier has served us well for about a quarter of a century
but has not been getting much TLC due to lack of known users. Most recently
it has become easy prey to syzkaller. For this reason, we are retiring it.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bf73588165ba7d32131a043775557a54b6e1db5 upstream.
Port silent mode detection to the future (post make-4.4) versions of gnu make.
Makefile contains the following piece of make code to detect if option -s is
specified on the command line.
ifneq ($(findstring s,$(filter-out --%,$(MAKEFLAGS))),)
This code is executed by make at parse time and assumes that MAKEFLAGS
does not contain command line variable definitions.
Currently if the user defines a=s on the command line, then at build only
time MAKEFLAGS contains " -- a=s".
However, starting with commit dc2d963989b96161472b2cd38cef5d1f4851ea34
MAKEFLAGS contains command line definitions at both parse time and
build time.
This '-s' detection code then confuses a command line variable
definition which contains letter 's' with option -s.
$ # old make
$ make net/wireless/ocb.o a=s
CALL scripts/checksyscalls.sh
DESCEND objtool
$ # this a new make which defines makeflags at parse time
$ ~/src/gmake/make/l64/make net/wireless/ocb.o a=s
$
We can see here that the letter 's' from 'a=s' was confused with -s.
This patch checks for presence of -s using a method recommended by the
make manual here
https://www.gnu.org/software/make/manual/make.html#Testing-Flags.
Link: https://lists.gnu.org/archive/html/bug-make/2022-11/msg00190.html
Reported-by: Jan Palus <jpalus+gnu@fastmail.com>
Signed-off-by: Dmitry Goncharov <dgoncharov@users.sf.net>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31b62a98de42cf65d76e4dcfb571af067d27d83a upstream.
When reading pinconf-pins from debugfs it fails to get the configured pull
type on RK3568, "unsupported pinctrl type" error messages is also reported.
Fix this by adding support for RK3568 in rockchip_get_pull, including a
reverse of the pull-up value swap applied in rockchip_set_pull so that
pull-up is correctly reported in pinconf-pins.
Also update the workaround comment to reflect affected pins, GPIO0_D3-D6.
Fixes: c0dadc0e47a8 ("pinctrl: rockchip: add support for rk3568")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Jianqun Xu <jay.xu@rock-chips.com>
Link: https://lore.kernel.org/r/20230110172955.1258840-1-jonas@kwiboo.se
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 431d1531466033909d2e8c754a7dc3704b70843f upstream.
IO mux selection is configured in PMU_GRF_SOC_CON4 and GRF_IOFUNC_SEL0-5
regs on RK3568. pwm0-2 is configured in PMU_GRF reg and the rest is
configured in GRF_IOFUNC regs according to TRM [1].
Update mux route data to reflect this and use proper detection pin for
UART1 IO mux M1.
This fixes HDMITX IO mux M1 selection and makes it possible to enable
HDMI CEC on my Radxa ROCK 3 Model A v1.31 board.
[1] http://opensource.rock-chips.com/images/2/26/Rockchip_RK3568_TRM_Part1_V1.3-20220930P.PDF
Fixes: c0dadc0e47a8 ("pinctrl: rockchip: add support for rk3568")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20230110084636.1141740-1-jonas@kwiboo.se
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b7dc753fe33a707379e2254317794a4dad6c0fe2 upstream.
A previous cleanup patch accidentally broke some conditional
expressions by replacing the safe "do {} while (0)" constructs
with empty macros. gcc points this out when extra warnings
are enabled:
drivers/net/wireless/ath/ath9k/hif_usb.c: In function 'ath9k_skb_queue_complete':
drivers/net/wireless/ath/ath9k/hif_usb.c:251:57: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
251 | TX_STAT_INC(hif_dev, skb_failed);
Make both sets of macros proper expressions again.
Fixes: d7fc76039b74 ("ath9k: htc: clean up statistics macros")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221215165553.1950307-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 100d9c94ccf15b02742c326cd04f422ab729153b upstream.
Serdes register space sizes are incorrect, update them to match the
actual sizes from downstream QCA 5.4 kernel.
Fixes: 942bcd33ed45 ("arm64: dts: qcom: Fix IPQ8074 PCIe PHY nodes")
Signed-off-by: Robert Marko <robimarko@gmail.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230113164449.906002-1-robimarko@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1cbc1f0d324ba6c4d1b10ac6362b5e0b029f63d5 upstream.
We try to avoid sending VICs defined in the later specs in AVI
infoframes to sinks that conform to the earlier specs, to not upset
them, and use 0 for the VIC instead. However, we do this detection and
conversion to 0 too early, as we'll need the actual VIC to figure out
the aspect ratio.
In particular, for a mode with 64:27 aspect ratio, 0 for VIC fails the
AVI infoframe generation altogether with -EINVAL.
Separate the VIC lookup from the "filtering", and postpone the
filtering, to use the proper VIC for aspect ratio handling, and the 0
VIC for the infoframe video code as needed.
Reported-by: William Tseng <william.tseng@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6153
References: https://lore.kernel.org/r/20220920062316.43162-1-william.tseng@intel.com
Cc: <stable@vger.kernel.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c3e78cc6d01ed237f71ad0038826b08d83d75eef.1672826282.git.jani.nikula@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05eacc198c68cbb35a7281ce4011f8899ee1cfb8 upstream.
Apple iMac11,2 (mid 2010) also with Radeon HD-4670 that has the same
issue as iMac10,1 (late 2009) where the internal eDP panel stays dark on
driver load. This patch treats iMac11,2 the same as iMac10,1,
so the eDP panel stays active.
Additional steps:
Kernel boot parameter radeon.nomodeset=0 required to keep the eDP
panel active.
This patch is an extension of
commit 564d8a2cf3 ("drm/radeon: Fix eDP for single-display iMac10,1 (v2)")
Link: https://lore.kernel.org/all/lsq.1507553064.833262317@decadent.org.uk/
Signed-off-by: Mark Hawrylak <mark.hawrylak@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 046eca5018f8a5dd1dc2cedf87fb5843b9ea3026 upstream.
When a vfio container is preserved across exec, the task does not change,
but it gets a new mm with locked_vm=0, and loses the count from existing
dma mappings. If the user later unmaps a dma mapping, locked_vm underflows
to a large unsigned value, and a subsequent dma map request fails with
ENOMEM in __account_locked_vm.
To avoid underflow, grab and save the mm at the time a dma is mapped.
Use that mm when adjusting locked_vm, rather than re-acquiring the saved
task's mm, which may have changed. If the saved mm is dead, do nothing.
locked_vm is incremented for existing mappings in a subsequent patch.
Fixes: 73fa0d10d0 ("vfio: Type1 IOMMU implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1675184289-267876-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63ba51db24ed1b8f8088a897290eb6c036c5435d upstream.
PCI passthrough to VMs does not work with AMD FCH AHCI adapters: the guest
OS fails to correctly probe devices attached to the controller due to FIS
communication failures:
ata4: softreset failed (1st FIS failed)
...
ata4.00: qc timeout after 5000 msecs (cmd 0xec)
ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Forcing the "bus" reset method before unbinding & binding the adapter to
the vfio-pci driver solves this issue, e.g.:
echo "bus" > /sys/bus/pci/devices/<ID>/reset_method
gives a working guest OS, indicating that the default FLR reset method
doesn't work correctly.
Apply quirk_no_flr() to AMD FCH AHCI devices to work around this issue.
Link: https://lore.kernel.org/r/20230128013951.523247-1-damien.lemoal@opensource.wdc.com
Reported-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 74ff8864cc842be994853095dba6db48e716400a upstream.
On surprise removal, pciehp_unconfigure_device() and acpiphp's
trim_stale_devices() call pci_dev_set_disconnected() to mark removed
devices as permanently offline. Thereby, the PCI core and drivers know
to skip device accesses.
However pci_dev_set_disconnected() takes the device_lock and thus waits for
a concurrent driver bind or unbind to complete. As a result, the driver's
->probe and ->remove hooks have no chance to learn that the device is gone.
That doesn't make any sense, so drop the device_lock and instead use atomic
xchg() and cmpxchg() operations to update the device state.
As a byproduct, an AB-BA deadlock reported by Anatoli is fixed which occurs
on surprise removal with AER concurrently performing a bus reset.
AER bus reset:
INFO: task irq/26-aerdrv:95 blocked for more than 120 seconds.
Tainted: G W 6.2.0-rc3-custom-norework-jan11+
schedule
rwsem_down_write_slowpath
down_write_nested
pciehp_reset_slot # acquires reset_lock
pci_reset_hotplug_slot
pci_slot_reset # acquires device_lock
pci_bus_error_reset
aer_root_reset
pcie_do_recovery
aer_process_err_devices
aer_isr
pciehp surprise removal:
INFO: task irq/26-pciehp:96 blocked for more than 120 seconds.
Tainted: G W 6.2.0-rc3-custom-norework-jan11+
schedule_preempt_disabled
__mutex_lock
mutex_lock_nested
pci_dev_set_disconnected # acquires device_lock
pci_walk_bus
pciehp_unconfigure_device
pciehp_disable_slot
pciehp_handle_presence_or_link_change
pciehp_ist # acquires reset_lock
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590
Fixes: a6bd101b8f ("PCI: Unify device inaccessible")
Link: https://lore.kernel.org/r/3dc88ea82bdc0e37d9000e413d5ebce481cbd629.1674205689.git.lukas@wunner.de
Reported-by: Anatoli Antonovitch <anatoli.antonovitch@amd.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.20+
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ef0217227b42e2c34a18de316cee3da16c9bf1e upstream.
If a PCI bridge is suspended to D3cold upon entering system sleep,
resuming it entails a Fundamental Reset per PCIe r6.0 sec 5.8.
The delay prescribed after a Fundamental Reset in PCIe r6.0 sec 6.6.1
is sought to be observed by:
pci_pm_resume_noirq()
pci_pm_bridge_power_up_actions()
pci_bridge_wait_for_secondary_bus()
However, pci_bridge_wait_for_secondary_bus() bails out if the bridge_d3
flag is not set. That flag indicates whether a bridge is allowed to
suspend to D3cold at *runtime*.
Hence *no* delay is observed on resume from system sleep if runtime
D3cold is forbidden. That doesn't make any sense, so drop the bridge_d3
check from pci_bridge_wait_for_secondary_bus().
The purpose of the bridge_d3 check was probably to avoid delays if a
bridge remained in D0 during suspend. However the sole caller of
pci_bridge_wait_for_secondary_bus(), pci_pm_bridge_power_up_actions(),
is only invoked if the previous power state was D3cold. Hence the
additional bridge_d3 check seems superfluous.
Fixes: ad9001f2f4 ("PCI/PM: Add missing link delays required by the PCIe spec")
Link: https://lore.kernel.org/r/eb37fa345285ec8bacabbf06b020b803f77bdd3d.1673769517.git.lukas@wunner.de
Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ddfc3cd806081ce1f6c9c2f988cbb031f35d28f upstream.
Runtime code patching must be done at a naturally aligned address, or we
may execute on a partial instruction.
We have encountered problems traced back to static jump functions during
the test. We switched the tracer randomly for every 1~5 seconds on a
dual-core QEMU setup and found the kernel sucking at a static branch
where it jumps to itself.
The reason is that the static branch was 2-byte but not 4-byte aligned.
Then, the kernel would patch the instruction, either J or NOP, with two
half-word stores if the machine does not have efficient unaligned
accesses. Thus, moments exist where half of the NOP mixes with the other
half of the J when transitioning the branch. In our particular case, on
a little-endian machine, the upper half of the NOP was mixed with the
lower part of the J when enabling the branch, resulting in a jump that
jumped to itself. Conversely, it would result in a HINT instruction when
disabling the branch, but it might not be observable.
ARM64 does not have this problem since all instructions must be 4-byte
aligned.
Fixes: ebc00dde8a ("riscv: Add jump-label implementation")
Link: https://lore.kernel.org/linux-riscv/20220913094252.3555240-6-andy.chiu@sifive.com/
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230206090440.1255001-1-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 578797f0c8cbc2e3ec5fc0dab87087b4c7073686 upstream.
A fix for:
BUG: KASAN: slab-out-of-bounds in ses_intf_remove+0x23f/0x270 [ses]
Read of size 8 at addr ffff88a10d32e5d8 by task rmmod/12013
When edev->components is zero, accessing edev->component[0] members is
wrong.
Link: https://lore.kernel.org/r/20230202162451.15346-5-thenzl@redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b4f5028e493cb353a5c8f5c45073eeea0303abd upstream.
A fix for:
BUG: KASAN: slab-out-of-bounds in ses_enclosure_data_process+0x949/0xe30 [ses]
Read of size 1 at addr ffff88a1b043a451 by task systemd-udevd/3271
Checking after (and before in next loop) addl_desc_ptr[1] is sufficient, we
expect the size to be sanitized before first access to addl_desc_ptr[1].
Make sure we don't walk beyond end of page.
Link: https://lore.kernel.org/r/20230202162451.15346-2-thenzl@redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fe97ff3d94934649abb0652028dd7296170c8d0 upstream.
An enclosure with no components can't usefully be operated by the driver
(since effectively it has nothing to manage), so report the problem and
don't attach. Not attaching also fixes an oops which could occur if the
driver tries to manage a zero component enclosure.
[mkp: Switched to KERN_WARNING since this scenario is common]
Link: https://lore.kernel.org/r/c5deac044ac409e32d9ad9968ce0dcbc996bfc7a.camel@linux.ibm.com
Cc: stable@vger.kernel.org
Reported-by: Ding Hui <dinghui@sangfor.com.cn>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fbc74feb642deb688cc97f76d40b7287ddd4cb1 upstream.
If after an adapter reset the appearance of link is not recovered, the
devices are not rediscovered. This is result of a race condition between
adapter reset (abort_isp) and the topology scan. During adapter reset, the
ABORT_ISP_ACTIVE flag is set. Topology scan usually occurred after adapter
reset. In this case, the topology scan came earlier than usual where it
ran into problem due to ABORT_ISP_ACTIVE flag was still set.
kernel: qla2xxx [0000:13:00.0]-1005:1: Cmd 0x6a aborted with timeout since ISP Abort is pending
kernel: qla2xxx [0000:13:00.0]-28a0:1: MBX_GET_PORT_NAME failed, No FL Port.
kernel: qla2xxx [0000:13:00.0]-286b:1: qla2x00_configure_loop: exiting normally. local port wwpn 51402ec0123d9a80 id 012300)
kernel: qla2xxx [0000:13:00.0]-8017:1: ADAPTER RESET SUCCEEDED nexus=1:0:15.
Allow adapter reset to complete before any scan can start.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c75e6aef5039830cce5d4cf764dd204522f89e6b upstream.
The following message and call trace was seen with debug kernels:
DMA-API: qla2xxx 0000:41:00.0: device driver failed to check map
error [device address=0x00000002a3ff38d8] [size=1024 bytes] [mapped as
single]
WARNING: CPU: 0 PID: 2930 at kernel/dma/debug.c:1017
check_unmap+0xf42/0x1990
Call Trace:
debug_dma_unmap_page+0xc9/0x100
qla_nvme_ls_unmap+0x141/0x210 [qla2xxx]
Remove DMA mapping from the driver altogether, as it is already done by FC
layer. This prevents the warning.
Fixes: c85ab7d9e27a ("scsi: qla2xxx: Fix missed DMA unmap for NVMe ls requests")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1ae65c082f74536ec292b15766f2846f0238373 upstream.
User experienced symptoms of adapter failure in NPIV environment. NPIV
hosts were allowed to trigger chip reset back to back due to NPIV link
state being slow to come online.
Fix link failure in NPIV environment by removing NPIV host from directly
being able to perform chip reset.
kernel: qla2xxx [0000:04:00.1]-6009:261: Loop down - aborting ISP.
kernel: qla2xxx [0000:04:00.1]-6009:262: Loop down - aborting ISP.
kernel: qla2xxx [0000:04:00.1]-6009:281: Loop down - aborting ISP.
kernel: qla2xxx [0000:04:00.1]-6009:285: Loop down - aborting ISP
Fixes: 0d6e61bc6a ("[SCSI] qla2xxx: Correct various NPIV issues.")
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8843e06f67b14f71c044bf6267b2387784c7e198 upstream.
It seems a data race between ring_buffer writing and integrity check.
That is, RB_FLAG of head_page is been updating, while at same time
RB_FLAG was cleared when doing integrity check rb_check_pages():
rb_check_pages() rb_handle_head_page():
-------- --------
rb_head_page_deactivate()
rb_head_page_set_normal()
rb_head_page_activate()
We do intergrity test of the list to check if the list is corrupted and
it is still worth doing it. So, let's refactor rb_check_pages() such that
we no longer clear and set flag during the list sanity checking.
[1] and [2] are the test to reproduce and the crash report respectively.
1:
``` read_trace.sh
while true;
do
# the "trace" file is closed after read
head -1 /sys/kernel/tracing/trace > /dev/null
done
```
``` repro.sh
sysctl -w kernel.panic_on_warn=1
# function tracer will writing enough data into ring_buffer
echo function > /sys/kernel/tracing/current_tracer
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
```
2:
------------[ cut here ]------------
WARNING: CPU: 9 PID: 62 at kernel/trace/ring_buffer.c:2653
rb_move_tail+0x450/0x470
Modules linked in:
CPU: 9 PID: 62 Comm: ksoftirqd/9 Tainted: G W 6.2.0-rc6+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:rb_move_tail+0x450/0x470
Code: ff ff 4c 89 c8 f0 4d 0f b1 02 48 89 c2 48 83 e2 fc 49 39 d0 75 24
83 e0 03 83 f8 02 0f 84 e1 fb ff ff 48 8b 57 10 f0 ff 42 08 <0f> 0b 83
f8 02 0f 84 ce fb ff ff e9 db
RSP: 0018:ffffb5564089bd00 EFLAGS: 00000203
RAX: 0000000000000000 RBX: ffff9db385a2bf81 RCX: ffffb5564089bd18
RDX: ffff9db281110100 RSI: 0000000000000fe4 RDI: ffff9db380145400
RBP: ffff9db385a2bf80 R08: ffff9db385a2bfc0 R09: ffff9db385a2bfc2
R10: ffff9db385a6c000 R11: ffff9db385a2bf80 R12: 0000000000000000
R13: 00000000000003e8 R14: ffff9db281110100 R15: ffffffffbb006108
FS: 0000000000000000(0000) GS:ffff9db3bdcc0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005602323024c8 CR3: 0000000022e0c000 CR4: 00000000000006e0
Call Trace:
<TASK>
ring_buffer_lock_reserve+0x136/0x360
? __do_softirq+0x287/0x2df
? __pfx_rcu_softirq_qs+0x10/0x10
trace_function+0x21/0x110
? __pfx_rcu_softirq_qs+0x10/0x10
? __do_softirq+0x287/0x2df
function_trace_call+0xf6/0x120
0xffffffffc038f097
? rcu_softirq_qs+0x5/0x140
rcu_softirq_qs+0x5/0x140
__do_softirq+0x287/0x2df
run_ksoftirqd+0x2a/0x30
smpboot_thread_fn+0x188/0x220
? __pfx_smpboot_thread_fn+0x10/0x10
kthread+0xe7/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
---[ end trace 0000000000000000 ]---
[ crash report and test reproducer credit goes to Zheng Yejian]
Link: https://lore.kernel.org/linux-trace-kernel/1676376403-16462-1-git-send-email-quic_mojha@quicinc.com
Cc: <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 1039221cc2 ("ring-buffer: Do not disable recording when there is an iterator")
Reported-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e7d2a8f0b52abf23b1dc13b3d88bc0923383cd5 upstream.
There is a disconnect between the run_command function and the
wait_for_input. The wait_for_input has a default timeout of 2 minutes. But
if that happens, the run_command loop will exit out to the waitpid() of
the executing command. This fails in that it no longer monitors the
command, and also, the ssh to the test box can hang when its finished, as
it's waiting for the pipe it's writing to to flush, but the loop that
reads that pipe has already exited, leaving the command stuck, and the
test hangs.
Instead, make the default "wait_for_input" of the run_command infinite,
and allow the user to override it if they want with a default timeout
option "RUN_TIMEOUT".
But this fixes the hang that happens when the pipe is full and the ssh
session never exits.
Cc: stable@vger.kernel.org
Fixes: 6e98d1b441 ("ktest: Add timeout to ssh command")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e8bf9b98d40dbdf4e39362e3b85a70c61da68cb7 upstream.
In the "reboot" command, it does a check of the machine to see if it is
still alive with a simple "ssh echo" command. If it fails, it will assume
that a normal "ssh reboot" is not possible and force a power cycle.
In this case, the "start_monitor" is executed, but the "end_monitor" is
not, and this causes the screen will not be given back to the console. That
is, after the test, a "reset" command needs to be performed, as "echo" is
turned off.
Cc: stable@vger.kernel.org
Fixes: 6474ace999 ("ktest.pl: Powercycle the box on reboot if no connection can be made")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 83d29d439cd3ef23041570d55841f814af2ecac0 upstream.
When monitoring the console output, the stdout is being redirected to do
so. If Ctrl^C is hit during this mode, the stdout is not back to the
console, the user does not see anything they type (no echo).
Add "end_monitor" to the SIGINT interrupt handler to give back the console
on Ctrl^C.
Cc: stable@vger.kernel.org
Fixes: 9f2cdcbbb9 ("ktest: Give console process a dedicated tty")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 upstream.
Kernel build regression with LLVM was reported here:
https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/ with
commit f35b5d7d676e ("mm: align larger anonymous mappings on THP
boundaries"). And the commit f35b5d7d676e was reverted.
It turned out the regression is related with madvise(MADV_DONTNEED)
was used by ld.lld. But with none PMD_SIZE aligned parameter len.
trace-bpfcc captured:
531607 531732 ld.lld do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4
531607 531793 ld.lld do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4
If the underneath physical page is THP, the madvise(MADV_DONTNEED) can
trigger split_queue_lock contention raised significantly. perf showed
following data:
14.85% 0.00% ld.lld [kernel.kallsyms] [k]
entry_SYSCALL_64_after_hwframe
11.52%
entry_SYSCALL_64_after_hwframe
do_syscall_64
__x64_sys_madvise
do_madvise.part.0
zap_page_range
unmap_single_vma
unmap_page_range
page_remove_rmap
deferred_split_huge_page
__lock_text_start
native_queued_spin_lock_slowpath
If THP can't be removed from rmap as whole THP, partial THP will be
removed from rmap by removing sub-pages from rmap. Even the THP head page
is added to deferred queue already, the split_queue_lock will be acquired
and check whether the THP head page is in the queue already. Thus, the
contention of split_queue_lock is raised.
Before acquire split_queue_lock, check and bail out early if the THP
head page is in the queue already. The checking without holding
split_queue_lock could race with deferred_split_scan, but it doesn't
impact the correctness here.
Test result of building kernel with ld.lld:
commit 7b5a0b664ebe (parent commit of f35b5d7d676e):
time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
6:07.99 real, 26367.77 user, 5063.35 sys
commit f35b5d7d676e:
time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
7:22.15 real, 26235.03 user, 12504.55 sys
commit f35b5d7d676e with the fixing patch:
time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
6:08.49 real, 26520.15 user, 5047.91 sys
Link: https://lkml.kernel.org/r/20221223135207.2275317-1-fengwei.yin@intel.com
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit da34a8484d162585e22ed8c1e4114aa2f60e3567 upstream.
Charge moving mode in cgroup1 allows memory to follow tasks as they
migrate between cgroups. This is, and always has been, a questionable
thing to do - for several reasons.
First, it's expensive. Pages need to be identified, locked and isolated
from various MM operations, and reassigned, one by one.
Second, it's unreliable. Once pages are charged to a cgroup, there isn't
always a clear owner task anymore. Cache isn't moved at all, for example.
Mapped memory is moved - but if trylocking or isolating a page fails,
it's arbitrarily left behind. Frequent moving between domains may leave a
task's memory scattered all over the place.
Third, it isn't really needed. Launcher tasks can kick off workload tasks
directly in their target cgroup. Using dedicated per-workload groups
allows fine-grained policy adjustments - no need to move tasks and their
physical pages between control domains. The feature was never
forward-ported to cgroup2, and it hasn't been missed.
Despite it being a niche usecase, the maintenance overhead of supporting
it is enormous. Because pages are moved while they are live and subject
to various MM operations, the synchronization rules are complicated.
There are lock_page_memcg() in MM and FS code, which non-cgroup people
don't understand. In some cases we've been able to shift code and cgroup
API calls around such that we can rely on native locking as much as
possible. But that's fragile, and sometimes we need to hold MM locks for
longer than we otherwise would (pte lock e.g.).
Mark the feature deprecated. Hopefully we can remove it soon.
And backport into -stable kernels so that people who develop against
earlier kernels are warned about this deprecation as early as possible.
[akpm@linux-foundation.org: fix memory.rst underlining]
Link: https://lkml.kernel.org/r/Y5COd+qXwk/S+n8N@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>