From 8383226583251858814d5521b542e7bf7dbadc4b Mon Sep 17 00:00:00 2001 From: Wilken Gottwalt Date: Sat, 13 Nov 2021 06:53:52 +0000 Subject: [PATCH 001/143] hwmon: (corsair-psu) fix plain integer used as NULL pointer sparse warnings: (new ones prefixed by >>) >> drivers/hwmon/corsair-psu.c:536:82: sparse: sparse: Using plain integer as NULL pointer Fixes: d115b51e0e56 ("hwmon: add Corsair PSU HID controller driver") Reported-by: kernel test robot Signed-off-by: Wilken Gottwalt Link: https://lore.kernel.org/r/YY9hAL8MZEQYLYPf@monster.localdomain Signed-off-by: Guenter Roeck --- drivers/hwmon/corsair-psu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hwmon/corsair-psu.c b/drivers/hwmon/corsair-psu.c index 731d5117f9f1..14389fd7afb8 100644 --- a/drivers/hwmon/corsair-psu.c +++ b/drivers/hwmon/corsair-psu.c @@ -729,7 +729,7 @@ static int corsairpsu_probe(struct hid_device *hdev, const struct hid_device_id corsairpsu_check_cmd_support(priv); priv->hwmon_dev = hwmon_device_register_with_info(&hdev->dev, "corsairpsu", priv, - &corsairpsu_chip_info, 0); + &corsairpsu_chip_info, NULL); if (IS_ERR(priv->hwmon_dev)) { ret = PTR_ERR(priv->hwmon_dev); From dbd3e6eaf3d813939b28e8a66e29d81cdc836445 Mon Sep 17 00:00:00 2001 From: Armin Wolf Date: Fri, 12 Nov 2021 18:14:40 +0100 Subject: [PATCH 002/143] hwmon: (dell-smm) Fix warning on /proc/i8k creation error MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The removal function is called regardless of whether /proc/i8k was created successfully or not, the later causing a WARN() on module removal. Fix that by only registering the removal function if /proc/i8k was created successfully. Tested on a Inspiron 3505. Fixes: 039ae58503f3 ("hwmon: Allow to compile dell-smm-hwmon driver without /proc/i8k") Signed-off-by: Armin Wolf Acked-by: Pali Rohár Link: https://lore.kernel.org/r/20211112171440.59006-1-W_Armin@gmx.de Signed-off-by: Guenter Roeck --- drivers/hwmon/dell-smm-hwmon.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c index eaace478f508..5596c211f38d 100644 --- a/drivers/hwmon/dell-smm-hwmon.c +++ b/drivers/hwmon/dell-smm-hwmon.c @@ -627,10 +627,9 @@ static void __init i8k_init_procfs(struct device *dev) { struct dell_smm_data *data = dev_get_drvdata(dev); - /* Register the proc entry */ - proc_create_data("i8k", 0, NULL, &i8k_proc_ops, data); - - devm_add_action_or_reset(dev, i8k_exit_procfs, NULL); + /* Only register exit function if creation was successful */ + if (proc_create_data("i8k", 0, NULL, &i8k_proc_ops, data)) + devm_add_action_or_reset(dev, i8k_exit_procfs, NULL); } #else From 214f525255069a55b4664842c68bc15b2ee049f0 Mon Sep 17 00:00:00 2001 From: Zev Weiss Date: Wed, 10 Nov 2021 18:53:38 -0800 Subject: [PATCH 003/143] hwmon: (nct6775) mask out bank number in nct6775_wmi_read_value() The first call to nct6775_asuswmi_read() in nct6775_wmi_read_value() had been passing the full bank+register number instead of just the lower 8 bits. It didn't end up actually causing problems because the second argument of that function is a u8 anyway, but it seems preferable to be explicit about it at the call site (and consistent with the rest of the code). Signed-off-by: Zev Weiss Fixes: 3fbbfc27f955 ("hwmon: (nct6775) Support access via Asus WMI") Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20211111025339.27520-1-zev@bewilderbeest.net Signed-off-by: Guenter Roeck --- drivers/hwmon/nct6775.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hwmon/nct6775.c b/drivers/hwmon/nct6775.c index 93dca471972e..57ce8633a725 100644 --- a/drivers/hwmon/nct6775.c +++ b/drivers/hwmon/nct6775.c @@ -1527,7 +1527,7 @@ static u16 nct6775_wmi_read_value(struct nct6775_data *data, u16 reg) nct6775_wmi_set_bank(data, reg); - err = nct6775_asuswmi_read(data->bank, reg, &tmp); + err = nct6775_asuswmi_read(data->bank, reg & 0xff, &tmp); if (err) return 0; From 0e4190d762ef2609111507e1b9553a166436f556 Mon Sep 17 00:00:00 2001 From: David Mosberger-Tang Date: Sat, 20 Nov 2021 21:28:56 +0000 Subject: [PATCH 004/143] hwmon: (sht4x) Fix EREMOTEIO errors Per datasheet, SHT4x may need up to 8.2ms for a "high repeatability" measurement to complete. Attempting to read the result too early triggers a NAK which then causes an EREMOTEIO error. This behavior has been confirmed with a logic analyzer while running the I2C bus at only 40kHz. The low frequency precludes any signal-integrity issues, which was also confirmed by the absence of any CRC8 errors. In this configuration, a NAK occurred on any read that followed the measurement command within less than 8.2ms. Signed-off-by: David Mosberger-Tang Link: https://lore.kernel.org/r/20211120212849.2300854-2-davidm@egauge.net Signed-off-by: Guenter Roeck --- drivers/hwmon/sht4x.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/hwmon/sht4x.c b/drivers/hwmon/sht4x.c index 09c2a0b06444..3415d7a0e0fc 100644 --- a/drivers/hwmon/sht4x.c +++ b/drivers/hwmon/sht4x.c @@ -23,7 +23,7 @@ /* * I2C command delays (in microseconds) */ -#define SHT4X_MEAS_DELAY 1000 +#define SHT4X_MEAS_DELAY_HPM 8200 /* see t_MEAS,h in datasheet */ #define SHT4X_DELAY_EXTRA 10000 /* @@ -90,7 +90,7 @@ static int sht4x_read_values(struct sht4x_data *data) if (ret < 0) goto unlock; - usleep_range(SHT4X_MEAS_DELAY, SHT4X_MEAS_DELAY + SHT4X_DELAY_EXTRA); + usleep_range(SHT4X_MEAS_DELAY_HPM, SHT4X_MEAS_DELAY_HPM + SHT4X_DELAY_EXTRA); ret = i2c_master_recv(client, raw_data, SHT4X_RESPONSE_LENGTH); if (ret != SHT4X_RESPONSE_LENGTH) { From 12dc48f545fd349ef2cadcc4d816706951b87998 Mon Sep 17 00:00:00 2001 From: David Heidelberg Date: Wed, 24 Nov 2021 16:51:01 +0100 Subject: [PATCH 005/143] ASoC: dt-bindings: wlf,wm8962: add missing interrupt property Both, hardware and drivers does support interrupts. Fix warnings as: arch/arm/boot/dts/tegra30-microsoft-surface-rt-efi.dt.yaml: audio-codec@1a: 'interrupt-parent', 'interrupts' do not match any of the regexes: 'pinctrl-[0-9]+' From schema: /home/runner/work/linux/linux/Documentation/devicetree/bindings/sound/wlf,wm8962.yaml Fixes: cd51b942f344 ("ASoC: dt-bindings: wlf,wm8962: Convert to json-schema") Signed-off-by: David Heidelberg Acked-by: Charles Keepax Reviewed-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20211124155101.59694-1-david@ixit.cz Signed-off-by: Mark Brown --- Documentation/devicetree/bindings/sound/wlf,wm8962.yaml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/devicetree/bindings/sound/wlf,wm8962.yaml b/Documentation/devicetree/bindings/sound/wlf,wm8962.yaml index 0e6249d7c133..5e172e9462b9 100644 --- a/Documentation/devicetree/bindings/sound/wlf,wm8962.yaml +++ b/Documentation/devicetree/bindings/sound/wlf,wm8962.yaml @@ -19,6 +19,9 @@ properties: clocks: maxItems: 1 + interrupts: + maxItems: 1 + "#sound-dai-cells": const: 0 From 70408f755f589f67957b9ec6852e6b01f858d0a2 Mon Sep 17 00:00:00 2001 From: Sameer Pujar Date: Tue, 23 Nov 2021 19:37:34 +0530 Subject: [PATCH 006/143] ASoC: tegra: Balance runtime PM count After successful application of volume/mute settings via mixer control put calls, the control returns without balancing the runtime PM count. This makes device to be always runtime active. Fix this by allowing control to reach pm_runtime_put() call. Fixes: e539891f9687 ("ASoC: tegra: Add Tegra210 based MVC driver") Cc: stable@vger.kernel.org Signed-off-by: Sameer Pujar Link: https://lore.kernel.org/r/1637676459-31191-2-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown --- sound/soc/tegra/tegra210_mvc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/soc/tegra/tegra210_mvc.c b/sound/soc/tegra/tegra210_mvc.c index 85b155887ec2..436bb151fed1 100644 --- a/sound/soc/tegra/tegra210_mvc.c +++ b/sound/soc/tegra/tegra210_mvc.c @@ -164,7 +164,7 @@ static int tegra210_mvc_put_mute(struct snd_kcontrol *kcontrol, if (err < 0) goto end; - return 1; + err = 1; end: pm_runtime_put(cmpnt->dev); @@ -236,7 +236,7 @@ static int tegra210_mvc_put_vol(struct snd_kcontrol *kcontrol, TEGRA210_MVC_VOLUME_SWITCH_MASK, TEGRA210_MVC_VOLUME_SWITCH_TRIGGER); - return 1; + err = 1; end: pm_runtime_put(cmpnt->dev); From af120d07bbb0721708b10204beed66ed2cb0cb62 Mon Sep 17 00:00:00 2001 From: Sameer Pujar Date: Tue, 23 Nov 2021 19:37:35 +0530 Subject: [PATCH 007/143] ASoC: tegra: Use normal system sleep for SFC The driver currently subscribes for a late system sleep call. The initcall_debug log shows that suspend call for SFC device happens after the parent device (AHUB). This seems to cause suspend failure on Jetson TX2 platform. Also there is no use of having late system sleep specifically for SFC device. Fix the order by using normal system sleep. Fixes: b2f74ec53a6c ("ASoC: tegra: Add Tegra210 based SFC driver") Cc: stable@vger.kernel.org Signed-off-by: Sameer Pujar Link: https://lore.kernel.org/r/1637676459-31191-3-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown --- sound/soc/tegra/tegra210_sfc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/soc/tegra/tegra210_sfc.c b/sound/soc/tegra/tegra210_sfc.c index 7a2227ed3df6..368f077e7bee 100644 --- a/sound/soc/tegra/tegra210_sfc.c +++ b/sound/soc/tegra/tegra210_sfc.c @@ -3594,8 +3594,8 @@ static int tegra210_sfc_platform_remove(struct platform_device *pdev) static const struct dev_pm_ops tegra210_sfc_pm_ops = { SET_RUNTIME_PM_OPS(tegra210_sfc_runtime_suspend, tegra210_sfc_runtime_resume, NULL) - SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, - pm_runtime_force_resume) + SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, + pm_runtime_force_resume) }; static struct platform_driver tegra210_sfc_driver = { From c83d263a89f30d1c0274827c475f3583cf8e477f Mon Sep 17 00:00:00 2001 From: Sameer Pujar Date: Tue, 23 Nov 2021 19:37:36 +0530 Subject: [PATCH 008/143] ASoC: tegra: Use normal system sleep for MVC The driver currently subscribes for a late system sleep call. The initcall_debug log shows that suspend call for MVC device happens after the parent device (AHUB). This seems to cause suspend failure on Jetson TX2 platform. Also there is no use of having late system sleep specifically for MVC device. Fix the order by using normal system sleep. Fixes: e539891f9687 ("ASoC: tegra: Add Tegra210 based MVC driver") Cc: stable@vger.kernel.org Signed-off-by: Sameer Pujar Link: https://lore.kernel.org/r/1637676459-31191-4-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown --- sound/soc/tegra/tegra210_mvc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/soc/tegra/tegra210_mvc.c b/sound/soc/tegra/tegra210_mvc.c index 436bb151fed1..acf59328dcb6 100644 --- a/sound/soc/tegra/tegra210_mvc.c +++ b/sound/soc/tegra/tegra210_mvc.c @@ -639,8 +639,8 @@ static int tegra210_mvc_platform_remove(struct platform_device *pdev) static const struct dev_pm_ops tegra210_mvc_pm_ops = { SET_RUNTIME_PM_OPS(tegra210_mvc_runtime_suspend, tegra210_mvc_runtime_resume, NULL) - SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, - pm_runtime_force_resume) + SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, + pm_runtime_force_resume) }; static struct platform_driver tegra210_mvc_driver = { From b78400e41653b3a752a4cd17d2fcbd4a96bb4bc2 Mon Sep 17 00:00:00 2001 From: Sameer Pujar Date: Tue, 23 Nov 2021 19:37:37 +0530 Subject: [PATCH 009/143] ASoC: tegra: Use normal system sleep for Mixer The driver currently subscribes for a late system sleep call. The initcall_debug log shows that suspend call for Mixer device happens after the parent device (AHUB). This seems to cause suspend failure on Jetson TX2 platform. Also there is no use of having late system sleep specifically for Mixer device. Fix the order by using normal system sleep. Fixes: 05bb3d5ec64a ("ASoC: tegra: Add Tegra210 based Mixer driver") Cc: stable@vger.kernel.org Signed-off-by: Sameer Pujar Link: https://lore.kernel.org/r/1637676459-31191-5-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown --- sound/soc/tegra/tegra210_mixer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/soc/tegra/tegra210_mixer.c b/sound/soc/tegra/tegra210_mixer.c index 51d375573cfa..16e679a95658 100644 --- a/sound/soc/tegra/tegra210_mixer.c +++ b/sound/soc/tegra/tegra210_mixer.c @@ -666,8 +666,8 @@ static int tegra210_mixer_platform_remove(struct platform_device *pdev) static const struct dev_pm_ops tegra210_mixer_pm_ops = { SET_RUNTIME_PM_OPS(tegra210_mixer_runtime_suspend, tegra210_mixer_runtime_resume, NULL) - SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, - pm_runtime_force_resume) + SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, + pm_runtime_force_resume) }; static struct platform_driver tegra210_mixer_driver = { From 638c31d542a576714a52bb6a9a7dedff98e32a1d Mon Sep 17 00:00:00 2001 From: Sameer Pujar Date: Tue, 23 Nov 2021 19:37:38 +0530 Subject: [PATCH 010/143] ASoC: tegra: Use normal system sleep for AMX The driver currently subscribes for a late system sleep call. The initcall_debug log shows that suspend call for AMX device happens after the parent device (AHUB). This seems to cause suspend failure on Jetson TX2 platform. Also there is no use of having late system sleep specifically for AMX device. Fix the order by using normal system sleep. Fixes: 77f7df346c45 ("ASoC: tegra: Add Tegra210 based AMX driver") Cc: stable@vger.kernel.org Signed-off-by: Sameer Pujar Link: https://lore.kernel.org/r/1637676459-31191-6-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown --- sound/soc/tegra/tegra210_amx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/soc/tegra/tegra210_amx.c b/sound/soc/tegra/tegra210_amx.c index 689576302ede..d064cc67fea6 100644 --- a/sound/soc/tegra/tegra210_amx.c +++ b/sound/soc/tegra/tegra210_amx.c @@ -583,8 +583,8 @@ static int tegra210_amx_platform_remove(struct platform_device *pdev) static const struct dev_pm_ops tegra210_amx_pm_ops = { SET_RUNTIME_PM_OPS(tegra210_amx_runtime_suspend, tegra210_amx_runtime_resume, NULL) - SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, - pm_runtime_force_resume) + SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, + pm_runtime_force_resume) }; static struct platform_driver tegra210_amx_driver = { From cf36de4fc5ce5502ce5070a793addd9d49df4113 Mon Sep 17 00:00:00 2001 From: Sameer Pujar Date: Tue, 23 Nov 2021 19:37:39 +0530 Subject: [PATCH 011/143] ASoC: tegra: Use normal system sleep for ADX The driver currently subscribes for a late system sleep call. The initcall_debug log shows that suspend call for ADX device happens after the parent device (AHUB). This seems to cause suspend failure on Jetson TX2 platform. Also there is no use of having late system sleep specifically for ADX device. Fix the order by using normal system sleep. Fixes: a99ab6f395a9 ("ASoC: tegra: Add Tegra210 based ADX driver") Cc: stable@vger.kernel.org Signed-off-by: Sameer Pujar Link: https://lore.kernel.org/r/1637676459-31191-7-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown --- sound/soc/tegra/tegra210_adx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/soc/tegra/tegra210_adx.c b/sound/soc/tegra/tegra210_adx.c index 933c4503fe50..3785cade2d9a 100644 --- a/sound/soc/tegra/tegra210_adx.c +++ b/sound/soc/tegra/tegra210_adx.c @@ -514,8 +514,8 @@ static int tegra210_adx_platform_remove(struct platform_device *pdev) static const struct dev_pm_ops tegra210_adx_pm_ops = { SET_RUNTIME_PM_OPS(tegra210_adx_runtime_suspend, tegra210_adx_runtime_resume, NULL) - SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, - pm_runtime_force_resume) + SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, + pm_runtime_force_resume) }; static struct platform_driver tegra210_adx_driver = { From 4999d703c0e66f9f196b6edc0b8fdeca8846b8b6 Mon Sep 17 00:00:00 2001 From: Rob Clark Date: Wed, 17 Nov 2021 17:04:52 -0800 Subject: [PATCH 012/143] ASoC: rt5682: Fix crash due to out of scope stack vars Move the declaration of temporary arrays to somewhere that won't go out of scope before the devm_clk_hw_register() call, lest we be at the whim of the compiler for whether those stack variables get overwritten. Fixes a crash seen with gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) Fixes: edbd24ea1e5c ("ASoC: rt5682: Drop usage of __clk_get_name()") Signed-off-by: Rob Clark Reviewed-by: Stephen Boyd Link: https://lore.kernel.org/r/20211118010453.843286-1-robdclark@gmail.com Signed-off-by: Mark Brown --- sound/soc/codecs/rt5682.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/sound/soc/codecs/rt5682.c b/sound/soc/codecs/rt5682.c index 04cb747c2b12..5224123d0d3b 100644 --- a/sound/soc/codecs/rt5682.c +++ b/sound/soc/codecs/rt5682.c @@ -2858,6 +2858,8 @@ int rt5682_register_dai_clks(struct rt5682_priv *rt5682) for (i = 0; i < RT5682_DAI_NUM_CLKS; ++i) { struct clk_init_data init = { }; + struct clk_parent_data parent_data; + const struct clk_hw *parent; dai_clk_hw = &rt5682->dai_clks_hw[i]; @@ -2865,17 +2867,17 @@ int rt5682_register_dai_clks(struct rt5682_priv *rt5682) case RT5682_DAI_WCLK_IDX: /* Make MCLK the parent of WCLK */ if (rt5682->mclk) { - init.parent_data = &(struct clk_parent_data){ + parent_data = (struct clk_parent_data){ .fw_name = "mclk", }; + init.parent_data = &parent_data; init.num_parents = 1; } break; case RT5682_DAI_BCLK_IDX: /* Make WCLK the parent of BCLK */ - init.parent_hws = &(const struct clk_hw *){ - &rt5682->dai_clks_hw[RT5682_DAI_WCLK_IDX] - }; + parent = &rt5682->dai_clks_hw[RT5682_DAI_WCLK_IDX]; + init.parent_hws = &parent; init.num_parents = 1; break; default: From 750dc2f622192c08664a15413bc9746d9cbc4361 Mon Sep 17 00:00:00 2001 From: Rob Clark Date: Wed, 17 Nov 2021 17:04:53 -0800 Subject: [PATCH 013/143] ASoC: rt5682s: Fix crash due to out of scope stack vars Move the declaration of temporary arrays to somewhere that won't go out of scope before the devm_clk_hw_register() call, lest we be at the whim of the compiler for whether those stack variables get overwritten. Fixes a crash seen with gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) Fixes: bdd229ab26be ("ASoC: rt5682s: Add driver for ALC5682I-VS codec") Signed-off-by: Rob Clark Reviewed-by: Stephen Boyd Link: https://lore.kernel.org/r/20211118010453.843286-2-robdclark@gmail.com Signed-off-by: Mark Brown --- sound/soc/codecs/rt5682s.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/sound/soc/codecs/rt5682s.c b/sound/soc/codecs/rt5682s.c index 470957fcad6b..d49a4f68566d 100644 --- a/sound/soc/codecs/rt5682s.c +++ b/sound/soc/codecs/rt5682s.c @@ -2693,6 +2693,8 @@ static int rt5682s_register_dai_clks(struct snd_soc_component *component) for (i = 0; i < RT5682S_DAI_NUM_CLKS; ++i) { struct clk_init_data init = { }; + struct clk_parent_data parent_data; + const struct clk_hw *parent; dai_clk_hw = &rt5682s->dai_clks_hw[i]; @@ -2700,17 +2702,17 @@ static int rt5682s_register_dai_clks(struct snd_soc_component *component) case RT5682S_DAI_WCLK_IDX: /* Make MCLK the parent of WCLK */ if (rt5682s->mclk) { - init.parent_data = &(struct clk_parent_data){ + parent_data = (struct clk_parent_data){ .fw_name = "mclk", }; + init.parent_data = &parent_data; init.num_parents = 1; } break; case RT5682S_DAI_BCLK_IDX: /* Make WCLK the parent of BCLK */ - init.parent_hws = &(const struct clk_hw *){ - &rt5682s->dai_clks_hw[RT5682S_DAI_WCLK_IDX] - }; + parent = &rt5682s->dai_clks_hw[RT5682S_DAI_WCLK_IDX]; + init.parent_hws = &parent; init.num_parents = 1; break; default: From 53689f7f91a2ab0079422e1d1b6e096cf68d58f4 Mon Sep 17 00:00:00 2001 From: Nicolas Frattaroli Date: Thu, 25 Nov 2021 09:48:59 +0100 Subject: [PATCH 014/143] ASoC: rockchip: i2s_tdm: Dup static DAI template Previously, the DAI template was used directly, which lead to fun bugs such as "why is my channels_max changing?" when one instantiated more than one i2s_tdm IP block in a device tree. This change makes it so that we instead duplicate the template struct, and then use that. Fixes: 081068fd6414 ("ASoC: rockchip: add support for i2s-tdm controller") Signed-off-by: Nicolas Frattaroli Link: https://lore.kernel.org/r/20211125084900.417102-1-frattaroli.nicolas@gmail.com Signed-off-by: Mark Brown --- sound/soc/rockchip/rockchip_i2s_tdm.c | 52 ++++++++++++++++----------- 1 file changed, 31 insertions(+), 21 deletions(-) diff --git a/sound/soc/rockchip/rockchip_i2s_tdm.c b/sound/soc/rockchip/rockchip_i2s_tdm.c index 17b9b287853a..5f9cb5c4c7f0 100644 --- a/sound/soc/rockchip/rockchip_i2s_tdm.c +++ b/sound/soc/rockchip/rockchip_i2s_tdm.c @@ -95,6 +95,7 @@ struct rk_i2s_tdm_dev { spinlock_t lock; /* xfer lock */ bool has_playback; bool has_capture; + struct snd_soc_dai_driver *dai; }; static int to_ch_num(unsigned int val) @@ -1310,19 +1311,14 @@ static const struct of_device_id rockchip_i2s_tdm_match[] = { {}, }; -static struct snd_soc_dai_driver i2s_tdm_dai = { +static const struct snd_soc_dai_driver i2s_tdm_dai = { .probe = rockchip_i2s_tdm_dai_probe, - .playback = { - .stream_name = "Playback", - }, - .capture = { - .stream_name = "Capture", - }, .ops = &rockchip_i2s_tdm_dai_ops, }; -static void rockchip_i2s_tdm_init_dai(struct rk_i2s_tdm_dev *i2s_tdm) +static int rockchip_i2s_tdm_init_dai(struct rk_i2s_tdm_dev *i2s_tdm) { + struct snd_soc_dai_driver *dai; struct property *dma_names; const char *dma_name; u64 formats = (SNDRV_PCM_FMTBIT_S8 | SNDRV_PCM_FMTBIT_S16_LE | @@ -1337,19 +1333,33 @@ static void rockchip_i2s_tdm_init_dai(struct rk_i2s_tdm_dev *i2s_tdm) i2s_tdm->has_capture = true; } + dai = devm_kmemdup(i2s_tdm->dev, &i2s_tdm_dai, + sizeof(*dai), GFP_KERNEL); + if (!dai) + return -ENOMEM; + if (i2s_tdm->has_playback) { - i2s_tdm_dai.playback.channels_min = 2; - i2s_tdm_dai.playback.channels_max = 8; - i2s_tdm_dai.playback.rates = SNDRV_PCM_RATE_8000_192000; - i2s_tdm_dai.playback.formats = formats; + dai->playback.stream_name = "Playback"; + dai->playback.channels_min = 2; + dai->playback.channels_max = 8; + dai->playback.rates = SNDRV_PCM_RATE_8000_192000; + dai->playback.formats = formats; } if (i2s_tdm->has_capture) { - i2s_tdm_dai.capture.channels_min = 2; - i2s_tdm_dai.capture.channels_max = 8; - i2s_tdm_dai.capture.rates = SNDRV_PCM_RATE_8000_192000; - i2s_tdm_dai.capture.formats = formats; + dai->capture.stream_name = "Capture"; + dai->capture.channels_min = 2; + dai->capture.channels_max = 8; + dai->capture.rates = SNDRV_PCM_RATE_8000_192000; + dai->capture.formats = formats; } + + if (i2s_tdm->clk_trcm != TRCM_TXRX) + dai->symmetric_rate = 1; + + i2s_tdm->dai = dai; + + return 0; } static int rockchip_i2s_tdm_path_check(struct rk_i2s_tdm_dev *i2s_tdm, @@ -1541,8 +1551,6 @@ static int rockchip_i2s_tdm_probe(struct platform_device *pdev) spin_lock_init(&i2s_tdm->lock); i2s_tdm->soc_data = (struct rk_i2s_soc_data *)of_id->data; - rockchip_i2s_tdm_init_dai(i2s_tdm); - i2s_tdm->frame_width = 64; i2s_tdm->clk_trcm = TRCM_TXRX; @@ -1555,8 +1563,10 @@ static int rockchip_i2s_tdm_probe(struct platform_device *pdev) } i2s_tdm->clk_trcm = TRCM_RX; } - if (i2s_tdm->clk_trcm != TRCM_TXRX) - i2s_tdm_dai.symmetric_rate = 1; + + ret = rockchip_i2s_tdm_init_dai(i2s_tdm); + if (ret) + return ret; i2s_tdm->grf = syscon_regmap_lookup_by_phandle(node, "rockchip,grf"); if (IS_ERR(i2s_tdm->grf)) @@ -1678,7 +1688,7 @@ static int rockchip_i2s_tdm_probe(struct platform_device *pdev) ret = devm_snd_soc_register_component(&pdev->dev, &rockchip_i2s_tdm_component, - &i2s_tdm_dai, 1); + i2s_tdm->dai, 1); if (ret) { dev_err(&pdev->dev, "Could not register DAI\n"); From d5c137f41352e8dd864522c417b45d8d1aebca68 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 30 Nov 2021 15:56:33 +0300 Subject: [PATCH 015/143] ASoC: amd: fix uninitialized variable in snd_acp6x_probe() The "index" is potentially used without being initialized on the error path. Fixes: fc329c1de498 ("ASoC: amd: add platform devices for acp6x pdm driver and dmic driver") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/20211130125633.GA24941@kili Signed-off-by: Mark Brown --- sound/soc/amd/yc/pci-acp6x.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sound/soc/amd/yc/pci-acp6x.c b/sound/soc/amd/yc/pci-acp6x.c index 957eeb6fb8e3..7e9a9a9d8ddd 100644 --- a/sound/soc/amd/yc/pci-acp6x.c +++ b/sound/soc/amd/yc/pci-acp6x.c @@ -146,10 +146,11 @@ static int snd_acp6x_probe(struct pci_dev *pci, { struct acp6x_dev_data *adata; struct platform_device_info pdevinfo[ACP6x_DEVS]; - int ret, index; + int index = 0; int val = 0x00; u32 addr; unsigned int irqflags; + int ret; irqflags = IRQF_SHARED; /* Yellow Carp device check */ From 046aede2f847676f93a2ea4f48b77909c51dba40 Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Tue, 30 Nov 2021 11:06:06 +0200 Subject: [PATCH 016/143] ASoC: SOF: Intel: Retry codec probing if it fails On the latest Lenovo Thinkstation laptops, we often experience the speaker failure after rebooting, check the dmesg, we could see: sof-audio-pci-intel-tgl 0000:00:1f.3: codec #0 probe error, ret: -5 The analogue codec on the machine is ALC287, then we designed a testcase to reboot and check the codec probing result repeatedly, we found the analogue codec probing always failed at least once within several minutes to several hours (roughly 1 reboot per min). This issue happens on all laptops of this Thinkstation model, but with legacy HDA driver, we couldn't reproduce this issue on those laptops. And so far, this issue is not reproduced on machines which don't belong to this model. We tried to make the hda_dsp_ctrl_init_chip() same as hda_intel_init_chip() which is the controller init routine in the legacy HDA driver, but it didn't help. We found when issue happens, the resp is -1, and if we let driver re-run send_cmd() and get_response(), it will get the correct response 10ec0287, then driver continues the rest work, finally boot to the desktop and all audio function work well. Here adding codec probing retries to 3 times, it could fix the issue on this Thinkstation model, and it doesn't bring impact to other machines. Reviewed-by: Bard Liao Reviewed-by: Pierre-Louis Bossart Signed-off-by: Hui Wang Signed-off-by: Kai Vehmanen Link: https://lore.kernel.org/r/20211130090606.529348-1-kai.vehmanen@linux.intel.com Signed-off-by: Mark Brown --- sound/soc/sof/intel/hda-codec.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/sound/soc/sof/intel/hda-codec.c b/sound/soc/sof/intel/hda-codec.c index 6744318de612..13cd96e6724a 100644 --- a/sound/soc/sof/intel/hda-codec.c +++ b/sound/soc/sof/intel/hda-codec.c @@ -22,6 +22,7 @@ #if IS_ENABLED(CONFIG_SND_SOC_SOF_HDA_AUDIO_CODEC) #define IDISP_VID_INTEL 0x80860000 +#define CODEC_PROBE_RETRIES 3 /* load the legacy HDA codec driver */ static int request_codec_module(struct hda_codec *codec) @@ -121,12 +122,15 @@ static int hda_codec_probe(struct snd_sof_dev *sdev, int address, u32 hda_cmd = (address << 28) | (AC_NODE_ROOT << 20) | (AC_VERB_PARAMETERS << 8) | AC_PAR_VENDOR_ID; u32 resp = -1; - int ret; + int ret, retry = 0; + + do { + mutex_lock(&hbus->core.cmd_mutex); + snd_hdac_bus_send_cmd(&hbus->core, hda_cmd); + snd_hdac_bus_get_response(&hbus->core, address, &resp); + mutex_unlock(&hbus->core.cmd_mutex); + } while (resp == -1 && retry++ < CODEC_PROBE_RETRIES); - mutex_lock(&hbus->core.cmd_mutex); - snd_hdac_bus_send_cmd(&hbus->core, hda_cmd); - snd_hdac_bus_get_response(&hbus->core, address, &resp); - mutex_unlock(&hbus->core.cmd_mutex); if (resp == -1) return -EIO; dev_dbg(sdev->dev, "HDA codec #%d probed OK: response: %x\n", From a2ca752055edd39be38b887e264d3de7ca2bc1bb Mon Sep 17 00:00:00 2001 From: Billy Tsai Date: Tue, 30 Nov 2021 17:22:12 +0800 Subject: [PATCH 017/143] hwmon: (pwm-fan) Ensure the fan going on in .probe() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Before commit 86585c61972f ("hwmon: (pwm-fan) stop using legacy PWM functions and some cleanups") pwm_apply_state() was called unconditionally in pwm_fan_probe(). In this commit this direct call was replaced by a call to __set_pwm(ct, MAX_PWM) which however is a noop if ctx->pwm_value already matches the value to set. After probe the fan is supposed to run at full speed, and the internal driver state suggests it does, but this isn't asserted and depending on bootloader and pwm low-level driver, the fan might just be off. So drop setting pwm_value to MAX_PWM to ensure the check in __set_pwm doesn't make it exit early and the fan goes on as intended. Cc: stable@vger.kernel.org Fixes: 86585c61972f ("hwmon: (pwm-fan) stop using legacy PWM functions and some cleanups") Signed-off-by: Billy Tsai Reviewed-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20211130092212.17783-1-billy_tsai@aspeedtech.com Signed-off-by: Guenter Roeck --- drivers/hwmon/pwm-fan.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/hwmon/pwm-fan.c b/drivers/hwmon/pwm-fan.c index 17518b4cab1b..f12b9a28a232 100644 --- a/drivers/hwmon/pwm-fan.c +++ b/drivers/hwmon/pwm-fan.c @@ -336,8 +336,6 @@ static int pwm_fan_probe(struct platform_device *pdev) return ret; } - ctx->pwm_value = MAX_PWM; - pwm_init_state(ctx->pwm, &ctx->pwm_state); /* From 7dba402807a85fa3723f4a27504813caf81cc9d7 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Tue, 30 Nov 2021 14:23:09 +0100 Subject: [PATCH 018/143] mmc: renesas_sdhi: initialize variable properly when tuning 'cmd_error' is not necessarily initialized on some error paths in mmc_send_tuning(). Initialize it. Fixes: 2c9017d0b5d3 ("mmc: renesas_sdhi: abort tuning when timeout detected") Reported-by: Dan Carpenter Signed-off-by: Wolfram Sang Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211130132309.18246-1-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson --- drivers/mmc/host/renesas_sdhi_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c index a4407f391f66..f5b2684ad805 100644 --- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -673,7 +673,7 @@ static int renesas_sdhi_execute_tuning(struct mmc_host *mmc, u32 opcode) /* Issue CMD19 twice for each tap */ for (i = 0; i < 2 * priv->tap_num; i++) { - int cmd_error; + int cmd_error = 0; /* Set sampling clock position */ sd_scc_write32(host, priv, SH_MOBILE_SDHI_SCC_TAPSET, i % priv->tap_num); From 4739d88ad8e1900f809f8a5c98f3c1b65bf76220 Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Tue, 30 Nov 2021 16:31:10 +0000 Subject: [PATCH 019/143] ASoC: qdsp6: q6routing: Fix return value from msm_routing_put_audio_mixer msm_routing_put_audio_mixer() can return incorrect value in various scenarios. scenario 1: amixer cset iface=MIXER,name='SLIMBUS_0_RX Audio Mixer MultiMedia1' 1 amixer cset iface=MIXER,name='SLIMBUS_0_RX Audio Mixer MultiMedia1' 0 return value is 0 instead of 1 eventhough value was changed scenario 2: amixer cset iface=MIXER,name='SLIMBUS_0_RX Audio Mixer MultiMedia1' 1 amixer cset iface=MIXER,name='SLIMBUS_0_RX Audio Mixer MultiMedia1' 1 return value is 1 instead of 0 eventhough the value was not changed scenario 3: amixer cset iface=MIXER,name='SLIMBUS_0_RX Audio Mixer MultiMedia1' 0 return value is 1 instead of 0 eventhough the value was not changed Fix this by adding checks, so that change notifications are sent correctly. Fixes: e3a33673e845 ("ASoC: qdsp6: q6routing: Add q6routing driver") Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20211130163110.5628-1-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown --- sound/soc/qcom/qdsp6/q6routing.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/sound/soc/qcom/qdsp6/q6routing.c b/sound/soc/qcom/qdsp6/q6routing.c index cd74681e811e..928fd23e2c27 100644 --- a/sound/soc/qcom/qdsp6/q6routing.c +++ b/sound/soc/qcom/qdsp6/q6routing.c @@ -498,14 +498,16 @@ static int msm_routing_put_audio_mixer(struct snd_kcontrol *kcontrol, struct session_data *session = &data->sessions[session_id]; if (ucontrol->value.integer.value[0]) { + if (session->port_id == be_id) + return 0; + session->port_id = be_id; snd_soc_dapm_mixer_update_power(dapm, kcontrol, 1, update); } else { - if (session->port_id == be_id) { - session->port_id = -1; + if (session->port_id == -1 || session->port_id != be_id) return 0; - } + session->port_id = -1; snd_soc_dapm_mixer_update_power(dapm, kcontrol, 0, update); } From 23ba28616d3063bd4c4953598ed5e439ca891101 Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Tue, 30 Nov 2021 16:05:04 +0000 Subject: [PATCH 020/143] ASoC: codecs: wcd934x: handle channel mappping list correctly Currently each channel is added as list to dai channel list, however there is danger of adding same channel to multiple dai channel list which endups corrupting the other list where its already added. This patch ensures that the channel is actually free before adding to the dai channel list and also ensures that the channel is on the list before deleting it. This check was missing previously, and we did not hit this issue as we were testing very simple usecases with sequence of amixer commands. Fixes: a70d9245759a ("ASoC: wcd934x: add capture dapm widgets") Fixes: dd9eb19b5673 ("ASoC: wcd934x: add playback dapm widgets") Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20211130160507.22180-2-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown --- sound/soc/codecs/wcd934x.c | 119 +++++++++++++++++++++++++++---------- 1 file changed, 88 insertions(+), 31 deletions(-) diff --git a/sound/soc/codecs/wcd934x.c b/sound/soc/codecs/wcd934x.c index 4f568abd59e2..eb4e2f2a24ae 100644 --- a/sound/soc/codecs/wcd934x.c +++ b/sound/soc/codecs/wcd934x.c @@ -3326,6 +3326,31 @@ static int slim_rx_mux_get(struct snd_kcontrol *kc, return 0; } +static int slim_rx_mux_to_dai_id(int mux) +{ + int aif_id; + + switch (mux) { + case 1: + aif_id = AIF1_PB; + break; + case 2: + aif_id = AIF2_PB; + break; + case 3: + aif_id = AIF3_PB; + break; + case 4: + aif_id = AIF4_PB; + break; + default: + aif_id = -1; + break; + } + + return aif_id; +} + static int slim_rx_mux_put(struct snd_kcontrol *kc, struct snd_ctl_elem_value *ucontrol) { @@ -3333,43 +3358,59 @@ static int slim_rx_mux_put(struct snd_kcontrol *kc, struct wcd934x_codec *wcd = dev_get_drvdata(w->dapm->dev); struct soc_enum *e = (struct soc_enum *)kc->private_value; struct snd_soc_dapm_update *update = NULL; + struct wcd934x_slim_ch *ch, *c; u32 port_id = w->shift; + bool found = false; + int mux_idx; + int prev_mux_idx = wcd->rx_port_value[port_id]; + int aif_id; - if (wcd->rx_port_value[port_id] == ucontrol->value.enumerated.item[0]) + mux_idx = ucontrol->value.enumerated.item[0]; + + if (mux_idx == prev_mux_idx) return 0; - wcd->rx_port_value[port_id] = ucontrol->value.enumerated.item[0]; - - switch (wcd->rx_port_value[port_id]) { + switch(mux_idx) { case 0: - list_del_init(&wcd->rx_chs[port_id].list); + aif_id = slim_rx_mux_to_dai_id(prev_mux_idx); + if (aif_id < 0) + return 0; + + list_for_each_entry_safe(ch, c, &wcd->dai[aif_id].slim_ch_list, list) { + if (ch->port == port_id + WCD934X_RX_START) { + found = true; + list_del_init(&ch->list); + break; + } + } + if (!found) + return 0; + break; - case 1: - list_add_tail(&wcd->rx_chs[port_id].list, - &wcd->dai[AIF1_PB].slim_ch_list); - break; - case 2: - list_add_tail(&wcd->rx_chs[port_id].list, - &wcd->dai[AIF2_PB].slim_ch_list); - break; - case 3: - list_add_tail(&wcd->rx_chs[port_id].list, - &wcd->dai[AIF3_PB].slim_ch_list); - break; - case 4: - list_add_tail(&wcd->rx_chs[port_id].list, - &wcd->dai[AIF4_PB].slim_ch_list); + case 1 ... 4: + aif_id = slim_rx_mux_to_dai_id(mux_idx); + if (aif_id < 0) + return 0; + + if (list_empty(&wcd->rx_chs[port_id].list)) { + list_add_tail(&wcd->rx_chs[port_id].list, + &wcd->dai[aif_id].slim_ch_list); + } else { + dev_err(wcd->dev ,"SLIM_RX%d PORT is busy\n", port_id); + return 0; + } break; + default: - dev_err(wcd->dev, "Unknown AIF %d\n", - wcd->rx_port_value[port_id]); + dev_err(wcd->dev, "Unknown AIF %d\n", mux_idx); goto err; } + wcd->rx_port_value[port_id] = mux_idx; snd_soc_dapm_mux_update_power(w->dapm, kc, wcd->rx_port_value[port_id], e, update); - return 0; + return 1; err: return -EINVAL; } @@ -3815,6 +3856,7 @@ static int slim_tx_mixer_put(struct snd_kcontrol *kc, struct soc_mixer_control *mixer = (struct soc_mixer_control *)kc->private_value; int enable = ucontrol->value.integer.value[0]; + struct wcd934x_slim_ch *ch, *c; int dai_id = widget->shift; int port_id = mixer->shift; @@ -3822,17 +3864,32 @@ static int slim_tx_mixer_put(struct snd_kcontrol *kc, if (enable == wcd->tx_port_value[port_id]) return 0; + if (enable) { + if (list_empty(&wcd->tx_chs[port_id].list)) { + list_add_tail(&wcd->tx_chs[port_id].list, + &wcd->dai[dai_id].slim_ch_list); + } else { + dev_err(wcd->dev ,"SLIM_TX%d PORT is busy\n", port_id); + return 0; + } + } else { + bool found = false; + + list_for_each_entry_safe(ch, c, &wcd->dai[dai_id].slim_ch_list, list) { + if (ch->port == port_id) { + found = true; + list_del_init(&wcd->tx_chs[port_id].list); + break; + } + } + if (!found) + return 0; + } + wcd->tx_port_value[port_id] = enable; - - if (enable) - list_add_tail(&wcd->tx_chs[port_id].list, - &wcd->dai[dai_id].slim_ch_list); - else - list_del_init(&wcd->tx_chs[port_id].list); - snd_soc_dapm_mixer_update_power(widget->dapm, kc, enable, update); - return 0; + return 1; } static const struct snd_kcontrol_new aif1_slim_cap_mixer[] = { From d9be0ff4796d1b6f5ee391c1b7e3653a43cedfab Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Tue, 30 Nov 2021 16:05:06 +0000 Subject: [PATCH 021/143] ASoC: codecs: wcd934x: return correct value from mixer put wcd934x_compander_set() currently returns zero eventhough it changes the value. Fix this, so that change notifications are sent correctly. Fixes: 1cde8b822332 ("ASoC: wcd934x: add basic controls") Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20211130160507.22180-4-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown --- sound/soc/codecs/wcd934x.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/sound/soc/codecs/wcd934x.c b/sound/soc/codecs/wcd934x.c index eb4e2f2a24ae..e63c6b723d76 100644 --- a/sound/soc/codecs/wcd934x.c +++ b/sound/soc/codecs/wcd934x.c @@ -3256,6 +3256,9 @@ static int wcd934x_compander_set(struct snd_kcontrol *kc, int value = ucontrol->value.integer.value[0]; int sel; + if (wcd->comp_enabled[comp] == value) + return 0; + wcd->comp_enabled[comp] = value; sel = value ? WCD934X_HPH_GAIN_SRC_SEL_COMPANDER : WCD934X_HPH_GAIN_SRC_SEL_REGISTER; @@ -3279,10 +3282,10 @@ static int wcd934x_compander_set(struct snd_kcontrol *kc, case COMPANDER_8: break; default: - break; + return 0; } - return 0; + return 1; } static int wcd934x_rx_hph_mode_get(struct snd_kcontrol *kc, From 3fc27e9a1f619b50700f020e6cd270c1b74755f0 Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Tue, 30 Nov 2021 16:05:07 +0000 Subject: [PATCH 022/143] ASoC: codecs: wsa881x: fix return values from kcontrol put wsa881x_set_port() and wsa881x_put_pa_gain() currently returns zero eventhough it changes the value. Fix this, so that change notifications are sent correctly. Fixes: a0aab9e1404a ("ASoC: codecs: add wsa881x amplifier support") Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20211130160507.22180-5-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown --- sound/soc/codecs/wsa881x.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/sound/soc/codecs/wsa881x.c b/sound/soc/codecs/wsa881x.c index 2da4a5fa7a18..564b78f3cdd0 100644 --- a/sound/soc/codecs/wsa881x.c +++ b/sound/soc/codecs/wsa881x.c @@ -772,7 +772,8 @@ static int wsa881x_put_pa_gain(struct snd_kcontrol *kc, usleep_range(1000, 1010); } - return 0; + + return 1; } static int wsa881x_get_port(struct snd_kcontrol *kcontrol, @@ -816,15 +817,22 @@ static int wsa881x_set_port(struct snd_kcontrol *kcontrol, (struct soc_mixer_control *)kcontrol->private_value; int portidx = mixer->reg; - if (ucontrol->value.integer.value[0]) + if (ucontrol->value.integer.value[0]) { + if (data->port_enable[portidx]) + return 0; + data->port_enable[portidx] = true; - else + } else { + if (!data->port_enable[portidx]) + return 0; + data->port_enable[portidx] = false; + } if (portidx == WSA881X_PORT_BOOST) /* Boost Switch */ wsa881x_boost_ctrl(comp, data->port_enable[portidx]); - return 0; + return 1; } static const char * const smart_boost_lvl_text[] = { From cc5faf26decfcfd9edbafee9b7204ac9a9514c12 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 25 Nov 2021 16:21:54 +0100 Subject: [PATCH 023/143] dt-bindings: iio: adc: exynos-adc: Fix node name in example "make dt_binding_check": Documentation/devicetree/bindings/iio/adc/samsung,exynos-adc.example.dt.yaml: ncp15wb473: $nodename:0: 'ncp15wb473' does not match '^thermistor(.*)?$' From schema: Documentation/devicetree/bindings/hwmon/ntc-thermistor.yaml Signed-off-by: Geert Uytterhoeven Reviewed-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20211125152154.162780-1-geert@linux-m68k.org Signed-off-by: Rob Herring --- .../devicetree/bindings/iio/adc/samsung,exynos-adc.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/iio/adc/samsung,exynos-adc.yaml b/Documentation/devicetree/bindings/iio/adc/samsung,exynos-adc.yaml index c65921e66dc1..81c87295912c 100644 --- a/Documentation/devicetree/bindings/iio/adc/samsung,exynos-adc.yaml +++ b/Documentation/devicetree/bindings/iio/adc/samsung,exynos-adc.yaml @@ -136,7 +136,7 @@ examples: samsung,syscon-phandle = <&pmu_system_controller>; /* NTC thermistor is a hwmon device */ - ncp15wb473 { + thermistor { compatible = "murata,ncp15wb473"; pullup-uv = <1800000>; pullup-ohm = <47000>; From 39bd54d43b3f8b3c7b3a75f5d868d8bb858860e7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Marek=20Beh=C3=BAn?= Date: Thu, 25 Nov 2021 17:01:48 +0100 Subject: [PATCH 024/143] Revert "PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 239edf686c14a9ff926dec2f350289ed7adfefe2. 239edf686c14 ("PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge") added support for the Type 1 Expansion ROM BAR at config offset 0x38, based on the register being listed in the Marvell Armada A3720 spec. But the spec doesn't document it at all for RC mode, and there is no ROM in the SOC, so remove this emulation for now. The PCI bridge which represents aardvark's PCIe Root Port has an Expansion ROM Base Address register at offset 0x30, but its meaning is different than PCI's Expansion ROM BAR register, although the layout is the same. (This is why we thought it does the same thing.) First: there is no ROM (or part of BootROM) in the A3720 SOC dedicated for PCIe Root Port (or controller in RC mode) containing executable code that would initialize the Root Port, suitable for execution in bootloader (this is how Expansion ROM BAR is used on x86). Second: in A3720 spec the register (address 0xD0070030) is not documented at all for Root Complex mode, but similar to other BAR registers, it has an "entangled partner" in register 0xD0075920, which does address translation for the BAR in 0xD0070030: - the BAR register sets the address from the view of PCIe bus - the translation register sets the address from the view of the CPU The other BAR registers also have this entangled partner, and they can be used to: - in RC mode: address-checking on the receive side of the RC (they can define address ranges for memory accesses from remote Endpoints to the RC) - in Endpoint mode: allow the remote CPU to access memory on A3720 The Expansion ROM BAR has only the Endpoint part documented, but from the similarities we think that it can also be used in RC mode in that way. So either Expansion ROM BAR has different meaning (if the hypothesis above is true), or we don't know it's meaning (since it is not documented for RC mode). Remove the register from the emulated bridge accessing functions. [bhelgaas: summarize reason for removal (first paragraph)] Fixes: 239edf686c14 ("PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge") Link: https://lore.kernel.org/r/20211125160148.26029-3-kabel@kernel.org Signed-off-by: Marek Behún Signed-off-by: Bjorn Helgaas Reviewed-by: Pali Rohár --- drivers/pci/controller/pci-aardvark.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c index c5300d49807a..c3b725afa11f 100644 --- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -32,7 +32,6 @@ #define PCIE_CORE_DEV_ID_REG 0x0 #define PCIE_CORE_CMD_STATUS_REG 0x4 #define PCIE_CORE_DEV_REV_REG 0x8 -#define PCIE_CORE_EXP_ROM_BAR_REG 0x30 #define PCIE_CORE_PCIEXP_CAP 0xc0 #define PCIE_CORE_ERR_CAPCTL_REG 0x118 #define PCIE_CORE_ERR_CAPCTL_ECRC_CHK_TX BIT(5) @@ -774,10 +773,6 @@ advk_pci_bridge_emul_base_conf_read(struct pci_bridge_emul *bridge, *value = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG); return PCI_BRIDGE_EMUL_HANDLED; - case PCI_ROM_ADDRESS1: - *value = advk_readl(pcie, PCIE_CORE_EXP_ROM_BAR_REG); - return PCI_BRIDGE_EMUL_HANDLED; - case PCI_INTERRUPT_LINE: { /* * From the whole 32bit register we support reading from HW only @@ -810,10 +805,6 @@ advk_pci_bridge_emul_base_conf_write(struct pci_bridge_emul *bridge, advk_writel(pcie, new, PCIE_CORE_CMD_STATUS_REG); break; - case PCI_ROM_ADDRESS1: - advk_writel(pcie, new, PCIE_CORE_EXP_ROM_BAR_REG); - break; - case PCI_INTERRUPT_LINE: if (mask & (PCI_BRIDGE_CTL_BUS_RESET << 16)) { u32 val = advk_readl(pcie, PCIE_CORE_CTRL1_REG); From 9d2479c960875ca1239bcb899f386970c13d9cfe Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 1 Dec 2021 08:36:04 +0100 Subject: [PATCH 025/143] ALSA: pcm: oss: Fix negative period/buffer sizes The period size calculation in OSS layer may receive a negative value as an error, but the code there assumes only the positive values and handle them with size_t. Due to that, a too big value may be passed to the lower layers. This patch changes the code to handle with ssize_t and adds the proper error checks appropriately. Reported-by: syzbot+bb348e9f9a954d42746f@syzkaller.appspotmail.com Reported-by: Bixuan Cui Cc: Link: https://lore.kernel.org/r/1638270978-42412-1-git-send-email-cuibixuan@linux.alibaba.com Link: https://lore.kernel.org/r/20211201073606.11660-2-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/core/oss/pcm_oss.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c index 82a818734a5f..bec7590bc84b 100644 --- a/sound/core/oss/pcm_oss.c +++ b/sound/core/oss/pcm_oss.c @@ -147,7 +147,7 @@ snd_pcm_hw_param_value_min(const struct snd_pcm_hw_params *params, * * Return the maximum value for field PAR. */ -static unsigned int +static int snd_pcm_hw_param_value_max(const struct snd_pcm_hw_params *params, snd_pcm_hw_param_t var, int *dir) { @@ -682,18 +682,24 @@ static int snd_pcm_oss_period_size(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *oss_params, struct snd_pcm_hw_params *slave_params) { - size_t s; - size_t oss_buffer_size, oss_period_size, oss_periods; - size_t min_period_size, max_period_size; + ssize_t s; + ssize_t oss_buffer_size; + ssize_t oss_period_size, oss_periods; + ssize_t min_period_size, max_period_size; struct snd_pcm_runtime *runtime = substream->runtime; size_t oss_frame_size; oss_frame_size = snd_pcm_format_physical_width(params_format(oss_params)) * params_channels(oss_params) / 8; + oss_buffer_size = snd_pcm_hw_param_value_max(slave_params, + SNDRV_PCM_HW_PARAM_BUFFER_SIZE, + NULL); + if (oss_buffer_size <= 0) + return -EINVAL; oss_buffer_size = snd_pcm_plug_client_size(substream, - snd_pcm_hw_param_value_max(slave_params, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, NULL)) * oss_frame_size; - if (!oss_buffer_size) + oss_buffer_size * oss_frame_size); + if (oss_buffer_size <= 0) return -EINVAL; oss_buffer_size = rounddown_pow_of_two(oss_buffer_size); if (atomic_read(&substream->mmap_count)) { @@ -730,7 +736,7 @@ static int snd_pcm_oss_period_size(struct snd_pcm_substream *substream, min_period_size = snd_pcm_plug_client_size(substream, snd_pcm_hw_param_value_min(slave_params, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, NULL)); - if (min_period_size) { + if (min_period_size > 0) { min_period_size *= oss_frame_size; min_period_size = roundup_pow_of_two(min_period_size); if (oss_period_size < min_period_size) @@ -739,7 +745,7 @@ static int snd_pcm_oss_period_size(struct snd_pcm_substream *substream, max_period_size = snd_pcm_plug_client_size(substream, snd_pcm_hw_param_value_max(slave_params, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, NULL)); - if (max_period_size) { + if (max_period_size > 0) { max_period_size *= oss_frame_size; max_period_size = rounddown_pow_of_two(max_period_size); if (oss_period_size > max_period_size) @@ -752,7 +758,7 @@ static int snd_pcm_oss_period_size(struct snd_pcm_substream *substream, oss_periods = substream->oss.setup.periods; s = snd_pcm_hw_param_value_max(slave_params, SNDRV_PCM_HW_PARAM_PERIODS, NULL); - if (runtime->oss.maxfrags && s > runtime->oss.maxfrags) + if (s > 0 && runtime->oss.maxfrags && s > runtime->oss.maxfrags) s = runtime->oss.maxfrags; if (oss_periods > s) oss_periods = s; From 8839c8c0f77ab8fc0463f4ab8b37fca3f70677c2 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 1 Dec 2021 08:36:05 +0100 Subject: [PATCH 026/143] ALSA: pcm: oss: Limit the period size to 16MB Set the practical limit to the period size (the fragment shift in OSS) instead of a full 31bit; a too large value could lead to the exhaust of memory as we allocate temporary buffers of the period size, too. As of this patch, we set to 16MB limit, which should cover all use cases. Reported-by: syzbot+bb348e9f9a954d42746f@syzkaller.appspotmail.com Reported-by: Bixuan Cui Cc: Link: https://lore.kernel.org/r/1638270978-42412-1-git-send-email-cuibixuan@linux.alibaba.com Link: https://lore.kernel.org/r/20211201073606.11660-3-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/core/oss/pcm_oss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c index bec7590bc84b..89c4910daf02 100644 --- a/sound/core/oss/pcm_oss.c +++ b/sound/core/oss/pcm_oss.c @@ -1962,7 +1962,7 @@ static int snd_pcm_oss_set_fragment1(struct snd_pcm_substream *substream, unsign if (runtime->oss.subdivision || runtime->oss.fragshift) return -EINVAL; fragshift = val & 0xffff; - if (fragshift >= 31) + if (fragshift >= 25) /* should be large enough */ return -EINVAL; runtime->oss.fragshift = fragshift; runtime->oss.maxfrags = (val >> 16) & 0xffff; From 6665bb30a6b1a4a853d52557c05482ee50e71391 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 1 Dec 2021 08:36:06 +0100 Subject: [PATCH 027/143] ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*() A couple of calls in snd_pcm_oss_change_params_locked() ignore the possible errors. Catch those errors and abort the operation for avoiding further problems. Cc: Link: https://lore.kernel.org/r/20211201073606.11660-4-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/core/oss/pcm_oss.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c index 89c4910daf02..20a0a4771b9a 100644 --- a/sound/core/oss/pcm_oss.c +++ b/sound/core/oss/pcm_oss.c @@ -884,8 +884,15 @@ static int snd_pcm_oss_change_params_locked(struct snd_pcm_substream *substream) err = -EINVAL; goto failure; } - choose_rate(substream, sparams, runtime->oss.rate); - snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_CHANNELS, runtime->oss.channels, NULL); + + err = choose_rate(substream, sparams, runtime->oss.rate); + if (err < 0) + goto failure; + err = snd_pcm_hw_param_near(substream, sparams, + SNDRV_PCM_HW_PARAM_CHANNELS, + runtime->oss.channels, NULL); + if (err < 0) + goto failure; format = snd_pcm_oss_format_from(runtime->oss.format); From b6409dd6bdc03aa178bbff0d80db2a30d29b63ac Mon Sep 17 00:00:00 2001 From: Alan Young Date: Thu, 2 Dec 2021 15:06:07 +0000 Subject: [PATCH 028/143] ALSA: ctl: Fix copy of updated id with element read/write When control_compat.c:copy_ctl_value_to_user() is used, by ctl_elem_read_user() & ctl_elem_write_user(), it must also copy back the snd_ctl_elem_id value that may have been updated (filled in) by the call to snd_ctl_elem_read/snd_ctl_elem_write(). This matches the functionality provided by snd_ctl_elem_read_user() and snd_ctl_elem_write_user(), via snd_ctl_build_ioff(). Without this, and without making additional calls to snd_ctl_info() which are unnecessary when using the non-compat calls, a userspace application will not know the numid value for the element and consequently will not be able to use the poll/read interface on the control file to determine which elements have updates. Signed-off-by: Alan Young Cc: Link: https://lore.kernel.org/r/20211202150607.543389-1-consult.awy@gmail.com Signed-off-by: Takashi Iwai --- sound/core/control_compat.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/core/control_compat.c b/sound/core/control_compat.c index 470dabc60aa0..edff063e088d 100644 --- a/sound/core/control_compat.c +++ b/sound/core/control_compat.c @@ -264,6 +264,7 @@ static int copy_ctl_value_to_user(void __user *userdata, struct snd_ctl_elem_value *data, int type, int count) { + struct snd_ctl_elem_value32 __user *data32 = userdata; int i, size; if (type == SNDRV_CTL_ELEM_TYPE_BOOLEAN || @@ -280,6 +281,8 @@ static int copy_ctl_value_to_user(void __user *userdata, if (copy_to_user(valuep, data->value.bytes.data, size)) return -EFAULT; } + if (copy_to_user(&data32->id, &data->id, sizeof(data32->id))) + return -EFAULT; return 0; } From 9a61f813fcc8d56d85fcf9ca6119cf2b5ac91dd5 Mon Sep 17 00:00:00 2001 From: Dmitry Baryshkov Date: Tue, 16 Nov 2021 02:34:07 +0300 Subject: [PATCH 029/143] clk: qcom: regmap-mux: fix parent clock lookup The function mux_get_parent() uses qcom_find_src_index() to find the parent clock index, which is incorrect: qcom_find_src_index() uses src enum for the lookup, while mux_get_parent() should use cfg field (which corresponds to the register value). Add qcom_find_cfg_index() function doing this kind of lookup and use it for mux parent lookup. Fixes: df964016490b ("clk: qcom: add parent map for regmap mux") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20211115233407.1046179-1-dmitry.baryshkov@linaro.org Signed-off-by: Stephen Boyd --- drivers/clk/qcom/clk-regmap-mux.c | 2 +- drivers/clk/qcom/common.c | 12 ++++++++++++ drivers/clk/qcom/common.h | 2 ++ 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/clk/qcom/clk-regmap-mux.c b/drivers/clk/qcom/clk-regmap-mux.c index b2d00b451963..45d9cca28064 100644 --- a/drivers/clk/qcom/clk-regmap-mux.c +++ b/drivers/clk/qcom/clk-regmap-mux.c @@ -28,7 +28,7 @@ static u8 mux_get_parent(struct clk_hw *hw) val &= mask; if (mux->parent_map) - return qcom_find_src_index(hw, mux->parent_map, val); + return qcom_find_cfg_index(hw, mux->parent_map, val); return val; } diff --git a/drivers/clk/qcom/common.c b/drivers/clk/qcom/common.c index 0932e019dd12..75f09e6e057e 100644 --- a/drivers/clk/qcom/common.c +++ b/drivers/clk/qcom/common.c @@ -69,6 +69,18 @@ int qcom_find_src_index(struct clk_hw *hw, const struct parent_map *map, u8 src) } EXPORT_SYMBOL_GPL(qcom_find_src_index); +int qcom_find_cfg_index(struct clk_hw *hw, const struct parent_map *map, u8 cfg) +{ + int i, num_parents = clk_hw_get_num_parents(hw); + + for (i = 0; i < num_parents; i++) + if (cfg == map[i].cfg) + return i; + + return -ENOENT; +} +EXPORT_SYMBOL_GPL(qcom_find_cfg_index); + struct regmap * qcom_cc_map(struct platform_device *pdev, const struct qcom_cc_desc *desc) { diff --git a/drivers/clk/qcom/common.h b/drivers/clk/qcom/common.h index bb39a7e106d8..9c8f7b798d9f 100644 --- a/drivers/clk/qcom/common.h +++ b/drivers/clk/qcom/common.h @@ -49,6 +49,8 @@ extern void qcom_pll_set_fsm_mode(struct regmap *m, u32 reg, u8 bias_count, u8 lock_count); extern int qcom_find_src_index(struct clk_hw *hw, const struct parent_map *map, u8 src); +extern int qcom_find_cfg_index(struct clk_hw *hw, const struct parent_map *map, + u8 cfg); extern int qcom_cc_register_board_clk(struct device *dev, const char *path, const char *name, unsigned long rate); From a1f0019c342bd83240b05be68c9888549dde7935 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Tue, 23 Nov 2021 08:25:08 -0800 Subject: [PATCH 030/143] clk: qcom: clk-alpha-pll: Don't reconfigure running Trion In the event that the bootloader has configured the Trion PLL as source for the display clocks, e.g. for the continuous splashscreen, then there will also be RCGs that are clocked by this instance. Reconfiguring, and in particular disabling the output of, the PLL will cause issues for these downstream RCGs and has been shown to prevent them from being re-parented. Follow downstream and skip configuration if it's determined that the PLL is already running. Fixes: 59128c20a6a9 ("clk: qcom: clk-alpha-pll: Add support for controlling Lucid PLLs") Signed-off-by: Bjorn Andersson Reviewed-by: Robert Foss Reviewed-by: Vinod Koul Link: https://lore.kernel.org/r/20211123162508.153711-1-bjorn.andersson@linaro.org Signed-off-by: Stephen Boyd --- drivers/clk/qcom/clk-alpha-pll.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/clk/qcom/clk-alpha-pll.c b/drivers/clk/qcom/clk-alpha-pll.c index eaedcceb766f..8f65b9bdafce 100644 --- a/drivers/clk/qcom/clk-alpha-pll.c +++ b/drivers/clk/qcom/clk-alpha-pll.c @@ -1429,6 +1429,15 @@ EXPORT_SYMBOL_GPL(clk_alpha_pll_postdiv_fabia_ops); void clk_trion_pll_configure(struct clk_alpha_pll *pll, struct regmap *regmap, const struct alpha_pll_config *config) { + /* + * If the bootloader left the PLL enabled it's likely that there are + * RCGs that will lock up if we disable the PLL below. + */ + if (trion_pll_is_enabled(pll, regmap)) { + pr_debug("Trion PLL is already enabled, skipping configuration\n"); + return; + } + clk_alpha_pll_write_config(regmap, PLL_L_VAL(pll), config->l); regmap_write(regmap, PLL_CAL_L_VAL(pll), TRION_PLL_CAL_VAL); clk_alpha_pll_write_config(regmap, PLL_ALPHA_VAL(pll), config->alpha); From eee377b8f44e7ac4f76bbf2440e5cbbc1d25c25f Mon Sep 17 00:00:00 2001 From: Miles Chen Date: Sun, 5 Sep 2021 07:54:18 +0800 Subject: [PATCH 031/143] clk: imx: use module_platform_driver Replace builtin_platform_driver_probe with module_platform_driver_probe because CONFIG_CLK_IMX8QXP can be set to =m (kernel module). Fixes: e0d0d4d86c766 ("clk: imx8qxp: Support building i.MX8QXP clock driver as module") Cc: Fabio Estevam Cc: Stephen Boyd Signed-off-by: Miles Chen Link: https://lore.kernel.org/r/20210904235418.2442-1-miles.chen@mediatek.com Reviewed-by: Fabio Estevam Reviewed-by: Stephen Boyd Signed-off-by: Stephen Boyd --- drivers/clk/imx/clk-imx8qxp-lpcg.c | 2 +- drivers/clk/imx/clk-imx8qxp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/clk/imx/clk-imx8qxp-lpcg.c b/drivers/clk/imx/clk-imx8qxp-lpcg.c index d3e905cf867d..b23758083ce5 100644 --- a/drivers/clk/imx/clk-imx8qxp-lpcg.c +++ b/drivers/clk/imx/clk-imx8qxp-lpcg.c @@ -370,7 +370,7 @@ static struct platform_driver imx8qxp_lpcg_clk_driver = { .probe = imx8qxp_lpcg_clk_probe, }; -builtin_platform_driver(imx8qxp_lpcg_clk_driver); +module_platform_driver(imx8qxp_lpcg_clk_driver); MODULE_AUTHOR("Aisheng Dong "); MODULE_DESCRIPTION("NXP i.MX8QXP LPCG clock driver"); diff --git a/drivers/clk/imx/clk-imx8qxp.c b/drivers/clk/imx/clk-imx8qxp.c index c53a688d8ccc..40a2efb1329b 100644 --- a/drivers/clk/imx/clk-imx8qxp.c +++ b/drivers/clk/imx/clk-imx8qxp.c @@ -308,7 +308,7 @@ static struct platform_driver imx8qxp_clk_driver = { }, .probe = imx8qxp_clk_probe, }; -builtin_platform_driver(imx8qxp_clk_driver); +module_platform_driver(imx8qxp_clk_driver); MODULE_AUTHOR("Aisheng Dong "); MODULE_DESCRIPTION("NXP i.MX8QXP clock driver"); From 653926205741add87a6cf452e21950eebc6ac10b Mon Sep 17 00:00:00 2001 From: Igor Pylypiv Date: Tue, 30 Nov 2021 20:16:27 -0800 Subject: [PATCH 032/143] scsi: pm80xx: Do not call scsi_remove_host() in pm8001_alloc() Calling scsi_remove_host() before scsi_add_host() results in a crash: BUG: kernel NULL pointer dereference, address: 0000000000000108 RIP: 0010:device_del+0x63/0x440 Call Trace: device_unregister+0x17/0x60 scsi_remove_host+0xee/0x2a0 pm8001_pci_probe+0x6ef/0x1b90 [pm80xx] local_pci_probe+0x3f/0x90 We cannot call scsi_remove_host() in pm8001_alloc() because scsi_add_host() has not been called yet at that point in time. Function call tree: pm8001_pci_probe() | `- pm8001_pci_alloc() | | | `- pm8001_alloc() | | | `- scsi_remove_host() | `- scsi_add_host() Link: https://lore.kernel.org/r/20211201041627.1592487-1-ipylypiv@google.com Fixes: 05c6c029a44d ("scsi: pm80xx: Increase number of supported queues") Reviewed-by: Vishakha Channapattan Acked-by: Jack Wang Signed-off-by: Igor Pylypiv Signed-off-by: Martin K. Petersen --- drivers/scsi/pm8001/pm8001_init.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c index bed8cc125544..fbfeb0b046dd 100644 --- a/drivers/scsi/pm8001/pm8001_init.c +++ b/drivers/scsi/pm8001/pm8001_init.c @@ -282,12 +282,12 @@ static int pm8001_alloc(struct pm8001_hba_info *pm8001_ha, if (rc) { pm8001_dbg(pm8001_ha, FAIL, "pm8001_setup_irq failed [ret: %d]\n", rc); - goto err_out_shost; + goto err_out; } /* Request Interrupt */ rc = pm8001_request_irq(pm8001_ha); if (rc) - goto err_out_shost; + goto err_out; count = pm8001_ha->max_q_num; /* Queues are chosen based on the number of cores/msix availability */ @@ -423,8 +423,6 @@ static int pm8001_alloc(struct pm8001_hba_info *pm8001_ha, pm8001_tag_init(pm8001_ha); return 0; -err_out_shost: - scsi_remove_host(pm8001_ha->shost); err_out_nodev: for (i = 0; i < pm8001_ha->max_memcnt; i++) { if (pm8001_ha->memoryMap.region[i].virt_ptr != NULL) { From e485382ea7eb4b81f4b59073cd831084820497de Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Thu, 2 Dec 2021 11:29:08 +0100 Subject: [PATCH 033/143] drm/ttm: fix ttm_bo_swapout MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 7120a447c7fe ("drm/ttm: Double check mem_type of BO while eviction") made ttm_bo_evict_swapout_allowable() function actually check the placement, but we always used a dummy placement in ttm_bo_swapout. Fix this by using the real placement instead. Signed-off-by: Christian König Fixes: 7120a447c7fe ("drm/ttm: Double check mem_type of BO while eviction") Reviewed-by: Pan, Xinhui Link: https://patchwork.freedesktop.org/patch/msgid/20211202103828.44573-1-christian.koenig@amd.com --- drivers/gpu/drm/ttm/ttm_bo.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 739f11c0109c..047adc42d9a0 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1103,7 +1103,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, * as an indication that we're about to swap out. */ memset(&place, 0, sizeof(place)); - place.mem_type = TTM_PL_SYSTEM; + place.mem_type = bo->resource->mem_type; if (!ttm_bo_evict_swapout_allowable(bo, ctx, &place, &locked, NULL)) return -EBUSY; @@ -1135,6 +1135,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, struct ttm_place hop; memset(&hop, 0, sizeof(hop)); + place.mem_type = TTM_PL_SYSTEM; ret = ttm_resource_alloc(bo, &place, &evict_mem); if (unlikely(ret)) goto out; From 619764cc2ec9ce1283a8bbcd89a1376a7c68293b Mon Sep 17 00:00:00 2001 From: Werner Sembach Date: Thu, 2 Dec 2021 17:50:10 +0100 Subject: [PATCH 034/143] ALSA: hda/realtek: Fix quirk for TongFang PHxTxX1 This fixes the SND_PCI_QUIRK(...) of the TongFang PHxTxX1 barebone. This fixes the issue of sound not working after s3 suspend. When waking up from s3 suspend the Coef 0x10 is set to 0x0220 instead of 0x0020. Setting the value manually makes the sound work again. This patch does this automatically. While being on it, I also fixed the comment formatting of the quirk and shortened variable and function names. Signed-off-by: Werner Sembach Fixes: dd6dd6e3c791 ("ALSA: hda/realtek: Add quirk for TongFang PHxTxX1") Cc: Link: https://lore.kernel.org/r/20211202165010.876431-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 40 +++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 9ce7457533c9..d361a1260d5a 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6503,22 +6503,26 @@ static void alc287_fixup_legion_15imhg05_speakers(struct hda_codec *codec, /* for alc285_fixup_ideapad_s740_coef() */ #include "ideapad_s740_helper.c" -static void alc256_fixup_tongfang_reset_persistent_settings(struct hda_codec *codec, - const struct hda_fixup *fix, - int action) +static const struct coef_fw alc256_fixup_set_coef_defaults_coefs[] = { + WRITE_COEF(0x10, 0x0020), WRITE_COEF(0x24, 0x0000), + WRITE_COEF(0x26, 0x0000), WRITE_COEF(0x29, 0x3000), + WRITE_COEF(0x37, 0xfe05), WRITE_COEF(0x45, 0x5089), + {} +}; + +static void alc256_fixup_set_coef_defaults(struct hda_codec *codec, + const struct hda_fixup *fix, + int action) { /* - * A certain other OS sets these coeffs to different values. On at least one TongFang - * barebone these settings might survive even a cold reboot. So to restore a clean slate the - * values are explicitly reset to default here. Without this, the external microphone is - * always in a plugged-in state, while the internal microphone is always in an unplugged - * state, breaking the ability to use the internal microphone. - */ - alc_write_coef_idx(codec, 0x24, 0x0000); - alc_write_coef_idx(codec, 0x26, 0x0000); - alc_write_coef_idx(codec, 0x29, 0x3000); - alc_write_coef_idx(codec, 0x37, 0xfe05); - alc_write_coef_idx(codec, 0x45, 0x5089); + * A certain other OS sets these coeffs to different values. On at least + * one TongFang barebone these settings might survive even a cold + * reboot. So to restore a clean slate the values are explicitly reset + * to default here. Without this, the external microphone is always in a + * plugged-in state, while the internal microphone is always in an + * unplugged state, breaking the ability to use the internal microphone. + */ + alc_process_coef_fw(codec, alc256_fixup_set_coef_defaults_coefs); } static const struct coef_fw alc233_fixup_no_audio_jack_coefs[] = { @@ -6759,7 +6763,7 @@ enum { ALC287_FIXUP_LEGION_15IMHG05_AUTOMUTE, ALC287_FIXUP_YOGA7_14ITL_SPEAKERS, ALC287_FIXUP_13S_GEN2_SPEAKERS, - ALC256_FIXUP_TONGFANG_RESET_PERSISTENT_SETTINGS, + ALC256_FIXUP_SET_COEF_DEFAULTS, ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE, ALC233_FIXUP_NO_AUDIO_JACK, }; @@ -8465,9 +8469,9 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_HEADSET_MODE, }, - [ALC256_FIXUP_TONGFANG_RESET_PERSISTENT_SETTINGS] = { + [ALC256_FIXUP_SET_COEF_DEFAULTS] = { .type = HDA_FIXUP_FUNC, - .v.func = alc256_fixup_tongfang_reset_persistent_settings, + .v.func = alc256_fixup_set_coef_defaults, }, [ALC245_FIXUP_HP_GPIO_LED] = { .type = HDA_FIXUP_FUNC, @@ -8929,7 +8933,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1b7d, 0xa831, "Ordissimo EVE2 ", ALC269VB_FIXUP_ORDISSIMO_EVE2), /* Also known as Malata PC-B1303 */ SND_PCI_QUIRK(0x1c06, 0x2013, "Lemote A1802", ALC269_FIXUP_LEMOTE_A1802), SND_PCI_QUIRK(0x1c06, 0x2015, "Lemote A190X", ALC269_FIXUP_LEMOTE_A190X), - SND_PCI_QUIRK(0x1d05, 0x1132, "TongFang PHxTxX1", ALC256_FIXUP_TONGFANG_RESET_PERSISTENT_SETTINGS), + SND_PCI_QUIRK(0x1d05, 0x1132, "TongFang PHxTxX1", ALC256_FIXUP_SET_COEF_DEFAULTS), SND_PCI_QUIRK(0x1d72, 0x1602, "RedmiBook", ALC255_FIXUP_XIAOMI_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1701, "XiaomiNotebook Pro", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1d72, 0x1901, "RedmiBook 14", ALC256_FIXUP_ASUS_HEADSET_MIC), From de4adddcbcc25dcd82ffbf3a4bbd8db5f64da056 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 1 Dec 2021 11:41:02 +0000 Subject: [PATCH 035/143] of/irq: Add a quirk for controllers with their own definition of interrupt-map Since commit 041284181226 ("of/irq: Allow matching of an interrupt-map local to an interrupt controller"), a handful of interrupt controllers have stopped working correctly. This is due to the DT exposing a non-sensical interrupt-map property, and their drivers relying on the kernel ignoring this property. Since we cannot realistically fix this terrible behaviour, add a quirk for the limited set of devices that have implemented this monster, and document that this is a pretty bad practice. Fixes: 041284181226 ("of/irq: Allow matching of an interrupt-map local to an interrupt controller") Cc: Rob Herring Cc: John Crispin Cc: Biwen Li Cc: Chris Brandt Cc: Geert Uytterhoeven Cc: Sander Vanheule Signed-off-by: Marc Zyngier Tested-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20211201114102.13446-1-maz@kernel.org Signed-off-by: Rob Herring --- drivers/of/irq.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/of/irq.c b/drivers/of/irq.c index b10f015b2e37..2b07677a386b 100644 --- a/drivers/of/irq.c +++ b/drivers/of/irq.c @@ -76,6 +76,26 @@ struct device_node *of_irq_find_parent(struct device_node *child) } EXPORT_SYMBOL_GPL(of_irq_find_parent); +/* + * These interrupt controllers abuse interrupt-map for unspeakable + * reasons and rely on the core code to *ignore* it (the drivers do + * their own parsing of the property). + * + * If you think of adding to the list for something *new*, think + * again. There is a high chance that you will be sent back to the + * drawing board. + */ +static const char * const of_irq_imap_abusers[] = { + "CBEA,platform-spider-pic", + "sti,platform-spider-pic", + "realtek,rtl-intc", + "fsl,ls1021a-extirq", + "fsl,ls1043a-extirq", + "fsl,ls1088a-extirq", + "renesas,rza1-irqc", + NULL, +}; + /** * of_irq_parse_raw - Low level interrupt tree parsing * @addr: address specifier (start of "reg" property of the device) in be32 format @@ -159,12 +179,15 @@ int of_irq_parse_raw(const __be32 *addr, struct of_phandle_args *out_irq) /* * Now check if cursor is an interrupt-controller and * if it is then we are done, unless there is an - * interrupt-map which takes precedence. + * interrupt-map which takes precedence except on one + * of these broken platforms that want to parse + * interrupt-map themselves for $reason. */ bool intc = of_property_read_bool(ipar, "interrupt-controller"); imap = of_get_property(ipar, "interrupt-map", &imaplen); - if (imap == NULL && intc) { + if (intc && + (!imap || of_device_compatible_match(ipar, of_irq_imap_abusers))) { pr_debug(" -> got it !\n"); return 0; } From b54472a02cefd0dc468158bbc4d636b27cd6fc34 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Fri, 3 Dec 2021 10:48:28 -0600 Subject: [PATCH 036/143] dt-bindings: media: nxp,imx7-mipi-csi2: Drop bad if/then schema The if/then schema for 'data-lanes' doesn't work as 'compatible' is at a different level than 'data-lanes'. To make it work, the if/then schema would have to be moved to the top level and then whole hierarchy of nodes down to 'data-lanes' created. I don't think it is worth the complexity to do that, so let's just drop it. The error in this schema is masked by a fixup in the tools causing the 'allOf' to get overwritten. Removing the fixup as part of moving to json-schema draft 2019-09 revealed the issue: Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.example.dt.yaml: mipi-csi@30750000: ports:port@0:endpoint:data-lanes:0: [1] is too short From schema: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.yaml Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.example.dt.yaml: mipi-csi@32e30000: ports:port@0:endpoint:data-lanes:0: [1, 2, 3, 4] is too long From schema: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.yaml The if condition was always true because 'compatible' did not exist in 'endpoint' node and a non-existent property is true for json-schema. Fixes: 85b62ff2cb97 ("media: dt-bindings: media: nxp,imx7-mipi-csi2: Add i.MX8MM support") Cc: Rui Miguel Silva Cc: Laurent Pinchart Cc: Mauro Carvalho Chehab Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: linux-media@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Rob Herring Reviewed-by: Laurent Pinchart Acked-by: Rui Miguel Silva Link: https://lore.kernel.org/r/20211203164828.187642-1-robh@kernel.org --- .../bindings/media/nxp,imx7-mipi-csi2.yaml | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.yaml b/Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.yaml index 877183cf4278..1ef849dc74d7 100644 --- a/Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.yaml +++ b/Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.yaml @@ -79,6 +79,8 @@ properties: properties: data-lanes: + description: + Note that 'fsl,imx7-mipi-csi2' only supports up to 2 data lines. items: minItems: 1 maxItems: 4 @@ -91,18 +93,6 @@ properties: required: - data-lanes - allOf: - - if: - properties: - compatible: - contains: - const: fsl,imx7-mipi-csi2 - then: - properties: - data-lanes: - items: - maxItems: 2 - port@1: $ref: /schemas/graph.yaml#/properties/port description: From 815b6cb37e8e9c4da06e7a52d7215a6dc1965e02 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Thu, 2 Dec 2021 15:27:08 +0900 Subject: [PATCH 037/143] ata: ahci_ceva: Fix id array access in ceva_ahci_read_id() ATA IDENTIFY command returns an array of le16 words. Accessing it as a u16 array triggers the following sparse warning: drivers/ata/ahci_ceva.c:107:33: warning: invalid assignment: &= drivers/ata/ahci_ceva.c:107:33: left side has type unsigned short drivers/ata/ahci_ceva.c:107:33: right side has type restricted __le16 Use a local variable to explicitly cast the id array to __le16 to avoid this warning. Signed-off-by: Damien Le Moal --- drivers/ata/ahci_ceva.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/ata/ahci_ceva.c b/drivers/ata/ahci_ceva.c index 50b56cd0039d..e9c7c07fd84c 100644 --- a/drivers/ata/ahci_ceva.c +++ b/drivers/ata/ahci_ceva.c @@ -94,6 +94,7 @@ struct ceva_ahci_priv { static unsigned int ceva_ahci_read_id(struct ata_device *dev, struct ata_taskfile *tf, u16 *id) { + __le16 *__id = (__le16 *)id; u32 err_mask; err_mask = ata_do_dev_read_id(dev, tf, id); @@ -103,7 +104,7 @@ static unsigned int ceva_ahci_read_id(struct ata_device *dev, * Since CEVA controller does not support device sleep feature, we * need to clear DEVSLP (bit 8) in word78 of the IDENTIFY DEVICE data. */ - id[ATA_ID_FEATURE_SUPP] &= cpu_to_le16(~(1 << 8)); + __id[ATA_ID_FEATURE_SUPP] &= cpu_to_le16(~(1 << 8)); return 0; } From 16cc33b23732d3ec55e428ddadb39c225f23de7e Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Mon, 29 Nov 2021 08:24:34 -0800 Subject: [PATCH 038/143] nvme: show subsys nqn for duplicate cntlids The driver assigned nvme handle isn't persistent across reboots, so is not enough information to match up where the collisions are occuring. Add the subsys nqn string to the output so that it can more easily be identified later. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215099 Signed-off-by: Keith Busch Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 4c63564adeaa..d476ad65def3 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2696,8 +2696,9 @@ static bool nvme_validate_cntlid(struct nvme_subsystem *subsys, if (tmp->cntlid == ctrl->cntlid) { dev_err(ctrl->device, - "Duplicate cntlid %u with %s, rejecting\n", - ctrl->cntlid, dev_name(tmp->device)); + "Duplicate cntlid %u with %s, subsys %s, rejecting\n", + ctrl->cntlid, dev_name(tmp->device), + subsys->subnqn); return false; } From d39ad2a45c0e38def3e0c95f5b90d9af4274c939 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Tue, 30 Nov 2021 08:14:54 -0800 Subject: [PATCH 039/143] nvme: disable namespace access for unsupported metadata The only fabrics target that supports metadata handling through the separate integrity buffer is RDMA. It is currently usable only if the size is 8B per block and formatted for protection information. If an rdma target were to export a namespace with a different format (ex: 4k+64B), the driver will not be able to submit valid read/write commands for that namespace. Suppress setting the metadata feature in the namespace so that the gendisk capacity will be set to 0. This will prevent read/write access through the block stack, but will continue to allow ioctl passthrough commands. Cc: Max Gurtovoy Cc: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d476ad65def3..4ee7d2f8b8d8 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1749,9 +1749,20 @@ static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id) */ if (WARN_ON_ONCE(!(id->flbas & NVME_NS_FLBAS_META_EXT))) return -EINVAL; - if (ctrl->max_integrity_segments) - ns->features |= - (NVME_NS_METADATA_SUPPORTED | NVME_NS_EXT_LBAS); + + ns->features |= NVME_NS_EXT_LBAS; + + /* + * The current fabrics transport drivers support namespace + * metadata formats only if nvme_ns_has_pi() returns true. + * Suppress support for all other formats so the namespace will + * have a 0 capacity and not be usable through the block stack. + * + * Note, this check will need to be modified if any drivers + * gain the ability to use other metadata formats. + */ + if (ctrl->max_integrity_segments && nvme_ns_has_pi(ns)) + ns->features |= NVME_NS_METADATA_SUPPORTED; } else { /* * For PCIe controllers, we can't easily remap the separate From 793fcab83f38661e22e6f7c682dfba6fd0d97bb2 Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Fri, 26 Nov 2021 10:42:44 +0000 Subject: [PATCH 040/143] nvme: report write pointer for a full zone as zone start + zone len The write pointer in NVMe ZNS is invalid for a zone in zone state full. The same also holds true for ZAC/ZBC. The current behavior for NVMe is to simply propagate the wp reported by the drive, even for full zones. Since the wp is invalid for a full zone, the wp reported by the drive may be any value. The way that the sd_zbc driver handles a full zone is to always report the wp as zone start + zone len, regardless of what the drive reported. null_blk also follows this convention. Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write pointer for a full zone in a consistent way, regardless of the interface of the underlying zoned block device. blkzone report before patch: start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8 reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] blkzone report after patch: start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] Signed-off-by: Niklas Cassel Reviewed-by: Keith Busch Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig --- drivers/nvme/host/zns.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c index bfc259e0d7b8..9f81beb4df4e 100644 --- a/drivers/nvme/host/zns.c +++ b/drivers/nvme/host/zns.c @@ -166,7 +166,10 @@ static int nvme_zone_parse_entry(struct nvme_ns *ns, zone.len = ns->zsze; zone.capacity = nvme_lba_to_sect(ns, le64_to_cpu(entry->zcap)); zone.start = nvme_lba_to_sect(ns, le64_to_cpu(entry->zslba)); - zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp)); + if (zone.cond == BLK_ZONE_COND_FULL) + zone.wp = zone.start + zone.len; + else + zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp)); return cb(&zone, idx, data); } From fb1af5bea4670c835e42fc0c14c49d3499468774 Mon Sep 17 00:00:00 2001 From: Geraldo Nascimento Date: Sat, 4 Dec 2021 15:52:24 -0300 Subject: [PATCH 041/143] ALSA: usb-audio: Reorder snd_djm_devices[] entries Olivia Mackintosh has posted to alsa-devel reporting that there's a potential bug that could break mixer quirks for Pioneer devices introduced by 6d27788160362a7ee6c0d317636fe4b1ddbe59a7 "ALSA: usb-audio: Add support for the Pioneer DJM 750MK2 Mixer/Soundcard". This happened because the DJM 750 MK2 was added last to the Pioneer DJM device table index and defined as 0x4 but was added to snd_djm_devices[] just after the DJM 750 (MK1) entry instead of last, after the DJM 900 NXS2. This escaped review. To prevent that from ever happening again, Takashi Iwai suggested to use C99 array designators in snd_djm_devices[] instead of simply reordering the entries. Fixes: 6d2778816036 ("ALSA: usb-audio: Add support for the Pioneer DJM 750MK2") Reported-by: Olivia Mackintosh Suggested-by: Takashi Iwai Signed-off-by: Geraldo Nascimento Link: https://lore.kernel.org/r/Yau46FDzoql0SNnW@geday Signed-off-by: Takashi Iwai --- sound/usb/mixer_quirks.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sound/usb/mixer_quirks.c b/sound/usb/mixer_quirks.c index d489c1de3bae..823b6b8de942 100644 --- a/sound/usb/mixer_quirks.c +++ b/sound/usb/mixer_quirks.c @@ -3016,11 +3016,11 @@ static const struct snd_djm_ctl snd_djm_ctls_750mk2[] = { static const struct snd_djm_device snd_djm_devices[] = { - SND_DJM_DEVICE(250mk2), - SND_DJM_DEVICE(750), - SND_DJM_DEVICE(750mk2), - SND_DJM_DEVICE(850), - SND_DJM_DEVICE(900nxs2) + [SND_DJM_250MK2_IDX] = SND_DJM_DEVICE(250mk2), + [SND_DJM_750_IDX] = SND_DJM_DEVICE(750), + [SND_DJM_850_IDX] = SND_DJM_DEVICE(850), + [SND_DJM_900NXS2_IDX] = SND_DJM_DEVICE(900nxs2), + [SND_DJM_750MK2_IDX] = SND_DJM_DEVICE(750mk2), }; From 776b54e97a7d993ba23696e032426d5dea5bbe70 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 6 Dec 2021 08:04:09 +0100 Subject: [PATCH 042/143] mtd_blkdevs: don't scan partitions for plain mtdblock mtdblock / mtdblock_ro set part_bits to 0 and thus nevever scanned partitions. Restore that behavior by setting the GENHD_FL_NO_PART flag. Fixes: 1ebe2e5f9d68e94c ("block: remove GENHD_FL_EXT_DEVT") Reported-by: Geert Uytterhoeven Signed-off-by: Christoph Hellwig Tested-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20211206070409.2836165-1-hch@lst.de Signed-off-by: Jens Axboe --- drivers/mtd/mtd_blkdevs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c index 4eaba6f4ec68..a69d064a8eec 100644 --- a/drivers/mtd/mtd_blkdevs.c +++ b/drivers/mtd/mtd_blkdevs.c @@ -346,7 +346,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) gd->minors = 1 << tr->part_bits; gd->fops = &mtd_block_ops; - if (tr->part_bits) + if (tr->part_bits) { if (new->devnum < 26) snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%c", tr->name, 'a' + new->devnum); @@ -355,9 +355,11 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) "%s%c%c", tr->name, 'a' - 1 + new->devnum / 26, 'a' + new->devnum % 26); - else + } else { snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%d", tr->name, new->devnum); + gd->flags |= GENHD_FL_NO_PART; + } set_capacity(gd, ((u64)new->size * tr->blksize) >> 9); From 3583521aabac76e58675297cead02f9ecac518b6 Mon Sep 17 00:00:00 2001 From: Vladimir Murzin Date: Tue, 30 Nov 2021 17:29:54 +0000 Subject: [PATCH 043/143] percpu: km: ensure it is used with NOMMU (either UP or SMP) Currently, NOMMU pull km allocator via !SMP dependency because most of them are UP, yet for SMP+NOMMU vm allocator gets pulled which: * may lead to broken build [1] * ...or not working runtime due to [2] It looks like SMP+NOMMU case was overlooked in bbddff054587 ("percpu: use percpu allocator on UP too") so restore that. [1] For ARM SMP+NOMMU (R-class cores) arm-none-linux-gnueabihf-ld: mm/percpu.o: in function `pcpu_post_unmap_tlb_flush': mm/percpu-vm.c:188: undefined reference to `flush_tlb_kernel_range' [2] static inline int vmap_pages_range_noflush(unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, unsigned int page_shift) { return -EINVAL; } Signed-off-by: Vladimir Murzin Tested-by: Rob Landley Tested-by: Rich Felker [Dennis: use depends instead of default for condition] Signed-off-by: Dennis Zhou --- mm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/Kconfig b/mm/Kconfig index 28edafc820ad..356f4f2c779e 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -428,7 +428,7 @@ config THP_SWAP # UP and nommu archs use km based percpu allocator # config NEED_PER_CPU_KM - depends on !SMP + depends on !SMP || !MMU bool default y From e47498afeca9a0c6d07eeeacc46d563555a3f677 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Mon, 6 Dec 2021 10:49:04 -0700 Subject: [PATCH 044/143] io-wq: remove spurious bit clear on task_work addition There's a small race here where the task_work could finish and drop the worker itself, so that by the time that task_work_add() returns with a successful addition we've already put the worker. The worker callbacks clear this bit themselves, so we don't actually need to manually clear it in the caller. Get rid of it. Reported-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com Signed-off-by: Jens Axboe --- fs/io-wq.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 50cf9f92da36..35da9d90df76 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -359,10 +359,8 @@ static bool io_queue_worker_create(struct io_worker *worker, init_task_work(&worker->create_work, func); worker->create_index = acct->index; - if (!task_work_add(wq->task, &worker->create_work, TWA_SIGNAL)) { - clear_bit_unlock(0, &worker->create_state); + if (!task_work_add(wq->task, &worker->create_work, TWA_SIGNAL)) return true; - } clear_bit_unlock(0, &worker->create_state); fail_release: io_worker_release(worker); From 96db48c9d777a73a33b1d516c5cfed7a417a5f40 Mon Sep 17 00:00:00 2001 From: Alexander Stein Date: Tue, 30 Nov 2021 09:27:56 +0100 Subject: [PATCH 045/143] dt-bindings: net: Reintroduce PHY no lane swap binding This binding was already documented in phy.txt, commit 252ae5330daa ("Documentation: devicetree: Add PHY no lane swap binding"), but got accidently removed during YAML conversion in commit d8704342c109 ("dt-bindings: net: Add a YAML schemas for the generic PHY options"). Note: 'enet-phy-lane-no-swap' and the absence of 'enet-phy-lane-swap' are not identical, as the former one disable this feature, while the latter one doesn't change anything. Fixes: d8704342c109 ("dt-bindings: net: Add a YAML schemas for the generic PHY options") Signed-off-by: Alexander Stein Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20211130082756.713919-1-alexander.stein@ew.tq-group.com Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/net/ethernet-phy.yaml | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml index 2766fe45bb98..ee42328a109d 100644 --- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml +++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml @@ -91,6 +91,14 @@ properties: compensate for the board being designed with the lanes swapped. + enet-phy-lane-no-swap: + $ref: /schemas/types.yaml#/definitions/flag + description: + If set, indicates that PHY will disable swap of the + TX/RX lanes. This property allows the PHY to work correcly after + e.g. wrong bootstrap configuration caused by issues in PCB + layout design. + eee-broken-100tx: $ref: /schemas/types.yaml#/definitions/flag description: From c4cb38b54b365edafb351e5371b9ae9eebf8e805 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 3 Dec 2021 14:35:06 +0100 Subject: [PATCH 046/143] dt-bindings: input: gpio-keys: Fix interrupts in example The "interrupts" property in the example looks weird: - The type is not in the last cell, - Level interrupts don't work well with gpio-keys, as they keep the interrupt asserted as long as the key is pressed, causing an interrupt storm. Use a more realistic falling-edge interrupt instead. Signed-off-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/47ecd2d8efcf09f8ab47de87a7bcfafc82208776.1638538079.git.geert+renesas@glider.be Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/input/gpio-keys.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/input/gpio-keys.yaml b/Documentation/devicetree/bindings/input/gpio-keys.yaml index 060a309ff8e7..dbe7ecc19ccb 100644 --- a/Documentation/devicetree/bindings/input/gpio-keys.yaml +++ b/Documentation/devicetree/bindings/input/gpio-keys.yaml @@ -142,7 +142,7 @@ examples: down { label = "GPIO Key DOWN"; linux,code = <108>; - interrupts = <1 IRQ_TYPE_LEVEL_HIGH 7>; + interrupts = <1 IRQ_TYPE_EDGE_FALLING>; }; }; From 656eb419b5076bacc536ddb66d30f2f3bf0bba92 Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Mon, 6 Dec 2021 16:29:05 +0100 Subject: [PATCH 047/143] dt-bindings: bq25980: Fixup the example Use the ti,watchdog-timeout-ms property instead of the unsupported ti,watchdog-timer property to make the example validate correctly. Signed-off-by: Thierry Reding Link: https://lore.kernel.org/r/20211206152905.226239-1-thierry.reding@gmail.com Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/power/supply/bq25980.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/power/supply/bq25980.yaml b/Documentation/devicetree/bindings/power/supply/bq25980.yaml index 06eca6667f67..8367a1fd4057 100644 --- a/Documentation/devicetree/bindings/power/supply/bq25980.yaml +++ b/Documentation/devicetree/bindings/power/supply/bq25980.yaml @@ -105,7 +105,7 @@ examples: reg = <0x65>; interrupt-parent = <&gpio1>; interrupts = <16 IRQ_TYPE_EDGE_FALLING>; - ti,watchdog-timer = <0>; + ti,watchdog-timeout-ms = <0>; ti,sc-ocp-limit-microamp = <2000000>; ti,sc-ovp-limit-microvolt = <17800000>; monitored-battery = <&bat>; From e53f2086856c16ccab80fd0ac012baa1ae88af73 Mon Sep 17 00:00:00 2001 From: Martin Botka Date: Tue, 30 Nov 2021 22:20:15 +0100 Subject: [PATCH 048/143] clk: qcom: sm6125-gcc: Swap ops of ice and apps on sdcc1 Without this change eMMC runs at overclocked freq. Swap the ops to not OC the eMMC. Signed-off-by: Martin Botka Link: https://lore.kernel.org/r/20211130212015.25232-1-martin.botka@somainline.org Reviewed-by: Bjorn Andersson Fixes: 4b8d6ae57cdf ("clk: qcom: Add SM6125 (TRINKET) GCC driver") Signed-off-by: Stephen Boyd --- drivers/clk/qcom/gcc-sm6125.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/clk/qcom/gcc-sm6125.c b/drivers/clk/qcom/gcc-sm6125.c index 543cfab7561f..431b55bb0d2f 100644 --- a/drivers/clk/qcom/gcc-sm6125.c +++ b/drivers/clk/qcom/gcc-sm6125.c @@ -1121,7 +1121,7 @@ static struct clk_rcg2 gcc_sdcc1_apps_clk_src = { .name = "gcc_sdcc1_apps_clk_src", .parent_data = gcc_parent_data_1, .num_parents = ARRAY_SIZE(gcc_parent_data_1), - .ops = &clk_rcg2_ops, + .ops = &clk_rcg2_floor_ops, }, }; @@ -1143,7 +1143,7 @@ static struct clk_rcg2 gcc_sdcc1_ice_core_clk_src = { .name = "gcc_sdcc1_ice_core_clk_src", .parent_data = gcc_parent_data_0, .num_parents = ARRAY_SIZE(gcc_parent_data_0), - .ops = &clk_rcg2_floor_ops, + .ops = &clk_rcg2_ops, }, }; From 3fe5185db46fedea7a6852d6a59d6e7cdb5d818a Mon Sep 17 00:00:00 2001 From: Manish Rangankar Date: Fri, 3 Dec 2021 01:52:18 -0800 Subject: [PATCH 049/143] scsi: qedi: Fix cmd_cleanup_cmpl counter mismatch issue When issued LUN reset under heavy I/O we hit the qedi WARN_ON because of a mismatch in firmware I/O cmd cleanup request count and I/O cmd cleanup response count received. The mismatch is because of a race caused by the postfix increment of cmd_cleanup_cmpl. [qedi_clearsq:1295]:18: fatal error, need hard reset, cid=0x0 WARNING: CPU: 48 PID: 110963 at drivers/scsi/qedi/qedi_fw.c:1296 qedi_clearsq+0xa5/0xd0 [qedi] CPU: 48 PID: 110963 Comm: kworker/u130:0 Kdump: loaded Tainted: G W Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 04/15/2020 Workqueue: iscsi_conn_cleanup iscsi_cleanup_conn_work_fn [scsi_transport_iscsi] RIP: 0010:qedi_clearsq+0xa5/0xd0 [qedi] RSP: 0018:ffffac2162c7fd98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff975213c40ab8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9761bf816858 RDI: ffff9761bf816858 RBP: ffff975247018628 R08: 000000000000522c R09: 000000000000005b R10: 0000000000000000 R11: ffffac2162c7fbd8 R12: ffff97522e1b2be8 R13: 0000000000000000 R14: ffff97522e1b2800 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff9761bf800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1a34e3e1a0 CR3: 0000000108bb2000 CR4: 0000000000350ee0 Call Trace: qedi_ep_disconnect+0x533/0x550 [qedi] ? iscsi_dbg_trace+0x63/0x80 [scsi_transport_iscsi] ? _cond_resched+0x15/0x30 ? iscsi_suspend_queue+0x19/0x40 [libiscsi] iscsi_ep_disconnect+0xb0/0x130 [scsi_transport_iscsi] iscsi_cleanup_conn_work_fn+0x82/0x130 [scsi_transport_iscsi] process_one_work+0x1a7/0x360 ? create_worker+0x1a0/0x1a0 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x116/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x22/0x40 ---[ end trace 5f1441f59082235c ]--- Link: https://lore.kernel.org/r/20211203095218.5477-1-mrangankar@marvell.com Reviewed-by: Lee Duncan Reviewed-by: Mike Christie Signed-off-by: Manish Rangankar Signed-off-by: Martin K. Petersen --- drivers/scsi/qedi/qedi_fw.c | 37 ++++++++++++++-------------------- drivers/scsi/qedi/qedi_iscsi.c | 2 +- drivers/scsi/qedi/qedi_iscsi.h | 2 +- 3 files changed, 17 insertions(+), 24 deletions(-) diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index 84a4204a2cb4..5916ed7662d5 100644 --- a/drivers/scsi/qedi/qedi_fw.c +++ b/drivers/scsi/qedi/qedi_fw.c @@ -732,7 +732,6 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi, { struct qedi_work_map *work, *work_tmp; u32 proto_itt = cqe->itid; - itt_t protoitt = 0; int found = 0; struct qedi_cmd *qedi_cmd = NULL; u32 iscsi_cid; @@ -812,16 +811,12 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi, return; check_cleanup_reqs: - if (qedi_conn->cmd_cleanup_req > 0) { - QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID, + if (atomic_inc_return(&qedi_conn->cmd_cleanup_cmpl) == + qedi_conn->cmd_cleanup_req) { + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, "Freeing tid=0x%x for cid=0x%x\n", cqe->itid, qedi_conn->iscsi_conn_id); - qedi_conn->cmd_cleanup_cmpl++; wake_up(&qedi_conn->wait_queue); - } else { - QEDI_ERR(&qedi->dbg_ctx, - "Delayed or untracked cleanup response, itt=0x%x, tid=0x%x, cid=0x%x\n", - protoitt, cqe->itid, qedi_conn->iscsi_conn_id); } } @@ -1163,7 +1158,7 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn, } qedi_conn->cmd_cleanup_req = 0; - qedi_conn->cmd_cleanup_cmpl = 0; + atomic_set(&qedi_conn->cmd_cleanup_cmpl, 0); QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, "active_cmd_count=%d, cid=0x%x, in_recovery=%d, lun_reset=%d\n", @@ -1215,16 +1210,15 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn, qedi_conn->iscsi_conn_id); rval = wait_event_interruptible_timeout(qedi_conn->wait_queue, - ((qedi_conn->cmd_cleanup_req == - qedi_conn->cmd_cleanup_cmpl) || - test_bit(QEDI_IN_RECOVERY, - &qedi->flags)), - 5 * HZ); + (qedi_conn->cmd_cleanup_req == + atomic_read(&qedi_conn->cmd_cleanup_cmpl)) || + test_bit(QEDI_IN_RECOVERY, &qedi->flags), + 5 * HZ); if (rval) { QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, "i/o cmd_cleanup_req=%d, equal to cmd_cleanup_cmpl=%d, cid=0x%x\n", qedi_conn->cmd_cleanup_req, - qedi_conn->cmd_cleanup_cmpl, + atomic_read(&qedi_conn->cmd_cleanup_cmpl), qedi_conn->iscsi_conn_id); return 0; @@ -1233,7 +1227,7 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn, QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, "i/o cmd_cleanup_req=%d, not equal to cmd_cleanup_cmpl=%d, cid=0x%x\n", qedi_conn->cmd_cleanup_req, - qedi_conn->cmd_cleanup_cmpl, + atomic_read(&qedi_conn->cmd_cleanup_cmpl), qedi_conn->iscsi_conn_id); iscsi_host_for_each_session(qedi->shost, @@ -1242,11 +1236,10 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn, /* Enable IOs for all other sessions except current.*/ if (!wait_event_interruptible_timeout(qedi_conn->wait_queue, - (qedi_conn->cmd_cleanup_req == - qedi_conn->cmd_cleanup_cmpl) || - test_bit(QEDI_IN_RECOVERY, - &qedi->flags), - 5 * HZ)) { + (qedi_conn->cmd_cleanup_req == + atomic_read(&qedi_conn->cmd_cleanup_cmpl)) || + test_bit(QEDI_IN_RECOVERY, &qedi->flags), + 5 * HZ)) { iscsi_host_for_each_session(qedi->shost, qedi_mark_device_available); return -1; @@ -1266,7 +1259,7 @@ void qedi_clearsq(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn, qedi_ep = qedi_conn->ep; qedi_conn->cmd_cleanup_req = 0; - qedi_conn->cmd_cleanup_cmpl = 0; + atomic_set(&qedi_conn->cmd_cleanup_cmpl, 0); if (!qedi_ep) { QEDI_WARN(&qedi->dbg_ctx, diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index 88aa7d8b11c9..282ecb4e39bb 100644 --- a/drivers/scsi/qedi/qedi_iscsi.c +++ b/drivers/scsi/qedi/qedi_iscsi.c @@ -412,7 +412,7 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session, qedi_conn->iscsi_conn_id = qedi_ep->iscsi_cid; qedi_conn->fw_cid = qedi_ep->fw_cid; qedi_conn->cmd_cleanup_req = 0; - qedi_conn->cmd_cleanup_cmpl = 0; + atomic_set(&qedi_conn->cmd_cleanup_cmpl, 0); if (qedi_bind_conn_to_iscsi_cid(qedi, qedi_conn)) { rc = -EINVAL; diff --git a/drivers/scsi/qedi/qedi_iscsi.h b/drivers/scsi/qedi/qedi_iscsi.h index a282860da0aa..9b9f2e44fdde 100644 --- a/drivers/scsi/qedi/qedi_iscsi.h +++ b/drivers/scsi/qedi/qedi_iscsi.h @@ -155,7 +155,7 @@ struct qedi_conn { spinlock_t list_lock; /* internal conn lock */ u32 active_cmd_count; u32 cmd_cleanup_req; - u32 cmd_cleanup_cmpl; + atomic_t cmd_cleanup_cmpl; u32 iscsi_conn_id; int itt; From 7db0e0c8190a086ef92ce5bb960836cde49540aa Mon Sep 17 00:00:00 2001 From: Shin'ichiro Kawasaki Date: Tue, 7 Dec 2021 10:06:38 +0900 Subject: [PATCH 050/143] scsi: scsi_debug: Fix buffer size of REPORT ZONES command According to ZBC and SPC specifications, the unit of ALLOCATION LENGTH field of REPORT ZONES command is byte. However, current scsi_debug implementation handles it as number of zones to calculate buffer size to report zones. When the ALLOCATION LENGTH has a large number, this results in too large buffer size and causes memory allocation failure. Fix the failure by handling ALLOCATION LENGTH as byte unit. Link: https://lore.kernel.org/r/20211207010638.124280-1-shinichiro.kawasaki@wdc.com Fixes: f0d1cf9378bd ("scsi: scsi_debug: Add ZBC zone commands") Reviewed-by: Damien Le Moal Signed-off-by: Shin'ichiro Kawasaki Signed-off-by: Martin K. Petersen --- drivers/scsi/scsi_debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index 3c0da3770edf..2104973a35cd 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -4342,7 +4342,7 @@ static int resp_report_zones(struct scsi_cmnd *scp, rep_max_zones = min((alloc_len - 64) >> ilog2(RZONES_DESC_HD), max_zones); - arr = kcalloc(RZONES_DESC_HD, alloc_len, GFP_ATOMIC); + arr = kzalloc(alloc_len, GFP_ATOMIC); if (!arr) { mk_sense_buffer(scp, ILLEGAL_REQUEST, INSUFF_RES_ASC, INSUFF_RES_ASCQ); From 69002c8ce914ef0ae22a6ea14b43bb30b9a9a6a8 Mon Sep 17 00:00:00 2001 From: Roman Bolshakov Date: Fri, 12 Nov 2021 17:54:46 +0300 Subject: [PATCH 051/143] scsi: qla2xxx: Format log strings only if needed Commit 598a90f2002c ("scsi: qla2xxx: add ring buffer for tracing debug logs") introduced unconditional log string formatting to ql_dbg() even if ql_dbg_log event is disabled. It harms performance because some strings are formatted in fastpath and/or interrupt context. Link: https://lore.kernel.org/r/20211112145446.51210-1-r.bolshakov@yadro.com Fixes: 598a90f2002c ("scsi: qla2xxx: add ring buffer for tracing debug logs") Cc: Rajan Shanmugavelu Cc: stable@vger.kernel.org Signed-off-by: Roman Bolshakov Signed-off-by: Martin K. Petersen --- drivers/scsi/qla2xxx/qla_dbg.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/qla2xxx/qla_dbg.c b/drivers/scsi/qla2xxx/qla_dbg.c index 25549a8a2d72..7cf1f78cbaee 100644 --- a/drivers/scsi/qla2xxx/qla_dbg.c +++ b/drivers/scsi/qla2xxx/qla_dbg.c @@ -2491,6 +2491,9 @@ ql_dbg(uint level, scsi_qla_host_t *vha, uint id, const char *fmt, ...) struct va_format vaf; char pbuf[64]; + if (!ql_mask_match(level) && !trace_ql_dbg_log_enabled()) + return; + va_start(va, fmt); vaf.fmt = fmt; From d7f32791a9fcf0dae8b073cdea9b79e29098c5f4 Mon Sep 17 00:00:00 2001 From: Kailang Yang Date: Tue, 23 Nov 2021 16:32:44 +0800 Subject: [PATCH 052/143] ALSA: hda/realtek - Add headset Mic support for Lenovo ALC897 platform Lenovo ALC897 platform had headset Mic. This patch enable supported headset Mic. Signed-off-by: Kailang Yang Cc: Link: https://lore.kernel.org/r/baab2c2536cb4cc18677a862c6f6d840@realtek.com Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 40 +++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index d361a1260d5a..3599f4c85ebf 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10235,6 +10235,27 @@ static void alc671_fixup_hp_headset_mic2(struct hda_codec *codec, } } +static void alc897_hp_automute_hook(struct hda_codec *codec, + struct hda_jack_callback *jack) +{ + struct alc_spec *spec = codec->spec; + int vref; + + snd_hda_gen_hp_automute(codec, jack); + vref = spec->gen.hp_jack_present ? (PIN_HP | AC_PINCTL_VREF_100) : PIN_HP; + snd_hda_codec_write(codec, 0x1b, 0, AC_VERB_SET_PIN_WIDGET_CONTROL, + vref); +} + +static void alc897_fixup_lenovo_headset_mic(struct hda_codec *codec, + const struct hda_fixup *fix, int action) +{ + struct alc_spec *spec = codec->spec; + if (action == HDA_FIXUP_ACT_PRE_PROBE) { + spec->gen.hp_automute_hook = alc897_hp_automute_hook; + } +} + static const struct coef_fw alc668_coefs[] = { WRITE_COEF(0x01, 0xbebe), WRITE_COEF(0x02, 0xaaaa), WRITE_COEF(0x03, 0x0), WRITE_COEF(0x04, 0x0180), WRITE_COEF(0x06, 0x0), WRITE_COEF(0x07, 0x0f80), @@ -10315,6 +10336,8 @@ enum { ALC668_FIXUP_ASUS_NO_HEADSET_MIC, ALC668_FIXUP_HEADSET_MIC, ALC668_FIXUP_MIC_DET_COEF, + ALC897_FIXUP_LENOVO_HEADSET_MIC, + ALC897_FIXUP_HEADSET_MIC_PIN, }; static const struct hda_fixup alc662_fixups[] = { @@ -10721,6 +10744,19 @@ static const struct hda_fixup alc662_fixups[] = { {} }, }, + [ALC897_FIXUP_LENOVO_HEADSET_MIC] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc897_fixup_lenovo_headset_mic, + }, + [ALC897_FIXUP_HEADSET_MIC_PIN] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x1a, 0x03a11050 }, + { } + }, + .chained = true, + .chain_id = ALC897_FIXUP_LENOVO_HEADSET_MIC + }, }; static const struct snd_pci_quirk alc662_fixup_tbl[] = { @@ -10765,6 +10801,10 @@ static const struct snd_pci_quirk alc662_fixup_tbl[] = { SND_PCI_QUIRK(0x144d, 0xc051, "Samsung R720", ALC662_FIXUP_IDEAPAD), SND_PCI_QUIRK(0x14cd, 0x5003, "USI", ALC662_FIXUP_USI_HEADSET_MODE), SND_PCI_QUIRK(0x17aa, 0x1036, "Lenovo P520", ALC662_FIXUP_LENOVO_MULTI_CODECS), + SND_PCI_QUIRK(0x17aa, 0x32ca, "Lenovo ThinkCentre M80", ALC897_FIXUP_HEADSET_MIC_PIN), + SND_PCI_QUIRK(0x17aa, 0x32cb, "Lenovo ThinkCentre M70", ALC897_FIXUP_HEADSET_MIC_PIN), + SND_PCI_QUIRK(0x17aa, 0x32cf, "Lenovo ThinkCentre M950", ALC897_FIXUP_HEADSET_MIC_PIN), + SND_PCI_QUIRK(0x17aa, 0x32f7, "Lenovo ThinkCentre M90", ALC897_FIXUP_HEADSET_MIC_PIN), SND_PCI_QUIRK(0x17aa, 0x38af, "Lenovo Ideapad Y550P", ALC662_FIXUP_IDEAPAD), SND_PCI_QUIRK(0x17aa, 0x3a0d, "Lenovo Ideapad Y550", ALC662_FIXUP_IDEAPAD), SND_PCI_QUIRK(0x1849, 0x5892, "ASRock B150M", ALC892_FIXUP_ASROCK_MOBO), From ee91cb570d9b12ae8cfcb96f7ea6fb80983b6a0a Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 23 Nov 2021 18:06:34 +0000 Subject: [PATCH 053/143] PCI: apple: Follow the PCIe specifications when resetting the port MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While the Apple PCIe driver works correctly when directly booted from the firmware, it fails to initialise when the kernel is booted from a bootloader using PCIe such as u-boot. That's because we're missing a proper reset of the port (we only clear the reset, but never assert it). The PCIe spec requirements are two-fold: - PERST# must be asserted before setting up the clocks and stay asserted for at least 100us (Tperst-clk) - Once PERST# is deasserted, the OS must wait for at least 100ms "from the end of a Conventional Reset" before we can start talking to the devices Implementing this results in a booting system. [bhelgaas: #PERST -> PERST#, update spec references to current] Fixes: 1e33888fbe44 ("PCI: apple: Add initial hardware bring-up") Link: https://lore.kernel.org/r/20211123180636.80558-2-maz@kernel.org Signed-off-by: Marc Zyngier Signed-off-by: Bjorn Helgaas Reviewed-by: Luca Ceresoli Acked-by: Pali Rohár Cc: Alyssa Rosenzweig Cc: Lorenzo Pieralisi --- drivers/pci/controller/pcie-apple.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c index 1bf4d75b61be..c384777b0374 100644 --- a/drivers/pci/controller/pcie-apple.c +++ b/drivers/pci/controller/pcie-apple.c @@ -516,7 +516,7 @@ static int apple_pcie_setup_port(struct apple_pcie *pcie, int ret, i; reset = gpiod_get_from_of_node(np, "reset-gpios", 0, - GPIOD_OUT_LOW, "#PERST"); + GPIOD_OUT_LOW, "PERST#"); if (IS_ERR(reset)) return PTR_ERR(reset); @@ -539,13 +539,23 @@ static int apple_pcie_setup_port(struct apple_pcie *pcie, rmw_set(PORT_APPCLK_EN, port->base + PORT_APPCLK); + /* Assert PERST# before setting up the clock */ + gpiod_set_value(reset, 0); + ret = apple_pcie_setup_refclk(pcie, port); if (ret < 0) return ret; + /* The minimal Tperst-clk value is 100us (PCIe CEM r5.0, 2.9.2) */ + usleep_range(100, 200); + + /* Deassert PERST# */ rmw_set(PORT_PERST_OFF, port->base + PORT_PERST); gpiod_set_value(reset, 1); + /* Wait for 100ms after PERST# deassertion (PCIe r5.0, 6.6.1) */ + msleep(100); + ret = readl_relaxed_poll_timeout(port->base + PORT_STATUS, stat, stat & PORT_STATUS_READY, 100, 250000); if (ret < 0) { From c7c15ae3dc50c0ab46c5cbbf8d2f3d3307e51f37 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Fri, 3 Dec 2021 19:47:15 +0800 Subject: [PATCH 054/143] nvme-multipath: set ana_log_size to 0 after free ana_log_buf Set ana_log_size to 0 when ana_log_buf is freed to make sure nvme_mpath_init_identify will do the right thing when retrying after an earlier failure. Signed-off-by: Hou Tao Reviewed-by: Keith Busch Signed-off-by: Christoph Hellwig --- drivers/nvme/host/multipath.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 7f2071f2460c..13e5d503ed07 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -866,7 +866,7 @@ int nvme_mpath_init_identify(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) } if (ana_log_size > ctrl->ana_log_size) { nvme_mpath_stop(ctrl); - kfree(ctrl->ana_log_buf); + nvme_mpath_uninit(ctrl); ctrl->ana_log_buf = kmalloc(ana_log_size, GFP_KERNEL); if (!ctrl->ana_log_buf) return -ENOMEM; @@ -886,4 +886,5 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl) { kfree(ctrl->ana_log_buf); ctrl->ana_log_buf = NULL; + ctrl->ana_log_size = 0; } From 8b77fa6fdce0fc7147bab91b1011048758290ca4 Mon Sep 17 00:00:00 2001 From: Ruozhu Li Date: Thu, 4 Nov 2021 15:13:32 +0800 Subject: [PATCH 055/143] nvme: fix use after free when disconnecting a reconnecting ctrl A crash happens when trying to disconnect a reconnecting ctrl: 1) The network was cut off when the connection was just established, scan work hang there waiting for some IOs complete. Those I/Os were retried because we return BLK_STS_RESOURCE to blk in reconnecting. 2) After a while, I tried to disconnect this connection. This procedure also hangs because it tried to obtain ctrl->scan_lock. It should be noted that now we have switched the controller state to NVME_CTRL_DELETING. 3) In nvme_check_ready(), we always return true when ctrl->state is NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom device which was already freed. To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom device only when queue state is live. If not, return host path error to the block layer Signed-off-by: Ruozhu Li Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 1 + drivers/nvme/host/nvme.h | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 4ee7d2f8b8d8..1af8a4513708 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -666,6 +666,7 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl, struct request *rq) { if (ctrl->state != NVME_CTRL_DELETING_NOIO && + ctrl->state != NVME_CTRL_DELETING && ctrl->state != NVME_CTRL_DEAD && !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) && !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index b334af8aa264..9b095ee01364 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -709,7 +709,7 @@ static inline bool nvme_check_ready(struct nvme_ctrl *ctrl, struct request *rq, return true; if (ctrl->ops->flags & NVME_F_FABRICS && ctrl->state == NVME_CTRL_DELETING) - return true; + return queue_live; return __nvme_check_ready(ctrl, rq, queue_live); } int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd, From 089558bc7ba785c03815a49c89e28ad9b8de51f9 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Mon, 6 Dec 2021 15:38:20 -0800 Subject: [PATCH 056/143] xfs: remove all COW fork extents when remounting readonly As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount # (0) fsstress & # (1) while true; do fsstress mount -o remount,ro # (2) fsstress mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. Solve this race by forcing the xfs_blockgc_free_space to run synchronously, which causes xfs_icwalk to return to inodes that were skipped because the blockgc code couldn't take the IOLOCK. This is safe to do here because the VFS has already prohibited new writer threads. Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Reviewed-by: Chandan Babu R --- fs/xfs/xfs_super.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e21459f9923a..778b57b1f020 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1765,7 +1765,10 @@ static int xfs_remount_ro( struct xfs_mount *mp) { - int error; + struct xfs_icwalk icw = { + .icw_flags = XFS_ICWALK_FLAG_SYNC, + }; + int error; /* * Cancel background eofb scanning so it cannot race with the final @@ -1773,8 +1776,13 @@ xfs_remount_ro( */ xfs_blockgc_stop(mp); - /* Get rid of any leftover CoW reservations... */ - error = xfs_blockgc_free_space(mp, NULL); + /* + * Clear out all remaining COW staging extents and speculative post-EOF + * preallocations so that we don't leave inodes requiring inactivation + * cleanups during reclaim on a read-only mount. We must process every + * cached inode, so this requires a synchronous cache scan. + */ + error = xfs_blockgc_free_space(mp, &icw); if (error) { xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); return error; From 2d4fcc5ab35fac2e995f497d62439dcbb416babc Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 17 Nov 2021 10:26:05 +0300 Subject: [PATCH 057/143] clk: versatile: clk-icst: use after free on error path This frees "name" and then tries to display in as part of the error message on the next line. Swap the order. Fixes: 1b2189f3aa50 ("clk: versatile: clk-icst: Ensure clock names are unique") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/20211117072604.GC5237@kili Reviewed-by: Linus Walleij Signed-off-by: Stephen Boyd --- drivers/clk/versatile/clk-icst.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/versatile/clk-icst.c b/drivers/clk/versatile/clk-icst.c index d52f976dc875..d5cb372f0901 100644 --- a/drivers/clk/versatile/clk-icst.c +++ b/drivers/clk/versatile/clk-icst.c @@ -543,8 +543,8 @@ static void __init of_syscon_icst_setup(struct device_node *np) regclk = icst_clk_setup(NULL, &icst_desc, name, parent_name, map, ctype); if (IS_ERR(regclk)) { - kfree(name); pr_err("error setting up syscon ICST clock %s\n", name); + kfree(name); return; } of_clk_add_provider(np, of_clk_src_simple_get, regclk); From 5b970dfcfee9e04e041c9eeac5dbd1ccc719c249 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 23 Nov 2021 18:06:35 +0000 Subject: [PATCH 058/143] arm64: dts: apple: t8103: Mark PCIe PERST# polarity active low in DT As the name indicates, PERST# is active low. Fix the DT description to match the HW behaviour. Fixes: ff2a8d91d80c ("arm64: apple: Add PCIe node") Link: https://lore.kernel.org/r/20211123180636.80558-3-maz@kernel.org Signed-off-by: Marc Zyngier Signed-off-by: Bjorn Helgaas Reviewed-by: Luca Ceresoli Reviewed-by: Mark Kettenis Acked-by: Arnd Bergmann --- arch/arm64/boot/dts/apple/t8103.dtsi | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/arm64/boot/dts/apple/t8103.dtsi b/arch/arm64/boot/dts/apple/t8103.dtsi index fc8b2bb06ffe..e22c9433d5e0 100644 --- a/arch/arm64/boot/dts/apple/t8103.dtsi +++ b/arch/arm64/boot/dts/apple/t8103.dtsi @@ -7,6 +7,7 @@ * Copyright The Asahi Linux Contributors */ +#include #include #include #include @@ -281,7 +282,7 @@ pcie0: pcie@690000000 { port00: pci@0,0 { device_type = "pci"; reg = <0x0 0x0 0x0 0x0 0x0>; - reset-gpios = <&pinctrl_ap 152 0>; + reset-gpios = <&pinctrl_ap 152 GPIO_ACTIVE_LOW>; max-link-speed = <2>; #address-cells = <3>; @@ -301,7 +302,7 @@ port00: pci@0,0 { port01: pci@1,0 { device_type = "pci"; reg = <0x800 0x0 0x0 0x0 0x0>; - reset-gpios = <&pinctrl_ap 153 0>; + reset-gpios = <&pinctrl_ap 153 GPIO_ACTIVE_LOW>; max-link-speed = <2>; #address-cells = <3>; @@ -321,7 +322,7 @@ port01: pci@1,0 { port02: pci@2,0 { device_type = "pci"; reg = <0x1000 0x0 0x0 0x0 0x0>; - reset-gpios = <&pinctrl_ap 33 0>; + reset-gpios = <&pinctrl_ap 33 GPIO_ACTIVE_LOW>; max-link-speed = <1>; #address-cells = <3>; From 87620512681a20ef24ece85ac21ff90c9efed37d Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 23 Nov 2021 18:06:36 +0000 Subject: [PATCH 059/143] PCI: apple: Fix PERST# polarity MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that PERST# is properly defined as active-low in the device tree, fix the driver to correctly drive the line independently of the implied polarity. Suggested-by: Pali Rohár Fixes: 1e33888fbe44 ("PCI: apple: Add initial hardware bring-up") Link: https://lore.kernel.org/r/20211123180636.80558-4-maz@kernel.org Signed-off-by: Marc Zyngier Signed-off-by: Bjorn Helgaas Reviewed-by: Luca Ceresoli --- drivers/pci/controller/pcie-apple.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c index c384777b0374..b090924b41fe 100644 --- a/drivers/pci/controller/pcie-apple.c +++ b/drivers/pci/controller/pcie-apple.c @@ -540,7 +540,7 @@ static int apple_pcie_setup_port(struct apple_pcie *pcie, rmw_set(PORT_APPCLK_EN, port->base + PORT_APPCLK); /* Assert PERST# before setting up the clock */ - gpiod_set_value(reset, 0); + gpiod_set_value(reset, 1); ret = apple_pcie_setup_refclk(pcie, port); if (ret < 0) @@ -551,7 +551,7 @@ static int apple_pcie_setup_port(struct apple_pcie *pcie, /* Deassert PERST# */ rmw_set(PORT_PERST_OFF, port->base + PORT_PERST); - gpiod_set_value(reset, 1); + gpiod_set_value(reset, 0); /* Wait for 100ms after PERST# deassertion (PCIe r5.0, 6.6.1) */ msleep(100); From 75feae73a28020e492fbad2323245455ef69d687 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 7 Dec 2021 20:16:36 +0000 Subject: [PATCH 060/143] block: fix single bio async DIO error handling BUG: KASAN: use-after-free in io_submit_one+0x496/0x2fe0 fs/aio.c:1882 CPU: 2 PID: 15100 Comm: syz-executor873 Not tainted 5.16.0-rc1-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: [...] refcount_dec_and_test include/linux/refcount.h:333 [inline] iocb_put fs/aio.c:1161 [inline] io_submit_one+0x496/0x2fe0 fs/aio.c:1882 __do_sys_io_submit fs/aio.c:1938 [inline] __se_sys_io_submit fs/aio.c:1908 [inline] __x64_sys_io_submit+0x1c7/0x4a0 fs/aio.c:1908 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae __blkdev_direct_IO_async() returns errors from bio_iov_iter_get_pages() directly, in which case upper layers won't be expecting ->ki_complete to be called by the block layer and will terminate the request. However, there is also bio_endio() leading to a second ->ki_complete and a double free. Fixes: 54a88eb838d37 ("block: add single bio async direct IO helper") Reported-by: George Kennedy Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/c9eb786f6cef041e159e6287de131bec0719ad5c.1638907997.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- block/fops.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/block/fops.c b/block/fops.c index ad732a36f9b3..8d329ca56b0f 100644 --- a/block/fops.c +++ b/block/fops.c @@ -340,8 +340,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, } else { ret = bio_iov_iter_get_pages(bio, iter); if (unlikely(ret)) { - bio->bi_status = BLK_STS_IOERR; - bio_endio(bio); + bio_put(bio); return ret; } } From 51a08bdeca27988a17c87b87d8e64ffecbd2a172 Mon Sep 17 00:00:00 2001 From: Vincent Whitchurch Date: Tue, 7 Dec 2021 12:54:19 +0100 Subject: [PATCH 061/143] cifs: Fix crash on unload of cifs_arc4.ko The exit function is wrongly placed in the __init section and this leads to a crash when the module is unloaded. Just remove both the init and exit functions since this module does not need them. Fixes: 71c02863246167b3d ("cifs: fork arc4 and create a separate module...") Signed-off-by: Vincent Whitchurch Acked-by: Ronnie Sahlberg Acked-by: Paulo Alcantara (SUSE) Cc: stable@vger.kernel.org # 5.15 Signed-off-by: Steve French --- fs/smbfs_common/cifs_arc4.c | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/fs/smbfs_common/cifs_arc4.c b/fs/smbfs_common/cifs_arc4.c index 85ba15a60b13..043e4cb839fa 100644 --- a/fs/smbfs_common/cifs_arc4.c +++ b/fs/smbfs_common/cifs_arc4.c @@ -72,16 +72,3 @@ void cifs_arc4_crypt(struct arc4_ctx *ctx, u8 *out, const u8 *in, unsigned int l ctx->y = y; } EXPORT_SYMBOL_GPL(cifs_arc4_crypt); - -static int __init -init_smbfs_common(void) -{ - return 0; -} -static void __init -exit_smbfs_common(void) -{ -} - -module_init(init_smbfs_common) -module_exit(exit_smbfs_common) From 250552b925ce400c17d166422fde9bb215958481 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Mon, 29 Nov 2021 10:47:01 +0100 Subject: [PATCH 062/143] KVM: nVMX: Don't use Enlightened MSR Bitmap for L3 When KVM runs as a nested hypervisor on top of Hyper-V it uses Enlightened VMCS and enables Enlightened MSR Bitmap feature for its L1s and L2s (which are actually L2s and L3s from Hyper-V's perspective). When MSR bitmap is updated, KVM has to reset HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP from clean fields to make Hyper-V aware of the change. For KVM's L1s, this is done in vmx_disable_intercept_for_msr()/vmx_enable_intercept_for_msr(). MSR bitmap for L2 is build in nested_vmx_prepare_msr_bitmap() by blending MSR bitmap for L1 and L1's idea of MSR bitmap for L2. KVM, however, doesn't check if the resulting bitmap is different and never cleans HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP in eVMCS02. This is incorrect and may result in Hyper-V missing the update. The issue could've been solved by calling evmcs_touch_msr_bitmap() for eVMCS02 from nested_vmx_prepare_msr_bitmap() unconditionally but doing so would not give any performance benefits (compared to not using Enlightened MSR Bitmap at all). 3-level nesting is also not a very common setup nowadays. Don't enable 'Enlightened MSR Bitmap' feature for KVM's L2s (real L3s) for now. Signed-off-by: Vitaly Kuznetsov Message-Id: <20211129094704.326635-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/vmx.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 9453743ce0c4..5aadad3e7367 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2646,15 +2646,6 @@ int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) if (!loaded_vmcs->msr_bitmap) goto out_vmcs; memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); - - if (IS_ENABLED(CONFIG_HYPERV) && - static_branch_unlikely(&enable_evmcs) && - (ms_hyperv.nested_features & HV_X64_NESTED_MSR_BITMAP)) { - struct hv_enlightened_vmcs *evmcs = - (struct hv_enlightened_vmcs *)loaded_vmcs->vmcs; - - evmcs->hv_enlightenments_control.msr_bitmap = 1; - } } memset(&loaded_vmcs->host_state, 0, sizeof(struct vmcs_host_state)); @@ -6842,6 +6833,19 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu) if (err < 0) goto free_pml; + /* + * Use Hyper-V 'Enlightened MSR Bitmap' feature when KVM runs as a + * nested (L1) hypervisor and Hyper-V in L0 supports it. Enable the + * feature only for vmcs01, KVM currently isn't equipped to realize any + * performance benefits from enabling it for vmcs02. + */ + if (IS_ENABLED(CONFIG_HYPERV) && static_branch_unlikely(&enable_evmcs) && + (ms_hyperv.nested_features & HV_X64_NESTED_MSR_BITMAP)) { + struct hv_enlightened_vmcs *evmcs = (void *)vmx->vmcs01.vmcs; + + evmcs->hv_enlightenments_control.msr_bitmap = 1; + } + /* The MSR bitmap starts with all ones */ bitmap_fill(vmx->shadow_msr_intercept.read, MAX_POSSIBLE_PASSTHROUGH_MSRS); bitmap_fill(vmx->shadow_msr_intercept.write, MAX_POSSIBLE_PASSTHROUGH_MSRS); From ee7f3666995d8537dec17b1d35425f28877671a9 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 8 Dec 2021 07:57:20 -0500 Subject: [PATCH 063/143] tracefs: Have new files inherit the ownership of their parent If directories in tracefs have their ownership changed, then any new files and directories that are created under those directories should inherit the ownership of the director they are created in. Link: https://lkml.kernel.org/r/20211208075720.4855d180@gandalf.local.home Cc: Kees Cook Cc: Ingo Molnar Cc: Andrew Morton Cc: Linus Torvalds Cc: Al Viro Cc: Greg Kroah-Hartman Cc: Yabin Cui Cc: Christian Brauner Cc: stable@vger.kernel.org Fixes: 4282d60689d4f ("tracefs: Add new tracefs file system") Reported-by: Kalesh Singh Reported: https://lore.kernel.org/all/CAC_TJve8MMAv+H_NdLSJXZUSoxOEq2zB_pVaJ9p=7H6Bu3X76g@mail.gmail.com/ Signed-off-by: Steven Rostedt (VMware) --- fs/tracefs/inode.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 925a621b432e..06cf0534cc60 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -414,6 +414,8 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, inode->i_mode = mode; inode->i_fop = fops ? fops : &tracefs_file_operations; inode->i_private = data; + inode->i_uid = d_inode(dentry->d_parent)->i_uid; + inode->i_gid = d_inode(dentry->d_parent)->i_gid; d_instantiate(dentry, inode); fsnotify_create(dentry->d_parent->d_inode, dentry); return end_creating(dentry); @@ -436,6 +438,8 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent, inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP; inode->i_op = ops; inode->i_fop = &simple_dir_operations; + inode->i_uid = d_inode(dentry->d_parent)->i_uid; + inode->i_gid = d_inode(dentry->d_parent)->i_gid; /* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); From 48b27b6b5191e2e1f2798cd80877b6e4ef47c351 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Tue, 7 Dec 2021 17:17:29 -0500 Subject: [PATCH 064/143] tracefs: Set all files to the same group ownership as the mount option As people have been asking to allow non-root processes to have access to the tracefs directory, it was considered best to only allow groups to have access to the directory, where it is easier to just set the tracefs file system to a specific group (as other would be too dangerous), and that way the admins could pick which processes would have access to tracefs. Unfortunately, this broke tooling on Android that expected the other bit to be set. For some special cases, for non-root tools to trace the system, tracefs would be mounted and change the permissions of the top level directory which gave access to all running tasks permission to the tracing directory. Even though this would be dangerous to do in a production environment, for testing environments this can be useful. Now with the new changes to not allow other (which is still the proper thing to do), it breaks the testing tooling. Now more code needs to be loaded on the system to change ownership of the tracing directory. The real solution is to have tracefs honor the gid=xxx option when mounting. That is, (tracing group tracing has value 1003) mount -t tracefs -o gid=1003 tracefs /sys/kernel/tracing should have it that all files in the tracing directory should be of the given group. Copy the logic from d_walk() from dcache.c and simplify it for the mount case of tracefs if gid is set. All the files in tracefs will be walked and their group will be set to the value passed in. Link: https://lkml.kernel.org/r/20211207171729.2a54e1b3@gandalf.local.home Cc: Ingo Molnar Cc: Kees Cook Cc: Andrew Morton Cc: Linus Torvalds Cc: linux-fsdevel@vger.kernel.org Cc: Al Viro Cc: Greg Kroah-Hartman Reported-by: Kalesh Singh Reported-by: Yabin Cui Fixes: 49d67e445742 ("tracefs: Have tracefs directories not set OTH permission bits by default") Signed-off-by: Steven Rostedt (VMware) --- fs/tracefs/inode.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 06cf0534cc60..3616839c5c4b 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -161,6 +161,77 @@ struct tracefs_fs_info { struct tracefs_mount_opts mount_opts; }; +static void change_gid(struct dentry *dentry, kgid_t gid) +{ + if (!dentry->d_inode) + return; + dentry->d_inode->i_gid = gid; +} + +/* + * Taken from d_walk, but without he need for handling renames. + * Nothing can be renamed while walking the list, as tracefs + * does not support renames. This is only called when mounting + * or remounting the file system, to set all the files to + * the given gid. + */ +static void set_gid(struct dentry *parent, kgid_t gid) +{ + struct dentry *this_parent; + struct list_head *next; + + this_parent = parent; + spin_lock(&this_parent->d_lock); + + change_gid(this_parent, gid); +repeat: + next = this_parent->d_subdirs.next; +resume: + while (next != &this_parent->d_subdirs) { + struct list_head *tmp = next; + struct dentry *dentry = list_entry(tmp, struct dentry, d_child); + next = tmp->next; + + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); + + change_gid(dentry, gid); + + if (!list_empty(&dentry->d_subdirs)) { + spin_unlock(&this_parent->d_lock); + spin_release(&dentry->d_lock.dep_map, _RET_IP_); + this_parent = dentry; + spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_); + goto repeat; + } + spin_unlock(&dentry->d_lock); + } + /* + * All done at this level ... ascend and resume the search. + */ + rcu_read_lock(); +ascend: + if (this_parent != parent) { + struct dentry *child = this_parent; + this_parent = child->d_parent; + + spin_unlock(&child->d_lock); + spin_lock(&this_parent->d_lock); + + /* go into the first sibling still alive */ + do { + next = child->d_child.next; + if (next == &this_parent->d_subdirs) + goto ascend; + child = list_entry(next, struct dentry, d_child); + } while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED)); + rcu_read_unlock(); + goto resume; + } + rcu_read_unlock(); + spin_unlock(&this_parent->d_lock); + return; +} + static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts) { substring_t args[MAX_OPT_ARGS]; @@ -193,6 +264,7 @@ static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts) if (!gid_valid(gid)) return -EINVAL; opts->gid = gid; + set_gid(tracefs_mount->mnt_root, gid); break; case Opt_mode: if (match_octal(&args[0], &option)) From 11f8cb8903ba4e8ba900fa4e4ab29d0fb4c9ef5d Mon Sep 17 00:00:00 2001 From: Chen Yu Date: Tue, 23 Nov 2021 21:23:30 +0800 Subject: [PATCH 065/143] ACPI: tools: Fix compilation when output directory is not present Compiling the ACPI tools when output directory parameter is specified, but the output directory is not present, triggers the following error: make O=/data/test/tmp/ -C tools/power/acpi/ make: Entering directory '/data/src/kernel/linux/tools/power/acpi' DESCEND tools/acpidbg make[1]: Entering directory '/data/src/kernel/linux/tools/power/acpi/tools/acpidbg' MKDIR include CP include CC tools/acpidbg/acpidbg.o Assembler messages: Fatal error: can't create /data/test/tmp/tools/power/acpi/tools/acpidbg/acpidbg.o: No such file or directory make[1]: *** [../../Makefile.rules:24: /data/test/tmp/tools/power/acpi/tools/acpidbg/acpidbg.o] Error 1 make[1]: Leaving directory '/data/src/kernel/linux/tools/power/acpi/tools/acpidbg' make: *** [Makefile:18: acpidbg] Error 2 make: Leaving directory '/data/src/kernel/linux/tools/power/acpi' which occurs because the output directory has not been created yet. Fix this issue by creating the output directory before compiling. Reported-by: Andy Shevchenko Signed-off-by: Chen Yu Reviewed-by: Andy Shevchenko [ rjw: New subject, changelog edits ] Signed-off-by: Rafael J. Wysocki --- tools/power/acpi/Makefile.config | 1 + tools/power/acpi/Makefile.rules | 1 + 2 files changed, 2 insertions(+) diff --git a/tools/power/acpi/Makefile.config b/tools/power/acpi/Makefile.config index 331f6d30f472..cd7106876a5f 100644 --- a/tools/power/acpi/Makefile.config +++ b/tools/power/acpi/Makefile.config @@ -69,6 +69,7 @@ KERNEL_INCLUDE := $(OUTPUT)include ACPICA_INCLUDE := $(srctree)/../../../drivers/acpi/acpica CFLAGS += -D_LINUX -I$(KERNEL_INCLUDE) -I$(ACPICA_INCLUDE) CFLAGS += $(WARNINGS) +MKDIR = mkdir ifeq ($(strip $(V)),false) QUIET=@ diff --git a/tools/power/acpi/Makefile.rules b/tools/power/acpi/Makefile.rules index 2a6c170b57cd..1d7616f5d0ae 100644 --- a/tools/power/acpi/Makefile.rules +++ b/tools/power/acpi/Makefile.rules @@ -21,6 +21,7 @@ $(KERNEL_INCLUDE): $(objdir)%.o: %.c $(KERNEL_INCLUDE) $(ECHO) " CC " $(subst $(OUTPUT),,$@) + $(QUIET) $(MKDIR) -p $(objdir) 2>/dev/null $(QUIET) $(CC) -c $(CFLAGS) -o $@ $< all: $(OUTPUT)$(TOOL) From 444dd878e85fb33fcfb2682cfdab4c236f33ea3e Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 3 Dec 2021 17:19:47 +0100 Subject: [PATCH 066/143] PM: runtime: Fix pm_runtime_active() kerneldoc comment The kerneldoc comment of pm_runtime_active() does not reflect the behavior of the function, so update it accordingly. Fixes: 403d2d116ec0 ("PM: runtime: Add kerneldoc comments to multiple helpers") Signed-off-by: Rafael J. Wysocki Reviewed-by: Ulf Hansson --- include/linux/pm_runtime.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 222da43b7096..eddd66d426ca 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -129,7 +129,7 @@ static inline bool pm_runtime_suspended(struct device *dev) * pm_runtime_active - Check whether or not a device is runtime-active. * @dev: Target device. * - * Return %true if runtime PM is enabled for @dev and its runtime PM status is + * Return %true if runtime PM is disabled for @dev or its runtime PM status is * %RPM_ACTIVE, or %false otherwise. * * Note that the return value of this function can only be trusted if it is From f872f73601b92c86f3da8bdf3e19abd0f1780eb9 Mon Sep 17 00:00:00 2001 From: Sumeet Pawnikar Date: Tue, 7 Dec 2021 18:05:39 +0530 Subject: [PATCH 067/143] thermal: int340x: Fix VCoRefLow MMIO bit offset for TGL The VCoRefLow CPU FIVR register definition for Tiger Lake is incorrect. Current implementation reads it from MMIO offset 0x5A18 and bit offset [12:14], but the actual correct register definition is from bit offset [11:13]. Update to fix the bit offset. Fixes: 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver") Signed-off-by: Sumeet Pawnikar Cc: 5.14+ # 5.14+ [ rjw: New subject, changelog edits ] Signed-off-by: Rafael J. Wysocki --- drivers/thermal/intel/int340x_thermal/processor_thermal_rfim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_rfim.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_rfim.c index b25b54d4bac1..e693ec8234fb 100644 --- a/drivers/thermal/intel/int340x_thermal/processor_thermal_rfim.c +++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_rfim.c @@ -29,7 +29,7 @@ static const char * const fivr_strings[] = { }; static const struct mmio_reg tgl_fivr_mmio_regs[] = { - { 0, 0x5A18, 3, 0x7, 12}, /* vco_ref_code_lo */ + { 0, 0x5A18, 3, 0x7, 11}, /* vco_ref_code_lo */ { 0, 0x5A18, 8, 0xFF, 16}, /* vco_ref_code_hi */ { 0, 0x5A08, 8, 0xFF, 0}, /* spread_spectrum_pct */ { 0, 0x5A08, 1, 0x1, 8}, /* spread_spectrum_clk_enable */ From d815b3f2f273537cb8afaf5ab11a46851f6c03e5 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 16 Nov 2021 14:50:25 +0300 Subject: [PATCH 068/143] btrfs: fix error pointer dereference in btrfs_ioctl_rm_dev_v2() If memdup_user() fails the error handing will crash when it tries to kfree() an error pointer. Just return directly because there is no cleanup required. Fixes: 1a15eb724aae ("btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls") Reviewed-by: Josef Bacik Signed-off-by: Dan Carpenter Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/ioctl.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 012fbfdfbebf..1b85d98df66b 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3188,10 +3188,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg) return -EPERM; vol_args = memdup_user(arg, sizeof(*vol_args)); - if (IS_ERR(vol_args)) { - ret = PTR_ERR(vol_args); - goto out; - } + if (IS_ERR(vol_args)) + return PTR_ERR(vol_args); if (vol_args->flags & ~BTRFS_DEVICE_REMOVE_ARGS_MASK) { ret = -EOPNOTSUPP; From f981fec12cc5d2c07942301744b9ea61228bf246 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Mon, 22 Nov 2021 17:04:19 -0500 Subject: [PATCH 069/143] btrfs: fail if fstrim_range->start == U64_MAX We've always been failing generic/260 because it's testing things we actually don't care about and thus won't fail for. However we probably should fail for fstrim_range->start == U64_MAX since we clearly can't trim anything past that. This in combination with an update to generic/260 will allow us to pass this test properly. Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3fd736a02c1e..fc4895e6a62c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6051,6 +6051,9 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range) int dev_ret = 0; int ret = 0; + if (range->start == U64_MAX) + return -EINVAL; + /* * Check range overflow if range->len is set. * The default range->len is U64_MAX. From c2e39305299f0118298c2201f6d6cc7d3485f29e Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Wed, 24 Nov 2021 14:14:23 -0500 Subject: [PATCH 070/143] btrfs: clear extent buffer uptodate when we fail to write it I got dmesg errors on generic/281 on our overnight fstests. Looking at the history this happens occasionally, with errors like this WARNING: CPU: 0 PID: 673217 at fs/btrfs/extent_io.c:6848 assert_eb_page_uptodate+0x3f/0x50 CPU: 0 PID: 673217 Comm: kworker/u4:13 Tainted: G W 5.16.0-rc2+ #469 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Workqueue: btrfs-cache btrfs_work_helper RIP: 0010:assert_eb_page_uptodate+0x3f/0x50 RSP: 0018:ffffae598230bc60 EFLAGS: 00010246 RAX: 0017ffffc0002112 RBX: ffffebaec4100900 RCX: 0000000000001000 RDX: ffffebaec45733c7 RSI: ffffebaec4100900 RDI: ffff9fd98919f340 RBP: 0000000000000d56 R08: ffff9fd98e300000 R09: 0000000000000000 R10: 0001207370a91c50 R11: 0000000000000000 R12: 00000000000007b0 R13: ffff9fd98919f340 R14: 0000000001500000 R15: 0000000001cb0000 FS: 0000000000000000(0000) GS:ffff9fd9fbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f549fcf8940 CR3: 0000000114908004 CR4: 0000000000370ef0 Call Trace: extent_buffer_test_bit+0x3f/0x70 free_space_test_bit+0xa6/0xc0 load_free_space_tree+0x1d6/0x430 caching_thread+0x454/0x630 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? lock_release+0x1f0/0x2d0 btrfs_work_helper+0xf2/0x3e0 ? lock_release+0x1f0/0x2d0 ? finish_task_switch.isra.0+0xf9/0x3a0 process_one_work+0x270/0x5a0 worker_thread+0x55/0x3c0 ? process_one_work+0x5a0/0x5a0 kthread+0x174/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 This happens because we're trying to read from a extent buffer page that is !PageUptodate. This happens because we will clear the page uptodate when we have an IO error, but we don't clear the extent buffer uptodate. If we do a read later and find this extent buffer we'll think its valid and not return an error, and then trip over this warning. Fix this by also clearing uptodate on the extent buffer when this happens, so that we get an error when we do a btrfs_search_slot() and find this block later. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4e03a6d3aa32..dcdb97d9205d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4313,6 +4313,12 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb) if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) return; + /* + * A read may stumble upon this buffer later, make sure that it gets an + * error and knows there was an error. + */ + clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); + /* * If we error out, we should add back the dirty_metadata_bytes * to make it consistent. From 68b85589ba8114514d83ae87dd6f3fe9b315cae0 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Wed, 24 Nov 2021 14:14:25 -0500 Subject: [PATCH 071/143] btrfs: call mapping_set_error() on btree inode with a write error generic/484 fails sometimes with compression on because the write ends up small enough that it goes into the btree. This means that we never call mapping_set_error() on the inode itself, because the page gets marked as fine when we inline it into the metadata. When the metadata writeback happens we see it and abort the transaction properly and mark the fs as readonly, however we don't do the mapping_set_error() on anything. In syncfs() we will simply return 0 if the sb is marked read-only, so we can't check for this in our syncfs callback. The only way the error gets returned if we called mapping_set_error() on something. Fix this by calling mapping_set_error() on the btree inode mapping. This allows us to properly return an error on syncfs and pass generic/484 with compression on. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index dcdb97d9205d..3258b6f01e85 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4319,6 +4319,14 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb) */ clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); + /* + * We need to set the mapping with the io error as well because a write + * error will flip the file system readonly, and then syncfs() will + * return a 0 because we are readonly if we don't modify the err seq for + * the superblock. + */ + mapping_set_error(page->mapping, -EIO); + /* * If we error out, we should add back the dirty_metadata_bytes * to make it consistent. From 84c25448929942edacba905cecc0474e91114e7a Mon Sep 17 00:00:00 2001 From: Naohiro Aota Date: Tue, 30 Nov 2021 12:40:21 +0900 Subject: [PATCH 072/143] btrfs: fix re-dirty process of tree-log nodes There is a report of a transaction abort of -EAGAIN with the following script. #!/bin/sh for d in sda sdb; do mkfs.btrfs -d single -m single -f /dev/\${d} done mount /dev/sda /mnt/test mount /dev/sdb /mnt/scratch for dir in test scratch; do echo 3 >/proc/sys/vm/drop_caches fio --directory=/mnt/\${dir} --name=fio.\${dir} --rw=read --size=50G --bs=64m \ --numjobs=$(nproc) --time_based --ramp_time=5 --runtime=480 \ --group_reporting |& tee /dev/shm/fio.\${dir} echo 3 >/proc/sys/vm/drop_caches done for d in sda sdb; do umount /dev/\${d} done The stack trace is shown in below. [3310.967991] BTRFS: error (device sda) in btrfs_commit_transaction:2341: errno=-11 unknown (Error while writing out transaction) [3310.968060] BTRFS info (device sda): forced readonly [3310.968064] BTRFS warning (device sda): Skipping commit of aborted transaction. [3310.968065] ------------[ cut here ]------------ [3310.968066] BTRFS: Transaction aborted (error -11) [3310.968074] WARNING: CPU: 14 PID: 1684 at fs/btrfs/transaction.c:1946 btrfs_commit_transaction.cold+0x209/0x2c8 [3310.968131] CPU: 14 PID: 1684 Comm: fio Not tainted 5.14.10-300.fc35.x86_64 #1 [3310.968135] Hardware name: DIAWAY Tartu/Tartu, BIOS V2.01.B10 04/08/2021 [3310.968137] RIP: 0010:btrfs_commit_transaction.cold+0x209/0x2c8 [3310.968144] RSP: 0018:ffffb284ce393e10 EFLAGS: 00010282 [3310.968147] RAX: 0000000000000026 RBX: ffff973f147b0f60 RCX: 0000000000000027 [3310.968149] RDX: ffff974ecf098a08 RSI: 0000000000000001 RDI: ffff974ecf098a00 [3310.968150] RBP: ffff973f147b0f08 R08: 0000000000000000 R09: ffffb284ce393c48 [3310.968151] R10: ffffb284ce393c40 R11: ffffffff84f47468 R12: ffff973f101bfc00 [3310.968153] R13: ffff971f20cf2000 R14: 00000000fffffff5 R15: ffff973f147b0e58 [3310.968154] FS: 00007efe65468740(0000) GS:ffff974ecf080000(0000) knlGS:0000000000000000 [3310.968157] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [3310.968158] CR2: 000055691bcbe260 CR3: 000000105cfa4001 CR4: 0000000000770ee0 [3310.968160] PKRU: 55555554 [3310.968161] Call Trace: [3310.968167] ? dput+0xd4/0x300 [3310.968174] btrfs_sync_file+0x3f1/0x490 [3310.968180] __x64_sys_fsync+0x33/0x60 [3310.968185] do_syscall_64+0x3b/0x90 [3310.968190] entry_SYSCALL_64_after_hwframe+0x44/0xae [3310.968194] RIP: 0033:0x7efe6557329b [3310.968200] RSP: 002b:00007ffe0236ebc0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [3310.968203] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007efe6557329b [3310.968204] RDX: 0000000000000000 RSI: 00007efe58d77010 RDI: 0000000000000006 [3310.968205] RBP: 0000000004000000 R08: 0000000000000000 R09: 00007efe58d77010 [3310.968207] R10: 0000000016cacc0c R11: 0000000000000293 R12: 00007efe5ce95980 [3310.968208] R13: 0000000000000000 R14: 00007efe6447c790 R15: 0000000c80000000 [3310.968212] ---[ end trace 1a346f4d3c0d96ba ]--- [3310.968214] BTRFS: error (device sda) in cleanup_transaction:1946: errno=-11 unknown The abort occurs because of a write hole while writing out freeing tree nodes of a tree-log tree. For zoned btrfs, we re-dirty a freed tree node to ensure btrfs can write the region and does not leave a hole on write on a zoned device. The current code fails to re-dirty a node when the tree-log tree's depth is greater or equal to 2. That leads to a transaction abort with -EAGAIN. Fix the issue by properly re-dirtying a node on walking up the tree. Fixes: d3575156f662 ("btrfs: zoned: redirty released extent buffers") CC: stable@vger.kernel.org # 5.12+ Link: https://github.com/kdave/btrfs-progs/issues/415 Reviewed-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Signed-off-by: David Sterba --- fs/btrfs/tree-log.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8ab33caf016f..3e6f14e13918 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2908,6 +2908,8 @@ static noinline int walk_up_log_tree(struct btrfs_trans_handle *trans, path->nodes[*level]->len); if (ret) return ret; + btrfs_redirty_list_add(trans->transaction, + next); } else { if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &next->bflags)) clear_extent_buffer_dirty(next); @@ -2988,6 +2990,7 @@ static int walk_log_tree(struct btrfs_trans_handle *trans, next->start, next->len); if (ret) goto out; + btrfs_redirty_list_add(trans->transaction, next); } else { if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &next->bflags)) clear_extent_buffer_dirty(next); @@ -3438,8 +3441,6 @@ static void free_log_tree(struct btrfs_trans_handle *trans, EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); extent_io_tree_release(&log->log_csum_range); - if (trans && log->node) - btrfs_redirty_list_add(trans->transaction, log->node); btrfs_put_root(log); } From da5e817d9d75422eaaa05490d0b9a5e328fc1a51 Mon Sep 17 00:00:00 2001 From: Johannes Thumshirn Date: Fri, 3 Dec 2021 02:55:33 -0800 Subject: [PATCH 073/143] btrfs: free exchange changeset on failures Fstests runs on my VMs have show several kmemleak reports like the following. unreferenced object 0xffff88811ae59080 (size 64): comm "xfs_io", pid 12124, jiffies 4294987392 (age 6.368s) hex dump (first 32 bytes): 00 c0 1c 00 00 00 00 00 ff cf 1c 00 00 00 00 00 ................ 90 97 e5 1a 81 88 ff ff 90 97 e5 1a 81 88 ff ff ................ backtrace: [<00000000ac0176d2>] ulist_add_merge+0x60/0x150 [btrfs] [<0000000076e9f312>] set_state_bits+0x86/0xc0 [btrfs] [<0000000014fe73d6>] set_extent_bit+0x270/0x690 [btrfs] [<000000004f675208>] set_record_extent_bits+0x19/0x20 [btrfs] [<00000000b96137b1>] qgroup_reserve_data+0x274/0x310 [btrfs] [<0000000057e9dcbb>] btrfs_check_data_free_space+0x5c/0xa0 [btrfs] [<0000000019c4511d>] btrfs_delalloc_reserve_space+0x1b/0xa0 [btrfs] [<000000006d37e007>] btrfs_dio_iomap_begin+0x415/0x970 [btrfs] [<00000000fb8a74b8>] iomap_iter+0x161/0x1e0 [<0000000071dff6ff>] __iomap_dio_rw+0x1df/0x700 [<000000002567ba53>] iomap_dio_rw+0x5/0x20 [<0000000072e555f8>] btrfs_file_write_iter+0x290/0x530 [btrfs] [<000000005eb3d845>] new_sync_write+0x106/0x180 [<000000003fb505bf>] vfs_write+0x24d/0x2f0 [<000000009bb57d37>] __x64_sys_pwrite64+0x69/0xa0 [<000000003eba3fdf>] do_syscall_64+0x43/0x90 In case brtfs_qgroup_reserve_data() or btrfs_delalloc_reserve_metadata() fail the allocated extent_changeset will not be freed. So in btrfs_check_data_free_space() and btrfs_delalloc_reserve_space() free the allocated extent_changeset to get rid of the allocated memory. The issue currently only happens in the direct IO write path, but only after 65b3c08606e5 ("btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range"), and also at defrag_one_locked_target(). Every other place is always calling extent_changeset_free() even if its call to btrfs_delalloc_reserve_space() or btrfs_check_data_free_space() has failed. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana Signed-off-by: Johannes Thumshirn Signed-off-by: David Sterba --- fs/btrfs/delalloc-space.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c index 2059d1504149..40c4d6ba3fb9 100644 --- a/fs/btrfs/delalloc-space.c +++ b/fs/btrfs/delalloc-space.c @@ -143,10 +143,13 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode, /* Use new btrfs_qgroup_reserve_data to reserve precious data space. */ ret = btrfs_qgroup_reserve_data(inode, reserved, start, len); - if (ret < 0) + if (ret < 0) { btrfs_free_reserved_data_space_noquota(fs_info, len); - else + extent_changeset_free(*reserved); + *reserved = NULL; + } else { ret = 0; + } return ret; } @@ -452,8 +455,11 @@ int btrfs_delalloc_reserve_space(struct btrfs_inode *inode, if (ret < 0) return ret; ret = btrfs_delalloc_reserve_metadata(inode, len); - if (ret < 0) + if (ret < 0) { btrfs_free_reserved_data_space(inode, *reserved, start, len); + extent_changeset_free(*reserved); + *reserved = NULL; + } return ret; } From 5911f5382022aff2b817cb88f276756af229664d Mon Sep 17 00:00:00 2001 From: Johannes Thumshirn Date: Thu, 2 Dec 2021 00:47:14 -0800 Subject: [PATCH 074/143] btrfs: zoned: clear data relocation bg on zone finish When finishing a zone that is used by a dedicated data relocation block group, also remove its reference from fs_info, so we're not trying to use a full block group for allocations during data relocation, which will always fail. The result is we're not making any forward progress and end up in a deadlock situation. Fixes: c2707a255623 ("btrfs: zoned: add a dedicated data relocation block group") Reviewed-by: Naohiro Aota Signed-off-by: Johannes Thumshirn Signed-off-by: David Sterba --- fs/btrfs/zoned.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 67d932d70798..678a29469511 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1860,6 +1860,7 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group) block_group->alloc_offset = block_group->zone_capacity; block_group->free_space_ctl->free_space = 0; btrfs_clear_treelog_bg(block_group); + btrfs_clear_data_reloc_bg(block_group); spin_unlock(&block_group->lock); ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH, @@ -1942,6 +1943,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len ASSERT(block_group->alloc_offset == block_group->zone_capacity); ASSERT(block_group->free_space_ctl->free_space == 0); btrfs_clear_treelog_bg(block_group); + btrfs_clear_data_reloc_bg(block_group); spin_unlock(&block_group->lock); map = block_group->physical_map; From 8289ed9f93bef2762f9184e136d994734b16d997 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Wed, 1 Dec 2021 19:56:17 +0800 Subject: [PATCH 075/143] btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling I hit the BUG_ON() with generic/475 test case, and to my surprise, all callers of btrfs_del_root_ref() are already aborting transaction, thus there is not need for such BUG_ON(), just go to @out label and caller will properly handle the error. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/root-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 702dc5441f03..db37a3799649 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -336,7 +336,8 @@ int btrfs_del_root_ref(struct btrfs_trans_handle *trans, u64 root_id, key.offset = ref_id; again: ret = btrfs_search_slot(trans, tree_root, &key, path, -1, 1); - BUG_ON(ret < 0); + if (ret < 0) + goto out; if (ret == 0) { leaf = path->nodes[0]; ref = btrfs_item_ptr(leaf, path->slots[0], From 30e32f300be6d0160fd1b3fc6d0f62917acd9be2 Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Wed, 8 Dec 2021 15:35:06 +0200 Subject: [PATCH 076/143] nvmet-tcp: fix possible list corruption for unexpected command failure nvmet_tcp_handle_req_failure needs to understand weather to prepare for incoming data or the next pdu. However if we misidentify this, we will wait for 0-length data, and queue the response although nvmet_req_init already did that. The particular command was namespace management command with no data, which was incorrectly categorized as a command with incapsule data. Also, add a code comment of what we are trying to do here. Signed-off-by: Sagi Grimberg Reviewed-by: Keith Busch Signed-off-by: Christoph Hellwig --- drivers/nvme/target/tcp.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index cb6a473c3eaf..7c1c43ce466b 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -922,7 +922,14 @@ static void nvmet_tcp_handle_req_failure(struct nvmet_tcp_queue *queue, size_t data_len = le32_to_cpu(req->cmd->common.dptr.sgl.length); int ret; - if (!nvme_is_write(cmd->req.cmd) || + /* + * This command has not been processed yet, hence we are trying to + * figure out if there is still pending data left to receive. If + * we don't, we can simply prepare for the next pdu and bail out, + * otherwise we will need to prepare a buffer and receive the + * stale data before continuing forward. + */ + if (!nvme_is_write(cmd->req.cmd) || !data_len || data_len > cmd->req.port->inline_data_size) { nvmet_prepare_receive_pdu(queue); return; From b19926d4f3a660a8b76e5d989ffd1168e619a5c4 Mon Sep 17 00:00:00 2001 From: Bas Nieuwenhuizen Date: Wed, 8 Dec 2021 03:39:35 +0100 Subject: [PATCH 077/143] drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit dma_fence_chain_find_seqno only ever returns the top fence in the chain or an unsignalled fence. Hence if we request a seqno that is already signalled it returns a NULL fence. Some callers are not prepared to handle this, like the syncobj transfer functions for example. This behavior is "new" with timeline syncobj and it looks like not all callers were updated. To fix this behavior make sure that a successful drm_sync_find_fence always returns a non-NULL fence. v2: Move the fix to drm_syncobj_find_fence from the transfer functions. Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen Reviewed-by: Christian König Acked-by: Lionel Landwerlin Signed-off-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20211208023935.17018-1-bas@basnieuwenhuizen.nl --- drivers/gpu/drm/drm_syncobj.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index c9a9d74f338c..c313a5b4549c 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -404,8 +404,17 @@ int drm_syncobj_find_fence(struct drm_file *file_private, if (*fence) { ret = dma_fence_chain_find_seqno(fence, point); - if (!ret) + if (!ret) { + /* If the requested seqno is already signaled + * drm_syncobj_find_fence may return a NULL + * fence. To make sure the recipient gets + * signalled, use a new fence instead. + */ + if (!*fence) + *fence = dma_fence_get_stub(); + goto out; + } dma_fence_put(*fence); } else { ret = -EINVAL; From 7d5b7cad79da76f3dad4a9f6040e524217814e5a Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Mon, 6 Dec 2021 19:20:30 +0100 Subject: [PATCH 078/143] ftrace: Use direct_ops hash in unregister_ftrace_direct Now when we have *direct_multi interface the direct_functions hash is no longer owned just by direct_ops. It's also used by any other ftrace_ops passed to *direct_multi interface. Thus to find out that we are unregistering the last function from direct_ops, we need to check directly direct_ops's hash. Link: https://lkml.kernel.org/r/20211206182032.87248-2-jolsa@kernel.org Cc: Ingo Molnar Cc: Heiko Carstens Fixes: f64dd4627ec6 ("ftrace: Add multi direct register/unregister interface") Signed-off-by: Jiri Olsa Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/ftrace.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 30bc880c3849..7f0594e28226 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5217,6 +5217,7 @@ int unregister_ftrace_direct(unsigned long ip, unsigned long addr) { struct ftrace_direct_func *direct; struct ftrace_func_entry *entry; + struct ftrace_hash *hash; int ret = -ENODEV; mutex_lock(&direct_mutex); @@ -5225,7 +5226,8 @@ int unregister_ftrace_direct(unsigned long ip, unsigned long addr) if (!entry) goto out_unlock; - if (direct_functions->count == 1) + hash = direct_ops.func_hash->filter_hash; + if (hash->count == 1) unregister_ftrace_function(&direct_ops); ret = ftrace_set_filter_ip(&direct_ops, ip, 1, 0); From fea3ffa48c6d42a11dca766c89284d22eaf5603f Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Mon, 6 Dec 2021 19:20:31 +0100 Subject: [PATCH 079/143] ftrace: Add cleanup to unregister_ftrace_direct_multi Adding ops cleanup to unregister_ftrace_direct_multi, so it can be reused in another register call. Link: https://lkml.kernel.org/r/20211206182032.87248-3-jolsa@kernel.org Cc: Ingo Molnar Cc: Heiko Carstens Fixes: f64dd4627ec6 ("ftrace: Add multi direct register/unregister interface") Signed-off-by: Jiri Olsa Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/ftrace.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 7f0594e28226..be5f6b32a012 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5542,6 +5542,10 @@ int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) err = unregister_ftrace_function(ops); remove_direct_functions_hash(hash, addr); mutex_unlock(&direct_mutex); + + /* cleanup for possible another register call */ + ops->func = NULL; + ops->trampoline = 0; return err; } EXPORT_SYMBOL_GPL(unregister_ftrace_direct_multi); From 9cdb54be3e463f5c0607fcac045d5a9c67575775 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 1 Dec 2021 20:48:31 -0800 Subject: [PATCH 080/143] drm/i915: Fix error pointer dereference in i915_gem_do_execbuffer() Originally "out_fence" was set using out_fence = sync_file_create() but which returns NULL, but now it is set with out_fence = eb_requests_create() which returns error pointers. The error path needs to be modified to avoid an Oops in the "goto err_request;" path. Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Signed-off-by: Dan Carpenter Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost Signed-off-by: John Harrison Link: https://patchwork.freedesktop.org/patch/msgid/20211202044831.29583-1-matthew.brost@intel.com (cherry picked from commit 8722ded49ce8a0c706b373e8087eb810684962ff) Signed-off-by: Rodrigo Vivi --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 4d7da07442f2..9b24d9b5ade1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -3277,6 +3277,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, out_fence = eb_requests_create(&eb, in_fence, out_fence_fd); if (IS_ERR(out_fence)) { err = PTR_ERR(out_fence); + out_fence = NULL; if (eb.requests[0]) goto err_request; else From 75e895343d5a2fcbdf4cb3d31ab7492bd65925f0 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Wed, 8 Dec 2021 15:39:16 -0600 Subject: [PATCH 081/143] Revert "kbuild: Enable DT schema checks for %.dtb targets" This reverts commit 53182e81f47d4ea0c727c49ad23cb782173ab849. This added tool dependencies on various build systems using %.dtb targets. Validation will need to be controlled by a kconfig or make variable instead, but for now let's just revert it. Signed-off-by: Rob Herring --- Makefile | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Makefile b/Makefile index 9e12c14ea0fb..fa5070e53979 100644 --- a/Makefile +++ b/Makefile @@ -1374,17 +1374,17 @@ endif ifneq ($(dtstree),) -%.dtb: dt_binding_check include/config/kernel.release scripts_dtc - $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@ $(dtstree)/$*.dt.yaml +%.dtb: include/config/kernel.release scripts_dtc + $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@ -%.dtbo: dt_binding_check include/config/kernel.release scripts_dtc - $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@ $(dtstree)/$*.dt.yaml +%.dtbo: include/config/kernel.release scripts_dtc + $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@ PHONY += dtbs dtbs_install dtbs_check dtbs: include/config/kernel.release scripts_dtc $(Q)$(MAKE) $(build)=$(dtstree) -ifneq ($(filter dtbs_check %.dtb %.dtbo, $(MAKECMDGOALS)),) +ifneq ($(filter dtbs_check, $(MAKECMDGOALS)),) export CHECK_DTBS=y dtbs: dt_binding_check endif From 9de0737d5ba0425c3154d5d83da12a8fa8595c0f Mon Sep 17 00:00:00 2001 From: Paulo Alcantara Date: Tue, 7 Dec 2021 22:51:04 -0300 Subject: [PATCH 082/143] cifs: fix ntlmssp auth when there is no key exchange Warn on the lack of key exchange during NTLMSSP authentication rather than aborting it as there are some servers that do not set it in CHALLENGE message. Signed-off-by: Paulo Alcantara (SUSE) Acked-by: Ronnie Sahlberg Signed-off-by: Steve French --- fs/cifs/sess.c | 54 +++++++++++++++++++++++++++++++++----------------- 1 file changed, 36 insertions(+), 18 deletions(-) diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c index af63548eaf26..035dc3e245dc 100644 --- a/fs/cifs/sess.c +++ b/fs/cifs/sess.c @@ -590,8 +590,8 @@ int decode_ntlmssp_challenge(char *bcc_ptr, int blob_len, { unsigned int tioffset; /* challenge message target info area */ unsigned int tilen; /* challenge message target info area length */ - CHALLENGE_MESSAGE *pblob = (CHALLENGE_MESSAGE *)bcc_ptr; + __u32 server_flags; if (blob_len < sizeof(CHALLENGE_MESSAGE)) { cifs_dbg(VFS, "challenge blob len %d too small\n", blob_len); @@ -609,12 +609,37 @@ int decode_ntlmssp_challenge(char *bcc_ptr, int blob_len, return -EINVAL; } + server_flags = le32_to_cpu(pblob->NegotiateFlags); + cifs_dbg(FYI, "%s: negotiate=0x%08x challenge=0x%08x\n", __func__, + ses->ntlmssp->client_flags, server_flags); + + if ((ses->ntlmssp->client_flags & (NTLMSSP_NEGOTIATE_SEAL | NTLMSSP_NEGOTIATE_SIGN)) && + (!(server_flags & NTLMSSP_NEGOTIATE_56) && !(server_flags & NTLMSSP_NEGOTIATE_128))) { + cifs_dbg(VFS, "%s: requested signing/encryption but server did not return either 56-bit or 128-bit session key size\n", + __func__); + return -EINVAL; + } + if (!(server_flags & NTLMSSP_NEGOTIATE_NTLM) && !(server_flags & NTLMSSP_NEGOTIATE_EXTENDED_SEC)) { + cifs_dbg(VFS, "%s: server does not seem to support either NTLMv1 or NTLMv2\n", __func__); + return -EINVAL; + } + if (ses->server->sign && !(server_flags & NTLMSSP_NEGOTIATE_SIGN)) { + cifs_dbg(VFS, "%s: forced packet signing but server does not seem to support it\n", + __func__); + return -EOPNOTSUPP; + } + if ((ses->ntlmssp->client_flags & NTLMSSP_NEGOTIATE_KEY_XCH) && + !(server_flags & NTLMSSP_NEGOTIATE_KEY_XCH)) + pr_warn_once("%s: authentication has been weakened as server does not support key exchange\n", + __func__); + + ses->ntlmssp->server_flags = server_flags; + memcpy(ses->ntlmssp->cryptkey, pblob->Challenge, CIFS_CRYPTO_KEY_SIZE); - /* BB we could decode pblob->NegotiateFlags; some may be useful */ /* In particular we can examine sign flags */ /* BB spec says that if AvId field of MsvAvTimestamp is populated then we must set the MIC field of the AUTHENTICATE_MESSAGE */ - ses->ntlmssp->server_flags = le32_to_cpu(pblob->NegotiateFlags); + tioffset = le32_to_cpu(pblob->TargetInfoArray.BufferOffset); tilen = le16_to_cpu(pblob->TargetInfoArray.Length); if (tioffset > blob_len || tioffset + tilen > blob_len) { @@ -721,13 +746,13 @@ int build_ntlmssp_negotiate_blob(unsigned char **pbuffer, flags = NTLMSSP_NEGOTIATE_56 | NTLMSSP_REQUEST_TARGET | NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE | NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC | - NTLMSSP_NEGOTIATE_SEAL; - if (server->sign) - flags |= NTLMSSP_NEGOTIATE_SIGN; + NTLMSSP_NEGOTIATE_ALWAYS_SIGN | NTLMSSP_NEGOTIATE_SEAL | + NTLMSSP_NEGOTIATE_SIGN; if (!server->session_estab || ses->ntlmssp->sesskey_per_smbsess) flags |= NTLMSSP_NEGOTIATE_KEY_XCH; tmp = *pbuffer + sizeof(NEGOTIATE_MESSAGE); + ses->ntlmssp->client_flags = flags; sec_blob->NegotiateFlags = cpu_to_le32(flags); /* these fields should be null in negotiate phase MS-NLMP 3.1.5.1.1 */ @@ -779,15 +804,8 @@ int build_ntlmssp_auth_blob(unsigned char **pbuffer, memcpy(sec_blob->Signature, NTLMSSP_SIGNATURE, 8); sec_blob->MessageType = NtLmAuthenticate; - flags = NTLMSSP_NEGOTIATE_56 | - NTLMSSP_REQUEST_TARGET | NTLMSSP_NEGOTIATE_TARGET_INFO | - NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE | - NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC | - NTLMSSP_NEGOTIATE_SEAL | NTLMSSP_NEGOTIATE_WORKSTATION_SUPPLIED; - if (ses->server->sign) - flags |= NTLMSSP_NEGOTIATE_SIGN; - if (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess) - flags |= NTLMSSP_NEGOTIATE_KEY_XCH; + flags = ses->ntlmssp->server_flags | NTLMSSP_REQUEST_TARGET | + NTLMSSP_NEGOTIATE_TARGET_INFO | NTLMSSP_NEGOTIATE_WORKSTATION_SUPPLIED; tmp = *pbuffer + sizeof(AUTHENTICATE_MESSAGE); sec_blob->NegotiateFlags = cpu_to_le32(flags); @@ -834,9 +852,9 @@ int build_ntlmssp_auth_blob(unsigned char **pbuffer, *pbuffer, &tmp, nls_cp); - if (((ses->ntlmssp->server_flags & NTLMSSP_NEGOTIATE_KEY_XCH) || - (ses->ntlmssp->server_flags & NTLMSSP_NEGOTIATE_EXTENDED_SEC)) - && !calc_seckey(ses)) { + if ((ses->ntlmssp->server_flags & NTLMSSP_NEGOTIATE_KEY_XCH) && + (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess) && + !calc_seckey(ses)) { memcpy(tmp, ses->ntlmssp->ciphertext, CIFS_CPHTXT_SIZE); sec_blob->SessionKey.BufferOffset = cpu_to_le32(tmp - *pbuffer); sec_blob->SessionKey.Length = cpu_to_le16(CIFS_CPHTXT_SIZE); From a66307d473077b7aeba74e9b09c841ab3d399c2d Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Wed, 8 Dec 2021 07:58:53 +0100 Subject: [PATCH 083/143] libata: add horkage for ASMedia 1092 The ASMedia 1092 has a configuration mode which will present a dummy device; sadly the implementation falsely claims to provide a device with 100M which doesn't actually exist. So disable this device to avoid errors during boot. Cc: stable@vger.kernel.org Signed-off-by: Hannes Reinecke Signed-off-by: Damien Le Moal --- drivers/ata/libata-core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 59ad8c979cb3..aba0c67d1bd6 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3920,6 +3920,8 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "VRFDFC22048UCHC-TE*", NULL, ATA_HORKAGE_NODMA }, /* Odd clown on sil3726/4726 PMPs */ { "Config Disk", NULL, ATA_HORKAGE_DISABLE }, + /* Similar story with ASMedia 1092 */ + { "ASMT109x- Config", NULL, ATA_HORKAGE_DISABLE }, /* Weird ATAPI devices */ { "TORiSAN DVD-ROM DRD-N216", NULL, ATA_HORKAGE_MAX_SEC_128 }, From af6902ec415655236adea91826bd96ed0ab16f42 Mon Sep 17 00:00:00 2001 From: Nicholas Kazlauskas Date: Tue, 23 Nov 2021 11:56:38 -0500 Subject: [PATCH 084/143] drm/amd/display: Fix DPIA outbox timeout after S3/S4/reset [Why] The HW interrupt gets disabled after S3/S4/reset so we don't receive notifications for HPD or AUX from DMUB - leading to timeout and black screen with (or without) DPIA links connected. [How] Re-enable the interrupt after S3/S4/reset like we do for the other DC interrupts. Guard both instances of the outbox interrupt enable or we'll hang during restore on ASIC that don't support it. Fixes: 6eff272dbee7ad ("drm/amd/display: Fix DPIA outbox timeout after GPU reset") Reviewed-by: Jude Shih Acked-by: Pavle Kotarac Signed-off-by: Nicholas Kazlauskas Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 1cd6b9f4a568..122dae1a1813 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2576,7 +2576,8 @@ static int dm_resume(void *handle) */ link_enc_cfg_init(dm->dc, dc_state); - amdgpu_dm_outbox_init(adev); + if (dc_enable_dmub_notifications(adev->dm.dc)) + amdgpu_dm_outbox_init(adev); r = dm_dmub_hw_init(adev); if (r) @@ -2625,6 +2626,10 @@ static int dm_resume(void *handle) /* TODO: Remove dc_state->dccg, use dc->dccg directly. */ dc_resource_state_construct(dm->dc, dm_state->context); + /* Re-enable outbox interrupts for DPIA. */ + if (dc_enable_dmub_notifications(adev->dm.dc)) + amdgpu_dm_outbox_init(adev); + /* Before powering on DC we need to re-initialize DMUB. */ r = dm_dmub_hw_init(adev); if (r) From 0755c38eb007196a5f779298b4a5f46c4eec41d2 Mon Sep 17 00:00:00 2001 From: Mikita Lipski Date: Mon, 15 Nov 2021 16:07:38 -0500 Subject: [PATCH 085/143] drm/amd/display: prevent reading unitialized links [why/how] The function can be called on boot or after suspend when links are not initialized, to prevent it guard it with NULL pointer check Reviewed-by: Nicholas Kazlauskas Acked-by: Pavle Kotarac Signed-off-by: Mikita Lipski Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/dc_link.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h b/drivers/gpu/drm/amd/display/dc/dc_link.h index b01077a6af0e..fad3d883ed89 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_link.h +++ b/drivers/gpu/drm/amd/display/dc/dc_link.h @@ -226,6 +226,8 @@ static inline void get_edp_links(const struct dc *dc, *edp_num = 0; for (i = 0; i < dc->link_count; i++) { // report any eDP links, even unconnected DDI's + if (!dc->links[i]) + continue; if (dc->links[i]->connector_signal == SIGNAL_TYPE_EDP) { edp_links[*edp_num] = dc->links[i]; if (++(*edp_num) == MAX_NUM_EDP) From b503de239f62eca898cfb7e820d9a35499137d22 Mon Sep 17 00:00:00 2001 From: Vincent Whitchurch Date: Thu, 2 Dec 2021 16:32:14 +0100 Subject: [PATCH 086/143] i2c: virtio: fix completion handling The driver currently assumes that the notify callback is only received when the device is done with all the queued buffers. However, this is not true, since the notify callback could be called without any of the queued buffers being completed (for example, with virtio-pci and shared interrupts) or with only some of the buffers being completed (since the driver makes them available to the device in multiple separate virtqueue_add_sgs() calls). This can lead to incorrect data on the I2C bus or memory corruption in the guest if the device operates on buffers which are have been freed by the driver. (The WARN_ON in the driver is also triggered.) BUG kmalloc-128 (Tainted: G W ): Poison overwritten First byte 0x0 instead of 0x6b Allocated in i2cdev_ioctl_rdwr+0x9d/0x1de age=243 cpu=0 pid=28 memdup_user+0x2e/0xbd i2cdev_ioctl_rdwr+0x9d/0x1de i2cdev_ioctl+0x247/0x2ed vfs_ioctl+0x21/0x30 sys_ioctl+0xb18/0xb41 Freed in i2cdev_ioctl_rdwr+0x1bb/0x1de age=68 cpu=0 pid=28 kfree+0x1bd/0x1cc i2cdev_ioctl_rdwr+0x1bb/0x1de i2cdev_ioctl+0x247/0x2ed vfs_ioctl+0x21/0x30 sys_ioctl+0xb18/0xb41 Fix this by calling virtio_get_buf() from the notify handler like other virtio drivers and by actually waiting for all the buffers to be completed. Fixes: 3cfc88380413d20f ("i2c: virtio: add a virtio i2c frontend driver") Acked-by: Viresh Kumar Signed-off-by: Vincent Whitchurch Acked-by: Michael S. Tsirkin Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-virtio.c | 32 ++++++++++++-------------------- 1 file changed, 12 insertions(+), 20 deletions(-) diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c index 95378780da6d..41eb0dcc3204 100644 --- a/drivers/i2c/busses/i2c-virtio.c +++ b/drivers/i2c/busses/i2c-virtio.c @@ -22,24 +22,24 @@ /** * struct virtio_i2c - virtio I2C data * @vdev: virtio device for this controller - * @completion: completion of virtio I2C message * @adap: I2C adapter for this controller * @vq: the virtio virtqueue for communication */ struct virtio_i2c { struct virtio_device *vdev; - struct completion completion; struct i2c_adapter adap; struct virtqueue *vq; }; /** * struct virtio_i2c_req - the virtio I2C request structure + * @completion: completion of virtio I2C message * @out_hdr: the OUT header of the virtio I2C message * @buf: the buffer into which data is read, or from which it's written * @in_hdr: the IN header of the virtio I2C message */ struct virtio_i2c_req { + struct completion completion; struct virtio_i2c_out_hdr out_hdr ____cacheline_aligned; uint8_t *buf ____cacheline_aligned; struct virtio_i2c_in_hdr in_hdr ____cacheline_aligned; @@ -47,9 +47,11 @@ struct virtio_i2c_req { static void virtio_i2c_msg_done(struct virtqueue *vq) { - struct virtio_i2c *vi = vq->vdev->priv; + struct virtio_i2c_req *req; + unsigned int len; - complete(&vi->completion); + while ((req = virtqueue_get_buf(vq, &len))) + complete(&req->completion); } static int virtio_i2c_prepare_reqs(struct virtqueue *vq, @@ -62,6 +64,8 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq, for (i = 0; i < num; i++) { int outcnt = 0, incnt = 0; + init_completion(&reqs[i].completion); + /* * Only 7-bit mode supported for this moment. For the address * format, Please check the Virtio I2C Specification. @@ -106,21 +110,15 @@ static int virtio_i2c_complete_reqs(struct virtqueue *vq, struct virtio_i2c_req *reqs, struct i2c_msg *msgs, int num) { - struct virtio_i2c_req *req; bool failed = false; - unsigned int len; int i, j = 0; for (i = 0; i < num; i++) { - /* Detach the ith request from the vq */ - req = virtqueue_get_buf(vq, &len); + struct virtio_i2c_req *req = &reqs[i]; - /* - * Condition req == &reqs[i] should always meet since we have - * total num requests in the vq. reqs[i] can never be NULL here. - */ - if (!failed && (WARN_ON(req != &reqs[i]) || - req->in_hdr.status != VIRTIO_I2C_MSG_OK)) + wait_for_completion(&req->completion); + + if (!failed && req->in_hdr.status != VIRTIO_I2C_MSG_OK) failed = true; i2c_put_dma_safe_msg_buf(reqs[i].buf, &msgs[i], !failed); @@ -156,12 +154,8 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, * remote here to clear the virtqueue, so we can try another set of * messages later on. */ - - reinit_completion(&vi->completion); virtqueue_kick(vq); - wait_for_completion(&vi->completion); - count = virtio_i2c_complete_reqs(vq, reqs, msgs, count); err_free: @@ -210,8 +204,6 @@ static int virtio_i2c_probe(struct virtio_device *vdev) vdev->priv = vi; vi->vdev = vdev; - init_completion(&vi->completion); - ret = virtio_i2c_setup_vqs(vi); if (ret) return ret; From d594b35d3b31bc04b6ef36589f38135d3acb8df5 Mon Sep 17 00:00:00 2001 From: Wenbin Mei Date: Tue, 7 Dec 2021 15:50:13 +0800 Subject: [PATCH 087/143] mmc: mediatek: free the ext_csd when mmc_get_ext_csd success If mmc_get_ext_csd success, the ext_csd are not freed. Add the missing kfree() calls. Signed-off-by: Wenbin Mei Fixes: c4ac38c6539b ("mmc: mtk-sd: Add HS400 online tuning support") Link: https://lore.kernel.org/r/20211207075013.22911-1-wenbin.mei@mediatek.com Signed-off-by: Ulf Hansson --- drivers/mmc/host/mtk-sd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/mtk-sd.c b/drivers/mmc/host/mtk-sd.c index 943940b44e83..632775217d35 100644 --- a/drivers/mmc/host/mtk-sd.c +++ b/drivers/mmc/host/mtk-sd.c @@ -2291,8 +2291,10 @@ static int msdc_execute_hs400_tuning(struct mmc_host *mmc, struct mmc_card *card sdr_set_field(host->base + PAD_DS_TUNE, PAD_DS_TUNE_DLY1, i); ret = mmc_get_ext_csd(card, &ext_csd); - if (!ret) + if (!ret) { result_dly1 |= (1 << i); + kfree(ext_csd); + } } host->hs400_tuning = false; From 52255ef662a5d490678fbad64a735f88fcba564d Mon Sep 17 00:00:00 2001 From: Raviteja Goud Talla Date: Fri, 3 Dec 2021 20:26:03 +0530 Subject: [PATCH 088/143] drm/i915/gen11: Moving WAs to icl_gt_workarounds_init() Bspec page says "Reset: BUS", Accordingly moving w/a's: Wa_1407352427,Wa_1406680159 to proper function icl_gt_workarounds_init() Which will resolve guc enabling error v2: - Previous patch rev2 was created by email client which caused the Build failure, This v2 is to resolve the previous broken series Reviewed-by: John Harrison Signed-off-by: Raviteja Goud Talla Signed-off-by: John Harrison Link: https://patchwork.freedesktop.org/patch/msgid/20211203145603.4006937-1-ravitejax.goud.talla@intel.com (cherry picked from commit 67b858dd89932086ae0ee2d0ce4dd070a2c88bb3) Signed-off-by: Rodrigo Vivi --- drivers/gpu/drm/i915/gt/intel_workarounds.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index ed73d9bc9d40..2400d6423ba5 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -1127,6 +1127,15 @@ icl_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) GAMT_CHKN_BIT_REG, GAMT_CHKN_DISABLE_L3_COH_PIPE); + /* Wa_1407352427:icl,ehl */ + wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE2, + PSDUNIT_CLKGATE_DIS); + + /* Wa_1406680159:icl,ehl */ + wa_write_or(wal, + SUBSLICE_UNIT_LEVEL_CLKGATE, + GWUNIT_CLKGATE_DIS); + /* Wa_1607087056:icl,ehl,jsl */ if (IS_ICELAKE(i915) || IS_JSL_EHL_GT_STEP(i915, STEP_A0, STEP_B0)) @@ -1852,15 +1861,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE, VSUNIT_CLKGATE_DIS | HSUNIT_CLKGATE_DIS); - /* Wa_1407352427:icl,ehl */ - wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE2, - PSDUNIT_CLKGATE_DIS); - - /* Wa_1406680159:icl,ehl */ - wa_write_or(wal, - SUBSLICE_UNIT_LEVEL_CLKGATE, - GWUNIT_CLKGATE_DIS); - /* * Wa_1408767742:icl[a2..forever],ehl[all] * Wa_1605460711:icl[a0..c0] From ee3a4f666207b5a9d3d4bc7f45c9d59f2aeb3a0d Mon Sep 17 00:00:00 2001 From: "Maciej S. Szmigiero" Date: Fri, 3 Dec 2021 00:10:13 +0100 Subject: [PATCH 089/143] KVM: x86: selftests: svm_int_ctl_test: fix intercept calculation INTERCEPT_x are bit positions, but the code was using the raw value of INTERCEPT_VINTR (4) instead of BIT(INTERCEPT_VINTR). This resulted in masking of bit 2 - that is, SMI instead of VINTR. Signed-off-by: Maciej S. Szmigiero Message-Id: <49b9571d25588870db5380b0be1a41df4bbaaf93.1638486479.git.maciej.szmigiero@oracle.com> --- tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c b/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c index df04f56ce859..30a81038df46 100644 --- a/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c +++ b/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c @@ -75,7 +75,7 @@ static void l1_guest_code(struct svm_test_data *svm) vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK; /* No intercepts for real and virtual interrupts */ - vmcb->control.intercept &= ~(1ULL << INTERCEPT_INTR | INTERCEPT_VINTR); + vmcb->control.intercept &= ~(BIT(INTERCEPT_INTR) | BIT(INTERCEPT_VINTR)); /* Make a virtual interrupt VINTR_IRQ_NUMBER pending */ vmcb->control.int_ctl |= V_IRQ_MASK | (0x1 << V_INTR_PRIO_SHIFT); From e1067a07cfbc5a36abad3752fafe4c79e06db1bb Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Mon, 6 Dec 2021 19:20:32 +0100 Subject: [PATCH 090/143] ftrace/samples: Add module to test multi direct modify interface Adding ftrace-direct-multi-modify.ko kernel module that uses modify_ftrace_direct_multi API. The core functionality is taken from ftrace-direct-modify.ko kernel module and changed to fit multi direct interface. The init function creates kthread that periodically calls modify_ftrace_direct_multi to change the trampoline address for the direct ftrace_ops. The ftrace trace_pipe then shows trace from both trampolines. Link: https://lkml.kernel.org/r/20211206182032.87248-4-jolsa@kernel.org Cc: Ingo Molnar Reviewed-by: Heiko Carstens Tested-by: Heiko Carstens Signed-off-by: Jiri Olsa Signed-off-by: Steven Rostedt (VMware) --- samples/ftrace/Makefile | 1 + samples/ftrace/ftrace-direct-multi-modify.c | 152 ++++++++++++++++++++ 2 files changed, 153 insertions(+) create mode 100644 samples/ftrace/ftrace-direct-multi-modify.c diff --git a/samples/ftrace/Makefile b/samples/ftrace/Makefile index b9198e2eef28..faf8cdb79c5f 100644 --- a/samples/ftrace/Makefile +++ b/samples/ftrace/Makefile @@ -4,6 +4,7 @@ obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct.o obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct-too.o obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct-modify.o obj-$(CONFIG_SAMPLE_FTRACE_DIRECT_MULTI) += ftrace-direct-multi.o +obj-$(CONFIG_SAMPLE_FTRACE_DIRECT_MULTI) += ftrace-direct-multi-modify.o CFLAGS_sample-trace-array.o := -I$(src) obj-$(CONFIG_SAMPLE_TRACE_ARRAY) += sample-trace-array.o diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c new file mode 100644 index 000000000000..91bc42a7adb9 --- /dev/null +++ b/samples/ftrace/ftrace-direct-multi-modify.c @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include + +void my_direct_func1(unsigned long ip) +{ + trace_printk("my direct func1 ip %lx\n", ip); +} + +void my_direct_func2(unsigned long ip) +{ + trace_printk("my direct func2 ip %lx\n", ip); +} + +extern void my_tramp1(void *); +extern void my_tramp2(void *); + +#ifdef CONFIG_X86_64 + +asm ( +" .pushsection .text, \"ax\", @progbits\n" +" .type my_tramp1, @function\n" +" .globl my_tramp1\n" +" my_tramp1:" +" pushq %rbp\n" +" movq %rsp, %rbp\n" +" pushq %rdi\n" +" movq 8(%rbp), %rdi\n" +" call my_direct_func1\n" +" popq %rdi\n" +" leave\n" +" ret\n" +" .size my_tramp1, .-my_tramp1\n" +" .type my_tramp2, @function\n" +"\n" +" .globl my_tramp2\n" +" my_tramp2:" +" pushq %rbp\n" +" movq %rsp, %rbp\n" +" pushq %rdi\n" +" movq 8(%rbp), %rdi\n" +" call my_direct_func2\n" +" popq %rdi\n" +" leave\n" +" ret\n" +" .size my_tramp2, .-my_tramp2\n" +" .popsection\n" +); + +#endif /* CONFIG_X86_64 */ + +#ifdef CONFIG_S390 + +asm ( +" .pushsection .text, \"ax\", @progbits\n" +" .type my_tramp1, @function\n" +" .globl my_tramp1\n" +" my_tramp1:" +" lgr %r1,%r15\n" +" stmg %r0,%r5,"__stringify(__SF_GPRS)"(%r15)\n" +" stg %r14,"__stringify(__SF_GPRS+8*8)"(%r15)\n" +" aghi %r15,"__stringify(-STACK_FRAME_OVERHEAD)"\n" +" stg %r1,"__stringify(__SF_BACKCHAIN)"(%r15)\n" +" lgr %r2,%r0\n" +" brasl %r14,my_direct_func1\n" +" aghi %r15,"__stringify(STACK_FRAME_OVERHEAD)"\n" +" lmg %r0,%r5,"__stringify(__SF_GPRS)"(%r15)\n" +" lg %r14,"__stringify(__SF_GPRS+8*8)"(%r15)\n" +" lgr %r1,%r0\n" +" br %r1\n" +" .size my_tramp1, .-my_tramp1\n" +"\n" +" .type my_tramp2, @function\n" +" .globl my_tramp2\n" +" my_tramp2:" +" lgr %r1,%r15\n" +" stmg %r0,%r5,"__stringify(__SF_GPRS)"(%r15)\n" +" stg %r14,"__stringify(__SF_GPRS+8*8)"(%r15)\n" +" aghi %r15,"__stringify(-STACK_FRAME_OVERHEAD)"\n" +" stg %r1,"__stringify(__SF_BACKCHAIN)"(%r15)\n" +" lgr %r2,%r0\n" +" brasl %r14,my_direct_func2\n" +" aghi %r15,"__stringify(STACK_FRAME_OVERHEAD)"\n" +" lmg %r0,%r5,"__stringify(__SF_GPRS)"(%r15)\n" +" lg %r14,"__stringify(__SF_GPRS+8*8)"(%r15)\n" +" lgr %r1,%r0\n" +" br %r1\n" +" .size my_tramp2, .-my_tramp2\n" +" .popsection\n" +); + +#endif /* CONFIG_S390 */ + +static unsigned long my_tramp = (unsigned long)my_tramp1; +static unsigned long tramps[2] = { + (unsigned long)my_tramp1, + (unsigned long)my_tramp2, +}; + +static struct ftrace_ops direct; + +static int simple_thread(void *arg) +{ + static int t; + int ret = 0; + + while (!kthread_should_stop()) { + set_current_state(TASK_INTERRUPTIBLE); + schedule_timeout(2 * HZ); + + if (ret) + continue; + t ^= 1; + ret = modify_ftrace_direct_multi(&direct, tramps[t]); + if (!ret) + my_tramp = tramps[t]; + WARN_ON_ONCE(ret); + } + + return 0; +} + +static struct task_struct *simple_tsk; + +static int __init ftrace_direct_multi_init(void) +{ + int ret; + + ftrace_set_filter_ip(&direct, (unsigned long) wake_up_process, 0, 0); + ftrace_set_filter_ip(&direct, (unsigned long) schedule, 0, 0); + + ret = register_ftrace_direct_multi(&direct, my_tramp); + + if (!ret) + simple_tsk = kthread_run(simple_thread, NULL, "event-sample-fn"); + return ret; +} + +static void __exit ftrace_direct_multi_exit(void) +{ + kthread_stop(simple_tsk); + unregister_ftrace_direct_multi(&direct, my_tramp); +} + +module_init(ftrace_direct_multi_init); +module_exit(ftrace_direct_multi_exit); + +MODULE_AUTHOR("Jiri Olsa"); +MODULE_DESCRIPTION("Example use case of using modify_ftrace_direct_multi()"); +MODULE_LICENSE("GPL"); From c24be24aed405d64ebcf04526614c13b2adfb1d2 Mon Sep 17 00:00:00 2001 From: Miaoqian Lin Date: Thu, 9 Dec 2021 02:43:17 +0000 Subject: [PATCH 091/143] tracing: Fix possible memory leak in __create_synth_event() error path There's error paths in __create_synth_event() after the argv is allocated that fail to free it. Add a jump to free it when necessary. Link: https://lkml.kernel.org/r/20211209024317.11783-1-linmq006@gmail.com Suggested-by: Steven Rostedt (VMware) Signed-off-by: Miaoqian Lin [ Fixed up the patch and change log ] Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_synth.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c index 22db3ce95e74..ca9c13b2ecf4 100644 --- a/kernel/trace/trace_events_synth.c +++ b/kernel/trace/trace_events_synth.c @@ -1237,9 +1237,8 @@ static int __create_synth_event(const char *name, const char *raw_fields) argv + consumed, &consumed, &field_version); if (IS_ERR(field)) { - argv_free(argv); ret = PTR_ERR(field); - goto err; + goto err_free_arg; } /* @@ -1262,18 +1261,19 @@ static int __create_synth_event(const char *name, const char *raw_fields) if (cmd_version > 1 && n_fields_this_loop >= 1) { synth_err(SYNTH_ERR_INVALID_CMD, errpos(field_str)); ret = -EINVAL; - goto err; + goto err_free_arg; } fields[n_fields++] = field; if (n_fields == SYNTH_FIELDS_MAX) { synth_err(SYNTH_ERR_TOO_MANY_FIELDS, 0); ret = -EINVAL; - goto err; + goto err_free_arg; } n_fields_this_loop++; } + argv_free(argv); if (consumed < argc) { synth_err(SYNTH_ERR_INVALID_CMD, 0); @@ -1281,7 +1281,6 @@ static int __create_synth_event(const char *name, const char *raw_fields) goto err; } - argv_free(argv); } if (n_fields == 0) { @@ -1307,6 +1306,8 @@ static int __create_synth_event(const char *name, const char *raw_fields) kfree(saved_fields); return ret; + err_free_arg: + argv_free(argv); err: for (i = 0; i < n_fields; i++) free_synth_field(fields[i]); From 42288cb44c4b5fff7653bc392b583a2b8bd6a8c0 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 8 Dec 2021 17:04:51 -0800 Subject: [PATCH 092/143] wait: add wake_up_pollfree() Several ->poll() implementations are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'. However, that has a bug: wake_up_poll() calls __wake_up() with nr_exclusive=1. Therefore, if there are multiple "exclusive" waiters, and the wakeup function for the first one returns a positive value, only that one will be called. That's *not* what's needed for POLLFREE; POLLFREE is special in that it really needs to wake up everyone. Considering the three non-blocking poll systems: - io_uring poll doesn't handle POLLFREE at all, so it is broken anyway. - aio poll is unaffected, since it doesn't support exclusive waits. However, that's fragile, as someone could add this feature later. - epoll doesn't appear to be broken by this, since its wakeup function returns 0 when it sees POLLFREE. But this is fragile. Although there is a workaround (see epoll), it's better to define a function which always sends POLLFREE to all waiters. Add such a function. Also make it verify that the queue really becomes empty after all waiters have been woken up. Reported-by: Linus Torvalds Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211209010455.42744-2-ebiggers@kernel.org Signed-off-by: Eric Biggers --- include/linux/wait.h | 26 ++++++++++++++++++++++++++ kernel/sched/wait.c | 7 +++++++ 2 files changed, 33 insertions(+) diff --git a/include/linux/wait.h b/include/linux/wait.h index 2d0df57c9902..851e07da2583 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -217,6 +217,7 @@ void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void void __wake_up_locked_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked(struct wait_queue_head *wq_head, unsigned int mode, int nr); void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode); +void __wake_up_pollfree(struct wait_queue_head *wq_head); #define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL) #define wake_up_nr(x, nr) __wake_up(x, TASK_NORMAL, nr, NULL) @@ -245,6 +246,31 @@ void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode); #define wake_up_interruptible_sync_poll_locked(x, m) \ __wake_up_locked_sync_key((x), TASK_INTERRUPTIBLE, poll_to_key(m)) +/** + * wake_up_pollfree - signal that a polled waitqueue is going away + * @wq_head: the wait queue head + * + * In the very rare cases where a ->poll() implementation uses a waitqueue whose + * lifetime is tied to a task rather than to the 'struct file' being polled, + * this function must be called before the waitqueue is freed so that + * non-blocking polls (e.g. epoll) are notified that the queue is going away. + * + * The caller must also RCU-delay the freeing of the wait_queue_head, e.g. via + * an explicit synchronize_rcu() or call_rcu(), or via SLAB_TYPESAFE_BY_RCU. + */ +static inline void wake_up_pollfree(struct wait_queue_head *wq_head) +{ + /* + * For performance reasons, we don't always take the queue lock here. + * Therefore, we might race with someone removing the last entry from + * the queue, and proceed while they still hold the queue lock. + * However, rcu_read_lock() is required to be held in such cases, so we + * can safely proceed with an RCU-delayed free. + */ + if (waitqueue_active(wq_head)) + __wake_up_pollfree(wq_head); +} + #define ___wait_cond_timeout(condition) \ ({ \ bool __cond = (condition); \ diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 76577d1642a5..eca38107b32f 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -238,6 +238,13 @@ void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode) } EXPORT_SYMBOL_GPL(__wake_up_sync); /* For internal use only */ +void __wake_up_pollfree(struct wait_queue_head *wq_head) +{ + __wake_up(wq_head, TASK_NORMAL, 0, poll_to_key(EPOLLHUP | POLLFREE)); + /* POLLFREE must have cleared the queue. */ + WARN_ON_ONCE(waitqueue_active(wq_head)); +} + /* * Note: we use "set_current_state()" _after_ the wait-queue add, * because we need a memory barrier there on SMP, so that any From a880b28a71e39013e357fd3adccd1d8a31bc69a8 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 8 Dec 2021 17:04:52 -0800 Subject: [PATCH 093/143] binder: use wake_up_pollfree() wake_up_poll() uses nr_exclusive=1, so it's not guaranteed to wake up all exclusive waiters. Yet, POLLFREE *must* wake up all waiters. epoll and aio poll are fortunately not affected by this, but it's very fragile. Thus, the new function wake_up_pollfree() has been introduced. Convert binder to use wake_up_pollfree(). Reported-by: Linus Torvalds Fixes: f5cb779ba163 ("ANDROID: binder: remove waitqueue when thread exits.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211209010455.42744-3-ebiggers@kernel.org Signed-off-by: Eric Biggers --- drivers/android/binder.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index cffbe57a8e08..c75fb600740c 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -4422,23 +4422,20 @@ static int binder_thread_release(struct binder_proc *proc, __release(&t->lock); /* - * If this thread used poll, make sure we remove the waitqueue - * from any epoll data structures holding it with POLLFREE. - * waitqueue_active() is safe to use here because we're holding - * the inner lock. + * If this thread used poll, make sure we remove the waitqueue from any + * poll data structures holding it. */ - if ((thread->looper & BINDER_LOOPER_STATE_POLL) && - waitqueue_active(&thread->wait)) { - wake_up_poll(&thread->wait, EPOLLHUP | POLLFREE); - } + if (thread->looper & BINDER_LOOPER_STATE_POLL) + wake_up_pollfree(&thread->wait); binder_inner_proc_unlock(thread->proc); /* - * This is needed to avoid races between wake_up_poll() above and - * and ep_remove_waitqueue() called for other reasons (eg the epoll file - * descriptor being closed); ep_remove_waitqueue() holds an RCU read - * lock, so we can be sure it's done after calling synchronize_rcu(). + * This is needed to avoid races between wake_up_pollfree() above and + * someone else removing the last entry from the queue for other reasons + * (e.g. ep_remove_wait_queue() being called due to an epoll file + * descriptor being closed). Such other users hold an RCU read lock, so + * we can be sure they're done after we call synchronize_rcu(). */ if (thread->looper & BINDER_LOOPER_STATE_POLL) synchronize_rcu(); From 9537bae0da1f8d1e2361ab6d0479e8af7824e160 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 8 Dec 2021 17:04:53 -0800 Subject: [PATCH 094/143] signalfd: use wake_up_pollfree() wake_up_poll() uses nr_exclusive=1, so it's not guaranteed to wake up all exclusive waiters. Yet, POLLFREE *must* wake up all waiters. epoll and aio poll are fortunately not affected by this, but it's very fragile. Thus, the new function wake_up_pollfree() has been introduced. Convert signalfd to use wake_up_pollfree(). Reported-by: Linus Torvalds Fixes: d80e731ecab4 ("epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211209010455.42744-4-ebiggers@kernel.org Signed-off-by: Eric Biggers --- fs/signalfd.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/fs/signalfd.c b/fs/signalfd.c index 040e1cf90528..65ce0e72e7b9 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -35,17 +35,7 @@ void signalfd_cleanup(struct sighand_struct *sighand) { - wait_queue_head_t *wqh = &sighand->signalfd_wqh; - /* - * The lockless check can race with remove_wait_queue() in progress, - * but in this case its caller should run under rcu_read_lock() and - * sighand_cachep is SLAB_TYPESAFE_BY_RCU, we can safely return. - */ - if (likely(!waitqueue_active(wqh))) - return; - - /* wait_queue_entry_t->func(POLLFREE) should do remove_wait_queue() */ - wake_up_poll(wqh, EPOLLHUP | POLLFREE); + wake_up_pollfree(&sighand->signalfd_wqh); } struct signalfd_ctx { From 363bee27e25804d8981dd1c025b4ad49dc39c530 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 8 Dec 2021 17:04:54 -0800 Subject: [PATCH 095/143] aio: keep poll requests on waitqueue until completed Currently, aio_poll_wake() will always remove the poll request from the waitqueue. Then, if aio_poll_complete_work() sees that none of the polled events are ready and the request isn't cancelled, it re-adds the request to the waitqueue. (This can easily happen when polling a file that doesn't pass an event mask when waking up its waitqueue.) This is fundamentally broken for two reasons: 1. If a wakeup occurs between vfs_poll() and the request being re-added to the waitqueue, it will be missed because the request wasn't on the waitqueue at the time. Therefore, IOCB_CMD_POLL might never complete even if the polled file is ready. 2. When the request isn't on the waitqueue, there is no way to be notified that the waitqueue is being freed (which happens when its lifetime is shorter than the struct file's). This is supposed to happen via the waitqueue entries being woken up with POLLFREE. Therefore, leave the requests on the waitqueue until they are actually completed (or cancelled). To keep track of when aio_poll_complete_work needs to be scheduled, use new fields in struct poll_iocb. Remove the 'done' field which is now redundant. Note that this is consistent with how sys_poll() and eventpoll work; their wakeup functions do *not* remove the waitqueue entries. Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL") Cc: # v4.18+ Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.org Signed-off-by: Eric Biggers --- fs/aio.c | 83 ++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 63 insertions(+), 20 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 9c81cf611d65..2bc1352a83d8 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -181,8 +181,9 @@ struct poll_iocb { struct file *file; struct wait_queue_head *head; __poll_t events; - bool done; bool cancelled; + bool work_scheduled; + bool work_need_resched; struct wait_queue_entry wait; struct work_struct work; }; @@ -1638,14 +1639,26 @@ static void aio_poll_complete_work(struct work_struct *work) * avoid further branches in the fast path. */ spin_lock_irq(&ctx->ctx_lock); + spin_lock(&req->head->lock); if (!mask && !READ_ONCE(req->cancelled)) { - add_wait_queue(req->head, &req->wait); + /* + * The request isn't actually ready to be completed yet. + * Reschedule completion if another wakeup came in. + */ + if (req->work_need_resched) { + schedule_work(&req->work); + req->work_need_resched = false; + } else { + req->work_scheduled = false; + } + spin_unlock(&req->head->lock); spin_unlock_irq(&ctx->ctx_lock); return; } + list_del_init(&req->wait.entry); + spin_unlock(&req->head->lock); list_del_init(&iocb->ki_list); iocb->ki_res.res = mangle_poll(mask); - req->done = true; spin_unlock_irq(&ctx->ctx_lock); iocb_put(iocb); @@ -1659,9 +1672,9 @@ static int aio_poll_cancel(struct kiocb *iocb) spin_lock(&req->head->lock); WRITE_ONCE(req->cancelled, true); - if (!list_empty(&req->wait.entry)) { - list_del_init(&req->wait.entry); + if (!req->work_scheduled) { schedule_work(&aiocb->poll.work); + req->work_scheduled = true; } spin_unlock(&req->head->lock); @@ -1680,20 +1693,26 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, if (mask && !(mask & req->events)) return 0; - list_del_init(&req->wait.entry); - - if (mask && spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) { + /* + * Complete the request inline if possible. This requires that three + * conditions be met: + * 1. An event mask must have been passed. If a plain wakeup was done + * instead, then mask == 0 and we have to call vfs_poll() to get + * the events, so inline completion isn't possible. + * 2. The completion work must not have already been scheduled. + * 3. ctx_lock must not be busy. We have to use trylock because we + * already hold the waitqueue lock, so this inverts the normal + * locking order. Use irqsave/irqrestore because not all + * filesystems (e.g. fuse) call this function with IRQs disabled, + * yet IRQs have to be disabled before ctx_lock is obtained. + */ + if (mask && !req->work_scheduled && + spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) { struct kioctx *ctx = iocb->ki_ctx; - /* - * Try to complete the iocb inline if we can. Use - * irqsave/irqrestore because not all filesystems (e.g. fuse) - * call this function with IRQs disabled and because IRQs - * have to be disabled before ctx_lock is obtained. - */ + list_del_init(&req->wait.entry); list_del(&iocb->ki_list); iocb->ki_res.res = mangle_poll(mask); - req->done = true; if (iocb->ki_eventfd && eventfd_signal_allowed()) { iocb = NULL; INIT_WORK(&req->work, aio_poll_put_work); @@ -1703,7 +1722,20 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, if (iocb) iocb_put(iocb); } else { - schedule_work(&req->work); + /* + * Schedule the completion work if needed. If it was already + * scheduled, record that another wakeup came in. + * + * Don't remove the request from the waitqueue here, as it might + * not actually be complete yet (we won't know until vfs_poll() + * is called), and we must not miss any wakeups. + */ + if (req->work_scheduled) { + req->work_need_resched = true; + } else { + schedule_work(&req->work); + req->work_scheduled = true; + } } return 1; } @@ -1750,8 +1782,9 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP; req->head = NULL; - req->done = false; req->cancelled = false; + req->work_scheduled = false; + req->work_need_resched = false; apt.pt._qproc = aio_poll_queue_proc; apt.pt._key = req->events; @@ -1766,17 +1799,27 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) spin_lock_irq(&ctx->ctx_lock); if (likely(req->head)) { spin_lock(&req->head->lock); - if (unlikely(list_empty(&req->wait.entry))) { - if (apt.error) + if (list_empty(&req->wait.entry) || req->work_scheduled) { + /* + * aio_poll_wake() already either scheduled the async + * completion work, or completed the request inline. + */ + if (apt.error) /* unsupported case: multiple queues */ cancel = true; apt.error = 0; mask = 0; } if (mask || apt.error) { + /* Steal to complete synchronously. */ list_del_init(&req->wait.entry); } else if (cancel) { + /* Cancel if possible (may be too late though). */ WRITE_ONCE(req->cancelled, true); - } else if (!req->done) { /* actually waiting for an event */ + } else if (!list_empty(&req->wait.entry)) { + /* + * Actually waiting for an event, so add the request to + * active_reqs so that it can be cancelled if needed. + */ list_add_tail(&aiocb->ki_list, &ctx->active_reqs); aiocb->ki_cancel = aio_poll_cancel; } From 50252e4b5e989ce64555c7aef7516bdefc2fea72 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 8 Dec 2021 17:04:55 -0800 Subject: [PATCH 096/143] aio: fix use-after-free due to missing POLLFREE handling signalfd_poll() and binder_poll() are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, by sending a POLLFREE notification to all waiters. Unfortunately, only eventpoll handles POLLFREE. A second type of non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't handle POLLFREE. This allows a use-after-free to occur if a signalfd or binder fd is polled with aio poll, and the waitqueue gets freed. Fix this by making aio poll handle POLLFREE. A patch by Ramji Jiyani (https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com) tried to do this by making aio_poll_wake() always complete the request inline if POLLFREE is seen. However, that solution had two bugs. First, it introduced a deadlock, as it unconditionally locked the aio context while holding the waitqueue lock, which inverts the normal locking order. Second, it didn't consider that POLLFREE notifications are missed while the request has been temporarily de-queued. The second problem was solved by my previous patch. This patch then properly fixes the use-after-free by handling POLLFREE in a deadlock-free way. It does this by taking advantage of the fact that freeing of the waitqueue is RCU-delayed, similar to what eventpoll does. Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL") Cc: # v4.18+ Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org Signed-off-by: Eric Biggers --- fs/aio.c | 137 ++++++++++++++++++++++++-------- include/uapi/asm-generic/poll.h | 2 +- 2 files changed, 107 insertions(+), 32 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 2bc1352a83d8..c9bb0d3d8593 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1620,6 +1620,51 @@ static void aio_poll_put_work(struct work_struct *work) iocb_put(iocb); } +/* + * Safely lock the waitqueue which the request is on, synchronizing with the + * case where the ->poll() provider decides to free its waitqueue early. + * + * Returns true on success, meaning that req->head->lock was locked, req->wait + * is on req->head, and an RCU read lock was taken. Returns false if the + * request was already removed from its waitqueue (which might no longer exist). + */ +static bool poll_iocb_lock_wq(struct poll_iocb *req) +{ + wait_queue_head_t *head; + + /* + * While we hold the waitqueue lock and the waitqueue is nonempty, + * wake_up_pollfree() will wait for us. However, taking the waitqueue + * lock in the first place can race with the waitqueue being freed. + * + * We solve this as eventpoll does: by taking advantage of the fact that + * all users of wake_up_pollfree() will RCU-delay the actual free. If + * we enter rcu_read_lock() and see that the pointer to the queue is + * non-NULL, we can then lock it without the memory being freed out from + * under us, then check whether the request is still on the queue. + * + * Keep holding rcu_read_lock() as long as we hold the queue lock, in + * case the caller deletes the entry from the queue, leaving it empty. + * In that case, only RCU prevents the queue memory from being freed. + */ + rcu_read_lock(); + head = smp_load_acquire(&req->head); + if (head) { + spin_lock(&head->lock); + if (!list_empty(&req->wait.entry)) + return true; + spin_unlock(&head->lock); + } + rcu_read_unlock(); + return false; +} + +static void poll_iocb_unlock_wq(struct poll_iocb *req) +{ + spin_unlock(&req->head->lock); + rcu_read_unlock(); +} + static void aio_poll_complete_work(struct work_struct *work) { struct poll_iocb *req = container_of(work, struct poll_iocb, work); @@ -1639,24 +1684,25 @@ static void aio_poll_complete_work(struct work_struct *work) * avoid further branches in the fast path. */ spin_lock_irq(&ctx->ctx_lock); - spin_lock(&req->head->lock); - if (!mask && !READ_ONCE(req->cancelled)) { - /* - * The request isn't actually ready to be completed yet. - * Reschedule completion if another wakeup came in. - */ - if (req->work_need_resched) { - schedule_work(&req->work); - req->work_need_resched = false; - } else { - req->work_scheduled = false; + if (poll_iocb_lock_wq(req)) { + if (!mask && !READ_ONCE(req->cancelled)) { + /* + * The request isn't actually ready to be completed yet. + * Reschedule completion if another wakeup came in. + */ + if (req->work_need_resched) { + schedule_work(&req->work); + req->work_need_resched = false; + } else { + req->work_scheduled = false; + } + poll_iocb_unlock_wq(req); + spin_unlock_irq(&ctx->ctx_lock); + return; } - spin_unlock(&req->head->lock); - spin_unlock_irq(&ctx->ctx_lock); - return; - } - list_del_init(&req->wait.entry); - spin_unlock(&req->head->lock); + list_del_init(&req->wait.entry); + poll_iocb_unlock_wq(req); + } /* else, POLLFREE has freed the waitqueue, so we must complete */ list_del_init(&iocb->ki_list); iocb->ki_res.res = mangle_poll(mask); spin_unlock_irq(&ctx->ctx_lock); @@ -1670,13 +1716,14 @@ static int aio_poll_cancel(struct kiocb *iocb) struct aio_kiocb *aiocb = container_of(iocb, struct aio_kiocb, rw); struct poll_iocb *req = &aiocb->poll; - spin_lock(&req->head->lock); - WRITE_ONCE(req->cancelled, true); - if (!req->work_scheduled) { - schedule_work(&aiocb->poll.work); - req->work_scheduled = true; - } - spin_unlock(&req->head->lock); + if (poll_iocb_lock_wq(req)) { + WRITE_ONCE(req->cancelled, true); + if (!req->work_scheduled) { + schedule_work(&aiocb->poll.work); + req->work_scheduled = true; + } + poll_iocb_unlock_wq(req); + } /* else, the request was force-cancelled by POLLFREE already */ return 0; } @@ -1728,7 +1775,8 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, * * Don't remove the request from the waitqueue here, as it might * not actually be complete yet (we won't know until vfs_poll() - * is called), and we must not miss any wakeups. + * is called), and we must not miss any wakeups. POLLFREE is an + * exception to this; see below. */ if (req->work_scheduled) { req->work_need_resched = true; @@ -1736,6 +1784,28 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, schedule_work(&req->work); req->work_scheduled = true; } + + /* + * If the waitqueue is being freed early but we can't complete + * the request inline, we have to tear down the request as best + * we can. That means immediately removing the request from its + * waitqueue and preventing all further accesses to the + * waitqueue via the request. We also need to schedule the + * completion work (done above). Also mark the request as + * cancelled, to potentially skip an unneeded call to ->poll(). + */ + if (mask & POLLFREE) { + WRITE_ONCE(req->cancelled, true); + list_del_init(&req->wait.entry); + + /* + * Careful: this *must* be the last step, since as soon + * as req->head is NULL'ed out, the request can be + * completed and freed, since aio_poll_complete_work() + * will no longer need to take the waitqueue lock. + */ + smp_store_release(&req->head, NULL); + } } return 1; } @@ -1743,6 +1813,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, struct aio_poll_table { struct poll_table_struct pt; struct aio_kiocb *iocb; + bool queued; int error; }; @@ -1753,11 +1824,12 @@ aio_poll_queue_proc(struct file *file, struct wait_queue_head *head, struct aio_poll_table *pt = container_of(p, struct aio_poll_table, pt); /* multiple wait queues per file are not supported */ - if (unlikely(pt->iocb->poll.head)) { + if (unlikely(pt->queued)) { pt->error = -EINVAL; return; } + pt->queued = true; pt->error = 0; pt->iocb->poll.head = head; add_wait_queue(head, &pt->iocb->poll.wait); @@ -1789,6 +1861,7 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) apt.pt._qproc = aio_poll_queue_proc; apt.pt._key = req->events; apt.iocb = aiocb; + apt.queued = false; apt.error = -EINVAL; /* same as no support for IOCB_CMD_POLL */ /* initialized the list so that we can do list_empty checks */ @@ -1797,9 +1870,10 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) mask = vfs_poll(req->file, &apt.pt) & req->events; spin_lock_irq(&ctx->ctx_lock); - if (likely(req->head)) { - spin_lock(&req->head->lock); - if (list_empty(&req->wait.entry) || req->work_scheduled) { + if (likely(apt.queued)) { + bool on_queue = poll_iocb_lock_wq(req); + + if (!on_queue || req->work_scheduled) { /* * aio_poll_wake() already either scheduled the async * completion work, or completed the request inline. @@ -1815,7 +1889,7 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) } else if (cancel) { /* Cancel if possible (may be too late though). */ WRITE_ONCE(req->cancelled, true); - } else if (!list_empty(&req->wait.entry)) { + } else if (on_queue) { /* * Actually waiting for an event, so add the request to * active_reqs so that it can be cancelled if needed. @@ -1823,7 +1897,8 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) list_add_tail(&aiocb->ki_list, &ctx->active_reqs); aiocb->ki_cancel = aio_poll_cancel; } - spin_unlock(&req->head->lock); + if (on_queue) + poll_iocb_unlock_wq(req); } if (mask) { /* no async, we'd stolen it */ aiocb->ki_res.res = mangle_poll(mask); diff --git a/include/uapi/asm-generic/poll.h b/include/uapi/asm-generic/poll.h index 41b509f410bf..f9c520ce4bf4 100644 --- a/include/uapi/asm-generic/poll.h +++ b/include/uapi/asm-generic/poll.h @@ -29,7 +29,7 @@ #define POLLRDHUP 0x2000 #endif -#define POLLFREE (__force __poll_t)0x4000 /* currently only for epoll */ +#define POLLFREE (__force __poll_t)0x4000 #define POLL_BUSY_LOOP (__force __poll_t)0x8000 From 4b3749865374899e115aa8c48681709b086fe6d3 Mon Sep 17 00:00:00 2001 From: Xie Yongji Date: Mon, 13 Sep 2021 19:19:28 +0800 Subject: [PATCH 097/143] aio: Fix incorrect usage of eventfd_signal_allowed() We should defer eventfd_signal() to the workqueue when eventfd_signal_allowed() return false rather than return true. Fixes: b542e383d8c0 ("eventfd: Make signal recursion protection a task bit") Signed-off-by: Xie Yongji Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com Reviewed-by: Eric Biggers Signed-off-by: Eric Biggers --- fs/aio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/aio.c b/fs/aio.c index c9bb0d3d8593..f6f1cbffef9e 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1760,7 +1760,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, list_del_init(&req->wait.entry); list_del(&iocb->ki_list); iocb->ki_res.res = mangle_poll(mask); - if (iocb->ki_eventfd && eventfd_signal_allowed()) { + if (iocb->ki_eventfd && !eventfd_signal_allowed()) { iocb = NULL; INIT_WORK(&req->work, aio_poll_put_work); schedule_work(&req->work); From a4f1192cb53758a7210ed5a9ee695aeba22f75fb Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 9 Dec 2021 14:30:33 +0200 Subject: [PATCH 098/143] percpu_ref: Replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Signed-off-by: Andy Shevchenko Signed-off-by: Dennis Zhou --- include/linux/percpu-refcount.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index b31d3f3312ce..d73a1c08c3e3 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -51,9 +51,9 @@ #define _LINUX_PERCPU_REFCOUNT_H #include -#include #include #include +#include #include struct percpu_ref; From 1ebfaa11ebb5b603a3c3f54b2e84fcf1030f5a14 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Thu, 9 Dec 2021 11:29:37 +0100 Subject: [PATCH 099/143] KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall Prior to commit 0baedd792713 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH | KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs and KVM_REQ_TLB_FLUSH is/was defined as: (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that "This call guarantees that by the time control returns back to the caller, the observable effects of all flushes on the specified virtual processors have occurred." and without KVM_REQUEST_WAIT there's a small chance that the vCPU making the TLB flush will resume running before all IPIs get delivered to other vCPUs and a stale mapping can get read there. Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST: kvm_hv_flush_tlb() is the sole caller which uses it for kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where KVM_REQUEST_WAIT makes a difference. Cc: stable@kernel.org Fixes: 0baedd792713 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()") Signed-off-by: Vitaly Kuznetsov Message-Id: <20211209102937.584397-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 860ed500580c..2164b9f4c7b0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -97,7 +97,7 @@ KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_TLB_FLUSH_CURRENT KVM_ARCH_REQ(26) #define KVM_REQ_TLB_FLUSH_GUEST \ - KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_NO_WAKEUP) + KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_APF_READY KVM_ARCH_REQ(28) #define KVM_REQ_MSR_FILTER_CHANGED KVM_ARCH_REQ(29) #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \ From 3244867af8c065e51969f1bffe732d3ebfd9a7d2 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Tue, 7 Dec 2021 22:09:19 +0000 Subject: [PATCH 100/143] KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req Do not bail early if there are no bits set in the sparse banks for a non-sparse, a.k.a. "all CPUs", IPI request. Per the Hyper-V spec, it is legal to have a variable length of '0', e.g. VP_SET's BankContents in this case, if the request can be serviced without the extra info. It is possible that for a given invocation of a hypercall that does accept variable sized input headers that all the header input fits entirely within the fixed size header. In such cases the variable sized input header is zero-sized and the corresponding bits in the hypercall input should be set to zero. Bailing early results in KVM failing to send IPIs to all CPUs as expected by the guest. Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Reviewed-by: Vitaly Kuznetsov Message-Id: <20211207220926.718794-2-seanjc@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/hyperv.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 5e19e6e4c2ce..8d8c1cc7cb53 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -1922,11 +1922,13 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool all_cpus = send_ipi_ex.vp_set.format == HV_GENERIC_SET_ALL; + if (all_cpus) + goto check_and_send_ipi; + if (!sparse_banks_len) goto ret_success; - if (!all_cpus && - kvm_read_guest(kvm, + if (kvm_read_guest(kvm, hc->ingpa + offsetof(struct hv_send_ipi_ex, vp_set.bank_contents), sparse_banks, @@ -1934,6 +1936,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool return HV_STATUS_INVALID_HYPERCALL_INPUT; } +check_and_send_ipi: if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR)) return HV_STATUS_INVALID_HYPERCALL_INPUT; From c8cc43c1eae2910ac96daa4216e0fb3391ad0504 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 5 Aug 2021 06:54:23 -0400 Subject: [PATCH 101/143] selftests: KVM: avoid failures due to reserved HyperTransport region AMD proceessors define an address range that is reserved by HyperTransport and causes a failure if used for guest physical addresses. Avoid selftests failures by reserving those guest physical addresses; the rules are: - On parts with <40 bits, its fully hidden from software. - Before Fam17h, it was always 12G just below 1T, even if there was more RAM above this location. In this case we just not use any RAM above 1T. - On Fam17h and later, it is variable based on SME, and is either just below 2^48 (no encryption) or 2^43 (encryption). Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()") Cc: stable@vger.kernel.org Cc: David Matlack Reported-by: Maxim Levitsky Signed-off-by: Paolo Bonzini Message-Id: <20210805105423.412878-1-pbonzini@redhat.com> Reviewed-by: Sean Christopherson Tested-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- .../testing/selftests/kvm/include/kvm_util.h | 9 +++ tools/testing/selftests/kvm/lib/kvm_util.c | 2 +- .../selftests/kvm/lib/x86_64/processor.c | 68 +++++++++++++++++++ 3 files changed, 78 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index 6a1a37f30494..da2b702da71a 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -71,6 +71,15 @@ enum vm_guest_mode { #endif +#if defined(__x86_64__) +unsigned long vm_compute_max_gfn(struct kvm_vm *vm); +#else +static inline unsigned long vm_compute_max_gfn(struct kvm_vm *vm) +{ + return ((1ULL << vm->pa_bits) >> vm->page_shift) - 1; +} +#endif + #define MIN_PAGE_SIZE (1U << MIN_PAGE_SHIFT) #define PTES_PER_MIN_PAGE ptes_per_page(MIN_PAGE_SIZE) diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 8f2e0bb1ef96..daf6fdb217a7 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -302,7 +302,7 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm) (1ULL << (vm->va_bits - 1)) >> vm->page_shift); /* Limit physical addresses to PA-bits. */ - vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1; + vm->max_gfn = vm_compute_max_gfn(vm); /* Allocate and setup memory for guest. */ vm->vpages_mapped = sparsebit_alloc(); diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c index 82c39db91369..eef7b34756d5 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c @@ -1431,3 +1431,71 @@ struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpui return cpuid; } + +#define X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx 0x68747541 +#define X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx 0x444d4163 +#define X86EMUL_CPUID_VENDOR_AuthenticAMD_edx 0x69746e65 + +static inline unsigned x86_family(unsigned int eax) +{ + unsigned int x86; + + x86 = (eax >> 8) & 0xf; + + if (x86 == 0xf) + x86 += (eax >> 20) & 0xff; + + return x86; +} + +unsigned long vm_compute_max_gfn(struct kvm_vm *vm) +{ + const unsigned long num_ht_pages = 12 << (30 - vm->page_shift); /* 12 GiB */ + unsigned long ht_gfn, max_gfn, max_pfn; + uint32_t eax, ebx, ecx, edx, max_ext_leaf; + + max_gfn = (1ULL << (vm->pa_bits - vm->page_shift)) - 1; + + /* Avoid reserved HyperTransport region on AMD processors. */ + eax = ecx = 0; + cpuid(&eax, &ebx, &ecx, &edx); + if (ebx != X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx || + ecx != X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx || + edx != X86EMUL_CPUID_VENDOR_AuthenticAMD_edx) + return max_gfn; + + /* On parts with <40 physical address bits, the area is fully hidden */ + if (vm->pa_bits < 40) + return max_gfn; + + /* Before family 17h, the HyperTransport area is just below 1T. */ + ht_gfn = (1 << 28) - num_ht_pages; + eax = 1; + cpuid(&eax, &ebx, &ecx, &edx); + if (x86_family(eax) < 0x17) + goto done; + + /* + * Otherwise it's at the top of the physical address space, possibly + * reduced due to SME by bits 11:6 of CPUID[0x8000001f].EBX. Use + * the old conservative value if MAXPHYADDR is not enumerated. + */ + eax = 0x80000000; + cpuid(&eax, &ebx, &ecx, &edx); + max_ext_leaf = eax; + if (max_ext_leaf < 0x80000008) + goto done; + + eax = 0x80000008; + cpuid(&eax, &ebx, &ecx, &edx); + max_pfn = (1ULL << ((eax & 0xff) - vm->page_shift)) - 1; + if (max_ext_leaf >= 0x8000001f) { + eax = 0x8000001f; + cpuid(&eax, &ebx, &ecx, &edx); + max_pfn >>= (ebx >> 6) & 0x3f; + } + + ht_gfn = max_pfn - num_ht_pages; +done: + return min(max_gfn, ht_gfn - 1); +} From 777ab82d7ce0451fd47bb57e331548deba57394e Mon Sep 17 00:00:00 2001 From: Lai Jiangshan Date: Tue, 7 Dec 2021 17:52:30 +0800 Subject: [PATCH 102/143] KVM: X86: Raise #GP when clearing CR0_PG in 64 bit mode In the SDM: If the logical processor is in 64-bit mode or if CR4.PCIDE = 1, an attempt to clear CR0.PG causes a general-protection exception (#GP). Software should transition to compatibility mode and clear CR4.PCIDE before attempting to disable paging. Signed-off-by: Lai Jiangshan Message-Id: <20211207095230.53437-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e0aa4dd53c7f..c473498c8512 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -890,7 +890,8 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) !load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu))) return 1; - if (!(cr0 & X86_CR0_PG) && kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) + if (!(cr0 & X86_CR0_PG) && + (is_64_bit_mode(vcpu) || kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE))) return 1; static_call(kvm_x86_set_cr0)(vcpu, cr0); From d07898eaf39909806128caccb6ebd922ee3edd69 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Mon, 25 Oct 2021 13:13:10 -0700 Subject: [PATCH 103/143] KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exit Replace a WARN with a comment to call out that userspace can modify RCX during an exit to userspace to handle string I/O. KVM doesn't actually support changing the rep count during an exit, i.e. the scenario can be ignored, but the WARN needs to go as it's trivial to trigger from userspace. Cc: stable@vger.kernel.org Fixes: 3b27de271839 ("KVM: x86: split the two parts of emulator_pio_in") Signed-off-by: Sean Christopherson Message-Id: <20211025201311.1881846-2-seanjc@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c473498c8512..0cf1082455df 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7122,7 +7122,13 @@ static int emulator_pio_in(struct kvm_vcpu *vcpu, int size, unsigned short port, void *val, unsigned int count) { if (vcpu->arch.pio.count) { - /* Complete previous iteration. */ + /* + * Complete a previous iteration that required userspace I/O. + * Note, @count isn't guaranteed to match pio.count as userspace + * can modify ECX before rerunning the vCPU. Ignore any such + * shenanigans as KVM doesn't support modifying the rep count, + * and the emulator ensures @count doesn't overflow the buffer. + */ } else { int r = __emulator_pio_in(vcpu, size, port, count); if (!r) @@ -7131,7 +7137,6 @@ static int emulator_pio_in(struct kvm_vcpu *vcpu, int size, /* Results already available, fall through. */ } - WARN_ON(count != vcpu->arch.pio.count); complete_emulator_pio_in(vcpu, val); return 1; } From 10e7a099bfd860a2b77ea8aaac661f52c16dd865 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Mon, 25 Oct 2021 13:13:11 -0700 Subject: [PATCH 104/143] selftests: KVM: Add test to verify KVM doesn't explode on "bad" I/O Add an x86 selftest to verify that KVM doesn't WARN or otherwise explode if userspace modifies RCX during a userspace exit to handle string I/O. This is a regression test for a user-triggerable WARN introduced by commit 3b27de271839 ("KVM: x86: split the two parts of emulator_pio_in"). Signed-off-by: Sean Christopherson Message-Id: <20211025201311.1881846-3-seanjc@google.com> Signed-off-by: Paolo Bonzini --- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/userspace_io_test.c | 114 ++++++++++++++++++ 3 files changed, 116 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/userspace_io_test.c diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index 3763105029fb..00814c0f87a6 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -30,6 +30,7 @@ /x86_64/svm_int_ctl_test /x86_64/sync_regs_test /x86_64/tsc_msrs_test +/x86_64/userspace_io_test /x86_64/userspace_msr_exit_test /x86_64/vmx_apic_access_test /x86_64/vmx_close_while_nested_test diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index c4e34717826a..f307c9f61981 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -59,6 +59,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test +TEST_GEN_PROGS_x86_64 += x86_64/userspace_io_test TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_apic_access_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test diff --git a/tools/testing/selftests/kvm/x86_64/userspace_io_test.c b/tools/testing/selftests/kvm/x86_64/userspace_io_test.c new file mode 100644 index 000000000000..e4bef2e05686 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/userspace_io_test.c @@ -0,0 +1,114 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include + +#include "test_util.h" + +#include "kvm_util.h" +#include "processor.h" + +#define VCPU_ID 1 + +static void guest_ins_port80(uint8_t *buffer, unsigned int count) +{ + unsigned long end; + + if (count == 2) + end = (unsigned long)buffer + 1; + else + end = (unsigned long)buffer + 8192; + + asm volatile("cld; rep; insb" : "+D"(buffer), "+c"(count) : "d"(0x80) : "memory"); + GUEST_ASSERT_1(count == 0, count); + GUEST_ASSERT_2((unsigned long)buffer == end, buffer, end); +} + +static void guest_code(void) +{ + uint8_t buffer[8192]; + int i; + + /* + * Special case tests. main() will adjust RCX 2 => 1 and 3 => 8192 to + * test that KVM doesn't explode when userspace modifies the "count" on + * a userspace I/O exit. KVM isn't required to play nice with the I/O + * itself as KVM doesn't support manipulating the count, it just needs + * to not explode or overflow a buffer. + */ + guest_ins_port80(buffer, 2); + guest_ins_port80(buffer, 3); + + /* Verify KVM fills the buffer correctly when not stuffing RCX. */ + memset(buffer, 0, sizeof(buffer)); + guest_ins_port80(buffer, 8192); + for (i = 0; i < 8192; i++) + GUEST_ASSERT_2(buffer[i] == 0xaa, i, buffer[i]); + + GUEST_DONE(); +} + +int main(int argc, char *argv[]) +{ + struct kvm_regs regs; + struct kvm_run *run; + struct kvm_vm *vm; + struct ucall uc; + int rc; + + /* Tell stdout not to buffer its content */ + setbuf(stdout, NULL); + + /* Create VM */ + vm = vm_create_default(VCPU_ID, 0, guest_code); + run = vcpu_state(vm, VCPU_ID); + + memset(®s, 0, sizeof(regs)); + + while (1) { + rc = _vcpu_run(vm, VCPU_ID); + + TEST_ASSERT(rc == 0, "vcpu_run failed: %d\n", rc); + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Unexpected exit reason: %u (%s),\n", + run->exit_reason, + exit_reason_str(run->exit_reason)); + + if (get_ucall(vm, VCPU_ID, &uc)) + break; + + TEST_ASSERT(run->io.port == 0x80, + "Expected I/O at port 0x80, got port 0x%x\n", run->io.port); + + /* + * Modify the rep string count in RCX: 2 => 1 and 3 => 8192. + * Note, this abuses KVM's batching of rep string I/O to avoid + * getting stuck in an infinite loop. That behavior isn't in + * scope from a testing perspective as it's not ABI in any way, + * i.e. it really is abusing internal KVM knowledge. + */ + vcpu_regs_get(vm, VCPU_ID, ®s); + if (regs.rcx == 2) + regs.rcx = 1; + if (regs.rcx == 3) + regs.rcx = 8192; + memset((void *)run + run->io.data_offset, 0xaa, 4096); + vcpu_regs_set(vm, VCPU_ID, ®s); + } + + switch (uc.cmd) { + case UCALL_DONE: + break; + case UCALL_ABORT: + TEST_FAIL("%s at %s:%ld : argN+1 = 0x%lx, argN+2 = 0x%lx", + (const char *)uc.args[0], __FILE__, uc.args[1], + uc.args[2], uc.args[3]); + default: + TEST_FAIL("Unknown ucall %lu", uc.cmd); + } + + kvm_vm_free(vm); + return 0; +} From b10252c7ae9c9d7c90552f88b544a44ee773af64 Mon Sep 17 00:00:00 2001 From: Alexander Sverdlin Date: Tue, 7 Dec 2021 15:00:39 +0100 Subject: [PATCH 105/143] nfsd: Fix nsfd startup race (again) Commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") has re-opened rpc_pipefs_event() race against nfsd_net_id registration (register_pernet_subsys()) which has been fixed by commit bb7ffbf29e76 ("nfsd: fix nsfd startup race triggering BUG_ON"). Restore the order of register_pernet_subsys() vs register_cld_notifier(). Add WARN_ON() to prevent a future regression. Crash info: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012 CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1 pc : rpc_pipefs_event+0x54/0x120 [nfsd] lr : rpc_pipefs_event+0x48/0x120 [nfsd] Call trace: rpc_pipefs_event+0x54/0x120 [nfsd] blocking_notifier_call_chain rpc_fill_super get_tree_keyed rpc_fs_get_tree vfs_get_tree do_mount ksys_mount __arm64_sys_mount el0_svc_handler el0_svc Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin Signed-off-by: J. Bruce Fields --- fs/nfsd/nfs4recover.c | 1 + fs/nfsd/nfsctl.c | 14 +++++++------- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index 6fedc49726bf..c634483d85d2 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -2156,6 +2156,7 @@ static struct notifier_block nfsd4_cld_block = { int register_cld_notifier(void) { + WARN_ON(!nfsd_net_id); return rpc_pipefs_notifier_register(&nfsd4_cld_block); } diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index af8531c3854a..51a49e0cfe37 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1521,12 +1521,9 @@ static int __init init_nfsd(void) int retval; printk(KERN_INFO "Installing knfsd (copyright (C) 1996 okir@monad.swb.de).\n"); - retval = register_cld_notifier(); - if (retval) - return retval; retval = nfsd4_init_slabs(); if (retval) - goto out_unregister_notifier; + return retval; retval = nfsd4_init_pnfs(); if (retval) goto out_free_slabs; @@ -1545,9 +1542,14 @@ static int __init init_nfsd(void) goto out_free_exports; retval = register_pernet_subsys(&nfsd_net_ops); if (retval < 0) + goto out_free_filesystem; + retval = register_cld_notifier(); + if (retval) goto out_free_all; return 0; out_free_all: + unregister_pernet_subsys(&nfsd_net_ops); +out_free_filesystem: unregister_filesystem(&nfsd_fs_type); out_free_exports: remove_proc_entry("fs/nfs/exports", NULL); @@ -1561,13 +1563,12 @@ static int __init init_nfsd(void) nfsd4_exit_pnfs(); out_free_slabs: nfsd4_free_slabs(); -out_unregister_notifier: - unregister_cld_notifier(); return retval; } static void __exit exit_nfsd(void) { + unregister_cld_notifier(); unregister_pernet_subsys(&nfsd_net_ops); nfsd_drc_slab_free(); remove_proc_entry("fs/nfs/exports", NULL); @@ -1577,7 +1578,6 @@ static void __exit exit_nfsd(void) nfsd4_free_slabs(); nfsd4_exit_pnfs(); unregister_filesystem(&nfsd_fs_type); - unregister_cld_notifier(); } MODULE_AUTHOR("Olaf Kirch "); From 548ec0805c399c65ed66c6641be467f717833ab5 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 29 Nov 2021 15:08:00 -0500 Subject: [PATCH 106/143] nfsd: fix use-after-free due to delegation race A delegation break could arrive as soon as we've called vfs_setlease. A delegation break runs a callback which immediately (in nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. If we then exit nfs4_set_delegation without hashing the delegation, it will be freed as soon as the callback is done with it, without ever being removed from del_recall_lru. Symptoms show up later as use-after-free or list corruption warnings, usually in the laundromat thread. I suspect aba2072f4523 "nfsd: grant read delegations to clients holding writes" made this bug easier to hit, but I looked as far back as v3.0 and it looks to me it already had the same problem. So I'm not sure where the bug was introduced; it may have been there from the beginning. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields --- fs/nfsd/nfs4state.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index bfad94c70b84..1956d377d1a6 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1207,6 +1207,11 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp) return 0; } +static bool delegation_hashed(struct nfs4_delegation *dp) +{ + return !(list_empty(&dp->dl_perfile)); +} + static bool unhash_delegation_locked(struct nfs4_delegation *dp) { @@ -1214,7 +1219,7 @@ unhash_delegation_locked(struct nfs4_delegation *dp) lockdep_assert_held(&state_lock); - if (list_empty(&dp->dl_perfile)) + if (!delegation_hashed(dp)) return false; dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID; @@ -4598,7 +4603,7 @@ static void nfsd4_cb_recall_prepare(struct nfsd4_callback *cb) * queued for a lease break. Don't queue it again. */ spin_lock(&state_lock); - if (dp->dl_time == 0) { + if (delegation_hashed(dp) && dp->dl_time == 0) { dp->dl_time = ktime_get_boottime_seconds(); list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru); } From 55df1ce0d4e086e05a8ab20619c73c729350f965 Mon Sep 17 00:00:00 2001 From: Markus Hochholdinger Date: Tue, 16 Nov 2021 10:21:35 +0000 Subject: [PATCH 107/143] md: fix update super 1.0 on rdev size change The superblock of version 1.0 doesn't get moved to the new position on a device size change. This leads to a rdev without a superblock on a known position, the raid can't be re-assembled. The line was removed by mistake and is re-added by this patch. Fixes: d9c0fa509eaf ("md: fix max sectors calculation for super 1.0") Cc: stable@vger.kernel.org Signed-off-by: Markus Hochholdinger Reviewed-by: Xiao Ni Signed-off-by: Song Liu --- drivers/md/md.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/md.c b/drivers/md/md.c index 5111ed966947..e97d2faf1e88 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2189,6 +2189,7 @@ super_1_rdev_size_change(struct md_rdev *rdev, sector_t num_sectors) if (!num_sectors || num_sectors > max_sectors) num_sectors = max_sectors; + rdev->sb_start = sb_start; } sb = page_address(rdev->sb_page); sb->data_size = cpu_to_le64(num_sectors); From 07641b5f32f6991758b08da9b1f4173feeb64f2a Mon Sep 17 00:00:00 2001 From: zhangyue Date: Tue, 16 Nov 2021 10:35:26 +0800 Subject: [PATCH 108/143] md: fix double free of mddev->private in autorun_array() In driver/md/md.c, if the function autorun_array() is called, the problem of double free may occur. In function autorun_array(), when the function do_md_run() returns an error, the function do_md_stop() will be called. The function do_md_run() called function md_run(), but in function md_run(), the pointer mddev->private may be freed. The function do_md_stop() called the function __md_stop(), but in function __md_stop(), the pointer mddev->private also will be freed without judging null. At this time, the pointer mddev->private will be double free, so it needs to be judged null or not. Signed-off-by: zhangyue Signed-off-by: Song Liu --- drivers/md/md.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index e97d2faf1e88..41d6e2383517 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6271,7 +6271,8 @@ static void __md_stop(struct mddev *mddev) spin_lock(&mddev->lock); mddev->pers = NULL; spin_unlock(&mddev->lock); - pers->free(mddev, mddev->private); + if (mddev->private) + pers->free(mddev, mddev->private); mddev->private = NULL; if (pers->sync_request && mddev->to_remove == NULL) mddev->to_remove = &md_redundancy_group; From e6a59aac8a8713f335a37d762db0dbe80e7f6d38 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Fri, 10 Dec 2021 10:20:58 -0800 Subject: [PATCH 109/143] block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent change_pid(PIDTYPE_PGID) that can move the task from one hlist to another while iterating. Serialize ioprio_get to take the tasklist_lock in this case, just like it's set counterpart. Fixes: d69b78ba1de (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()) Acked-by: Oleg Nesterov Signed-off-by: Davidlohr Bueso Link: https://lore.kernel.org/r/20211210182058.43417-1-dave@stgolabs.net Signed-off-by: Jens Axboe --- block/ioprio.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/ioprio.c b/block/ioprio.c index 313c14a70bbd..6f01d35a5145 100644 --- a/block/ioprio.c +++ b/block/ioprio.c @@ -220,6 +220,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who) pgrp = task_pgrp(current); else pgrp = find_vpid(who); + read_lock(&tasklist_lock); do_each_pid_thread(pgrp, PIDTYPE_PGID, p) { tmpio = get_task_ioprio(p); if (tmpio < 0) @@ -229,6 +230,8 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who) else ret = ioprio_best(ret, tmpio); } while_each_pid_thread(pgrp, PIDTYPE_PGID, p); + read_unlock(&tasklist_lock); + break; case IOPRIO_WHO_USER: uid = make_kuid(current_user_ns(), who); From 5eff363838654790f67f4bd564c5782967f67bcc Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 10 Dec 2021 11:52:34 -0700 Subject: [PATCH 110/143] Revert "mtd_blkdevs: don't scan partitions for plain mtdblock" This reverts commit 776b54e97a7d993ba23696e032426d5dea5bbe70. Looks like a last minute edit snuck into this patch, and as a result, it doesn't even compile. Revert the change for now. Signed-off-by: Jens Axboe --- drivers/mtd/mtd_blkdevs.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c index a69d064a8eec..4eaba6f4ec68 100644 --- a/drivers/mtd/mtd_blkdevs.c +++ b/drivers/mtd/mtd_blkdevs.c @@ -346,7 +346,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) gd->minors = 1 << tr->part_bits; gd->fops = &mtd_block_ops; - if (tr->part_bits) { + if (tr->part_bits) if (new->devnum < 26) snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%c", tr->name, 'a' + new->devnum); @@ -355,11 +355,9 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) "%s%c%c", tr->name, 'a' - 1 + new->devnum / 26, 'a' + new->devnum % 26); - } else { + else snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%d", tr->name, new->devnum); - gd->flags |= GENHD_FL_NO_PART; - } set_capacity(gd, ((u64)new->size * tr->blksize) >> 9); From 78a780602075d8b00c98070fa26e389b3b3efa72 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 9 Dec 2021 08:54:29 -0700 Subject: [PATCH 111/143] io_uring: ensure task_work gets run as part of cancelations If we successfully cancel a work item but that work item needs to be processed through task_work, then we can be sleeping uninterruptibly in io_uring_cancel_generic() and never process it. Hence we don't make forward progress and we end up with an uninterruptible sleep warning. While in there, correct a comment that should be IFF, not IIF. Reported-and-tested-by: syzbot+21e6887c0be14181206d@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe --- fs/io_uring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index c4f217613f56..d5ab0e9a3f29 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -9824,7 +9824,7 @@ static __cold void io_uring_drop_tctx_refs(struct task_struct *task) /* * Find any io_uring ctx that this task has registered or done IO on, and cancel - * requests. @sqd should be not-null IIF it's an SQPOLL thread cancellation. + * requests. @sqd should be not-null IFF it's an SQPOLL thread cancellation. */ static __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd) @@ -9866,8 +9866,10 @@ static __cold void io_uring_cancel_generic(bool cancel_all, cancel_all); } - prepare_to_wait(&tctx->wait, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(&tctx->wait, &wait, TASK_INTERRUPTIBLE); + io_run_task_work(); io_uring_drop_tctx_refs(current); + /* * If we've seen completions, retry without waiting. This * avoids a race where a completion comes in before we did From 71a85387546e50b1a37b0fa45dadcae3bfb35cf6 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 10 Dec 2021 08:29:30 -0700 Subject: [PATCH 112/143] io-wq: check for wq exit after adding new worker task_work We check IO_WQ_BIT_EXIT before attempting to create a new worker, and wq exit cancels pending work if we have any. But it's possible to have a race between the two, where creation checks exit finding it not set, but we're in the process of exiting. The exit side will cancel pending creation task_work, but there's a gap where we add task_work after we've canceled existing creations at exit time. Fix this by checking the EXIT bit post adding the creation task_work. If it's set, run the same cancelation that exit does. Reported-and-tested-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com Reviewed-by: Hao Xu Signed-off-by: Jens Axboe --- fs/io-wq.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 35da9d90df76..8d2bb818a3bb 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -142,6 +142,7 @@ static bool io_acct_cancel_pending_work(struct io_wqe *wqe, struct io_wqe_acct *acct, struct io_cb_cancel_data *match); static void create_worker_cb(struct callback_head *cb); +static void io_wq_cancel_tw_create(struct io_wq *wq); static bool io_worker_get(struct io_worker *worker) { @@ -357,10 +358,22 @@ static bool io_queue_worker_create(struct io_worker *worker, test_and_set_bit_lock(0, &worker->create_state)) goto fail_release; + atomic_inc(&wq->worker_refs); init_task_work(&worker->create_work, func); worker->create_index = acct->index; - if (!task_work_add(wq->task, &worker->create_work, TWA_SIGNAL)) + if (!task_work_add(wq->task, &worker->create_work, TWA_SIGNAL)) { + /* + * EXIT may have been set after checking it above, check after + * adding the task_work and remove any creation item if it is + * now set. wq exit does that too, but we can have added this + * work item after we canceled in io_wq_exit_workers(). + */ + if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) + io_wq_cancel_tw_create(wq); + io_worker_ref_put(wq); return true; + } + io_worker_ref_put(wq); clear_bit_unlock(0, &worker->create_state); fail_release: io_worker_release(worker); @@ -1196,13 +1209,9 @@ void io_wq_exit_start(struct io_wq *wq) set_bit(IO_WQ_BIT_EXIT, &wq->state); } -static void io_wq_exit_workers(struct io_wq *wq) +static void io_wq_cancel_tw_create(struct io_wq *wq) { struct callback_head *cb; - int node; - - if (!wq->task) - return; while ((cb = task_work_cancel_match(wq->task, io_task_work_match, wq)) != NULL) { struct io_worker *worker; @@ -1210,6 +1219,16 @@ static void io_wq_exit_workers(struct io_wq *wq) worker = container_of(cb, struct io_worker, create_work); io_worker_cancel_cb(worker); } +} + +static void io_wq_exit_workers(struct io_wq *wq) +{ + int node; + + if (!wq->task) + return; + + io_wq_cancel_tw_create(wq); rcu_read_lock(); for_each_node(node) { From a74c313aca266fab0d1d1a72becbb8b7b5286b6e Mon Sep 17 00:00:00 2001 From: Chris Packham Date: Tue, 7 Dec 2021 17:21:44 +1300 Subject: [PATCH 113/143] i2c: mpc: Use atomic read and fix break condition Maxime points out that the polling code in mpc_i2c_isr should use the _atomic API because it is called in an irq context and that the behaviour of the MCF bit is that it is 1 when the byte transfer is complete. All of this means the original code was effectively a udelay(100). Fix this by using readb_poll_timeout_atomic() and removing the negation of the break condition. Fixes: 4a8ac5e45cda ("i2c: mpc: Poll for MCF") Reported-by: Maxime Bizon Signed-off-by: Chris Packham Tested-by: Maxime Bizon Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-mpc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c index a6ea1eb1394e..53b8da6dbb23 100644 --- a/drivers/i2c/busses/i2c-mpc.c +++ b/drivers/i2c/busses/i2c-mpc.c @@ -636,7 +636,7 @@ static irqreturn_t mpc_i2c_isr(int irq, void *dev_id) status = readb(i2c->base + MPC_I2C_SR); if (status & CSR_MIF) { /* Wait up to 100us for transfer to properly complete */ - readb_poll_timeout(i2c->base + MPC_I2C_SR, status, !(status & CSR_MCF), 0, 100); + readb_poll_timeout_atomic(i2c->base + MPC_I2C_SR, status, status & CSR_MCF, 0, 100); writeb(0, i2c->base + MPC_I2C_SR); mpc_i2c_do_intr(i2c, status); return IRQ_HANDLED; From 9dcc38e2813e0cd3b195940c98b181ce6ede8f20 Mon Sep 17 00:00:00 2001 From: Drew DeVault Date: Fri, 10 Dec 2021 14:46:09 -0800 Subject: [PATCH 114/143] Increase default MLOCK_LIMIT to 8 MiB This limit has not been updated since 2008, when it was increased to 64 KiB at the request of GnuPG. Until recently, the main use-cases for this feature were (1) preventing sensitive memory from being swapped, as in GnuPG's use-case; and (2) real-time use-cases. In the first case, little memory is called for, and in the second case, the user is generally in a position to increase it if they need more. The introduction of IOURING_REGISTER_BUFFERS adds a third use-case: preparing fixed buffers for high-performance I/O. This use-case will take as much of this memory as it can get, but is still limited to 64 KiB by default, which is very little. This increases the limit to 8 MB, which was chosen fairly arbitrarily as a more generous, but still conservative, default value. It is also possible to raise this limit in userspace. This is easily done, for example, in the use-case of a network daemon: systemd, for instance, provides for this via LimitMEMLOCK in the service file; OpenRC via the rc_ulimit variables. However, there is no established userspace facility for configuring this outside of daemons: end-user applications do not presently have access to a convenient means of raising their limits. The buck, as it were, stops with the kernel. It's much easier to address it here than it is to bring it to hundreds of distributions, and it can only realistically be relied upon to be high-enough by end-user software if it is more-or-less ubiquitous. Most distros don't change this particular rlimit from the kernel-supplied default value, so a change here will easily provide that ubiquity. Link: https://lkml.kernel.org/r/20211028080813.15966-1-sir@cmpwn.com Signed-off-by: Drew DeVault Acked-by: Jens Axboe Acked-by: Cyril Hrubis Acked-by: Johannes Weiner Cc: Pavel Begunkov Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Andrew Dona-Couch Cc: Ammar Faizi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/uapi/linux/resource.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/resource.h b/include/uapi/linux/resource.h index 74ef57b38f9f..ac5d6a3031db 100644 --- a/include/uapi/linux/resource.h +++ b/include/uapi/linux/resource.h @@ -66,10 +66,17 @@ struct rlimit64 { #define _STK_LIM (8*1024*1024) /* - * GPG2 wants 64kB of mlocked memory, to make sure pass phrases - * and other sensitive information are never written to disk. + * Limit the amount of locked memory by some sane default: + * root can always increase this limit if needed. + * + * The main use-cases are (1) preventing sensitive memory + * from being swapped; (2) real-time operations; (3) via + * IOURING_REGISTER_BUFFERS. + * + * The first two don't need much. The latter will take as + * much as it can get. 8MB is a reasonably sane default. */ -#define MLOCK_LIMIT ((PAGE_SIZE > 64*1024) ? PAGE_SIZE : 64*1024) +#define MLOCK_LIMIT (8*1024*1024) /* * Due to binary compatibility, the actual resource numbers From e943d28db257cc771f193bd75443f75ec8e8d978 Mon Sep 17 00:00:00 2001 From: Dave Young Date: Fri, 10 Dec 2021 14:46:12 -0800 Subject: [PATCH 115/143] MAINTAINERS: update kdump maintainers Remove myself from kdump maintainers as I have no enough time to maintain it now. But I can review patches on demand though. Link: https://lkml.kernel.org/r/YZyKilzKFsWJYdgn@dhcp-128-65.nay.redhat.com Signed-off-by: Dave Young Acked-by: Baoquan He Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 8691c531e297..cf77835aea35 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10279,9 +10279,9 @@ F: lib/Kconfig.kcsan F: scripts/Makefile.kcsan KDUMP -M: Dave Young M: Baoquan He R: Vivek Goyal +R: Dave Young L: kexec@lists.infradead.org S: Maintained W: http://lse.sourceforge.net/kdump/ From d020d9e63d5396fbcc3a2c01cee38e28c7d20a3d Mon Sep 17 00:00:00 2001 From: Guo Ren Date: Fri, 10 Dec 2021 14:46:15 -0800 Subject: [PATCH 116/143] mailmap: update email address for Guo Ren The ren_guo@c-sky.com would be deprecated and use guoren@kernel.org as the main email address. Link: https://lkml.kernel.org/r/20211123022741.545541-1-guoren@kernel.org Signed-off-by: Guo Ren Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .mailmap | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.mailmap b/.mailmap index 6277bb27b4bf..b344067e0acb 100644 --- a/.mailmap +++ b/.mailmap @@ -126,6 +126,8 @@ Greg Kroah-Hartman Greg Kroah-Hartman Greg Kurz Gregory CLEMENT +Guo Ren +Guo Ren Gustavo Padovan Gustavo Padovan Hanjun Guo From 0c941cf30b913d4a684d3f8d9eee60e0daffdc79 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 10 Dec 2021 14:46:18 -0800 Subject: [PATCH 117/143] filemap: remove PageHWPoison check from next_uptodate_page() Pages are individually marked as suffering from hardware poisoning. Checking that the head page is not hardware poisoned doesn't make sense; we might be after a subpage. We check each page individually before we use it, so this was an optimisation gone wrong. It will cause us to fall back to the slow path when there was no need to do that Link: https://lkml.kernel.org/r/20211120174429.2596303-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Naoya Horiguchi Cc: Yang Shi Cc: "Kirill A . Shutemov" Cc: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/filemap.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index daa0e23a6ee6..39c4c46c6133 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3253,8 +3253,6 @@ static struct page *next_uptodate_page(struct page *page, goto skip; if (!PageUptodate(page) || PageReadahead(page)) goto skip; - if (PageHWPoison(page)) - goto skip; if (!trylock_page(page)) goto skip; if (page->mapping != mapping) From e4779015fd5d2fb8390c258268addff24d6077c7 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:22 -0800 Subject: [PATCH 118/143] timers: implement usleep_idle_range() Patch series "mm/damon: Fix fake /proc/loadavg reports", v3. This patchset fixes DAMON's fake load report issue. The first patch makes yet another variant of usleep_range() for this fix, and the second patch fixes the issue of DAMON by making it using the newly introduced function. This patch (of 2): Some kernel threads such as DAMON could need to repeatedly sleep in micro seconds level. Because usleep_range() sleeps in uninterruptible state, however, such threads would make /proc/loadavg reports fake load. To help such cases, this commit implements a variant of usleep_range() called usleep_idle_range(). It is same to usleep_range() but sets the state of the current task as TASK_IDLE while sleeping. Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org Signed-off-by: SeongJae Park Suggested-by: Andrew Morton Reviewed-by: Thomas Gleixner Tested-by: Oleksandr Natalenko Cc: John Stultz Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/delay.h | 14 +++++++++++++- kernel/time/timer.c | 16 +++++++++------- 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/include/linux/delay.h b/include/linux/delay.h index 8eacf67eb212..039e7e0c7378 100644 --- a/include/linux/delay.h +++ b/include/linux/delay.h @@ -20,6 +20,7 @@ */ #include +#include extern unsigned long loops_per_jiffy; @@ -58,7 +59,18 @@ void calibrate_delay(void); void __attribute__((weak)) calibration_delay_done(void); void msleep(unsigned int msecs); unsigned long msleep_interruptible(unsigned int msecs); -void usleep_range(unsigned long min, unsigned long max); +void usleep_range_state(unsigned long min, unsigned long max, + unsigned int state); + +static inline void usleep_range(unsigned long min, unsigned long max) +{ + usleep_range_state(min, max, TASK_UNINTERRUPTIBLE); +} + +static inline void usleep_idle_range(unsigned long min, unsigned long max) +{ + usleep_range_state(min, max, TASK_IDLE); +} static inline void ssleep(unsigned int seconds) { diff --git a/kernel/time/timer.c b/kernel/time/timer.c index e3d2c23c413d..85f1021ad459 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -2054,26 +2054,28 @@ unsigned long msleep_interruptible(unsigned int msecs) EXPORT_SYMBOL(msleep_interruptible); /** - * usleep_range - Sleep for an approximate time - * @min: Minimum time in usecs to sleep - * @max: Maximum time in usecs to sleep + * usleep_range_state - Sleep for an approximate time in a given state + * @min: Minimum time in usecs to sleep + * @max: Maximum time in usecs to sleep + * @state: State of the current task that will be while sleeping * * In non-atomic context where the exact wakeup time is flexible, use - * usleep_range() instead of udelay(). The sleep improves responsiveness + * usleep_range_state() instead of udelay(). The sleep improves responsiveness * by avoiding the CPU-hogging busy-wait of udelay(), and the range reduces * power usage by allowing hrtimers to take advantage of an already- * scheduled interrupt instead of scheduling a new one just for this sleep. */ -void __sched usleep_range(unsigned long min, unsigned long max) +void __sched usleep_range_state(unsigned long min, unsigned long max, + unsigned int state) { ktime_t exp = ktime_add_us(ktime_get(), min); u64 delta = (u64)(max - min) * NSEC_PER_USEC; for (;;) { - __set_current_state(TASK_UNINTERRUPTIBLE); + __set_current_state(state); /* Do not return before the requested sleep time has elapsed */ if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS)) break; } } -EXPORT_SYMBOL(usleep_range); +EXPORT_SYMBOL(usleep_range_state); From 70e9274805fccfd175d0431a947bfd11ee7df40e Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:25 -0800 Subject: [PATCH 119/143] mm/damon/core: fix fake load reports due to uninterruptible sleeps Because DAMON sleeps in uninterruptible mode, /proc/loadavg reports fake load while DAMON is turned on, though it is doing nothing. This can confuse users[1]. To avoid the case, this commit makes DAMON sleeps in idle mode. [1] https://lore.kernel.org/all/11868371.O9o76ZdvQC@natalenko.name/ Link: https://lkml.kernel.org/r/20211126145015.15862-3-sj@kernel.org Fixes: 2224d8485492 ("mm: introduce Data Access MONitor (DAMON)") Reported-by: Oleksandr Natalenko Signed-off-by: SeongJae Park Tested-by: Oleksandr Natalenko Cc: John Stultz Cc: Thomas Gleixner Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/damon/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/damon/core.c b/mm/damon/core.c index c381b3c525d0..2daffd5820fe 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -981,9 +981,9 @@ static unsigned long damos_wmark_wait_us(struct damos *scheme) static void kdamond_usleep(unsigned long usecs) { if (usecs > 100 * 1000) - schedule_timeout_interruptible(usecs_to_jiffies(usecs)); + schedule_timeout_idle(usecs_to_jiffies(usecs)); else - usleep_range(usecs, usecs + 1); + usleep_idle_range(usecs, usecs + 1); } /* Returns negative error code if it's not activated but should return */ @@ -1038,7 +1038,7 @@ static int kdamond_fn(void *data) ctx->callback.after_sampling(ctx)) done = true; - usleep_range(ctx->sample_interval, ctx->sample_interval + 1); + kdamond_usleep(ctx->sample_interval); if (ctx->primitive.check_accesses) max_nr_accesses = ctx->primitive.check_accesses(ctx); From 4de46a30b9929d3d1b29e481d48e9c25f8ac7919 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:28 -0800 Subject: [PATCH 120/143] mm/damon/core: use better timer mechanisms selection threshold Patch series "mm/damon: Trivial fixups and improvements". This patchset contains trivial fixups and improvements for DAMON and its kunit/kselftest tests. This patch (of 11): DAMON is using hrtimer if requested sleep time is <=100ms, while the suggested threshold[1] is <=20ms. This commit applies the threshold. [1] Documentation/timers/timers-howto.rst Link: https://lkml.kernel.org/r/20211201150440.1088-2-sj@kernel.org Fixes: ee801b7dd7822 ("mm/damon/schemes: activate schemes based on a watermarks mechanism") Signed-off-by: SeongJae Park Cc: Shuah Khan Cc: Brendan Higgins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/damon/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/damon/core.c b/mm/damon/core.c index 2daffd5820fe..eefb2ada67ca 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -980,7 +980,8 @@ static unsigned long damos_wmark_wait_us(struct damos *scheme) static void kdamond_usleep(unsigned long usecs) { - if (usecs > 100 * 1000) + /* See Documentation/timers/timers-howto.rst for the thresholds */ + if (usecs > 20 * USEC_PER_MSEC) schedule_timeout_idle(usecs_to_jiffies(usecs)); else usleep_idle_range(usecs, usecs + 1); From 0bceffa236af401f5206feaf3538526cbc427209 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:31 -0800 Subject: [PATCH 121/143] mm/damon/dbgfs: remove an unnecessary error message When wrong scheme action is requested via the debugfs interface, DAMON prints an error message. Because the function returns error code, this is not really needed. Because the code path is triggered by the user specified input, this can result in kernel log mistakenly being messy. To avoid the case, this commit removes the message. Link: https://lkml.kernel.org/r/20211201150440.1088-3-sj@kernel.org Fixes: af122dd8f3c0 ("mm/damon/dbgfs: support DAMON-based Operation Schemes") Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/damon/dbgfs.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index 9b520bb4a3e7..1efac0022e9a 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -210,10 +210,8 @@ static struct damos **str_to_schemes(const char *str, ssize_t len, &wmarks.low, &parsed); if (ret != 18) break; - if (!damos_action_valid(action)) { - pr_err("wrong action %d\n", action); + if (!damos_action_valid(action)) goto fail; - } pos += parsed; scheme = damon_new_scheme(min_sz, max_sz, min_nr_a, max_nr_a, From 1afaf5cb687de85c5e00ac70f6eea5597077cbc5 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:34 -0800 Subject: [PATCH 122/143] mm/damon/core: remove unnecessary error messages DAMON core prints error messages when damon_target object creation is failed or wrong monitoring attributes are given. Because appropriate error code is returned for each case, the messages are not essential. Also, because the code path can be triggered with user-specified input, this could result in kernel log mistakenly being messy. To avoid the case, this commit removes the messages. Link: https://lkml.kernel.org/r/20211201150440.1088-4-sj@kernel.org Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface") Fixes: b9a6ac4e4ede ("mm/damon: adaptively adjust regions") Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: kernel test robot Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/damon/core.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/mm/damon/core.c b/mm/damon/core.c index eefb2ada67ca..e92497895202 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -282,7 +282,6 @@ int damon_set_targets(struct damon_ctx *ctx, for (i = 0; i < nr_ids; i++) { t = damon_new_target(ids[i]); if (!t) { - pr_err("Failed to alloc damon_target\n"); /* The caller should do cleanup of the ids itself */ damon_for_each_target_safe(t, next, ctx) damon_destroy_target(t); @@ -312,16 +311,10 @@ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, unsigned long aggr_int, unsigned long primitive_upd_int, unsigned long min_nr_reg, unsigned long max_nr_reg) { - if (min_nr_reg < 3) { - pr_err("min_nr_regions (%lu) must be at least 3\n", - min_nr_reg); + if (min_nr_reg < 3) return -EINVAL; - } - if (min_nr_reg > max_nr_reg) { - pr_err("invalid nr_regions. min (%lu) > max (%lu)\n", - min_nr_reg, max_nr_reg); + if (min_nr_reg > max_nr_reg) return -EINVAL; - } ctx->sample_interval = sample_int; ctx->aggr_interval = aggr_int; From 09e12289cc044afa484e70c0b379d579d52caf9a Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:37 -0800 Subject: [PATCH 123/143] mm/damon/vaddr: remove an unnecessary warning message The DAMON virtual address space monitoring primitive prints a warning message for wrong DAMOS action. However, it is not essential as the code returns appropriate failure in the case. This commit removes the message to make the log clean. Link: https://lkml.kernel.org/r/20211201150440.1088-5-sj@kernel.org Fixes: 6dea8add4d28 ("mm/damon/vaddr: support DAMON-based Operation Schemes") Signed-off-by: SeongJae Park Reviewed-by: Muchun Song Cc: Brendan Higgins Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/damon/vaddr.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 47f47f60440e..20a9a9d69eb1 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -627,7 +627,6 @@ int damon_va_apply_scheme(struct damon_ctx *ctx, struct damon_target *t, case DAMOS_STAT: return 0; default: - pr_warn("Wrong action %d\n", scheme->action); return -EINVAL; } From 044cd9750fe010170f5dc812e4824d98f5ea928c Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:40 -0800 Subject: [PATCH 124/143] mm/damon/vaddr-test: split a test function having >1024 bytes frame size On some configuration[1], 'damon_test_split_evenly()' kunit test function has >1024 bytes frame size, so below build warning is triggered: CC mm/damon/vaddr.o In file included from mm/damon/vaddr.c:672: mm/damon/vaddr-test.h: In function 'damon_test_split_evenly': mm/damon/vaddr-test.h:309:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=] 309 | } | ^ This commit fixes the warning by separating the common logic in the function. [1] https://lore.kernel.org/linux-mm/202111182146.OV3C4uGr-lkp@intel.com/ Link: https://lkml.kernel.org/r/20211201150440.1088-6-sj@kernel.org Fixes: 17ccae8bb5c9 ("mm/damon: add kunit tests") Signed-off-by: SeongJae Park Reported-by: kernel test robot Cc: Brendan Higgins Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/damon/vaddr-test.h | 93 ++++++++++++++++++++++--------------------- 1 file changed, 48 insertions(+), 45 deletions(-) diff --git a/mm/damon/vaddr-test.h b/mm/damon/vaddr-test.h index ecfd0b2ed222..3097ef9c662a 100644 --- a/mm/damon/vaddr-test.h +++ b/mm/damon/vaddr-test.h @@ -252,59 +252,62 @@ static void damon_test_apply_three_regions4(struct kunit *test) new_three_regions, expected, ARRAY_SIZE(expected)); } +static void damon_test_split_evenly_fail(struct kunit *test, + unsigned long start, unsigned long end, unsigned int nr_pieces) +{ + struct damon_target *t = damon_new_target(42); + struct damon_region *r = damon_new_region(start, end); + + damon_add_region(r, t); + KUNIT_EXPECT_EQ(test, + damon_va_evenly_split_region(t, r, nr_pieces), -EINVAL); + KUNIT_EXPECT_EQ(test, damon_nr_regions(t), 1u); + + damon_for_each_region(r, t) { + KUNIT_EXPECT_EQ(test, r->ar.start, start); + KUNIT_EXPECT_EQ(test, r->ar.end, end); + } + + damon_free_target(t); +} + +static void damon_test_split_evenly_succ(struct kunit *test, + unsigned long start, unsigned long end, unsigned int nr_pieces) +{ + struct damon_target *t = damon_new_target(42); + struct damon_region *r = damon_new_region(start, end); + unsigned long expected_width = (end - start) / nr_pieces; + unsigned long i = 0; + + damon_add_region(r, t); + KUNIT_EXPECT_EQ(test, + damon_va_evenly_split_region(t, r, nr_pieces), 0); + KUNIT_EXPECT_EQ(test, damon_nr_regions(t), nr_pieces); + + damon_for_each_region(r, t) { + if (i == nr_pieces - 1) + break; + KUNIT_EXPECT_EQ(test, + r->ar.start, start + i++ * expected_width); + KUNIT_EXPECT_EQ(test, r->ar.end, start + i * expected_width); + } + KUNIT_EXPECT_EQ(test, r->ar.start, start + i * expected_width); + KUNIT_EXPECT_EQ(test, r->ar.end, end); + damon_free_target(t); +} + static void damon_test_split_evenly(struct kunit *test) { struct damon_ctx *c = damon_new_ctx(); - struct damon_target *t; - struct damon_region *r; - unsigned long i; KUNIT_EXPECT_EQ(test, damon_va_evenly_split_region(NULL, NULL, 5), -EINVAL); - t = damon_new_target(42); - r = damon_new_region(0, 100); - KUNIT_EXPECT_EQ(test, damon_va_evenly_split_region(t, r, 0), -EINVAL); + damon_test_split_evenly_fail(test, 0, 100, 0); + damon_test_split_evenly_succ(test, 0, 100, 10); + damon_test_split_evenly_succ(test, 5, 59, 5); + damon_test_split_evenly_fail(test, 5, 6, 2); - damon_add_region(r, t); - KUNIT_EXPECT_EQ(test, damon_va_evenly_split_region(t, r, 10), 0); - KUNIT_EXPECT_EQ(test, damon_nr_regions(t), 10u); - - i = 0; - damon_for_each_region(r, t) { - KUNIT_EXPECT_EQ(test, r->ar.start, i++ * 10); - KUNIT_EXPECT_EQ(test, r->ar.end, i * 10); - } - damon_free_target(t); - - t = damon_new_target(42); - r = damon_new_region(5, 59); - damon_add_region(r, t); - KUNIT_EXPECT_EQ(test, damon_va_evenly_split_region(t, r, 5), 0); - KUNIT_EXPECT_EQ(test, damon_nr_regions(t), 5u); - - i = 0; - damon_for_each_region(r, t) { - if (i == 4) - break; - KUNIT_EXPECT_EQ(test, r->ar.start, 5 + 10 * i++); - KUNIT_EXPECT_EQ(test, r->ar.end, 5 + 10 * i); - } - KUNIT_EXPECT_EQ(test, r->ar.start, 5 + 10 * i); - KUNIT_EXPECT_EQ(test, r->ar.end, 59ul); - damon_free_target(t); - - t = damon_new_target(42); - r = damon_new_region(5, 6); - damon_add_region(r, t); - KUNIT_EXPECT_EQ(test, damon_va_evenly_split_region(t, r, 2), -EINVAL); - KUNIT_EXPECT_EQ(test, damon_nr_regions(t), 1u); - - damon_for_each_region(r, t) { - KUNIT_EXPECT_EQ(test, r->ar.start, 5ul); - KUNIT_EXPECT_EQ(test, r->ar.end, 6ul); - } - damon_free_target(t); damon_destroy_ctx(c); } From 9f86d624292c238203b3687cdb870a2cde1a6f9b Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:43 -0800 Subject: [PATCH 125/143] mm/damon/vaddr-test: remove unnecessary variables A couple of test functions in DAMON virtual address space monitoring primitives implementation has unnecessary damon_ctx variables. This commit removes those. Link: https://lkml.kernel.org/r/20211201150440.1088-7-sj@kernel.org Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/damon/vaddr-test.h | 8 -------- 1 file changed, 8 deletions(-) diff --git a/mm/damon/vaddr-test.h b/mm/damon/vaddr-test.h index 3097ef9c662a..6a1b9272ea12 100644 --- a/mm/damon/vaddr-test.h +++ b/mm/damon/vaddr-test.h @@ -135,7 +135,6 @@ static void damon_do_test_apply_three_regions(struct kunit *test, struct damon_addr_range *three_regions, unsigned long *expected, int nr_expected) { - struct damon_ctx *ctx = damon_new_ctx(); struct damon_target *t; struct damon_region *r; int i; @@ -145,7 +144,6 @@ static void damon_do_test_apply_three_regions(struct kunit *test, r = damon_new_region(regions[i * 2], regions[i * 2 + 1]); damon_add_region(r, t); } - damon_add_target(ctx, t); damon_va_apply_three_regions(t, three_regions); @@ -154,8 +152,6 @@ static void damon_do_test_apply_three_regions(struct kunit *test, KUNIT_EXPECT_EQ(test, r->ar.start, expected[i * 2]); KUNIT_EXPECT_EQ(test, r->ar.end, expected[i * 2 + 1]); } - - damon_destroy_ctx(ctx); } /* @@ -298,8 +294,6 @@ static void damon_test_split_evenly_succ(struct kunit *test, static void damon_test_split_evenly(struct kunit *test) { - struct damon_ctx *c = damon_new_ctx(); - KUNIT_EXPECT_EQ(test, damon_va_evenly_split_region(NULL, NULL, 5), -EINVAL); @@ -307,8 +301,6 @@ static void damon_test_split_evenly(struct kunit *test) damon_test_split_evenly_succ(test, 0, 100, 10); damon_test_split_evenly_succ(test, 5, 59, 5); damon_test_split_evenly_fail(test, 5, 6, 2); - - damon_destroy_ctx(c); } static struct kunit_case damon_test_cases[] = { From 964e17016cf99902c79a5de095cc5e57e7d58248 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:46 -0800 Subject: [PATCH 126/143] selftests/damon: skip test if DAMON is running Testing the DAMON debugfs files while DAMON is running makes no sense, as any write to the debugfs files will fail. This commit makes the test be skipped in this case. Link: https://lkml.kernel.org/r/20211201150440.1088-8-sj@kernel.org Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/selftests/damon/debugfs_attrs.sh | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh index 196b6640bf37..fc80380c59f0 100644 --- a/tools/testing/selftests/damon/debugfs_attrs.sh +++ b/tools/testing/selftests/damon/debugfs_attrs.sh @@ -44,6 +44,15 @@ test_content() { source ./_chk_dependency.sh +ksft_skip=4 + +damon_onoff="$DBGFS/monitor_on" +if [ $(cat "$damon_onoff") = "on" ] +then + echo "monitoring is on" + exit $ksft_skip +fi + # Test attrs file # =============== From c6980e30af356e85699f37142ae435a6aa483ceb Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:49 -0800 Subject: [PATCH 127/143] selftests/damon: test DAMON enabling with empty target_ids case DAMON debugfs didn't check empty targets when starting monitoring, and the issue is fixed with commit b5ca3e83ddb0 ("mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on"). To avoid future regression, this commit adds a test case for that in DAMON selftests. Link: https://lkml.kernel.org/r/20211201150440.1088-9-sj@kernel.org Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/selftests/damon/debugfs_attrs.sh | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh index fc80380c59f0..d0916373f310 100644 --- a/tools/testing/selftests/damon/debugfs_attrs.sh +++ b/tools/testing/selftests/damon/debugfs_attrs.sh @@ -94,4 +94,13 @@ test_write_succ "$file" "" "$orig_content" "empty input" test_content "$file" "$orig_content" "" "empty input written" echo "$orig_content" > "$file" +# Test empty targets case +# ======================= + +orig_target_ids=$(cat "$DBGFS/target_ids") +echo "" > "$DBGFS/target_ids" +orig_monitor_on=$(cat "$DBGFS/monitor_on") +test_write_fail "$DBGFS/monitor_on" "on" "orig_monitor_on" "empty target ids" +echo "$orig_target_ids" > "$DBGFS/target_ids" + echo "PASS" From d85570c655cc2c257b7da37a6d1fa4c59443c055 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:52 -0800 Subject: [PATCH 128/143] selftests/damon: test wrong DAMOS condition ranges input A patch titled "mm/damon/schemes: add the validity judgment of thresholds"[1] makes DAMON debugfs interface to validate DAMON scheme inputs. This commit adds a test case for the validation logic in DAMON selftests. [1] https://lore.kernel.org/linux-mm/d78360e52158d786fcbf20bc62c96785742e76d3.1637239568.git.xhao@linux.alibaba.com/ Link: https://lkml.kernel.org/r/20211201150440.1088-10-sj@kernel.org Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/selftests/damon/debugfs_attrs.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh index d0916373f310..1ef118617167 100644 --- a/tools/testing/selftests/damon/debugfs_attrs.sh +++ b/tools/testing/selftests/damon/debugfs_attrs.sh @@ -77,6 +77,8 @@ test_write_succ "$file" "1 2 3 4 5 6 4 0 0 0 1 2 3 1 100 3 2 1" \ test_write_fail "$file" "1 2 3 4 5 6 3 0 0 0 1 2 3 1 100 3 2 1" "$orig_content" "multi lines" test_write_succ "$file" "" "$orig_content" "disabling" +test_write_fail "$file" "2 1 2 1 10 1 3 10 1 1 1 1 1 1 1 1 2 3" \ + "$orig_content" "wrong condition ranges" echo "$orig_content" > "$file" # Test target_ids file From b4a002889d24979295ed3c2bf1d5fcfb3901026a Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:55 -0800 Subject: [PATCH 129/143] selftests/damon: test debugfs file reads/writes with huge count DAMON debugfs interface users were able to trigger warning by writing some files with arbitrarily large 'count' parameter. The issue is fixed with commit db7a347b26fe ("mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation"). This commit adds a test case for the issue in DAMON selftests to avoid future regressions. Link: https://lkml.kernel.org/r/20211201150440.1088-11-sj@kernel.org Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/selftests/damon/.gitignore | 2 + tools/testing/selftests/damon/Makefile | 2 + .../testing/selftests/damon/debugfs_attrs.sh | 18 +++++++++ .../selftests/damon/huge_count_read_write.c | 39 +++++++++++++++++++ 4 files changed, 61 insertions(+) create mode 100644 tools/testing/selftests/damon/.gitignore create mode 100644 tools/testing/selftests/damon/huge_count_read_write.c diff --git a/tools/testing/selftests/damon/.gitignore b/tools/testing/selftests/damon/.gitignore new file mode 100644 index 000000000000..c6c2965a6607 --- /dev/null +++ b/tools/testing/selftests/damon/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +huge_count_read_write diff --git a/tools/testing/selftests/damon/Makefile b/tools/testing/selftests/damon/Makefile index 8a3f2cd9fec0..f0aa954b5d13 100644 --- a/tools/testing/selftests/damon/Makefile +++ b/tools/testing/selftests/damon/Makefile @@ -1,6 +1,8 @@ # SPDX-License-Identifier: GPL-2.0 # Makefile for damon selftests +TEST_GEN_FILES += huge_count_read_write + TEST_FILES = _chk_dependency.sh TEST_PROGS = debugfs_attrs.sh diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh index 1ef118617167..23a7b48ca7d3 100644 --- a/tools/testing/selftests/damon/debugfs_attrs.sh +++ b/tools/testing/selftests/damon/debugfs_attrs.sh @@ -105,4 +105,22 @@ orig_monitor_on=$(cat "$DBGFS/monitor_on") test_write_fail "$DBGFS/monitor_on" "on" "orig_monitor_on" "empty target ids" echo "$orig_target_ids" > "$DBGFS/target_ids" +# Test huge count read write +# ========================== + +dmesg -C + +for file in "$DBGFS/"* +do + ./huge_count_read_write "$file" +done + +if dmesg | grep -q WARNING +then + dmesg + exit 1 +else + exit 0 +fi + echo "PASS" diff --git a/tools/testing/selftests/damon/huge_count_read_write.c b/tools/testing/selftests/damon/huge_count_read_write.c new file mode 100644 index 000000000000..ad7a6b4cf338 --- /dev/null +++ b/tools/testing/selftests/damon/huge_count_read_write.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Author: SeongJae Park + */ + +#include +#include +#include +#include + +void write_read_with_huge_count(char *file) +{ + int filedesc = open(file, O_RDWR); + char buf[25]; + int ret; + + printf("%s %s\n", __func__, file); + if (filedesc < 0) { + fprintf(stderr, "failed opening %s\n", file); + exit(1); + } + + write(filedesc, "", 0xfffffffful); + perror("after write: "); + ret = read(filedesc, buf, 0xfffffffful); + perror("after read: "); + close(filedesc); +} + +int main(int argc, char *argv[]) +{ + if (argc != 2) { + fprintf(stderr, "Usage: %s \n", argv[0]); + exit(1); + } + write_read_with_huge_count(argv[1]); + + return 0; +} From 9ab3b0c8ef629f60ef25cb7634ad305315ae94d1 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 10 Dec 2021 14:46:59 -0800 Subject: [PATCH 130/143] selftests/damon: split test cases Currently, the single test program, debugfs.sh, contains all test cases for DAMON. When one of the cases fails, finding which case is failed from the test log is not so easy, and all remaining tests will be skipped. To improve the situation, this commit splits the single program into small test programs having their own names. Link: https://lkml.kernel.org/r/20211201150440.1088-12-sj@kernel.org Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/selftests/damon/Makefile | 5 +- .../selftests/damon/_debugfs_common.sh | 52 ++++++++ .../testing/selftests/damon/debugfs_attrs.sh | 111 +----------------- .../selftests/damon/debugfs_empty_targets.sh | 13 ++ .../damon/debugfs_huge_count_read_write.sh | 22 ++++ .../selftests/damon/debugfs_schemes.sh | 19 +++ .../selftests/damon/debugfs_target_ids.sh | 19 +++ 7 files changed, 129 insertions(+), 112 deletions(-) create mode 100644 tools/testing/selftests/damon/_debugfs_common.sh create mode 100644 tools/testing/selftests/damon/debugfs_empty_targets.sh create mode 100644 tools/testing/selftests/damon/debugfs_huge_count_read_write.sh create mode 100644 tools/testing/selftests/damon/debugfs_schemes.sh create mode 100644 tools/testing/selftests/damon/debugfs_target_ids.sh diff --git a/tools/testing/selftests/damon/Makefile b/tools/testing/selftests/damon/Makefile index f0aa954b5d13..937d36ae9a69 100644 --- a/tools/testing/selftests/damon/Makefile +++ b/tools/testing/selftests/damon/Makefile @@ -3,7 +3,8 @@ TEST_GEN_FILES += huge_count_read_write -TEST_FILES = _chk_dependency.sh -TEST_PROGS = debugfs_attrs.sh +TEST_FILES = _chk_dependency.sh _debugfs_common.sh +TEST_PROGS = debugfs_attrs.sh debugfs_schemes.sh debugfs_target_ids.sh +TEST_PROGS += debugfs_empty_targets.sh debugfs_huge_count_read_write.sh include ../lib.mk diff --git a/tools/testing/selftests/damon/_debugfs_common.sh b/tools/testing/selftests/damon/_debugfs_common.sh new file mode 100644 index 000000000000..48989d4813ae --- /dev/null +++ b/tools/testing/selftests/damon/_debugfs_common.sh @@ -0,0 +1,52 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +test_write_result() { + file=$1 + content=$2 + orig_content=$3 + expect_reason=$4 + expected=$5 + + echo "$content" > "$file" + if [ $? -ne "$expected" ] + then + echo "writing $content to $file doesn't return $expected" + echo "expected because: $expect_reason" + echo "$orig_content" > "$file" + exit 1 + fi +} + +test_write_succ() { + test_write_result "$1" "$2" "$3" "$4" 0 +} + +test_write_fail() { + test_write_result "$1" "$2" "$3" "$4" 1 +} + +test_content() { + file=$1 + orig_content=$2 + expected=$3 + expect_reason=$4 + + content=$(cat "$file") + if [ "$content" != "$expected" ] + then + echo "reading $file expected $expected but $content" + echo "expected because: $expect_reason" + echo "$orig_content" > "$file" + exit 1 + fi +} + +source ./_chk_dependency.sh + +damon_onoff="$DBGFS/monitor_on" +if [ $(cat "$damon_onoff") = "on" ] +then + echo "monitoring is on" + exit $ksft_skip +fi diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh index 23a7b48ca7d3..902e312bca89 100644 --- a/tools/testing/selftests/damon/debugfs_attrs.sh +++ b/tools/testing/selftests/damon/debugfs_attrs.sh @@ -1,57 +1,7 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 -test_write_result() { - file=$1 - content=$2 - orig_content=$3 - expect_reason=$4 - expected=$5 - - echo "$content" > "$file" - if [ $? -ne "$expected" ] - then - echo "writing $content to $file doesn't return $expected" - echo "expected because: $expect_reason" - echo "$orig_content" > "$file" - exit 1 - fi -} - -test_write_succ() { - test_write_result "$1" "$2" "$3" "$4" 0 -} - -test_write_fail() { - test_write_result "$1" "$2" "$3" "$4" 1 -} - -test_content() { - file=$1 - orig_content=$2 - expected=$3 - expect_reason=$4 - - content=$(cat "$file") - if [ "$content" != "$expected" ] - then - echo "reading $file expected $expected but $content" - echo "expected because: $expect_reason" - echo "$orig_content" > "$file" - exit 1 - fi -} - -source ./_chk_dependency.sh - -ksft_skip=4 - -damon_onoff="$DBGFS/monitor_on" -if [ $(cat "$damon_onoff") = "on" ] -then - echo "monitoring is on" - exit $ksft_skip -fi +source _debugfs_common.sh # Test attrs file # =============== @@ -65,62 +15,3 @@ test_write_fail "$file" "1 2 3 5 4" "$orig_content" \ "min_nr_regions > max_nr_regions" test_content "$file" "$orig_content" "1 2 3 4 5" "successfully written" echo "$orig_content" > "$file" - -# Test schemes file -# ================= - -file="$DBGFS/schemes" -orig_content=$(cat "$file") - -test_write_succ "$file" "1 2 3 4 5 6 4 0 0 0 1 2 3 1 100 3 2 1" \ - "$orig_content" "valid input" -test_write_fail "$file" "1 2 -3 4 5 6 3 0 0 0 1 2 3 1 100 3 2 1" "$orig_content" "multi lines" -test_write_succ "$file" "" "$orig_content" "disabling" -test_write_fail "$file" "2 1 2 1 10 1 3 10 1 1 1 1 1 1 1 1 2 3" \ - "$orig_content" "wrong condition ranges" -echo "$orig_content" > "$file" - -# Test target_ids file -# ==================== - -file="$DBGFS/target_ids" -orig_content=$(cat "$file") - -test_write_succ "$file" "1 2 3 4" "$orig_content" "valid input" -test_write_succ "$file" "1 2 abc 4" "$orig_content" "still valid input" -test_content "$file" "$orig_content" "1 2" "non-integer was there" -test_write_succ "$file" "abc 2 3" "$orig_content" "the file allows wrong input" -test_content "$file" "$orig_content" "" "wrong input written" -test_write_succ "$file" "" "$orig_content" "empty input" -test_content "$file" "$orig_content" "" "empty input written" -echo "$orig_content" > "$file" - -# Test empty targets case -# ======================= - -orig_target_ids=$(cat "$DBGFS/target_ids") -echo "" > "$DBGFS/target_ids" -orig_monitor_on=$(cat "$DBGFS/monitor_on") -test_write_fail "$DBGFS/monitor_on" "on" "orig_monitor_on" "empty target ids" -echo "$orig_target_ids" > "$DBGFS/target_ids" - -# Test huge count read write -# ========================== - -dmesg -C - -for file in "$DBGFS/"* -do - ./huge_count_read_write "$file" -done - -if dmesg | grep -q WARNING -then - dmesg - exit 1 -else - exit 0 -fi - -echo "PASS" diff --git a/tools/testing/selftests/damon/debugfs_empty_targets.sh b/tools/testing/selftests/damon/debugfs_empty_targets.sh new file mode 100644 index 000000000000..87aff8083822 --- /dev/null +++ b/tools/testing/selftests/damon/debugfs_empty_targets.sh @@ -0,0 +1,13 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +source _debugfs_common.sh + +# Test empty targets case +# ======================= + +orig_target_ids=$(cat "$DBGFS/target_ids") +echo "" > "$DBGFS/target_ids" +orig_monitor_on=$(cat "$DBGFS/monitor_on") +test_write_fail "$DBGFS/monitor_on" "on" "orig_monitor_on" "empty target ids" +echo "$orig_target_ids" > "$DBGFS/target_ids" diff --git a/tools/testing/selftests/damon/debugfs_huge_count_read_write.sh b/tools/testing/selftests/damon/debugfs_huge_count_read_write.sh new file mode 100644 index 000000000000..922cadac2950 --- /dev/null +++ b/tools/testing/selftests/damon/debugfs_huge_count_read_write.sh @@ -0,0 +1,22 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +source _debugfs_common.sh + +# Test huge count read write +# ========================== + +dmesg -C + +for file in "$DBGFS/"* +do + ./huge_count_read_write "$file" +done + +if dmesg | grep -q WARNING +then + dmesg + exit 1 +else + exit 0 +fi diff --git a/tools/testing/selftests/damon/debugfs_schemes.sh b/tools/testing/selftests/damon/debugfs_schemes.sh new file mode 100644 index 000000000000..5b39ab44731c --- /dev/null +++ b/tools/testing/selftests/damon/debugfs_schemes.sh @@ -0,0 +1,19 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +source _debugfs_common.sh + +# Test schemes file +# ================= + +file="$DBGFS/schemes" +orig_content=$(cat "$file") + +test_write_succ "$file" "1 2 3 4 5 6 4 0 0 0 1 2 3 1 100 3 2 1" \ + "$orig_content" "valid input" +test_write_fail "$file" "1 2 +3 4 5 6 3 0 0 0 1 2 3 1 100 3 2 1" "$orig_content" "multi lines" +test_write_succ "$file" "" "$orig_content" "disabling" +test_write_fail "$file" "2 1 2 1 10 1 3 10 1 1 1 1 1 1 1 1 2 3" \ + "$orig_content" "wrong condition ranges" +echo "$orig_content" > "$file" diff --git a/tools/testing/selftests/damon/debugfs_target_ids.sh b/tools/testing/selftests/damon/debugfs_target_ids.sh new file mode 100644 index 000000000000..49aeabdb0aae --- /dev/null +++ b/tools/testing/selftests/damon/debugfs_target_ids.sh @@ -0,0 +1,19 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +source _debugfs_common.sh + +# Test target_ids file +# ==================== + +file="$DBGFS/target_ids" +orig_content=$(cat "$file") + +test_write_succ "$file" "1 2 3 4" "$orig_content" "valid input" +test_write_succ "$file" "1 2 abc 4" "$orig_content" "still valid input" +test_content "$file" "$orig_content" "1 2" "non-integer was there" +test_write_succ "$file" "abc 2 3" "$orig_content" "the file allows wrong input" +test_content "$file" "$orig_content" "" "wrong input written" +test_write_succ "$file" "" "$orig_content" "empty input" +test_content "$file" "$orig_content" "" "empty input written" +echo "$orig_content" > "$file" From 005a79e5c254c3f60ec269a459cc41b55028c798 Mon Sep 17 00:00:00 2001 From: Gerald Schaefer Date: Fri, 10 Dec 2021 14:47:02 -0800 Subject: [PATCH 131/143] mm/slub: fix endianness bug for alloc/free_traces attributes On big-endian s390, the alloc/free_traces attributes produce endless output, because of always 0 idx in slab_debugfs_show(). idx is de-referenced from *v, which points to a loff_t value, with unsigned int idx = *(unsigned int *)v; This will only give the upper 32 bits on big-endian, which remain 0. Instead of only fixing this de-reference, during discussion it seemed more appropriate to change the seq_ops so that they use an explicit iterator in private loc_track struct. This patch adds idx to loc_track, which will also fix the endianness bug. Link: https://lore.kernel.org/r/20211117193932.4049412-1-gerald.schaefer@linux.ibm.com Link: https://lkml.kernel.org/r/20211126171848.17534-1-gerald.schaefer@linux.ibm.com Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs") Signed-off-by: Gerald Schaefer Reported-by: Steffen Maier Acked-by: Vlastimil Babka Cc: Faiyaz Mohammed Cc: Greg Kroah-Hartman Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/slub.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index a8626825a829..abe7db581d68 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5081,6 +5081,7 @@ struct loc_track { unsigned long max; unsigned long count; struct location *loc; + loff_t idx; }; static struct dentry *slab_debugfs_root; @@ -6052,11 +6053,11 @@ __initcall(slab_sysfs_init); #if defined(CONFIG_SLUB_DEBUG) && defined(CONFIG_DEBUG_FS) static int slab_debugfs_show(struct seq_file *seq, void *v) { - - struct location *l; - unsigned int idx = *(unsigned int *)v; struct loc_track *t = seq->private; + struct location *l; + unsigned long idx; + idx = (unsigned long) t->idx; if (idx < t->count) { l = &t->loc[idx]; @@ -6105,16 +6106,18 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos) { struct loc_track *t = seq->private; - v = ppos; - ++*ppos; + t->idx = ++(*ppos); if (*ppos <= t->count) - return v; + return ppos; return NULL; } static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos) { + struct loc_track *t = seq->private; + + t->idx = *ppos; return ppos; } From a7ebf564de325e1d7d52cd85b12721c424338bcc Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Fri, 10 Dec 2021 14:47:05 -0800 Subject: [PATCH 132/143] mm/memcg: relocate mod_objcg_mlstate(), get_obj_stock() and put_obj_stock() All the calls to mod_objcg_mlstate(), get_obj_stock() and put_obj_stock() are done by functions defined within the same "#ifdef CONFIG_MEMCG_KMEM" compilation block. When CONFIG_MEMCG_KMEM isn't defined, the following compilation warnings will be issued [1] and [2]. mm/memcontrol.c:785:20: warning: unused function 'mod_objcg_mlstate' mm/memcontrol.c:2113:33: warning: unused function 'get_obj_stock' Fix these warning by moving those functions to under the same CONFIG_MEMCG_KMEM compilation block. There is no functional change. [1] https://lore.kernel.org/lkml/202111272014.WOYNLUV6-lkp@intel.com/ [2] https://lore.kernel.org/lkml/202111280551.LXsWYt1T-lkp@intel.com/ Link: https://lkml.kernel.org/r/20211129161140.306488-1-longman@redhat.com Fixes: 559271146efc ("mm/memcg: optimize user context object stock access") Fixes: 68ac5b3c8db2 ("mm/memcg: cache vmstat data in percpu memcg_stock_pcp") Signed-off-by: Waiman Long Reported-by: kernel test robot Reviewed-by: Shakeel Butt Acked-by: Roman Gushchin Reviewed-by: Muchun Song Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 106 ++++++++++++++++++++++++------------------------ 1 file changed, 53 insertions(+), 53 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6863a834ed42..2ed5f2a0879d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -776,24 +776,6 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val) rcu_read_unlock(); } -/* - * mod_objcg_mlstate() may be called with irq enabled, so - * mod_memcg_lruvec_state() should be used. - */ -static inline void mod_objcg_mlstate(struct obj_cgroup *objcg, - struct pglist_data *pgdat, - enum node_stat_item idx, int nr) -{ - struct mem_cgroup *memcg; - struct lruvec *lruvec; - - rcu_read_lock(); - memcg = obj_cgroup_memcg(objcg); - lruvec = mem_cgroup_lruvec(memcg, pgdat); - mod_memcg_lruvec_state(lruvec, idx, nr); - rcu_read_unlock(); -} - /** * __count_memcg_events - account VM events in a cgroup * @memcg: the memory cgroup @@ -2137,41 +2119,6 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, } #endif -/* - * Most kmem_cache_alloc() calls are from user context. The irq disable/enable - * sequence used in this case to access content from object stock is slow. - * To optimize for user context access, there are now two object stocks for - * task context and interrupt context access respectively. - * - * The task context object stock can be accessed by disabling preemption only - * which is cheap in non-preempt kernel. The interrupt context object stock - * can only be accessed after disabling interrupt. User context code can - * access interrupt object stock, but not vice versa. - */ -static inline struct obj_stock *get_obj_stock(unsigned long *pflags) -{ - struct memcg_stock_pcp *stock; - - if (likely(in_task())) { - *pflags = 0UL; - preempt_disable(); - stock = this_cpu_ptr(&memcg_stock); - return &stock->task_obj; - } - - local_irq_save(*pflags); - stock = this_cpu_ptr(&memcg_stock); - return &stock->irq_obj; -} - -static inline void put_obj_stock(unsigned long flags) -{ - if (likely(in_task())) - preempt_enable(); - else - local_irq_restore(flags); -} - /** * consume_stock: Try to consume stocked charge on this cpu. * @memcg: memcg to consume from. @@ -2816,6 +2763,59 @@ static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) */ #define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT) +/* + * Most kmem_cache_alloc() calls are from user context. The irq disable/enable + * sequence used in this case to access content from object stock is slow. + * To optimize for user context access, there are now two object stocks for + * task context and interrupt context access respectively. + * + * The task context object stock can be accessed by disabling preemption only + * which is cheap in non-preempt kernel. The interrupt context object stock + * can only be accessed after disabling interrupt. User context code can + * access interrupt object stock, but not vice versa. + */ +static inline struct obj_stock *get_obj_stock(unsigned long *pflags) +{ + struct memcg_stock_pcp *stock; + + if (likely(in_task())) { + *pflags = 0UL; + preempt_disable(); + stock = this_cpu_ptr(&memcg_stock); + return &stock->task_obj; + } + + local_irq_save(*pflags); + stock = this_cpu_ptr(&memcg_stock); + return &stock->irq_obj; +} + +static inline void put_obj_stock(unsigned long flags) +{ + if (likely(in_task())) + preempt_enable(); + else + local_irq_restore(flags); +} + +/* + * mod_objcg_mlstate() may be called with irq enabled, so + * mod_memcg_lruvec_state() should be used. + */ +static inline void mod_objcg_mlstate(struct obj_cgroup *objcg, + struct pglist_data *pgdat, + enum node_stat_item idx, int nr) +{ + struct mem_cgroup *memcg; + struct lruvec *lruvec; + + rcu_read_lock(); + memcg = obj_cgroup_memcg(objcg); + lruvec = mem_cgroup_lruvec(memcg, pgdat); + mod_memcg_lruvec_state(lruvec, idx, nr); + rcu_read_unlock(); +} + int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, gfp_t gfp, bool new_page) { From 4178158ef8cadeb0ee86639749ce2b33ad75f770 Mon Sep 17 00:00:00 2001 From: Zhenguo Yao Date: Fri, 10 Dec 2021 14:47:08 -0800 Subject: [PATCH 133/143] hugetlbfs: fix issue of preallocation of gigantic pages can't work Preallocation of gigantic pages can't work bacause of commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation"). When nid is NUMA_NO_NODE(-1), alloc_bootmem_huge_page will always return without doing allocation. Fix this by adding more check. Link: https://lkml.kernel.org/r/20211129133803.15653-1-yaozhenguo1@gmail.com Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") Signed-off-by: Zhenguo Yao Reviewed-by: Mike Kravetz Tested-by: Maxim Levitsky Reviewed-by: Muchun Song Reviewed-by: Baolin Wang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index abcd1785c629..a1baa198519a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2973,7 +2973,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) struct huge_bootmem_page *m = NULL; /* initialize for clang */ int nr_nodes, node; - if (nid >= nr_online_nodes) + if (nid != NUMA_NO_NODE && nid >= nr_online_nodes) return 0; /* do node specific alloc */ if (nid != NUMA_NO_NODE) { From 3c376dfafbf7a8ea0dea212d095ddd83e93280bb Mon Sep 17 00:00:00 2001 From: Manjong Lee Date: Fri, 10 Dec 2021 14:47:11 -0800 Subject: [PATCH 134/143] mm: bdi: initialize bdi_min_ratio when bdi is unregistered Initialize min_ratio if it is set during bdi unregistration. This can prevent problems that may occur a when bdi is removed without resetting min_ratio. For example. 1) insert external sdcard 2) set external sdcard's min_ratio 70 3) remove external sdcard without setting min_ratio 0 4) insert external sdcard 5) set external sdcard's min_ratio 70 << error occur(can't set) Because when an sdcard is removed, the present bdi_min_ratio value will remain. Currently, the only way to reset bdi_min_ratio is to reboot. [akpm@linux-foundation.org: tweak comment and coding style] Link: https://lkml.kernel.org/r/20211021161942.5983-1-mj0123.lee@samsung.com Signed-off-by: Manjong Lee Acked-by: Peter Zijlstra (Intel) Cc: Changheun Lee Cc: Jens Axboe Cc: Christoph Hellwig Cc: Matthew Wilcox Cc: Cc: Cc: Cc: Cc: Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/backing-dev.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 1eead4761011..eae96dfe0261 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -945,6 +945,13 @@ void bdi_unregister(struct backing_dev_info *bdi) wb_shutdown(&bdi->wb); cgwb_bdi_unregister(bdi); + /* + * If this BDI's min ratio has been set, use bdi_set_min_ratio() to + * update the global bdi_min_ratio. + */ + if (bdi->min_ratio) + bdi_set_min_ratio(bdi, 0); + if (bdi->dev) { bdi_debug_unregister(bdi); device_unregister(bdi->dev); From c897899752478d4c905c56f2b54b99ba82b34e13 Mon Sep 17 00:00:00 2001 From: German Gomez Date: Wed, 1 Dec 2021 12:33:29 +0000 Subject: [PATCH 135/143] perf tools: Prevent out-of-bounds access to registers The size of the cache of register values is arch-dependant (PERF_REGS_MAX). This has the potential of causing an out-of-bounds access in the function "perf_reg_value" if the local architecture contains less registers than the one the perf.data file was recorded on. Since the maximum number of registers is bound by the bitmask "u64 cache_mask", and the size of the cache when running under x86 systems is 64 already, fix the size to 64 and add a range-check to the function "perf_reg_value" to prevent out-of-bounds access. Reported-by: Alexandre Truong Reviewed-by: Kajol Jain Signed-off-by: German Gomez Acked-by: Jiri Olsa Cc: Alexander Shishkin Cc: John Garry Cc: Leo Yan Cc: Mark Rutland Cc: Mathieu Poirier Cc: Namhyung Kim Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20211201123334.679131-2-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/event.h | 5 ++++- tools/perf/util/perf_regs.c | 3 +++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index 95ffed66369c..c59331eea1d9 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -44,13 +44,16 @@ struct perf_event_attr; /* perf sample has 16 bits size limit */ #define PERF_SAMPLE_MAX_SIZE (1 << 16) +/* number of register is bound by the number of bits in regs_dump::mask (64) */ +#define PERF_SAMPLE_REGS_CACHE_SIZE (8 * sizeof(u64)) + struct regs_dump { u64 abi; u64 mask; u64 *regs; /* Cached values/mask filled by first register access. */ - u64 cache_regs[PERF_REGS_MAX]; + u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE]; u64 cache_mask; }; diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 5ee47ae1509c..06a7461ba864 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -25,6 +25,9 @@ int perf_reg_value(u64 *valp, struct regs_dump *regs, int id) int i, idx = 0; u64 mask = regs->mask; + if ((u64)id >= PERF_SAMPLE_REGS_CACHE_SIZE) + return -EINVAL; + if (regs->cache_mask & (1ULL << id)) goto out; From 057ae59f5a1d924511beb1b09f395bdb316cfd03 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 10 Dec 2021 18:22:57 +0200 Subject: [PATCH 136/143] perf intel-pt: Fix some PGE (packet generation enable/control flow packets) usage Packet generation enable (PGE) refers to whether control flow (COFI) packets are being produced. PGE may be false even when branch-tracing is enabled, due to being out-of-context, or outside a filter address range. Fix some missing PGE usage. Fixes: 7c1b16ba0e26e6 ("perf intel-pt: Add support for decoding FUP/TIP only") Fixes: 839598176b0554 ("perf intel-pt: Allow decoding with branch tracing disabled") Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 5f83937bf8f3..6f6f163161a9 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -2678,6 +2678,7 @@ static int intel_pt_hop_trace(struct intel_pt_decoder *decoder, bool *no_tip, in return HOP_IGNORE; case INTEL_PT_TIP_PGD: + decoder->pge = false; if (!decoder->packet.count) { intel_pt_set_nr(decoder); return HOP_IGNORE; @@ -2707,7 +2708,7 @@ static int intel_pt_hop_trace(struct intel_pt_decoder *decoder, bool *no_tip, in intel_pt_set_ip(decoder); if (intel_pt_fup_event(decoder)) return HOP_RETURN; - if (!decoder->branch_enable) + if (!decoder->branch_enable || !decoder->pge) *no_tip = true; if (*no_tip) { decoder->state.type = INTEL_PT_INSTRUCTION; @@ -2897,7 +2898,7 @@ static bool intel_pt_psb_with_fup(struct intel_pt_decoder *decoder, int *err) { struct intel_pt_psb_info data = { .fup = false }; - if (!decoder->branch_enable || !decoder->pge) + if (!decoder->branch_enable) return false; intel_pt_pkt_lookahead(decoder, intel_pt_psb_lookahead_cb, &data); @@ -2999,7 +3000,7 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder) break; } intel_pt_set_last_ip(decoder); - if (!decoder->branch_enable) { + if (!decoder->branch_enable || !decoder->pge) { decoder->ip = decoder->last_ip; if (intel_pt_fup_event(decoder)) return 0; From ad106a26aef3a95ac7ca88d033b431661ba346ce Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 10 Dec 2021 18:22:58 +0200 Subject: [PATCH 137/143] perf intel-pt: Fix sync state when a PSB (synchronization) packet is found When syncing, it may be that branch packet generation is not enabled at that point, in which case there will not immediately be a control-flow packet, so some packets before a control flow packet turns up, get ignored. However, the decoder is in sync as soon as a PSB is found, so the state should be set accordingly. Fixes: f4aa081949e7b6 ("perf tools: Add Intel PT decoder") Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 6f6f163161a9..bddf98123dc3 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -3608,7 +3608,7 @@ static int intel_pt_sync(struct intel_pt_decoder *decoder) } decoder->have_last_ip = true; - decoder->pkt_state = INTEL_PT_STATE_NO_IP; + decoder->pkt_state = INTEL_PT_STATE_IN_SYNC; err = intel_pt_walk_psb(decoder); if (err) From 4c761d805bb2d2ead1b9baaba75496152b394c80 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 10 Dec 2021 18:22:59 +0200 Subject: [PATCH 138/143] perf intel-pt: Fix intel_pt_fup_event() assumptions about setting state type intel_pt_fup_event() assumes it can overwrite the state type if there has been an FUP event, but this is an unnecessary and unexpected constraint on callers. Fix by touching only the state type flags that are affected by an FUP event. Fixes: a472e65fc490a ("perf intel-pt: Add decoder support for ptwrite and power event packets") Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- .../util/intel-pt-decoder/intel-pt-decoder.c | 32 ++++++++----------- 1 file changed, 13 insertions(+), 19 deletions(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index bddf98123dc3..16fbbf07e367 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -1205,61 +1205,55 @@ static int intel_pt_walk_insn(struct intel_pt_decoder *decoder, static bool intel_pt_fup_event(struct intel_pt_decoder *decoder) { + enum intel_pt_sample_type type = decoder->state.type; bool ret = false; + decoder->state.type &= ~INTEL_PT_BRANCH; + if (decoder->set_fup_tx_flags) { decoder->set_fup_tx_flags = false; decoder->tx_flags = decoder->fup_tx_flags; - decoder->state.type = INTEL_PT_TRANSACTION; + decoder->state.type |= INTEL_PT_TRANSACTION; if (decoder->fup_tx_flags & INTEL_PT_ABORT_TX) decoder->state.type |= INTEL_PT_BRANCH; - decoder->state.from_ip = decoder->ip; - decoder->state.to_ip = 0; decoder->state.flags = decoder->fup_tx_flags; - return true; + ret = true; } if (decoder->set_fup_ptw) { decoder->set_fup_ptw = false; - decoder->state.type = INTEL_PT_PTW; + decoder->state.type |= INTEL_PT_PTW; decoder->state.flags |= INTEL_PT_FUP_IP; - decoder->state.from_ip = decoder->ip; - decoder->state.to_ip = 0; decoder->state.ptw_payload = decoder->fup_ptw_payload; - return true; + ret = true; } if (decoder->set_fup_mwait) { decoder->set_fup_mwait = false; - decoder->state.type = INTEL_PT_MWAIT_OP; - decoder->state.from_ip = decoder->ip; - decoder->state.to_ip = 0; + decoder->state.type |= INTEL_PT_MWAIT_OP; decoder->state.mwait_payload = decoder->fup_mwait_payload; ret = true; } if (decoder->set_fup_pwre) { decoder->set_fup_pwre = false; decoder->state.type |= INTEL_PT_PWR_ENTRY; - decoder->state.type &= ~INTEL_PT_BRANCH; - decoder->state.from_ip = decoder->ip; - decoder->state.to_ip = 0; decoder->state.pwre_payload = decoder->fup_pwre_payload; ret = true; } if (decoder->set_fup_exstop) { decoder->set_fup_exstop = false; decoder->state.type |= INTEL_PT_EX_STOP; - decoder->state.type &= ~INTEL_PT_BRANCH; decoder->state.flags |= INTEL_PT_FUP_IP; - decoder->state.from_ip = decoder->ip; - decoder->state.to_ip = 0; ret = true; } if (decoder->set_fup_bep) { decoder->set_fup_bep = false; decoder->state.type |= INTEL_PT_BLK_ITEMS; - decoder->state.type &= ~INTEL_PT_BRANCH; + ret = true; + } + if (ret) { decoder->state.from_ip = decoder->ip; decoder->state.to_ip = 0; - ret = true; + } else { + decoder->state.type = type; } return ret; } From c79ee2b2160909889df67c8801352d3e69d43a1a Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 10 Dec 2021 18:23:00 +0200 Subject: [PATCH 139/143] perf intel-pt: Fix state setting when receiving overflow (OVF) packet An overflow (OVF packet) is treated as an error because it represents a loss of trace data, but there is no loss of synchronization, so the packet state should be INTEL_PT_STATE_IN_SYNC not INTEL_PT_STATE_ERR_RESYNC. To support that, some additional variables must be reset, and the FUP packet that may follow OVF is treated as an FUP event. Fixes: f4aa081949e7b6 ("perf tools: Add Intel PT decoder") Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- .../util/intel-pt-decoder/intel-pt-decoder.c | 32 ++++++++++++++++--- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 16fbbf07e367..845b0ca866a4 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -1249,6 +1249,20 @@ static bool intel_pt_fup_event(struct intel_pt_decoder *decoder) decoder->state.type |= INTEL_PT_BLK_ITEMS; ret = true; } + if (decoder->overflow) { + decoder->overflow = false; + if (!ret && !decoder->pge) { + if (decoder->hop) { + decoder->state.type = 0; + decoder->pkt_state = INTEL_PT_STATE_RESAMPLE; + } + decoder->pge = true; + decoder->state.type |= INTEL_PT_BRANCH | INTEL_PT_TRACE_BEGIN; + decoder->state.from_ip = 0; + decoder->state.to_ip = decoder->ip; + return true; + } + } if (ret) { decoder->state.from_ip = decoder->ip; decoder->state.to_ip = 0; @@ -1602,7 +1616,16 @@ static int intel_pt_overflow(struct intel_pt_decoder *decoder) intel_pt_clear_tx_flags(decoder); intel_pt_set_nr(decoder); decoder->timestamp_insn_cnt = 0; - decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC; + decoder->pkt_state = INTEL_PT_STATE_IN_SYNC; + decoder->state.from_ip = decoder->ip; + decoder->ip = 0; + decoder->pge = false; + decoder->set_fup_tx_flags = false; + decoder->set_fup_ptw = false; + decoder->set_fup_mwait = false; + decoder->set_fup_pwre = false; + decoder->set_fup_exstop = false; + decoder->set_fup_bep = false; decoder->overflow = true; return -EOVERFLOW; } @@ -2957,6 +2980,7 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder) case INTEL_PT_TIP_PGE: { decoder->pge = true; + decoder->overflow = false; intel_pt_mtc_cyc_cnt_pge(decoder); intel_pt_set_nr(decoder); if (decoder->packet.count == 0) { @@ -3462,10 +3486,10 @@ static int intel_pt_sync_ip(struct intel_pt_decoder *decoder) decoder->set_fup_pwre = false; decoder->set_fup_exstop = false; decoder->set_fup_bep = false; + decoder->overflow = false; if (!decoder->branch_enable) { decoder->pkt_state = INTEL_PT_STATE_IN_SYNC; - decoder->overflow = false; decoder->state.type = 0; /* Do not have a sample */ return 0; } @@ -3480,7 +3504,6 @@ static int intel_pt_sync_ip(struct intel_pt_decoder *decoder) decoder->pkt_state = INTEL_PT_STATE_RESAMPLE; else decoder->pkt_state = INTEL_PT_STATE_IN_SYNC; - decoder->overflow = false; decoder->state.from_ip = 0; decoder->state.to_ip = decoder->ip; @@ -3699,7 +3722,8 @@ const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder) if (err) { decoder->state.err = intel_pt_ext_err(err); - decoder->state.from_ip = decoder->ip; + if (err != -EOVERFLOW) + decoder->state.from_ip = decoder->ip; intel_pt_update_sample_time(decoder); decoder->sample_tot_cyc_cnt = decoder->tot_cyc_cnt; intel_pt_set_nr(decoder); From a32e6c5da599dbf49e60622a4dfb5b9b40ece029 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 10 Dec 2021 18:23:01 +0200 Subject: [PATCH 140/143] perf intel-pt: Fix next 'err' value, walking trace Code after label 'next:' in intel_pt_walk_trace() assumes 'err' is zero, but it may not be, if arrived at via a 'goto'. Ensure it is zero. Fixes: 7c1b16ba0e26e6 ("perf intel-pt: Add support for decoding FUP/TIP only") Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 845b0ca866a4..75b504aed7f4 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -2942,6 +2942,7 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder) if (err) return err; next: + err = 0; if (decoder->cyc_threshold) { if (decoder->sample_cyc && last_packet_type != INTEL_PT_CYC) decoder->sample_cyc = false; From a882cc94971093e146ffa1163b140ad956236754 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 10 Dec 2021 18:23:02 +0200 Subject: [PATCH 141/143] perf intel-pt: Fix missing 'instruction' events with 'q' option FUP packets contain IP information, which makes them also an 'instruction' event in 'hop' mode i.e. the itrace 'q' option. That wasn't happening, so restructure the logic so that FUP events are added along with appropriate 'instruction' and 'branch' events. Fixes: 7c1b16ba0e26e6 ("perf intel-pt: Add support for decoding FUP/TIP only") Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 75b504aed7f4..0e013c2d9eb4 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -2683,6 +2683,8 @@ static int intel_pt_scan_for_psb(struct intel_pt_decoder *decoder); /* Hop mode: Ignore TNT, do not walk code, but get ip from FUPs and TIPs */ static int intel_pt_hop_trace(struct intel_pt_decoder *decoder, bool *no_tip, int *err) { + *err = 0; + /* Leap from PSB to PSB, getting ip from FUP within PSB+ */ if (decoder->leap && !decoder->in_psb && decoder->packet.type != INTEL_PT_PSB) { *err = intel_pt_scan_for_psb(decoder); @@ -2723,18 +2725,21 @@ static int intel_pt_hop_trace(struct intel_pt_decoder *decoder, bool *no_tip, in if (!decoder->packet.count) return HOP_IGNORE; intel_pt_set_ip(decoder); - if (intel_pt_fup_event(decoder)) - return HOP_RETURN; + if (decoder->set_fup_mwait || decoder->set_fup_pwre) + *no_tip = true; if (!decoder->branch_enable || !decoder->pge) *no_tip = true; if (*no_tip) { decoder->state.type = INTEL_PT_INSTRUCTION; decoder->state.from_ip = decoder->ip; decoder->state.to_ip = 0; + intel_pt_fup_event(decoder); return HOP_RETURN; } + intel_pt_fup_event(decoder); + decoder->state.type |= INTEL_PT_INSTRUCTION | INTEL_PT_BRANCH; *err = intel_pt_walk_fup_tip(decoder); - if (!*err) + if (!*err && decoder->state.to_ip) decoder->pkt_state = INTEL_PT_STATE_RESAMPLE; return HOP_RETURN; From 6665b8e4836caa8023cbc7e53733acd234969c8c Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 10 Dec 2021 18:23:03 +0200 Subject: [PATCH 142/143] perf intel-pt: Fix error timestamp setting on the decoder error path An error timestamp shows the last known timestamp for the queue, but this is not updated on the error path. Fix by setting it. Fixes: f4aa081949e7b6 ("perf tools: Add Intel PT decoder") Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/intel-pt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 556a893508da..10c3187e4c5a 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -2565,6 +2565,7 @@ static int intel_pt_run_decoder(struct intel_pt_queue *ptq, u64 *timestamp) ptq->sync_switch = false; intel_pt_next_tid(pt, ptq); } + ptq->timestamp = state->est_timestamp; if (pt->synth_opts.errors) { err = intel_ptq_synth_error(ptq, state); if (err) From 9937e8daab29d9e20de6b7bc56c76db7a4eeda69 Mon Sep 17 00:00:00 2001 From: Miaoqian Lin Date: Sat, 11 Dec 2021 05:38:53 +0000 Subject: [PATCH 143/143] perf python: Fix NULL vs IS_ERR_OR_NULL() checking The function trace_event__tp_format_id may return ERR_PTR(-ENOMEM). Use IS_ERR_OR_NULL to check tp_format. Signed-off-by: Miaoqian Lin Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Song Liu Link: http://lore.kernel.org/lkml/20211211053856.19827-1-linmq006@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/python.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index 563a9ba8954f..7f782a31bda3 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -461,7 +461,7 @@ get_tracepoint_field(struct pyrf_event *pevent, PyObject *attr_name) struct tep_event *tp_format; tp_format = trace_event__tp_format_id(evsel->core.attr.config); - if (!tp_format) + if (IS_ERR_OR_NULL(tp_format)) return NULL; evsel->tp_format = tp_format;