android_kernel_samsung_sm8650

Author	SHA1	Message	Date
Ludovic Barre	7577316528	mmc: mmci: Add callbacks for to manage signal voltage switch A variant may need to define some actions before and after a voltage switch. This patch adds 2 callbacks to manage signal voltage switch in the struct mmci_host_ops. ->pre_sig_volt_switch() allows to prepare a signal voltage switch before sending the SD_SWITCH_VOLTAGE command (CMD11). ->post_sig_volt_switch callback allows specific actions to be executed, after the I/O signal voltage level has been changed. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200128090636.13689-8-ludovic.barre@st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:40 +01:00
Ludovic Barre	1103f807a3	mmc: mmci_sdmmc: Add execute tuning with delay block The hardware delay block is used to align the sampling clock on the data received by SDMMC. It is mandatory for SDMMC to support the SDR104 mode. The delay block is used to generate an output clock which is dephased from the input clock. The phase of the output clock must be programmed by the execute tuning interface. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200128090636.13689-7-ludovic.barre@st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:40 +01:00
Ludovic Barre	e19c33dbfe	dt-bindings: mmc: mmci: add delay block base register for sdmmc To support the sdr104 mode, the sdmmc variant has a hardware delay block to manage the clock phase when sampling data received by the card. This patch adds a second base register (optional) for sdmmc delay block. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20200128090636.13689-6-ludovic.barre@st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:40 +01:00
Ludovic Barre	31b963e194	mmc: mmci: Add private pointer for variant In variant init function, some references may be allocated for variant specific usage. Add a private void* to mmci_host struct allows at variant functions to access on this references by mmci_host structure. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200128090636.13689-5-ludovic.barre@st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:40 +01:00
Ludovic Barre	7b9716a0f1	mmc: mmci: Add a reference at mmc_host_ops in mmci struct The variant init function may need to add a mmc_host_ops, for example to add the execute_tuning support if this feature is available. This patch adds mmc_host_ops pointer in mmci struct. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200128090636.13689-4-ludovic.barre@st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:40 +01:00
Ludovic Barre	bdbf9faf5f	mmc: mmci_sdmmc: Rename sdmmc_priv struct to sdmmc_idma This patch renames sdmmc_priv struct to sdmmc_idma which is assigned to host->dma_priv. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200128090636.13689-3-ludovic.barre@st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:40 +01:00
Ludovic Barre	127e6e98ca	mmc: mmci_sdmmc: Replace sg_dma_xxx macros sg_dma_xxx should be used after a dma_map_sg call has been done to get bus addresses of each of the SG entries and their lengths. But mmci_host_ops validate_data can be called before dma_map_sg. This patch replaces theses macros by sg->offset and sg->length which are always defined. Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200128090636.13689-2-ludovic.barre@st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:40 +01:00
Ulf Hansson	892bf10014	mmc: core: Fixup support for HW busy detection for HPI commands In case the host specify a max_busy_timeout, we need to validate that the needed timeout for the HPI command conforms to that requirement. If that's not the case, let's convert from a R1B response to a R1 response, as to instruct the host to avoid HW busy detection. Additionally, when R1B is used we must also inform the host about the busy timeout for the command, so let's do that via updating cmd.busy_timeout. Finally, when R1B is used and in case the host supports HW busy detection, there should be no need for doing polling, so then skip that. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-12-ulf.hansson@linaro.org	2020-03-24 14:35:40 +01:00
Ulf Hansson	490ff95f8e	mmc: core: Convert to mmc_poll_for_busy() for HPI commands Rather than open coding the polling loop in mmc_interrupt_hpi(), let's convert to use mmc_poll_for_busy(). Note that, moving to mmc_poll_for_busy() for HPI also improves the behaviour according to below. - Adds support for polling via the optional ->card_busy() host ops. - Require R1_READY_FOR_DATA to be set in the CMD13 response before exiting the polling loop. - Adds a throttling mechanism to avoid CPU hogging when polling. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-11-ulf.hansson@linaro.org	2020-03-24 14:35:40 +01:00
Ulf Hansson	9f94d04752	mmc: core: Drop redundant out-parameter to mmc_send_hpi_cmd() The 'u32 *status' is unused by the caller, so let's drop it. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-10-ulf.hansson@linaro.org	2020-03-24 14:35:40 +01:00
Ulf Hansson	0d84c3e6a5	mmc: core: Convert to mmc_poll_for_busy() for erase/trim/discard Rather than open coding the polling loop in mmc_do_erase(), let's convert to use mmc_poll_for_busy(). To allow a slightly different error parsing during polling, compared to the __mmc_switch() case, a new in-parameter to mmc_poll_for_busy() is needed, but other than that the conversion is straight forward. Besides addressing the open coding issue, moving to mmc_poll_for_busy() for erase/trim/discard improves the behaviour according to below. - Adds support for polling via the optional ->card_busy() host ops. - Returns zero to indicate success when the final polling attempt finds the card non-busy, even if the timeout expired. - Exits the polling loop when state moves to R1_STATE_TRAN, rather than when leaving R1_STATE_PRG. - Decreases the starting range for throttling to 32-64us. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-9-ulf.hansson@linaro.org	2020-03-24 14:35:40 +01:00
Ulf Hansson	2a1c7cda52	mmc: core: Update CMD13 busy check for CMD6 commands Through mmc_poll_for_busy() a CMD13 may be sent to get the status of the (e)MMC card. If the state of the card is R1_STATE_PRG, the card is considered as being busy, which means we continue to poll with CMD13. This seems to be sufficient, but it's also unnecessary fragile, as it means a new command/request could potentially be sent to the card when it's in an unknown state. To try to improve the situation, but also to move towards a more consistent CMD13 polling behaviour in the mmc core, let's deploy the same policy we use for regular I/O write requests. In other words, let's check that card returns to the R1_STATE_TRAN and that the R1_READY_FOR_DATA bit is set in the CMD13 response, before exiting the polling loop. Note that, potentially this changed behaviour could lead to unnecessary waiting for the timeout to expire, if the card for some reason, moves to an unexpected error state. However, as we bail out from the polling loop when R1_SWITCH_ERROR bit is set or when the CMD13 fails, this shouldn't be an issue. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-8-ulf.hansson@linaro.org	2020-03-24 14:35:39 +01:00
Ulf Hansson	40c96853fe	mmc: core: Enable re-use of mmc_blk_in_tran_state() To allow subsequent changes to re-use the code from the static function mmc_blk_in_tran_state(), let's move it to a public header. While at it, let's also rename it to mmc_ready_for_data(), as to try to better describe its purpose. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-7-ulf.hansson@linaro.org	2020-03-24 14:35:39 +01:00
Ulf Hansson	6972096a03	mmc: core: Split up mmc_poll_for_busy() To make the code more readable, move the part that gets the busy status of the card out into a separate function, mmc_busy_status(). Then call it from mmc_poll_for_busy(). Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-6-ulf.hansson@linaro.org	2020-03-24 14:35:39 +01:00
Ulf Hansson	02098ccdd8	mmc: core: Drop redundant in-parameter to __mmc_switch() The use_busy_signal in-parameter is set true by all callers of __mmc_switch(), hence it's redundant so drop it. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-5-ulf.hansson@linaro.org	2020-03-24 14:35:39 +01:00
Ulf Hansson	60db8a4749	mmc: core: Extend mmc_switch_status() to rid of __mmc_switch_status() To simplify code, let's extend mmc_switch_status() to cope with needs addressed in __mmc_switch_status(). Then move all users to the updated mmc_switch_status() API and drop __mmc_switch_status() altogether. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-4-ulf.hansson@linaro.org	2020-03-24 14:35:39 +01:00
Ulf Hansson	ebd4f4bd01	mmc: core: Drop unused define The last user of MMC_OPS_TIMEOUT_MS was recently removed, however the define stayed around. Let's remove it. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-3-ulf.hansson@linaro.org	2020-03-24 14:35:39 +01:00
Ulf Hansson	d46a24a9d2	mmc: core: Throttle polling rate for CMD6 In mmc_poll_for_busy() we loop continuously, either by sending a CMD13 or by invoking the ->card_busy() host ops, as to detect when the card stops signaling busy. This behaviour is problematic as it may cause CPU hogging, especially when the busy signal time reaches beyond a few ms. Let's fix the issue by adding a throttling mechanism, that inserts a usleep_range() in between the polling attempts. The sleep range starts at 32-64us, but increases for each loop by a factor of 2, up until the range reaches ~32-64ms. In this way, we are able to keep the loop fine-grained enough for short busy signaling times, while also not hogging the CPU for longer times. Note that, this change is inspired by the similar throttling mechanism that we already use for mmc_do_erase(). Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Tested-by: Ludovic Barre <ludovic.barre@st.com> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200204085449.32585-2-ulf.hansson@linaro.org	2020-03-24 14:35:39 +01:00
Baolin Wang	f4498549e1	mmc: host: sdhci-sprd: Add software queue support Add software queue support to improve the performance. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Baolin Wang <baolin.wang7@gmail.com> Link: https://lore.kernel.org/r/f629b32943aae9e30ffa17acf4af06c270417001.1581478569.git.baolin.wang7@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:39 +01:00
Baolin Wang	4730831c7d	mmc: host: sdhci: Add a variable to defer to complete requests if needed When using the host software queue, it will trigger the next request in irq handler without a context switch. But the sdhci_request() can not be called in interrupt context when using host software queue for some host drivers, due to the get_cd() ops can be sleepable. But for some host drivers, such as Spreadtrum host driver, the card is nonremovable, so the get_cd() ops is not sleepable, which means we can complete the data request and trigger the next request in irq handler to remove the context switch for the Spreadtrum host driver. As suggested by Adrian, we should introduce a request_atomic() API to indicate that a request can be called in interrupt context to remove the context switch when using mmc host software queue. But this should be done in another thread to convert the users of mmc host software queue. Thus we can introduce a variable in struct sdhci_host to indicate that we will always to defer to complete requests when using the host software queue. Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Baolin Wang <baolin.wang7@gmail.com> Link: https://lore.kernel.org/r/e693e7a29beb3c1922b333f4603ea81f43d5c5b1.1581478568.git.baolin.wang7@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:39 +01:00
Baolin Wang	1774b00214	mmc: host: sdhci: Add request_done ops for struct sdhci_ops Add request_done ops for struct sdhci_ops as a preparation in case some host controllers have different method to complete one request, such as supporting request completion of MMC software queue. Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Baolin Wang <baolin.wang7@gmail.com> Link: https://lore.kernel.org/r/1539c801c8bbdbcd1d86f8c2dab375f5803c765a.1581478568.git.baolin.wang7@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:39 +01:00
Baolin Wang	045d705dc1	mmc: core: Enable the MMC host software queue for the SD card Enable the MMC host software queue for the SD card if the host controller supports the MMC host software queue. On my Spreadtrum platform, I did not see any obvious performance changes in 4K block size when changing to use hsq for the SD cards, I think the reason is the SD card works at a low speed on my platform, and most of time is spent in the hardware. But we can see some obvious improvements when enabling the packed request based on hsq, that's why we still add hsq support for the SD cards. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Baolin Wang <baolin.wang7@gmail.com> Link: https://lore.kernel.org/r/0065b4631fef2d61c3b89d14a4ea4f2b7499ea56.1581478568.git.baolin.wang7@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:39 +01:00
Baolin Wang	511ce378e1	mmc: Add MMC host software queue support Now the MMC read/write stack will always wait for previous request is completed by mmc_blk_rw_wait(), before sending a new request to hardware, or queue a work to complete request, that will bring context switching overhead and spend some extra time to poll the card for busy completion for I/O writes via sending CMD13, especially for high I/O per second rates, to affect the IO performance. Thus this patch introduces MMC software queue interface based on the hardware command queue engine's interfaces, which is similar with the hardware command queue engine's idea, that can remove the context switching. Moreover we set the default queue depth as 64 for software queue, which allows more requests to be prepared, merged and inserted into IO scheduler to improve performance, but we only allow 2 requests in flight, that is enough to let the irq handler always trigger the next request without a context switch, as well as avoiding a long latency. Moreover the host controller should support HW busy detection for I/O operations when enabling the host software queue. That means, the host controller must not complete a data transfer request, until after the card stops signals busy. From the fio testing data in cover letter, we can see the software queue can improve some performance with 4K block size, increasing about 16% for random read, increasing about 90% for random write, though no obvious improvement for sequential read and write. Moreover we can expand the software queue interface to support MMC packed request or packed command in future. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Baolin Wang <baolin.wang7@gmail.com> Link: https://lore.kernel.org/r/4409c1586a9b3ed20d57ad2faf6c262fc3ccb6e2.1581478568.git.baolin.wang7@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:39 +01:00
Ritesh Harjani	219c02ca1c	mmc: sdhci-msm: Don't enable PWRSAVE_DLL for certain sdhc hosts SDHC core with new 14lpp and later tech DLL should not enable PWRSAVE_DLL since such controller's internal gating cannot meet following MCLK requirement: When MCLK is gated OFF, it is not gated for less than 0.5us and MCLK must be switched on for at-least 1us before DATA starts coming. Adding support for this requirement. Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org> Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org> Reviewed-by: Can Guo <cang@codeaurora.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1581077075-26011-1-git-send-email-vbadigan@codeaurora.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:39 +01:00
Manish Narani	2a2b821607	mmc: sdhci-of-arasan: Remove quirk for broken base clock This patch removes quirk which indicates a broken base clock. This was making the kernel report wrong base clock of ~187MHz instead of 200MHz even as the measurement on the hardware was showing 200MHz. Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1579602095-30060-5-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:39 +01:00
Manish Narani	8d2e334377	mmc: sdhci-of-arasan: Add support for DLL reset for ZynqMP platforms The DLL resets are required while executing the auto tuning procedure in ZynqMP. This patch adds code to support the same. Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1579602095-30060-4-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:39 +01:00
Manish Narani	0dc64c2b94	firmware: xilinx: Add DLL reset support SD DLL resets are required for some of the operations on ZynqMP platform. Add DLL reset support in ZynqMP firmware driver for SD DLL reset. Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Link: https://lore.kernel.org/r/1579602095-30060-3-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Manish Narani	93660d837f	firmware: xilinx: Add ZynqMP Tap Delay setup ioctl to the valid list The Tap Delay setup ioctl was not added to valid list due to which it may fail to set Tap Delays for SD. This patch fixes the same. Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Link: https://lore.kernel.org/r/1579602095-30060-2-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Wolfram Sang	f22084b662	mmc: tmio: remove superfluous callback wrappers After various refactoring, we can populate the mmc_ops callbacks directly and don't need to have wrappers for them anymore. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20200129203709.30493-7-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Wolfram Sang	b2dd9a1325	mmc: tmio: factor out TAP usage TAPs are Renesas SDHI specific. Now that we moved all handling to the SDHI core, we can also move the definitions from the TMIO struct to the SDHI one. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20200129203709.30493-6-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Wolfram Sang	a86bf70b70	mmc: tmio: enforce retune after runtime suspend Currently, select_tuning() is called after RPM resume. But select_tuning() needs some additional function calls to work correctly. Instead of reimplementing the whole postprocessing, just enforce retuning. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20200129203709.30493-5-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Wolfram Sang	64982b9f2f	mmc: tmio: give callback a generic name check_scc_error() is too Renesas specific. Let's just call it check_retune() to make it also easier understandable what it does. Only a rename, no functional change. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20200129203709.30493-4-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Wolfram Sang	3a821a8244	mmc: renesas_sdhi: complain loudly if driver needs update When the tap array in the driver is too low, this is not a warning but an error. Also _once is not helpful, we should make sure it is prominently in the logs. It is safe to do this because this will only show up during SoC enablement when we a new SoCs needs more taps (if that ever will happen). Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20200129203709.30493-3-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Wolfram Sang	0c482d829a	mmc: tmio: refactor tuning execution into SDHI driver Move Renesas specific code for executing the tuning with a SCC into the SDHI driver and leave only a generic call in the TMIO driver. Simplify the code a little by removing init_tuning() and prepare_tuning() callbacks. The latter is directly folded into the new execute_tuning() callbacks. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20200129203709.30493-2-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Wolfram Sang	6199a10e7e	mmc: renesas_sdhi: cleanup SCC defines Use increasing BIT numbers consistently and remove some superfluous comments. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20191217114034.13290-6-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Wolfram Sang	9b0d6855e7	mmc: renesas_sdhi: enforce manual correction for Gen3 HW engineers say that automatic tap correction cannot be used for HS400 in all R-Car Gen3 SoCs. So, check for that SDHI variant and disable it when HS400 is about to be enabled. Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20191217114034.13290-5-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Takeshi Saito	71cfc92751	mmc: renesas_sdhi: only check CMD status for HS400 manual correction R-Car Gen3 cannot use correction error status with HS400. HS200: CMD and DAT signal timing are based on CLK signal. HS400: CMD signal is based on CLK. DAT signal is based on DS signal. In HS400, CMD signal is 200MHz(SDR). DAT signal is 200MHz(DDR). Center position of signal is different between CMD and DAT. TAP position should be adjusted to the center position of CMD signal. DAT sampling timing is adjusted by HS400 calibration circuit regardless of TAP position. Refer to renesas_sdhi_adjust_hs400mode_enable(). However, correction error status contains CMD and DAT status in HS400 (DAT signal is not masked in HS400). Therefore, correction error status cannot use in HS400. It means that auto correction cannot be uses in HS400. Manual correction can change to the correct TAP position by ignoring DAT correction error status and using only CMD correction status. Signed-off-by: Takeshi Saito <takeshi.saito.xv@renesas.com> [wsa: refactored patch from BSP] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20191217114034.13290-4-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Takeshi Saito	11a219606a	mmc: renesas_sdhi: Add manual correction This patch adds a manual correction mechanism for SDHI. Currently, SDHI uses automatic TAP position correction. However, TAP position can also be corrected manually via correction error status flags. Signed-off-by: Takeshi Saito <takeshi.saito.xv@renesas.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20191217114034.13290-3-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Wolfram Sang	44f54e7012	mmc: renesas_sdhi: remove double clear of automatic correction hw_reset() clears the automatic correction bit twice. I couldn't find anything in the docs recommending that. Removing one of them didn't cause any regressions here, so keep it simple. Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20191217114034.13290-2-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2020-03-24 14:35:38 +01:00
Linus Torvalds	16fbf79b0f	Linux 5.6-rc7	2020-03-22 18:31:56 -07:00
Linus Torvalds	67d584e33e	for-5.6-rc6-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl53Vh0ACgkQxWXV+ddt WDtfOQ//bbUyKXcdH0FBZOCEcJmegcK1eUFYqKrwR2bHGe5JRdLM8pAvjCcqmWeO jtaRiFC4NSCqTIl3mkBUb+XmQtjZwixBUHRxJpuEO8zqawvFZXTqg/KJklNvi2rd KdflSNia6KrozTT+B/lpwZ5emS+wSdj5XTZ6VGj4riwtphSfWAjOu+4cOASMeFu+ Gfn+N9xu0ZcR/6zO20xAg0Xz+WU2uj4EfeM35dtRP2bPLG0yOGmiYT15Ll9h74Wm 7F+28iNTQfYutAexGvUpiouanGXE+ka3TCsJg5LuVTpdKGraOVGEuX+RhsyoKQrB E8bk91fbkLlooluhUC306iNA9/+RN/yFGtILX8JsgI2Od26ZuU01l/OHrc19MDIm gw1w3PMsD/hXLsG5ba4QsIYOzXofSrPdWej29h/o5p0VEQrAoCJEpAi7fVsiJDR1 sx6kCodw5jYhVs1P6DdXO1pgjE7iFUmjUQCFkl40edPMLy/LwB99A4zNnCOwI0KZ 49CMWHDe+tXVJBTzPvtma/PycQHIxJYMf1f8ko9E4stB7HtfH4dnUERDkb1UwQ5n aJgyhsCCnp/EJoPunUT7g9nLUdyu0Rtwknn3NascWZEieX2QhKEF5RcjAUSL+Hlo jbGGvoLhG0nOtYkU7BNSQbL8wxPJEEAq8e6F4tWMcOkhX4pNZP8= =YkB0 -----END PGP SIGNATURE----- Merge tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two fixes. The first is a regression: when dropping some incompat bits the conditions were reversed. The other is a fix for rename whiteout potentially leaving stack memory linked to a list" * tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix removal of raid[56\|1c34} incompat flags after removing block group btrfs: fix log context list corruption after rename whiteout error	2020-03-22 11:35:33 -07:00
Linus Torvalds	b3c03db67e	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: x86/mm: split vmalloc_sync_all() mm, slub: prevent kmalloc_node crashes and memory leaks mm/mmu_notifier: silence PROVE_RCU_LIST warnings epoll: fix possible lost wakeup on epoll_ctl() path mm: do not allow MADV_PAGEOUT for CoW pages mm, memcg: throttle allocators based on ancestral memory.high mm, memcg: fix corruption on 64-bit divisor in memory.high throttling page-flags: fix a crash at SetPageError(THP_SWAP) mm/hotplug: fix hot remove failure in SPARSEMEM\|!VMEMMAP case memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event	2020-03-22 10:46:50 -07:00
Joerg Roedel	763802b53a	x86/mm: split vmalloc_sync_all() Commit `3f8fd02b1b` ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Fixes: `3f8fd02b1b` ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Vlastimil Babka	0715e6c516	mm, slub: prevent kmalloc_node crashes and memory leaks Sachin reports [1] a crash in SLUB __slab_alloc(): BUG: Kernel NULL pointer dereference on read at 0x000073b0 Faulting instruction address: 0xc0000000003d55f4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1 NIP: c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000 REGS: c0000008b37836d0 TRAP: 0300 Not tainted (5.6.0-rc2-next-20200218-autotest) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004844 XER: 00000000 CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500 GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620 GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000 GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000 GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002 GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122 GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8 GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180 NIP ___slab_alloc+0x1f4/0x760 LR __slab_alloc+0x34/0x60 Call Trace: ___slab_alloc+0x334/0x760 (unreliable) __slab_alloc+0x34/0x60 __kmalloc_node+0x110/0x490 kvmalloc_node+0x58/0x110 mem_cgroup_css_online+0x108/0x270 online_css+0x48/0xd0 cgroup_apply_control_enable+0x2ec/0x4d0 cgroup_mkdir+0x228/0x5f0 kernfs_iop_mkdir+0x90/0xf0 vfs_mkdir+0x110/0x230 do_mkdirat+0xb0/0x1a0 system_call+0x5c/0x68 This is a PowerPC platform with following NUMA topology: available: 2 nodes (0-1) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 35247 MB node 1 free: 30907 MB node distances: node 0 1 0: 10 40 1: 40 10 possible numa nodes: 0-31 This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node" [2] which effectively calls kmalloc_node for each possible node. SLUB however only allocates kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on node_to_mem_node to return such valid node for other nodes since commit `a561ce00b0` ("slub: fall back to node_to_mem_node() node if allocating on memoryless node"). This is however not true in this configuration where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31, thus it contains zeroes and get_partial() ends up accessing non-allocated kmem_cache_node. A related issue was reported by Bharata (originally by Ramachandran) [3] where a similar PowerPC configuration, but with mainline kernel without patch [2] ends up allocating large amounts of pages by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with node_to_mem_node() not behaving as expected, and might probably also lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4]. This patch should fix both issues by not relying on node_to_mem_node() anymore and instead simply falling back to NUMA_NO_NODE, when kmalloc_node(node) is attempted for a node that's not online, or has no usable memory. The "usable memory" condition is also changed from node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly the condition that SLUB uses to allocate kmem_cache_node structures. The check in get_partial() is removed completely, as the checks in ___slab_alloc() are now sufficient to prevent get_partial() being reached with an invalid node. [1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/ [2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/ [3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/ Fixes: `a561ce00b0` ("slub: fall back to node_to_mem_node() node if allocating on memoryless node") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Reported-by: PUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christopher Lameter <cl@linux.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.cz Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Qian Cai	63886bad90	mm/mmu_notifier: silence PROVE_RCU_LIST warnings It is safe to traverse mm->notifier_subscriptions->list either under SRCU read lock or mm->notifier_subscriptions->lock using hlist_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false positives, for example, WARNING: suspicious RCU usage ----------------------------- mm/mmu_notifier.c:484 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by libvirtd/802: #0: ffff9321e3f58148 (&mm->mmap_sem#2){++++}, at: do_mprotect_pkey+0xe1/0x3e0 #1: ffffffff91ae6160 (mmu_notifier_invalidate_range_start){+.+.}, at: change_p4d_range+0x5fa/0x800 #2: ffffffff91ae6e08 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x178/0x460 stack backtrace: CPU: 7 PID: 802 Comm: libvirtd Tainted: G I 5.6.0-rc6-next-20200317+ #2 Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014 Call Trace: dump_stack+0xa4/0xfe lockdep_rcu_suspicious+0xeb/0xf5 __mmu_notifier_invalidate_range_start+0x3ff/0x460 change_p4d_range+0x746/0x800 change_protection+0x1df/0x300 mprotect_fixup+0x245/0x3e0 do_mprotect_pkey+0x23b/0x3e0 __x64_sys_mprotect+0x51/0x70 do_syscall_64+0x91/0xae8 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Link: http://lkml.kernel.org/r/20200317175640.2047-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Roman Penyaev	1b53734bd0	epoll: fix possible lost wakeup on epoll_ctl() path This fixes possible lost wakeup introduced by commit `a218cc4914`. Originally modifications to ep->wq were serialized by ep->wq.lock, but in commit `a218cc4914` ("epoll: use rwlock in order to reduce ep_poll_callback() contention") a new rw lock was introduced in order to relax fd event path, i.e. callers of ep_poll_callback() function. After the change ep_modify and ep_insert (both are called on epoll_ctl() path) were switched to ep->lock, but ep_poll (epoll_wait) was using ep->wq.lock on wqueue list modification. The bug doesn't lead to any wqueue list corruptions, because wake up path and list modifications were serialized by ep->wq.lock internally, but actual waitqueue_active() check prior wake_up() call can be reordered with modifications of ep ready list, thus wake up can be lost. And yes, can be healed by explicit smp_mb(): list_add_tail(&epi->rdlink, &ep->rdllist); smp_mb(); if (waitqueue_active(&ep->wq)) wake_up(&ep->wp); But let's make it simple, thus current patch replaces ep->wq.lock with the ep->lock for wqueue modifications, thus wake up path always observes activeness of the wqueue correcty. Fixes: `a218cc4914` ("epoll: use rwlock in order to reduce ep_poll_callback() contention") Reported-by: Max Neunhoeffer <max@arangodb.com> Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Max Neunhoeffer <max@arangodb.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Jes Sorensen <jes.sorensen@gmail.com> Cc: <stable@vger.kernel.org> [5.1+] Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de References: https://bugzilla.kernel.org/show_bug.cgi?id=205933 Bisected-by: Max Neunhoeffer <max@arangodb.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Michal Hocko	12e967fd8e	mm: do not allow MADV_PAGEOUT for CoW pages Jann has brought up a very interesting point [1]. While shared pages are excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed that way. This can lead to all sorts of hard to debug problems. E.g. performance problems outlined by Daniel [2]. There are runtime environments where there is a substantial memory shared among security domains via CoW memory and a easy to reclaim way of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation in for the parent process which might be more privileged or even open side channel attacks. The feasibility of the latter is not really clear to me TBH but there is no real reason for exposure at this stage. It seems there is no real use case to depend on reclaiming CoW memory via madvise at this stage so it is much easier to simply disallow it and this is what this patch does. Put it simply MADV_{PAGEOUT,COLD} can operate only on the exclusively owned memory which is a straightforward semantic. [1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com [2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com Fixes: `9c276cc65a` ("mm: introduce MADV_COLD") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Chris Down	e26733e0d0	mm, memcg: throttle allocators based on ancestral memory.high Prior to this commit, we only directly check the affected cgroup's memory.high against its usage. However, it's possible that we are being reclaimed as a result of hitting an ancestor memory.high and should be penalised based on that, instead. This patch changes memory.high overage throttling to use the largest overage in its ancestors when considering how many penalty jiffies to charge. This makes sure that we penalise poorly behaving cgroups in the same way regardless of at what level of the hierarchy memory.high was breached. Fixes: `0e4b01df86` ("mm, memcg: throttle allocators when failing reclaim over memory.high") Reported-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> [5.4.x+] Link: http://lkml.kernel.org/r/8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Chris Down	d397a45fc7	mm, memcg: fix corruption on 64-bit divisor in memory.high throttling Commit `0e4b01df86` had a bunch of fixups to use the right division method. However, it seems that after all that it still wasn't right -- div_u64 takes a 32-bit divisor. The headroom is still large (2^32 pages), so on mundane systems you won't hit this, but this should definitely be fixed. Fixes: `0e4b01df86` ("mm, memcg: throttle allocators when failing reclaim over memory.high") Reported-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: <stable@vger.kernel.org> [5.4.x+] Link: http://lkml.kernel.org/r/80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Qian Cai	d72520ad00	page-flags: fix a crash at SetPageError(THP_SWAP) Commit `bd4c82c22c` ("mm, THP, swap: delay splitting THP after swapped out") supported writing THP to a swap device but forgot to upgrade an older commit `df8c94d13c` ("page-flags: define behavior of FS/IO-related flags on compound pages") which could trigger a crash during THP swapping out with DEBUG_VM_PGFLAGS=y, kernel BUG at include/linux/page-flags.h:317! page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page)) page:fffff3b2ec3a8000 refcount:512 mapcount:0 mapping:000000009eb0338c index:0x7f6e58200 head:fffff3b2ec3a8000 order:9 compound_mapcount:0 compound_pincount:0 anon flags: 0x45fffe0000d8454(uptodate\|lru\|workingset\|owner_priv_1\|writeback\|head\|reclaim\|swapbacked) end_swap_bio_write() SetPageError(page) VM_BUG_ON_PAGE(1 && PageCompound(page)) <IRQ> bio_endio+0x297/0x560 dec_pending+0x218/0x430 [dm_mod] clone_endio+0xe4/0x2c0 [dm_mod] bio_endio+0x297/0x560 blk_update_request+0x201/0x920 scsi_end_request+0x6b/0x4b0 scsi_io_completion+0x509/0x7e0 scsi_finish_command+0x1ed/0x2a0 scsi_softirq_done+0x1c9/0x1d0 __blk_mqnterrupt+0xf/0x20 </IRQ> Fix by checking PF_NO_TAIL in those places instead. Fixes: `bd4c82c22c` ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200310235846.1319-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00

1 2 3 4 5 ...

902294 Commits