Our Android phones occur Panic as follows:
[77522.303024][ T9734] Call trace:
[77522.303039][ T9734] dump_backtrace.cfi_jt+0x0/0x8
[77522.303052][ T9734] dump_stack_lvl+0xc4/0x140
[77522.303061][ T9734] dump_stack+0x1c/0x2c
[77522.303123][ T9734] mrdump_common_die+0x3a8/0x544 [mrdump]
[77522.303177][ T9734] ipanic_die+0x24/0x38 [mrdump]
[77522.303189][ T9734] die+0x340/0x698
[77522.303199][ T9734] bug_handler+0x48/0x108
[77522.303210][ T9734] brk_handler+0xac/0x1a8
[77522.303221][ T9734] do_debug_exception+0xe0/0x1e0
[77522.303233][ T9734] el1_dbg+0x38/0x54
[77522.303242][ T9734] el1_sync_handler+0x40/0x88
[77522.303255][ T9734] el1_sync+0x8c/0x140
[77522.303264][ T9734] plist_requeue+0xd4/0x110
[77522.303297][ T9734] tran_get_swap_pages+0xc8/0x364 [memfusion]
[77522.303329][ T9734] probe_android_vh_get_swap_page+0x1b4/0x220 [memfusion]
[77522.303342][ T9734] get_swap_page+0x258/0x304
[77522.303352][ T9734] shrink_page_list+0xe00/0x1e0c
[77522.303361][ T9734] shrink_inactive_list+0x2f4/0xac8
[77522.303373][ T9734] shrink_lruvec+0x1a4/0x34c
[77522.303383][ T9734] shrink_node_memcgs+0x84/0x3b0
[77522.303391][ T9734] shrink_node+0x2c4/0x6e4
[77522.303400][ T9734] shrink_zones+0x16c/0x29c
[77522.303410][ T9734] do_try_to_free_pages+0xe4/0x2bc
[77522.303418][ T9734] try_to_free_pages+0x388/0x7b4
[77522.303429][ T9734] __alloc_pages_direct_reclaim+0x88/0x278
[77522.303438][ T9734] __alloc_pages_slowpath+0x464/0xb24
[77522.303447][ T9734] __alloc_pages_nodemask+0x1f4/0x3dc
[77522.303458][ T9734] do_anonymous_page+0x164/0x914
[77522.303466][ T9734] handle_pte_fault+0x15c/0x9f8
[77522.303476][ T9734] ___handle_speculative_fault+0x234/0xe18
[77522.303485][ T9734] __handle_speculative_fault+0x78/0x21c
[77522.303497][ T9734] do_page_fault+0x36c/0x754
[77522.303506][ T9734] do_translation_fault+0x48/0x64
[77522.303514][ T9734] do_mem_abort+0x6c/0x164
[77522.303522][ T9734] el0_da+0x24/0x34
[77522.303531][ T9734] el0_sync_handler+0xc8/0xf0
[77522.303539][ T9734] el0_sync+0x1b4/0x1c0
The analysis shows that when we iterate the swap_avail_heads list, we get
node A, but before we access node A, node A is maybe deleted, and by the time
we actually access node A, it no longer exists, as follows:
CPU1 thread1 CPU2 thread2
plist_for_each_entry_safe()
get si->avail_lists[node] from swap_avail_heads
remove si->avail_lists[node] from swap_avail_heads
plist_requeue(&si->avail_lists[node])
BUG_ON(plist_node_empty(node)); // trigger
Due to when we use vendor hook of get_swap_page, the get_swap_pages() function
is overridden, use our own spin_lock to protect when iterate swap_avail_heads
list, but now use native swap_avail_lock spin_lock protect when the
swap_avail_heads list to add and delete nodes, so there will be concurrent
access.
So add vendor hook of add/delete/iterate node for avail_list, in this way, we
can use our own spin_lock to protect the swap_avail_heads list to add, delete
and iterate node.
Due to enable_swap_info function to call vendor hook of add_to_avail_list,
need first init swap_avail_heads, so also add vendor hook of
swap_avail_heads_init.
Due to the vendor hook of __cgroup_throttle_swaprate need to call
blkcg_schedule_throttle function, so export it also.
Bug: 225795494
Change-Id: I03107cbda6310fa7ae85e41b8cf1fa8225cafe78
Signed-off-by: Lincheng Yang <lincheng.yang@transsion.com>
Suggested-by: Bing Han <bing.han@transsion.com>
[ Upstream commit e3ff8887e7db757360f97634e0d6f4b8e27a8c46 ]
If the policy defines pd_online_fn(), it should be called after
pd_init_fn(), like blkg_create().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230103112833.2013432-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmKhrZIACgkQONu9yGCS
aT4bxhAAsahNlwa6uWf6brIeZkHy62w0LrZAEr6+TvO2CHLWwhcKIol5ZjdaJD5y
KX7A839Vcdo5iAk0eNUV2MTigp7YK0f7XH9y/u/L3yNLc9YA4isA9PQhnnPc4R7N
mgkmGT7Oz7BbQydyDiLvSwtXJDxBMOzCDTF3/4/42PsdmRmPzLBxzoTpH8wcY4vG
jwGyiyUjUVWAF99uHo0O/Yp8sw8UvudpOX+lbKed76V+fXsbH0PYk1yMMJfWhZ60
TrFh1dmZY7j2bW0+F7rkVPXVGeQGyOlLSUVSFWlugJ8qvxVNpAItjcBUXZ+nChGe
O25/5UiaBHprTIoms05yG1jPZtBbAO2MgLhw6zBCOySBr/e0bligNfJWpjt5D6H3
17+CQ1QeaL9BlzcYr4Ug/y60o2CkfUc/vr2CEQRQBRgj1gjsFWwBI4HVdO982fKC
QClnC55h1wYDsjSJ6Z4l4TKBuEN8rV9D3RfdIaPex5C6JJMAoUNeAojCL+6iyuem
ODSIufKm1I1eHeIS49+tw0Uu4jiAtn9RJfR4+uiV8zftfrDZ1qM/RPuHZTsE9wAl
3jHx6+8mT8NYjxb9Omn4Dp3aOl7Fcx/vPxx9uoj8YjrJtQ3L0EGgCnk0djmMi0b3
sBdKw15ftoJvNNrhQaLiCo+0M3XkcUUBk37ttNuIo4lvqIY23RE=
=piEC
-----END PGP SIGNATURE-----
Merge 5.10.121 into android12-5.10-lts
Changes in 5.10.121
binfmt_flat: do not stop relocating GOT entries prematurely on riscv
parisc/stifb: Implement fb_is_primary_device()
riscv: Initialize thread pointer before calling C functions
riscv: Fix irq_work when SMP is disabled
ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9520 laptop
ALSA: hda/realtek - Fix microphone noise on ASUS TUF B550M-PLUS
ALSA: usb-audio: Cancel pending work at closing a MIDI substream
USB: serial: option: add Quectel BG95 modem
USB: new quirk for Dell Gen 2 devices
usb: dwc3: gadget: Move null pinter check to proper place
usb: core: hcd: Add support for deferring roothub registration
cifs: when extending a file with falloc we should make files not-sparse
xhci: Allow host runtime PM as default for Intel Alder Lake N xHCI
Fonts: Make font size unsigned in font_desc
parisc/stifb: Keep track of hardware path of graphics card
x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails
perf/x86/intel: Fix event constraints for ICL
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
btrfs: add "0x" prefix for unsupported optional features
btrfs: repair super block num_devices automatically
iommu/vt-d: Add RPLS to quirk list to skip TE disabling
drm/virtio: fix NULL pointer dereference in virtio_gpu_conn_get_modes
mwifiex: add mutex lock for call in mwifiex_dfs_chan_sw_work_queue
b43legacy: Fix assigning negative value to unsigned variable
b43: Fix assigning negative value to unsigned variable
ipw2x00: Fix potential NULL dereference in libipw_xmit()
ipv6: fix locking issues with loops over idev->addr_list
fbcon: Consistently protect deferred_takeover with console_lock()
x86/platform/uv: Update TSC sync state for UV5
ACPICA: Avoid cache flush inside virtual machines
drm/komeda: return early if drm_universal_plane_init() fails.
rcu-tasks: Fix race in schedule and flush work
rcu: Make TASKS_RUDE_RCU select IRQ_WORK
sfc: ef10: Fix assigning negative value to unsigned variable
ALSA: jack: Access input_dev under mutex
spi: spi-rspi: Remove setting {src,dst}_{addr,addr_width} based on DMA direction
tools/power turbostat: fix ICX DRAM power numbers
drm/amd/pm: fix double free in si_parse_power_table()
ath9k: fix QCA9561 PA bias level
media: venus: hfi: avoid null dereference in deinit
media: pci: cx23885: Fix the error handling in cx23885_initdev()
media: cx25821: Fix the warning when removing the module
md/bitmap: don't set sb values if can't pass sanity check
mmc: jz4740: Apply DMA engine limits to maximum segment size
drivers: mmc: sdhci_am654: Add the quirk to set TESTCD bit
scsi: megaraid: Fix error check return value of register_chrdev()
scsi: ufs: Use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()
scsi: lpfc: Fix resource leak in lpfc_sli4_send_seq_to_ulp()
ath11k: disable spectral scan during spectral deinit
ASoC: Intel: bytcr_rt5640: Add quirk for the HP Pro Tablet 408
drm/plane: Move range check for format_count earlier
drm/amd/pm: fix the compile warning
ath10k: skip ath10k_halt during suspend for driver state RESTARTING
arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscall
drm: msm: fix error check return value of irq_of_parse_and_map()
ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL
net/mlx5: fs, delete the FTE when there are no rules attached to it
ASoC: dapm: Don't fold register value changes into notifications
mlxsw: spectrum_dcb: Do not warn about priority changes
mlxsw: Treat LLDP packets as control
drm/amdgpu/ucode: Remove firmware load type check in amdgpu_ucode_free_bo
HID: bigben: fix slab-out-of-bounds Write in bigben_probe
ASoC: tscs454: Add endianness flag in snd_soc_component_driver
net: remove two BUG() from skb_checksum_help()
s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES
perf/amd/ibs: Cascade pmu init functions' return value
spi: stm32-qspi: Fix wait_cmd timeout in APM mode
dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default
ipmi:ssif: Check for NULL msg when handling events and messages
ipmi: Fix pr_fmt to avoid compilation issues
rtlwifi: Use pr_warn instead of WARN_ONCE
media: rga: fix possible memory leak in rga_probe
media: coda: limit frame interval enumeration to supported encoder frame sizes
media: imon: reorganize serialization
media: cec-adap.c: fix is_configuring state
openrisc: start CPU timer early in boot
nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags
ASoC: rt5645: Fix errorenous cleanup order
nbd: Fix hung on disconnect request if socket is closed before
net: phy: micrel: Allow probing without .driver_data
media: exynos4-is: Fix compile warning
ASoC: max98357a: remove dependency on GPIOLIB
ASoC: rt1015p: remove dependency on GPIOLIB
can: mcp251xfd: silence clang's -Wunaligned-access warning
x86/microcode: Add explicit CPU vendor dependency
m68k: atari: Make Atari ROM port I/O write macros return void
rxrpc: Return an error to sendmsg if call failed
rxrpc, afs: Fix selection of abort codes
eth: tg3: silence the GCC 12 array-bounds warning
selftests/bpf: fix btf_dump/btf_dump due to recent clang change
gfs2: use i_lock spin_lock for inode qadata
IB/rdmavt: add missing locks in rvt_ruc_loopback
ARM: dts: ox820: align interrupt controller node name with dtschema
ARM: dts: s5pv210: align DMA channels with dtschema
arm64: dts: qcom: msm8994: Fix BLSP[12]_DMA channels count
PM / devfreq: rk3399_dmc: Disable edev on remove()
crypto: ccree - use fine grained DMA mapping dir
soc: ti: ti_sci_pm_domains: Check for null return of devm_kcalloc
fs: jfs: fix possible NULL pointer dereference in dbFree()
ARM: OMAP1: clock: Fix UART rate reporting algorithm
powerpc/fadump: Fix fadump to work with a different endian capture kernel
fat: add ratelimit to fat*_ent_bread()
pinctrl: renesas: rzn1: Fix possible null-ptr-deref in sh_pfc_map_resources()
ARM: versatile: Add missing of_node_put in dcscb_init
ARM: dts: exynos: add atmel,24c128 fallback to Samsung EEPROM
ARM: hisi: Add missing of_node_put after of_find_compatible_node
PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
powerpc/powernv/vas: Assign real address to rx_fifo in vas_rx_win_attr
powerpc/xics: fix refcount leak in icp_opal_init()
powerpc/powernv: fix missing of_node_put in uv_init()
macintosh/via-pmu: Fix build failure when CONFIG_INPUT is disabled
powerpc/iommu: Add missing of_node_put in iommu_init_early_dart
RDMA/hfi1: Prevent panic when SDMA is disabled
drm: fix EDID struct for old ARM OABI format
dt-bindings: display: sitronix, st7735r: Fix backlight in example
ath11k: acquire ab->base_lock in unassign when finding the peer by addr
ath9k: fix ar9003_get_eepmisc
drm/edid: fix invalid EDID extension block filtering
drm/bridge: adv7511: clean up CEC adapter when probe fails
spi: qcom-qspi: Add minItems to interconnect-names
ASoC: mediatek: Fix error handling in mt8173_max98090_dev_probe
ASoC: mediatek: Fix missing of_node_put in mt2701_wm8960_machine_probe
x86/delay: Fix the wrong asm constraint in delay_loop()
drm/ingenic: Reset pixclock rate when parent clock rate changes
drm/mediatek: Fix mtk_cec_mask()
drm/vc4: hvs: Reset muxes at probe time
drm/vc4: txp: Don't set TXP_VSTART_AT_EOF
drm/vc4: txp: Force alpha to be 0xff if it's disabled
libbpf: Don't error out on CO-RE relos for overriden weak subprogs
bpf: Fix excessive memory allocation in stack_map_alloc()
nl80211: show SSID for P2P_GO interfaces
drm/komeda: Fix an undefined behavior bug in komeda_plane_add()
drm: mali-dp: potential dereference of null pointer
spi: spi-ti-qspi: Fix return value handling of wait_for_completion_timeout
scftorture: Fix distribution of short handler delays
net: dsa: mt7530: 1G can also support 1000BASE-X link mode
NFC: NULL out the dev->rfkill to prevent UAF
efi: Add missing prototype for efi_capsule_setup_info
target: remove an incorrect unmap zeroes data deduction
drbd: fix duplicate array initializer
EDAC/dmc520: Don't print an error for each unconfigured interrupt line
mtd: rawnand: denali: Use managed device resources
HID: hid-led: fix maximum brightness for Dream Cheeky
HID: elan: Fix potential double free in elan_input_configured
drm/bridge: Fix error handling in analogix_dp_probe
sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq
spi: img-spfi: Fix pm_runtime_get_sync() error checking
cpufreq: Fix possible race in cpufreq online error path
ath9k_htc: fix potential out of bounds access with invalid rxstatus->rs_keyix
media: hantro: Empty encoder capture buffers by default
drm/panel: simple: Add missing bus flags for Innolux G070Y2-L01
ALSA: pcm: Check for null pointer of pointer substream before dereferencing it
inotify: show inotify mask flags in proc fdinfo
fsnotify: fix wrong lockdep annotations
of: overlay: do not break notify on NOTIFY_{OK|STOP}
drm/msm/dpu: adjust display_v_end for eDP and DP
scsi: ufs: qcom: Fix ufs_qcom_resume()
scsi: ufs: core: Exclude UECxx from SFR dump list
selftests/resctrl: Fix null pointer dereference on open failed
libbpf: Fix logic for finding matching program for CO-RE relocation
mtd: spi-nor: core: Check written SR value in spi_nor_write_16bit_sr_and_check()
x86/pm: Fix false positive kmemleak report in msr_build_context()
mtd: rawnand: cadence: fix possible null-ptr-deref in cadence_nand_dt_probe()
x86/speculation: Add missing prototype for unpriv_ebpf_notify()
ASoC: rk3328: fix disabling mclk on pclk probe failure
perf tools: Add missing headers needed by util/data.h
drm/msm/disp/dpu1: set vbif hw config to NULL to avoid use after memory free during pm runtime resume
drm/msm/dp: stop event kernel thread when DP unbind
drm/msm/dp: fix error check return value of irq_of_parse_and_map()
drm/msm/dsi: fix error checks and return values for DSI xmit functions
drm/msm/hdmi: check return value after calling platform_get_resource_byname()
drm/msm/hdmi: fix error check return value of irq_of_parse_and_map()
drm/msm: add missing include to msm_drv.c
drm/panel: panel-simple: Fix proper bpc for AM-1280800N3TZQW-T00H
drm/rockchip: vop: fix possible null-ptr-deref in vop_bind()
perf tools: Use Python devtools for version autodetection rather than runtime
virtio_blk: fix the discard_granularity and discard_alignment queue limits
x86: Fix return value of __setup handlers
irqchip/exiu: Fix acknowledgment of edge triggered interrupts
irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value
irqchip/aspeed-scu-ic: Fix irq_of_parse_and_map() return value
x86/mm: Cleanup the control_va_addr_alignment() __setup handler
arm64: fix types in copy_highpage()
regulator: core: Fix enable_count imbalance with EXCLUSIVE_GET
drm/msm/dp: fix event thread stuck in wait_event after kthread_stop()
drm/msm/mdp5: Return error code in mdp5_pipe_release when deadlock is detected
drm/msm/mdp5: Return error code in mdp5_mixer_release when deadlock is detected
drm/msm: return an error pointer in msm_gem_prime_get_sg_table()
media: uvcvideo: Fix missing check to determine if element is found in list
iomap: iomap_write_failed fix
spi: spi-fsl-qspi: check return value after calling platform_get_resource_byname()
Revert "cpufreq: Fix possible race in cpufreq online error path"
regulator: qcom_smd: Fix up PM8950 regulator configuration
perf/amd/ibs: Use interrupt regs ip for stack unwinding
ath11k: Don't check arvif->is_started before sending management frames
ASoC: fsl: Fix refcount leak in imx_sgtl5000_probe
ASoC: mxs-saif: Fix refcount leak in mxs_saif_probe
regulator: pfuze100: Fix refcount leak in pfuze_parse_regulators_dt
ASoC: samsung: Use dev_err_probe() helper
ASoC: samsung: Fix refcount leak in aries_audio_probe
kselftest/cgroup: fix test_stress.sh to use OUTPUT dir
scripts/faddr2line: Fix overlapping text section failures
media: aspeed: Fix an error handling path in aspeed_video_probe()
media: exynos4-is: Fix PM disable depth imbalance in fimc_is_probe
media: st-delta: Fix PM disable depth imbalance in delta_probe
media: exynos4-is: Change clk_disable to clk_disable_unprepare
media: pvrusb2: fix array-index-out-of-bounds in pvr2_i2c_core_init
media: vsp1: Fix offset calculation for plane cropping
Bluetooth: fix dangling sco_conn and use-after-free in sco_sock_timeout
Bluetooth: Interleave with allowlist scan
Bluetooth: L2CAP: Rudimentary typo fixes
Bluetooth: LL privacy allow RPA
Bluetooth: use inclusive language in HCI role comments
Bluetooth: use inclusive language when filtering devices
Bluetooth: use hdev lock for accept_list and reject_list in conn req
nvme: set dma alignment to dword
m68k: math-emu: Fix dependencies of math emulation support
lsm,selinux: pass flowi_common instead of flowi to the LSM hooks
sctp: read sk->sk_bound_dev_if once in sctp_rcv()
net: hinic: add missing destroy_workqueue in hinic_pf_to_mgmt_init
ASoC: ti: j721e-evm: Fix refcount leak in j721e_soc_probe_*
media: ov7670: remove ov7670_power_off from ov7670_remove
media: staging: media: rkvdec: Make use of the helper function devm_platform_ioremap_resource()
media: rkvdec: h264: Fix dpb_valid implementation
media: rkvdec: h264: Fix bit depth wrap in pps packet
ext4: reject the 'commit' option on ext2 filesystems
drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init
drm: msm: fix possible memory leak in mdp5_crtc_cursor_set()
x86/sev: Annotate stack change in the #VC handler
drm/msm/dpu: handle pm_runtime_get_sync() errors in bind path
drm/i915: Fix CFI violation with show_dynamic_id()
thermal/drivers/bcm2711: Don't clamp temperature at zero
thermal/drivers/broadcom: Fix potential NULL dereference in sr_thermal_probe
thermal/drivers/core: Use a char pointer for the cooling device name
thermal/core: Fix memory leak in __thermal_cooling_device_register()
thermal/drivers/imx_sc_thermal: Fix refcount leak in imx_sc_thermal_probe
ASoC: wm2000: fix missing clk_disable_unprepare() on error in wm2000_anc_transition()
NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx
ASoC: max98090: Move check for invalid values before casting in max98090_put_enab_tlv()
net: stmmac: selftests: Use kcalloc() instead of kzalloc()
net: stmmac: fix out-of-bounds access in a selftest
hv_netvsc: Fix potential dereference of NULL pointer
rxrpc: Fix listen() setting the bar too high for the prealloc rings
rxrpc: Don't try to resend the request if we're receiving the reply
rxrpc: Fix overlapping ACK accounting
rxrpc: Don't let ack.previousPacket regress
rxrpc: Fix decision on when to generate an IDLE ACK
net: huawei: hinic: Use devm_kcalloc() instead of devm_kzalloc()
hinic: Avoid some over memory allocation
net/smc: postpone sk_refcnt increment in connect()
arm64: dts: rockchip: Move drive-impedance-ohm to emmc phy on rk3399
memory: samsung: exynos5422-dmc: Avoid some over memory allocation
ARM: dts: suniv: F1C100: fix watchdog compatible
soc: qcom: smp2p: Fix missing of_node_put() in smp2p_parse_ipc
soc: qcom: smsm: Fix missing of_node_put() in smsm_parse_ipc
PCI: cadence: Fix find_first_zero_bit() limit
PCI: rockchip: Fix find_first_zero_bit() limit
PCI: dwc: Fix setting error return on MSI DMA mapping failure
ARM: dts: ci4x10: Adapt to changes in imx6qdl.dtsi regarding fec clocks
soc: qcom: llcc: Add MODULE_DEVICE_TABLE()
KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry
KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault
platform/chrome: cros_ec: fix error handling in cros_ec_register()
ARM: dts: imx6dl-colibri: Fix I2C pinmuxing
platform/chrome: Re-introduce cros_ec_cmd_xfer and use it for ioctls
can: xilinx_can: mark bit timing constants as const
ARM: dts: stm32: Fix PHY post-reset delay on Avenger96
ARM: dts: bcm2835-rpi-zero-w: Fix GPIO line name for Wifi/BT
ARM: dts: bcm2837-rpi-cm3-io3: Fix GPIO line names for SMPS I2C
ARM: dts: bcm2837-rpi-3-b-plus: Fix GPIO line name of power LED
ARM: dts: bcm2835-rpi-b: Fix GPIO line names
misc: ocxl: fix possible double free in ocxl_file_register_afu
crypto: marvell/cesa - ECB does not IV
gpiolib: of: Introduce hook for missing gpio-ranges
pinctrl: bcm2835: implement hook for missing gpio-ranges
arm: mediatek: select arch timer for mt7629
powerpc/fadump: fix PT_LOAD segment for boot memory area
mfd: ipaq-micro: Fix error check return value of platform_get_irq()
scsi: fcoe: Fix Wstringop-overflow warnings in fcoe_wwn_from_mac()
firmware: arm_scmi: Fix list protocols enumeration in the base protocol
nvdimm: Fix firmware activation deadlock scenarios
nvdimm: Allow overwrite in the presence of disabled dimms
pinctrl: mvebu: Fix irq_of_parse_and_map() return value
drivers/base/node.c: fix compaction sysfs file leak
dax: fix cache flush on PMD-mapped pages
drivers/base/memory: fix an unlikely reference counting issue in __add_memory_block()
powerpc/8xx: export 'cpm_setbrg' for modules
pinctrl: renesas: core: Fix possible null-ptr-deref in sh_pfc_map_resources()
powerpc/idle: Fix return value of __setup() handler
powerpc/4xx/cpm: Fix return value of __setup() handler
ASoC: atmel-pdmic: Remove endianness flag on pdmic component
ASoC: atmel-classd: Remove endianness flag on class d component
proc: fix dentry/inode overinstantiating under /proc/${pid}/net
ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
PCI: imx6: Fix PERST# start-up sequence
tty: fix deadlock caused by calling printk() under tty_port->lock
crypto: sun8i-ss - rework handling of IV
crypto: sun8i-ss - handle zero sized sg
crypto: cryptd - Protect per-CPU resource by disabling BH.
Input: sparcspkr - fix refcount leak in bbc_beep_probe
PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
hwrng: omap3-rom - fix using wrong clk_disable() in omap_rom_rng_runtime_resume()
powerpc/64: Only WARN if __pa()/__va() called with bad addresses
powerpc/perf: Fix the threshold compare group constraint for power9
macintosh: via-pmu and via-cuda need RTC_LIB
powerpc/fsl_rio: Fix refcount leak in fsl_rio_setup
mfd: davinci_voicecodec: Fix possible null-ptr-deref davinci_vc_probe()
mailbox: forward the hrtimer if not queued and under a lock
RDMA/hfi1: Prevent use of lock before it is initialized
Input: stmfts - do not leave device disabled in stmfts_input_open
OPP: call of_node_put() on error path in _bandwidth_supported()
f2fs: fix dereference of stale list iterator after loop body
iommu/mediatek: Add list_del in mtk_iommu_remove
i2c: at91: use dma safe buffers
cpufreq: mediatek: add missing platform_driver_unregister() on error in mtk_cpufreq_driver_init
cpufreq: mediatek: Use module_init and add module_exit
cpufreq: mediatek: Unregister platform device on exit
MIPS: Loongson: Use hwmon_device_register_with_groups() to register hwmon
i2c: at91: Initialize dma_buf in at91_twi_xfer()
dmaengine: idxd: Fix the error handling path in idxd_cdev_register()
NFS: Do not report EINTR/ERESTARTSYS as mapping errors
NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYS
NFS: Do not report flush errors in nfs_write_end()
NFS: Don't report errors from nfs_pageio_complete() more than once
NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout
video: fbdev: clcdfb: Fix refcount leak in clcdfb_of_vram_setup
dmaengine: stm32-mdma: remove GISR1 register
dmaengine: stm32-mdma: rework interrupt handler
dmaengine: stm32-mdma: fix chan initialization in stm32_mdma_irq_handler()
iommu/amd: Increase timeout waiting for GA log enablement
i2c: npcm: Fix timeout calculation
i2c: npcm: Correct register access width
i2c: npcm: Handle spurious interrupts
i2c: rcar: fix PM ref counts in probe error paths
perf c2c: Use stdio interface if slang is not supported
perf jevents: Fix event syntax error caused by ExtSel
f2fs: fix to avoid f2fs_bug_on() in dec_valid_node_count()
f2fs: fix to do sanity check on block address in f2fs_do_zero_range()
f2fs: fix to clear dirty inode in f2fs_evict_inode()
f2fs: fix deadloop in foreground GC
f2fs: don't need inode lock for system hidden quota
f2fs: fix to do sanity check on total_data_blocks
f2fs: fix fallocate to use file_modified to update permissions consistently
f2fs: fix to do sanity check for inline inode
wifi: mac80211: fix use-after-free in chanctx code
iwlwifi: mvm: fix assert 1F04 upon reconfig
fs-writeback: writeback_sb_inodes:Recalculate 'wrote' according skipped pages
efi: Do not import certificates from UEFI Secure Boot for T2 Macs
bfq: Split shared queues on move between cgroups
bfq: Update cgroup information before merging bio
bfq: Track whether bfq_group is still online
ext4: fix use-after-free in ext4_rename_dir_prepare
ext4: fix warning in ext4_handle_inode_extension
ext4: fix bug_on in ext4_writepages
ext4: filter out EXT4_FC_REPLAY from on-disk superblock field s_state
ext4: fix bug_on in __es_tree_search
ext4: verify dir block before splitting it
ext4: avoid cycles in directory h-tree
ACPI: property: Release subnode properties with data nodes
tracing: Fix potential double free in create_var_ref()
PCI/PM: Fix bridge_d3_blacklist[] Elo i2 overwrite of Gigabyte X299
PCI: qcom: Fix runtime PM imbalance on probe errors
PCI: qcom: Fix unbalanced PHY init on probe errors
mm, compaction: fast_find_migrateblock() should return pfn in the target zone
s390/perf: obtain sie_block from the right address
dlm: fix plock invalid read
dlm: fix missing lkb refcount handling
ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
scsi: dc395x: Fix a missing check on list iterator
scsi: ufs: qcom: Add a readl() to make sure ref_clk gets enabled
drm/amdgpu/cs: make commands with 0 chunks illegal behaviour.
drm/etnaviv: check for reaped mapping in etnaviv_iommu_unmap_gem
drm/nouveau/clk: Fix an incorrect NULL check on list iterator
drm/nouveau/kms/nv50-: atom: fix an incorrect NULL check on list iterator
drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX
drm/i915/dsi: fix VBT send packet port selection for ICL+
md: fix an incorrect NULL check in does_sb_need_changing
md: fix an incorrect NULL check in md_reload_sb
mtd: cfi_cmdset_0002: Move and rename chip_check/chip_ready/chip_good_for_write
mtd: cfi_cmdset_0002: Use chip_ready() for write on S29GL064N
media: coda: Fix reported H264 profile
media: coda: Add more H264 levels for CODA960
ima: remove the IMA_TEMPLATE Kconfig option
Kconfig: Add option for asm goto w/ tied outputs to workaround clang-13 bug
RDMA/hfi1: Fix potential integer multiplication overflow errors
csky: patch_text: Fixup last cpu should be master
irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x
irqchip: irq-xtensa-mx: fix initial IRQ affinity
cfg80211: declare MODULE_FIRMWARE for regulatory.db
mac80211: upgrade passive scan to active scan on DFS channels after beacon rx
um: chan_user: Fix winch_tramp() return value
um: Fix out-of-bounds read in LDT setup
kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]
ftrace: Clean up hash direct_functions on register failures
iommu/msm: Fix an incorrect NULL check on list iterator
nodemask.h: fix compilation error with GCC12
hugetlb: fix huge_pmd_unshare address update
xtensa/simdisk: fix proc_read_simdisk()
rtl818x: Prevent using not initialized queues
ASoC: rt5514: Fix event generation for "DSP Voice Wake Up" control
carl9170: tx: fix an incorrect use of list iterator
stm: ltdc: fix two incorrect NULL checks on list iterator
bcache: improve multithreaded bch_btree_check()
bcache: improve multithreaded bch_sectors_dirty_init()
bcache: remove incremental dirty sector counting for bch_sectors_dirty_init()
bcache: avoid journal no-space deadlock by reserving 1 journal bucket
serial: pch: don't overwrite xmit->buf[0] by x_char
tilcdc: tilcdc_external: fix an incorrect NULL check on list iterator
gma500: fix an incorrect NULL check on list iterator
arm64: dts: qcom: ipq8074: fix the sleep clock frequency
phy: qcom-qmp: fix struct clk leak on probe errors
ARM: dts: s5pv210: Remove spi-cs-high on panel in Aries
ARM: pxa: maybe fix gpio lookup tables
SMB3: EBADF/EIO errors in rename/open caused by race condition in smb2_compound_op
docs/conf.py: Cope with removal of language=None in Sphinx 5.0.0
dt-bindings: gpio: altera: correct interrupt-cells
vdpasim: allow to enable a vq repeatedly
blk-iolatency: Fix inflight count imbalances and IO hangs on offline
coresight: core: Fix coresight device probe failure issue
phy: qcom-qmp: fix reset-controller leak on probe errors
net: ipa: fix page free in ipa_endpoint_trans_release()
net: ipa: fix page free in ipa_endpoint_replenish_one()
xfs: set inode size after creating symlink
xfs: sync lazy sb accounting on quiesce of read-only mounts
xfs: fix chown leaking delalloc quota blocks when fssetxattr fails
xfs: fix incorrect root dquot corruption error when switching group/project quota types
xfs: restore shutdown check in mapped write fault path
xfs: force log and push AIL to clear pinned inodes when aborting mount
xfs: consider shutdown in bmapbt cursor delete assert
xfs: assert in xfs_btree_del_cursor should take into account error
kseltest/cgroup: Make test_stress.sh work if run interactively
thermal/core: fix a UAF bug in __thermal_cooling_device_register()
thermal/core: Fix memory leak in the error path
bfq: Avoid merging queues with different parents
bfq: Drop pointless unlock-lock pair
bfq: Remove pointless bfq_init_rq() calls
bfq: Get rid of __bio_blkcg() usage
bfq: Make sure bfqg for which we are queueing requests is online
block: fix bio_clone_blkg_association() to associate with proper blkcg_gq
Revert "random: use static branch for crng_ready()"
RDMA/rxe: Generate a completion for unsupported/invalid opcode
MIPS: IP27: Remove incorrect `cpu_has_fpu' override
MIPS: IP30: Remove incorrect `cpu_has_fpu' override
ext4: only allow test_dummy_encryption when supported
md: bcache: check the return value of kzalloc() in detached_dev_do_request()
Linux 5.10.121
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I52dd11dc43acfa0ebddd2b6e277c823b96b07327
commit 22b106e5355d6e7a9c3b5cb5ed4ef22ae585ea94 upstream.
Commit d92c370a16 ("block: really clone the block cgroup in
bio_clone_blkg_association") changed bio_clone_blkg_association() to
just clone bio->bi_blkg reference from source to destination bio. This
is however wrong if the source and destination bios are against
different block devices because struct blkcg_gq is different for each
bdev-blkcg pair. This will result in IOs being accounted (and throttled
as a result) multiple times against the same device (src bdev) while
throttling of the other device (dst bdev) is ignored. In case of BFQ the
inconsistency can even result in crashes in bfq_bic_update_cgroup().
Fix the problem by looking up correct blkcg_gq for the cloned bio.
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
Fixes: d92c370a16 ("block: really clone the block cgroup in bio_clone_blkg_association")
CC: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmFVcUcACgkQONu9yGCS
aT4/Mw//b3IUn6Vy0r8Jc6MsU16U+UY0Rb6o8X6J5V7PXMI2RuHIf6+AXm4CDLPZ
jpsgaPB3nSYUz63+b699kB6IZiUTbij8r0O/Yjy1p2/Z6HoDgSOX8WvU25kTO697
MWxZT25Nj8sZzigPuXw1zy1ioZCdeGlRGXrDAoeZt8OL8TMd78eSLISYNQYv38L6
Sg3TbtumEwjfZe3FeyzPA82Qc1jlsZ2ViKJ+E/BC74TJ9DBS5K+uMUzDwDyJEIaB
MwswdjvQIbK5cN+uux6Ok3v4/6/bIKeouYkpLnQvnNtIrn8hk8FXO6OamU6XwTGl
oI26Hu5mjL2WecHvpQJCcn6h8L0w/dMfQPg2b/m1gJ5l58NJobFS3Uy1bMaGlJic
L1K2ZFPHQd+CR9Lvz/umiXqaBgL2K4QKKi28TrWxMgKatrMeip3Lo8krxNuxm0/Z
VpJIsOajWkgf3n5HuQ/zfFGl+YUcjtBUqxO+WR3ocTLlN3kcG6ZjEMxHPK8VYmIr
Yp4s+WyU7uRlGhSy6UpWI78AHcijx5WKS5n25ZI56VJRi38Qxgb3Q+EZ6vlpJuvh
yTCgvjwi4FzLWXeYRR/RXpwzvwS8t5TKJT355ufjqZaAtQk/vE27deFdQs6B7Hqy
17KvN8UjycbWKUXX/zM1CcU6ikXgj/h+q3+kAe99kldpEphjpMs=
=vyz1
-----END PGP SIGNATURE-----
Merge 5.10.70 into android12-5.10-lts
Changes in 5.10.70
PCI: aardvark: Increase polling delay to 1.5s while waiting for PIO response
ocfs2: drop acl cache for directories too
mm: fix uninitialized use in overcommit_policy_handler
usb: gadget: r8a66597: fix a loop in set_feature()
usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave
usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA
usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
cifs: fix incorrect check for null pointer in header_assemble
xen/x86: fix PV trap handling on secondary processors
usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
USB: cdc-acm: fix minor-number release
Revert "USB: bcma: Add a check for devm_gpiod_get"
binder: make sure fd closes complete
staging: greybus: uart: fix tty use after free
Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
usb: dwc3: core: balance phy init and exit
usb: core: hcd: Add support for deferring roothub registration
USB: serial: mos7840: remove duplicated 0xac24 device ID
USB: serial: option: add Telit LN920 compositions
USB: serial: option: remove duplicate USB device ID
USB: serial: option: add device id for Foxconn T99W265
mcb: fix error handling in mcb_alloc_bus()
erofs: fix up erofs_lookup tracepoint
btrfs: prevent __btrfs_dump_space_info() to underflow its free space
xhci: Set HCD flag to defer primary roothub registration
serial: 8250: 8250_omap: Fix RX_LVL register offset
serial: mvebu-uart: fix driver's tx_empty callback
scsi: sd_zbc: Ensure buffer size is aligned to SECTOR_SIZE
drm/amd/pm: Update intermediate power state for SI
net: hso: fix muxed tty registration
comedi: Fix memory leak in compat_insnlist()
afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation
afs: Fix updating of i_blocks on file/dir extension
platform/x86/intel: punit_ipc: Drop wrong use of ACPI_PTR()
enetc: Fix illegal access when reading affinity_hint
enetc: Fix uninitialized struct dim_sample field usage
bnxt_en: Fix TX timeout when TX ring size is set to the smallest
net: hns3: fix change RSS 'hfunc' ineffective issue
net: hns3: check queue id range before using
net/smc: add missing error check in smc_clc_prfx_set()
net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
net: dsa: don't allocate the slave_mii_bus using devres
net: dsa: realtek: register the MDIO bus under devres
kselftest/arm64: signal: Add SVE to the set of features we can check for
kselftest/arm64: signal: Skip tests if required features are missing
s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
gpio: uniphier: Fix void functions to remove return value
qed: rdma - don't wait for resources under hw error recovery flow
net/mlx4_en: Don't allow aRFS for encapsulated packets
atlantic: Fix issue in the pm resume flow.
scsi: iscsi: Adjust iface sysfs attr detection
scsi: target: Fix the pgr/alua_support_store functions
tty: synclink_gt, drop unneeded forward declarations
tty: synclink_gt: rename a conflicting function name
fpga: machxo2-spi: Return an error on failure
fpga: machxo2-spi: Fix missing error code in machxo2_write_complete()
nvme-tcp: fix incorrect h2cdata pdu offset accounting
treewide: Change list_sort to use const pointers
nvme: keep ctrl->namespaces ordered
thermal/core: Potential buffer overflow in thermal_build_list_of_policies()
cifs: fix a sign extension bug
scsi: qla2xxx: Restore initiator in dual mode
scsi: lpfc: Use correct scnprintf() limit
irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
irqchip/gic-v3-its: Fix potential VPE leak on error
md: fix a lock order reversal in md_alloc
x86/asm: Add a missing __iomem annotation in enqcmds()
x86/asm: Fix SETZ size enqcmds() build failure
io_uring: put provided buffer meta data under memcg accounting
blktrace: Fix uaf in blk_trace access after removing by sysfs
net: phylink: Update SFP selected interface on advertising changes
net: macb: fix use after free on rmmod
net: stmmac: allow CSR clock of 300MHz
blk-mq: avoid to iterate over stale request
m68k: Double cast io functions to unsigned long
ipv6: delay fib6_sernum increase in fib6_add
cpufreq: intel_pstate: Override parameters if HWP forced by BIOS
bpf: Add oversize check before call kvcalloc()
xen/balloon: use a kernel thread instead a workqueue
nvme-multipath: fix ANA state updates when a namespace is not present
nvme-rdma: destroy cm id before destroy qp to avoid use after free
sparc32: page align size in arch_dma_alloc
amd/display: downgrade validation failure log level
block: check if a profile is actually registered in blk_integrity_unregister
block: flush the integrity workqueue in blk_integrity_unregister
blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd
compiler.h: Introduce absolute_pointer macro
net: i825xx: Use absolute_pointer for memcpy from fixed memory location
sparc: avoid stringop-overread errors
qnx4: avoid stringop-overread errors
parisc: Use absolute_pointer() to define PAGE0
arm64: Mark __stack_chk_guard as __ro_after_init
alpha: Declare virt_to_phys and virt_to_bus parameter as pointer to volatile
net: 6pack: Fix tx timeout and slot time
spi: Fix tegra20 build with CONFIG_PM=n
EDAC/synopsys: Fix wrong value type assignment for edac_mode
EDAC/dmc520: Assign the proper type to dimm->edac_mode
thermal/drivers/int340x: Do not set a wrong tcc offset on resume
USB: serial: cp210x: fix dropped characters with CP2102
xen/balloon: fix balloon kthread freezing
qnx4: work around gcc false positive warning bug
Linux 5.10.70
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0be3ab08ab5dd724a79c5c5ff8e49c18d2666193
Introduce an rq-qos policy that assigns an I/O priority to requests based
on blk-cgroup configuration settings. This policy has the following
advantages over the ioprio_set() system call:
- This policy is cgroup based so it has all the advantages of cgroups.
- While ioprio_set() does not affect page cache writeback I/O, this rq-qos
controller affects page cache writeback I/O for filesystems that support
assiociating a cgroup with writeback I/O. See also
Documentation/admin-guide/cgroup-v2.rst.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: If51e608ad37ee7a3f57b507bb17900dcfcb263ed
(cherry picked from commit ee9d2a55c960f152b5710078bbe399a4c51eb0a9 git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Before adding more calls in this function, simplify the error path.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
BUG: 187357408
Change-Id: I8568b87d1bebbd3841e42a79b7efe2d0a1bff2bc
(cherry picked from commit f1a7f539c2720906fb10be0af3514b034e1a9fee git://git.kernel.dk/linux-block/ for-5.14/block)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
[ Upstream commit 6c635caef410aa757befbd8857c1eadde5cc22ed ]
On !PREEMPT kernel, we can get below softlockup when doing stress
testing with creating and destroying block cgroup repeatly. The
reason is it may take a long time to acquire the queue's lock in
the loop of blkcg_destroy_blkgs(), or the system can accumulate a
huge number of blkgs in pathological cases. We can add a need_resched()
check on each loop and release locks and do cond_resched() if true
to avoid this issue, since the blkcg_destroy_blkgs() is not called
from atomic contexts.
[ 4757.010308] watchdog: BUG: soft lockup - CPU#11 stuck for 94s!
[ 4757.010698] Call trace:
[ 4757.010700] blkcg_destroy_blkgs+0x68/0x150
[ 4757.010701] cgwb_release_workfn+0x104/0x158
[ 4757.010702] process_one_work+0x1bc/0x3f0
[ 4757.010704] worker_thread+0x164/0x468
[ 4757.010705] kthread+0x108/0x138
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
disk_get_part needs to be paired with a disk_put_part.
Cc: stable@vger.kernel.org
Fixes: ef45fe470e ("blk-cgroup: show global disk stats in root cgroup io.stat")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Similarly to commit 457e490f2b ("blkcg: allocate struct blkcg_gq
outside request queue spinlock"), blkg_create can also trigger
occasional -ENOMEM failures at the radix insertion because any
allocation inside blkg_create has to be non-blocking, making it more
likely to fail. This causes trouble for userspace tools trying to
configure io weights who need to deal with this condition.
This patch reduces the occurrence of -ENOMEMs on this path by preloading
the radix tree element on a GFP_KERNEL context, such that we guarantee
the later non-blocking insertion won't fail.
A similar solution exists in blkcg_init_queue for the same situation.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The test and the explaination of the patch as bellow.
Before test we added more debug code in blkg_async_bio_workfn():
int count = 0
if (bios.head && bios.head->bi_next) {
need_plug = true;
blk_start_plug(&plug);
}
while ((bio = bio_list_pop(&bios))) {
/*io_punt is a sysctl user interface to control the print*/
if(io_punt) {
printk("[%s:%d] bio start,size:%llu,%d count=%d plug?%d\n",
current->comm, current->pid, bio->bi_iter.bi_sector,
(bio->bi_iter.bi_size)>>9, count++, need_plug);
}
submit_bio(bio);
}
if (need_plug)
blk_finish_plug(&plug);
Steps that need to be set to trigger *PUNT* io before testing:
mount -t btrfs -o compress=lzo /dev/sda6 /btrfs
mount -t cgroup2 nodev /cgroup2
mkdir /cgroup2/cg3
echo "+io" > /cgroup2/cgroup.subtree_control
echo "8:0 wbps=1048576000" > /cgroup2/cg3/io.max #1000M/s
echo $$ > /cgroup2/cg3/cgroup.procs
Then use dd command to test btrfs PUNT io in current shell:
dd if=/dev/zero of=/btrfs/file bs=64K count=100000
Test hardware environment as below:
[root@localhost btrfs]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
With above debug code, test command and test environment, I did the
tests under 3 different system loads, which are triggered by stress:
1, Run 64 threads by command "stress -c 64 &"
[53615.975974] [kworker/u66:18:1490] bio start,size:45583056,8 count=0 plug?1
[53615.975980] [kworker/u66:18:1490] bio start,size:45583064,8 count=1 plug?1
[53615.975984] [kworker/u66:18:1490] bio start,size:45583072,8 count=2 plug?1
[53615.975987] [kworker/u66:18:1490] bio start,size:45583080,8 count=3 plug?1
[53615.975990] [kworker/u66:18:1490] bio start,size:45583088,8 count=4 plug?1
[53615.975993] [kworker/u66:18:1490] bio start,size:45583096,8 count=5 plug?1
... ...
[53615.977041] [kworker/u66:18:1490] bio start,size:45585480,8 count=303 plug?1
[53615.977044] [kworker/u66:18:1490] bio start,size:45585488,8 count=304 plug?1
[53615.977047] [kworker/u66:18:1490] bio start,size:45585496,8 count=305 plug?1
[53615.977050] [kworker/u66:18:1490] bio start,size:45585504,8 count=306 plug?1
[53615.977053] [kworker/u66:18:1490] bio start,size:45585512,8 count=307 plug?1
[53615.977056] [kworker/u66:18:1490] bio start,size:45585520,8 count=308 plug?1
[53615.977058] [kworker/u66:18:1490] bio start,size:45585528,8 count=309 plug?1
2, Run 32 threads by command "stress -c 32 &"
[50586.290521] [kworker/u66:6:32351] bio start,size:45806496,8 count=0 plug?1
[50586.290526] [kworker/u66:6:32351] bio start,size:45806504,8 count=1 plug?1
[50586.290529] [kworker/u66:6:32351] bio start,size:45806512,8 count=2 plug?1
[50586.290531] [kworker/u66:6:32351] bio start,size:45806520,8 count=3 plug?1
[50586.290533] [kworker/u66:6:32351] bio start,size:45806528,8 count=4 plug?1
[50586.290535] [kworker/u66:6:32351] bio start,size:45806536,8 count=5 plug?1
... ...
[50586.299640] [kworker/u66:5:32350] bio start,size:45808576,8 count=252 plug?1
[50586.299643] [kworker/u66:5:32350] bio start,size:45808584,8 count=253 plug?1
[50586.299646] [kworker/u66:5:32350] bio start,size:45808592,8 count=254 plug?1
[50586.299649] [kworker/u66:5:32350] bio start,size:45808600,8 count=255 plug?1
[50586.299652] [kworker/u66:5:32350] bio start,size:45808608,8 count=256 plug?1
[50586.299663] [kworker/u66:5:32350] bio start,size:45808616,8 count=257 plug?1
[50586.299665] [kworker/u66:5:32350] bio start,size:45808624,8 count=258 plug?1
[50586.299668] [kworker/u66:5:32350] bio start,size:45808632,8 count=259 plug?1
3, Don't run thread by stress
[50861.355246] [kworker/u66:19:32376] bio start,size:13544504,8 count=0 plug?0
[50861.355288] [kworker/u66:19:32376] bio start,size:13544512,8 count=0 plug?0
[50861.355322] [kworker/u66:19:32376] bio start,size:13544520,8 count=0 plug?0
[50861.355353] [kworker/u66:19:32376] bio start,size:13544528,8 count=0 plug?0
[50861.355392] [kworker/u66:19:32376] bio start,size:13544536,8 count=0 plug?0
[50861.355431] [kworker/u66:19:32376] bio start,size:13544544,8 count=0 plug?0
[50861.355468] [kworker/u66:19:32376] bio start,size:13544552,8 count=0 plug?0
[50861.355499] [kworker/u66:19:32376] bio start,size:13544560,8 count=0 plug?0
[50861.355532] [kworker/u66:19:32376] bio start,size:13544568,8 count=0 plug?0
[50861.355575] [kworker/u66:19:32376] bio start,size:13544576,8 count=0 plug?0
[50861.355618] [kworker/u66:19:32376] bio start,size:13544584,8 count=0 plug?0
[50861.355659] [kworker/u66:19:32376] bio start,size:13544592,8 count=0 plug?0
[50861.355740] [kworker/u66:0:32346] bio start,size:13544600,8 count=0 plug?1
[50861.355748] [kworker/u66:0:32346] bio start,size:13544608,8 count=1 plug?1
[50861.355962] [kworker/u66:2:32347] bio start,size:13544616,8 count=0 plug?0
[50861.356272] [kworker/u66:7:31962] bio start,size:13544624,8 count=0 plug?0
[50861.356446] [kworker/u66:7:31962] bio start,size:13544632,8 count=0 plug?0
[50861.356567] [kworker/u66:7:31962] bio start,size:13544640,8 count=0 plug?0
[50861.356707] [kworker/u66:19:32376] bio start,size:13544648,8 count=0 plug?0
[50861.356748] [kworker/u66:15:32355] bio start,size:13544656,8 count=0 plug?0
[50861.356825] [kworker/u66:17:31970] bio start,size:13544664,8 count=0 plug?0
Analysis of above 3 test results with different system load:
>From above test, we can see more and more continuous bios can be plugged
with system load increasing. When run "stress -c 64 &", 310 continuous
bios are plugged; When run "stress -c 32 &", 260 continuous bios are
plugged; When don't run stress, at most only 2 continuous bios are
plugged, in most cases, bio_list only contains one single bio.
How to explain above phenomenon:
We know, in submit_bio(), if the bio is a REQ_CGROUP_PUNT io, it will
queue a work to workqueue blkcg_punt_bio_wq. But when the workqueue is
scheduled, it depends on the system load. When system load is low, the
workqueue will be quickly scheduled, and the bio in bio_list will be
quickly processed in blkg_async_bio_workfn(), so there is less chance
that the same io submit thread can add multiple continuous bios to
bio_list before workqueue is scheduled to run. The analysis aligned with
above test "3".
When system load is high, there is some delay before the workqueue can
be scheduled to run, the higher the system load the greater the delay.
So there is more chance that the same io submit thread can add multiple
continuous bios to bio_list. Then when the workqueue is scheduled to run,
there are more continuous bios in bio_list, which will be processed in
blkg_async_bio_workfn(). The analysis aligned with above test "1" and "2".
According to test, we can get io performance improved with the patch,
especially when system load is higher. Another optimazition is to use
the plug only when bio_list contains at least 2 bios.
Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Curently, iocost syncs the delay duration to the outstanding debt amount,
which seemed enough to protect the system from anon memory hogs. However,
that was mostly because the delay calcuation was using hweight_inuse which
quickly converges towards zero under debt for delay duration calculation,
often pusnishing debtors overly harshly for longer than deserved.
The previous patch fixed the delay calcuation and now the protection against
anonymous memory hogs isn't enough because the effect of delay is indirect
and non-linear and a huge amount of future debt can accumulate abruptly
while unthrottled.
This patch implements delay hysteresis so that delay is decayed
exponentially over time instead of getting cleared immediately as debt is
paid off. While the overall behavior is similar to the blk-cgroup
implementation used by blk-iolatency, a lot of the details are different and
due to the empirical nature of the mechanism, it's challenging to adapt the
mechanism for one controller without negatively impacting the other.
As the delay is gradually decayed now, there's no point in running it from
its own hrtimer. Periodic updates are now performed from ioc_timer_fn() and
the dedicated hrtimer is removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Normally, blkcg_iolatency_exit() will free related memory in iolatency
when cleanup queue. But if blk_throtl_init() return error and queue init
fail, blkcg_iolatency_exit() will not do that for us. Then it cause
memory leak.
Fixes: d706751215 ("block: introduce blk-iolatency io controller")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In order to improve consistency and usability in cgroup stat accounting,
we would like to support the root cgroup's io.stat.
Since the root cgroup has processes doing io even if the system has no
explicitly created cgroups, we need to be careful to avoid overhead in
that case. For that reason, the rstat algorithms don't handle the root
cgroup, so just turning the file on wouldn't give correct statistics.
To get around this, we simulate flushing the iostat struct by filling it
out directly from global disk stats. The result is a root cgroup io.stat
file consistent with both /proc/diskstats and io.stat.
Note that in order to collect the disk stats, we needed to iterate over
devices. To facilitate that, we had to change the linkage of a disk_type
to external so that it can be used from blk-cgroup.c to iterate over
disks.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Boris Burkov <boris@bur.io>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Previously, the code which printed io.stat only needed access to the
generic rstat flushing code, but since we plan to write some more
specific code for preparing root cgroup stats, we need to manipulate
iostat structs directly. Since declaring static functions ahead does not
seem like common practice in this file, simply move the iostat functions
up. We only plan to use blkg_iostat_set, but it seems better to keep them
all together.
Signed-off-by: Boris Burkov <boris@bur.io>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We never set any congested bits in the group writeback instances of it.
And for the simpler bdi-wide case a simple scalar field is all that
that is needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The make_request_fn is a little weird in that it sits directly in
struct request_queue instead of an operation vector. Replace it with
a block_device_operations method called submit_bio (which describes much
better what it does). Also remove the request_queue argument to it, as
the queue can be derived pretty trivially from the bio.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a statement that is indented one level too deeply, fix it
by removing a tab.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkcg_bio_issue_check is a giant inline function that does three entirely
different things. Factor out the blk-cgroup related bio initalization
into a new helper, and the open code the sequence in the only caller,
relying on the fact that all the actual functionality is stubbed out for
non-cgroup builds.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
By moving the initial blkg lookup into blkg_tryget_closest we get
a nicely self contained routines that does all the RCU locking.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The root_blkg is only torn down at the very end of removing a queue.
So in the I/O submission path is always has a life reference and we
can just grab another one using blkg_get instead of doing a tryget
and parent walk that won't lead anywhere.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No good reason to keep these two functions split.
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull in block-5.7 fixes for 5.8. Mostly to resolve a conflict with
the blk-iocost changes, but we also need the base of the bdi
use-after-free as well as we build on top of it.
* block-5.7:
nvme: fix possible hang when ns scanning fails during error recovery
nvme-pci: fix "slimmer CQ head update"
bdi: add a ->dev_name field to struct backing_dev_info
bdi: use bdi_dev_name() to get device name
bdi: move bdi_dev_name out of line
vboxsf: don't use the source name in the bdi name
iocost: protect iocg->abs_vdebt with iocg->waitq.lock
block: remove the bd_openers checks in blk_drop_partitions
nvme: prevent double free in nvme_alloc_ns() error handling
null_blk: Cleanup zoned device initialization
null_blk: Fix zoned command handling
block: remove unused header
blk-iocost: Fix error on iocost_ioc_vrate_adj
bdev: Reduce time holding bd_mutex in sync in blkdev_close()
buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use the common interface bdi_dev_name() to get device name.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Add missing <linux/backing-dev.h> include BFQ
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The use_delay mechanism was introduced by blk-iolatency to hold memory
allocators accountable for the reclaim and other shared IOs they cause. The
duration of the delay is dynamically balanced between iolatency increasing the
value on each target miss and it auto-decaying as time passes and threads get
delayed on it.
While this works well for iolatency, iocost's control model isn't compatible
with it. There is no repeated "violation" events which can be balanced against
auto-decaying. iocost instead knows how much a given cgroup is over budget and
wants to prevent that cgroup from issuing IOs while over budget. Until now,
iocost has been adding the cost of force-issued IOs. However, this doesn't
reflect the amount which is already over budget and is simply not enough to
counter the auto-decaying allowing anon-memory leaking low priority cgroup to
go over its alloted share of IOs.
As auto-decaying doesn't make much sense for iocost, this patch introduces a
different mode of operation for use_delay - when blkcg_set_delay() are used
insted of blkcg_add/use_delay(), the delay duration is not auto-decayed until it
is explicitly cleared with blkcg_clear_delay(). iocost is updated to keep the
delay duration synchronized to the budget overage amount.
With this change, iocost can effectively police cgroups which generate
significant amount of force-issued IOs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
don't get offlined while there are active cgwbs on them. However, it
ends up making offlining unordered sometimes causing parents to be
offlined before children.
Let's fix this by making child blkcgs pin the parents' online states.
Note that pin/unpin names are chosen over get/put intentionally
because css uses get/put online for something different.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
don't get offlined while there are active cgwbs on them. However, it
ends up making offlining unordered sometimes causing parents to be
offlined before children.
To fix it, we want child blkcgs to pin the parents' online states
turning the refcnt into a more generic online pinning mechanism.
In prepartion,
* blkcg->cgwb_refcnt -> blkcg->online_pin
* blkcg_cgwb_get/put() -> blkcg_pin/unpin_online()
* Take them out of CONFIG_CGROUP_WRITEBACK
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Current make_request based drivers use either blk_alloc_queue_node or
blk_alloc_queue to allocate a queue, and then set up the make_request_fn
function pointer and a few parameters using the blk_queue_make_request
helper. Simplify this by passing the make_request pointer to
blk_alloc_queue, and while at it merge the _node variant into the main
helper by always passing a node_id, and remove the superfluous gfp_mask
parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since blk_drain_queue had already been removed, so this function
is not needed anymore.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkg_rwstat is now only used by bfq-iosched and blk-throtl when on
cgroup1. Let's move it into its own files and gate it behind a config
option.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk-cgroup has been using blkg_rwstat to track basic IO stats.
Unfortunately, reading recursive stats scales badly as itinvolves
walking all descendants. On systems with a huge number of cgroups
(dead or alive), this can lead to substantial CPU cost when reading IO
stats.
This patch reimplements basic IO stats using cgroup rstat which uses
more memory but makes recursive stat reading O(# descendants which
have been active since last reading) instead of O(# descendants).
* blk-cgroup core no longer uses sync/async stats. Introduce new stat
enums - BLKG_IOSTAT_{READ|WRITE|DISCARD}.
* Add blkg_iostat[_set] which encapsulates byte and io stats, last
values for propagation delta calculation and u64_stats_sync for
correctness on 32bit archs.
* Update the new percpu stat counters directly and implement
blkcg_rstat_flush() to implement propagation.
* blkg_print_stat() can now bring the stats up to date by calling
cgroup_rstat_flush() and print them instead of directly summing up
all descendants.
* It now allocates 96 bytes per cpu. It used to be 40 bytes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Dan Schatzberg <dschatzberg@fb.com>
Cc: Daniel Xu <dlxu@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkcg_print_stat() iterates blkgs under RCU and doesn't test whether
the blkg is online. This can call into pd_stat_fn() on a pd which is
still being initialized leading to an oops.
The heaviest operation - recursively summing up rwstat counters - is
already done while holding the queue_lock. Expand queue_lock to cover
the other operations and skip the blkg if it isn't online yet. The
online state is protected by both blkcg and queue locks, so this
guarantees that only online blkgs are processed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Roman Gushchin <guro@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Fixes: 903d23f0a3 ("blk-cgroup: allow controllers to output their own stats")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkcg_activate_policy() has the following bugs.
* cf09a8ee19 ("blkcg: pass @q and @blkcg into
blkcg_pol_alloc_pd_fn()") added @blkcg to ->pd_alloc_fn(); however,
blkcg_activate_policy() ends up using pd's allocated for the root
blkcg for all preallocations, so ->pd_init_fn() for non-root blkcgs
can be passed in pd's which are allocated for the root blkcg.
For blk-iocost, this means that ->pd_init_fn() can write beyond the
end of the allocated object as it determines the length of the flex
array at the end based on the blkcg's nesting level.
* Each pd is initialized as they get allocated. If alloc fails, the
policy will get freed with pd's initialized on it.
* After the above partial failure, the partial pds are not freed.
This patch fixes all the above issues by
* Restructuring blkcg_activate_policy() so that alloc and init passes
are separate. Init takes place only after all allocs succeeded and
on failure all allocated pds are freed.
* Unifying and fixing the cleanup of the remaining pd_prealloc.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: cf09a8ee19 ("blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since commit 795fe54c2a ("bfq: Add per-device weight"), bfq uses
blkg_conf_prep() and blkg_conf_finish(), which are not exported. So, it
causes linkage error if bfq compiled as a module.
Fixes: 795fe54c2a ("bfq: Add per-device weight")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Separate out blkcg_conf_get_disk() so that it can be used by blkcg
policy interface file input parsers before the policy is actually
enabled. This doesn't introduce any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For policies which can do enough initialization from ->cpd_alloc_fn(),
make ->cpd_init_fn() optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of @node, pass in @q and @blkcg so that the alloc function has
more context. This doesn't cause any behavior change and will be used
by io.weight implementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, ->pd_stat() is called only when moduleparam
blkcg_debug_stats is set which prevents it from printing non-debug
policy-specific statistics. Let's move debug testing down so that
->pd_stat() can print non-debug stat too. This patch doesn't cause
any visible behavior change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When a shared kthread needs to issue a bio for a cgroup, doing so
synchronously can lead to priority inversions as the kthread can be
trapped waiting for that cgroup. This patch implements
REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing
to a dedicated per-blkcg work item to avoid such priority inversions.
This will be used to fix priority inversions in btrfs compression and
should be generally useful as we grow filesystem support for
comprehensive IO control.
Cc: Chris Mason <clm@fb.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
btrfs is going to use css_put() and wbc helpers to improve cgroup
writeback support. Add dummy css_get() definition and export wbc
helpers to prepare for module and !CONFIG_CGROUP builds.
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With the psi stuff in place we can use the memstall flag to indicate
pressure that happens from throttling.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This structure and assorted infrastructure is only used by the bfq I/O
scheduler. Move it there instead of bloating the common code.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When sampling the blkcg counts we don't need atomics or per-cpu
variables. Introduce a new structure just containing plain u64
counters.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Returning a structure generates rather bad code, so switch to passing
by reference. Also don't require the structure to be zeroed and add
to the 0-initialized counters, but actually set the counters to the
calculated value.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>