Commit Graph

1020 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
9100d24dfd This is the 5.10.215 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmYaZdgACgkQONu9yGCS
 aT4oMxAA0pATFAq8RN5f9CmYlMg5HqHgzZ8lJv8P0/reOINhUa+F5sJb1n+x+Ch4
 WQbmiFeZRzfsKZ2qKhIdNR0Lg+9JOr/DtYXdSBZ6InfSWrTAIrQ9fjl5Warkmcgg
 O4WbgF5BVgU3vGFATgxLvnUZwhR1D7WK93oMDunzrT7+OqyncU3f1Uj53ZAu9030
 z18UNqnTxDLYH/CMGwAeRkaZqBev9gZ1HdgQWA27SVLqWQwZq0al81Cmlo+ECVmk
 5dF6V2pid4qfKGJjDDfx1NS0PVnoP68iK4By1SXyoFV9VBiSwp77nUUyDr7YsHsT
 u8GpZHr9jZvSO5/xtKv20NPLejTPCRKc06CbkwpikDRtGOocBL8em0GuVqlf8hMs
 KwDb6ZEzYhXZGPJHbJM+aRD1tq/KHw9X7TrldOszMQPr6lubBtscPbg1FCg3OlcC
 HUrtub0i275x7TH0dJeRTD8TRE9jRmF+tl7KQytEJM3JRrquFjLyhDj+/VJnZkiB
 lzj3FRf4zshzgz4+CAeqXO/8Lu8b3fGYmcW1acCmk7emjDcXUKojPj/Aig6T4l7P
 oCWDY3+w1E6eiyE8BazxY1KUa/41ld0VJnlW5JWGRaDFTJwrk0h6/rvf9qImSckw
 IGx24UezRyp6NS1op3Qm2iwHLr41pFRfKxNm9ppgH9iBPzOhe38=
 =pkLL
 -----END PGP SIGNATURE-----

Merge 5.10.215 into android12-5.10-lts

Changes in 5.10.215
	amdkfd: use calloc instead of kzalloc to avoid integer overflow
	Documentation/hw-vuln: Update spectre doc
	x86/cpu: Support AMD Automatic IBRS
	x86/bugs: Use sysfs_emit()
	timers: Update kernel-doc for various functions
	timers: Use del_timer_sync() even on UP
	timers: Rename del_timer_sync() to timer_delete_sync()
	wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach
	media: staging: ipu3-imgu: Set fields before media_entity_pads_init()
	clk: qcom: gcc-sdm845: Add soft dependency on rpmhpd
	smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
	smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
	arm: dts: marvell: Fix maxium->maxim typo in brownstone dts
	drm/vmwgfx: stop using ttm_bo_create v2
	drm/vmwgfx: switch over to the new pin interface v2
	drm/vmwgfx/vmwgfx_cmdbuf_res: Remove unused variable 'ret'
	drm/vmwgfx: Fix some static checker warnings
	drm/vmwgfx: Fix possible null pointer derefence with invalid contexts
	serial: max310x: fix NULL pointer dereference in I2C instantiation
	media: xc4000: Fix atomicity violation in xc4000_get_frequency
	KVM: Always flush async #PF workqueue when vCPU is being destroyed
	sparc64: NMI watchdog: fix return value of __setup handler
	sparc: vDSO: fix return value of __setup handler
	crypto: qat - fix double free during reset
	crypto: qat - resolve race condition during AER recovery
	selftests/mqueue: Set timeout to 180 seconds
	ext4: correct best extent lstart adjustment logic
	block: introduce zone_write_granularity limit
	block: Clear zone limits for a non-zoned stacked queue
	bounds: support non-power-of-two CONFIG_NR_CPUS
	fat: fix uninitialized field in nostale filehandles
	ubifs: Set page uptodate in the correct place
	ubi: Check for too small LEB size in VTBL code
	ubi: correct the calculation of fastmap size
	mtd: rawnand: meson: fix scrambling mode value in command macro
	parisc: Avoid clobbering the C/B bits in the PSW with tophys and tovirt macros
	parisc: Fix ip_fast_csum
	parisc: Fix csum_ipv6_magic on 32-bit systems
	parisc: Fix csum_ipv6_magic on 64-bit systems
	parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds
	PM: suspend: Set mem_sleep_current during kernel command line setup
	clk: qcom: gcc-ipq6018: fix terminating of frequency table arrays
	clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays
	clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays
	clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays
	powerpc/fsl: Fix mfpmr build errors with newer binutils
	USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB
	USB: serial: add device ID for VeriFone adapter
	USB: serial: cp210x: add ID for MGP Instruments PDS100
	USB: serial: option: add MeiG Smart SLM320 product
	USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M
	PM: sleep: wakeirq: fix wake irq warning in system suspend
	mmc: tmio: avoid concurrent runs of mmc_request_done()
	fuse: fix root lookup with nonzero generation
	fuse: don't unhash root
	usb: typec: ucsi: Clean up UCSI_CABLE_PROP macros
	printk/console: Split out code that enables default console
	serial: Lock console when calling into driver before registration
	btrfs: fix off-by-one chunk length calculation at contains_pending_extent()
	PCI: Drop pci_device_remove() test of pci_dev->driver
	PCI/PM: Drain runtime-idle callbacks before driver removal
	PCI/ERR: Cache RCEC EA Capability offset in pci_init_capabilities()
	PCI: Cache PCIe Device Capabilities register
	PCI: Work around Intel I210 ROM BAR overlap defect
	PCI/ASPM: Make Intel DG2 L1 acceptable latency unlimited
	PCI/DPC: Quirk PIO log size for certain Intel Root Ports
	PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports
	Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d""
	dm-raid: fix lockdep waring in "pers->hot_add_disk"
	mac802154: fix llsec key resources release in mac802154_llsec_key_del
	mm: swap: fix race between free_swap_and_cache() and swapoff()
	mmc: core: Fix switch on gp3 partition
	drm/etnaviv: Restore some id values
	hwmon: (amc6821) add of_match table
	ext4: fix corruption during on-line resize
	nvmem: meson-efuse: fix function pointer type mismatch
	slimbus: core: Remove usage of the deprecated ida_simple_xx() API
	phy: tegra: xusb: Add API to retrieve the port number of phy
	usb: gadget: tegra-xudc: Use dev_err_probe()
	usb: gadget: tegra-xudc: Fix USB3 PHY retrieval logic
	speakup: Fix 8bit characters from direct synth
	PCI/ERR: Clear AER status only when we control AER
	PCI/AER: Block runtime suspend when handling errors
	nfs: fix UAF in direct writes
	kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
	PCI: dwc: endpoint: Fix advertised resizable BAR size
	vfio/platform: Disable virqfds on cleanup
	ring-buffer: Fix waking up ring buffer readers
	ring-buffer: Do not set shortest_full when full target is hit
	ring-buffer: Fix resetting of shortest_full
	ring-buffer: Fix full_waiters_pending in poll
	soc: fsl: qbman: Always disable interrupts when taking cgr_lock
	soc: fsl: qbman: Add helper for sanity checking cgr ops
	soc: fsl: qbman: Add CGR update function
	soc: fsl: qbman: Use raw spinlock for cgr_lock
	s390/zcrypt: fix reference counting on zcrypt card objects
	drm/panel: do not return negative error codes from drm_panel_get_modes()
	drm/exynos: do not return negative values from .get_modes()
	drm/imx/ipuv3: do not return negative values from .get_modes()
	drm/vc4: hdmi: do not return negative values from .get_modes()
	memtest: use {READ,WRITE}_ONCE in memory scanning
	nilfs2: fix failure to detect DAT corruption in btree and direct mappings
	nilfs2: prevent kernel bug at submit_bh_wbc()
	cpufreq: dt: always allocate zeroed cpumask
	x86/CPU/AMD: Update the Zenbleed microcode revisions
	net: hns3: tracing: fix hclgevf trace event strings
	wireguard: netlink: check for dangling peer via is_dead instead of empty list
	wireguard: netlink: access device through ctx instead of peer
	ahci: asm1064: correct count of reported ports
	ahci: asm1064: asm1166: don't limit reported ports
	drm/amd/display: Return the correct HDCP error code
	drm/amd/display: Fix noise issue on HDMI AV mute
	dm snapshot: fix lockup in dm_exception_table_exit
	vxge: remove unnecessary cast in kfree()
	x86/stackprotector/32: Make the canary into a regular percpu variable
	x86/pm: Work around false positive kmemleak report in msr_build_context()
	scripts: kernel-doc: Fix syntax error due to undeclared args variable
	comedi: comedi_test: Prevent timers rescheduling during deletion
	cpufreq: brcmstb-avs-cpufreq: fix up "add check for cpufreq_cpu_get's return value"
	netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
	netfilter: nf_tables: disallow anonymous set with timeout flag
	netfilter: nf_tables: reject constant set with timeout
	Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory
	xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
	KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
	ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
	USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
	usb: gadget: ncm: Fix handling of zero block length packets
	usb: port: Don't try to peer unused USB ports based on location
	tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
	mei: me: add arrow lake point S DID
	mei: me: add arrow lake point H DID
	vt: fix unicode buffer corruption when deleting characters
	fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
	tee: optee: Fix kernel panic caused by incorrect error handling
	xen/events: close evtchn after mapping cleanup
	printk: Update @console_may_schedule in console_trylock_spinning()
	btrfs: allocate btrfs_ioctl_defrag_range_args on stack
	x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
	x86/bugs: Add asm helpers for executing VERW
	x86/entry_64: Add VERW just before userspace transition
	x86/entry_32: Add VERW just before userspace transition
	x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
	KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
	KVM/VMX: Move VERW closer to VMentry for MDS mitigation
	x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
	Documentation/hw-vuln: Add documentation for RFDS
	x86/rfds: Mitigate Register File Data Sampling (RFDS)
	KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
	perf/core: Fix reentry problem in perf_output_read_group()
	efivarfs: Request at most 512 bytes for variable names
	powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
	serial: sc16is7xx: convert from _raw_ to _noinc_ regmap functions for FIFO
	mm/memory-failure: fix an incorrect use of tail pages
	mm/migrate: set swap entry values of THP tail pages properly.
	init: open /initrd.image with O_LARGEFILE
	wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
	exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
	hexagon: vmlinux.lds.S: handle attributes section
	mmc: core: Initialize mmc_blk_ioc_data
	mmc: core: Avoid negative index with array access
	net: ll_temac: platform_get_resource replaced by wrong function
	usb: cdc-wdm: close race between read and workqueue
	ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
	scsi: core: Fix unremoved procfs host directory regression
	staging: vc04_services: changen strncpy() to strscpy_pad()
	staging: vc04_services: fix information leak in create_component()
	USB: core: Add hub_get() and hub_put() routines
	usb: dwc2: host: Fix remote wakeup from hibernation
	usb: dwc2: host: Fix hibernation flow
	usb: dwc2: host: Fix ISOC flow in DDMA mode
	usb: dwc2: gadget: LPM flow fix
	usb: udc: remove warning when queue disabled ep
	usb: typec: ucsi: Ack unsupported commands
	usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
	scsi: qla2xxx: Split FCE|EFT trace control
	scsi: qla2xxx: Fix command flush on cable pull
	scsi: qla2xxx: Delay I/O Abort on PCI error
	x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
	PCI/DPC: Quirk PIO log size for Intel Ice Lake Root Ports
	scsi: lpfc: Correct size for wqe for memset()
	USB: core: Fix deadlock in usb_deauthorize_interface()
	nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
	ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()
	tcp: properly terminate timers for kernel sockets
	ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
	bpf: Protect against int overflow for stack access size
	Octeontx2-af: fix pause frame configuration in GMP mode
	dm integrity: fix out-of-range warning
	r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
	x86/cpufeatures: Add new word for scattered features
	Bluetooth: hci_event: set the conn encrypted before conn establishes
	Bluetooth: Fix TOCTOU in HCI debugfs implementation
	netfilter: nf_tables: disallow timeout for anonymous sets
	net/rds: fix possible cp null dereference
	vfio/pci: Disable auto-enable of exclusive INTx IRQ
	vfio/pci: Lock external INTx masking ops
	vfio: Introduce interface to flush virqfd inject workqueue
	vfio/pci: Create persistent INTx handler
	vfio/platform: Create persistent IRQ handlers
	vfio/fsl-mc: Block calling interrupt handler without trigger
	io_uring: ensure '0' is returned on file registration success
	Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
	mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
	x86/srso: Add SRSO mitigation for Hygon processors
	block: add check that partition length needs to be aligned with block size
	netfilter: nf_tables: reject new basechain after table flag update
	netfilter: nf_tables: flush pending destroy work before exit_net release
	netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
	netfilter: validate user input for expected length
	vboxsf: Avoid an spurious warning if load_nls_xxx() fails
	bpf, sockmap: Prevent lock inversion deadlock in map delete elem
	net/sched: act_skbmod: prevent kernel-infoleak
	net: stmmac: fix rx queue priority assignment
	erspan: make sure erspan_base_hdr is present in skb->head
	selftests: reuseaddr_conflict: add missing new line at the end of the output
	ipv6: Fix infinite recursion in fib6_dump_done().
	udp: do not transition UDP GRO fraglist partial checksums to unnecessary
	octeontx2-pf: check negative error code in otx2_open()
	i40e: fix i40e_count_filters() to count only active/new filters
	i40e: fix vf may be used uninitialized in this function warning
	scsi: qla2xxx: Update manufacturer details
	scsi: qla2xxx: Update manufacturer detail
	Revert "usb: phy: generic: Get the vbus supply"
	udp: do not accept non-tunnel GSO skbs landing in a tunnel
	net: ravb: Always process TX descriptor ring
	arm64: dts: qcom: sc7180: Remove clock for bluetooth on Trogdor
	arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
	ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
	ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
	scsi: mylex: Fix sysfs buffer lengths
	ata: sata_mv: Fix PCI device ID table declaration compilation warning
	ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
	driver core: Introduce device_link_wait_removal()
	of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
	x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
	s390/entry: align system call table on 8 bytes
	riscv: Fix spurious errors from __get/put_kernel_nofault
	x86/bugs: Fix the SRSO mitigation on Zen3/4
	x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO
	mptcp: don't account accept() of non-MPC client as fallback to TCP
	x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word
	objtool: Add asm version of STACK_FRAME_NON_STANDARD
	wifi: ath9k: fix LNA selection in ath_ant_try_scan()
	VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host()
	panic: Flush kernel log buffer at the end
	arm64: dts: rockchip: fix rk3328 hdmi ports node
	arm64: dts: rockchip: fix rk3399 hdmi ports node
	ionic: set adminq irq affinity
	pstore/zone: Add a null pointer check to the psz_kmsg_read
	tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num()
	btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()
	btrfs: export: handle invalid inode or root reference in btrfs_get_parent()
	btrfs: send: handle path ref underflow in header iterate_inode_ref()
	net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list()
	Bluetooth: btintel: Fix null ptr deref in btintel_read_version
	Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails
	pinctrl: renesas: checker: Limit cfg reg enum checks to provided IDs
	sysv: don't call sb_bread() with pointers_lock held
	scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc()
	isofs: handle CDs with bad root inode but good Joliet root directory
	media: sta2x11: fix irq handler cast
	ext4: add a hint for block bitmap corrupt state in mb_groups
	ext4: forbid commit inconsistent quota data when errors=remount-ro
	drm/amd/display: Fix nanosec stat overflow
	SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int
	Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default"
	libperf evlist: Avoid out-of-bounds access
	block: prevent division by zero in blk_rq_stat_sum()
	RDMA/cm: add timeout to cm_destroy_id wait
	Input: allocate keycode for Display refresh rate toggle
	platform/x86: touchscreen_dmi: Add an extra entry for a variant of the Chuwi Vi8 tablet
	ktest: force $buildonly = 1 for 'make_warnings_file' test type
	ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment
	tools: iio: replace seekdir() in iio_generic_buffer
	usb: typec: tcpci: add generic tcpci fallback compatible
	usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined
	fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2
	drivers/nvme: Add quirks for device 126f:2262
	fbmon: prevent division by zero in fb_videomode_from_videomode()
	netfilter: nf_tables: release batch on table validation from abort path
	netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
	netfilter: nf_tables: discard table flag update with pending basechain deletion
	tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc
	virtio: reenable config if freezing device failed
	x86/mm/pat: fix VM_PAT handling in COW mappings
	drm/i915/gt: Reset queue_priority_hint on parking
	Bluetooth: btintel: Fixe build regression
	VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
	kbuild: dummy-tools: adjust to stricter stackprotector check
	scsi: sd: Fix wrong zone_write_granularity value during revalidate
	x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
	x86/head/64: Re-enable stack protection
	Linux 5.10.215

Change-Id: I45a0a9c4a0683ff5ef97315690f1f884f666e1b5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2024-06-01 11:03:55 +00:00
Vlastimil Babka
af47e6a95e mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
commit 803de9000f334b771afacb6ff3e78622916668b0 upstream.

Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO.  Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.

Quoting Sven:

1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
   with __GFP_RETRY_MAYFAIL set.

2. page alloc's __alloc_pages_slowpath tries to get a page from the
   freelist. This fails because there is nothing free of that costly
   order.

3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
   which bails out because a zone is ready to be compacted; it pretends
   to have made a single page of progress.

4. page alloc tries to compact, but this always bails out early because
   __GFP_IO is not set (it's not passed by the snd allocator, and even
   if it were, we are suspending so the __GFP_IO flag would be cleared
   anyway).

5. page alloc believes reclaim progress was made (because of the
   pretense in item 3) and so it checks whether it should retry
   compaction. The compaction retry logic thinks it should try again,
   because:
    a) reclaim is needed because of the early bail-out in item 4
    b) a zonelist is suitable for compaction

6. goto 2. indefinite stall.

(end quote)

The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries.  There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.

To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.

Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
  there's at least one attempt to actually reclaim, even if chances are
  small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
  return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
  which is then used to avoid retrying reclaim/compaction for costly
  allocations (step 5) if we can't compact and also to skip the early
  compaction attempt that we do in some cases

Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes: 3250845d05 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sven van Ashbrook <svenva@chromium.org>
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13 12:59:22 +02:00
liwei
0a9731879a ANDROID: mm: Add vendor hooks for recording when kswapd finishing the reclaim job
To monitor the reclaiming ability of kswapd, add vendor hook recording when the kswapd finish the reclaiming job and the reclaim progress.

android_vh_vmscan_kswpad_done(int, unsigned int, unsigned int, unsigned int)

Bug: 301044280

Change-Id: Id6e0a97003f0a156cff4d0996bc38bcd89b1dc69
Signed-off-by: John Hsu <john.hsu@mediatek.com>
Signed-off-by: liwei <liwei1234@oppo.com>
(cherry picked from commit 6c3dd25d2fdeff6fc752ef74e87e22f86ae1d939)
2024-02-02 17:56:46 +00:00
Peifeng Li
61344663df ANDROID: clear memory trylock-bit when page_locked.
Clearing trylock-bit of page shrinked by shrnk_page_list in advance
which avoids page in other scene with the trylock-bit.

The page with trylock-bit will be added ret_pages and handled in
trace_android_vh_handle_failed_page_trylock. In the progress[unlock_page, handled],
the page carried with trylock-bit will cause some error-issues in other
scene, so clear trylock-bit here.
trace_android_vh_page_trylock_get_result will clear trylock-bit and return
if page tyrlock failed in reclaim-process. Here we just want to clear
trylock-bit so that ignore page_trylock_result.

Fixes: 309a6bf81a20 ("ANDROID: vendor_hook: Add hook to not be stuck ro rmap lock in kswapd or direct_reclaim")

Bug: 240003372

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: Ifecd308573ef37a51e33856a0b3bb93cd67289ac
2023-04-12 15:05:56 +08:00
Greg Kroah-Hartman
982d7f3eb8 This is the 5.10.157 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmOKKmsACgkQONu9yGCS
 aT73ixAAwyEk1kuY9T0i4JfjPViD9Kg+v64lGLnM88CuGjkxcT4kv2Lg/hURDD+K
 pObBEaOWduKVxqH/4GqpeEpqrw3bxxQJUchw1F5C2ZsLjB5mA4u9U0dqExTPIeY2
 GSLdkBY/3yWBgDlpWsEHRjhzqx16ZuvHyvMGegHLG5+hNbfmfiFBhVpn8knTFaqv
 fXRyC9MAt072thjjuPG6QcWpWAFFTG0PWsEkNWGLw0U07FF+V7O9sWLontHi93sn
 seIEUPbjgGEFND2NqLfiLOLZ9m2fBB3P32L66b9rrZNZ2DPmyrNCD0WSLhlzb1OV
 8yXiVEkDUozkI6W8fzVtUUjH3gYvB9e37zCYPO6WnAl5cwGhCJz1cpQfN7g7hk9H
 iKpetcKf7XFBRmUq2Ftnaq7KPc81dVrQ5mYfrtsT9IYDnWMdF7AcOctN+dKkCS15
 QoiJklSeE28b4PZtdt7Uv7OF2qW6w+tMKSD3PJyiBHB46rcQjuuOy7ifa8VqaXHI
 ZO+mWUjMMUdo3q0lXoy2i5PMNrul41QMsdnrGaZxXU+LfaCVIubpHghSBHFhnFTY
 3r2Fko3ZOsuAOQXX5iCTCstCEev5LH0v74bou355Y0uteueCqpnc/GSEZ8KhP+M0
 kqpcyf3e6KAL7TA7eqQdptpFyDW732IgcbU4bQKUMd038Hb5I4o=
 =1JWA
 -----END PGP SIGNATURE-----

Merge 5.10.157 into android12-5.10-lts

Changes in 5.10.157
	scsi: scsi_transport_sas: Fix error handling in sas_phy_add()
	ata: libata-scsi: simplify __ata_scsi_queuecmd()
	ata: libata-core: do not issue non-internal commands once EH is pending
	bridge: switchdev: Notify about VLAN protocol changes
	bridge: switchdev: Fix memory leaks when changing VLAN protocol
	drm/display: Don't assume dual mode adaptors support i2c sub-addressing
	nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFH
	nvme-pci: add NVME_QUIRK_BOGUS_NID for Micron Nitro
	iio: ms5611: Simplify IO callback parameters
	iio: pressure: ms5611: fixed value compensation bug
	ceph: do not update snapshot context when there is no new snapshot
	ceph: avoid putting the realm twice when decoding snaps fails
	wifi: mac80211: fix memory free error when registering wiphy fail
	wifi: mac80211_hwsim: fix debugfs attribute ps with rc table support
	riscv: dts: sifive unleashed: Add PWM controlled LEDs
	audit: fix undefined behavior in bit shift for AUDIT_BIT
	wifi: airo: do not assign -1 to unsigned char
	wifi: mac80211: Fix ack frame idr leak when mesh has no route
	spi: stm32: fix stm32_spi_prepare_mbr() that halves spi clk for every run
	selftests/bpf: Add verifier test for release_reference()
	Revert "net: macsec: report real_dev features when HW offloading is enabled"
	platform/x86: touchscreen_dmi: Add info for the RCA Cambio W101 v2 2-in-1
	scsi: ibmvfc: Avoid path failures during live migration
	scsi: scsi_debug: Make the READ CAPACITY response compliant with ZBC
	drm: panel-orientation-quirks: Add quirk for Acer Switch V 10 (SW5-017)
	block, bfq: fix null pointer dereference in bfq_bio_bfqg()
	arm64/syscall: Include asm/ptrace.h in syscall_wrapper header.
	RISC-V: vdso: Do not add missing symbols to version section in linker script
	MIPS: pic32: treat port as signed integer
	xfrm: fix "disable_policy" on ipv4 early demux
	xfrm: replay: Fix ESN wrap around for GSO
	af_key: Fix send_acquire race with pfkey_register
	ARM: dts: am335x-pcm-953: Define fixed regulators in root node
	ASoC: hdac_hda: fix hda pcm buffer overflow issue
	ASoC: sgtl5000: Reset the CHIP_CLK_CTRL reg on remove
	ASoC: soc-pcm: Don't zero TDM masks in __soc_pcm_open()
	scsi: storvsc: Fix handling of srb_status and capacity change events
	regulator: core: fix kobject release warning and memory leak in regulator_register()
	spi: dw-dma: decrease reference count in dw_spi_dma_init_mfld()
	regulator: core: fix UAF in destroy_regulator()
	bus: sunxi-rsb: Support atomic transfers
	tee: optee: fix possible memory leak in optee_register_device()
	ARM: dts: at91: sam9g20ek: enable udc vbus gpio pinctrl
	net: liquidio: simplify if expression
	rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc
	rxrpc: Use refcount_t rather than atomic_t
	rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]
	nfc/nci: fix race with opening and closing
	net: pch_gbe: fix potential memleak in pch_gbe_tx_queue()
	9p/fd: fix issue of list_del corruption in p9_fd_cancel()
	netfilter: conntrack: Fix data-races around ct mark
	ARM: mxs: fix memory leak in mxs_machine_init()
	ARM: dts: imx6q-prti6q: Fix ref/tcxo-clock-frequency properties
	net: ethernet: mtk_eth_soc: fix error handling in mtk_open()
	net/mlx4: Check retval of mlx4_bitmap_init
	net/qla3xxx: fix potential memleak in ql3xxx_send()
	net: pch_gbe: fix pci device refcount leak while module exiting
	nfp: fill splittable of devlink_port_attrs correctly
	nfp: add port from netdev validation for EEPROM access
	macsec: Fix invalid error code set
	Drivers: hv: vmbus: fix double free in the error path of vmbus_add_channel_work()
	Drivers: hv: vmbus: fix possible memory leak in vmbus_device_register()
	netfilter: ipset: Limit the maximal range of consecutive elements to add/delete
	netfilter: ipset: regression in ip_set_hash_ip.c
	net/mlx5: Fix FW tracer timestamp calculation
	net/mlx5: Fix handling of entry refcount when command is not issued to FW
	tipc: set con sock in tipc_conn_alloc
	tipc: add an extra conn_get in tipc_conn_alloc
	tipc: check skb_linearize() return value in tipc_disc_rcv()
	xfrm: Fix ignored return value in xfrm6_init()
	sfc: fix potential memleak in __ef100_hard_start_xmit()
	net: sched: allow act_ct to be built without NF_NAT
	NFC: nci: fix memory leak in nci_rx_data_packet()
	regulator: twl6030: re-add TWL6032_SUBCLASS
	bnx2x: fix pci device refcount leak in bnx2x_vf_is_pcie_pending()
	dma-buf: fix racing conflict of dma_heap_add()
	netfilter: flowtable_offload: add missing locking
	dccp/tcp: Reset saddr on failure after inet6?_hash_connect().
	ipv4: Fix error return code in fib_table_insert()
	s390/dasd: fix no record found for raw_track_access
	net: arcnet: Fix RESET flag handling
	arcnet: fix potential memory leak in com20020_probe()
	nfc: st-nci: fix incorrect validating logic in EVT_TRANSACTION
	nfc: st-nci: fix memory leaks in EVT_TRANSACTION
	net: thunderx: Fix the ACPI memory leak
	s390/crashdump: fix TOD programmable field size
	net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled
	net: enetc: cache accesses to &priv->si->hw
	net: enetc: preserve TX ring priority across reconfiguration
	lib/vdso: use "grep -E" instead of "egrep"
	usb: dwc3: exynos: Fix remove() function
	ext4: fix use-after-free in ext4_ext_shift_extents
	arm64: dts: rockchip: lower rk3399-puma-haikou SD controller clock frequency
	iio: light: apds9960: fix wrong register for gesture gain
	iio: core: Fix entry not deleted when iio_register_sw_trigger_type() fails
	init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash
	nios2: add FORCE for vmlinuz.gz
	mmc: sdhci-brcmstb: Re-organize flags
	mmc: sdhci-brcmstb: Enable Clock Gating to save power
	mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI
	usb: cdns3: Add support for DRD CDNSP
	ceph: make ceph_create_session_msg a global symbol
	ceph: make iterate_sessions a global symbol
	ceph: flush mdlog before umounting
	ceph: flush the mdlog before waiting on unsafe reqs
	ceph: fix off by one bugs in unsafe_request_wait()
	ceph: put the requests/sessions when it fails to alloc memory
	ceph: fix possible NULL pointer dereference for req->r_session
	ceph: Use kcalloc for allocating multiple elements
	ceph: fix NULL pointer dereference for req->r_session
	usb: dwc3: gadget: conditionally remove requests
	usb: dwc3: gadget: Return -ESHUTDOWN on ep disable
	usb: dwc3: gadget: Clear ep descriptor last
	nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty
	gcov: clang: fix the buffer overflow issue
	mm: vmscan: fix extreme overreclaim and swap floods
	KVM: x86: nSVM: leave nested mode on vCPU free
	KVM: x86: remove exit_int_info warning in svm_handle_exit
	x86/ioremap: Fix page aligned size calculation in __ioremap_caller()
	binder: avoid potential data leakage when copying txn
	binder: read pre-translated fds from sender buffer
	binder: defer copies of pre-patched txn data
	binder: fix pointer cast warning
	binder: Address corner cases in deferred copy and fixup
	binder: Gracefully handle BINDER_TYPE_FDA objects with num_fds=0
	Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode
	ASoC: Intel: bytcht_es8316: Add quirk for the Nanote UMPC-01
	serial: 8250: 8250_omap: Avoid RS485 RTS glitch on ->set_termios()
	Input: goodix - try resetting the controller when no config is set
	Input: soc_button_array - add use_low_level_irq module parameter
	Input: soc_button_array - add Acer Switch V 10 to dmi_use_low_level_irq[]
	xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too
	xen/platform-pci: add missing free_irq() in error path
	platform/x86: asus-wmi: add missing pci_dev_put() in asus_wmi_set_xusb2pr()
	platform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017)
	zonefs: fix zone report size in __zonefs_io_error()
	platform/x86: hp-wmi: Ignore Smart Experience App event
	tcp: configurable source port perturb table size
	net: usb: qmi_wwan: add Telit 0x103a composition
	gpu: host1x: Avoid trying to use GART on Tegra20
	dm integrity: flush the journal on suspend
	dm integrity: clear the journal on suspend
	wifi: wilc1000: validate pairwise and authentication suite offsets
	wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_OPER_CHANNEL attribute
	wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_CHANNEL_LIST attribute
	wifi: wilc1000: validate number of channels
	genirq/msi: Shutdown managed interrupts with unsatifiable affinities
	genirq: Always limit the affinity to online CPUs
	irqchip/gic-v3: Always trust the managed affinity provided by the core code
	genirq: Take the proposed affinity at face value if force==true
	btrfs: free btrfs_path before copying root refs to userspace
	btrfs: free btrfs_path before copying fspath to userspace
	btrfs: free btrfs_path before copying subvol info to userspace
	btrfs: sysfs: normalize the error handling branch in btrfs_init_sysfs()
	drm/amd/dc/dce120: Fix audio register mapping, stop triggering KASAN
	drm/amdgpu: always register an MMU notifier for userptr
	drm/i915: fix TLB invalidation for Gen12 video and compute engines
	fuse: lock inode unconditionally in fuse_fallocate()
	Linux 5.10.157

Change-Id: Ie53a7379c392879de240237eb8258857b59564a6
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2022-12-04 12:51:21 +00:00
Johannes Weiner
d925dd3e44 mm: vmscan: fix extreme overreclaim and swap floods
commit f53af4285d775cd9a9a146fc438bd0a1bee1838a upstream.

During proactive reclaim, we sometimes observe severe overreclaim, with
several thousand times more pages reclaimed than requested.

This trace was obtained from shrink_lruvec() during such an instance:

    prio:0 anon_cost:1141521 file_cost:7767
    nr_reclaimed:4387406 nr_to_reclaim:1047 (or_factor:4190)
    nr=[7161123 345 578 1111]

While he reclaimer requested 4M, vmscan reclaimed close to 16G, most of it
by swapping.  These requests take over a minute, during which the write()
to memory.reclaim is unkillably stuck inside the kernel.

Digging into the source, this is caused by the proportional reclaim
bailout logic.  This code tries to resolve a fundamental conflict: to
reclaim roughly what was requested, while also aging all LRUs fairly and
in accordance to their size, swappiness, refault rates etc.  The way it
attempts fairness is that once the reclaim goal has been reached, it stops
scanning the LRUs with the smaller remaining scan targets, and adjusts the
remainder of the bigger LRUs according to how much of the smaller LRUs was
scanned.  It then finishes scanning that remainder regardless of the
reclaim goal.

This works fine if priority levels are low and the LRU lists are
comparable in size.  However, in this instance, the cgroup that is
targeted by proactive reclaim has almost no files left - they've already
been squeezed out by proactive reclaim earlier - and the remaining anon
pages are hot.  Anon rotations cause the priority level to drop to 0,
which results in reclaim targeting all of anon (a lot) and all of file
(almost nothing).  By the time reclaim decides to bail, it has scanned
most or all of the file target, and therefor must also scan most or all of
the enormous anon target.  This target is thousands of times larger than
the reclaim goal, thus causing the overreclaim.

The bailout code hasn't changed in years, why is this failing now?  The
most likely explanations are two other recent changes in anon reclaim:

1. Before the series starting with commit 5df741963d ("mm: fix LRU
   balancing effect of new transparent huge pages"), the VM was
   overall relatively reluctant to swap at all, even if swap was
   configured. This means the LRU balancing code didn't come into play
   as often as it does now, and mostly in high pressure situations
   where pronounced swap activity wouldn't be as surprising.

2. For historic reasons, shrink_lruvec() loops on the scan targets of
   all LRU lists except the active anon one, meaning it would bail if
   the only remaining pages to scan were active anon - even if there
   were a lot of them.

   Before the series starting with commit ccc5dc6734 ("mm/vmscan:
   make active/inactive ratio as 1:1 for anon lru"), most anon pages
   would live on the active LRU; the inactive one would contain only a
   handful of preselected reclaim candidates. After the series, anon
   gets aged similarly to file, and the inactive list is the default
   for new anon pages as well, making it often the much bigger list.

   As a result, the VM is now more likely to actually finish large
   anon targets than before.

Change the code such that only one SWAP_CLUSTER_MAX-sized nudge toward the
larger LRU lists is made before bailing out on a met reclaim goal.

This fixes the extreme overreclaim problem.

Fairness is more subtle and harder to evaluate.  No obvious misbehavior
was observed on the test workload, in any case.  Conceptually, fairness
should primarily be a cumulative effect from regular, lower priority
scans.  Once the VM is in trouble and needs to escalate scan targets to
make forward progress, fairness needs to take a backseat.  This is also
acknowledged by the myriad exceptions in get_scan_count().  This patch
makes fairness decrease gradually, as it keeps fairness work static over
increasing priority levels with growing scan targets.  This should make
more sense - although we may have to re-visit the exact values.

Link: https://lkml.kernel.org/r/20220802162811.39216-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-02 17:40:04 +01:00
Peifeng Li
af699fd6a2 ANDROID: vendor_hook: skip trace_android_vh_page_trylock_set when ignore_references is true
Avoid async-reclaim to cause to reclaim-delay when ignore_references is true.

Bug: 240003372
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: Iaf50bd4ac53f748da0dac93324c6d94de11e01e9
2022-10-25 20:31:00 +00:00
Minchan Kim
c35cda5280 BACKPORT: mm: don't be stuck to rmap lock on reclaim path
The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended
under memory pressure if processes keep working on their vmas(e.g., fork,
mmap, munmap).  It makes reclaim path stuck.  In our real workload traces,
we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it
makes other processes entering direct reclaim, which were also stuck on
the lock.

This patch makes lru aging path try_lock mode like shink_page_list so the
reclaim context will keep working with next lru pages without being stuck.
if it found the rmap lock contended, it rotates the page back to head of
lru in both active/inactive lrus to make them consistent behavior, which
is basic starting point rather than adding more heristic.

Since this patch introduces a new "contended" field as out-param along
with try_lock in-param in rmap_walk_control, it's not immutable any longer
if the try_lock is set so remove const keywords on rmap related functions.
Since rmap walking is already expensive operation, I doubt the const
would help sizable benefit( And we didn't have it until 5.17).

In a heavy app workload in Android, trace shows following statistics.  It
almost removes rmap lock contention from reclaim path.

Martin Liu reported:

Before:

   max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
         1632            0            1631   151.542173        31672    209  page_lock_anon_vma_read
          601            0             601   145.544681        28817    198  rmap_walk_file

After:

   max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
          NaN          NaN              NaN          NaN          NaN    0.0             NaN
            0            0                0     0.127645            1     12  rmap_walk_file

[minchan@kernel.org: add comment, per Matthew]
  Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com
Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: John Dias <joaodias@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Martin Liu <liumartin@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Conflicts:
	folio->page

(cherry picked from commit 6d4675e601357834dadd2ba1d803f6484596015c)
Bug: 239681156
Bug: 252333201
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I0c63e0291120c8a1b5f2d83b8a7b210cb56c27a2
Signed-off-by: chenxin <chenxinxin@xiaomi.corp-partner.google.com>
2022-10-11 16:33:36 +00:00
Peifeng Li
f50f24e781 ANDROID: vendor_hooks: Add hooks for lookaround
Add hooks for support lookaround in memory reclamation.

- android_vh_test_clear_look_around_ref
- android_vh_check_page_look_around_ref
- android_vh_look_around_migrate_page
- android_vh_look_around
Bug: 241079328

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I9a606ae71d2f1303df3b02403b30bc8fdc9d06dd
2022-09-13 19:07:41 +08:00
Greg Kroah-Hartman
7b0822a261 Revert "ANDROID: vendor_hooks: tune reclaim scan type for specified mem_cgroup"
This reverts commit e5b4949bfc.

The hook android_vh_tune_memcg_scan_type is not used by any vendor, so
remove it to help with merge issues with future LTS releases.

If this is needed by any real user, it can easily be reverted to add it
back and then the symbol should be added to the abi list at the same
time to prevent it from being removed again later.

Bug: 203756332
Bug: 230450931
Cc: xiaofeng <xiaofeng5@xiaomi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0e32c24d67a9ede087eca5005796512a9451c1e2
2022-08-24 18:50:22 +00:00
Peifeng Li
9ecb2fcca3 ANDROID: avoid huge-page not to clear trylock-bit after shrink_page_list.
Clearing trylock-bit of page shrinked by shrnk_page_list in advance
which avoids huge-page not to clear trylock-bit after shrink_page_list.

Fixes: 1f8f6d59a2 ("ANDROID: vendor_hook: Add hook to not be stuck ro rmap lock in kswapd or direct_reclaim")

Bug: 240003372
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: Iac4d60ec3497d9bb7ba1f001a5c08a604daf4f5a
2022-08-24 02:35:43 +00:00
Peifeng Li
fb39cdb9ea ANDROID: export reclaim_pages
export reclaim_pages to support to reclaim pages in drivers-page-list.

Bug: 240003372
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: If92ed66ec9ce9dd66565497e6774d89911652429
2022-08-15 15:51:29 +00:00
Peifeng Li
1f8f6d59a2 ANDROID: vendor_hook: Add hook to not be stuck ro rmap lock in kswapd or direct_reclaim
Add hooks to support trylock in rmaplock when reclaiming in kswapd or
direct_reclaim, in order to avoid wait lock for a long time.

- android_vh_handle_failed_page_trylock
- android_vh_page_trylock_set
- android_vh_page_trylock_clear
- android_vh_page_trylock_get_result
- android_vh_do_page_trylock

Bug: 240003372

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I0f605b35ae41f15b3ca7bc72cd5f003175c318a5
2022-08-14 23:08:07 +08:00
Peifeng Li
e56f8712cf ANDROID: vendor_hooks: protect multi-mapcount pages in kernel
Support two hooks as follows to protect multi-mapcount pages in kernel:

- trace_android_vh_page_should_be_protect
- trace_android_vh_mapped_page_try_sorthead

Bug: 236578020
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I688aceabf17d9de2feac7c3ad7144d307de6ef29
2022-08-03 20:10:45 +00:00
Peifeng Li
3f775b9367 ANDROID: vendor_hooks: account page-mapcount
Support five hooks as follows to account
the amount of multi-mapped pages in kernel:

- android_vh_show_mapcount_pages
- android_vh_do_traversal_lruvec
- android_vh_update_page_mapcount
- android_vh_add_page_to_lrulist
- android_vh_del_page_from_lrulist

Bug: 236578020
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: Ia2c7015aab442be7dbb496b8b630b9dff59ab935
2022-08-03 20:10:45 +00:00
Charan Teja Reddy
1002747429 ANDROID: mm: shmem: use reclaim_pages() to recalim pages from a list
Static code analysis tool reported NULL pointer access in
shrink_page_list() as the commit 26aa2d199d6f2 ("mm/migrate: demote
pages during reclaim") expects valid pgdat.

There is already an existing api, reclaim_pages, that tries to reclaim
pages from the list. use it instead of creating custom function.

Bug: 201263305
Fixes: 96f80f628451 ("ANDROID: mm: add reclaim_shmem_address_space() for faster reclaims")
Change-Id: Iaa11feac94c9e8338324ace0276c49d6a0adeb0c
Signed-off-by: Charan Teja Reddy <quic_charante@quicinc.com>
2022-07-07 15:49:09 +00:00
Bing Han
6b7243da5e ANDROID: vendor_hook: Add hook in snapshot_refaults()
Provide a vendor hook android_vh_snapshot_refaults to record the
refault statistics of WORKINGSET_RESTORE_ANON;

Bug: 234214858
Signed-off-by: Bing Han <bing.han@transsion.com>
Change-Id: I20eb5ea99bf21fa8ba34b45e87d2ab9e9cdca167
2022-06-30 03:00:23 +00:00
Bing Han
6b04959511 ANDROID: vendor_hook: Add hook in inactive_is_low()
Provide a vendor hook android_vh_inactive_is_low to replace the
calculation process of inactive_ratio.
The alternative calculation algorithm takes into account the
difference between file pages and anonymous pages.

Bug: 234214858
Signed-off-by: Bing Han <bing.han@transsion.com>
Change-Id: I6cf9c47fbc440852cc36e04f49d644146eb2c6af
2022-06-30 03:00:23 +00:00
xiaofeng
e5b4949bfc ANDROID: vendor_hooks: tune reclaim scan type for specified mem_cgroup
Add memcg support for hooks in the reclaim path

Bug: 230450931
Change-Id: Ia3e6949985d915c8640139fbb93800d91e1e46f8
Signed-off-by: xiaofeng <xiaofeng5@xiaomi.com>
2022-04-29 20:11:57 +00:00
Liujie Xie
95380146ce ANDROID: vendor_hooks: Add hook in shrink_node_memcgs
Add vendor hook in shrink_node_memcgs to adjust whether
to skip memory reclamation of memcg.

Bug: 226482420
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I925856353e63c5a821027de4f8476c833e21b982
2022-03-29 19:43:45 +00:00
Peifeng Li
71d560e017 ANDROID: vendor_hooks: export shrink_slab
Export shrink_slab to module for do shrink-memory action.

Bug: 221768451
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I5abe9ad419d64999b714d879c228625a243e90d1
2022-03-28 21:26:10 +00:00
Liujie Xie
f2d0c30576 ANDROID: vendor_hooks: Add hooks for shrink_active_list
Provide a vendor hook to allow page_referenced to be skipped
during shrink_active_list to avoid heavy cpuloading caused by
it.

Bug: 220878851
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Ie0e369f8f8739fea59a95470af20ab0e976869d1
2022-03-01 00:02:41 +00:00
Greg Kroah-Hartman
b3e6d6eec6 Revert "ANDROID: vendor_hooks: add hook and OEM data for slab shrink"
This reverts commit bf769b7216.

The hook android_vh_do_shrink_slab is not used by any vendor, so remove
it to help with merge issues with future LTS releases.

If this is needed by any real user, it can easily be reverted to add it
back and then the symbol should be added to the abi list at the same
time to prevent it from being removed again later.

Bug: 203756332
Bug: 188684131
Cc: rongqianfeng <rongqianfeng@vivo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I814e3aad76770140063fad4a81951fc025b2b1dd
2022-01-21 13:11:10 +01:00
Greg Kroah-Hartman
08ed4cb090 Merge 5.10.67 into android12-5.10-lts
Changes in 5.10.67
	rtc: tps65910: Correct driver module alias
	io_uring: limit fixed table size by RLIMIT_NOFILE
	io_uring: place fixed tables under memcg limits
	io_uring: add ->splice_fd_in checks
	io_uring: fail links of cancelled timeouts
	io-wq: fix wakeup race when adding new work
	btrfs: wake up async_delalloc_pages waiters after submit
	btrfs: reset replace target device to allocation state on close
	blk-zoned: allow zone management send operations without CAP_SYS_ADMIN
	blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN
	PCI/MSI: Skip masking MSI-X on Xen PV
	powerpc/perf/hv-gpci: Fix counter value parsing
	xen: fix setting of max_pfn in shared_info
	9p/xen: Fix end of loop tests for list_for_each_entry
	ceph: fix dereference of null pointer cf
	selftests/ftrace: Fix requirement check of README file
	tools/thermal/tmon: Add cross compiling support
	clk: socfpga: agilex: fix the parents of the psi_ref_clk
	clk: socfpga: agilex: fix up s2f_user0_clk representation
	clk: socfpga: agilex: add the bypass register for s2f_usr0 clock
	pinctrl: stmfx: Fix hazardous u8[] to unsigned long cast
	pinctrl: ingenic: Fix incorrect pull up/down info
	soc: qcom: aoss: Fix the out of bound usage of cooling_devs
	soc: aspeed: lpc-ctrl: Fix boundary check for mmap
	soc: aspeed: p2a-ctrl: Fix boundary check for mmap
	arm64: mm: Fix TLBI vs ASID rollover
	arm64: head: avoid over-mapping in map_memory
	iio: ltc2983: fix device probe
	wcn36xx: Ensure finish scan is not requested before start scan
	crypto: public_key: fix overflow during implicit conversion
	block: bfq: fix bfq_set_next_ioprio_data()
	power: supply: max17042: handle fails of reading status register
	dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()
	crypto: ccp - shutdown SEV firmware on kexec
	VMCI: fix NULL pointer dereference when unmapping queue pair
	media: uvc: don't do DMA on stack
	media: rc-loopback: return number of emitters rather than error
	s390/qdio: fix roll-back after timeout on ESTABLISH ccw
	s390/qdio: cancel the ESTABLISH ccw after timeout
	Revert "dmaengine: imx-sdma: refine to load context only once"
	dmaengine: imx-sdma: remove duplicated sdma_load_context
	libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs
	ARM: 9105/1: atags_to_fdt: don't warn about stack size
	f2fs: fix to do sanity check for sb/cp fields correctly
	PCI/portdrv: Enable Bandwidth Notification only if port supports it
	PCI: Restrict ASMedia ASM1062 SATA Max Payload Size Supported
	PCI: Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure
	PCI: xilinx-nwl: Enable the clock through CCF
	PCI: aardvark: Configure PCIe resources from 'ranges' DT property
	PCI: Export pci_pio_to_address() for module use
	PCI: aardvark: Fix checking for PIO status
	PCI: aardvark: Fix masking and unmasking legacy INTx interrupts
	HID: input: do not report stylus battery state as "full"
	f2fs: quota: fix potential deadlock
	pinctrl: remove empty lines in pinctrl subsystem
	pinctrl: armada-37xx: Correct PWM pins definitions
	scsi: bsg: Remove support for SCSI_IOCTL_SEND_COMMAND
	clk: rockchip: drop GRF dependency for rk3328/rk3036 pll types
	IB/hfi1: Adjust pkey entry in index 0
	RDMA/iwcm: Release resources if iw_cm module initialization fails
	docs: Fix infiniband uverbs minor number
	scsi: BusLogic: Use %X for u32 sized integer rather than %lX
	pinctrl: samsung: Fix pinctrl bank pin count
	vfio: Use config not menuconfig for VFIO_NOIOMMU
	scsi: ufs: Fix memory corruption by ufshcd_read_desc_param()
	cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards
	powerpc/stacktrace: Include linux/delay.h
	RDMA/efa: Remove double QP type assignment
	RDMA/mlx5: Delete not-available udata check
	cpuidle: pseries: Mark pseries_idle_proble() as __init
	f2fs: reduce the scope of setting fsck tag when de->name_len is zero
	openrisc: don't printk() unconditionally
	dma-debug: fix debugfs initialization order
	NFSv4/pNFS: Fix a layoutget livelock loop
	NFSv4/pNFS: Always allow update of a zero valued layout barrier
	NFSv4/pnfs: The layout barrier indicate a minimal value for the seqid
	SUNRPC: Fix potential memory corruption
	SUNRPC/xprtrdma: Fix reconnection locking
	SUNRPC query transport's source port
	sunrpc: Fix return value of get_srcport()
	scsi: fdomain: Fix error return code in fdomain_probe()
	pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry()
	powerpc/numa: Consider the max NUMA node for migratable LPAR
	scsi: smartpqi: Fix an error code in pqi_get_raid_map()
	scsi: qedi: Fix error codes in qedi_alloc_global_queues()
	scsi: qedf: Fix error codes in qedf_alloc_global_queues()
	powerpc/config: Renable MTD_PHYSMAP_OF
	iommu/vt-d: Update the virtual command related registers
	HID: i2c-hid: Fix Elan touchpad regression
	clk: imx8m: fix clock tree update of TF-A managed clocks
	KVM: PPC: Book3S HV: Fix copy_tofrom_guest routines
	scsi: ufs: ufs-exynos: Fix static checker warning
	KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs are live
	platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call
	powerpc/smp: Update cpu_core_map on all PowerPc systems
	RDMA/hns: Fix QP's resp incomplete assignment
	fscache: Fix cookie key hashing
	clk: at91: clk-generated: Limit the requested rate to our range
	KVM: PPC: Fix clearing never mapped TCEs in realmode
	soc: mediatek: cmdq: add address shift in jump
	f2fs: fix to account missing .skipped_gc_rwsem
	f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
	f2fs: fix to unmap pages from userspace process in punch_hole()
	f2fs: deallocate compressed pages when error happens
	f2fs: should put a page beyond EOF when preparing a write
	MIPS: Malta: fix alignment of the devicetree buffer
	kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
	userfaultfd: prevent concurrent API initialization
	drm/vc4: hdmi: Set HD_CTL_WHOLSMP and HD_CTL_CHALIGN_SET
	drm/amdgpu: Fix amdgpu_ras_eeprom_init()
	ASoC: atmel: ATMEL drivers don't need HAS_DMA
	media: dib8000: rewrite the init prbs logic
	libbpf: Fix reuse of pinned map on older kernel
	x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable
	crypto: mxs-dcp - Use sg_mapping_iter to copy data
	PCI: Use pci_update_current_state() in pci_enable_device_flags()
	tipc: keep the skb in rcv queue until the whole data is read
	net: phy: Fix data type in DP83822 dp8382x_disable_wol()
	iio: dac: ad5624r: Fix incorrect handling of an optional regulator.
	iavf: do not override the adapter state in the watchdog task
	iavf: fix locking of critical sections
	ARM: dts: qcom: apq8064: correct clock names
	video: fbdev: kyro: fix a DoS bug by restricting user input
	netlink: Deal with ESRCH error in nlmsg_notify()
	Smack: Fix wrong semantics in smk_access_entry()
	drm: avoid blocking in drm_clients_info's rcu section
	drm: serialize drm_file.master with a new spinlock
	drm: protect drm_master pointers in drm_lease.c
	rcu: Fix macro name CONFIG_TASKS_RCU_TRACE
	igc: Check if num of q_vectors is smaller than max before array access
	usb: host: fotg210: fix the endpoint's transactional opportunities calculation
	usb: host: fotg210: fix the actual_length of an iso packet
	usb: gadget: u_ether: fix a potential null pointer dereference
	USB: EHCI: ehci-mv: improve error handling in mv_ehci_enable()
	usb: gadget: composite: Allow bMaxPower=0 if self-powered
	staging: board: Fix uninitialized spinlock when attaching genpd
	tty: serial: jsm: hold port lock when reporting modem line changes
	bus: fsl-mc: fix mmio base address for child DPRCs
	selftests: firmware: Fix ignored return val of asprintf() warn
	drm/amd/display: Fix timer_per_pixel unit error
	media: hantro: vp8: Move noisy WARN_ON to vpu_debug
	media: platform: stm32: unprepare clocks at handling errors in probe
	media: atomisp: Fix runtime PM imbalance in atomisp_pci_probe
	media: atomisp: pci: fix error return code in atomisp_pci_probe()
	nfp: fix return statement in nfp_net_parse_meta()
	ethtool: improve compat ioctl handling
	drm/amdgpu: Fix a printing message
	drm/amd/amdgpu: Update debugfs link_settings output link_rate field in hex
	bpf/tests: Fix copy-and-paste error in double word test
	bpf/tests: Do not PASS tests without actually testing the result
	drm/bridge: nwl-dsi: Avoid potential multiplication overflow on 32-bit
	arm64: dts: allwinner: h6: tanix-tx6: Fix regulator node names
	video: fbdev: asiliantfb: Error out if 'pixclock' equals zero
	video: fbdev: kyro: Error out if 'pixclock' equals zero
	video: fbdev: riva: Error out if 'pixclock' equals zero
	ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs()
	flow_dissector: Fix out-of-bounds warnings
	s390/jump_label: print real address in a case of a jump label bug
	s390: make PCI mio support a machine flag
	serial: 8250: Define RX trigger levels for OxSemi 950 devices
	xtensa: ISS: don't panic in rs_init
	hvsi: don't panic on tty_register_driver failure
	serial: 8250_pci: make setup_port() parameters explicitly unsigned
	staging: ks7010: Fix the initialization of the 'sleep_status' structure
	samples: bpf: Fix tracex7 error raised on the missing argument
	libbpf: Fix race when pinning maps in parallel
	ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init()
	Bluetooth: skip invalid hci_sync_conn_complete_evt
	workqueue: Fix possible memory leaks in wq_numa_init()
	ARM: dts: stm32: Set {bitclock,frame}-master phandles on DHCOM SoM
	ARM: dts: stm32: Set {bitclock,frame}-master phandles on ST DKx
	ARM: dts: stm32: Update AV96 adv7513 node per dtbs_check
	bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler()
	ARM: dts: at91: use the right property for shutdown controller
	arm64: tegra: Fix Tegra194 PCIe EP compatible string
	ASoC: Intel: bytcr_rt5640: Move "Platform Clock" routes to the maps for the matching in-/output
	ASoC: Intel: update sof_pcm512x quirks
	media: imx258: Rectify mismatch of VTS value
	media: imx258: Limit the max analogue gain to 480
	media: v4l2-dv-timings.c: fix wrong condition in two for-loops
	media: TDA1997x: fix tda1997x_query_dv_timings() return value
	media: tegra-cec: Handle errors of clk_prepare_enable()
	gfs2: Fix glock recursion in freeze_go_xmote_bh
	arm64: dts: qcom: sdm630: Rewrite memory map
	arm64: dts: qcom: sdm630: Fix TLMM node and pinctrl configuration
	serial: 8250_omap: Handle optional overrun-throttle-ms property
	ARM: dts: imx53-ppd: Fix ACHC entry
	arm64: dts: qcom: ipq8074: fix pci node reg property
	arm64: dts: qcom: sdm660: use reg value for memory node
	arm64: dts: qcom: ipq6018: drop '0x' from unit address
	arm64: dts: qcom: sdm630: don't use underscore in node name
	arm64: dts: qcom: msm8994: don't use underscore in node name
	arm64: dts: qcom: msm8996: don't use underscore in node name
	arm64: dts: qcom: sm8250: Fix epss_l3 unit address
	nvmem: qfprom: Fix up qfprom_disable_fuse_blowing() ordering
	net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe()
	drm/msm: mdp4: drop vblank get/put from prepare/complete_commit
	drm/msm/dsi: Fix DSI and DSI PHY regulator config from SDM660
	drm: xlnx: zynqmp_dpsub: Call pm_runtime_get_sync before setting pixel clock
	drm: xlnx: zynqmp: release reset to DP controller before accessing DP registers
	thunderbolt: Fix port linking by checking all adapters
	drm/amd/display: fix missing writeback disablement if plane is removed
	drm/amd/display: fix incorrect CM/TF programming sequence in dwb
	selftests/bpf: Fix xdp_tx.c prog section name
	drm/vmwgfx: fix potential UAF in vmwgfx_surface.c
	Bluetooth: schedule SCO timeouts with delayed_work
	Bluetooth: avoid circular locks in sco_sock_connect
	drm/msm/dp: return correct edid checksum after corrupted edid checksum read
	net/mlx5: Fix variable type to match 64bit
	gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port()
	drm/display: fix possible null-pointer dereference in dcn10_set_clock()
	mac80211: Fix monitor MTU limit so that A-MSDUs get through
	ARM: tegra: acer-a500: Remove bogus USB VBUS regulators
	ARM: tegra: tamonten: Fix UART pad setting
	arm64: tegra: Fix compatible string for Tegra132 CPUs
	arm64: dts: ls1046a: fix eeprom entries
	nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data
	nvme: code command_id with a genctr for use-after-free validation
	Bluetooth: Fix handling of LE Enhanced Connection Complete
	opp: Don't print an error if required-opps is missing
	serial: sh-sci: fix break handling for sysrq
	iomap: pass writeback errors to the mapping
	tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
	rpc: fix gss_svc_init cleanup on failure
	selftests/bpf: Fix flaky send_signal test
	hwmon: (pmbus/ibm-cffps) Fix write bits for LED control
	staging: rts5208: Fix get_ms_information() heap buffer size
	net: Fix offloading indirect devices dependency on qdisc order creation
	kselftest/arm64: mte: Fix misleading output when skipping tests
	kselftest/arm64: pac: Fix skipping of tests on systems without PAC
	gfs2: Don't call dlm after protocol is unmounted
	usb: chipidea: host: fix port index underflow and UBSAN complains
	lockd: lockd server-side shouldn't set fl_ops
	drm/exynos: Always initialize mapping in exynos_drm_register_dma()
	rtl8xxxu: Fix the handling of TX A-MPDU aggregation
	rtw88: use read_poll_timeout instead of fixed sleep
	rtw88: wow: build wow function only if CONFIG_PM is on
	rtw88: wow: fix size access error of probe request
	octeontx2-pf: Fix NIX1_RX interface backpressure
	m68knommu: only set CONFIG_ISA_DMA_API for ColdFire sub-arch
	btrfs: tree-log: check btrfs_lookup_data_extent return value
	soundwire: intel: fix potential race condition during power down
	ASoC: Intel: Skylake: Fix module configuration for KPB and MIXER
	ASoC: Intel: Skylake: Fix passing loadable flag for module
	of: Don't allow __of_attached_node_sysfs() without CONFIG_SYSFS
	mmc: sdhci-of-arasan: Modified SD default speed to 19MHz for ZynqMP
	mmc: sdhci-of-arasan: Check return value of non-void funtions
	mmc: rtsx_pci: Fix long reads when clock is prescaled
	selftests/bpf: Enlarge select() timeout for test_maps
	mmc: core: Return correct emmc response in case of ioctl error
	cifs: fix wrong release in sess_alloc_buffer() failed path
	Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set"
	usb: musb: musb_dsps: request_irq() after initializing musb
	usbip: give back URBs for unsent unlink requests during cleanup
	usbip:vhci_hcd USB port can get stuck in the disabled state
	ASoC: rockchip: i2s: Fix regmap_ops hang
	ASoC: rockchip: i2s: Fixup config for DAIFMT_DSP_A/B
	drm/amdkfd: Account for SH/SE count when setting up cu masks.
	nfsd: fix crash on LOCKT on reexported NFSv3
	iwlwifi: pcie: free RBs during configure
	iwlwifi: mvm: fix a memory leak in iwl_mvm_mac_ctxt_beacon_changed
	iwlwifi: mvm: avoid static queue number aliasing
	iwlwifi: mvm: fix access to BSS elements
	iwlwifi: fw: correctly limit to monitor dump
	iwlwifi: mvm: Fix scan channel flags settings
	net/mlx5: DR, fix a potential use-after-free bug
	net/mlx5: DR, Enable QP retransmission
	parport: remove non-zero check on count
	selftests/bpf: Fix potential unreleased lock
	wcn36xx: Fix missing frame timestamp for beacon/probe-resp
	ath9k: fix OOB read ar9300_eeprom_restore_internal
	ath9k: fix sleeping in atomic context
	net: fix NULL pointer reference in cipso_v4_doi_free
	fix array-index-out-of-bounds in taprio_change
	net: w5100: check return value after calling platform_get_resource()
	net: hns3: clean up a type mismatch warning
	fs/io_uring Don't use the return value from import_iovec().
	io_uring: remove duplicated io_size from rw
	parisc: fix crash with signals and alloca
	ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
	scsi: BusLogic: Fix missing pr_cont() use
	scsi: qla2xxx: Changes to support kdump kernel
	scsi: qla2xxx: Sync queue idx with queue_pair_map idx
	cpufreq: powernv: Fix init_chip_info initialization in numa=off
	s390/pv: fix the forcing of the swiotlb
	hugetlb: fix hugetlb cgroup refcounting during vma split
	mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
	mm/hugetlb: initialize hugetlb_usage in mm_init
	mm,vmscan: fix divide by zero in get_scan_count
	memcg: enable accounting for pids in nested pid namespaces
	libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
	platform/chrome: cros_ec_proto: Send command again when timeout occurs
	lib/test_stackinit: Fix static initializer test
	net: dsa: lantiq_gswip: fix maximum frame length
	drm/mgag200: Select clock in PLL update functions
	drm/msi/mdp4: populate priv->kms in mdp4_kms_init
	drm/dp_mst: Fix return code on sideband message failure
	drm/panfrost: Make sure MMU context lifetime is not bound to panfrost_priv
	drm/amdgpu: Fix BUG_ON assert
	drm/amd/display: Update number of DCN3 clock states
	drm/amd/display: Update bounding box states (v2)
	drm/panfrost: Simplify lock_region calculation
	drm/panfrost: Use u64 for size in lock_region
	drm/panfrost: Clamp lock region to Bifrost minimum
	fanotify: limit number of event merge attempts
	Linux 5.10.67

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic8df59518265d0cdf724e93e8922cde48fc85ce9
2021-09-30 12:21:03 +02:00
Rik van Riel
388f12dabb mm,vmscan: fix divide by zero in get_scan_count
commit 32d4f4b782bb8f0ceb78c6b5dc46eb577ae25bf7 upstream.

Commit f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to
proportional memory.low reclaim") introduced a divide by zero corner
case when oomd is being used in combination with cgroup memory.low
protection.

When oomd decides to kill a cgroup, it will force the cgroup memory to
be reclaimed after killing the tasks, by writing to the memory.max file
for that cgroup, forcing the remaining page cache and reclaimable slab
to be reclaimed down to zero.

Previously, on cgroups with some memory.low protection that would result
in the memory being reclaimed down to the memory.low limit, or likely
not at all, having the page cache reclaimed asynchronously later.

With f56ce412a59d the oomd write to memory.max tries to reclaim all the
way down to zero, which may race with another reclaimer, to the point of
ending up with the divide by zero below.

This patch implements the obvious fix.

Link: https://lkml.kernel.org/r/20210826220149.058089c6@imladris.surriel.com
Fixes: f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18 13:40:36 +02:00
Greg Kroah-Hartman
a6777a7cee Linux 5.10.61
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmEnjxwACgkQ3qZv95d3
 LNxlaw//VC9rmejl9JwdZQaz3NfhNv2UwamLShlliq2aBmy8d+fGTAMXkCW7UK/M
 gS27oweL6UV6wT8BKgK8LoKJ7L1rI1KDAePpxMCSZw8Mrd8jre0FMVgkkQz/2mFQ
 w3aMmVitCX7RIeShoxP/8fmWnGRkugDx7SZ2csoqRP9Txtih7DAamiun2ttTzzra
 4dxFRtyzw1LpJBSv/pMJ3/FusPkrbwKzSv2wYIgWmfhIWjPcvmBc0ufJmGUK7Gxo
 MJGaUWk9RJ3eNSUPUP3pJlKdzeHdOhOQnydiaUA60pWcDoAyQj7qM06af5hVLsCX
 0Z9r97bzWOF+LNuEiNIGbodJ74IgFv0VTgjlRdZfeLC+5yLyIo7fXAmAA2Od0fiH
 04Ak9+n+FTkl5avUufLrEwHljAAOgcbtJX7W4F/XPW+1P1tZxG9H4ZI+uQN7ZyAB
 fXGo3O3p5J14fI/m8Zr4mXDIiq34OPxSHLx89YryubriO8kdNv69+tciLn2m5a5W
 yctpYgVQnRJt44dg5I/aIzCSOW5+FviRY8slAopdXBIAwUZAZZgW9IVGAh2uRiLS
 tCtbFa3cXUV1JoWsgUdae8BDoOp0dm69yeYrP1f5eVCLe+FHa9P/Z3qxpBLhpPEh
 QOTI14tg3wu3vwFxAA2rsFmSInkh231ryjgGq788BnoHHad3DnI=
 =Rtb5
 -----END PGP SIGNATURE-----

Merge 5.10.61 into android12-5.10-lts

Changes in 5.10.61
	ath: Use safer key clearing with key cache entries
	ath9k: Clear key cache explicitly on disabling hardware
	ath: Export ath_hw_keysetmac()
	ath: Modify ath_key_delete() to not need full key entry
	ath9k: Postpone key cache entry deletion for TXQ frames reference it
	mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards
	media: zr364xx: propagate errors from zr364xx_start_readpipe()
	media: zr364xx: fix memory leaks in probe()
	media: drivers/media/usb: fix memory leak in zr364xx_probe
	KVM: x86: Factor out x86 instruction emulation with decoding
	KVM: X86: Fix warning caused by stale emulation context
	USB: core: Avoid WARNings for 0-length descriptor requests
	USB: core: Fix incorrect pipe calculation in do_proc_control()
	dmaengine: xilinx_dma: Fix read-after-free bug when terminating transfers
	dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe()
	spi: spi-mux: Add module info needed for autoloading
	net: xfrm: Fix end of loop tests for list_for_each_entry
	ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218
	dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not yet available
	scsi: pm80xx: Fix TMF task completion race condition
	scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry()
	scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach()
	scsi: core: Avoid printing an error if target_alloc() returns -ENXIO
	scsi: core: Fix capacity set to zero after offlinining device
	drm/amdgpu: fix the doorbell missing when in CGPG issue for renoir.
	qede: fix crash in rmmod qede while automatic debug collection
	ARM: dts: nomadik: Fix up interrupt controller node names
	net: usb: pegasus: Check the return value of get_geristers() and friends;
	net: usb: lan78xx: don't modify phy_device state concurrently
	drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X
	drm/amd/display: workaround for hard hang on HPD on native DP
	Bluetooth: hidp: use correct wait queue when removing ctrl_wait
	arm64: dts: qcom: c630: fix correct powerdown pin for WSA881x
	arm64: dts: qcom: msm8992-bullhead: Remove PSCI
	iommu: Check if group is NULL before remove device
	cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
	dccp: add do-while-0 stubs for dccp_pr_debug macros
	virtio: Protect vqs list access
	vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()
	bus: ti-sysc: Fix error handling for sysc_check_active_timer()
	vhost: Fix the calculation in vhost_overflow()
	vdpa/mlx5: Avoid destroying MR on empty iotlb
	soc / drm: mediatek: Move DDP component defines into mtk-mmsys.h
	drm/mediatek: Fix aal size config
	drm/mediatek: Add AAL output size configuration
	bpf: Clear zext_dst of dead insns
	bnxt: don't lock the tx queue from napi poll
	bnxt: disable napi before canceling DIM
	bnxt: make sure xmit_more + errors does not miss doorbells
	bnxt: count Tx drops
	net: 6pack: fix slab-out-of-bounds in decode_data
	ptp_pch: Restore dependency on PCI
	bnxt_en: Disable aRFS if running on 212 firmware
	bnxt_en: Add missing DMA memory barriers
	vrf: Reset skb conntrack connection on VRF rcv
	virtio-net: support XDP when not more queues
	virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO
	net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
	ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path
	sch_cake: fix srchost/dsthost hashing mode
	net: mdio-mux: Don't ignore memory allocation errors
	net: mdio-mux: Handle -EPROBE_DEFER correctly
	ovs: clear skb->tstamp in forwarding path
	iommu/vt-d: Consolidate duplicate cache invaliation code
	iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry()
	r8152: fix writing USB_BP2_EN
	i40e: Fix ATR queue selection
	iavf: Fix ping is lost after untrusted VF had tried to change MAC
	Revert "flow_offload: action should not be NULL when it is referenced"
	mmc: dw_mmc: Fix hang on data CRC error
	mmc: mmci: stm32: Check when the voltage switch procedure should be done
	mmc: sdhci-msm: Update the software timeout value for sdhc
	clk: imx6q: fix uart earlycon unwork
	clk: qcom: gdsc: Ensure regulator init state matches GDSC state
	ALSA: hda - fix the 'Capture Switch' value change notifications
	tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name
	slimbus: messaging: start transaction ids from 1 instead of zero
	slimbus: messaging: check for valid transaction id
	slimbus: ngd: reset dma setup during runtime pm
	ipack: tpci200: fix many double free issues in tpci200_pci_probe
	ipack: tpci200: fix memory leak in the tpci200_register
	ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop
	mmc: sdhci-iproc: Cap min clock frequency on BCM2711
	mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711
	btrfs: prevent rename2 from exchanging a subvol with a directory from different parents
	ALSA: hda/via: Apply runtime PM workaround for ASUS B23E
	s390/pci: fix use after free of zpci_dev
	PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI
	ALSA: hda/realtek: Limit mic boost on HP ProBook 445 G8
	ASoC: intel: atom: Fix breakage for PCM buffer address setup
	mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
	fs: warn about impending deprecation of mandatory locks
	io_uring: fix xa_alloc_cycle() error return value check
	io_uring: only assign io_uring_enter() SQPOLL error in actual error case
	Linux 5.10.61

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5b6e2a66b03d1cb01c8310b83dcc2a119c1bd6b3
2021-08-27 20:51:37 +02:00
Johannes Weiner
8132fc2bf4 mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
[ Upstream commit f56ce412a59d7d938b81de8878faef128812482c ]

We've noticed occasional OOM killing when memory.low settings are in
effect for cgroups.  This is unexpected and undesirable as memory.low is
supposed to express non-OOMing memory priorities between cgroups.

The reason for this is proportional memory.low reclaim.  When cgroups
are below their memory.low threshold, reclaim passes them over in the
first round, and then retries if it couldn't find pages anywhere else.
But when cgroups are slightly above their memory.low setting, page scan
force is scaled down and diminished in proportion to the overage, to the
point where it can cause reclaim to fail as well - only in that case we
currently don't retry, and instead trigger OOM.

To fix this, hook proportional reclaim into the same retry logic we have
in place for when cgroups are skipped entirely.  This way if reclaim
fails and some cgroups were scanned with diminished pressure, we'll try
another full-force cycle before giving up and OOMing.

[akpm@linux-foundation.org: coding-style fixes]

Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org
Fixes: 9783aa9917 ("mm, memcg: proportional memory.{low,min} reclaim")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Leon Yang <lnyng@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>		[5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-26 08:35:57 -04:00
Liujie Xie
a8385d61f2 ANDROID: Allow vendor module to reclaim a memcg
Export try_to_free_mem_cgroup_pages function to allow vendor modules to reclaim a memory cgroup.

Bug: 192052083

Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Iec6ef50f5c71c62d0c9aa6de90e56a143dac61c1
2021-07-12 18:54:56 +00:00
Charan Teja Reddy
daeabfe7fa ANDROID: mm: add reclaim_shmem_address_space() for faster reclaims
Add the functionality that allow users of shmem to reclaim its pages
without going through the kswapd/direct reclaim path. An example usecase
is: Say that device allocates a larger amount of shmem pages and shares
it with hardware. To faster reclaims such pages, drivers can register
the shrinkers and call reclaim_shmem_address_space().

The implementation of this function is mostly borrowed from
reclaim_address_space() implemented for per process reclaim[1].

[1] https://lore.kernel.org/patchwork/cover/378056/

Bug: 187798288
Change-Id: I03d2c3b9610612af977f89ddeabb63b8e9e50918
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2021-07-08 17:26:58 +00:00
Martin Liu
12902c9996 ANDROID: vendor_hooks: Export direct reclaim trace points
Get direct reclaim info.

Bug: 190795589
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: Ie66a3c87484a364a918c19b8e044c82f1afd6749
2021-06-24 17:05:05 +00:00
Sudarshan Rajagopalan
2699fa478d ANDROID: mm: vmscan: support equal reclaim for anon and file pages
When performing memory reclaim, support treating anonymous and
file backed pages equally. Swapping anonymous pages out to memory
can be efficient enough to justify treating anonymous and file backed
pages equally.
Create a vendor hook inside of get_scan_count so that equal reclaim of
anon and file pages can be enabled inside of the trace hook based on
certain conditions.

Bug: 175415908
Change-Id: Idac2f1468371549d20dd3e5652c7382dc3d7d9cf
Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
2021-06-09 02:02:44 +00:00
rongqianfeng
bf769b7216 ANDROID: vendor_hooks: add hook and OEM data for slab shrink
Some shrinker add lock in count_objects() may cause lock contention
issues and lead all task stall on slab shrink. Add vendor hook in
do_shrink_slab() for shrinker->count_objects() latency measuring.

Add 3 oem data in shrink_control struct. Two is for
shrinker->count_objects() and shrinker->scan_objects() latency
measuring, other one to store priority, some shrinker know the reclaimer
priority can control the memory reclaim more better.

Bug: 188684131

Change-Id: I80e9d90179bb52a99c54d9a067c6fcee835bb2ad
Signed-off-by: rongqianfeng <rongqianfeng@vivo.com>
2021-06-04 11:15:20 -07:00
zhouhuacai
577f73412f ANDROID: vendor_hooks: tune reclaim inactive ratio
Add hooks to apply oem's memory reclaim optimization

Bug: 186669330
Change-Id: I9a1c713bf3eea2823b2758ee44284a244b1ca7a1
Signed-off-by: zhouhuacai <zhouhuacai@oppo.com>
2021-04-30 00:04:07 +00:00
wudean
396a6adfd3 ANDROID: vendor_hooks: bypass shrink slab
Add hooks for bypass shrink slab.

Bug: 185951972
Change-Id: I343e02ae5cd6d076d525d0e4bfc09ecdfeda1d7b
Signed-off-by: wudean <dean.wu@vivo.com>
2021-04-29 09:57:29 -07:00
xiaofeng
35dafe72dd ANDROID: vendor_hooks: tune reclaim swappiness or scan type
Add hooks for reclaim

Bug: 185438290
Change-Id: Ib9eec302b1df4da7e98c77b94541af28c34a8613
Signed-off-by: xiaofeng <xiaofeng5@xiaomi.com>
2021-04-15 23:08:29 +00:00
Greg Kroah-Hartman
d8c7f0a3cd Merge 5.10.20 into android12-5.10
Changes in 5.10.20
	vmlinux.lds.h: add DWARF v5 sections
	vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()
	debugfs: be more robust at handling improper input in debugfs_lookup()
	debugfs: do not attempt to create a new file before the filesystem is initalized
	scsi: libsas: docs: Remove notify_ha_event()
	scsi: qla2xxx: Fix mailbox Ch erroneous error
	kdb: Make memory allocations more robust
	w1: w1_therm: Fix conversion result for negative temperatures
	PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
	PCI: Decline to resize resources if boot config must be preserved
	virt: vbox: Do not use wait_event_interruptible when called from kernel context
	bfq: Avoid false bfq queue merging
	ALSA: usb-audio: Fix PCM buffer allocation in non-vmalloc mode
	MIPS: vmlinux.lds.S: add missing PAGE_ALIGNED_DATA() section
	vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=y
	random: fix the RNDRESEEDCRNG ioctl
	ALSA: pcm: Call sync_stop at disconnection
	ALSA: pcm: Assure sync with the pending stop operation at suspend
	ALSA: pcm: Don't call sync_stop if it hasn't been stopped
	drm/i915/gt: One more flush for Baytrail clear residuals
	ath10k: Fix error handling in case of CE pipe init failure
	Bluetooth: btqcomsmd: Fix a resource leak in error handling paths in the probe function
	Bluetooth: hci_uart: Fix a race for write_work scheduling
	Bluetooth: Fix initializing response id after clearing struct
	arm64: dts: renesas: beacon kit: Fix choppy Bluetooth Audio
	arm64: dts: renesas: beacon: Fix audio-1.8V pin enable
	ARM: dts: exynos: correct PMIC interrupt trigger level on Artik 5
	ARM: dts: exynos: correct PMIC interrupt trigger level on Monk
	ARM: dts: exynos: correct PMIC interrupt trigger level on Rinato
	ARM: dts: exynos: correct PMIC interrupt trigger level on Spring
	ARM: dts: exynos: correct PMIC interrupt trigger level on Arndale Octa
	ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid XU3 family
	arm64: dts: exynos: correct PMIC interrupt trigger level on TM2
	arm64: dts: exynos: correct PMIC interrupt trigger level on Espresso
	memory: mtk-smi: Fix PM usage counter unbalance in mtk_smi ops
	Bluetooth: hci_qca: Fix memleak in qca_controller_memdump
	staging: vchiq: Fix bulk userdata handling
	staging: vchiq: Fix bulk transfers on 64-bit builds
	arm64: dts: qcom: msm8916-samsung-a5u: Fix iris compatible
	net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clock
	bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h
	bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args
	firmware: arm_scmi: Fix call site of scmi_notification_exit
	arm64: dts: allwinner: A64: properly connect USB PHY to port 0
	arm64: dts: allwinner: H6: properly connect USB PHY to port 0
	arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card
	arm64: dts: allwinner: H6: Allow up to 150 MHz MMC bus frequency
	arm64: dts: allwinner: A64: Limit MMC2 bus frequency to 150 MHz
	arm64: dts: qcom: msm8916-samsung-a2015: Fix sensors
	cpufreq: brcmstb-avs-cpufreq: Free resources in error path
	cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()
	arm64: dts: rockchip: rk3328: Add clock_in_out property to gmac2phy node
	ACPICA: Fix exception code class checks
	usb: gadget: u_audio: Free requests only after callback
	arm64: dts: qcom: sdm845-db845c: Fix reset-pin of ov8856 node
	soc: qcom: socinfo: Fix an off by one in qcom_show_pmic_model()
	soc: ti: pm33xx: Fix some resource leak in the error handling paths of the probe function
	staging: media: atomisp: Fix size_t format specifier in hmm_alloc() debug statemenet
	Bluetooth: drop HCI device reference before return
	Bluetooth: Put HCI device if inquiry procedure interrupts
	memory: ti-aemif: Drop child node when jumping out loop
	ARM: dts: Configure missing thermal interrupt for 4430
	usb: dwc2: Do not update data length if it is 0 on inbound transfers
	usb: dwc2: Abort transaction after errors with unknown reason
	usb: dwc2: Make "trimming xfer length" a debug message
	staging: rtl8723bs: wifi_regd.c: Fix incorrect number of regulatory rules
	x86/MSR: Filter MSR writes through X86_IOC_WRMSR_REGS ioctl too
	arm64: dts: renesas: beacon: Fix EEPROM compatible value
	can: mcp251xfd: mcp251xfd_probe(): fix errata reference
	ARM: dts: armada388-helios4: assign pinctrl to LEDs
	ARM: dts: armada388-helios4: assign pinctrl to each fan
	arm64: dts: armada-3720-turris-mox: rename u-boot mtd partition to a53-firmware
	opp: Correct debug message in _opp_add_static_v2()
	Bluetooth: btusb: Fix memory leak in btusb_mtk_wmt_recv
	soc: qcom: ocmem: don't return NULL in of_get_ocmem
	arm64: dts: msm8916: Fix reserved and rfsa nodes unit address
	arm64: dts: meson: fix broken wifi node for Khadas VIM3L
	iwlwifi: mvm: set enabled in the PPAG command properly
	ARM: s3c: fix fiq for clang IAS
	optee: simplify i2c access
	staging: wfx: fix possible panic with re-queued frames
	ARM: at91: use proper asm syntax in pm_suspend
	ath10k: Fix suspicious RCU usage warning in ath10k_wmi_tlv_parse_peer_stats_info()
	ath10k: Fix lockdep assertion warning in ath10k_sta_statistics
	ath11k: fix a locking bug in ath11k_mac_op_start()
	soc: aspeed: snoop: Add clock control logic
	iwlwifi: mvm: fix the type we use in the PPAG table validity checks
	iwlwifi: mvm: store PPAG enabled/disabled flag properly
	iwlwifi: mvm: send stored PPAG command instead of local
	iwlwifi: mvm: assign SAR table revision to the command later
	iwlwifi: mvm: don't check if CSA event is running before removing
	bpf_lru_list: Read double-checked variable once without lock
	iwlwifi: pnvm: set the PNVM again if it was already loaded
	iwlwifi: pnvm: increment the pointer before checking the TLV
	ath9k: fix data bus crash when setting nf_override via debugfs
	selftests/bpf: Convert test_xdp_redirect.sh to bash
	ibmvnic: Set to CLOSED state even on error
	bnxt_en: reverse order of TX disable and carrier off
	bnxt_en: Fix devlink info's stored fw.psid version format.
	xen/netback: fix spurious event detection for common event case
	dpaa2-eth: fix memory leak in XDP_REDIRECT
	net: phy: consider that suspend2ram may cut off PHY power
	net/mlx5e: Don't change interrupt moderation params when DIM is enabled
	net/mlx5e: Change interrupt moderation channel params also when channels are closed
	net/mlx5: Fix health error state handling
	net/mlx5e: Replace synchronize_rcu with synchronize_net
	net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context
	net/mlx5: Disable devlink reload for multi port slave device
	net/mlx5: Disallow RoCE on multi port slave device
	net/mlx5: Disallow RoCE on lag device
	net/mlx5: Disable devlink reload for lag devices
	net/mlx5e: CT: manage the lifetime of the ct entry object
	net/mlx5e: Check tunnel offload is required before setting SWP
	mac80211: fix potential overflow when multiplying to u32 integers
	libbpf: Ignore non function pointer member in struct_ops
	bpf: Fix an unitialized value in bpf_iter
	bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocation
	bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx
	selftests: mptcp: fix ACKRX debug message
	tcp: fix SO_RCVLOWAT related hangs under mem pressure
	net: axienet: Handle deferred probe on clock properly
	cxgb4/chtls/cxgbit: Keeping the max ofld immediate data size same in cxgb4 and ulds
	b43: N-PHY: Fix the update of coef for the PHY revision >= 3case
	bpf: Clear subreg_def for global function return values
	ibmvnic: add memory barrier to protect long term buffer
	ibmvnic: skip send_request_unmap for timeout reset
	net: dsa: felix: perform teardown in reverse order of setup
	net: dsa: felix: don't deinitialize unused ports
	net: phy: mscc: adding LCPLL reset to VSC8514
	net: amd-xgbe: Reset the PHY rx data path when mailbox command timeout
	net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning
	net: amd-xgbe: Reset link when the link never comes back
	net: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP
	net: mvneta: Remove per-cpu queue mapping for Armada 3700
	net: enetc: fix destroyed phylink dereference during unbind
	tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer
	tty: implement read_iter
	fbdev: aty: SPARC64 requires FB_ATY_CT
	drm/gma500: Fix error return code in psb_driver_load()
	gma500: clean up error handling in init
	drm/fb-helper: Add missed unlocks in setcmap_legacy()
	drm/panel: mantix: Tweak init sequence
	drm/vc4: hdmi: Take into account the clock doubling flag in atomic_check
	crypto: sun4i-ss - linearize buffers content must be kept
	crypto: sun4i-ss - fix kmap usage
	crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled
	hwrng: ingenic - Fix a resource leak in an error handling path
	media: allegro: Fix use after free on error
	kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state()
	drm: rcar-du: Fix PM reference leak in rcar_cmm_enable()
	drm: rcar-du: Fix crash when using LVDS1 clock for CRTC
	drm: rcar-du: Fix the return check of of_parse_phandle and of_find_device_by_node
	drm/amdgpu: Fix macro name _AMDGPU_TRACE_H_ in preprocessor if condition
	MIPS: c-r4k: Fix section mismatch for loongson2_sc_init
	MIPS: lantiq: Explicitly compare LTQ_EBU_PCC_ISTAT against 0
	drm/virtio: make sure context is created in gem open
	drm/fourcc: fix Amlogic format modifier masks
	media: ipu3-cio2: Build only for x86
	media: i2c: ov5670: Fix PIXEL_RATE minimum value
	media: imx: Unregister csc/scaler only if registered
	media: imx: Fix csc/scaler unregister
	media: mtk-vcodec: fix error return code in vdec_vp9_decode()
	media: camss: missing error code in msm_video_register()
	media: vsp1: Fix an error handling path in the probe function
	media: em28xx: Fix use-after-free in em28xx_alloc_urbs
	media: media/pci: Fix memleak in empress_init
	media: tm6000: Fix memleak in tm6000_start_stream
	media: aspeed: fix error return code in aspeed_video_setup_video()
	ASoC: cs42l56: fix up error handling in probe
	ASoC: qcom: qdsp6: Move frontend AIFs to q6asm-dai
	evm: Fix memleak in init_desc
	crypto: bcm - Rename struct device_private to bcm_device_private
	sched/fair: Avoid stale CPU util_est value for schedutil in task dequeue
	drm/sun4i: tcon: fix inverted DCLK polarity
	media: imx7: csi: Fix regression for parallel cameras on i.MX6UL
	media: imx7: csi: Fix pad link validation
	media: ti-vpe: cal: fix write to unallocated memory
	MIPS: properly stop .eh_frame generation
	MIPS: Compare __SYNC_loongson3_war against 0
	drm/tegra: Fix reference leak when pm_runtime_get_sync() fails
	drm/amdgpu: toggle on DF Cstate after finishing xgmi injection
	bsg: free the request before return error code
	macintosh/adb-iop: Use big-endian autopoll mask
	drm/amd/display: Fix 10/12 bpc setup in DCE output bit depth reduction.
	drm/amd/display: Fix HDMI deep color output for DCE 6-11.
	media: software_node: Fix refcounts in software_node_get_next_child()
	media: lmedm04: Fix misuse of comma
	media: vidtv: psi: fix missing crc for PMT
	media: atomisp: Fix a buffer overflow in debug code
	media: qm1d1c0042: fix error return code in qm1d1c0042_init()
	media: cx25821: Fix a bug when reallocating some dma memory
	media: mtk-vcodec: fix argument used when DEBUG is defined
	media: pxa_camera: declare variable when DEBUG is defined
	media: uvcvideo: Accept invalid bFormatIndex and bFrameIndex values
	sched/eas: Don't update misfit status if the task is pinned
	f2fs: compress: fix potential deadlock
	ASoC: qcom: lpass-cpu: Remove bit clock state check
	ASoC: SOF: Intel: hda: cancel D0i3 work during runtime suspend
	perf/arm-cmn: Fix PMU instance naming
	perf/arm-cmn: Move IRQs when migrating context
	mtd: parser: imagetag: fix error codes in bcm963xx_parse_imagetag_partitions()
	crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
	crypto: talitos - Fix ctr(aes) on SEC1
	drm/nouveau: bail out of nouveau_channel_new if channel init fails
	mm: proc: Invalidate TLB after clearing soft-dirty page state
	ata: ahci_brcm: Add back regulators management
	ASoC: cpcap: fix microphone timeslot mask
	ASoC: codecs: add missing max_register in regmap config
	mtd: parsers: afs: Fix freeing the part name memory in failure
	f2fs: fix to avoid inconsistent quota data
	drm/amdgpu: Prevent shift wrapping in amdgpu_read_mask()
	f2fs: fix a wrong condition in __submit_bio
	ASoC: qcom: Fix typo error in HDMI regmap config callbacks
	KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs
	drm/mediatek: Check if fb is null
	Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind()
	ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A5E
	ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A3E
	locking/lockdep: Avoid unmatched unlock
	ASoC: qcom: lpass: Fix i2s ctl register bit map
	ASoC: rt5682: Fix panic in rt5682_jack_detect_handler happening during system shutdown
	ASoC: SOF: debug: Fix a potential issue on string buffer termination
	btrfs: clarify error returns values in __load_free_space_cache
	btrfs: fix double accounting of ordered extent for subpage case in btrfs_invalidapge
	KVM: x86: Restore all 64 bits of DR6 and DR7 during RSM on x86-64
	s390/zcrypt: return EIO when msg retry limit reached
	drm/vc4: hdmi: Move hdmi reset to bind
	drm/vc4: hdmi: Fix register offset with longer CEC messages
	drm/vc4: hdmi: Fix up CEC registers
	drm/vc4: hdmi: Restore cec physical address on reconnect
	drm/vc4: hdmi: Compute the CEC clock divider from the clock rate
	drm/vc4: hdmi: Update the CEC clock divider on HSM rate change
	drm/lima: fix reference leak in lima_pm_busy
	drm/dp_mst: Don't cache EDIDs for physical ports
	hwrng: timeriomem - Fix cooldown period calculation
	crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()
	io_uring: fix possible deadlock in io_uring_poll
	nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs
	nvmet-tcp: fix potential race of tcp socket closing accept_work
	nvme-multipath: set nr_zones for zoned namespaces
	nvmet: remove extra variable in identify ns
	nvmet: set status to 0 in case for invalid nsid
	ASoC: SOF: sof-pci-dev: add missing Up-Extreme quirk
	ima: Free IMA measurement buffer on error
	ima: Free IMA measurement buffer after kexec syscall
	ASoC: simple-card-utils: Fix device module clock
	fs/jfs: fix potential integer overflow on shift of a int
	jffs2: fix use after free in jffs2_sum_write_data()
	ubifs: Fix memleak in ubifs_init_authentication
	ubifs: replay: Fix high stack usage, again
	ubifs: Fix error return code in alloc_wbufs()
	irqchip/imx: IMX_INTMUX should not default to y, unconditionally
	smp: Process pending softirqs in flush_smp_call_function_from_idle()
	drm/amdgpu/display: remove hdcp_srm sysfs on device removal
	capabilities: Don't allow writing ambiguous v3 file capabilities
	HSI: Fix PM usage counter unbalance in ssi_hw_init
	power: supply: cpcap: Add missing IRQF_ONESHOT to fix regression
	clk: meson: clk-pll: fix initializing the old rate (fallback) for a PLL
	clk: meson: clk-pll: make "ret" a signed integer
	clk: meson: clk-pll: propagate the error from meson_clk_pll_set_rate()
	selftests/powerpc: Make the test check in eeh-basic.sh posix compliant
	regulator: qcom-rpmh-regulator: add pm8009-1 chip revision
	arm64: dts: qcom: qrb5165-rb5: fix pm8009 regulators
	quota: Fix memory leak when handling corrupted quota file
	i2c: iproc: handle only slave interrupts which are enabled
	i2c: iproc: update slave isr mask (ISR_MASK_SLAVE)
	i2c: iproc: handle master read request
	spi: cadence-quadspi: Abort read if dummy cycles required are too many
	clk: sunxi-ng: h6: Fix CEC clock
	clk: renesas: r8a779a0: Remove non-existent S2 clock
	clk: renesas: r8a779a0: Fix parent of CBFUSA clock
	HID: core: detect and skip invalid inputs to snto32()
	RDMA/siw: Fix handling of zero-sized Read and Receive Queues.
	dmaengine: fsldma: Fix a resource leak in the remove function
	dmaengine: fsldma: Fix a resource leak in an error handling path of the probe function
	dmaengine: owl-dma: Fix a resource leak in the remove function
	dmaengine: hsu: disable spurious interrupt
	mfd: bd9571mwv: Use devm_mfd_add_devices()
	power: supply: cpcap-charger: Fix missing power_supply_put()
	power: supply: cpcap-battery: Fix missing power_supply_put()
	power: supply: cpcap-charger: Fix power_supply_put on null battery pointer
	fdt: Properly handle "no-map" field in the memory region
	of/fdt: Make sure no-map does not remove already reserved regions
	RDMA/rtrs: Extend ibtrs_cq_qp_create
	RDMA/rtrs-srv: Release lock before call into close_sess
	RDMA/rtrs-srv: Use sysfs_remove_file_self for disconnect
	RDMA/rtrs-clt: Set mininum limit when create QP
	RDMA/rtrs: Call kobject_put in the failure path
	RDMA/rtrs-srv: Fix missing wr_cqe
	RDMA/rtrs-clt: Refactor the failure cases in alloc_clt
	RDMA/rtrs-srv: Init wr_cnt as 1
	power: reset: at91-sama5d2_shdwc: fix wkupdbc mask
	rtc: s5m: select REGMAP_I2C
	dmaengine: idxd: set DMA channel to be private
	power: supply: fix sbs-charger build, needs REGMAP_I2C
	clocksource/drivers/ixp4xx: Select TIMER_OF when needed
	clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined
	spi: imx: Don't print error on -EPROBEDEFER
	RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
	IB/mlx5: Add mutex destroy call to cap_mask_mutex mutex
	clk: sunxi-ng: h6: Fix clock divider range on some clocks
	platform/chrome: cros_ec_proto: Use EC_HOST_EVENT_MASK not BIT
	platform/chrome: cros_ec_proto: Add LID and BATTERY to default mask
	regulator: axp20x: Fix reference cout leak
	watch_queue: Drop references to /dev/watch_queue
	certs: Fix blacklist flag type confusion
	regulator: s5m8767: Fix reference count leak
	spi: atmel: Put allocated master before return
	regulator: s5m8767: Drop regulators OF node reference
	power: supply: axp20x_usb_power: Init work before enabling IRQs
	power: supply: smb347-charger: Fix interrupt usage if interrupt is unavailable
	regulator: core: Avoid debugfs: Directory ... already present! error
	isofs: release buffer head before return
	watchdog: intel-mid_wdt: Postpone IRQ handler registration till SCU is ready
	auxdisplay: ht16k33: Fix refresh rate handling
	objtool: Fix error handling for STD/CLD warnings
	objtool: Fix retpoline detection in asm code
	objtool: Fix ".cold" section suffix check for newer versions of GCC
	scsi: lpfc: Fix ancient double free
	iommu: Switch gather->end to the inclusive end
	IB/umad: Return EIO in case of when device disassociated
	IB/umad: Return EPOLLERR in case of when device disassociated
	KVM: PPC: Make the VMX instruction emulation routines static
	powerpc/47x: Disable 256k page size
	powerpc/time: Enable sched clock for irqtime
	mmc: owl-mmc: Fix a resource leak in an error handling path and in the remove function
	mmc: sdhci-sprd: Fix some resource leaks in the remove function
	mmc: usdhi6rol0: Fix a resource leak in the error handling path of the probe
	mmc: renesas_sdhi_internal_dmac: Fix DMA buffer alignment from 8 to 128-bytes
	ARM: 9046/1: decompressor: Do not clear SCTLR.nTLSMD for ARMv7+ cores
	i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct
	amba: Fix resource leak for drivers without .remove
	iommu: Move iotlb_sync_map out from __iommu_map
	iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping
	IB/mlx5: Return appropriate error code instead of ENOMEM
	IB/cm: Avoid a loop when device has 255 ports
	tracepoint: Do not fail unregistering a probe due to memory failure
	rtc: zynqmp: depend on HAS_IOMEM
	perf tools: Fix DSO filtering when not finding a map for a sampled address
	perf vendor events arm64: Fix Ampere eMag event typo
	RDMA/rxe: Fix coding error in rxe_recv.c
	RDMA/rxe: Fix coding error in rxe_rcv_mcast_pkt
	RDMA/rxe: Correct skb on loopback path
	spi: stm32: properly handle 0 byte transfer
	mfd: altera-sysmgr: Fix physical address storing more
	mfd: wm831x-auxadc: Prevent use after free in wm831x_auxadc_read_irq()
	powerpc/pseries/dlpar: handle ibm, configure-connector delay status
	powerpc/8xx: Fix software emulation interrupt
	clk: qcom: gcc-msm8998: Fix Alpha PLL type for all GPLLs
	kunit: tool: fix unit test cleanup handling
	kselftests: dmabuf-heaps: Fix Makefile's inclusion of the kernel's usr/include dir
	RDMA/hns: Fixed wrong judgments in the goto branch
	RDMA/siw: Fix calculation of tx_valid_cpus size
	RDMA/hns: Fix type of sq_signal_bits
	RDMA/hns: Disable RQ inline by default
	clk: divider: fix initialization with parent_hw
	spi: pxa2xx: Fix the controller numbering for Wildcat Point
	powerpc/uaccess: Avoid might_fault() when user access is enabled
	powerpc/kuap: Restore AMR after replaying soft interrupts
	regulator: qcom-rpmh: fix pm8009 ldo7
	clk: aspeed: Fix APLL calculate formula from ast2600-A2
	selftests/ftrace: Update synthetic event syntax errors
	perf symbols: Use (long) for iterator for bfd symbols
	regulator: bd718x7, bd71828, Fix dvs voltage levels
	spi: dw: Avoid stack content exposure
	spi: Skip zero-length transfers in spi_transfer_one_message()
	printk: avoid prb_first_valid_seq() where possible
	perf symbols: Fix return value when loading PE DSO
	nfsd: register pernet ops last, unregister first
	svcrdma: Hold private mutex while invoking rdma_accept()
	ceph: fix flush_snap logic after putting caps
	RDMA/hns: Fixes missing error code of CMDQ
	RDMA/ucma: Fix use-after-free bug in ucma_create_uevent
	RDMA/rtrs-srv: Fix stack-out-of-bounds
	RDMA/rtrs: Only allow addition of path to an already established session
	RDMA/rtrs-srv: fix memory leak by missing kobject free
	RDMA/rtrs-srv-sysfs: fix missing put_device
	RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()
	Input: sur40 - fix an error code in sur40_probe()
	perf record: Fix continue profiling after draining the buffer
	perf intel-pt: Fix missing CYC processing in PSB
	perf intel-pt: Fix premature IPC
	perf intel-pt: Fix IPC with CYC threshold
	perf test: Fix unaligned access in sample parsing test
	Input: elo - fix an error code in elo_connect()
	sparc64: only select COMPAT_BINFMT_ELF if BINFMT_ELF is set
	sparc: fix led.c driver when PROC_FS is not enabled
	Input: zinitix - fix return type of zinitix_init_touch()
	ARM: 9065/1: OABI compat: fix build when EPOLL is not enabled
	misc: eeprom_93xx46: Fix module alias to enable module autoprobe
	phy: rockchip-emmc: emmc_phy_init() always return 0
	phy: cadence-torrent: Fix error code in cdns_torrent_phy_probe()
	misc: eeprom_93xx46: Add module alias to avoid breaking support for non device tree users
	PCI: rcar: Always allocate MSI addresses in 32bit space
	soundwire: cadence: fix ACK/NAK handling
	pwm: rockchip: Enable APB clock during register access while probing
	pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare()
	pwm: rockchip: Eliminate potential race condition when probing
	PCI: xilinx-cpm: Fix reference count leak on error path
	VMCI: Use set_page_dirty_lock() when unregistering guest memory
	PCI: Align checking of syscall user config accessors
	mei: hbm: call mei_set_devstate() on hbm stop response
	drm/msm: Fix MSM_INFO_GET_IOVA with carveout
	drm/msm/dsi: Correct io_start for MSM8994 (20nm PHY)
	drm/msm/mdp5: Fix wait-for-commit for cmd panels
	drm/msm: Fix race of GPU init vs timestamp power management.
	drm/msm: Fix races managing the OOB state for timestamp vs timestamps.
	drm/msm/dp: trigger unplug event in msm_dp_display_disable
	vfio/iommu_type1: Populate full dirty when detach non-pinned group
	vfio/iommu_type1: Fix some sanity checks in detach group
	vfio-pci/zdev: fix possible segmentation fault issue
	ext4: fix potential htree index checksum corruption
	phy: USB_LGM_PHY should depend on X86
	coresight: etm4x: Skip accessing TRCPDCR in save/restore
	nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of()
	nvmem: core: skip child nodes not matching binding
	soundwire: bus: use sdw_update_no_pm when initializing a device
	soundwire: bus: use sdw_write_no_pm when setting the bus scale registers
	soundwire: export sdw_write/read_no_pm functions
	soundwire: bus: fix confusion on device used by pm_runtime
	misc: fastrpc: fix incorrect usage of dma_map_sgtable
	remoteproc/mediatek: acknowledge watchdog IRQ after handled
	regmap: sdw: use _no_pm functions in regmap_read/write
	ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it
	mailbox: sprd: correct definition of SPRD_OUTBOX_FIFO_FULL
	device-dax: Fix default return code of range_parse()
	PCI: pci-bridge-emul: Fix array overruns, improve safety
	PCI: cadence: Fix DMA range mapping early return error
	i40e: Fix flow for IPv6 next header (extension header)
	i40e: Add zero-initialization of AQ command structures
	i40e: Fix overwriting flow control settings during driver loading
	i40e: Fix addition of RX filters after enabling FW LLDP agent
	i40e: Fix VFs not created
	Take mmap lock in cacheflush syscall
	nios2: fixed broken sys_clone syscall
	i40e: Fix add TC filter for IPv6
	octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()
	pwm: iqs620a: Fix overflow and optimize calculations
	vfio/type1: Use follow_pte()
	ice: report correct max number of TCs
	ice: Account for port VLAN in VF max packet size calculation
	ice: Fix state bits on LLDP mode switch
	ice: update the number of available RSS queues
	net: stmmac: fix CBS idleslope and sendslope calculation
	net/mlx4_core: Add missed mlx4_free_cmd_mailbox()
	PCI: rockchip: Make 'ep-gpios' DT property optional
	vxlan: move debug check after netdev unregister
	wireguard: device: do not generate ICMP for non-IP packets
	wireguard: kconfig: use arm chacha even with no neon
	ocfs2: fix a use after free on error
	mm: memcontrol: fix NR_ANON_THPS accounting in charge moving
	mm: memcontrol: fix slub memory accounting
	mm/memory.c: fix potential pte_unmap_unlock pte error
	mm/hugetlb: fix potential double free in hugetlb_register_node() error path
	mm/hugetlb: suppress wrong warning info when alloc gigantic page
	mm/compaction: fix misbehaviors of fast_find_migrateblock()
	r8169: fix jumbo packet handling on RTL8168e
	NFSv4: Fixes for nfs4_bitmask_adjust()
	KVM: SVM: Intercept INVPCID when it's disabled to inject #UD
	KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages
	arm64: Add missing ISB after invalidating TLB in __primary_switch
	i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition
	i2c: exynos5: Preserve high speed master code
	mm,thp,shmem: make khugepaged obey tmpfs mount flags
	mm: fix memory_failure() handling of dax-namespace metadata
	mm/rmap: fix potential pte_unmap on an not mapped pte
	proc: use kvzalloc for our kernel buffer
	csky: Fix a size determination in gpr_get()
	scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
	scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
	block: reopen the device in blkdev_reread_part
	ide/falconide: Fix module unload
	scsi: sd: Fix Opal support
	blk-settings: align max_sectors on "logical_block_size" boundary
	soundwire: intel: fix possible crash when no device is detected
	ACPI: property: Fix fwnode string properties matching
	ACPI: configfs: add missing check after configfs_register_default_group()
	cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known
	HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming
	HID: wacom: Ignore attempts to overwrite the touch_max value from HID
	Input: raydium_ts_i2c - do not send zero length
	Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X|S
	Input: joydev - prevent potential read overflow in ioctl
	Input: i8042 - add ASUS Zenbook Flip to noselftest list
	media: mceusb: Fix potential out-of-bounds shift
	USB: serial: option: update interface mapping for ZTE P685M
	usb: musb: Fix runtime PM race in musb_queue_resume_work
	usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
	usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
	USB: serial: ftdi_sio: fix FTX sub-integer prescaler
	USB: serial: pl2303: fix line-speed handling on newer chips
	USB: serial: mos7840: fix error code in mos7840_write()
	USB: serial: mos7720: fix error code in mos7720_write()
	phy: lantiq: rcu-usb2: wait after clock enable
	ALSA: fireface: fix to parse sync status register of latter protocol
	ALSA: hda: Add another CometLake-H PCI ID
	ALSA: hda/hdmi: Drop bogus check at closing a stream
	ALSA: hda/realtek: modify EAPD in the ALC886
	ALSA: hda/realtek: Quirk for HP Spectre x360 14 amp setup
	MIPS: Ingenic: Disable HPTLB for D0 XBurst CPUs too
	MIPS: Support binutils configured with --enable-mips-fix-loongson3-llsc=yes
	MIPS: VDSO: Use CLANG_FLAGS instead of filtering out '--target='
	Revert "MIPS: Octeon: Remove special handling of CONFIG_MIPS_ELF_APPENDED_DTB=y"
	Revert "bcache: Kill btree_io_wq"
	bcache: Give btree_io_wq correct semantics again
	bcache: Move journal work to new flush wq
	Revert "drm/amd/display: Update NV1x SR latency values"
	drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()
	drm/amd/display: Remove Assert from dcn10_get_dig_frontend
	drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1
	drm/amdkfd: Fix recursive lock warnings
	drm/amdgpu: Set reference clock to 100Mhz on Renoir (v2)
	drm/nouveau/kms: handle mDP connectors
	drm/modes: Switch to 64bit maths to avoid integer overflow
	drm/sched: Cancel and flush all outstanding jobs before finish.
	drm/panel: kd35t133: allow using non-continuous dsi clock
	drm/rockchip: Require the YTR modifier for AFBC
	ASoC: siu: Fix build error by a wrong const prefix
	selinux: fix inconsistency between inode_getxattr and inode_listsecurity
	erofs: initialized fields can only be observed after bit is set
	tpm_tis: Fix check_locality for correct locality acquisition
	tpm_tis: Clean up locality release
	KEYS: trusted: Fix incorrect handling of tpm_get_random()
	KEYS: trusted: Fix migratable=1 failing
	KEYS: trusted: Reserve TPM for seal and unseal operations
	btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
	btrfs: do not warn if we can't find the reloc root when looking up backref
	btrfs: add asserts for deleting backref cache nodes
	btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
	btrfs: fix reloc root leak with 0 ref reloc roots on recovery
	btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
	btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
	btrfs: account for new extents being deleted in total_bytes_pinned
	btrfs: fix extent buffer leak on failure to copy root
	drm/i915/gt: Flush before changing register state
	drm/i915/gt: Correct surface base address for renderclear
	crypto: arm64/sha - add missing module aliases
	crypto: aesni - prevent misaligned buffers on the stack
	crypto: michael_mic - fix broken misalignment handling
	crypto: sun4i-ss - checking sg length is not sufficient
	crypto: sun4i-ss - IV register does not work on A10 and A13
	crypto: sun4i-ss - handle BigEndian for cipher
	crypto: sun4i-ss - initialize need_fallback
	soc: samsung: exynos-asv: don't defer early on not-supported SoCs
	soc: samsung: exynos-asv: handle reading revision register error
	seccomp: Add missing return in non-void function
	arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL)
	misc: rtsx: init of rts522a add OCP power off when no card is present
	drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue
	pstore: Fix typo in compression option name
	dts64: mt7622: fix slow sd card access
	arm64: dts: agilex: fix phy interface bit shift for gmac1 and gmac2
	staging/mt7621-dma: mtk-hsdma.c->hsdma-mt7621.c
	staging: gdm724x: Fix DMA from stack
	staging: rtl8188eu: Add Edimax EW-7811UN V2 to device table
	floppy: reintroduce O_NDELAY fix
	media: i2c: max9286: fix access to unallocated memory
	media: ir_toy: add another IR Droid device
	media: ipu3-cio2: Fix mbus_code processing in cio2_subdev_set_fmt()
	media: marvell-ccic: power up the device on mclk enable
	media: smipcie: fix interrupt handling and IR timeout
	x86/virt: Eat faults on VMXOFF in reboot flows
	x86/reboot: Force all cpus to exit VMX root if VMX is supported
	x86/fault: Fix AMD erratum #91 errata fixup for user code
	x86/entry: Fix instrumentation annotation
	powerpc/prom: Fix "ibm,arch-vec-5-platform-support" scan
	rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
	rcu/nocb: Perform deferred wake up before last idle's need_resched() check
	kprobes: Fix to delay the kprobes jump optimization
	arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55
	iommu/arm-smmu-qcom: Fix mask extraction for bootloader programmed SMRs
	arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails
	arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing
	arm64 module: set plt* section addresses to 0x0
	arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
	riscv: Disable KSAN_SANITIZE for vDSO
	watchdog: qcom: Remove incorrect usage of QCOM_WDT_ENABLE_IRQ
	watchdog: mei_wdt: request stop on unregister
	coresight: etm4x: Handle accesses to TRCSTALLCTLR
	mtd: spi-nor: sfdp: Fix last erase region marking
	mtd: spi-nor: sfdp: Fix wrong erase type bitmask for overlaid region
	mtd: spi-nor: core: Fix erase type discovery for overlaid region
	mtd: spi-nor: core: Add erase size check for erase command initialization
	mtd: spi-nor: hisi-sfc: Put child node np on error path
	fs/affs: release old buffer head on error path
	seq_file: document how per-entry resources are managed.
	x86: fix seq_file iteration for pat/memtype.c
	mm: memcontrol: fix swap undercounting in cgroup2
	mm: memcontrol: fix get_active_memcg return value
	hugetlb: fix update_and_free_page contig page struct assumption
	hugetlb: fix copy_huge_page_from_user contig page struct assumption
	mm/vmscan: restore zone_reclaim_mode ABI
	mm, compaction: make fast_isolate_freepages() stay within zone
	KVM: nSVM: fix running nested guests when npt=0
	nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer
	module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
	mmc: sdhci-esdhc-imx: fix kernel panic when remove module
	mmc: sdhci-pci-o2micro: Bug fix for SDR104 HW tuning failure
	powerpc/32: Preserve cr1 in exception prolog stack check to fix build error
	powerpc/kexec_file: fix FDT size estimation for kdump kernel
	powerpc/32s: Add missing call to kuep_lock on syscall entry
	spmi: spmi-pmic-arb: Fix hw_irq overflow
	mei: fix transfer over dma with extended header
	mei: me: emmitsburg workstation DID
	mei: me: add adler lake point S DID
	mei: me: add adler lake point LP DID
	gpio: pcf857x: Fix missing first interrupt
	mfd: gateworks-gsc: Fix interrupt type
	printk: fix deadlock when kernel panic
	exfat: fix shift-out-of-bounds in exfat_fill_super()
	zonefs: Fix file size of zones in full condition
	kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
	thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
	cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks
	cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument
	cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available
	proc: don't allow async path resolution of /proc/thread-self components
	s390/vtime: fix inline assembly clobber list
	virtio/s390: implement virtio-ccw revision 2 correctly
	um: mm: check more comprehensively for stub changes
	um: defer killing userspace on page table update failures
	irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap
	f2fs: fix out-of-repair __setattr_copy()
	f2fs: enforce the immutable flag on open files
	f2fs: flush data when enabling checkpoint back
	sparc32: fix a user-triggerable oops in clear_user()
	spi: fsl: invert spisel_boot signal on MPC8309
	spi: spi-synquacer: fix set_cs handling
	gfs2: fix glock confusion in function signal_our_withdraw
	gfs2: Don't skip dlm unlock if glock has an lvb
	gfs2: Lock imbalance on error path in gfs2_recover_one
	gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
	dm: fix deadlock when swapping to encrypted device
	dm table: fix iterate_devices based device capability checks
	dm table: fix DAX iterate_devices based device capability checks
	dm table: fix zoned iterate_devices based device capability checks
	dm writecache: fix performance degradation in ssd mode
	dm writecache: return the exact table values that were set
	dm writecache: fix writing beyond end of underlying device when shrinking
	dm era: Recover committed writeset after crash
	dm era: Update in-core bitset after committing the metadata
	dm era: Verify the data block size hasn't changed
	dm era: Fix bitset memory leaks
	dm era: Use correct value size in equality function of writeset tree
	dm era: Reinitialize bitset cache before digesting a new writeset
	dm era: only resize metadata in preresume
	drm/i915: Reject 446-480MHz HDMI clock on GLK
	kgdb: fix to kill breakpoints on initmem after boot
	ipv6: silence compilation warning for non-IPV6 builds
	net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending
	wireguard: selftests: test multiple parallel streams
	wireguard: queueing: get rid of per-peer ring buffers
	net: sched: fix police ext initialization
	net: qrtr: Fix memory leak in qrtr_tun_open
	net_sched: fix RTNL deadlock again caused by request_module()
	ARM: dts: aspeed: Add LCLK to lpc-snoop
	Linux 5.10.20

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3fbcecd9413ce212dac68d5cc800c9457feba56a
2021-03-07 12:33:33 +01:00
Dave Hansen
54683f81c8 mm/vmscan: restore zone_reclaim_mode ABI
commit 519983645a9f2ec339cabfa0c6ef7b09be985dd0 upstream.

I went to go add a new RECLAIM_* mode for the zone_reclaim_mode sysctl.
Like a good kernel developer, I also went to go update the
documentation.  I noticed that the bits in the documentation didn't
match the bits in the #defines.

The VM never explicitly checks the RECLAIM_ZONE bit.  The bit is,
however implicitly checked when checking 'node_reclaim_mode==0'.  The
RECLAIM_ZONE #define was removed in a cleanup.  That, by itself is fine.

But, when the bit was removed (bit 0) the _other_ bit locations also got
changed.  That's not OK because the bit values are documented to mean
one specific thing.  Users surely do not expect the meaning to change
from kernel to kernel.

The end result is that if someone had a script that did:

	sysctl vm.zone_reclaim_mode=1

it would have gone from enabling node reclaim for clean unmapped pages
to writing out pages during node reclaim after the commit in question.
That's not great.

Put the bits back the way they were and add a comment so something like
this is a bit harder to do again.  Update the documentation to make it
clear that the first bit is ignored.

Link: https://lkml.kernel.org/r/20210219172555.FF0CDF23@viggo.jf.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 648b5cf368 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: "Tobin C. Harding" <tobin@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:38 +01:00
Suren Baghdasaryan
aa8f690d71 ANDROID: vmscan: Fix sparse warnings for kswapd_threads
Fix the following warning by declaring kswapd_threads as static:
mm/vmscan.c:175:5: sparse: symbol 'kswapd_threads' was not declared.

Fixes: 0d61a651e4 ("ANDROID: vmscan: Support multiple kswapd threads per node")
Bug: 171351667
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I6ccb1e0e0f597fda27ed6893d254a77341139084
2021-02-22 18:13:47 -08:00
Charan Teja Reddy
0d61a651e4 ANDROID: vmscan: Support multiple kswapd threads per node
Page replacement is handled in the Linux Kernel in one of two ways:

1) Asynchronously via kswapd
2) Synchronously, via direct reclaim

At page allocation time the allocating task is immediately given a page
from the zone free list allowing it to go right back to work doing
whatever it was doing; Probably directly or indirectly executing business
logic.

Just prior to satisfying the allocation, free pages is checked to see if
it has reached the zone low watermark and if so, kswapd is awakened.
Kswapd will start scanning pages looking for inactive pages to evict to
make room for new page allocations. The work of kswapd allows tasks to
continue allocating memory from their respective zone free list without
incurring any delay.

When the demand for free pages exceeds the rate that kswapd tasks can
supply them, page allocation works differently. Once the allocating task
finds that the number of free pages is at or below the zone min watermark,
the task will no longer pull pages from the free list. Instead, the task
will run the same CPU-bound routines as kswapd to satisfy its own
allocation by scanning and evicting pages. This is called a direct reclaim.

The time spent performing a direct reclaim can be substantial, often
taking tens to hundreds of milliseconds for small order0 allocations to
half a second or more for order9 huge-page allocations. In fact, kswapd is
not actually required on a linux system. It exists for the sole purpose of
optimizing performance by preventing direct reclaims.

When memory shortfall is sufficient to trigger direct reclaims, they can
occur in any task that is running on the system. A single aggressive
memory allocating task can set the stage for collateral damage to occur in
small tasks that rarely allocate additional memory. Consider the impact of
injecting an additional 100ms of latency when nscd allocates memory to
facilitate caching of a DNS query.

The presence of direct reclaims 10 years ago was a fairly reliable
indicator that too much was being asked of a Linux system. Kswapd was
likely wasting time scanning pages that were ineligible for eviction.
Adding RAM or reducing the working set size would usually make the problem
go away. Since then hardware has evolved to bring a new struggle for
kswapd. Storage speeds have increased by orders of magnitude while CPU
clock speeds stayed the same or even slowed down in exchange for more
cores per package. This presents a throughput problem for a single
threaded kswapd that will get worse with each generation of new hardware.

Test Details

NOTE: The tests below were run with shadow entries disabled. See the
associated patch and cover letter for details

The tests below were designed with the assumption that a kswapd bottleneck
is best demonstrated using filesystem reads. This way, the inactive list
will be full of clean pages, simplifying the analysis and allowing kswapd
to achieve the highest possible steal rate. Maximum steal rates for kswapd
are likely to be the same or lower for any other mix of page types on the
system.

Tests were run on a 2U Oracle X7-2L with 52 Intel Xeon Skylake 2GHz cores,
756GB of RAM and 8 x 3.6 TB NVMe Solid State Disk drives. Each drive has
an XFS file system mounted separately as /d0 through /d7. SSD drives
require multiple concurrent streams to show their potential, so I created
eleven 250GB zero-filled files on each drive so that I could test with
parallel reads.

The test script runs in multiple stages. At each stage, the number of dd
tasks run concurrently is increased by 2. I did not include all of the
test output for brevity.

During each stage dd tasks are launched to read from each drive in a round
robin fashion until the specified number of tasks for the stage has been
reached. Then iostat, vmstat and top are started in the background with 10
second intervals. After five minutes, all of the dd tasks are killed and
the iostat, vmstat and top output is parsed in order to report the
following:

CPU consumption
- sy - aggregate kernel mode CPU consumption from vmstat output. The value
       doesn't tend to fluctuate much so I just grab the highest value.
       Each sample is averaged over 10 seconds
- dd_cpu - for all of the dd tasks averaged across the top samples since
           there is a lot of variation.

Throughput
- in Kbytes
- Command is iostat -x -d 10 -g total

This first test performs reads using O_DIRECT in order to show the maximum
throughput that can be obtained using these drives. It also demonstrates
how rapidly throughput scales as the number of dd tasks are increased.

The dd command for this test looks like this:

Command Used: dd iflag=direct if=/d${i}/$n of=/dev/null bs=4M

Test #1: Direct IO
dd sy dd_cpu throughput
6  0  2.33   14726026.40
10 1  2.95   19954974.80
16 1  2.63   24419689.30
22 1  2.63   25430303.20
28 1  2.91   26026513.20
34 1  2.53   26178618.00
40 1  2.18   26239229.20
46 1  1.91   26250550.40
52 1  1.69   26251845.60
58 1  1.54   26253205.60
64 1  1.43   26253780.80
70 1  1.31   26254154.80
76 1  1.21   26253660.80
82 1  1.12   26254214.80
88 1  1.07   26253770.00
90 1  1.04   26252406.40

Throughput was close to peak with only 22 dd tasks. Very little system CPU
was consumed as expected as the drives DMA directly into the user address
space when using direct IO.

In this next test, the iflag=direct option is removed and we only run the
test until the pgscan_kswapd from /proc/vmstat starts to increment. At
that point metrics are parsed and reported and the pagecache contents are
dropped prior to the next test. Lather, rinse, repeat.

Test #2: standard file system IO, no page replacement
dd sy dd_cpu throughput
6  2  28.78  5134316.40
10 3  31.40  8051218.40
16 5  34.73  11438106.80
22 7  33.65  14140596.40
28 8  31.24  16393455.20
34 10 29.88  18219463.60
40 11 28.33  19644159.60
46 11 25.05  20802497.60
52 13 26.92  22092370.00
58 13 23.29  22884881.20
64 14 23.12  23452248.80
70 15 22.40  23916468.00
76 16 22.06  24328737.20
82 17 20.97  24718693.20
88 16 18.57  25149404.40
90 16 18.31  25245565.60

Each read has to pause after the buffer in kernel space is populated while
those pages are added to the pagecache and copied into the user address
space. For this reason, more parallel streams are required to achieve peak
throughput. The copy operation consumes substantially more CPU than direct
IO as expected.

The next test measures throughput after kswapd starts running. This is the
same test only we wait for kswapd to wake up before we start collecting
metrics. The script actually keeps track of a few things that were not
mentioned earlier. It tracks direct reclaims and page scans by watching
the metrics in /proc/vmstat. CPU consumption for kswapd is tracked the
same way it is tracked for dd.

Since the test is 100% reads, you can assume that the page steal rate for
kswapd and direct reclaims is almost identical to the scan rate.

Test #3: 1 kswapd thread per node
dd sy dd_cpu kswapd0 kswapd1 throughput  dr    pgscan_kswapd pgscan_direct
10 4  26.07  28.56   27.03   7355924.40  0     459316976     0
16 7  34.94  69.33   69.66   10867895.20 0     872661643     0
22 10 36.03  93.99   99.33   13130613.60 489   1037654473    11268334
28 10 30.34  95.90   98.60   14601509.60 671   1182591373    15429142
34 14 34.77  97.50   99.23   16468012.00 10850 1069005644    249839515
40 17 36.32  91.49   97.11   17335987.60 18903 975417728     434467710
46 19 38.40  90.54   91.61   17705394.40 25369 855737040     582427973
52 22 40.88  83.97   83.70   17607680.40 31250 709532935     724282458
58 25 40.89  82.19   80.14   17976905.60 35060 657796473     804117540
64 28 41.77  73.49   75.20   18001910.00 39073 561813658     895289337
70 33 45.51  63.78   64.39   17061897.20 44523 379465571     1020726436
76 36 46.95  57.96   60.32   16964459.60 47717 291299464     1093172384
82 39 47.16  55.43   56.16   16949956.00 49479 247071062     1134163008
88 42 47.41  53.75   47.62   16930911.20 51521 195449924     1180442208
90 43 47.18  51.40   50.59   16864428.00 51618 190758156     1183203901

In the previous test where kswapd was not involved, the system-wide kernel
mode CPU consumption with 90 dd tasks was 16%. In this test CPU consumption
with 90 tasks is at 43%. With 52 cores, and two kswapd tasks (one per NUMA
node), kswapd can only be responsible for a little over 4% of the increase.
The rest is likely caused by 51,618 direct reclaims that scanned 1.2
billion pages over the five minute time period of the test.

Same test, more kswapd tasks:

Test #4: 4 kswapd threads per node
dd sy dd_cpu kswapd0 kswapd1 throughput  dr    pgscan_kswapd pgscan_direct
10 5  27.09  16.65   14.17   7842605.60  0     459105291     0
16 10 37.12  26.02   24.85   11352920.40 15    920527796     358515
22 11 36.94  37.13   35.82   13771869.60 0     1132169011     0
28 13 35.23  48.43   46.86   16089746.00 0     1312902070     0
34 15 33.37  53.02   55.69   18314856.40 0     1476169080     0
40 19 35.90  69.60   64.41   19836126.80 0     1629999149     0
46 22 36.82  88.55   57.20   20740216.40 0     1708478106     0
52 24 34.38  93.76   68.34   21758352.00 0     1794055559     0
58 24 30.51  79.20   82.33   22735594.00 0     1872794397     0
64 26 30.21  97.12   76.73   23302203.60 176   1916593721     4206821
70 33 32.92  92.91   92.87   23776588.00 3575  1817685086     85574159
76 37 31.62  91.20   89.83   24308196.80 4752  1812262569     113981763
82 29 25.53  93.23   92.33   24802791.20 306   2032093122     7350704
88 43 37.12  76.18   77.01   25145694.40 20310 1253204719     487048202
90 42 38.56  73.90   74.57   22516787.60 22774 1193637495     545463615

By increasing the number of kswapd threads, throughput increased by ~50%
while kernel mode CPU utilization decreased or stayed the same, likely due
to a decrease in the number of parallel tasks at any given time doing page
replacement.

Signed-off-by: Buddy Lumpkin <buddy.lumpkin@oracle.com>
Bug: 171351667
Link: https://lore.kernel.org/lkml/1522661062-39745-1-git-send-email-buddy.lumpkin@oracle.com
[charante@codeaurora.org]: Changes made to select number of kswapds through uapi
Change-Id: I8425cab7f40cbeaf65af0ea118c1a9ac7da0930e
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2021-02-22 19:47:44 +00:00
Linus Torvalds
72c5ce8942 mm: don't put pinned pages into the swap cache
[ Upstream commit feb889fb40fafc6933339cf1cca8f770126819fb ]

So technically there is nothing wrong with adding a pinned page to the
swap cache, but the pinning obviously means that the page can't actually
be free'd right now anyway, so it's a bit pointless.

However, the real problem is not with it being a bit pointless: the real
issue is that after we've added it to the swap cache, we'll try to unmap
the page.  That will succeed, because the code in mm/rmap.c doesn't know
or care about pinned pages.

Even the unmapping isn't fatal per se, since the page will stay around
in memory due to the pinning, and we do hold the connection to it using
the swap cache.  But when we then touch it next and take a page fault,
the logic in do_swap_page() will map it back into the process as a
possibly read-only page, and we'll then break the page association on
the next COW fault.

Honestly, this issue could have been fixed in any of those other places:
(a) we could refuse to unmap a pinned page (which makes conceptual
sense), or (b) we could make sure to re-map a pinned page writably in
do_swap_page(), or (c) we could just make do_wp_page() not COW the
pinned page (which was what we historically did before that "mm:
do_wp_page() simplification" commit).

But while all of them are equally valid models for breaking this chain,
not putting pinned pages into the swap cache in the first place is the
simplest one by far.

It's also the safest one: the reason why do_wp_page() was changed in the
first place was that getting the "can I re-use this page" wrong is so
fraught with errors.  If you do it wrong, you end up with an incorrectly
shared page.

As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or
do_swap_page() would be a serious bug since it is only a (very good)
heuristic.  Re-using the page requires a hard black-and-white rule with
no room for ambiguity.

In contrast, saying "this page is very likely dma pinned, so let's not
add it to the swap cache and try to unmap it" is an obviously safe thing
to do, and if the heuristic might very rarely be a false positive, no
harm is done.

Fixes: 09854ba94c ("mm: do_wp_page() simplification")
Reported-and-tested-by: Martin Raiber <martin@urbackup.org>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:29 +01:00
Shakeel Butt
dd156e3fca mm/rmap: always do TTU_IGNORE_ACCESS
[ Upstream commit 013339df116c2ee0d796dd8bfb8f293a2030c063 ]

Since commit 369ea8242c ("mm/rmap: update to new mmu_notifier semantic
v2"), the code to check the secondary MMU's page table access bit is
broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from the
secondary MMU's page table before the check.  More specifically for those
secondary MMUs which unmap the memory in
mmu_notifier_invalidate_range_start() like kvm.

However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
access check before trying to unmap the page.  So, at worst the reclaim
will miss accesses in a very short window if we remove page table access
check in unmapping code.

There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
reclaim.  From memcg reclaim the page_referenced() only account the
accesses from the processes which are in the same memcg of the target page
but the unmapping code is considering accesses from all the processes, so,
decreasing the effectiveness of memcg reclaim.

The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping
code.

Link: https://lkml.kernel.org/r/20201104231928.1494083-1-shakeelb@google.com
Fixes: 369ea8242c ("mm/rmap: update to new mmu_notifier semantic v2")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30 11:53:55 +01:00
Nicholas Piggin
2da9f6305f mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit
Previously the negated unsigned long would be cast back to signed long
which would have the correct negative value.  After commit 730ec8c01a
("mm/vmscan.c: change prototype for shrink_page_list"), the large
unsigned int converts to a large positive signed long.

Symptoms include CMA allocations hanging forever holding the cma_mutex
due to alloc_contig_range->...->isolate_migratepages_block waiting
forever in "while (unlikely(too_many_isolated(pgdat)))".

[akpm@linux-foundation.org: fix -stat.nr_lazyfree_fail as well, per Michal]

Fixes: 730ec8c01a ("mm/vmscan.c: change prototype for shrink_page_list")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vaneet Narang <v.narang@samsung.com>
Cc: Maninder Singh <maninder1.s@samsung.com>
Cc: Amit Sahrawat <a.sahrawat@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201029032320.1448441-1-npiggin@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-14 11:26:03 -08:00
Yu Zhao
ed0173733d mm: use self-explanatory macros rather than "2"
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Link: http://lkml.kernel.org/r/20200831175042.3527153-2-yuzhao@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
Matthew Wilcox (Oracle)
3efe62e466 mm/vmscan: allow arbitrary sized pages to be paged out
Remove the assumption that a compound page has HPAGE_PMD_NR pins from the
page cache.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: "Huang, Ying" <ying.huang@intel.com>
Link: https://lkml.kernel.org/r/20200908195539.25896-12-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:15 -07:00
Hui Su
01c4776ba0 mm/vmscan: fix comments for isolate_lru_page()
fix comments for isolate_lru_page():
s/fundamentnal/fundamental

Signed-off-by: Hui Su <sh_def@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200927173923.GA8058@rlk
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:34 -07:00
Chunxin Zang
069c411de4 mm/vmscan: fix infinite loop in drop_slab_node
We have observed that drop_caches can take a considerable amount of
time (<put data here>).  Especially when there are many memcgs involved
because they are adding an additional overhead.

It is quite unfortunate that the operation cannot be interrupted by a
signal currently.  Add a check for fatal signals into the main loop so
that userspace can control early bailout.

There are two reasons:

1. We have too many memcgs, even though one object freed in one memcg,
   the sum of object is bigger than 10.

2. We spend a lot of time in traverse memcg once.  So, the memcg who
   traversed at the first have been freed many objects.  Traverse memcg
   next time, the freed count bigger than 10 again.

We can get the following info through 'ps':

  root:~# ps -aux | grep drop
  root  357956 ... R    Aug25 21119854:55 echo 3 > /proc/sys/vm/drop_caches
  root 1771385 ... R    Aug16 21146421:17 echo 3 > /proc/sys/vm/drop_caches
  root 1986319 ... R    18:56 117:27 echo 3 > /proc/sys/vm/drop_caches
  root 2002148 ... R    Aug24 5720:39 echo 3 > /proc/sys/vm/drop_caches
  root 2564666 ... R    18:59 113:58 echo 3 > /proc/sys/vm/drop_caches
  root 2639347 ... R    Sep03 2383:39 echo 3 > /proc/sys/vm/drop_caches
  root 3904747 ... R    03:35 993:31 echo 3 > /proc/sys/vm/drop_caches
  root 4016780 ... R    Aug21 7882:18 echo 3 > /proc/sys/vm/drop_caches

Use bpftrace follow 'freed' value in drop_slab_node:

  root:~# bpftrace -e 'kprobe:drop_slab_node+70 {@ret=hist(reg("bp")); }'
  Attaching 1 probe...
  ^B^C

  @ret:
  [64, 128)        1 |                                                    |
  [128, 256)      28 |                                                    |
  [256, 512)     107 |@                                                   |
  [512, 1K)      298 |@@@                                                 |
  [1K, 2K)       613 |@@@@@@@                                             |
  [2K, 4K)      4435 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [4K, 8K)       442 |@@@@@                                               |
  [8K, 16K)      299 |@@@                                                 |
  [16K, 32K)     100 |@                                                   |
  [32K, 64K)     139 |@                                                   |
  [64K, 128K)     56 |                                                    |
  [128K, 256K)    26 |                                                    |
  [256K, 512K)     2 |                                                    |

In the while loop, we can check whether the TASK_KILLABLE signal is set,
if so, we should break the loop.

Signed-off-by: Chunxin Zang <zangchunxin@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Link: https://lkml.kernel.org/r/20200909152047.27905-1-zangchunxin@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:34 -07:00
Hugh Dickins
8d8869ca5d mm: fix check_move_unevictable_pages() on THP
check_move_unevictable_pages() is used in making unevictable shmem pages
evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
i915/gem check_release_pagevec().  Those may pass down subpages of a huge
page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".

That does not crash or warn at present, but the accounting of vmstats
unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
scanned being incremented on each subpage, rescued only on the head (since
tails already appear evictable once the head has been updated).

5.8 commit 5d91f31faf ("mm: swap: fix vmstats for huge page") has
established that vm_events in general (and unevictable_pgs_rescued in
particular) should count every subpage: so follow that precedent here.

Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
(to check page->mem_cgroup is always set), no problem: skip the tails
before calling it, and add thp_nr_pages() to vmstats on the head.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301405000.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19 13:13:38 -07:00
Xunlei Pang
e3336cab25 mm: memcg: fix memcg reclaim soft lockup
We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg
doesn't have any reclaimable memory.

It can be easily reproduced as below:

  watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
  CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
  Call Trace:
    shrink_lruvec+0x49f/0x640
    shrink_node+0x2a6/0x6f0
    do_try_to_free_pages+0xe9/0x3e0
    try_to_free_mem_cgroup_pages+0xef/0x1f0
    try_charge+0x2c1/0x750
    mem_cgroup_charge+0xd7/0x240
    __add_to_page_cache_locked+0x2fd/0x370
    add_to_page_cache_lru+0x4a/0xc0
    pagecache_get_page+0x10b/0x2f0
    filemap_fault+0x661/0xad0
    ext4_filemap_fault+0x2c/0x40
    __do_fault+0x4d/0xf9
    handle_mm_fault+0x1080/0x1790

It only happens on our 1-vcpu instances, because there's no chance for
oom reaper to run to reclaim the to-be-killed process.

Add a cond_resched() at the upper shrink_node_memcgs() to solve this
issue, this will mean that we will get a scheduling point for each memcg
in the reclaimed hierarchy without any dependency on the reclaimable
memory in that memcg thus making it more predictable.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 12:14:29 -07:00
Matthew Wilcox (Oracle)
6c357848b4 mm: replace hpage_nr_pages with thp_nr_pages
The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.

[akpm@linux-foundation.org: fix mm/migrate.c]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14 19:56:56 -07:00
Randy Dunlap
1eba09c15d mm/vmscan.c: delete or fix duplicated words
Drop the repeated word "marked".
Change "time time" to "same time".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Link: http://lkml.kernel.org/r/20200801173822.14973-14-rdunlap@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:57:58 -07:00