Commit Graph

27 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
ee4c9c95ff Merge 6.1.34 into android14-6.1-lts
Changes in 6.1.34
	scsi: megaraid_sas: Add flexible array member for SGLs
	net: sfp: fix state loss when updating state_hw_mask
	spi: mt65xx: make sure operations completed before unloading
	platform/surface: aggregator: Allow completion work-items to be executed in parallel
	platform/surface: aggregator_tabletsw: Add support for book mode in KIP subsystem
	spi: qup: Request DMA before enabling clocks
	afs: Fix setting of mtime when creating a file/dir/symlink
	wifi: mt76: mt7615: fix possible race in mt7615_mac_sta_poll
	bpf, sockmap: Avoid potential NULL dereference in sk_psock_verdict_data_ready()
	neighbour: fix unaligned access to pneigh_entry
	net: dsa: lan9303: allow vid != 0 in port_fdb_{add|del} methods
	net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294
	bpf: Fix UAF in task local storage
	bpf: Fix elem_size not being set for inner maps
	net/ipv6: fix bool/int mismatch for skip_notify_on_dev_down
	net/smc: Avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT
	net: enetc: correct the statistics of rx bytes
	net: enetc: correct rx_bytes statistics of XDP
	net/sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values
	drm/i915: Explain the magic numbers for AUX SYNC/precharge length
	drm/i915: Use 18 fast wake AUX sync len
	Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER
	Bluetooth: Fix l2cap_disconnect_req deadlock
	Bluetooth: ISO: don't try to remove CIG if there are bound CIS left
	Bluetooth: L2CAP: Add missing checks for invalid DCID
	wifi: mac80211: use correct iftype HE cap
	wifi: cfg80211: reject bad AP MLD address
	wifi: mac80211: mlme: fix non-inheritence element
	wifi: mac80211: don't translate beacon/presp addrs
	qed/qede: Fix scheduling while atomic
	wifi: cfg80211: fix locking in sched scan stop work
	selftests/bpf: Verify optval=NULL case
	selftests/bpf: Fix sockopt_sk selftest
	netfilter: nft_bitwise: fix register tracking
	netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
	netfilter: ipset: Add schedule point in call_ad().
	netfilter: nf_tables: out-of-bound check in chain blob
	ipv6: rpl: Fix Route of Death.
	tcp: gso: really support BIG TCP
	rfs: annotate lockless accesses to sk->sk_rxhash
	rfs: annotate lockless accesses to RFS sock flow table
	net: sched: add rcu annotations around qdisc->qdisc_sleeping
	drm/i915/selftests: Stop using kthread_stop()
	drm/i915/selftests: Add some missing error propagation
	net: sched: move rtm_tca_policy declaration to include file
	net: sched: act_police: fix sparse errors in tcf_police_dump()
	net: sched: fix possible refcount leak in tc_chain_tmplt_add()
	bpf: Add extra path pointer check to d_path helper
	drm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vram
	lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()
	net: bcmgenet: Fix EEE implementation
	bnxt_en: Don't issue AP reset during ethtool's reset operation
	bnxt_en: Query default VLAN before VNIC setup on a VF
	bnxt_en: Skip firmware fatal error recovery if chip is not accessible
	bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event
	bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
	batman-adv: Broken sync while rescheduling delayed work
	Input: xpad - delete a Razer DeathAdder mouse VID/PID entry
	Input: psmouse - fix OOB access in Elantech protocol
	Input: fix open count when closing inhibited device
	ALSA: hda: Fix kctl->id initialization
	ALSA: ymfpci: Fix kctl->id initialization
	ALSA: gus: Fix kctl->id initialization
	ALSA: cmipci: Fix kctl->id initialization
	ALSA: hda/realtek: Add quirk for Clevo NS50AU
	ALSA: ice1712,ice1724: fix the kcontrol->id initialization
	ALSA: hda/realtek: Add a quirk for HP Slim Desktop S01
	ALSA: hda/realtek: Add Lenovo P3 Tower platform
	ALSA: hda/realtek: Add quirks for Asus ROG 2024 laptops using CS35L41
	drm/i915/gt: Use the correct error value when kernel_context() fails
	drm/amd/pm: conditionally disable pcie lane switching for some sienna_cichlid SKUs
	drm/amdgpu: fix xclk freq on CHIP_STONEY
	drm/amdgpu: change reserved vram info print
	drm/amd/pm: Fix power context allocation in SMU13
	drm/amd/display: Reduce sdp bw after urgent to 90%
	wifi: iwlwifi: mvm: Fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()
	can: j1939: j1939_sk_send_loop_abort(): improved error queue handling in J1939 Socket
	can: j1939: change j1939_netdev_lock type to mutex
	can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
	mptcp: only send RM_ADDR in nl_cmd_remove
	mptcp: add address into userspace pm list
	mptcp: update userspace pm infos
	selftests: mptcp: update userspace pm addr tests
	selftests: mptcp: update userspace pm subflow tests
	ceph: fix use-after-free bug for inodes when flushing capsnaps
	s390/dasd: Use correct lock while counting channel queue length
	Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk
	Bluetooth: fix debugfs registration
	Bluetooth: hci_qca: fix debugfs registration
	tee: amdtee: Add return_origin to 'struct tee_cmd_load_ta'
	rbd: move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting
	rbd: get snapshot context after exclusive lock is ensured to be held
	virtio_net: use control_buf for coalesce params
	soc: qcom: icc-bwmon: fix incorrect error code passed to dev_err_probe()
	pinctrl: meson-axg: add missing GPIOA_18 gpio group
	usb: usbfs: Enforce page requirements for mmap
	usb: usbfs: Use consistent mmap functions
	mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM
	mm: page_table_check: Ensure user pages are not slab pages
	arm64: dts: qcom: sc8280xp: Flush RSC sleep & wake votes
	ARM: at91: pm: fix imbalanced reference counter for ethernet devices
	ARM: dts: at91: sama7g5ek: fix debounce delay property for shdwc
	ASoC: codecs: wsa883x: do not set can_multi_write flag
	ASoC: codecs: wsa881x: do not set can_multi_write flag
	arm64: dts: qcom: sc7180-lite: Fix SDRAM freq for misidentified sc7180-lite boards
	arm64: dts: imx8qm-mek: correct GPIOs for USDHC2 CD and WP signals
	arm64: dts: imx8-ss-dma: assign default clock rate for lpuarts
	ASoC: mediatek: mt8195-afe-pcm: Convert to platform remove callback returning void
	ASoC: mediatek: mt8195: fix use-after-free in driver remove path
	ASoC: simple-card-utils: fix PCM constraint error check
	blk-mq: fix blk_mq_hw_ctx active request accounting
	arm64: dts: imx8mn-beacon: Fix SPI CS pinmux
	i2c: mv64xxx: Fix reading invalid status value in atomic mode
	firmware: arm_ffa: Set handle field to zero in memory descriptor
	gpio: sim: fix memory corruption when adding named lines and unnamed hogs
	i2c: sprd: Delete i2c adapter in .remove's error path
	riscv: mm: Ensure prot of VM_WRITE and VM_EXEC must be readable
	eeprom: at24: also select REGMAP
	soundwire: stream: Add missing clear of alloc_slave_rt
	riscv: fix kprobe __user string arg print fault issue
	vduse: avoid empty string for dev name
	vhost: support PACKED when setting-getting vring_base
	vhost_vdpa: support PACKED when setting-getting vring_base
	ksmbd: fix out-of-bound read in deassemble_neg_contexts()
	ksmbd: fix out-of-bound read in parse_lease_state()
	ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loop
	Revert "ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled"
	ext4: only check dquot_initialize_needed() when debugging
	wifi: rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS
	wifi: rtw88: correct PS calculation for SUPPORTS_DYNAMIC_PS
	Revert "staging: rtl8192e: Replace macro RTL_PCI_DEVICE with PCI_DEVICE"
	Linux 6.1.34

Note, commit 898c9a0ee7 ("bpf, sockmap: Avoid potential NULL
dereference in sk_psock_verdict_data_ready()") is merged away in this
merge, due to missing dependencies, it will come back in later.

Change-Id: I8e57d0914e6114822a8941a4663525d85377ca8a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 19:49:28 +00:00
Ruihan Li
08378f0314 mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM
commit 81a31a860bb61d54eb688af2568d9332ed9b8942 upstream.

Without EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary
physical memory regions into the userspace via /dev/mem. At the same
time, pages may change their properties (e.g., from anonymous pages to
named pages) while they are still being mapped in the userspace, leading
to "corruption" detected by the page table check.

To avoid these false positives, this patch makes PAGE_TABLE_CHECK
depends on EXCLUSIVE_SYSTEM_RAM. This dependency is understandable
because PAGE_TABLE_CHECK is a hardening technique but /dev/mem without
STRICT_DEVMEM (i.e., !EXCLUSIVE_SYSTEM_RAM) is itself a security
problem.

Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be
mapped via /dev/mem. However, these pages are always considered as named
pages, so they won't break the logic used in the page table check.

Cc: <stable@vger.kernel.org> # 5.17
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20230515130958.32471-4-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-14 11:15:29 +02:00
Suren Baghdasaryan
71c7092b68 ANDROID: Revert "mm: remove cleancache"
This reverts commit 0a4ee51818.

Conflicts:
	Documentation/mm/cleancache.rst
	Documentation/vm/index.rst
	arch/arm/configs/bcm2835_defconfig
	arch/arm/configs/qcom_defconfig
	arch/m68k/configs/amiga_defconfig
	arch/m68k/configs/apollo_defconfig
	arch/m68k/configs/atari_defconfig
	arch/m68k/configs/bvme6000_defconfig
	arch/m68k/configs/hp300_defconfig
	arch/m68k/configs/mac_defconfig
	arch/m68k/configs/multi_defconfig
	arch/m68k/configs/mvme147_defconfig
	arch/m68k/configs/mvme16x_defconfig
	arch/m68k/configs/q40_defconfig
	arch/m68k/configs/sun3_defconfig
	arch/m68k/configs/sun3x_defconfig
	arch/s390/configs/debug_defconfig
	arch/s390/configs/defconfig
	fs/f2fs/data.c
	fs/mpage.c

1. Skip documentation which was refactored.
2. Skip defconfigs unused in Android.
3. Replaced deprecated __submit_bio() with f2fs_submit_read_bio()
4. Replaced PageUptodate() with folio_test_uptodate()
5. Replaced SetPageUptodate() with folio_mark_uptodate()
6. Changed cleancache_get_page() call to use folio->page

Bug: 271544708
Change-Id: I93359509f7799de72f31b002a2539565d1bda9d6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-04-26 17:01:50 +00:00
Greg Kroah-Hartman
0fff48d6fe Merge 6.1.24 into android14-6.1
Changes in 6.1.24
	dm cache: Add some documentation to dm-cache-background-tracker.h
	dm integrity: Remove bi_sector that's only used by commented debug code
	dm: change "unsigned" to "unsigned int"
	dm: fix improper splitting for abnormal bios
	KVM: arm64: PMU: Align chained counter implementation with architecture pseudocode
	KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow
	KVM: arm64: PMU: Sanitise PMCR_EL0.LP on first vcpu run
	KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU
	gpio: GPIO_REGMAP: select REGMAP instead of depending on it
	Drivers: vmbus: Check for channel allocation before looking up relids
	ASoC: SOF: ipc4: Ensure DSP is in D0I0 during sof_ipc4_set_get_data()
	pwm: Make .get_state() callback return an error code
	pwm: hibvt: Explicitly set .polarity in .get_state()
	pwm: cros-ec: Explicitly set .polarity in .get_state()
	pwm: iqs620a: Explicitly set .polarity in .get_state()
	pwm: sprd: Explicitly set .polarity in .get_state()
	pwm: meson: Explicitly set .polarity in .get_state()
	ASoC: codecs: lpass: fix the order or clks turn off during suspend
	KVM: s390: pv: fix external interruption loop not always detected
	wifi: mac80211: fix the size calculation of ieee80211_ie_len_eht_cap()
	wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
	net: qrtr: Fix a refcount bug in qrtr_recvmsg()
	net: phylink: add phylink_expects_phy() method
	net: stmmac: check if MAC needs to attach to a PHY
	net: stmmac: remove redundant fixup to support fixed-link mode
	l2tp: generate correct module alias strings
	wifi: brcmfmac: Fix SDIO suspend/resume regression
	NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
	nfsd: call op_release, even when op_func returns an error
	icmp: guard against too small mtu
	ALSA: hda/hdmi: Preserve the previous PCM device upon re-enablement
	net: don't let netpoll invoke NAPI if in xmit context
	net: dsa: mv88e6xxx: Reset mv88e6393x force WD event bit
	sctp: check send stream number after wait_for_sndbuf
	net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT
	ipv6: Fix an uninit variable access bug in __ip6_make_skb()
	platform/x86: think-lmi: Fix memory leak when showing current settings
	platform/x86: think-lmi: Fix memory leaks when parsing ThinkStation WMI strings
	platform/x86: think-lmi: Clean up display of current_value on Thinkstation
	gpio: davinci: Do not clear the bank intr enable bit in save_context
	gpio: davinci: Add irq chip flag to skip set wake
	net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
	net: stmmac: fix up RX flow hash indirection table when setting channels
	sunrpc: only free unix grouplist after RCU settles
	NFSD: callback request does not use correct credential for AUTH_SYS
	ice: fix wrong fallback logic for FDIR
	ice: Reset FDIR counter in FDIR init stage
	raw: use net_hash_mix() in hash function
	raw: Fix NULL deref in raw_get_next().
	ping: Fix potentail NULL deref for /proc/net/icmp.
	ethtool: reset #lanes when lanes is omitted
	netlink: annotate lockless accesses to nlk->max_recvmsg_len
	gve: Secure enough bytes in the first TX desc for all TCP pkts
	arm64: compat: Work around uninitialized variable warning
	net: stmmac: check fwnode for phy device before scanning for phy
	cxl/pci: Fix CDAT retrieval on big endian
	cxl/pci: Handle truncated CDAT header
	cxl/pci: Handle truncated CDAT entries
	cxl/pci: Handle excessive CDAT length
	PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
	PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
	usb: xhci: tegra: fix sleep in atomic call
	xhci: Free the command allocated for setting LPM if we return early
	xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
	usb: cdnsp: Fixes error: uninitialized symbol 'len'
	usb: dwc3: pci: add support for the Intel Meteor Lake-S
	USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
	usb: typec: altmodes/displayport: Fix configure initial pin assignment
	USB: serial: option: add Telit FE990 compositions
	USB: serial: option: add Quectel RM500U-CN modem
	drivers: iio: adc: ltc2497: fix LSB shift
	iio: adis16480: select CONFIG_CRC32
	iio: adc: qcom-spmi-adc5: Fix the channel name
	iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
	iio: dac: cio-dac: Fix max DAC write value check for 12-bit
	iio: buffer: correctly return bytes written in output buffers
	iio: buffer: make sure O_NONBLOCK is respected
	iio: light: cm32181: Unregister second I2C client if present
	tty: serial: sh-sci: Fix transmit end interrupt handler
	tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
	tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
	nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
	nilfs2: fix sysfs interface lifetime
	dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs
	serial: 8250: Prevent starting up DMA Rx on THRI interrupt
	ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
	ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr
	ALSA: hda/realtek: Add quirk for Clevo X370SNW
	ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
	x86/acpi/boot: Correct acpi_is_processor_usable() check
	x86/ACPI/boot: Use FADT version to check support for online capable
	KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
	KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode
	mm: kfence: fix PG_slab and memcg_data clearing
	mm: kfence: fix handling discontiguous page
	coresight: etm4x: Do not access TRCIDR1 for identification
	coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
	counter: 104-quad-8: Fix race condition between FLAG and CNTR reads
	counter: 104-quad-8: Fix Synapse action reported for Index signals
	blk-mq: directly poll requests
	iio: adc: ad7791: fix IRQ flags
	io_uring: fix return value when removing provided buffers
	io_uring: fix memory leak when removing provided buffers
	scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()
	scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
	nvme: fix discard support without oncs
	cifs: sanitize paths in cifs_update_super_prepath.
	block: ublk: make sure that block size is set correctly
	block: don't set GD_NEED_PART_SCAN if scan partition failed
	perf/core: Fix the same task check in perf_event_set_output
	ftrace: Mark get_lock_parent_ip() __always_inline
	ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
	fs: drop peer group ids under namespace lock
	can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
	can: isotp: fix race between isotp_sendsmg() and isotp_release()
	can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
	can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
	ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()
	ACPI: video: Make acpi_backlight=video work independent from GPU driver
	ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2
	ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530
	net: stmmac: Add queue reset into stmmac_xdp_open() function
	tracing/synthetic: Fix races on freeing last_cmd
	tracing/timerlat: Notify new max thread latency
	tracing/osnoise: Fix notify new tracing_max_latency
	tracing: Free error logs of tracing instances
	ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
	tracing/synthetic: Make lastcmd_mutex static
	zsmalloc: document freeable stats
	mm: vmalloc: avoid warn_alloc noise caused by fatal signal
	wifi: mt76: ignore key disable commands
	ublk: read any SQE values upfront
	drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
	drm/nouveau/disp: Support more modes by checking with lower bpc
	drm/i915: Fix context runtime accounting
	drm/i915: fix race condition UAF in i915_perf_add_config_ioctl
	ring-buffer: Fix race while reader and writer are on the same page
	mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
	mm/hugetlb: fix uffd wr-protection for CoW optimization path
	maple_tree: fix get wrong data_end in mtree_lookup_walk()
	maple_tree: fix a potential concurrency bug in RCU mode
	blk-throttle: Fix that bps of child could exceed bps limited in parent
	drm/amd/display: Clear MST topology if it fails to resume
	drm/amdgpu: for S0ix, skip SDMA 5.x+ suspend/resume
	drm/amdgpu: skip psp suspend for IMU enabled ASICs mode2 reset
	drm/display/dp_mst: Handle old/new payload states in drm_dp_remove_payload()
	drm/i915/dp_mst: Fix payload removal during output disabling
	drm/bridge: lt9611: Fix PLL being unable to lock
	drm/i915: Use _MMIO_PIPE() for SKL_BOTTOM_COLOR
	drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
	mm: take a page reference when removing device exclusive entries
	maple_tree: remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk()
	maple_tree: fix potential rcu issue
	maple_tree: reduce user error potential
	maple_tree: fix handle of invalidated state in mas_wr_store_setup()
	maple_tree: fix mas_prev() and mas_find() state handling
	maple_tree: be more cautious about dead nodes
	maple_tree: refine ma_state init from mas_start()
	maple_tree: detect dead nodes in mas_start()
	maple_tree: fix freeing of nodes in rcu mode
	maple_tree: remove extra smp_wmb() from mas_dead_leaves()
	maple_tree: add smp_rmb() to dead node detection
	maple_tree: add RCU lock checking to rcu callback functions
	mm: enable maple tree RCU mode by default.
	bpftool: Print newline before '}' for struct with padding only fields
	Linux 6.1.24

Change-Id: I475408e1166927565c7788e7095bdf2cb236c4b2
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-22 08:52:25 +00:00
Sergey Senozhatsky
0d33aa4351 zsmalloc: document freeable stats
commit 618a8a917dbf5830e2064d2fa0568940eb5d2584 upstream.

When freeable class stat was added to classes file (back in 2016) we
forgot to update zsmalloc documentation.  Fix that.

Link: https://lkml.kernel.org/r/20230325024631.2817153-3-senozhatsky@chromium.org
Fixes: 1120ed5483 ("mm/zsmalloc: add `freeable' column to pool stat")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:55:35 +02:00
T.J. Alumbaugh
e1cf082319 UPSTREAM: mm: multi-gen LRU: section for memcg LRU
Move memcg LRU code into a dedicated section.  Improve the design doc to
outline its architecture.

Link: https://lkml.kernel.org/r/20230118001827.1040870-5-talumbau@google.com
Change-Id: Id252e420cff7a858acb098cf2b3642da5c40f602
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 36c7b4db7c942ae9e1b111f0c6b468c8b2e33842)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
2023-04-12 16:02:15 +00:00
T.J. Alumbaugh
282363eb6f UPSTREAM: mm: multi-gen LRU: section for Bloom filters
Move Bloom filters code into a dedicated section.  Improve the design doc
to explain Bloom filter usage and connection between aging and eviction in
their use.

Link: https://lkml.kernel.org/r/20230118001827.1040870-4-talumbau@google.com
Change-Id: I73e866f687c1ed9f5c8538086aa39408b79897db
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ccbbbb85945d8f0255aa9dbc1b617017e2294f2c)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
2023-04-12 16:02:15 +00:00
T.J. Alumbaugh
4d8cf6f6f0 UPSTREAM: mm: multi-gen LRU: section for rmap/PT walk feedback
Add a section for lru_gen_look_around() in the code and the design doc.

Link: https://lkml.kernel.org/r/20230118001827.1040870-3-talumbau@google.com
Change-Id: I5097af63f61b3b69ec2abee6cdbdc33c296df213
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit db19a43d9b3a8876552f00f656008206ef9a5efa)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
2023-04-12 16:02:15 +00:00
T.J. Alumbaugh
014c372cc3 UPSTREAM: mm: multi-gen LRU: section for working set protection
Patch series "mm: multi-gen LRU: improve".

This patch series improves a few MGLRU functions, collects related
functions, and adds additional documentation.

This patch (of 7):

Add a section for working set protection in the code and the design doc.
The admin doc already contains its usage.

Link: https://lkml.kernel.org/r/20230118001827.1040870-1-talumbau@google.com
Link: https://lkml.kernel.org/r/20230118001827.1040870-2-talumbau@google.com
Change-Id: I65599075fd42951db7739a2ab7cee78516e157b3
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7b8144e63d84716f16a1b929e0c7e03ae5c4d5c1)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
2023-04-12 16:02:15 +00:00
Yu Zhao
0f410148ae UPSTREAM: mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[]
lru_gen_folio will be chained into per-node lists by the coming
lrugen->list.

Link: https://lkml.kernel.org/r/20221222041905.2431096-3-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 6df1b2212950aae2b2188c6645ea18e2a9e3fdd5)
Change-Id: I09f53e0fb2cd6b8b3adbb8a80b15dc5efbeae857
Signed-off-by: T.J. Mercier <tjmercier@google.com>
2023-03-30 10:23:50 +00:00
Linus Torvalds
f2b220ef93 A handful of relatively simple documentation fixes, plus a set of patches
catching the Chinese translation up with the front-page rework.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmNIPyAPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Y6UMH/2ZLziHH0jQkoBAIhxyUzU3ZfXLlq5Xqo6vS
 oBYfJlYLClY/dmfTc3HAI6UhhJLGTcTNDaC1H4YQdgeP6RVJruFThOYF9WgW0FFl
 SgsBKhTbi3dpfdzjID8bJM7ytkbIvV4voNh52J9L1TA3z/CPxKSiXCScFAH/o12t
 E+CMjtmgi2P8w3kqgX59FMavp3W8M8HsT6u/wVoKb+zXjjqXGFYEXTjjKUxufRf6
 QWkaQGb0PHq9+2hAhgF4vdy4tWB9lr7r2ENZ8YKUkYUYfv5KGAqt39J7A4rC+g7w
 4Rvzznd0BJv3nuZ4rdxom7cOJ77i3lmWSJ65FoDHNeQ/8VBNuZc=
 =UpLC
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.1-2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A handful of relatively simple documentation fixes, plus a set of
  patches catching the Chinese translation up with the front-page
  rework"

* tag 'docs-6.1-2' of git://git.lwn.net/linux:
  Documentation: rtla: Correct command line example
  docs/zh_CN: add a man-pages link to zh_CN/index.rst
  docs/zh_CN: Rewrite the Chinese translation front page
  docs/zh_CN: add zh_CN/arch.rst
  docs/zh_CN: promote the title of zh_CN/process/index.rst
  docs/zh_CN: Update the translation of page_owner to 6.0-rc7
  docs/zh_CN: Update the translation of ksm to 6.0-rc7
  docs/howto: Replace abundoned URL of gmane.org
  Documentation: ubifs: Fix compression idiom
  Documentation/mm/page_owner.rst: delete frequently changing experimental data
  docs/zh_CN: Fix build warning
  docs: ftrace: Correct access mode
2022-10-13 10:58:32 -07:00
Linus Torvalds
27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Yixuan Cao
0719fdba54 Documentation/mm/page_owner.rst: delete frequently changing experimental data
The kernel size changes due to many factors, such as compiler
version, configuration, and the build environment. This makes
size comparison figures irrelevant to reader's setup.

Remove these figures and describe the effects of page owner
to the kernel size in general instead.

Thanks for Jonathan Corbet, Bagas Sanjaya and Mike Rapoport's
constructive suggestions.

Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Link: https://lore.kernel.org/r/20221005145525.10359-1-caoyixuan2019@email.szu.edu.cn
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-10 12:54:11 -06:00
Linus Torvalds
52abb27abf slab fixes for 6.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmM6/BMACgkQ4CHKc/GJ
 qRBqBAgAh+5JdVkYBxW4MvGEolRw0RDIBNwEwmyJI7WeAegL8FaGI3jmA5Kcww4c
 yA+lL/jcS9zQ/qwwHHoCqZoCLDFa43oiDMjSW4MI6oZpV+T6lx5uaH5kXBKsmxy5
 2dONP7kYG/eFfBGB6F9qQOLJnCz0CXeY7+O99D1Nldx0yKKUVCK0krb018p5oI6a
 RTVRASSVuEGkxvJGo4BbIR1H40s1BKTyRO9eZCKEHSanYM5SVXdBy9GTh5VQWTPk
 WLwvXmd0DehZzlPrgg3PMVPBTNGO/yplWibugWyzUqGcPIhQPk6Z76aWE4vojI2q
 f0w+86BYR2U7SBV2ZaNrGrxk/PZJyg==
 =aDgU
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fixes from Vlastimil Babka:

 - The "common kmalloc v4" series [1] by Hyeonggon Yoo.

   While the plan after LPC is to try again if it's possible to get rid
   of SLOB and SLAB (and if any critical aspect of those is not possible
   to achieve with SLUB today, modify it accordingly), it will take a
   while even in case there are no objections.

   Meanwhile this is a nice cleanup and some parts (e.g. to the
   tracepoints) will be useful even if we end up with a single slab
   implementation in the future:

      - Improves the mm/slab_common.c wrappers to allow deleting
        duplicated code between SLAB and SLUB.

      - Large kmalloc() allocations in SLAB are passed to page allocator
        like in SLUB, reducing number of kmalloc caches.

      - Removes the {kmem_cache_alloc,kmalloc}_node variants of
        tracepoints, node id parameter added to non-_node variants.

 - Addition of kmalloc_size_roundup()

   The first two patches from a series by Kees Cook [2] that introduce
   kmalloc_size_roundup(). This will allow merging of per-subsystem
   patches using the new function and ultimately stop (ab)using ksize()
   in a way that causes ongoing trouble for debugging functionality and
   static checkers.

 - Wasted kmalloc() memory tracking in debugfs alloc_traces

   A patch from Feng Tang that enhances the existing debugfs
   alloc_traces file for kmalloc caches with information about how much
   space is wasted by allocations that needs less space than the
   particular kmalloc cache provides.

 - My series [3] to fix validation races for caches with enabled
   debugging:

      - By decoupling the debug cache operation more from non-debug
        fastpaths, extra locking simplifications were possible and thus
        done afterwards.

      - Additional cleanup of PREEMPT_RT specific code on top, by Thomas
        Gleixner.

      - A late fix for slab page leaks caused by the series, by Feng
        Tang.

 - Smaller fixes and cleanups:

      - Unneeded variable removals, by ye xingchen

      - A cleanup removing a BUG_ON() in create_unique_id(), by Chao Yu

Link: https://lore.kernel.org/all/20220817101826.236819-1-42.hyeyoo@gmail.com/ [1]
Link: https://lore.kernel.org/all/20220923202822.2667581-1-keescook@chromium.org/ [2]
Link: https://lore.kernel.org/all/20220823170400.26546-1-vbabka@suse.cz/ [3]

* tag 'slab-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (30 commits)
  mm/slub: fix a slab missed to be freed problem
  slab: Introduce kmalloc_size_roundup()
  slab: Remove __malloc attribute from realloc functions
  mm/slub: clean up create_unique_id()
  mm/slub: enable debugging memory wasting of kmalloc
  slub: Make PREEMPT_RT support less convoluted
  mm/slub: simplify __cmpxchg_double_slab() and slab_[un]lock()
  mm/slub: convert object_map_lock to non-raw spinlock
  mm/slub: remove slab_lock() usage for debug operations
  mm/slub: restrict sysfs validation to debug caches and make it safe
  mm/sl[au]b: check if large object is valid in __ksize()
  mm/slab_common: move declaration of __ksize() to mm/slab.h
  mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
  mm/slab_common: unify NUMA and UMA version of tracepoints
  mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
  mm/sl[au]b: generalize kmalloc subsystem
  mm/slub: move free_debug_processing() further
  mm/sl[au]b: introduce common alloc/free functions without tracepoint
  mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
  mm/slab_common: cleanup kmalloc_large()
  ...
2022-10-10 10:21:22 -07:00
Qi Zheng
21fbd59136 ksm: add the ksm prefix to the names of the ksm private structures
In order to prevent the name of the private structure of ksm from being
the same as the name of the common structure used in subsequent patches,
prefix their names with ksm in advance.

Link: https://lkml.kernel.org/r/20220831031951.43152-5-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:02:43 -07:00
Vernon Yang
9a7d7a80e1 Documentation/mm: modify page_referenced to folio_referenced
Since commit b3ac04132c ("mm/rmap: Turn page_referenced() into
folio_referenced()") the page_referenced function name was modified,
so fix it up to use the correct one.

Signed-off-by: Vernon Yang <vernon2gm@gmail.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220926152032.74621-1-vernon2gm@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-29 13:16:08 -06:00
Yu Zhao
8be976a093 mm: multi-gen LRU: design doc
Add a design doc.

Link: https://lkml.kernel.org/r/20220918080010.2920238-15-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:11 -07:00
Feng Tang
6edf2576a6 mm/slub: enable debugging memory wasting of kmalloc
kmalloc's API family is critical for mm, with one nature that it will
round up the request size to a fixed one (mostly power of 2). Say
when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
could be allocated, so in worst case, there is around 50% memory
space waste.

The wastage is not a big issue for requests that get allocated/freed
quickly, but may cause problems with objects that have longer life
time.

We've met a kernel boot OOM panic (v5.10), and from the dumped slab
info:

    [   26.062145] kmalloc-2k            814056KB     814056KB

From debug we found there are huge number of 'struct iova_magazine',
whose size is 1032 bytes (1024 + 8), so each allocation will waste
1016 bytes. Though the issue was solved by giving the right (bigger)
size of RAM, it is still nice to optimize the size (either use a
kmalloc friendly size or create a dedicated slab for it).

And from lkml archive, there was another crash kernel OOM case [1]
back in 2019, which seems to be related with the similar slab waste
situation, as the log is similar:

    [    4.332648] iommu: Adding device 0000:20:02.0 to group 16
    [    4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0
    ...
    [    4.857565] kmalloc-2048           59164KB      59164KB

The crash kernel only has 256M memory, and 59M is pretty big here.
(Note: the related code has been changed and optimised in recent
kernel [2], these logs are just picked to demo the problem, also
a patch changing its size to 1024 bytes has been merged)

So add an way to track each kmalloc's memory waste info, and
leverage the existing SLUB debug framework (specifically
SLUB_STORE_USER) to show its call stack of original allocation,
so that user can evaluate the waste situation, identify some hot
spots and optimize accordingly, for a better utilization of memory.

The waste info is integrated into existing interface:
'/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of
'kmalloc-4k' after boot is:

 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1
     __kmem_cache_alloc_node+0x11f/0x4e0
     __kmalloc_node+0x4e/0x140
     ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe]
     ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe]
     ixgbe_probe+0x165f/0x1d20 [ixgbe]
     local_pci_probe+0x78/0xc0
     work_for_cpu_fn+0x26/0x40
     ...

which means in 'kmalloc-4k' slab, there are 126 requests of
2240 bytes which got a 4KB space (wasting 1856 bytes each
and 233856 bytes in total), from ixgbe_alloc_q_vector().

And when system starts some real workload like multiple docker
instances, there could are more severe waste.

[1]. https://lkml.org/lkml/2019/8/12/266
[2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/

[Thanks Hyeonggon for pointing out several bugs about sorting/format]
[Thanks Vlastimil for suggesting way to reduce memory usage of
 orig_size and keep it only for kmalloc objects]

Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-23 12:32:45 +02:00
Kassey Li
8f0efa81df mm/page_owner.c: add llseek for page_owner
It is too slow to dump all the pages, in some usage we just want to dump a
given start pfn, for example: a CMA range or a single page.

To speed up and save time, this change allows specifying of a start pfn by
adding llseek for page_owner.

Link: https://lkml.kernel.org/r/20220818022425.31056-1-quic_yingangl@quicinc.com
Signed-off-by: Kassey Li <quic_yingangl@quicinc.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:59 -07:00
Fabio M. De Francesco
a9e9c93966 Documentation/mm: add details about kmap_local_page() and preemption
What happens if a thread is preempted after mapping pages with
kmap_local_page() was questioned recently.[1]

Commit f3ba3c710a ("mm/highmem: Provide kmap_local*") from Thomas
Gleixner explains clearly that on context switch, the maps of an outgoing
task are removed and the map of the incoming task are restored and that
kmap_local_page() can be invoked from both preemptible and atomic
contexts.[2]

Therefore, for the purpose to make it clearer that users can call
kmap_local_page() from contexts that allow preemption, rework a couple of
sentences and add further information in highmem.rst.

[1] https://lore.kernel.org/lkml/5303077.Sb9uPGUboI@opensuse/
[2] https://lore.kernel.org/all/20201118204007.468533059@linutronix.de/

Link: https://lkml.kernel.org/r/20220728154844.10874-8-fmdefrancesco@gmail.com
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:46 -07:00
Fabio M. De Francesco
84b86f6054 Documentation/mm: rrefer kmap_local_page() and avoid kmap()
The reasoning for converting kmap() to kmap_local_page() was questioned
recently.[1]

There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) kmap() also requires global TLB invalidation when
its pool wraps and it might block when the mapping space is fully utilized
until a slot becomes available.

Warn users to avoid the use of kmap() and instead use kmap_local_page(),
by designing their code to map pages in the same context the mapping will
be used.

[1] https://lore.kernel.org/lkml/1891319.taCxCBeP46@opensuse/

Link: https://lkml.kernel.org/r/20220728154844.10874-6-fmdefrancesco@gmail.com
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:45 -07:00
Fabio M. De Francesco
6b3afe2eee Documentation/mm: avoid invalid use of addresses from kmap_local_page()
Users of kmap_local_page() must be absolutely sure to not hand kernel
virtual address obtained calling kmap_local_page() on highmem pages to
other contexts because those pointers are thread local, therefore, they
are no longer valid across different contexts.

Extend the documentation of kmap_local_page() to warn users about the
above-mentioned potential invalid use of pointers returned by
kmap_local_page().

Link: https://lkml.kernel.org/r/20220728154844.10874-5-fmdefrancesco@gmail.com
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:45 -07:00
Fabio M. De Francesco
516ea046ec Documentation/mm: don't kmap*() pages which can't come from HIGHMEM
There is no need to kmap*() pages which are guaranteed to come from
ZONE_NORMAL (or lower).  Linux has currently several call sites of
kmap{,_atomic,_local_page}() on pages which are clearly known which can't
come from ZONE_HIGHMEM.

Therefore, add a paragraph to highmem.rst, to explain better that a plain
page_address() may be used for getting the address of pages which cannot
come from ZONE_HIGHMEM, although it is always safe to use
kmap_local_page() / kunmap_local() also on those pages.

Link: https://lkml.kernel.org/r/20220728154844.10874-4-fmdefrancesco@gmail.com
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:45 -07:00
Muchun Song
838691a1c0 mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst
All the comments which explains how HVO works are moved to
vmemmap_dedup.rst since

  commit 4917f55b4e ("mm/sparse-vmemmap: improve memory savings for compound devmaps")

except some comments above page_fixed_fake_head().  This commit moves
those comments to vmemmap_dedup.rst and improve vmemmap_dedup.rst as well.

Link: https://lkml.kernel.org/r/20220628092235.91270-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:43 -07:00
Muchun Song
dff033818a mm: hugetlb_vmemmap: introduce the name HVO
It it inconvenient to mention the feature of optimizing vmemmap pages
associated with HugeTLB pages when communicating with others since there
is no specific or abbreviated name for it when it is first introduced. 
Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now.

This commit also updates the document about "hugetlb_free_vmemmap" by the
way discussed in thread [1].

Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1]
Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:42 -07:00
Linus Torvalds
6614a3c316 - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
 
 - Some kmemleak fixes from Patrick Wang and Waiman Long
 
 - DAMON updates from SeongJae Park
 
 - memcg debug/visibility work from Roman Gushchin
 
 - vmalloc speedup from Uladzislau Rezki
 
 - more folio conversion work from Matthew Wilcox
 
 - enhancements for coherent device memory mapping from Alex Sierra
 
 - addition of shared pages tracking and CoW support for fsdax, from
   Shiyang Ruan
 
 - hugetlb optimizations from Mike Kravetz
 
 - Mel Gorman has contributed some pagealloc changes to improve latency
   and realtime behaviour.
 
 - mprotect soft-dirty checking has been improved by Peter Xu
 
 - Many other singleton patches all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
 jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
 SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
 =w/UH
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Most of the MM queue. A few things are still pending.

  Liam's maple tree rework didn't make it. This has resulted in a few
  other minor patch series being held over for next time.

  Multi-gen LRU still isn't merged as we were waiting for mapletree to
  stabilize. The current plan is to merge MGLRU into -mm soon and to
  later reintroduce mapletree, with a view to hopefully getting both
  into 6.1-rc1.

  Summary:

   - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
     Lin, Yang Shi, Anshuman Khandual and Mike Rapoport

   - Some kmemleak fixes from Patrick Wang and Waiman Long

   - DAMON updates from SeongJae Park

   - memcg debug/visibility work from Roman Gushchin

   - vmalloc speedup from Uladzislau Rezki

   - more folio conversion work from Matthew Wilcox

   - enhancements for coherent device memory mapping from Alex Sierra

   - addition of shared pages tracking and CoW support for fsdax, from
     Shiyang Ruan

   - hugetlb optimizations from Mike Kravetz

   - Mel Gorman has contributed some pagealloc changes to improve
     latency and realtime behaviour.

   - mprotect soft-dirty checking has been improved by Peter Xu

   - Many other singleton patches all over the place"

 [ XFS merge from hell as per Darrick Wong in

   https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]

* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
  tools/testing/selftests/vm/hmm-tests.c: fix build
  mm: Kconfig: fix typo
  mm: memory-failure: convert to pr_fmt()
  mm: use is_zone_movable_page() helper
  hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
  hugetlbfs: cleanup some comments in inode.c
  hugetlbfs: remove unneeded header file
  hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
  hugetlbfs: use helper macro SZ_1{K,M}
  mm: cleanup is_highmem()
  mm/hmm: add a test for cross device private faults
  selftests: add soft-dirty into run_vmtests.sh
  selftests: soft-dirty: add test for mprotect
  mm/mprotect: fix soft-dirty check in can_change_pte_writable()
  mm: memcontrol: fix potential oom_lock recursion deadlock
  mm/gup.c: fix formatting in check_and_migrate_movable_page()
  xfs: fail dax mount if reflink is enabled on a partition
  mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
  userfaultfd: don't fail on unrecognized features
  hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
  ...
2022-08-05 16:32:45 -07:00
Mike Rapoport
ee65728e10 docs: rename Documentation/vm to Documentation/mm
so it will be consistent with code mm directory and with
Documentation/admin-guide/mm and won't be confused with virtual machines.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Wu XiangCheng <bobwxc@email.cn>
2022-06-27 12:52:53 -07:00