android_kernel_samsung_sm8650

Author	SHA1	Message	Date
Fangzheng Zhang	aa71a02cf3	ANDROID: vendor_hooks: mm: add hook to count the number pages allocated for each slab Add the tracing interface on the kmalloc_large allocation path, which can detect the number of pages allocated by the slab, and if exceeds a threshold, trigger a panic or other actions. Bug: 312897430 Change-Id: I5575d0e4f91dab1c6e074f3e907fee8ea9327fd7 Signed-off-by: Fangzheng Zhang <fangzheng.zhang@unisoc.com>	2023-11-30 18:19:39 +00:00
Greg Kroah-Hartman	2950de8b2d	Merge 6.1.56 into android14-6.1-lts Changes in 6.1.56 NFS: Fix error handling for O_DIRECT write scheduling NFS: Fix O_DIRECT locking issues NFS: More O_DIRECT accounting fixes for error paths NFS: Use the correct commit info in nfs_join_page_group() NFS: More fixes for nfs_direct_write_reschedule_io() NFS/pNFS: Report EINVAL errors from connect() to the server SUNRPC: Mark the cred for revalidation if the server rejects it NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server NFSv4.1: fix pnfs MDS=DS session trunking media: v4l: Use correct dependency for camera sensor drivers media: via: Use correct dependency for camera sensor drivers netfs: Only call folio_start_fscache() one time for each folio dm: fix a race condition in retrieve_deps btrfs: improve error message after failure to add delayed dir index item btrfs: remove BUG() after failure to insert delayed dir index item ext4: replace the traditional ternary conditional operator with with max()/min() ext4: move setting of trimmed bit into ext4_try_to_trim_range() ext4: do not let fstrim block system suspend netfilter: nf_tables: don't skip expired elements during walk netfilter: nf_tables: GC transaction API to avoid race with control plane netfilter: nf_tables: adapt set backend to use GC transaction API netfilter: nft_set_hash: mark set element as dead when deleting from packet path netfilter: nf_tables: remove busy mark and gc batch API netfilter: nf_tables: don't fail inserts if duplicate has expired netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path netfilter: nf_tables: GC transaction race with netns dismantle netfilter: nf_tables: GC transaction race with abort path netfilter: nf_tables: use correct lock to protect gc_list netfilter: nf_tables: defer gc run if previous batch is still pending netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration netfilter: nf_tables: fix memleak when more than 255 elements expired ASoC: meson: spdifin: start hw on dai probe netfilter: nf_tables: disallow element removal on anonymous sets bpf: Avoid deadlock when using queue and stack maps from NMI ASoC: rt5640: Revert "Fix sleep in atomic context" ASoC: rt5640: Fix IRQ not being free-ed for HDA jack detect mode ALSA: hda/realtek: Splitting the UX3402 into two separate models netfilter: conntrack: fix extension size table selftests: tls: swap the TX and RX sockets in some tests net/core: Fix ETH_P_1588 flow dissector ASoC: hdaudio.c: Add missing check for devm_kstrdup ASoC: imx-audmix: Fix return error with devm_clk_get() octeon_ep: fix tx dma unmap len values in SG iavf: do not process adminq tasks when __IAVF_IN_REMOVE_TASK is set ASoC: SOF: core: Only call sof_ops_free() on remove if the probe was successful iavf: add iavf_schedule_aq_request() helper iavf: schedule a request immediately after add/delete vlan i40e: Fix VF VLAN offloading when port VLAN is configured netfilter, bpf: Adjust timeouts of non-confirmed CTs in bpf_ct_insert_entry() ionic: fix 16bit math issue when PAGE_SIZE >= 64KB igc: Fix infinite initialization loop with early XDP redirect ipv4: fix null-deref in ipv4_link_failure scsi: iscsi_tcp: restrict to TCP sockets powerpc/perf/hv-24x7: Update domain value check dccp: fix dccp_v4_err()/dccp_v6_err() again x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer() net: hsr: Properly parse HSRv1 supervisor frames. platform/x86: intel_scu_ipc: Check status after timeout in busy_loop() platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt() platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command() platform/x86: intel_scu_ipc: Fail IPC send if still busy x86/srso: Fix srso_show_state() side effect x86/srso: Fix SBPB enablement for spec_rstack_overflow=off net: hns3: add cmdq check for vf periodic service task net: hns3: fix GRE checksum offload issue net: hns3: only enable unicast promisc when mac table full net: hns3: fix fail to delete tc flower rules during reset issue net: hns3: add 5ms delay before clear firmware reset irq source net: bridge: use DEV_STATS_INC() team: fix null-ptr-deref when team device type is changed net: rds: Fix possible NULL-pointer dereference netfilter: nf_tables: disable toggling dormant table state more than once netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP i915/pmu: Move execlist stats initialization to execlist specific setup locking/seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested() net: ena: Flush XDP packets on error. bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI octeontx2-pf: Do xdp_do_flush() after redirects. igc: Expose tx-usecs coalesce setting to user proc: nommu: /proc/<pid>/maps: release mmap read lock proc: nommu: fix empty /proc/<pid>/maps cifs: Fix UAF in cifs_demultiplex_thread() gpio: tb10x: Fix an error handling path in tb10x_gpio_probe() i2c: mux: demux-pinctrl: check the return value of devm_kstrdup() i2c: mux: gpio: Add missing fwnode_handle_put() i2c: xiic: Correct return value check for xiic_reinit() ARM: dts: BCM5301X: Extend RAM to full 256MB for Linksys EA6500 V2 ARM: dts: samsung: exynos4210-i9100: Fix LCD screen's physical size ARM: dts: qcom: msm8974pro-castor: correct inverted X of touchscreen ARM: dts: qcom: msm8974pro-castor: correct touchscreen function names ARM: dts: qcom: msm8974pro-castor: correct touchscreen syna,nosleep-mode f2fs: optimize iteration over sparse directories f2fs: get out of a repeat loop when getting a locked data page s390/pkey: fix PKEY_TYPE_EP11_AES handling in PKEY_CLR2SECK2 IOCTL arm64: dts: qcom: sdm845-db845c: Mark cont splash memory region as reserved wifi: ath11k: fix tx status reporting in encap offload mode wifi: ath11k: Cleanup mac80211 references on failure during tx_complete scsi: qla2xxx: Select qpair depending on which CPU post_cmd() gets called scsi: qla2xxx: Use raw_smp_processor_id() instead of smp_processor_id() drm/amdkfd: Flush TLB after unmapping for GFX v9.4.3 drm/amdkfd: Insert missing TLB flush on GFX10 and later btrfs: reset destination buffer when read_extent_buffer() gets invalid range vfio/mdev: Fix a null-ptr-deref bug for mdev_unregister_parent() MIPS: Alchemy: only build mmc support helpers if au1xmmc is enabled spi: spi-gxp: BUG: Correct spi write return value drm/bridge: ti-sn65dsi83: Do not generate HFP/HBP/HSA and EOT packet bus: ti-sysc: Use fsleep() instead of usleep_range() in sysc_reset() bus: ti-sysc: Fix missing AM35xx SoC matching firmware: arm_scmi: Harden perf domain info access firmware: arm_scmi: Fixup perf power-cost/microwatt support power: supply: mt6370: Fix missing error code in mt6370_chg_toggle_cfo() clk: sprd: Fix thm_parents incorrect configuration clk: tegra: fix error return case for recalc_rate ARM: dts: omap: correct indentation ARM: dts: ti: omap: Fix bandgap thermal cells addressing for omap3/4 ARM: dts: Unify pwm-omap-dmtimer node names ARM: dts: Unify pinctrl-single pin group nodes for omap4 ARM: dts: ti: omap: motorola-mapphone: Fix abe_clkctrl warning on boot bus: ti-sysc: Fix SYSC_QUIRK_SWSUP_SIDLE_ACT handling for uart wake-up power: supply: ucs1002: fix error code in ucs1002_get_property() firmware: imx-dsp: Fix an error handling path in imx_dsp_setup_channels() xtensa: add default definition for XCHAL_HAVE_DIV32 xtensa: iss/network: make functions static xtensa: boot: don't add include-dirs xtensa: umulsidi3: fix conditional expression xtensa: boot/lib: fix function prototypes power: supply: rk817: Fix node refcount leak selftests/powerpc: Use CLEAN macro to fix make warning selftests/powerpc: Pass make context to children selftests/powerpc: Fix emit_tests to work with run_kselftest.sh soc: imx8m: Enable OCOTP clock for imx8mm before reading registers arm64: dts: imx: Add imx8mm-prt8mm.dtb to build firmware: arm_ffa: Don't set the memory region attributes for MEM_LEND gpio: pmic-eic-sprd: Add can_sleep flag for PMIC EIC chip i2c: npcm7xx: Fix callback completion ordering x86/reboot: VMCLEAR active VMCSes before emergency reboot ceph: drop messages from MDS when unmounting dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock bpf: Annotate bpf_long_memcpy with data_race spi: sun6i: reduce DMA RX transfer width to single byte spi: sun6i: fix race between DMA RX transfer completion and RX FIFO drain nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid() parisc: sba: Fix compile warning wrt list of SBA devices parisc: iosapic.c: Fix sparse warnings parisc: drivers: Fix sparse warning parisc: irq: Make irq_stack_union static to avoid sparse warning scsi: qedf: Add synchronization between I/O completions and abort scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command selftests/ftrace: Correctly enable event in instance-event.tc ring-buffer: Avoid softlockup in ring_buffer_resize() btrfs: assert delayed node locked when removing delayed item selftests: fix dependency checker script ring-buffer: Do not attempt to read past "commit" net/smc: bugfix for smcr v2 server connect success statistic ata: sata_mv: Fix incorrect string length computation in mv_dump_mem() platform/mellanox: mlxbf-bootctl: add NET dependency into Kconfig platform/x86: asus-wmi: Support 2023 ROG X16 tablet mode thermal/of: add missing of_node_put() drm/amd/display: Don't check registers, if using AUX BL control drm/amdgpu/soc21: don't remap HDP registers for SR-IOV drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOV drm/amdgpu: Handle null atom context in VBIOS info ioctl riscv: errata: fix T-Head dcache.cva encoding scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() spi: nxp-fspi: reset the FLSHxCR1 registers spi: stm32: add a delay before SPI disable ASoC: fsl: imx-pcm-rpmsg: Add SNDRV_PCM_INFO_BATCH flag spi: intel-pci: Add support for Granite Rapids SPI serial flash bpf: Clarify error expectations from bpf_clone_redirect ALSA: hda: intel-sdw-acpi: Use u8 type for link index ASoC: cs42l42: Ensure a reset pulse meets minimum pulse width. ASoC: cs42l42: Don't rely on GPIOD_OUT_LOW to set RESET initially low firmware: cirrus: cs_dsp: Only log list of algorithms in debug build memblock tests: fix warning: "__ALIGN_KERNEL" redefined memblock tests: fix warning ‘struct seq_file’ declared inside parameter list ASoC: imx-rpmsg: Set ignore_pmdown_time for dai_link media: vb2: frame_vector.c: replace WARN_ONCE with a comment NFSv4.1: fix zero value filehandle in post open getattr ASoC: SOF: Intel: MTL: Reduce the DSP init timeout powerpc/watchpoints: Disable preemption in thread_change_pc() powerpc/watchpoint: Disable pagefaults when getting user instruction powerpc/watchpoints: Annotate atomic context in more places ncsi: Propagate carrier gain/loss events to the NCSI controller net: hsr: Add __packed to struct hsr_sup_tlv. tsnep: Fix NAPI scheduling tsnep: Fix NAPI polling with budget 0 LoongArch: Set all reserved memblocks on Node#0 at initialization fbdev/sh7760fb: Depend on FB=y perf build: Define YYNOMEM as YYNOABORT for bison < 3.81 nvme-pci: factor the iod mempool creation into a helper nvme-pci: factor out a nvme_pci_alloc_dev helper nvme-pci: do not set the NUMA node of device if it has none wifi: ath11k: Don't drop tx_status when peer cannot be found scsi: qla2xxx: Fix NULL pointer dereference in target mode nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev smack: Record transmuting in smk_transmuted smack: Retrieve transmuting information in smack_inode_getsecurity() iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_range x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race x86/srso: Add SRSO mitigation for Hygon processors KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway KVM: SVM: Fix TSC_AUX virtualization setup KVM: x86/mmu: Open code leaf invalidation from mmu_notifier KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe() mptcp: fix bogus receive window shrinkage with multiple subflows misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe Revert "tty: n_gsm: fix UAF in gsm_cleanup_mux" serial: 8250_port: Check IRQ data before use nilfs2: fix potential use after free in nilfs_gccache_submit_read_data() netfilter: nf_tables: disallow rule removal from chain binding ALSA: hda: Disable power save for solving pop issue on Lenovo ThinkCentre M70q LoongArch: Define relocation types for ABI v2.10 LoongArch: numa: Fix high_memory calculation ata: libata-scsi: link ata port and scsi device ata: libata-scsi: ignore reserved bits for REPORT SUPPORTED OPERATION CODES io_uring/fs: remove sqe->rw_flags checking from LINKAT i2c: i801: unregister tco_pdev in i801_probe() error path ASoC: amd: yc: Fix non-functional mic on Lenovo 82QF and 82UG kernel/sched: Modify initial boot task idle setup sched/rt: Fix live lock between select_fallback_rq() and RT push netfilter: nf_tables: fix kdoc warnings after gc rework Revert "SUNRPC dont update timeout value on connection reset" timers: Tag (hr)timer softirq as hotplug safe drm/tests: Fix incorrect argument in drm_test_mm_insert_range arm64: defconfig: remove CONFIG_COMMON_CLK_NPCM8XX=y mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions() mm/slab_common: fix slab_caches list corruption after kmem_cache_destroy() mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement ring-buffer: Update "shortest_full" in polling btrfs: properly report 0 avail for very full file systems media: uvcvideo: Fix OOB read bpf: Add override check to kprobe multi link attach bpf: Fix BTF_ID symbol generation collision bpf: Fix BTF_ID symbol generation collision in tools/ net: thunderbolt: Fix TCPv6 GSO checksum calculation fs/smb/client: Reset password pointer to NULL ata: libata-core: Fix ata_port_request_pm() locking ata: libata-core: Fix port and device removal ata: libata-core: Do not register PM operations for SAS ports ata: libata-sata: increase PMP SRST timeout to 10s drm/i915/gt: Fix reservation address in ggtt_reserve_guc_top power: supply: rk817: Add missing module alias power: supply: ab8500: Set typing and props fs: binfmt_elf_efpic: fix personality for ELF-FDPIC drm/amdkfd: Use gpu_offset for user queue's wptr drm/meson: fix memory leak on ->hpd_notify callback memcg: drop kmem.limit_in_bytes mm, memcg: reconsider kmem.limit_in_bytes deprecation ASoC: amd: yc: Fix a non-functional mic on Lenovo 82TL Linux 6.1.56 Change-Id: Id110614d91d6d60fb6c7622c5af82f219a84a30f Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-10-27 09:17:04 +00:00
Rafael Aquini	a5569bb187	mm/slab_common: fix slab_caches list corruption after kmem_cache_destroy() commit 46a9ea6681907a3be6b6b0d43776dccc62cad6cf upstream. After the commit in Fixes:, if a module that created a slab cache does not release all of its allocated objects before destroying the cache (at rmmod time), we might end up releasing the kmem_cache object without removing it from the slab_caches list thus corrupting the list as kmem_cache_destroy() ignores the return value from shutdown_cache(), which in turn never removes the kmem_cache object from slabs_list in case __kmem_cache_shutdown() fails to release all of the cache's slabs. This is easily observable on a kernel built with CONFIG_DEBUG_LIST=y as after that ill release the system will immediately trip on list_add, or list_del, assertions similar to the one shown below as soon as another kmem_cache gets created, or destroyed: [ 1041.213632] list_del corruption. next->prev should be ffff89f596fb5768, but was 52f1e5016aeee75d. (next=ffff89f595a1b268) [ 1041.219165] ------------[ cut here ]------------ [ 1041.221517] kernel BUG at lib/list_debug.c:62! [ 1041.223452] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 1041.225408] CPU: 2 PID: 1852 Comm: rmmod Kdump: loaded Tainted: G B W OE 6.5.0 #15 [ 1041.228244] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023 [ 1041.231212] RIP: 0010:__list_del_entry_valid+0xae/0xb0 Another quick way to trigger this issue, in a kernel with CONFIG_SLUB=y, is to set slub_debug to poison the released objects and then just run cat /proc/slabinfo after removing the module that leaks slab objects, in which case the kernel will panic: [ 50.954843] general protection fault, probably for non-canonical address 0xa56b6b6b6b6b6b8b: 0000 [#1] PREEMPT SMP PTI [ 50.961545] CPU: 2 PID: 1495 Comm: cat Kdump: loaded Tainted: G B W OE 6.5.0 #15 [ 50.966808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023 [ 50.972663] RIP: 0010:get_slabinfo+0x42/0xf0 This patch fixes this issue by properly checking shutdown_cache()'s return value before taking the kmem_cache_release() branch. Fixes: `0495e337b7` ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock") Signed-off-by: Rafael Aquini <aquini@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-10-06 14:57:03 +02:00
Liujie Xie	573ba7b6e6	ANDROID: vendor_hooks: Add hooks for memory when debug Add vendors hooks for recording memory used Vendor modules allocate and manages the memory itself. These memories might not be included in kernel memory statistics. Also, detailed references and vendor-specific information are managed only inside modules. When various problems such as memory leaks occurs, these information should be showed in real-time. Bug: 182443489 Bug: 234407991 Bug: 277799025 Signed-off-by: Liujie Xie <xieliujie@oppo.com> Change-Id: I62d8bb2b6650d8b187b433f97eb833ef0b784df1 Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>	2023-05-25 21:06:40 +00:00
Vlastimil Babka	c18c20f162	mm, slab: remove duplicate kernel-doc comment for ksize() Akira reports: > "make htmldocs" reports duplicate C declaration of ksize() as follows: > /linux/Documentation/core-api/mm-api:43: ./mm/slab_common.c:1428: WARNING: Duplicate C declaration, also defined at core-api/mm-api:212. > Declaration is '.. c:function:: size_t ksize (const void *objp)'. > This is due to the kernel-doc comment for ksize() declaration added in > include/linux/slab.h by commit `05a940656e` ("slab: Introduce > kmalloc_size_roundup()"). There is an older kernel-doc comment for ksize() definition in mm/slab_common.c, which is not only duplicated, but also contradicts the new one - the additional storage discovered by ksize() should not be used by callers anymore. Delete the old kernel-doc. Reported-by: Akira Yokosawa <akiyks@gmail.com> Link: https://lore.kernel.org/all/d33440f6-40cf-9747-3340-e54ffaf7afb8@gmail.com/ Fixes: `05a940656e` ("slab: Introduce kmalloc_size_roundup()") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-11-07 17:11:27 +01:00
Kees Cook	328687151b	mm/slab_common: Restore passing "caller" for tracing The "caller" argument was accidentally being ignored in a few places that were recently refactored. Restore these "caller" arguments, instead of _RET_IP_. Fixes: `11e9734bcb` ("mm/slab_common: unify NUMA and UMA version of tracepoints") Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-11-06 21:20:46 +01:00
Vlastimil Babka	eb4940d4ad	mm/slab: remove !CONFIG_TRACING variants of kmalloc_[node_]trace() For !CONFIG_TRACING kernels, the kmalloc() implementation tries (in cases where the allocation size is build-time constant) to save a function call, by inlining kmalloc_trace() to a kmem_cache_alloc() call. However since commit `6edf2576a6` ("mm/slub: enable debugging memory wasting of kmalloc") this path now fails to pass the original request size to be eventually recorded (for kmalloc caches with debugging enabled). We could adjust the code to call __kmem_cache_alloc_node() as the CONFIG_TRACING variant, but that would as a result inline a call with 5 parameters, bloating the kmalloc() call sites. The cost of extra function call (to kmalloc_trace()) seems like a lesser evil. It also appears that the !CONFIG_TRACING variant is incompatible with upcoming hardening efforts [1] so it's easier if we just remove it now. Kernels with no tracing are rare these days and the benefit is dubious anyway. [1] https://lore.kernel.org/linux-mm/20221101222520.never.109-kees@kernel.org/T/#m20ecf14390e406247bde0ea9cce368f469c539ed Link: https://lore.kernel.org/all/097d8fba-bd10-a312-24a3-a4068c4f424c@suse.cz/ Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-11-04 14:57:21 +01:00
Lukas Bulwahn	a207620123	mm/slab_common: repair kernel-doc for __ksize() Commit `445d41d7a7` ("Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next") resolved a conflict of two concurrent changes to __ksize(). However, it did not adjust the kernel-doc comment of __ksize(), while the name of the argument to __ksize() was renamed. Hence, ./scripts/ kernel-doc -none mm/slab_common.c warns about it. Adjust the kernel-doc comment for __ksize() for make W=1 happiness. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-11-03 18:09:45 +01:00
Linus Torvalds	27bc50fc90	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam R. Howlett. An overlapping range-based tree for vmas. It it apparently slight more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com). This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU= =xfWx -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...	2022-10-10 17:53:04 -07:00
Vlastimil Babka	445d41d7a7	Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next The first two patches from a series by Kees Cook [1] that introduce kmalloc_size_roundup(). This will allow merging of per-subsystem patches using the new function and ultimately stop (ab)using ksize() in a way that causes ongoing trouble for debugging functionality and static checkers. [1] https://lore.kernel.org/all/20220923202822.2667581-1-keescook@chromium.org/ -- Resolved a conflict of modifying mm/slab.c __ksize() comment with a commit that unifies __ksize() implementation into mm/slab_common.c	2022-09-29 11:30:55 +02:00
Vlastimil Babka	af961f8059	Merge branch 'slab/for-6.1/slub_debug_waste' into slab/for-next A patch from Feng Tang that enhances the existing debugfs alloc_traces file for kmalloc caches with information about how much space is wasted by allocations that needs less space than the particular kmalloc cache provides.	2022-09-29 11:28:26 +02:00
Kees Cook	05a940656e	slab: Introduce kmalloc_size_roundup() In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for. There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel: 1) An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example: data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data); 2) The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators: data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize); In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information. We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?". Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize(). [1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183 [ vbabka@suse.cz: add SLOB version ] Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-29 11:10:34 +02:00
Kees Cook	9ed9cac185	slab: Remove __malloc attribute from realloc functions The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed. Thanks to Geert Uytterhoeven <geert@linux-m68k.org> for reporting build failures with gcc-8 in earlier version which tried to remove the #ifdef. While the "alloc_size" attribute is available on all GCC versions, I forgot that it gets disabled explicitly by the kernel in GCC < 9.1 due to misbehaviors. Add a note to the compiler_attributes.h entry for it. Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Marco Elver <elver@google.com> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-29 11:05:57 +02:00
Feng Tang	6edf2576a6	mm/slub: enable debugging memory wasting of kmalloc kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL\|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1 __kmem_cache_alloc_node+0x11f/0x4e0 __kmalloc_node+0x4e/0x140 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe] ixgbe_probe+0x165f/0x1d20 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 ... which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total), from ixgbe_alloc_q_vector(). And when system starts some real workload like multiple docker instances, there could are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: John Garry <john.garry@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-23 12:32:45 +02:00
Vlastimil Babka	3662c13ec6	Merge branch 'slab/for-6.1/common_kmalloc' into slab/for-next The "common kmalloc v4" series [1] by Hyeonggon Yoo. - Improves the mm/slab_common.c wrappers to allow deleting duplicated code between SLAB and SLUB. - Large kmalloc() allocations in SLAB are passed to page allocator like in SLUB, reducing number of kmalloc caches. - Removes the {kmem_cache_alloc,kmalloc}_node variants of tracepoints, node id parameter added to non-_node variants. - 8 files changed, 341 insertions(+), 651 deletions(-) [1] https://lore.kernel.org/all/20220817101826.236819-1-42.hyeyoo@gmail.com/ -- Merge resolves trivial conflict in mm/slub.c with commit `5373b8a09d` ("kasan: call kasan_malloc() from __kmalloc_*track_caller()")	2022-09-23 10:32:02 +02:00
Vlastimil Babka	0467ca385f	Merge branch 'slab/for-6.1/trivial' into slab/for-next Trivial fixes and cleanups: - unneeded variable removals, by ye xingchen	2022-09-23 10:29:53 +02:00
Feng Tang	d71608a877	mm/slab_common: fix possible double free of kmem_cache When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu' kunit test case cause a use-after-free error: BUG: KASAN: use-after-free in kobject_del+0x14/0x30 Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261 CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_address_description.constprop.0+0x87/0x2a5 print_report+0x103/0x1ed kasan_report+0xb7/0x140 kobject_del+0x14/0x30 kmem_cache_destroy+0x130/0x170 test_exit+0x1a/0x30 kunit_try_run_case+0xad/0xc0 kunit_generic_run_threadfn_adapter+0x26/0x50 kthread+0x17b/0x1b0 </TASK> The cause is inside kmem_cache_destroy(): kmem_cache_destroy acquire lock/mutex shutdown_cache schedule_work(kmem_cache_release) (if RCU flag set) release lock/mutex kmem_cache_release (if RCU flag not set) In some certain timing, the scheduled work could be run before the next RCU flag checking, which can then get a wrong value and lead to double kmem_cache_release(). Fix it by caching the RCU flag inside protected area, just like 'refcnt' Fixes: `0495e337b7` ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock") Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-19 16:27:26 +02:00
Waiman Long	0495e337b7	mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock A circular locking problem is reported by lockdep due to the following circular locking dependency. +--> cpu_hotplug_lock --> slab_mutex --> kn->active --+ \| \| +-----------------------------------------------------+ The forward cpu_hotplug_lock ==> slab_mutex ==> kn->active dependency happens in kmem_cache_destroy(): cpus_read_lock(); mutex_lock(&slab_mutex); ==> sysfs_slab_unlink() ==> kobject_del() ==> kernfs_remove() ==> __kernfs_remove() ==> kernfs_drain(): rwsem_acquire(&kn->dep_map, ...); The backward kn->active ==> cpu_hotplug_lock dependency happens in kernfs_fop_write_iter(): kernfs_get_active(); ==> slab_attr_store() ==> cpu_partial_store() ==> flush_all(): cpus_read_lock() One way to break this circular locking chain is to avoid holding cpu_hotplug_lock and slab_mutex while deleting the kobject in sysfs_slab_unlink() which should be equivalent to doing a write_lock and write_unlock pair of the kn->active virtual lock. Since the kobject structures are not protected by slab_mutex or the cpu_hotplug_lock, we can certainly release those locks before doing the delete operation. Move sysfs_slab_unlink() and sysfs_slab_release() to the newly created kmem_cache_release() and call it outside the slab_mutex & cpu_hotplug_lock critical sections. There will be a slight delay in the deletion of sysfs files if kmem_cache_release() is called indirectly from a work function. Fixes: `5a836bf6b0` ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: David Rientjes <rientjes@google.com> Link: https://lore.kernel.org/all/YwOImVd+nRUsSAga@hyeyoo/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-01 12:10:31 +02:00
Hyeonggon Yoo	d5eff73690	mm/sl[au]b: check if large object is valid in __ksize() If address of large object is not beginning of folio or size of the folio is too small, it must be invalid. WARN() and return 0 in such cases. Cc: Marco Elver <elver@google.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-01 11:44:39 +02:00
Hyeonggon Yoo	8dfa9d5540	mm/slab_common: move declaration of __ksize() to mm/slab.h __ksize() is only called by KASAN. Remove export symbol and move declaration to mm/slab.h as we don't want to grow its callers. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-01 11:44:39 +02:00
Hyeonggon Yoo	2c1d697fb8	mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc using TRACE_EVENT() macro. And then this patch does: - Do not pass pointer to struct kmem_cache to trace_kmalloc. gfp flag is enough to know if it's accounted or not. - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event. - Avoid dereferencing s->name in when not using kmem_cache_free event. - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB Cc: Vasily Averin <vasily.averin@linux.dev> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-01 11:44:26 +02:00
Hyeonggon Yoo	11e9734bcb	mm/slab_common: unify NUMA and UMA version of tracepoints Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and remove _node postfix for NUMA version of tracepoints. This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node, but at this point maintaining both kmem_alloc and kmem_alloc_node event classes does not makes sense at all. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-01 10:40:27 +02:00
Hyeonggon Yoo	26a40990ba	mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace() Despite its name, kmem_cache_alloc[_node]_trace() is hook for inlined kmalloc. So rename it to kmalloc[_node]_trace(). Move its implementation to slab_common.c by using __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to save a function call when CONFIG_TRACING=n. Use __assume_kmalloc_alignment for kmalloc[_node]_trace instead of __assume_slab_alignement. Generally kmalloc has larger alignment requirements. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-01 10:40:27 +02:00
Hyeonggon Yoo	b140513524	mm/sl[au]b: generalize kmalloc subsystem Now everything in kmalloc subsystem can be generalized. Let's do it! Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(), kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them to slab_common.c. In the meantime, rename kmalloc_large_node_notrace() to __kmalloc_large_node() and make it static as it's now only called in slab_common.c. [ feng.tang@intel.com: adjust kfence skip list to include __kmem_cache_free so that kfence kunit tests do not fail ] Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-09-01 10:38:06 +02:00
Hyeonggon Yoo	d6a71648db	mm/slab: kmalloc: pass requests larger than order-1 page to page allocator There is not much benefit for serving large objects in kmalloc(). Let's pass large requests to page allocator like SLUB for better maintenance of common code. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-08-24 16:11:41 +02:00
Hyeonggon Yoo	c4cab55752	mm/slab_common: cleanup kmalloc_large() Now that kmalloc_large() and kmalloc_large_node() do mostly same job, make kmalloc_large() wrapper of kmalloc_large_node_notrace(). In the meantime, add missing flag fix code in kmalloc_large_node_notrace(). Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-08-24 16:11:41 +02:00
Hyeonggon Yoo	bf37d79102	mm/slab_common: kmalloc_node: pass large requests to page allocator Now that kmalloc_large_node() is in common code, pass large requests to page allocator in kmalloc_node() using kmalloc_large_node(). One problem is that currently there is no tracepoint in kmalloc_large_node(). Instead of simply putting tracepoint in it, use kmalloc_large_node{,_notrace} depending on its caller to show useful address for both inlined kmalloc_node() and __kmalloc_node_track_caller() when large objects are allocated. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-08-24 16:11:41 +02:00
Hyeonggon Yoo	a0c3b94002	mm/slub: move kmalloc_large_node() to slab_common.c In later patch SLAB will also pass requests larger than order-1 page to page allocator. Move kmalloc_large_node() to slab_common.c. Fold kmalloc_large_node_hook() into kmalloc_large_node() as there is no other caller. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-08-24 16:11:41 +02:00
Hyeonggon Yoo	e4c98d6895	mm/slab_common: fold kmalloc_order_trace() into kmalloc_large() There is no caller of kmalloc_order_trace() except kmalloc_large(). Fold it into kmalloc_large() and remove kmalloc_order{,_trace}(). Also add tracepoint in kmalloc_large() that was previously in kmalloc_order_trace(). Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-08-24 16:11:40 +02:00
ye xingchen	610f9c00ce	mm/slab_common: Remove the unneeded result variable Return the value from __kmem_cache_shrink() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-08-23 16:03:05 +02:00
Hyeonggon Yoo	3041808b52	mm/slab_common: move generic bulk alloc/free functions to SLOB Now that only SLOB use __kmem_cache_{alloc,free}_bulk(), move them to SLOB. No functional change intended. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-07-20 13:30:12 +02:00
Vasily Averin	b347aa7b57	mm/tracing: add 'accounted' entry into output of allocation tracepoints Slab caches marked with SLAB_ACCOUNT force accounting for every allocation from this cache even if __GFP_ACCOUNT flag is not passed. Unfortunately, at the moment this flag is not visible in ftrace output, and this makes it difficult to analyze the accounted allocations. This patch adds boolean "accounted" entry into trace output, and set it to 'true' for calls used __GFP_ACCOUNT flag and for allocations from caches marked with SLAB_ACCOUNT. Set it to 'false' if accounting is disabled in configs. Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/r/c418ed25-65fe-f623-fbf8-1676528859ed@openvz.org Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2022-07-04 17:11:27 +02:00
Linus Torvalds	98931dd95f	Yang Shi has improved the behaviour of khugepaged collapsing of readonly file-backed transparent hugepages. Johannes Weiner has arranged for zswap memory use to be tracked and managed on a per-cgroup basis. Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime enablement of the recent huge page vmemmap optimization feature. Baolin Wang contributes a series to fix some issues around hugetlb pagetable invalidation. Zhenwei Pi has fixed some interactions between hwpoisoned pages and virtualization. Tong Tiangen has enabled the use of the presently x86-only page_table_check debugging feature on arm64 and riscv. David Vernet has done some fixup work on the memcg selftests. Peter Xu has taught userfaultfd to handle write protection faults against shmem- and hugetlbfs-backed files. More DAMON development from SeongJae Park - adding online tuning of the feature and support for monitoring of fixed virtual address ranges. Also easier discovery of which monitoring operations are available. Nadav Amit has done some optimization of TLB flushing during mprotect(). Neil Brown continues to labor away at improving our swap-over-NFS support. David Hildenbrand has some fixes to anon page COWing versus get_user_pages(). Peng Liu fixed some errors in the core hugetlb code. Joao Martins has reduced the amount of memory consumed by device-dax's compound devmaps. Some cleanups of the arch-specific pagemap code from Anshuman Khandual. Muchun Song has found and fixed some errors in the TLB flushing of transparent hugepages. Roman Gushchin has done more work on the memcg selftests. And, of course, many smaller fixes and cleanups. Notably, the customary million cleanup serieses from Miaohe Lin. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo52xQAKCRDdBJ7gKXxA jtJFAQD238KoeI9z5SkPMaeBRYSRQmNll85mxs25KapcEgWgGQD9FAb7DJkqsIVk PzE+d9hEfirUGdL6cujatwJ6ejYR8Q8= =nFe6 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Almost all of MM here. A few things are still getting finished off, reviewed, etc. - Yang Shi has improved the behaviour of khugepaged collapsing of readonly file-backed transparent hugepages. - Johannes Weiner has arranged for zswap memory use to be tracked and managed on a per-cgroup basis. - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime enablement of the recent huge page vmemmap optimization feature. - Baolin Wang contributes a series to fix some issues around hugetlb pagetable invalidation. - Zhenwei Pi has fixed some interactions between hwpoisoned pages and virtualization. - Tong Tiangen has enabled the use of the presently x86-only page_table_check debugging feature on arm64 and riscv. - David Vernet has done some fixup work on the memcg selftests. - Peter Xu has taught userfaultfd to handle write protection faults against shmem- and hugetlbfs-backed files. - More DAMON development from SeongJae Park - adding online tuning of the feature and support for monitoring of fixed virtual address ranges. Also easier discovery of which monitoring operations are available. - Nadav Amit has done some optimization of TLB flushing during mprotect(). - Neil Brown continues to labor away at improving our swap-over-NFS support. - David Hildenbrand has some fixes to anon page COWing versus get_user_pages(). - Peng Liu fixed some errors in the core hugetlb code. - Joao Martins has reduced the amount of memory consumed by device-dax's compound devmaps. - Some cleanups of the arch-specific pagemap code from Anshuman Khandual. - Muchun Song has found and fixed some errors in the TLB flushing of transparent hugepages. - Roman Gushchin has done more work on the memcg selftests. ... and, of course, many smaller fixes and cleanups. Notably, the customary million cleanup serieses from Miaohe Lin" * tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits) mm: kfence: use PAGE_ALIGNED helper selftests: vm: add the "settings" file with timeout variable selftests: vm: add "test_hmm.sh" to TEST_FILES selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests selftests: vm: add migration to the .gitignore selftests/vm/pkeys: fix typo in comment ksm: fix typo in comment selftests: vm: add process_mrelease tests Revert "mm/vmscan: never demote for memcg reclaim" mm/kfence: print disabling or re-enabling message include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace" include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion" mm: fix a potential infinite loop in start_isolate_page_range() MAINTAINERS: add Muchun as co-maintainer for HugeTLB zram: fix Kconfig dependency warning mm/shmem: fix shmem folio swapoff hang cgroup: fix an error handling path in alloc_pagecache_max_30M() mm: damon: use HPAGE_PMD_SIZE tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate nodemask.h: fix compilation error with GCC12 ...	2022-05-26 12:32:41 -07:00
Linus Torvalds	2e17ce1106	slab changes for 5.19 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmKLUYoACgkQ4CHKc/GJ qRCMFwf/Tm1cf2JLUANrT58rjkrrj15EtKhnJdm5/yvmsWKps7WKPP4jeUHe+NTO NovAGt67lG1l6LMLczZkWckOkWlyYjC42CPDLdxRUkk+zQRb3nRA8Nbt6VTNBOfQ 0wTLOqXgsNXdSPSVUsKGL8kIAHNQTMX+7TjO6s7CXy/5Qag6r1iZX2HZxASOHxLa yYzaJ9pJRZBAMGnzV6L6v0J8KPnjYO0fB68S1qYQTbhoRxchtFF+0AIr1JydGgBI 9RFUowTrSpJkZtcSjabopvZz4JfCRDP+eAxkyw13feji7MG1FMX74HgDdw+HhzTv R2/6iA5WcsmzcXopsfMx8lUP/KIfPw== =gnSc -----END PGP SIGNATURE----- Merge tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - Conversion of slub_debug stack traces to stackdepot, allowing more useful debugfs-based inspection for e.g. memory leak debugging. Allocation and free debugfs info now includes full traces and is sorted by the unique trace frequency. The stackdepot conversion was already attempted last year but reverted by `ae14c63a9f`. The memory overhead (while not actually enabled on boot) has been meanwhile solved by making the large stackdepot allocation dynamic. The xfstest issues haven't been reproduced on current kernel locally nor in -next, so the slab cache layout changes that originally made that bug manifest were probably not the root cause. - Refactoring of dma-kmalloc caches creation. - Trivial cleanups such as removal of unused parameters, fixes and clarifications of comments. - Hyeonggon Yoo joins as a reviewer. * tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: MAINTAINERS: add myself as reviewer for slab mm/slub: remove unused kmem_cache_order_objects max mm: slab: fix comment for __assume_kmalloc_alignment mm: slab: fix comment for ARCH_KMALLOC_MINALIGN mm/slub: remove unneeded return value of slab_pad_check mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache() mm/slub: remove meaningless node check in ___slab_alloc() mm/slub: remove duplicate flag in allocate_slab() mm/slub: remove unused parameter in setup_object*() mm/slab.c: fix comments slab, documentation: add description of debugfs files for SLUB caches mm/slub: sort debugfs output by frequency of stack traces mm/slub: distinguish and print stack traces in debugfs files mm/slub: use stackdepot to save stack trace in objects mm/slub: move struct track init out of set_track() lib/stackdepot: allow requesting early initialization dynamically mm/slub, kunit: Make slub_kunit unaffected by user specified flags mm/slab: remove some unused functions	2022-05-25 10:24:04 -07:00
Vlastimil Babka	e001897da6	Merge branches 'slab/for-5.19/stackdepot' and 'slab/for-5.19/refactor' into slab/for-linus	2022-05-23 11:14:32 +02:00
Peter Collingbourne	d949a8155d	mm: make minimum slab alignment a runtime property When CONFIG_KASAN_HW_TAGS is enabled we currently increase the minimum slab alignment to 16. This happens even if MTE is not supported in hardware or disabled via kasan=off, which creates an unnecessary memory overhead in those cases. Eliminate this overhead by making the minimum slab alignment a runtime property and only aligning to 16 if KASAN is enabled at runtime. On a DragonBoard 845c (non-MTE hardware) with a kernel built with CONFIG_KASAN_HW_TAGS, waiting for quiescence after a full Android boot I see the following Slab measurements in /proc/meminfo (median of 3 reboots): Before: 169020 kB After: 167304 kB [akpm@linux-foundation.org: make slab alignment type `unsigned int' to avoid casting] Link: https://linux-review.googlesource.com/id/I752e725179b43b144153f4b6f584ceb646473ead Link: https://lkml.kernel.org/r/20220427195820.1716975-2-pcc@google.com Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-05-13 07:20:07 -07:00
Marco Elver	2dfe63e61c	mm, kfence: support kmem_dump_obj() for KFENCE objects Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been producing garbage data due to the object not actually being maintained by SLAB or SLUB. Fix this by implementing __kfence_obj_info() that copies relevant information to struct kmem_obj_info when the object was allocated by KFENCE; this is called by a common kmem_obj_info(), which also calls the slab/slub/slob specific variant now called __kmem_obj_info(). For completeness, kmem_dump_obj() now displays if the object was allocated by KFENCE. Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com Fixes: `b89fb5ef0c` ("mm, kfence: insert KFENCE hooks for SLUB") Fixes: `d3fb45f370` ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> [slab] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-15 14:49:55 -07:00
Ohhoon Kwon	33647783de	mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache() There are four types of kmalloc_caches: KMALLOC_NORMAL, KMALLOC_CGROUP, KMALLOC_RECLAIM, and KMALLOC_DMA. While the first three types are created using new_kmalloc_cache(), KMALLOC_DMA caches are created in a separate logic. Let KMALLOC_DMA caches be also created using new_kmalloc_cache(), to enhance readability. Historically, there were only KMALLOC_NORMAL caches and KMALLOC_DMA caches in the first place, and they were initialized in two separate logics. However, when KMALLOC_RECLAIM was introduced in v4.20 via commit `1291523f2c` ("mm, slab/slub: introduce kmalloc-reclaimable caches") and KMALLOC_CGROUP was introduced in v5.14 via commit `494c1dfe85` ("mm: memcg/slab: create a new set of kmalloc-cg-<n> caches"), their creations were merged with KMALLOC_NORMAL's only. KMALLOC_DMA creation logic should be merged with them, too. By merging KMALLOC_DMA initialization with other types, the following two changes might occur: 1. The order dma-kmalloc-<n> caches added in slab_cache list may be sorted by size. i.e. the order they appear in /proc/slabinfo may change as well. 2. slab_state will be set to UP after KMALLOC_DMA is created. In case of slub, freelist randomization is dependent on slab_state>=UP, and therefore KMALLOC_DMA cache's freelist will not be randomized in creation, but will be deferred to init_freelist_randomization(). Co-developed-by: JaeSang Yoo <jsyoo5b@gmail.com> Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com> Signed-off-by: Ohhoon Kwon <ohkwon1043@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220410162511.656541-1-ohkwon1043@gmail.com	2022-04-13 09:07:06 +02:00
Oliver Glitta	5cf909c553	mm/slub: use stackdepot to save stack trace in objects Many stack traces are similar so there are many similar arrays. Stackdepot saves each unique stack only once. Replace field addrs in struct track with depot_stack_handle_t handle. Use stackdepot to save stack trace. The benefits are smaller memory overhead and possibility to aggregate per-cache statistics in the following patch using the stackdepot handle instead of matching stacks manually. [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ] This was initially merged as commit `788691464c` and reverted by commit `ae14c63a9f` due to several issues, that should now be fixed. The problem of unconditional memory overhead by stackdepot has been addressed by commit `2dba5eb1c7` ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()"), so the dependency on stackdepot will result in extra memory usage only when a slab cache tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds. The build failures on some architectures were also addressed, and the reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this patch. Signed-off-by: Oliver Glitta <glittao@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>	2022-04-06 11:03:32 +02:00
Miaohe Lin	7d6b6cc355	mm/slab_common: use helper function is_power_of_2() Use helper function is_power_of_2() to check if KMALLOC_MIN_SIZE is power of two. Minor readability improvement. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220217091609.8214-1-linmiaohe@huawei.com	2022-02-21 11:38:12 +01:00
Linus Torvalds	f56caedaf9	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "146 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak, dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap, memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb, userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp, ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits) mm/damon: hide kernel pointer from tracepoint event mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging mm/damon/dbgfs: remove an unnecessary variable mm/damon: move the implementation of damon_insert_region to damon.h mm/damon: add access checking for hugetlb pages Docs/admin-guide/mm/damon/usage: update for schemes statistics mm/damon/dbgfs: support all DAMOS stats Docs/admin-guide/mm/damon/reclaim: document statistics parameters mm/damon/reclaim: provide reclamation statistics mm/damon/schemes: account how many times quota limit has exceeded mm/damon/schemes: account scheme actions that successfully applied mm/damon: remove a mistakenly added comment for a future feature Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk\|rm)_contexts Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning Docs/admin-guide/mm/damon/usage: remove redundant information Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks mm/damon: convert macro functions to static inline functions mm/damon: modify damon_rand() macro to static inline function mm/damon: move damon_rand() definition into damon.h ...	2022-01-15 20:37:06 +02:00
Quanfa Fu	0b8f0d8700	mm: fix some comment errors Link: https://lkml.kernel.org/r/20211101040208.460810-1-fuqf0919@gmail.com Signed-off-by: Quanfa Fu <fuqf0919@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-15 16:30:31 +02:00
Muchun Song	17c1736775	mm: memcontrol: make cgroup_memory_nokmem static Commit `494c1dfe85` ("mm: memcg/slab: create a new set of kmalloc-cg-<n> caches") makes cgroup_memory_nokmem global, however, it is unnecessary because there is already a function mem_cgroup_kmem_disabled() which exports it. Just make it static and replace it with mem_cgroup_kmem_disabled() in mm/slab_common.c. Link: https://lkml.kernel.org/r/20211109065418.21693-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-15 16:30:27 +02:00
Marco Elver	bed0a9b591	kasan: add ability to detect double-kmem_cache_destroy() Because mm/slab_common.c is not instrumented with software KASAN modes, it is not possible to detect use-after-free of the kmem_cache passed into kmem_cache_destroy(). In particular, because of the s->refcount-- and subsequent early return if non-zero, KASAN would never be able to see the double-free via kmem_cache_free(kmem_cache, s). To be able to detect a double-kmem_cache_destroy(), check accessibility of the kmem_cache, and in case of failure return early. While KASAN_HW_TAGS is able to detect such bugs, by checking accessibility and returning early we fail more gracefully and also avoid corrupting reused objects (where tags mismatch). A recent case of a double-kmem_cache_destroy() was detected by KFENCE: https://lkml.kernel.org/r/0000000000003f654905c168b09d@google.com, which was not detectable by software KASAN modes. Link: https://lkml.kernel.org/r/20211119142219.1519617-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-15 16:30:26 +02:00
Muchun Song	c29b5b3d33	mm: slab: make slab iterator functions static There is no external users of slab_start/next/stop(), so make them static. And the memory.kmem.slabinfo is deprecated, which outputs nothing now, so move memcg_slab_show() into mm/memcontrol.c and rename it to mem_cgroup_slab_show to be consistent with other function names. Link: https://lkml.kernel.org/r/20211109133359.32881-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-15 16:30:25 +02:00
Marco Elver	7302e91f39	mm/slab_common: use WARN() if cache still has objects on destroy Calling kmem_cache_destroy() while the cache still has objects allocated is a kernel bug, and will usually result in the entire cache being leaked. While the message in kmem_cache_destroy() resembles a warning, it is currently not implemented using a real WARN(). This is problematic for infrastructure testing the kernel, all of which rely on the specific format of WARN()s to pick up on bugs. Some 13 years ago this used to be a simple WARN_ON() in slub, but commit `d629d81957` ("slub: improve kmem_cache_destroy() error message") changed it into an open-coded warning to avoid confusion with a bug in slub itself. Instead, turn the open-coded warning into a real WARN() with the message preserved, so that test systems can actually identify these issues, and we get all the other benefits of using a normal WARN(). The warning message is extended with "when called from <caller-ip>" to make it even clearer where the fault lies. For most configurations this is only a cosmetic change, however, note that WARN() here will now also respect panic_on_warn. Link: https://lkml.kernel.org/r/20211102170733.648216-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-15 16:30:25 +02:00
Matthew Wilcox (Oracle)	7213230af5	mm: Use struct slab in kmem_obj_info() All three implementations of slab support kmem_obj_info() which reports details of an object allocated from the slab allocator. By using the slab type instead of the page type, we make it obvious that this can only be called for slabs. [ vbabka@suse.cz: also convert the related kmem_valid_obj() to folios ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>	2022-01-06 12:25:51 +01:00
Stephen Kitt	53944f171a	mm: remove HARDENED_USERCOPY_FALLBACK This has served its purpose and is no longer used. All usercopy violations appear to have been handled by now, any remaining instances (or new bugs) will cause copies to be rejected. This isn't a direct revert of commit `2d891fbc3b` ("usercopy: Allow strict enforcement of whitelists"); since usercopy_fallback is effectively 0, the fallback handling is removed too. This also removes the usercopy_fallback module parameter on slab_common. Link: https://github.com/KSPP/linux/issues/153 Link: https://lkml.kernel.org/r/20210921061149.1091163-1-steve@sk2.org Signed-off-by: Stephen Kitt <steve@sk2.org> Suggested-by: Kees Cook <keescook@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Joel Stanley <joel@jms.id.au> [defconfig change] Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: James Morris <jmorris@namei.org> Cc: "Serge E . Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-06 13:30:43 -07:00
Sebastian Andrzej Siewior	5a836bf6b0	mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context flush_all() flushes a specific SLAB cache on each CPU (where the cache is present). The deactivate_slab()/__free_slab() invocation happens within IPI handler and is problematic for PREEMPT_RT. The flush operation is not a frequent operation or a hot path. The per-CPU flush operation can be moved to within a workqueue. Because a workqueue handler, unlike IPI handler, does not disable irqs, flush_slab() now has to disable them for working with the kmem_cache_cpu fields. deactivate_slab() is safe to call with irqs enabled. [vbabka@suse.cz: adapt to new SLUB changes] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2021-09-04 01:12:23 +02:00
Linus Torvalds	28e92f9903	Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Bitmap parsing support for "all" as an alias for all bits - Documentation updates - Miscellaneous fixes, including some that overlap into mm and lockdep - kvfree_rcu() updates - mem_dump_obj() updates, with acks from one of the slab-allocator maintainers - RCU NOCB CPU updates, including limited deoffloading - SRCU updates - Tasks-RCU updates - Torture-test updates * 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits) tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states rcu: Add missing __releases() annotation rcu: Remove obsolete rcu_read_unlock() deadlock commentary rcu: Improve comments describing RCU read-side critical sections rcu: Create an unrcu_pointer() to remove __rcu from a pointer srcu: Early test SRCU polling start rcu: Fix various typos in comments rcu/nocb: Unify timers rcu/nocb: Prepare for fine-grained deferred wakeup rcu/nocb: Only cancel nocb timer if not polling rcu/nocb: Delete bypass_timer upon nocb_gp wakeup rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup rcu/nocb: Allow de-offloading rdp leader rcu/nocb: Directly call __wake_nocb_gp() from bypass timer rcu: Don't penalize priority boosting when there is nothing to boost rcu: Point to documentation of ordering guarantees rcu: Make rcu_gp_cleanup() be noinline for tracing rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP ...	2021-07-04 12:58:33 -07:00

1 2 3 4 5 ...

300 Commits