android_kernel_xiaomi_sm8450/mm
Shakeel Butt 912736a043 memcg: protect concurrent access to mem_cgroup_idr
commit 9972605a238339b85bd16b084eed5f18414d22db upstream.

Commit 73f576c04b ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures.  It introduced IDR to maintain the memcg ID
space.  The IDR depends on external synchronization mechanisms for
modifications.  For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications.  However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero.  Fix that.

We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time.  These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code.  Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object.  The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success.  No evidence were found for these cases.

Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them.  So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove().  These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them.  Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.

Link: https://lkml.kernel.org/r/20240802235822.1830976-1-shakeel.butt@linux.dev
Fixes: 73f576c04b ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Adapted due to commit be740503ed03 ("mm: memcontrol: fix cannot alloc the
  maximum memcg ID") and 6f0df8e16eb5 ("memcontrol: ensure memcg acquired by id
  is properly set up") not in the tree ]
Signed-off-by: Tomas Krcka <krckatom@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12 11:06:51 +02:00
..
kasan kasan: print the original fault addr when access invalid shadow 2023-11-08 17:30:43 +01:00
backing-dev.c writeback, cgroup: remove extra percpu_ref_exit() 2023-05-30 12:57:56 +01:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c
cma_debug.c debugfs: make sure we can remove u32_array files cleanly 2020-07-10 13:54:00 -07:00
cma.c mm/cma: use nth_page() in place of direct struct page manipulation 2023-11-28 16:54:58 +00:00
cma.h mm: cma: use CMA_MAX_NAME to define the length of cma name array 2020-09-01 09:19:43 +02:00
compaction.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-13 12:59:22 +02:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-08 18:30:35 +01:00
debug.c mm, dump_page: rename head_mapcount() --> head_compound_mapcount() 2020-10-13 18:38:29 -07:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c mm/early_ioremap.c: use %pa to print resource_size_t variables 2020-01-31 10:30:38 -08:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm/filemap: fix infinite loop in generic_file_buffered_read() 2023-09-23 11:01:10 +02:00
frame_vector.c media: vb2: frame_vector.c: replace WARN_ONCE with a comment 2023-10-10 21:53:33 +02:00
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup_benchmark.c mm/gup_benchmark: take the mmap lock around GUP 2020-10-18 09:27:09 -07:00
gup.c mm/migration: return errno when isolate_huge_page failed 2023-02-15 17:22:21 +01:00
highmem.c mm/highmem.c: clean up endif comments 2020-10-16 11:11:18 -07:00
hmm.c mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault 2022-01-27 10:54:36 +01:00
huge_memory.c mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage 2023-03-22 13:30:04 +01:00
hugetlb_cgroup.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hugetlb.c mm/hugetlb: change hugetlb_reserve_pages() to type bool 2024-03-15 10:48:22 -04:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-30 11:53:54 +01:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:27 -04:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
Kconfig.debug treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
khugepaged.c mm/khugepaged: check again on anon uffd-wp during isolation 2023-04-26 11:27:38 +02:00
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 11:32:02 +02:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:13:07 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-25 17:45:53 +01:00
madvise.c fs: add file and path permissions helpers 2024-06-21 14:52:58 +02:00
Makefile mm,kmemleak-test.c: move kmemleak-test.c to samples dir 2020-10-13 18:38:27 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: update huge page-table entry callbacks 2020-04-02 09:35:29 -07:00
memblock.c Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." 2023-02-22 12:55:56 +01:00
memcontrol.c memcg: protect concurrent access to mem_cgroup_idr 2024-09-12 11:06:51 +02:00
memfd.c memfd: check for non-NULL file_seals in memfd_create() syscall 2023-06-28 10:28:09 +02:00
memory_hotplug.c mm/memory_hotplug: use pfn math in place of direct struct page manipulation 2023-11-28 16:54:58 +00:00
memory-failure.c mm/memory-failure: fix an incorrect use of tail pages 2024-04-13 12:59:01 +02:00
memory.c x86/mm/pat: fix VM_PAT handling in COW mappings 2024-04-13 12:59:56 +02:00
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-02-15 17:22:21 +01:00
mempool.c mm/mempool: add 'else' to split mutually exclusive case 2020-10-13 18:38:34 -07:00
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-16 09:57:17 +01:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-04-13 12:58:39 +02:00
migrate.c mm/migrate: set swap entry values of THP tail pages properly. 2024-04-13 12:59:02 +02:00
mincore.c fs: add file and path permissions helpers 2024-06-21 14:52:58 +02:00
mlock.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
mm_init.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
mmap.c mm/mmap: undo ->mmap() when arch_validate_flags() fails 2022-10-26 13:25:11 +02:00
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-12-14 11:31:55 +01:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-27 13:53:54 +02:00
mmzone.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 20:00:09 +02:00
mprotect.c mm: don't try to NUMA-migrate COW pages that have other uses 2022-02-23 12:00:57 +01:00
mremap.c mm/mremap: hold the rmap lock in write mode when moving page table entries. 2022-08-21 15:15:21 +02:00
msync.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
nommu.c mm: remove alloc_vm_area 2020-10-18 09:27:10 -07:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 13:53:54 +02:00
page_alloc.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-13 12:59:22 +02:00
page_counter.c mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge() 2020-10-13 18:38:30 -07:00
page_ext.c mm/page_ext.c: drop pfn_present() check when onlining 2020-04-07 10:43:40 -07:00
page_idle.c mm/page_idle.c: skip offline pages 2020-06-08 11:05:55 -07:00
page_io.c mm: fix unexpected zeroed page mapping with zram swap 2022-04-20 09:23:25 +02:00
page_isolation.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_owner.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_poison.c mm/page_poison.c: replace bool variable with static key 2020-10-16 11:11:17 -07:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:29 -04:00
page-writeback.c Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again" 2024-07-18 13:05:43 +02:00
pagewalk.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 11:11:38 +02:00
percpu-internal.h percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:26 -04:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-19 18:27:21 +01:00
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 11:11:38 +02:00
readahead.c vfs: fix readahead(2) on block devices 2023-11-20 11:06:44 +01:00
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:28:56 +02:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c tmpfs: verify {g,u}id mount options correctly 2023-09-19 12:20:06 +02:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab_common.c mm/slub: fix redzoning for small allocations 2021-06-23 14:42:54 +02:00
slab.c mm/sl?b.c: remove ctor argument from kmem_cache_flags 2021-05-14 09:50:45 +02:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 10:39:19 +01:00
slob.c mm: memcg: convert vmstat slab counters to bytes 2020-08-07 11:33:24 -07:00
slub.c mm/slub: fix to return errno if kmalloc() fails 2022-09-28 11:10:28 +02:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2024-02-23 08:42:00 +01:00
swap_cgroup.c mm: memcontrol: make swap tracking an integral part of memory control 2020-06-03 20:09:48 -07:00
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c mm: swap: get rid of livelock in swapin readahead 2022-03-23 09:13:27 +01:00
swap.c mm: move call to compound_head() in release_pages() 2020-10-13 18:38:33 -07:00
swapfile.c mm: swap: fix race between free_swap_and_cache() and swapoff() 2024-04-13 12:58:26 +02:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:27 -04:00
usercopy.c mm/usercopy: return 1 from hardened_usercopy __setup() handler 2022-04-08 14:40:43 +02:00
userfaultfd.c userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb 2024-03-01 13:16:43 +01:00
util.c mm: vmalloc: introduce array allocation functions 2024-02-23 08:41:55 +01:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c mm: add a call to flush_cache_vmap() in vmap_pfn() 2023-08-30 16:23:14 +02:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-13 12:59:22 +02:00
vmstat.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 20:00:09 +02:00
workingset.c XArray updates for 5.9 2020-10-20 14:39:37 -07:00
z3fold.c mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page 2021-07-14 16:56:51 +02:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:42:43 +02:00
zswap.c mm/zswap: allow setting default status, compressor and allocator in Kconfig 2020-04-07 10:43:41 -07:00