android_kernel_xiaomi_sm8450/mm
Charan Teja Kalla 90ad17575d mm/sparsemem: fix race in accessing memory_section->usage
[ Upstream commit 5ec8e8ea8b7783fab150cf86404fc38cb4db8800 ]

The below race is observed on a PFN which falls into the device memory
region with the system memory configuration where PFN's are such that
[ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL].  Since normal zone start and end
pfn contains the device memory PFN's as well, the compaction triggered
will try on the device memory PFN's too though they end up in NOP(because
pfn_to_online_page() returns NULL for ZONE_DEVICE memory sections).  When
from other core, the section mappings are being removed for the
ZONE_DEVICE region, that the PFN in question belongs to, on which
compaction is currently being operated is resulting into the kernel crash
with CONFIG_SPASEMEM_VMEMAP enabled.  The crash logs can be seen at [1].

compact_zone()			memunmap_pages
-------------			---------------
__pageblock_pfn_to_page
   ......
 (a)pfn_valid():
     valid_section()//return true
			      (b)__remove_pages()->
				  sparse_remove_section()->
				    section_deactivate():
				    [Free the array ms->usage and set
				     ms->usage = NULL]
     pfn_section_valid()
     [Access ms->usage which
     is NULL]

NOTE: From the above it can be said that the race is reduced to between
the pfn_valid()/pfn_section_valid() and the section deactivate with
SPASEMEM_VMEMAP enabled.

The commit b943f045a9af("mm/sparse: fix kernel crash with
pfn_section_valid check") tried to address the same problem by clearing
the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns
false thus ms->usage is not accessed.

Fix this issue by the below steps:

a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage.

b) RCU protected read side critical section will either return NULL
   when SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage.

c) Free the ->usage with kfree_rcu() and set ms->usage = NULL.  No
   attempt will be made to access ->usage after this as the
   SECTION_HAS_MEM_MAP is cleared thus valid_section() return false.

Thanks to David/Pavan for their inputs on this patch.

[1] https://lore.kernel.org/linux-mm/994410bb-89aa-d987-1f50-f514903c55aa@quicinc.com/

On Snapdragon SoC, with the mentioned memory configuration of PFN's as
[ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL], we are able to see bunch of
issues daily while testing on a device farm.

For this particular issue below is the log.  Though the below log is
not directly pointing to the pfn_section_valid(){ ms->usage;}, when we
loaded this dump on T32 lauterbach tool, it is pointing.

[  540.578056] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
[  540.578068] Mem abort info:
[  540.578070]   ESR = 0x0000000096000005
[  540.578073]   EC = 0x25: DABT (current EL), IL = 32 bits
[  540.578077]   SET = 0, FnV = 0
[  540.578080]   EA = 0, S1PTW = 0
[  540.578082]   FSC = 0x05: level 1 translation fault
[  540.578085] Data abort info:
[  540.578086]   ISV = 0, ISS = 0x00000005
[  540.578088]   CM = 0, WnR = 0
[  540.579431] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBSBTYPE=--)
[  540.579436] pc : __pageblock_pfn_to_page+0x6c/0x14c
[  540.579454] lr : compact_zone+0x994/0x1058
[  540.579460] sp : ffffffc03579b510
[  540.579463] x29: ffffffc03579b510 x28: 0000000000235800 x27:000000000000000c
[  540.579470] x26: 0000000000235c00 x25: 0000000000000068 x24:ffffffc03579b640
[  540.579477] x23: 0000000000000001 x22: ffffffc03579b660 x21:0000000000000000
[  540.579483] x20: 0000000000235bff x19: ffffffdebf7e3940 x18:ffffffdebf66d140
[  540.579489] x17: 00000000739ba063 x16: 00000000739ba063 x15:00000000009f4bff
[  540.579495] x14: 0000008000000000 x13: 0000000000000000 x12:0000000000000001
[  540.579501] x11: 0000000000000000 x10: 0000000000000000 x9 :ffffff897d2cd440
[  540.579507] x8 : 0000000000000000 x7 : 0000000000000000 x6 :ffffffc03579b5b4
[  540.579512] x5 : 0000000000027f25 x4 : ffffffc03579b5b8 x3 :0000000000000001
[  540.579518] x2 : ffffffdebf7e3940 x1 : 0000000000235c00 x0 :0000000000235800
[  540.579524] Call trace:
[  540.579527]  __pageblock_pfn_to_page+0x6c/0x14c
[  540.579533]  compact_zone+0x994/0x1058
[  540.579536]  try_to_compact_pages+0x128/0x378
[  540.579540]  __alloc_pages_direct_compact+0x80/0x2b0
[  540.579544]  __alloc_pages_slowpath+0x5c0/0xe10
[  540.579547]  __alloc_pages+0x250/0x2d0
[  540.579550]  __iommu_dma_alloc_noncontiguous+0x13c/0x3fc
[  540.579561]  iommu_dma_alloc+0xa0/0x320
[  540.579565]  dma_alloc_attrs+0xd4/0x108

[quic_charante@quicinc.com: use kfree_rcu() in place of synchronize_rcu(), per David]
  Link: https://lkml.kernel.org/r/1698403778-20938-1-git-send-email-quic_charante@quicinc.com
Link: https://lkml.kernel.org/r/1697202267-23600-1-git-send-email-quic_charante@quicinc.com
Fixes: f46edbd1b1 ("mm/sparsemem: add helpers track active portions of a section at boot")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23 08:42:00 +01:00
..
kasan kasan: print the original fault addr when access invalid shadow 2023-11-08 17:30:43 +01:00
backing-dev.c writeback, cgroup: remove extra percpu_ref_exit() 2023-05-30 12:57:56 +01:00
balloon_compaction.c
cleancache.c
cma_debug.c debugfs: make sure we can remove u32_array files cleanly 2020-07-10 13:54:00 -07:00
cma.c mm/cma: use nth_page() in place of direct struct page manipulation 2023-11-28 16:54:58 +00:00
cma.h mm: cma: use CMA_MAX_NAME to define the length of cma name array 2020-09-01 09:19:43 +02:00
compaction.c mm, compaction: fix fast_isolate_around() to stay within boundaries 2023-01-14 10:16:27 +01:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-08 18:30:35 +01:00
debug.c mm, dump_page: rename head_mapcount() --> head_compound_mapcount() 2020-10-13 18:38:29 -07:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c mm/early_ioremap.c: use %pa to print resource_size_t variables 2020-01-31 10:30:38 -08:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm/filemap: fix infinite loop in generic_file_buffered_read() 2023-09-23 11:01:10 +02:00
frame_vector.c media: vb2: frame_vector.c: replace WARN_ONCE with a comment 2023-10-10 21:53:33 +02:00
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup_benchmark.c mm/gup_benchmark: take the mmap lock around GUP 2020-10-18 09:27:09 -07:00
gup.c mm/migration: return errno when isolate_huge_page failed 2023-02-15 17:22:21 +01:00
highmem.c mm/highmem.c: clean up endif comments 2020-10-16 11:11:18 -07:00
hmm.c mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault 2022-01-27 10:54:36 +01:00
huge_memory.c mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage 2023-03-22 13:30:04 +01:00
hugetlb_cgroup.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hugetlb.c mm/migration: return errno when isolate_huge_page failed 2023-02-15 17:22:21 +01:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-30 11:53:54 +01:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:27 -04:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
Kconfig.debug treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
khugepaged.c mm/khugepaged: check again on anon uffd-wp during isolation 2023-04-26 11:27:38 +02:00
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 11:32:02 +02:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:13:07 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-25 17:45:53 +01:00
madvise.c mm: fix madivse_pageout mishandling on non-LRU page 2022-10-05 10:38:40 +02:00
Makefile mm,kmemleak-test.c: move kmemleak-test.c to samples dir 2020-10-13 18:38:27 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: update huge page-table entry callbacks 2020-04-02 09:35:29 -07:00
memblock.c Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." 2023-02-22 12:55:56 +01:00
memcontrol.c mm: kmem: drop __GFP_NOFAIL when allocating objcg vectors 2023-11-28 16:55:01 +00:00
memfd.c memfd: check for non-NULL file_seals in memfd_create() syscall 2023-06-28 10:28:09 +02:00
memory_hotplug.c mm/memory_hotplug: use pfn math in place of direct struct page manipulation 2023-11-28 16:54:58 +00:00
memory-failure.c mm/memory-failure: check the mapcount of the precise page 2024-01-15 18:48:06 +01:00
memory.c mm: fix unmap_mapping_range high bits shift bug 2024-01-15 18:48:06 +01:00
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-02-15 17:22:21 +01:00
mempool.c mm/mempool: add 'else' to split mutually exclusive case 2020-10-13 18:38:34 -07:00
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-16 09:57:17 +01:00
memtest.c
migrate.c mm/migration: return errno when isolate_huge_page failed 2023-02-15 17:22:21 +01:00
mincore.c mm: factor find_get_incore_page out of mincore_page 2020-10-13 18:38:29 -07:00
mlock.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
mm_init.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
mmap.c mm/mmap: undo ->mmap() when arch_validate_flags() fails 2022-10-26 13:25:11 +02:00
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-12-14 11:31:55 +01:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-27 13:53:54 +02:00
mmzone.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 20:00:09 +02:00
mprotect.c mm: don't try to NUMA-migrate COW pages that have other uses 2022-02-23 12:00:57 +01:00
mremap.c mm/mremap: hold the rmap lock in write mode when moving page table entries. 2022-08-21 15:15:21 +02:00
msync.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
nommu.c mm: remove alloc_vm_area 2020-10-18 09:27:10 -07:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 13:53:54 +02:00
page_alloc.c mm/page_alloc: correct start page when guard page debug is enabled 2023-11-08 17:30:41 +01:00
page_counter.c mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge() 2020-10-13 18:38:30 -07:00
page_ext.c mm/page_ext.c: drop pfn_present() check when onlining 2020-04-07 10:43:40 -07:00
page_idle.c mm/page_idle.c: skip offline pages 2020-06-08 11:05:55 -07:00
page_io.c mm: fix unexpected zeroed page mapping with zram swap 2022-04-20 09:23:25 +02:00
page_isolation.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_owner.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_poison.c mm/page_poison.c: replace bool variable with static key 2020-10-16 11:11:17 -07:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:29 -04:00
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-12 20:18:22 +01:00
pagewalk.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 11:11:38 +02:00
percpu-internal.h percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:26 -04:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-19 18:27:21 +01:00
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 11:11:38 +02:00
readahead.c vfs: fix readahead(2) on block devices 2023-11-20 11:06:44 +01:00
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:28:56 +02:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c tmpfs: verify {g,u}id mount options correctly 2023-09-19 12:20:06 +02:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab_common.c mm/slub: fix redzoning for small allocations 2021-06-23 14:42:54 +02:00
slab.c mm/sl?b.c: remove ctor argument from kmem_cache_flags 2021-05-14 09:50:45 +02:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 10:39:19 +01:00
slob.c mm: memcg: convert vmstat slab counters to bytes 2020-08-07 11:33:24 -07:00
slub.c mm/slub: fix to return errno if kmalloc() fails 2022-09-28 11:10:28 +02:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2024-02-23 08:42:00 +01:00
swap_cgroup.c mm: memcontrol: make swap tracking an integral part of memory control 2020-06-03 20:09:48 -07:00
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c mm: swap: get rid of livelock in swapin readahead 2022-03-23 09:13:27 +01:00
swap.c mm: move call to compound_head() in release_pages() 2020-10-13 18:38:33 -07:00
swapfile.c mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() 2023-04-20 12:10:24 +02:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:27 -04:00
usercopy.c mm/usercopy: return 1 from hardened_usercopy __setup() handler 2022-04-08 14:40:43 +02:00
userfaultfd.c mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() 2022-05-15 20:00:09 +02:00
util.c mm: vmalloc: introduce array allocation functions 2024-02-23 08:41:55 +01:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c mm: add a call to flush_cache_vmap() in vmap_pfn() 2023-08-30 16:23:14 +02:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm: vmscan: fix extreme overreclaim and swap floods 2022-12-02 17:40:04 +01:00
vmstat.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 20:00:09 +02:00
workingset.c XArray updates for 5.9 2020-10-20 14:39:37 -07:00
z3fold.c mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page 2021-07-14 16:56:51 +02:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:42:43 +02:00
zswap.c mm/zswap: allow setting default status, compressor and allocator in Kconfig 2020-04-07 10:43:41 -07:00