android_kernel_samsung_sm8650

Author	SHA1	Message	Date
Todd Kjos	d645236cfd	ANDROID: fix kernelci build failure in vmscan.c Vendor hooks added in vmscan.c directly referenced a vendor-specific field which is only defined if CONFIG_ANDROID_VENDOR_OEM_DATA is enabled. A kernelci config wich CONFIG_ANDROID_VENDOR_OEM_DATA disabled and CONFIG_ANDROID_VENDOR_HOOKS enabled has a build-break due to the undefined field. Fixes: `3e2dc32f59` ("ANDROID: mm: create vendor hooks for memory reclaim") Change-Id: Id7d31af9cf5752eba5ba27c5d31a288230f29114 Signed-off-by: Todd Kjos <tkjos@google.com>	2023-06-09 19:30:11 +00:00
zhouwenhao	87f8c82651	ANDROID: vendor_hooks:vendor hook for madvise_cold_or_pageout_pte_range. add vendor hook in madvise_cold_or_pageout_pte_range to control the pages to be reclaimed more fine-grained. Bug: 284808098 Signed-off-by: zhouwenhao <zhouwenhao@xiaomi.com> Change-Id: I298fde436df192cea9b1541d857f3a46808e06f2	2023-06-08 22:35:52 +00:00
Liujie Xie	3efffff553	ANDROID: Allow vendor module to reclaim a memcg Export try_to_free_mem_cgroup_pages function to allow vendor modules to reclaim a memory cgroup. Bug: 192052083 Signed-off-by: Liujie Xie <xieliujie@oppo.com> (cherry picked from commit a8385d61f27b57d98fb6245a23477c6ed5db4a7c) (cherry picked from commit 1ed025b9a1c8dc1420ccf1a656797b85eacd2bdb) Change-Id: Iec6ef50f5c71c62d0c9aa6de90e56a143dac61c1 Signed-off-by: lvwenhuan <lvwenhuan@oppo.com>	2023-06-08 00:54:10 +00:00
Liujie Xie	f627d47d36	ANDROID: Export memcg functions to allow module to add new files Export cgroup_add_legacy_cftypes and a helper function to allow vendor module to expose additional files in the memory cgroup hierarchy. Bug: 192052083 Signed-off-by: Liujie Xie <xieliujie@oppo.com> (cherry picked from commit f41a95eadca98506e627b21f5cc73332bba4d95c) (cherry picked from commit bf24c43b7f90290d2ac6f8163b43ab00f8f820b9) Change-Id: Ie2b936b3e77c7ab6d740d1bb6d70e03c70a326a7 Signed-off-by: lvwenhuan <lvwenhuan@oppo.com>	2023-06-08 00:54:10 +00:00
Liujie Xie	032458b9cb	ANDROID: vendor_hooks: add hooks in mem_cgroup subsystem Add hooks to tune memory policy based on mem_cgroup. Bug: 192052083 Signed-off-by: Liujie Xie <xieliujie@oppo.com> (cherry picked from commit 1cdcf76b1532ca8092bb6601f45d27c1ed19f448) (cherry picked from commit 7af5027889c760a4e02abf7cbd1b95685af4b233) Change-Id: Ica1a5409eed86fbd466edd2c7557f94972a40175 Signed-off-by: lvwenhuan <lvwenhuan@oppo.com>	2023-06-08 00:54:10 +00:00
Kalesh Singh	b0375cb69c	BACKPORT: mm: Multi-gen LRU: remove wait_event_killable() Android 14 and later default to MGLRU [1] and field telemetry showed occasional long tail latency (>100ms) in the reclaim path. Tracing revealed priority inversion in the reclaim path. In try_to_inc_max_seq(), when high priority tasks were blocked on wait_event_killable(), the preemption of the low priority task to call wake_up_all() caused those high priority tasks to wait longer than necessary. In general, this problem is not different from others of its kind, e.g., one caused by mutex_lock(). However, it is specific to MGLRU because it introduced the new wait queue lruvec->mm_state.wait. The purpose of this new wait queue is to avoid the thundering herd problem. If many direct reclaimers rush into try_to_inc_max_seq(), only one can succeed, i.e., the one to wake up the rest, and the rest who failed might cause premature OOM kills if they do not wait. So far there is no evidence supporting this scenario, based on how often the wait has been hit. And this begs the question how useful the wait queue is in practice. Based on Minchan's recommendation, which is in line with his commit `6d4675e601` ("mm: don't be stuck to rmap lock on reclaim path") and the rest of the MGLRU code which also uses trylock when possible, remove the wait queue. [1] https://android-review.googlesource.com/q/I7ed7fbfd6ef9ce10053347528125dd98c39e50bf Link: https://lkml.kernel.org/r/20230413214326.2147568-1-kaleshsingh@google.com Fixes: `bd74fdaea1` ("mm: multi-gen LRU: support page table walks") Change-Id: I911f3968fd1adb25171279cc5b6f48ccb7efc8de Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Minchan Kim <minchan@kernel.org> Reported-by: Wei Wang <wvw@google.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 7f63cf2d9b9bbe7b90f808927558a66ff737d399) Bug: 277906484 [ Kalesh Singh - Fix conflict in mm/vmscan.c (lru_gen_del_mm) ] Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2023-06-07 14:25:07 +00:00
Shreyas K K	d7e1f4f021	ANDROID: vendor hooks: Add hooks to support bootloader based hibernation Add vendor hooks to disable randomization of swap slot allocation for swap partition used for saving hibernation image. Another level of randomization of swap slots takes place at the firmware level as well in order to address the wear leveling for UFS/MMC devices, so this vendor hook checks if a block device represents the swap partition being used for saving hibernation image, if yes, the swap slot allocation for such partition is serialized at kernel level. There is a performance advantage of reading contiguous pages of hibernation image, it makes the restore logic of hibernation image simpler and faster as there are no seeks involved in the secondary storage to read multiple contiguous pages of the image. Bug: 279879797 Change-Id: I8258b5166d8c6952fe9eb91a5a9826f33b836f00 Signed-off-by: Vivek Kumar <quic_vivekuma@quicinc.com> Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>	2023-06-07 14:25:04 +00:00
Suren Baghdasaryan	a264d8efcb	BACKPORT: mm: do not increment pgfault stats when page fault handler retries If the page fault handler requests a retry, we will count the fault multiple times. This is a relatively harmless problem as the retry paths are not often requested, and the only user-visible problem is that the fault counter will be slightly higher than it should be. Nevertheless, userspace only took one fault, and should not see the fact that the kernel had to retry the fault multiple times. Move page fault accounting into mm_account_fault() and skip incomplete faults which will be accounted upon completion. Link: https://lkml.kernel.org/r/20230419175836.3857458-1-surenb@google.com Fixes: `d065bd810b` ("mm: retry page fault when blocking on disk transfer") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Peter Xu <peterx@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Minchan Kim <minchan@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 53156443a30368c0759c22e54a8d5cacc1b543cc) [surenb: resolve differences in handle_mm_fault() between 6.1 and 6.4 kernel versions] Bug: 161210518 Change-Id: Ic8cd807128ffd2c77a4db2af85b64bc24cc5052b Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:03 +00:00
Suren Baghdasaryan	78c6875e2f	UPSTREAM: mm: change per-VMA lock statistics to be disabled by default Change CONFIG_PER_VMA_LOCK_STATS to be disabled by default, as most users don't need it. Add configuration help to clarify its usage. Link: https://lkml.kernel.org/r/20230428173533.18158-1-surenb@google.com Fixes: 52f238653e45 ("mm: introduce per-VMA lock statistics") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 6152e53d9671b0ccc21c1bca842617b32ccfc5d8) Bug: 161210518 Change-Id: Ibd57999a415b5433ae3b99365ea50526a35452d1 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:02 +00:00
Suren Baghdasaryan	23fcd3167e	UPSTREAM: mm/mmap: free vm_area_struct without call_rcu in exit_mmap call_rcu() can take a long time when callback offloading is enabled. Its use in the vm_area_free can cause regressions in the exit path when multiple VMAs are being freed. Because exit_mmap() is called only after the last mm user drops its refcount, the page fault handlers can't be racing with it. Any other possible user like oom-reaper or process_mrelease are already synchronized using mmap_lock. Therefore exit_mmap() can free VMAs directly, without the use of call_rcu(). Expose __vm_area_free() and use it from exit_mmap() to avoid possible call_rcu() floods and performance regressions caused by it. Link: https://lkml.kernel.org/r/20230227173632.3292573-33-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 0d2ebf9c3f7822e7ba3e4792ea3b6b19aa2da34a) Bug: 161210518 Change-Id: I4fbf3ef38fdb22a3c80dcc61125ec21d2c426100 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:02 +00:00
Suren Baghdasaryan	ebbbcdfeaf	UPSTREAM: mm: introduce per-VMA lock statistics Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics about handling page fault under VMA lock. Link: https://lkml.kernel.org/r/20230227173632.3292573-29-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 52f238653e452e0fda61e880f263a173d219acd1) Bug: 161210518 Change-Id: I1bc9ab9bc0307af26e0c51ba12f9ad561af5b6c8 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:01 +00:00
Suren Baghdasaryan	4e4c6989ae	UPSTREAM: mm: prevent userfaults to be handled under per-vma lock Due to the possibility of handle_userfault dropping mmap_lock, avoid fault handling under VMA lock and retry holding mmap_lock. This can be handled more gracefully in the future. Link: https://lkml.kernel.org/r/20230227173632.3292573-28-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 444eeb17437a0ef526c606e9141a415d3b7dfddd) Bug: 161210518 Change-Id: I383603d637497ea9917ad08908530f91052a17cc Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:01 +00:00
Suren Baghdasaryan	6e306e82ac	UPSTREAM: mm: prevent do_swap_page from handling page faults under VMA lock Due to the possibility of do_swap_page dropping mmap_lock, abort fault handling under VMA lock and retry holding mmap_lock. This can be handled more gracefully in the future. Link: https://lkml.kernel.org/r/20230227173632.3292573-27-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 17c05f18e54158a3eed0c22c85b7a756b63dcc01) Bug: 161210518 Change-Id: I047f4d0e0ca3b3bf9505e5cda2da768c88bed20e Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:01 +00:00
Suren Baghdasaryan	c06661eab5	UPSTREAM: mm: fall back to mmap_lock if vma->anon_vma is not yet set When vma->anon_vma is not set, page fault handler will set it by either reusing anon_vma of an adjacent VMA if VMAs are compatible or by allocating a new one. find_mergeable_anon_vma() walks VMA tree to find a compatible adjacent VMA and that requires not only the faulting VMA to be stable but also the tree structure and other VMAs inside that tree. Therefore locking just the faulting VMA is not enough for this search. Fall back to taking mmap_lock when vma->anon_vma is not set. This situation happens only on the first page fault and should not affect overall performance. Link: https://lkml.kernel.org/r/20230227173632.3292573-25-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 2ac0af1b66e3b66307f53b1cc446514308ec466d) Bug: 161210518 Change-Id: Iafacad5bda7bb138b290f38421a22d828051b067 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:01 +00:00
Suren Baghdasaryan	5949b78f6c	UPSTREAM: mm: introduce lock_vma_under_rcu to be used from arch-specific code Introduce lock_vma_under_rcu function to lookup and lock a VMA during page fault handling. When VMA is not found, can't be locked or changes after being locked, the function returns NULL. The lookup is performed under RCU protection to prevent the found VMA from being destroyed before the VMA lock is acquired. VMA lock statistics are updated according to the results. For now only anonymous VMAs can be searched this way. In other cases the function returns NULL. Link: https://lkml.kernel.org/r/20230227173632.3292573-24-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 50ee32537206140e4cf6e47024be29a84d458d49) Bug: 161210518 Change-Id: I4872bb04f5c8a515e4b31bc36c95e15b62cbd0da Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:01 +00:00
Suren Baghdasaryan	35ffa4830e	BACKPORT: mm: introduce vma detached flag Per-vma locking mechanism will search for VMA under RCU protection and then after locking it, has to ensure it was not removed from the VMA tree after we found it. To make this check efficient, introduce a vma->detached flag to mark VMAs which were removed from the VMA tree. Link: https://lkml.kernel.org/r/20230227173632.3292573-23-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 457f67be5910a2b5f1fda8af06bfe4d3492a0a4f) [surenb: vma_complete does not exist in 6.1, therefore patch is adjusted to mark VMAs detached directly in vma_expand and __vma_adjust] Bug: 161210518 Change-Id: Id1f31733cb7a36f3f1294b2be83cf3b87ba3f812 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:00 +00:00
Suren Baghdasaryan	3c6748cd51	UPSTREAM: mm/mmap: prevent pagefault handler from racing with mmu_notifier registration Page fault handlers might need to fire MMU notifications while a new notifier is being registered. Modify mm_take_all_locks to write-lock all VMAs and prevent this race with page fault handlers that would hold VMA locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same locking order as in page fault handlers. Link: https://lkml.kernel.org/r/20230227173632.3292573-22-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit eeff9a5d47f89bc641034fea05501c8a6de131cb) Bug: 161210518 Change-Id: I4176bf0e1b07f03dfc1ac7dd37d7941d5a1dbc02 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:00 +00:00
Suren Baghdasaryan	9cc64c7fb9	UPSTREAM: mm: conditionally write-lock VMA in free_pgtables Normally free_pgtables needs to lock affected VMAs except for the case when VMAs were isolated under VMA write-lock. munmap() does just that, isolating while holding appropriate locks and then downgrading mmap_lock and dropping per-VMA locks before freeing page tables. Add a parameter to free_pgtables for such scenario. Link: https://lkml.kernel.org/r/20230227173632.3292573-20-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 98e51a2239d9d419d819cd61a2e720ebf19a8b0a) Bug: 161210518 Change-Id: I3c9177cce187526407754baf7641d3741ca7b0cb Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:00 +00:00
Suren Baghdasaryan	5f1e1ab919	UPSTREAM: mm: write-lock VMAs before removing them from VMA tree Write-locking VMAs before isolating them ensures that page fault handlers don't operate on isolated VMAs. [surenb@google.com: mm/nommu: remove unnecessary VMA locking] Link: https://lkml.kernel.org/r/20230301190457.1498985-1-surenb@google.com Link: https://lore.kernel.org/all/Y%2F8CJQGNuMUTdLwP@localhost/ Link: https://lkml.kernel.org/r/20230227173632.3292573-19-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 73046fd00b069ffd198eda099dae966e152fae39) Bug: 161210518 Change-Id: Ia742da40896e6bc4e8150911596f80dca5ef3e12 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:00 +00:00
Suren Baghdasaryan	24ecdbc5e2	UPSTREAM: mm/mremap: write-lock VMA while remapping it to a new address range Write-lock VMA as locked before copying it and when copy_vma produces a new VMA. Link: https://lkml.kernel.org/r/20230227173632.3292573-18-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit d6ac235de4ba6dc659eebb5f4e5ba0a8523d8424) Bug: 161210518 Change-Id: I38b5c5689380754a366223caff30e1ac4aaf7cc4 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:25:00 +00:00
Suren Baghdasaryan	2554cb4775	FROMLIST: mm/mmap: write-lock VMAs affected by VMA expansion vma_expand changes VMA boundaries and might result in freeing an adjacent VMA. Write-lock affected VMAs to prevent concurrent page faults. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/all/20230109205336.3665937-22-surenb@google.com/ [surenb: using older v1 of patchset due to __vma_adjust() being removed in 6.2-rc4] [surenb: lock next earlier when removing it like we do in v3: https://lore.kernel.org/all/20230216051750.3125598-18-surenb@google.com/] Bug: 161210518 Change-Id: I31aff80996b4ad646bdd6861ff6479c8eb2a690a Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:59 +00:00
Suren Baghdasaryan	57b3f8a5ab	FROMLIST: mm/mmap: write-lock VMAs in vma_adjust vma_adjust modifies a VMA and possibly its neighbors. Write-lock them before making the modifications. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/all/20230109205336.3665937-21-surenb@google.com/ [surenb: using older v1 of patchset due to __vma_adjust() being removed in 6.2-rc4] [surenb: minor fixes in next_next locking inside __vma_adjust] Bug: 161210518 Change-Id: I9ab2f88c82a7071fe2f1a14c51a2e6f1b6196681 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:59 +00:00
Suren Baghdasaryan	998ec9f54d	FROMLIST: mm/mmap: write-lock VMAs before merging, splitting or expanding them Decisions about whether VMAs can be merged, split or expanded must be made while VMAs are protected from the changes which can affect that decision. For example, merge_vma uses vma->anon_vma in its decision whether the VMA can be merged. Meanwhile, page fault handler changes vma->anon_vma during COW operation. Write-lock all VMAs which might be affected by a merge or split operation before making decision how such operations should be performed. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/all/20230216051750.3125598-17-surenb@google.com/ [surenb: using older v3 of patchset due to missing __vma_adjust() refactoring in 6.2-rc4 which introduced vma_prepare()] Bug: 161210518 Change-Id: I56d84aa67366a1988fc81296da7164ad7f89a5c0 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:59 +00:00
Suren Baghdasaryan	d73ebe031c	UPSTREAM: mm/khugepaged: write-lock VMA while collapsing a huge page Protect VMA from concurrent page fault handler while collapsing a huge page. Page fault handler needs a stable PMD to use PTL and relies on per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(), set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will not be detected by a page fault handler without proper locking. Before this patch, page tables can be walked under any one of the mmap_lock, the mapping lock, and the anon_vma lock; so when khugepaged unlinks and frees page tables, it must ensure that all of those either are locked or don't exist. This patch adds a fourth lock under which page tables can be traversed, and so khugepaged must also lock out that one. [surenb@google.com: vm_lock/i_mmap_rwsem inversion in retract_page_tables] Link: https://lkml.kernel.org/r/20230303213250.3555716-1-surenb@google.com [surenb@google.com: build fix] Link: https://lkml.kernel.org/r/CAJuCfpFjWhtzRE1X=J+_JjgJzNKhq-=JT8yTBSTHthwp0pqWZw@mail.gmail.com Link: https://lkml.kernel.org/r/20230227173632.3292573-16-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 55fd6fccad3172c0feaaa817f0a1283629ff183e) Bug: 161210518 Change-Id: I6c3cddd7861dd03fe496c4de20f284dc692c8654 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:59 +00:00
Suren Baghdasaryan	3771808d64	FROMLIST: mm/mmap: move VMA locking before vma_adjust_trans_huge call vma_adjust_trans_huge() modifies the VMA and such modifications should be done after VMA is marked as being written. Therefore move VMA flag modifications before vma_adjust_trans_huge() so that VMA is marked before all these modifications. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/all/20230216051750.3125598-15-surenb@google.com/ [surenb: using older v3 of patchset due to missing __vma_adjust() refactoring in 6.2-rc4 which introduced vma_prepare()] Bug: 161210518 Change-Id: I650162fd85fabee00a8a05ddb32318e654270cb1 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:59 +00:00
Suren Baghdasaryan	a9ea3113d4	UPSTREAM: mm: add per-VMA lock and helper functions to control it Introduce per-VMA locking. The lock implementation relies on a per-vma and per-mm sequence counters to note exclusive locking: - read lock - (implemented by vma_start_read) requires the vma (vm_lock_seq) and mm (mm_lock_seq) sequence counters to differ. If they match then there must be a vma exclusive lock held somewhere. - read unlock - (implemented by vma_end_read) is a trivial vma->lock unlock. - write lock - (vma_start_write) requires the mmap_lock to be held exclusively and the current mm counter is assigned to the vma counter. This will allow multiple vmas to be locked under a single mmap_lock write lock (e.g. during vma merging). The vma counter is modified under exclusive vma lock. - write unlock - (vma_end_write_all) is a batch release of all vma locks held. It doesn't pair with a specific vma_start_write! It is done before exclusive mmap_lock is released by incrementing mm sequence counter (mm_lock_seq). - write downgrade - if the mmap_lock is downgraded to the read lock, all vma write locks are released as well (effectivelly same as write unlock). Link: https://lkml.kernel.org/r/20230227173632.3292573-13-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 5e31275cc997f8ec5d9e8d65fe9840ebed89db19) Bug: 161210518 Change-Id: I5e0db53a4b5562e59dd031fabbae4f97acc1bce1 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:59 +00:00
Suren Baghdasaryan	04f73ad5b4	UPSTREAM: mm: introduce CONFIG_PER_VMA_LOCK Patch series "Per-VMA locks", v4. LWN article describing the feature: https://lwn.net/Articles/906852/ Per-vma locks idea that was discussed during SPF [1] discussion at LSF/MM last year [2], which concluded with suggestion that “a reader/writer semaphore could be put into the VMA itself; that would have the effect of using the VMA as a sort of range lock. There would still be contention at the VMA level, but it would be an improvement.” This patchset implements this suggested approach. When handling page faults we lookup the VMA that contains the faulting page under RCU protection and try to acquire its lock. If that fails we fall back to using mmap_lock, similar to how SPF handled this situation. One notable way the implementation deviates from the proposal is the way VMAs are read-locked. During some of mm updates, multiple VMAs need to be locked until the end of the update (e.g. vma_merge, split_vma, etc). Tracking all the locked VMAs, avoiding recursive locks, figuring out when it's safe to unlock previously locked VMAs would make the code more complex. So, instead of the usual lock/unlock pattern, the proposed solution marks a VMA as locked and provides an efficient way to: 1. Identify locked VMAs. 2. Unlock all locked VMAs in bulk. We also postpone unlocking the locked VMAs until the end of the update, when we do mmap_write_unlock. Potentially this keeps a VMA locked for longer than is absolutely necessary but it results in a big reduction of code complexity. Read-locking a VMA is done using two sequence numbers - one in the vm_area_struct and one in the mm_struct. VMA is considered read-locked when these sequence numbers are equal. To read-lock a VMA we set the sequence number in vm_area_struct to be equal to the sequence number in mm_struct. To unlock all VMAs we increment mm_struct's seq number. This allows for an efficient way to track locked VMAs and to drop the locks on all VMAs at the end of the update. The patchset implements per-VMA locking only for anonymous pages which are not in swap and avoids userfaultfs as their implementation is more complex. Additional support for file-back page faults, swapped and user pages can be added incrementally. Performance benchmarks show similar although slightly smaller benefits as with SPF patchset (~75% of SPF benefits). Still, with lower complexity this approach might be more desirable. Since RFC was posted in September 2022, two separate Google teams outside of Android evaluated the patchset and confirmed positive results. Here are the known usecases when per-VMA locks show benefits: Android: Apps with high number of threads (~100) launch times improve by up to 20%. Each thread mmaps several areas upon startup (Stack and Thread-local storage (TLS), thread signal stack, indirect ref table), which requires taking mmap_lock in write mode. Page faults take mmap_lock in read mode. During app launch, both thread creation and page faults establishing the active workinget are happening in parallel and that causes lock contention between mm writers and readers even if updates and page faults are happening in different VMAs. Per-vma locks prevent this contention by providing more granular lock. Google Fibers: We have several dynamically sized thread pools that spawn new threads under increased load and reduce their number when idling. For example, Google's in-process scheduling/threading framework, UMCG/Fibers, is backed by such a thread pool. When idling, only a small number of idle worker threads are available; when a spike of incoming requests arrive, each request is handled in its own "fiber", which is a work item posted onto a UMCG worker thread; quite often these spikes lead to a number of new threads spawning. Each new thread needs to allocate and register an RSEQ section on its TLS, then register itself with the kernel as a UMCG worker thread, and only after that it can be considered by the in-process UMCG/Fiber scheduler as available to do useful work. In short, during an incoming workload spike new threads have to be spawned, and they perform several syscalls (RSEQ registration, UMCG worker registration, memory allocations) before they can actually start doing useful work. Removing any bottlenecks on this thread startup path will greatly improve our services' latencies when faced with request/workload spikes. At high scale, mmap_lock contention during thread creation and stack page faults leads to user-visible multi-second serving latencies in a similar pattern to Android app startup. Per-VMA locking patchset has been run successfully in limited experiments with user-facing production workloads. In these experiments, we observed that the peak thread creation rate was high enough that thread creation is no longer a bottleneck. TCP zerocopy receive: From the point of view of TCP zerocopy receive, the per-vma lock patch is massively beneficial. In today's implementation, a process with N threads where N - 1 are performing zerocopy receive and 1 thread is performing madvise() with the write lock taken (e.g. needs to change vm_flags) will result in all N -1 receive threads blocking until the madvise is done. Conversely, on a busy process receiving a lot of data, an madvise operation that does need to take the mmap lock in write mode will need to wait for all of the receives to be done - a lose:lose proposition. Per-VMA locking _removes_ by definition this source of contention entirely. There are other benefits for receive as well, chiefly a reduction in cacheline bouncing across receiving threads for locking/unlocking the single mmap lock. On an RPC style synthetic workload with 4KB RPCs: 1a) The find+lock+unlock VMA path in the base case, without the per-vma lock patchset, is about 0.7% of cycles as measured by perf. 1b) mmap_read_lock + mmap_read_unlock in the base case is about 0.5% cycles overall - most of this is within the TCP read hotpath (a small fraction is 'other' usage in the system). 2a) The find+lock+unlock VMA path, with the per-vma patchset and a trivial patch written to take advantage of it in TCP, is about 0.4% of cycles (down from 0.7% above) 2b) mmap_read_lock + mmap_read_unlock in the per-vma patchset is < 0.1% cycles and is out of the TCP read hotpath entirely (down from 0.5% before, the remaining usage is the 'other' usage in the system). So, in addition to entirely removing an onerous source of contention, it also reduces the CPU cycles of TCP receive zerocopy by about 0.5%+ (compared to overall cycles in perf) for the 'small' RPC scenario. In https://lkml.kernel.org/r/87fsaqouyd.fsf_-_@stealth, Punit demonstrated throughput improvements of as much as 188% from this patchset. This patch (of 25): This configuration variable will be used to build the support for VMA locking during page fault handling. This is enabled on supported architectures with SMP and MMU set. The architecture support is needed since the page fault handler is called from the architecture's page faulting code which needs modifications to handle faults under VMA lock. Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com Link: https://lkml.kernel.org/r/20230227173632.3292573-10-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 0b6cc04f3db3604c1485049bc9582523c2b44b75) Bug: 161210518 Change-Id: I787e1d28194655fb717d38718b2b839ef4e6226c Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:58 +00:00
Suren Baghdasaryan	ef8351241d	UPSTREAM: mm: introduce vm_flags_reset_once to replace WRITE_ONCE vm_flags updates Provide vm_flags_reset_once() and replace the vm_flags updates which used WRITE_ONCE() to prevent compiler optimizations. Link: https://lkml.kernel.org/r/20230201000116.1333160-1-surenb@google.com Fixes: 0cce31a0aa0e ("mm: replace vma->vm_flags direct modifications with modifier calls") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 601c3c29dbeb049862faa00917f2daf094a71028) Bug: 161210518 Change-Id: Ied961a1bfbdc25b79268ba04515960c664052d61 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:58 +00:00
Suren Baghdasaryan	75977e5919	UPSTREAM: mm: export dump_mm() mmap_assert_write_locked() is used in vm_flags modifiers. Because mmap_assert_write_locked() uses dump_mm() and vm_flags are sometimes modified from inside a module, it's necessary to export dump_mm() function. Link: https://lkml.kernel.org/r/20230126193752.297968-8-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit c2fdc235300a027adc04a41b383bd78ab5da56f4) Bug: 161210518 Change-Id: I78d82d04c26c9ae3bcd118e281d2ac8531e1ad81 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:58 +00:00
Suren Baghdasaryan	2ff3b23c7f	UPSTREAM: mm: introduce __vm_flags_mod and use it in untrack_pfn There are scenarios when vm_flags can be modified without exclusive mmap_lock, such as: - after VMA was isolated and mmap_lock was downgraded or dropped - in exit_mmap when there are no other mm users and locking is unnecessary Introduce __vm_flags_mod to avoid assertions when the caller takes responsibility for the required locking. Pass a hint to untrack_pfn to conditionally use __vm_flags_mod for flags modification to avoid assertion. Link: https://lkml.kernel.org/r/20230126193752.297968-7-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 68f48381d7fdd1cbb9d88c37a4dfbb98ac78226d) Bug: 161210518 Change-Id: I6ba44b03cde4c9b96d80423d41accab1effb71ac Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:58 +00:00
Suren Baghdasaryan	5dd0547a3e	UPSTREAM: mm: replace vma->vm_flags direct modifications with modifier calls Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 1c71222e5f2393b5ea1a41795c67589eea7e3490) Bug: 161210518 Change-Id: Ifc352b487db109adab17dd33a83f5c7e68c0bbc6 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:57 +00:00
Suren Baghdasaryan	bf16383ebd	UPSTREAM: mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK To simplify the usage of VM_LOCKED_CLEAR_MASK in vm_flags_clear(), replace it with VM_LOCKED_MASK bitmask and convert all users. Link: https://lkml.kernel.org/r/20230126193752.297968-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit e430a95a04efc557bc4ff9b3035c7c85aee5d63f) Bug: 161210518 Change-Id: I17bbcc01a133511dbfaf3d82fbc4b25ecdd0b376 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-07 14:24:57 +00:00
Jaewon Kim	a390414140	ANDROID: vendor_hooks: add hooks for extra memory Add vendor hooks for extra memory. If there is extra memory, this can be accounted like other memory stats. One of the usecases could be cleancache. If some of ram memory is used for cleancache, its free, cache, and total size could be added through these vendor hooks. Bug: 283896254 Change-Id: Iad7330310528581f09842f45860f05dc84823f41 Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>	2023-06-07 01:06:25 +00:00
xiaofeng	508ca06639	ANDROID: vendor_hooks:vendor hook for control memory dirty rate When the IO pressure increases or the system performs dirty page balancing, the frame rate of the foreground application may become unstable. Therefore, a hook point is added to limit the buffer IO rate from the source. Bug: 262189942 Change-Id: I5214d611a388c5e8d87dc44ffde86ead1834ddff Signed-off-by: xiaofeng <xiaofeng5@xiaomi.com>	2023-06-06 23:03:20 +00:00
Liam R. Howlett	2ea053d317	FROMGIT: userfaultfd: fix regression in userfaultfd_unmap_prep() Android reported a performance regression in the userfaultfd unmap path. A closer inspection on the userfaultfd_unmap_prep() change showed that a second tree walk would be necessary in the reworked code. Fix the regression by passing each VMA that will be unmapped through to the userfaultfd_unmap_prep() function as they are added to the unmap list, instead of re-walking the tree for the VMA. Link: https://lkml.kernel.org/r/20230601015402.2819343-1-Liam.Howlett@oracle.com Fixes: `69dbe6daf1` ("userfaultfd: use maple tree iterator to iterate VMAs") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit de53cc0be1c8b47d595682932beb3c11be9e4e5a git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable) Bug: 274059236 Change-Id: Ia189a5e98ffe86c4ca5ac3b686ada5f51826f2ed Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-06 20:05:25 +00:00
Liam R. Howlett	2f5f352e6a	FROMGIT: BACKPORT: mm: avoid rewalk in mmap_region If the iterator has moved to the previous entry, then step forward one range, back to the gap. Link: https://lkml.kernel.org/r/20230518145544.1722059-36-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: David Binderman <dcb314@hotmail.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Vernon Yang <vernon2gm@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit d3f028c7599ea2297dd630e1a6acaf4915c769d3 git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable) Bug: 274059236 Change-Id: Ic45e095c728095d41647a704a287596d03489cdf Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-06 20:05:25 +00:00
Liam R. Howlett	5ff9438fe1	FROMGIT: BACKPORT: mm/mmap: change do_vmi_align_munmap() for maple tree iterator changes The maple tree iterator clean up is incompatible with the way do_vmi_align_munmap() expects it to behave. Update the expected behaviour to map now since the change will work currently. Link: https://lkml.kernel.org/r/20230518145544.1722059-23-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: David Binderman <dcb314@hotmail.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Vernon Yang <vernon2gm@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit a4d5b9fbaf42d668c1b5c7f231f79776a9419a91 git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable) [surenb: adjust for missing vma_iter_load] Bug: 274059236 Change-Id: Id05ab617a3539f885a32c7d3031098a8c005fff8 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-06 20:05:25 +00:00
Liam R. Howlett	aede79b81e	ANDROID: mm: Fix __vma_adjust() writes for the maple tree Only write when necessary to the maple tree. This should only occur when the VMA changes. In the __vma_adjust() case, it is either the vma when it is expanded, the next vma when the boundary expands into 'vma', writing the 'insert', or when vma expands/shrinks for shift_arg_pages(). The mas_preallocate() setup should track the intended write to ensure the correct number of nodes are preallocated for the pending write. Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Link: `61b337f650` [surenb: __vma_adjust was removed in 6.3, therefore these fixes are not applicable upstream anymore. The patch was obtained from the author's tree] Bug: 274059236 Change-Id: I69d68a5b4ff11c40985f7b03b31eec4bb24dcbb6 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-06 20:05:25 +00:00
Liam R. Howlett	b802573f44	FROMLIST: BACKPORT: mm: Set up vma iterator for vma_iter_prealloc() calls Set the correct limits for vma_iter_prealloc() calls so that the maple tree can be smarter about how many nodes are needed. Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Link: https://lore.kernel.org/lkml/20230601021605.2823123-11-Liam.Howlett@oracle.com/ [surenb: remove vma_iter-related changes not present in 6.1 kernel] Bug: 274059236 Change-Id: I05d1989e35b2e72b9346743f290da66739b3ee59 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-06 20:05:25 +00:00
Liam R. Howlett	e9fdabfc2a	FROMLIST: BACKPORT: mm: Change do_vmi_align_munmap() side tree index The majority of the calls to munmap a VMA is for a single vma. The maple tree is able to store a single entry at 0, with a size of 1 as a pointer and avoid any allocations. Change do_vmi_align_munmap() to store the VMAs being munmap()'ed into a tree indexed by the count. This will leverage the ability to store the first entry without a node allocation. Storing the entries into a tree by the count and not the vma start and end means changing the functions which iterate over the entries. Update unmap_vmas() and free_pgtables() to take a maple state and a tree end address to support this functionality. Passing through the same maple state to unmap_vmas() and free_pgtables() means the state needs to be reset between calls. This happens in the static unmap_region() and exit_mmap(). Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Link: https://lore.kernel.org/lkml/20230601021605.2823123-5-Liam.Howlett@oracle.com/ [surenb: skip changes passing maple state to unmap_vmas() and free_pgtables()] Bug: 274059236 Change-Id: If38cfecd51da884bcfdbdfdfbf955a0b338d3d60 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-06 20:05:25 +00:00
Liam R. Howlett	25bed2fdbc	UPSTREAM: mm/mmap: remove preallocation from do_mas_align_munmap() In preparation of passing the vma state through split, the pre-allocation that occurs before the split has to be moved to after. Since the preallocation would then live right next to the store, just call store instead of preallocating. This effectively restores the potential error path of splitting and not munmap'ing which pre-dates the maple tree. Link: https://lkml.kernel.org/r/20230120162650.984577-12-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 0378c0a0e9e463b9e31b94fbbbc10f94b34225b6) Bug: 274059236 Change-Id: I3539fb3a08043dae1bc8aaa6c7f285711a0b5548 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-06 20:05:25 +00:00
Sooyong Suk	aee36dd530	ANDROID: mm: add vendor hooks in madvise for swap entry Add vendor hooks in madvise for swap entry - android_vh_madvise_pageout_swap_entry - android_vh_madvise_swapin_walk_pmd_entry - android_vh_process_madvise_end Bug: 284059805 Change-Id: Ic389244e343737a583286c20cadb6774efd8890c Signed-off-by: Sooyong Suk <s.suk@samsung.com>	2023-06-05 23:12:28 +00:00
Peter Collingbourne	131714e34b	FROMLIST: mm: Call arch_swap_restore() from unuse_pte() We would like to move away from requiring architectures to restore metadata from swap in the set_pte_at() implementation, as this is not only error-prone but adds complexity to the arch-specific code. This requires us to call arch_swap_restore() before calling swap_free() whenever pages are restored from swap. We are currently doing so everywhere except in unuse_pte(); do so there as well. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/I68276653e612d64cde271ce1b5a99ae05d6bbc4f Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/all/20230523004312.1807357-3-pcc@google.com/ Change-Id: I68276653e612d64cde271ce1b5a99ae05d6bbc4f Bug: 274890466	2023-06-05 21:53:19 +00:00
Peter Collingbourne	3805b879f5	FROMLIST: mm: Call arch_swap_restore() from do_swap_page() Commit `c145e0b47c` ("mm: streamline COW logic in do_swap_page()") moved the call to swap_free() before the call to set_pte_at(), which meant that the MTE tags could end up being freed before set_pte_at() had a chance to restore them. Fix it by adding a call to the arch_swap_restore() hook before the call to swap_free(). Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c510678965 Cc: <stable@vger.kernel.org> # 6.1 Fixes: `c145e0b47c` ("mm: streamline COW logic in do_swap_page()") Reported-by: Qun-wei Lin (林群崴) <Qun-wei.Lin@mediatek.com> Closes: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/ Acked-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/all/20230523004312.1807357-2-pcc@google.com/ Change-Id: I6470efa669e8bd2f841049b8c61020c510678965 Bug: 274890466	2023-06-05 21:53:19 +00:00
xiaofeng	025b5a487b	ANDROID: vendor_hooks:vendor hook for __alloc_pages_slowpath. add vendor hook in __alloc_pages_slowpath ahead of __alloc_pages_direct_reclaim and warn_alloc. Bug: 243629905 Change-Id: Ieacc6cf79823c0bfacfdeec9afb55ed66f40d0b0 Signed-off-by: xiaofeng <xiaofeng5@xiaomi.com>	2023-06-05 16:38:22 +00:00
Dezhi Huang	3e2dc32f59	ANDROID: mm: create vendor hooks for memory reclaim we try to adjust page reclaim operations based on the running task and kernel memory pressure. Thus, we want to create some vendor hooks into kernel6.1. Firstly, we add ADNRROID_VENDOR_DATA into the struct scan_control, special operations would be performed based on this special scan option. We measure the importance of the current process in the system and obtain its weight, which is recorded in ANDROID_VENDOR_DATA. The hook function: trace_android_vh_modify_scan_control is added inside of the function modify_scan_control() to adjust reclaim operations based on memory pressure. The hook function: trace_android_vh_should_continue_reclaim is added inside of the function shrink_node() to decide if page_reclaim would continue or not based on memory pressure. The hook function: trace_android_vh_file_is_tiny_bypass is added into the function prepare_scan_count() to decide if the file pages should be skipped in condition to file refualts and memory pressure. Bug: 279793370 Change-Id: I1efe9d3e866f37b0295c7cd94ec8ca0117a9bd4a Signed-off-by: Dezhi Huang <huangdezhi@hihonor.com>	2023-06-05 16:31:49 +00:00
Zhenhua Huang	78fe8913d1	UPSTREAM: mm,kfence: decouple kfence from page granularity mapping judgement Kfence only needs its pool to be mapped as page granularity, if it is inited early. Previous judgement was a bit over protected. From [1], Mark suggested to "just map the KFENCE region a page granularity". So I decouple it from judgement and do page granularity mapping for kfence pool only. Need to be noticed that late init of kfence pool still requires page granularity mapping. Page granularity mapping in theory cost more(2M per 1GB) memory on arm64 platform. Like what I've tested on QEMU(emulated 1GB RAM) with gki_defconfig, also turning off rodata protection: Before: [root@liebao ]# cat /proc/meminfo MemTotal: 999484 kB After: [root@liebao ]# cat /proc/meminfo MemTotal: 1001480 kB To implement this, also relocate the kfence pool allocation before the linear mapping setting up, arm64_kfence_alloc_pool is to allocate phys addr, __kfence_pool is to be set after linear mapping set up. LINK: [1] https://lore.kernel.org/linux-arm-kernel/Y+IsdrvDNILA59UN@FVFF77S0Q05N/ Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/1679066974-690-1-git-send-email-quic_zhenhuah@quicinc.com Signed-off-by: Will Deacon <will@kernel.org> BUG: 284812202 Change-Id: I8e7c565d3f4d6349a028a6a060259d62cf5beee7 (cherry picked from commit bfa7965b33ab79fc3b2f8adc14704075fe2416cd) Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>	2023-05-31 17:22:42 +00:00
Tetsuo Handa	8035e57ec7	UPSTREAM: mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock commit 1007843a91909a4995ee78a538f62d8665705b66 upstream. syzbot is reporting circular locking dependency which involves zonelist_update_seq seqlock [1], for this lock is checked by memory allocation requests which do not need to be retried. One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler. CPU0 ---- __build_all_zonelists() { write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd // e.g. timer interrupt handler runs at this moment some_timer_func() { kmalloc(GFP_ATOMIC) { __alloc_pages_slowpath() { read_seqbegin(&zonelist_update_seq) { // spins forever because zonelist_update_seq.seqcount is odd } } } } // e.g. timer interrupt handler finishes write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even } This deadlock scenario can be easily eliminated by not calling read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation requests. But Michal Hocko does not know whether we should go with this approach. Another deadlock scenario which syzbot is reporting is a race between kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with port->lock held and printk() from __build_all_zonelists() with zonelist_update_seq held. CPU0 CPU1 ---- ---- pty_write() { tty_insert_flip_string_and_push_buffer() { __build_all_zonelists() { write_seqlock(&zonelist_update_seq); build_zonelists() { printk() { vprintk() { vprintk_default() { vprintk_emit() { console_unlock() { console_flush_all() { console_emit_next_record() { con->write() = serial8250_console_write() { spin_lock_irqsave(&port->lock, flags); tty_insert_flip_string() { tty_insert_flip_string_fixed_flag() { __tty_buffer_request_room() { tty_buffer_alloc() { kmalloc(GFP_ATOMIC \| __GFP_NOWARN) { __alloc_pages_slowpath() { zonelist_iter_begin() { read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held } } } } } } } } spin_unlock_irqrestore(&port->lock, flags); // message is printed to console spin_unlock_irqrestore(&port->lock, flags); } } } } } } } } } write_sequnlock(&zonelist_update_seq); } } } This deadlock scenario can be eliminated by preventing interrupt context from calling kmalloc(GFP_ATOMIC) and preventing printk() from calling console_flush_all() while zonelist_update_seq.seqcount is odd. Since Petr Mladek thinks that __build_all_zonelists() can become a candidate for deferring printk() [2], let's address this problem by disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC) and disabling synchronous printk() in order to avoid console_flush_all() . As a side effect of minimizing duration of zonelist_update_seq.seqcount being odd by disabling synchronous printk(), latency at read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e. do not record unnecessary locking dependency) from interrupt context is still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq) section... Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp Fixes: `3d36424b3b` ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] Change-Id: Ifc0c6ed9be6d36166367811ad412bedc66ed713e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Petr Mladek <pmladek@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Patrick Daly <quic_pdaly@quicinc.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `b528537d13`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-05-31 16:27:26 +00:00
Mel Gorman	fa3ef799ad	UPSTREAM: mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages commit 4d73ba5fa710fe7d432e0b271e6fecd252aef66e upstream. A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is taking an excessive amount of time for large amounts of memory. Further testing allocating huge pages that the cost is linear i.e. if allocating 1G pages in batches of 10 then the time to allocate nr_hugepages from 10->20->30->etc increases linearly even though 10 pages are allocated at each step. Profiles indicated that much of the time is spent checking the validity within already existing huge pages and then attempting a migration that fails after isolating the range, draining pages and a whole lot of other useless work. Commit `eb14d4eefd` ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") removed two checks, one which ignored huge pages for contiguous allocations as huge pages can sometimes migrate. While there may be value on migrating a 2M page to satisfy a 1G allocation, it's potentially expensive if the 1G allocation fails and it's pointless to try moving a 1G page for a new 1G allocation or scan the tail pages for valid PFNs. Reintroduce the PageHuge check and assume any contiguous region with hugetlbfs pages is unsuitable for a new 1G allocation. The hpagealloc test allocates huge pages in batches and reports the average latency per page over time. This test happens just after boot when fragmentation is not an issue. Units are in milliseconds. hpagealloc 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla hugeallocrevert-v1r1 hugeallocsimple-v1r2 Min Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%) 1st-qrtle Latency 356.61 ( 0.00%) 5.34 ( 98.50%) 19.85 ( 94.43%) 2nd-qrtle Latency 697.26 ( 0.00%) 5.47 ( 99.22%) 20.44 ( 97.07%) 3rd-qrtle Latency 972.94 ( 0.00%) 5.50 ( 99.43%) 20.81 ( 97.86%) Max-1 Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%) Max-5 Latency 82.14 ( 0.00%) 5.11 ( 93.78%) 19.31 ( 76.49%) Max-10 Latency 150.54 ( 0.00%) 5.20 ( 96.55%) 19.43 ( 87.09%) Max-90 Latency 1164.45 ( 0.00%) 5.53 ( 99.52%) 20.97 ( 98.20%) Max-95 Latency 1223.06 ( 0.00%) 5.55 ( 99.55%) 21.06 ( 98.28%) Max-99 Latency 1278.67 ( 0.00%) 5.57 ( 99.56%) 22.56 ( 98.24%) Max Latency 1310.90 ( 0.00%) 8.06 ( 99.39%) 26.62 ( 97.97%) Amean Latency 678.36 ( 0.00%) 5.44 * 99.20%* 20.44 * 96.99%* 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla revert-v1 hugeallocfix-v2 Duration User 0.28 0.27 0.30 Duration System 808.66 17.77 35.99 Duration Elapsed 830.87 18.08 36.33 The vanilla kernel is poor, taking up to 1.3 second to allocate a huge page and almost 10 minutes in total to run the test. Reverting the problematic commit reduces it to 8ms at worst and the patch takes 26ms. This patch fixes the main issue with skipping huge pages but leaves the page_count() out because a page with an elevated count potentially can migrate. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022 Link: https://lkml.kernel.org/r/20230414141429.pwgieuwluxwez3rj@techsingularity.net Fixes: `eb14d4eefd` ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") Change-Id: I552f0631f15e41038219e207c994fa7702b269fa Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Yuanxi Liu <y.liu@naruida.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `059f24aff6`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-05-31 16:27:26 +00:00
Alexander Potapenko	f800df6e1f	UPSTREAM: mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush() commit 47ebd0310e89c087f56e58c103c44b72a2f6b216 upstream. As reported by Dipanjan Das, when KMSAN is used together with kernel fault injection (or, generally, even without the latter), calls to kcalloc() or __vmap_pages_range_noflush() may fail, leaving the metadata mappings for the virtual mapping in an inconsistent state. When these metadata mappings are accessed later, the kernel crashes. To address the problem, we return a non-zero error code from kmsan_vmap_pages_range_noflush() in the case of any allocation/mapping failure inside it, and make vmap_pages_range_noflush() return an error if KMSAN fails to allocate the metadata. This patch also removes KMSAN_WARN_ON() from vmap_pages_range_noflush(), as these allocation failures are not fatal anymore. Link: https://lkml.kernel.org/r/20230413131223.4135168-1-glider@google.com Fixes: `b073d7f8ae` ("mm: kmsan: maintain KMSAN metadata for page operations") Change-Id: I2a50da1c7cc438a30026b2b18d425fff2ea349b6 Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `bd6f3421a5`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-05-31 15:20:12 +00:00

1 2 3 4 5 ...

19906 Commits