In the ufshcd_clear_cmds(), the 2nd pamameter would be the
bit mask of the command to be cleared in the transfer request
door bell register. This bit mask mechanism does not scale well in
mcq mode when the queue depth becomes much greater than 64. Change the
2nd parameter to the function to be the task_tag number of the
corresponding bit to be cleared in the door bell register.
By doing so, mcq mode with a large queue depth can reuse this function.
Since the behavior of this function is changed from handling
multiple commands into a single command, rename ufshcd_clear_cmds()
into ufshcd_clear_cmd().
Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Bug: 285631559
Link: https://lore.kernel.org/linux-scsi/8411fb5363acc90519bced30ea2c2ac582ff2340.1685396241.git.quic_nguyenb@quicinc.com/
Change-Id: Id8038a4c3b4a690e97aeb5e401396fd181cfca26
Signed-off-by: SEO HOYOUNG <hy50.seo@samsung.com>
The UTP command descriptor base address is a 57-bit field in the
UTP transfer request descriptor. Combine the two 32-bit
command_desc_base_addr_lo/hi fields into a 64-bit for better handling
of this field.
Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Bug: 285631559
Link: https://lore.kernel.org/linux-scsi/4e6f7f5a15000cdae77c3014b477264f57bf572c.1685396241.git.quic_nguyenb@quicinc.com/
Change-Id: Ia976d86b0c8b9e31fd5cd784191ea3eb4eb5c130
Signed-off-by: SEO HOYOUNG <hy50.seo@samsung.com>
Reserve ANDROID_OEM_DATA in struct mutex/rw_semaphore for recording
information about the lock, such as the lock owner, which helps with
OEM specific lock optimization.
Bug: 188869548
Change-Id: I33f767a1823f854a8deb8ba9078079aa6a9d76ea
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
(cherry picked from commit 97f7f2ebf3eaa4d5f2e00b0a9c6d23462a25aa2d)
Android 14 and later default to MGLRU [1] and field telemetry showed
occasional long tail latency (>100ms) in the reclaim path.
Tracing revealed priority inversion in the reclaim path. In
try_to_inc_max_seq(), when high priority tasks were blocked on
wait_event_killable(), the preemption of the low priority task to call
wake_up_all() caused those high priority tasks to wait longer than
necessary. In general, this problem is not different from others of its
kind, e.g., one caused by mutex_lock(). However, it is specific to MGLRU
because it introduced the new wait queue lruvec->mm_state.wait.
The purpose of this new wait queue is to avoid the thundering herd
problem. If many direct reclaimers rush into try_to_inc_max_seq(), only
one can succeed, i.e., the one to wake up the rest, and the rest who
failed might cause premature OOM kills if they do not wait. So far there
is no evidence supporting this scenario, based on how often the wait has
been hit. And this begs the question how useful the wait queue is in
practice.
Based on Minchan's recommendation, which is in line with his commit
6d4675e601 ("mm: don't be stuck to rmap lock on reclaim path") and the
rest of the MGLRU code which also uses trylock when possible, remove the
wait queue.
[1] https://android-review.googlesource.com/q/I7ed7fbfd6ef9ce10053347528125dd98c39e50bf
Link: https://lkml.kernel.org/r/20230413214326.2147568-1-kaleshsingh@google.com
Fixes: bd74fdaea1 ("mm: multi-gen LRU: support page table walks")
Change-Id: I911f3968fd1adb25171279cc5b6f48ccb7efc8de
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Reported-by: Wei Wang <wvw@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7f63cf2d9b9bbe7b90f808927558a66ff737d399)
Bug: 277906484
[ Kalesh Singh - Fix conflict in mm/vmscan.c (lru_gen_del_mm) ]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
This commit adds support for getting the pid and tid information of
the sender for asynchronous transfers in binderfs transfer records.
In previous versions, it was not possible to obtain this information
from the transfer records. While this information may not be necessary
for all use cases, it can be useful in some scenarios.
Signed-off-by: Chuang Zhang <zhangchuang3@xiaomi.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/0c1e8bd37c68dd1518bb737b06b768cde9659386.1682333709.git.zhangchuang3@xiaomi.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 235773151
(cherry picked from commit c21c0f9a20a963f5a1874657a4e3d657503f7815
git: //git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
char-misc-next)
Change-Id: I7e729853353522164c4a3dd1094145dfd19af961
Signed-off-by: Carlos Llamas <cmllamas@google.com>
This patch adds a timestamp field to the binder_transaction
structure to track the time consumed during transmission
when reading binder_transaction records.
Signed-off-by: Chuang Zhang <zhangchuang3@xiaomi.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/5ac8c0d09392290be789423f0dd78a520b830fab.1682333709.git.zhangchuang3@xiaomi.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 235773151
(cherry picked from commit 800936191a26a5aba5caa3cbd70a4154b45eb94a
git: //git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
char-misc-next)
[cmllamas: resolved minor conflicts with local patches]
Change-Id: If6dab2e9b80c71f9ac3084dc8cf0e519976f55d8
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Add ANDROID_OEM_DATA(1) in struct request_queue to support more
request queue's status for fbarrier feature.
Bug: 283021230
Change-Id: Ic946fd08dcebed708f03749557d9289ddb3696b8
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Weichao Guo <guoweichao@oppo.corp-partner.google.com>
Add ANDROID_OEM_DATA(1) in struct bio to record more fields
for fbarrier feature.
Bug: 283021230
Change-Id: I83a804122db830eab2ce9a0c9ad32a732e986ff0
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Weichao Guo <guoweichao@oppo.corp-partner.google.com>
Add ANDROID_OEM_DATA(1) in struct ufs_dev_info to support more
feature status for extend COPY feature.
Bug: 283021230
Change-Id: I961dbd4d4a0939658ce55f608062107f4a0575a6
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Weichao Guo <guoweichao@oppo.corp-partner.google.com>
Add ANDROID_OEM_DATA to struct rq, which is used to implement oem's
scheduler tuning.
Bug: 188899490
Change-Id: I1904b4fd83effc4b309bfb98811e9718398504f4
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
(cherry picked from commit 8fdd8e4bb5e0fc1af6ec2eec73280c2ffe569a37)
Add encryption support to bootloader based hibernation. This will
encrypt the hibernation snapshot image before it is written to the swap
partition.
Bug: 279879797
Change-Id: I07046ad7fb848fc62258871ab41b3e03246c40dc
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
Added changes to support the hibernation feature for serial UART.
Added support for freeze, restore and thaw callbacks to put the
device into hibernation.
Signed-off-by: Aniket Randive <quic_arandive@quicinc.com>
Link: https://lore.kernel.org/r/1665123780-20557-1-git-send-email-quic_arandive@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 35781d8356a2eecaa6074ceeb80ee22e252fcdae)
Bug: 279879797
Change-Id: I3af4deebe1029e7463add27a5dd86f3eb46a6917
Signed-off-by: Aniket Randive <quic_arandive@quicinc.com>
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
There is a qcom clock driver which uses the clock restore and save context
for the hibernation use case. The PLLs, always enabled clocks are saved and
restored during this use case.
Leaf changes summary: 2 artifact changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 2 Added functions
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
2 Added functions:
[A] 'function void clk_restore_context()'
[A] 'function void clk_save_context()'
Bug: 279879797
Change-Id: I0f9f0853f9593239dedb7d84941002d346038843
Signed-off-by: Taniya Das <quic_tdas@quicinc.com>
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
Add vendor hooks to disable randomization of swap slot allocation for
swap partition used for saving hibernation image. Another level of
randomization of swap slots takes place at the firmware level as well
in order to address the wear leveling for UFS/MMC devices, so this
vendor hook checks if a block device represents the swap partition being
used for saving hibernation image, if yes, the swap slot allocation for
such partition is serialized at kernel level.
There is a performance advantage of reading contiguous pages of hibernation
image, it makes the restore logic of hibernation image simpler and faster
as there are no seeks involved in the secondary storage to read multiple
contiguous pages of the image.
Bug: 279879797
Change-Id: I8258b5166d8c6952fe9eb91a5a9826f33b836f00
Signed-off-by: Vivek Kumar <quic_vivekuma@quicinc.com>
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
Enable CONFIG_HIBERNATION to add support for hibernation.
Bug: 279879797
Change-Id: Ibc7958afce9e2002616dc3e40b0524d98997a798
Signed-off-by: Vivek Kumar <quic_vivekuma@quicinc.com>
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
To add encryption support to bootloader based hibernation, export
symbols snapshot_get_image_size and alloc_swapdev_block.
These symbols can be used by vendor implementation to be called before
and after storing the snapshot image.
Bug: 279879797
Change-Id: I0d44bf833a97fce5bc5213712b2b2523a9e22607
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
Enable CONFIG_LED_TRIGGER_PHY to support for tracking link state.
This also makes GKI on arm64 consistent with GKI on x86.
This could affect KMI breakage by changing struct phy_device,
so it should be fixed until KMI freeze.
Bug: 278043288
Change-Id: I6cd976de7de2a3518434cf8ca5eaf691e40047a3
Signed-off-by: Norihiko Hama <Norihiko.Hama@alpsalpine.com>
struct swap_info_struct :: ANDROID_VENDOR_DATA(1)
It is pointer to a struct to record the following message:
1) total swapin pages;
2) total swapout pages;
3) total number of cold pages swapin;
4) total number of swapout pages, specified by userspace;
5) total number of swapout pages, specified by kernel;
6) the maxmium number of swapout pages;
7) the maxmium number of swapout pages allowed by kernel;
8) the maxmium number of swapout pages allowed by framework;
Bug: 225795494
Change-Id: I779145a83d87e339db86ec81c7f962be99946afb
Signed-off-by: Bing Han <bing.han@transsion.com>
(cherry picked from commit af4eb0e377b09fe7e2f342bcca0e97fa43eaed41)
(cherry picked from commit 29277e2bf79d36eede562b529c8e7b295e9a53df)
struct swap_slots_cache :: ANDROID_VENDOR_DATA(1)
1) Multiple swap devices can be supported;
2) There are different kinds of data;
3) During data reclamation, different types of data are exchanged
to different swap devices;
4) Each swap device has corresponding arrays of slots and slots_ret;
5) Each swap device has corresponding indexes of nr, cur and n_ret;
6) This field is a pointer, it points to a struct which contains
all the other arrays and indexes;
Bug: 225795494
Change-Id: Icf116135926be98449a2d96fc458e58e5ad3b7e9
Signed-off-by: Bing Han <bing.han@transsion.com>
(cherry picked from commit a034320a6855af99fe1ec1dfe6cafa4c2a15ecce)
(cherry picked from commit 2fd1f19d555cb63bdf2f810f9b944feabf836dff)
struct lruvec :: ANDROID_VENDOR_DATA(1)
It is pointer to a struct to record the following message:
1)the account of workingset_restore pages of cached anonymous and
file pages
This is used to adjust the strategy and amount of reclaiming data.
Bug: 225795494
Change-Id: I34e57ee23b6c97ac91effa5b72513d238335a996
Signed-off-by: Bing Han <bing.han@transsion.com>
(cherry picked from commit 1b14ae01b09dfde89da470cac8415cefaca824fb)
(cherry picked from commit dcac70709fb59478979519d7502b2bb5b8389ff6)
Add a field in mem_cgroup to record additional per-cgroup information
for memory policy tuning.
Bug: 192052083
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit 45fabbd8e3de18cc8f3e8f1d1f5e8639b7350da5)
Change-Id: I28c8bc1c2455d53e68a05555b57b76ded27af98a
Signed-off-by: lvwenhuan <lvwenhuan@oppo.com>
Add a pglist_data field to record additional node parameters.
Bug: 192052083
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit 65115fdbf84d284ea5472e366cc0800896100de9)
(cherry picked from commit 9fa4706bf4120caf55426b2042b7611d8663a94a)
Change-Id: I3d764ab298c71ab9aba245867ee529045551aef4
Signed-off-by: lvwenhuan <lvwenhuan@oppo.com>
Per-vma locks patchset causes db845c module failure:
These symbols are missing from the symbol list and are not available
at runtime for unsigned modules:
down_write required by ['msm.ko']
up_write required by ['msm.ko']
Add the missing symbols.
Bug: 161210518
Change-Id: I6f5777612086cec5331e94e0999670c97229435d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
atomisp driver which is missing in 6.4 kernel fails when building
allmodconfig configuration because if modifies vma->vm_flags directly.
Fix by replacing direct modifications with modifier functions.
Bug: 161210518
Change-Id: I91c233a276ce4bfba3decfaaa58cfe12b423d24d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
If the page fault handler requests a retry, we will count the fault
multiple times. This is a relatively harmless problem as the retry paths
are not often requested, and the only user-visible problem is that the
fault counter will be slightly higher than it should be. Nevertheless,
userspace only took one fault, and should not see the fact that the kernel
had to retry the fault multiple times.
Move page fault accounting into mm_account_fault() and skip incomplete
faults which will be accounted upon completion.
Link: https://lkml.kernel.org/r/20230419175836.3857458-1-surenb@google.com
Fixes: d065bd810b ("mm: retry page fault when blocking on disk transfer")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 53156443a30368c0759c22e54a8d5cacc1b543cc)
[surenb: resolve differences in handle_mm_fault() between 6.1 and 6.4
kernel versions]
Bug: 161210518
Change-Id: Ic8cd807128ffd2c77a4db2af85b64bc24cc5052b
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change CONFIG_PER_VMA_LOCK_STATS to be disabled by default, as most users
don't need it. Add configuration help to clarify its usage.
Link: https://lkml.kernel.org/r/20230428173533.18158-1-surenb@google.com
Fixes: 52f238653e45 ("mm: introduce per-VMA lock statistics")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 6152e53d9671b0ccc21c1bca842617b32ccfc5d8)
Bug: 161210518
Change-Id: Ibd57999a415b5433ae3b99365ea50526a35452d1
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
This is the s390 variant of "x86/mm: try VMA lock-based page fault handling
first".
Link: https://lkml.kernel.org/r/20230314132808.1266335-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit e06f47a16573decc57498f2d02f9af3bb3e84cf2)
Bug: 161210518
Change-Id: Icad059c2b458f650ac1c407b0fbbdcede4b0c164
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
vma->lock being part of the vm_area_struct causes performance regression
during page faults because during contention its count and owner fields
are constantly updated and having other parts of vm_area_struct used
during page fault handling next to them causes constant cache line
bouncing. Fix that by moving the lock outside of the vm_area_struct.
All attempts to keep vma->lock inside vm_area_struct in a separate cache
line still produce performance regression especially on NUMA machines.
Smallest regression was achieved when lock is placed in the fourth cache
line but that bloats vm_area_struct to 256 bytes.
Considering performance and memory impact, separate lock looks like the
best option. It increases memory footprint of each VMA but that can be
optimized later if the new size causes issues. Note that after this
change vma_init() does not allocate or initialize vma->lock anymore. A
number of drivers allocate a pseudo VMA on the stack but they never use
the VMA's lock, therefore it does not need to be allocated. The future
drivers which might need the VMA lock should use
vm_area_alloc()/vm_area_free() to allocate the VMA.
Link: https://lkml.kernel.org/r/20230227173632.3292573-34-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c7f8f31c00d187a2c71a241c7f2bd6aa102a4e6f)
Bug: 161210518
Change-Id: I0c4dff518b74e1a9ea2b40636b436a122841564d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
call_rcu() can take a long time when callback offloading is enabled. Its
use in the vm_area_free can cause regressions in the exit path when
multiple VMAs are being freed.
Because exit_mmap() is called only after the last mm user drops its
refcount, the page fault handlers can't be racing with it. Any other
possible user like oom-reaper or process_mrelease are already synchronized
using mmap_lock. Therefore exit_mmap() can free VMAs directly, without
the use of call_rcu().
Expose __vm_area_free() and use it from exit_mmap() to avoid possible
call_rcu() floods and performance regressions caused by it.
Link: https://lkml.kernel.org/r/20230227173632.3292573-33-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0d2ebf9c3f7822e7ba3e4792ea3b6b19aa2da34a)
Bug: 161210518
Change-Id: I4fbf3ef38fdb22a3c80dcc61125ec21d2c426100
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
Link: https://lkml.kernel.org/r/20230227173632.3292573-31-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit cd7f176aea5f5929a09a91c661a26912cc995d1b)
Bug: 161210518
Change-Id: I0630cfa72467bb5f0ec162c3c276aa7745e49780
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
Link: https://lkml.kernel.org/r/20230227173632.3292573-30-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0bff0aaea03e2a3ed6bfa302155cca8a432a1829)
Bug: 161210518
Change-Id: I9fe8bad1b040d0514de802dd58fe3187d4741ab9
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics
about handling page fault under VMA lock.
Link: https://lkml.kernel.org/r/20230227173632.3292573-29-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 52f238653e452e0fda61e880f263a173d219acd1)
Bug: 161210518
Change-Id: I1bc9ab9bc0307af26e0c51ba12f9ad561af5b6c8
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Due to the possibility of handle_userfault dropping mmap_lock, avoid fault
handling under VMA lock and retry holding mmap_lock. This can be handled
more gracefully in the future.
Link: https://lkml.kernel.org/r/20230227173632.3292573-28-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 444eeb17437a0ef526c606e9141a415d3b7dfddd)
Bug: 161210518
Change-Id: I383603d637497ea9917ad08908530f91052a17cc
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Due to the possibility of do_swap_page dropping mmap_lock, abort fault
handling under VMA lock and retry holding mmap_lock. This can be handled
more gracefully in the future.
Link: https://lkml.kernel.org/r/20230227173632.3292573-27-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 17c05f18e54158a3eed0c22c85b7a756b63dcc01)
Bug: 161210518
Change-Id: I047f4d0e0ca3b3bf9505e5cda2da768c88bed20e
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When vma->anon_vma is not set, page fault handler will set it by either
reusing anon_vma of an adjacent VMA if VMAs are compatible or by
allocating a new one. find_mergeable_anon_vma() walks VMA tree to find a
compatible adjacent VMA and that requires not only the faulting VMA to be
stable but also the tree structure and other VMAs inside that tree.
Therefore locking just the faulting VMA is not enough for this search.
Fall back to taking mmap_lock when vma->anon_vma is not set. This
situation happens only on the first page fault and should not affect
overall performance.
Link: https://lkml.kernel.org/r/20230227173632.3292573-25-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 2ac0af1b66e3b66307f53b1cc446514308ec466d)
Bug: 161210518
Change-Id: Iafacad5bda7bb138b290f38421a22d828051b067
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Introduce lock_vma_under_rcu function to lookup and lock a VMA during page
fault handling. When VMA is not found, can't be locked or changes after
being locked, the function returns NULL. The lookup is performed under
RCU protection to prevent the found VMA from being destroyed before the
VMA lock is acquired. VMA lock statistics are updated according to the
results. For now only anonymous VMAs can be searched this way. In other
cases the function returns NULL.
Link: https://lkml.kernel.org/r/20230227173632.3292573-24-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 50ee32537206140e4cf6e47024be29a84d458d49)
Bug: 161210518
Change-Id: I4872bb04f5c8a515e4b31bc36c95e15b62cbd0da
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Per-vma locking mechanism will search for VMA under RCU protection and
then after locking it, has to ensure it was not removed from the VMA tree
after we found it. To make this check efficient, introduce a
vma->detached flag to mark VMAs which were removed from the VMA tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-23-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 457f67be5910a2b5f1fda8af06bfe4d3492a0a4f)
[surenb: vma_complete does not exist in 6.1, therefore patch is adjusted
to mark VMAs detached directly in vma_expand and __vma_adjust]
Bug: 161210518
Change-Id: Id1f31733cb7a36f3f1294b2be83cf3b87ba3f812
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Page fault handlers might need to fire MMU notifications while a new
notifier is being registered. Modify mm_take_all_locks to write-lock all
VMAs and prevent this race with page fault handlers that would hold VMA
locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same
locking order as in page fault handlers.
Link: https://lkml.kernel.org/r/20230227173632.3292573-22-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit eeff9a5d47f89bc641034fea05501c8a6de131cb)
Bug: 161210518
Change-Id: I4176bf0e1b07f03dfc1ac7dd37d7941d5a1dbc02
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Assert there are no holders of VMA lock for reading when it is about to be
destroyed.
Link: https://lkml.kernel.org/r/20230227173632.3292573-21-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f2e13784c16a98e269b3111ac02ae44446dd589c)
Bug: 161210518
Change-Id: I67fc318fd08575c9bcc54fd00b0304246b0cd279
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Normally free_pgtables needs to lock affected VMAs except for the case
when VMAs were isolated under VMA write-lock. munmap() does just that,
isolating while holding appropriate locks and then downgrading mmap_lock
and dropping per-VMA locks before freeing page tables. Add a parameter to
free_pgtables for such scenario.
Link: https://lkml.kernel.org/r/20230227173632.3292573-20-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 98e51a2239d9d419d819cd61a2e720ebf19a8b0a)
Bug: 161210518
Change-Id: I3c9177cce187526407754baf7641d3741ca7b0cb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>