Create input symbol files to generate GKI modules header
under include/config. By placing files in this generated
directory, the default filters that ignore certain files
will work without any special handling required, and they
will also be available to inspect after the build to inspect
for the debugging purposes.
abi_gki_protected_exports: Input for gki_module_protected_exports.h
From :- ${objtree}/abi_gki_protected_exports
To :- include/config/abi_gki_protected_exports
all_kmi_symbols: Input for gki_module_unprotected.h
- Rename to abi_gki_kmi_symbols
From :- all_kmi_symbols
To :- include/config/abi_gki_kmi_symbols
Bug: 286529877
Test: TH
Test: Manual verification of the generated files
Change-Id: Iafa10631e7712a8e1e87a2f56cfd614de6b1053a
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit fb49c455323ff8319a123dd312be9082c49a23a5)
Change-Id: Ic5aa9dc51a888b5b0319ec4ec6d2941424573ca0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Since upstream commit 715f7f9ece ("locking/rtmutex: Squash !RT
tasks to DEFAULT_PRIO"), non-rt tasks do not inherit the
nice-priority values across rt_mutexes. This removes the minor
(and indirect) priority-inheritance that rt-mutexes provided for
CFS tasks.
Though without priority inheritance, time-bounded priority
inversion can occur between CFS tasks of different nice
priorities / cgroup limitations. The proxy-execution efforts
are a work-in-progress to resolve this upstream, but in the
meantime it is left to vendor hooks to provide a near term
solution to avoid priority inversion between CFS tasks.
In our oem scheduler, if a CFS thread has an "user-aware
property", we will always pick it even if it's vruntime is
bigger than the smallest one in runqueue. That's why the
trace_android_rvh_replace_next_task_fair vendorhook was added
previously in commit 53e809978443 ("ANDROID: vendor_hooks: Add
hooks for scheduler").
Thus for our oem scheduler, important CFS tasks(like
RenderThread) are marked with the "user-aware property" in their
struct task_struct. If those tasks are blocked on an rtmutex, we
want to allow the "user-aware property" to be inherited to lock
owner, so it will be selected to run immediately to release the
lock.
To support this, we need new hooks to map "user-aware property"
into different rtmutex_waiter prio and update the owner's
"user-aware property" if needed. Thus these additional vendor
hooks are needed.
In the future, once an generalized upstream solution for CFS
priority inheritance is in place, this will no longer be needed.
Bug: 290585456
Change-Id: I6521ed2086b147400a54da6b84a324baf16bc649
Signed-off-by: xieliujie <xieliujie@oppo.com>
Change build time generated GKI module headers location
From :- kernel/module/gki_module_*.h
To :- include/generated/gki_module_*.h
This prevents the kernel source from being contaminated.
By placing the header files in a generated directory,
the default filters that ignore certain files will work
without any special handling required.
Bug: 286529877
Test: Manual verification & TH
Change-Id: Ie247d1c132ddae54906de2e2850e95d7ae9edd50
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
(cherry picked from commit e9cba885543fc50a5b59ff7234d02b74a380573c)
If CONFIG_LOCKDEP is enabled, export `sched_domains_mutex` as it is
indirectly accessed by the macro `for_each_domain()`. This allows
vendors to call the `for_each_domain()` macro with CONFIG_LOCKDEP
enabled via the GKI_BUILD_CONFIG_FRAGMENT.
Bug: 176254015
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Change-Id: Ia9f2989de41b2224c63855f2fd129cbeeac4f195
Signed-off-by: Will McVicker <willmcvicker@google.com>
(cherry picked from commit 7171a5de9832b53122a9787586e18e9523b5b47e)
(cherry picked from commit e2cdae06e22e6cfdbae696c5e40ad159a2cf9639)
As a supplement to commit 6c1c1552e6
("ANDROID: vendor_hook: add hooks to protect locking-tsk in cpu scheduler").
In rwsem read, we missed a lock-holding scenario, add it now.
Bug: 290868674
Change-Id: I718dd942b24b330a79283fc241dcbf47cc34c0c5
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
commit 43ec16f1450f4936025a9bdf1a273affdb9732c1 upstream.
There is a crash in relay_file_read, as the var from
point to the end of last subbuf.
The oops looks something like:
pc : __arch_copy_to_user+0x180/0x310
lr : relay_file_read+0x20c/0x2c8
Call trace:
__arch_copy_to_user+0x180/0x310
full_proxy_read+0x68/0x98
vfs_read+0xb0/0x1d0
ksys_read+0x6c/0xf0
__arm64_sys_read+0x20/0x28
el0_svc_common.constprop.3+0x84/0x108
do_el0_svc+0x74/0x90
el0_svc+0x1c/0x28
el0_sync_handler+0x88/0xb0
el0_sync+0x148/0x180
We get the condition by analyzing the vmcore:
1). The last produced byte and last consumed byte
both at the end of the last subbuf
2). A softirq calls function(e.g __blk_add_trace)
to write relay buffer occurs when an program is calling
relay_file_read_avail().
relay_file_read
relay_file_read_avail
relay_file_read_consume(buf, 0, 0);
//interrupted by softirq who will write subbuf
....
return 1;
//read_start point to the end of the last subbuf
read_start = relay_file_read_start_pos
//avail is equal to subsize
avail = relay_file_read_subbuf_avail
//from points to an invalid memory address
from = buf->start + read_start
//system is crashed
copy_to_user(buffer, from, avail)
Bug: 288957094
Link: https://lkml.kernel.org/r/20230419040203.37676-1-zhang.zhengming@h3c.com
Fixes: 8d62fdebda ("relay file read: start-pos fix")
Signed-off-by: Zhang Zhengming <zhang.zhengming@h3c.com>
Reviewed-by: Zhao Lei <zhao_lei1@hoperun.com>
Reviewed-by: Zhou Kete <zhou.kete@h3c.com>
Reviewed-by: Pengcheng Yang <yangpc@wangsu.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f6ee841ff2)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ibbdf65d8bf2268c3e8c09520f595167a2ed41e8b
When the kernel is built inside a sandbox container,
a forest of symlinks to the source files may be
created in the container. In this case, the generated
kheaders.tar.xz should follow these symlinks
to access the source files, instead of packing
the symlinks themselves.
Test: manual (add kheaders_data.tar.xz to the output,
then examine the contents)
Bug: 276339429
Fixes: b0acbba3f489 ("Revert "Revert "Revert "FROMLIST: kheaders: Follow symlinks to source files."""")
Link: https://lore.kernel.org/lkml/20230420010029.2702543-1-elsk@google.com/
Signed-off-by: Yifan Hong <elsk@google.com>
(cherry picked from https://android-review.googlesource.com/q/commit:28fa7afc424f3dc53358c0e9b080433d78f0cd54)
Merged-In: Ie4db22dfa13d05fdccb3ad8f4fae2fe3fead994e
Change-Id: Ie4db22dfa13d05fdccb3ad8f4fae2fe3fead994e
Checking whether the ui and render threads are preempted
during frame drawing,then adjusting the prority and core
selection if they are preempted,because we expect the
threads to be executed first.By introducing the
sched_mode parameter, we can check the prev thread
preemption status in the hook, and judging important
threads for executing business logic.
Bug: 285166029
Change-Id: I6af31dff4c9032940c7f1c991a25a49ebbeac7a8
Signed-off-by: Dezhi Huang <huangdezhi@hihonor.com>
Now that the branch is used to create production GKI
images, need to institute ACK DrNo for all commits.
The DrNo approvers are in the android-mainline branch
at /OWNERS_DrNo.
Bug: 287162457
Signed-off-by: Matthias Maennich <maennich@google.com>
Change-Id: Id5bb83d7add5f314df6816c1c51b4bf2d8018e79
commit f7abf14f0001a5a47539d9f60bbdca649e43536b upstream.
For some unknown reason the introduction of the timer_wait_running callback
missed to fixup posix CPU timers, which went unnoticed for almost four years.
Marco reported recently that the WARN_ON() in timer_wait_running()
triggers with a posix CPU timer test case.
Posix CPU timers have two execution models for expiring timers depending on
CONFIG_POSIX_CPU_TIMERS_TASK_WORK:
1) If not enabled, the expiry happens in hard interrupt context so
spin waiting on the remote CPU is reasonably time bound.
Implement an empty stub function for that case.
2) If enabled, the expiry happens in task work before returning to user
space or guest mode. The expired timers are marked as firing and moved
from the timer queue to a local list head with sighand lock held. Once
the timers are moved, sighand lock is dropped and the expiry happens in
fully preemptible context. That means the expiring task can be scheduled
out, migrated, interrupted etc. So spin waiting on it is more than
suboptimal.
The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way:
- Add a mutex to struct posix_cputimers_work. This struct is per task
and used to schedule the expiry task work from the timer interrupt.
- Add a task_struct pointer to struct cpu_timer which is used to store
a the task which runs the expiry. That's filled in when the task
moves the expired timers to the local expiry list. That's not
affecting the size of the k_itimer union as there are bigger union
members already
- Let the task take the expiry mutex around the expiry function
- Let the waiter acquire a task reference with rcu_read_lock() held and
block on the expiry mutex
This avoids spin-waiting on a task which might not even be on a CPU and
works nicely for RT too.
Fixes: ec8f954a40 ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Reported-by: Marco Elver <elver@google.com>
Change-Id: Ic069585c15bc968dec3c2b99cc70256f56a70b32
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marco Elver <elver@google.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bccf9fe296)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Export cgroup_add_legacy_cftypes and a helper function to allow vendor module to expose additional files in the memory cgroup hierarchy.
Bug: 192052083
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit f41a95eadca98506e627b21f5cc73332bba4d95c)
(cherry picked from commit bf24c43b7f90290d2ac6f8163b43ab00f8f820b9)
Change-Id: Ie2b936b3e77c7ab6d740d1bb6d70e03c70a326a7
Signed-off-by: lvwenhuan <lvwenhuan@oppo.com>
With hooks below, we can mark a lock-owned thread with an identifiable flag, which can protect it from being preempted by some other unimportant threads, and then waiter will be wakeup more quickly.
https://android-review.googlesource.com/c/kernel/common/+/2183353
but now we find an issue like this one:
static inline void __up_write(struct rw_semaphore *sem)
{
...
// Step 1. we clear flag.
trace_android_vh_record_rwsem_lock_starttime(current, 0);
// Step 2. owner may be preempted by unimportant threads.
rwsem_clear_owner(sem);
...
// Step 3. wake up waiter, but it's too later.
if (unlikely(tmp & RWSEM_FLAG_WAITERS))
rwsem_wake(sem);
}
This patch will clear protect-flag after waking up waiters.
Bug: 286024926
Change-Id: I71f8b6a7d8a01336fd36b8267c2cb5edab65bd11
Signed-off-by: xieliujie <xieliujie@oppo.com>
Add ANDROID_OEM_DATA to struct rq, which is used to implement oem's
scheduler tuning.
Bug: 188899490
Change-Id: I1904b4fd83effc4b309bfb98811e9718398504f4
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
(cherry picked from commit 8fdd8e4bb5e0fc1af6ec2eec73280c2ffe569a37)
Add encryption support to bootloader based hibernation. This will
encrypt the hibernation snapshot image before it is written to the swap
partition.
Bug: 279879797
Change-Id: I07046ad7fb848fc62258871ab41b3e03246c40dc
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
Add vendor hooks to disable randomization of swap slot allocation for
swap partition used for saving hibernation image. Another level of
randomization of swap slots takes place at the firmware level as well
in order to address the wear leveling for UFS/MMC devices, so this
vendor hook checks if a block device represents the swap partition being
used for saving hibernation image, if yes, the swap slot allocation for
such partition is serialized at kernel level.
There is a performance advantage of reading contiguous pages of hibernation
image, it makes the restore logic of hibernation image simpler and faster
as there are no seeks involved in the secondary storage to read multiple
contiguous pages of the image.
Bug: 279879797
Change-Id: I8258b5166d8c6952fe9eb91a5a9826f33b836f00
Signed-off-by: Vivek Kumar <quic_vivekuma@quicinc.com>
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
To add encryption support to bootloader based hibernation, export
symbols snapshot_get_image_size and alloc_swapdev_block.
These symbols can be used by vendor implementation to be called before
and after storing the snapshot image.
Bug: 279879797
Change-Id: I0d44bf833a97fce5bc5213712b2b2523a9e22607
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
vma->lock being part of the vm_area_struct causes performance regression
during page faults because during contention its count and owner fields
are constantly updated and having other parts of vm_area_struct used
during page fault handling next to them causes constant cache line
bouncing. Fix that by moving the lock outside of the vm_area_struct.
All attempts to keep vma->lock inside vm_area_struct in a separate cache
line still produce performance regression especially on NUMA machines.
Smallest regression was achieved when lock is placed in the fourth cache
line but that bloats vm_area_struct to 256 bytes.
Considering performance and memory impact, separate lock looks like the
best option. It increases memory footprint of each VMA but that can be
optimized later if the new size causes issues. Note that after this
change vma_init() does not allocate or initialize vma->lock anymore. A
number of drivers allocate a pseudo VMA on the stack but they never use
the VMA's lock, therefore it does not need to be allocated. The future
drivers which might need the VMA lock should use
vm_area_alloc()/vm_area_free() to allocate the VMA.
Link: https://lkml.kernel.org/r/20230227173632.3292573-34-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c7f8f31c00d187a2c71a241c7f2bd6aa102a4e6f)
Bug: 161210518
Change-Id: I0c4dff518b74e1a9ea2b40636b436a122841564d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
call_rcu() can take a long time when callback offloading is enabled. Its
use in the vm_area_free can cause regressions in the exit path when
multiple VMAs are being freed.
Because exit_mmap() is called only after the last mm user drops its
refcount, the page fault handlers can't be racing with it. Any other
possible user like oom-reaper or process_mrelease are already synchronized
using mmap_lock. Therefore exit_mmap() can free VMAs directly, without
the use of call_rcu().
Expose __vm_area_free() and use it from exit_mmap() to avoid possible
call_rcu() floods and performance regressions caused by it.
Link: https://lkml.kernel.org/r/20230227173632.3292573-33-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0d2ebf9c3f7822e7ba3e4792ea3b6b19aa2da34a)
Bug: 161210518
Change-Id: I4fbf3ef38fdb22a3c80dcc61125ec21d2c426100
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Assert there are no holders of VMA lock for reading when it is about to be
destroyed.
Link: https://lkml.kernel.org/r/20230227173632.3292573-21-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f2e13784c16a98e269b3111ac02ae44446dd589c)
Bug: 161210518
Change-Id: I67fc318fd08575c9bcc54fd00b0304246b0cd279
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Introduce per-VMA locking. The lock implementation relies on a per-vma
and per-mm sequence counters to note exclusive locking:
- read lock - (implemented by vma_start_read) requires the vma
(vm_lock_seq) and mm (mm_lock_seq) sequence counters to differ.
If they match then there must be a vma exclusive lock held somewhere.
- read unlock - (implemented by vma_end_read) is a trivial vma->lock
unlock.
- write lock - (vma_start_write) requires the mmap_lock to be held
exclusively and the current mm counter is assigned to the vma counter.
This will allow multiple vmas to be locked under a single mmap_lock
write lock (e.g. during vma merging). The vma counter is modified
under exclusive vma lock.
- write unlock - (vma_end_write_all) is a batch release of all vma
locks held. It doesn't pair with a specific vma_start_write! It is
done before exclusive mmap_lock is released by incrementing mm
sequence counter (mm_lock_seq).
- write downgrade - if the mmap_lock is downgraded to the read lock, all
vma write locks are released as well (effectivelly same as write
unlock).
Link: https://lkml.kernel.org/r/20230227173632.3292573-13-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 5e31275cc997f8ec5d9e8d65fe9840ebed89db19)
Bug: 161210518
Change-Id: I5e0db53a4b5562e59dd031fabbae4f97acc1bce1
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
This prepares for page faults handling under VMA lock, looking up VMAs
under protection of an rcu read lock, instead of the usual mmap read lock.
Link: https://lkml.kernel.org/r/20230227173632.3292573-11-surenb@google.com
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 20cce633f4254cc0df39665449726e3172518f6c)
Bug: 161210518
Change-Id: Ic1262d00de9dc84ed2cc506b3bae7ef714efdfaa
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.
[akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 1c71222e5f2393b5ea1a41795c67589eea7e3490)
Bug: 161210518
Change-Id: Ifc352b487db109adab17dd33a83f5c7e68c0bbc6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
To simplify the usage of VM_LOCKED_CLEAR_MASK in vm_flags_clear(), replace
it with VM_LOCKED_MASK bitmask and convert all users.
Link: https://lkml.kernel.org/r/20230126193752.297968-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit e430a95a04efc557bc4ff9b3035c7c85aee5d63f)
Bug: 161210518
Change-Id: I17bbcc01a133511dbfaf3d82fbc4b25ecdd0b376
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Patch series "introduce vm_flags modifier functions", v4.
This patchset was originally published as a part of per-VMA locking [1]
and was split after suggestion that it's viable on its own and to
facilitate the review process. It is now a preprequisite for the next
version of per-VMA lock patchset, which reuses vm_flags modifier functions
to lock the VMA when vm_flags are being updated.
VMA vm_flags modifications are usually done under exclusive mmap_lock
protection because this attrubute affects other decisions like VMA merging
or splitting and races should be prevented. Introduce vm_flags modifier
functions to enforce correct locking.
This patch (of 7):
Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
errors when we add a const modifier to vma->vm_flags.
Link: https://lkml.kernel.org/r/20230126193752.297968-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230126193752.297968-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 06e78b614e3780f9ac32056f2861159fd19d9702)
Bug: 161210518
Change-Id: I8f0b92d5ed602d087316e99b29f59d75a644d0e4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Try to mitigate potential future driver core api changes by adding
padding to a number of core internal scheduler structures.
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0ef2f8dd5f3259dcf443c5045aa1e8505ed78a76
commit: da07d2f9c153e457e845d4dcfdd13568d71d18a4 upstream.
Traversing the Perf Domains requires rcu_read_lock() to be held and is
conditional on sched_energy_enabled(). Ensure right protections applied.
Also skip capacity inversion detection for our own pd; which was an
error.
Fixes: 44c7b80bffc3 ("sched/fair: Detect capacity inversion")
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Change-Id: I1c81232e94a38a68e39cf73f8893f185e268928d
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230112122708.330667-3-qyousef@layalina.io
(cherry picked from commit da07d2f9c153e457e845d4dcfdd13568d71d18a4)
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d362a03d92)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit: aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8 upstream.
We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
With the exception of the biggest cores which by definition are the max
performance point of the system and all tasks by definition should fit.
Even under thermal pressure, the capacity of the biggest CPU is the
highest in the system and should still fit every task. Except when it
reaches capacity inversion point, then this is no longer true.
We can handle this by using the inverted capacity as capacity_orig in
util_fits_cpu(). Which not only addresses the problem above, but also
ensure uclamp_max now considers the inverted capacity. Force fitting
a task when a CPU is in this adverse state will contribute to making the
thermal throttling last longer.
Change-Id: I3c1a4758eb387c5c8e0ba68751840517fff4ae64
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-10-qais.yousef@arm.com
(cherry picked from commit aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8)
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 799c7301de)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit: 44c7b80bffc3a657a36857098d5d9c49d94e652b upstream.
Check each performance domain to see if thermal pressure is causing its
capacity to be lower than another performance domain.
We assume that each performance domain has CPUs with the same
capacities, which is similar to an assumption made in energy_model.c
We also assume that thermal pressure impacts all CPUs in a performance
domain equally.
If there're multiple performance domains with the same capacity_orig, we
will trigger a capacity inversion if the domain is under thermal
pressure.
The new cpu_in_capacity_inversion() should help users to know when
information about capacity_orig are not reliable and can opt in to use
the inverted capacity as the 'actual' capacity_orig.
Change-Id: I23a98594e6b24697eb33d9bbfb8dd46a12bfa050
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-9-qais.yousef@arm.com
(cherry picked from commit 44c7b80bffc3a657a36857098d5d9c49d94e652b)
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fe1c982958)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Haibo Li reported:
| Unable to handle kernel paging request at virtual address
| ffffff802a0d8d7171
| Mem abort info⭕
| ESR = 0x9600002121
| EC = 0x25: DABT (current EL), IL = 32 bitsts
| SET = 0, FnV = 0 0
| EA = 0, S1PTW = 0 0
| FSC = 0x21: alignment fault
| Data abort info⭕
| ISV = 0, ISS = 0x0000002121
| CM = 0, WnR = 0 0
| swapper pgtable: 4k pages, 39-bit VAs, pgdp=000000002835200000
| [ffffff802a0d8d71] pgd=180000005fbf9003, p4d=180000005fbf9003,
| pud=180000005fbf9003, pmd=180000005fbe8003, pte=006800002a0d8707
| Internal error: Oops: 96000021 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 2 PID: 45 Comm: kworker/u8:2 Not tainted
| 5.15.78-android13-8-g63561175bbda-dirty #1
| ...
| pc : kcsan_setup_watchpoint+0x26c/0x6bc
| lr : kcsan_setup_watchpoint+0x88/0x6bc
| sp : ffffffc00ab4b7f0
| x29: ffffffc00ab4b800 x28: ffffff80294fe588 x27: 0000000000000001
| x26: 0000000000000019 x25: 0000000000000001 x24: ffffff80294fdb80
| x23: 0000000000000000 x22: ffffffc00a70fb68 x21: ffffff802a0d8d71
| x20: 0000000000000002 x19: 0000000000000000 x18: ffffffc00a9bd060
| x17: 0000000000000001 x16: 0000000000000000 x15: ffffffc00a59f000
| x14: 0000000000000001 x13: 0000000000000000 x12: ffffffc00a70faa0
| x11: 00000000aaaaaaab x10: 0000000000000054 x9 : ffffffc00839adf8
| x8 : ffffffc009b4cf00 x7 : 0000000000000000 x6 : 0000000000000007
| x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffc00a70fb70
| x2 : 0005ff802a0d8d71 x1 : 0000000000000000 x0 : 0000000000000000
| Call trace:
| kcsan_setup_watchpoint+0x26c/0x6bc
| __tsan_read2+0x1f0/0x234
| inflate_fast+0x498/0x750
| zlib_inflate+0x1304/0x2384
| __gunzip+0x3a0/0x45c
| gunzip+0x20/0x30
| unpack_to_rootfs+0x2a8/0x3fc
| do_populate_rootfs+0xe8/0x11c
| async_run_entry_fn+0x58/0x1bc
| process_one_work+0x3ec/0x738
| worker_thread+0x4c4/0x838
| kthread+0x20c/0x258
| ret_from_fork+0x10/0x20
| Code: b8bfc2a8 2a0803f7 14000007 d503249f (78bfc2a8) )
| ---[ end trace 613a943cb0a572b6 ]-----
The reason for this is that on certain arm64 configuration since
e35123d83e ("arm64: lto: Strengthen READ_ONCE() to acquire when
CONFIG_LTO=y"), READ_ONCE() may be promoted to a full atomic acquire
instruction which cannot be used on unaligned addresses.
Fix it by avoiding READ_ONCE() in read_instrumented_memory(), and simply
forcing the compiler to do the required access by casting to the
appropriate volatile type. In terms of generated code this currently
only affects architectures that do not use the default READ_ONCE()
implementation.
The only downside is that we are not guaranteed atomicity of the access
itself, although on most architectures a plain load up to machine word
size should still be atomic (a fact the default READ_ONCE() still relies
on itself).
BUG: 285794521
(cherry picked from commit 8dec88070d964bfeb4198f34cb5956d89dd1f557)
Reported-by: Haibo Li <haibo.li@mediatek.com>
Tested-by: Haibo Li <haibo.li@mediatek.com>
Cc: <stable@vger.kernel.org> # 5.17+
Change-Id: I16c9f83c3b4e28021a936249cafb1501760aa59d
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: 杨辉 <yanghui10@xiaomi.corp-partner.google.com>
android_rvh_effective_cpu_util:
To perform vendor-specific cpu util, it is used in EAS/schedutil/thermal.
The effective_cpu_util would be called when thermal calc the dynamic power,
it's non-atomic context, so set the hook be restricted.
Bug: 226686099
Test: build pass
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Change-Id: I6fd77f44ca4328f5ef37d96989aa2e08d65e29bb
A vendor hook is added in post_init_entity_util_avg before
a new cfs task's util is attached to cfs_rq's util so that
vendors can gather and modify se's information to modify
scheduling behavior and DVFS as they want.
trace_android_rvh_new_task_stats is not a proper hook because
it is called after the task's util is attached to cfs_rq's util,
which means updating cfs_rq's sched_avg and DVFS request are done.
Bug: 184219858
Signed-off-by: Choonghoon Park <choong.park@samsung.com>
Change-Id: I2deaa93297f8464895978496c9838cdffaa35b7f
(cherry picked from commit 1eea1cbdd3)
In order to allow arches to use code patching to conditionally emit the
shadow stack pushes and pops, rather than always taking the performance
hit even on CPUs that implement alternatives such as stack pointer
authentication on arm64, add a Kconfig symbol that can be set by the
arch to omit the SCS codegen itself, without otherwise affecting how
support code for SCS and compiler options (for register reservation, for
instance) are emitted.
Also, add a static key and some plumbing to omit the allocation of
shadow call stack for dynamic SCS configurations if SCS is disabled at
runtime.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-3-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 9beccca0984022a844850e32f0d7dd80d4a225de)
Bug: 283954062
Change-Id: I71ed23533124b071bd6bf5ab91b2af3bbf03b42b
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Add vendors hooks for recording memory used
Vendor modules allocate and manages the memory itself.
These memories might not be included in kernel memory
statistics. Also, detailed references and vendor-specific
information are managed only inside modules. When
various problems such as memory leaks occurs, these
information should be showed in real-time.
Bug: 182443489
Bug: 234407991
Bug: 277799025
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I62d8bb2b6650d8b187b433f97eb833ef0b784df1
Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Add hooks to init oem data of mutex.
Bug: 231527236
Signed-off-by: Dezhi Huang <huangdezhi@hihonor.com>
Change-Id: Id0aeac168e81bd3d88051657c32ba709c329dbdd
Add a hook in irqtime_account_process_tick, which helps to get
information about the high load task.
Bug: 187904818
Change-Id: I644f7d66b09d047ca6b0a0fbd2915a6387c8c007
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
(cherry picked from commit fe580539f6cec43ddb0d6ecfd39aa2f4e45754ca)
(cherry picked from commit bf3b384e0876a3111d114392b405ed269dcb9c9e)
Add hook to dup_task_struct for vendor data fields initialisation.
Bug: 188004638
Change-Id: I4b58604ee822fb8d1e0cc37bec72e820e7318427
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
(cherry picked from commit f66d96b14aab5051fdf6b5054d87362c17a7b365)
(cherry picked from commit bafafe0ec46160573bef46d3d0f5d6c65fadaa3b)
These hooks will do the following works:
a) record the time of the process in various states
b) Make corresponding optimization strategies in different hooks
Bug: 205938967
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Ia3c47bbf0aadd17337ce18fd910343b1b8c3ef93
(cherry picked from commit a61d61bab7c3d221d5d2250eacab2e4e9f59b252)
Signed-off-by: Carlos Llamas <cmllamas@google.com>
(cherry picked from commit b7a1174cc5dfc68cf769547b91958df407e55981)
(cherry picked from commit e1f430a48702844c01933a925c92b077ba14ec3f)
Add hook to boost thread when process killed.
Bug: 237749933
Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I7cc6f248397021f3a8271433144a0e582ed27cfa
(cherry picked from commit 709679142d583b0b7338d931fdd43b27b1bbf9e0)
(cherry picked from commit 226f36edddc3ad523dbb901ca3d5683f5f4e1707)
Add hooks to capture various per-zone memory stats when
a trigger threshold is hit.
Bug: 268290366
Change-Id: Ibe39263ddb05ffc3fa63b5225497a90c6480c8d7
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
In order to update cpufreq, vendor modules invoke cpufreq_update_util(),
but when we build our modules, report error:
ERROR: modpost: "cpufreq_update_util_data" [xxx.ko] undefined!
Bug: 192218676
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Ib1da70229f04b08d8d812d065021dec0bf891e0e
(cherry picked from commit 8943a2e7a33e33fd89614ac83b33b30f8d8c6b96)
(cherry picked from commit d1bc61dd85e8d28f1df0e22e2d73aa9e99cb645e)
Export the tracepoint task_rename to identify specific new task,
to customize task's util for power and performance, or optimize
task schedule parameters.
Bug: 189985971
Change-Id: I3bb71eae316e3096d361e7b47012ba46ea4be509
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit 016d3f7b6986d0ca0acef18c59a47cf6eaa4f562)
(cherry picked from commit cfc14a391adb4d44d8186694b4884815bd85be6c)
Add a hook in account_process_tick, which help us to get information
about the high load task and the cpu they running on.
Bug: 183260319
Change-Id: I54162ce3c65bd69e08d2d4747e4d4883efe4c442
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
this hook will allow to account tick in every third of more ticks
to save cpu time for accounting
Bug: 279549765
Change-Id: I5d18e0167fdce076d13aecc653dcf6387bcb25f2
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Add the vendor hook to freezer.c so that OEM's logic can be executed
when the process is about to be frozen. We need to clear the flag for
some tasks and rebind task dependencies for optimization purposes.
Bug: 187458531
Bug: 281920779
Signed-off-by: heshuai1 <heshuai1@xiaomi.com>
Change-Id: Iea42fd9604d6b33ccd6502425416f0dd28eecebb
(cherry picked from commit a1580311c36ca28344b2f03b3c8a72d9f8db5bde)