* This is in preparation for migrating these
as `kernel_build` attributes. i.e. these will
be removed as a follow-up.
Bug: 236012223
Signed-off-by: Ulises Mendez Martinez <umendez@google.com>
(cherry picked from https://android-review.googlesource.com/q/commit:ccc4fb8185b50958354d8d511823491030988131)
Merged-In: I168c44fd76f9f2732caf8f5c00bec4ed8c96ee65
Change-Id: I168c44fd76f9f2732caf8f5c00bec4ed8c96ee65
IOMMU_SYS_CACHE_NWA allows buffers for non-coherent devices to be
mapped with the correct memory attributes so that the buffers can be
cached in the system cache, with a no write allocate cache policy.
However, this property is only usable by drivers that invoke the IOMMU
API directly; it is not usable by drivers that use the DMA API.
Thus, introduce DMA_ATTR_SYS_CACHE_NWA, so that drivers for
non-coherent devices that use the DMA API can use it to specify if
they want a buffer to be cached in the system cache.
Bug: 189339242
Change-Id: Ic812a1fb144a58deb4279c2bf121fc6cc4c3b208
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
IOMMU_SYS_CACHE allows buffers for non-coherent devices to be mapped
with the correct memory attributes so that the buffers can be cached
in the system cache. However, this property is only usable by drivers
that invoke the IOMMU API directly; it is not usable by drivers that
use the DMA API.
Thus, introduce DMA_ATTR_SYS_CACHE, so that drivers for non-coherent
devices that use the DMA API can use it to specify if they want a
buffer to be cached in the system cache.
Bug: 189339242
Change-Id: I849d7a3f36b689afd2f6ee400507223fd6395158
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Add IOMMU_SYS_CACHE and IOMMU_SYS_CACHE_NWA for device mappings.
IOMMU_SYS_CACHE, used by itself, allows device accesses to be cached
in the system cache (if present). IOMMU_SYS_CACHE_NWA, used by itself,
allows device accesses to be cached in the system cache with a
no-write allocate policy.
On systems in which devices can also snoop the CPU caches (i.e.
IO-coherency is present), IOMMU_SYS_CACHE_NWA and IOMMU_SYS_CACHE can
be combined with IOMMU_CACHE (with IOMMU_SYS_CACHE + IOMMU_CACHE being
a no-op).
Bug: 189339242
Change-Id: Ic91616a148f39fead008a5b87a54ffd781fee734
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Non-coherent devices on systems that support a system or
last level cache may want to request that allocations be
cached in the system cache. For memory that is allocated
by the kernel, and used for DMA with devices, the memory
attributes used for CPU access should match the memory
attributes that will be used for device access.
The memory attributes that need to be programmed into
the MAIR for system cache usage are:
0xf4 - Normal memory, outer write back read/write allocate,
inner non-cacheable.
There is currently no support for this memory attribute for
CPU mappings, so add it.
Bug: 189339242
Change-Id: I3abc7becd408f20ac5499cbbe3c6c6f53f784107
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Fix case when an existing bpf prog is being removed
Tidy up code
Bug: 279363668
Test: Boots, can copy file to /sdcardfs/Android/data, fuse_test passes
Signed-off-by: Paul Lawrence <paullawrence@google.com>
(cherry picked from https://android-review.googlesource.com/q/commit:64366661e8a9a6d691e5ab6499872d495aed5266)
Merged-In: If0e682f43cbeb62764a7a2be543b90cb974b0aa0
Change-Id: If0e682f43cbeb62764a7a2be543b90cb974b0aa0
Currently, the frequency is calculated by max freq * 1.25 * util / max cap.
Add a vendor hook to adjust the frequency when the calculation
overestimate.
android_vh_map_util_freq
adjust util to freq calculation
Bug: 177845439
Signed-off-by: Yun Hsiang <yun.hsiang@mediatek.com>
Change-Id: I9aa9079f00af7d3380b19f2fe21b75cddd107d15
(cherry picked from commit 3122e3ec9672036384304fdeaa1b1815f60ba817)
(cherry picked from commit a2d89d4f3a)
Consider the following sequence of events:
1) A page in a PROT_READ|PROT_WRITE VMA is faulted.
2) Page migration allocates a page with the KASAN allocator,
causing it to receive a non-match-all tag, and uses it
to replace the page faulted in 1.
3) The program uses mprotect() to enable PROT_MTE on the page faulted in 1.
As a result of step 3, we are left with a non-match-all tag for a page
with tags accessible to userspace, which can lead to the same kind of
tag check faults that commit e74a68468062 ("arm64: Reset KASAN tag in
copy_highpage with HW tags only") intended to fix.
The general invariant that we have for pages in a VMA with VM_MTE_ALLOWED
is that they cannot have a non-match-all tag. As a result of step 2, the
invariant is broken. This means that the fix in the referenced commit
was incomplete and we also need to reset the tag for pages without
PG_mte_tagged.
Fixes: e5b8d92189 ("arm64: mte: reset the page tag in page->flags")
Cc: <stable@vger.kernel.org> # 5.15
Link: https://linux-review.googlesource.com/id/I7409cdd41acbcb215c2a7417c1e50d37b875beff
Link: https://lore.kernel.org/all/20230420210945.2313627-1-pcc@google.com/
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Bug: 263910115
Change-Id: I7409cdd41acbcb215c2a7417c1e50d37b875beff
[pcc: fixed merge conflict]
The mte_sync_page_tags() function sets PG_mte_tagged if it initializes
page tags. Then we return to mte_sync_tags(), which sets PG_mte_tagged
again. At best, this is redundant. However, it is possible for
mte_sync_page_tags() to return without having initialized tags for the
page, i.e. in the case where check_swap is true (non-compound page),
is_swap_pte(old_pte) is false and pte_is_tagged is false. So at worst,
we set PG_mte_tagged on a page with uninitialized tags. This can happen
if, for example, page migration causes a PTE for an untagged page to
be replaced. If the userspace program subsequently uses mprotect() to
enable PROT_MTE for that page, the uninitialized tags will be exposed
to userspace.
Fix it by removing the redundant call to set_page_mte_tagged().
Fixes: e059853d14ca ("arm64: mte: Fix/clarify the PG_mte_tagged semantics")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Cc: <stable@vger.kernel.org> # 6.1
Link: https://linux-review.googlesource.com/id/Ib02d004d435b2ed87603b858ef7480f7b1463052
Link: https://lore.kernel.org/all/20230420214327.2357985-1-pcc@google.com/
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Bug: 263910115
Change-Id: Ib02d004d435b2ed87603b858ef7480f7b1463052
Since these are unmapped from EL1, kmemleak will crash if it accesses
them.
Bug: 275004094
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: Ieb15033c2dc21e6437a3a3c91a8b36e8dda31e98
Since host stage-2 mappings are created lazily, we cannot rely on the
pte in order to recover the target physical address when checking a
host-initiated memory transition.
Instead, move the addr_is_allowed_memory() check into the host callback
function where it is passed the physical address directly from the
walker.
Bug: 279739439
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I84bdc43eded79f1f5e5a489dbc0874604491e5c8
Add the following symbol for qpnp-smb5 driver.
1 function symbol(s) added
'ktime_t alarm_expires_remaining(const struct alarm*)'
Bug: 279705107
Change-Id: I179eb3a46a9b8f95a4a191fc99a4fdd1758efe8e
Signed-off-by: Jishnu Prakash <quic_jprakash@quicinc.com>
c82ae97ea1 ("ANDROID: ABI: Update QCOM symbol list for display
drivers") lost the race with 7b05b74b3b ("ANDROID: 4/26/2023 KMI
update") and hence the CRCs in the representation are wrong. Fix that.
function symbol '__poll_t v4l2_m2m_poll(struct file*, struct v4l2_m2m_ctx*, struct poll_table_struct*)' changed
CRC changed from 0x66202a46 to 0x927a7513
function symbol 'int v4l2_m2m_querybuf(struct file*, struct v4l2_m2m_ctx*, struct v4l2_buffer*)' changed
CRC changed from 0x477bda98 to 0x9040fcee
function symbol 'int v4l2_m2m_reqbufs(struct file*, struct v4l2_m2m_ctx*, struct v4l2_requestbuffers*)' changed
CRC changed from 0x1b578a39 to 0x55e0e942
... 1 omitted; 4 symbols have only CRC changes
Fixes: c82ae97ea1 ("ANDROID: ABI: Update QCOM symbol list for display drivers")
Change-Id: I19c76907ed62c6f91e61df65920ee58216492fff
Signed-off-by: Matthias Maennich <maennich@google.com>
We've recently added a .data section for the hypervisor, which kmemleak
is eager to parse. This clearly doesn't go well, so add the section to
kmemleak's block list.
Bug: 232768943
Bug: 235903024
Change-Id: Ib1ee0009ce05bf7b0ba5d53fc8ca0429ec592102
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 275004094
Signed-off-by: Keir Fraser <keirf@google.com>
* aosp/upstream-f2fs-stable-linux-6.1.y:
f2fs: remove unnessary comment in __may_age_extent_tree
f2fs: allocate node blocks for atomic write block replacement
f2fs: use cow inode data when updating atomic write
f2fs: remove power-of-two limitation of zoned device
f2fs: allocate trace path buffer from names_cache
f2fs: add has_enough_free_secs()
f2fs: relax sanity check if checkpoint is corrupted
f2fs: refactor f2fs_gc to call checkpoint in urgent condition
f2fs: remove folio_detach_private() in .invalidate_folio and .release_folio
f2fs: remove bulk remove_proc_entry() and unnecessary kobject_del()
f2fs: support iopoll method
f2fs: remove batched_trim_sections node description
f2fs: fix to check return value of inc_valid_block_count()
f2fs: fix to check return value of f2fs_do_truncate_blocks()
f2fs: fix passing relative address when discard zones
f2fs: fix potential corruption when moving a directory
f2fs: add radix_tree_preload_end in error case
f2fs: fix to recover quota data correctly
f2fs: fix to check readonly condition correctly
docs: f2fs: Correct instruction to disable checkpoint
f2fs: fix to keep consistent i_gc_rwsem lock order
f2fs: fix to drop all dirty pages during umount() if cp_error is set
f2fs: fix to avoid use-after-free for cached IPU bio
f2fs: remove unneeded in-memory i_crtime copy
f2fs: use f2fs_hw_is_readonly() instead of bdev_read_only()
f2fs: use common implementation of file type
f2fs: merge lz4hc_compress_pages() to lz4_compress_pages()
f2fs: convert to use sysfs_emit
f2fs: set default compress option only when sb_has_compression
f2fs: Fix system crash due to lack of free space in LFS
f2fs: remove struct victim_selection default_v_ops
f2fs: fix null pointer panic in tracepoint in __replace_atomic_write_block
f2fs: fix iostat lock protection
f2fs: fix align check for npo2
f2fs: add compression feature check for all compress mount opt
f2fs: convert is_extension_exist() to return bool type
f2fs: fix scheduling while atomic in decompression path
f2fs: preserve direct write semantics when buffering is forced
f2fs: compress: fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages()
f2fs: remove else in f2fs_write_cache_pages()
f2fs: apply zone capacity to all zone type
f2fs: fix to handle filemap_fdatawrite() error in f2fs_ioc_decompress_file/f2fs_ioc_compress_file
f2fs: convert to MAX_SBI_FLAG instead of 32 in stat_show()
f2fs: Fix discard bug on zoned block devices with 2MiB zone size
f2fs: remove entire rb_entry sharing
f2fs: factor out discard_cmd usage from general rb_tree use
f2fs: factor out victim_entry usage from general rb_tree use
f2fs: fix uninitialized skipped_gc_rwsem
f2fs: handle dqget error in f2fs_transfer_project_quota()
f2fs: convert to use bitmap API
f2fs: export compress_percent and compress_watermark entries
f2fs: make f2fs_sync_inode_meta() static
f2fs: Fix f2fs_truncate_partial_nodes ftrace event
Bug: 273795759
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: I260f4009b3bb6a0ffca20488d0ad0e41e92fb9d2
Set KMI_GENERATION=5 for 4/26 KMI update
4 function symbol(s) added
'int __traceiter_android_rvh_set_gfp_zone_flags(void*, unsigned int*)'
'int __traceiter_android_rvh_set_readahead_gfp_mask(void*, unsigned int*)'
'int __traceiter_android_vh_kswapd_per_node(void*, int, bool*, bool)'
'int kswapd(void*)'
3 variable symbol(s) added
'struct tracepoint __tracepoint_android_rvh_set_gfp_zone_flags'
'struct tracepoint __tracepoint_android_rvh_set_readahead_gfp_mask'
'struct tracepoint __tracepoint_android_vh_kswapd_per_node'
function symbol 'struct block_device* I_BDEV(struct inode*)' changed
CRC changed from 0xbf847796 to 0xbc7aa1fb
function symbol 'void __ClearPageMovable(struct page*)' changed
CRC changed from 0xd312e35b to 0x3607cc69
function symbol 'void __SetPageMovable(struct page*, const struct movable_operations*)' changed
CRC changed from 0x9c92af65 to 0x44efe80c
... 4301 omitted; 4304 symbols have only CRC changes
type 'struct request' changed
byte size changed from 280 to 304
member 'struct { struct io_cq* icq; void* priv[2]; } elv' was added
member 'struct { unsigned int seq; struct list_head list; rq_end_io_fn* saved_end_io; } flush' was added
member 'union { struct { struct io_cq* icq; void* priv[2]; } elv; struct { unsigned int seq; struct list_head list; rq_end_io_fn* saved_end_io; } flush; }' was removed
3 members ('union { struct __call_single_data csd; u64 fifo_time; }' .. 'void* end_io_data') changed
offset changed by 192
type 'struct super_block' changed
member 'int cleancache_poolid' was added
14 members ('struct shrinker s_shrink' .. 'int s_stack_depth') changed
offset changed by 64
type 'struct pglist_data' changed
byte size changed from 9088 to 9216
member 'struct task_struct* mkswapd[16]' was added
18 members ('int kswapd_order' .. 'atomic_long_t vm_stat[42]') changed
offset changed by 1024
type 'struct netns_ipv6' changed
member 'struct list_head mr6_tables' was added
member 'struct fib_rules_ops* mr6_rules_ops' was added
member 'struct mr_table* mrt6' was removed
8 members ('atomic_t dev_addr_genid' .. 'struct ioam6_pernet_data* ioam6_data') changed
offset changed by 128
type 'struct fscrypt_operations' changed
byte size changed from 104 to 136
member 'u64 android_oem_data1[4]' was added
type 'struct dma_heap_ops' changed
byte size changed from 8 to 16
member 'long(* get_pool_size)(struct dma_heap*)' was added
type 'struct per_cpu_pages' changed
byte size changed from 256 to 320
member changed from 'struct list_head lists[13]' to 'struct list_head lists[17]'
type changed from 'struct list_head[13]' to 'struct list_head[17]'
number of elements changed from 13 to 17
Bug: 279074305
Change-Id: I21b301a1a4a761e935ff5679d143c2614e533ad6
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Create a vendor hook inside of gfp_zone() to modify which allocations
get to enter ZONE_MOVABLE, by zeroing out __GFP_HIGHMEM inside of the
trace hook based on certain conditions.
Separately, create a trace hook in the readahead path to affect the
behavior of the tracehook in gfp_zone().
In 5.15, we had set_skip_swapcache_flags trace-hook in do_swap_page()
but commit ac26e9c7b809 ("ANDROID: cma: allow to use CMA in swap-in path")
added __GFP_CMA explicitly, so the set_skip_swapcache_flags trace hook
is no longer needed.
Note: To comply with vendor hook guidlines, avoid including types.h in
trace/hooks/mm.h and use unsigned int for gfp_t.
Bug: 158645321
Change-Id: Idfa6b0b06b1b819d706c847e702bc94ddf7aa55a
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Signed-off-by: Sukadev Bhattiprolu <quic_sukadev@quicinc.com>
Though zram pages are movable, they aren't allowed to enter
MIGRATE_CMA pageblocks. zram is not seen to pin pages for
long which can cause an issue. Moreover allowing zram to
pick CMA pages can be helpful in cases seen where zram order
0 alloc fails when there are lots of free cma pages, resulting
in kswapd or direct reclaim not making enough progress.
Bug: 158645321
Link: https://lore.kernel.org/linux-mm/4c77bb100706b714213ff840d827a48e40ac9177.1604282969.git.cgoldswo@codeaurora.org/
Change-Id: I31f4a21781cdb31982a768daa59e9546d7667b08
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
[isaacm@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Sukadev Bhattiprolu <quic_sukadev@quicinc.com>
Add a PCP list for __GFP_CMA allocations so as not to deprive
MIGRATE_MOVABLE allocations quick access to pages on their PCP
lists.
Bug: 158645321
Change-Id: I9831eed113ec9e851b4f651755205ac9cf23b9be
Signed-off-by: Liam Mark <lmark@codeaurora.org>
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
[isaacm@codeaurora.org: Resolve merge conflicts related to new mm
features]
Signed-off-by: Isaac J. Manjarres <isaacm@quicinc.com>
quic_sukadev@quicinc.com: Resolve merge conflicts due to earlier patch
dropping gfp_flags;drop BUILD_BUG_ON related to MIGRATETYPE_HIGHATOMIC
since its value changed.
Signed-off-by: Sukadev Bhattiprolu <quic_sukadev@quicinc.com>
CMA pages are designed to be used as fallback for movable allocations
and cannot be used for non-movable allocations. If CMA pages are
utilized poorly, non-movable allocations may end up getting starved if
all regular movable pages are allocated and the only pages left are
CMA. Always using CMA pages first creates unacceptable performance
problems. As a midway alternative, use CMA pages for certain
userspace allocations. The userspace pages can be migrated or dropped
quickly which giving decent utilization.
Additionally, add a fall-backs for failed CMA allocations in rmqueue()
and __rmqueue_pcplist() (the latter addition being driven by a report
by the kernel test robot); these fallbacks were dealt with differently
in the original version of the patch as the rmqueue() call chain has
changed).
Bug: 158645321
Link: https://lore.kernel.org/lkml/cover.1604282969.git.cgoldswo@codeaurora.org/
Change-Id: Iad46f0405b416e29ae788f82b79c9953513a9c9d
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Heesub Shin <heesub.shin@samsung.com>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
[cgoldswo@codeaurora.org: Place in bugfixes; remove cma_alloc zone flag]
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
[isaacm@codeaurora.org: Resolve merge conflicts to account for new mm
features]
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
[quic_sukadev@quicinc.com: dropped unused gfp_flags parameter to
__rmqueue_pcplist(), resolved some conflicts]
Signed-off-by: Sukadev Bhattiprolu <quic_sukadev@quicinc.com>
'struct fscrypt_operations' shouldn't really be part of the KMI, as
there's no reason for loadable modules to use it. However, due to the
way MODVERSIONS calculates symbol CRCs by recursively dereferencing
structures, changes to 'struct fscrypt_operations' affect the CRCs of
KMI functions exported from certain core kernel files such as
fs/dcache.c. That brings it in-scope for the KMI freeze.
There is an OEM who wants to add fields to this struct, so add an
ANDROID_OEM_DATA_ARRAY for them to use.
Bug: 173475629
Change-Id: Idfc76884fce8a5fcc0837cd9363695d5428b1624
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: j7093.jung <j7093.jung@samsung.com>
We need to pass some device specific flags that are detected from EL1
(as built-in sync device) to the hypervisor. The flags are defined
by the driver but hosted in the main iommu struct.
As we use SMCCC1.1 we only have 7 args, which were already used, so
mem_size is removed as it really not needed as all page donations
are 1 page. so passing the base address is enough.
Bug: 255266847
Change-Id: I14e6d2573d7a822334455999aa9fd6f01ac97450
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Page replacement is handled in the Linux Kernel in one of two ways:
1) Asynchronously via kswapd
2) Synchronously, via direct reclaim
At page allocation time the allocating task is immediately given a page
from the zone free list allowing it to go right back to work doing
whatever it was doing; Probably directly or indirectly executing business
logic.
Just prior to satisfying the allocation, free pages is checked to see if
it has reached the zone low watermark and if so, kswapd is awakened.
Kswapd will start scanning pages looking for inactive pages to evict to
make room for new page allocations. The work of kswapd allows tasks to
continue allocating memory from their respective zone free list without
incurring any delay.
When the demand for free pages exceeds the rate that kswapd tasks can
supply them, page allocation works differently. Once the allocating task
finds that the number of free pages is at or below the zone min watermark,
the task will no longer pull pages from the free list. Instead, the task
will run the same CPU-bound routines as kswapd to satisfy its own
allocation by scanning and evicting pages. This is called a direct reclaim.
The time spent performing a direct reclaim can be substantial, often
taking tens to hundreds of milliseconds for small order0 allocations to
half a second or more for order9 huge-page allocations. In fact, kswapd is
not actually required on a linux system. It exists for the sole purpose of
optimizing performance by preventing direct reclaims.
When memory shortfall is sufficient to trigger direct reclaims, they can
occur in any task that is running on the system. A single aggressive
memory allocating task can set the stage for collateral damage to occur in
small tasks that rarely allocate additional memory. Consider the impact of
injecting an additional 100ms of latency when nscd allocates memory to
facilitate caching of a DNS query.
The presence of direct reclaims 10 years ago was a fairly reliable
indicator that too much was being asked of a Linux system. Kswapd was
likely wasting time scanning pages that were ineligible for eviction.
Adding RAM or reducing the working set size would usually make the problem
go away. Since then hardware has evolved to bring a new struggle for
kswapd. Storage speeds have increased by orders of magnitude while CPU
clock speeds stayed the same or even slowed down in exchange for more
cores per package. This presents a throughput problem for a single
threaded kswapd that will get worse with each generation of new hardware.
Test Details
NOTE: The tests below were run with shadow entries disabled. See the
associated patch and cover letter for details
The tests below were designed with the assumption that a kswapd bottleneck
is best demonstrated using filesystem reads. This way, the inactive list
will be full of clean pages, simplifying the analysis and allowing kswapd
to achieve the highest possible steal rate. Maximum steal rates for kswapd
are likely to be the same or lower for any other mix of page types on the
system.
Tests were run on a 2U Oracle X7-2L with 52 Intel Xeon Skylake 2GHz cores,
756GB of RAM and 8 x 3.6 TB NVMe Solid State Disk drives. Each drive has
an XFS file system mounted separately as /d0 through /d7. SSD drives
require multiple concurrent streams to show their potential, so I created
eleven 250GB zero-filled files on each drive so that I could test with
parallel reads.
The test script runs in multiple stages. At each stage, the number of dd
tasks run concurrently is increased by 2. I did not include all of the
test output for brevity.
During each stage dd tasks are launched to read from each drive in a round
robin fashion until the specified number of tasks for the stage has been
reached. Then iostat, vmstat and top are started in the background with 10
second intervals. After five minutes, all of the dd tasks are killed and
the iostat, vmstat and top output is parsed in order to report the
following:
CPU consumption
- sy - aggregate kernel mode CPU consumption from vmstat output. The value
doesn't tend to fluctuate much so I just grab the highest value.
Each sample is averaged over 10 seconds
- dd_cpu - for all of the dd tasks averaged across the top samples since
there is a lot of variation.
Throughput
- in Kbytes
- Command is iostat -x -d 10 -g total
This first test performs reads using O_DIRECT in order to show the maximum
throughput that can be obtained using these drives. It also demonstrates
how rapidly throughput scales as the number of dd tasks are increased.
The dd command for this test looks like this:
Command Used: dd iflag=direct if=/d${i}/$n of=/dev/null bs=4M
Test #1: Direct IO
dd sy dd_cpu throughput
6 0 2.33 14726026.40
10 1 2.95 19954974.80
16 1 2.63 24419689.30
22 1 2.63 25430303.20
28 1 2.91 26026513.20
34 1 2.53 26178618.00
40 1 2.18 26239229.20
46 1 1.91 26250550.40
52 1 1.69 26251845.60
58 1 1.54 26253205.60
64 1 1.43 26253780.80
70 1 1.31 26254154.80
76 1 1.21 26253660.80
82 1 1.12 26254214.80
88 1 1.07 26253770.00
90 1 1.04 26252406.40
Throughput was close to peak with only 22 dd tasks. Very little system CPU
was consumed as expected as the drives DMA directly into the user address
space when using direct IO.
In this next test, the iflag=direct option is removed and we only run the
test until the pgscan_kswapd from /proc/vmstat starts to increment. At
that point metrics are parsed and reported and the pagecache contents are
dropped prior to the next test. Lather, rinse, repeat.
Test #2: standard file system IO, no page replacement
dd sy dd_cpu throughput
6 2 28.78 5134316.40
10 3 31.40 8051218.40
16 5 34.73 11438106.80
22 7 33.65 14140596.40
28 8 31.24 16393455.20
34 10 29.88 18219463.60
40 11 28.33 19644159.60
46 11 25.05 20802497.60
52 13 26.92 22092370.00
58 13 23.29 22884881.20
64 14 23.12 23452248.80
70 15 22.40 23916468.00
76 16 22.06 24328737.20
82 17 20.97 24718693.20
88 16 18.57 25149404.40
90 16 18.31 25245565.60
Each read has to pause after the buffer in kernel space is populated while
those pages are added to the pagecache and copied into the user address
space. For this reason, more parallel streams are required to achieve peak
throughput. The copy operation consumes substantially more CPU than direct
IO as expected.
The next test measures throughput after kswapd starts running. This is the
same test only we wait for kswapd to wake up before we start collecting
metrics. The script actually keeps track of a few things that were not
mentioned earlier. It tracks direct reclaims and page scans by watching
the metrics in /proc/vmstat. CPU consumption for kswapd is tracked the
same way it is tracked for dd.
Since the test is 100% reads, you can assume that the page steal rate for
kswapd and direct reclaims is almost identical to the scan rate.
Test #3: 1 kswapd thread per node
dd sy dd_cpu kswapd0 kswapd1 throughput dr pgscan_kswapd pgscan_direct
10 4 26.07 28.56 27.03 7355924.40 0 459316976 0
16 7 34.94 69.33 69.66 10867895.20 0 872661643 0
22 10 36.03 93.99 99.33 13130613.60 489 1037654473 11268334
28 10 30.34 95.90 98.60 14601509.60 671 1182591373 15429142
34 14 34.77 97.50 99.23 16468012.00 10850 1069005644 249839515
40 17 36.32 91.49 97.11 17335987.60 18903 975417728 434467710
46 19 38.40 90.54 91.61 17705394.40 25369 855737040 582427973
52 22 40.88 83.97 83.70 17607680.40 31250 709532935 724282458
58 25 40.89 82.19 80.14 17976905.60 35060 657796473 804117540
64 28 41.77 73.49 75.20 18001910.00 39073 561813658 895289337
70 33 45.51 63.78 64.39 17061897.20 44523 379465571 1020726436
76 36 46.95 57.96 60.32 16964459.60 47717 291299464 1093172384
82 39 47.16 55.43 56.16 16949956.00 49479 247071062 1134163008
88 42 47.41 53.75 47.62 16930911.20 51521 195449924 1180442208
90 43 47.18 51.40 50.59 16864428.00 51618 190758156 1183203901
In the previous test where kswapd was not involved, the system-wide kernel
mode CPU consumption with 90 dd tasks was 16%. In this test CPU consumption
with 90 tasks is at 43%. With 52 cores, and two kswapd tasks (one per NUMA
node), kswapd can only be responsible for a little over 4% of the increase.
The rest is likely caused by 51,618 direct reclaims that scanned 1.2
billion pages over the five minute time period of the test.
Same test, more kswapd tasks:
Test #4: 4 kswapd threads per node
dd sy dd_cpu kswapd0 kswapd1 throughput dr pgscan_kswapd pgscan_direct
10 5 27.09 16.65 14.17 7842605.60 0 459105291 0
16 10 37.12 26.02 24.85 11352920.40 15 920527796 358515
22 11 36.94 37.13 35.82 13771869.60 0 1132169011 0
28 13 35.23 48.43 46.86 16089746.00 0 1312902070 0
34 15 33.37 53.02 55.69 18314856.40 0 1476169080 0
40 19 35.90 69.60 64.41 19836126.80 0 1629999149 0
46 22 36.82 88.55 57.20 20740216.40 0 1708478106 0
52 24 34.38 93.76 68.34 21758352.00 0 1794055559 0
58 24 30.51 79.20 82.33 22735594.00 0 1872794397 0
64 26 30.21 97.12 76.73 23302203.60 176 1916593721 4206821
70 33 32.92 92.91 92.87 23776588.00 3575 1817685086 85574159
76 37 31.62 91.20 89.83 24308196.80 4752 1812262569 113981763
82 29 25.53 93.23 92.33 24802791.20 306 2032093122 7350704
88 43 37.12 76.18 77.01 25145694.40 20310 1253204719 487048202
90 42 38.56 73.90 74.57 22516787.60 22774 1193637495 545463615
By increasing the number of kswapd threads, throughput increased by ~50%
while kernel mode CPU utilization decreased or stayed the same, likely due
to a decrease in the number of parallel tasks at any given time doing page
replacement.
Signed-off-by: Buddy Lumpkin <buddy.lumpkin@oracle.com>
Bug: 201263306
Link: https://lore.kernel.org/lkml/1522661062-39745-1-git-send-email-buddy.lumpkin@oracle.com
[charante@codeaurora.org]: Changes made to select number of kswapds through uapi
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
[quic_vjitta@quicinc.com]: Changes made to move multiple kswapd threads logic to vendor hooks
Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com>
(cherry picked from commit 0d61a651e4dd3c61d1658cc92e0b0450c8374738)
Change-Id: I8425cab7f40cbeaf65af0ea118c1a9ac7da0930e
[quic_vjitta@quicinc.com]: Resolved minor merge conflicts
Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com>
To support multiple kswap threads vendor modules need
access to kswapd function. So, export it.
Bug: 201263306
Change-Id: I442612710835f39836a295e9d1936f86826ab960
Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com>
(cherry picked from commit 12972dd7bfa306aa07c92966c4efe7b1c0c5e043)
Enable support for multicast policy routing. This will allow border
router devices to run multiple routing tables simultaneously.
Bug: 233821827
Change-Id: Ib029f4db1c5bb9416c06813fa0b66c965fef8fd8
Signed-off-by: Carlos Llamas <cmllamas@google.com>
(cherry picked from commit c9e98bfeeeae4580143ec87b4f1f3ef8571dc331)
Update symbol list after making the DMA-BUF heap page-pool helper
Bug: 275698445
Change-Id: Ic063172de9c81a3f7fcd9c3f8a07a81ddd1a6d8c
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Update symbol list after making the DMA-BUF heap page-pool helper
library built-in.
Bug: 275698445
Change-Id: If06ccb4c916da03a9b6e1a05305089ca7ab9514d
Signed-off-by: T.J. Mercier <tjmercier@google.com>
In order to help with memory accounting, expose the total pool size of
all DMA-BUF heaps at /sys/kernel/dma_heap/total_pools_kb.
This information will be exposed as part of Android Bugreport[1].
[1]: https://android-review.googlesource.com/q/topic:%22b%252F167709539%22+(status:open%20OR%20status:merged)
Bug: 167709539
Bug: 275698445
Change-Id: I6a1b52517e73103122690f6567f4f295db9ca1ad
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
This patch does not change any functionality. This patch is a subset of
the following patch that is expected to be merged upstream soon:
https://lore.kernel.org/linux-block/20230407235822.1672286-3-bvanassche@acm.org/
Bug: 275581839
Bug: 277112517
Change-Id: I717d1c78233b92fd18297c81ef15335684da5d54
Signed-off-by: Bart Van Assche <bvanassche@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmRBFW0ACgkQONu9yGCS
aT7Jew//Ytw9+JQ71LT1TuJnQ1GayJOL1BW5hgxoYgnBFasWDwsGA9rzHs6KHqHb
0Vjk7MX7VZB+6zWakOxY5CFVM33J4fS7wY8WE2bj8X3QQhD/J0HQDMdELvSBi3qF
7xI6sghEQEwOuwAj2+CBqm/q7rA5FTnO1QgJuk/AKJ6PHGRiQeZ7q1zGpFvSaj7S
cyKvY99RsjnUN+PYk4LE2+u/6DVCqiWYVDQrdjalb9zsrXg4+nmPH6ZJzZX8+bbM
eM0xAR675V8TXqDi+8bj7tWmiS52XyjYF3Q/bu9BmU67DqslH9FFyVQxhgTHUZpN
qWXkojEU2djIc3qt7T/bpZS/vD8Kg3Px1CgyIRN8Y5SlZfhZyqVdTZ4AQCtJuLQJ
wDIdQCLlGzzDNFvbD+LdfJSjZt7Ig1sM/HwtPZhUA9yF0FN1XV3dcESzCOeI0/S7
ohRh8cs1sidnxrbvVwiVNENSqbJD7G9/9vVjIfyfcnt57q+fs6xCBhpOyNoVOp74
I5i6ALMcVZoAB50vDjnoGZsSRe9W2AmOV6UMIkVCvRCWYFqBpgVftMTAACNyljni
UlXmO7aDQj+nbHD/auclFtU02oHQbk62FSrwoWMFS090zWztQqUhgRY7Qnl13yCM
poEvrKlskXhvunsNtdVmI5O3N2GANWKgGwkyFIiXvgxKkw1qpUo=
=zeN9
-----END PGP SIGNATURE-----
Merge 6.1.25 into android14-6.1
Changes in 6.1.25
Revert "pinctrl: amd: Disable and mask interrupts on resume"
drm/amd/display: Pass the right info to drm_dp_remove_payload
ALSA: emu10k1: fix capture interrupt handler unlinking
ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
ALSA: i2c/cs8427: fix iec958 mixer control deactivation
ALSA: hda: patch_realtek: add quirk for Asus N7601ZM
ALSA: hda/realtek: Add quirks for Lenovo Z13/Z16 Gen2
ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
ALSA: emu10k1: don't create old pass-through playback device on Audigy
ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
ALSA: hda/hdmi: disable KAE for Intel DG2
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Bluetooth: Fix race condition in hidp_session_thread
bluetooth: btbcm: Fix logic error in forming the board name.
Bluetooth: Free potentially unfreed SCO connection
Bluetooth: hci_conn: Fix possible UAF
btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues
btrfs: fix fast csum implementation detection
fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
mtdblock: tolerate corrected bit-flips
mtd: rawnand: meson: fix bitmask for length in command word
mtd: rawnand: stm32_fmc2: remove unsupported EDO mode
mtd: rawnand: stm32_fmc2: use timings.mode instead of checking tRC_min
KVM: arm64: PMU: Restore the guest's EL0 event counting after migration
fbcon: Fix error paths in set_con2fb_map
fbcon: set_con2fb_map needs to set con2fb_map!
drm/i915/dsi: fix DSS CTL register offsets for TGL+
clk: sprd: set max_register according to mapping range
RDMA/irdma: Do not generate SW completions for NOPs
RDMA/irdma: Fix memory leak of PBLE objects
RDMA/irdma: Increase iWARP CM default rexmit count
RDMA/irdma: Add ipv4 check to irdma_find_listener()
IB/mlx5: Add support for 400G_8X lane speed
RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192
RDMA/erdma: Inline mtt entries into WQE if supported
RDMA/erdma: Defer probing if netdevice can not be found
clk: rs9: Fix suspend/resume
RDMA/cma: Allow UD qp_type to join multicast only
bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp
LoongArch, bpf: Fix jit to skip speculation barrier opcode
dmaengine: apple-admac: Handle 'global' interrupt flags
dmaengine: apple-admac: Set src_addr_widths capability
dmaengine: apple-admac: Fix 'current_tx' not getting freed
9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
bpf, arm64: Fixed a BTI error on returning to patched function
KVM: arm64: Initialise hypervisor copies of host symbols unconditionally
KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV2/3 to protected VMs
niu: Fix missing unwind goto in niu_alloc_channels()
tcp: restrict net.ipv4.tcp_app_win
bonding: fix ns validation on backup slaves
iavf: refactor VLAN filter states
iavf: remove active_cvlans and active_svlans bitmaps
net: openvswitch: fix race on port output
Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
Bluetooth: Fix printing errors if LE Connection times out
Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
Bluetooth: Set ISO Data Path on broadcast sink
drm/armada: Fix a potential double free in an error handling path
qlcnic: check pci_reset_function result
net: wwan: iosm: Fix error handling path in ipc_pcie_probe()
cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
sctp: fix a potential overflow in sctp_ifwdtsn_skip
RDMA/core: Fix GID entry ref leak when create_ah fails
selftests: openvswitch: adjust datapath NL message declaration
udp6: fix potential access to stale information
net: macb: fix a memory corruption in extended buffer descriptor mode
skbuff: Fix a race between coalescing and releasing SKBs
libbpf: Fix single-line struct definition output in btf_dump
ARM: 9290/1: uaccess: Fix KASAN false-positives
ARM: dts: qcom: apq8026-lg-lenok: add missing reserved memory
power: supply: rk817: Fix unsigned comparison with less than zero
power: supply: cros_usbpd: reclassify "default case!" as debug
power: supply: axp288_fuel_gauge: Added check for negative values
selftests/bpf: Fix progs/find_vma_fail1.c build error.
wifi: mwifiex: mark OF related data as maybe unused
i2c: imx-lpi2c: clean rx/tx buffers upon new message
i2c: hisi: Avoid redundant interrupts
efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
block: ublk_drv: mark device as LIVE before adding disk
ACPI: video: Add backlight=native DMI quirk for Acer Aspire 3830TG
drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F
hwmon: (peci/cputemp) Fix miscalculated DTS for SKX
hwmon: (xgene) Fix ioremap and memremap leak
verify_pefile: relax wrapper length check
asymmetric_keys: log on fatal failures in PE/pkcs7
nvme: send Identify with CNS 06h only to I/O controllers
wifi: iwlwifi: mvm: fix mvmtxq->stopped handling
wifi: iwlwifi: mvm: protect TXQ list manipulation
drm/amdgpu: add mes resume when do gfx post soft reset
drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
drm/amdgpu/gfx: set cg flags to enter/exit safe mode
ACPI: resource: Add Medion S17413 to IRQ override quirk
x86/hyperv: Move VMCB enlightenment definitions to hyperv-tlfs.h
KVM: selftests: Move "struct hv_enlightenments" to x86_64/svm.h
KVM: SVM: Add a proper field for Hyper-V VMCB enlightenments
x86/hyperv: KVM: Rename "hv_enlightenments" to "hv_vmcb_enlightenments"
KVM: SVM: Flush Hyper-V TLB when required
tracing: Add trace_array_puts() to write into instance
tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
maple_tree: fix write memory barrier of nodes once dead for RCU mode
ksmbd: avoid out of bounds access in decode_preauth_ctxt()
riscv: add icache flush for nommu sigreturn trampoline
HID: intel-ish-hid: Fix kernel panic during warm reset
net: sfp: initialize sfp->i2c_block_size at sfp allocation
net: phy: nxp-c45-tja11xx: add remove callback
net: phy: nxp-c45-tja11xx: fix unsigned long multiplication overflow
scsi: ses: Handle enclosure with just a primary component gracefully
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
cgroup: fix display of forceidle time at root
cgroup/cpuset: Fix partition root's cpuset.cpus update bug
cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
drm/amd/pm: correct SMU13.0.7 pstate profiling clock settings
drm/amd/pm: correct SMU13.0.7 max shader clock reporting
mptcp: use mptcp_schedule_work instead of open-coding it
mptcp: stricter state check in mptcp_worker
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
ubi: Fix deadlock caused by recursively holding work_sem
i2c: mchp-pci1xxxx: Update Timing registers
powerpc/papr_scm: Update the NUMA distance table for the target node
sched/fair: Fix imbalance overflow
x86/rtc: Remove __init for runtime functions
i2c: ocores: generate stop condition after timeout in polling mode
cifs: fix negotiate context parsing
nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
cgroup/cpuset: Skip spread flags update on v2
cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods
Linux 6.1.25
Change-Id: Ib4d2c49ea9bacb8d8dbdb7b3a4eecce937016427
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
The 6.1.24 release requires the addition of the symbol `strchrnul` for
the db845c target to build properly.
Bug: 279448025
Change-Id: I3643400271513fbd0bad68fca720039d3a5a98db
Signed-off-by: Ulises Mendez Martinez <umendez@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
In release 6.1.24 a new .h file was included in the xhci code, which
caused the CRCs to change as some structures changed into "real"
structures instead of anonymous definitions. So preserve the CRCs by
commenting out the #include if GENKSYMS is being calculated.
This will be removed the next KABI break, as it shouldn't be sticking
around long.
Bug: 161946584
Fixes: 167c05646f ("xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I59838b0af869d3e17fc73d72eb473190c50281fd
This reverts commit 53a0031217.
It breaks the current Android kernel abi. It will be brought back at
the next KABI break update.
Bug: 161946584
Change-Id: I7fd2655234ff38dfe11a528e71c7772458a36328
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit b34056bedf.
It breaks the current Android kernel abi. It will be brought back at
the next KABI break update.
Bug: 161946584
Change-Id: I3664de0db15ba207c8b35530840095083d376dd1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 547cc8dae2.
It breaks the current Android kernel abi. It will be brought back at
the next KABI break update.
Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3ba5ae5485e8cfb709ad23885053a3db8112913b
This reverts commit 98ba763cc9.
It breaks the current Android kernel abi. It will be brought back at
the next KABI break update.
Bug: 161946584
Change-Id: Ic6c1286d261b9a502caeb8bac2244cd6037f291e
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 8a3a6a0aca.
It breaks the current Android kernel abi. It will be brought back at
the next KABI break update.
Bug: 161946584
Change-Id: I7126fe19db71c01814281eb6518f11ededccc3e3
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>