Commit Graph

1159035 Commits

Author SHA1 Message Date
Jakub Kicinski
dc6facfe02 UPSTREAM: net: tls: handle backlogging of crypto requests
commit 8590541473188741055d27b955db0777569438e3 upstream.

Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
 -EBUSY instead of -EINPROGRESS in valid situations. For example, when
the cryptd queue for AESNI is full (easy to trigger with an
artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
to the backlog but still processed. In that case, the async callback
will also be called twice: first with err == -EINPROGRESS, which it
seems we can just ignore, then with err == 0.

Compared to Sabrina's original patch this version uses the new
tls_*crypt_async_wait() helpers and converts the EBUSY to
EINPROGRESS to avoid having to modify all the error handling
paths. The handling is identical.

Bug: 326215202
Fixes: a54667f672 ("tls: Add support for encryption using async offload accelerator")
Fixes: 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Srish: v2: fixed hunk failures
        fixed merge-conflict in stable branch linux-6.1.y,
        needs to go on top of https://lore.kernel.org/stable/20240307155930.913525-1-lee@kernel.org/]
Signed-off-by: Srish Srinivasan <srish.srinivasan@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cd1bbca03f3c1d845ce274c0d0a66de8e5929f72)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I6aedd018e89a9aa2ace6633e02308336ed19fe13
2024-05-07 13:40:59 +00:00
Kalesh Singh
1794308d46 ANDROID: 16K: Fix show maps CFI failure
If the kernel is built CONFIG_CFI_CLANG=y, reading smaps
may cause a panic. This is due to a failed CFI check; which
is triggered becuase the signature of the function pointer for
printing smaps padding VMAs does not match exactly with that
for show_smap().

Fix this by casting the function pointer to the expected type
based on whether printing maps or smaps padding.

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I65564a547dacbc4131f8557344c8c96e51f90cd5
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-02 22:14:25 +00:00
Kalesh Singh
72a9c0a205 ANDROID: 16K: Handle pad VMA splits and merges
In some cases a VMA with padding representation may be split, and
therefore the padding flags must be updated accordingly.

There are 3 cases to handle:

Given:
    | DDDDPPPP |

where:
    - D represents 1 page of data;
    - P represents 1 page of padding;
    - | represents the boundaries (start/end) of the VMA

1) Split exactly at the padding boundary

    | DDDDPPPP | --> | DDDD | PPPP |

    - Remove padding flags from the first VMA.
    - The second VMA is all padding

2) Split within the padding area

    | DDDDPPPP | --> | DDDDPP | PP |

    - Subtract the length of the second VMA from the first VMA's
      padding.
    - The second VMA is all padding, adjust its padding length (flags)

3) Split within the data area

    | DDDDPPPP | --> | DD | DDPPPP |

    - Remove padding flags from the first VMA.
    - The second VMA is has the same padding as from before the split.

To simplify the semantics merging of padding VMAs is not allowed.

If a split produces a VMA that is entirely padding, show_[s]maps()
only outputs the padding VMA entry (as the data entry is of length 0).

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: Ie2628ced5512e2c7f8af25fabae1f38730c8bb1a
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-02 22:14:25 +00:00
Kalesh Singh
b86b5cb22d ANDROID: 16K: madvise_vma_pad_pages: Remove filemap_fault check
Some file systems like F2FS use a custom filemap_fault ops. Remove this
check, as checking vm_file is sufficient.

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: Id6a584d934f06650c0a95afd1823669fc77ba2c2
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-02 22:14:25 +00:00
Kalesh Singh
1657717c12 ANDROID: 16K: Only madvise padding from dynamic linker context
Only preform padding advise from the execution context on bionic's
dynamic linker. This ensures that madvise() doesn't have unwanted
side effects.

Also rearrange the order of fail checks in madvise_vma_pad_pages()
in order of ascending cost.

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I3e05b8780c6eda78007f86b613f8c11dd18ac28f
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-02 22:14:25 +00:00
Kalesh Singh
2ca5e076c9 ANDROID: 16K: Separate padding from ELF LOAD segment mappings
In has been found that some in-field apps depend on the output of
/proc/*/maps to determine the address ranges of other operations.

With the extension of LOAD segments VMAs to be contiguous in memory,
the apps may perform operations on an area that is not backed by the
underlying file, which results in a SIGBUS. Other apps have crashed
with yet unindentified reasons.

To avoid breaking in-field apps, maintain the output of /proc/*/[s]maps
with PROT_NONE VMAs for the padding pages of LOAD segments instead of
showing the segment extensions.

NOTE: This does not allocate actual backing VMAs for the shown
      PROT_NONE mappings.

This approach maintains 2 possible assumptions that userspace (apps)
could be depending on:
   1) That LOAD segment mappings are "contiguous" (not speparated by
      unrelated mappings) in memory.
   2) That no virtual address space is available between mappings of
      consecutive LOAD segments for the same ELF.

For example the output of /proc/*/[s]maps before and after this change
is shown below. Segments maintain PROT_NONE gaps ("[page size compat]")
for app compatiblity but these are not backed by actual slab VMA memory.

Maps Before:

7fb03604d000-7fb036051000 r--p 00000000 fe:09 21935719                   /system/lib64/libnetd_client.so
7fb036051000-7fb036055000 r-xp 00004000 fe:09 21935719                   /system/lib64/libnetd_client.so
7fb036055000-7fb036059000 r--p 00008000 fe:09 21935719                   /system/lib64/libnetd_client.so
7fb036059000-7fb03605a000 rw-p 0000c000 fe:09 21935719                   /system/lib64/libnetd_client.so

Maps After:

7fc707390000-7fc707393000 r--p 00000000 fe:09 21935719                   /system/lib64/libnetd_client.so
7fc707393000-7fc707394000 ---p 00000000 00:00 0                          [page size compat]
7fc707394000-7fc707398000 r-xp 00004000 fe:09 21935719                   /system/lib64/libnetd_client.so
7fc707398000-7fc707399000 r--p 00008000 fe:09 21935719                   /system/lib64/libnetd_client.so
7fc707399000-7fc70739c000 ---p 00000000 00:00 0                          [page size compat]
7fc70739c000-7fc70739d000 rw-p 0000c000 fe:09 21935719                   /system/lib64/libnetd_client.so

Smaps Before:

7fb03604d000-7fb036051000 r--p 00000000 fe:09 21935719                   /system/lib64/libnetd_client.so
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:         16 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           16 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd mr mw me
7fb036051000-7fb036055000 r-xp 00004000 fe:09 21935719                   /system/lib64/libnetd_client.so
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:         16 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           16 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd ex mr mw me
7fb036055000-7fb036059000 r--p 00008000 fe:09 21935719                   /system/lib64/libnetd_client.so
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd mr mw me ac
7fb036059000-7fb03605a000 rw-p 0000c000 fe:09 21935719                   /system/lib64/libnetd_client.so
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd wr mr mw me ac

Smaps After:

7fc707390000-7fc707393000 r--p 00000000 fe:09 21935719                   /system/lib64/libnetd_client.so
Size:                 12 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  12 kB
Pss:                   0 kB
Shared_Clean:         12 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           12 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd mr mw me ??
7fc707393000-7fc707394000 ---p 00000000 00:00 0                          [page size compat]
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: mr mw me
7fc707394000-7fc707398000 r-xp 00004000 fe:09 21935719                   /system/lib64/libnetd_client.so
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                   0 kB
Shared_Clean:         16 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           16 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd ex mr mw me
7fc707398000-7fc707399000 r--p 00008000 fe:09 21935719                   /system/lib64/libnetd_client.so
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd mr mw me ac ?? ??
7fc707399000-7fc70739c000 ---p 00000000 00:00 0                          [page size compat]
Size:                 12 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: mr mw me ac
7fc70739c000-7fc70739d000 rw-p 0000c000 fe:09 21935719                   /system/lib64/libnetd_client.so
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd wr mr mw me ac

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I12bf2c106fafc74a500d79155b81dde5db42661e
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-02 22:14:25 +00:00
Kalesh Singh
1537dbe21b ANDROID: 16K: Exclude ELF padding for fault around range
Userspace apps often analyze memory consumption by the use of mm
rss_stat counters -- via the kmem/rss_stat trace event or from
/proc/<pid>/statm.

rss_stat counters are only updated when the PTEs are updated. What this
means is that pages can be present in the page cache from readahead but
not visible to userspace (not attributed to the app) as there is no
corresponding VMA (PTEs) for the respective page cache pages.

A side effect of the loader now extending ELF LOAD segments to be
contiguously mapped in the virtual address space, means that the VMA is
extended to cover the padding pages.

When filesystems, such as f2fs and ext4, that implement
vm_ops->map_pages() attempt to perform a do_fault_around() the extent of
the fault around is restricted by the area of the enclosing VMA. Since
the loader extends LOAD segment VMAs to be contiguously mapped, the extent
of the fault around is also increased. The result of which, is that the
PTEs corresponding to the padding pages are updated and reflected in the
rss_stat counters.

It is not common that userspace application developers be aware of this
nuance in the kernel's memory accounting. To avoid apparent regressions
in memory usage to userspace, restrict the fault around range to only
valid data pages (i.e. exclude the padding pages at the end of the VMA).

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I2c7a39ec1b040be2b9fb47801f95042f5dbf869d
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-02 22:14:25 +00:00
Kalesh Singh
6815ef3195 ANDROID: 16K: Use MADV_DONTNEED to save VMA padding pages.
When performing LOAD segment extension, the dynamic linker knows what
portion of the VMA is padding. In order for the kernel to implement
mitigations that ensure app compatibility, the extent of the padding
must be made available to the kernel.

To achieve this, reuse MADV_DONTNEED on single VMAs to hint the padding
range to the kernel. This information is then stored in vm_flag bits.
This allows userspace (dynamic linker) to set the padding pages on the
VMA without a need for new out-of-tree UAPI.

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I3421de32ab38ad3cb0fbce73ecbd8f7314287cde
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-02 22:14:25 +00:00
Kalesh Singh
6b9e404675 ANDROID: 16K: Introduce ELF padding representation for VMAs
The dynamic linker may extend ELF LOAD segment mappings to be contiguous
in memory when loading a 16kB compatible ELF on a 4kB page-size system.
This is done to reduce the use of unreclaimable VMA slab memory for the
otherwise necessary "gap" VMAs. The extended portion of the mapping
(VMA) can be viewed as "padding", meaning that the mapping in that range
corresponds to an area of the file that does not contain contents of the
respective segments (maybe zero's depending on how the ELF is built).

For some compatibility mitigations, the region of a VMA corresponding to
these padding sections need to be known.

In order to represent such regions without adding addtional overhead or
breaking ABI, some upper bits of vm_flags are used.

Add the VMA padding pages representation and the necessary APIs to
manipulate it.

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: Ieb9fa98e30ec9b0bec62256624f14e3ed6062a75
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-02 22:14:25 +00:00
Kalesh Singh
e79c1d4590 ANDROID: 16K: Introduce /sys/kernel/mm/pgsize_miration/enabled
Migrating from 4kB to 16kB page-size in Android requires first making
the platform page-agnostic, which involves increasing Android-ELFs'
max-page-size (p_align) from 4kB to 16kB.

Increasing the ELF max-page-size was found to cause compatibility issues
in apps that use obfuscation or depend on the ELF segments being mapped
based on 4kB-alignment.

Working around these compatibility issues involves both kernel and
userspace (dynamic linker) changes.

Introduce a knob for userspace (dynamic linker) to determine whether the
kernel supports the mitigations needed for page-size migration compatibility.

The knob also allows for userspace to turn on or off these mitigations
by writing 1 or 0 to /sys/kernel/mm/pgsize_miration/enabled:

    echo 1 > /sys/kernel/mm//pgsize_miration/enabled  # Enable
    echo 0 > /sys/kernel/mm//pgsize_miration/enabled  # Disable

Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I9ac1d15d397b8226b27827ecffa30502da91e10e
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2024-05-02 22:14:25 +00:00
Badhri Jagan Sridharan
ea3c70fb95 FROMGIT: usb: typec: tcpm: Check for port partner validity before consuming it
typec_register_partner() does not guarantee partner registration
to always succeed. In the event of failure, port->partner is set
to the error value or NULL. Given that port->partner validity is
not checked, this results in the following crash:

Unable to handle kernel NULL pointer dereference at virtual address xx
 pc : run_state_machine+0x1bc8/0x1c08
 lr : run_state_machine+0x1b90/0x1c08
..
 Call trace:
   run_state_machine+0x1bc8/0x1c08
   tcpm_state_machine_work+0x94/0xe4
   kthread_worker_fn+0x118/0x328
   kthread+0x1d0/0x23c
   ret_from_fork+0x10/0x20

To prevent the crash, check for port->partner validity before
derefencing it in all the call sites.

Cc: stable@vger.kernel.org
Fixes: c97cd0b4b54e ("usb: typec: tcpm: set initial svdm version based on pd revision")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240427202812.3435268-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 321849121
(cherry picked from commit ae11f04b452b5205536e1c02d31f8045eba249dd
https: //kernel.googlesource.com/pub/scm/linux/kernel/git/gregkh/usb
usb-linus)
Change-Id: I8e6d61816bd5ef22bc781e5d433e68ae078aac2d
Signed-off-by: Zheng Pan <zhengpan@google.com>
2024-05-01 16:07:02 -07:00
Zheng Pan
13f322e958 Revert "FROMGIT: usb: typec: tcpm: Check for port partner validity before consuming it"
This reverts commit 6657c436ed.

Revert reason: patch has mistake and need resubmit

Change-Id: Ic39b13cfe9b38d7bbbad2a99fa8e3eed44e1374b
Signed-off-by: Zheng Pan <zhengpan@google.com>
2024-05-01 15:58:48 -07:00
Badhri Jagan Sridharan
6657c436ed FROMGIT: usb: typec: tcpm: Check for port partner validity before consuming it
typec_register_partner() does not guarantee partner registration
to always succeed. In the event of failure, port->partner is set
to the error value or NULL. Given that port->partner validity is
not checked, this results in the following crash:

Unable to handle kernel NULL pointer dereference at virtual address xx
 pc : run_state_machine+0x1bc8/0x1c08
 lr : run_state_machine+0x1b90/0x1c08
..
 Call trace:
   run_state_machine+0x1bc8/0x1c08
   tcpm_state_machine_work+0x94/0xe4
   kthread_worker_fn+0x118/0x328
   kthread+0x1d0/0x23c
   ret_from_fork+0x10/0x20

To prevent the crash, check for port->partner validity before
derefencing it in all the call sites.

Cc: stable@vger.kernel.org
Fixes: c97cd0b4b54e ("usb: typec: tcpm: set initial svdm version based on pd revision")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240427202812.3435268-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 321849121
(cherry picked from commit ae11f04b452b5205536e1c02d31f8045eba249dd
https://kernel.googlesource.com/pub/scm/linux/kernel/git/gregkh/usb
usb-linus)
Change-Id: I01510c86e147b3011afc5d475fc1dc38d2636a60
Signed-off-by: Zheng Pan <zhengpan@google.com>
2024-05-01 12:30:56 -07:00
xieliujie
1d37bc9913 ANDROID: vendor_hooks: add symbols for lazy preemption
add some symnols to achieve the lazy preemption feature in our baseline.
- __traceiter_android_vh_read_lazy_flag
- __traceiter_android_vh_set_tsk_need_resched_lazy
- __tracepoint_android_vh_read_lazy_flag
- __tracepoint_android_vh_set_tsk_need_resched_lazy
Bug: 336982374

Change-Id: I7807617575da9365edd2e8fccd01a22913aaffc1
Signed-off-by: xieliujie <xieliujie@oppo.com>
2024-04-30 17:11:10 +00:00
xieliujie
14f07c1db0 ANDROID: vendor_hooks: add two hooks for lazy preemption
add some changes to achieve the lazy preemption feature in our baseline.
- android_vh_read_lazy_flag
- android_vh_set_tsk_need_resched_lazy
Bug: 336982374

Change-Id: I09f1110a2a11da4dbf0d4d0cca3500d1a6ee6a74
Signed-off-by: xieliujie <xieliujie@oppo.com>
2024-04-30 17:06:35 +00:00
Vincent Donnefort
6364d59412 ANDROID: KVM: arm64: wait_for_initramfs for pKVM module loading procfs
Of course, the initramfs needs to be ready before procfs can be
mounted... in the initramfs. While at it, only mount if a pKVM module
must be loaded and only print a warning in case of failure.

Bug: 278749606
Bug: 301483379
Bug: 331152809
Change-Id: Ie56bd26d4575f69cb1f06ba6317a098649f6da44
Reported-by: Mankyum Kim <mankyum.kim@samsung-slsi.corp-partner.google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
(cherry picked from commit 7d5843b59548672c23c977b4666c3779d31695fb)
2024-04-30 11:00:27 +00:00
Vincent Palomares
4744b3a4ed ANDROID: GKI: Expose device async to userspace
Setting CONFIG_PM_ADVANCED_DEBUG=y to expose device async fields
to userspace, allowing to fine-tune the suspend/resume path.

Bug: 235135485
Change-Id: I75060e88ce0c1e199aa8740f446a2c0f8167f3d7
Signed-off-by: Vincent Palomares <paillon@google.com>
2024-04-29 20:04:23 +00:00
Suzuki K Poulose
08cc4037cf FROMGIT: coresight: etm4x: Fix access to resource selector registers
Resource selector pair 0 is always implemented and reserved. We must not
touch it, even during save/restore for CPU Idle. Rest of the driver is
well behaved. Fix the offending ones.

Reported-by: Yabin Cui <yabinc@google.com>
Fixes: f188b5e76a ("coresight: etm4x: Save/restore state across CPU low power states")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Yabin Cui <yabinc@google.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Link: https://lore.kernel.org/r/20240412142702.2882478-5-suzuki.poulose@arm.com

Bug: 335234033
(cherry picked from commit d6fc00d0f640d6010b51054aa8b0fd191177dbc9
 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
 next)
Change-Id: I5f3385cb269969a299402fa258b30ab43e95805f
Signed-off-by: Yabin Cui <yabinc@google.com>
2024-04-26 12:54:24 -07:00
Suzuki K Poulose
7ff054397a FROMGIT: coresight: etm4x: Safe access for TRCQCLTR
ETM4x implements TRCQCLTR only when the Q elements are supported
and the Q element filtering is supported (TRCIDR0.QFILT). Access
to the register otherwise could be fatal. Fix this by tracking the
availability, like the others.

Fixes: f188b5e76a ("coresight: etm4x: Save/restore state across CPU low power states")
Reported-by: Yabin Cui <yabinc@google.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Yabin Cui <yabinc@google.com>
Link: https://lore.kernel.org/r/20240412142702.2882478-4-suzuki.poulose@arm.com

Bug: 335234033
(cherry picked from commit 46bf8d7cd8530eca607379033b9bc4ac5590a0cd
 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
 next)
Change-Id: Id848fa14ba8003149f76b5ca54562593f6164150
Signed-off-by: Yabin Cui <yabinc@google.com>
2024-04-26 12:54:24 -07:00
Suzuki K Poulose
f401cce7d9 FROMGIT: coresight: etm4x: Do not save/restore Data trace control registers
ETM4x doesn't support Data trace on A class CPUs. As such do not access the
Data trace control registers during CPU idle. This could cause problems for
ETE. While at it, remove all references to the Data trace control registers.

Fixes: f188b5e76a ("coresight: etm4x: Save/restore state across CPU low power states")
Reported-by: Yabin Cui <yabinc@google.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Yabin Cui <yabinc@google.com>
Link: https://lore.kernel.org/r/20240412142702.2882478-3-suzuki.poulose@arm.com

Bug: 335234033
(cherry picked from commit 5eb3a0c2c52368cb9902e9a6ea04888e093c487d
 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
 next)
Change-Id: I06977d86aa2d876d166db0fac8fbccf48fd07229
Signed-off-by: Yabin Cui <yabinc@google.com>
2024-04-26 12:54:24 -07:00
Suzuki K Poulose
d9604db041 FROMGIT: coresight: etm4x: Do not hardcode IOMEM access for register restore
When we restore the register state for ETM4x, while coming back
from CPU idle, we hardcode IOMEM access. This is wrong and could
blow up for an ETM with system instructions access (and for ETE).

Fixes: f5bd523690 ("coresight: etm4x: Convert all register accesses")
Reported-by: Yabin Cui <yabinc@google.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Yabin Cui <yabinc@google.com>
Link: https://lore.kernel.org/r/20240412142702.2882478-2-suzuki.poulose@arm.com

Bug: 335234033
(cherry picked from commit 1e7ba33fa591de1cf60afffcabb45600b3607025
 https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
 next)
Change-Id: Id2ea066374933de51a90f1fca8304338b741845d
Signed-off-by: Yabin Cui <yabinc@google.com>
2024-04-26 12:54:24 -07:00
Norihiko Hama
fa87a072a7 ANDROID: GKI: Update honda symbol list for led-trigger
Add some missing symbols required for led-trigger

2 function symbol(s) added
  'u32* led_get_default_pattern(struct led_classdev*, unsigned int*)'
  'void led_set_brightness(struct led_classdev*, unsigned int)'

Bug: 333795249
Change-Id: I9935592d63175a2328c2b8a95556fd3ee6898fdd
Signed-off-by: Norihiko Hama <Norihiko.Hama@alpsalpine.com>
2024-04-24 22:45:35 +00:00
xieliujie
c61278bb70 ANDROID: GKI: Update symbols to symbol list
Update symbols for vendor hooks of reader optimistic spin.

4 function symbol(s) added
  'int __traceiter_android_vh_rwsem_direct_rsteal(void*, struct rw_semaphore*, bool*)'
  'int __traceiter_android_vh_rwsem_optimistic_rspin(void*, struct rw_semaphore*, long*, bool*)'
  'bool osq_lock(struct optimistic_spin_queue*)'
  'void osq_unlock(struct optimistic_spin_queue*)'

2 variable symbol(s) added
  'struct tracepoint __tracepoint_android_vh_rwsem_direct_rsteal'
  'struct tracepoint __tracepoint_android_vh_rwsem_optimistic_rspin'

Bug: 331742151
Change-Id: I6603ec88f84a9a8adb30b802ba2fdd9b0dc8a016
Signed-off-by: xieliujie <xieliujie@oppo.com>
2024-04-24 10:51:56 +08:00
xieliujie
260bfad693 ANDROID: vendor_hook: Add hooks to support reader optimistic spin in rwsem
Since upstream commit 617f3ef951 ("locking/rwsem: Remove
reader optimistic spinning"), vendors have seen increased
contention and blocking on rwsems.

There are attempts to actively fix this upstream:
  https://lore.kernel.org/lkml/20240406081126.8030-1-bongkyu7.kim@samsung.com/

But in the meantime, provide vendorhooks so that vendors can
implement their own optimistic spin routine. In doing so,
vendors see improvements in cold launch times on important apps.

Bug: 331742151
Change-Id: I7466413de9ee1293e86f73880931235d7a9142ac
Signed-off-by: xieliujie <xieliujie@oppo.com>
[jstultz: Rewrote commit message]
Signed-off-by: John Stultz <jstultz@google.com>
2024-04-24 10:30:39 +08:00
Michal Luczaj
d0c6724b0f UPSTREAM: af_unix: Fix garbage collector racing against connect()
[ Upstream commit 47d8ac011fe1c9251070e1bd64cb10b48193ec51 ]

Garbage collector does not take into account the risk of embryo getting
enqueued during the garbage collection. If such embryo has a peer that
carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list.

sockets are AF_UNIX/SOCK_STREAM
S is an unconnected socket
L is a listening in-flight socket bound to addr, not in fdtable
V's fd will be passed via sendmsg(), gets inflight count bumped

connect(S, addr)	sendmsg(S, [V]); close(V)	__unix_gc()
----------------	-------------------------	-----------

NS = unix_create1()
skb1 = sock_wmalloc(NS)
L = unix_find_other(addr)
unix_state_lock(L)
unix_peer(S) = NS
			// V count=1 inflight=0

 			NS = unix_peer(S)
 			skb2 = sock_alloc()
			skb_queue_tail(NS, skb2[V])

			// V became in-flight
			// V count=2 inflight=1

			close(V)

			// V count=1 inflight=1
			// GC candidate condition met

						for u in gc_inflight_list:
						  if (total_refs == inflight_refs)
						    add u to gc_candidates

						// gc_candidates={L, V}

						for u in gc_candidates:
						  scan_children(u, dec_inflight)

						// embryo (skb1) was not
						// reachable from L yet, so V's
						// inflight remains unchanged
__skb_queue_tail(L, skb1)
unix_state_unlock(L)
						for u in gc_candidates:
						  if (u.inflight)
						    scan_children(u, inc_inflight_move_tail)

						// V count=1 inflight=2 (!)

If there is a GC-candidate listening socket, lock/unlock its state. This
makes GC wait until the end of any ongoing connect() to that socket. After
flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
this point, unix_inflight() can not happen because unix_gc_lock is already
taken. Inflight graph remains unaffected.

Bug: 336226035
Fixes: 1fd05ba5a2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 507cc232ffe53a352847893f8177d276c3b532a9)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If321f78b8b3220f5a1caea4b5e9450f1235b0770
2024-04-22 16:23:05 -07:00
Kuniyuki Iwashima
94c88f80ff UPSTREAM: af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
[ Upstream commit 97af84a6bba2ab2b9c704c08e67de3b5ea551bb2 ]

When touching unix_sk(sk)->inflight, we are always under
spin_lock(&unix_gc_lock).

Let's convert unix_sk(sk)->inflight to the normal unsigned long.

Bug: 336226035
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 301fdbaa0bba4653570f07789909939f977a7620)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I0d965d5f2a863d798c06de9f21d0467f256b538e
2024-04-22 16:22:57 -07:00
Lokesh Gidra
3dfddcb9c2 ANDROID: GKI: fix ABI breakage in struct userfaultfd_ctx
The following two commits move 'userfaultfd_ctx' struct from
fs/userfaultfd.c to header file and then add a rw_semaphore to it. The
ABI is broken by the change. However, given that the type should be
private and not accessed by vendor modules, use some GENKSYMS #define
magic to preserve the CRC. Also update the .stg file for offset
adjustment within 'userfaultfd_ctx'.

5e4c24a57b0c ("userfaultfd: protect mmap_changing with rw_sem in userfaulfd_ctx")
f91e6b41dd11 ("userfaultfd: move userfaultfd_ctx struct to header file")

Bug: 320478828
Change-Id: I5f97ff34dd8c88fe3d18c4dc902452488ba28cbd
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Lokesh Gidra
8dd482be44 UPSTREAM: userfaultfd: fix deadlock warning when locking src and dst VMAs
Use down_read_nested() to avoid the warning.

Link: https://lkml.kernel.org/r/20240321235818.125118-1-lokeshgidra@google.com
Fixes: 867a43a34ff8 ("userfaultfd: use per-vma locks in userfaultfd operations")
Reported-by: syzbot+49056626fe41e01f2ba7@syzkaller.appspotmail.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jann Horn <jannh@google.com> [Bug #2]
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 30af24facf0aed12dec23bdf6eac6a907f88306a)

Bug: 320478828
Change-Id: I56d7e33878d6248bba28e1e4204e2b9005d87e4d
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Lokesh Gidra
ce2896c0c6 BACKPORT: userfaultfd: use per-vma locks in userfaultfd operations
All userfaultfd operations, except write-protect, opportunistically use
per-vma locks to lock vmas.  On failure, attempt again inside mmap_lock
critical section.

Write-protect operation requires mmap_lock as it iterates over multiple
vmas.

Link: https://lkml.kernel.org/r/20240215182756.3448972-5-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 867a43a34ff8a38772212045262b2c9b77807ea3)
Conflicts:
	mm/userfaultfd.c

1. Resolve conflict in validate_dst_vma() due to absence of
   range_in_vma().
2. Use 'page' instead of 'folio' for BUG_ON on copy_from_user() failure
   in COPY ioctl.
3. Resolve conflict around mfill_file_over_size().
4. Resolve conflict in comment for __mcopy_atomic_hugetlb() due to
   function name change.
5. Resolve conflict due to use of 'flags' instead of 'mode' in
   __mcopy_atomic_hugetlb().
6. Use find_vma() and validate_dst_vma() in mwriteprotect_range()
   instead of find_dst_vma().

Bug: 320478828
Change-Id: I6d5b7101218cb1b11329108c3f31f12bb1caebc6
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Lokesh Gidra
daf0b0fc4a BACKPORT: mm: add vma_assert_locked() for !CONFIG_PER_VMA_LOCK
vma_assert_locked() is needed to replace mmap_assert_locked() once we
start using per-vma locks in userfaultfd operations.

In !CONFIG_PER_VMA_LOCK case when mm is locked, it implies that the given
VMA is locked.

Link: https://lkml.kernel.org/r/20240215182756.3448972-4-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 32af81af2f6f4c23b1b4ff68410e91da660af102)
Conflicts:
	include/linux/mm.h

1. lock_vma_under_rcu() definition in !CONFIG_PER_VMA_LOCK case doesn't
   exist in 6.1. Resolved cherry-pick conflict due to that.

Bug: 320478828
Change-Id: I76d414cd08c3d696d3886921a7e27cf94fd17b76
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Lokesh Gidra
a5b6040d5c BACKPORT: userfaultfd: protect mmap_changing with rw_sem in userfaulfd_ctx
Increments and loads to mmap_changing are always in mmap_lock critical
section.  This ensures that if userspace requests event notification for
non-cooperative operations (e.g.  mremap), userfaultfd operations don't
occur concurrently.

This can be achieved by using a separate read-write semaphore in
userfaultfd_ctx such that increments are done in write-mode and loads in
read-mode, thereby eliminating the dependency on mmap_lock for this
purpose.

This is a preparatory step before we replace mmap_lock usage with per-vma
locks in fill/move ioctls.

Link: https://lkml.kernel.org/r/20240215182756.3448972-3-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 5e4c24a57b0c126686534b5b159a406c5dd02400)
Conflicts:
	fs/userfaultfd.c
	include/linux/userfaultfd_k.h
	mm/userfaultfd.c

1. Functions passing control from fs/userfaultfd.c to mm/userfaultfd.c
   were renamed after 6.1.
   a. Replace mfill_atomic_copy() with mcopy_atomic()
   b. Replace mfill_atomic_zeropage() with mfill_zeropage()
   c. Replace mfill_atomic_continue() with mcopy_continue()
   d. Replace mfill_atomic() with __mcopy_atomic()
   e. Replace mfill_atomic_hugetlb() with __mcopy_atomic_hugetlb()
2. uffd flags were unified into a single parameter after 6.1. Replace
   'flags' with 'mcopy_mode' and 'mode'.
3. Fetch dst_mm from dst_vma in __mcopy_atomic_hugetlb().

Bug: 320478828
Change-Id: I77615c36a0c891801c9eb9de3609df4e7f125c39
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Lokesh Gidra
6b5ee039a1 BACKPORT: userfaultfd: move userfaultfd_ctx struct to header file
Patch series "per-vma locks in userfaultfd", v7.

Performing userfaultfd operations (like copy/move etc.) in critical
section of mmap_lock (read-mode) causes significant contention on the lock
when operations requiring the lock in write-mode are taking place
concurrently.  We can use per-vma locks instead to significantly reduce
the contention issue.

Android runtime's Garbage Collector uses userfaultfd for concurrent
compaction.  mmap-lock contention during compaction potentially causes
jittery experience for the user.  During one such reproducible scenario,
we observed the following improvements with this patch-set:

- Wall clock time of compaction phase came down from ~3s to <500ms
- Uninterruptible sleep time (across all threads in the process) was
  ~10ms (none in mmap_lock) during compaction, instead of >20s

This patch (of 4):

Move the struct to userfaultfd_k.h to be accessible from mm/userfaultfd.c.
There are no other changes in the struct.

This is required to prepare for using per-vma locks in userfaultfd
operations.

Link: https://lkml.kernel.org/r/20240215182756.3448972-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20240215182756.3448972-2-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit f91e6b41dd11daffb138e3afdb4804aefc3d4e1b)
Conflicts:
	include/linux/userfaultfd_k.h

1. Retain 'sysctl_unprivileged_userfaultfd' global variable.

Bug: 320478828
Change-Id: Iebaae028d5e793dd50342b141c1d46b79026834a
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Lokesh Gidra
ac96edb501 BACKPORT: userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
In mfill_atomic_hugetlb(), mmap_changing isn't being checked
again if we drop mmap_lock and reacquire it. When the lock is not held,
mmap_changing could have been incremented. This is also inconsistent
with the behavior in mfill_atomic().

Link: https://lkml.kernel.org/r/20240117223729.1444522-1-lokeshgidra@google.com
Fixes: df2cc96e77 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races")
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 67695f18d55924b2013534ef3bdc363bc9e14605)
Conflicts:
	mm/userfaultfd.c

1. Update mfill_atomic_hugetlb() parameters to pass 'wp_copy' and 'mode'
   instead of 'flags'.

Bug: 320478828
Change-Id: I11ef09b2b8e477c32cc731205fd48b25bcbd020f
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Suren Baghdasaryan
51eab7ecc4 BACKPORT: selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
Add a test for UFFDIO_MOVE ioctl operating on a hugepage which has to be
split because destination is marked with MADV_NOHUGEPAGE.  With this we
cover all 3 cases: normal page move, hugepage move, hugepage splitting
before move.

Link: https://lkml.kernel.org/r/20231230025636.2477429-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit a5b7620bab81f16e8bbb04f4aea94c4c7feb0d77)
Conflicts:
	tools/testing/selftests/mm/uffd-unit-tests.c
	tools/testing/selftests/vm/userfaultfd.c

1. Add request_src_hugepages() to enable THP on src
2. Add madvise() to enable THP on dst in request_hugepages()
3. Add request_split_hugepages() to enable THP on src and disable on dst
4. Change return type of uffd_move_pmd_split_test() to int

Bug: 274911254
Change-Id: I21147a5b7f3e8bbe2befa8bff536e62826e9f6e3
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Suren Baghdasaryan
f152691515 BACKPORT: selftests/mm: add UFFDIO_MOVE ioctl test
Add tests for new UFFDIO_MOVE ioctl which uses uffd to move source into
destination buffer while checking the contents of both after the move.
After the operation the content of the destination buffer should match the
original source buffer's content while the source buffer should be zeroed.
Separate tests are designed for PMD aligned and unaligned cases because
they utilize different code paths in the kernel.

Link: https://lkml.kernel.org/r/20231206103702.3873743-6-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit a2bf6a9ca80532b75f8f8b6a1cd75ef7e5150576)
Conflicts:
	tools/testing/selftests/mm/uffd-common.c
	tools/testing/selftests/mm/uffd-common.h
	tools/testing/selftests/mm/uffd-unit-tests.c
	tools/testing/selftests/vm/userfaultfd.c

1. Removed errmsg parameter from prevent_hugepages() and post_hugepages()
2. Removed uffd_test_args parameter from uffd_move_* functions
3. Added uffd_test_case_ops parameter in uffd_move_test_common()
4. Added userfaultfd_move_test() for all 'move' tests, called from
   userfaultfd_stress()
5. Added 'test_uffdio_move' global bool variable, which is set to true
   only when testing anon mappings
6. Added call to uffd_test_ctx_init() and uffd_test_ctx_clear() in
   uffd_move_test_common()
7. Replaced uffd_args with uffd_stats
8. Converted return type of uffd_move_test() and uffd_move_pmd_test() to
   `int`
9. Added uffd_test_page_fault_handler as uffd_args doesn't exist.
   uffd_poll_thread() checks if it is NULL then calls
   uffd_handle_page_fault().
10. Replaced uffd_register() (isn't defined on 6.1) with UFFDIO_REGISTER
    ioctl call
11. Added printf() calls to log when the test is starting and finishing.
12. Change return type of uffd_move_test_common() to int.

Bug: 274911254
Change-Id: I1c68445d9c64533aab0ba27c2e010347d0807981
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Suren Baghdasaryan
a5d504c067 BACKPORT: selftests/mm: add uffd_test_case_ops to allow test case-specific operations
Currently each test can specify unique operations using uffd_test_ops,
however these operations are per-memory type and not per-test.  Add
uffd_test_case_ops which each test case can customize for its own needs
regardless of the memory type being used.  Pre- and post-allocation
operations are added, some of which will be used in the next patch to
implement test-specific operations like madvise after memory is allocated
but before it is accessed.

Link: https://lkml.kernel.org/r/20231206103702.3873743-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit e8a422408ba9760e2640ca57e4b79c3dd7f48bd2)
Conflicts:
	tools/testing/selftests/mm/uffd-common.c
	tools/testing/selftests/mm/uffd-common.h
	tools/testing/selftests/mm/uffd-unit-tests.c
	tools/testing/selftests/vm/userfaultfd.c

1. Userfaultfd selftest was split into separate uffd-* files and moved
   to selftests/mm.
2. In 6.1 there is no mechanism to run individual unit-tests. All
   unit-tests are run after stress test. Consequently, the tests are not
   abstracted using 'uffd_test_case_t'. Therefore, added
   'uffd_test_case_ops' as a parameter to uffd_test_ctx_init().

Bug: 274911254
Change-Id: I6480abf1709ca717d9baad5047bf675852f10726
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Suren Baghdasaryan
ee72d5a7d9 BACKPORT: selftests/mm: call uffd_test_ctx_clear at the end of the test
uffd_test_ctx_clear() is being called from uffd_test_ctx_init() to unmap
areas used in the previous test run.  This approach is problematic because
while unmapping areas uffd_test_ctx_clear() uses page_size and nr_pages
which might differ from one test run to another.  Fix this by calling
uffd_test_ctx_clear() after each test is done.

Link: https://lkml.kernel.org/r/20231206103702.3873743-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 1c8d39fa7b63dcbb77af7b0325fdc519c35fe618)
Conflicts:
	tools/testing/selftests/mm/uffd-common.c
	tools/testing/selftests/mm/uffd-common.h
	tools/testing/selftests/mm/uffd-stress.c
	tools/testing/selftests/mm/uffd-unit-tests.c
	tools/testing/selftests/vm/userfaultfd.c

1. Userfaultfd selftest was split into separate files and as a consequence
   the code moved from selftests/vm/userfaultfd.c to selftests/mm/uffd_* files.

Bug: 274911254
Change-Id: Ic224c3965a645342dc0f41e743d3c072b7bb852e
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Lokesh Gidra
abd6748ba6 UPSTREAM: userfaultfd: fix return error if mmap_changing is non-zero in MOVE ioctl
To be consistent with other uffd ioctl's returning EAGAIN when
mmap_changing is detected, we should change UFFDIO_MOVE to do the same.

Link: https://lkml.kernel.org/r/20240117223922.1445327-1-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 6ca03f1bb5a7427a66df62c954b3500a4255cdb9)

Bug: 274911254
Change-Id: I3499e3987bf72d3d7a307165e0f9d1ed6d2b0611
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Lokesh Gidra
4f658d7723 BACKPORT: userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE
Commit d7a08838ab74 ("mm: userfaultfd: fix unexpected change to src_folio
when UFFDIO_MOVE fails") moved the src_folio->{mapping, index} changing to
after clearing the page-table and ensuring that it's not pinned.  This
avoids failure of swapout+migration and possibly memory corruption.

However, the commit missed fixing it in the huge-page case.

Link: https://lkml.kernel.org/r/20240404171726.2302435-1-lokeshgidra@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit c0205eaf3af9f5db14d4b5ee4abacf4a583c3c50)
Conflicts:
	mm/huge_memory.c

1. Replace folio_move_anon_rmap() with page_move_anon_rmap().

Bug: 274911254
Change-Id: I15a07ea22de7ae38ed20320a73c995c7c48ef42b
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Qi Zheng
bfb4b24b64 BACKPORT: mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE fails
After ptep_clear_flush(), if we find that src_folio is pinned we will fail
UFFDIO_MOVE and put src_folio back to src_pte entry, but the change to
src_folio->{mapping,index} is not restored in this process. This is not
what we expected, so fix it.

This can cause the rmap for that page to be invalid, possibly resulting
in memory corruption.  At least swapout+migration would no longer work,
because we might fail to locate the mappings of that folio.

Link: https://lkml.kernel.org/r/20240222080815.46291-1-zhengqi.arch@bytedance.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit d7a08838ab74652f2b53fee9763f0178278c3a4b)
Conflicts:
	mm/userfaultfd.c

1. Replace folio_move_anon_rmap() with page_move_anon_rmap().

Bug: 274911254
Change-Id: Ie4bf5785244271ab233c6230ed71460fd571bd1a
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Suren Baghdasaryan
6ecd08eaf4 BACKPORT: userfaultfd: handle zeropage moves by UFFDIO_MOVE
Current implementation of UFFDIO_MOVE fails to move zeropages and returns
EBUSY when it encounters one.  We can handle them by mapping a zeropage at
the destination and clearing the mapping at the source.  This is done both
for ordinary and for huge zeropages.

Link: https://lkml.kernel.org/r/20240131175618.2417291-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202401300107.U8iMAkTl-lkp@intel.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit eb1521dad8f391d3f3b88f589db27a288b55b8ed)
Conflicts:
	mm/huge_memory.c

1. Replace folio_move_anon_rmap() with page_move_anon_rmap().
2. Remove vma parameter in pmd_mkwrite() calls.

Bug: 274911254
Change-Id: I271aa365bb3930e7e480d5749b44863eeca74dda
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Suren Baghdasaryan
e275c2b743 UPSTREAM: userfaultfd: avoid huge_zero_page in UFFDIO_MOVE
While testing UFFDIO_MOVE ioctl, syzbot triggered VM_BUG_ON_PAGE caused by
a call to PageAnonExclusive() with a huge_zero_page as a parameter.
UFFDIO_MOVE does not yet handle zeropages and returns EBUSY when one is
encountered.  Add an early huge_zero_page check in the PMD move path to
avoid this situation.

Link: https://lkml.kernel.org/r/20240112013935.1474648-1-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Reported-by: syzbot+705209281e36404998f6@syzkaller.appspotmail.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 5d4747a6cc8e78ce74742d557fc9b7697fcacc95)

Bug: 274911254
Change-Id: I7096b02b3a5b101e049608703ee77179d469a434
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Suren Baghdasaryan
60c5a0e023 UPSTREAM: userfaultfd: fix move_pages_pte() splitting folio under RCU read lock
While testing the split PMD path with lockdep enabled I've got an "Invalid
wait context" error caused by split_huge_page_to_list() trying to lock
anon_vma->rwsem while inside RCU read section.  The issues is due to
move_pages_pte() calling split_folio() under RCU read lock.  Fix this by
unmapping the PTEs and exiting RCU read section before splitting the folio
and then retrying.  The same retry pattern is used when locking the folio
or anon_vma in this function.  After splitting the large folio we unlock
and release it because after the split the old folio might not be the one
that contains the src_addr.

Link: https://lkml.kernel.org/r/20240102233256.1077959-1-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 982ae058b2f08f576e4f3d4055f8916ba789f3d4)

Bug: 274911254
Change-Id: I382c6631d821b0ed26d9b15afa78a417dafaeb2e
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Andrea Arcangeli
5025ad140e BACKPORT: userfaultfd: UFFDIO_MOVE uABI
Implement the uABI of UFFDIO_MOVE ioctl.
UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application
needs pages to be allocated [1]. However, with UFFDIO_MOVE, if pages are
available (in userspace) for recycling, as is usually the case in heap
compaction algorithms, then we can avoid the page allocation and memcpy
(done by UFFDIO_COPY). Also, since the pages are recycled in the
userspace, we avoid the need to release (via madvise) the pages back to
the kernel [2].

We see over 40% reduction (on a Google pixel 6 device) in the compacting
thread's completion time by using UFFDIO_MOVE vs.  UFFDIO_COPY.  This was
measured using a benchmark that emulates a heap compaction implementation
using userfaultfd (to allow concurrent accesses by application threads).
More details of the usecase are explained in [2].  Furthermore,
UFFDIO_MOVE enables moving swapped-out pages without touching them within
the same vma.  Today, it can only be done by mremap, however it forces
splitting the vma.

[1] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redhat.com/
[2] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyjniNVjp0Aw@mail.gmail.com/

Update for the ioctl_userfaultfd(2)  manpage:

   UFFDIO_MOVE
       (Since Linux xxx)  Move a continuous memory chunk into the
       userfault registered range and optionally wake up the blocked
       thread. The source and destination addresses and the number of
       bytes to move are specified by the src, dst, and len fields of
       the uffdio_move structure pointed to by argp:

           struct uffdio_move {
               __u64 dst;    /* Destination of move */
               __u64 src;    /* Source of move */
               __u64 len;    /* Number of bytes to move */
               __u64 mode;   /* Flags controlling behavior of move */
               __s64 move;   /* Number of bytes moved, or negated error */
           };

       The following value may be bitwise ORed in mode to change the
       behavior of the UFFDIO_MOVE operation:

       UFFDIO_MOVE_MODE_DONTWAKE
              Do not wake up the thread that waits for page-fault
              resolution

       UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES
              Allow holes in the source virtual range that is being moved.
              When not specified, the holes will result in ENOENT error.
              When specified, the holes will be accounted as successfully
              moved memory. This is mostly useful to move hugepage aligned
              virtual regions without knowing if there are transparent
              hugepages in the regions or not, but preventing the risk of
              having to split the hugepage during the operation.

       The move field is used by the kernel to return the number of
       bytes that was actually moved, or an error (a negated errno-
       style value).  If the value returned in move doesn't match the
       value that was specified in len, the operation fails with the
       error EAGAIN.  The move field is output-only; it is not read by
       the UFFDIO_MOVE operation.

       The operation may fail for various reasons. Usually, remapping of
       pages that are not exclusive to the given process fail; once KSM
       might deduplicate pages or fork() COW-shares pages during fork()
       with child processes, they are no longer exclusive. Further, the
       kernel might only perform lightweight checks for detecting whether
       the pages are exclusive, and return -EBUSY in case that check fails.
       To make the operation more likely to succeed, KSM should be
       disabled, fork() should be avoided or MADV_DONTFORK should be
       configured for the source VMA before fork().

       This ioctl(2) operation returns 0 on success.  In this case, the
       entire area was moved.  On error, -1 is returned and errno is
       set to indicate the error.  Possible errors include:

       EAGAIN The number of bytes moved (i.e., the value returned in
              the move field) does not equal the value that was
              specified in the len field.

       EINVAL Either dst or len was not a multiple of the system page
              size, or the range specified by src and len or dst and len
              was invalid.

       EINVAL An invalid bit was specified in the mode field.

       ENOENT
              The source virtual memory range has unmapped holes and
              UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES is not set.

       EEXIST
              The destination virtual memory range is fully or partially
              mapped.

       EBUSY
              The pages in the source virtual memory range are either
              pinned or not exclusive to the process. The kernel might
              only perform lightweight checks for detecting whether the
              pages are exclusive. To make the operation more likely to
              succeed, KSM should be disabled, fork() should be avoided
              or MADV_DONTFORK should be configured for the source virtual
              memory area before fork().

       ENOMEM Allocating memory needed for the operation failed.

       ESRCH
              The target process has exited at the time of a UFFDIO_MOVE
              operation.

Link: https://lkml.kernel.org/r/20231206103702.3873743-3-surenb@google.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit adef440691bab824e39c1b17382322d195e1fab0)
Conflicts:
        mm/huge_memory.c
        mm/userfaultfd.c

1. Add vma parameter in mmu_notifier_range_init() calls.
2. Replace folio_move_anon_rmap() with page_move_anon_rmap().
3. Remove vma parameter in pmd_mkwrite() calls.
4. Replace pte_offset_map_nolock() with pte_offset_map()+pte_lockptr()
combo.
5. Remove VM_SHADOW_STACK in vma_move_compatible().
6. Replace pmdp_get_lockless() with pmd_read_atomic().

Bug: 274911254
Change-Id: I1116f15a395f1a8bac176906f7f9c2411e59dc54
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Andrea Arcangeli
25db7c13d8 UPSTREAM: mm/rmap: support move to different root anon_vma in folio_move_anon_rmap()
Patch series "userfaultfd move option", v6.

This patch series introduces UFFDIO_MOVE feature to userfaultfd, which has
long been implemented and maintained by Andrea in his local tree [1], but
was not upstreamed due to lack of use cases where this approach would be
better than allocating a new page and copying the contents.  Previous
upstraming attempts could be found at [6] and [7].

UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application
needs pages to be allocated [2].  However, with UFFDIO_MOVE, if pages are
available (in userspace) for recycling, as is usually the case in heap
compaction algorithms, then we can avoid the page allocation and memcpy
(done by UFFDIO_COPY).  Also, since the pages are recycled in the
userspace, we avoid the need to release (via madvise) the pages back to
the kernel [3].  We see over 40% reduction (on a Google pixel 6 device) in
the compacting thread's completion time by using UFFDIO_MOVE vs.
UFFDIO_COPY.  This was measured using a benchmark that emulates a heap
compaction implementation using userfaultfd (to allow concurrent accesses
by application threads).  More details of the usecase are explained in
[3].

Furthermore, UFFDIO_MOVE enables moving swapped-out pages without
touching them within the same vma. Today, it can only be done by mremap,
however it forces splitting the vma.

TODOs for follow-up improvements:
- cross-mm support. Known differences from single-mm and missing pieces:
	- memcg recharging (might need to isolate pages in the process)
	- mm counters
	- cross-mm deposit table moves
	- cross-mm test
	- document the address space where src and dest reside in struct
	  uffdio_move

- TLB flush batching.  Will require extensive changes to PTL locking in
  move_pages_pte().  OTOH that might let us reuse parts of mremap code.

This patch (of 5):

For now, folio_move_anon_rmap() was only used to move a folio to a
different anon_vma after fork(), whereby the root anon_vma stayed
unchanged.  For that, it was sufficient to hold the folio lock when
calling folio_move_anon_rmap().

However, we want to make use of folio_move_anon_rmap() to move folios
between VMAs that have a different root anon_vma.  As folio_referenced()
performs an RMAP walk without holding the folio lock but only holding the
anon_vma in read mode, holding the folio lock is insufficient.

When moving to an anon_vma with a different root anon_vma, we'll have to
hold both, the folio lock and the anon_vma lock in write mode.
Consequently, whenever we succeeded in folio_lock_anon_vma_read() to
read-lock the anon_vma, we have to re-check if the mapping was changed in
the meantime.  If that was the case, we have to retry.

Note that folio_move_anon_rmap() must only be called if the anon page is
exclusive to a process, and must not be called on KSM folios.

This is a preparation for UFFDIO_MOVE, which will hold the folio lock, the
anon_vma lock in write mode, and the mmap_lock in read mode.

Link: https://lkml.kernel.org/r/20231206103702.3873743-1-surenb@google.com
Link: https://lkml.kernel.org/r/20231206103702.3873743-2-surenb@google.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: kernel-team@android.com
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 880a99b60d467eefd96322e27b0a8c0b805dfa43)

Bug: 274911254
Change-Id: Iad9619c0273e050af26356f66ae9fc88b56d68bd
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
Nikhil V
503add1843 ANDROID: PM: hibernate: Encryption support with compression
Currently only the uncompressed hibernation snapshot image is encrypted
before being written to the swap partition. Extend the encryption
support for compression enabled scenarios as well.

Bug: 335581841
Change-Id: Ida781b727f56b664a67e2887a4db3d6b355dafdb
Signed-off-by: Nikhil V <quic_nprakash@quicinc.com>
2024-04-22 11:05:17 +05:30
Nikhil V
3e99ae28ea ANDROID: abi_gki_aarch64_qcom: Update symbol list
Add android_vh_hibernate_save_cmp_len, android_vh_hibernated_do_mem_alloc
symbols to support compression with hibernation.

Symbols added:
	__traceiter_android_vh_hibernate_save_cmp_len
	__traceiter_android_vh_hibernated_do_mem_alloc
	__tracepoint_android_vh_hibernate_save_cmp_len
	__tracepoint_android_vh_hibernated_do_mem_alloc

Bug: 335581841
Change-Id: I99e704bd54f220bac180c5bbfec48da44359f27e
Signed-off-by: Nikhil V <quic_nprakash@quicinc.com>
2024-04-22 11:04:47 +05:30
Nikhil V
8f08ea0d59 ANDROID: vendor_hooks: Add hooks to support hibernation
In case of hibernation with compression enabled, 'n' number of pages
will be compressed to 'x' number of pages before being written to the
disk. Keep a note of these compressed block counts so that bootloader
can directly read 'x' pages and pass it on to the decompressor. An
array will be maintained which will hold the count of these compressed
blocks and later on written to the the disk as part of the hibernation
image save process.

The vendor hook '__tracepoint_android_vh_hibernated_do_mem_alloc' does
the required memory allocations, for example, the array which is
dynamically allocated based on the snapshot image size so as to hold
the compressed block counts etc. This memory is later freed as part of
PM_POST_HIBERNATION notifier call.

The vendor hook '__tracepoint_android_vh_hibernate_save_cmp_len' saves
the compressed block counts to the array which is later written to the
disk.

Bug: 335581841
Change-Id: I574b641e2d9f4cd503c7768a66a7be3142c2686b
Signed-off-by: Nikhil V <quic_nprakash@quicinc.com>
2024-04-22 11:04:11 +05:30
Nikhil V
e7e8932600 ANDROID: gki_defconfig: Sync gki_defconfig
After applying commit 990d3701d0 ("BACKPORT: PM: hibernate: Move
to crypto APIs for LZO compression"), CRPTO_LZO is selected by
default. Sync the gki_defconfig accordingly. This doesn't add any
functional change.

Bug: 335581841
Change-Id: Iafa7211245f6d66a96f8f0030e2574c7a220d3a4
Signed-off-by: Nikhil V <quic_nprakash@quicinc.com>
2024-04-22 11:02:55 +05:30
Nikhil V
54c2418b76 UPSTREAM: PM: hibernate: Support to select compression algorithm
Currently the default compression algorithm is selected based on
compile time options. Introduce a module parameter "hibernate.compressor"
to override this behaviour.

Different compression algorithms have different characteristics and
hibernation may benefit when it uses any of these algorithms, especially
when a secondary algorithm(LZ4) offers better decompression speeds over
a default algorithm(LZO), which in turn reduces hibernation image
restore time.

Users can override the default algorithm in two ways:

 1) Passing "hibernate.compressor" as kernel command line parameter.
    Usage:
    	LZO: hibernate.compressor=lzo
    	LZ4: hibernate.compressor=lz4

 2) Specifying the algorithm at runtime.
    Usage:
	LZO: echo lzo > /sys/module/hibernate/parameters/compressor
	LZ4: echo lz4 > /sys/module/hibernate/parameters/compressor

Currently LZO and LZ4 are the supported algorithms. LZO is the default
compression algorithm used with hibernation.

Bug: 335581841
Signed-off-by: Nikhil V <quic_nprakash@quicinc.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 3fec6e5961b77af6a952b77f5c2ea26f7513b216)
Change-Id: I3c787939c8d37dfeb2c164de69078d303411f938
Signed-off-by: Nikhil V <quic_nprakash@quicinc.com>
2024-04-22 10:56:17 +05:30