android_kernel_samsung_sm8650/Documentation
Andrea Arcangeli 5025ad140e BACKPORT: userfaultfd: UFFDIO_MOVE uABI
Implement the uABI of UFFDIO_MOVE ioctl.
UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application
needs pages to be allocated [1]. However, with UFFDIO_MOVE, if pages are
available (in userspace) for recycling, as is usually the case in heap
compaction algorithms, then we can avoid the page allocation and memcpy
(done by UFFDIO_COPY). Also, since the pages are recycled in the
userspace, we avoid the need to release (via madvise) the pages back to
the kernel [2].

We see over 40% reduction (on a Google pixel 6 device) in the compacting
thread's completion time by using UFFDIO_MOVE vs.  UFFDIO_COPY.  This was
measured using a benchmark that emulates a heap compaction implementation
using userfaultfd (to allow concurrent accesses by application threads).
More details of the usecase are explained in [2].  Furthermore,
UFFDIO_MOVE enables moving swapped-out pages without touching them within
the same vma.  Today, it can only be done by mremap, however it forces
splitting the vma.

[1] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redhat.com/
[2] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyjniNVjp0Aw@mail.gmail.com/

Update for the ioctl_userfaultfd(2)  manpage:

   UFFDIO_MOVE
       (Since Linux xxx)  Move a continuous memory chunk into the
       userfault registered range and optionally wake up the blocked
       thread. The source and destination addresses and the number of
       bytes to move are specified by the src, dst, and len fields of
       the uffdio_move structure pointed to by argp:

           struct uffdio_move {
               __u64 dst;    /* Destination of move */
               __u64 src;    /* Source of move */
               __u64 len;    /* Number of bytes to move */
               __u64 mode;   /* Flags controlling behavior of move */
               __s64 move;   /* Number of bytes moved, or negated error */
           };

       The following value may be bitwise ORed in mode to change the
       behavior of the UFFDIO_MOVE operation:

       UFFDIO_MOVE_MODE_DONTWAKE
              Do not wake up the thread that waits for page-fault
              resolution

       UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES
              Allow holes in the source virtual range that is being moved.
              When not specified, the holes will result in ENOENT error.
              When specified, the holes will be accounted as successfully
              moved memory. This is mostly useful to move hugepage aligned
              virtual regions without knowing if there are transparent
              hugepages in the regions or not, but preventing the risk of
              having to split the hugepage during the operation.

       The move field is used by the kernel to return the number of
       bytes that was actually moved, or an error (a negated errno-
       style value).  If the value returned in move doesn't match the
       value that was specified in len, the operation fails with the
       error EAGAIN.  The move field is output-only; it is not read by
       the UFFDIO_MOVE operation.

       The operation may fail for various reasons. Usually, remapping of
       pages that are not exclusive to the given process fail; once KSM
       might deduplicate pages or fork() COW-shares pages during fork()
       with child processes, they are no longer exclusive. Further, the
       kernel might only perform lightweight checks for detecting whether
       the pages are exclusive, and return -EBUSY in case that check fails.
       To make the operation more likely to succeed, KSM should be
       disabled, fork() should be avoided or MADV_DONTFORK should be
       configured for the source VMA before fork().

       This ioctl(2) operation returns 0 on success.  In this case, the
       entire area was moved.  On error, -1 is returned and errno is
       set to indicate the error.  Possible errors include:

       EAGAIN The number of bytes moved (i.e., the value returned in
              the move field) does not equal the value that was
              specified in the len field.

       EINVAL Either dst or len was not a multiple of the system page
              size, or the range specified by src and len or dst and len
              was invalid.

       EINVAL An invalid bit was specified in the mode field.

       ENOENT
              The source virtual memory range has unmapped holes and
              UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES is not set.

       EEXIST
              The destination virtual memory range is fully or partially
              mapped.

       EBUSY
              The pages in the source virtual memory range are either
              pinned or not exclusive to the process. The kernel might
              only perform lightweight checks for detecting whether the
              pages are exclusive. To make the operation more likely to
              succeed, KSM should be disabled, fork() should be avoided
              or MADV_DONTFORK should be configured for the source virtual
              memory area before fork().

       ENOMEM Allocating memory needed for the operation failed.

       ESRCH
              The target process has exited at the time of a UFFDIO_MOVE
              operation.

Link: https://lkml.kernel.org/r/20231206103702.3873743-3-surenb@google.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit adef440691bab824e39c1b17382322d195e1fab0)
Conflicts:
        mm/huge_memory.c
        mm/userfaultfd.c

1. Add vma parameter in mmu_notifier_range_init() calls.
2. Replace folio_move_anon_rmap() with page_move_anon_rmap().
3. Remove vma parameter in pmd_mkwrite() calls.
4. Replace pte_offset_map_nolock() with pte_offset_map()+pte_lockptr()
combo.
5. Remove VM_SHADOW_STACK in vma_move_compatible().
6. Replace pmdp_get_lockless() with pmd_read_atomic().

Bug: 274911254
Change-Id: I1116f15a395f1a8bac176906f7f9c2411e59dc54
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
2024-04-22 18:09:14 +00:00
..
ABI FROMGIT: f2fs: introduce FAULT_BLKADDR_CONSISTENCE 2024-03-04 18:26:14 +00:00
accounting Revert "sched/psi: Allow unprivileged polling of N*2s period" 2023-09-07 10:51:18 +00:00
admin-guide BACKPORT: userfaultfd: UFFDIO_MOVE uABI 2024-04-22 18:09:14 +00:00
arc
arm EFI updates for v6.1 2022-10-09 08:56:54 -07:00
arm64 BACKPORT: irqchip/gic-v3: Work around affinity issues on ASR8601 2024-01-26 10:14:07 +00:00
block blk-crypto: don't use struct request_queue for public interfaces 2023-05-11 23:03:00 +09:00
bpf bpf, docs: Fix modulo zero, division by zero, overflow, and underflow 2023-03-10 09:33:50 +01:00
cdrom
core-api FROMGIT: maple_tree: update the documentation of maple tree 2024-01-04 22:44:38 +00:00
cpu-freq
crypto
dev-tools Merge 6.1.16 into android14-6.1 2023-03-13 15:45:34 +00:00
devicetree Reapply "Merge tag 'android14-6.1.75_r00' into android14-6.1" 2024-04-02 19:49:12 +00:00
doc-guide Rust introduction for v6.1-rc1 2022-10-03 16:39:37 -07:00
driver-api Reapply "Merge tag 'android14-6.1.75_r00' into android14-6.1" 2024-04-02 19:49:12 +00:00
fault-injection lkdtm: replace ll_rw_block with submit_bh 2023-07-19 16:21:53 +02:00
fb Documentation: fb: udlfb: clean up text and formatting 2022-09-27 13:21:44 -06:00
features
filesystems FROMGIT: f2fs: introduce FAULT_BLKADDR_CONSISTENCE 2024-03-04 18:26:14 +00:00
firmware_class
firmware-guide Merge branches 'acpi-misc', 'acpi-tools' and 'acpi-docs' 2022-10-03 20:03:49 +02:00
fpga
gpu drm/vmwgfx: Remove vmwgfx_hashtab 2023-01-18 11:58:28 +01:00
hid
hwmon hwmon: (ftsteutates) Fix scaling of measurements 2023-03-10 09:33:10 +01:00
i2c docs: i2c: slave-interface: return errno when handle I2C_SLAVE_WRITE_REQUESTED 2022-09-28 21:41:59 +02:00
ia64
iio docs: iio: add documentation for BNO055 driver 2022-09-21 18:42:56 +01:00
images
infiniband
input Merge branch 'next' into for-linus 2022-10-09 22:30:23 -07:00
isdn
kbuild Linux 6.1-rc4 2022-11-07 09:35:43 +01:00
kernel-hacking Documentation: Fix spelling mistake in hacking.rst 2022-10-24 11:27:51 -06:00
leds
litmus-tests
livepatch
locking Remove duplicate words inside documentation 2022-09-27 13:21:43 -06:00
loongarch docs/LoongArch: Add booting description 2022-12-08 14:59:15 +08:00
m68k
maintainer
mhi
mips
misc-devices
mm mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[] 2023-09-19 12:27:54 +02:00
netlabel
networking Reapply "net: change accept_ra_min_rtr_lft to affect all RA lifetimes" 2024-02-13 01:24:14 +00:00
nios2
nvdimm
openrisc
parisc
PCI
pcmcia
peci
power
powerpc powerpc/64s: update cpu selection options 2022-09-28 19:22:10 +10:00
process docs: Set minimal gtags / GNU GLOBAL version to 6.6.5 2023-07-05 18:27:38 +01:00
RCU There's not a huge amount of activity in the docs tree this time around, 2022-10-03 10:23:32 -07:00
riscv riscv: Move early dtb mapping into the fixmap region 2023-05-01 08:26:28 +09:00
rust x86: enable initial Rust support 2022-09-28 09:02:45 +02:00
s390 vfio/mdev: embedd struct mdev_parent in the parent data structure 2022-10-04 12:06:58 -06:00
scheduler Merge 9388076b4c ("Merge tag 'acpi-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm") into android-mainline 2022-10-04 14:39:52 +02:00
scsi This is the 6.1.53 stable release 2023-09-18 09:57:37 +00:00
security KEYS: encrypted: fix key instantiation with user-provided data 2022-12-21 17:48:11 +01:00
sh
sound ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard 2023-04-20 12:35:05 +02:00
sparc
sphinx docs: Fix the docs build with Sphinx 6.0 2023-01-18 11:58:10 +01:00
sphinx-static
spi
staging docs: put atomic*.txt and memory-barriers.txt into the core-api book 2022-09-29 12:55:06 -06:00
target
timers
tools A handful of relatively simple documentation fixes, plus a set of patches 2022-10-13 10:58:32 -07:00
trace tracing/probes: Add symstr type for dynamic events 2023-08-03 10:23:54 +02:00
translations docs/zh_CN: Add LoongArch booting description's translation 2022-12-08 15:03:14 +08:00
usb usbip: add USBIP_URB_* URB transfer flags 2022-08-31 09:07:53 +02:00
userspace-api Revert "media: uapi: HEVC: Add num_delta_pocs_of_ref_rps_idx field" 2023-10-12 12:03:37 +00:00
virt This is the 6.1.63 stable release 2023-11-27 16:59:46 +00:00
w1 Documentation: W1: minor typo corrections 2022-09-27 13:21:44 -06:00
watchdog
x86 x86/sev: Add SEV-SNP guest feature negotiation support 2023-02-01 08:34:50 +01:00
xtensa
.gitignore
arch.rst
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py There's not a huge amount of activity in the docs tree this time around, 2022-10-03 10:23:32 -07:00
docutils.conf
dontdiff
index.rst Rust introduction for v6.1-rc1 2022-10-03 16:39:37 -07:00
Kconfig
Makefile
memory-barriers.txt docs/memory-barriers.txt: Fixup long lines 2022-08-31 05:15:31 -07:00
SubmittingPatches
subsystem-apis.rst docs: Rewrite the front page 2022-09-29 12:55:06 -06:00