-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmQumsIACgkQONu9yGCS
aT4yfBAAwaDPXomEa+DY6pkQEE7WPVtIkeO+sQIo7bWHunTDilTLRFeDUJ4THydT
CnhhlGsBUt8KGeWgSR6hHeTl/c+b+AcBan5k5BBufUGrsDn/XV8QIEyKWhbLIEja
qWehpogs7BJLg2dFRqTfHQEOhLht1jCmC99tfEozEG4zRudmdS3Z2DbRypfEHshc
oGOC1Jzg4MLPfB+lCwKNrVMBlR2n/73P7mTUCu/Dc9+DUbm+GtqvsPuGT2LxVyY7
kkNgGzvdxQQCqtK5X6zyoU61gepsobf6c6kHjBucn8mhaYURT5ndfV9VqLWkDYE7
71iH0oY5fg2NgbMtQpbA10MokjijFp46I4QxzG/RVl2ZN2pbCFNm5aNIBCwBbF2k
lN6hwJc1nbTi696o29o1osm+yju3347HCAWC8s+DAszXiquihiUeJBwuCfa1c+Gy
GhdATa3nNQ/8D0gWULr/kl7DvlgpSpYrbEQGVG2gH6tdsAZt2iKYUtGLFjvDN+fw
CoMpq2OZTX5afM7AxTX00f5lGmbXhD+T9a+pS9AXhPqKcGv1tt0Gso8dn7cpWpj5
LxhIE9dK5F1/tI+wPE+8t80CukqQHfoCQ24YO8mfUKmlInwjGd1Hque+ihKJo7ZW
W5CXlZJJVvpVk9BxMNaYHKfSE+U6G7hYabEAzJXR3fz9vGfoTII=
=rz/i
-----END PGP SIGNATURE-----
Merge 6.1.23 into android14-6.1
Changes in 6.1.23
thunderbolt: Limit USB3 bandwidth of certain Intel USB4 host routers
cifs: update ip_addr for ses only for primary chan setup
cifs: prevent data race in cifs_reconnect_tcon()
cifs: avoid race conditions with parallel reconnects
zonefs: Reorganize code
zonefs: Simplify IO error handling
zonefs: Reduce struct zonefs_inode_info size
zonefs: Separate zone information from inode information
zonefs: Fix error message in zonefs_file_dio_append()
fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
kernel: kcsan: kcsan_test: build without structleak plugin
kcsan: avoid passing -g for test
btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
btrfs: zoned: count fresh BG region as zone unusable
net: ethernet: ti: am65-cpsw/cpts: Fix CPTS release action
riscv: ftrace: Fixup panic by disabling preemption
ARM: dts: aspeed: p10bmc: Update battery node name
drm/msm/dpu: Refactor sc7280_pp location
drm/msm/dpu: correct sm8250 and sm8350 scaler
drm/msm/disp/dpu: fix sc7280_pp base offset
tty: serial: fsl_lpuart: switch to new dmaengine_terminate_* API
tty: serial: fsl_lpuart: fix race on RX DMA shutdown
tracing: Add .percent suffix option to histogram values
tracing: Add .graph suffix option to histogram value
tracing: Do not let histogram values have some modifiers
net: mscc: ocelot: fix stats region batching
arm64: efi: Set NX compat flag in PE/COFF header
cifs: fix missing unload_nls() in smb2_reconnect()
xfrm: Zero padding when dumping algos and encap
ASoC: codecs: tx-macro: Fix for KASAN: slab-out-of-bounds
ASoC: Intel: avs: max98357a: Explicitly define codec format
ASoC: Intel: avs: da7219: Explicitly define codec format
ASoC: Intel: avs: ssm4567: Remove nau8825 bits
ASoC: Intel: avs: nau8825: Adjust clock control
zstd: Fix definition of assert()
ACPI: video: Add backlight=native DMI quirk for Dell Vostro 15 3535
ASoC: SOF: ipc3: Check for upper size limit for the received message
ASoC: SOF: ipc4-topology: Fix incorrect sample rate print unit
ASoC: SOF: Intel: pci-tng: revert invalid bar size setting
ASoC: SOF: IPC4: update gain ipc msg definition to align with fw
md: avoid signed overflow in slot_store()
x86/PVH: obtain VGA console info in Dom0
drm/amdkfd: Fix BO offset for multi-VMA page migration
drm/amdkfd: fix a potential double free in pqm_create_queue
drm/amdkfd: fix potential kgd_mem UAFs
net: hsr: Don't log netdev_err message on unknown prp dst node
ALSA: asihpi: check pao in control_message()
ALSA: hda/ca0132: fixup buffer overrun at tuning_ctl_set()
fbdev: tgafb: Fix potential divide by zero
ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
sched_getaffinity: don't assume 'cpumask_size()' is fully initialized
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
drm/amdkfd: Fixed kfd_process cleanup on module exit.
net/mlx5e: Lower maximum allowed MTU in XSK to match XDP prerequisites
fbdev: nvidia: Fix potential divide by zero
fbdev: intelfb: Fix potential divide by zero
fbdev: lxfb: Fix potential divide by zero
fbdev: au1200fb: Fix potential divide by zero
tools/power turbostat: Fix /dev/cpu_dma_latency warnings
tools/power turbostat: fix decoding of HWP_STATUS
tracing: Fix wrong return in kprobe_event_gen_test.c
btrfs: fix uninitialized variable warning in btrfs_update_block_group
btrfs: use temporary variable for space_info in btrfs_update_block_group
mtd: rawnand: meson: initialize struct with zeroes
mtd: nand: mxic-ecc: Fix mxic_ecc_data_xfer_wait_for_completion() when irq is used
ca8210: Fix unsigned mac_len comparison with zero in ca8210_skb_tx()
riscv/kvm: Fix VM hang in case of timer delta being zero.
mips: bmips: BCM6358: disable RAC flush for TP1
ALSA: usb-audio: Fix recursive locking at XRUN during syncing
PCI: dwc: Fix PORT_LINK_CONTROL update when CDM check enabled
platform/x86: think-lmi: add missing type attribute
platform/x86: think-lmi: use correct possible_values delimiters
platform/x86: think-lmi: only display possible_values if available
platform/x86: think-lmi: Add possible_values for ThinkStation
platform/surface: aggregator: Add missing fwnode_handle_put()
mtd: rawnand: meson: invalidate cache on polling ECC bit
SUNRPC: fix shutdown of NFS TCP client socket
sfc: ef10: don't overwrite offload features at NIC reset
scsi: megaraid_sas: Fix crash after a double completion
scsi: mpt3sas: Don't print sense pool info twice
net: dsa: realtek: fix out-of-bounds access
ptp_qoriq: fix memory leak in probe()
net: dsa: microchip: ksz8: fix ksz8_fdb_dump()
net: dsa: microchip: ksz8: fix ksz8_fdb_dump() to extract all 1024 entries
net: dsa: microchip: ksz8: fix offset for the timestamp filed
net: dsa: microchip: ksz8: ksz8_fdb_dump: avoid extracting ghost entry from empty dynamic MAC table.
net: dsa: microchip: ksz8863_smi: fix bulk access
net: dsa: microchip: ksz8: fix MDB configuration with non-zero VID
r8169: fix RTL8168H and RTL8107E rx crc error
regulator: Handle deferred clk
net/net_failover: fix txq exceeding warning
net: stmmac: don't reject VLANs when IFF_PROMISC is set
drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
platform/x86/intel/pmc: Alder Lake PCH slp_s0_residency fix
can: bcm: bcm_tx_setup(): fix KMSAN uninit-value in vfs_write
s390/vfio-ap: fix memory leak in vfio_ap device driver
ACPI: bus: Rework system-level device notification handling
loop: LOOP_CONFIGURE: send uevents for partitions
net: mvpp2: classifier flow fix fragmentation flags
net: mvpp2: parser fix QinQ
net: mvpp2: parser fix PPPoE
smsc911x: avoid PHY being resumed when interface is not up
ice: Fix ice_cfg_rdma_fltr() to only update relevant fields
ice: add profile conflict check for AVF FDIR
ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
ALSA: ymfpci: Fix BUG_ON in probe function
net: ipa: compute DMA pool size properly
i40e: fix registers dump after run ethtool adapter self test
bnxt_en: Fix reporting of test result in ethtool selftest
bnxt_en: Fix typo in PCI id to device description string mapping
bnxt_en: Add missing 200G link speed reporting
net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
net: ethernet: mtk_eth_soc: fix flow block refcounting logic
net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
pinctrl: ocelot: Fix alt mode for ocelot
Input: xpad - fix incorrectly applied patch for MAP_PROFILE_BUTTON
iommu/vt-d: Allow zero SAGAW if second-stage not supported
Input: i8042 - add TUXEDO devices to i8042 quirk tables for partial fix
Input: alps - fix compatibility with -funsigned-char
Input: focaltech - use explicitly signed char type
cifs: prevent infinite recursion in CIFSGetDFSRefer()
cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL
Input: i8042 - add quirk for Fujitsu Lifebook A574/H
Input: goodix - add Lenovo Yoga Book X90F to nine_bytes_report DMI table
btrfs: fix deadlock when aborting transaction during relocation with scrub
btrfs: fix race between quota disable and quota assign ioctls
btrfs: scan device in non-exclusive mode
zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space
block/io_uring: pass in issue_flags for uring_cmd task_work handling
io_uring/poll: clear single/double poll flags on poll arming
io_uring/rsrc: fix rogue rsrc node grabbing
io_uring: fix poll/netmsg alloc caches
vmxnet3: use gro callback when UPT is enabled
zonefs: Always invalidate last cached page on append write
dm: fix __send_duplicate_bios() to always allow for splitting IO
can: j1939: prevent deadlock by moving j1939_sk_errqueue()
xen/netback: don't do grant copy across page boundary
net: phy: dp83869: fix default value for tx-/rx-internal-delay
modpost: Fix processing of CRCs on 32-bit build machines
pinctrl: amd: Disable and mask interrupts on resume
pinctrl: at91-pio4: fix domain name assignment
platform/x86: ideapad-laptop: Stop sending KEY_TOUCHPAD_TOGGLE
powerpc: Don't try to copy PPR for task with NULL pt_regs
powerpc/pseries/vas: Ignore VAS update for DLPAR if copy/paste is not enabled
powerpc/64s: Fix __pte_needs_flush() false positive warning
NFSv4: Fix hangs when recovering open state after a server reboot
ALSA: hda/conexant: Partial revert of a quirk for Lenovo
ALSA: usb-audio: Fix regression on detection of Roland VS-100
ALSA: hda/realtek: Add quirks for some Clevo laptops
ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Z
xtensa: fix KASAN report for show_stack
rcu: Fix rcu_torture_read ftrace event
dt-bindings: mtd: jedec,spi-nor: Document CPOL/CPHA support
s390/uaccess: add missing earlyclobber annotations to __clear_user()
s390: reintroduce expoline dependence to scripts
drm/etnaviv: fix reference leak when mmaping imported buffer
drm/amdgpu: allow more APUs to do mode2 reset when go to S4
drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
drm/amd/display: Take FEC Overhead into Timeslot Calculation
drm/i915/gem: Flush lmem contents after construction
drm/i915/dpt: Treat the DPT BO as a framebuffer
drm/i915: Disable DC states for all commits
drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk
KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value
KVM: arm64: Disable interrupts while walking userspace PTs
net: dsa: mv88e6xxx: read FID when handling ATU violations
net: dsa: mv88e6xxx: replace ATU violation prints with trace points
net: dsa: mv88e6xxx: replace VTU violation prints with trace points
selftests/bpf: Test btf dump for struct with padding only fields
libbpf: Fix BTF-to-C converter's padding logic
selftests/bpf: Add few corner cases to test padding handling of btf_dump
libbpf: Fix btf_dump's packed struct determination
usb: ucsi: Fix ucsi->connector race
drm/amdkfd: Get prange->offset after svm_range_vram_node_new
hsr: ratelimit only when errors are printed
x86/PVH: avoid 32-bit build warning when obtaining VGA console info
Revert "cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE *again*"
Linux 6.1.23
Change-Id: I15af3697170567c4678bcc9c2380d80e7cef5bc9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit a075bacde257f755bea0e53400c9f1cdd1b8e8e6 ]
The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing
performance problems and is hindering adoption of fsverity. It was
intended to solve a race condition where unverified pages might be left
in the pagecache. But actually it doesn't solve it fully.
Since the incomplete solution for this race condition has too much
performance impact for it to be worth it, let's remove it for now.
Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl")
Cc: stable@vger.kernel.org
Reviewed-by: Victor Hsieh <victorhsieh@google.com>
Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 6.1.22
interconnect: qcom: osm-l3: fix icc_onecell_data allocation
interconnect: qcom: sm8450: switch to qcom_icc_rpmh_* function
interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT
perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output
perf: fix perf_event_context->time
tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
drm/amd/display: Include virtual signal to set k1 and k2 values
drm/amd/display: fix k1 k2 divider programming for phantom streams
drm/amd/display: Remove OTG DIV register write for Virtual signals.
mptcp: refactor passive socket initialization
mptcp: use the workqueue to destroy unaccepted sockets
mptcp: fix UaF in listener shutdown
drm/amd/display: Fix DP MST sinks removal issue
arm64: dts: qcom: sm8450: Mark UFS controller as cache coherent
power: supply: bq24190: Fix use after free bug in bq24190_remove due to race condition
power: supply: da9150: Fix use after free bug in da9150_charger_remove due to race condition
arm64: dts: imx8dxl-evk: Disable hibernation mode of AR8031 for EQOS
arm64: dts: imx8dxl-evk: Fix eqos phy reset gpio
ARM: dts: imx6sll: e70k02: fix usbotg1 pinctrl
ARM: dts: imx6sll: e60k02: fix usbotg1 pinctrl
ARM: dts: imx6sl: tolino-shine2hd: fix usbotg1 pinctrl
arm64: dts: imx8mn: specify #sound-dai-cells for SAI nodes
arm64: dts: imx93: add missing #address-cells and #size-cells to i2c nodes
NFS: Fix /proc/PID/io read_bytes for buffered reads
xsk: Add missing overflow check in xdp_umem_reg
iavf: fix inverted Rx hash condition leading to disabled hash
iavf: fix non-tunneled IPv6 UDP packet type and hashing
iavf: do not track VLAN 0 filters
intel/igbvf: free irq on the error path in igbvf_request_msix()
igbvf: Regard vf reset nack as success
igc: fix the validation logic for taprio's gate list
i2c: imx-lpi2c: check only for enabled interrupt flags
i2c: mxs: ensure that DMA buffers are safe for DMA
i2c: hisi: Only use the completion interrupt to finish the transfer
scsi: scsi_dh_alua: Fix memleak for 'qdata' in alua_activate()
nfsd: don't replace page in rq_pages if it's a continuation of last page
net: dsa: b53: mmap: fix device tree support
net: usb: smsc95xx: Limit packet length to skb->len
efi/libstub: smbios: Use length member instead of record struct size
qed/qed_sriov: guard against NULL derefs from qed_iov_get_vf_info
xirc2ps_cs: Fix use after free bug in xirc2ps_detach
net: phy: Ensure state transitions are processed from phy_stop()
net: mdio: fix owner field for mdio buses registered using device-tree
net: mdio: fix owner field for mdio buses registered using ACPI
net: stmmac: Fix for mismatched host/device DMA address width
thermal/drivers/mellanox: Use generic thermal_zone_get_trip() function
mlxsw: core_thermal: Fix fan speed in maximum cooling state
drm/i915: Print return value on error
drm/i915/fbdev: lock the fbdev obj before vma pin
drm/i915/guc: Rename GuC register state capture node to be more obvious
drm/i915/guc: Fix missing ecodes
drm/i915/gt: perform uc late init after probe error injection
net: qcom/emac: Fix use after free bug in emac_remove due to race condition
net: usb: lan78xx: Limit packet length to skb->len
net/ps3_gelic_net: Fix RX sk_buff length
net/ps3_gelic_net: Use dma_mapping_error
octeontx2-vf: Add missing free for alloc_percpu
bootconfig: Fix testcase to increase max node
keys: Do not cache key in task struct if key is requested from kernel thread
ice: check if VF exists before mode check
iavf: fix hang on reboot with ice
i40e: fix flow director packet filter programming
bpf: Adjust insufficient default bpf_jit_limit
net/mlx5e: Set uplink rep as NETNS_LOCAL
net/mlx5e: Block entering switchdev mode with ns inconsistency
net/mlx5: Fix steering rules cleanup
net/mlx5e: Overcome slow response for first macsec ASO WQE
net/mlx5: Read the TC mapping of all priorities on ETS query
net/mlx5: E-Switch, Fix an Oops in error handling code
net: dsa: tag_brcm: legacy: fix daisy-chained switches
atm: idt77252: fix kmemleak when rmmod idt77252
erspan: do not use skb_mac_header() in ndo_start_xmit()
net/sonic: use dma_mapping_error() for error check
nvme-tcp: fix nvme_tcp_term_pdu to match spec
mlxsw: spectrum_fid: Fix incorrect local port type
hvc/xen: prevent concurrent accesses to the shared ring
ksmbd: add low bound validation to FSCTL_SET_ZERO_DATA
ksmbd: add low bound validation to FSCTL_QUERY_ALLOCATED_RANGES
ksmbd: fix possible refcount leak in smb2_open()
Bluetooth: hci_sync: Resume adv with no RPA when active scan
Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packet
Bluetooth: btusb: Remove detection of ISO packets over bulk
Bluetooth: ISO: fix timestamped HCI ISO data packet parsing
Bluetooth: Remove "Power-on" check from Mesh feature
gve: Cache link_speed value from device
net: asix: fix modprobe "sysfs: cannot create duplicate filename"
net: dsa: mt7530: move enabling disabling core clock to mt7530_pll_setup()
net: dsa: mt7530: move lowering TRGMII driving to mt7530_setup()
net: dsa: mt7530: move setting ssc_delta to PHY_INTERFACE_MODE_TRGMII case
net: mdio: thunder: Add missing fwnode_handle_put()
drm/amd/display: Set dcn32 caps.seamless_odm
Bluetooth: btqcomsmd: Fix command timeout after setting BD address
Bluetooth: L2CAP: Fix responding with wrong PDU type
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
Bluetooth: mgmt: Fix MGMT add advmon with RSSI command
Bluetooth: HCI: Fix global-out-of-bounds
platform/chrome: cros_ec_chardev: fix kernel data leak from ioctl
entry: Fix noinstr warning in __enter_from_user_mode()
perf/x86/amd/core: Always clear status for idx
entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up
hwmon: fix potential sensor registration fail if of_node is missing
hwmon (it87): Fix voltage scaling for chips with 10.9mV ADCs
scsi: qla2xxx: Synchronize the IOCB count to be in order
scsi: qla2xxx: Perform lockless command completion in abort path
smb3: lower default deferred close timeout to address perf regression
smb3: fix unusable share after force unmount failure
uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS583Gen 2
thunderbolt: Use scale field when allocating USB3 bandwidth
thunderbolt: Call tb_check_quirks() after initializing adapters
thunderbolt: Add quirk to disable CLx
thunderbolt: Fix memory leak in margining
thunderbolt: Disable interrupt auto clear for rings
thunderbolt: Add missing UNSET_INBOUND_SBTX for retimer access
thunderbolt: Use const qualifier for `ring_interrupt_index`
thunderbolt: Rename shadowed variables bit to interrupt_bit and auto_clear_bit
ASoC: amd: yp: Add OMEN by HP Gaming Laptop 16z-n000 to quirks
ASoC: amd: yc: Add DMI entries to support HP OMEN 16-n0xxx (8A43)
ACPI: x86: Drop quirk for HP Elitebook
ACPI: x86: utils: Add Cezanne to the list for forcing StorageD3Enable
riscv: Bump COMMAND_LINE_SIZE value to 1024
drm/cirrus: NULL-check pipe->plane.state->fb in cirrus_pipe_update()
HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded
ca8210: fix mac_len negative array access
HID: logitech-hidpp: Add support for Logitech MX Master 3S mouse
HID: intel-ish-hid: ipc: Fix potential use-after-free in work function
m68k: mm: Fix systems with memory at end of 32-bit address space
m68k: Only force 030 bus error if PC not in exception table
selftests/bpf: check that modifier resolves after pointer
scsi: target: iscsi: Fix an error message in iscsi_check_key()
scsi: qla2xxx: Add option to disable FC2 Target support
scsi: hisi_sas: Check devm_add_action() return value
scsi: ufs: core: Add soft dependency on governor_simpleondemand
scsi: lpfc: Check kzalloc() in lpfc_sli4_cgn_params_read()
scsi: lpfc: Avoid usage of list iterator variable after loop
scsi: mpi3mr: Driver unload crashes host when enhanced logging is enabled
scsi: mpi3mr: Wait for diagnostic save during controller init
scsi: mpi3mr: NVMe command size greater than 8K fails
scsi: mpi3mr: Bad drive in topology results kernel crash
scsi: storvsc: Handle BlockSize change in Hyper-V VHD/VHDX file
platform/x86: int3472: Add GPIOs to Surface Go 3 Board data
net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
net: usb: qmi_wwan: add Telit 0x1080 composition
drm/amd/display: Update clock table to include highest clock setting
sh: sanitize the flags on sigreturn
drm/amdgpu: Fix call trace warning and hang when removing amdgpu device
drm/amd: Fix initialization mistake for NBIO 7.3.0
net/sched: act_mirred: better wording on protection against excessive stack growth
act_mirred: use the backlog for nested calls to mirred ingress
cifs: lock chan_lock outside match_session
cifs: append path to open_enter trace event
cifs: do not poll server interfaces too regularly
cifs: empty interface list when server doesn't support query interfaces
cifs: dump pending mids for all channels in DebugData
cifs: print session id while listing open files
cifs: fix dentry lookups in directory handle cache
x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf()
selftests/x86/amx: Add a ptrace test
scsi: core: Add BLIST_SKIP_VPD_PAGES for SKhynix H28U74301AMR
usb: misc: onboard-hub: add support for Microchip USB2517 USB 2.0 hub
usb: dwc2: drd: fix inconsistent mode if role-switch-default-mode="host"
usb: dwc2: fix a devres leak in hw_enable upon suspend resume
usb: gadget: u_audio: don't let userspace block driver unbind
btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
Bluetooth: Fix race condition in hci_cmd_sync_clear
efi: sysfb_efi: Fix DMI quirks not working for simpledrm
mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP
fscrypt: destroy keyring after security_sb_delete()
fsverity: Remove WQ_UNBOUND from fsverity read workqueue
lockd: set file_lock start and end when decoding nlm4 testargs
arm64: dts: imx8mm-nitrogen-r2: fix WM8960 clock name
igb: revert rtnl_lock() that causes deadlock
dm thin: fix deadlock when swapping to thin device
usb: typec: tcpm: fix create duplicate source-capabilities file
usb: typec: tcpm: fix warning when handle discover_identity message
usb: cdns3: Fix issue with using incorrect PCI device function
usb: cdnsp: Fixes issue with redundant Status Stage
usb: cdnsp: changes PCI Device ID to fix conflict with CNDS3 driver
usb: chipdea: core: fix return -EINVAL if request role is the same with current role
usb: chipidea: core: fix possible concurrent when switch role
usb: dwc3: gadget: Add 1ms delay after end transfer command without IOC
usb: ucsi: Fix NULL pointer deref in ucsi_connector_change()
usb: ucsi_acpi: Increase the command completion timeout
mm: kfence: fix using kfence_metadata without initialization in show_object()
kfence: avoid passing -g for test
io_uring/net: avoid sending -ECONNABORTED on repeated connection requests
io_uring/rsrc: fix null-ptr-deref in io_file_bitmap_get()
Revert "kasan: drop skip_kasan_poison variable in free_pages_prepare"
test_maple_tree: add more testing for mas_empty_area()
maple_tree: fix mas_skip_node() end slot detection
ksmbd: fix wrong signingkey creation when encryption is AES256
ksmbd: set FILE_NAMED_STREAMS attribute in FS_ATTRIBUTE_INFORMATION
ksmbd: don't terminate inactive sessions after a few seconds
ksmbd: return STATUS_NOT_SUPPORTED on unsupported smb2.0 dialect
ksmbd: return unsupported error on smb1 mount
wifi: mac80211: fix qos on mesh interfaces
nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
drm/bridge: lt8912b: return EPROBE_DEFER if bridge is not found
drm/amd/display: fix wrong index used in dccg32_set_dpstreamclk
drm/meson: fix missing component unbind on bind errors
drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi
drm/i915/active: Fix missing debug object activation
drm/i915: Preserve crtc_state->inherited during state clearing
drm/amdgpu: skip ASIC reset for APUs when go to S4
drm/amdgpu: reposition the gpu reset checking for reuse
riscv: mm: Fix incorrect ASID argument when flushing TLB
riscv: Handle zicsr/zifencei issues between clang and binutils
tee: amdtee: fix race condition in amdtee_open_session
firmware: arm_scmi: Fix device node validation for mailbox transport
arm64: dts: qcom: sc7280: Mark PCIe controller as cache coherent
arm64: dts: qcom: sm8150: Fix the iommu mask used for PCIe controllers
soc: qcom: llcc: Fix slice configuration values for SC8280XP
mm/ksm: fix race with VMA iteration and mm_struct teardown
bus: imx-weim: fix branch condition evaluates to a garbage value
i2c: xgene-slimpro: Fix out-of-bounds bug in xgene_slimpro_i2c_xfer()
dm stats: check for and propagate alloc_percpu failure
dm crypt: add cond_resched() to dmcrypt_write()
dm crypt: avoid accessing uninitialized tasklet
sched/fair: sanitize vruntime of entity being placed
sched/fair: Sanitize vruntime of entity being migrated
drm/amdkfd: introduce dummy cache info for property asic
drm/amdkfd: Fix the warning of array-index-out-of-bounds
drm/amdkfd: add GC 11.0.4 KFD support
drm/amdkfd: Fix the memory overrun
Linux 6.1.22
Change-Id: Id13b4655dbfb59c29a0b8953e5e0cda3703f1879
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit f959325e6ac3f499450088b8d9c626d1177be160 upstream.
WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This
is problematic for latency sensitive workloads, like I/O
post-processing.
Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
scheduler latency and improves app cold startup times by ~30ms.
WQ_UNBOUND was also removed from the dm-verity workqueue for the same
reason [1].
This code was tested by running Android app startup benchmarks and
measuring how long the fsverity workqueue spent in the runnable state.
Before
Total workqueue scheduler latency: 553800us
After
Total workqueue scheduler latency: 18962us
[1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Fixes: 8a1d0f9cacc9 ("fs-verity: add data verification hooks for ->readpages()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230310193325.620493-1-nhuck@google.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* aosp/upstream-f2fs-stable-linux-6.1.y:
fscrypt: support decrypting data from large folios
fsverity: support verifying data from large folios
fsverity.rst: update git repo URL for fsverity-utils
ext4: allow verity with fs block size < PAGE_SIZE
fs/buffer.c: support fsverity in block_read_full_folio()
f2fs: simplify f2fs_readpage_limit()
ext4: simplify ext4_readpage_limit()
fsverity: support enabling with tree block size < PAGE_SIZE
fsverity: support verification with tree block size < PAGE_SIZE
fsverity: replace fsverity_hash_page() with fsverity_hash_block()
fsverity: use EFBIG for file too large to enable verity
fsverity: store log2(digest_size) precomputed
fsverity: simplify Merkle tree readahead size calculation
fsverity: use unsigned long for level_start
fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUG
fsverity: pass pos and size to ->write_merkle_tree_block
fsverity: optimize fsverity_cleanup_inode() on non-verity files
fsverity: optimize fsverity_prepare_setattr() on non-verity files
fsverity: optimize fsverity_file_open() on non-verity files
fscrypt: clean up fscrypt_add_test_dummy_key()
fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super()
f2fs: stop calling fscrypt_add_test_dummy_key()
ext4: stop calling fscrypt_add_test_dummy_key()
fscrypt: add the test dummy encryption key on-demand
f2fs: drop unnecessary arg for f2fs_ioc_*()
f2fs: Revert "f2fs: truncate blocks in batch in __complete_revoke_list()"
f2fs: synchronize atomic write aborts
f2fs: fix wrong segment count
f2fs: replace si->sbi w/ sbi in stat_show()
f2fs: export ipu policy in debugfs
f2fs: make kobj_type structures constant
f2fs: fix to do sanity check on extent cache correctly
f2fs: add missing description for ipu_policy node
f2fs: fix to set ipu policy
f2fs: fix typos in comments
f2fs: fix kernel crash due to null io->bio
f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx()
f2fs: add sysfs nodes to set last_age_weight
f2fs: fix f2fs_show_options to show nogc_merge mount option
f2fs: fix cgroup writeback accounting with fs-layer encryption
f2fs: fix wrong calculation of block age
f2fs: fix to update age extent in f2fs_do_zero_range()
f2fs: fix to update age extent correctly during truncation
f2fs: fix to avoid potential memory corruption in __update_iostat_latency()
f2fs: retry to update the inode page given data corruption
f2fs: fix to handle F2FS_IOC_START_ATOMIC_REPLACE in f2fs_compat_ioctl()
f2fs: clean up i_compress_flag and i_compress_level usage
f2fs: reduce stack memory cost by using bitfield in struct f2fs_io_info
f2fs: factor the read/write tracing logic into a helper
f2fs: remove __has_curseg_space
f2fs: refactor next blk selection
f2fs: remove __allocate_new_section
f2fs: refactor __allocate_new_segment
f2fs: add a f2fs_curseg_valid_blocks helper
f2fs: simplify do_checkpoint
f2fs: remove __add_sum_entry
f2fs: fix to abort atomic write only during do_exist()
f2fs: allow set compression option of files without blocks
f2fs: fix information leak in f2fs_move_inline_dirents()
fs: f2fs: initialize fsdata in pagecache_write()
f2fs: fix to check warm_data_age_threshold
f2fs: return true if all cmd were issued or no cmd need to be issued for f2fs_issue_discard_timeout()
f2fs: clarify compress level bit offset
f2fs: fix to show discard_unit mount opt
f2fs: fix to do sanity check on extent cache correctly
f2fs: remove unneeded f2fs_cp_error() in f2fs_create_whiteout()
f2fs: clear atomic_write_task in f2fs_abort_atomic_write()
f2fs: introduce trace_f2fs_replace_atomic_write_block
f2fs: introduce discard_io_aware_gran sysfs node
f2fs: drop useless initializer and unneeded local variable
f2fs: add iostat support for flush
f2fs: support accounting iostat count and avg_bytes
f2fs: convert discard_wake and gc_wake to bool type
f2fs: convert to use MIN_DISCARD_GRANULARITY macro
f2fs: merge f2fs_show_injection_info() into time_to_inject()
f2fs: add a f2fs_ prefix to punch_hole() and expand_inode_data()
f2fs: remove unnecessary blank lines
f2fs: mark f2fs_init_compress_mempool w/ __init
f2fs: judge whether discard_unit is section only when have CONFIG_BLK_DEV_ZONED
f2fs: start freeing cluster pages from the unused number
MAINTAINERS: Add f2fs's patchwork
f2fs: deliver the accumulated 'issued' to __issue_discard_cmd_orderly()
f2fs: avoid to check PG_error flag
f2fs: add missing doc for fault injection sysfs
f2fs: fix to avoid potential deadlock
f2fs: introduce IS_F2FS_IPU_* macro
f2fs: refactor the hole reporting and allocation logic in f2fs_map_blocks
f2fs: factor out a f2fs_map_no_dnode
f2fs: factor a f2fs_map_blocks_cached helper
f2fs: remove the create argument to f2fs_map_blocks
f2fs: remove f2fs_get_block
docs: f2fs: fix html doc error
f2fs: simplify __allocate_data_block
f2fs: reflow prepare_write_begin
f2fs: f2fs_do_map_lock
f2fs: add a f2fs_get_block_locked helper
f2fs: add a f2fs_lookup_extent_cache_block helper
f2fs: split __submit_bio
f2fs: rename F2FS_MAP_UNWRITTEN to F2FS_MAP_DELALLOC
f2fs: decouple F2FS_MAP_ from buffer head flags
f2fs: don't rely on F2FS_MAP_* in f2fs_iomap_begin
f2fs: fix to call clear_page_private_reference in .{release,invalid}_folio
f2fs: remove unused PAGE_PRIVATE_ATOMIC_WRITE
f2fs: fix to support .migrate_folio for compressed inode
f2fs: file: drop useless initializer in expand_inode_data()
Bug: 264705711
Bug: 269384820
Bug: 269593531
Change-Id: Ib84dc3389b6a06068a10d427c03f6dbc034831a6
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Try to make fs/verity/verify.c aware of large folios. This includes
making fsverity_verify_bio() support the case where the bio contains
large folios, and adding a function fsverity_verify_folio() which is the
equivalent of fsverity_verify_page().
There's no way to actually test this with large folios yet, but I've
tested that this doesn't cause any regressions.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230127221529.299560-1-ebiggers@kernel.org
Make FS_IOC_ENABLE_VERITY support values of
fsverity_enable_arg::block_size other than PAGE_SIZE.
To make this possible, rework build_merkle_tree(), which was reading
data and hash pages from the file and assuming that they were the same
thing as "blocks".
For reading the data blocks, just replace the direct pagecache access
with __kernel_read(), to naturally read one block at a time.
(A disadvantage of the above is that we lose the two optimizations of
hashing the pagecache pages in-place and forcing the maximum readahead.
That shouldn't be very important, though.)
The hash block reads are a bit more difficult to handle, as the only way
to do them is through fsverity_operations::read_merkle_tree_page().
Instead, let's switch to the single-pass tree construction algorithm
that fsverity-utils uses. This eliminates the need to read back any
hash blocks while the tree is being built, at the small cost of an extra
block-sized memory buffer per Merkle tree level. This is probably what
I should have done originally.
Taken together, the above two changes result in page-size independent
code that is also a bit simpler than what we had before.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20221223203638.41293-8-ebiggers@kernel.org
Add support for verifying data from verity files whose Merkle tree block
size is less than the page size. The main use case for this is to allow
a single Merkle tree block size to be used across all systems, so that
only one set of fsverity file digests and signatures is needed.
To do this, eliminate various assumptions that the Merkle tree block
size and the page size are the same:
- Make fsverity_verify_page() a wrapper around a new function
fsverity_verify_blocks() which verifies one or more blocks in a page.
- When a Merkle tree block is needed, get the corresponding page and
only verify and use the needed portion. (The Merkle tree continues to
be read and cached in page-sized chunks; that doesn't need to change.)
- When the Merkle tree block size and page size differ, use a bitmap
fsverity_info::hash_block_verified to keep track of which Merkle tree
blocks have been verified, as PageChecked cannot be used directly.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20221223203638.41293-7-ebiggers@kernel.org
In preparation for allowing the Merkle tree block size to differ from
PAGE_SIZE, replace fsverity_hash_page() with fsverity_hash_block(). The
new function is similar to the old one, but it operates on the block at
the given offset in the page instead of on the full page.
(For now, all callers still pass a full page.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20221223203638.41293-6-ebiggers@kernel.org
Currently, there is an implementation limit where files can't have more
than 8 Merkle tree levels. With SHA-256 and 4K blocks, this limit is
never reached, since a file would need to be larger than 2**64 bytes to
need 9 levels. However, with SHA-512, 9 levels are needed for files
larger than about 1.15 EB, which is possible on btrfs. Therefore, this
limit technically became reachable when btrfs added fsverity support.
Meanwhile, support for merkle_tree_block_size < PAGE_SIZE will introduce
another implementation limit on file size, resulting from the use of an
in-memory bitmap to track which Merkle tree blocks have been verified.
In any case, currently FS_IOC_ENABLE_VERITY fails with EINVAL when the
file is too large. This is undocumented, and also ambiguous since
EINVAL can mean other things too. Let's change the error code to EFBIG,
which is much clearer, and document it.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20221223203638.41293-5-ebiggers@kernel.org
Add log_digestsize to struct merkle_tree_params so that it can be used
in verify.c. Also save memory by using u8 for all the log_* fields.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20221223203638.41293-4-ebiggers@kernel.org
First, calculate max_ra_pages more efficiently by using the bio size.
Second, calculate the number of readahead pages from the hash page
index, instead of calculating it ahead of time using the data page
index. This ends up being a bit simpler, especially since level 0 is
last in the tree, so we can just limit the readahead to the tree size.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20221223203638.41293-3-ebiggers@kernel.org
fs/verity/ isn't consistent with whether Merkle tree block indices are
'unsigned long' or 'u64'. There's no real point to using u64 for them,
though, since (a) a Merkle tree with over ULONG_MAX blocks would only be
needed for a file larger than MAX_LFS_FILESIZE, and (b) for reads, the
status of all Merkle tree blocks has to be tracked in memory.
Therefore, let's make things a bit more efficient on 32-bit systems by
using 'unsigned long[]' for merkle_tree_params::level_start, instead of
'u64[]'. Also, to be extra safe, explicitly check that there aren't
more than ULONG_MAX Merkle tree blocks.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20221223203638.41293-2-ebiggers@kernel.org
I've gotten very little use out of these debug messages, and I'm not
aware of anyone else having used them.
Indeed, sprinkling pr_debug around is not really a best practice these
days, especially for filesystem code. Tracepoints are used instead.
Let's just remove these and start from a clean slate.
This change does not affect info, warning, and error messages.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221215060420.60692-1-ebiggers@kernel.org
fsverity_operations::write_merkle_tree_block is passed the index of the
block to write and the log base 2 of the block size. However, all
implementations of it use these parameters only to calculate the
position and the size of the block, in bytes.
Therefore, make ->write_merkle_tree_block take 'pos' and 'size'
parameters instead of 'index' and 'log_blocksize'.
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20221214224304.145712-5-ebiggers@kernel.org
Make fsverity_cleanup_inode() an inline function that checks for
non-NULL ->i_verity_info, then (if needed) calls
__fsverity_cleanup_inode() to do the real work. This reduces the
overhead on non-verity files.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20221214224304.145712-4-ebiggers@kernel.org
Make fsverity_prepare_setattr() an inline function that does the
IS_VERITY() check, then (if needed) calls __fsverity_prepare_setattr()
to do the real work. This reduces the overhead on non-verity files.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20221214224304.145712-3-ebiggers@kernel.org
Make fsverity_file_open() an inline function that does the IS_VERITY()
check, then (if needed) calls __fsverity_file_open() to do the real
work. This reduces the overhead on non-verity files.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20221214224304.145712-2-ebiggers@kernel.org
* aosp/upstream-f2fs-stable-linux-6.1.y:
fsverity: simplify fsverity_get_digest()
fsverity: stop using PG_error to track error status
f2fs: reset wait_ms to default if any of the victims have been selected
f2fs: fix some format WARNING in debug.c and sysfs.c
f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super()
f2fs: fix iostat parameter for discard
f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache
f2fs: add block_age-based extent cache
f2fs: allocate the extent_cache by default
f2fs: refactor extent_cache to support for read and more
f2fs: remove unnecessary __init_extent_tree
f2fs: move internal functions into extent_cache.c
f2fs: specify extent cache for read explicitly
f2fs: introduce f2fs_is_readonly() for readability
f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macro
f2fs: do some cleanup for f2fs module init
MAINTAINERS: Add f2fs bug tracker link
f2fs: remove the unused flush argument to change_curseg
f2fs: open code allocate_segment_by_default
f2fs: remove struct segment_allocation default_salloc_ops
f2fs: introduce discard_urgent_util sysfs node
f2fs: define MIN_DISCARD_GRANULARITY macro
f2fs: init discard policy after thread wakeup
f2fs: avoid victim selection from previous victim section
f2fs: truncate blocks in batch in __complete_revoke_list()
f2fs: make __queue_discard_cmd() return void
f2fs: fix description about discard_granularity node
f2fs: move set_file_temperature into f2fs_new_inode
f2fs: fix to enable compress for newly created file if extension matches
f2fs: set zstd compress level correctly
f2fs: change type for 'sbi->readdir_ra'
f2fs: cleanup for 'f2fs_tuning_parameters' function
f2fs: fix to alloc_mode changed after remount on a small volume device
f2fs: remove submit label in __submit_discard_cmd()
f2fs: fix to do sanity check on i_extra_isize in is_alive()
f2fs: introduce F2FS_IOC_START_ATOMIC_REPLACE
f2fs: fix to set flush_merge opt and show noflush_merge
f2fs: initialize locks earlier in f2fs_fill_super()
f2fs: optimize iteration over sparse directories
f2fs: fix to avoid accessing uninitialized spinlock
f2fs: correct i_size change for atomic writes
f2fs: add proc entry to show discard_plist info
f2fs: allow to read node block after shutdown
f2fs: replace ternary operator with max()
f2fs: replace gc_urgent_high_remaining with gc_remaining_trials
f2fs: add missing bracket in doc
f2fs: use sysfs_emit instead of sprintf
f2fs: introduce gc_mode sysfs node
f2fs: fix to destroy sbi->post_read_wq in error path of f2fs_fill_super()
f2fs: fix return val in f2fs_start_ckpt_thread()
f2fs: fix the msg data type
f2fs: fix the assign logic of iocb
f2fs: Fix typo in comments
f2fs: introduce max_ordered_discard sysfs node
f2fs: allow to set compression for inlined file
f2fs: add barrier mount option
f2fs: fix normal discard process
f2fs: cleanup in f2fs_create_flush_cmd_control()
f2fs: fix gc mode when gc_urgent_high_remaining is 1
f2fs: remove batched_trim_sections node
f2fs: support fault injection for f2fs_is_valid_blkaddr()
f2fs: fix to invalidate dcc->f2fs_issue_discard in error path
f2fs: Fix the race condition of resize flag between resizefs
f2fs: let's avoid to get cp_rwsem twice by f2fs_evict_inode by d_invalidate
f2fs: should put a page when checking the summary info
Bug: 256243893
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: I84a5ebbfbfb58a1f89327ce003a298aaae7a42b9
Instead of looking up the algorithm by name in hash_algo_name[] to get
its hash_algo ID, just store the hash_algo ID in the fsverity_hash_alg
struct. Verify at boot time that every fsverity_hash_alg has a valid
hash_algo ID with matching digest size.
Remove an unnecessary memset() of the whole digest array to 0 before the
digest is copied into it.
Finally, remove the pr_debug statement. There is already a pr_debug for
the fsverity digest when the file is opened.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/r/20221129045139.69803-1-ebiggers@kernel.org
As a step towards freeing the PG_error flag for other uses, change ext4
and f2fs to stop using PG_error to track verity errors. Instead, if a
verity error occurs, just mark the whole bio as failed. The coarser
granularity isn't really a problem since it isn't any worse than what
the block layer provides, and errors from a multi-page readahead aren't
reported to applications unless a single-page read fails too.
f2fs supports compression, which makes the f2fs changes a bit more
complicated than desired, but the basic premise still works.
Note: there are still a few uses of PageError in f2fs, but they are on
the write path, so they are unrelated and this patch doesn't touch them.
Reviewed-by: Chao Yu <chao@kernel.org>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221129070401.156114-1-ebiggers@kernel.org
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmM6zNkACgkQxWXV+ddt
WDsNMg/+LTuwf6Js+mAl1AgtSpLOl2gLfNBJAUXhzwPbc3nF9bwONE/EUYEXTo5h
kTf1cQRj0NCIZ7iHDwXuWNm77diNl+SChEDIoc7k0d6P7Qmmn2AWbTLM4dleyg5S
6jxPpOMbegycQfL9tSJNaiT9zlZxj9Z+0yPibR99otrgtuv6zuvRxcdh34rEFIyf
xoabO3/18lAKHzYzAZxNXMpbUSBmqLPVoZEOcfBAXvcuIJkzKRP6Y9gwlYs+kn+D
J8BPa3LoSNxXrpCvWzlu7vO3gwNp7H7pQQqZKjjEcOZ+dj2UYQeTyJvl1vdzaNyk
EoFYlkaKkYi7RaonuHjNaTeD/igJf8Eo6DTiXzACECssbKutlvNG4HXuFApsWy7M
T7KZ5jTAQ98ZMYjgZ27UbEpFZd8lYHzV952Njjo9zbRVbqwaPEZTTdkjpz+3X6t4
Z0A951ixOYKiOVdu3Uj1fHaBv0n/p0wrXIGt3ZIdjufM9TctV3oJwOZOiM2H0ccb
XJVwsQG92+ja9XLZrw8H62PCKBYo3LL52r9b9NVodY9aTsQWTfiV5OP84RRlncCp
hzPkHmO1YIyVcLoijagiO7cW21pQbKfqsRX/P1F7DXyjosHppmDS7IHDWA7Adf3W
QA6eBnoWqVwBh7P+IyxJuRG0CrnxkPZeAZIhohDwk5Mt4NGATkA=
=NlUz
-----END PGP SIGNATURE-----
Merge tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"There's a bunch of performance improvements, most notably the FIEMAP
speedup, the new block group tree to speed up mount on large
filesystems, more io_uring integration, some sysfs exports and the
usual fixes and core updates.
Summary:
Performance:
- outstanding FIEMAP speed improvement
- algorithmic change how extents are enumerated leads to orders of
magnitude speed boost (uncached and cached)
- extent sharing check speedup (2.2x uncached, 3x cached)
- add more cancellation points, allowing to interrupt seeking in
files with large number of extents
- more efficient hole and data seeking (4x uncached, 1.3x cached)
- sample results:
256M, 32K extents: 4s -> 29ms (~150x)
512M, 64K extents: 30s -> 59ms (~550x)
1G, 128K extents: 225s -> 120ms (~1800x)
- improved inode logging, especially for directories (on dbench
workload throughput +25%, max latency -21%)
- improved buffered IO, remove redundant extent state tracking,
lowering memory consumption and avoiding rb tree traversal
- add sysfs tunable to let qgroup temporarily skip exact accounting
when deleting snapshot, leading to a speedup but requiring a rescan
after that, will be used by snapper
- support io_uring and buffered writes, until now it was just for
direct IO, with the no-wait semantics implemented in the buffered
write path it now works and leads to speed improvement in IOPS
(2x), throughput (2.2x), latency (depends, 2x to 150x)
- small performance improvements when dropping and searching for
extent maps as well as when flushing delalloc in COW mode
(throughput +5MB/s)
User visible changes:
- new incompatible feature block-group-tree adding a dedicated tree
for tracking block groups, this allows a much faster load during
mount and avoids seeking unlike when it's scattered in the extent
tree items
- this reduces mount time for many-terabyte sized filesystems
- conversion tool will be provided so existing filesystem can also
be updated in place
- to reduce test matrix and feature combinations requires no-holes
and free-space-tree (mkfs defaults since 5.15)
- improved reporting of super block corruption detected by scrub
- scrub also tries to repair super block and does not wait until next
commit
- discard stats and tunables are exported in sysfs
(/sys/fs/btrfs/FSID/discard)
- qgroup status is exported in sysfs
(/sys/sys/fs/btrfs/FSID/qgroups/)
- verify that super block was not modified when thawing filesystem
Fixes:
- FIEMAP fixes
- fix extent sharing status, does not depend on the cached status
where merged
- flush delalloc so compressed extents are reported correctly
- fix alignment of VMA for memory mapped files on THP
- send: fix failures when processing inodes with no links (orphan
files and directories)
- fix race between quota enable and quota rescan ioctl
- handle more corner cases for read-only compat feature verification
- fix missed extent on fsync after dropping extent maps
Core:
- lockdep annotations to validate various transactions states and
state transitions
- preliminary support for fs-verity in send
- more effective memory use in scrub for subpage where sector is
smaller than page
- block group caching progress logic has been removed, load is now
synchronous
- simplify end IO callbacks and bio handling, use chained bios
instead of own tracking
- add no-wait semantics to several functions (tree search, nocow,
flushing, buffered write
- cleanups and refactoring
MM changes:
- export balance_dirty_pages_ratelimited_flags"
* tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (177 commits)
btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer
btrfs: drop extent map range more efficiently
btrfs: avoid pointless extent map tree search when flushing delalloc
btrfs: remove unnecessary next extent map search
btrfs: remove unnecessary NULL pointer checks when searching extent maps
btrfs: assert tree is locked when clearing extent map from logging
btrfs: remove unnecessary extent map initializations
btrfs: remove the refcount warning/check at free_extent_map()
btrfs: add helper to replace extent map range with a new extent map
btrfs: move open coded extent map tree deletion out of inode eviction
btrfs: use cond_resched_rwlock_write() during inode eviction
btrfs: use extent_map_end() at btrfs_drop_extent_map_range()
btrfs: move btrfs_drop_extent_cache() to extent_map.c
btrfs: fix missed extent on fsync after dropping extent maps
btrfs: remove stale prototype of btrfs_write_inode
btrfs: enable nowait async buffered writes
btrfs: assert nowait mode is not used for some btree search functions
btrfs: make btrfs_buffered_write nowait compatible
btrfs: plumb NOWAIT through the write path
btrfs: make lock_and_cleanup_extent_if_need nowait compatible
...
Preserve the fs-verity status of a btrfs file across send/recv.
There is no facility for installing the Merkle tree contents directly on
the receiving filesystem, so we package up the parameters used to enable
verity found in the verity descriptor. This gives the receive side
enough information to properly enable verity again. Note that this means
that receive will have to re-compute the whole Merkle tree, similar to
how compression worked before encoded_write.
Since the file becomes read-only after verity is enabled, it is
important that verity is added to the send stream after any file writes.
Therefore, when we process a verity item, merely note that it happened,
then actually create the command in the send stream during
'finish_inode_if_needed'.
This also creates V3 of the send stream format, without any format
changes besides adding the new commands and attributes.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Convert the use of kmap() to its recommended replacement
kmap_local_page(). This avoids the overhead of doing a non-local
mapping, which is unnecessary in this case.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Link: https://lore.kernel.org/r/20220818224010.43778-1-ebiggers@kernel.org
Replace extract_hash() with the memcpy_from_page() helper function.
This is simpler, and it has the side effect of replacing the use of
kmap_atomic() with its recommended replacement kmap_local_page().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Link: https://lore.kernel.org/r/20220818223903.43710-1-ebiggers@kernel.org
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first argument
like ->read_folio
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmKNMDUACgkQDpNsjXcp
gj4/mwf/bpHhXH4ZoNIvtUpTF6rZbqeffmc0VrbxCZDZ6igRnRPglxZ9H9v6L53O
7B0FBQIfxgNKHZpdqGdOkv8cjg/GMe/HJUbEy5wOakYPo4L9fZpHbDZ9HM2Eankj
xBqLIBgBJ7doKr+Y62DAN19TVD8jfRfVtli5mqXJoNKf65J7BkxljoTH1L3EXD9d
nhLAgyQjR67JQrT/39KMW+17GqLhGefLQ4YnAMONtB6TVwX/lZmigKpzVaCi4r26
bnk5vaR/3PdjtNxIoYvxdc71y2Eg05n2jEq9Wcy1AaDv/5vbyZUlZ2aBSaIVbtKX
WfrhN9O3L0bU5qS7p9PoyfLc9wpq8A==
=djLv
-----END PGP SIGNATURE-----
Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache
Pull page cache updates from Matthew Wilcox:
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first
argument like ->read_folio
* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
nilfs2: Fix some kernel-doc comments
Appoint myself page cache maintainer
fs: Remove aops->freepage
secretmem: Convert to free_folio
nfs: Convert to free_folio
orangefs: Convert to free_folio
fs: Add free_folio address space operation
fs: Convert drop_buffers() to use a folio
fs: Change try_to_free_buffers() to take a folio
jbd2: Convert release_buffer_page() to use a folio
jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
reiserfs: Convert release_buffer_page() to use a folio
fs: Remove last vestiges of releasepage
ubifs: Convert to release_folio
reiserfs: Convert to release_folio
orangefs: Convert to release_folio
ocfs2: Convert to release_folio
nilfs2: Remove comment about releasepage
nfs: Convert to release_folio
jfs: Convert to release_folio
...
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCYo0tOhQcem9oYXJAbGlu
dXguaWJtLmNvbQAKCRDLwZzRsCrn5QJfAP47Ym9vacLc1m8/MUaRA/QjbJ/8t3TX
h/4McK8kiRudxgD/RiPHII6gJ8q+qpBrYWJZ4ZZaHE8v0oA1viuZfbuN2wc=
=KQYi
-----END PGP SIGNATURE-----
Merge tag 'integrity-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull IMA updates from Mimi Zohar:
"New is IMA support for including fs-verity file digests and signatures
in the IMA measurement list as well as verifying the fs-verity file
digest based signatures, both based on policy.
In addition, are two bug fixes:
- avoid reading UEFI variables, which cause a page fault, on Apple
Macs with T2 chips.
- remove the original "ima" template Kconfig option to address a boot
command line ordering issue.
The rest is a mixture of code/documentation cleanup"
* tag 'integrity-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: Fix sparse warnings in keyring_handler
evm: Clean up some variables
evm: Return INTEGRITY_PASS for enum integrity_status value '0'
efi: Do not import certificates from UEFI Secure Boot for T2 Macs
fsverity: update the documentation
ima: support fs-verity file digest based version 3 signatures
ima: permit fsverity's file digests in the IMA measurement list
ima: define a new template field named 'd-ngv2' and templates
fs-verity: define a function to return the integrity protected file digest
ima: use IMA default hash algorithm for integrity violations
ima: fix 'd-ng' comments and documentation
ima: remove the IMA_TEMPLATE Kconfig option
ima: remove redundant initialization of pointer 'file'.
The parameter desc_size in fsverity_create_info() is useless and it is
not referenced anywhere. The greatest meaning of desc_size here is to
indecate the size of struct fsverity_descriptor and futher calculate the
size of signature. However, the desc->sig_size can do it also and it is
indeed, so remove it.
Therefore, it is no need to acquire desc_size by fsverity_get_descriptor()
in ensure_verity_info(), so remove the parameter desc_ret in
fsverity_get_descriptor() too.
Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220518132256.2297655-1-chris.zjh@huawei.com
Removes a couple of calls to compound_head and saves a few bytes.
Also convert verity's read_file_data_page() to be folio-based.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Define a function named fsverity_get_digest() to return the verity file
digest and the associated hash algorithm (enum hash_algo).
This assumes that before calling fsverity_get_digest() the file must have
been opened, which is even true for the IMA measure/appraise on file
open policy rule use case (func=FILE_CHECK). do_open() calls vfs_open()
immediately prior to ima_file_check().
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
All filesystems have now been converted to use ->readahead, so
remove the ->readpages operation and fix all the comments that
used to refer to it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
If the file size is almost S64_MAX, the calculated number of Merkle tree
levels exceeds FS_VERITY_MAX_LEVELS, causing FS_IOC_ENABLE_VERITY to
fail. This is unintentional, since as the comment above the definition
of FS_VERITY_MAX_LEVELS states, it is enough for over U64_MAX bytes of
data using SHA-256 and 4K blocks. (Specifically, 4096*128**8 >= 2**64.)
The bug is actually that when the number of blocks in the first level is
calculated from i_size, there is a signed integer overflow due to i_size
being signed. Fix this by treating i_size as unsigned.
This was found by the new test "generic: test fs-verity EFBIG scenarios"
(https://lkml.kernel.org/r/b1d116cd4d0ea74b9cd86f349c672021e005a75c.1631558495.git.boris@bur.io).
This didn't affect ext4 or f2fs since those have a smaller maximum file
size, but it did affect btrfs which allows files up to S64_MAX bytes.
Reported-by: Boris Burkov <boris@bur.io>
Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl")
Fixes: fd2d1acfcadf ("fs-verity: add the hook for file ->open()")
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Boris Burkov <boris@bur.io>
Link: https://lore.kernel.org/r/20210916203424.113376-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
CONFIG_CRYPTO_SHA256 denotes the generic C implementation of the SHA-256
shash algorithm, which is selected as the default crypto shash provider
for fsverity. However, fsverity has no strict link time dependency, and
the same shash could be exposed by an optimized implementation, and arm64
has a number of those (scalar, NEON-based and one based on special crypto
instructions). In such cases, it makes little sense to require that the
generic C implementation is incorporated as well, given that it will never
be called.
To address this, relax the 'select' clause to 'imply' so that the generic
driver can be omitted from the build if desired.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Steps on the way to 5.12-rc1.
Resolves conflicts in:
fs/overlayfs/inode.c
Note, incfs is broken here, will mark it as BROKEN in another patch to
make it more obvious.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I95a5938fc3cfa27d521c5d4fd18b2adfb13a6d84
Steps on the way to 5.12-rc1.
Resolves merge issues with:
fs/verity/signature.c
include/linux/fsverity.h
Cc: Eric Biggers <ebiggers@google.com>
Cc: Paul Lawrence <paullawrence@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I38ec011e7931f81341afed6cf24de550234b893b
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
=yPaw
-----END PGP SIGNATURE-----
Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull idmapped mounts from Christian Brauner:
"This introduces idmapped mounts which has been in the making for some
time. Simply put, different mounts can expose the same file or
directory with different ownership. This initial implementation comes
with ports for fat, ext4 and with Christoph's port for xfs with more
filesystems being actively worked on by independent people and
maintainers.
Idmapping mounts handle a wide range of long standing use-cases. Here
are just a few:
- Idmapped mounts make it possible to easily share files between
multiple users or multiple machines especially in complex
scenarios. For example, idmapped mounts will be used in the
implementation of portable home directories in
systemd-homed.service(8) where they allow users to move their home
directory to an external storage device and use it on multiple
computers where they are assigned different uids and gids. This
effectively makes it possible to assign random uids and gids at
login time.
- It is possible to share files from the host with unprivileged
containers without having to change ownership permanently through
chown(2).
- It is possible to idmap a container's rootfs and without having to
mangle every file. For example, Chromebooks use it to share the
user's Download folder with their unprivileged containers in their
Linux subsystem.
- It is possible to share files between containers with
non-overlapping idmappings.
- Filesystem that lack a proper concept of ownership such as fat can
use idmapped mounts to implement discretionary access (DAC)
permission checking.
- They allow users to efficiently changing ownership on a per-mount
basis without having to (recursively) chown(2) all files. In
contrast to chown (2) changing ownership of large sets of files is
instantenous with idmapped mounts. This is especially useful when
ownership of a whole root filesystem of a virtual machine or
container is changed. With idmapped mounts a single syscall
mount_setattr syscall will be sufficient to change the ownership of
all files.
- Idmapped mounts always take the current ownership into account as
idmappings specify what a given uid or gid is supposed to be mapped
to. This contrasts with the chown(2) syscall which cannot by itself
take the current ownership of the files it changes into account. It
simply changes the ownership to the specified uid and gid. This is
especially problematic when recursively chown(2)ing a large set of
files which is commong with the aforementioned portable home
directory and container and vm scenario.
- Idmapped mounts allow to change ownership locally, restricting it
to specific mounts, and temporarily as the ownership changes only
apply as long as the mount exists.
Several userspace projects have either already put up patches and
pull-requests for this feature or will do so should you decide to pull
this:
- systemd: In a wide variety of scenarios but especially right away
in their implementation of portable home directories.
https://systemd.io/HOME_DIRECTORY/
- container runtimes: containerd, runC, LXD:To share data between
host and unprivileged containers, unprivileged and privileged
containers, etc. The pull request for idmapped mounts support in
containerd, the default Kubernetes runtime is already up for quite
a while now: https://github.com/containerd/containerd/pull/4734
- The virtio-fs developers and several users have expressed interest
in using this feature with virtual machines once virtio-fs is
ported.
- ChromeOS: Sharing host-directories with unprivileged containers.
I've tightly synced with all those projects and all of those listed
here have also expressed their need/desire for this feature on the
mailing list. For more info on how people use this there's a bunch of
talks about this too. Here's just two recent ones:
https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdfhttps://fosdem.org/2021/schedule/event/containers_idmap/
This comes with an extensive xfstests suite covering both ext4 and
xfs:
https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts
It covers truncation, creation, opening, xattrs, vfscaps, setid
execution, setgid inheritance and more both with idmapped and
non-idmapped mounts. It already helped to discover an unrelated xfs
setgid inheritance bug which has since been fixed in mainline. It will
be sent for inclusion with the xfstests project should you decide to
merge this.
In order to support per-mount idmappings vfsmounts are marked with
user namespaces. The idmapping of the user namespace will be used to
map the ids of vfs objects when they are accessed through that mount.
By default all vfsmounts are marked with the initial user namespace.
The initial user namespace is used to indicate that a mount is not
idmapped. All operations behave as before and this is verified in the
testsuite.
Based on prior discussions we want to attach the whole user namespace
and not just a dedicated idmapping struct. This allows us to reuse all
the helpers that already exist for dealing with idmappings instead of
introducing a whole new range of helpers. In addition, if we decide in
the future that we are confident enough to enable unprivileged users
to setup idmapped mounts the permission checking can take into account
whether the caller is privileged in the user namespace the mount is
currently marked with.
The user namespace the mount will be marked with can be specified by
passing a file descriptor refering to the user namespace as an
argument to the new mount_setattr() syscall together with the new
MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
of extensibility.
The following conditions must be met in order to create an idmapped
mount:
- The caller must currently have the CAP_SYS_ADMIN capability in the
user namespace the underlying filesystem has been mounted in.
- The underlying filesystem must support idmapped mounts.
- The mount must not already be idmapped. This also implies that the
idmapping of a mount cannot be altered once it has been idmapped.
- The mount must be a detached/anonymous mount, i.e. it must have
been created by calling open_tree() with the OPEN_TREE_CLONE flag
and it must not already have been visible in the filesystem.
The last two points guarantee easier semantics for userspace and the
kernel and make the implementation significantly simpler.
By default vfsmounts are marked with the initial user namespace and no
behavioral or performance changes are observed.
The manpage with a detailed description can be found here:
1d7b902e28
In order to support idmapped mounts, filesystems need to be changed
and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
patches to convert individual filesystem are not very large or
complicated overall as can be seen from the included fat, ext4, and
xfs ports. Patches for other filesystems are actively worked on and
will be sent out separately. The xfstestsuite can be used to verify
that port has been done correctly.
The mount_setattr() syscall is motivated independent of the idmapped
mounts patches and it's been around since July 2019. One of the most
valuable features of the new mount api is the ability to perform
mounts based on file descriptors only.
Together with the lookup restrictions available in the openat2()
RESOLVE_* flag namespace which we added in v5.6 this is the first time
we are close to hardened and race-free (e.g. symlinks) mounting and
path resolution.
While userspace has started porting to the new mount api to mount
proper filesystems and create new bind-mounts it is currently not
possible to change mount options of an already existing bind mount in
the new mount api since the mount_setattr() syscall is missing.
With the addition of the mount_setattr() syscall we remove this last
restriction and userspace can now fully port to the new mount api,
covering every use-case the old mount api could. We also add the
crucial ability to recursively change mount options for a whole mount
tree, both removing and adding mount options at the same time. This
syscall has been requested multiple times by various people and
projects.
There is a simple tool available at
https://github.com/brauner/mount-idmapped
that allows to create idmapped mounts so people can play with this
patch series. I'll add support for the regular mount binary should you
decide to pull this in the following weeks:
Here's an example to a simple idmapped mount of another user's home
directory:
u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt
u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
-rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
-rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
-rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ touch /mnt/my-file
u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--
u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--"
* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
xfs: support idmapped mounts
ext4: support idmapped mounts
fat: handle idmapped mounts
tests: add mount_setattr() selftests
fs: introduce MOUNT_ATTR_IDMAP
fs: add mount_setattr()
fs: add attr_flags_to_mnt_flags helper
fs: split out functions to hold writers
namespace: only take read lock in do_reconfigure_mnt()
mount: make {lock,unlock}_mount_hash() static
namespace: take lock_mount_hash() directly when changing flags
nfs: do not export idmapped mounts
overlayfs: do not mount on top of idmapped mounts
ecryptfs: do not mount on top of idmapped mounts
ima: handle idmapped mounts
apparmor: handle idmapped mounts
fs: make helpers idmap mount aware
exec: handle idmapped mounts
would_dump: handle idmapped mounts
...
Allows a file system to provide its own fs-verity implementation
but still to hook into the signature check and control file from
fs-verity
Bug: 160634504
Bug: 170978993
Test: incfs_test running on this + subsequent changes
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I02020af460d62fa5eb459a083419208e175005e8
Add support for FS_VERITY_METADATA_TYPE_SIGNATURE to
FS_IOC_READ_VERITY_METADATA. This allows a userspace server program to
retrieve the built-in signature (if present) of a verity file for
serving to a client which implements fs-verity compatible verification.
See the patch which introduced FS_IOC_READ_VERITY_METADATA for more
details.
The ability for userspace to read the built-in signatures is also useful
because it allows a system that is using the in-kernel signature
verification to migrate to userspace signature verification.
This has been tested using a new xfstest which calls this ioctl via a
new subcommand for the 'fsverity' program from fsverity-utils.
Link: https://lore.kernel.org/r/20210115181819.34732-7-ebiggers@kernel.org
Reviewed-by: Victor Hsieh <victorhsieh@google.com>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Add support for FS_VERITY_METADATA_TYPE_DESCRIPTOR to
FS_IOC_READ_VERITY_METADATA. This allows a userspace server program to
retrieve the fs-verity descriptor of a file for serving to a client
which implements fs-verity compatible verification. See the patch which
introduced FS_IOC_READ_VERITY_METADATA for more details.
"fs-verity descriptor" here means only the part that userspace cares
about because it is hashed to produce the file digest. It doesn't
include the signature which ext4 and f2fs append to the
fsverity_descriptor struct when storing it on-disk, since that way of
storing the signature is an implementation detail. The next patch adds
a separate metadata_type value for retrieving the signature separately.
This has been tested using a new xfstest which calls this ioctl via a
new subcommand for the 'fsverity' program from fsverity-utils.
Link: https://lore.kernel.org/r/20210115181819.34732-6-ebiggers@kernel.org
Reviewed-by: Victor Hsieh <victorhsieh@google.com>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>