* refs/heads/tmp-1aff922:
Revert "net: xfrm: Localize sequence counter per network namespace"
ANDROID: Kbuild: Add support for KBUILD_MIXED_TREE
ANDROID: build.config: Add vmlinux.symvers and modules.builtin to DIST_DIR
ANDROID: abi_gki_aarch64_qcom: Add thermal zone enable and unregister
FROMGIT: usb: typec: tcpm: Honour pSnkStdby requirement during negotiation
FROMGIT: dm verity fec: fix misaligned RS roots IO
Revert "Revert "dm verity: fix FEC for RS roots unaligned to blo..."
Revert "Revert "dm bufio: subtract the number of initial sectors..."
ANDROID: smp: fix preprocessor conditional warning
ANDROID: mm/memory_hotplug: fix minor printk format warnings
ANDROID: power_supply: inline empty power_supply_get_by_phandle_array()
FROMGIT: usb: dwc3: core: Add shutdown callback for dwc3
FROMGIT: usb: dwc3: gadget: Ignore Packet Pending bit
Linux 5.10.30
Revert "net: sched: bump refcount for new action in ACT replace mode"
net: ieee802154: stop dump llsec params for monitors
net: ieee802154: forbid monitor for del llsec seclevel
net: ieee802154: forbid monitor for set llsec params
net: ieee802154: fix nl802154 del llsec devkey
net: ieee802154: fix nl802154 add llsec key
net: ieee802154: fix nl802154 del llsec dev
net: ieee802154: fix nl802154 del llsec key
net: ieee802154: nl-mac: fix check on panid
net: mac802154: Fix general protection fault
drivers: net: fix memory leak in peak_usb_create_dev
drivers: net: fix memory leak in atusb_probe
net: tun: set tun->dev->addr_len during TUNSETLINK processing
cfg80211: remove WARN_ON() in cfg80211_sme_connect
gpiolib: Read "gpio-line-names" from a firmware node
net: sched: bump refcount for new action in ACT replace mode
dt-bindings: net: ethernet-controller: fix typo in NVMEM
lockdep: Address clang -Wformat warning printing for %hd
clk: socfpga: fix iomem pointer cast on 64-bit
RAS/CEC: Correct ce_add_elem()'s returned values
vdpa/mlx5: Fix wrong use of bit numbers
vdpa/mlx5: should exclude header length and fcs from mtu
RDMA/addr: Be strict with gid size
i40e: Fix parameters in aq_get_phy_register()
drm/vc4: crtc: Reduce PV fifo threshold on hvs4
RDMA/qedr: Fix kernel panic when trying to access recv_cq
perf report: Fix wrong LBR block sorting
RDMA/cxgb4: check for ipv6 address properly while destroying listener
net/mlx5: Fix PBMC register mapping
net/mlx5: Fix PPLM register mapping
net/mlx5: Fix placement of log_max_flow_counter
net: hns3: clear VF down state bit before request link status
tipc: increment the tmp aead refcnt before attaching it
can: mcp251x: fix support for half duplex SPI host controllers
iwlwifi: fix 11ax disabled bit in the regulatory capability flags
i2c: designware: Adjust bus_freq_hz when refuse high speed mode set
openvswitch: fix send of uninitialized stack memory in ct limit reply
net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()
perf inject: Fix repipe usage
s390/cpcmd: fix inline assembly register clobbering
workqueue: Move the position of debug_work_activate() in __queue_work()
clk: fix invalid usage of list cursor in unregister
clk: fix invalid usage of list cursor in register
net: macb: restore cmp registers on resume path
net: cls_api: Fix uninitialised struct field bo->unlocked_driver_cb
scsi: ufs: core: Fix wrong Task Tag used in task management request UPIUs
scsi: ufs: core: Fix task management request completion timeout
mptcp: forbit mcast-related sockopt on MPTCP sockets
net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);
drm/msm: Set drvdata to NULL when msm_drm_init() fails
RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files
i40e: Fix display statistics for veb_tc
soc/fsl: qbman: fix conflicting alignment attributes
xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model
net/rds: Fix a use after free in rds_message_map_pages
net/mlx5: Don't request more than supported EQs
net/mlx5e: Fix ethtool indication of connector type
net/mlx5e: Fix mapping of ct_label zero
ASoC: sunxi: sun4i-codec: fill ASoC card owner
I2C: JZ4780: Fix bug for Ingenic X1000.
net: phy: broadcom: Only advertise EEE for supported modes
nfp: flower: ignore duplicate merge hints from FW
net: qrtr: Fix memory leak on qrtr_tx_wait failure
net/ncsi: Avoid channel_monitor hrtimer deadlock
ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces
net:tipc: Fix a double free in tipc_sk_mcast_rcv
cxgb4: avoid collecting SGE_QBASE regs during traffic
net: dsa: Fix type was not set for devlink port
gianfar: Handle error code at MAC address change
ethernet: myri10ge: Fix a use after free in myri10ge_sw_tso
mlxsw: spectrum: Fix ECN marking in tunnel decapsulation
can: isotp: fix msg_namelen values depending on CAN_REQUIRED_SIZE
can: bcm/raw: fix msg_namelen values depending on CAN_REQUIRED_SIZE
xfrm: Provide private skb extensions for segmented and hw offloaded ESP packets
arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0
drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit
sch_red: fix off-by-one checks in red_check_params()
geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply
vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
amd-xgbe: Update DMA coherency values
hostfs: fix memory handling in follow_link()
i40e: Fix kernel oops when i40e driver removes VF's
i40e: Added Asym_Pause to supported link modes
virtchnl: Fix layout of RSS structures
xfrm: Fix NULL pointer dereference on policy lookup
ASoC: wm8960: Fix wrong bclk and lrclk with pll enabled for some chips
ASoC: SOF: Intel: HDA: fix core status verification
esp: delete NETIF_F_SCTP_CRC bit from features for esp offload
net: xfrm: Localize sequence counter per network namespace
ARM: OMAP4: PM: update ROM return address for OSWR and OFF
ARM: OMAP4: Fix PMIC voltage domains for bionic
regulator: bd9571mwv: Fix AVS and DVFS voltage range
remoteproc: qcom: pil_info: avoid 64-bit division
xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume
xfrm: interface: fix ipv4 pmtu check to honor ip header df
ice: Recognize 860 as iSCSI port in CEE mode
ice: Refactor DCB related variables out of the ice_port_info struct
net: sched: fix err handler in tcf_action_init()
KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp
KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed
KVM: x86/mmu: Ensure forward progress when yielding in TDP MMU iter
KVM: x86/mmu: Rename goal_gfn to next_last_level_gfn
KVM: x86/mmu: Merge flush and non-flush tdp_mmu_iter_cond_resched
KVM: x86/mmu: change TDP MMU yield function returns to match cond_resched
i2c: turn recovery error on init to debug
percpu: make pcpu_nr_empty_pop_pages per chunk type
scsi: target: iscsi: Fix zero tag inside a trace event
scsi: pm80xx: Fix chip initialization failure
driver core: Fix locking bug in deferred_probe_timeout_work_func()
usbip: synchronize event handler with sysfs code paths
usbip: vudc synchronize sysfs code paths
usbip: stub-dev synchronize sysfs code paths
usbip: add sysfs_lock to synchronize sysfs code paths
thunderbolt: Fix off by one in tb_port_find_retimer()
thunderbolt: Fix a leak in tb_retimer_add()
net: let skb_orphan_partial wake-up waiters.
net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind()
net: hsr: Reset MAC header for Tx path
mac80211: fix TXQ AC confusion
mac80211: fix time-is-after bug in mlme
cfg80211: check S1G beacon compat element length
nl80211: fix potential leak of ACL params
nl80211: fix beacon head validation
net: sched: fix action overwrite reference counting
net: sched: sch_teql: fix null-pointer dereference
vdpa/mlx5: Fix suspend/resume index restoration
i40e: Fix sparse errors in i40e_txrx.c
i40e: Fix sparse error: uninitialized symbol 'ring'
i40e: Fix sparse error: 'vsi->netdev' could be null
i40e: Fix sparse warning: missing error code 'err'
net: ensure mac header is set in virtio_net_hdr_to_skb()
bpf, sockmap: Fix incorrect fwd_alloc accounting
bpf, sockmap: Fix sk->prot unhash op reset
bpf: Refcount task stack in bpf_get_task_stack
libbpf: Only create rx and tx XDP rings when necessary
libbpf: Restore umem state after socket create failure
libbpf: Ensure umem pointer is non-NULL before dereferencing
ethernet/netronome/nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
bpf: link: Refuse non-O_RDWR flags in BPF_OBJ_GET
bpf: Enforce that struct_ops programs be GPL-only
libbpf: Fix bail out from 'ringbuf_process_ring()' on error
net: hso: fix null-ptr-deref during tty device unregistration
ice: fix memory leak of aRFS after resuming from suspend
iwlwifi: pcie: properly set LTR workarounds on 22000 devices
ice: Cleanup fltr list in case of allocation issues
ice: Use port number instead of PF ID for WoL
ice: Fix for dereference of NULL pointer
ice: remove DCBNL_DEVRESET bit from PF state
ice: fix memory allocation call
ice: prevent ice_open and ice_stop during reset
ice: Increase control queue timeout
ice: Continue probe on link/PHY errors
batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field
ARM: dts: turris-omnia: configure LED[2]/INTn pin as interrupt pin
parisc: avoid a warning on u8 cast for cmpxchg on u8 pointers
parisc: parisc-agp requires SBA IOMMU driver
of: property: fw_devlink: do not link ".*,nr-gpios"
ethtool: fix incorrect datatype in set_eee ops
fs: direct-io: fix missing sdio->boundary
ocfs2: fix deadlock between setattr and dio_end_io_write
nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff
ia64: fix user_stack_pointer() for ptrace()
gcov: re-fix clang-11+ support
LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOS
ACPI: processor: Fix build when CONFIG_ACPI_PROCESSOR=m
drm/i915: Fix invalid access to ACPI _DSM objects
net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
net: dsa: lantiq_gswip: Don't use PHY auto polling
net: dsa: lantiq_gswip: Let GSWIP automatically set the xMII clock
net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
xen/evtchn: Change irq_info lock to raw_spinlock_t
selinux: fix race between old and new sidtab
selinux: fix cond_list corruption when changing booleans
selinux: make nslot handling in avtab more robust
nfc: Avoid endless loops caused by repeated llcp_sock_connect()
nfc: fix memory leak in llcp_sock_connect()
nfc: fix refcount leak in llcp_sock_connect()
nfc: fix refcount leak in llcp_sock_bind()
ASoC: intel: atom: Stop advertising non working S24LE support
ALSA: hda/conexant: Apply quirk for another HP ZBook G5 model
ALSA: hda/realtek: Fix speaker amp setup on Acer Aspire E1
ALSA: aloop: Fix initialization of controls
xfrm/compat: Cleanup WARN()s that can be user-triggered
ANDROID: usb: typec: tcpm: Update tcpm_update_sink_capabilities
ANDROID: GKI: Update the ABI xml
ANDROID: GKI: Add generic aarch64 symbol list
ANDROID: usb: host: Use old init scheme when hook unavailable
Revert "dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size"
Revert "dm verity: fix FEC for RS roots unaligned to block size"
Revert "ANDROID: AVB error handler to invalidate vbmeta partition."
ANDROID: gki_defconfig: reduce KFENCE pool size
FROMGIT: virt_wifi: Return micros for BSS TSF values
ANDROID: stacktrace: export stack_trace_save_tsk/regs
ANDROID: arm64: declare system_32bit_el0_cpumask as export
ANDROID: Fix compilation warning in __iommu_map_pages()
ANDROID: iommu/io-pgtable-arm: Fix arguments for __arm_lpae_map()
ANDROID: GKI: Bump KMI_GENERATION, ABI representation
ANDROID: GKI: Update virtual device symbol list
ANDROID: usb: host: free the offload TR by vendor hook
Conflicts:
Documentation/devicetree/bindings
Documentation/devicetree/bindings/net/ethernet-controller.yaml
net/qrtr/qrtr.c
Change-Id: I2cd033199ac0993bd0f793aeedee16a2ccbb5245
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
Changes in 5.10.30
xfrm/compat: Cleanup WARN()s that can be user-triggered
ALSA: aloop: Fix initialization of controls
ALSA: hda/realtek: Fix speaker amp setup on Acer Aspire E1
ALSA: hda/conexant: Apply quirk for another HP ZBook G5 model
ASoC: intel: atom: Stop advertising non working S24LE support
nfc: fix refcount leak in llcp_sock_bind()
nfc: fix refcount leak in llcp_sock_connect()
nfc: fix memory leak in llcp_sock_connect()
nfc: Avoid endless loops caused by repeated llcp_sock_connect()
selinux: make nslot handling in avtab more robust
selinux: fix cond_list corruption when changing booleans
selinux: fix race between old and new sidtab
xen/evtchn: Change irq_info lock to raw_spinlock_t
net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
net: dsa: lantiq_gswip: Let GSWIP automatically set the xMII clock
net: dsa: lantiq_gswip: Don't use PHY auto polling
net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
drm/i915: Fix invalid access to ACPI _DSM objects
ACPI: processor: Fix build when CONFIG_ACPI_PROCESSOR=m
IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOS
LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
gcov: re-fix clang-11+ support
ia64: fix user_stack_pointer() for ptrace()
nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff
ocfs2: fix deadlock between setattr and dio_end_io_write
fs: direct-io: fix missing sdio->boundary
ethtool: fix incorrect datatype in set_eee ops
of: property: fw_devlink: do not link ".*,nr-gpios"
parisc: parisc-agp requires SBA IOMMU driver
parisc: avoid a warning on u8 cast for cmpxchg on u8 pointers
ARM: dts: turris-omnia: configure LED[2]/INTn pin as interrupt pin
batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field
ice: Continue probe on link/PHY errors
ice: Increase control queue timeout
ice: prevent ice_open and ice_stop during reset
ice: fix memory allocation call
ice: remove DCBNL_DEVRESET bit from PF state
ice: Fix for dereference of NULL pointer
ice: Use port number instead of PF ID for WoL
ice: Cleanup fltr list in case of allocation issues
iwlwifi: pcie: properly set LTR workarounds on 22000 devices
ice: fix memory leak of aRFS after resuming from suspend
net: hso: fix null-ptr-deref during tty device unregistration
libbpf: Fix bail out from 'ringbuf_process_ring()' on error
bpf: Enforce that struct_ops programs be GPL-only
bpf: link: Refuse non-O_RDWR flags in BPF_OBJ_GET
ethernet/netronome/nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
libbpf: Ensure umem pointer is non-NULL before dereferencing
libbpf: Restore umem state after socket create failure
libbpf: Only create rx and tx XDP rings when necessary
bpf: Refcount task stack in bpf_get_task_stack
bpf, sockmap: Fix sk->prot unhash op reset
bpf, sockmap: Fix incorrect fwd_alloc accounting
net: ensure mac header is set in virtio_net_hdr_to_skb()
i40e: Fix sparse warning: missing error code 'err'
i40e: Fix sparse error: 'vsi->netdev' could be null
i40e: Fix sparse error: uninitialized symbol 'ring'
i40e: Fix sparse errors in i40e_txrx.c
vdpa/mlx5: Fix suspend/resume index restoration
net: sched: sch_teql: fix null-pointer dereference
net: sched: fix action overwrite reference counting
nl80211: fix beacon head validation
nl80211: fix potential leak of ACL params
cfg80211: check S1G beacon compat element length
mac80211: fix time-is-after bug in mlme
mac80211: fix TXQ AC confusion
net: hsr: Reset MAC header for Tx path
net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind()
net: let skb_orphan_partial wake-up waiters.
thunderbolt: Fix a leak in tb_retimer_add()
thunderbolt: Fix off by one in tb_port_find_retimer()
usbip: add sysfs_lock to synchronize sysfs code paths
usbip: stub-dev synchronize sysfs code paths
usbip: vudc synchronize sysfs code paths
usbip: synchronize event handler with sysfs code paths
driver core: Fix locking bug in deferred_probe_timeout_work_func()
scsi: pm80xx: Fix chip initialization failure
scsi: target: iscsi: Fix zero tag inside a trace event
percpu: make pcpu_nr_empty_pop_pages per chunk type
i2c: turn recovery error on init to debug
KVM: x86/mmu: change TDP MMU yield function returns to match cond_resched
KVM: x86/mmu: Merge flush and non-flush tdp_mmu_iter_cond_resched
KVM: x86/mmu: Rename goal_gfn to next_last_level_gfn
KVM: x86/mmu: Ensure forward progress when yielding in TDP MMU iter
KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed
KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp
net: sched: fix err handler in tcf_action_init()
ice: Refactor DCB related variables out of the ice_port_info struct
ice: Recognize 860 as iSCSI port in CEE mode
xfrm: interface: fix ipv4 pmtu check to honor ip header df
xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume
remoteproc: qcom: pil_info: avoid 64-bit division
regulator: bd9571mwv: Fix AVS and DVFS voltage range
ARM: OMAP4: Fix PMIC voltage domains for bionic
ARM: OMAP4: PM: update ROM return address for OSWR and OFF
net: xfrm: Localize sequence counter per network namespace
esp: delete NETIF_F_SCTP_CRC bit from features for esp offload
ASoC: SOF: Intel: HDA: fix core status verification
ASoC: wm8960: Fix wrong bclk and lrclk with pll enabled for some chips
xfrm: Fix NULL pointer dereference on policy lookup
virtchnl: Fix layout of RSS structures
i40e: Added Asym_Pause to supported link modes
i40e: Fix kernel oops when i40e driver removes VF's
hostfs: fix memory handling in follow_link()
amd-xgbe: Update DMA coherency values
vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply
sch_red: fix off-by-one checks in red_check_params()
drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit
arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0
xfrm: Provide private skb extensions for segmented and hw offloaded ESP packets
can: bcm/raw: fix msg_namelen values depending on CAN_REQUIRED_SIZE
can: isotp: fix msg_namelen values depending on CAN_REQUIRED_SIZE
mlxsw: spectrum: Fix ECN marking in tunnel decapsulation
ethernet: myri10ge: Fix a use after free in myri10ge_sw_tso
gianfar: Handle error code at MAC address change
net: dsa: Fix type was not set for devlink port
cxgb4: avoid collecting SGE_QBASE regs during traffic
net:tipc: Fix a double free in tipc_sk_mcast_rcv
ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces
net/ncsi: Avoid channel_monitor hrtimer deadlock
net: qrtr: Fix memory leak on qrtr_tx_wait failure
nfp: flower: ignore duplicate merge hints from FW
net: phy: broadcom: Only advertise EEE for supported modes
I2C: JZ4780: Fix bug for Ingenic X1000.
ASoC: sunxi: sun4i-codec: fill ASoC card owner
net/mlx5e: Fix mapping of ct_label zero
net/mlx5e: Fix ethtool indication of connector type
net/mlx5: Don't request more than supported EQs
net/rds: Fix a use after free in rds_message_map_pages
xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model
soc/fsl: qbman: fix conflicting alignment attributes
i40e: Fix display statistics for veb_tc
RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files
drm/msm: Set drvdata to NULL when msm_drm_init() fails
net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);
mptcp: forbit mcast-related sockopt on MPTCP sockets
scsi: ufs: core: Fix task management request completion timeout
scsi: ufs: core: Fix wrong Task Tag used in task management request UPIUs
net: cls_api: Fix uninitialised struct field bo->unlocked_driver_cb
net: macb: restore cmp registers on resume path
clk: fix invalid usage of list cursor in register
clk: fix invalid usage of list cursor in unregister
workqueue: Move the position of debug_work_activate() in __queue_work()
s390/cpcmd: fix inline assembly register clobbering
perf inject: Fix repipe usage
net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()
openvswitch: fix send of uninitialized stack memory in ct limit reply
i2c: designware: Adjust bus_freq_hz when refuse high speed mode set
iwlwifi: fix 11ax disabled bit in the regulatory capability flags
can: mcp251x: fix support for half duplex SPI host controllers
tipc: increment the tmp aead refcnt before attaching it
net: hns3: clear VF down state bit before request link status
net/mlx5: Fix placement of log_max_flow_counter
net/mlx5: Fix PPLM register mapping
net/mlx5: Fix PBMC register mapping
RDMA/cxgb4: check for ipv6 address properly while destroying listener
perf report: Fix wrong LBR block sorting
RDMA/qedr: Fix kernel panic when trying to access recv_cq
drm/vc4: crtc: Reduce PV fifo threshold on hvs4
i40e: Fix parameters in aq_get_phy_register()
RDMA/addr: Be strict with gid size
vdpa/mlx5: should exclude header length and fcs from mtu
vdpa/mlx5: Fix wrong use of bit numbers
RAS/CEC: Correct ce_add_elem()'s returned values
clk: socfpga: fix iomem pointer cast on 64-bit
lockdep: Address clang -Wformat warning printing for %hd
dt-bindings: net: ethernet-controller: fix typo in NVMEM
net: sched: bump refcount for new action in ACT replace mode
gpiolib: Read "gpio-line-names" from a firmware node
cfg80211: remove WARN_ON() in cfg80211_sme_connect
net: tun: set tun->dev->addr_len during TUNSETLINK processing
drivers: net: fix memory leak in atusb_probe
drivers: net: fix memory leak in peak_usb_create_dev
net: mac802154: Fix general protection fault
net: ieee802154: nl-mac: fix check on panid
net: ieee802154: fix nl802154 del llsec key
net: ieee802154: fix nl802154 del llsec dev
net: ieee802154: fix nl802154 add llsec key
net: ieee802154: fix nl802154 del llsec devkey
net: ieee802154: forbid monitor for set llsec params
net: ieee802154: forbid monitor for del llsec seclevel
net: ieee802154: stop dump llsec params for monitors
Revert "net: sched: bump refcount for new action in ACT replace mode"
Linux 5.10.30
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie8754a2e4dfef03bf1f2b878843cde19a4adab21
Use the correct printk length specifier [%llx] for u64 variables.
This fixes several warnings of the following type:
mm/memory_hotplug.c: In function ‘add_memory_subsection’:
./include/linux/kern_levels.h:5:18: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘u64’ {aka ‘long long unsigned int’} [-Wformat=]
mm/memory_hotplug.c:1144:25: note: format string is defined here
pr_err("%s: start 0x%lx size 0x%lx not aligned to subsection size\n",
~~^
%llx
Bug: 183339614
Fixes: 417ac617ea5e (ANDROID: mm/memory_hotplug: implement {add/remove}_memory_subsection)
Reported-by: kernelci.org bot <bot@kernelci.org>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Change-Id: Iee89be07cb40513661e336dd0671f65b1161e830
commit 0760fa3d8f7fceeea508b98899f1c826e10ffe78 upstream.
nr_empty_pop_pages is used to guarantee that there are some free
populated pages to satisfy atomic allocations. Accounted and
non-accounted allocations are using separate sets of chunks,
so both need to have a surplus of empty pages.
This commit makes pcpu_nr_empty_pop_pages and the corresponding logic
per chunk type.
[Dennis]
This issue came up as I was reviewing [1] and realized I missed this.
Simultaneously, it was reported btrfs was seeing failed atomic
allocations in fsstress tests [2] and [3].
[1] https://lore.kernel.org/linux-mm/20210324190626.564297-1-guro@fb.com/
[2] https://lore.kernel.org/linux-mm/20210401185158.3275.409509F4@e16-tech.com/
[3] https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/
Fixes: 3c7be18ac9a0 ("mm: memcg/percpu: account percpu memory to memory cgroups")
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Roman Gushchin <guro@fb.com>
Tested-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-c62f091:
ANDROID: Add GKI_HIDDEN_MM_CONFIGS to support ballooning.
ANDROID: usb: dwc3: gadget: don't cancel the started requests
Linux 5.10.29
init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM
init/Kconfig: make COMPILE_TEST depend on !S390
bpf, x86: Validate computation of branch displacements for x86-32
bpf, x86: Validate computation of branch displacements for x86-64
tools/resolve_btfids: Add /libbpf to .gitignore
kbuild: Do not clean resolve_btfids if the output does not exist
kbuild: Add resolve_btfids clean to root clean target
tools/resolve_btfids: Set srctree variable unconditionally
tools/resolve_btfids: Check objects before removing
tools/resolve_btfids: Build libbpf and libsubcmd in separate directories
math: Export mul_u64_u64_div_u64
io_uring: fix timeout cancel return code
cifs: Silently ignore unknown oplock break handle
cifs: revalidate mapping when we open files for SMB1 POSIX
ia64: fix format strings for err_inject
ia64: mca: allocate early mca with GFP_ATOMIC
selftests/vm: fix out-of-tree build
scsi: target: pscsi: Clean up after failure in pscsi_map_sg()
ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
platform/x86: intel_pmc_core: Ignore GBE LTR on Tiger Lake platforms
block: clear GD_NEED_PART_SCAN later in bdev_disk_changed
x86/build: Turn off -fcf-protection for realmode targets
drm/msm/disp/dpu1: icc path needs to be set before dpu runtime resume
kselftest/arm64: sve: Do not use non-canonical FFR register value
platform/x86: thinkpad_acpi: Allow the FnLock LED to change state
net: ipa: fix init header command validation
netfilter: nftables: skip hook overlap logic if flowtable is stale
netfilter: conntrack: Fix gre tunneling over ipv6
drm/msm: Ratelimit invalid-fence message
drm/msm/adreno: a5xx_power: Don't apply A540 lm_setup to other GPUs
drm/msm/dsi_pll_7nm: Fix variable usage for pll_lockdet_rate
mac80211: choose first enabled channel for monitor
mac80211: Check crypto_aead_encrypt for errors
mISDN: fix crash in fritzpci
kunit: tool: Fix a python tuple typing error
net: pxa168_eth: Fix a potential data race in pxa168_eth_remove
net/mlx5e: Enforce minimum value check for ICOSQ size
bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp
platform/x86: intel-hid: Support Lenovo ThinkPad X1 Tablet Gen 2
bus: ti-sysc: Fix warning on unbind if reset is not deasserted
ARM: dts: am33xx: add aliases for mmc interfaces
FROMGIT: usb: typec: tcpm: update power supply once partner accepts
FROMGIT: usb: typec: tcpm: Address incorrect values of tcpm psy for pps supply
FROMGIT: usb: typec: tcpm: Address incorrect values of tcpm psy for fixed supply
ANDROID: first 4/9/2021 KMI update
ANDROID: Add a new core symbol list
FROMLIST: iommu/arm-smmu: Implement the map_pages() IOMMU driver callback
FROMLIST: iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback
FROMLIST: iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()
FROMLIST: iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()
FROMLIST: iommu/io-pgtable-arm: Implement arm_lpae_map_pages()
FROMLIST: iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
BACKPORT: FROMLIST: iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries
FROMLIST: iommu: Add support for the map_pages() callback
FROMLIST: iommu: Hook up '->unmap_pages' driver callback
FROMLIST: iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts
FROMLIST: iommu: Use bitmap to calculate page size in iommu_pgsize()
BACKPORT: FROMLIST: iommu: Add a map_pages() op for IOMMU drivers
BACKPORT: FROMLIST: iommu/io-pgtable: Introduce map_pages() as a page table op
FROMLIST: iommu: Add an unmap_pages() op for IOMMU drivers
FROMLIST: iommu/io-pgtable: Introduce unmap_pages() as a page table op
Revert "Revert "net: introduce CAN specific pointer in the struct net_device""
Revert "Revert "bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG""
Revert "Revert "bpf: Fix fexit trampoline.""
Revert "ANDROID: GKI: hack to handle genksyms change in sound/soc/soc-core.c"
Revert "Revert "can: dev: Move device back to init netns on owning netns delete""
Revert "Revert "net: phy: broadcom: Fix RGMII delays for BCM50160 and BCM50610M""
Revert "Revert "net: phy: broadcom: Set proper 1000BaseX/SGMII interface mode for BCM54616S""
Revert "Revert "net: phy: broadcom: Avoid forward for bcm54xx_config_clock_delay()""
Revert "Revert "net: phy: introduce phydev->port""
ANDROID: abi_gki_aarch64_qcom: Add __tracepoint_android_rvh_replace_next_task_fair
ANDROID: sched: Update android_rvh_check_preempt_wakeup hook
FROMGIT: scsi: ufs: ufs-debugfs: Add error counters
FROMGIT: scsi: ufs: Refine error history functions
ANDROID: GKI: Add android_rvh_cpu_cgroup_online to qcom symbol list
ANDROID: sched: Add android_rvh_cpu_cgroup_online hook
FROMLIST: userfaultfd/shmem: fix minor fault page leak
FROMLIST: userfaultfd/hugetlbfs: Fix minor fault page leak
BACKPORT: FROMGIT: userfaultfd/selftests: unify error handling
FROMGIT: userfaultfd/selftests: only dump counts if mode enabled
FROMGIT: userfaultfd/selftests: drop VERIFY check in locking_thread
FROMGIT: userfaultfd/selftests: remove the time() check on delayed uffd
FROMGIT: userfaultfd/selftests: use user mode only
FROMGIT: userfaultfd/selftests: exercise minor fault handling shmem support
BACKPORT: FROMGIT: userfaultfd/selftests: reinitialize test context in each test
FROMGIT: userfaultfd/selftests: create alias mappings in the shmem test
FROMGIT: userfaultfd/selftests: use memfd_create for shmem test type
BACKPORT: FROMGIT: userfaultfd: support minor fault handling for shmem
FROMGIT: userfaultfd/selftests: add test exercising minor fault handling
FROMGIT: userfaultfd: update documentation to describe minor fault handling
BACKPORT: FROMGIT: userfaultfd: add UFFDIO_CONTINUE ioctl
BACKPORT: FROMGIT: userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled
FROMGIT: userfaultfd: disable huge PMD sharing for MINOR registered VMAs
BACKPORT: FROMGIT: userfaultfd: add minor fault registration mode
FROMGIT: hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp
FROMGIT: mm/hugetlb: move flush_hugetlb_tlb_range() into hugetlb.h
FROMGIT: mm/hugetlb: fix build with !ARCH_WANT_HUGE_PMD_SHARE
FROMGIT: hugetlb/userfaultfd: forbid huge pmd sharing when uffd enabled
BACKPORT: FROMGIT: hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share()
ANDROID: arm64: coresight: Fix a sparse warning
ANDROID: usb: dwc3: export tracepoint for dwc3 read/write
Conflicts:
drivers/iommu/arm/arm-smmu/arm-smmu.c
Change-Id: Id5b65da0d3a7bd2e169e28f227f362c6627048ec
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
* refs/heads/tmp-c64c734:
ANDROID: GKI: hack to handle genksyms change in sound/soc/soc-core.c
Revert "bpf: Fix fexit trampoline."
Revert "bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG"
Revert "net: introduce CAN specific pointer in the struct net_device"
ANDROID: Add vendor hooks to signal.
ANDROID: mm: cma: Add forward definition of cma in vendor hook
ANDROID: arm64/mm: fix minor printk format warning
FROMLIST: gcov: re-fix clang-11+ support
ANDROID: GKI: Add deferred_free to qcom symbol list
ANDROID: android/OWNERS: drop gki-abi-approvers@
ANDROID: GKI: Update abi_gki_aarch64_qcom for VBO support
ANDROID: qcom: Add dev, inet and skb related symbols
FROMGIT: arm64: fix inline asm in load_unaligned_zeropad()
ANDROID: Add Image.lz4 to arm64 GKI outputs
UPSTREAM: drm/drm_vblank: set the dma-fence timestamp during send_vblank_event
UPSTREAM: dma-fence: allow signaling drivers to set fence timestamp
Linux 5.10.28
bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG
Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
riscv: evaluate put_user() arg before enabling user access
drivers: video: fbcon: fix NULL dereference in fbcon_cursor()
driver core: clear deferred probe reason on probe retry
staging: rtl8192e: Change state information from u16 to u8
staging: rtl8192e: Fix incorrect source in memcpy()
soc: qcom-geni-se: Cleanup the code to remove proxy votes
usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable
usb: dwc3: qcom: skip interconnect init for ACPI probe
usb: dwc2: Prevent core suspend when port connection flag is 0
usb: dwc2: Fix HPRT0.PrtSusp bit setting for HiKey 960 board.
usb: gadget: udc: amd5536udc_pci fix null-ptr-dereference
USB: cdc-acm: fix use-after-free after probe failure
USB: cdc-acm: fix double free on probe failure
USB: cdc-acm: downgrade message to debug
USB: cdc-acm: untangle a circular dependency between callback and softint
cdc-acm: fix BREAK rx code path adding necessary calls
usb: xhci-mtk: fix broken streams issue on 0.96 xHCI
usb: musb: Fix suspend with devices connected for a64
USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem
usbip: vhci_hcd fix shift out-of-bounds in vhci_hub_control()
firewire: nosy: Fix a use-after-free bug in nosy_ioctl()
video: hyperv_fb: Fix a double free in hvfb_probe
usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield
firmware: stratix10-svc: reset COMMAND_RECONFIG_FLAG_PARTIAL to 0
extcon: Fix error handling in extcon_dev_register
extcon: Add stubs for extcon_register_notifier_all() functions
pinctrl: rockchip: fix restore error in resume
vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends
drm/tegra: sor: Grab runtime PM reference across reset
drm/tegra: dc: Restore coupling of display controllers
drm/imx: fix memory leak when fails to init
reiserfs: update reiserfs_xattrs_initialized() condition
drm/amdgpu: check alignment on CPU page for bo map
drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()
drm/amdkfd: dqm fence memory corruption
mm: fix race by making init_zero_pfn() early_initcall
s390/vdso: fix tod_steering_delta type
s390/vdso: copy tod_steering_delta value to vdso_data page
tracing: Fix stack trace event size
PM: runtime: Fix ordering in pm_runtime_get_suppliers()
PM: runtime: Fix race getting/putting suppliers at probe
KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
KVM: SVM: load control fields from VMCB12 before checking them
xtensa: move coprocessor_flush to the .text section
xtensa: fix uaccess-related livelock in do_page_fault
ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8
ALSA: hda/realtek: call alc_update_headset_mode() in hp_automute_hook
ALSA: hda/realtek: fix a determine_headset_type issue for a Dell AIO
ALSA: hda: Add missing sanity checks in PM prepare/complete callbacks
ALSA: hda: Re-add dropped snd_poewr_change_state() calls
ALSA: usb-audio: Apply sample rate quirk to Logitech Connect
ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()
ACPI: tables: x86: Reserve memory occupied by ACPI tables
bpf: Remove MTU check in __bpf_skb_max_len
net: 9p: advance iov on empty read
net: wan/lmc: unregister device when no matching device is found
net: ipa: fix register write command validation
net: ipa: remove two unused register definitions
appletalk: Fix skb allocation size in loopback case
net: ethernet: aquantia: Handle error cleanup of start on open
ath10k: hold RCU lock when calling ieee80211_find_sta_by_ifaddr()
iwlwifi: pcie: don't disable interrupts for reg_lock
netdevsim: dev: Initialize FIB module after debugfs
rtw88: coex: 8821c: correct antenna switch function
ath11k: add ieee80211_unregister_hw to avoid kernel crash caused by NULL pointer
brcmfmac: clear EAP/association status bits on linkdown events
can: tcan4x5x: fix max register value
net: introduce CAN specific pointer in the struct net_device
can: dev: move driver related infrastructure into separate subdir
flow_dissector: fix TTL and TOS dissection on IPv4 fragments
net: mvpp2: fix interrupt mask/unmask skip condition
io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
ext4: do not iput inode under running transaction in ext4_rename()
static_call: Align static_call_is_init() patching condition
io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
nvmet-tcp: fix kmap leak when data digest in use
locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
thermal/core: Add NULL pointer check before using cooling device stats
ASoC: rt711: add snd_soc_component remove callback
ASoC: rt5659: Update MCLK rate in set_sysclk()
staging: comedi: cb_pcidas64: fix request_irq() warn
staging: comedi: cb_pcidas: fix request_irq() warn
scsi: qla2xxx: Fix broken #endif placement
scsi: st: Fix a use after free in st_open()
io_uring: fix ->flags races by linked timeouts
vhost: Fix vhost_vq_reset()
kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
NFSD: fix error handling in NFSv4.0 callbacks
ASoC: cs42l42: Always wait at least 3ms after reset
ASoC: cs42l42: Fix mixer volume control
ASoC: cs42l42: Fix channel width support
ASoC: cs42l42: Fix Bitclock polarity inversion
ASoC: soc-core: Prevent warning if no DMI table is present
ASoC: es8316: Simplify adc_pga_gain_tlv table
ASoC: sgtl5000: set DAP_AVC_CTRL register to correct default value on probe
ASoC: rt5651: Fix dac- and adc- vol-tlv values being off by a factor of 10
ASoC: rt5640: Fix dac- and adc- vol-tlv values being off by a factor of 10
ASoC: rt1015: fix i2c communication error
iomap: Fix negative assignment to unsigned sis->pages in iomap_swapfile_activate
rpc: fix NULL dereference on kmalloc failure
fs: nfsd: fix kconfig dependency warning for NFSD_V4
ext4: fix bh ref count on error paths
ext4: shrink race window in ext4_should_retry_alloc()
virtiofs: Fail dax mount if device does not support it
bpf: Fix fexit trampoline.
arm64: mm: correct the inside linear map range during hotplug check
ANDROID: sched: Initialize arguments of android_rvh_replace_next_task_fair
ANDROID: usb: typec: tcpm: Add vendor hook to update current limit
ANDROID: mm: cma: add vendor hoook in cma_alloc()
ANDROID: GKI: Update ABI XML
ANDROID: GKI: Update virtual_device symbol list
ANDROID: dma-heap: Make the page-pool/deferred-free libraries built-in
ANDROID: vendor_hooks: Add hooks to recognize special worker thread.
ANDROID: usb: typec: tcpm: Add vendor hook to store partner source capabilities
UPSTREAM: KVM: arm64: Fix CPU interface MMIO compatibility detection
FROMGIT: xhci: prevent double-fetch of transfer and transfer event TRBs
FROMGIT: xhci: fix potential array out of bounds with several interrupters
FROMGIT: xhci: check control context is valid before dereferencing it.
FROMGIT: xhci: check port array allocation was successful before dereferencing it
ANDROID: tracing: Make automounting in debugfs optional
ANDROID: usb: add EXPORT_TRACE_SYMBOL to export tracepoint
ANDROID: Add a build config fragment for KHWASan.
FROMGIT: driver core: Use unbound workqueue for deferred probes
Conflicts:
kernel/trace/Kconfig
Change-Id: I9e717422a89ba883c739ea39897904b84fd164d7
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
This fix is analogous to Peter Xu's fix for hugetlb [0]. If we don't
put_page() after getting the page out of the page cache, we leak the
reference.
The fix can be verified by checking /proc/meminfo and running the
userfaultfd selftest in shmem mode. Without the fix, we see MemFree /
MemAvailable steadily decreasing with each run of the test. With the
fix, memory is correctly freed after the test program exits.
Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Link: https://lore.kernel.org/patchwork/patch/1400686/
Bug: 160737021
Bug: 169683130
Change-Id: I599f1434e24fce6e31d0d73c7f9c4714e9875b63
When uffd-minor enabled, we need to put the page cache before handling the
userfault in hugetlb_no_page(), otherwise the page refcount got leaked.
This can be reproduced by running userfaultfd selftest with hugetlb_shared
mode, then cat /proc/meminfo.
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: f2bf15fb0969 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Link: https://lore.kernel.org/patchwork/patch/1400632/
Bug: 160737021
Bug: 169683130
Change-Id: Iac0ebd6738af8b6212c5a6303e4ee2f482bb5841
Patch series "userfaultfd: support minor fault handling for shmem", v2.
Overview
========
See my original series [1] for a detailed overview of minor fault handling
in general. The feature in this series works exactly like the hugetblfs
version (from userspace's perspective).
I'm sending this as a separate series because:
- The original minor fault handling series has a full set of R-Bs, and seems
close to being merged. So, it seems reasonable to start looking at this next
step, which extends the basic functionality.
- shmem is different enough that this series may require some additional work
before it's ready, and I don't want to delay the original series
unnecessarily by bundling them together.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live
migration use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra@google.com>) hope
to optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap.
With this feature, the heap can be shared-mapped at another location where
the GC-thread(s) could continue the compaction operation without the need
to invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java
threads get faults on the heap, UFFDIO_CONTINUE can be used to resume
execution. Furthermore, this feature enables updating references in the
'non-moving' portion of the heap efficiently. Without this feature,
uneccessary page copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t
This patch (of 5):
Modify the userfaultfd register API to allow registering shmem VMAs in
minor mode. Modify the shmem mcopy implementation to support
UFFDIO_CONTINUE in order to resolve such faults.
Combine the shmem mcopy handler functions into a single
shmem_mcopy_atomic_pte, which takes a mode parameter. This matches how
the hugetlbfs implementation is structured, and lets us remove a good
chunk of boilerplate.
Link: https://lkml.kernel.org/r/20210302000133.272579-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210302000133.272579-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wang Qing <wangqing@vivo.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 4cc6e15679966aa49afc5b114c3c83ba0ac39b05
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388146/
Conflicts:
mm/shmem.c
(1. Manual rebase
2. Enclosed shmem_copy_atomic_pte() with CONFIG_USERFAULTFD to avoid
compile erros when USERFAULTFD is not enabled.)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Idcd822b2a124a089121b9ad8c65061f6979126ec
This ioctl is how userspace ought to resolve "minor" userfaults. The
idea is, userspace is notified that a minor fault has occurred. It might
change the contents of the page using its second non-UFFD mapping, or
not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured
the page contents are correct, carry on setting up the mapping".
Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for
MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in
the minor fault case, we already have some pre-existing underlying page.
Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping.
We'd just use memcpy() or similar instead.
It turns out hugetlb_mcopy_atomic_pte() already does very close to what
we want, if an existing page is provided via `struct page **pagep`. We
already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so
just extend that design: add an enum for the three modes of operation,
and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE
case. (Basically, look up the existing page, and avoid adding the
existing page to the page cache or calling set_page_huge_active() on
it.)
Link: https://lkml.kernel.org/r/20210301222728.176417-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 14ea86439abaf3423cd9b6712ed5ce8451d2d181
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388136/
Conflicts:
mm/hugetlb.c
(8f251a3d5ce3bdea73bd045ed35db64f32e0d0d9 is not cherry-picked yet so
switched SetHPageMigratable() to set_active_huge_page())
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I45b62959dcb1d343154cb831113a26e47e77c8af
For background, mm/userfaultfd.c provides a general mcopy_atomic
implementation. But some types of memory (i.e., hugetlb and shmem) need a
slightly different implementation, so they provide their own helpers for
this. In other words, userfaultfd is the only caller of these functions.
This patch achieves two things:
1. Don't spend time compiling code which will end up never being
referenced anyway (a small build time optimization).
2. In patches later in this series, we extend the signature of these
helpers with UFFD-specific state (a mode enumeration). Once this
happens, we *have to* either not compile the helpers, or
unconditionally define the UFFD-only state (which seems messier to me).
This includes the declarations in the headers, as otherwise they'd
yield warnings about implicitly defining the type of those arguments.
Link: https://lkml.kernel.org/r/20210301222728.176417-4-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 0e6e243e1d9a252c047c4cb1b032cfb31caf87ea
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388133/
Conflicts:
include/linux/hugetlb.h
(changed return type of hugetlb_reserve_pages from bool to int))
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I765cff74cde5fb4ce8141fb95e41848890ced961
Patch series "userfaultfd: add minor fault handling", v9.
Overview
========
This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
When enabled (via the UFFDIO_API ioctl), this feature means that any
hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
get events for "minor" faults. By "minor" fault, I mean the following
situation:
Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory). One of the mappings is registered with userfaultfd (in minor
mode), and the other is not. Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents. The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault. As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.
We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea
is, userspace resolves the fault by either a) doing nothing if the
contents are already correct, or b) updating the underlying contents using
the second, non-UFFD mapping (via memcpy/memset or similar, or something
fancier like RDMA, or etc...). In either case, userspace issues
UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
correct, carry on setting up the mapping".
Use Case
========
Consider the use case of VM live migration (e.g. under QEMU/KVM):
1. While a VM is still running, we copy the contents of its memory to a
target machine. The pages are populated on the target by writing to the
non-UFFD mapping, using the setup described above. The VM is still running
(and therefore its memory is likely changing), so this may be repeated
several times, until we decide the target is "up to date enough".
2. We pause the VM on the source, and start executing on the target machine.
During this gap, the VM's user(s) will *see* a pause, so it is desirable to
minimize this window.
3. Between the last time any page was copied from the source to the target, and
when the VM was paused, the contents of that page may have changed - and
therefore the copy we have on the target machine is out of date. Although we
can keep track of which pages are out of date, for VMs with large amounts of
memory, it is "slow" to transfer this information to the target machine. We
want to resume execution before such a transfer would complete.
4. So, the guest begins executing on the target machine. The first time it
touches its memory (via the UFFD-registered mapping), userspace wants to
intercept this fault. Userspace checks whether or not the page is up to date,
and if not, copies the updated page from the source machine, via the non-UFFD
mapping. Finally, whether a copy was performed or not, userspace issues a
UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
are correct, carry on setting up the mapping".
We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.
Interaction with Existing APIs
==============================
Because this is a feature, a registered VMA could potentially receive both
missing and minor faults. I spent some time thinking through how the
existing API interacts with the new feature:
UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault:
- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.
UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
Without modifications, the existing codepath assumes a new page needs to
be allocated. This is okay, since userspace must have a second
non-UFFD-registered mapping anyway, thus there isn't much reason to want
to use these in any case (just memcpy or memset or similar).
- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
-ENOENT in that case (regardless of the kind of fault).
Future Work
===========
This series only supports hugetlbfs. I have a second series in flight to
support shmem as well, extending the functionality. This series is more
mature than the shmem support at this point, and the functionality works
fully on hugetlbfs, so this series can be merged first and then shmem
support will follow.
This patch (of 6):
This feature allows userspace to intercept "minor" faults. By "minor"
faults, I mean the following situation:
Let there exist two mappings (i.e., VMAs) to the same page(s). One of the
mappings is registered with userfaultfd (in minor mode), and the other is
not. Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents. The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what
I'm calling a "minor" fault. As a concrete example, when working with
hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
page.
This commit adds the new registration mode, and sets the relevant flag on
the VMAs being registered. In the hugetlb fault path, if we find that we
have huge_pte_none(), but find_lock_page() does indeed find an existing
page, then we have a "minor" fault, and if the VMA has the userfaultfd
registration flag, we call into userfaultfd to handle it.
This is implemented as a new registration mode, instead of an API feature.
This is because the alternative implementation has significant drawbacks
[1].
However, doing it this was requires we allocate a VM_* flag for the new
registration mode. On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.
[1] https://lore.kernel.org/patchwork/patch/1380226/
Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 82a150ec394f6b944e26786b907fc0deab5b2064
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388132/
Conflicts:
arch/arm64/Kconfig
fs/userfaultfd.c
mm/hugetlb.c
(All related to SPF feature. Resolved by manual rebase)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I43b37272d531341439ceaa03213d0e2415e04688
Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because
userfaultfd-wp is always based on pgtable entries, so they cannot be
shared.
Walk the hugetlb range and unshare all such mappings if there is, right
before UFFDIO_REGISTER will succeed and return to userspace.
This will pair with want_pmd_share() in hugetlb code so that huge pmd
sharing is completely disabled for userfaultfd-wp registered range.
Link: https://lkml.kernel.org/r/20210218231206.15524-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 267bda5c9993856b86f91a998df632b29cf517e2
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1382208/
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I99d541ce45aaf924fa912f00dafa4caefe307755
want_pmd_share() is undefined with !ARCH_WANT_HUGE_PMD_SHARE since it's
put by accident into a "#ifdef ARCH_WANT_HUGE_PMD_SHARE" block. Moving it
out won't work either since vma_shareable() is only defined within the
block. Define it for !ARCH_WANT_HUGE_PMD_SHARE instead.
Link: https://lkml.kernel.org/r/20210310185359.88297-1-peterx@redhat.com
Fixes: 5b109cc1cdcc ("hugetlb/userfaultfd: forbid huge pmd sharing when uffd enabled")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 5038f9dd8bbde13ff16435011bb3b0981acc5c1c
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1393174/
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Id716afd43bff303f7eda2c4f70f18d9ea727c698
Huge pmd sharing could bring problem to userfaultfd. The thing is that
userfaultfd is running its logic based on the special bits on page table
entries, however the huge pmd sharing could potentially share page table
entries for different address ranges. That could cause issues on either:
- When sharing huge pmd page tables for an uffd write protected range, the
newly mapped huge pmd range will also be write protected unexpectedly, or,
- When we try to write protect a range of huge pmd shared range, we'll first
do huge_pmd_unshare() in hugetlb_change_protection(), however that also
means the UFFDIO_WRITEPROTECT could be silently skipped for the shared
region, which could lead to data loss.
Since at it, a few other things are done altogether:
- Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because
that's definitely something that arch code would like to use too
- ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when
trying to share huge pmd. Switch to the want_pmd_share() helper.
Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share().
Link: https://lkml.kernel.org/r/20210218231202.15426-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit ab6a0d00a63f92f1f0d220274fa989eb75c09f2b
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1382207/
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ie2dff7ab31600cae78914e3278be61516844394e
Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4.
This series tries to disable huge pmd unshare of hugetlbfs backed memory
for uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage, the
idea of this series may be needed for multiple tasks (Axel's uffd minor
fault series, and Mike's soft dirty series), so I picked it out from the
larger series.
This patch (of 4):
It is a preparation work to be able to behave differently in the per
architecture huge_pte_alloc() according to different VMA attributes.
Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call.
Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit b92dc1bfd52ecf338c024815a7c1d44e37a507a1
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1382205/
Conflicts:
arch/sparc/mm/hugetlbpage.c
(1.Resolved by manual rebase)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I50db4e27f2951a5ee01b0dfa22c1ece34e79f881
Changes in 5.10.28
arm64: mm: correct the inside linear map range during hotplug check
bpf: Fix fexit trampoline.
virtiofs: Fail dax mount if device does not support it
ext4: shrink race window in ext4_should_retry_alloc()
ext4: fix bh ref count on error paths
fs: nfsd: fix kconfig dependency warning for NFSD_V4
rpc: fix NULL dereference on kmalloc failure
iomap: Fix negative assignment to unsigned sis->pages in iomap_swapfile_activate
ASoC: rt1015: fix i2c communication error
ASoC: rt5640: Fix dac- and adc- vol-tlv values being off by a factor of 10
ASoC: rt5651: Fix dac- and adc- vol-tlv values being off by a factor of 10
ASoC: sgtl5000: set DAP_AVC_CTRL register to correct default value on probe
ASoC: es8316: Simplify adc_pga_gain_tlv table
ASoC: soc-core: Prevent warning if no DMI table is present
ASoC: cs42l42: Fix Bitclock polarity inversion
ASoC: cs42l42: Fix channel width support
ASoC: cs42l42: Fix mixer volume control
ASoC: cs42l42: Always wait at least 3ms after reset
NFSD: fix error handling in NFSv4.0 callbacks
kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
vhost: Fix vhost_vq_reset()
io_uring: fix ->flags races by linked timeouts
scsi: st: Fix a use after free in st_open()
scsi: qla2xxx: Fix broken #endif placement
staging: comedi: cb_pcidas: fix request_irq() warn
staging: comedi: cb_pcidas64: fix request_irq() warn
ASoC: rt5659: Update MCLK rate in set_sysclk()
ASoC: rt711: add snd_soc_component remove callback
thermal/core: Add NULL pointer check before using cooling device stats
locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
nvmet-tcp: fix kmap leak when data digest in use
io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
static_call: Align static_call_is_init() patching condition
ext4: do not iput inode under running transaction in ext4_rename()
io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
net: mvpp2: fix interrupt mask/unmask skip condition
flow_dissector: fix TTL and TOS dissection on IPv4 fragments
can: dev: move driver related infrastructure into separate subdir
net: introduce CAN specific pointer in the struct net_device
can: tcan4x5x: fix max register value
brcmfmac: clear EAP/association status bits on linkdown events
ath11k: add ieee80211_unregister_hw to avoid kernel crash caused by NULL pointer
rtw88: coex: 8821c: correct antenna switch function
netdevsim: dev: Initialize FIB module after debugfs
iwlwifi: pcie: don't disable interrupts for reg_lock
ath10k: hold RCU lock when calling ieee80211_find_sta_by_ifaddr()
net: ethernet: aquantia: Handle error cleanup of start on open
appletalk: Fix skb allocation size in loopback case
net: ipa: remove two unused register definitions
net: ipa: fix register write command validation
net: wan/lmc: unregister device when no matching device is found
net: 9p: advance iov on empty read
bpf: Remove MTU check in __bpf_skb_max_len
ACPI: tables: x86: Reserve memory occupied by ACPI tables
ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()
ALSA: usb-audio: Apply sample rate quirk to Logitech Connect
ALSA: hda: Re-add dropped snd_poewr_change_state() calls
ALSA: hda: Add missing sanity checks in PM prepare/complete callbacks
ALSA: hda/realtek: fix a determine_headset_type issue for a Dell AIO
ALSA: hda/realtek: call alc_update_headset_mode() in hp_automute_hook
ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8
xtensa: fix uaccess-related livelock in do_page_fault
xtensa: move coprocessor_flush to the .text section
KVM: SVM: load control fields from VMCB12 before checking them
KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
PM: runtime: Fix race getting/putting suppliers at probe
PM: runtime: Fix ordering in pm_runtime_get_suppliers()
tracing: Fix stack trace event size
s390/vdso: copy tod_steering_delta value to vdso_data page
s390/vdso: fix tod_steering_delta type
mm: fix race by making init_zero_pfn() early_initcall
drm/amdkfd: dqm fence memory corruption
drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()
drm/amdgpu: check alignment on CPU page for bo map
reiserfs: update reiserfs_xattrs_initialized() condition
drm/imx: fix memory leak when fails to init
drm/tegra: dc: Restore coupling of display controllers
drm/tegra: sor: Grab runtime PM reference across reset
vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends
pinctrl: rockchip: fix restore error in resume
extcon: Add stubs for extcon_register_notifier_all() functions
extcon: Fix error handling in extcon_dev_register
firmware: stratix10-svc: reset COMMAND_RECONFIG_FLAG_PARTIAL to 0
usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield
video: hyperv_fb: Fix a double free in hvfb_probe
firewire: nosy: Fix a use-after-free bug in nosy_ioctl()
usbip: vhci_hcd fix shift out-of-bounds in vhci_hub_control()
USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem
usb: musb: Fix suspend with devices connected for a64
usb: xhci-mtk: fix broken streams issue on 0.96 xHCI
cdc-acm: fix BREAK rx code path adding necessary calls
USB: cdc-acm: untangle a circular dependency between callback and softint
USB: cdc-acm: downgrade message to debug
USB: cdc-acm: fix double free on probe failure
USB: cdc-acm: fix use-after-free after probe failure
usb: gadget: udc: amd5536udc_pci fix null-ptr-dereference
usb: dwc2: Fix HPRT0.PrtSusp bit setting for HiKey 960 board.
usb: dwc2: Prevent core suspend when port connection flag is 0
usb: dwc3: qcom: skip interconnect init for ACPI probe
usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable
soc: qcom-geni-se: Cleanup the code to remove proxy votes
staging: rtl8192e: Fix incorrect source in memcpy()
staging: rtl8192e: Change state information from u16 to u8
driver core: clear deferred probe reason on probe retry
drivers: video: fbcon: fix NULL dereference in fbcon_cursor()
riscv: evaluate put_user() arg before enabling user access
Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG
Linux 5.10.28
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifdbbeda8de3ee22a7aa3f5d3b10becf0aba1a124
commit e720e7d0e983bf05de80b231bccc39f1487f0f16 upstream.
There are code paths that rely on zero_pfn to be fully initialized
before core_initcall. For example, wq_sysfs_init() is a core_initcall
function that eventually results in a call to kernel_execve, which
causes a page fault with a subsequent mmput. If zero_pfn is not
initialized by then it may not get cleaned up properly and result in an
error:
BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1
Here is an analysis of the race as seen on a MIPS device. On this
particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
initialized, at which point it becomes PFN 5120:
1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
kobject_uevent_env+0x7e4/0x7ec
kset_register+0x68/0x88
bus_register+0xdc/0x34c
subsys_virtual_register+0x34/0x78
wq_sysfs_init+0x1c/0x4c
do_one_initcall+0x50/0x1a8
kernel_init_freeable+0x230/0x2c8
kernel_init+0x10/0x100
ret_from_kernel_thread+0x14/0x1c
2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
kernel_execve asynchronously.
3. Memory allocations in kernel_execve cause a page fault, bumping the
MM reference counter:
add_mm_counter_fast+0xb4/0xc0
handle_mm_fault+0x6e4/0xea0
__get_user_pages.part.78+0x190/0x37c
__get_user_pages_remote+0x128/0x360
get_arg_page+0x34/0xa0
copy_string_kernel+0x194/0x2a4
kernel_execve+0x11c/0x298
call_usermodehelper_exec_async+0x114/0x194
4. In case zero_pfn has not been initialized yet, zap_pte_range does
not decrement the MM_ANONPAGES RSS counter and the BUG message is
triggered shortly afterwards when __mmdrop checks the ref counters:
__mmdrop+0x98/0x1d0
free_bprm+0x44/0x118
kernel_execve+0x160/0x1d8
call_usermodehelper_exec_async+0x114/0x194
ret_from_kernel_thread+0x14/0x1c
To avoid races such as described above, initialize init_zero_pfn at
early_initcall level. Depending on the architecture, ZERO_PAGE is
either constant or gets initialized even earlier, at paging_init, so
there is no issue with initializing zero_pfn earlier.
Link: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-519c8c6:
ANDROID: usb: host: xhci: provide function prototype for xhci_address_device
ANDROID: usb: host: add bus_suspend/bus_resume to xhci overrides
ANDROID: usb: host: add address_device to xhci overrides
ANDROID: Add OWNERS files referring to the respective android-mainline OWNERS
ANDROID: usb: host: add max packet parameter on alloc_transfer_ring hook
ANDROID: usb: host: add xhci hooks for vendor specific container context
ANDROID: ABI: Update allowed symbol list for QCOM
ANDROID: abi_gki_aarch64_qcom: Add android_rvh_probe_register
FROMGIT: usb: xhci-mtk: support quirk to disable usb2 lpm
FROMGIT: usb: xhci-mtk: fix broken streams issue on 0.96 xHCI
FROMGIT: usb: xhci-mtk: fix oops when unbind driver
FROMGIT: usb: xhci-mtk: fix wrong remainder of bandwidth budget
FROMGIT: usb: dwc3: add cancelled reasons for dwc3 requests
Revert "net: phy: introduce phydev->port"
Revert "net: phy: broadcom: Avoid forward for bcm54xx_config_clock_delay()"
Revert "net: phy: broadcom: Set proper 1000BaseX/SGMII interface mode for BCM54616S"
Revert "net: phy: broadcom: Fix RGMII delays for BCM50160 and BCM50610M"
Revert "can: dev: Move device back to init netns on owning netns delete"
FROMGIT: pstore: Add mem_type property DT parsing support
ANDROID: usb: host: export xhci symbols for ring management
ANDROID: usb: typec: tcpm: vendor hook for timer adjustments
ANDROID: Incremental fs: Truncate file when complete
ANDROID: Incremental fs: Fix mlock to fail gracefully on corrupt files
ANDROID: Incremental fs: Finer readlog compression internally
ANDROID: Incremental fs: Support STATX_ATTR_VERITY
ANDROID: GKI: sched: add rvh for new cfs task util
ANDROID: GKI: Update abi_gki_aarch64_qcom for binder
ANDROID: mm: Make slub_debug global
ANDROID: mm: Make page_owner_enabled global
ANDROID: scsi: ufs: set crypto keyslot before prepare_command
ANDROID: vendor_hooks: Allow multiple attachments to restricted hooks
FROMGIT: KVM: arm64: Drop the CPU_FTR_REG_HYP_COPY infrastructure
FROMGIT: KVM: arm64: Generate final CTR_EL0 value when running in Protected mode
ANDROID: KVM: arm64: Sync with upstream host stage 2 series
FROMGIT: media: v4l2-ctrls: Fix h264 hierarchical coding type menu ctrl
FROMGIT: mm/page_owner: record the timestamp of all pages during free
UPSTREAM: mm/page_io: use pr_alert_ratelimited for swap read/write errors
Linux 5.10.27
xen-blkback: don't leak persistent grants from xen_blkbk_map()
can: peak_usb: Revert "can: peak_usb: add forgotten supported devices"
nvme: fix the nsid value to print in nvme_validate_or_alloc_ns
Revert "net: bonding: fix error return code of bond_neigh_init()"
Revert "xen: fix p2m size in dom0 for disabled memory hotplug case"
fs/ext4: fix integer overflow in s_log_groups_per_flex
ext4: add reclaim checks to xattr code
mac80211: fix double free in ibss_leave
net: dsa: b53: VLAN filtering is global to all users
r8169: fix DMA being used after buffer free if WoL is enabled
can: dev: Move device back to init netns on owning netns delete
ch_ktls: fix enum-conversion warning
fs/cachefiles: Remove wait_bit_key layout dependency
mm/memcg: fix 5.10 backport of splitting page memcg
x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
locking/mutex: Fix non debug version of mutex_lock_io_nested()
cifs: Adjust key sizes and key generation routines for AES256 encryption
smb3: fix cached file size problems in duplicate extents (reflink)
scsi: mpt3sas: Fix error return code of mpt3sas_base_attach()
scsi: qedi: Fix error return code of qedi_alloc_global_queues()
scsi: Revert "qla2xxx: Make sure that aborted commands are freed"
block: recalculate segment count for multi-segment discards correctly
io_uring: fix provide_buffers sign extension
perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
perf auxtrace: Fix auxtrace queue conflict
ACPI: scan: Use unique number for instance_no
ACPI: scan: Rearrange memory allocation in acpi_device_add()
Revert "netfilter: x_tables: Update remaining dereference to RCU"
mm/mmu_notifiers: ensure range_end() is paired with range_start()
dm table: Fix zoned model check and zone sectors check
netfilter: x_tables: Use correct memory barriers.
Revert "netfilter: x_tables: Switch synchronization to RCU"
net: phy: broadcom: Fix RGMII delays for BCM50160 and BCM50610M
net: phy: broadcom: Set proper 1000BaseX/SGMII interface mode for BCM54616S
net: phy: broadcom: Avoid forward for bcm54xx_config_clock_delay()
net: phy: introduce phydev->port
net: axienet: Fix probe error cleanup
net: axienet: Properly handle PCS/PMA PHY for 1000BaseX mode
igb: avoid premature Rx buffer reuse
net, bpf: Fix ip6ip6 crash with collect_md populated skbs
net: Consolidate common blackhole dst ops
bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs
RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server
xen/x86: make XEN_BALLOON_MEMORY_HOTPLUG_LIMIT depend on MEMORY_HOTPLUG
octeontx2-af: Fix memory leak of object buf
net: bridge: don't notify switchdev for local FDB addresses
PM: EM: postpone creating the debugfs dir till fs_initcall
net/mlx5e: Fix error path for ethtool set-priv-flag
net/mlx5e: Offload tuple rewrite for non-CT flows
net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
net/mlx5: Add back multicast stats for uplink representor
PM: runtime: Defer suspending suppliers
arm64: kdump: update ppos when reading elfcorehdr
drm/msm: Fix suspend/resume on i.MX5
drm/msm: fix shutdown hook in case GPU components failed to bind
can: isotp: tx-path: zero initialize outgoing CAN frames
bpf: Fix umd memory leak in copy_process()
libbpf: Fix BTF dump of pointer-to-array-of-struct
selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value
selinux: vsock: Set SID for socket returned by accept()
net: stmmac: dwmac-sun8i: Provide TX and RX fifo sizes
r8152: limit the RX buffer size of RTL8153A for USB 2.0
igb: check timestamp validity
net: cdc-phonet: fix data-interface release on probe failure
net: check all name nodes in __dev_alloc_name
octeontx2-af: fix infinite loop in unmapping NPC counter
octeontx2-pf: Clear RSS enable flag on interace down
octeontx2-af: Fix irq free in rvu teardown
octeontx2-af: Remove TOS field from MKEX TX
octeontx2-af: Modify default KEX profile to extract TX packet fields
octeontx2-af: Formatting debugfs entry rsrc_alloc.
ipv6: weaken the v4mapped source check
ARM: dts: imx6ull: fix ubi filesystem mount failed
libbpf: Use SOCK_CLOEXEC when opening the netlink socket
libbpf: Fix error path in bpf_object__elf_init()
netfilter: flowtable: Make sure GC works periodically in idle system
netfilter: nftables: allow to update flowtable flags
netfilter: nftables: report EOPNOTSUPP on unsupported flowtable flags
net/sched: cls_flower: fix only mask bit check in the validate_ct_state
ionic: linearize tso skb with too many frags
drm/msm/dsi: fix check-before-set in the 7nm dsi_pll code
ftrace: Fix modify_ftrace_direct.
nfp: flower: fix pre_tun mask id allocation
nfp: flower: add ipv6 bit to pre_tunnel control message
nfp: flower: fix unsupported pre_tunnel flows
selftests/net: fix warnings on reuseaddr_ports_exhausted
mac80211: Allow HE operation to be longer than expected.
mac80211: fix rate mask reset
can: m_can: m_can_rx_peripheral(): fix RX being blocked by errors
can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning
can: c_can: move runtime PM enable/disable to c_can_platform
can: c_can_pci: c_can_pci_remove(): fix use-after-free
can: kvaser_pciefd: Always disable bus load reporting
can: flexcan: flexcan_chip_freeze(): fix chip freeze for missing bitrate
can: peak_usb: add forgotten supported devices
can: isotp: TX-path: ensure that CAN frame flags are initialized
can: isotp: isotp_setsockopt(): only allow to set low level TX flags for CAN-FD
tcp: relookup sock for RST+ACK packets handled by obsolete req sock
tipc: better validate user input in tipc_nl_retrieve_key()
net: phylink: Fix phylink_err() function name error in phylink_major_config
net: hdlc_x25: Prevent racing between "x25_close" and "x25_xmit"/"x25_rx"
netfilter: ctnetlink: fix dump of the expect mask attribute
selftests/bpf: Set gopt opt_class to 0 if get tunnel opt failed
flow_dissector: fix byteorder of dissected ICMP ID
net: qrtr: fix a kernel-infoleak in qrtr_recvmsg()
net: ipa: terminate message handler arrays
clk: qcom: gcc-sc7180: Use floor ops for the correct sdcc1 clk
ftgmac100: Restart MAC HW once
net: phy: broadcom: Add power down exit reset state delay
net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template
e1000e: Fix error handling in e1000_set_d0_lplu_state_82571
e1000e: add rtnl_lock() to e1000_reset_task
igc: Fix igc_ptp_rx_pktstamp()
igc: Fix Supported Pause Frame Link Setting
igc: Fix Pause Frame Advertising
igc: reinit_locked() should be called with rtnl_lock
net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port
net: sched: validate stab values
macvlan: macvlan_count_rx() needs to be aware of preemption
drop_monitor: Perform cleanup upon probe registration failure
ipv6: fix suspecious RCU usage warning
net/mlx5e: Don't match on Geneve options in case option masks are all zero
net/mlx5e: When changing XDP program without reset, take refs for XSK RQs
net/mlx5e: RX, Mind the MPWQE gaps when calculating offsets
libbpf: Fix INSTALL flag order
bpf: Change inode_storage's lookup_elem return value from NULL to -EBADF
veth: Store queue_mapping independently of XDP prog presence
soc: ti: omap-prm: Fix occasional abort on reset deassert for dra7 iva
ARM: OMAP2+: Fix smartreflex init regression after dropping legacy data
bus: omap_l3_noc: mark l3 irqs as IRQF_NO_THREAD
dm ioctl: fix out of bounds array access when no devices
dm verity: fix DM_VERITY_OPTS_MAX value
drm/i915: Fix the GT fence revocation runtime PM logic
drm/amdgpu: Add additional Sienna Cichlid PCI ID
drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x
drm/amd/pm: workaround for audio noise issue
drm/etnaviv: Use FOLL_FORCE for userptr
integrity: double check iint_cache was initialized
ARM: dts: at91-sama5d27_som1: fix phy address to 7
ARM: dts: at91: sam9x60: fix mux-mask to match product's datasheet
ARM: dts: at91: sam9x60: fix mux-mask for PA7 so it can be set to A, B and C
arm64: dts: ls1043a: mark crypto engine dma coherent
arm64: dts: ls1012a: mark crypto engine dma coherent
arm64: dts: ls1046a: mark crypto engine dma coherent
arm64: stacktrace: don't trace arch_stack_walk()
ACPICA: Always create namespace nodes using acpi_ns_create_node()
ACPI: video: Add missing callback back for Sony VPCEH3U1E
gcov: fix clang-11+ support
kasan: fix per-page tags for non-page_alloc pages
hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
squashfs: fix xattr id and id lookup sanity checks
squashfs: fix inode lookup sanity checks
z3fold: prevent reclaim/free race for headless pages
psample: Fix user API breakage
platform/x86: intel-vbtn: Stop reporting SW_DOCK events
netsec: restore phy power state after controller reset
selinux: fix variable scope issue in live sidtab conversion
selinux: don't log MAC_POLICY_LOAD record on failed policy load
btrfs: fix sleep while in non-sleep context during qgroup removal
KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
static_call: Fix static_call_set_init()
static_call: Fix the module key fixup
static_call: Allow module use without exposing static_call_key
static_call: Pull some static_call declarations to the type headers
ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
mm/fork: clear PASID for new mm
block: Suppress uevent for hidden device when removed
nfs: we don't support removing system.nfs4_acl
nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a
nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
nvme-core: check ctrl css before setting up zns
nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted
nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()
nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request()
nvme: simplify error logic in nvme_validate_ns()
drm/radeon: fix AGP dependency
drm/amdgpu: fb BO should be ttm_bo_type_device
drm/amd/display: Revert dram_clock_change_latency for DCN2.1
block: Fix REQ_OP_ZONE_RESET_ALL handling
regulator: qcom-rpmh: Correct the pmic5_hfsmps515 buck
kselftest: arm64: Fix exit code of sve-ptrace
u64_stats,lockdep: Fix u64_stats_init() vs lockdep
staging: rtl8192e: fix kconfig dependency on CRYPTO
habanalabs: Call put_pid() when releasing control device
sparc64: Fix opcode filtering in handling of no fault loads
umem: fix error return code in mm_pci_probe()
kbuild: dummy-tools: fix inverted tests for gcc
kbuild: add image_name to no-sync-config-targets
irqchip/ingenic: Add support for the JZ4760
cifs: change noisy error message to FYI
atm: idt77252: fix null-ptr-dereference
atm: uPD98402: fix incorrect allocation
net: enetc: set MAC RX FIFO to recommended value
net: davicom: Use platform_get_irq_optional()
net: wan: fix error return code of uhdlc_init()
net: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()
NFS: Correct size calculation for create reply length
nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default
gpiolib: acpi: Add missing IRQF_ONESHOT
cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev
gfs2: fix use-after-free in trans_drain
cifs: ask for more credit on async read/write code paths
gianfar: fix jumbo packets+napi+rx overrun crash
sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count
net: intel: iavf: fix error return code of iavf_init_get_resources()
net: tehuti: fix error return code in bdx_probe()
blk-cgroup: Fix the recursive blkg rwstat
scsi: ufs: ufs-qcom: Disable interrupt in reset path
ixgbe: Fix memleak in ixgbe_configure_clsu32
ALSA: hda: ignore invalid NHLT table
Revert "r8152: adjust the settings about MAC clock speed down for RTL8153"
atm: lanai: dont run lanai_dev_close if not open
atm: eni: dont release is never initialized
powerpc/4xx: Fix build errors from mfdcr()
net: fec: ptp: avoid register access when ipg clock is disabled
net: stmmac: fix dma physical address of descriptor when display ring
mt76: fix tx skb error handling in mt76_dma_tx_queue_skb
mm/memcg: set memcg when splitting page
mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
ANDROID: roll back xt_IDLETIMER to 5.10.21 upstream/vanilla version
ANDROID: qcom: Add ip, rtnl and free related symbols
Conflicts:
Documentation/admin-guide/ramoops.rst
Documentation/devicetree/bindings
Documentation/devicetree/bindings/reserved-memory/ramoops.txt
Change-Id: I1bb9e2c15dd1c4bc6f9d75a930a97993bd03be7f
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
Changes in 5.10.27
mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
mm/memcg: set memcg when splitting page
mt76: fix tx skb error handling in mt76_dma_tx_queue_skb
net: stmmac: fix dma physical address of descriptor when display ring
net: fec: ptp: avoid register access when ipg clock is disabled
powerpc/4xx: Fix build errors from mfdcr()
atm: eni: dont release is never initialized
atm: lanai: dont run lanai_dev_close if not open
Revert "r8152: adjust the settings about MAC clock speed down for RTL8153"
ALSA: hda: ignore invalid NHLT table
ixgbe: Fix memleak in ixgbe_configure_clsu32
scsi: ufs: ufs-qcom: Disable interrupt in reset path
blk-cgroup: Fix the recursive blkg rwstat
net: tehuti: fix error return code in bdx_probe()
net: intel: iavf: fix error return code of iavf_init_get_resources()
sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count
gianfar: fix jumbo packets+napi+rx overrun crash
cifs: ask for more credit on async read/write code paths
gfs2: fix use-after-free in trans_drain
cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev
gpiolib: acpi: Add missing IRQF_ONESHOT
nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default
NFS: Correct size calculation for create reply length
net: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()
net: wan: fix error return code of uhdlc_init()
net: davicom: Use platform_get_irq_optional()
net: enetc: set MAC RX FIFO to recommended value
atm: uPD98402: fix incorrect allocation
atm: idt77252: fix null-ptr-dereference
cifs: change noisy error message to FYI
irqchip/ingenic: Add support for the JZ4760
kbuild: add image_name to no-sync-config-targets
kbuild: dummy-tools: fix inverted tests for gcc
umem: fix error return code in mm_pci_probe()
sparc64: Fix opcode filtering in handling of no fault loads
habanalabs: Call put_pid() when releasing control device
staging: rtl8192e: fix kconfig dependency on CRYPTO
u64_stats,lockdep: Fix u64_stats_init() vs lockdep
kselftest: arm64: Fix exit code of sve-ptrace
regulator: qcom-rpmh: Correct the pmic5_hfsmps515 buck
block: Fix REQ_OP_ZONE_RESET_ALL handling
drm/amd/display: Revert dram_clock_change_latency for DCN2.1
drm/amdgpu: fb BO should be ttm_bo_type_device
drm/radeon: fix AGP dependency
nvme: simplify error logic in nvme_validate_ns()
nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request()
nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()
nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted
nvme-core: check ctrl css before setting up zns
nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a
nfs: we don't support removing system.nfs4_acl
block: Suppress uevent for hidden device when removed
mm/fork: clear PASID for new mm
ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
static_call: Pull some static_call declarations to the type headers
static_call: Allow module use without exposing static_call_key
static_call: Fix the module key fixup
static_call: Fix static_call_set_init()
KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
btrfs: fix sleep while in non-sleep context during qgroup removal
selinux: don't log MAC_POLICY_LOAD record on failed policy load
selinux: fix variable scope issue in live sidtab conversion
netsec: restore phy power state after controller reset
platform/x86: intel-vbtn: Stop reporting SW_DOCK events
psample: Fix user API breakage
z3fold: prevent reclaim/free race for headless pages
squashfs: fix inode lookup sanity checks
squashfs: fix xattr id and id lookup sanity checks
hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
kasan: fix per-page tags for non-page_alloc pages
gcov: fix clang-11+ support
ACPI: video: Add missing callback back for Sony VPCEH3U1E
ACPICA: Always create namespace nodes using acpi_ns_create_node()
arm64: stacktrace: don't trace arch_stack_walk()
arm64: dts: ls1046a: mark crypto engine dma coherent
arm64: dts: ls1012a: mark crypto engine dma coherent
arm64: dts: ls1043a: mark crypto engine dma coherent
ARM: dts: at91: sam9x60: fix mux-mask for PA7 so it can be set to A, B and C
ARM: dts: at91: sam9x60: fix mux-mask to match product's datasheet
ARM: dts: at91-sama5d27_som1: fix phy address to 7
integrity: double check iint_cache was initialized
drm/etnaviv: Use FOLL_FORCE for userptr
drm/amd/pm: workaround for audio noise issue
drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x
drm/amdgpu: Add additional Sienna Cichlid PCI ID
drm/i915: Fix the GT fence revocation runtime PM logic
dm verity: fix DM_VERITY_OPTS_MAX value
dm ioctl: fix out of bounds array access when no devices
bus: omap_l3_noc: mark l3 irqs as IRQF_NO_THREAD
ARM: OMAP2+: Fix smartreflex init regression after dropping legacy data
soc: ti: omap-prm: Fix occasional abort on reset deassert for dra7 iva
veth: Store queue_mapping independently of XDP prog presence
bpf: Change inode_storage's lookup_elem return value from NULL to -EBADF
libbpf: Fix INSTALL flag order
net/mlx5e: RX, Mind the MPWQE gaps when calculating offsets
net/mlx5e: When changing XDP program without reset, take refs for XSK RQs
net/mlx5e: Don't match on Geneve options in case option masks are all zero
ipv6: fix suspecious RCU usage warning
drop_monitor: Perform cleanup upon probe registration failure
macvlan: macvlan_count_rx() needs to be aware of preemption
net: sched: validate stab values
net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port
igc: reinit_locked() should be called with rtnl_lock
igc: Fix Pause Frame Advertising
igc: Fix Supported Pause Frame Link Setting
igc: Fix igc_ptp_rx_pktstamp()
e1000e: add rtnl_lock() to e1000_reset_task
e1000e: Fix error handling in e1000_set_d0_lplu_state_82571
net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template
net: phy: broadcom: Add power down exit reset state delay
ftgmac100: Restart MAC HW once
clk: qcom: gcc-sc7180: Use floor ops for the correct sdcc1 clk
net: ipa: terminate message handler arrays
net: qrtr: fix a kernel-infoleak in qrtr_recvmsg()
flow_dissector: fix byteorder of dissected ICMP ID
selftests/bpf: Set gopt opt_class to 0 if get tunnel opt failed
netfilter: ctnetlink: fix dump of the expect mask attribute
net: hdlc_x25: Prevent racing between "x25_close" and "x25_xmit"/"x25_rx"
net: phylink: Fix phylink_err() function name error in phylink_major_config
tipc: better validate user input in tipc_nl_retrieve_key()
tcp: relookup sock for RST+ACK packets handled by obsolete req sock
can: isotp: isotp_setsockopt(): only allow to set low level TX flags for CAN-FD
can: isotp: TX-path: ensure that CAN frame flags are initialized
can: peak_usb: add forgotten supported devices
can: flexcan: flexcan_chip_freeze(): fix chip freeze for missing bitrate
can: kvaser_pciefd: Always disable bus load reporting
can: c_can_pci: c_can_pci_remove(): fix use-after-free
can: c_can: move runtime PM enable/disable to c_can_platform
can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning
can: m_can: m_can_rx_peripheral(): fix RX being blocked by errors
mac80211: fix rate mask reset
mac80211: Allow HE operation to be longer than expected.
selftests/net: fix warnings on reuseaddr_ports_exhausted
nfp: flower: fix unsupported pre_tunnel flows
nfp: flower: add ipv6 bit to pre_tunnel control message
nfp: flower: fix pre_tun mask id allocation
ftrace: Fix modify_ftrace_direct.
drm/msm/dsi: fix check-before-set in the 7nm dsi_pll code
ionic: linearize tso skb with too many frags
net/sched: cls_flower: fix only mask bit check in the validate_ct_state
netfilter: nftables: report EOPNOTSUPP on unsupported flowtable flags
netfilter: nftables: allow to update flowtable flags
netfilter: flowtable: Make sure GC works periodically in idle system
libbpf: Fix error path in bpf_object__elf_init()
libbpf: Use SOCK_CLOEXEC when opening the netlink socket
ARM: dts: imx6ull: fix ubi filesystem mount failed
ipv6: weaken the v4mapped source check
octeontx2-af: Formatting debugfs entry rsrc_alloc.
octeontx2-af: Modify default KEX profile to extract TX packet fields
octeontx2-af: Remove TOS field from MKEX TX
octeontx2-af: Fix irq free in rvu teardown
octeontx2-pf: Clear RSS enable flag on interace down
octeontx2-af: fix infinite loop in unmapping NPC counter
net: check all name nodes in __dev_alloc_name
net: cdc-phonet: fix data-interface release on probe failure
igb: check timestamp validity
r8152: limit the RX buffer size of RTL8153A for USB 2.0
net: stmmac: dwmac-sun8i: Provide TX and RX fifo sizes
selinux: vsock: Set SID for socket returned by accept()
selftests: forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value
libbpf: Fix BTF dump of pointer-to-array-of-struct
bpf: Fix umd memory leak in copy_process()
can: isotp: tx-path: zero initialize outgoing CAN frames
drm/msm: fix shutdown hook in case GPU components failed to bind
drm/msm: Fix suspend/resume on i.MX5
arm64: kdump: update ppos when reading elfcorehdr
PM: runtime: Defer suspending suppliers
net/mlx5: Add back multicast stats for uplink representor
net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
net/mlx5e: Offload tuple rewrite for non-CT flows
net/mlx5e: Fix error path for ethtool set-priv-flag
PM: EM: postpone creating the debugfs dir till fs_initcall
net: bridge: don't notify switchdev for local FDB addresses
octeontx2-af: Fix memory leak of object buf
xen/x86: make XEN_BALLOON_MEMORY_HOTPLUG_LIMIT depend on MEMORY_HOTPLUG
RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server
bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs
net: Consolidate common blackhole dst ops
net, bpf: Fix ip6ip6 crash with collect_md populated skbs
igb: avoid premature Rx buffer reuse
net: axienet: Properly handle PCS/PMA PHY for 1000BaseX mode
net: axienet: Fix probe error cleanup
net: phy: introduce phydev->port
net: phy: broadcom: Avoid forward for bcm54xx_config_clock_delay()
net: phy: broadcom: Set proper 1000BaseX/SGMII interface mode for BCM54616S
net: phy: broadcom: Fix RGMII delays for BCM50160 and BCM50610M
Revert "netfilter: x_tables: Switch synchronization to RCU"
netfilter: x_tables: Use correct memory barriers.
dm table: Fix zoned model check and zone sectors check
mm/mmu_notifiers: ensure range_end() is paired with range_start()
Revert "netfilter: x_tables: Update remaining dereference to RCU"
ACPI: scan: Rearrange memory allocation in acpi_device_add()
ACPI: scan: Use unique number for instance_no
perf auxtrace: Fix auxtrace queue conflict
perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
io_uring: fix provide_buffers sign extension
block: recalculate segment count for multi-segment discards correctly
scsi: Revert "qla2xxx: Make sure that aborted commands are freed"
scsi: qedi: Fix error return code of qedi_alloc_global_queues()
scsi: mpt3sas: Fix error return code of mpt3sas_base_attach()
smb3: fix cached file size problems in duplicate extents (reflink)
cifs: Adjust key sizes and key generation routines for AES256 encryption
locking/mutex: Fix non debug version of mutex_lock_io_nested()
x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
mm/memcg: fix 5.10 backport of splitting page memcg
fs/cachefiles: Remove wait_bit_key layout dependency
ch_ktls: fix enum-conversion warning
can: dev: Move device back to init netns on owning netns delete
r8169: fix DMA being used after buffer free if WoL is enabled
net: dsa: b53: VLAN filtering is global to all users
mac80211: fix double free in ibss_leave
ext4: add reclaim checks to xattr code
fs/ext4: fix integer overflow in s_log_groups_per_flex
Revert "xen: fix p2m size in dom0 for disabled memory hotplug case"
Revert "net: bonding: fix error return code of bond_neigh_init()"
nvme: fix the nsid value to print in nvme_validate_or_alloc_ns
can: peak_usb: Revert "can: peak_usb: add forgotten supported devices"
xen-blkback: don't leak persistent grants from xen_blkbk_map()
Linux 5.10.27
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7eafe976fd6bf33db6db4adb8ebf2ff087294a23
Make slub_debug a global variable so that it can
be used by the minidump module to reserve memory
for slab owner.
Bug: 177377077
Change-Id: I0548a0f0d7abfa1d2df864669fa3aae443fbd6ec
Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>
Make page_owner_enabled variable global so that it
can be used by the minidump module to reserve memory
for page owner.
Bug: 177374907
Change-Id: Ib6189466c810321d109fa7d32773728215887e84
Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>
Collect the time when each allocation is freed, to help with memory
analysis with kdump/ramdump. Add the timestamp also in the page_owner
debugfs file and print it in dump_page().
Having another timestamp when we free the page helps for debugging page
migration issues. For example both alloc and free timestamps being the
same can gave hints that there is an issue with migrating memory, as
opposed to a page just being dropped during migration.
Link: https://lkml.kernel.org/r/20210203175905.12267-1-georgi.djakov@linaro.org
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 178721506
(cherry picked from https://lore.kernel.org/mm-commits/20210309004326.M_rrImRZI%25akpm@linux-foundation.org/)
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Change-Id: I99567a372536b4541ed81378baccecc171f78a72
If there are errors during swap read or write, they can easily fill the
log buffer and remove any previous messages that might be useful for
debugging, especially on systems that rely for logging only on the kernel
ring-buffer.
For example, on a systems using zram as swap, we are more likely to see
any page allocation errors preceding the swap write errors if the alerts
are ratelimited.
Link: https://lkml.kernel.org/r/20210201142055.29068-1-georgi.djakov@linaro.org
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 178886143
(cherry picked from commit 25eaab438dd58092c5f0c62118d933bf8b2fcc76)
Change-Id: Id9ec7098ee381128090a2aca181baed7f17b9843
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
The straight backport of 5.12's e1baddf8475b ("mm/memcg: set memcg when
splitting page") works fine in 5.11, but turned out to be wrong for 5.10:
because that relies on a separate flag, which must also be set for the
memcg to be recognized and uncharged and cleared when freeing. Fix that.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c2655835fd8cabdfe7dab737253de3ffb88da126 ]
If one or more notifiers fails .invalidate_range_start(), invoke
.invalidate_range_end() for "all" notifiers. If there are multiple
notifiers, those that did not fail are expecting _start() and _end() to
be paired, e.g. KVM's mmu_notifier_count would become imbalanced.
Disallow notifiers that can fail _start() from implementing _end() so
that it's unnecessary to either track which notifiers rejected _start(),
or had already succeeded prior to a failed _start().
Note, the existing behavior of calling _start() on all notifiers even
after a previous notifier failed _start() was an unintented "feature".
Make it canon now that the behavior is depended on for correctness.
As of today, the bug is likely benign:
1. The only caller of the non-blocking notifier is OOM kill.
2. The only notifiers that can fail _start() are the i915 and Nouveau
drivers.
3. The only notifiers that utilize _end() are the SGI UV GRU driver
and KVM.
4. The GRU driver will never coincide with the i195/Nouveau drivers.
5. An imbalanced kvm->mmu_notifier_count only causes soft lockup in the
_guest_, and the guest is already doomed due to being an OOM victim.
Fix the bug now to play nice with future usage, e.g. KVM has a
potential use case for blocking memslot updates in KVM while an
invalidation is in-progress, and failure to unblock would result in said
updates being blocked indefinitely and hanging.
Found by inspection. Verified by adding a second notifier in KVM that
periodically returns -EAGAIN on non-blockable ranges, triggering OOM,
and observing that KVM exits with an elevated notifier count.
Link: https://lkml.kernel.org/r/20210311180057.1582638-1-seanjc@google.com
Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d85aecf2844ff02a0e5f077252b2461d4f10c9f0 upstream.
The current implementation of hugetlb_cgroup for shared mappings could
have different behavior. Consider the following two scenarios:
1.Assume initial css reference count of hugetlb_cgroup is 1:
1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference
count is 2 associated with 1 file_region.
1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference
count is 3 associated with 2 file_region.
1.3 coalesce_file_region will coalesce these two file_regions into
one. So css reference count is 3 associated with 1 file_region
now.
2.Assume initial css reference count of hugetlb_cgroup is 1 again:
2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference
count is 2 associated with 1 file_region.
Therefore, we might have one file_region while holding one or more css
reference counts. This inconsistency could lead to imbalanced css_get()
and css_put() pair. If we do css_put one by one (i.g. hole punch case),
scenario 2 would put one more css reference. If we do css_put all
together (i.g. truncate case), scenario 1 will leak one css reference.
The imbalanced css_get() and css_put() pair would result in a non-zero
reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup
directory is removed __but__ associated resource is not freed. This
might result in OOM or can not create a new hugetlb cgroup in a busy
workload ultimately.
In order to fix this, we have to make sure that one file_region must
hold exactly one css reference. So in coalesce_file_region case, we
should release one css reference before coalescence. Also only put css
reference when the entire file_region is removed.
The last thing to note is that the caller of region_add() will only hold
one reference to h_cg->css for the whole contiguous reservation region.
But this area might be scattered when there are already some
file_regions reside in it. As a result, many file_regions may share only
one h_cg->css reference. In order to ensure that one file_region must
hold exactly one css reference, we should do css_get() for each
file_region and release the reference held by caller when they are done.
[linmiaohe@huawei.com: fix imbalanced css_get and css_put pair for shared mappings]
Link: https://lkml.kernel.org/r/20210316023002.53921-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20210301120540.37076-1-linmiaohe@huawei.com
Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings")
Reported-by: kernel test robot <lkp@intel.com> (auto build test ERROR)
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1baddf8475b06cc56f4bafecf9a32a124343d9f upstream.
As described in the split_page() comment, for the non-compound high order
page, the sub-pages must be freed individually. If the memcg of the first
page is valid, the tail pages cannot be uncharged when be freed.
For example, when alloc_pages_exact is used to allocate 1MB continuous
physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is
set). When make_alloc_exact free the unused 1MB and free_pages_exact free
the applied 1MB, actually, only 4KB(one page) is uncharged.
Therefore, the memcg of the tail page needs to be set when splitting a
page.
Michel:
There are at least two explicit users of __GFP_ACCOUNT with
alloc_exact_pages added recently. See 7efe8ef274024 ("KVM: arm64:
Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c419621873713
("KVM: s390: Add memcg accounting to KVM allocations"), so this is not
just a theoretical issue.
Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.com
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rui Xiang <rui.xiang@huawei.com>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Weilong Chen <chenweilong@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be6c8982e4ab9a41907555f601b711a7e2a17d4c upstream.
Rename mem_cgroup_split_huge_fixup to split_page_memcg and explicitly pass
in page number argument.
In this way, the interface name is more common and can be used by
potential users. In addition, the complete info(memcg and flag) of the
memcg needs to be set to the tail pages.
Link: https://lkml.kernel.org/r/20210304074053.65527-2-zhouguanghui1@huawei.com
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Weilong Chen <chenweilong@huawei.com>
Cc: Rui Xiang <rui.xiang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-99941e2:
ANDROID: gki_defconfig: Enable NET_CLS_{BASIC,TCINDEX,MATCHALL} & NET_ACT_{GACT,MIRRED}
FROMLIST: selftests: Add a MREMAP_DONTUNMAP selftest for shmem
FROMLIST: mm: Extend MREMAP_DONTUNMAP to non-anonymous mappings
ANDROID: GKI: enable CONFIG_CMA_SYSFS
ANDROID: make cma_sysfs experimental
FROMLIST: mm: cma: support sysfs
ANDROID: cpuidle: Move vendor hook to enter proper state
ANDROID: fix up ext4 build from 5.10.26
ANDROID: GKI: Enable DETECT_HUNG_TASK
ANDROID: refresh ABI XML to new version
ANDROID: GKI: refresh ABI XML
Linux 5.10.26
cifs: Fix preauth hash corruption
x86/apic/of: Fix CPU devicetree-node lookups
genirq: Disable interrupts for force threaded handlers
firmware/efi: Fix a use after bug in efi_mem_reserve_persistent
efi: use 32-bit alignment for efi_guid_t literals
static_call: Fix static_call_update() sanity check
MAINTAINERS: move the staging subsystem to lists.linux.dev
MAINTAINERS: move some real subsystems off of the staging mailing list
ext4: fix rename whiteout with fast commit
ext4: fix potential error in ext4_do_update_inode
ext4: do not try to set xattr into ea_inode if value is empty
ext4: stop inode update before return
ext4: find old entry again if failed to rename whiteout
ext4: fix error handling in ext4_end_enable_verity()
efivars: respect EFI_UNSUPPORTED return from firmware
x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
x86: Move TS_COMPAT back to asm/thread_info.h
kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
x86/ioapic: Ignore IRQ2 again
perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT
perf/x86/intel: Fix a crash caused by zero PEBS status
PCI: rpadlpar: Fix potential drc_name corruption in store functions
counter: stm32-timer-cnt: fix ceiling miss-alignment with reload register
counter: stm32-timer-cnt: fix ceiling write max value
iio: hid-sensor-temperature: Fix issues of timestamp channel
iio: hid-sensor-prox: Fix scale not correct issue
iio: hid-sensor-humidity: Fix alignment issue of timestamp channel
iio: adc: adi-axi-adc: add proper Kconfig dependencies
iio: adc: ad7949: fix wrong ADC result due to incorrect bit mask
iio: adc: ab8500-gpadc: Fix off by 10 to 3
iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handler
iio: adis16400: Fix an error code in adis16400_initial_setup()
iio:adc:qcom-spmi-vadc: add default scale to LR_MUX2_BAT_ID channel
iio:adc:stm32-adc: Add HAS_IOMEM dependency
thunderbolt: Increase runtime PM reference count on DP tunnel discovery
thunderbolt: Initialize HopID IDAs in tb_switch_alloc()
usb: dwc3: gadget: Prevent EP queuing while stopping transfers
usb: dwc3: gadget: Allow runtime suspend if UDC unbinded
usb: typec: tcpm: Invoke power_supply_changed for tcpm-source-psy-
usb: typec: Remove vdo[3] part of tps6598x_rx_identity_reg struct
usb: gadget: configfs: Fix KASAN use-after-free
usbip: Fix incorrect double assignment to udc->ud.tcp_rx
usb-storage: Add quirk to defeat Kindle's automatic unload
powerpc: Force inlining of cpu_has_feature() to avoid build failure
gfs2: bypass signal_our_withdraw if no journal
gfs2: move freeze glock outside the make_fs_rw and _ro functions
gfs2: Add common helper for holding and releasing the freeze glock
regulator: pca9450: Clear PRESET_EN bit to fix BUCK1/2/3 voltage setting
regulator: pca9450: Enable system reset on WDOG_B assertion
regulator: pca9450: Add SD_VSEL GPIO for LDO5
net: bonding: fix error return code of bond_neigh_init()
io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
io_uring: don't attempt IO reissue from the ring exit path
drm/amd/pm: fulfill the Polaris implementation for get_clock_by_type_with_latency()
s390/qeth: schedule TX NAPI on QAOB completion
ibmvnic: remove excessive irqsave
media: cedrus: h264: Support profile controls
io_uring: fix inconsistent lock state
iwlwifi: Add a new card for MA family
drm/amd/display: turn DPMS off on connector unplug
MIPS: compressed: fix build with enabled UBSAN
net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081
i40e: Fix endianness conversions
powerpc/sstep: Fix darn emulation
powerpc/sstep: Fix load-store and update emulation
RDMA/mlx5: Allow creating all QPs even when non RDMA profile is used
scsi: isci: Pass gfp_t flags in isci_port_bc_change_received()
scsi: isci: Pass gfp_t flags in isci_port_link_up()
scsi: isci: Pass gfp_t flags in isci_port_link_down()
scsi: mvsas: Pass gfp_t flags to libsas event notifiers
scsi: libsas: Introduce a _gfp() variant of event notifiers
scsi: libsas: Remove notifier indirection
scsi: pm8001: Neaten debug logging macros and uses
scsi: pm80xx: Fix pm8001_mpi_get_nvmd_resp() race condition
scsi: pm80xx: Make running_req atomic
scsi: pm80xx: Make mpi_build_cmd locking consistent
module: harden ELF info handling
module: avoid *goto*s in module_sig_check()
module: merge repetitive strings in module_sig_check()
RDMA/rtrs: Fix KASAN: stack-out-of-bounds bug
RDMA/rtrs: Introduce rtrs_post_send
RDMA/rtrs-srv: Jump to dereg_mr label if allocate iu fails
RDMA/rtrs: Remove unnecessary argument dir of rtrs_iu_free
bpf: Declare __bpf_free_used_maps() unconditionally
serial: stm32: fix DMA initialization error handling
tty: serial: stm32-usart: Remove set but unused 'cookie' variables
ibmvnic: serialize access to work queue on remove
ibmvnic: add some debugs
nvme-rdma: fix possible hang when failing to set io queues
gpiolib: Assign fwnode to parent's if no primary one provided
counter: stm32-timer-cnt: Report count function when SLAVE_MODE_DISABLED
RISC-V: correct enum sbi_ext_rfence_fid
scsi: ufs: ufs-mediatek: Correct operator & -> &&
scsi: myrs: Fix a double free in myrs_cleanup()
scsi: lpfc: Fix some error codes in debugfs
riscv: Correct SPARSEMEM configuration
cifs: fix allocation size on newly created files
kbuild: Fix <linux/version.h> for empty SUBLEVEL or PATCHLEVEL again
net/qrtr: fix __netdev_alloc_skb call
io_uring: ensure that SQPOLL thread is started for exit
pstore: Fix warning in pstore_kill_sb()
i915/perf: Start hrtimer only if sampling the OA buffer
sunrpc: fix refcount leak for rpc auth modules
vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation
vfio: IOMMU_API should be selected
svcrdma: disable timeouts on rdma backchannel
NFSD: fix dest to src mount in inter-server COPY
NFSD: Repair misuse of sv_lock in 5.10.16-rt30.
nfsd: don't abort copies early
nfsd: Don't keep looking up unhashed files in the nfsd file cache
nvmet: don't check iosqes,iocqes for discovery controllers
nvme-tcp: fix a NULL deref when receiving a 0-length r2t PDU
nvme-tcp: fix possible hang when failing to set io queues
nvme-tcp: fix misuse of __smp_processor_id with preemption enabled
nvme: fix Write Zeroes limitations
ALSA: usb-audio: Fix unintentional sign extension issue
afs: Stop listxattr() from listing "afs.*" attributes
afs: Fix accessing YFS xattrs on a non-YFS server
ASoC: simple-card-utils: Do not handle device clock
ASoC: qcom: lpass-cpu: Fix lpass dai ids parse
ASoC: codecs: wcd934x: add a sanity check in set channel map
ASoC: qcom: sdm845: Fix array out of range on rx slim channels
ASoC: qcom: sdm845: Fix array out of bounds access
ASoC: SOF: intel: fix wrong poll bits in dsp power down
ASoC: SOF: Intel: unregister DMIC device on probe error
ASoC: Intel: bytcr_rt5640: Fix HP Pavilion x2 10-p0XX OVCD current threshold
ASoC: fsl_ssi: Fix TDM slot setup for I2S mode
drm/amd/display: Correct algorithm for reversed gamma
vhost-vdpa: set v->config_ctx to NULL if eventfd_ctx_fdget() fails
vhost-vdpa: fix use-after-free of v->config_ctx
btrfs: fix slab cache flags for free space tree bitmap
btrfs: fix race when cloning extent buffer during rewind of an old root
zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone()
zonefs: prevent use of seq files as swap file
zonefs: Fix O_APPEND async write handling
s390/pci: fix leak of PCI device structure
s390/pci: remove superfluous zdev->zbus check
s390/pci: refactor zpci_create_device()
s390/vtime: fix increased steal time accounting
Revert "PM: runtime: Update device status before letting suppliers suspend"
ALSA: hda/realtek: fix mute/micmute LEDs for HP 850 G8
ALSA: hda/realtek: fix mute/micmute LEDs for HP 440 G8
ALSA: hda/realtek: fix mute/micmute LEDs for HP 840 G8
ALSA: hda/realtek: Apply headset-mic quirks for Xiaomi Redmibook Air
ALSA: hda: generic: Fix the micmute led init state
ALSA: hda/realtek: apply pin quirk for XiaomiNotebook Pro
ALSA: dice: fix null pointer dereference when node is disconnected
spi: cadence: set cqspi to the driver_data field of struct device
ASoC: ak5558: Add MODULE_DEVICE_TABLE
ASoC: ak4458: Add MODULE_DEVICE_TABLE
ANDROID: refresh ABI XML to new version
ANDROID: refresh ABI
Linux 5.10.25
net: dsa: b53: Support setting learning on port
ALSA: usb-audio: Don't avoid stopping the stream at disconnection
Revert "nfsd4: a client's own opens needn't prevent delegations"
Revert "nfsd4: remove check_conflicting_opens warning"
fuse: fix live lock in fuse_iget()
RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodes
bpf, selftests: Fix up some test_verifier cases for unprivileged
bpf: Add sanity check for upper ptr_limit
bpf: Simplify alu_limit masking for pointer arithmetic
bpf: Fix off-by-one for area size in creating mask to left
bpf: Prohibit alu ops for pointer types not defining ptr_limit
crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%reg
Linux 5.10.24
RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size
KVM: arm64: Fix nVHE hyp panic host context restore
xen/events: avoid handling the same event on two cpus at the same time
xen/events: don't unmask an event channel when an eoi is pending
mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
mm/madvise: replace ptrace attach requirement for process_madvise
mm/userfaultfd: fix memory corruption due to writeprotect
KVM: arm64: Fix exclusive limit for IPA size
KVM: arm64: Reject VM creation when the default IPA size is unsupported
KVM: arm64: nvhe: Save the SPE context early
KVM: arm64: Avoid corrupting vCPU context register in guest exit
KVM: arm64: Fix range alignment when walking page tables
KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
x86/sev-es: Use __copy_from_user_inatomic()
x86/sev-es: Correctly track IRQ states in runtime #VC handler
x86/entry: Move nmi entry/exit into common code
x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
x86/sev-es: Introduce ip_within_syscall_gap() helper
x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
binfmt_misc: fix possible deadlock in bm_register_write
powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
powerpc: Fix inverted SET_FULL_REGS bitop
powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
sched/membarrier: fix missing local execution of ipi_sync_rq_state()
linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
zram: fix return value on writeback_store
include/linux/sched/mm.h: use rcu_dereference in in_vfork()
stop_machine: mark helpers __always_inline
seqlock,lockdep: Fix seqcount_latch_init()
powerpc/64s/exception: Clean up a missed SRR specifier
hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
perf/core: Flush PMU internal buffers for per-CPU events
arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds
configfs: fix a use-after-free in __configfs_open_file
nvme-fc: fix racing controller reset and create association
block: rsxx: fix error return code of rsxx_pci_probe()
NFSv4.2: fix return value of _nfs4_get_security_label()
NFS: Don't gratuitously clear the inode cache when lookup failed
NFS: Don't revalidate the directory permissions on a lookup failure
SUNRPC: Set memalloc_nofs_save() for sync tasks
arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init()
cpufreq: qcom-hw: fix dereferencing freed memory 'data'
sh_eth: fix TRSCER mask for R7S72100
staging: comedi: pcl818: Fix endian problem for AI command data
staging: comedi: pcl711: Fix endian problem for AI command data
staging: comedi: me4000: Fix endian problem for AI command data
staging: comedi: dmm32at: Fix endian problem for AI command data
staging: comedi: das800: Fix endian problem for AI command data
staging: comedi: das6402: Fix endian problem for AI command data
staging: comedi: adv_pci1710: Fix endian problem for AI command data
staging: comedi: addi_apci_1500: Fix endian problem for command sample
staging: comedi: addi_apci_1032: Fix endian problem for COS sample
staging: rtl8192e: Fix possible buffer overflow in _rtl92e_wx_set_scan
staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
staging: rtl8712: unterminated string leads to read overflow
staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
misc: fastrpc: restrict user apps from sending kernel RPC messages
misc/pvpanic: Export module FDT device table
Revert "serial: max310x: rework RX interrupt handling"
usbip: fix vudc usbip_sockfd_store races leading to gpf
usbip: fix vhci_hcd attach_store() races leading to gpf
usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
usbip: fix vudc to check for stream socket
usbip: fix vhci_hcd to check for stream socket
usbip: fix stub_dev to check for stream socket
USB: serial: cp210x: add some more GE USB IDs
USB: serial: cp210x: add ID for Acuity Brands nLight Air Adapter
USB: serial: ch341: add new Product ID
USB: serial: io_edgeport: fix memory leak in edge_startup
xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
xhci: Improve detection of device initiated wake signal.
usb: xhci: do not perform Soft Retry for some xHCI hosts
usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
USB: usblp: fix a hang in poll() if disconnected
usb: dwc3: qcom: Honor wakeup enabled/disabled state
usb: dwc3: qcom: add ACPI device id for sc8180x
usb: dwc3: qcom: add URS Host support for sdm845 ACPI boot
usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
usb: gadget: f_uac1: stop playback on function disable
usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
USB: gadget: u_ether: Fix a configfs return code
USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
Goodix Fingerprint device is not a modem
cifs: do not send close in compound create+close requests
mmc: cqhci: Fix random crash when remove mmc module/card
mmc: core: Fix partition switch time for eMMC
mmc: mmci: Add MMC_CAP_NEED_RSP_BUSY for the stm32 variants
xen/events: reset affinity of 2-level event when tearing it down
software node: Fix node registration
s390/dasd: fix hanging IO request during DASD driver unbind
s390/dasd: fix hanging DASD driver unbind
arm64: perf: Fix 64-bit event counter read truncation
arm64: mte: Map hotplugged memory as Normal Tagged
arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL
block: Try to handle busy underlying device on discard
block: Discard page cache of zone reset target range
Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")
ALSA: usb-audio: fix use after free in usb_audio_disconnect
ALSA: usb-audio: fix NULL ptr dereference in usb_audio_probe
ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend()
ALSA: usb-audio: Apply the control quirk to Plantronics headsets
ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar
ALSA: hda: Avoid spurious unsol event handling during S3/S4
ALSA: hda: Flush pending unsolicited events before suspend
ALSA: hda: Drop the BATCH workaround for AMD controllers
ALSA: hda/ca0132: Add Sound BlasterX AE-5 Plus support
ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5
ALSA: hda/hdmi: Cancel pending works before suspend
ALSA: usb: Add Plantronics C320-M USB ctrl msg delay quirk
ARM: efistub: replace adrl pseudo-op with adr_l macro invocation
ARM: assembler: introduce adr_l, ldr_l and str_l macros
ARM: 9029/1: Make iwmmxt.S support Clang's integrated assembler
mmc: sdhci: Update firmware interface API
clk: qcom: gpucc-msm8998: Add resets, cxc, fix flags on gpu_gx_gdsc
scsi: target: core: Prevent underflow for service actions
scsi: target: core: Add cmd length set before cmd complete
scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
sysctl.c: fix underflow value setting risk in vm_table
drivers/base/memory: don't store phys_device in memory blocks
s390/smp: __smp_rescan_cpus() - move cpumask away from stack
kasan: fix memory corruption in kasan_bitops_tags test
i40e: Fix memory leak in i40e_probe
PCI: Fix pci_register_io_range() memory leak
kbuild: clamp SUBLEVEL to 255
ext4: don't try to processed freed blocks until mballoc is initialized
PCI/LINK: Remove bandwidth notification
drivers/base: build kunit tests without structleak plugin
PCI: mediatek: Add missing of_node_put() to fix reference leak
PCI: xgene-msi: Fix race in installing chained irq handler
Input: applespi - don't wait for responses to commands indefinitely.
sparc64: Use arch_validate_flags() to validate ADI flag
sparc32: Limit memblock allocation to low memory
clk: qcom: gdsc: Implement NO_RET_PERIPH flag
iommu/amd: Fix performance counter initialization
powerpc/64: Fix stack trace not displaying final frame
HID: logitech-dj: add support for the new lightspeed connection iteration
powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
powerpc: improve handling of unrecoverable system reset
spi: stm32: make spurious and overrun interrupts visible
powerpc/pci: Add ppc_md.discover_phbs()
Platform: OLPC: Fix probe error handling
mmc: sdhci-iproc: Add ACPI bindings for the RPi
mmc: mediatek: fix race condition between msdc_request_timeout and irq
mmc: mxs-mmc: Fix a resource leak in an error handling path in 'mxs_mmc_probe()'
iommu/vt-d: Clear PRQ overflow only when PRQ is empty
udf: fix silent AED tagLocation corruption
scsi: ufs: WB is only available on LUN #0 to #7
i2c: rcar: optimize cacheline to minimize HW race condition
i2c: rcar: faster irq code to minimize HW race condition
ath11k: fix AP mode for QCA6390
ath11k: start vdev if a bss peer is already created
ath11k: peer delete synchronization with firmware
net: enetc: initialize RFS/RSS memories for unused ports too
enetc: Fix unused var build warning for CONFIG_OF
net: dsa: tag_mtk: fix 802.1ad VLAN egress
net: dsa: tag_ar9331: let DSA core deal with TX reallocation
net: dsa: tag_gswip: let DSA core deal with TX reallocation
net: dsa: tag_dsa: let DSA core deal with TX reallocation
net: dsa: tag_brcm: let DSA core deal with TX reallocation
net: dsa: tag_edsa: let DSA core deal with TX reallocation
net: dsa: tag_lan9303: let DSA core deal with TX reallocation
net: dsa: tag_mtk: let DSA core deal with TX reallocation
net: dsa: tag_ocelot: let DSA core deal with TX reallocation
net: dsa: tag_qca: let DSA core deal with TX reallocation
net: dsa: trailer: don't allocate additional memory for padding/tagging
net: dsa: tag_ksz: don't allocate additional memory for padding/tagging
net: dsa: implement a central TX reallocation procedure
s390/qeth: fix notification for pending buffers during teardown
s390/qeth: improve completion of pending TX buffers
s390/qeth: remove QETH_QDIO_BUF_HANDLED_DELAYED state
s390/qeth: don't replace a fully completed async TX buffer
net: hns3: fix error mask definition of flow director
cifs: fix credit accounting for extra channel
media: rc: compile rc-cec.c into rc-core
media: v4l: vsp1: Fix bru null pointer access
media: v4l: vsp1: Fix uif null pointer access
media: rkisp1: params: fix wrong bits settings
media: usbtv: Fix deadlock on suspend
sh_eth: fix TRSCER mask for R7S9210
qxl: Fix uninitialised struct field head.surface_id
s390/crypto: return -EFAULT if copy_to_user() fails
s390/cio: return -EFAULT if copy_to_user() fails
drm/i915: Wedge the GPU if command parser setup fails
drm/shmem-helpers: vunmap: Don't put pages for dma-buf
drm: meson_drv add shutdown function
drm: Use USB controller's DMA mask when importing dmabufs
drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff
drm/shmem-helper: Check for purged buffers in fault handler
drm/amdgpu/display: handle aux backlight in backlight_get_brightness
drm/amdgpu/display: don't assert in set backlight function
drm/amdgpu/display: simplify backlight setting
drm/amd/pm: bug fix for pcie dpm
drm/amd/display: Fix nested FPU context in dcn21_validate_bandwidth()
drm/amdgpu/display: use GFP_ATOMIC in dcn21_validate_bandwidth_fp()
drm/amd/display: Add a backlight module option
drm/compat: Clear bounce structures
gpio: fix gpio-device list corruption
gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
gpiolib: acpi: Allow to find GpioInt() resource by name and index
gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
bnxt_en: reliably allocate IRQ table on reset to avoid crash
s390/cio: return -EFAULT if copy_to_user() fails again
net: hns3: fix bug when calculating the TCAM table info
net: hns3: fix query vlan mask value error for flow director
perf report: Fix -F for branch & mem modes
perf traceevent: Ensure read cmdlines are null terminated.
mlxsw: spectrum_ethtool: Add an external speed to PTYS register
selftests: forwarding: Fix race condition in mirror installation
net: phy: make mdio_bus_phy_suspend/resume as __maybe_unused
ethtool: fix the check logic of at least one channel for RX/TX
net: stmmac: fix wrongly set buffer2 valid when sph unsupport
net: stmmac: fix watchdog timeout during suspend/resume stress test
net: stmmac: stop each tx channel independently
perf build: Fix ccache usage in $(CC) when generating arch errno table
tools/resolve_btfids: Fix build error with older host toolchains
ixgbe: fail to create xfrm offload of IPsec tunnel mode SA
r8169: fix r8168fp_adjust_ocp_cmd function
s390/qeth: fix memory leak after failed TX Buffer allocation
net: qrtr: fix error return code of qrtr_sendmsg()
net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
net: davicom: Fix regulator not turned off on driver removal
net: davicom: Fix regulator not turned off on failed probe
net: lapbether: Remove netif_start_queue / netif_stop_queue
stmmac: intel: Fixes clock registration error seen for multiple interfaces
net: stmmac: Fix VLAN filter delete timeout issue in Intel mGBE SGMII
cipso,calipso: resolve a number of problems with the DOI refcounts
netdevsim: init u64 stats for 32bit hardware
net: usb: qmi_wwan: allow qmimux add/del with master up
net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10
net: mscc: ocelot: properly reject destination IP keys in VCAP IS1
net: sched: avoid duplicates in classes dump
nexthop: Do not flush blackhole nexthops when loopback goes down
net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10
net/mlx4_en: update moderation when config reset
net: ethernet: mtk-star-emac: fix wrong unmap in RX handling
net: enetc: keep RX ring consumer index in sync with hardware
net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr
net: enetc: force the RGMII speed and duplex instead of operating in inband mode
net: enetc: don't disable VLAN filtering in IFF_PROMISC mode
net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets
net: enetc: take the MDIO lock only once per NAPI poll cycle
net: enetc: don't overwrite the RSS indirection table when initializing
sh_eth: fix TRSCER mask for SH771x
net: dsa: tag_rtl4_a: fix egress tags
docs: networking: drop special stable handling
Revert "mm, slub: consider rest of partial list if acquire_slab() fails"
cifs: return proper error code in statfs(2)
mount: fix mounting of detached mounts onto targets that reside on shared mounts
powerpc/603: Fix protection of user pages mapped with PROT_NONE
mt76: dma: do not report truncated frames to mac80211
ibmvnic: always store valid MAC address
ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning.
libbpf: Clear map_info before each bpf_obj_get_info_by_fd
samples, bpf: Add missing munmap in xdpsock
selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier
selftests/bpf: No need to drop the packet when there is no geneve opt
selftests/bpf: Use the last page in test_snprintf_btf on s390
net: phy: fix save wrong speed and duplex problem if autoneg is on
net: always use icmp{,v6}_ndo_send from ndo_start_xmit
netfilter: x_tables: gpf inside xt_find_revision()
netfilter: nf_nat: undo erroneous tcp edemux lookup
tcp: add sanity tests to TCP_QUEUE_SEQ
tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE)
can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode
can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode
can: flexcan: enable RX FIFO after FRZ/HALT valid
can: flexcan: assert FRZ bit in flexcan_chip_freeze()
can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership
net: l2tp: reduce log level of messages in receive path, add counter instead
net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
net: check if protocol extracted by virtio_net_hdr_set_proto is correct
net: Fix gro aggregation for udp encaps with zero csum
ath9k: fix transmitting to stations in dynamic SMPS mode
crypto: mips/poly1305 - enable for all MIPS processors
ethernet: alx: fix order of calls on resume
powerpc/pseries: Don't enforce MSI affinity with kdump
powerpc/perf: Fix handling of privilege level checks in perf interrupt context
uapi: nfnetlink_cthelper.h: fix userspace compilation error
Linux 5.10.23
nvme-pci: add quirks for Lexar 256GB SSD
nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
KVM: SVM: Clear the CR4 register on reset
scsi: ufs: Fix a duplicate dev quirk number
ASoC: Intel: sof_sdw: add quirk for HP Spectre x360 convertible
ASoC: Intel: sof_sdw: reorganize quirks by generation
PCI: cadence: Retrain Link to work around Gen2 training defect
ALSA: usb-audio: add mixer quirks for Pioneer DJM-900NXS2
ALSA: usb-audio: Add DJM750 to Pioneer mixer quirk
HID: i2c-hid: Add I2C_HID_QUIRK_NO_IRQ_AFTER_RESET for ITE8568 EC on Voyo Winpad A15
mmc: sdhci-of-dwcmshc: set SDHCI_QUIRK2_PRESET_VALUE_BROKEN
drm/msm/a5xx: Remove overwriting A5XX_PC_DBG_ECO_CNTL register
scsi: ufs: ufs-exynos: Use UFSHCD_QUIRK_ALIGN_SG_WITH_PAGE_SIZE
scsi: ufs: ufs-exynos: Apply vendor-specific values for three timeouts
scsi: ufs: Introduce a quirk to allow only page-aligned sg entries
misc: eeprom_93xx46: Add quirk to support Microchip 93LC46B eeprom
scsi: ufs: Add a quirk to permit overriding UniPro defaults
scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A32
KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID check
PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller
usb: cdns3: fix NULL pointer dereference on no platform data
usb: cdns3: add quirk for enable runtime pm by default
usb: cdns3: host: add xhci_plat_priv quirk XHCI_SKIP_PHY_INIT
usb: cdns3: host: add .suspend_quirk for xhci-plat.c
ASoC: Intel: bytcr_rt5640: Add quirk for ARCHOS Cesium 140
ACPI: video: Add DMI quirk for GIGABYTE GB-BXBT-2807
media: cx23885: add more quirks for reset DMA on some AMD IOMMU
HID: mf: add support for 0079:1846 Mayflash/Dragonrise USB Gamecube Adapter
platform/x86: acer-wmi: Add ACER_CAP_KBD_DOCK quirk for the Aspire Switch 10E SW3-016
platform/x86: acer-wmi: Add support for SW_TABLET_MODE on Switch devices
platform/x86: acer-wmi: Add ACER_CAP_SET_FUNCTION_MODE capability flag
platform/x86: acer-wmi: Add new force_caps module parameter
platform/x86: acer-wmi: Cleanup accelerometer device handling
platform/x86: acer-wmi: Cleanup ACER_CAP_FOO defines
bus: ti-sysc: Implement GPMC debug quirk to drop platform data
ASoC: Intel: sof_sdw: add quirk for new TigerLake-SDCA device
mwifiex: pcie: skip cancel_work_sync() on reset failure path
Bluetooth: btqca: Add valid le states quirk
iommu/amd: Fix sleeping in atomic in increase_address_space()
btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
btrfs: export and rename qgroup_reserve_meta
arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+
parisc: Enable -mlong-calls gcc option with CONFIG_COMPILE_TEST
nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
ASoC: SOF: Intel: broadwell: fix mutual exclusion with catpt driver
ACPICA: Fix race in generic_serial_bus (I2C) and GPIO op_region parameter handling
Linux 5.10.22
r8169: fix resuming from suspend on RTL8105e if machine runs on battery
tomoyo: recognize kernel threads correctly
of: unittest: Fix build on architectures without CONFIG_OF_ADDRESS
Revert "arm64: dts: amlogic: add missing ethernet reset ID"
iommu/vt-d: Fix status code for Allocate/Free PASID command
rsxx: Return -EFAULT if copy_to_user() fails
ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount
ALSA: hda: intel-nhlt: verify config type
IB/mlx5: Add missing error code
RDMA/rxe: Fix missing kconfig dependency on CRYPTO
RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep
ALSA: ctxfi: cthw20k2: fix mask on conf to allow 4 bits
mm: Remove examples from enum zone_type comment
arm64: mm: Set ZONE_DMA size based on early IORT scan
arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
of: unittest: Add test for of_dma_get_max_cpu_address()
of/address: Introduce of_dma_get_max_cpu_address()
arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
arm64: mm: Move reserve_crashkernel() into mem_init()
crypto - shash: reduce minimum alignment of shash_desc structure
drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie
drm/amdgpu:disable VCN for Navi12 SKU
dm verity: fix FEC for RS roots unaligned to block size
dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size
io_uring: ignore double poll add on the same waitqueue head
ring-buffer: Force before_stamp and write_stamp to be different on discard
PM: runtime: Update device status before letting suppliers suspend
btrfs: fix warning when creating a directory with smack enabled
btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
btrfs: fix race between extent freeing/allocation when using bitmaps
btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled
btrfs: fix race between swap file activation and snapshot creation
btrfs: fix race between writes to swap files and scrub
btrfs: fix raid6 qstripe kmap
btrfs: avoid double put of block group when emptying cluster
tpm, tpm_tis: Decorate tpm_get_timeouts() with request_locality()
tpm, tpm_tis: Decorate tpm_tis_gen_interrupt() with request_locality()
ALSA: usb-audio: Drop bogus dB range in too low level
ALSA: usb-audio: use Corsair Virtuoso mapping for Corsair Virtuoso SE
ALSA: hda/realtek: Enable headset mic of Acer SWIFT with ALC256
Conflicts:
drivers/cpufreq/qcom-cpufreq-hw.c
drivers/vfio/Kconfig
net/qrtr/qrtr.c
Change-Id: Ib622ea353c1c1db4b1cce31729d224df47902a57
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
Updates the branch to the 5.10.26 upstream kernel version.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I84aa29bf4e4e809051eb346830c4c4b5acb78c8c
Currently MREMAP_DONTUNMAP only accepts private anonymous mappings.
This restriction was placed initially for simplicity and not because
there exists a technical reason to do so.
This change will widen the support to include any mappings which are not
VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
This change will result in mremap(MREMAP_DONTUNMAP) returning -EINVAL
if VM_DONTEXPAND or VM_PFNMAP mappings are specified.
Lokesh Gidra who works on the Android JVM, provided an explanation of how
such a feature will improve Android JVM garbage collection:
"Android is developing a new garbage collector (GC), based on userfaultfd.
The garbage collector will use userfaultfd (uffd) on the java heap during
compaction. On accessing any uncompacted page, the application threads will
find it missing, at which point the thread will create the compacted page
and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
Before starting this compaction, in a stop-the-world pause the heap will be
mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
UFFD_EVENT_PAGEFAULT events after resuming execution.
To speedup mremap operations, pagetable movement was optimized by moving
PUD entries instead of PTE entries [1]. It was necessary as mremap of even
modest sized memory ranges also took several milliseconds, and stopping the
application for that long isn't acceptable in response-time sensitive
cases.
With UFFDIO_CONTINUE feature [2], it will be even more efficient to
implement this GC, particularly the 'non-moveable' portions of the heap.
It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
However, for this to work, the java heap has to be on a 'shared' vma.
Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
compaction."
[1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%25akpm@linux-foundation.org/
[2] https://lore.kernel.org/linux-mm/20210302000133.272579-1-axelrasmussen@google.com/
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Link: https://lore.kernel.org/patchwork/patch/1401224/
Bug: 160737021
Bug: 169683130
Change-Id: Ic4f023dff404d7b0e35adbe92c7a12536aa0f70d
Since it's not stable until it could be merged into Linus's tree
lets make it as experimental. If a vendor want to use it, they
should carry on cma_sysfs.experimental=Y on kernel parameter.
Otherwise, it will be disabled.
If some vendor enables it, it means they know this is experimental
faeture so Android never guarantee it in the future.
Bug: 179256052
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ic6566197a7865dfcab6964d008103d3686c9d14b
Since CMA is getting used more widely, it's more important to
keep monitoring CMA statistics for system health since it's
directly related to user experience.
This patch introduces sysfs statistics for CMA, in order to provide
some basic monitoring of the CMA allocator.
* the number of CMA page successful allocations
* the number of CMA page allocation failures
These two values allow the user to calcuate the allocation
failure rate for each CMA area.
e.g.)
/sys/kernel/mm/cma/WIFI/alloc_pages_[success|fail]
/sys/kernel/mm/cma/SENSOR/alloc_pages_[success|fail]
/sys/kernel/mm/cma/BLUETOOTH/alloc_pages_[success|fail]
The cma_stat was intentionally allocated by dynamic allocation
to harmonize with kobject lifetime management.
https://lore.kernel.org/linux-mm/YCOAmXqt6dZkCQYs@kroah.com/
Link: https://lore.kernel.org/linux-mm/20210324230759.2213957-1-minchan@kernel.org/
Bug: 179256052
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: https://lore.kernel.org/linux-mm/20210316100433.17665-1-colin.king@canonical.com/
Addresses-Coverity: ("Dereference after null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I86239db91c7853a62a22b2161d1bf8c9099152b7
* refs/heads/tmp-44f812e:
ANDROID: sched/core: Move en/dequeue hooks before related callbacks
FROMGIT: kasan: record task_work_add() call stack
FROMGIT: kasan, mm: integrate slab init_on_free with HW_TAGS
FROMGIT: kasan, mm: integrate slab init_on_alloc with HW_TAGS
FROMGIT: kasan, mm: integrate page_alloc init with HW_TAGS
FROMGIT: mm: introduce debug_pagealloc_{map,unmap}_pages() helpers
FROMGIT: mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO
FROMGIT: mm/page_alloc: clear all pages in post_alloc_hook() with init_on_alloc=1
FROMGIT: mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITY
FROMGIT: kernel/power: allow hibernation with page_poison sanity checking
FROMGIT: mm, page_poison: use static key more efficiently
BACKPORT: mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters
FROMGIT: kasan: init memory in kasan_(un)poison for HW_TAGS
FROMGIT: arm64: kasan: allow to init memory when setting tags
FROMGIT: mm, kasan: don't poison boot memory with tag-based modes
FROMGIT: kasan: initialize shadow to TAG_INVALID for SW_TAGS
FROMGIT: mm/kasan: switch from strlcpy to strscpy
BACKPORT: kasan: remove redundant config option
FROMGIT: kasan: fix per-page tags for non-page_alloc pages
FROMGIT: kasan: fix KASAN_STACK dependency for HW_TAGS
FROMGIT: kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
FROMGIT: arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL
FROMLIST: configfs: make directories inherit uid/gid from creator
ANDROID: GKI: add some padding to some driver core structures
ANDROID: Initial Android 12 OWNERS for abi metafiles
UPSTREAM: iommu/msm: Hook up iotlb_sync_map
UPSTREAM: memory: mtk-smi: Allow building as module
UPSTREAM: memory: mtk-smi: Use platform_register_drivers
UPSTREAM: iommu/mediatek: Fix error code in probe()
UPSTREAM: iommu/mediatek: Fix unsigned domid comparison with less than zero
UPSTREAM: iommu/mediatek: Add mt8192 support
UPSTREAM: memory: mtk-smi: Add mt8192 support
UPSTREAM: iommu/mediatek: Remove unnecessary check in attach_device
UPSTREAM: iommu/mediatek: Support master use iova over 32bit
UPSTREAM: iommu/mediatek: Add iova reserved function
UPSTREAM: iommu/mediatek: Support for multi domains
UPSTREAM: iommu/mediatek: Add get_domain_id from dev->dma_range_map
UPSTREAM: iommu/mediatek: Add iova_region structure
UPSTREAM: iommu/mediatek: Move geometry.aperture updating into domain_finalise
UPSTREAM: iommu/mediatek: Move domain_finalise into attach_device
UPSTREAM: iommu/mediatek: Adjust the structure
UPSTREAM: iommu/mediatek: Support report iova 34bit translation fault in ISR
UPSTREAM: iommu/mediatek: Support up to 34bit iova in tlb flush
UPSTREAM: iommu/mediatek: Add power-domain operation
UPSTREAM: iommu/mediatek: Add pm runtime callback
UPSTREAM: iommu/mediatek: Add device link for smi-common and m4u
UPSTREAM: iommu/mediatek: Add error handle for mtk_iommu_probe
UPSTREAM: iommu/mediatek: Move hw_init into attach_device
UPSTREAM: iommu/mediatek: Update oas for v7s
UPSTREAM: iommu/mediatek: Add a flag for iova 34bits case
UPSTREAM: iommu/io-pgtable-arm-v7s: Quad lvl1 pgtable for MediaTek
UPSTREAM: iommu/io-pgtable-arm-v7s: Add cfg as a param in some macros
UPSTREAM: iommu/io-pgtable-arm-v7s: Clarify LVL_SHIFT/BITS macro
UPSTREAM: iommu/io-pgtable-arm-v7s: Use ias to check the valid iova in unmap
UPSTREAM: iommu/io-pgtable-arm-v7s: Extend PA34 for MediaTek
UPSTREAM: iommu/mediatek: Use the common mtk-memory-port.h
UPSTREAM: dt-bindings: mediatek: Add binding for mt8192 IOMMU
UPSTREAM: dt-bindings: memory: mediatek: Rename header guard for SMI header file
UPSTREAM: dt-bindings: memory: mediatek: Extend LARB_NR_MAX to 32
UPSTREAM: dt-bindings: memory: mediatek: Add a common memory header file
UPSTREAM: dt-bindings: memory: mediatek: Convert SMI to DT schema
UPSTREAM: dt-bindings: iommu: mediatek: Convert IOMMU to DT schema
UPSTREAM: iommu/mediatek: Remove the tlb-ops for v7s
UPSTREAM: iommu/io-pgtable: Remove TLBI_ON_MAP quirk
UPSTREAM: iommu/io-pgtable: Allow io_pgtable_tlb ops optional
UPSTREAM: iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
UPSTREAM: iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
BACKPORT: UPSTREAM: iommu: Add iova and size as parameters in iotlb_sync_map
UPSTREAM: iommu/io-pgtable: Remove tlb_flush_leaf
ANDROID: abi_gki_aarch64_qcom: Add symbols to allow list
ANDROID: Add vendor hook to binder.
ANDROID: fs: Add vendor hooks for ep_create_wakeup_source & timerfd_create
Revert "FROMLIST: fs/buffer.c: Revoke LRU when trying to drop buffers"
ANDROID: enable LLVM_IAS=1 for clang's integrated assembler for arm
FROMLIST: ARM: kprobes: rewrite test-arm.c in UAL
FROMLIST: ARM: kprobes: fix UNPREDICTABLE warnings
UPSTREAM: ARM: efistub: replace adrl pseudo-op with adr_l macro invocation
UPSTREAM: ARM: assembler: introduce adr_l, ldr_l and str_l macros
UPSTREAM: ARM: 9029/1: Make iwmmxt.S support Clang's integrated assembler
FROMGIT: binder: BINDER_GET_FROZEN_INFO ioctl
FROMGIT: binder: use EINTR for interrupted wait for work
BACKPORT: FROMGIT: binder: BINDER_FREEZE ioctl
ANDROID: qcom: Add pci_dev_present to ABI
ANDROID: GKI: Add sysfs_emit to symbol list
ANDROID: gki_defconfig: Enable IFB, NET_SCH_TBF, NET_ACT_POLICE
ANDROID: gki_defconfig: Enable USB_NET_CDC_NCM
ANDROID: gki_defconfig: Enable USB_NET_AQC111
UPSTREAM: usb: dwc3: gadget: Use max speed if unspecified
UPSTREAM: usb: dwc3: gadget: Set gadget_max_speed when set ssp_rate
ANDROID: freezer: export the freezer_cgrp_subsys for GKI purpose.
UPSTREAM: usb: dwc3: qcom: skip interconnect init for ACPI probe
FROMGIT: usb: dwc3: gadget: Ignore EP queue requests during bus reset
FROMGIT: usb: dwc3: gadget: Avoid continuing preparing TRBs during teardown
ANDROID: gpiolib: Add vendor hook for gpio read
ANDROID: abi_gki_aarch64_qcom: Whitelist sched_setattr
ANDROID: GKI: sched: add Android ABI padding to some structures
ANDROID: GKI: mm: add Android ABI padding to some structures
ANDROID: GKI: mount.h: add Android ABI padding to some structures
FROMLIST: mm: fs: Invalidate BH LRU during page migration
FROMLIST: mm: replace migrate_[prep|finish] with lru_cache_[disable|enable]
BACKPORT: FROMLIST: mm: disable LRU pagevec during the migration temporarily
Revert "FROMLIST: mm: replace migrate_prep with lru_add_drain_all"
Revert "BACKPORT: FROMLIST: mm: disable LRU pagevec during the migration temporarily"
Revert "FROMLIST: mm: fs: Invalidate BH LRU during page migration"
ANDROID: vendor_hooks: Add hooks for account process tick
ANDROID: usb: dwc3: gadget: Export dwc3_stop_active_transfer, dwc3_send_gadget_ep_cmd
ANDROID: clang: update to 12.0.4
ANDROID: vendor_hooks: Add hooks for improving binder trans
ANDROID: GKI: Disable DTPM CPU device
UPSTREAM: powercap/drivers/dtpm: Add the experimental label to the option description
UPSTREAM: powercap/drivers/dtpm: Fix root node initialization
ANDROID: GKI: sched.h: add Android ABI padding to some structures
ANDROID: GKI: module.h: add Android ABI padding to some structures
ANDROID: GKI: sock.h: add Android ABI padding to some structures
ANDROID: sched/fair: Do not sync task util with SD_BALANCE_FORK
FROMGIT: selinux: vsock: Set SID for socket returned by accept()
ANDROID: usb: typec: tcpci: Migrate restricted vendor hook
ANDROID: qcom: Add is_dma_buf_file to ABI
ANDROID: GKI: update .xml file
ANDROID: GKI: enable KFENCE by setting the sample interval to 500ms
ANDROID: abi_gki_aarch64_qcom: Add xhci symbols to list
ANDROID: vmlinux.lds.h: Define SANITIZER_DISCARDS with CONFIG_CFI_CLANG
ANDROID: usb: typec: tcpci: Add vendor hook to mask vbus present
ANDROID: usb: typce: tcpci: Add vendor hook for chip specific features
ANDROID: usb: typec: tcpci: Add vendor hooks for tcpci interface
FROMGIT: f2fs: add sysfs nodes to get runtime compression stat
ANDROID: dma-buf: Fix error path on system heaps use of the page pool
ANDROID: usb: typec: tcpm: Fix event storm caused by error in backport
ANDROID: GKI: USB: XHCI: add Android ABI padding to lots of xhci structures
FROMGIT: KVM: arm64: Fix host's ZCR_EL2 restore on nVHE
FROMGIT: KVM: arm64: Force SCTLR_EL2.WXN when running nVHE
FROMGIT: KVM: arm64: Turn SCTLR_ELx_FLAGS into INIT_SCTLR_EL2_MMU_ON
FROMGIT: KVM: arm64: Use INIT_SCTLR_EL2_MMU_OFF to disable the MMU on KVM teardown
FROMGIT: arm64: Use INIT_SCTLR_EL1_MMU_OFF to disable the MMU on CPU restart
FROMGIT: KVM: arm64: Enable SVE support for nVHE
FROMGIT: KVM: arm64: Save/restore SVE state for nVHE
BACKPORT: FROMGIT: KVM: arm64: Trap host SVE accesses when the FPSIMD state is dirty
FROMGIT: KVM: arm64: Save guest's ZCR_EL1 before saving the FPSIMD state
FROMGIT: KVM: arm64: Map SVE context at EL2 when available
BACKPORT: FROMGIT: KVM: arm64: Rework SVE host-save/guest-restore
FROMGIT: arm64: sve: Provide a conditional update accessor for ZCR_ELx
FROMGIT: KVM: arm64: Introduce vcpu_sve_vq() helper
FROMGIT: KVM: arm64: Let vcpu_sve_pffr() handle HYP VAs
FROMGIT: KVM: arm64: Use {read,write}_sysreg_el1 to access ZCR_EL1
FROMGIT: KVM: arm64: Provide KVM's own save/restore SVE primitives
ANDROID: GKI: USB: Gadget: add Android ABI padding to struct usb_gadget
ANDROID: vendor_hooks: Add hooks for memory when debug
ANDROID: vendor_hooks: Add hooks for ufs scheduler
ANDROID: GKI: sound/usb/card.h: add Android ABI padding to struct snd_usb_endpoint
ANDROID: GKI: user_namespace.h: add Android ABI padding to a structure
ANDROID: GKI: timer.h: add Android ABI padding to a structure
ANDROID: GKI: quota.h: add Android ABI padding to some structures
ANDROID: GKI: mmu_notifier.h: add Android ABI padding to some structures
ANDROID: GKI: mm.h: add Android ABI padding to a structure
ANDROID: GKI: kobject.h: add Android ABI padding to some structures
ANDROID: GKI: kernfs.h: add Android ABI padding to some structures
ANDROID: GKI: irqdomain.h: add Android ABI padding to a structure
ANDROID: GKI: ioport.h: add Android ABI padding to a structure
ANDROID: GKI: iomap.h: add Android ABI padding to a structure
ANDROID: GKI: hrtimer.h: add Android ABI padding to a structure
ANDROID: GKI: genhd.h: add Android ABI padding to some structures
ANDROID: GKI: ethtool.h: add Android ABI padding to a structure
ANDROID: GKI: dma-mapping.h: add Android ABI padding to a structure
ANDROID: GKI: networking: add Android ABI padding to a lot of networking structures
ANDROID: GKI: blk_types.h: add Android ABI padding to a structure
ANDROID: GKI: scsi.h: add Android ABI padding to a structure
ANDROID: GKI: pci: add Android ABI padding to some structures
ANDROID: GKI: add Android ABI padding to struct nf_conn
Conflicts:
Documentation/devicetree/bindings
include/linux/usb/gadget.h
Change-Id: Id08dc5a5299b4a780553a44a402d18e9b5b096cb
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
Why record task_work_add() call stack? Syzbot reports many use-after-free
issues for task_work, see [1]. After seeing the free stack and the
current auxiliary stack, we think they are useless, we don't know where
the work was registered. This work may be the free call stack, so we miss
the root cause and don't solve the use-after-free.
Add the task_work_add() call stack into the KASAN auxiliary stack in order
to improve KASAN reports. It helps programmers solve use-after-free
issues.
[1]: https://groups.google.com/g/syzkaller-bugs/search?q=kasan%20use-after-free%20task_work_run
Link: https://lkml.kernel.org/r/20210316024410.19967-1-walter-zh.wu@mediatek.com
Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 357e2e021b3a5c473b43a5a4d752139564bf27b8
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I38b2e1856ba9605bcdf0fb4fd4a7031596c8fe4a
This change uses the previously added memory initialization feature of
HW_TAGS KASAN routines for slab memory when init_on_free is enabled.
With this change, memory initialization memset() is no longer called when
both HW_TAGS KASAN and init_on_free are enabled. Instead, memory is
initialized in KASAN runtime.
For SLUB, the memory initialization memset() is moved into
slab_free_hook() that currently directly follows the initialization loop.
A new argument is added to slab_free_hook() that indicates whether to
initialize the memory or not.
To avoid discrepancies with which memory gets initialized that can be
caused by future changes, both KASAN hook and initialization memset() are
put together and a warning comment is added.
Combining setting allocation tags with memory initialization improves
HW_TAGS KASAN performance when init_on_free is enabled.
Link: https://lkml.kernel.org/r/190fd15c1886654afdec0d19ebebd5ade665b601.1615296150.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 6b548c253039de9a1658bb4c38e13e963f06489d
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I3bdfe966b27dc93964ad38c9a8385ca744932307
This change uses the previously added memory initialization feature of
HW_TAGS KASAN routines for slab memory when init_on_alloc is enabled.
With this change, memory initialization memset() is no longer called when
both HW_TAGS KASAN and init_on_alloc are enabled. Instead, memory is
initialized in KASAN runtime.
The memory initialization memset() is moved into slab_post_alloc_hook()
that currently directly follows the initialization loop. A new argument
is added to slab_post_alloc_hook() that indicates whether to initialize
the memory or not.
To avoid discrepancies with which memory gets initialized that can be
caused by future changes, both KASAN hook and initialization memset() are
put together and a warning comment is added.
Combining setting allocation tags with memory initialization improves
HW_TAGS KASAN performance when init_on_alloc is enabled.
Link: https://lkml.kernel.org/r/c1292aeb5d519da221ec74a0684a949b027d7720.1615296150.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit c7948d4407ed85251c6de1a09589e69e4072abb4
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I257917062e2cc5bfb3dbb46508200c30631a00a3
This change uses the previously added memory initialization feature of
HW_TAGS KASAN routines for page_alloc memory when init_on_alloc/free is
enabled.
With this change, kernel_init_free_pages() is no longer called when both
HW_TAGS KASAN and init_on_alloc/free are enabled. Instead, memory is
initialized in KASAN runtime.
To avoid discrepancies with which memory gets initialized that can be
caused by future changes, both KASAN and kernel_init_free_pages() hooks
are put together and a warning comment is added.
This patch changes the order in which memory initialization and page
poisoning hooks are called. This doesn't lead to any side-effects, as
whenever page poisoning is enabled, memory initialization gets disabled.
Combining setting allocation tags with memory initialization improves
HW_TAGS KASAN performance when init_on_alloc/free is enabled.
Link: https://lkml.kernel.org/r/e77f0d5b1b20658ef0b8288625c74c2b3690e725.1615296150.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 26a7ee1a170e0bc17505d04120e595cba0b9cc1b
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Iac6cf801657c260b15ec9ef49bd1b02dc83660bc
Patch series "arch, mm: improve robustness of direct map manipulation", v7.
During recent discussion about KVM protected memory, David raised a
concern about usage of __kernel_map_pages() outside of DEBUG_PAGEALLOC
scope [1].
Indeed, for architectures that define CONFIG_ARCH_HAS_SET_DIRECT_MAP it is
possible that __kernel_map_pages() would fail, but since this function is
void, the failure will go unnoticed.
Moreover, there's lack of consistency of __kernel_map_pages() semantics
across architectures as some guard this function with #ifdef
DEBUG_PAGEALLOC, some refuse to update the direct map if page allocation
debugging is disabled at run time and some allow modifying the direct map
regardless of DEBUG_PAGEALLOC settings.
This set straightens this out by restoring dependency of
__kernel_map_pages() on DEBUG_PAGEALLOC and updating the call sites
accordingly.
Since currently the only user of __kernel_map_pages() outside
DEBUG_PAGEALLOC is hibernation, it is updated to make direct map accesses
there more explicit.
[1] https://lore.kernel.org/lkml/2759b4bf-e1e3-d006-7d86-78a40348269d@redhat.com
This patch (of 4):
When CONFIG_DEBUG_PAGEALLOC is enabled, it unmaps pages from the kernel
direct mapping after free_pages(). The pages than need to be mapped back
before they could be used. Theese mapping operations use
__kernel_map_pages() guarded with with debug_pagealloc_enabled().
The only place that calls __kernel_map_pages() without checking whether
DEBUG_PAGEALLOC is enabled is the hibernation code that presumes
availability of this function when ARCH_HAS_SET_DIRECT_MAP is set. Still,
on arm64, __kernel_map_pages() will bail out when DEBUG_PAGEALLOC is not
enabled but set_direct_map_invalid_noflush() may render some pages not
present in the direct map and hibernation code won't be able to save such
pages.
To make page allocation debugging and hibernation interaction more robust,
the dependency on DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP has to be
made more explicit.
Start with combining the guard condition and the call to
__kernel_map_pages() into debug_pagealloc_map_pages() and
debug_pagealloc_unmap_pages() functions to emphasize that
__kernel_map_pages() should not be called without DEBUG_PAGEALLOC and use
these new functions to map/unmap pages when page allocation debugging is
enabled.
Link: https://lkml.kernel.org/r/20201109192128.960-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20201109192128.960-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 77bc7fd607dee2ffb28daff6d0dd8ae42af61ea8
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I9f0dac574bc3a7ea7d88bff051b77eca19610ce9
CONFIG_PAGE_POISONING_ZERO uses the zero pattern instead of 0xAA. It was
introduced by commit 1414c7f4f7d7 ("mm/page_poisoning.c: allow for zero
poisoning"), noting that using zeroes retains the benefit of sanitizing
content of freed pages, with the benefit of not having to zero them again
on alloc, and the downside of making some forms of corruption (stray
writes of NULLs) harder to detect than with the 0xAA pattern. Together
with CONFIG_PAGE_POISONING_NO_SANITY it made possible to sanitize the
contents on free without checking it back on alloc.
These days we have the init_on_free() option to achieve sanitization with
zeroes and to save clearing on alloc (and without checking on alloc).
Arguably if someone does choose to check the poison for corruption on
alloc, the savings of not clearing the page are secondary, and it makes
sense to always use the 0xAA poison pattern. Thus, remove the
CONFIG_PAGE_POISONING_ZERO option for being redundant.
Link: https://lkml.kernel.org/r/20201113104033.22907-6-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f289041ed4cf9a3f6e8a32068fef9ffb2acc5662
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I57e19d1dd77d3d5eec40f94c1b64a877c3710baa
commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
init_on_free=1 boot options") resulted with init_on_alloc=1 in all pages
leaving the buddy via alloc_pages() and friends to be
initialized/cleared/zeroed on allocation.
However, the same logic is currently not applied to alloc_contig_pages():
allocated pages leaving the buddy aren't cleared with init_on_alloc=1 and
init_on_free=0. Let's also properly clear pages on that allocation path.
To achieve that, let's move clearing into post_alloc_hook(). This will
not only affect alloc_contig_pages() allocations but also any pages used
as migration target in compaction code via compaction_alloc().
While this sounds sub-optimal, it's the very same handling as when
allocating migration targets via alloc_migration_target() - pages will get
properly cleared with init_on_free=1. In case we ever want to optimize
migration in that regard, we should tackle all such migration users - if
we believe migration code can be fully trusted.
With this change, we will see double clearing of pages in some cases. One
example are gigantic pages (either allocated via CMA, or allocated
dynamically via alloc_contig_pages()) - which is the right thing to do
(and to be optimized outside of the buddy in the callers) as discussed in:
https://lkml.kernel.org/r/20201019182853.7467-1-gpiccoli@canonical.com
This change implies that with init_on_alloc=1
- All CMA allocations will be cleared
- Gigantic pages allocated via alloc_contig_pages() will be cleared
- virtio-mem memory to be unplugged will be cleared. While this is
suboptimal, it's similar to memory balloon drivers handling, where
all pages to be inflated will get cleared as well.
- Pages isolated for compaction will be cleared
Link: https://lkml.kernel.org/r/20201120180452.19071-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 862b6dee20b0db2ebaa728c302a1b296ff144de3
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ie400a475598a5fae888d6bad32f32355a2d153b7
CONFIG_PAGE_POISONING_NO_SANITY skips the check on page alloc whether the
poison pattern was corrupted, suggesting a use-after-free. The motivation
to introduce it in commit 8823b1dbc05f ("mm/page_poison.c: enable
PAGE_POISONING as a separate option") was to simply sanitize freed pages,
optimally together with CONFIG_PAGE_POISONING_ZERO.
These days we have an init_on_free=1 boot option, which makes this use
case of page poisoning redundant. For sanitizing, writing zeroes is
sufficient, there is pretty much no benefit from writing the 0xAA poison
pattern to freed pages, without checking it back on alloc. Thus, remove
this option and suggest init_on_free instead in the main config's help.
Link: https://lkml.kernel.org/r/20201113104033.22907-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8f424750baaafcef229791882e879da01c9473b5
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I2ecd65191b6954db33d22df9cab0eb11bd934b8a
Page poisoning used to be incompatible with hibernation, as the state of
poisoned pages was lost after resume, thus enabling CONFIG_HIBERNATION
forces CONFIG_PAGE_POISONING_NO_SANITY. For the same reason, the
poisoning with zeroes variant CONFIG_PAGE_POISONING_ZERO used to disable
hibernation. The latter restriction was removed by commit 1ad1410f632d
("PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO") and
similarly for init_on_free by commit 18451f9f9e58 ("PM: hibernate: fix
crashes with init_on_free=1") by making sure free pages are cleared after
resume.
We can use the same mechanism to instead poison free pages with
PAGE_POISON after resume. This covers both zero and 0xAA patterns. Thus
we can remove the Kconfig restriction that disables page poison sanity
checking when hibernation is enabled.
Link: https://lkml.kernel.org/r/20201113104033.22907-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernation]
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 03b6c9a3e8805606c0bb4ad41855fac3bf85c3b9
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ieea49ebb4d3eeddd18eb2040f13b8121978facca
Commit 11c9c7edae06 ("mm/page_poison.c: replace bool variable with static
key") changed page_poisoning_enabled() to a static key check. However,
the function is not inlined, so each check still involves a function call
with overhead not eliminated when page poisoning is disabled.
Analogically to how debug_pagealloc is handled, this patch converts
page_poisoning_enabled() back to boolean check, and introduces
page_poisoning_enabled_static() for fast paths. Both functions are
inlined.
The function kernel_poison_pages() is also called unconditionally and does
the static key check inside. Remove it from there and put it to callers.
Also split it to two functions kernel_poison_pages() and
kernel_unpoison_pages() instead of the confusing bool parameter.
Also optimize the check that enables page poisoning instead of
debug_pagealloc for architectures without proper debug_pagealloc support.
Move the check to init_mem_debugging_and_hardening() to enable a single
static key instead of having two static branches in
page_poisoning_enabled_static().
Link: https://lkml.kernel.org/r/20201113104033.22907-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8db26a3d47354ce7271a8cab03cd65b9d3d610b9
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ifc3fdf5cd58f3b8346bf81480df3836811e7458b
Patch series "cleanup page poisoning", v3.
I have identified a number of issues and opportunities for cleanup with
CONFIG_PAGE_POISON and friends:
- interaction with init_on_alloc and init_on_free parameters depends on
the order of parameters (Patch 1)
- the boot time enabling uses static key, but inefficienty (Patch 2)
- sanity checking is incompatible with hibernation (Patch 3)
- CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have
init_on_free (Patch 4)
- CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we
have init_on_free (Patch 5)
This patch (of 5):
Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1
produces a warning in dmesg that page_poison takes precedence. However,
as these warnings are printed in early_param handlers for
init_on_alloc/free, they are not printed if page_poison is enabled later
on the command line (handlers are called in the order of their
parameters), or when init_on_alloc/free is always enabled by the
respective config option - before the page_poison early param handler is
called, it is not considered to be enabled. This is inconsistent.
We can remove the dependency on order by making the init_on_* parameters
only set a boolean variable, and postponing the evaluation after all early
params have been processed. Introduce a new
init_mem_debugging_and_hardening() function for that, and move the related
debug_pagealloc processing there as well.
As a result init_mem_debugging_and_hardening() knows always accurately if
init_on_* and/or page_poison options were enabled. Thus we can also
optimize want_init_on_alloc() and want_init_on_free(). We don't need to
check page_poisoning_enabled() there, we can instead not enable the
init_on_* static keys at all, if page poisoning is enabled. This results
in a simpler and more effective code.
Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz
Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Laura Abbott <labbott@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 04013513cc84c401c7de9023ff3eda7863fc4add
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Bug: 182930667
[glider: resolved a minor conflict in init/main.c - API change]
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I6c0ffcb0d8e2f56a688986aa1dc201adf89de067