android11-5.4
148 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Suren Baghdasaryan
|
ad939deb18 |
ANDROID: mm: remove sequence counting when mmap_lock is not exclusively owned
In a number of cases vm_write_{begin|end} is called while mmap_lock is not owned exclusively. This is unnecessary and can affect correctness of the sequence counting protecting speculative page fault handlers. Remove extra calls. Bug: 257443051 Change-Id: I1278638a0794448e22fbdab5601212b3b2eaebdc Signed-off-by: Suren Baghdasaryan <surenb@google.com> Git-commit: bfdcf47ca34dc3b7b63ca16b0a1856e57c57ee47 Git-repo: https://android.googlesource.com/kernel/common/ [quic_c_spathi@quicinc.com: resolve trivial merge conflicts] Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com> |
||
Srinivasarao Pathipati
|
b696332499 |
Merge android11-5.4.219+ (0ce03d1 ) into msm-5.4
* refs/heads/tmp-0ce03d1: Revert "wait: Fix __wait_event_hrtimeout for RT/DL tasks" Reverts below USB and netfilter patches BACKPORT: Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled UPSTREAM: bpf: Ensure correct locking around vulnerable function find_vpid() UPSTREAM: HID: roccat: Fix use-after-free in roccat_read() ANDROID: arm64: mm: perform clean & invalidation in __dma_map_area UPSTREAM: mmc: hsq: Fix data stomping during mmc recovery UPSTREAM: pinctrl: sunxi: Fix name for A100 R_PIO BACKPORT: mmc: core: Fix UHS-I SD 1.8V workaround branch UPSTREAM: Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression UPSTREAM: wifi: mac80211_hwsim: set virtio device ready in probe() BACKPORT: f2fs: don't use casefolded comparison for "." and ".." UPSTREAM: Revert "mm/cma.c: remove redundant cma_mutex lock" UPSTREAM: usb: dwc3: Try usb-role-switch first in dwc3_drd_init BACKPORT: usb: typec: ucsi: Fix reuse of completion structure BACKPORT: tipc: fix incorrect order of state message data sanity check UPSTREAM: net: fix up skbs delta_truesize in UDP GRO frag_list UPSTREAM: cgroup-v1: Correct privileges check in release_agent writes UPSTREAM: mm: don't try to NUMA-migrate COW pages that have other uses UPSTREAM: usb: raw-gadget: fix handling of dual-direction-capable endpoints UPSTREAM: selinux: check return value of sel_make_avc_files UPSTREAM: usb: musb: select GENERIC_PHY instead of depending on it BACKPORT: driver core: Fix error return code in really_probe() UPSTREAM: fscrypt: fix derivation of SipHash keys on big endian CPUs BACKPORT: fscrypt: rename FS_KEY_DERIVATION_NONCE_SIZE UPSTREAM: socionext: account for napi_gro_receive never returning GRO_DROP UPSTREAM: net: socionext: netsec: fix xdp stats accounting BACKPORT: fs: align IOCB_* flags with RWF_* flags UPSTREAM: efi: capsule-loader: Fix use-after-free in efi_capsule_write BACKPORT: ARM: 9039/1: assembler: generalize byte swapping macro into rev_l BACKPORT: ARM: 9035/1: uncompress: Add be32tocpu macro UPSTREAM: drm/meson: Fix overflow implicit truncation warnings UPSTREAM: irqchip/tegra: Fix overflow implicit truncation warnings UPSTREAM: video: fbdev: pxa3xx-gcu: Fix integer overflow in pxa3xx_gcu_write UPSTREAM: mm/mremap: hold the rmap lock in write mode when moving page table entries. FROMLIST: binder: fix UAF of alloc->vma in race with munmap() UPSTREAM: mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() UPSTREAM: mm: Force TLB flush for PFNMAP mappings before unlink_file_vma() UPSTREAM: af_key: Do not call xfrm_probe_algs in parallel UPSTREAM: wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans() UPSTREAM: wifi: cfg80211/mac80211: reject bad MBSSID elements UPSTREAM: wifi: cfg80211: ensure length byte is present before access UPSTREAM: wifi: cfg80211: fix BSS refcounting bugs UPSTREAM: wifi: cfg80211: avoid nontransmitted BSS list corruption UPSTREAM: wifi: mac80211_hwsim: avoid mac80211 warning on bad rate UPSTREAM: wifi: cfg80211: update hidden BSSes to avoid WARN_ON UPSTREAM: mac80211: mlme: find auth challenge directly UPSTREAM: wifi: mac80211: don't parse mbssid in assoc response ANDROID: GKI: db845c: Update symbols list and ABI UPSTREAM: wifi: mac80211: fix MBSSID parsing use-after-free ANDROID: Drop explicit 'CONFIG_INIT_STACK_ALL_ZERO=y' from gki_defconfig UPSTREAM: hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero UPSTREAM: hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO UPSTREAM: hardening: Clarify Kconfig text for auto-var-init ANDROID: GKI: Update FCNT KMI symbol list Linux 5.4.219 wifi: mac80211: fix MBSSID parsing use-after-free wifi: mac80211: don't parse mbssid in assoc response mac80211: mlme: find auth challenge directly Revert "fs: check FMODE_LSEEK to control internal pipe splicing" Linux 5.4.218 Input: xpad - fix wireless 360 controller breaking after suspend Input: xpad - add supported devices as contributed on github wifi: cfg80211: update hidden BSSes to avoid WARN_ON wifi: mac80211_hwsim: avoid mac80211 warning on bad rate wifi: cfg80211: avoid nontransmitted BSS list corruption wifi: cfg80211: fix BSS refcounting bugs wifi: cfg80211: ensure length byte is present before access wifi: cfg80211/mac80211: reject bad MBSSID elements wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans() random: use expired timer rather than wq for mixing fast pool random: avoid reading two cache lines on irq randomness random: restore O_NONBLOCK support USB: serial: qcserial: add new usb-id for Dell branded EM7455 scsi: stex: Properly zero out the passthrough command structure efi: Correct Macmini DMI match in uefi cert quirk ALSA: hda: Fix position reporting on Poulsbo random: clamp credited irq bits to maximum mixed ceph: don't truncate file in atomic_open nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure nilfs2: fix leak of nilfs_root in case of writer thread creation failure nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level() rpmsg: qcom: glink: replace strncpy() with strscpy_pad() mmc: core: Terminate infinite loop in SD-UHS voltage switch mmc: core: Replace with already defined values for readability USB: serial: ftdi_sio: fix 300 bps rate for SIO usb: mon: make mmapped memory read only arch: um: Mark the stack non-executable to fix a binutils warning um: Cleanup compiler warning in arch/x86/um/tls_32.c um: Cleanup syscall_handler_t cast in syscalls_32.h net/ieee802154: fix uninit value bug in dgram_sendmsg scsi: qedf: Fix a UAF bug in __qedf_probe() ARM: dts: fix Moxa SDIO 'compatible', remove 'sdhci' misnomer dmaengine: xilinx_dma: Report error in case of dma_set_mask_and_coherent API failure dmaengine: xilinx_dma: cleanup for fetching xlnx,num-fstores property firmware: arm_scmi: Add SCMI PM driver remove routine fs: fix UAF/GPF bug in nilfs_mdt_destroy perf tools: Fixup get_current_dir_name() compilation mm: pagewalk: Fix race between unmap and page walker ANDROID: Fix kenelci build-break for !CONFIG_PERF_EVENTS BACKPORT: HID: steam: Prevent NULL pointer dereference in steam_{recv,send}_report Linux 5.4.217 docs: update mediator information in CoC docs Makefile.extrawarn: Move -Wcast-function-type-strict to W=1 Revert "drm/amdgpu: use dirty framebuffer helper" xfs: remove unused variable 'done' xfs: fix uninitialized variable in xfs_attr3_leaf_inactive xfs: streamline xfs_attr3_leaf_inactive xfs: move incore structures out of xfs_da_format.h xfs: fix memory corruption during remote attr value buffer invalidation xfs: refactor remote attr value buffer invalidation xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read xfs: fix s_maxbytes computation on 32-bit kernels xfs: truncate should remove all blocks, not just to the end of the page cache xfs: introduce XFS_MAX_FILEOFF xfs: fix misuse of the XFS_ATTR_INCOMPLETE flag x86/speculation: Add RSB VM Exit protections x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current x86/speculation: Disable RRSBA behavior x86/bugs: Add Cannon lake to RETBleed affected CPU list x86/cpu/amd: Enumerate BTC_NO x86/common: Stamp out the stepping madness x86/speculation: Fill RSB on vmexit for IBRS KVM: VMX: Fix IBRS handling after vmexit KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS KVM: VMX: Convert launched argument to flags KVM: VMX: Flatten __vmx_vcpu_run() KVM/nVMX: Use __vmx_vcpu_run in nested_vmx_check_vmentry_hw KVM/VMX: Use TEST %REG,%REG instead of CMP $0,%REG in vmenter.S x86/speculation: Remove x86_spec_ctrl_mask x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit x86/speculation: Fix SPEC_CTRL write on SMT state change x86/speculation: Fix firmware entry SPEC_CTRL handling x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n x86/speculation: Change FILL_RETURN_BUFFER to work with objtool intel_idle: Disable IBRS during long idle x86/bugs: Report Intel retbleed vulnerability x86/bugs: Split spectre_v2_select_mitigation() and spectre_v2_user_select_mitigation() x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS x86/bugs: Optimize SPEC_CTRL MSR writes x86/entry: Add kernel IBRS implementation x86/entry: Remove skip_r11rcx x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value x86/bugs: Add AMD retbleed= boot parameter x86/bugs: Report AMD retbleed vulnerability x86/cpufeatures: Move RETPOLINE flags to word 11 x86/kvm/vmx: Make noinstr clean x86/cpu: Add a steppings field to struct x86_cpu_id x86/cpu: Add consistent CPU match macros x86/devicetable: Move x86 specific macro out of generic code Revert "x86/cpu: Add a steppings field to struct x86_cpu_id" Revert "x86/speculation: Add RSB VM Exit protections" Linux 5.4.216 clk: iproc: Do not rely on node name for correct PLL setup clk: imx: imx6sx: remove the SET_RATE_PARENT flag for QSPI clocks selftests: Fix the if conditions of in test_extra_filter() nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices nvme: add new line after variable declatation usbnet: Fix memory leak in usbnet_disconnect() Input: melfas_mip4 - fix return value check in mip4_probe() Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time" soc: sunxi: sram: Fix debugfs info for A64 SRAM C soc: sunxi: sram: Fix probe function ordering issues soc: sunxi_sram: Make use of the helper function devm_platform_ioremap_resource() soc: sunxi: sram: Prevent the driver from being unbound soc: sunxi: sram: Actually claim SRAM regions ARM: dts: am33xx: Fix MMCHS0 dma properties ARM: dts: Move am33xx and am43xx mmc nodes to sdhci-omap driver media: dvb_vb2: fix possible out of bound access mm: fix madivse_pageout mishandling on non-LRU page mm/migrate_device.c: flush TLB while holding PTL mm: prevent page_frag_alloc() from corrupting the memory mm/page_alloc: fix race condition between build_all_zonelists and page allocation mmc: moxart: fix 4-bit bus width and remove 8-bit bus width libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205 Revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()" ntfs: fix BUG_ON in ntfs_lookup_inode_by_name() ARM: dts: integrator: Tag PCI host with device_type clk: ingenic-tcu: Properly enable registers before accessing timers net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455 uas: ignore UAS for Thinkplus chips usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS uas: add no-uas quirk for Hiksemi usb_disk ANDROID: ABI: Update allowed list for QCOM UPSTREAM: wifi: mac80211_hwsim: use 32-bit skb cookie UPSTREAM: wifi: mac80211_hwsim: add back erroneously removed cast UPSTREAM: wifi: mac80211_hwsim: fix race condition in pending packet Linux 5.4.215 ext4: make directory inode spreading reflect flexbg size xfs: fix use-after-free when aborting corrupt attr inactivation xfs: fix an ABBA deadlock in xfs_rename xfs: don't commit sunit/swidth updates to disk if that would cause repair failures xfs: split the sunit parameter update into two parts xfs: refactor agfl length computation function xfs: use bitops interface for buf log item AIL flag check xfs: stabilize insert range start boundary to avoid COW writeback race xfs: fix some memory leaks in log recovery xfs: always log corruption errors xfs: constify the buffer pointer arguments to error functions xfs: convert EIO to EFSCORRUPTED when log contents are invalid xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename() xfs: attach dquots and reserve quota blocks during unwritten conversion xfs: range check ri_cnt when recovering log items xfs: add missing assert in xfs_fsmap_owner_from_rmap xfs: slightly tweak an assert in xfs_fs_map_blocks xfs: replace -EIO with -EFSCORRUPTED for corrupt metadata ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0 workqueue: don't skip lockdep work dependency in cancel_work_sync() drm/rockchip: Fix return type of cdn_dp_connector_mode_valid drm/amd/display: Limit user regamma to a valid value drm/amdgpu: use dirty framebuffer helper Drivers: hv: Never allocate anything besides framebuffer from framebuffer memory region cifs: always initialize struct msghdr smb_msg completely usb: xhci-mtk: fix issue of out-of-bounds array access s390/dasd: fix Oops in dasd_alias_get_start_dev due to missing pavgroup serial: tegra-tcu: Use uart_xmit_advance(), fixes icount.tx accounting serial: tegra: Use uart_xmit_advance(), fixes icount.tx accounting serial: Create uart_xmit_advance() net: sched: fix possible refcount leak in tc_new_tfilter() net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD perf kcore_copy: Do not check /proc/modules is unchanged perf jit: Include program header in ELF files can: gs_usb: gs_can_open(): fix race dev->can.state condition netfilter: ebtables: fix memory leak when blob is malformed net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs net/sched: taprio: avoid disabling offload when it was never enabled of: mdio: Add of_node_put() when breaking out of for_each_xx i40e: Fix set max_tx_rate when it is lower than 1 Mbps i40e: Fix VF set max MTU size iavf: Fix set max MTU size with port VLAN and jumbo frames iavf: Fix bad page state MIPS: Loongson32: Fix PHY-mode being left unspecified MIPS: lantiq: export clk_get_io() for lantiq_wdt.ko net: team: Unsync device addresses on ndo_stop ipvlan: Fix out-of-bound bugs caused by unset skb->mac_header iavf: Fix cached head and tail value for iavf_get_tx_pending netfilter: nfnetlink_osf: fix possible bogus match in nf_osf_find() netfilter: nf_conntrack_irc: Tighten matching on DCC message netfilter: nf_conntrack_sip: fix ct_sip_walk_headers arm64: dts: rockchip: Remove 'enable-active-low' from rk3399-puma arm64: dts: rockchip: Set RK3399-Gru PCLK_EDP to 24 MHz arm64: dts: rockchip: Pull up wlan wake# on Gru-Bob mm/slub: fix to return errno if kmalloc() fails efi: libstub: check Shim mode using MokSBStateRT ALSA: hda/realtek: Enable 4-speaker output Dell Precision 5530 laptop ALSA: hda/realtek: Add quirk for ASUS GA503R laptop ALSA: hda/realtek: Add pincfg for ASUS G533Z HP jack ALSA: hda/realtek: Add pincfg for ASUS G513 HP jack ALSA: hda/realtek: Re-arrange quirk table entries ALSA: hda/realtek: Add quirk for Huawei WRT-WX9 ALSA: hda: add Intel 5 Series / 3400 PCI DID ALSA: hda/tegra: set depop delay for tegra USB: serial: option: add Quectel RM520N USB: serial: option: add Quectel BG95 0x0203 composition USB: core: Fix RST error in hub.c Revert "usb: gadget: udc-xilinx: replace memcpy with memcpy_toio" Revert "usb: add quirks for Lenovo OneLink+ Dock" usb: cdns3: fix issue with rearming ISO OUT endpoint usb: gadget: udc-xilinx: replace memcpy with memcpy_toio usb: add quirks for Lenovo OneLink+ Dock tty: serial: atmel: Preserve previous USART mode if RS485 disabled serial: atmel: remove redundant assignment in rs485_config tty/serial: atmel: RS485 & ISO7816: wait for TXRDY before sending data wifi: mac80211: Fix UAF in ieee80211_scan_rx() usb: xhci-mtk: relax TT periodic bandwidth allocation usb: xhci-mtk: allow multiple Start-Split in a microframe usb: xhci-mtk: add some schedule error number usb: xhci-mtk: add a function to (un)load bandwidth info usb: xhci-mtk: use @sch_tt to check whether need do TT schedule usb: xhci-mtk: add only one extra CS for FS/LS INTR usb: xhci-mtk: get the microframe boundary for ESIT usb: dwc3: gadget: Avoid duplicate requests to enable Run/Stop usb: dwc3: gadget: Don't modify GEVNTCOUNT in pullup() usb: dwc3: gadget: Refactor pullup() usb: dwc3: gadget: Prevent repeat pullup() usb: dwc3: Issue core soft reset before enabling run/stop usb: dwc3: gadget: Avoid starting DWC3 gadget during UDC unbind ALSA: hda/sigmatel: Fix unused variable warning for beep power change cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all() video: fbdev: pxa3xx-gcu: Fix integer overflow in pxa3xx_gcu_write mksysmap: Fix the mismatch of 'L0' symbols in System.map MIPS: OCTEON: irq: Fix octeon_irq_force_ciu_mapping() afs: Return -EAGAIN, not -EREMOTEIO, when a file already locked net: usb: qmi_wwan: add Quectel RM520N ALSA: hda/tegra: Align BDL entry to 4KB boundary ALSA: hda/sigmatel: Keep power up while beep is enabled rxrpc: Fix calc of resend age rxrpc: Fix local destruction being repeated regulator: pfuze100: Fix the global-out-of-bounds access in pfuze100_regulator_probe() ASoC: nau8824: Fix semaphore unbalance at error paths iomap: iomap that extends beyond EOF should be marked dirty MAINTAINERS: add Chandan as xfs maintainer for 5.4.y cifs: don't send down the destination address to sendmsg for a SOCK_STREAM cifs: revalidate mapping when doing direct writes tracing: hold caller_addr to hardirq_{enable,disable}_ip task_stack, x86/cea: Force-inline stack helpers ALSA: pcm: oss: Fix race at SNDCTL_DSP_SYNC parisc: ccio-dma: Add missing iounmap in error path in ccio_probe() drm/meson: Fix OSD1 RGB to YCbCr coefficient drm/meson: Correct OSD1 global alpha value gpio: mpc8xxx: Fix support for IRQ_TYPE_LEVEL_LOW flow_type in mpc85xx NFSv4: Turn off open-by-filehandle and NFS re-export for NFSv4.0 of: fdt: fix off-by-one error in unflatten_dt_nodes() Revert "USB: core: Prevent nested device-reset calls" Revert "io_uring: disable polling pollfree files" Revert "netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y" Revert "sched/deadline: Fix priority inheritance with multiple scheduling classes" Revert "kernel/sched: Remove dl_boosted flag comment" Revert "mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse" Revert "fs: check FMODE_LSEEK to control internal pipe splicing" Linux 5.4.214 tracefs: Only clobber mode/uid/gid on remount if asked soc: fsl: select FSL_GUTS driver for DPIO net: dp83822: disable rx error interrupt mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() usb: storage: Add ASUS <0x0b05:0x1932> to IGNORE_UAS platform/x86: acer-wmi: Acer Aspire One AOD270/Packard Bell Dot keymap fixes perf/arm_pmu_platform: fix tests for platform_get_irq() failure nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change() Input: iforce - add support for Boeder Force Feedback Wheel ieee802154: cc2520: add rc code in cc2520_tx() tg3: Disable tg3 device on system reboot to avoid triggering AER hid: intel-ish-hid: ishtp: Fix ishtp client sending disordered message HID: ishtp-hid-clientHID: ishtp-hid-client: Fix comment typo drm/msm/rd: Fix FIFO-full deadlock Linux 5.4.213 MIPS: loongson32: ls1c: Fix hang during startup x86/nospec: Fix i386 RSB stuffing sch_sfb: Also store skb len before calling child enqueue tcp: fix early ETIMEDOUT after spurious non-SACK RTO nvme-tcp: fix UAF when detecting digest errors RDMA/mlx5: Set local port to one when accessing counters ipv6: sr: fix out-of-bounds read when setting HMAC data. RDMA/siw: Pass a pointer to virt_to_page() i40e: Fix kernel crash during module removal tipc: fix shift wrapping bug in map_get() sch_sfb: Don't assume the skb is still around after enqueueing to child afs: Use the operation issue time instead of the reply time for callbacks rxrpc: Fix an insufficiently large sglist in rxkad_verify_packet_2() netfilter: nf_conntrack_irc: Fix forged IP logic netfilter: br_netfilter: Drop dst references before setting. RDMA/hns: Fix supported page size soc: brcmstb: pm-arm: Fix refcount leak and __iomem leak bugs RDMA/cma: Fix arguments order in net device validation regulator: core: Clean up on enable failure ARM: dts: imx6qdl-kontron-samx6i: remove duplicated node smb3: missing inode locks in punch hole cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree cgroup: Optimize single thread migration scsi: lpfc: Add missing destroy_workqueue() in error path scsi: mpt3sas: Fix use-after-free warning nvmet: fix a use-after-free debugfs: add debugfs_lookup_and_remove() kprobes: Prohibit probes in gate area ALSA: usb-audio: Fix an out-of-bounds bug in __snd_usb_parse_audio_interface() ALSA: aloop: Fix random zeros in capture data when using jiffies timer ALSA: emu10k1: Fix out of bounds access in snd_emu10k1_pcm_channel_alloc() drm/amdgpu: mmVM_L2_CNTL3 register not initialized correctly fbdev: chipsfb: Add missing pci_disable_device() in chipsfb_pci_init() arm64: cacheinfo: Fix incorrect assignment of signed error value to unsigned fw_level parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources() drm/radeon: add a force flush to delay work when radeon drm/amdgpu: Check num_gfx_rings for gfx v9_0 rb setup. drm/gem: Fix GEM handle release errors scsi: megaraid_sas: Fix double kfree() USB: serial: ch341: fix disabled rx timer on older devices USB: serial: ch341: fix lost character on LCR updates usb: dwc3: disable USB core PHY management usb: dwc3: fix PHY disable sequence btrfs: harden identification of a stale device drm/i915/glk: ECS Liva Q2 needs GLK HDMI port timing quirk ALSA: seq: Fix data-race at module auto-loading ALSA: seq: oss: Fix data-race for max_midi_devs access net: mac802154: Fix a condition in the receive path ip: fix triggering of 'icmp redirect' wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnected driver core: Don't probe devices after bus_type.match() probe deferral usb: gadget: mass_storage: Fix cdrom data transfers on MAC-OS USB: core: Prevent nested device-reset calls s390: fix nospec table alignments s390/hugetlb: fix prepare_hugepage_range() check for 2 GB hugepages usb-storage: Add ignore-residue quirk for NXP PN7462AU USB: cdc-acm: Add Icom PMR F3400 support (0c26:0020) usb: dwc2: fix wrong order of phy_power_on and phy_init usb: typec: altmodes/displayport: correct pin assignment for UFP receptacles USB: serial: option: add support for Cinterion MV32-WA/WB RmNet mode USB: serial: option: add Quectel EM060K modem USB: serial: option: add support for OPPO R11 diag port USB: serial: cp210x: add Decagon UCA device id xhci: Add grace period after xHC start to prevent premature runtime suspend. thunderbolt: Use the actual buffer in tb_async_error() gpio: pca953x: Add mutex_lock for regcache sync in PM hwmon: (gpio-fan) Fix array out of bounds access clk: bcm: rpi: Fix error handling of raspberrypi_fw_get_rate Input: rk805-pwrkey - fix module autoloading clk: core: Fix runtime PM sequence in clk_core_unprepare() Revert "clk: core: Honor CLK_OPS_PARENT_ENABLE for clk gate ops" clk: core: Honor CLK_OPS_PARENT_ENABLE for clk gate ops drm/i915/reg: Fix spelling mistake "Unsupport" -> "Unsupported" usb: dwc3: qcom: fix use-after-free on runtime-PM wakeup binder: fix UAF of ref->proc caused by race condition USB: serial: ftdi_sio: add Omron CS1W-CIF31 device id misc: fastrpc: fix memory corruption on open misc: fastrpc: fix memory corruption on probe iio: adc: mcp3911: use correct formula for AD conversion Input: iforce - wake up after clearing IFORCE_XMIT_RUNNING flag tty: serial: lpuart: disable flow control while waiting for the transmit engine to complete vt: Clear selection before changing the font powerpc: align syscall table for ppc32 staging: rtl8712: fix use after free bugs serial: fsl_lpuart: RS485 RTS polariy is inverse net/smc: Remove redundant refcount increase Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb" tcp: annotate data-race around challenge_timestamp sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb kcm: fix strp_init() order and cleanup ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler net: sched: tbf: don't call qdisc_put() while holding tree lock Revert "xhci: turn off port power in shutdown" wifi: cfg80211: debugfs: fix return type in ht40allow_map_read() ieee802154/adf7242: defer destroy_workqueue call iio: adc: mcp3911: make use of the sign bit platform/x86: pmc_atom: Fix SLP_TYPx bitfield mask drm/msm/dsi: Fix number of regulators for msm8996_dsi_cfg drm/msm/dsi: fix the inconsistent indenting net: dp83822: disable false carrier interrupt Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" fs: only do a memory barrier for the first set_buffer_uptodate() net: mvpp2: debugfs: fix memory leak when using debugfs_lookup() wifi: iwlegacy: 4965: corrected fix for potential off-by-one overflow in il4965_rs_fill_link_cmd() efi: capsule-loader: Fix use-after-free in efi_capsule_write Linux 5.4.212 net: neigh: don't call kfree_skb() under spin_lock_irqsave() net/af_packet: check len when min_header_len equals to 0 io_uring: disable polling pollfree files kprobes: don't call disarm_kprobe() for disabled kprobes lib/vdso: Mark do_hres() and do_coarse() as __always_inline lib/vdso: Let do_coarse() return 0 to simplify the callsite btrfs: tree-checker: check for overlapping extent items netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y drm/amd/display: Fix pixel clock programming s390/hypfs: avoid error message under KVM neigh: fix possible DoS due to net iface start/stop loop drm/amd/display: clear optc underflow before turn off odm clock drm/amd/display: Avoid MPC infinite loop btrfs: unify lookup return value when dir entry is missing btrfs: do not pin logs too early during renames btrfs: introduce btrfs_lookup_match_dir mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse bpf: Don't redirect packets with invalid pkt_len ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead fbdev: fb_pm2fb: Avoid potential divide by zero error HID: hidraw: fix memory leak in hidraw_release() media: pvrusb2: fix memory leak in pvr_probe udmabuf: Set the DMA mask for the udmabuf device (v2) HID: steam: Prevent NULL pointer dereference in steam_{recv,send}_report Bluetooth: L2CAP: Fix build errors in some archs kbuild: Fix include path in scripts/Makefile.modpost x86/bugs: Add "unknown" reporting for MMIO Stale Data s390/mm: do not trigger write fault when vma does not allow VM_WRITE mm: Force TLB flush for PFNMAP mappings before unlink_file_vma() scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU md: call __md_stop_writes in md_stop mm/hugetlb: fix hugetlb not supporting softdirty tracking ACPI: processor: Remove freq Qos request for all CPUs s390: fix double free of GS and RI CBs on fork() failure asm-generic: sections: refactor memory_intersects loop: Check for overflow while configuring loop x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry btrfs: check if root is readonly while setting security xattr btrfs: add info when mount fails due to stale replace target btrfs: replace: drop assert for suspended replace btrfs: fix silent failure when deleting root reference ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter net: Fix a data-race around sysctl_somaxconn. net: Fix a data-race around netdev_budget_usecs. net: Fix a data-race around netdev_budget. net: Fix a data-race around sysctl_net_busy_read. net: Fix a data-race around sysctl_net_busy_poll. net: Fix a data-race around sysctl_tstamp_allow_data. ratelimit: Fix data-races in ___ratelimit(). net: Fix data-races around netdev_tstamp_prequeue. net: Fix data-races around weight_p and dev_weight_[rt]x_bias. netfilter: nft_tunnel: restrict it to netdev family netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families netfilter: nft_payload: do not truncate csum_offset and csum_type netfilter: nft_payload: report ERANGE for too long offset and length bnxt_en: fix NQ resource accounting during vf creation on 57500 chips netfilter: ebtables: reject blobs that don't provide all entry points net: ipvtap - add __init/__exit annotations to module init/exit funcs bonding: 802.3ad: fix no transmission of LACPDUs net: moxa: get rid of asymmetry in DMA mapping/unmapping net/mlx5e: Properly disable vlan strip on non-UL reps rose: check NULL rose_loopback_neigh->loopback SUNRPC: RPC level errors should set task->tk_rpc_status af_key: Do not call xfrm_probe_algs in parallel xfrm: fix refcount leak in __xfrm_policy_check() kernel/sched: Remove dl_boosted flag comment sched/deadline: Fix priority inheritance with multiple scheduling classes sched/deadline: Fix stale throttling on de-/boosted tasks sched/deadline: Unthrottle PI boosted threads while enqueuing pinctrl: amd: Don't save/restore interrupt status and wake status bits Revert "selftests/bpf: Fix test_align verifier log patterns" Revert "selftests/bpf: Fix "dubious pointer arithmetic" test" usb: cdns3: Fix issue for clear halt endpoint kernel/sys_ni: add compat entry for fadvise64_64 parisc: Fix exception handler for fldw and fstw instructions audit: fix potential double free on error path from fsnotify_add_inode_mark Revert "USB: HCD: Fix URB giveback issue in tasklet function" Linux 5.4.211 btrfs: raid56: don't trust any cached sector in __raid56_parity_recover() btrfs: only write the sectors in the vertical stripe which has data stripes can: j1939: j1939_session_destroy(): fix memory leak of skbs can: j1939: j1939_sk_queue_activate_next_locked(): replace WARN_ON_ONCE with netdev_warn_once() tracing/probes: Have kprobes and uprobes use $COMM too MIPS: tlbex: Explicitly compare _PAGE_NO_EXEC against 0 video: fbdev: i740fb: Check the argument of i740_calc_vclk() powerpc/64: Init jump labels before parse_early_param() smb3: check xattr value length earlier f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page() ALSA: timer: Use deferred fasync helper ALSA: core: Add async signal helpers powerpc/32: Don't always pass -mcpu=powerpc to the compiler watchdog: export lockup_detector_reconfigure RISC-V: Add fast call path of crash_kexec() riscv: mmap with PROT_WRITE but no PROT_READ is invalid mips: cavium-octeon: Fix missing of_node_put() in octeon2_usb_clocks_start vfio: Clear the caps->buf to NULL after free tty: serial: Fix refcount leak bug in ucc_uart.c lib/list_debug.c: Detect uninitialized lists ext4: avoid resizing to a partial cluster size ext4: avoid remove directory when directory is corrupted drivers:md:fix a potential use-after-free bug nvmet-tcp: fix lockdep complaint on nvmet_tcp_wq flush during queue teardown dmaengine: sprd: Cleanup in .remove() after pm_runtime_get_sync() failed selftests/kprobe: Do not test for GRP/ without event failures um: add "noreboot" command line option for PANIC_TIMEOUT=-1 setups PCI/ACPI: Guard ARM64-specific mcfg_quirks cxl: Fix a memory leak in an error handling path gadgetfs: ep_io - wait until IRQ finishes scsi: lpfc: Prevent buffer overflow crashes in debugfs with malformed user input clk: qcom: ipq8074: dont disable gcc_sleep_clk_src vboxguest: Do not use devm for irq usb: renesas: Fix refcount leak bug usb: host: ohci-ppc-of: Fix refcount leak bug drm/meson: Fix overflow implicit truncation warnings irqchip/tegra: Fix overflow implicit truncation warnings usb: gadget: uvc: call uvc uvcg_warn on completed status instead of uvcg_info usb: cdns3 fix use-after-free at workaround 2 PCI: Add ACS quirk for Broadcom BCM5750x NICs drm/meson: Fix refcount bugs in meson_vpu_has_available_connectors() locking/atomic: Make test_and_*_bit() ordered on failure gcc-plugins: Undefine LATENT_ENTROPY_PLUGIN when plugin disabled for a file igb: Add lock to avoid data race fec: Fix timer capture timing in `fec_ptp_enable_pps()` i40e: Fix to stop tx_timeout recovery if GLOBR fails ice: Ignore EEXIST when setting promisc mode net: dsa: microchip: ksz9477: fix fdb_dump last invalid entry net: moxa: pass pdev instead of ndev to DMA functions net: dsa: mv88e6060: prevent crash on an unused port powerpc/pci: Fix get_phb_number() locking netfilter: nf_tables: really skip inactive sets when allocating name clk: rockchip: add sclk_mac_lbtest to rk3188_critical_clocks iavf: Fix adminq error handling nios2: add force_successful_syscall_return() nios2: restarts apply only to the first sigframe we build... nios2: fix syscall restart checks nios2: traced syscall does need to check the syscall number nios2: don't leave NULLs in sys_call_table[] nios2: page fault et.al. are *not* restartable syscalls... tee: add overflow check in register_shm_helper() dpaa2-eth: trace the allocated address instead of page struct atm: idt77252: fix use-after-free bugs caused by tst_timer xen/xenbus: fix return type in xenbus_file_read() nfp: ethtool: fix the display error of `ethtool -m DEVNAME` NTB: ntb_tool: uninitialized heap data in tool_fn_write() tools build: Switch to new openssl API for test-libcrypto tools/vm/slabinfo: use alphabetic order when two values are equal dt-bindings: arm: qcom: fix MSM8916 MTP compatibles vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout() vsock: Fix memory leak in vsock_connect() plip: avoid rcu debug splat geneve: do not use RT_TOS for IPv6 flowlabel ACPI: property: Return type of acpi_add_nondev_subnodes() should be bool pinctrl: sunxi: Add I/O bias setting for H6 R-PIO pinctrl: qcom: msm8916: Allow CAMSS GP clocks to be muxed pinctrl: nomadik: Fix refcount leak in nmk_pinctrl_dt_subnode_to_map net: bgmac: Fix a BUG triggered by wrong bytes_compl devlink: Fix use-after-free after a failed reload SUNRPC: Reinitialise the backchannel request buffers before reuse sunrpc: fix expiry of auth creds can: mcp251x: Fix race condition on receive interrupt NFSv4/pnfs: Fix a use-after-free bug in open NFSv4.1: RECLAIM_COMPLETE must handle EACCES NFSv4: Fix races in the legacy idmapper upcall NFSv4.1: Handle NFS4ERR_DELAY replies to OP_SEQUENCE correctly NFSv4.1: Don't decrease the value of seq_nr_highest_sent Documentation: ACPI: EINJ: Fix obsolete example apparmor: Fix memleak in aa_simple_write_to_buffer() apparmor: fix reference count leak in aa_pivotroot() apparmor: fix overlapping attachment computation apparmor: fix aa_label_asxprint return check apparmor: Fix failed mount permission check error message apparmor: fix absroot causing audited secids to begin with = apparmor: fix quiet_denied for file rules can: ems_usb: fix clang's -Wunaligned-access warning tracing: Have filter accept "common_cpu" to be consistent btrfs: fix lost error handling when looking up extended ref on log replay mmc: pxamci: Fix an error handling path in pxamci_probe() mmc: pxamci: Fix another error handling path in pxamci_probe() ata: libata-eh: Add missing command name rds: add missing barrier to release_refill ALSA: info: Fix llseek return value when using callback net_sched: cls_route: disallow handle of 0 net/9p: Initialize the iounit field during fid creation Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression Revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" scsi: sg: Allow waiting for commands to complete on removed device tcp: fix over estimation in sk_forced_mem_schedule() KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast() KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq KVM: Add infrastructure and macro to mark VM as bugged btrfs: reject log replay if there is unsupported RO compat flag net_sched: cls_route: remove from list when handle is 0 iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE) firmware: arm_scpi: Ensure scpi_info is not assigned if the probe fails timekeeping: contribute wall clock to rng on time change ACPI: CPPC: Do not prevent CPPC from working in the future dm writecache: set a default MAX_WRITEBACK_JOBS dm thin: fix use-after-free crash in dm_sm_register_threshold_callback dm raid: fix address sanitizer warning in raid_status dm raid: fix address sanitizer warning in raid_resume intel_th: pci: Add Meteor Lake-P support intel_th: pci: Add Raptor Lake-S PCH support intel_th: pci: Add Raptor Lake-S CPU support ext4: correct the misjudgment in ext4_iget_extra_inode ext4: correct max_inline_xattr_value_size computing ext4: fix extent status tree race in writeback error recovery path ext4: update s_overhead_clusters in the superblock during an on-line resize ext4: fix use-after-free in ext4_xattr_set_entry ext4: make sure ext4_append() always allocates new block ext4: add EXT4_INODE_HAS_XATTR_SPACE macro in xattr.h btrfs: reset block group chunk force if we have to wait tpm: eventlog: Fix section mismatch for DEBUG_SECTION_MISMATCH kexec, KEYS, s390: Make use of built-in and secondary keyring for signature verification spmi: trace: fix stack-out-of-bound access in SPMI tracing functions x86/olpc: fix 'logical not is only applied to the left hand side' scsi: qla2xxx: Fix erroneous mailbox timeout after PCI error injection scsi: qla2xxx: Turn off multi-queue for 8G adapters scsi: qla2xxx: Fix discovery issues in FC-AL topology scsi: zfcp: Fix missing auto port scan and thus missing target ports video: fbdev: s3fb: Check the size of screen before memset_io() video: fbdev: arkfb: Check the size of screen before memset_io() video: fbdev: vt8623fb: Check the size of screen before memset_io() tools/thermal: Fix possible path truncations video: fbdev: arkfb: Fix a divide-by-zero bug in ark_set_pixclock() x86/numa: Use cpumask_available instead of hardcoded NULL check scripts/faddr2line: Fix vmlinux detection on arm64 genelf: Use HAVE_LIBCRYPTO_SUPPORT, not the never defined HAVE_LIBCRYPTO powerpc/pci: Fix PHB numbering when using opal-phbid kprobes: Forbid probing on trampoline and BPF code areas perf symbol: Fail to read phdr workaround powerpc/cell/axon_msi: Fix refcount leak in setup_msi_msg_address powerpc/xive: Fix refcount leak in xive_get_max_prio powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader powerpc/pci: Prefer PCI domain assignment via DT 'linux,pci-domain' and alias powerpc/32: Do not allow selection of e5500 or e6500 CPUs on PPC32 video: fbdev: sis: fix typos in SiS_GetModeID() video: fbdev: amba-clcd: Fix refcount leak bugs watchdog: armada_37xx_wdt: check the return value of devm_ioremap() in armada_37xx_wdt_probe() ASoC: audio-graph-card: Add of_node_put() in fail path fuse: Remove the control interface for virtio-fs ASoC: qcom: q6dsp: Fix an off-by-one in q6adm_alloc_copp() s390/zcore: fix race when reading from hardware system area iommu/arm-smmu: qcom_iommu: Add of_node_put() when breaking out of loop mfd: max77620: Fix refcount leak in max77620_initialise_fps mfd: t7l66xb: Drop platform disable callback kfifo: fix kfifo_to_user() return type rpmsg: qcom_smd: Fix refcount leak in qcom_smd_parse_edge iommu/exynos: Handle failed IOMMU device registration properly tty: n_gsm: fix missing corner cases in gsmld_poll() tty: n_gsm: fix DM command tty: n_gsm: fix wrong T1 retry count handling vfio/ccw: Do not change FSM state in subchannel event remoteproc: qcom: wcnss: Fix handling of IRQs tty: n_gsm: fix race condition in gsmld_write() tty: n_gsm: fix packet re-transmission without open control channel tty: n_gsm: fix non flow control frames during mux flow off profiling: fix shift too large makes kernel panic ASoC: codecs: wcd9335: move gains from SX_TLV to S8_TLV ASoC: codecs: msm8916-wcd-digital: move gains from SX_TLV to S8_TLV serial: 8250_dw: Store LSR into lsr_saved_flags in dw8250_tx_wait_empty() ASoC: mediatek: mt8173-rt5650: Fix refcount leak in mt8173_rt5650_dev_probe ASoC: codecs: da7210: add check for i2c_add_driver ASoC: mt6797-mt6351: Fix refcount leak in mt6797_mt6351_dev_probe ASoC: mediatek: mt8173: Fix refcount leak in mt8173_rt5650_rt5676_dev_probe opp: Fix error check in dev_pm_opp_attach_genpd() jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted ext4: recover csum seed of tmp_inode after migrating to extents jbd2: fix outstanding credits assert in jbd2_journal_commit_transaction() null_blk: fix ida error handling in null_add_dev() RDMA/rxe: Fix error unwind in rxe_create_qp() mm/mmap.c: fix missing call to vm_unacct_memory in mmap_region platform/olpc: Fix uninitialized data in debugfs write USB: serial: fix tty-port initialized comments PCI: tegra194: Fix link up retry sequence PCI: tegra194: Fix Root Port interrupt handling HID: alps: Declare U1_UNICORN_LEGACY support mmc: cavium-thunderx: Add of_node_put() when breaking out of loop mmc: cavium-octeon: Add of_node_put() when breaking out of loop gpio: gpiolib-of: Fix refcount bugs in of_mm_gpiochip_add_data() RDMA/hfi1: fix potential memory leak in setup_base_ctxt() RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event RDMA/hns: Fix incorrect clearing of interrupt status register usb: gadget: udc: amd5536 depends on HAS_DMA scsi: smartpqi: Fix DMA direction for RAID requests mmc: sdhci-of-at91: fix set_uhs_signaling rewriting of MC1R memstick/ms_block: Fix a memory leak memstick/ms_block: Fix some incorrect memory allocation mmc: sdhci-of-esdhc: Fix refcount leak in esdhc_signal_voltage_switch staging: rtl8192u: Fix sleep in atomic context bug in dm_fsync_timer_callback intel_th: msu: Fix vmalloced buffers intel_th: msu-sink: Potential dereference of null pointer intel_th: Fix a resource leak in an error handling path soundwire: bus_type: fix remove and shutdown support clk: qcom: camcc-sdm845: Fix topology around titan_top power domain clk: qcom: ipq8074: set BRANCH_HALT_DELAY flag for UBI clocks clk: qcom: ipq8074: fix NSS port frequency tables usb: host: xhci: use snprintf() in xhci_decode_trb() clk: qcom: clk-krait: unlock spin after mux completion driver core: fix potential deadlock in __driver_attach misc: rtsx: Fix an error handling path in rtsx_pci_probe() clk: mediatek: reset: Fix written reset bit offset usb: xhci: tegra: Fix error check usb: ohci-nxp: Fix refcount leak in ohci_hcd_nxp_probe usb: host: Fix refcount leak in ehci_hcd_ppc_of_probe fpga: altera-pr-ip: fix unsigned comparison with less than zero mtd: st_spi_fsm: Add a clk_disable_unprepare() in .probe()'s error path mtd: partitions: Fix refcount leak in parse_redboot_of mtd: sm_ftl: Fix deadlock caused by cancel_work_sync in sm_release HID: cp2112: prevent a buffer overflow in cp2112_xfer() mtd: rawnand: meson: Fix a potential double free issue mtd: maps: Fix refcount leak in ap_flash_init mtd: maps: Fix refcount leak in of_flash_probe_versatile clk: renesas: r9a06g032: Fix UART clkgrp bitsel dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock net: rose: fix netdev reference changes netdevsim: Avoid allocation warnings triggered from user space iavf: Fix max_rate limiting crypto: inside-secure - Add missing MODULE_DEVICE_TABLE for of net/mlx5e: Fix the value of MLX5E_MAX_RQ_NUM_MTTS wifi: libertas: Fix possible refcount leak in if_usb_probe() wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue wifi: wil6210: debugfs: fix uninitialized variable use in `wil_write_file_wmi()` i2c: mux-gpmux: Add of_node_put() when breaking out of loop i2c: cadence: Support PEC for SMBus block read Bluetooth: hci_intel: Add check for platform_driver_register can: pch_can: pch_can_error(): initialize errc before using it can: error: specify the values of data[5..7] of CAN error frames can: usb_8dev: do not report txerr and rxerr during bus-off can: kvaser_usb_leaf: do not report txerr and rxerr during bus-off can: kvaser_usb_hydra: do not report txerr and rxerr during bus-off can: sun4i_can: do not report txerr and rxerr during bus-off can: hi311x: do not report txerr and rxerr during bus-off can: sja1000: do not report txerr and rxerr during bus-off can: rcar_can: do not report txerr and rxerr during bus-off can: pch_can: do not report txerr and rxerr during bus-off selftests/bpf: fix a test for snprintf() overflow wifi: p54: add missing parentheses in p54_flush() wifi: p54: Fix an error handling path in p54spi_probe() wifi: wil6210: debugfs: fix info leak in wil_write_file_wmi() fs: check FMODE_LSEEK to control internal pipe splicing selftests: timers: clocksource-switch: fix passing errors from child selftests: timers: valid-adjtimex: build fix for newer toolchains libbpf: Fix the name of a reused map tcp: make retransmitted SKB fit into the send window drm/exynos/exynos7_drm_decon: free resources when clk_set_parent() failed. mediatek: mt76: mac80211: Fix missing of_node_put() in mt76_led_init() media: platform: mtk-mdp: Fix mdp_ipi_comm structure alignment crypto: hisilicon - Kunpeng916 crypto driver don't sleep when in softirq drm/msm/mdp5: Fix global state lock backoff drm: bridge: sii8620: fix possible off-by-one drm/mediatek: dpi: Only enable dpi after the bridge is enabled drm/mediatek: dpi: Remove output format of YUV drm/rockchip: Fix an error handling path rockchip_dp_probe() drm/rockchip: vop: Don't crash for invalid duplicate_state() crypto: arm64/gcm - Select AEAD for GHASH_ARM64_CE drm/vc4: dsi: Correct DSI divider calculations drm/vc4: plane: Fix margin calculations for the right/bottom edges drm/vc4: plane: Remove subpixel positioning check media: hdpvr: fix error value returns in hdpvr_read drm/mcde: Fix refcount leak in mcde_dsi_bind drm: bridge: adv7511: Add check for mipi_dsi_driver_register wifi: iwlegacy: 4965: fix potential off-by-one overflow in il4965_rs_fill_link_cmd() ath9k: fix use-after-free in ath9k_hif_usb_rx_cb media: tw686x: Register the irq at the end of probe i2c: Fix a potential use after free drm: adv7511: override i2c address of cec before accessing it drm/mediatek: Add pull-down MIPI operation in mtk_dsi_poweroff function drm/radeon: fix potential buffer overflow in ni_set_mc_special_registers() drm/mipi-dbi: align max_chunk to 2 in spi_transfer wifi: rtlwifi: fix error codes in rtl_debugfs_set_write_h2c() ath10k: do not enforce interrupt trigger type dm: return early from dm_pr_call() if DM device is suspended thermal/tools/tmon: Include pthread and time headers in tmon.h nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt() regulator: of: Fix refcount leak bug in of_get_regulation_constraints() blk-mq: don't create hctx debugfs dir until q->debugfs_dir is created erofs: avoid consecutive detection for Highmem memory arm64: dts: mt7622: fix BPI-R64 WPS button bus: hisi_lpc: fix missing platform_device_put() in hisi_lpc_acpi_probe() ARM: dts: qcom: pm8841: add required thermal-sensor-cells soc: qcom: aoss: Fix refcount leak in qmp_cooling_devices_register cpufreq: zynq: Fix refcount leak in zynq_get_revision ARM: OMAP2+: Fix refcount leak in omap3xxx_prm_late_init ARM: OMAP2+: Fix refcount leak in omapdss_init_of ARM: dts: qcom: mdm9615: add missing PMIC GPIO reg soc: fsl: guts: machine variable might be unset ARM: dts: ast2600-evb: fix board compatible ARM: dts: ast2500-evb: fix board compatible x86/pmem: Fix platform-device leak in error path ARM: bcm: Fix refcount leak in bcm_kona_smc_init meson-mx-socinfo: Fix refcount leak in meson_mx_socinfo_init ARM: findbit: fix overflowing offset spi: spi-rspi: Fix PIO fallback on RZ platforms selinux: Add boundary check in put_entry() PM: hibernate: defer device probing when resuming from hibernation ARM: shmobile: rcar-gen2: Increase refcount for new reference arm64: dts: allwinner: a64: orangepi-win: Fix LED node name arm64: dts: qcom: ipq8074: fix NAND node name ACPI: LPSS: Fix missing check in register_device_clock() ACPI: PM: save NVS memory for Lenovo G40-45 ACPI: EC: Remove duplicate ThinkPad X1 Carbon 6th entry from DMI quirks ARM: OMAP2+: display: Fix refcount leak bug spi: synquacer: Add missing clk_disable_unprepare() ARM: dts: imx6ul: fix qspi node compatible ARM: dts: imx6ul: fix lcdif node compatible ARM: dts: imx6ul: fix csi node compatible ARM: dts: imx6ul: change operating-points to uint32-matrix ARM: dts: imx6ul: add missing properties for sram wait: Fix __wait_event_hrtimeout for RT/DL tasks genirq: Don't return error on missing optional irq_request_resources() ext2: Add more validity checks for inode counts arm64: fix oops in concurrently setting insn_emulation sysctls arm64: Do not forget syscall when starting a new thread. x86: Handle idle=nomwait cmdline properly for x86_idle epoll: autoremove wakers even more aggressively netfilter: nf_tables: fix null deref due to zeroed list head netfilter: nf_tables: do not allow RULE_ID to refer to another chain netfilter: nf_tables: do not allow SET_ID to refer to another table arm64: dts: uniphier: Fix USB interrupts for PXs3 SoC ARM: dts: uniphier: Fix USB interrupts for PXs2 SoC USB: HCD: Fix URB giveback issue in tasklet function coresight: Clear the connection field properly MIPS: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK powerpc/powernv: Avoid crashing if rng is NULL powerpc/ptdump: Fix display of RW pages on FSL_BOOK3E powerpc/fsl-pci: Fix Class Code of PCIe Root Port PCI: Add defines for normal and subtractive PCI bridges ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() md-raid10: fix KASAN warning serial: mvebu-uart: uart2 error bits clearing fuse: limit nsec iio: light: isl29028: Fix the warning in isl29028_remove() drm/amdgpu: Check BO's requested pinning domains against its preferred_domains drm/nouveau: fix another off-by-one in nvbios_addr drm/gem: Properly annotate WW context on drm_gem_lock_reservations() error parisc: io_pgetevents_time64() needs compat syscall in 32-bit compat mode parisc: Fix device names in /proc/iomem ovl: drop WARN_ON() dentry is NULL in ovl_encode_fh() usbnet: Fix linkwatch use-after-free on disconnect fbcon: Fix boundary checks for fbcon=vc:n1-n2 parameters thermal: sysfs: Fix cooling_device_stats_setup() error code path fs: Add missing umask strip in vfs_tmpfile vfs: Check the truncate maximum size in inode_newsize_ok() tty: vt: initialize unicode screen buffer ALSA: hda/realtek: Add quirk for another Asus K42JZ model ALSA: hda/cirrus - support for iMac 12,1 model ALSA: hda/conexant: Add quirk for LENOVO 20149 Notebook model mm/mremap: hold the rmap lock in write mode when moving page table entries. KVM: x86: Set error code to segment selector on LLDT/LTR non-canonical #GP KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checks KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value KVM: SVM: Don't BUG if userspace injects an interrupt with GIF=0 KVM: nVMX: Snapshot pre-VM-Enter DEBUGCTL for !nested_run_pending case KVM: nVMX: Snapshot pre-VM-Enter BNDCFGS for !nested_run_pending case HID: wacom: Don't register pad_input for touch switch HID: wacom: Only report rotation for art pen add barriers to buffer_uptodate and set_buffer_uptodate wifi: mac80211_hwsim: use 32-bit skb cookie wifi: mac80211_hwsim: add back erroneously removed cast wifi: mac80211_hwsim: fix race condition in pending packet igc: Remove _I_PHY_ID checking ALSA: bcd2000: Fix a UAF bug on the error path of probing scsi: Revert "scsi: qla2xxx: Fix disk failure to rediscover" x86: link vdso and boot with -z noexecstack --no-warn-rwx-segments Makefile: link with -z noexecstack --no-warn-rwx-segments Conflicts: Documentation/devicetree/bindings Documentation/devicetree/bindings/arm/qcom.yaml Documentation/devicetree/bindings/dma/moxa,moxart-dma.txt drivers/mmc/core/sd.c drivers/net/usb/ax88179_178a.c drivers/rpmsg/qcom_glink_native.c drivers/usb/dwc3/core.c drivers/usb/typec/ucsi/ucsi.c net/core/dev.c net/wireless/scan.c Change-Id: Id1996866ef5d9b7c097c39a5bdb00db413763104 Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com> |
||
Greg Kroah-Hartman
|
3e7819945e |
This is the 5.4.216 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmM9Ql8ACgkQONu9yGCS aT6i7Q//ZlSl7y1NgrraPyYYiHdzEzNSGD6dHcjrdghC/YhLZ/K93sNiTBnwpj18 TLYzRuKDf/47Dniml0vv62xHIiQcBJ79a7Lo7kT76X4nCg09gFV444fADEjTR6MF Bc3W1Ps4ikoOH9K1xq99uoxjQWq4k55xHrZ6JW5Cq7Dsj/BdBQKuTlzVNDrwOdXD Cb+ubL41ni5qipJ9a4hwlBhhOB04ZeSThRWeyCPwRiIWHFVOAfDAqq2fVNjYu0Cx E+Oz6GMvLaXBjZh2fw1uuClBh1FbDD51wlmtGS0YLmcPqWECAbTOFP2/ozzo3Gh2 Ugm7et0zHXsCXeOCmNULWg60INFhp0+b8e44w9/j3tXofjatDQsJtG3t05TqJN6M kcWg4SKDjc0ope+HyObY5TZA8Oa+D6XCOh2RLfNA/ipJXQ1EtDDbAAYVPPY9Yyqo efsMNRKxMycT1TSc2KINhlpztaxgyz67AJ+vnDfoA42/XMNPYdMjEat+PuOwSiwy RP0+lrGm6zUPQ0wzNHk+S4iwNrpts0Oc29S4IAV1i24aKWqByQSN5vxmiy7UH4mG pbXJ4uSNiCzHtyd9zV7FlyniJCrawm+sAbbAQc99VUp46vJKEJVOVTGGtvuAR+ME iQUhIfqBgs+Eff2ZJ5+QFgPTHMzuQYVfBaigVhXsJwZqPJgHNcs= =m4xD -----END PGP SIGNATURE----- Merge 5.4.216 into android11-5.4-lts Changes in 5.4.216 uas: add no-uas quirk for Hiksemi usb_disk usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS uas: ignore UAS for Thinkplus chips net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455 clk: ingenic-tcu: Properly enable registers before accessing timers ARM: dts: integrator: Tag PCI host with device_type ntfs: fix BUG_ON in ntfs_lookup_inode_by_name() Revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()" libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205 mmc: moxart: fix 4-bit bus width and remove 8-bit bus width mm/page_alloc: fix race condition between build_all_zonelists and page allocation mm: prevent page_frag_alloc() from corrupting the memory mm/migrate_device.c: flush TLB while holding PTL mm: fix madivse_pageout mishandling on non-LRU page media: dvb_vb2: fix possible out of bound access ARM: dts: Move am33xx and am43xx mmc nodes to sdhci-omap driver ARM: dts: am33xx: Fix MMCHS0 dma properties soc: sunxi: sram: Actually claim SRAM regions soc: sunxi: sram: Prevent the driver from being unbound soc: sunxi_sram: Make use of the helper function devm_platform_ioremap_resource() soc: sunxi: sram: Fix probe function ordering issues soc: sunxi: sram: Fix debugfs info for A64 SRAM C Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time" Input: melfas_mip4 - fix return value check in mip4_probe() usbnet: Fix memory leak in usbnet_disconnect() nvme: add new line after variable declatation nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices selftests: Fix the if conditions of in test_extra_filter() clk: imx: imx6sx: remove the SET_RATE_PARENT flag for QSPI clocks clk: iproc: Do not rely on node name for correct PLL setup Linux 5.4.216 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6f013eba455e18b2e39b6092a278d0af09a0c85e |
||
Minchan Kim
|
0f4634f70b |
mm: fix madivse_pageout mishandling on non-LRU page
commit 58d426a7ba92870d489686dfdb9d06b66815a2ab upstream.
MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from
isolate_lru_page below.
Fix it by checking PageLRU in advance.
------------[ cut here ]------------
trying to isolate tail page
WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
Modules linked in:
CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:isolate_lru_page+0x130/0x140
Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org
Fixes:
|
||
Charan Teja Reddy
|
6f65360b49 |
mm/madvise: use mmap_sem lock from passed mm_struct
Use the mmap_sem lock from passed 'mm' rather than the 'current' as it
is required because of the patchset[1]. The commit
|
||
Srinivasarao P
|
b403cd66bd |
Merge android11-5.4.86+ (75c93eb ) into msm-5.4
* refs/heads/tmp-75c93eb: Revert one chunk from |
||
Minchan Kim
|
475c63f782 |
mm/madvise: remove racy mm ownership check
Jann spotted the security hole due to race of mm ownership check. If the task is sharing the mm_struct but goes through execve() before mm_access(), it could skip process_madvise_behavior_valid check. That makes *any advice hint* to reach into the remote process. This patch removes the mm ownership check. With it, it will lose the ability that local process could give *any* advice hint with vector interface for some reason (e.g., performance). Since there is no concrete example in upstream yet, it would be better to remove the abiliity at this moment and need to review when such new advice comes up. Change-Id: I536de5167b6cf7378a06a6b7815f1dd2869a56a4 Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") Reported-by: Jann Horn <jannh@google.com> Suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-Commit: a68a0262abdaa251e12c53715f48e698a18ef402 Git-Repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Greg Kroah-Hartman
|
e772bef401 |
This is the 5.4.69 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl91u0cACgkQONu9yGCS aT7KmhAAvuW3edfAfzD/F5h4vHaa9rMRmtvp2/FwefBoE4LEi3F6p2gBrUZMA3ds DNQ8Nheafeqd63wFkfE//TXYR0rYTxTxa0jTrhtuJCUZ4+anRyG00fEbHPOxvMnJ aPwQQVNOfCaUAvRbFdQ4RbuIm5chhX8Bml0ZtqvsAAFJ9XkCh1UPF0VHtSrS7PRL lRMBlamLgZqU72naaJaFY2nMp+pvMFPZrzkR7tpv0Z1bqxuJp6L2n/EmcHpmTOJy Ze+Wvt1wKk8Ep5Vql5ekXt5lEiInjacwsJZXbb5HfHO++Y+1b+ABt1kSjJx+R3/q 2Qdztq+9Eoj0N1A4gXdVFoZHqKihhbD49k8YqX4qO5ujTzqgnNyHGSEXyIKvaU6z b3b12IvjbcMhM1zm3qvFfrVbbQI3kJf66zSi9NAwsZHlsvxRzslALR8I7mila4r5 fVOyfGoZxFs44FNW9JG7I85/isAxgg0ogYraMZbk8gmhTtb1ZaN+r7kJeXuTpzOg UBAIDYPclMyZeny6tn1/qFuzNGYQQ0R9kxFcTC21Cf2zNLWHNfwCL1vE3Ob+ROIS IHcsce6IqWQKGlD8UPjkZiXTLfqCAVi51PsGTVrnidXfa1IBOuvDsVqlghPsjHSD 30N4VB++9Gbw7LFEP4e33cOZLBLjDEdYd4VuoQFYywDZ3cy6xXo= =OoZD -----END PGP SIGNATURE----- Merge 5.4.69 into android11-5.4-lts Changes in 5.4.69 kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec() scsi: lpfc: Fix pt2pt discovery on SLI3 HBAs scsi: mpt3sas: Free diag buffer without any status check selinux: allow labeling before policy is loaded media: mc-device.c: fix memleak in media_device_register_entity drm/amd/display: Do not double-buffer DTO adjustments drm/amdkfd: Fix race in gfx10 context restore handler dma-fence: Serialise signal enabling (dma_fence_enable_sw_signaling) scsi: qla2xxx: Add error handling for PLOGI ELS passthrough ath10k: fix array out-of-bounds access ath10k: fix memory leak for tpc_stats_final PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes mm: fix double page fault on arm64 if PTE_AF is cleared scsi: aacraid: fix illegal IO beyond last LBA m68k: q40: Fix info-leak in rtc_ioctl xfs: fix inode fork extent count overflow gma/gma500: fix a memory disclosure bug due to uninitialized bytes ASoC: kirkwood: fix IRQ error handling soundwire: intel/cadence: fix startup sequence media: smiapp: Fix error handling at NVM reading drm/amd/display: Free gamma after calculating legacy transfer function xfs: properly serialise fallocate against AIO+DIO leds: mlxreg: Fix possible buffer overflow dm table: do not allow request-based DM to stack on partitions PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out scsi: fnic: fix use after free scsi: lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce powerpc/64s: Always disable branch profiling for prom_init.o net: silence data-races on sk_backlog.tail dax: Fix alloc_dax_region() compile warning iomap: Fix overflow in iomap_page_mkwrite f2fs: avoid kernel panic on corruption test clk/ti/adpll: allocate room for terminating null drm/amdgpu/powerplay: fix AVFS handling with custom powerplay table ice: Fix to change Rx/Tx ring descriptor size via ethtool with DCBx mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup() mfd: mfd-core: Protect against NULL call-back function pointer drm/amdgpu/powerplay/smu7: fix AVFS handling with custom powerplay table tpm_crb: fix fTPM on AMD Zen+ CPUs tracing: Verify if trace array exists before destroying it. tracing: Adding NULL checks for trace_array descriptor pointer bcache: fix a lost wake-up problem caused by mca_cannibalize_lock dmaengine: mediatek: hsdma_probe: fixed a memory leak when devm_request_irq fails x86/kdump: Always reserve the low 1M when the crashkernel option is specified RDMA/qedr: Fix potential use after free RDMA/i40iw: Fix potential use after free PCI: Avoid double hpmemsize MMIO window assignment fix dget_parent() fastpath race xfs: fix attr leaf header freemap.size underflow RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()' ubi: Fix producing anchor PEBs mmc: core: Fix size overflow for mmc partitions gfs2: clean up iopen glock mess in gfs2_create_inode scsi: pm80xx: Cleanup command when a reset times out mt76: do not use devm API for led classdev mt76: add missing locking around ampdu action debugfs: Fix !DEBUG_FS debugfs_create_automount SUNRPC: Capture completion of all RPC tasks CIFS: Use common error handling code in smb2_ioctl_query_info() CIFS: Properly process SMB3 lease breaks f2fs: stop GC when the victim becomes fully valid ASoC: max98090: remove msleep in PLL unlocked workaround xtensa: fix system_call interaction with ptrace s390: avoid misusing CALL_ON_STACK for task stack setup xfs: fix realtime file data space leak drm/amdgpu: fix calltrace during kmd unload(v3) arm64: insn: consistently handle exit text selftests/bpf: De-flake test_tcpbpf kernel/notifier.c: intercept duplicate registrations to avoid infinite loops kernel/sys.c: avoid copying possible padding bytes in copy_to_user KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy() module: Remove accidental change of module_enable_x() xfs: fix log reservation overflows when allocating large rt extents ALSA: hda: enable regmap internal locking tipc: fix link overflow issue at socket shutdown vcc_seq_next should increase position index neigh_stat_seq_next() should increase position index rt_cpu_seq_next should increase position index ipv6_route_seq_next should increase position index drm/mcde: Handle pending vblank while disabling display seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier drm/scheduler: Avoid accessing freed bad job. media: ti-vpe: cal: Restrict DMA to avoid memory corruption opp: Replace list_kref with a local counter scsi: qla2xxx: Fix stuck session in GNL scsi: lpfc: Fix incomplete NVME discovery when target sctp: move trace_sctp_probe_path into sctp_outq_sack ACPI: EC: Reference count query handlers under lock scsi: ufs: Make ufshcd_add_command_trace() easier to read scsi: ufs: Fix a race condition in the tracing code drm/amd/display: Initialize DSC PPS variables to 0 i2c: tegra: Prevent interrupt triggering after transfer timeout btrfs: tree-checker: Check leaf chunk item size dmaengine: zynqmp_dma: fix burst length configuration s390/cpum_sf: Use kzalloc and minor changes nfsd: Fix a soft lockup race in nfsd_file_mark_find_or_create() powerpc/eeh: Only dump stack once if an MMIO loop is detected Bluetooth: btrtl: Use kvmalloc for FW allocations tracing: Set kernel_stack's caller size properly ARM: 8948/1: Prevent OOB access in stacktrace ar5523: Add USB ID of SMCWUSBT-G2 wireless adapter ceph: ensure we have a new cap before continuing in fill_inode selftests/ftrace: fix glob selftest tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility Bluetooth: Fix refcount use-after-free issue mm/swapfile.c: swap_next should increase position index mm: pagewalk: fix termination condition in walk_pte_range() Bluetooth: prefetch channel before killing sock KVM: fix overflow of zero page refcount with ksm running ALSA: hda: Clear RIRB status before reading WP skbuff: fix a data race in skb_queue_len() nfsd: Fix a perf warning drm/amd/display: fix workaround for incorrect double buffer register for DLG ADL and TTU audit: CONFIG_CHANGE don't log internal bookkeeping as an event selinux: sel_avc_get_stat_idx should increase position index scsi: lpfc: Fix RQ buffer leakage when no IOCBs available scsi: lpfc: Fix release of hwq to clear the eq relationship scsi: lpfc: Fix coverity errors in fmdi attribute handling drm/omap: fix possible object reference leak locking/lockdep: Decrement IRQ context counters when removing lock chain clk: stratix10: use do_div() for 64-bit calculation crypto: chelsio - This fixes the kernel panic which occurs during a libkcapi test mt76: clear skb pointers from rx aggregation reorder buffer during cleanup mt76: fix handling full tx queues in mt76_dma_tx_queue_skb_raw ALSA: usb-audio: Don't create a mixer element with bogus volume range perf test: Fix test trace+probe_vfs_getname.sh on s390 RDMA/rxe: Fix configuration of atomic queue pair attributes KVM: x86: fix incorrect comparison in trace event KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow dmaengine: stm32-mdma: use vchan_terminate_vdesc() in .terminate_all media: staging/imx: Missing assignment in imx_media_capture_device_register() x86/pkeys: Add check for pkey "overflow" bpf: Remove recursion prevention from rcu free callback dmaengine: stm32-dma: use vchan_terminate_vdesc() in .terminate_all dmaengine: tegra-apb: Prevent race conditions on channel's freeing soundwire: bus: disable pm_runtime in sdw_slave_delete drm/amd/display: dal_ddc_i2c_payloads_create can fail causing panic drm/omap: dss: Cleanup DSS ports on initialisation failure iavf: use tc_cls_can_offload_and_chain0() instead of chain check firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhp random: fix data races at timer_rand_state bus: hisi_lpc: Fixup IO ports addresses to avoid use-after-free in host removal ASoC: SOF: ipc: check ipc return value before data copy media: go7007: Fix URB type for interrupt handling Bluetooth: guard against controllers sending zero'd events timekeeping: Prevent 32bit truncation in scale64_check_overflow() powerpc/book3s64: Fix error handling in mm_iommu_do_alloc() drm/amd/display: fix image corruption with ODM 2:1 DSC 2 slice ext4: fix a data race at inode->i_disksize perf jevents: Fix leak of mapfile memory mm: avoid data corruption on CoW fault into PFN-mapped VMA drm/amdgpu: increase atombios cmd timeout ARM: OMAP2+: Handle errors for cpu_pm drm/amd/display: Stop if retimer is not available clk: imx: Fix division by zero warning on pfdv2 cpu-topology: Fix the potential data corruption s390/irq: replace setup_irq() by request_irq() perf cs-etm: Swap packets for instruction samples perf cs-etm: Correct synthesizing instruction samples ath10k: use kzalloc to read for ath10k_sdio_hif_diag_read scsi: aacraid: Disabling TM path and only processing IOP reset Bluetooth: L2CAP: handle l2cap config request during open state media: tda10071: fix unsigned sign extension overflow tty: sifive: Finish transmission before changing the clock xfs: don't ever return a stale pointer from __xfs_dir3_free_read xfs: mark dir corrupt when lookup-by-hash fails ext4: mark block bitmap corrupted when found instead of BUGON tpm: ibmvtpm: Wait for buffer to be set before proceeding rtc: sa1100: fix possible race condition rtc: ds1374: fix possible race condition nfsd: Don't add locks to closed or closing open stateids RDMA/cm: Remove a race freeing timewait_info intel_th: Disallow multi mode on devices where it's broken KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones drm/msm: fix leaks if initialization fails drm/msm/a5xx: Always set an OPP supported hardware value tracing: Use address-of operator on section symbols thermal: rcar_thermal: Handle probe error gracefully KVM: LAPIC: Mark hrtimer for period or oneshot mode to expire in hard interrupt context perf parse-events: Fix 3 use after frees found with clang ASAN btrfs: do not init a reloc root if we aren't relocating btrfs: free the reloc_control in a consistent way r8169: improve RTL8168b FIFO overflow workaround serial: 8250_port: Don't service RX FIFO if throttled serial: 8250_omap: Fix sleeping function called from invalid context during probe serial: 8250: 8250_omap: Terminate DMA before pushing data on RX timeout perf cpumap: Fix snprintf overflow check net: axienet: Convert DMA error handler to a work queue net: axienet: Propagate failure of DMA descriptor setup cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn tools: gpio-hammer: Avoid potential overflow in main exec: Add exec_update_mutex to replace cred_guard_mutex exec: Fix a deadlock in strace selftests/ptrace: add test cases for dead-locks kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve proc: Use new infrastructure to fix deadlocks in execve proc: io_accounting: Use new infrastructure to fix deadlocks in execve perf: Use new infrastructure to fix deadlocks in execve nvme-multipath: do not reset on unknown status nvme: Fix ctrl use-after-free during sysfs deletion nvme: Fix controller creation races with teardown flow brcmfmac: Fix double freeing in the fmac usb data path xfs: prohibit fs freezing when using empty transactions RDMA/rxe: Set sys_image_guid to be aligned with HW IB devices IB/iser: Always check sig MR before putting it to the free pool scsi: hpsa: correct race condition in offload enabled SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()' svcrdma: Fix leak of transport addresses netfilter: nf_tables: silence a RCU-list warning in nft_table_lookup() PCI: Use ioremap(), not phys_to_virt() for platform ROM ubifs: ubifs_jnl_write_inode: Fix a memory leak bug ubifs: ubifs_add_orphan: Fix a memory leak bug ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor PCI: pciehp: Fix MSI interrupt race NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests() drm/amdgpu/vcn2.0: stall DPG when WPTR/RPTR reset powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events. mm/kmemleak.c: use address-of operator on section symbols mm/filemap.c: clear page error before actual read mm/swapfile: fix data races in try_to_unuse() mm/vmscan.c: fix data races using kswapd_classzone_idx SUNRPC: Don't start a timer on an already queued rpc task nvmet-rdma: fix double free of rdma queue workqueue: Remove the warning in wq_worker_sleeping() drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area ALSA: hda: Skip controller resume if not needed scsi: qedi: Fix termination timeouts in session logout serial: uartps: Wait for tx_empty in console setup btrfs: fix setting last_trans for reloc roots KVM: Remove CREATE_IRQCHIP/SET_PIT2 race perf stat: Force error in fallback on :k events bdev: Reduce time holding bd_mutex in sync in blkdev_close() drivers: char: tlclk.c: Avoid data race between init and interrupt handler KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi() net: openvswitch: use u64 for meter bucket scsi: aacraid: Fix error handling paths in aac_probe_one() staging:r8188eu: avoid skb_clone for amsdu to msdu conversion sparc64: vcc: Fix error return code in vcc_probe() arm64: cpufeature: Relax checks for AArch32 support at EL[0-2] sched/fair: Eliminate bandwidth race between throttling and distribution dpaa2-eth: fix error return code in setup_dpni() dt-bindings: sound: wm8994: Correct required supplies based on actual implementaion devlink: Fix reporter's recovery condition atm: fix a memory leak of vcc->user_back media: venus: vdec: Init registered list unconditionally perf mem2node: Avoid double free related to realloc mm/slub: fix incorrect interpretation of s->offset i2c: tegra: Restore pinmux on system resume power: supply: max17040: Correct voltage reading phy: samsung: s5pv210-usb2: Add delay after reset Bluetooth: Handle Inquiry Cancel error after Inquiry Complete USB: EHCI: ehci-mv: fix error handling in mv_ehci_probe() KVM: x86: handle wrap around 32-bit address space tipc: fix memory leak in service subscripting tty: serial: samsung: Correct clock selection logic ALSA: hda: Fix potential race in unsol event handler drm/exynos: dsi: Remove bridge node reference in error handling path in probe function ipmi:bt-bmc: Fix error handling and status check powerpc/traps: Make unrecoverable NMIs die instead of panic svcrdma: Fix backchannel return code fuse: don't check refcount after stealing page fuse: update attr_version counter on fuse_notify_inval_inode() USB: EHCI: ehci-mv: fix less than zero comparison of an unsigned int coresight: etm4x: Fix use-after-free of per-cpu etm drvdata arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work scsi: cxlflash: Fix error return code in cxlflash_probe() arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register drm/amdkfd: fix restore worker race condition e1000: Do not perform reset in reset_task if we are already down drm/nouveau/debugfs: fix runtime pm imbalance on error drm/nouveau: fix runtime pm imbalance on error drm/nouveau/dispnv50: fix runtime pm imbalance on error printk: handle blank console arguments passed in. usb: dwc3: Increase timeout for CmdAct cleared by device controller btrfs: don't force read-only after error in drop snapshot btrfs: fix double __endio_write_update_ordered in direct I/O gpio: rcar: Fix runtime PM imbalance on error vfio/pci: fix memory leaks of eventfd ctx KVM: PPC: Book3S HV: Close race with page faults around memslot flushes perf evsel: Fix 2 memory leaks perf trace: Fix the selection for architectures to generate the errno name tables perf stat: Fix duration_time value for higher intervals perf util: Fix memory leak of prefix_if_not_in perf metricgroup: Free metric_events on error perf kcore_copy: Fix module map when there are no modules loaded PCI: tegra194: Fix runtime PM imbalance on error ASoC: img-i2s-out: Fix runtime PM imbalance on error wlcore: fix runtime pm imbalance in wl1271_tx_work wlcore: fix runtime pm imbalance in wlcore_regdomain_config mtd: rawnand: gpmi: Fix runtime PM imbalance on error mtd: rawnand: omap_elm: Fix runtime PM imbalance on error PCI: tegra: Fix runtime PM imbalance on error ceph: fix potential race in ceph_check_caps mm/swap_state: fix a data race in swapin_nr_pages mm: memcontrol: fix stat-corrupting race in charge moving rapidio: avoid data race between file operation callbacks and mport_cdev_add(). mtd: parser: cmdline: Support MTD names containing one or more colons x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline NFS: nfs_xdr_status should record the procedure name vfio/pci: Clear error and request eventfd ctx after releasing cifs: Fix double add page to memcg when cifs_readpages nvme: fix possible deadlock when I/O is blocked mac80211: skip mpath lookup also for control port tx scsi: libfc: Handling of extra kref scsi: libfc: Skip additional kref updating work event selftests/x86/syscall_nt: Clear weird flags after each test vfio/pci: fix racy on error and request eventfd ctx btrfs: qgroup: fix data leak caused by race between writeback and truncate perf tests: Fix test 68 zstd compression for s390 scsi: qla2xxx: Retry PLOGI on FC-NVMe PRLI failure ubi: fastmap: Free unused fastmap anchor peb during detach mt76: fix LED link time failure opp: Increase parsed_static_opps in _of_add_opp_table_v1() perf parse-events: Use strcmp() to compare the PMU name ALSA: hda: Always use jackpoll helper for jack update after resume ALSA: hda: Workaround for spurious wakeups on some Intel platforms net: openvswitch: use div_u64() for 64-by-32 divisions nvme: explicitly update mpath disk capacity on revalidation device_cgroup: Fix RCU list debugging warning ASoC: pcm3168a: ignore 0 Hz settings ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811 ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1 RISC-V: Take text_mutex in ftrace_init_nop() i2c: aspeed: Mask IRQ status to relevant bits s390/init: add missing __init annotations lockdep: fix order in trace_hardirqs_off_caller() EDAC/ghes: Check whether the driver is on the safe list correctly drm/amdkfd: fix a memory leak issue drm/amd/display: update nv1x stutter latencies drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices() objtool: Fix noreturn detection for ignored functions ieee802154: fix one possible memleak in ca8210_dev_com_init ieee802154/adf7242: check status of adf7242_read_reg clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init() mwifiex: Increase AES key storage size to 256 bits batman-adv: bla: fix type misuse for backbone_gw hash indexing atm: eni: fix the missed pci_disable_device() for eni_init_one() batman-adv: mcast/TT: fix wrongly dropped or rerouted packets netfilter: conntrack: nf_conncount_init is failing with IPv6 disabled mac802154: tx: fix use-after-free bpf: Fix clobbering of r2 in bpf_gen_ld_abs drm/vc4/vc4_hdmi: fill ASoC card owner net: qed: Disable aRFS for NPAR and 100G net: qede: Disable aRFS for NPAR and 100G net: qed: RDMA personality shouldn't fail VF load drm/sun4i: sun8i-csc: Secondary CSC register correction batman-adv: Add missing include for in_interrupt() nvme-tcp: fix kconfig dependency warning when !CRYPTO batman-adv: mcast: fix duplicate mcast packets in BLA backbone from LAN batman-adv: mcast: fix duplicate mcast packets in BLA backbone from mesh batman-adv: mcast: fix duplicate mcast packets from BLA backbone to mesh bpf: Fix a rcu warning for bpffs map pretty-print lib80211: fix unmet direct dependendices config warning when !CRYPTO ALSA: asihpi: fix iounmap in error handler regmap: fix page selection for noinc reads regmap: fix page selection for noinc writes MIPS: Add the missing 'CPU_1074K' into __get_cpu_type() regulator: axp20x: fix LDO2/4 description KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE KVM: SVM: Add a dedicated INVD intercept routine mm: validate pmd after splitting arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback x86/ioapic: Unbreak check_timer() scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported ALSA: usb-audio: Add delay quirk for H570e USB headsets ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged ALSA: hda/realtek: Enable front panel headset LED on Lenovo ThinkStation P520 lib/string.c: implement stpcpy tracing: fix double free s390/dasd: Fix zero write for FBA devices kprobes: Fix to check probe enabled before disarm_kprobe_ftrace() kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot btrfs: fix overflow when copying corrupt csums for a message dmabuf: fix NULL pointer dereference in dma_buf_release() mm, THP, swap: fix allocating cluster for swapfile by mistake mm/gup: fix gup_fast with dynamic page table folding s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch dm: fix bio splitting and its bio completion order for regular IO kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE ata: define AC_ERR_OK ata: make qc_prep return ata_completion_errors ata: sata_mv, avoid trigerrable BUG_ON Linux 5.4.69 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I2a26b4f6fd89b641fa80e339ee72089da51a1415 |
||
Minchan Kim
|
fe932d4c9e |
mm: validate pmd after splitting
[ Upstream commit ce2684254bd4818ca3995c0d021fb62c4cf10a19 ]
syzbot reported the following KASAN splat:
general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296
Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc
RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006
Call Trace:
lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:354 [inline]
madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389
walk_pmd_range mm/pagewalk.c:89 [inline]
walk_pud_range mm/pagewalk.c:160 [inline]
walk_p4d_range mm/pagewalk.c:193 [inline]
walk_pgd_range mm/pagewalk.c:229 [inline]
__walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331
walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427
madvise_pageout_page_range mm/madvise.c:521 [inline]
madvise_pageout mm/madvise.c:557 [inline]
madvise_vma mm/madvise.c:946 [inline]
do_madvise+0x12d0/0x2090 mm/madvise.c:1145
__do_sys_madvise mm/madvise.c:1171 [inline]
__se_sys_madvise mm/madvise.c:1169 [inline]
__x64_sys_madvise+0x76/0x80 mm/madvise.c:1169
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The backing vma was shmem.
In case of split page of file-backed THP, madvise zaps the pmd instead
of remapping of sub-pages. So we need to check pmd validity after
split.
Reported-by: syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com
Fixes:
|
||
Greg Kroah-Hartman
|
a3775e2a89 |
This is the 5.4.64 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl9ZDSIACgkQONu9yGCS aT5GkA/+I3VF/vpyQXLEY3lCOxUPWsbsU+NTx5x6g4ujFLPzzXISvxaQT3FdyTez 73nRDbEUwGX8b1Ruylg6PoRjNAilwvYB8gS/4TVxJQ/VtIyG7uFqjDK7vuGJT5xW +Pf+PSoJjJLfQfu6TzSZwMit5K8wfTk4egESeZ7KUH3IsLvlNs4Xegkpm1pkl8nZ jb3FT5vTPd425Qld6odkfVTj0QJ8JniL8U54YTBXjw6uEMRGsjMGsy91gXNQbgjf fhhhRrpFhnDE9rJFtLEVaXUbQ2j3+mjS5lSH/2erpXO+U19yeNLElwpltnHPFrJF vDjkvlWdoQKs1+JXNzVQZF9H+omQbTcU8gcRB+s8EbSV2+bcpIdNeas00GaumJW1 l6660A74mKPN4Vii5YioD9GcsJHgKRkbgJkoxu7QnegiHGoHTfToNVgwz2bQgT34 JXbZXyhfLOTR5zpczJ3gyBHX+Va3dyHJypyRMgvgyvDW+TZS9By8iAaqXs14eGjG 8nm5dlaiZyAeburIUyi8vFZZT/5BA42b1xyUZcduKmqlMjRu9fxCHlBCwj5rjcy5 Psin0EYZcwOtA4mKzIH+w1ZB0qsPLYtLYQZaJzPUsUfzoNvYtU7pbQZEVLtUPMf4 5MbOPLjT+aki4TGQOR+et29kusapeLEfrc3SgfLwYODmDXmR3cE= =N3oJ -----END PGP SIGNATURE----- Merge 5.4.64 into android11-5.4-lts Changes in 5.4.64 HID: quirks: Always poll three more Lenovo PixArt mice drm/msm/dpu: Fix scale params in plane validation tty: serial: qcom_geni_serial: Drop __init from qcom_geni_console_setup drm/msm: add shutdown support for display platform_driver hwmon: (applesmc) check status earlier. nvmet: Disable keep-alive timer when kato is cleared to 0h drm/msm: enable vblank during atomic commits habanalabs: validate FW file size habanalabs: check correct vmalloc return code drm/msm/a6xx: fix gmu start on newer firmware ceph: don't allow setlease on cephfs drm/omap: fix incorrect lock state cpuidle: Fixup IRQ state nbd: restore default timeout when setting it to zero s390: don't trace preemption in percpu macros drm/amd/display: Reject overlay plane configurations in multi-display scenarios drivers: gpu: amd: Initialize amdgpu_dm_backlight_caps object to 0 in amdgpu_dm_update_backlight_caps drm/amd/display: Retry AUX write when fail occurs drm/amd/display: Fix memleak in amdgpu_dm_mode_config_init xen/xenbus: Fix granting of vmalloc'd memory fsldma: fix very broken 32-bit ppc ioread64 functionality dmaengine: of-dma: Fix of_dma_router_xlate's of_dma_xlate handling batman-adv: Avoid uninitialized chaddr when handling DHCP batman-adv: Fix own OGM check in aggregated OGMs batman-adv: bla: use netif_rx_ni when not in interrupt context dmaengine: at_hdmac: check return value of of_find_device_by_node() in at_dma_xlate() rxrpc: Keep the ACK serial in a var in rxrpc_input_ack() rxrpc: Make rxrpc_kernel_get_srtt() indicate validity MIPS: mm: BMIPS5000 has inclusive physical caches MIPS: BMIPS: Also call bmips_cpu_setup() for secondary cores mmc: sdhci-acpi: Fix HS400 tuning for AMDI0040 netfilter: nf_tables: add NFTA_SET_USERDATA if not null netfilter: nf_tables: incorrect enum nft_list_attributes definition netfilter: nf_tables: fix destination register zeroing net: hns: Fix memleak in hns_nic_dev_probe net: systemport: Fix memleak in bcm_sysport_probe ravb: Fixed to be able to unload modules net: arc_emac: Fix memleak in arc_mdio_probe dmaengine: pl330: Fix burst length if burst size is smaller than bus width gtp: add GTPA_LINK info to msg sent to userspace net: ethernet: ti: cpsw: fix clean up of vlan mc entries for host port bnxt_en: Don't query FW when netif_running() is false. bnxt_en: Check for zero dir entries in NVRAM. bnxt_en: Fix PCI AER error recovery flow bnxt_en: Fix possible crash in bnxt_fw_reset_task(). bnxt_en: fix HWRM error when querying VF temperature xfs: fix boundary test in xfs_attr_shortform_verify bnxt: don't enable NAPI until rings are ready media: vicodec: add missing v4l2_ctrl_request_hdl_put() media: cedrus: Add missing v4l2_ctrl_request_hdl_put() selftests/bpf: Fix massive output from test_maps net: dsa: mt7530: fix advertising unsupported 1000baseT_Half netfilter: nfnetlink: nfnetlink_unicast() reports EAGAIN instead of ENOBUFS nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()' nvme: fix controller instance leak cxgb4: fix thermal zone device registration perf tools: Correct SNOOPX field offset net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init() fix regression in "epoll: Keep a reference on files added to the check list" net: gemini: Fix another missing clk_disable_unprepare() in probe MIPS: add missing MSACSR and upper MSA initialization xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files perf jevents: Fix suspicious code in fixregex() tg3: Fix soft lockup when tg3_reset_task() fails. x86, fakenuma: Fix invalid starting node ID iommu/vt-d: Serialize IOMMU GCMD register modifications thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430 thermal: qcom-spmi-temp-alarm: Don't suppress negative temp iommu/amd: Restore IRTE.RemapEn bit after programming IRTE net/packet: fix overflow in tpacket_rcv include/linux/log2.h: add missing () around n in roundup_pow_of_two() vfio/type1: Support faulting PFNMAP vmas vfio-pci: Fault mmaps to enable vma tracking vfio-pci: Invalidate mmaps and block MMIO access on disabled memory iommu/vt-d: Handle 36bit addressing for x86-32 tracing/kprobes, x86/ptrace: Fix regs argument order for i386 ext2: don't update mtime on COW faults xfs: don't update mtime on COW faults ARC: perf: don't bail setup if pct irq missing in device-tree btrfs: drop path before adding new uuid tree entry btrfs: allocate scrub workqueues outside of locks btrfs: set the correct lockdep class for new nodes btrfs: set the lockdep class for log tree extent buffers btrfs: tree-checker: fix the error message for transid error net: core: use listified Rx for GRO_NORMAL in napi_gro_receive() btrfs: fix potential deadlock in the search ioctl Revert "net: dsa: microchip: set the correct number of ports" Revert "ALSA: hda: Add support for Loongson 7A1000 controller" ALSA: ca0106: fix error code handling ALSA: usb-audio: Add implicit feedback quirk for UR22C ALSA: pcm: oss: Remove superfluous WARN_ON() for mulaw sanity check ALSA: hda/hdmi: always check pin power status in i915 pin fixup ALSA: firewire-digi00x: exclude Avid Adrenaline from detection ALSA: hda - Fix silent audio output and corrupted input on MSI X570-A PRO ALSA; firewire-tascam: exclude Tascam FE-8 from detection ALSA: hda/realtek: Add quirk for Samsung Galaxy Book Ion NT950XCJ-X716A ALSA: hda/realtek - Improved routing for Thinkpad X1 7th/8th Gen arm64: dts: mt7622: add reset node for mmc device mmc: mediatek: add optional module reset property mmc: dt-bindings: Add resets/reset-names for Mediatek MMC bindings mmc: cqhci: Add cqhci_deactivate() mmc: sdhci-pci: Fix SDHCI_RESET_ALL for CQHCI for Intel GLK-based controllers media: rc: do not access device via sysfs after rc_unregister_device() media: rc: uevent sysfs file races with rc_unregister_device() affs: fix basic permission bits to actually work block: allow for_each_bvec to support zero len bvec block: ensure bdi->io_pages is always initialized libata: implement ATA_HORKAGE_MAX_TRIM_128M and apply to Sandisks blk-iocost: ioc_pd_free() shouldn't assume irq disabled dmaengine: dw-edma: Fix scatter-gather address calculation drm/amd/pm: avoid false alarm due to confusing softwareshutdowntemp setting dm writecache: handle DAX to partitions on persistent memory correctly dm mpath: fix racey management of PG initialization dm integrity: fix error reporting in bitmap mode after creation dm crypt: Initialize crypto wait structures dm cache metadata: Avoid returning cmd->bm wild pointer on error dm thin metadata: Avoid returning cmd->bm wild pointer on error dm thin metadata: Fix use-after-free in dm_bm_set_read_only mm: slub: fix conversion of freelist_corrupted() mm: madvise: fix vma user-after-free vfio/pci: Fix SR-IOV VF handling with MMIO blocking perf record: Correct the help info of option "--no-bpf-event" sdhci: tegra: Add missing TMCLK for data timeout checkpatch: fix the usage of capture group ( ... ) mm/hugetlb: fix a race between hugetlb sysctl handlers mm/khugepaged.c: fix khugepaged's request size in collapse_file cfg80211: regulatory: reject invalid hints net: usb: Fix uninit-was-stored issue in asix_read_phy_addr() Linux 5.4.64 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I34f83b15e9f9a59529f8d67a434577becf25d1a6 |
||
Yang Shi
|
f4fa8d937e |
mm: madvise: fix vma user-after-free
commit 7867fd7cc44e63c6673cd0f8fea155456d34d0de upstream.
The syzbot reported the below use-after-free:
BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996
CPU: 0 PID: 9996 Comm: syz-executor.0 Not tainted 5.9.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
madvise_willneed mm/madvise.c:293 [inline]
madvise_vma mm/madvise.c:942 [inline]
do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
do_madvise mm/madvise.c:1169 [inline]
__do_sys_madvise mm/madvise.c:1171 [inline]
__se_sys_madvise mm/madvise.c:1169 [inline]
__x64_sys_madvise+0xd9/0x110 mm/madvise.c:1169
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Allocated by task 9992:
kmem_cache_alloc+0x138/0x3a0 mm/slab.c:3482
vm_area_alloc+0x1c/0x110 kernel/fork.c:347
mmap_region+0x8e5/0x1780 mm/mmap.c:1743
do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
vm_mmap_pgoff+0x195/0x200 mm/util.c:506
ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 9992:
kmem_cache_free.part.0+0x67/0x1f0 mm/slab.c:3693
remove_vma+0x132/0x170 mm/mmap.c:184
remove_vma_list mm/mmap.c:2613 [inline]
__do_munmap+0x743/0x1170 mm/mmap.c:2869
do_munmap mm/mmap.c:2877 [inline]
mmap_region+0x257/0x1780 mm/mmap.c:1716
do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
vm_mmap_pgoff+0x195/0x200 mm/util.c:506
ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It is because vma is accessed after releasing mmap_lock, but someone
else acquired the mmap_lock and the vma is gone.
Releasing mmap_lock after accessing vma should fix the problem.
Fixes:
|
||
qctecmdr
|
6e625d330f | Merge "mm: Fix sleeping while atomic during speculative page fault" | ||
Minchan Kim
|
a95f164e8e |
mm: support vector address ranges for process_madvise
This patch changes process_madvise interface: a) support vector address ranges in a system call b) support the vector address ranges to local process as well as external process c) remove pid but keep only pidfd in argument - [1][2] d) change type of flags with unsgined int Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma. (With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost of TLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT MADV_MERGEABLE MADV_UNMERGEABLE Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. [1] https://lore.kernel.org/linux-mm/20200509124817.xmrvsrq3mla6b76k@wittgenstein/ [2] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@virtuozzo.com/ Link: http://lkml.kernel.org/r/20200518211350.GA50295@google.com Link: http://lkml.kernel.org/r/20200423145215.72666-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: John Dias <joaodias@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Git-Commit: cac63a8674fec9a9e288972475366896d706b53c Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git From: Randy Dunlap <rdunlap@infradead.org> Subject: mm-support-vector-address-ranges-for-process_madvise-fix-fix fix process_madvise prototype. [charante@codeaurora.org]: This is merged with cac63a8674fe ("mm: support vector address ranges for process_madvise" to avoid the compilation errors. Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Git-Commit: beadf725ea3f41cc23c77aefcb3139c081ff528d Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Change-Id: I8aa28294d56874f057779674337986dbbe7cf076 Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Oleksandr Natalenko
|
f7027e2612 |
mm/madvise: allow KSM hints for remote API
It all began with the fact that KSM works only on memory that is marked by madvise(). And the only way to get around that is to either: * use LD_PRELOAD; or * patch the kernel with something like UKSM or PKSM. (i skip ptrace can of worms here intentionally) To overcome this restriction, lets employ a new remote madvise API. This can be used by some small userspace helper daemon that will do auto-KSM job for us. I think of two major consumers of remote KSM hints: * hosts, that run containers, especially similar ones and especially in a trusted environment, sharing the same runtime like Node.js; * heavy applications, that can be run in multiple instances, not limited to opensource ones like Firefox, but also those that cannot be modified since they are binary-only and, maybe, statically linked. Speaking of statistics, more numbers can be found in the very first submission, that is related to this one [1]. For my current setup with two Firefox instances I get 100 to 200 MiB saved for the second instance depending on the amount of tabs. 1 FF instance with 15 tabs: $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc 410 2 FF instances, second one has 12 tabs (all the tabs are different): $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc 592 At the very moment I do not have specific numbers for containerised workload, but those should be comparable in case the containers share similar/same runtime. [1] https://lore.kernel.org/patchwork/patch/1012142/. Change-Id: I325b1ec4ee73fe03511391a687a6ecc4c5df89e7 Link: http://lkml.kernel.org/r/20200302193630.68771-8-minchan@kernel.org Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: SeongJae Park <sjpark@amazon.de> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: <linux-man@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Git-Commit: bc8bcac991aaa143015c099a0d804f23a21d0c48 Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Minchan Kim
|
306d0c3e29 |
mm/madvise: support both pid and pidfd for process_madvise
There is a demand[1] to support pid as well pidfd for process_madvise to reduce unnecessary syscall to get pidfd if the user has control of the target process (ie, they could guarantee the process is not gone or pid is not reused). This patch aims for supporting both options like waitid(2). So, the syscall is currently, int process_madvise(idtype_t idtype, id_t id, void *addr, size_t length, int advice, unsigned long flags); @which is actually idtype_t for userspace library and currently, it supports P_PID and P_PIDFD. [1] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@virtuozzo.com/. Change-Id: I06249a7621685e120e548a94098e6cce8d32d38d Link: http://lkml.kernel.org/r/20200302193630.68771-6-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <christian@brauner.io> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: <linux-man@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Git-Commit: 6c7663468de1b093e8ef5c0ef1df42c890f612bf Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Minchan Kim
|
ab3b21e52e |
mm/madvise: check fatal signal pending of target process
Bail out to prevent unnecessary CPU overhead if target process has pending fatal signal during (MADV_COLD|MADV_PAGEOUT) operation. Change-Id: Ic264c3ba5b16c91b6395e5a361a3c47e4fd69256 Link: http://lkml.kernel.org/r/20200302193630.68771-4-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: <linux-man@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Git-Commit: d6d0112994a95b14016cc7b2825289999fc1b78c Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Minchan Kim
|
f4ed73112f |
mm/madvise: introduce process_madvise() syscall: an external memory hinting API
There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. int process_madvise(int pidfd, void *addr, size_t length, int advice, unsigned long flags); Since it could affect other process's address range, only privileged process(CAP_SYS_PTRACE) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/ Conflicts: arch/alpha/kernel/syscalls/syscall.tbl arch/arm/tools/syscall.tbl arch/arm64/include/asm/unistd.h arch/arm64/include/asm/unistd32.h arch/ia64/kernel/syscalls/syscall.tbl arch/m68k/kernel/syscalls/syscall.tbl arch/microblaze/kernel/syscalls/syscall.tbl arch/mips/kernel/syscalls/syscall_n32.tbl arch/mips/kernel/syscalls/syscall_n64.tbl arch/parisc/kernel/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl arch/sh/kernel/syscalls/syscall.tbl arch/sparc/kernel/syscalls/syscall.tbl arch/x86/entry/syscalls/syscall_32.tbl arch/x86/entry/syscalls/syscall_64.tbl arch/xtensa/kernel/syscalls/syscall.tbl include/uapi/asm-generic/unistd.h Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: <linux-man@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Git-commit: 8422ccd91057d2814466d90ed05d44b359e88ba9 Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git [charante@codeaurora.org: Fixed merged conflicts] Change-Id: I187d2a764db09f0868cd11c7536d7a1ed6a54f3a Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Laurent Dufour
|
06219e6c87 |
mm: protect VMA modifications using VMA sequence count
The VMA sequence count has been introduced to allow fast detection of VMA modification when running a page fault handler without holding the mmap_sem. This patch provides protection against the VMA modification done in : - madvise() - mpol_rebind_policy() - vma_replace_policy() - change_prot_numa() - mlock(), munlock() - mprotect() - mmap_region() - collapse_huge_page() - userfaultd registering services In addition, VMA fields which will be read during the speculative fault path needs to be written using WRITE_ONCE to prevent write to be split and intermediate values to be pushed to other CPUs. Change-Id: Ic36046b7254e538b6baf7144c50ae577ee7f2074 Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Patch-mainline: linux-mm @ Tue, 17 Apr 2018 16:33:15 [vinmenon@codeaurora.org: trivial merge conflict fixes] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> [charante@codeaurora.org: trivial merge conflict fixes] Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Minchan Kim
|
f51a23e5bf |
mm/madvise: pass task and mm to do_madvise
Patch series "introduce memory hinting API for external process", v7. Now, we have MADV_PAGEOUT and MADV_COLD as madvise hinting API. With that, application could give hints to kernel what memory range are preferred to be reclaimed. However, in some platform(e.g., Android), the information required to make the hinting decision is not known to the app. Instead, it is known to a centralized userspace daemon(e.g., ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the concern, this patch introduces new syscall - process_madvise(2). Bascially, it's same with madvise(2) syscall but it has some differences. 1. It needs pidfd of target process to provide the hint 2. It supports only MADV_{COLD|PAGEOUT|MERGEABLE|UNMEREABLE} at this moment. Other hints in madvise will be opened when there are explicit requests from community to prevent unexpected bugs we couldn't support. 3. Only privileged processes can do something for other process's address space. For more detail of the new API, please see "mm: introduce external memory hinting API" description in this patchset. This patch (of 7): In upcoming patches, do_madvise will be called from external process context so we shouldn't asssume "current" is always hinted process's task_struct. Furthermore, we must not access mm_struct via task->mm, but obtain it via access_mm() once (in the following patch) and only use that pointer [1], so pass it to do_madvise() as well. Note the vma->vm_mm pointers are safe, so we can use them further down the call stack. And let's pass *current* and current->mm as arguments of do_madvise so it shouldn't change existing behavior but prepare next patch to make review easy. Note: io_madvise passes NULL as target_task argument of do_madvise because it couldn't know who is target. [1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com [vbabka@suse.cz: changelog tweak] [minchan@kernel.org: use current->mm for io_uring] Link: http://lkml.kernel.org/r/20200423145215.72666-1-minchan@kernel.org [akpm@linux-foundation.org: fix it for upstream changes] [akpm@linux-foundation.org: whoops] [rdunlap@infradead.org: add missing includes] Link: http://lkml.kernel.org/r/20200302193630.68771-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jann Horn <jannh@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: John Dias <joaodias@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Christian Brauner <christian@brauner.io> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: <linux-man@vger.kernel.org> From: Minchan Kim <minchan@kernel.org> Subject: mm/madvise: introduce process_madvise() syscall: an external memory hinting API There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. int process_madvise(int pidfd, void *addr, size_t length, int advice, unsigned long flags); Since it could affect other process's address range, only privileged process(CAP_SYS_PTRACE) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/ Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: <linux-man@vger.kernel.org> From: Minchan Kim <minchan@kernel.org> Subject: fix process_madvise build break for arm64 0-day reported build break from process_madvise on ARM64. aarch64-linux-ld: arch/arm64/kernel/head.o: relocation R_AARCH64_ABS32 against `_kernel_offset_le_lo32' can not be used when making a shared object aarch64-linux-ld: arch/arm64/kernel/efi-entry.stub.o: relocation R_AARCH64_ABS32 against `__efistub_stext_offset' can not be used when making a shared object arch/arm64/kernel/head.o: In function `kimage_vaddr': (.idmap.text+0x0): dangerous relocation: unsupported relocation arch/arm64/kernel/head.o: In function `__primary_switch': (.idmap.text+0x378): dangerous relocation: unsupported relocation (.idmap.text+0x380): dangerous relocation: unsupported relocation >> arch/arm64/kernel/sys32.o:(.rodata+0xdb8): undefined reference to `__arm64_process_madvise' This patch should fix it. Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> From: Minchan Kim <minchan@kernel.org> Subject: mm: fix build error for mips of process_madvise kbuild test rebot reported build break of process_madvise for mips[1]. This patch should fix it. [1] https://lore.kernel.org/linux-mm/202005080716.cUcbCQ3i%25lkp@intel.com/ Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> From: Andrew Morton <akpm@linux-foundation.org> Subject: mm-introduce-external-memory-hinting-api-fix-2-fix the compat bit comes later Cc: Minchan Kim <minchan@kernel.org> From: Minchan Kim <minchan@kernel.org> Subject: mm/madvise: check fatal signal pending of target process Bail out to prevent unnecessary CPU overhead if target process has pending fatal signal during (MADV_COLD|MADV_PAGEOUT) operation. Link: http://lkml.kernel.org/r/20200302193630.68771-4-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: <linux-man@vger.kernel.org> From: Minchan Kim <minchan@kernel.org> Subject: pid: move pidfd_get_pid() to pid.c process_madvise syscall needs pidfd_get_pid function to translate pidfd to pid so this patch move the function to kernel/pid.c. Link: http://lkml.kernel.org/r/20200302193630.68771-5-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jann Horn <jannh@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: <linux-man@vger.kernel.org> From: Minchan Kim <minchan@kernel.org> Subject: mm/madvise: support both pid and pidfd for process_madvise There is a demand[1] to support pid as well pidfd for process_madvise to reduce unnecessary syscall to get pidfd if the user has control of the target process (ie, they could guarantee the process is not gone or pid is not reused). This patch aims for supporting both options like waitid(2). So, the syscall is currently, int process_madvise(idtype_t idtype, id_t id, void *addr, size_t length, int advice, unsigned long flags); @which is actually idtype_t for userspace library and currently, it supports P_PID and P_PIDFD. [1] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@virtuozzo.com/ Link: http://lkml.kernel.org/r/20200302193630.68771-6-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <christian@brauner.io> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: <linux-man@vger.kernel.org> From: Oleksandr Natalenko <oleksandr@redhat.com> Subject: mm/madvise: allow KSM hints for remote API It all began with the fact that KSM works only on memory that is marked by madvise(). And the only way to get around that is to either: * use LD_PRELOAD; or * patch the kernel with something like UKSM or PKSM. (i skip ptrace can of worms here intentionally) To overcome this restriction, lets employ a new remote madvise API. This can be used by some small userspace helper daemon that will do auto-KSM job for us. I think of two major consumers of remote KSM hints: * hosts, that run containers, especially similar ones and especially in a trusted environment, sharing the same runtime like Node.js; * heavy applications, that can be run in multiple instances, not limited to opensource ones like Firefox, but also those that cannot be modified since they are binary-only and, maybe, statically linked. Speaking of statistics, more numbers can be found in the very first submission, that is related to this one [1]. For my current setup with two Firefox instances I get 100 to 200 MiB saved for the second instance depending on the amount of tabs. 1 FF instance with 15 tabs: $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc 410 2 FF instances, second one has 12 tabs (all the tabs are different): $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc 592 At the very moment I do not have specific numbers for containerised workload, but those should be comparable in case the containers share similar/same runtime. [1] https://lore.kernel.org/patchwork/patch/1012142/ Link: http://lkml.kernel.org/r/20200302193630.68771-8-minchan@kernel.org Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: SeongJae Park <sjpark@amazon.de> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <linux-man@vger.kernel.org> From: Minchan Kim <minchan@kernel.org> Subject: mm: support vector address ranges for process_madvise This patch extends a) process_madvise(2) support vector address ranges in a system call and then b) support the vector address ranges to local process as well as external process. Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma. (With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost of TLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. So finally, the API is as follows, ssize_t process_madvise(idtype_t idtype, id_t id, const struct iovec *iovec, unsigned long vlen, int advice, unsigned long flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The idtype and id arguments select the target process to be advised as follows: idtype == P_PID select the process whose process ID matches id idtype == P_PIDFD select the process referred to by the PID file descriptor specified in id. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by idtype and id is external. MADV_COLD MADV_PAGEOUT MADV_MERGEABLE MADV_UNMERGEABLE Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. Link: http://lkml.kernel.org/r/20200423145215.72666-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: John Dias <joaodias@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> From: Minchan Kim <minchan@kernel.org> Subject: mm: support compat_sys_process_madvise This patch supports compat syscall for process_madvise Link: http://lkml.kernel.org/r/20200423195835.GA46847@google.com Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> From: Randy Dunlap <rdunlap@infradead.org> Subject: mm-support-vector-address-ranges-for-process_madvise-fix-fix fix process_madvise prototype Cc: Minchan Kim <minchan@kernel.org> From: Zheng Bin <zhengbin13@huawei.com> Subject: mm/madvise: make function 'do_process_madvise' static Fix sparse warnings: mm/madvise.c:1233:9: warning: symbol 'do_process_madvise' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20200429014030.41147-1-zhengbin13@huawei.com Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Cc: Minchan Kim <minchan@kernel.org> From: Minchan Kim <minchan@kernel.org> Subject: mm: fix s390 compat build error Nathan reported build error with sys_compat_process_madvise. This patch should fix it. Link: http://lkml.kernel.org/r/20200429012421.GA132200@google.com Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> [build] From: Andrew Morton <akpm@linux-foundation.org> Subject: mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix add compat_sys_process_madvise to mips syscall table Conflicts: fs/io_uring.c mm/madvise.c Change-Id: I89b92904043c6d7fbf9747746d20b823dbc20410 Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Git-commit: 82f576dd0298d675df9b19ac0638d79b5ca79e59 Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [charante@codeaurora.org: Fixed merge conflicts] Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Linus Torvalds
|
4b3fb647e2 |
mm: check that mm is still valid in madvise()
IORING_OP_MADVISE can end up basically doing mprotect() on the VM of
another process, which means that it can race with our crazy core dump
handling which accesses the VM state without holding the mmap_sem
(because it incorrectly thinks that it is the final user).
This is clearly a core dumping problem, but we've never fixed it the
right way, and instead have the notion of "check that the mm is still
ok" using mmget_still_valid() after getting the mmap_sem for writing in
any situation where we're not the original VM thread.
See commit
|
||
Jens Axboe
|
4f8f769a62 |
mm: make do_madvise() available internally
This is in preparation for enabling this functionality through io_uring. Add a helper that is just exporting what sys_madvise() does, and have the system call use it. No functional changes in this patch. Change-Id: I77394d51a06aeb36e0ba733787dbd71a8e06a582 Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Git-commit: db08ca25253d56f1f76eb4b3fe32a7ac1fbab741 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
Greg Kroah-Hartman
|
2341be6d9d |
This is the 5.4.28 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl57B4gACgkQONu9yGCS aT6KWQ//ToF4D+fDl1Muf4xuT83HnXe1yA1XFlyC54ZmEaxGFnFMSAgAvitBR7HC GEczXlvYbbVJl646ynTmX/LFz2d+V0i2zGv2QKN3xfd8GtDrAq/s+Ffhneaskk1k gKkIUzOyBq0nEAq5vXbCT4LbQDYGLw8BxxvurLim/YywSy5sfnw+hKkYE7cVVSOa rTIdt5qnuL4pxD8VeCAakoU6SajoxfFqS8pX79LC1UPY+++OaVcJjyJvM4APw0Kr e2E1BaeZxxCyY57pQY2oqhjG3cCfIcmfln19JRzMCVNo9+J3MEjZI5EUqP/Zcjwz 1V5FHqDmqMGfA9cn+CexDk79bTKW5+YOyYMEG2RJjV5alWJtvv3Wj6dRPVPpBhnJ O627IuIVGMsHuiEbjziKczaTtYYU5GTpJF+7COH0Jnud0q1w3/RjaJXspnb2Yozh L9BFqc4aonD+AyW2NXTxuuvc3hnD2YSVgLectm7LF/TbM2YFJVu4tounelFsGG6I CPH2VF0Tn+yNR2iWf8igvopvYPCjv+QFGXU6kFGxQxLQFTnXHqoO2sPF4awXx7Hv XF8LrJzPwissX5BbPyhUSIl0FEcmQi6UzSN1/6fpxVq+092OGVacMWpZvwqjUOV/ 3k/OrmcYsu7i2UUbms47YHAK0PRkL2ogKgxgcSO4aNZ7MfkXkm8= =kp1m -----END PGP SIGNATURE----- Merge 5.4.28 into android-5.4 Changes in 5.4.28 locks: fix a potential use-after-free problem when wakeup a waiter locks: reinstate locks_delete_block optimization spi: spi-omap2-mcspi: Support probe deferral for DMA channels drm/mediatek: Find the cursor plane instead of hard coding it phy: ti: gmii-sel: fix set of copy-paste errors phy: ti: gmii-sel: do not fail in case of gmii ARM: dts: dra7-l4: mark timer13-16 as pwm capable spi: qup: call spi_qup_pm_resume_runtime before suspending powerpc: Include .BTF section cifs: fix potential mismatch of UNC paths cifs: add missing mount option to /proc/mounts ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes spi: pxa2xx: Add CS control clock quirk spi/zynqmp: remove entry that causes a cs glitch drm/exynos: dsi: propagate error value and silence meaningless warning drm/exynos: dsi: fix workaround for the legacy clock name drm/exynos: hdmi: don't leak enable HDMI_EN regulator if probe fails drivers/perf: fsl_imx8_ddr: Correct the CLEAR bit definition drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer altera-stapl: altera_get_note: prevent write beyond end of 'key' dm bio record: save/restore bi_end_io and bi_integrity dm integrity: use dm_bio_record and dm_bio_restore riscv: avoid the PIC offset of static percpu data in module beyond 2G limits ASoC: stm32: sai: manage rebind issue spi: spi_register_controller(): free bus id on error paths riscv: Force flat memory model with no-mmu riscv: Fix range looking for kernel image memblock drm/amdgpu: clean wptr on wb when gpu recovery drm/amd/display: Clear link settings on MST disable connector drm/amd/display: fix dcc swath size calculations on dcn1 xenbus: req->body should be updated before req->state xenbus: req->err should be updated before req->state block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group() parse-maintainers: Mark as executable binderfs: use refcount for binder control devices too Revert "drm/fbdev: Fallback to non tiled mode if all tiles not present" USB: Disable LPM on WD19's Realtek Hub usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters USB: serial: option: add ME910G1 ECM composition 0x110b usb: host: xhci-plat: add a shutdown USB: serial: pl2303: add device-id for HP LD381 usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c usb: typec: ucsi: displayport: Fix NULL pointer dereference usb: typec: ucsi: displayport: Fix a potential race during registration USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL USB: cdc-acm: fix rounding error in TIOCSSERIAL ALSA: line6: Fix endless MIDI read loop ALSA: hda/realtek - Enable headset mic of Acer X2660G with ALC662 ALSA: hda/realtek - Enable the headset of Acer N50-600 with ALC662 ALSA: seq: virmidi: Fix running status after receiving sysex ALSA: seq: oss: Fix running status after receiving sysex ALSA: pcm: oss: Avoid plugin buffer overflow ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks tty: fix compat TIOCGSERIAL leaking uninitialized memory tty: fix compat TIOCGSERIAL checking wrong function ptr iio: chemical: sps30: fix missing triggered buffer dependency iio: st_sensors: remap SMO8840 to LIS2DH12 iio: trigger: stm32-timer: disable master mode when stopping iio: accel: adxl372: Set iio_chan BE iio: magnetometer: ak8974: Fix negative raw values in sysfs iio: adc: stm32-dfsdm: fix sleep in atomic context iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode iio: light: vcnl4000: update sampling periods for vcnl4200 iio: light: vcnl4000: update sampling periods for vcnl4040 mmc: rtsx_pci: Fix support for speed-modes that relies on tuning mmc: sdhci-of-at91: fix cd-gpios for SAMA5D2 mmc: sdhci-cadence: set SDHCI_QUIRK2_PRESET_VALUE_BROKEN for UniPhier CIFS: fiemap: do not return EINVAL if get nothing kbuild: Disable -Wpointer-to-enum-cast staging: rtl8188eu: Add device id for MERCUSYS MW150US v2 staging: greybus: loopback_test: fix poll-mask build breakage staging/speakup: fix get_word non-space look-ahead intel_th: msu: Fix the unexpected state warning intel_th: Fix user-visible error codes intel_th: pci: Add Elkhart Lake CPU support modpost: move the namespace field in Module.symvers last rtc: max8907: add missing select REGMAP_IRQ arm64: compat: Fix syscall number of compat_clock_getres xhci: Do not open code __print_symbolic() in xhci trace events btrfs: fix log context list corruption after rename whiteout error drm/amd/amdgpu: Fix GPR read from debugfs (v2) drm/lease: fix WARNING in idr_destroy stm class: sys-t: Fix the use of time_after() memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event mm, memcg: fix corruption on 64-bit divisor in memory.high throttling mm, memcg: throttle allocators based on ancestral memory.high mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case mm: do not allow MADV_PAGEOUT for CoW pages epoll: fix possible lost wakeup on epoll_ctl() path mm: slub: be more careful about the double cmpxchg of freelist mm, slub: prevent kmalloc_node crashes and memory leaks page-flags: fix a crash at SetPageError(THP_SWAP) x86/mm: split vmalloc_sync_all() futex: Fix inode life-time issue futex: Unbreak futex hashing ALSA: hda/realtek: Fix pop noise on ALC225 arm64: smp: fix smp_send_stop() behaviour arm64: smp: fix crash_smp_send_stop() behaviour nvmet-tcp: set MSG_MORE only if we actually have more to send drm/bridge: dw-hdmi: fix AVI frame colorimetry staging: greybus: loopback_test: fix potential path truncation staging: greybus: loopback_test: fix potential path truncations Linux 5.4.28 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5d9d15d6236d8ab7374205c6ceda7efa7a9abb70 |
||
Michal Hocko
|
69f434a05f |
mm: do not allow MADV_PAGEOUT for CoW pages
commit 12e967fd8e4e6c3d275b4c69c890adc838891300 upstream.
Jann has brought up a very interesting point [1]. While shared pages
are excluded from MADV_PAGEOUT normally, CoW pages can be easily
reclaimed that way. This can lead to all sorts of hard to debug
problems. E.g. performance problems outlined by Daniel [2].
There are runtime environments where there is a substantial memory
shared among security domains via CoW memory and a easy to reclaim way
of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either
performance degradation in for the parent process which might be more
privileged or even open side channel attacks.
The feasibility of the latter is not really clear to me TBH but there is
no real reason for exposure at this stage. It seems there is no real
use case to depend on reclaiming CoW memory via madvise at this stage so
it is much easier to simply disallow it and this is what this patch
does. Put it simply MADV_{PAGEOUT,COLD} can operate only on the
exclusively owned memory which is a straightforward semantic.
[1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
[2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com
Fixes:
|
||
Greg Kroah-Hartman
|
ad5859c6ae |
Linux 5.4-rc8
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3RzgkeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGN18H/0JZbfIpy8/4Irol 0va7Aj2fBi1a5oxfqYsMKN0u3GKbN3OV9tQ+7w1eBNGvL72TGadgVTzTY+Im7A9U UjboAc7jDPCG+YhIwXFufMiIAq5jDIj6h0LDas7ALsMfsnI/RhTwgNtLTAkyI3dH YV/6ljFULwueJHCxzmrYbd1x39PScj3kCNL2pOe6On7rXMKOemY/nbbYYISxY30E GMgKApSS+li7VuSqgrKoq5Qaox26LyR2wrXB1ij4pqEJ9xgbnKRLdHuvXZnE+/5p 46EMirt+yeSkltW3d2/9MoCHaA76ESzWMMDijLx7tPgoTc3RB3/3ZLsm3rYVH+cR cRlNNSk= =0+Cg -----END PGP SIGNATURE----- Merge 5.4-rc8 into android-mainline Linux 5.4-rc8 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I1f55e5d34dc78ddb064910ce1e1b7a7b5b39aaba |
||
zhong jiang
|
8207296297 |
mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
Recently, I hit the following issue when running upstream.
kernel BUG at mm/vmscan.c:1521!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1
RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521
Call Trace:
reclaim_pages+0x499/0x800 mm/vmscan.c:2188
madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453
walk_pmd_range mm/pagewalk.c:53 [inline]
walk_pud_range mm/pagewalk.c:112 [inline]
walk_p4d_range mm/pagewalk.c:139 [inline]
walk_pgd_range mm/pagewalk.c:166 [inline]
__walk_page_range+0x45a/0xc20 mm/pagewalk.c:261
walk_page_range+0x179/0x310 mm/pagewalk.c:349
madvise_pageout_page_range mm/madvise.c:506 [inline]
madvise_pageout+0x1f0/0x330 mm/madvise.c:542
madvise_vma mm/madvise.c:931 [inline]
__do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113
do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
madvise_pageout() accesses the specified range of the vma and isolates
them, then runs shrink_page_list() to reclaim its memory. But it also
isolates the unevictable pages to reclaim. Hence, we can catch the
cases in shrink_page_list().
The root cause is that we scan the page tables instead of specific LRU
list. and so we need to filter out the unevictable lru pages from our
end.
Link: http://lkml.kernel.org/r/1572616245-18946-1-git-send-email-zhongjiang@huawei.com
Fixes:
|
||
Greg Kroah-Hartman
|
94139142d9 |
Merge 5.4-rc1-prelrease into android-mainline
To make the 5.4-rc1 merge easier, merge at a prerelease point in time before the final release happens. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: If613d657fd0abf9910c5bf3435a745f01b89765e |
||
Minchan Kim
|
d616d51265 |
mm: factor out common parts between MADV_COLD and MADV_PAGEOUT
There are many common parts between MADV_COLD and MADV_PAGEOUT. This patch factor them out to save code duplication. Link: http://lkml.kernel.org/r/20190726023435.214162-6-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: kbuild test robot <lkp@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Minchan Kim
|
1a4e58cce8 |
mm: introduce MADV_PAGEOUT
When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ [minchan@kernel.org: clear PG_active on MADV_PAGEOUT] Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Minchan Kim
|
9c276cc65a |
mm: introduce MADV_COLD
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
057d338910 |
mm: untag user pointers passed to memory syscalls
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. This patch allows tagged pointers to be passed to the following memory syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect, mremap, msync, munlock, move_pages. The mmap and mremap syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma. Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mike Rapoport
|
f3bc0dba31 |
mm/madvise: reduce code duplication in error handling paths
madvise_behavior() converts -ENOMEM to -EAGAIN in several places using identical code. Move that code to a common error handling path. No functional changes. Link: http://lkml.kernel.org/r/1564640896-1210-1-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Greg Kroah-Hartman
|
896be8f44d |
Merge 5.4-rc1-prereleae into android-mainline
To make the 5.4-rc1 merge easier, merge at a prerelease point in time before the final release happens. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I29b683c837ed1a3324644dbf9bf863f30740cd0b |
||
Linus Torvalds
|
84da111de0 |
hmm related patches for 5.4
This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl1/nnkACgkQOG33FX4g mxqaRg//c6FqowV1pQlLutvAOAgMdpzfZ9eaaDKngy9RVQxz+k/MmJrdRH/p/mMA Pq93A1XfwtraGKErHegFXGEDk4XhOustVAVFwvjyXO41dTUdoFVUkti6ftbrl/rS 6CT+X90jlvrwdRY7QBeuo7lxx7z8Qkqbk1O1kc1IOracjKfNJS+y6LTamy6weM3g tIMHI65PkxpRzN36DV9uCN5dMwFzJ73DWHp1b0acnDIigkl6u5zp6orAJVWRjyQX nmEd3/IOvdxaubAoAvboNS5CyVb4yS9xshWWMbH6AulKJv3Glca1Aa7QuSpBoN8v wy4c9+umzqRgzgUJUe1xwN9P49oBNhJpgBSu8MUlgBA4IOc3rDl/Tw0b5KCFVfkH yHkp8n6MP8VsRrzXTC6Kx0vdjIkAO8SUeylVJczAcVSyHIo6/JUJCVDeFLSTVymh EGWJ7zX2iRhUbssJ6/izQTTQyCH3YIyZ5QtqByWuX2U7ZrfkqS3/EnBW1Q+j+gPF Z2yW8iT6k0iENw6s8psE9czexuywa/Lttz94IyNlOQ8rJTiQqB9wLaAvg9hvUk7a kuspL+JGIZkrL3ouCeO/VA6xnaP+Q7nR8geWBRb8zKGHmtWrb5Gwmt6t+vTnCC2l olIDebrnnxwfBQhEJ5219W+M1pBpjiTpqK/UdBd92A4+sOOhOD0= =FRGg -----END PGP SIGNATURE----- Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits) libnvdimm: Enable unit test infrastructure compile checks mm, notifier: Catch sleeping/blocking for !blockable kernel.h: Add non_block_start/end() drm/radeon: guard against calling an unpaired radeon_mn_unregister() csky: add missing brackets in a macro for tlb.h pagewalk: use lockdep_assert_held for locking validation pagewalk: separate function pointers from iterator data mm: split out a new pagewalk.h header from mm.h mm/mmu_notifiers: annotate with might_sleep() mm/mmu_notifiers: prime lockdep mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports mm/hmm: hmm_range_fault() infinite loop mm/hmm: hmm_range_fault() NULL pointer bug mm/hmm: fix hmm_range_fault()'s handling of swapped out pages mm/mmu_notifiers: remove unregister_no_release RDMA/odp: remove ib_ucontext from ib_umem RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address ... |
||
Greg Kroah-Hartman
|
bfa0399bc8 |
Merge Linus's 5.4-rc1-prerelease branch into android-mainline
This merges Linus's tree as of commit
|
||
Christoph Hellwig
|
7b86ac3371 |
pagewalk: separate function pointers from iterator data
The mm_walk structure currently mixed data and code. Split out the operations vectors into a new mm_walk_ops structure, and while we are changing the API also declare the mm_walk structure inside the walk_page_range and walk_page_vma functions. Based on patch from Linus Torvalds. Link: https://lore.kernel.org/r/20190828141955.22210-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> |
||
Christoph Hellwig
|
a520110e4a |
mm: split out a new pagewalk.h header from mm.h
Add a new header for the two handful of users of the walk_page_range / walk_page_vma interface instead of polluting all users of mm.h with it. Link: https://lore.kernel.org/r/20190828141955.22210-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> |
||
Jan Kara
|
692fe62433 |
mm: Handle MADV_WILLNEED through vfs_fadvise()
Currently handling of MADV_WILLNEED hint calls directly into readahead code. Handle it by calling vfs_fadvise() instead so that filesystem can use its ->fadvise() callback to acquire necessary locks or otherwise prepare for the request. Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Boaz Harrosh <boazh@netapp.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
||
Greg Kroah-Hartman
|
37766c2946 |
Linus 5.3-rc1
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0006weHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGaDUIAJ4oTyVWpMRZkfG6 vVY8qVMU3zlzEqRiyLYjkXoe/mGpuU/UVTyyStllxZ+Gg9da0mGwlugScKriPJof 4KRUDDTGX5DrfEOo+0brKvM+PYh9uGViPgKXzyv7i6BrnX2z3JdBR4bKNuEYlAJ9 N93Qg7v05SBHIq2Gfp3klrdWbsTTW2EaDTLbcgifXLnfKyFr47kwsmXAHPlTFP0p dYsZHHmf14Y9n1+ToZeVINgjQFr6mFn6ygY/PqTnd6vCgEEfP9eENJ4BZCtN1ZL/ V0BO9MyJ5iZV0AfwSEKydk+kDEvO16TG/nyDrECVuur7AXsBx18ZplVc787f6GK+ dyCQJ3U= =XLAF -----END PGP SIGNATURE----- Merge 5.3.0-rc1 into android-mainline Linus 5.3-rc1 release Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic171e37d4c21ffa495240c5538852bbb5a9dcce8 |
||
Christoph Hellwig
|
25b2995a35 |
mm: remove MEMORY_DEVICE_PUBLIC support
The code hasn't been used since it was added to the tree, and doesn't appear to actually be usable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> |
||
Greg Kroah-Hartman
|
1226c72a32 |
Linux 5.2-rc1
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlzh3PgeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGnhoH/jkl4X66KuOXvGXa 9pgFzyEa3Mhqs0j+AaJYmRyoRty1CbAfMLEWgr0JbZy56zm0PhtXOxcu56/tfdtw f5j8OJLDvld10imHXxUrom9zc546Ff90VTOvWmsYznszTz0rV5HLmKgVIIc7ZN8W 6hshDOy/rviUcPAVrLKdZffzgQZlASNS7To7IBE9okT4QHEtER7dgzM/Z0VAg9R1 guCPaN8tje4vq2Lv7+J5T9LOF1RObCbKXNADwXY1rMRK5ao3xS93eDnJ8Vn08utI UECIVfnYsA6pGl6v1ErCl9izx9MoTU3Crle7BRzVbrw7furvB2lJ1R4RGwqRbvcB HovhmHI= =TMir -----END PGP SIGNATURE----- Merge 5.2-rc1 into android-mainline Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Jérôme Glisse
|
7269f99993 |
mm/mmu_notifier: use correct mmu_notifier events for each invalidation
This updates each existing invalidation to use the correct mmu notifier event that represent what is happening to the CPU page table. See the patch which introduced the events to see the rational behind this. Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Jérôme Glisse
|
6f4f13e8d9 |
mm/mmu_notifier: contextual information for event triggering invalidation
CPU page table update can happens for many reasons, not only as a result of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as a result of kernel activities (memory compression, reclaim, migration, ...). Users of mmu notifier API track changes to the CPU page table and take specific action for them. While current API only provide range of virtual address affected by the change, not why the changes is happening. This patchset do the initial mechanical convertion of all the places that calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP event as well as the vma if it is know (most invalidation happens against a given vma). Passing down the vma allows the users of mmu notifier to inspect the new vma page protection. The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier should assume that every for the range is going away when that event happens. A latter patch do convert mm call path to use a more appropriate events for each call. This is done as 2 patches so that no call site is forgotten especialy as it uses this following coccinelle patch: %<---------------------------------------------------------------------- @@ identifier I1, I2, I3, I4; @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *I1, +enum mmu_notifier_event event, +unsigned flags, +struct vm_area_struct *vma, struct mm_struct *I2, unsigned long I3, unsigned long I4) { ... } @@ @@ -#define mmu_notifier_range_init(range, mm, start, end) +#define mmu_notifier_range_init(range, event, flags, vma, mm, start, end) @@ expression E1, E3, E4; identifier I1; @@ <... mmu_notifier_range_init(E1, +MMU_NOTIFY_UNMAP, 0, I1, I1->vm_mm, E3, E4) ...> @@ expression E1, E2, E3, E4; identifier FN, VMA; @@ FN(..., struct vm_area_struct *VMA, ...) { <... mmu_notifier_range_init(E1, +MMU_NOTIFY_UNMAP, 0, VMA, E2, E3, E4) ...> } @@ expression E1, E2, E3, E4; identifier FN, VMA; @@ FN(...) { struct vm_area_struct *VMA; <... mmu_notifier_range_init(E1, +MMU_NOTIFY_UNMAP, 0, VMA, E2, E3, E4) ...> } @@ expression E1, E2, E3, E4; identifier FN; @@ FN(...) { <... mmu_notifier_range_init(E1, +MMU_NOTIFY_UNMAP, 0, NULL, E2, E3, E4) ...> } ---------------------------------------------------------------------->% Applied with: spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place spatch --sp-file mmu-notifier.spatch --dir mm --in-place Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Todd Kjos
|
0f2cb7cf80 |
Merge branch 'linux-mainline' into android-mainline-tmp
Change-Id: I4380c68c3474026a42ffa9f95c525f9a563ba7a3 |
||
Colin Cross
|
60500a4228 |
ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory. When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.
This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma. vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.
The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas. If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".
The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen. This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging. The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.
Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list. Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.
Bug: 120441514
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Fix get_user_pages_remote() call to align with upstream commit
|
||
Peter Zijlstra
|
ed6a79352c |
asm-generic/tlb, arch: Provide CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
Move the mmu_gather::page_size things into the generic code instead of PowerPC specific bits. No change in behavior intended. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Jérôme Glisse
|
ac46d4f3c4 |
mm/mmu_notifier: use structure for invalidate_range_start/end calls v2
To avoid having to change many call sites everytime we want to add a parameter use a structure to group all parameters for the mmu_notifier invalidate_range_start/end cakks. No functional changes with this patch. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/20181205053628.3210-3-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> From: Jérôme Glisse <jglisse@redhat.com> Subject: mm/mmu_notifier: use structure for invalidate_range_start/end calls v3 fix build warning in migrate.c when CONFIG_MMU_NOTIFIER=n Link: http://lkml.kernel.org/r/20181213171330.8489-3-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
dad4f140ed |
Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax
Pull XArray conversion from Matthew Wilcox: "The XArray provides an improved interface to the radix tree data structure, providing locking as part of the API, specifying GFP flags at allocation time, eliminating preloading, less re-walking the tree, more efficient iterations and not exposing RCU-protected pointers to its users. This patch set 1. Introduces the XArray implementation 2. Converts the pagecache to use it 3. Converts memremap to use it The page cache is the most complex and important user of the radix tree, so converting it was most important. Converting the memremap code removes the only other user of the multiorder code, which allows us to remove the radix tree code that supported it. I have 40+ followup patches to convert many other users of the radix tree over to the XArray, but I'd like to get this part in first. The other conversions haven't been in linux-next and aren't suitable for applying yet, but you can see them in the xarray-conv branch if you're interested" * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits) radix tree: Remove multiorder support radix tree test: Convert multiorder tests to XArray radix tree tests: Convert item_delete_rcu to XArray radix tree tests: Convert item_kill_tree to XArray radix tree tests: Move item_insert_order radix tree test suite: Remove multiorder benchmarking radix tree test suite: Remove __item_insert memremap: Convert to XArray xarray: Add range store functionality xarray: Move multiorder_check to in-kernel tests xarray: Move multiorder_shrink to kernel tests xarray: Move multiorder account test in-kernel radix tree test suite: Convert iteration test to XArray radix tree test suite: Convert tag_tagged_items to XArray radix tree: Remove radix_tree_clear_tags radix tree: Remove radix_tree_maybe_preload_order radix tree: Remove split/join code radix tree: Remove radix_tree_update_node_t page cache: Finish XArray conversion dax: Convert page fault handlers to XArray ... |
||
Daniel Black
|
d41aa52523 |
mm: madvise(MADV_DODUMP): allow hugetlbfs pages
Reproducer, assuming 2M of hugetlbfs available: Hugetlbfs mounted, size=2M and option user=testuser # mount | grep ^hugetlbfs hugetlbfs on /dev/hugepages type hugetlbfs (rw,pagesize=2M,user=dan) # sysctl vm.nr_hugepages=1 vm.nr_hugepages = 1 # grep Huge /proc/meminfo AnonHugePages: 0 kB ShmemHugePages: 0 kB HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 2048 kB Code: #include <sys/mman.h> #include <stddef.h> #define SIZE 2*1024*1024 int main() { void *ptr; ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, -1, 0); madvise(ptr, SIZE, MADV_DONTDUMP); madvise(ptr, SIZE, MADV_DODUMP); } Compile and strace: mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ff7c9200000 madvise(0x7ff7c9200000, 2097152, MADV_DONTDUMP) = 0 madvise(0x7ff7c9200000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument) hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on author testing with analysis from Florian Weimer[1]. The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination was a consequence of the large useage of VM_DONTEXPAND in device drivers. A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be marked DODUMP. A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs memory for a while and later request that madvise(MADV_DODUMP) on the same memory. We correct this omission by allowing madvice(MADV_DODUMP) on hugetlbfs pages. [1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit [2] commit |
||
Matthew Wilcox
|
3159f943aa |
xarray: Replace exceptional entries
Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com> |
||
Dan Williams
|
23e7b5c2e2 |
mm, madvise_inject_error: Let memory_failure() optionally take a page reference
The madvise_inject_error() routine uses get_user_pages() to lookup the pfn and other information for injected error, but it does not release that pin. The assumption is that failed pages should be taken out of circulation. However, for dax mappings it is not possible to take pages out of circulation since they are 1:1 physically mapped as filesystem blocks, or device-dax capacity. They also typically represent persistent memory which has an error clearing capability. In preparation for adding a special handler for dax mappings, shift the responsibility of taking the page reference to memory_failure(). I.e. drop the page reference and do not specify MF_COUNT_INCREASED to memory_failure(). Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> |